There are essentially two ways to optimize memory resources in virtualization: Select the proper amount of memory to run a VM, and enable the appropriate features that optimize memory use for your workload and server.
To start, provision the memory that's adequate for the workload. Too little memory will cause the system to use page files -- disk swapping -- excessively and will significantly impact performance. It's better to provision too much memory than too little. Excess memory can be used by the guest OS for file system caching, and some hypervisors can reclaim the caches for the workload if memory demands increase. However, excessive memory allocation can also result in undesirable and unnecessary VM memory overhead.
Interestingly, memory resources are often the gating factor in VM provisioning. This means that a typical server runs out of physical memory capacity before it runs out of processor capacity. While CPUs can't easily be overcommitted, memory is often overcommitted on virtualized systems. Overcommitting memory works because most workloads exhibit varying amounts of memory utilization and rarely use the maximum amount of allocated memory all the time.
Memory management techniques
Consequently, various memory management techniques can be used to dynamically reduce the physical memory provided to a VM, often sharing physical memory in a variety of ways. For example, VMware ESXi 6.0 and later employs five different techniques to optimize memory: page sharing, memory ballooning, memory compression, swap-to-host cache and regular swapping. Administrators can opt to employ one or more of these techniques to enhance memory usage and support additional VMs on the server without adding more physical memory to the system.
Page sharing works because VMs often contain redundant content stored in memory pages. For example, identical OSes often have memory pages -- small portions of physical memory -- that contain redundant content. Without page sharing, memory would have to hold every instance of that redundant content. With page sharing enabled, a VM can utilize a single copy of those redundant memory pages. It's possible for page sharing to work with a single VM -- weeding out redundant memory pages within the memory space used by a single VM -- and page sharing can also work across VMs, effectively allowing all VMs to share the redundant memory pages. Think of page sharing as a type of data deduplication applied to virtualized memory.
Memory ballooning is a technique used to free low-priority memory when the VM is close to using all of its allocated memory but there is little free memory left on the host system to add to the VM. In effect, when memory is low, the hypervisor will dump those memory pages that are the least important. The content isn't discarded but rather shifted to an existing disk swap -- memory page swap -- file on disk. This allows pages to be recovered and used if need be.
Another way to optimize memory is to use disk swap features. Disk swap -- memory page swap -- techniques are vital for reliable workload performance, using disk space to supplement physical memory shortages. However, disk swapping is far slower than memory access, and excessive disk swapping can have a noticeable impact on workload performance. Modern servers can certainly include enough physical memory to disable disk swapping, but the push to optimize system utilization can still tax physical memory capacity to the point where disk swapping remains a viable fallback practice. When disk swapping is still required, memory compression techniques can reduce the number of memory pages that must be swapped, effectively reducing the performance impact of disk swapping behavior. Compression typically imposes less overhead -- performance impact -- than disk swapping.
Typical disk swapping employs hard disk drive devices, but HDDs can be notoriously slow devices that can impose significant performance penalties on applications that use conventional disk -- page -- swapping. Administrators can create high-performance host cache reservoirs on solid-state drive devices and use the host cache as a swap destination. The swap-to-host cache feature poses far lower latency and less impact on workloads. If a swap-to-host cache isn't available or hasn't been configured, the hypervisor will swap to a regular disk -- page -- swap file on local disk.
Learn the difference between flash memory and RAM
Run in-memory workloads in the cloud
See into the future of software-defined memory