BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
The seemingly simple issue of deciding how many cores to have in each CPU in your virtual servers has a number of complex dimensions. First, CPUs tend to be underutilized in virtual environments. Secondly, memory size and speed have a bigger impact than CPU performance. And finally, virtual servers tend to be I/O-bound.
Low cores per CPU makes for the best server farms, and so, on the surface, small might seem beautiful. The problem is the other dimensions of server design. These all have economic implications, which, in the end, might be determining factors in choosing a configuration.
Dynamic RAM (DRAM) is expensive, and the price of the latest, densest dual in-line memory modules (DIMMs) tends to have a high premium over the mainstream DIMM. Cheaper but more plentiful DIMMs might be a better option, but we now have the option for Optane or NAND non-volatile dual in-line memory modules (NVDIMMs) that give an effective memory expansion into the terabyte range.
Using NVDIMM and cheaper DRAM sticks means that the number of DIMM slots needs to increase. This implies a doubling of memory bandwidth in the system, and together with the capacity boost, we can load more instances onto that server.
Adding fast, non-volatile memory express (NVMe) solid-state drives (SSDs) to the server will dramatically boost I/O rate per instance. This used to be an extremely expensive proposition, but NVMe has entered the consumer market and prices are generally lower for the technology.
NVMe reduces OS overhead for I/O considerably, bringing back extra CPU and memory cycles in the process. Likewise, remote direct memory access (RDMA), which is starting to become ubiquitous in hyper-converged infrastructure (HCI) systems and will be a standard Ethernet feature in a couple of years, reduces overhead and latency between clustered virtual servers.
Taken together, the memory and I/O performance gains let us load up virtual servers with many more instances, likely to the point that the CPUs are more than loaded up. This points to more CPU cores per CPU to keep the server operating in balance.
At this point in the discussion, it's worth looking at what an instance actually is. There is, of course, no "right" size. Instances come in a variety of configurations, allowing a matchup to application use cases. They range from 1-vcore to 1-pcore to multiple virtual cores per physical core and can even reach 1 vcore per CPU. Memory and I/O allocations are also independent variables.
Trends in instances are to allow for larger DRAM and more I/O, coupled with a lower vcore ratio -- more CPU per instance. If you expect to service this class of instance in the next five years, a bigger server engine, with higher cores per CPU, probably makes sense.
Containers and microservices
This relatively new approach to virtualization looks to supplant hypervisor-based instances. Typically, a server hosting container needs less DRAM than a server running a hypervisor due to memory segment sharing, so a container might see three to five times the instance count. This increase in instance count implies yet more CPU cores.
If we add in the move to microservices software architecture, where storage and networking service functions are converted to small, containerized modules and applications are also partitioned into microservices elements, the container count per server will jump again, perhaps significantly. Microservices approaches mean more state-swapping in the CPU. More performance and parallelism are needed, so again, more cores per CPU helps keep the balance.
Innovation in server architectures
All of this ignores the rumblings at the leading edge of server architectures. We've heard talk for around three years about Hybrid Memory Cube (HMC) modules that bring the CPU and a segment of DRAM into a tightly coupled module. This boosts DRAM speed dramatically, both from the better electrical interface and architecture calls for many parallel channels to the DRAM.
HMC-like platforms are being touted by major vendors, with early versions limiting DRAM size to around 32 GB. This is enough to form a large intermediate cache between the L3 and DRAM, boosting effective memory performance to the point that more cores can be supported. Again, the conclusion is that more cores makes sense.
Economic implications of adding more cores
Generally, fewer, powerful servers are better economically than lots of small ones. They are cheaper to run and can effectively support 25 Gigabit Ethernet and 50 GbE links with RDMA. The underlying infrastructure of power supplies is better amortized, while running costs -- power, cooling and admin support -- come out cheaper, too.
In all these discussions, though, there is a sweet spot for the component choices. For example, cost per core might actually drop as cores per CPU goes up, but the top three or four CPUs are likely new and will have a 50% to 100% premium. The same applies to memory and SSDs.
If you avoid the leading-edge products, it would seem that higher cores per CPU and generally enriched servers make better sense than small servers for all except -- maybe -- popcorn-sized web servers.
Looking to the future
The future roadmaps for server architectures are a bit hazier than usual, due to debates about moving to a memory-centric framework, such as Gen-Z. Even so, a core count of up to 32 cores per CPU will be seen in 2018, and we can expect further growth if we expand HMC memory sizes to the terabyte level. Some vendors hint it will probably be a mix of DRAM and NAND in the module.
With high-core-count virtual servers and the HCI architecture, we'll see the footprint of the server storage farm shrinking dramatically for a given workload.
One caveat: Pay attention to network design and traffic flow. We are still pushing the load limits of LANs and WANs, even with the latest speeds and feeds. Encryption and compression are fast becoming required features for LANs, and this adds yet more CPU load, which means more cores.
Optimize your vCPU resources
Manage vCPU distribution with affinity and anti-affinity features
Overcome vCPU performance problems