It's been a well-known fact for many years that the standard hypervisor approach to virtualizing a server has a basic flaw in its architectural premise -- it requires each virtual machine to run a separate operating system instance. Hypervisors are designed to enable any operating system to run in a VM, allowing for a greater degree of flexibility. This also means that Windows instances can exist alongside Linux instances in the same machine.
Once we reach scale levels found in cloud providers, it becomes apparent that there is no real need to mix OSes on any given server, as there are so many instances, segregating them doesn't impact flexibility. With this realization comes the understanding that the hypervisor method wastes a great deal of memory and I/O cycles, since as many as hundreds of copies of the OS could exist on any given server.
The container approach
The idea that we can live with a single shared copy took a while to reach market. This is the container approach, which allows the OS and any applications to be shared. The resulting savings in DRAM enable many more instances to exist on any given server, often reaching three to five times the instance count for hypervisors.
With containers running within that single copy, we lose one of the protections Intel built into the hardware. Multi-tenancy requires barriers to keep instances out of the memory space of other instances. This logical separation adds a degree of Linux container security, ensuring that if one VM is compromised, other VMs on the same host are not also at risk. If this feature wasn't available in hypervisor-based systems, the cloud would never have grown to its current size.
Intel provides hardware assists to solidify multi-tenancy in their processors. Unfortunately, moving to containers meant these can no longer be used, leaving the containers exposed to boundary-crossing exploits.
The Docker daemons run as root, and changing the root settings requires major modifications to Docker. Such changes include running the containers inside VMs, placing control of the Docker daemon in the hands of trusted users only, and using UNIX sockets. This is assisted by the recent addition of a user namespace feature, which allows IT to separate access privileges for containers and the Docker daemon, preventing the containers from accessing the root.
Putting containers and hypervisors together
In May of 2015, Intel brought Clear Containers to the market. These provide a very streamlined hypervisor designed to host the containers. With an overhead of between just 10 to 20 MB per instance, we get back the protection that hypervisors provide without the space burden of running multiple copies of the OS stack. At the same time, Linux DAX zero-copy sharing between the host and guest and kernel samepage merging facilitate access to the OS image in DRAM.
Docker images are also a point of attack. These are build templates for the container, which are interpreted by the Docker daemon running in root. Again, there is an opportunity for exploitation, so Docker has recently released Docker Content Trust, which uses tools to guarantee the validity of an image. This involves hardware authentication of the image using Notary -- an open source tool -- and The Update Framework to validate the content and verify who published it.
Docker also has an official repository for independent software vendors to present safe images to users. These images can be accessed via the Docker Hub site, allowing organizations to verify Linux container security policies before use. This significantly increases the protection that Docker users enjoy, since these images are from known sources, fully validated, characterized from a security viewpoint and tested as an entity.
What does this mean for Linux container security?
Taken together, these improvements should ideally make containers as safe as hypervisors. However, the container approach is very new and still evolving. Only time will tell, but there's hope because containers have some advantages.
First and foremost, containers are much easier to update than traditional approaches, meaning software gets updated properly and quickly. Testing the result is also easier, so attacks exploiting old code shouldn't occur. This is a critical weakness in many large clusters today, since updating involves different teams and can be disjointed and often late.
Another major protection comes from containers being isolated from each other and from physical devices. This reduces the attack surface considerably. It's also good practice to use read-only file systems for images and other data wherever possible. Though that's true in all computing, the container approach shares image data more often, allowing for tighter control on fewer images.
Linux container security in Docker continues to evolve, as does the underlying containers approach. Compared to the hypervisor evolution, the Docker roadmap appears very focused and crisp. Also, provided that containers are built on a hypervisor such as Clear Containers, they look to be as robust as standard hypervisor virtualization and enjoy superior security control.
Cloud market prepares for a container showdown
Virtualization without the hypervisor
What's the difference between containers and VMs?