The hype around containers may have reached a fever pitch, but enterprises weighing how many eggs to put in that basket should consider how well (or how poorly) containers meet their data storage needs before they take things any further.
Consider server virtualization and its impact on storage. Sure, running multiple servers as virtual machines (VMs) rather than on dedicated boxes translated in to much better server utilization, but it also placed incredible demands on the underlying storage infrastructure. Instead of a single workload generating I/Os on a server, you had 10 VMs generating storage load, each with distinct -- and not necessarily complimentary -- characteristics. Meanwhile, virtualization's killer app -- live migration -- requires some form of networked storage, pushing many shops to pricey storage area networks and network attached storage disk arrays.
As a variation on virtualization, containers have their own set of storage issues that must be understood and addressed -- persistence, performance, and integration with scheduling and orchestration systems, to name a few. But today's application containers are relatively nascent and the landscape changes daily. As the IT industry coalesces around containers as the foundation of next generation application architecture, things that seemed like intractable problems yesterday may have already been solved.
Solving for statelessness
It doesn't take long for an IT professional to encounter a storage challenge when they start to poke around containers.
"If you're playing with Docker, these issues are going to be blatantly in your face right away," said Gou Rao, CTO at Portworx, a startup developing a container-native storage platform. "They're not subtle."
Take persistent storage, which by default, containers do not support. From the app perspective, a container is a full operating system, but that's a sleight of hand. Containers such as those popularized by Docker, rely on an overlay file system that uses a copy-on-write process to store any updates to the container's root file system. When the container is stopped or deleted, those changes are lost.
In containers' early days, a lack of persistent storage wasn't expected to be much of an issue, because many containers run stateless applications such as Web apps, where the data is independent of the application, rather than "stateful" applications such as databases. Containers' tight association with microservices architectures -- decomposing large "monolithic" applications in to individual piece parts -- seemed to further exempt them from worrying about persistent data storage.
But the more things go, the more they stay the same, and data storage issues are vital for enterprises exploring containers for production deployments.
"When people are playing around with containers, it's easy to say, 'This is my architecture' until you start to ask, 'Where am I going to put my database?'" said Zachary Smith, CEO and founder at Packet, a bare metal cloud provider whose customers are big users of containers. Even if your environment is largely comprised of stateless Web apps, "there's always a database in there somewhere."
And while it's always been possible for an application to store data in a Docker volume or Docker data container, those approaches have their own limitations, namely, limited support for external storage and difficulty sharing volumes between containers on different hosts.
Docker took an important step with Docker 1.9 last year, which added support for persistent storage with its plug-in architecture.
Arguably, one of the most important Docker storage volume plugins is Flocker, an open source container data volume manager from ClusterHQ that makes it possible for external storage devices to provide persistent storage to containers. For instance, SwissCom has a platform as a service (PaaS) offering based on containers and OpenStack and EMC's ScaleIO software-defined storage stack. With Flocker, SwissCom can offer persistent storage as a sort of "sidecar to the PaaS" to save data in a number of applications, include Reddis, MongoDB and Maria DB, said Marco Hochstrasser, SwissCom head of Application Cloud.
The Flocker driver is available for a number of storage devices (e.g., EMC XtremeIO and NetApp OnTap), software defined storage platforms (e.g., Ceph, Hedvig), public cloud block storage (AWS EBS, OpenStack Cinder and VMware vSphere), and the Docker Swarm, Google Kubernetes and Mesos cluster managers. And new platforms are being added all the time, said Mohit Bhatnagar, ClusterHQ vice president of products.
Even with the advent of the volume plugin, the storage capabilities that today's container environments can consume is still limited, said Sheng Liang, co-founder and CEO at Rancher Labs, which makes operational infrastructure software for containers.
While the volume plugin is a good start, the Docker ecosystem will eventually expose additional capabilities beyond 'open,' 'close,' 'read,' and 'write,' Liang said, for example, how to take a snapshot, or perform a backup.
Docker itself acknowledges that this isn't the last you'll hear from it about its storage volume plugin. Docker announced the storage volume plugin with version 1.7, but it wasn't until version 1.9 that the interface became generally available. "There was quite a bit of iterating on ease of use, partner flexibility, differentiation…" said Scott Johnston, Docker senior vice president for product management and design. "It was a good start, but that's doesn't mean there's not more to do yet." For instance, Docker doesn't yet support live migration, he said, which could emerge as critical in production environments.
Then there's the container ecosystem to consider, said Alex Polvi, CEO at CoreOS, the force behind Rocket, a container engine that competes with Docker, and the Tectonic orchestration framework. "Even if you've implemented a volume manager at the container engine level, the cluster level is where it really matters," he said. As such, ops folks considering containers must explore how well cluster management system such as Mesos, Docker Swarm and Kubernetes (on which Tectonic is based) map the containers they manage with their underlying storage volumes.
And while it's possible to run a containerized environment without a tool like Kubernetes, it's probably not a good idea if you plan on scaling the environment. "Things are more dynamic than ever," Polvi said. "Any storage system should be able to support [containers'] dynamic nature."
Running containers at scale is what savvy IT operations teams are planning for, said Itzik Reich, CTO at XtremIO, an EMC company that offers a scale-out all-flash storage platform. The company recently put together a demo for a large banking customer that consisted of XtremIO storage, the ClusterHQ Flocker driver, and Mesosphere/Marathon scheduling and orchestration. And while this customer is "at least two generations ahead of everyone else," they are actively evaluating container technologies, Reich said. "They want a working technology in their catalog -- they are not waiting around for the first order."
Brace yourself for storage strain
When those production workloads come to fruition, what kinds of strain can we expect them to place on storage?
Unlike traditional VMs, containers don't impose excessive strain on the underlying storage subsystem, said Docker's Johnston. "It's a different problem set with containers and virtualization," he said. Whereas VMs emulate the underlying hardware -- the CPU, I/O subsystem, etc. -- containers create isolation boundaries between resources. As such, containers offer "native performance" for both CPU and IO, and only a small (5%) networking tax.
Organizations also tend to run a lot more containers per server than VMs, by virtue of the fact that they share a single copy of the operating system. That translates to increased density, which, with storage systems, "always causes problems," said Portworx's Rao. And because containers take hardly any time to start up, resources need to be made available to them almost instantly.
In Portworx's case, its storage system minimizes challenges associated with density and bursty workloads, for example, by optimizing the system for containers' layered file system. Its software-defined storage stack is currently in beta, scheduled for availability this summer.
Likewise, Hedvig Inc., of Santa Clara, Calif. makes distributed, scale-out software-defined storage and believes its scalability is key to containerized environments -- as well as next-generation applications like Cassandra and MongoDB that frequently run in those environments. "It's very important for customers that can't predict what their needs are three to five months out," said Rob Whiteley, Hedvig vice president of marketing. The system also provides advanced storage policies that can be applied at the container, application or VM level, and features an inherently multi-site architecture, for built-in disaster recovery.
At Packet, the name of the game is providing container-friendly storage that is dead-easy for customers to consume, Smith said. "We thought when we launched last year that customers would create their own storage, but most people are too lazy or too scared," he said. And yet, "people still want elastic block services with security and control." As such, Packet now offers its customers block storage from startup Datera, which it offers in two tiers ("slow and big" or "fast and small") and leaves data placement and tiering to the Datera platform storage.
It's that level of abstraction and automation that may foreshadow containers' biggest impact on storage: not having to think about storage at all. Just as containers have freed developers from worrying about which version of Linux to build their application on, "as an app developer, you're not going to be thinking 'Now I need another terabyte with 700 IOPS,'" Smith said. "It's the kernel or workload manager that is going to deal with getting you the right storage."
Alex Barrett is editor in chief of Modern Infrastructure. Contact her at firstname.lastname@example.org.