How to Put Docker into Production with Persistent Storage

istock 530536607

As the adoption of application containers, including Docker and Kubernetes, continues to experience explosive growth, the need for effectively providing persistent storage to run stateful applications in production has developers and IT departments looking for answers.

Typically, applications and data are kept separate: server-side applications can be horizontally-scaled easily to increase performance, cloned in the case of a failure, or removed if no longer needed. Storage resources are decoupled from the application, and its lifecycle is independent. While many application components can be designed to be stateless, this is almost never true for an entire application. The main example being a database, an application that may not be on the bleeding edge of application delivery, but still may be run in Docker containers. So, Docker for example, needs to have a way to allow applications to store and manage persistent data.

One solution for adding persistent storage of data to an application is leveraging object storage like S3. However, S3 does not support all data types — for example, a typical database is placed on a file system, and doesn’t fit the S3 model.

To solve the problem, Docker volumes were introduced. Docker volumes add a data directory into an ephemeral container. Ephemerality means that an application running in a Docker container can be removed or redeployed at any time. In practice, the Docker storage volume is attached to a container upon start up with either the -v option or via Dockerfile's VOLUME instruction. The volume is managed independently from the container, ensuring that the volume isn’t deleted automatically when the container it is attached to goes away. While Docker volumes are a native feature of Docker, native implementations are very limited.

The most important thing to know about Docker volumes is that out-of-the-box, Docker only supports local volumes (stored on the file system of the server where a container is deployed). This may be acceptable for development and testing environments, but it’s a show-stopper for production environments. This is because:

  • Storage coupled with the original host will be lost if the server is rebuilt.
  • Management and data protection are hard to implement with local volumes.
  • In a multi-server environment (with Swarm or Kubernetes), the application owner has little control over which server will be chosen to deploy a new container.

For many, relying on local storage is not a viable solution. Attaching an NFS share as a mount point for a container can be a solution for Swarm/Kubernetes clusters initially, but when production scaling begins, a flexible, manageable storage solution is required.

For Docker users, there are alternatives. A good solution can come in the form of software-defined storage which runs on the same host as the application container and addresses all storage needs required by Docker. For example, you can connect a solution like Virtuozzo Storage with a Docker-certified plug-in, which provides an effective, high performance alternative (to native local volumes) storage backend for Docker volumes. The plug-in approach adds the much-desired flexibility in storage options for Docker users, who can pick a storage solution that is most appropriate for their needs.

When leveraging a more complete software-defined storage to solve persistent storage needs, you gain a lot more in terms of how you can re-imagine the way your company manages its storage resources. Below are just some of the higher-level benefits of using software-define storage solution:

  • High performance: faster than CEPH
  • Extremely cost efficient: Works with commodity off-the-shelf hardware
  • Massively scalable: Store petabytes of data.
  • Easy to manage: Stay in control with an easy-to-use web-based management interface.

When it comes to persistent storage for app containers, success is based on two key technologies: SSD caching and journaling provide a significant performance increase to the storage cluster with only 1 SSD per 4 HDD, and automated load balancing to keep "hot" data cached by moving it to less utilized disks, or local disks, or SSD disks – all to maximize utilization of all storage resources.

After the Docker volume plug-in implementation, all the benefits of software-defined storage can be realized, in addition to getting persistent storage for application containers.  Docker Swarm users can get a production-ready solution that combines storage and compute, eliminating the need for dedicated storage hardware. At the same time, storage performance grows linearly with deployment size and its capacity and performance demands, ensuring that storage is never a bottleneck in the container orchestration platform.

See how you can get started today with a certified Docker Plug-in solution for persistent storage – learn more here.