Category: containers

Containers and persistent storage

Containers and persistent storage

Containers are a method of operating system virtualization that allow you to run an application and its dependencies in resource-isolated processes. Containers allow you to easily package an application’s code, configurations, and dependencies into easy to use building blocks that deliver environmental consistency, operational efficiency, developer productivity, and version control. Containers are immutable, meaning they help you deploy applications in a reliable and consistent way independent of the the deployment environment.

As containers continue to rise in popularity beyond the developer populous the way these constructs are being used becomes increasingly varied, especially (but not exclusively) in light of enterprise applications the questions of persistent storage comes up more and more. It is a fallacy to think only stateless application can or should be containerized. If you take a look at https://hub.docker.com/explore/ you’ll see that about half of the most popular applications on Docker Hub are stateful, like databases for example. (Post hoc ergo propter hoc?). If you think about monolithic applications versus micro-services, a monolithic application typically requires state, if you pull this application out into micro-services, some of these services can be stateless containers but others will require state.

I’ll mainly use Docker as the example for this post but many other container technologies exist like LXD, rkt, OpenVZ, even Microsoft offers containers with Windows Server Containers, Hyper-V isolation, and Azure Container Service.

Running a stateless container using Docker is quite straightforward;

$ docker run --name demo-mysql mysql

When you execute docker run, the container process that runs is isolated in that it has its own file system, its own networking, and its own isolated process tree separate from the (local or remote) host.

The docker container is created from a readonly template called docker image. The “mysql” part in the command relates to this image, i.e. containerized application, that you want to run by pulling it from the registry. The data you create inside a container is stored on a thin writable layer, called the container layer, that sits on top of the stack of read-only layers, called the image layers, present in the base docker image. When the container is deleted the writable layer is also deleted so your data does not persist , in docker the docker storage driver is responsible for enabling and managing both the read-only image layers and the writable layer, both read and write speeds are generally considered slow.

Image result for docker layers

Assuming you want persistent data for your containers there are several methods to go about this. You can add a storage directory to a container’s virtual filesystem and map that directory to a directory on the host server. The data you create inside that directory on the container will be saved on the host, allowing it to persist after the container shuts down. This directory can also be shared between containers. In docker this is made possible by using volumes, you can also use bind mounts but these are dependent on the directory structure of the host machine whereas volumes are completely managed by Docker itself. Keep in mind though that these volumes don’t move with container workloads as they are local to the host. Alternatively you can use volume drives (Docker Engine volume plugins) to store data on remote systems instead of the Docker host itself. If you are only interested in storing data in the container writeable layer (i.e. on the docker host itself) you can use Docker storage drivers which then determine which filesystem is supported.

Typically you would create a volume using the storage driver of your choice in the following manner;

$ docker volume create -—driver=pure -o size=32GB testvol1

And then start a container and attach the volume to it;

$ docker run -ti -v testvol1:/data mysql

Storage Vendors and Persistent Container Storage

Storage vendors have an incentive to make consuming their particular storage as easy as possible for these types of workloads so many of them are providing plug-ins to do just that.

One example is Pure Storage who provide a Docker Volume Plugin for their FlashArray and FlashBlade systems. Current they support Docker, Swarm, and Mesos. Most other big name storage vendors also have plugins available.

Then there are things like REX-Ray which is an open source, storage management solution, it was born out of the now defunct {code} by Dell EMC team. It allows you to use multiple different storage backends and serve those up as persistent storage for your container workloads.

On the virtualization front VMware has something called the vSphere Docker Volume Service which consists of two parts, the Docker Volume Plugin and a vSphere Installation Bundle (VIB) to install on the ESXi hosts. This allows you to serve up vSphere Datastores (be it Virtual SAN, VMFS, NFS based) as persistent storage to your container workloads.

Then there are newer companies that have been solely focusing on providing persistent storage for container workloads, one of them is Portworx. Portworx want to provide another abstraction layer between the storage pool and the container workload. The idea is that they provide a “storage ” container that can then be integrated with the “application” containers. You can do this manually or you can integrate with a container scheduler like Docker Swarm using Docker Compose for example (Portworx provides a volume driver).

Docker itself has built specific plugins as well, Cloudstor is such a volume plugin. It comes pre-installed and pre-configured in Docker swarms deployed through Docker for AWS. Data volumes can either be backed by EBS or EFS. Workloads running in a Docker service that require access to low latency/high IOPs persistent storage, such as a database engine, can use a “relocatable” Cloudstor volume backed by EBS. When multiple swarm service tasks need to share data in a persistent storage volume, you can use a “shared” Cloudstor volume backed by EFS. Such a volume and its contents can be mounted by multiple swarm service tasks since EFS makes the data available to all swarm nodes over NFS.

Container Orchestration Systems and Persistent Storage

As most enterprise production container deployments will utilize some container orchestration system we should also determine how external persistent storage is managed at this level. If we look at Kubernetes for example, that supports a volume plugin system (FlexVolumes) that makes it relatively straightforward to consume different types of block and file storage. Additionally Kubernetes recently started supporting a implementation of the Container Storage Interface (CSI) which helps accelerate vendor support for these storage plug-ins as volume plugins are currently part of the core Kubernetes code and shipped with the core Kubernetes binaries meaning that vendors wanting to add support for their storage system to Kubernetes (or even fix a bug in an existing volume plugin) must align themselves with the Kubernetes release process. With the adoption of the Container Storage Interface, the Kubernetes volume layer becomes extensible. Third party storage developers can now write and deploy volume plugins exposing new storage systems in Kubernetes without having to touch the core Kubernetes code.

When using CSI with Docker, it relies on shared mounts (not docker volumes) to provide access to external storage. When using a mount the external storage is mounted into the container, when using volumes a new directory is created within Docker’s storage directory on the host machine, and Docker manages that directory’s contents.

To use CSI, you will need to deploy a CSI driver, a bunch of storage vendors have these available in various stages of development. For example there is a  Container Storage Interface (CSI) Storage Plug-in for VMware vSphere.

Pre-packaged container platforms

Another angle how vendors are trying to make it easier for enterprises to adopt these new platforms, including solving for persistence, is by providing packaged solutions (i.e. making it turnkey), this is not new of course, not too long ago we saw the same thing happening with OpenStack through the likes of VIO (VMware Integrated OpenStack), Platform9, Blue Box (acquired by IBM), etc. Looking at the Public Cloud providers these are moving more towards providing container as a service (CaaS) models with Azure Container Service, Google Container Engine, etc.

One example of packaged container platforms though is the Cisco Container Platform. This is provided as an OVA for VMware (meaning it is provisioning containers inside virtual machines, not on bare metal at the moment), initially this is supported on their HyperFlex Platform which will provide the persistent storage layer via a Kubernetes FlexVolume driver. It then can communicate externally via Contiv, including talking to other components on the HX platform like VMs that are running non containerized workloads. For the load-balancing piece (between k8s masters for example) they are bundling NGINX and for logging and monitoring they are bundling Prometheus (monitoring) and an ELK stack (logging and analytics).

Another example would be VMware PKS which I wrote about in my previous post.

Conclusion

Containers are ready for enterprise use today, however there are some areas that could do with a bit more maturity, one of them being storage. I fully expect to see continued innovation and tighter integrations as we figure out the validity of these use-cases. A lot progress has been made in the toolkits themselves, leading to the demise of earlier attempts like ClusterHQ/Flocker. As adoption continues so will the maturity of these frameworks and plugins.

Docker networking overview

Docker networking overview

Introduction

There are of course a lot of blog posts out there already regarding Docker networking, I don’t want to replicate that work but instead wanted to provide a clear overview of what is possible with Docker networking today by showing some examples of the different options.

In general the networking piece of Docker, and arguably Docker itself, is still quite young so things move fast and will likely change over time. A lot of progress has been made via the SocketPlane acquisition last year and it’s subsequent pluggable model, but more about that later.

Docker containers are ephemeral by design (pets vs cattle), this leads to several potential issues, not the least of which is not being able to keep your firewall configuration up to date because of difficult IP address management, it’s also hard to connect to services that might disappear at any moment, and no, using DNS as a stopgap is not a good solution (DNS as a SPOF, don’t go there). Of course there are several options and methods available to overcome these things.

Single host Docker networking

You basically have 4 options for single host Docker networking; Bridge mode, Host mode, Container mode, and No networking.

Bridge mode (the default Docker networking mode)

The Docker deamon creates “docker0” a virtual ethernet bridge that forwards packets between all interfaces attached to it. All containers on the host are attached to this internal bridge which assings one interface as the containers’ “eth0” interface and another interface in the host’s namespace (think VRF). The container get’s a private IP address assignment. To prevent ARP collisions on the local network, the Docker daemon generates a random MAC address from the allocated IP address. In the example below Docker assigns the private IP 172.17.0.1 to the container.

docker1

Host mode

In this mode the container shares the networking namespace of the host, directly exposing it to the outside world. This means you need to use port mapping to reach services inside the container, in Bridge mode, Docker can automatically assign ports and thus make them routable. In the example below the Docker host has the IP 10.0.0.4 and as you can see the container shares this IP address.

docker2

Container mode

This mode forces Docker to reuse the networking namespace of another container. This is used if you want to provide custom networking from said container, this is for example what Kubernetes uses to provide networking for multiple containers. In the example below the container to which we are going to connect the subsequent containers into has the IP 172.17.0.2 and as you can see the container being launched has the same IP address.

docker3
docker4

No networking

This mode does not configure networking, useful for containers that don’t require network access, but it can also be used to setup custom networking.
This is the mode Nuage Networks leverages pre-Docker 1.9 (more info here).
In the example below you can see that our new container did not get any IP address assigned.

docker5

By default Docker has inter-container communication enabled (–icc=true) meaning that containers on a host are free to communicate without restrictions which could be a security concern. Communication to the outside world is controlled via iptables and ip_forwarding.

Multi-host Docker networking

In a real world scenario you will most likely end up using Docker containers across multiple hosts depending on the needs of you containerized application. So now you need to build container networks across these hosts to have you distributed application communicate internally, and externally.

As alluded to above in march of 2015 Docker, Inc. acquired the SDN startup SocketPlane, that has given rise to Libnetwork and the Container Network Model, meant to be the default multi-host networking setup going forward.

Libnetwork

Libnetwork provides a native Go implementation for connecting containers
The goal of libnetwork is to deliver a robust Container Network Model that provides a consistent programming interface and the required network abstractions for applications.

One of the benefits of Libnetwork is that it uses a driver / plugin model to support many underlying network technologies while stil exposing a simple and consistent network model to the end-user (common API), Nuage Networks leverages this model by having a remote plugin.

Libnetwork also introduces the Container Network Model (CNM) to provide interoperation between networks and containers.

A63EECBA-52A0-471B-B7F5-679D2142014C

The CNM defines a network sandbox, and endpoint and a network. The Network Sandbox is an isolated environment where the Networking configuration for a Docker Container lives. The Endpoint is a network interface that can be used for communication over a specific network. Endpoints join exactly one network and multiple endpoints can exist within a single Network Sandbox. And the Network is a uniquely identifiable group of endpoints that are able to communicate with each other. You could create a “Frontend” and “Backend” network and they would be completely isolated.

Unifying Docker Container and VM networking

Unifying Docker Container and VM networking

Introduction

Most environments are not homogeneous, typically you have multiple types of workloads and I believe this will only increase in the near future with the rise of containers, PaaS, VM’s, bare metal,… In this brief overview I wanted to demonstrate how you can connect Virtual Machines and Containers on the same overlay network in an automated manner via our SDN solution. This way every time you spin up a new workload it will automatically get its network and security policy applied and behaves like any other endpoint on the network.

Screen Shot 2015-11-06 at 13.01.15

Docker networking

There are multiple options to do networking in Docker, typically a container (running a specific service) can be exposed externally by mapping an internal port to external one. When you install Docker, it creates three networks automatically (bridge, null, and host), when you run a container you can use the –net flag to specify which network you want to run a container on. By default the Docker daemon connects containers to the bridge network. If you run ifconfig on the host you can see the bridge as part of the host’s network stack.

The none network adds a container to a container-specific network stack, this is what we will use in the case of Nuage Networks to connect the Docker container to our VRS (Open vSwitch)

The host network adds a container on the hosts network stack. You’ll find the network configuration inside the container is identical to the host.

Nuage Networks and Docker containers

In the case of Nuage Networks we will attach every container to a tenant (overlay) network which is provided by our centralised management (VSD) and control (VSC) plane and configured on the Docker host in our VRS (Open vSwitch). This allows us to use our centralised networking and security policies providing IP configuration, firewall rules, QoS, etc. If traffic leaves the Docker host it is encapsulated in VXLAN so from a management point of view this no different then how we work with Virtual Machines.

Demo

I’ve created a L2 network (called DockerSN below) in Nuage (synced to OpenStack) where I’m connecting both my containers and VM workloads. The subnet has a range of 192.168.200.0/24.

Screen Shot 2015-11-06 at 09.11.28

So when I spin up a new container on my Docker host and connect it to the Nuage VRS I’ll automatically get the policies from that construct applied.

Screen Shot 2015-11-06 at 09.23.22

So as you can see above, my new container (gloomy_jang) has gotten the IP address 192.168.200.190, if we go back to our Virtualized Services Architect interface we can see 2 containers (one I created earlier) and a VM attached to the same subnet (could also be to a separate subnet ofc).

Screen Shot 2015-11-06 at 09.28.40

We can drill down on the newly created container and get all the network and security policy details

Screen Shot 2015-11-06 at 09.29.48

We now have connectivity between our container workloads and VM (192.168.200.2).

[[email protected] ~]# docker exec 152f5660e56a ping -c 3 192.168.200.2
PING 192.168.200.2 (192.168.200.2) 56(84) bytes of data.
64 bytes from 192.168.200.2: icmp_seq=1 ttl=64 time=0.650 ms
64 bytes from 192.168.200.2: icmp_seq=2 ttl=64 time=0.450 ms
64 bytes from 192.168.200.2: icmp_seq=3 ttl=64 time=0.450 ms

VMware AppCatalyst, Bonneville, and Photon.

VMware AppCatalyst, Bonneville, and Photon.

VMware has lot’s and lot’s of customers, running lot’s and lot’s of workloads, both dev and test workloads and production workloads, you know like, super duper important stuff that cannot, under any circumstance break.

As the whole DevOps “movement” makes clear both developers and operations teams have different requirements and different responsibilities. Developers want speed, the operations teams want stability, both are trying to respond to the demands of the business which wants faster time to market. Gross oversimplification I know, but since devops is getting a lot of attention lately you’ll have no problem digging up articles and blogposts on that.

worked-in-dev-ops-problem-now-7809f3cf

Now as VMware’s customers rightfully expect, the idea is to try and marry both worlds and make these new methods enterprise grade without loosing the original benefits. I personally notice a lot of resistance in that enterprise customers understand the benefits, and some internal teams are pushing hard to incorporate these, but still a lot of people seem to adopt this “let’s wait and see / it’s just another fad” attitude.

fb9f28a7e86eef70941e619c3e6b7d52

VMware has been doing a lot of work in the past around making it’s products more “DevOps” friendly with things like vRealize CloudClient a command-line utility that provides verb-based access with a unified interface across vCloud Automation Center APIs, and vRealize Code Stream which provides release automation and continuous delivery to enable frequent, reliable software releases, while reducing operational risks.

You don’t have meetings with other teams, you talk to their API instead.
-Adrian Cockcroft

And now at the recent DockerCon in San Francisco VMware announced the tech-preview of AppCatalyst and Project Bonneville.

VMware AppCatalyst is an API and Command Line Interface (CLI)-driven MacOS type 2 hypervisor (based on VMware Fusion but without the GUI, 3D graphics support, virtual USB support, and Windows guest support) that is purpose-built for developers, with the goal of bringing the datacenter environment to the desktop. Currently a technology preview, VMware AppCatalyst offers developers a fast and easy way to replicate a private cloud locally on their desktop for building and testing containerized and microservices-based applications. The tool includes Project Photon (already announced in April), an open source minimal Linux container host, Docker Machine and integration with Vagrant. Panamax and Kitematic support are planned in the near future. AppCatalyst uses MacOS as its host operating system (i.e., the user must use MacOS 10.9.4 or later as their host operating system to use AppCatalyst).

You can download the Tech Preview of AppCatalyst here, it comes with an installer so it pretty easy to get up and running. Once it is installed AppCatalyst does not appear under your Applications folder, instead you can use your Terminal to navigate to /opt/vmware/appcatalyst

Screen Shot 2015-06-27 at 10.11.41

As mentioned above AppCatalyst comes pre-bundled with Project Photon – VMware’s compact container host Linux distribution. When you download AppCatalyst, you can point docker-machine at it, start up a Photon instance almost instantly (since there’s no Linux ISO to download), and start using Docker.

Another common use of the desktop hypervisor is with Vagrant. Developers build Vagrant files and then Vagrant up their deployment. Vagrant creates and configures virtual development environments, it can be seen as a higher-level wrapper to AppCatalyst. You can find the plugin for Vagrant here. (git clone https://github.com/vmware/vagrant-vmware-appcatalyst.git)

Since Project Photon is included in AppCatalyst it’s pretty easy to get started with deploying a Photon Linux Container Host.

appcatalyst vm create photon1
Info: Cloned VM from ‘/opt/vmware/appcatalyst/photonvm/photon.vmx’ to ‘/Users/filipv/Documents/AppCatalyst/photon1/photon1.vmx’

appcatalyst vm list
Info: VMs found in ‘/Users/filipv/Documents/AppCatalyst’
photon1

appcatalyst vmpower on photon1
2015-06-27T10:18:38.530| ServiceImpl_Opener: PID 2949
Info: Completed power op ‘on’ for VM at ‘/Users/filipv/Documents/AppCatalyst/photon1/photon1.vmx’

appcatalyst guest getip photon1
192.168.2.128

I can now SSH into the VM:

Screen Shot 2015-06-27 at 10.22.28

And securely launch a Docker container via Project Photon:

Screen Shot 2015-06-27 at 10.26.17

Screen Shot 2015-06-27 at 10.28.07

As mentioned before the idea is to interface with AppCatalyst via REST API calls, you can enable this by first starting the app catalyst-deamon and then going to port 8080 on your localhost.

Screen Shot 2015-06-27 at 10.30.31once the deamon is running we can start to make REST API calls, for example retrieve the IP address of the Docker VM we previously created:

Screen Shot 2015-06-27 at 10.34.09

Since the last VMworld VMware has been talking about this concept of containers and VMs being better together, this kinda led to a lot of discussion about overhead of the hypervisor and each container needing it’s own OS, potential lock-in, etc. But again, this is where VMware is trying to marry Dev with Ops and make the use of containers feasible in the enterprise environment. Project Bonneville takes another step in this direction by making containers first class citizens on the vSphere hypervisor.

Bonneville orchestrates all the back-end systems: VM template (with Photon), storage, network, Docker image cache, etc. It can manage and configure native ESX storage and network primitives automatically as part of a container deploy.

Bonneville is a Docker daemon with custom VMware graph, execution and network drivers that delivers a fully-compatible API to vanilla Docker clients. The pure approach Bonneville takes is that the container is a VM, and the VM is a container. There is no distinction, no encapsulation, and no in-guest virtualization. All of the necessary container infrastructure is outside of the VM in the container host. The container is an x86 hardware virtualized VM – nothing more, nothing less.

Screen Shot 2015-06-27 at 15.22.23

Bonneville uses VMFork (Instant Clone / Project Fargo) to spin up a new VM every time a container is launched, by doing this the operations team now sees VM instances in it’s environment that it can treat, i.e. “operationlize”, just like regular Virtual Machines (Bonneville updates VM names and metadata fields for the container VMs it creates for full transparency in vCenter and any vSphere ecosystem products), and the obvious added benefit is that each container might be a VM but each container is not using a full blown linux host os to run. Instant Cloned VMs are powered on and fully booted in under a second and use no physical memory initially.

You can see a demo of Project Bonneville below: