Evaluation of container orchestration systems with respect to auto-recovery of databases Matt Bruzek∗
Eddy Truyen, Vincent Reniers, Dimitri Van Landuyt, Bert Lagaisse, Wouter Joosen
Ubuntu community member
[email protected]
imec-DistriNet, KU Leuven
[email protected]
ABSTRACT
1
Container orchestration systems, such as Docker Swarm, Kubernetes and Mesos, provide automated support for deployment and management of distributed applications as sets of containers. While these systems were initially designed for running load-balanced stateless services, they have also been used for running database clusters because of improved resilience attributes such as fast autorecovery of failed database nodes and location transparency at the level of TCP/IP connections between database instances. These improved resilience attributes depend on four common features of container orchestration systems: (i) automated container eviction in response to node failure events, (ii) automated migration of persistent volumes, (iii) virtual networks for containers and (iv) stable network addresses for services. It is expected that the use of these features incurs certain performance costs however. In this paper we evaluate the overhead on response latency and resource consumption of these features for Docker Swarm and Kubernetes, with MongoDB as database case study. As the baseline for comparison, we use an OpenStack IaaS cloud in a closed lab where the above features are also available although in a less automated manner. The first feature of automated container eviction introduces no significant performance overhead for Docker Swarm. Adding the feature of automated volume migration to Swarm, by using Flocker as volume plugin, introduces an 8.3% overhead with respect to the 0.95 quantile of response latencies. For the Kubernetes+Flocker deployment, we measured a combined overhead of 11.4% on response latency. The third and fourth features, which support location transparency, cause a deteriorated scalability for replicated MongoDB clusters, but only for CPU-intensive workloads where data set sizes are smaller than available memory.
Recently, there has been an increasing industry adoption of Docker containers for simplifying the deployment of software [14, 20, 29, 34]. Almost in parallel, container orchestration middleware, such as Kubernetes [6], Docker Swarm, Mesos Marathon and OpenShift 3.0 [20] have arisen which provide support for automated container deployment, scaling and management. NoSQL databases are an important technology trend for supporting the storage of various kinds of data structures with various trade-offs between consistency, availability and network-partition tolerance [5]. The configuration and maintenance of the data tier of a distributed software service is not an easy task. There are a lot of configuration parameters with inter-dependencies. Ensuring consistency of these configuration parameters between various development, testing and production environments is an important problem in contemporary DevOps environments. Docker containers have the potential to overcome this problem by offering a portable and light-weight component encapsulating a specific performance-tuned configuration of a NoSQL database node, referenced by a globally unique name. As such Docker containers have become the potential to become the new incarnation of reusable software component for Internet-based services. More and more companies are considering running Dockerized NoSQL databases, not only in development environments, but also in cloud-based production environments [8, 9, 31]. In large-scale deployments, container orchestration systems are then a logical necessity for managing important non-functional requirements, such as resilience, resource cost-efficiency, and elasticity [6]. It has been shown that containerizing NoSQL databases yields a number of benefits [4, 25, 31]. More specifically, it has been shown that by means of Kubernetes, failure recovery of databases can become considerably more automated and the time to recover from a failed database instance can be reduced drastically [25, 31]. These two advantages ensue because of four common features of container orchestration systems: (i) automated node failure dectection and automated container eviction, which means that when a node has been identified as failed, all containers on that node are started on another nodes after a configurable timeout, (ii) persistent volumes which can be dynamically detached from an evicted container and reattached to the new container, (iii) virtual networks for containers where each container has an IP address that can be reached without NAT, and (iv) a built-in service proxy such that each client-visible application service can be reached via the same IP address, even after a container is evicted to another node. While the first two features support fast recovery of failed database instances, the latter two features support location transparency for TCP/IP connections
ACM Reference format: Eddy Truyen, Vincent Reniers, Dimitri Van Landuyt, Bert Lagaisse, Wouter Joosen and Matt Bruzek. 2017. Evaluation of container orchestration systems with respect to auto-recovery of databases. In Proceedings of ACM/IFIP/USENIX Middleware 2017, Las Vegas, Nevada, USA, Dec 11-15, 2017 (Middleware 2017), 7 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn ∗ The
work on this paper was performed by the author when affiliated with Canonical.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. Middleware 2017, Dec 11-15, 2017, Las Vegas, Nevada, USA © 2017 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn
INTRODUCTION
Middleware 2017, Dec 11-15, 2017, Las Vegas, Nevada, USA between database instances even when they are migrated to another node. These features are common among many container orchestration systems including Kubernetes, Docker Swarm and Mesos Marathon1 , although little is known about the performance overhead of these features. In this paper, we quantify the performance costs of using a fully automated container orchestration solution for the above features and compare these costs to existing IaaS platforms such as OpenStack, AWS, Microsoft Azure and GCE. In these IaaS platforms, the above described features are also supported although in a less automated way. Virtual networks are commonly used in multi-tenant IaaS platforms and it is possible to configure location transparency at the level of TCP/IP connections by means of attaching so called elastic or floating IP addresses to virtual machines or load balancers. Moreover it is possible to start up a virtual machine with an attached persistent volume, such that a given VM state can be persisted and migrated across different VM instances. Node failure detection is also a basic building block of any cloud platform and VM failure detection can be offered to customers as a separate monitoring service, or integrated with a load-balancing service. This paper is structured as follows. Section 2 summarizes the existing work on performance evaluation of container technologies. Two contributions of the paper are then presented in the subsequent sections: Section 3 gives an up-to-date overview how the above four features are supported in the most popular container orchestration systems and Section 4 evaluates the performance overhead of these features for 2 container orchestration systems – Docker Swarm and Kubernetes – with MongoDB as database case study. As baseline, we use an OpenStack IaaS cloud in a closed lab. Section 5 concludes by discussing the experimental findings of Section 4.
2
RELATED WORK
Existing research [14, 15, 18, 28, 29] has focused most attention on comparing the performance of a single container against a single virtual machine, both deployed directly in Linux on top of a bare-metal machine. Sharma et al. [29] evaluate the so-called hybrid model where a containerized Redis instance, inside a VM, is compared to Redis directly installed in another VM; their results show that the hybrid model performs slightly better. Literature on design and evaluation of container orchestration (CO) systems originates mostly from Google [6, 32]. The Borg system, and its predecessor Omega, are used for running all Google services. For example, the GCE IaaS uses Borg to schedule VMs inside containers. Verma et al. [32] shows that Borg supports improved resource utilization in terms of number of machines needed for fitting a certain workload on. Kubernetes is the successor of Borg. Existing studies of containers consistently report and confirm that containers trade improved cost-efficiency for reduced security isolation in comparison to virtual machines [14, 30, 34]. For this reason, cloud providers continue to use a virtual machine as key abstraction for representing a computing node in order to protect their infrastructural assets. However, little research exists on the performance evaluation of CO systems that are deployed on top 1 Kubernetes v1.2+, the Swarm integrated mode of Docker v1.12+ and Mesos Marathon v1.3+ comprise support for all four features.
Truyen et al. of a virtualization-based cloud platform. Kratzke et al. [21, 22] has already studied the overhead of the virtual networking feature [21] and argues that operating container clusters with highly similar high core machine types covers the data-transfer rate-reducing effects of containers in several different public cloud providers.
3
ARCHITECTURE, FEATURES AND TOOLS
Docker Swarm (Swarm), Kubernetes (K8s), Mesos, and OpenShift 3.0 are the most popular container orchestration systems for running production-grade services in private OpenStack Clouds, according to a recent survey among OpenStack cloud administrators [1]. As OpenShift is based on K8s, we focus our attention on the other three CO systems. K8s and Swarm support deploying and managing service- or job-oriented workloads. The Apache Mesos Marathon system supports deploying groups of apps together and managing their mutual dependencies, whereas the Apache Aurora system, which is also based on Mesos, supports deploying long-running jobs and services. The underlying Mesos system supports scheduling differerent frameworks on top of a shared cluster of machines, For example Marathon, Aurora, K8s and Hadoop could be deployed next to each other. Mesos governs then the coordinated allocation and withdrawal of computing resources to these different frameworks [19]. First, we describe the common architectural tactics underlying CO systems. Then we describe how the above four features are realized in the different systems. Subsequently, we describe how these features can be used for deploying database clusters. Finally, we describe which deployment tools for K8s and Swarm have been selected for evaluation.
3.1
Common architecture
A CO system typically follows a declarative configuration management approach, which implies that an application manager describes or generates a declarative specification of the desired state of the distributed application. The CO system then continuously adapts the deployment and configuration of containers until the actual state of the distributed application matches the described desired state. The core architectural pattern underlying K8s, Swarm and Mesos Marathon/Aurora is very similar: A container cluster consists of Master and Worker nodes. The Master offers an API for configuring the desired state of the distributed application. It uses a distributed key-value store (e.g. etcd, Consul, or ZooKeeper) for storing the actual configuration state about all deployed services. It then schedules one or more containers on the Worker nodes for running the services. On each worker node, an agent interacts with the master. This agent also comprises a service proxy and a virtual network component. To ensure high-availability it is possible to configure multiple nodes as master.
3.2
Four features
1. Node failure detection and container eviction: All CO systems support tolerance against node fail-stop errors. When a node has been identified as down by the master and has not returned to a normally functioning mode within a configurable timeout, all
Evaluation of container orchestration for auto-recovery of databases containers on that failed node are evicted and will be restarted on another node. 2. Persistent volumes and automated volume migration: In all CO systems, containers are by default stateless; when a container dies or is stopped, any internal state is lost. Therefore, so called persistent volumes can be attached to store the persistent state of the container. Persistent volumes can be implemented by services for attaching block storage devices to guest virtual machines such as Cinder, distributed file systems such as NFS, cloud services such as GCE Persistent Disk or just be directory on the host machine. In this paper we use Cinder volumes. Cinder is the OpenStack service for attaching block-storage devices to VMs [26]. We also use Flocker which is a generic volume manager for Dockerized applications with support for volume migration: when K8s or Swarm evicts containers from a failed node and restarts them on another node, Flocker will automatically migrate the persistent volumes of the old containers to the new containers [27, 33]. 3. Virtual networks for containers: In order to improve the original port-to-port linkage model of Docker containers, K8s has introduced from its beginning an improved container networking model [13]: each container2 can be reached via a separate virtual cluster IP address which simplifies service discovery and connection management [17]. Virtual network software is used to provide a layer 3 network between the containers and to control how network traffic between containers is transported between nodes of the cluster. In K8s, the Flannel overlay network software is an often used and well-working solution [23]. Mesos v0.25+ [3] and Docker v1.12+[16] also support IP addresses per container. Note that containers have transient IP addresses: when a container is stopped or migrated to another node, its cluster IP address is released. The Container Network Interface (CNI) project [11] is a noteworthy standardization initiative that aims to support a common interface between container engines/orchestrators on the one hand and various (virtual) network plugins that configure and manage the network interfaces of Linux containers on the other hand. 4. Services: CO systems allow exposing a running container to a client via a Service resource3 , which embodies a stable DNS name or stable cluster IP address, a network protocol and one or more ports. DNS name and cluster IP address are named stable because they remain always the same such that clients of the service can depend on location transparency at the level of TCP/IP connections. A service can also have an external IP address in order to enable access by external clients outside of the container cluster. This external IP addresss is either supported by an external load balancer or by so-called NodePorts, which is a cluster-wide unique port number on which the service can be reached4 . On each node of the container cluster, requests to the cluster IP or Nodeport of a Service are processed by a local service proxy which forwards requests to the transient IP addresses of the containers of the service, using a load-balancing algorithm. All CO systems support an internal DNS service that enables lookup of the cluster IP address of a service based on its name that is specified in its Service configuration. 2 more
precisely: Pods, which are units of interdependent, co-located containers, have a separate cluster IP address. 3 Mesos Marathon supports the same concept but names it Application 4 Mesos has a similar concept, but names it service port.
Middleware 2017, Dec 11-15, 2017, Las Vegas, Nevada, USA Finally, it is also possible to attach HostPorts to containers so that clients can directly send network request to a container without requiring to pass through service proxies and the virtual network layer. This allows clients to use their own load balancer solution, and possibly avoid the known performance overheads of the virtual network layer[21, 22]. The CNI standard currently lacks support HostPorts[2].
3.3
Support for database clusters
A well-documented benefit of the above four features for databases is faster auto-recovery [25, 31]: managing a separate persistent volume per database node drastically reduces the time to recover from a node failure: the persistent volume of the database container on that failed node can be migrated to a newly-started database container which as a consequence does not need to retrieve all the data using the MongoDB’s replication protocol. Moreover, the other database replicas and external clients can continue to interact with the new database container via the existing Service IP address or DNS name of the old database container [31]. CO systems were however initially designed for deploying and managing stateless applications and therefore proved less suited for databases. The essence of the difficulty is that the built-in service proxy and replication scheme of CO systems are in conflict with the existing load balancing and replication algorithms of database cluster software systems such as MongoDB or Cassandra. In order to overcome this problem, four constraints have to be taken into account for defining a correct container-based configuration of a database cluster: (1) Separate Service resources need to be defined for each database container in the cluster such that each database container has a separate Service IP address. An alternative for Service IP addresses is registering the cluster IP address of each database container as an SRV record in the internal DNS service of the CO system. (2) Each database instance needs to be associated with a uniquely identified persistent volume. (3) Database instances are created and destroyed in an ordered fashion. (4) Database instances need to be placed on different nodes. Therefore, the cluster scheduler, which controls the placement of containers on nodes, needs to be configured accordingly. K8s version 1.5+ offers support for the concept of StatefulSet which is a higher-level configuration abstraction for automating the deployment of database clusters such that the above four constraints are automatically applied. However, it is also possible to adhere to the above four constraints in Swarm and Mesos Marathon by means of a shell script that first creates a persistent volume and than creates a new database container.
3.4
Used deployment tools
In order to simplify the installation procedure for container software, a number of deployment tools exists. We have experimented with three kinds of tools for private clouds before commencing our experiments.
Middleware 2017, Dec 11-15, 2017, Las Vegas, Nevada, USA • Containerizing deployment tools in the form of a bash script that installs the CO software itself as a set of containers. For example, for K8s we have used the docker-multinode project which uses the official hyperkube binary of K8s. Containerized deployments are portable as they can run on multiple Linux distribution and computer architecture types. • The CO software can be installed using traditional Linux packaging tools. For example, Swarm Integrated mode is automatically installed when installing docker-engine. Similarly, the kubeadm tool installs the kubelet agent of K8s from a Linux package and the rest of K8s is installed as containers. We expect that CO software deployed using this kind of tools performs better than fully containerized deployments. • Cloud orchestration platforms such as Juju5 and gcloud 6 that come with specific deployment packages for installing a container cluster on one or more cloud platforms. In our experiments we have evaluated docker-engine v17.04.0-ce (with Swarm integrated), K8s v1.4.8 using docker-multinode and K8s v1.7.2 using kubeadm. .
4
PERFORMANCE EVALUATION
This section evaluates the performance overhead of the above four features for K8s and Swarm. As stated above, our case study is MongoDB. First, Section 4.1 presents an overview of our OpenStack testbed and measurement approach. Then, Section 4.2 evaluates the first two features of automated container eviction and migration of persistent volumes. Finally, Section 4.3 evaluates the features of virtual networks between containers and service proxies. The code and results of these two experiments are made available at github [12].
4.1
Testbed and measurement approach
Our testbed is an isolated part of a private OpenStack cloud, version Liberty. The OpenStack cloud consists of controller machines and droplets, on which VMs can be scheduled. The droplets have Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz processors and 64GB DIMM DDR3 memory with Ubuntu xenial, while the master controller is an Intel(R) Xeon(R) CPU E5-2430 0 @ 2.20GHz machine with Ubuntu xenial. In our experiments, MongoDB v3.2.4 instances run in a VM with 2 vCPUs and 4 GB of RAM7 . Swapping is turned off. Each pair of vCPUs is mapped to two cores within the same motherboard socket of a droplet. Each droplet has two 10Gbit network interfaces. The storage infrastructure of the OpenStack private cloud is a two-tiered Ceph-based service consisting of replicated SSD storage which serves as a cache for 30 SAS drives. Infrequently used data is eventually flushed to the SAS drives. All VMs run Ubuntu trusty, except in the experiments with kubeadm which requires Ubuntu xenial as OS. We have used the YCSB benchmark (see table 1) [7] for running the performance evaluation experiments. For workloads with a Zipfian8 -based record selection, we measured a tri-modal distribution 5 https://jujucharms.com/containers 6 https://kubernetes.io/docs/getting-started-guides/gce/ 7 in 8 In
OpenStack one refers to these resource quantities as the c2m4 flavor a Zipfian distribution, popular records are read or updated most
Truyen et al. of response latencies. For such distributions, we only analyze the two modes with lowest mean latency, which respectively correspond with processing data from memory and processing data from the SSD data tier. The mode with the highest mean corresponds with data fetches from the SAS drives and is ignored. During the experiments, no VMs of other users were running on the entire isolated part of the OpenStack cloud. It was not possible to be the single user of the storage service however. Network traffic is neither isolated. It is handled by a switch that also processes network packets from other parts of the OpenStack cloud. Means and standard deviations of multiple ping tests between the YCSB VM on one droplet and several MongoDB VMs on another droplet are on average 0.694ms and 0.306 ms respectively. All evaluated MongoDB clusters have their instances distributed across the same set of droplets. The YCSB VM is also deployed on a separate droplet with a c4m8-flavored VM.
4.2
Container eviction and Volume migration
Research questions. We investigate two research questions: (1) Total performance overhead (during normal operation): What is the accumulated performance overhead for both automated container eviction and automated migration of persistent volumes in comparison to an VM-based deployment where persistent volumes can also be migrated by means of a specifically written OpenStack orchestration script? (2) Decomposed performance overhead: What is the performance cost of automated migration of persistent volumes in comparison to container-based deployments with local persistent volumes? What is the performance cost of automated container eviction in comparison to a VM-based deployment without an external volume? We expect that the overhead of using persistent volumes in Docker will be low because they bypass the AUFS or OverlayFS storage drivers used by Docker and therefore do not incur the known potential overheads for write-heavy workloads incurred by these storage drivers [10, 29]. Moreover, we expect that node failure detection and container eviction, which are default features in all the studied CO systems, do not incur much overhead. The overhead of automated migration of persistent volumes using Flocker is the most interesting part of the research question. Experimental setup. With respect to research question 1, we compare 3 deployments: (i) Swarm + Flocker + Cinder, (ii) K8s v1.4.89 + Flocker + Cinder and (iii) a VM with an attached Cinder volume as baseline (see Table 2). A generic out-of-the-box automated solution for volume migration, as Flocker offers, does not exist in OpenStack, although it is possible to program an OpenStack script to attach and detach persistent volumes to VMs. With respect to research question 2, we study what is the overhead of similar deployments as in research question 1, but with persistent volumes pinned on the local host, i.e. without Flocker and Cinder. More specifically we compare 4 deployments: (iv) Swarm + a local Docker volume, (v) K8s v1.4.8 with a HostPath volume, (vi) Docker with a local hostpath volume, and (iv) a VM without attached Cinder volume. 9 using
docker-multi-node
Evaluation of container orchestration for auto-recovery of databases Workload type D - Read latest E - Short ranges F - Read-modify-write
Operations Read: 95% Insert 5% Scan: 95%, Insert 5% Read 50%, RMW 50%
Record selection Latest Zipfian/Uniform Zipfian
Table 1: Workloads D, E and F from the core package of YCSB benchmark [7]
Middleware 2017, Dec 11-15, 2017, Las Vegas, Nevada, USA Deployment name
VM_HostName swarmServiceIP kubeServiceIP kubeDNS
Research question 1 swarmHostPortFlocker kubeHostPortFlocker Research question 2 swarmHostPort_DockerVolume kubeHostPort_HostpathVolume DockerNAT_HostpathVolume Absolute measurements for baseline deployments VM_CinderVolume (Question 1) VM_noExternalVolume (Qstn. 2)
E_scan 4.7% 5.9% E_scan < jitter 1.0% 1.1% E_scan
E_insert 5.6% 13.3% E_insert < jitter 10.8% 9.6% E_insert
100.8ms 90.6ms
3.155 ms 3.259ms
F_read 15.9% 14.3% F_read < jitter < jitter < jitter F_read 2.461ms 2.604ms
F_rmw 6.9% 11.9% F_rmw < jitter 8.8% < jitter F_rmw
∆ RAM 160 MB 230MB ∆ RAM 10 MB 60 MB -40 MB RAM
4.163ms 4.529ms
1.899 GB 1.936 GB
Table 2: Experiment1: Performance overheads for the 0.95 quantile of response latencies and additional memory consumption
kubeadmDNS_xenial kubeDNS_xenial Docker_noNAT
used CO software (+docker vce17-04) none Swarm, overlay & bridge networks K8s v1.4.8, docker-multinode, Flannel K8s v1.4.8, docker-multinode, Flannel K8s v1.7.2, kubeadm, CNI-Flannel K8s v1.4.8, docker-multinode, Flannel --net=host is set
Results. Significant differences in response latency (i.e, differences larger than the above stated network jitter of 300 us, with low standard deviations across all 5 experiments) could only be measured for workloads E and F. This is most likely because these workloads perform composite operations such as scanning a short range of data records, or performing a read-modify-write (rmw) action. Table2 compares the relative performance overhead of the CO-based deployments from research question 1 with the containerbased deployments from research question 2. For the swarmHostPortFlocker and kubeHostPortFlocker deployments of research question 1, there is an average overhead of respectively 8.3% and 11.4% on the 0.95 quantile of response latencies. With respect to research question 2, there is only for the kubeHostPort_HostpathVolume and DockerNAT_HostpathVolume deployments an average overhead of respectively 5% and 2.7%. Note that the DockerNAT_HostpathVolume deployment performs worse than the swarmHostPort_DockerVolume deployment because we started the MongoDB container in Docker without the --net=host option. These results indicate that Flocker is responsible for most of the performance overhead in Swarm and that containerized databases inside VMs, if well configured, can perform as well as VM-based deployments. The results are less conclusive for K8s. Besides Flocker,
HostName:HostPort ServiceIP:ContainerPort ServiceIP:ServicePort DNSname:ContainerPort DNSname:ContainerPort DNSname:ContainerPort VM_IP:HostPort
Table 3: Deployments with internal clients Deployment name
VM_FloatingIP swarmNodePort swarmHostPortFloatingIP kubeNodePort
Measurement approach. In each deployment a single MongoDB instance is started and its YCSB dataset is 11.66GB. We measure response latency for disk-io bounded requests during normal operation by analyzing the second mode in the distribution of measured response latencies. The experiment is as follows: for each of the six YCSB workload types, and for each deployment, we perform a run of 107 requests at a moderate rate of of 300 requests/s. We repeat this experiment five times. We send these requests directly to a HostPort of the MongoDB container. As such we intent to bypass the virtual network software layer as much as possible. Note that kubeadm does not support HostPorts as it is based on the CNI standard. Therefore kubeadm cannot be used for investigating the research questions of this experiment.
MongoDB endpoints
kubeHostPortFloatingIP
used CO software (+docker vce17-04) none Swarm, overlay & bridge networks Swarm, bridge network K8s v1.4.8, docker-multinode, Flannel K8s v1.4.8, docker-multinode, Flannel
MongoDB endpoints FloatingIP:HostPort LocalhostIP:NodePort FloatingIP:HostPort LocalhostIP:NodePort FloatingIP:HostPort
Table 4: Deployments with external clients
K8s also contributes to the increased response latency. A possible explanation is that a containerized deployment of K8s has been evaluated, while Swarm runs directly in the OS of the VM. Due to space constraints, we cannot show resource consumption measurements in detail. As memory is the most costly resource, Table 2 focuses on how much additional memory is consumed during workload E by a Worker node in the container-based deployments. Additionally, the VMs with the Master nodes consumed a total of 373 and 643 MB RAM for Swarm and K8S respectively. On Worker nodes, we measured +/- 2.5 MB/s of additional artificial incoming network traffic that originates from virtual network interfaces docker0 and flannel0. There is also a 2% increase in CPU utilization for all Flocker-based deployments and for the kubeHostPort_HostpathVolume deployment.
4.3
Virtual networks and Service proxies
Research question. What is the performance cost of the virtual networking functionality and stable network names for Services in comparison to an OpenStack-based deployment where TCP/IP location transparency can be achieved by means of attaching floating IP addresses to VMs? FloatingIP addresses are stable IP addresses that can be detached from a failed VM and reattached to another VM. For disk-intensive workloads, we expect that the overhead on response latency of these features will be very low. However, for cpu-intensive workloads such as processing small datasizes in MongoDB, in-memory databases, or databases that are designed for high-performance write throughput such as Cassandra, we expect that virtual networks may impose a non-negligible overhead on response latency.
Middleware 2017, Dec 11-15, 2017, Las Vegas, Nevada, USA
Truyen et al.
Experimental setup. As shown in Tables 2 and 3, we compare twelve deployments of a MongoDB cluster with one Primary and two Secondary instances. The size of the dataset (23 MB) is chosen very small so all operations can be performed in memory. The YCSB VM, which plays the role of MongoDB client, is configured to always read from the nearest MongoDB instance, while write requests wait for one or more secondaries to also acknowledge. The twelve deployments differ according to three variation points: (i) are clients internal10 or external11 to the container cluster? (ii) what are the used CO software (Swarm or K8s), deployment tool and network plugins, (iii) what kind of stable network addresses and service proxies are used between the endpoints of MongoDB instances (ServiceIP, DNSName, NodePort or HostPort). We were not able to configure kubeadm such that ServiceIPs or NodePorts can be used as MongoDB endpoints. 12 . Measurement approach. We measure response latency for YCSB workloads D and F. The experiment itself is setup as a scalability test: For each deployment, for workloads D and F, 16 runs of 105 requests are performed. The number of threads in the YCSB VM is increased with a factor 5 for each consecutive run. Results. We have measured significant negative impacts on scalability for both read and write operations. The largest impact is for insert operations in workload D (see Figures 1 and 2)13 . We see that all container-orchestrated deployments perform worse in comparison to the baseline (VM_HostName). The KubeDNS deployment performs the worst of all deployments. This indicates that K8s version 1.4.8 is not optimized for accessing containers directly via their Container IP. To better understand how the performance has evolved for this particular endpoint type, Figure 1 also compares the kubeadmDNS_xenial and kubeDNS_xenial deployments and shows considerable performance improvements. An interesting result in Figure 2 is that the swarmHostPortFloatingIP deployment performs better than the swarmNodePort deployment because requests are directly sent to the MongoDB container via the virtual network of the OpenStack cloud. The kubeHostPortFloatingIP deployment, surprisingly, does not perform better than the kubeNodePort deployment.
5
CONCLUSION
In this paper we have evaluated the performance overhead of four common features of container orchestration systems that support improved auto-recovery of databases: (i) automated container eviction in response to node failure events, (ii) automated migration of persistent volumes, (iii) virtual networks for containers and (iv) stable network addresses for database containers. As base line we compare with VM-based deployments in an OpenStack cloud in a closed lab. The first feature causes on Worker nodes an average overhead of 2% additional memory consumption and 2.5% overhead for the 0.95 quantile of response latencies during normal operation. 10 internal
clients run inside a container order for MongoDB’s loadbalancing functionality to work properly for the deployments with external clients, the MongoDB endpoints must be configured with network addresses that are reachable outside the container cluster. 12 We could not configure kubeadm in the hairpin mode [24] that allows containers to access themselves via the local service proxy. 13 For Workload D, the YCSB VM starts to become the bottleneck after 50 threads, as such we only show results for the first 8 runs.
11 In
Figure 1: Experiment 2: Deployments with internal clients.
Figure 2: Experiment 2: Deployments with external clients
The second feature incurs on average a 10% overhead on memory consumption and 10% overhead on response latency. The third and fourth features negatively impact the scalability of replicated databases but only for cpu-intensive workloads that are typical for in-memory and write-optimized databases. Figure 2 further demonstrates that the use of HostPorts in combination with a floating/elastic IP addresses can lead to a better performance than using NodePorts in Docker Swarm, provided of course that the particular network deployments of the cloud provider facilitate this. However, automated support for de- and reattaching floating IP addresses on container eviction events still needs to be implemented by a cloud orchestration tool. Finally, Sections 3.4 and 4.3 compare different versions of Kubernetes and different deployment tools in order to understand the evolution of the Kubernetes project. This shows that higher-level configuration abstractions for deploying and managing database clusters as well as performance-specific enhancements for database clusters have been added during the course of this project.
Evaluation of container orchestration for auto-recovery of databases
REFERENCES [1] Openstack user survey 2016. https://www.openstack.org/assets/survey/ October2016SurveyReport.pdf. Accessed: August 18 2017. [2] Apache. A port-mapper plugin for CNI networks. http://mesos.apache.org/ documentation/latest/cni/#a-port-mapper-plugin. Accessed: August 12 2017. [3] Kapil Arya. Networking for Mesos-managed containers. https://github.com/ apache/mesos/blob/0.25.0/docs/networking-for-mesos-managed-containers. md. Accessed: August 16 2017. [4] E. Bekas and K. Magoutis. Cross-layer management of a containerized nosql data store. In 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pages 1213–1221, May 2017. [5] E. Brewer. CAP twelve years later: How the "rules" have changed. Computer, 45(2):23–29, 2012. [6] Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. Borg, omega, and kubernetes. Commun. ACM, 59(5):50–57, 2016. [7] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with YCSB. Proceedings of the 1st ACM symposium on Cloud computing - SoCC ’10, pages 143–154, 2010. [8] Datadog. 8 surprising facts about real Docker adoption. https://www.datadoghq. com/docker-adoption/, 2017. Accessed: August 2 2017. [9] Sandeep Dinesh. Running MongoDB on Kubernetes with StatefulSets. http://blog. kubernetes.io/2017/01/running-mongodb-on-kubernetes-with-statefulsets. html, 2017. Accessed: May 24 2017. [10] Docker. Use the OverlayFS storage driver – OverlayFS and Docker performance. https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/ #overlayfs-and-docker-performance, 2017. Accessed: August 9 2017. [11] Che-Wei Lin et al. CNI - the Container Network Interface. https://github.com/ containernetworking/cni#cni---the-container-network-interface. Accessed: August 16 2017. [12] Eddy Truyen et al. Container orchestration for auto-recovery of databases. https://github.com/eddytruyen/containers_on_openstack/wiki/ Container-orchestration-for-auto-recovery-of-databases. Accessed: August 18 2017. [13] Matt T. Proud et al. Networking. https://github.com/kubernetes/kubernetes/ blob/release-0.4/docs/networking.md. Accessed: August 16 2017. [14] Miguel G. Xavier et al. A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters. 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pages 299–306, 2014. [15] Miguel G. Xavier et al. A Performance Isolation Analysis of Disk-Intensive Workloads on Container-Based Clouds. 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pages 253–260, 2015. [16] Misty Stanley-Jones et al. Swarm and container networks. https://github.com/ docker/docker.github.io/blob/v1.12-release/swarm/networking.md. Accessed: August 16 2017. [17] Tomasz Napierala et al. Networking design proposal. https://github. com/kubernetes/community/blob/master/contributors/design-proposals/ networking.md. Accessed: August 16 2017. [18] Wes Felter, Alexandre Ferreira, Ram Rajamony, and Juan Rubio. An updated performance comparison of virtual machines and linux containers. In Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium On, pages 171–172. IEEE, 2015. [19] Benjamin Hindman, Andy Konwinski, A Platform, Fine-Grained Resource, and Matei Zaharia. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX conference on Networked systems design and implementation (NSDI 2011). USENIX Association, 2011. [20] Nane Kratzke. A Lightweight Virtualization Cluster Reference Architecture Derived from Open Source PaaS Platforms. Open Journal of Mobile Computing and Cloud Computing, 1(2):17–30, 2014. [21] Nane Kratzke. About microservices, containers and their underestimated impact on network performance. In Proceedings of CLOUD COMPUTING 2015 (6th. International Conference on Cloud Computing, GRIDS and Virtualization), pages 165–169, 2015. [22] Nane Kratzke and Peter-Christian Quint. How to operate container clusters more efficiently? some insights concerning containers, software-defined-networks, and their sometimes counterintuitive impact on network performance. International Journal On Advances in Networks and Services, 8(3&4):203–214, 2015. [23] Kubernetes. Cluster Networking. https://kubernetes.io/docs/concepts/ cluster-administration/networking/#flannel. Accessed: August 16 2017. [24] Kubernetes. Debug Services – a pod cannot reach itself via service ip. https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/ #a-pod-cannot-reach-itself-via-service-ip. Accessed: August 15 2017. [25] Stephen Nguyen. Ridiculously Fast MongoDB Replica Recovery. https:// dzone.com/articles/ridiculously-fast-mongodb-replica-recovery-part-1, 2016. Accessed: June 30 2017. [26] Openstack. Openstack Documentation – Cinder System Architecture. https: //docs.openstack.org/cinder/latest/contributor/architecture.html, 2017. Accessed:
Middleware 2017, Dec 11-15, 2017, Las Vegas, Nevada, USA August 9 2017. [27] ScatterHQ. Flocker – Container data volume manager for your Dockerized application. https://github.com/ScatterHQ/flocker/, 2017. Accessed: August 9 2017. [28] Mathijs Jeroen Scheepers. Virtualization and Containerization of Application Infrastructure : A Comparison. 2014. [29] Prateek Sharma, Lucas Chaufournier, Prashant Shenoy, and Y. C. Tay. Containers and virtual machines at scale: A comparative study. In Proceedings of the 17th International Middleware Conference, Middleware ’16. ACM, 2016. [30] Stephen Soltesz, Herbert Pötzl, Marc E. Fiuczynski, Andy C. Bavier, and Larry L. Peterson. Container-based operating system virtualization: a scalable, highperformance alternative to hypervisors. In Proceedings of the 2007 EuroSys Conference, Lisbon, Portugal, March 21-23, 2007, pages 275–287. ACM, 2007. [31] MongoDB (TM). Running MongoDB as a Microservice with Docker and Kubernetes. https://www.mongodb.com/blog/post/ running-mongodb-as-a-microservice-with-docker-and-kubernetes, 2016. Accessed: August 2 2017. [32] Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. Large-scale cluster management at Google with Borg. Eurosys, 2015. [33] Ryan Wallner. Tutorial: Deploying a replicated redis cluster on kubernetes with flocker, https://clusterhq.com/2016/02/11/kubernetes-redis-cluster. [34] Mingwei Zhang, Daniel Marino, and Petros Efstathopoulos. Harbormaster: Policy Enforcement for Containers. 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom), pages 355–362, 2015.