eBay Tech Blog

Delivering eBay’s CI Solution with Apache Mesos – Part II

by The eBay PaaS Team on 05/12/2014

in Cloud,Data Infrastructure and Services,Software Engineering

In part I of this post we laid out in detail how to run a large Jenkins CI farm in Mesos. In this post we explore running the builds inside Docker containers and more:

  • Explain the motivation for using Docker containers for builds.
  • Show how to handle the case where the build itself is a Docker build.
  • Peek into how the Mesos 0.19 release is going to change Docker integration.
  • Walk through a Vagrant all-in-one-box setup so you can try things out.


Jenkins follows the master-slave model and is capable of launching tasks as remote Java processes on Mesos slave machines. Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications or frameworks. We can leverage the capabilities of Jenkins and Mesos to run a Jenkins slave process within a Docker container using Mesos as the resource manager.

Why use Docker containers?

This page gives a good picture of what Docker is all about.

At eBay Inc., we have several different build clusters. They are primarily partitioned due to a number of factors:  requirements to run different OS flavors (mostly RHEL and Ubuntu), software version conflicts, associated application dependencies, and special hardware. When using Mesos, we try to operate on a single cluster with heteregeneous workloads instead of having specialized clusters. Docker provides a good solution to isolate the different dependencies inside the container irrespective of the host setup where the Mesos slave is running, thereby helping us operate on a single cluster. Special hardware requirements can always be handled though slave attributes that the Jenkins plugin already supports. Overall, then, this setup scheme helps maintain consistent host images in the cluster, avoids having to introduce a wide combination of different flavors of Mesos slave hosts running, yet handles all the varied build dependencies within a container.

Now why support Docker-in-Docker setup?

When we started experimenting with running the builds in Docker containers, some of our teammates were working on enabling Docker images for applications. They posed the question, How do we support Docker build and push/pull operations within the Docker container used for the build? Valid point! So, we will explore two ways of handling this challenge. Many thanks to Jérôme Petazzoni from the Docker team for his guidance.

Environment setup

A Vagrant development VM setup demonstrates CI using Docker containers. This VM can be used for testing other frameworks like Chronos and Aurora; however, we will focus on the CI use of it with Marathon. The screenshots shown below have been taken from the Vagrant development environment setup, which runs a cluster of three Mesos masters, three Mesos slave instances, and one Marathon instance. (Marathon is a Mesos framework for long-running services. It provides a REST API for starting, stopping, and scaling services.) mesos1 marathon1 mesos2 mesos3

Running Jenkins slaves inside Mesos Docker containers requires the following ecosystem:

  1. Jenkins master server with the Mesos scheduler plugin installed (used for building Docker containers via CI jobs).
  2. Apache Mesos master server with at least one slave server .
  3. Mesos Docker Executor installed on all Mesos slave servers. Mesos slaves delegate execution of tasks within Docker containers to the Docker executor. (Note that integration with Docker changes with the Mesos 0.19 release, as explained in the miscellaneous section at the end of this post.)
  4. Docker installed on all slave servers (to automate the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere).
  5. Docker build container image in the Docker registry.
  6. Marathon framework.

1. Creating the Jenkins master instance

We needed to first launch a standalone Jenkins master instance in Mesos via the Marathon framework.  We placed Jenkins plugins in the plugins directory, and included a default config.xml file with pre-configured settings. Jenkins was then launched by executing the jenkins.war file. Here is the directory structure that we used for launching the Jenkins master:

├── README.md
├── config.xml
├── hudson.model.UpdateCenter.xml
├── jenkins.war
├── jobs
├── nodeMonitors.xml
├── plugins
│   ├── mesos.hpi
│   └── saferestart.jpi
└── userContent
└── readme.txt
3 directories, 8 files

2. Launching the Jenkins master instance

Marathon launched the Jenkins master instance using the following command, also shown in the Marathon UI screenshots below. We zipped our Jenkins files and downloaded them for the job by using the URIs field in the UI; however, for demonstration purposes, below we show using a Git repository to achieve the same goal.

git clone https://github.com/ahunnargikar/jenkins-standalone && cd jenkins-standalone;
export JENKINS_HOME=$(pwd);
java -jar jenkins.war






3. Launching Jenkins slaves using the Mesos Docker executor


Here’s a sample supervisord startup configuration for a Docker image capable of executing Jenkins slave jobs:


command=/bin/bash -c "eval $JENKINS_COMMAND"

As you can see, Jenkins passed its slave launch command as an environment variable to the Docker container. The container then initialized the Jenkins slave process, which fulfilled the basic requirement for kicking off the Jenkins slave job.

This configuration was sufficient to launch regular builds within the Docker container of choice. Now let’s walk through the two options that we explored to run Docker operations for a CI build inside a Docker container. Strategy #1 required use of supervisord to control the Docker daemon process. For the default case (regular non-Docker builds) and strategy #2, supervisord was not required; one could simply pass the command directly to the Docker container.

3.1 Strategy #1 – Using an individual Docker-in-Docker (dind) setup on each Mesos slave

This strategy, inspired by this blog,  involved a dedicated Docker daemon inside the Docker container. The advantage of this approach was that we didn’t have a single Docker daemon handling a large number of container builds. On the flip side, each container was now absorbing the I/O overhead of downloading and duplicating all the AUFS file system layers.


The Docker-in-Docker container had to be launched in privileged mode (by including the “-privileged” option in the Mesos Docker executor code); otherwise, nested Docker containers wouldn’t work. Using this strategy, we ended up having two Docker executors:  one for launching Docker containers in non-privileged mode (/var/lib/mesos/executors/docker) and the other for launching Docker-in-Docker containers in privileged mode (/var/lib/mesos/executors/docker2). The supervisord process manager configuration was updated to run the Docker daemon process in addition to the Jenkins slave job process.


The following Docker-in-Docker image has been provided for demonstration purposes for testing out the multi-Docker setup:


In real life, the actual build container image would capture the build dependencies and base image flavor, in addition to the contents of the above dind image. The actual command that the Docker executor ran looked similar to this one:

docker run 
-cidfile /tmp/docker_cid.6c6bba3db72b7483 
-c 51 -m 302365697638 
-e JENKINS_COMMAND=wget -O slave.jar && java -DHUDSON_HOME=jenkins -server -Xmx256m -Xms16m -XX:+UseConcMarkSweepGC -Djava.net.preferIPv4Stack=true -jar slave.jar  -jnlpUrl hashish/jenkins-dind

3.2 Strategy #2 – Using a shared Docker Setup on each Mesos slave

All of the Jenkins slaves running on a Mesos slave host could simply use a single Docker daemon for running their Docker containers, which was the default standard setup. This approach eliminated redundant network and disk I/O involved with downloading the AUFS file system layers. For example, all Java application projects could now reuse the same AUFS file system layers that contained the JDK, Tomcat, and other static Linux package dependencies. We lost isolation as far as the Docker daemon was concerned, but we gained a massive reduction in I/O and were able to leverage caching of build layers. This was the optimal strategy for our use case.


The Docker container mounted the host’s /var/run/docker.sock file descriptor as a shared volume so that its native Docker binary, located at /usr/local/bin/docker, could now communicate with the host server’s Docker daemon. So all Docker commands were now directly being executed by the host server’s Docker daemon. This eliminated the need for running individual Docker daemon processes on the Docker containers that were running on a Mesos slave server.

The following Docker image has been provided for demonstration purposes for a shared Docker setup. The actual build Docker container image of choice essentially just needed to execute the Docker binary via its CLI. We could even have mounted the Docker binary from the host server itself to the same end.


The actual command that the Docker executor ran looked similar to this:

docker run 
-cidfile /tmp/docker_cid.6c6bba3db72b7483 
-v /var/run/docker.sock:/var/run/docker.sock 
-c 51 -m 302365697638 
-e JENKINS_COMMAND=wget -O slave.jar && java -DHUDSON_HOME=jenkins -server -Xmx256m -Xms16m -XX:+UseConcMarkSweepGC -Djava.net.preferIPv4Stack=true -jar slave.jar  -jnlpUrl hashish/jenkins-dind-single

4. Specifying the cloud configuration for the Jenkins master

We then needed to configure the Jenkins master so that it would connect to the Mesos master server and start receiving resource offers, after which it could begin launching tasks on Mesos. The following screenshots illustrate how we configured the Jenkins master via its web administration UI.






Note: The Docker-specific configuration options above are not available in the stable release of the Mesos plugin. Major changes are underway in the upcoming Mesos 0.19.0 release, which will introduce the pluggable containerizer functionality. We decided to wait for 0.19.0 to be released before making a pull request for this feature. Instead, a modified .hpi plugin file was created from this Jenkins Mesos plugin branch and has been included in the Vagrant dev setup.



5. Creating the Jenkins Mesos Docker job

Now that the Jenkins scheduler had registered as a framework in Mesos, it started receiving resource offers from the Mesos master. The next step was to create a Jenkins job that would be launched on a Mesos slave whose resource offer satisfied the cloud configuration requirements.

5.1 Creating a Docker Tomcat 7 application container image

Jenkins first needed a Docker container base image that packaged the application code and dependencies as well as a web server. For demonstration purposes, here’s a sample Docker Tomcat 7 image created from this Github repository:


Every application’s Git repository would be expected to have its unique Dockerfile with whatever combination of Java/PHP/Node.js pre-installed in a base container. In the case of our Java apps, we simply built the .war file using Maven and then inserted it into the Docker image during build time. The Docker image was then tagged with the application name, version, and timestamp, and then uploaded into our private Docker registry.

5.2 Running a Jenkins Docker job

For demonstration purposes, the following example assumes that we are building a basic Java web application.







Once Jenkins built and uploaded the new application’s Docker image containing the war, dependencies, and other packages, this Docker image was launched in Mesos and scaled up or down to as many instances as required via the Marathon APIs.

Miscellaneous points

Our Docker integration with Mesos is going to be outdated soon with the 0.19 release. Our setup was against Mesos 0.17 and Docker 0.9.  You can read about the Mesos pluggable containerizer feature in this blog and in this ticket. The Mesosphere team is also working on the deimos project to integrate Docker with the external containerization approach. There is an old pull request against the Mesos Jenkins plugin to integrate containerization once it’s released. We will update our setup accordingly when this feature is rolled out. We’d like to add a disclaimer that the Docker integration in the above post hasn’t been tested at scale yet; we will do our due diligence once Mesos 0.19 and deimos are out.

For different build dependencies, you can define a build label for each. A merged PR already specifies the attributes per label. Hence, a Docker container image of choice can be added per build label.


This concludes the description of our journey, giving a good overview of how we ran a distributed CI solution on top of Mesos, utilizing resources in the most efficient manner and isolating build dependencies through Docker.

{ 16 comments… read them below or add one }

Daniel May 18, 2014 at 5:41AM

Linking /var/run/docker.sock into the container seems an interesting approach. I have experimented with this setup as well. But now I struggle with this container: It cannot be removed (“device or resource busy” when removing root file system of the container) or started again (“Error getting container … from driver aufs). Did you faced the same problem?


The eBay PaaS Team May 21, 2014 at 12:05PM

Daniel thanks for bringing the point up. In the version of the mesos-docker executor from mesosphere that we were using for our testing, cleanup of the mesos task only did a docker stop operation which was successful even though unmount failed.
In latest version of mesos-docker executor,
in the cleanup_container() its doing a “docker rm” that would hang at least in docker 0.9 because of mounting docker.sock. However, the good news is in latest docker 0.11 release, the issue is fixed. It was also tracked in this resolved issue

We will also update the vagrant setup mentioned in the article to use the latest docker executor and update to docker 0.11 so benefits of removing the container is achieved.
The latest docker documentation is also endorsing this approach
(To quote ‘By bind-mounting the docker unix socket and statically linked docker binary (such as that provided by https://get.docker.io), you give the container the full access to create and manipulate the host’s docker daemon.’)

Thanks for bringing this point up.


Daniel May 27, 2014 at 3:28AM

Thanks for the answer and insights!


Adam Spektor June 1, 2014 at 7:16AM

Great article.
I have a several question, in case parent docker will execute all the commands we will still have a problem with concurrent execution of tomcat (for example), Im talking about port collisions. I thought that all containers will be isolated in child docker so I can execute several tomcats inside different containers without taking care about random ports. Also when I stop internal docker I still can see this containers that were executed inside -> I should take care about tear-down.

In case all this true , I hope it not :) what are the benefits of using Docker inside Docker ?




The eBay PaaS Team June 2, 2014 at 3:24PM


Glad you asked! This blog post is in the context of CI builds and not about running application Docker containers on Mesos. To simply run multiple docker containers on Mesos, you do not need to apply any of the strategies that we talked about.

To answer your question about port assignments & collisions first – You can rely on the Marathon framework to run Docker containers in a Mesos cluster. In general Mesos has the ability to dynamically assign host-level ports which the executor (mesos-docker in this case) maps to the static ports defined in the app Dockerfile. This takes care of the port collision problem.

EBay has a polyglot platform running Java, C++, Node.js, Python and Scala applications. Running CI builds as a plain Mesos task would require us to install all the dependencies on the Mesos slave host server. Imagine having to install the latest JDK or Python updates on thousands of production Mesos slave nodes….painful indeed! Downloading and installing these dependencies during build time in every CI job is equally painful due to the I/O overhead. So relying on Docker is a necessity since all these dependencies are now isolated within the cached Docker container layers used to run the Jenkins job.

Now, Docker-in-Docker wouldn’t be required if the final deliverable of the Jenkins job was simply a war or zip file (a popular use case). A single Docker build container isolating the different dependencies can achieve that. Our final build deliverable in this case is a standalone Docker image and in our example the build job running inside the Docker container had to run docker operations like build and push. The outer Docker container handles caching of the build job CI dependencies and the inner Docker installation handles the build & push operation of the app Docker image. By using strategy #2 as described in the above post we’re simply relaying the Docker build & push commands to the Mesos slave host’s Docker daemon.

Finally, about the container tear down and cleanup – I believe that if you used strategy #1 then you would have to clean up the AUFS layers used by the nested Docker container manually and that’s why we preferred strategy #2. Using the Docker “rm” option should clean up the remnant container. Please refer to http://docs.docker.io/reference/commandline/cli/

Hope this answers your questions.



Chong Chen June 10, 2014 at 10:16AM

Very interesting article. I do have a technical question regarding mesos docker executor and Jenkin slave.

In Mesos world, once framework gets offer, it will launch tasks to use offered resource. In Jenkin context, what does each task represent here? A build item or just a Jenkin slave daemon that will need to connect back to master to fetch build item? In particular, what is this task mean in your picture?


And when do you terminate Jenkin docker container? Will you terminate it once each build item completes?


The eBay PaaS Team June 11, 2014 at 11:16AM

A “Task” is a generic term used for anything launched in Mesos. For example, if you ran a bash script in Mesos using the default command executor that is a task; if you ran a Docker container in Mesos using the Docker executor that too is a task. The Jenkins master instance, itself acting as a Mesos framework, launches its build job/task and a unique id is assigned to it which is “mesos-jenkins-be65c8fa-d409-4743-a3fa-c8679808c7cc”. This task is actually a Docker container inside which the Jenkins slave agent is initialized using supervisord. The slave agent then proceeds to download the job build scripts from the Jenkins master and executes them.

Jenkins has an “Idle Termination Minutes” field in its configuration (shown in one of the screenshots above) which controls how long the slave job is kept around after its build has completed. Developers can set the timeout appropriately so that the Docker container is terminated by the Jenkins master via the Docker executor immediately after the build job has completed or it can be kept around for several hours for reuse in a subsequent build. Either approach is fine depending on how you want to manage your Mesos cluster resources.



Julien Eid June 16, 2014 at 4:01PM

Hey! I’m trying to use https://github.com/ahunnargikar/mesos-plugin which is located in https://github.com/ahunnargikar/vagrant-mesos and I’m finding that I get this error with Docker enabled.

INFO: Received offers 1
Jun 16, 2014 10:54:06 PM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Received offers 1
Jun 16, 2014 10:54:09 PM org.jenkinsci.plugins.mesos.MesosCloud provision
INFO: Provisioning Jenkins Slave on Mesos with 1 executors. Remaining excess workload: 0 executors)
Jun 16, 2014 10:54:09 PM hudson.slaves.NodeProvisioner update
INFO: Started provisioning MesosCloud from MesosCloud with 1 executors. Remaining excess workload:0.0
Jun 16, 2014 10:54:09 PM org.jenkinsci.plugins.mesos.MesosComputerLauncher
INFO: Constructing MesosComputerLauncher
Jun 16, 2014 10:54:09 PM org.jenkinsci.plugins.mesos.MesosSlave
INFO: Constructing Mesos slave
Jun 16, 2014 10:54:12 PM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Received offers 1
Jun 16, 2014 10:54:17 PM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Received offers 1
Jun 16, 2014 10:54:19 PM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Launching slave computer mesos-jenkins-bc416717-768b-4e8b-a7cd-3bab75ae0db4
Jun 16, 2014 10:54:19 PM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
INFO: Sending a request to start jenkins slave mesos-jenkins-bc416717-768b-4e8b-a7cd-3bab75ae0db4
Jun 16, 2014 10:54:19 PM org.jenkinsci.plugins.mesos.JenkinsScheduler requestJenkinsSlave
INFO: Enqueuing jenkins slave request
Jun 16, 2014 10:54:19 PM hudson.slaves.NodeProvisioner update
INFO: MesosCloud provisioning successfully completed. We have now 2 computer(s)
Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Received offers 1
Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
WARNING: Ignoring disk resources from offer
Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
INFO: Ignoring ports resources from offer
Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
INFO: Offer matched! Creating mesos Docker task
Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler createMesosDockerTask
INFO: Launching task mesos-jenkins-bc416717-768b-4e8b-a7cd-3bab75ae0db4 with command exec /var/lib/mesos/executors/docker ubuntu
java.lang.NoSuchMethodError: org.apache.mesos.MesosSchedulerDriver.launchTasks(Ljava/util/Collection;Ljava/util/Collection;Lorg/apache/mesos/Protos$Filters;)Lorg/apache/mesos/Protos$Status;
at org.jenkinsci.plugins.mesos.JenkinsScheduler.createMesosDockerTask(JenkinsScheduler.java:420)
at org.jenkinsci.plugins.mesos.JenkinsScheduler.resourceOffers(JenkinsScheduler.java:191)
Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler$1 run
SEVERE: The mesos driver was aborted!

It is a NoSuchMethodError. I only have this issue when Docker is enabled, causing driver.launchTasks(offerIds, tasks, filters); to be run instead of normally driver.launchTasks(offer.getId(), tasks, filters); when Docker is disabled. I see that offerIds is a List of ID’s, does launchTasks actually take a List? Because normally, you pass a single id.


The eBay PaaS Team June 18, 2014 at 11:09AM


Just a guess but the exception might indicate an incompatibility between the Mesos driver version that plugin is using and your Mesos cluster version. The driver.launchTasks() method probably requires a list of OfferIds instead of simply passing in a single OfferId now.

Are you using the above Vagrant setup which has Mesos 0.18.2 and Docker 0.11.0 or is it a different environment? This plugin .hpi file was last built using Mesos 0.17.0 jars and seems to work fine with Mesos 0.18.2.



Ivan Kurnosov July 6, 2014 at 8:50PM

They are a really interesting 2 articles, but there is something not clear for me:

you’ve started a jenkins master as a mesos(/marathon) job and didn’t do anything explicit to persist the changes.

Which means that as soon as the jenkins master dies – marathon will resurrect it but it will be the clean installation without jobs and other things configured.

Is it for sake of simplicity of the article or I’m missing something?


The eBay PaaS Team July 7, 2014 at 4:01PM


Great question! You’ve pointed out correctly that as soon as the Jenkins instance dies and Marathon re-spawns it, all the job configs and history will be lost. At eBay our PaaS system maintains preconfigured Jenkins config.xml and job templates depending on the stack chosen. It then provides a vanity Jenkins master URL to the developer using an HTTP proxy which resolves it to the correct Mesos instance. This vanity URL doesn’t change and the Marathon event bus can be used to update the proxy dynamically, capture lost tasks and create replacement tasks. Check out https://github.com/mesosphere/marathon/wiki/Event-Bus for more information.

The build artifacts can be persisted using NFS mounts, Amazon S3 or Openstack Swift which is outside the scope of this blog post.



The eBay PaaS Team July 10, 2014 at 10:30AM
ioan October 14, 2014 at 3:31AM

Great article guys, I think you managed to convey a pretty confusing subject in a straight-forward manner. Loved the diagrams!


mihai November 13, 2014 at 3:01AM

Great example.
It had really showed me how is done. I was very confused about all this new things. Great!

Any plans to update the software to latest, which supports docker natively?
Thank you!


Jay November 24, 2014 at 2:15PM

I too am interested in seeing if the docs could be updated with the latest versions of marathon, mesos, and docker. Thanks!


Joe Hughes December 12, 2014 at 3:26PM

Hey Guys,

Thanks for the article! It is great. I am running on the latest Mesos and Using the latest mesos plugin. The problem I am seeing is on the Mesos Slaves I have to create a jenkins user. It seems that after that the slave startup fails to be able to fetch the docker container. My guess is that the jenkins user needs to be added to the Docker group. I am going to try that and I will post if that fixes the issue.

There is a brief section at https://github.com/jenkinsci/mesos-plugin that mentions “Mesos Slave Setup” that mentions creating a jenkins user on the slaves. I was just curious if anyone else trying to set this up has hit this issue.


Leave a Comment

{ 9 trackbacks }

Previous post:

Next post:

Copyright © 2011-2015 eBay Inc. All Rights Reserved - User Agreement - Privacy Policy - Comment Policy