Delivering eBay’s CI Solution with Apache Mesos – Part II

In part I of this post we laid out in detail how to run a large Jenkins CI farm in Mesos. In this post we explore running the builds inside Docker containers and more:

  • Explain the motivation for using Docker containers for builds.
  • Show how to handle the case where the build itself is a Docker build.
  • Peek into how the Mesos 0.19 release is going to change Docker integration.
  • Walk through a Vagrant all-in-one-box setup so you can try things out.


Jenkins follows the master-slave model and is capable of launching tasks as remote Java processes on Mesos slave machines. Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications or frameworks. We can leverage the capabilities of Jenkins and Mesos to run a Jenkins slave process within a Docker container using Mesos as the resource manager.

Why use Docker containers?

This page gives a good picture of what Docker is all about.

At eBay Inc., we have several different build clusters. They are primarily partitioned due to a number of factors:  requirements to run different OS flavors (mostly RHEL and Ubuntu), software version conflicts, associated application dependencies, and special hardware. When using Mesos, we try to operate on a single cluster with heteregeneous workloads instead of having specialized clusters. Docker provides a good solution to isolate the different dependencies inside the container irrespective of the host setup where the Mesos slave is running, thereby helping us operate on a single cluster. Special hardware requirements can always be handled though slave attributes that the Jenkins plugin already supports. Overall, then, this setup scheme helps maintain consistent host images in the cluster, avoids having to introduce a wide combination of different flavors of Mesos slave hosts running, yet handles all the varied build dependencies within a container.

Now why support Docker-in-Docker setup?

When we started experimenting with running the builds in Docker containers, some of our teammates were working on enabling Docker images for applications. They posed the question, How do we support Docker build and push/pull operations within the Docker container used for the build? Valid point! So, we will explore two ways of handling this challenge. Many thanks to Jérôme Petazzoni from the Docker team for his guidance.

Environment setup

A Vagrant development VM setup demonstrates CI using Docker containers. This VM can be used for testing other frameworks like Chronos and Aurora; however, we will focus on the CI use of it with Marathon. The screenshots shown below have been taken from the Vagrant development environment setup, which runs a cluster of three Mesos masters, three Mesos slave instances, and one Marathon instance. (Marathon is a Mesos framework for long-running services. It provides a REST API for starting, stopping, and scaling services.) mesos1 marathon1 mesos2 mesos3

Running Jenkins slaves inside Mesos Docker containers requires the following ecosystem:

  1. Jenkins master server with the Mesos scheduler plugin installed (used for building Docker containers via CI jobs).
  2. Apache Mesos master server with at least one slave server .
  3. Mesos Docker Executor installed on all Mesos slave servers. Mesos slaves delegate execution of tasks within Docker containers to the Docker executor. (Note that integration with Docker changes with the Mesos 0.19 release, as explained in the miscellaneous section at the end of this post.)
  4. Docker installed on all slave servers (to automate the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere).
  5. Docker build container image in the Docker registry.
  6. Marathon framework.

1. Creating the Jenkins master instance

We needed to first launch a standalone Jenkins master instance in Mesos via the Marathon framework.  We placed Jenkins plugins in the plugins directory, and included a default config.xml file with pre-configured settings. Jenkins was then launched by executing the jenkins.war file. Here is the directory structure that we used for launching the Jenkins master:

├── config.xml
├── hudson.model.UpdateCenter.xml
├── jenkins.war
├── jobs
├── nodeMonitors.xml
├── plugins
│   ├── mesos.hpi
│   └── saferestart.jpi
└── userContent
└── readme.txt
3 directories, 8 files

2. Launching the Jenkins master instance

Marathon launched the Jenkins master instance using the following command, also shown in the Marathon UI screenshots below. We zipped our Jenkins files and downloaded them for the job by using the URIs field in the UI; however, for demonstration purposes, below we show using a Git repository to achieve the same goal.

git clone && cd jenkins-standalone;
export JENKINS_HOME=$(pwd);
java -jar jenkins.war






3. Launching Jenkins slaves using the Mesos Docker executor


Here’s a sample supervisord startup configuration for a Docker image capable of executing Jenkins slave jobs:


command=/bin/bash -c "eval $JENKINS_COMMAND"

As you can see, Jenkins passed its slave launch command as an environment variable to the Docker container. The container then initialized the Jenkins slave process, which fulfilled the basic requirement for kicking off the Jenkins slave job.

This configuration was sufficient to launch regular builds within the Docker container of choice. Now let’s walk through the two options that we explored to run Docker operations for a CI build inside a Docker container. Strategy #1 required use of supervisord to control the Docker daemon process. For the default case (regular non-Docker builds) and strategy #2, supervisord was not required; one could simply pass the command directly to the Docker container.

3.1 Strategy #1 – Using an individual Docker-in-Docker (dind) setup on each Mesos slave

This strategy, inspired by this blog,  involved a dedicated Docker daemon inside the Docker container. The advantage of this approach was that we didn’t have a single Docker daemon handling a large number of container builds. On the flip side, each container was now absorbing the I/O overhead of downloading and duplicating all the AUFS file system layers.


The Docker-in-Docker container had to be launched in privileged mode (by including the “-privileged” option in the Mesos Docker executor code); otherwise, nested Docker containers wouldn’t work. Using this strategy, we ended up having two Docker executors:  one for launching Docker containers in non-privileged mode (/var/lib/mesos/executors/docker) and the other for launching Docker-in-Docker containers in privileged mode (/var/lib/mesos/executors/docker2). The supervisord process manager configuration was updated to run the Docker daemon process in addition to the Jenkins slave job process.


The following Docker-in-Docker image has been provided for demonstration purposes for testing out the multi-Docker setup:


In real life, the actual build container image would capture the build dependencies and base image flavor, in addition to the contents of the above dind image. The actual command that the Docker executor ran looked similar to this one:

docker run 
-cidfile /tmp/docker_cid.6c6bba3db72b7483 
-c 51 -m 302365697638 
-e JENKINS_COMMAND=wget -O slave.jar && java -DHUDSON_HOME=jenkins -server -Xmx256m -Xms16m -XX:+UseConcMarkSweepGC -jar slave.jar  -jnlpUrl hashish/jenkins-dind

3.2 Strategy #2 – Using a shared Docker Setup on each Mesos slave

All of the Jenkins slaves running on a Mesos slave host could simply use a single Docker daemon for running their Docker containers, which was the default standard setup. This approach eliminated redundant network and disk I/O involved with downloading the AUFS file system layers. For example, all Java application projects could now reuse the same AUFS file system layers that contained the JDK, Tomcat, and other static Linux package dependencies. We lost isolation as far as the Docker daemon was concerned, but we gained a massive reduction in I/O and were able to leverage caching of build layers. This was the optimal strategy for our use case.


The Docker container mounted the host’s /var/run/docker.sock file descriptor as a shared volume so that its native Docker binary, located at /usr/local/bin/docker, could now communicate with the host server’s Docker daemon. So all Docker commands were now directly being executed by the host server’s Docker daemon. This eliminated the need for running individual Docker daemon processes on the Docker containers that were running on a Mesos slave server.

The following Docker image has been provided for demonstration purposes for a shared Docker setup. The actual build Docker container image of choice essentially just needed to execute the Docker binary via its CLI. We could even have mounted the Docker binary from the host server itself to the same end.


The actual command that the Docker executor ran looked similar to this:

docker run 
-cidfile /tmp/docker_cid.6c6bba3db72b7483 
-v /var/run/docker.sock:/var/run/docker.sock 
-c 51 -m 302365697638 
-e JENKINS_COMMAND=wget -O slave.jar && java -DHUDSON_HOME=jenkins -server -Xmx256m -Xms16m -XX:+UseConcMarkSweepGC -jar slave.jar  -jnlpUrl hashish/jenkins-dind-single

4. Specifying the cloud configuration for the Jenkins master

We then needed to configure the Jenkins master so that it would connect to the Mesos master server and start receiving resource offers, after which it could begin launching tasks on Mesos. The following screenshots illustrate how we configured the Jenkins master via its web administration UI.






Note: The Docker-specific configuration options above are not available in the stable release of the Mesos plugin. Major changes are underway in the upcoming Mesos 0.19.0 release, which will introduce the pluggable containerizer functionality. We decided to wait for 0.19.0 to be released before making a pull request for this feature. Instead, a modified .hpi plugin file was created from this Jenkins Mesos plugin branch and has been included in the Vagrant dev setup.



5. Creating the Jenkins Mesos Docker job

Now that the Jenkins scheduler had registered as a framework in Mesos, it started receiving resource offers from the Mesos master. The next step was to create a Jenkins job that would be launched on a Mesos slave whose resource offer satisfied the cloud configuration requirements.

5.1 Creating a Docker Tomcat 7 application container image

Jenkins first needed a Docker container base image that packaged the application code and dependencies as well as a web server. For demonstration purposes, here’s a sample Docker Tomcat 7 image created from this Github repository:


Every application’s Git repository would be expected to have its unique Dockerfile with whatever combination of Java/PHP/Node.js pre-installed in a base container. In the case of our Java apps, we simply built the .war file using Maven and then inserted it into the Docker image during build time. The Docker image was then tagged with the application name, version, and timestamp, and then uploaded into our private Docker registry.

5.2 Running a Jenkins Docker job

For demonstration purposes, the following example assumes that we are building a basic Java web application.







Once Jenkins built and uploaded the new application’s Docker image containing the war, dependencies, and other packages, this Docker image was launched in Mesos and scaled up or down to as many instances as required via the Marathon APIs.

Miscellaneous points

Our Docker integration with Mesos is going to be outdated soon with the 0.19 release. Our setup was against Mesos 0.17 and Docker 0.9.  You can read about the Mesos pluggable containerizer feature in this blog and in this ticket. The Mesosphere team is also working on the deimos project to integrate Docker with the external containerization approach. There is an old pull request against the Mesos Jenkins plugin to integrate containerization once it’s released. We will update our setup accordingly when this feature is rolled out. We’d like to add a disclaimer that the Docker integration in the above post hasn’t been tested at scale yet; we will do our due diligence once Mesos 0.19 and deimos are out.

For different build dependencies, you can define a build label for each. A merged PR already specifies the attributes per label. Hence, a Docker container image of choice can be added per build label.


This concludes the description of our journey, giving a good overview of how we ran a distributed CI solution on top of Mesos, utilizing resources in the most efficient manner and isolating build dependencies through Docker.

33 thoughts on “Delivering eBay’s CI Solution with Apache Mesos – Part II

  1. Pingback: Delivering eBay’s CI Solution with Apache Mesos – Part I — eBay Tech Blog

  2. Daniel

    Linking /var/run/docker.sock into the container seems an interesting approach. I have experimented with this setup as well. But now I struggle with this container: It cannot be removed (“device or resource busy” when removing root file system of the container) or started again (“Error getting container … from driver aufs). Did you faced the same problem?

    1. The eBay PaaS Team Post author

      Daniel thanks for bringing the point up. In the version of the mesos-docker executor from mesosphere that we were using for our testing, cleanup of the mesos task only did a docker stop operation which was successful even though unmount failed.
      In latest version of mesos-docker executor,
      in the cleanup_container() its doing a “docker rm” that would hang at least in docker 0.9 because of mounting docker.sock. However, the good news is in latest docker 0.11 release, the issue is fixed. It was also tracked in this resolved issue

      We will also update the vagrant setup mentioned in the article to use the latest docker executor and update to docker 0.11 so benefits of removing the container is achieved.
      The latest docker documentation is also endorsing this approach
      (To quote ‘By bind-mounting the docker unix socket and statically linked docker binary (such as that provided by, you give the container the full access to create and manipulate the host’s docker daemon.’)

      Thanks for bringing this point up.

  3. Adam Spektor

    Great article.
    I have a several question, in case parent docker will execute all the commands we will still have a problem with concurrent execution of tomcat (for example), Im talking about port collisions. I thought that all containers will be isolated in child docker so I can execute several tomcats inside different containers without taking care about random ports. Also when I stop internal docker I still can see this containers that were executed inside -> I should take care about tear-down.

    In case all this true , I hope it not 🙂 what are the benefits of using Docker inside Docker ?



    1. The eBay PaaS Team Post author


      Glad you asked! This blog post is in the context of CI builds and not about running application Docker containers on Mesos. To simply run multiple docker containers on Mesos, you do not need to apply any of the strategies that we talked about.

      To answer your question about port assignments & collisions first – You can rely on the Marathon framework to run Docker containers in a Mesos cluster. In general Mesos has the ability to dynamically assign host-level ports which the executor (mesos-docker in this case) maps to the static ports defined in the app Dockerfile. This takes care of the port collision problem.

      EBay has a polyglot platform running Java, C++, Node.js, Python and Scala applications. Running CI builds as a plain Mesos task would require us to install all the dependencies on the Mesos slave host server. Imagine having to install the latest JDK or Python updates on thousands of production Mesos slave nodes….painful indeed! Downloading and installing these dependencies during build time in every CI job is equally painful due to the I/O overhead. So relying on Docker is a necessity since all these dependencies are now isolated within the cached Docker container layers used to run the Jenkins job.

      Now, Docker-in-Docker wouldn’t be required if the final deliverable of the Jenkins job was simply a war or zip file (a popular use case). A single Docker build container isolating the different dependencies can achieve that. Our final build deliverable in this case is a standalone Docker image and in our example the build job running inside the Docker container had to run docker operations like build and push. The outer Docker container handles caching of the build job CI dependencies and the inner Docker installation handles the build & push operation of the app Docker image. By using strategy #2 as described in the above post we’re simply relaying the Docker build & push commands to the Mesos slave host’s Docker daemon.

      Finally, about the container tear down and cleanup – I believe that if you used strategy #1 then you would have to clean up the AUFS layers used by the nested Docker container manually and that’s why we preferred strategy #2. Using the Docker “rm” option should clean up the remnant container. Please refer to

      Hope this answers your questions.


  4. Chong Chen

    Very interesting article. I do have a technical question regarding mesos docker executor and Jenkin slave.

    In Mesos world, once framework gets offer, it will launch tasks to use offered resource. In Jenkin context, what does each task represent here? A build item or just a Jenkin slave daemon that will need to connect back to master to fetch build item? In particular, what is this task mean in your picture?


    And when do you terminate Jenkin docker container? Will you terminate it once each build item completes?

    1. The eBay PaaS Team Post author

      A “Task” is a generic term used for anything launched in Mesos. For example, if you ran a bash script in Mesos using the default command executor that is a task; if you ran a Docker container in Mesos using the Docker executor that too is a task. The Jenkins master instance, itself acting as a Mesos framework, launches its build job/task and a unique id is assigned to it which is “mesos-jenkins-be65c8fa-d409-4743-a3fa-c8679808c7cc”. This task is actually a Docker container inside which the Jenkins slave agent is initialized using supervisord. The slave agent then proceeds to download the job build scripts from the Jenkins master and executes them.

      Jenkins has an “Idle Termination Minutes” field in its configuration (shown in one of the screenshots above) which controls how long the slave job is kept around after its build has completed. Developers can set the timeout appropriately so that the Docker container is terminated by the Jenkins master via the Docker executor immediately after the build job has completed or it can be kept around for several hours for reuse in a subsequent build. Either approach is fine depending on how you want to manage your Mesos cluster resources.


  5. Pingback: Why Google is sowing the seeds of container-based computing | Technology

  6. Pingback: Why Google is sowing the seeds of container-based computing | Essential Post

  7. Pingback: Why Google is sowing the seeds of container-based computing | Blogsfera

  8. Pingback: Why Google is sowing the seeds of container-based computing | Earthgrid

  9. Julien Eid

    Hey! I’m trying to use which is located in and I’m finding that I get this error with Docker enabled.

    INFO: Received offers 1
    Jun 16, 2014 10:54:06 PM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
    INFO: Received offers 1
    Jun 16, 2014 10:54:09 PM org.jenkinsci.plugins.mesos.MesosCloud provision
    INFO: Provisioning Jenkins Slave on Mesos with 1 executors. Remaining excess workload: 0 executors)
    Jun 16, 2014 10:54:09 PM hudson.slaves.NodeProvisioner update
    INFO: Started provisioning MesosCloud from MesosCloud with 1 executors. Remaining excess workload:0.0
    Jun 16, 2014 10:54:09 PM org.jenkinsci.plugins.mesos.MesosComputerLauncher
    INFO: Constructing MesosComputerLauncher
    Jun 16, 2014 10:54:09 PM org.jenkinsci.plugins.mesos.MesosSlave
    INFO: Constructing Mesos slave
    Jun 16, 2014 10:54:12 PM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
    INFO: Received offers 1
    Jun 16, 2014 10:54:17 PM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
    INFO: Received offers 1
    Jun 16, 2014 10:54:19 PM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
    INFO: Launching slave computer mesos-jenkins-bc416717-768b-4e8b-a7cd-3bab75ae0db4
    Jun 16, 2014 10:54:19 PM org.jenkinsci.plugins.mesos.MesosComputerLauncher launch
    INFO: Sending a request to start jenkins slave mesos-jenkins-bc416717-768b-4e8b-a7cd-3bab75ae0db4
    Jun 16, 2014 10:54:19 PM org.jenkinsci.plugins.mesos.JenkinsScheduler requestJenkinsSlave
    INFO: Enqueuing jenkins slave request
    Jun 16, 2014 10:54:19 PM hudson.slaves.NodeProvisioner update
    INFO: MesosCloud provisioning successfully completed. We have now 2 computer(s)
    Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
    INFO: Received offers 1
    Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
    WARNING: Ignoring disk resources from offer
    Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler matches
    INFO: Ignoring ports resources from offer
    Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler resourceOffers
    INFO: Offer matched! Creating mesos Docker task
    Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler createMesosDockerTask
    INFO: Launching task mesos-jenkins-bc416717-768b-4e8b-a7cd-3bab75ae0db4 with command exec /var/lib/mesos/executors/docker ubuntu
    java.lang.NoSuchMethodError: org.apache.mesos.MesosSchedulerDriver.launchTasks(Ljava/util/Collection;Ljava/util/Collection;Lorg/apache/mesos/Protos$Filters;)Lorg/apache/mesos/Protos$Status;
    at org.jenkinsci.plugins.mesos.JenkinsScheduler.createMesosDockerTask(
    at org.jenkinsci.plugins.mesos.JenkinsScheduler.resourceOffers(
    Jun 16, 2014 10:54:22 PM org.jenkinsci.plugins.mesos.JenkinsScheduler$1 run
    SEVERE: The mesos driver was aborted!

    It is a NoSuchMethodError. I only have this issue when Docker is enabled, causing driver.launchTasks(offerIds, tasks, filters); to be run instead of normally driver.launchTasks(offer.getId(), tasks, filters); when Docker is disabled. I see that offerIds is a List of ID’s, does launchTasks actually take a List? Because normally, you pass a single id.

    1. The eBay PaaS Team Post author


      Just a guess but the exception might indicate an incompatibility between the Mesos driver version that plugin is using and your Mesos cluster version. The driver.launchTasks() method probably requires a list of OfferIds instead of simply passing in a single OfferId now.

      Are you using the above Vagrant setup which has Mesos 0.18.2 and Docker 0.11.0 or is it a different environment? This plugin .hpi file was last built using Mesos 0.17.0 jars and seems to work fine with Mesos 0.18.2.


  10. Pingback: Docker Governance Advisory Board: Next Steps | Docker Blog

  11. Pingback: DockerCon Video: Delivering eBay’s CI Solution with Apache Mesos & Docker | Docker Blog

  12. Ivan Kurnosov

    They are a really interesting 2 articles, but there is something not clear for me:

    you’ve started a jenkins master as a mesos(/marathon) job and didn’t do anything explicit to persist the changes.

    Which means that as soon as the jenkins master dies – marathon will resurrect it but it will be the clean installation without jobs and other things configured.

    Is it for sake of simplicity of the article or I’m missing something?

    1. The eBay PaaS Team Post author


      Great question! You’ve pointed out correctly that as soon as the Jenkins instance dies and Marathon re-spawns it, all the job configs and history will be lost. At eBay our PaaS system maintains preconfigured Jenkins config.xml and job templates depending on the stack chosen. It then provides a vanity Jenkins master URL to the developer using an HTTP proxy which resolves it to the correct Mesos instance. This vanity URL doesn’t change and the Marathon event bus can be used to update the proxy dynamically, capture lost tasks and create replacement tasks. Check out for more information.

      The build artifacts can be persisted using NFS mounts, Amazon S3 or Openstack Swift which is outside the scope of this blog post.


  13. Pingback: eBay uses Apache Meso and Docker for their CI | OSS Silicon Valley

  14. ioan

    Great article guys, I think you managed to convey a pretty confusing subject in a straight-forward manner. Loved the diagrams!

  15. mihai

    Great example.
    It had really showed me how is done. I was very confused about all this new things. Great!

    Any plans to update the software to latest, which supports docker natively?
    Thank you!

  16. Jay

    I too am interested in seeing if the docs could be updated with the latest versions of marathon, mesos, and docker. Thanks!

  17. Joe Hughes

    Hey Guys,

    Thanks for the article! It is great. I am running on the latest Mesos and Using the latest mesos plugin. The problem I am seeing is on the Mesos Slaves I have to create a jenkins user. It seems that after that the slave startup fails to be able to fetch the docker container. My guess is that the jenkins user needs to be added to the Docker group. I am going to try that and I will post if that fixes the issue.

    There is a brief section at that mentions “Mesos Slave Setup” that mentions creating a jenkins user on the slaves. I was just curious if anyone else trying to set this up has hit this issue.

  18. Pingback: Links & reads from 2014 Week 20 | Martin's Weekly Curations

  19. Pingback: Useful Docker Resources - Wikiconsole

  20. Arvind

    EBay has a polyglot platform running Java, C++, Node.js, Python and Scala applications. Running CI builds as a plain Mesos task would require us to install all the dependencies on the Mesos slave host server. Imagine having to install the latest JDK or Python updates on thousands of production Mesos slave nodes….painful indeed! Downloading and installing these dependencies during build time in every CI job is equally painful due to the I/O overhead. So relying on Docker is a necessity since all these dependencies are now isolated within the cached Docker container layers used to run the Jenkins job.

    Do you have a single container with the build toolchain for all of these above languages or do you have one docker image containing each build toolchain. How does one configure jenkins to use the right image as the build slave if you are doing the latter?

  21. Shaohua Wen

    Great article! Thank you so much! I’ve tried and it works!
    One question is about Event-Bus and Nginx reverse proxy, how it’s implemented? I’m thinking of create a small webapp and subscribe to the Marathon events, and then update/reload nginx config once the jenkins master crashed and up again. Any better ideas?


  22. tony

    hi guys,

    thanks for sharing!

    i’m trying a similar setup and i have a couple of questions.

    (1) privs on /var/run/docker.sock:

    i’ve found i have to do something like “chmod o+rw /var/run/docker.sock” on the host to get this to work as i’m running as “jenkins” in the container. feels hacky, but i know there are some outstanding issues around that with docker ( what are you guys doing around that?
    (2) javax.servlet.ServletException: java.lang.UnsatisfiedLinkError: no mesos in java.library.path

    any guidance appreciated!


  23. Pingback: Useful Docker Resources | wikiconsole

  24. Pingback: Quora

  25. Z

    We found it’s extremely slow to use make -j N to build c/c++ code within a docker.
    Have you guys meet such thing?
    Let’s say we run a docker with 10 core and make -j10 within docker.
    The total compile time is about 2-3 times slower then we do the same thing outside container.

    Hope somebody can help.

  26. Buildmaster

    Hello all.

    Are you guys still running the same setup ?
    How has it evolved over time?

Comments are closed.