eBay Tech Blog

Last year, our Gumtree Australia team started to use behavior-driven development in our agile development process. Initially, we searched for an open-source BDD framework to implement for BDD testing. Since we work on a web application, we wanted our automated BDD tests to exercise the application as a customer view, via a web interface. Of course, many web testing libraries do just that, including open-source tools such as Selenium and WebDriver (Selenium 2). However, we discovered that no existing BDD framework supports web automation testing directly.

We decided to simply integrate the open-source BDD tool JBehave with Selenium, but soon learned that this solution doesn’t completely fulfill the potential of automated BDD tests. It can’t drive the project development and documentation processes as much as we expected.

Different stakeholders need to view BDD tests at different levels. For instance, QA needs results to show how the application behaved under test. Upper managers are not so interested in the finer details, but rather want to see the number and complexity of the features defined and implemented so far, and if the project is still on track.

We needed a testing framework that could let us express and report on BDD tests at different levels, manage the stories and their scenarios effectively, and then drill down into the details as required. And so we started to develop a product that matches our needs. We named this product the Behavior Automation Framework (Beaf) and designed it to make the practice of behavior-driven development easier. Based on JBehave as well as more traditional tools like TestNg, Beaf includes a host of features to simplify writing automated BDD tests and interpreting the results.


Beaf’s extensions and utilities improve web testing on WebDriver/Selenium2 in four ways:

  • Supplying a Story web console, where PMs/QAs can manage all the existing stories, divide stories into different categories or groups, and create/edit stories using the correct BDD format.


  • Organizing web tests into reusable steps, and mapping them back to their original requirements and user stories.


  • Generating reports and documentation about BDD tests. Whenever a test case is executed, Beaf generates a report. Each report contains a narrative description of the test, including a short comment and screenshot for each step. The report provides not only information about the test results, but also documentation about how the scenarios under test have been implemented. Reports serve as living documentation, illustrating how an application implements the specified requirements.


  • Incorporating high-level summaries and aggregations of test results into reports. These overviews include how many stories and their scenarios have been tested by automated BDD tests. Taken as a set of objective metrics, test results show the relative size and progress of each feature being implemented.


When executing a test, whether it be with JUnit, jMock, or another framework, Beaf handles many of the Selenium 2/WebDriver infrastructure details. For example, Beaf can run cases cross-platform, including desktop (Firefox, Chrome), mobile (iPhone), and tablet (iPad). Testers can opt to open a new browser for each test, or use the same browser session for all of the tests in a class. The browser/device to be used can also be set in the Beaf configuration.


Using common steps provides a layer of abstraction between the behavior being tested and the way the web application implements that behavior. This level of abstraction makes it easier to manage changes in implementation details, because a desired behavior will generally change less frequently than the details of how it is to be implemented. Abstraction also allows implementation details to be centralized in one place.

@When("posting an Ad in the \"$category\" category")
public void postingAd(@Named("category") String category) throws Throwable
// 1. PageFactory will generate ad post pages under specified category.  
// 2. Page status will be sent to next step by ThreadLocal.
postAdPage.set(PostAdPageFactory.createPostAdPage(category, AdType.OFFER.name()));

In addition to test reports, Beaf supplies a very useful web module called the Beaf Dashboard, which provides a higher-level view of current status. The dashboard shows the state of all of the stories, both in terms of their relative priorities and in terms of how many P1/P2/P3 stories and scenarios are fully, partially, or not automated. This information gives a good idea of the amount of work involved in implementing different parts of the project. The dashboard also keeps track of test results over time, so that users can visualize in concrete terms the amount of work done so far versus the estimated amount of work remaining to be done.


Beaf facilitates QA people joining projects in stages. It provides PD/QA with the detailed information required to test and update code, while giving business managers and PMs the more high-level views and reports that suit their needs. However, the greater potential of Beaf is the ability to turn automated web tests into automated web acceptance testing, in the true spirit of BDD.


At eBay we run Hadoop clusters comprising thousands of nodes that are shared by thousands of users. We analyze data on these clusters to gain insights for improved customer experience. In this post, we look at distributing RPC resources fairly between heavy and light users, as well as mitigating denial of service attacks within Hadoop. By providing appropriate response times and increasing system availability, we offer a better Hadoop experience.

Problem: namenode slowdown

In our clusters, we frequently deal with slowness caused by heavy users, to the point of namenode latency increasing from less than a millisecond to more than half a second. In the past, we fixed this latency by finding and terminating the offending job. However, this reactive approach meant that damage had already been done—in extreme cases, we lost cluster operation for hours.

This slowness is a consequence of the original design of Hadoop. In Hadoop, the namenode is a single machine that coordinates HDFS operations in its namespace. These operations include getting block locations, listing directories, and creating files. The namenode receives HDFS operations as RPC calls and puts them in a FIFO call queue for execution by reader threads. The dataflow looks like this:

FIFO call queue

Though FIFO is fair in the sense of first-come-first-serve, it is unfair in the sense that users who perform more I/O operations on the namenode will be served more than users who perform less I/O. The result is the aforementioned latency increase.

We can see the effect of heavy users in the namenode auditlogs on days where we get support emails complaining about HDFS slowness:


Each color is a different user, and the area indicates call volume. Single users monopolizing cluster resources are a frequent cause of slowdown. With only one namenode and thousands of datanodes, any poorly written MapReduce job is a potential distributed denial-of-service attack.

Solution: quality of service

Taking inspiration from routers—some of which include QoS (quality of service) capabilities—we replaced the FIFO queue with a new type of queue, which we call the FairCallQueue.


The scheduler places incoming RPC calls into a number of queues based on the call volume of the user who made the call. The scheduler keeps track of recent calls, and prioritizes calls from lighter users over calls from heavy users.

The multiplexer controls the penalty of being in a low-priority queue versus a high-priority queue. It reads calls in a weighted round-robin fashion, preferring to read from high-priority queues and infrequently reading from the lowest-priority queues. This ensures that high-priority requests are served first, and prevents starvation of low-priority RPCs.

The multiplexer and scheduler are connected by a multi-level queue; together, these three form the FairCallQueue. In our tests at scale, we’ve found the queue is effective at preserving low latencies even in the face of overwhelming denial-of-service attacks on the namenode.

This plot shows the latency of a minority user during three runs of a FIFO queue (QoS disabled) and the FairCallQueue (QoS enabled). As expected, the latency is much lower when the FairCallQueue is active. (Note: spikes are caused by garbage collection pauses, which are a separate issue).


Open source and beyond

The 2.4 release of Apache Hadoop includes the prerequisites to namenode QoS. With this release, cluster owners can modify the implementation of the RPC call queue at runtime and choose to leverage the new FairCallQueue. You can try the patches on Apache’s JIRA: HADOOP-9640.

The FairCallQueue can be customized with other schedulers and multiplexers to enable new features. We are already investigating future improvements, such as weighting different RPC types for more intelligent scheduling and allowing users to manually control which queues certain users are scheduled into. In addition, there are features submitted from the open source community that build upon QoS, such as RPC client backoff and Fair Share queuing.

With namenode QoS in place, we have improved our users’ experience of our Hadoop clusters by providing faster and more uniform response times to well-behaved users while minimizing the impact of poorly written or badly behaved jobs. This in turn allows our analysts to be more productive and focus on the things that matter, like making your eBay experience a delightful one.

- Chris Li

eBay Global Data Infrastructure Analytics Team


The Platform and Infrastructure team at eBay Inc. is happy to announce the open-sourcing of Oink – a self-service solution to Apache Pig.

Pig and Hadoop overview

Apache Pig is a platform for analyzing large data sets. It uses a high-level language for expressing data analysis programs, coupled with the infrastructure for evaluating these programs. Pig abstracts the Map/Reduce paradigm, making it very easy for users to write complex tasks using Pig’s language, called Pig Latin. Because execution of tasks can be optimized automatically, Pig Latin allows users to focus on semantics rather than efficiency. Another key benefit of Pig Latin is extensibility:  users can do special-purpose processing by creating their own functions.

Apache Hadoop and Pig provide an excellent platform for extracting and analyzing data from very large application logs. At eBay, we on the Platform and Infrastructure team are responsible for storing TBs of logs that are generated every day from thousands of eBay application servers. Hadoop and Pig offer us an array of tools to search and view logs and to generate reports on application behavior. As the logs are available in Hadoop, engineers (users of applications) also have the ability to use Hadoop and Pig to do custom processing, such as Pig scripting to extract useful information.

The problem

Today, Pig is primarily used through the command line to spawn jobs. This model wasn’t well suited to the Platform team at eBay, as the cluster that housed the application logs was shared with other teams. This situation created a number of issues:

  • Governance – In a shared-cluster scenario, governance is critically important to attain. Pig scripts and requests of one customer should not impact those of other customers and stakeholders of the cluster. In addition, providing CLI access would make governance difficult in terms of controlling the number of job submissions.
  • Scalability – CLI access to all Pig customers created another challenge:  scalability. Onboarding customers takes time and is a cumbersome process.
  • Change management – No easy means existed to upgrade or modify common libraries.

Hence, we needed a solution that acted as a gateway to Pig job submission, provided QoS, and abstracted the user from cluster configuration.

The solution:  Oink

Oink solves the above challenges not only by allowing execution of Pig requests through a REST interface, but also by enabling users to register jars, view the status of Pig requests, view Pig request output, and even cancel a running Pig request. With the REST interface, the user has a cleaner way to submit Pig requests compared to CLI access. Oink serves as a single point of entry for Pig requests, thereby facilitating rate limiting and QoS enforcement for different customers.

oinkOink runs as a servlet inside a web container and allows users to run multiple requests in parallel within a single JVM instance. This capability was not supported initially, but rather required the help of the patch found in PIG-3866. This patch provides multi-tenant environment support so that different users can share the same instance.

With Oink, eBay’s Platform and Infrastructure team has been able to onboard 100-plus different use cases onto its cluster. Currently, more than 6000 Pig jobs run every day without any manual intervention from the team.

Special thanks to Vijay Samuel, Ruchir Shah, Mahesh Somani, and Raju Kolluru for open-sourcing Oink. If you have any queries related to Oink, please submit an issue through GitHub.

{ 1 comment }

Using Spark to Ignite Data Analytics

by eBay Global Data Infrastructure Analytics Team on 05/28/2014

in Data Infrastructure and Services,Machine Learning

At eBay we want our customers to have the best experience possible. We use data analytics to improve user experiences, provide relevant offers, optimize performance, and create many, many other kinds of value. One way eBay supports this value creation is by utilizing data processing frameworks that enable, accelerate, or simplify data analytics. One such framework is Apache Spark. This post describes how Apache Spark fits into eBay’s Analytic Data Infrastructure.


What is Apache Spark?

The Apache Spark web site describes Spark as “a fast and general engine for large-scale data processing.” Spark is a framework that enables parallel, distributed data processing. It offers a simple programming abstraction that provides powerful cache and persistence capabilities. The Spark framework can be deployed through Apache Mesos, Apache Hadoop via Yarn, or Spark’s own cluster manager. Developers can use the Spark framework via several programming languages including Java, Scala, and Python. Spark also serves as a foundation for additional data processing frameworks such as Shark, which provides SQL functionality for Hadoop.

Spark is an excellent tool for iterative processing of large datasets. One way Spark is suited for this type of processing is through its Resilient Distributed Dataset (RDD). In the paper titled Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, RDDs are described as “…fault-tolerant, parallel data structures that let users explicitly persist intermediate results in memory, control their partitioning to optimize data placement, and manipulate them using a rich set of operators.” By using RDDs,  programmers can pin their large data sets to memory, thereby supporting high-performance, iterative processing. Compared to reading a large data set from disk for every processing iteration, the in-memory solution is obviously much faster.

The diagram below shows a simple example of using Spark to read input data from HDFS, perform a series of iterative operations against that data using RDDs, and write the subsequent output back to HDFS.


In the case of the first map operation into RDD(1), not all of the data could fit within the memory space allowed for RDDs. In such a case, the programmer is able to specify what should happen to the data that doesn’t fit. The options include spilling the computed data to disk and recreating it upon read. We can see in this example how each processing iteration is able to leverage memory for the reading and writing of its data. This method of leveraging memory is likely to be 100X faster than other methods that rely purely on disk storage for intermittent results.

Apache Spark at eBay

Today Spark is most commonly leveraged at eBay through Hadoop via Yarn. Yarn manages the Hadoop cluster’s resources and allows Hadoop to extend beyond traditional map and reduce jobs by employing Yarn containers to run generic tasks. Through the Hadoop Yarn framework, eBay’s Spark users are able to leverage clusters approaching the range of 2000 nodes, 100TB of RAM, and 20,000 cores.

The following example illustrates Spark on Hadoop via Yarn.


The user submits the Spark job to Hadoop. The Spark application master starts within a single Yarn container, then begins working with the Yarn resource manager to spawn Spark executors – as many as the user requested. These Spark executors will run the Spark application using the specified amount of memory and number of CPU cores. In this case, the Spark application is able to read and write to the cluster’s data residing in HDFS. This model of running Spark on Hadoop illustrates Hadoop’s growing ability to provide a singular, foundational platform for data processing over shared data.

The eBay analyst community includes a strong contingent of Scala users. Accordingly, many of eBay’s Spark users are writing their jobs in Scala. These jobs are supporting discovery through interrogation of complex data, data modelling, and data scoring, among other use cases. Below is a code snippet from a Spark Scala application. This application uses Spark’s machine learning library, MLlib, to cluster eBay’s sellers via KMeans. The seller attribute data is stored in HDFS.

 * read input files and turn into usable records
 var table = new SellerMetric()
 val model_data = sc.sequenceFile[Text,Text](
   v => parseRecord(v._2,table)
   v => v != null


 * build training data set from sample and summary data
 val train_data = sample_data.map( v =>
     i => zscore(v._2(i),sample_mean(i),sample_stddev(i))

 * train the model
 val model = KMeans.train(train_data,CLUSTERS,ITERATIONS)
 * score the data
 val results = grouped_model_data.map( 
   v => (
         i => zscore(v._2(i),sample_mean(i),sample_stddev(i))

In addition to  Spark Scala users, several folks at eBay have begun using Spark with Shark to accelerate their Hadoop SQL performance. Many of these Shark queries are easily running 5X faster than their Hive counterparts. While Spark at eBay is still in its early stages, usage is in the midst of expanding from experimental to everyday as the number of Spark users at eBay continues to accelerate.

The Future of Spark at eBay

Spark is helping eBay create value from its data, and so the future is bright for Spark at eBay. Our Hadoop platform team has started gearing up to formally support Spark on Hadoop. Additionally, we’re keeping our eyes on how Hadoop continues to evolve in its support for frameworks like Spark, how the community is able to use Spark to create value from data, and how companies like Hortonworks and Cloudera are incorporating Spark into their portfolios. Some groups within eBay are looking at spinning up their own Spark clusters outside of Hadoop. These clusters would either leverage more specialized hardware or be application-specific. Other folks are working on incorporating eBay’s already strong data platform language extensions into the Spark model to make it even easier to leverage eBay’s data within Spark. In the meantime, we will continue to see adoption of Spark increase at eBay. This adoption will be driven by chats in the hall, newsletter blurbs, product announcements, industry chatter, and Spark’s own strengths and capabilities.


In part I of this post we laid out in detail how to run a large Jenkins CI farm in Mesos. In this post we explore running the builds inside Docker containers and more:

  • Explain the motivation for using Docker containers for builds.
  • Show how to handle the case where the build itself is a Docker build.
  • Peek into how the Mesos 0.19 release is going to change Docker integration.
  • Walk through a Vagrant all-in-one-box setup so you can try things out.


Jenkins follows the master-slave model and is capable of launching tasks as remote Java processes on Mesos slave machines. Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications or frameworks. We can leverage the capabilities of Jenkins and Mesos to run a Jenkins slave process within a Docker container using Mesos as the resource manager.

Why use Docker containers?

This page gives a good picture of what Docker is all about.

At eBay Inc., we have several different build clusters. They are primarily partitioned due to a number of factors:  requirements to run different OS flavors (mostly RHEL and Ubuntu), software version conflicts, associated application dependencies, and special hardware. When using Mesos, we try to operate on a single cluster with heteregeneous workloads instead of having specialized clusters. Docker provides a good solution to isolate the different dependencies inside the container irrespective of the host setup where the Mesos slave is running, thereby helping us operate on a single cluster. Special hardware requirements can always be handled though slave attributes that the Jenkins plugin already supports. Overall, then, this setup scheme helps maintain consistent host images in the cluster, avoids having to introduce a wide combination of different flavors of Mesos slave hosts running, yet handles all the varied build dependencies within a container.

Now why support Docker-in-Docker setup?

When we started experimenting with running the builds in Docker containers, some of our teammates were working on enabling Docker images for applications. They posed the question, How do we support Docker build and push/pull operations within the Docker container used for the build? Valid point! So, we will explore two ways of handling this challenge. Many thanks to Jérôme Petazzoni from the Docker team for his guidance.

Environment setup

A Vagrant development VM setup demonstrates CI using Docker containers. This VM can be used for testing other frameworks like Chronos and Aurora; however, we will focus on the CI use of it with Marathon. The screenshots shown below have been taken from the Vagrant development environment setup, which runs a cluster of three Mesos masters, three Mesos slave instances, and one Marathon instance. (Marathon is a Mesos framework for long-running services. It provides a REST API for starting, stopping, and scaling services.) mesos1 marathon1 mesos2 mesos3

Running Jenkins slaves inside Mesos Docker containers requires the following ecosystem:

  1. Jenkins master server with the Mesos scheduler plugin installed (used for building Docker containers via CI jobs).
  2. Apache Mesos master server with at least one slave server .
  3. Mesos Docker Executor installed on all Mesos slave servers. Mesos slaves delegate execution of tasks within Docker containers to the Docker executor. (Note that integration with Docker changes with the Mesos 0.19 release, as explained in the miscellaneous section at the end of this post.)
  4. Docker installed on all slave servers (to automate the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere).
  5. Docker build container image in the Docker registry.
  6. Marathon framework.

1. Creating the Jenkins master instance

We needed to first launch a standalone Jenkins master instance in Mesos via the Marathon framework.  We placed Jenkins plugins in the plugins directory, and included a default config.xml file with pre-configured settings. Jenkins was then launched by executing the jenkins.war file. Here is the directory structure that we used for launching the Jenkins master:

├── README.md
├── config.xml
├── hudson.model.UpdateCenter.xml
├── jenkins.war
├── jobs
├── nodeMonitors.xml
├── plugins
│   ├── mesos.hpi
│   └── saferestart.jpi
└── userContent
└── readme.txt
3 directories, 8 files

2. Launching the Jenkins master instance

Marathon launched the Jenkins master instance using the following command, also shown in the Marathon UI screenshots below. We zipped our Jenkins files and downloaded them for the job by using the URIs field in the UI; however, for demonstration purposes, below we show using a Git repository to achieve the same goal.

git clone https://github.com/ahunnargikar/jenkins-standalone && cd jenkins-standalone;
export JENKINS_HOME=$(pwd);
java -jar jenkins.war






3. Launching Jenkins slaves using the Mesos Docker executor


Here’s a sample supervisord startup configuration for a Docker image capable of executing Jenkins slave jobs:


command=/bin/bash -c "eval $JENKINS_COMMAND"

As you can see, Jenkins passed its slave launch command as an environment variable to the Docker container. The container then initialized the Jenkins slave process, which fulfilled the basic requirement for kicking off the Jenkins slave job.

This configuration was sufficient to launch regular builds within the Docker container of choice. Now let’s walk through the two options that we explored to run Docker operations for a CI build inside a Docker container. Strategy #1 required use of supervisord to control the Docker daemon process. For the default case (regular non-Docker builds) and strategy #2, supervisord was not required; one could simply pass the command directly to the Docker container.

3.1 Strategy #1 – Using an individual Docker-in-Docker (dind) setup on each Mesos slave

This strategy, inspired by this blog,  involved a dedicated Docker daemon inside the Docker container. The advantage of this approach was that we didn’t have a single Docker daemon handling a large number of container builds. On the flip side, each container was now absorbing the I/O overhead of downloading and duplicating all the AUFS file system layers.


The Docker-in-Docker container had to be launched in privileged mode (by including the “-privileged” option in the Mesos Docker executor code); otherwise, nested Docker containers wouldn’t work. Using this strategy, we ended up having two Docker executors:  one for launching Docker containers in non-privileged mode (/var/lib/mesos/executors/docker) and the other for launching Docker-in-Docker containers in privileged mode (/var/lib/mesos/executors/docker2). The supervisord process manager configuration was updated to run the Docker daemon process in addition to the Jenkins slave job process.


The following Docker-in-Docker image has been provided for demonstration purposes for testing out the multi-Docker setup:


In real life, the actual build container image would capture the build dependencies and base image flavor, in addition to the contents of the above dind image. The actual command that the Docker executor ran looked similar to this one:

docker run 
-cidfile /tmp/docker_cid.6c6bba3db72b7483 
-c 51 -m 302365697638 
-e JENKINS_COMMAND=wget -O slave.jar && java -DHUDSON_HOME=jenkins -server -Xmx256m -Xms16m -XX:+UseConcMarkSweepGC -Djava.net.preferIPv4Stack=true -jar slave.jar  -jnlpUrl hashish/jenkins-dind

3.2 Strategy #2 – Using a shared Docker Setup on each Mesos slave

All of the Jenkins slaves running on a Mesos slave host could simply use a single Docker daemon for running their Docker containers, which was the default standard setup. This approach eliminated redundant network and disk I/O involved with downloading the AUFS file system layers. For example, all Java application projects could now reuse the same AUFS file system layers that contained the JDK, Tomcat, and other static Linux package dependencies. We lost isolation as far as the Docker daemon was concerned, but we gained a massive reduction in I/O and were able to leverage caching of build layers. This was the optimal strategy for our use case.


The Docker container mounted the host’s /var/run/docker.sock file descriptor as a shared volume so that its native Docker binary, located at /usr/local/bin/docker, could now communicate with the host server’s Docker daemon. So all Docker commands were now directly being executed by the host server’s Docker daemon. This eliminated the need for running individual Docker daemon processes on the Docker containers that were running on a Mesos slave server.

The following Docker image has been provided for demonstration purposes for a shared Docker setup. The actual build Docker container image of choice essentially just needed to execute the Docker binary via its CLI. We could even have mounted the Docker binary from the host server itself to the same end.


The actual command that the Docker executor ran looked similar to this:

docker run 
-cidfile /tmp/docker_cid.6c6bba3db72b7483 
-v /var/run/docker.sock:/var/run/docker.sock 
-c 51 -m 302365697638 
-e JENKINS_COMMAND=wget -O slave.jar && java -DHUDSON_HOME=jenkins -server -Xmx256m -Xms16m -XX:+UseConcMarkSweepGC -Djava.net.preferIPv4Stack=true -jar slave.jar  -jnlpUrl hashish/jenkins-dind-single

4. Specifying the cloud configuration for the Jenkins master

We then needed to configure the Jenkins master so that it would connect to the Mesos master server and start receiving resource offers, after which it could begin launching tasks on Mesos. The following screenshots illustrate how we configured the Jenkins master via its web administration UI.






Note: The Docker-specific configuration options above are not available in the stable release of the Mesos plugin. Major changes are underway in the upcoming Mesos 0.19.0 release, which will introduce the pluggable containerizer functionality. We decided to wait for 0.19.0 to be released before making a pull request for this feature. Instead, a modified .hpi plugin file was created from this Jenkins Mesos plugin branch and has been included in the Vagrant dev setup.



5. Creating the Jenkins Mesos Docker job

Now that the Jenkins scheduler had registered as a framework in Mesos, it started receiving resource offers from the Mesos master. The next step was to create a Jenkins job that would be launched on a Mesos slave whose resource offer satisfied the cloud configuration requirements.

5.1 Creating a Docker Tomcat 7 application container image

Jenkins first needed a Docker container base image that packaged the application code and dependencies as well as a web server. For demonstration purposes, here’s a sample Docker Tomcat 7 image created from this Github repository:


Every application’s Git repository would be expected to have its unique Dockerfile with whatever combination of Java/PHP/Node.js pre-installed in a base container. In the case of our Java apps, we simply built the .war file using Maven and then inserted it into the Docker image during build time. The Docker image was then tagged with the application name, version, and timestamp, and then uploaded into our private Docker registry.

5.2 Running a Jenkins Docker job

For demonstration purposes, the following example assumes that we are building a basic Java web application.







Once Jenkins built and uploaded the new application’s Docker image containing the war, dependencies, and other packages, this Docker image was launched in Mesos and scaled up or down to as many instances as required via the Marathon APIs.

Miscellaneous points

Our Docker integration with Mesos is going to be outdated soon with the 0.19 release. Our setup was against Mesos 0.17 and Docker 0.9.  You can read about the Mesos pluggable containerizer feature in this blog and in this ticket. The Mesosphere team is also working on the deimos project to integrate Docker with the external containerization approach. There is an old pull request against the Mesos Jenkins plugin to integrate containerization once it’s released. We will update our setup accordingly when this feature is rolled out. We’d like to add a disclaimer that the Docker integration in the above post hasn’t been tested at scale yet; we will do our due diligence once Mesos 0.19 and deimos are out.

For different build dependencies, you can define a build label for each. A merged PR already specifies the attributes per label. Hence, a Docker container image of choice can be added per build label.


This concludes the description of our journey, giving a good overview of how we ran a distributed CI solution on top of Mesos, utilizing resources in the most efficient manner and isolating build dependencies through Docker.


Problem statement

In eBay’s existing CI model, each developer gets a personal CI/Jenkins Master instance. This Jenkins instance runs within a dedicated VM, and over time the result has been VM sprawl and poor resource utilization. We started looking at solutions to maximize our resource utilization and reduce the VM footprint while still preserving the individual CI instance model. After much deliberation, we chose Apache Mesos for a POC. This post shares the journey of how we approached this challenge and accomplished our goal.

Jenkins framework’s Mesos plugin

The Mesos plugin is Jenkins’ gateway into the world of Mesos, so it made perfect sense to bring the plugin code in sync with our requirements. This video explains the plugin. The eBay PaaS team made several pull requests to the Mesos code, adding both new features and bug fixes. We are grateful to the Twitter engineering team (especially Vinod) for their input and cooperation in quickly getting these features validated and rolled out.  Here are all the contributions that we made to the recently released 0.2 Jenkins Mesos plugin version. We are adding more features as we proceed.

Mesos cluster setup



Our new Mesos cluster is set up on top of our existing OpenStack deployment. In the model we are pursuing, we would necessarily have lots of Jenkins Mesos frameworks (each Jenkins Master is essentially a Mesos framework), and we did not want to run those outside of the Mesos cluster so that we would not have to separately provision and manage them. We therefore decided to use the Marathon framework as the Mesos meta framework; we launched the Jenkins master (and the Mesos framework) in Mesos itself. We additionally wanted to collocate the Jenkins masters in a special set of VMs in the cluster, using the placement constraint feature of Marathon that leverages slave attributes. Thus we separated Mesos slave nodes into a group of Jenkins masters and another group of Jenkins slave nodes. For backup purposes, we associated special block storage with the VMs running the CI master. Special thanks to the Mesosphere.io team for quickly resolving all of our queries related to Marathon and Mesos in general.

Basic testing succeeded as expected. The Jenkins master would launch through Marathon with a preconfigured Jenkins config.xml, and it would automatically register as a Mesos framework without needing any manual configuration. Builds were correctly launched in a Jenkins slave within one of the distributed Mesos slave nodes. Configuring slave attributes allowed the Mesos plugin to pick the nodes on which  slave jobs could be scheduled. Checkpointing was enabled so that build jobs were not lost if the slave process temporarily disconnected and reconnected back to the Mesos master (the slave recovery feature). In case there was a new Mesos master leader elected, the plugin used Zookeeper endpoints to locate the new master (more on this a little later).



We decided to simulate a large deployment and wrote a Jenkins load test driver (mammoth). As time progressed we started uncovering use cases that were unsuccessful. Here is a discussion of each problem and how we addressed it.

Frameworks stopped receiving offers after a while

One of the first things we noticed occurred after we used Marathon to create the initial set of CI masters. As those CI masters started registering themselves as frameworks, Marathon stopped receiving any offers from Mesos; essentially, no new CI masters could be launched. The other thing we noticed was that, of the Jenkins Frameworks that were registered, only a few would receive offers. At that point, it was evident that we needed a very thorough understanding of the resource allocation algorithm of Mesos – we had to read the code. Here is an overview on the code’s setup and the dominant resource fairness algorithm.

Let’s start with Marathon. In the DRF model, it was unfair to treat Marathon in the same bucket/role alongside hundreds of connected Jenkins frameworks. After launching all these Jenkins frameworks, Marathon had a large resource share and Mesos would aggressively offer resources to frameworks that were using little or no resources. Marathon was placed last in priority and got starved out.

We decided to define a dedicated Mesos role for Marathon and to have all of the Mesos slaves that were reserved for Jenkins master instances support that Mesos role. Jenkins frameworks were left with the default role “*”. This solved the problem – Mesos offered resources per role and hence Marathon never got starved out. A framework with a special role will get resource offers from both slaves supporting that special role and also from the default role “*”. However, since we were using placement constraints, Marathon accepted resource offers only from slaves that supported both the role and the placement constraints.

Certain Jenkins frameworks were still getting starved

Our next task was to find out why certain Jenkins frameworks were getting zero offers even when they were in the same role and not running any jobs. Also, certain Jenkins frameworks always received offers. Mesos makes offers to frameworks and frameworks have to decline them if they don’t use them.

The important point was to remember that the offer was made to the framework. Frameworks that did not receive offers, but that had equal resource share to frameworks that received and declined offers, should receive offers from Mesos. Basically, past history has to be accounted for. This situation can also arise where there are fewer resources available in the cluster. Ben Hindman quickly proposed and implemented the fix for this issue, so that fair sharing happens among all of the Jenkins frameworks.

Mesos delayed offers to frameworks

We uncovered two more situations where frameworks would get starved out for some time but not indefinitely. No developer wants to wait that long for a build to get scheduled. In the first situation, the allocation counter to remember past resource offers (refer to the fix in the previous paragraph) for older frameworks (frameworks that joined earlier) would be much greater than for the new frameworks that just joined. New frameworks would continue to receive more offers, even if they were not running jobs; when their allocation counter reached the level of older frameworks, they would be treated equally. We addressed this issue by specifying that whenever a new framework joins, the allocation counter is reset for all frameworks, thereby bringing them to a level playing field to compete for resources. We are exploring an alternative that would normalize the counter values instead of setting the counter to zero. (See this commit.)

Secondly, we found that once a framework finished running jobs/Mesos tasks, the user share – representing the current active resources used – never came down to zero. Double arithmetic led to a ridiculously small value (e.g., 4.44089e-16), which unfairly put frameworks that had just finished builds behind frameworks that had their user share at 0. As a quick fix, we used precision 0.000001 to treat those small values as 0 in the comparator. Ben Hindman suggested an alternative: once a framework has no tasks and executors, it’s safe to set the resource share explicitly to zero. We are exploring that alternative as well in the Mesos bug fix review process.

Final hurdle

After making all of the above changes, we were in reasonably good shape. However, we discussed scenarios where certain frameworks running active jobs and finishing them always get ahead of inactive frameworks (due to the allocation counter); a build started in one of those inactive frameworks would waste some time in scheduling. It didn’t make sense for a bunch of connected frameworks to be sitting idle and competing for resource offers when they had nothing to build. So we came up with a new enhancement to the Jenkins Mesos plugin:  to register as a Mesos framework only when there was something in the build queue. The Jenkins framework would unregister as a Mesos framework as soon as there were no active or pending builds (see this pull request). This is an optional feature, not required in a shared CI master that’s running jobs for all developers. We also didn’t need to use the slave attribute feature any more in the plugin, as it was getting resource offers from slaves with the default role.

Our load tests were finally running with predictable results! No starvation, and quick launch of builds.

Cluster management

When we say “cluster” we mean a group of servers running the PaaS core framework services. Our cluster is built on virtual servers in our OpenStack environment. Specifically, the cluster consists of virtual servers running Apache Zookeeper, Apache Mesos (masters and slaves), and Marathon services. This combination of services was chosen to provide a fault-tolerant and high-availability (HA) redundant solution. Our cluster consists of at least three servers running each of the above services.

By design, the Mesos and Marathon services do not operate in the traditional redundant HA mode with active-active or active-passive servers. Instead, they are separate, independent servers that are all active, although for HA implementation they use the Zookeeper service as a communication bus among the servers. This bus provides a leader election mechanism to determine a single leader.

Zookeeper performs this function by keeping track of the leader within each group of servers. For example, we currently provision three Mesos masters running independently. On startup, each master connects to the Zookeeper service to first register itself and then to elect a leader among themselves. Once a leader is elected, the other two Mesos masters will redirect all client requests to the leader. In this way, we can have any number of Mesos masters for an HA setup and still have only a single leader at any one time.

The Mesos slaves do not require HA, because they are treated as workers that provide resources (CPU, memory, disk space) enabling the Mesos masters to execute tasks. Slaves can be added and removed dynamically. The Marathon servers use a similar leader election mechanism to allow any number of Marathon servers in an HA cluster. In our case we also chose to deploy three Marathon servers for our cluster. The Zookeeper service does have a built-in mechanism for HA, so we chose to deploy three Zookeeper servers in our cluster to take advantage of its HA capabilities.

Building the Zookeeper-Mesos-Marathon cluster

Of course we always have the option of building each server in the cluster manually, but that quickly becomes time-intensive and prone to inconsistencies. Instead of provisioning manually, we created a single base image with all the necessary software. We utilize the OpenStack cloud-init post-install to convert the base image into either a Zookeeper, Mesos Master, Mesos Slave, or Marathon server.

We maintain the cloud-init scripts and customization in github. Instead of using the nova client directly or the web gui to provision, we added another automation feature and wrote Python scripts to call the python-novaclient and pass in the necessary cloud-init and post-install instructions to build a new server. This combines all the necessary steps into a single command. The command provisions the VM, instructs the VM to download the cloud-init post-install script from github, activates the selected service, and joins the new VM to a cluster. As a result, we can easily add servers to an existing cluster as well as create new clusters.

Cluster management with Ansible

Ansible is a distributed systems management tool that helps to ease the management of many servers. It is not too difficult or time-consuming to make changes on one, two, or even a dozen servers, but making changes to hundreds or thousands of servers becomes a non-trivial task. Not only do such changes take a lot of time, but they have a high chance of introducing an inconsistency or error that would cause unforeseen problems.

Ansible is similar to cfengine, puppet, chef, salt, and many other systems management tools. Each tool has its own strengths and weaknesses. One of the reasons we decided to use Ansible is its ability to execute remote commands using ssh without having the need for any Ansible client to run on the servers.

Ansible can be used as a configuration management, software deployment, and do-anything-you-want kind of a tool. It employs a plug-and-play concept, where existing modules have already been written for many functions. For example, there are modules for connecting to hosts with a shell, for AWS EC2 automation, for networking, for user management, etc.

Since we have a large Mesos cluster and several of the servers are for experimentation, we use Ansible extensively to manage the cluster and make consistent changes when necessary across all of the servers.


Depending on the situation and use case, many different models for running CI in Mesos can be tried out. The model that we outlined above is only one. Another variation is a shared master using the plugin and temporary masters running the build directly.  In Part II of this blog post, we introduce an advanced use case of running builds in Docker containers on Mesos.


In the era of cloud and XaaS (everything as a service), REST/SOAP-based web services have become ubiquitous within eBay’s platform. We dynamically monitor and manage a large and rapidly growing number of web servers deployed on our infrastructure and systems. However, existing tools present major challenges when making REST/SOAP calls with server-specific requests to a large number of web servers, and then performing aggregated analysis on the responses.

We therefore developed REST Commander, a parallel asynchronous HTTP client as a service to monitor and manage web servers. REST Commander on a single server can send requests to thousands of servers with response aggregation in a matter of seconds. And yes, it is open-sourced at http://www.restcommander.com.

Feature highlights

REST Commander is Postman at scale: a fast, parallel asynchronous HTTP client as a service with response aggregation and string extraction based on generic regular expressions. Built in Java with Akka, Async HTTP Client, and the Play Framework, REST Commander is packed with features beyond speed and scalability:

  • Click-to-run with zero installation
  • Generic HTTP request template supporting variable-based replacement for sending server-specific requests
  • Ability to send the same request to different servers, different requests to different servers, and different requests to the same server
  • Maximum concurrency control (throttling) to accommodate server capacity

Commander itself is also “as a service”: with its powerful REST API, you can define ad-hoc target servers, an HTTP request template, variable replacement, and a regular expression all in a single call. In addition, intuitive step-by-step wizards help you achieve the same functionality through a GUI.

Usage at eBay

With REST Commander, we have enabled cost-effective monitoring and management automation for tens of thousands of web servers in production, boosting operational efficiency by at least 500%. We use REST Commander for large-scale web server updates, software deployment, config pushes, and discovery of outliers. All can be executed by both on-demand self-service wizards/APIs and scheduled auto-remediation. With a single instance of REST Commander, we can push server-specific topology configurations to 10,000 web servers within a minute (see the note about performance below). Thanks to its request template with support for target-aware variable replacement, REST Commander can also perform pool-level software deployment (e.g., deploy version 2.0 to QA pools and 1.0 to production pools).

Basic workflow

Figure 1 presents the basic REST Commander workflow. Given target servers as a “node group” and an HTTP command as the REST/SOAP API to hit, REST Commander sends the requests to the node group in parallel. The response and request for each server become a pair that is saved into an in-memory hash map. This hash map is also dumped to disk, with the timestamp, as a JSON file. From the request/response pair for each server, a regular expression is used to extract any substring from the response content.


 Figure 1. REST Commander Workflow.

Concurrency and throttling model with Akka

REST Commander leverages Akka and the actor model to simplify the concurrent workflows for high performance and scalability. First of all, Akka provides built-in thread pools and encapsulated low-level implementation details, so that we can fully focus on task-level development rather than on thread-level programming. Secondly, Akka provides a simple analogy of actors and messages to explain functional programming, eliminating global state, shared variables, and locks. When you need multiple threads/jobs to update the same field, simply send these results as messages to a single actor and let the actor handle the task.

Figure 2 is a simplified illustration of the concurrent HTTP request and response workflow with throttling in Akka. Throttling (concurrency control) indicates the maximum concurrent requests that REST Commander will perform. For example, if the throttling value is 100, REST Commander will not send the “n_th” request until it gets the “{n-100}_th” response back; so the 500th request will not be sent until the response from the 400th request has been received.


Figure 2. Concurrency Design with Throttling in Akka (see code)

Suppose one uniform GET /index.html HTTP request is to be sent to 10,000 target servers. The process starts with the Director having the job of sending the requests. Director is not an Akka actor, but rather a Java object that initializes the Actor system and the whole job. It creates an actor called Manager, and passes to it the 10,000 server names and the HTTP call. When the Manager receives the data, it creates one Assistant Manager and 10,000 Operation Workers. The Manager also embeds a task of “server name” and the “GET index.html HTTP request” in each Operation Worker. The Manager does not give the “go ahead” message for triggering task execution on the workers. Instead, the Assistant Manager is responsible for this part: exercising throttling control by asking only some workers to execute tasks.

To better decouple the code based on functionality, the Manager is only in charge of receiving responses from the workers, and the Assistant Manager is responsible for sending the “go ahead” message to trigger workers to work. The Manager initially sends the Assistant Manager a message to send the throttling number of messages; we’ll use 1500, the default throttling number, for this example. The Assistant Manager starts sending a “go ahead” message to each of 1500 workers. To control throttling, the Assistant Manager maintains a sliding window of [response_received_count, request_sent_count]. The request_sent_count is the number of “go ahead” messages the Assistant Manager has sent to the workers. The response_received_count comes from the Manager; when the Manager receives a response, it communicates the updated count to the Assistant Manager. Every half-second, the Assistant Manager sends itself a message to trigger a check of response_received_count and request_sent_count to determine whether the sliding window has room for sending additional messages. If so, the Assistant Manager sends messages until the sliding window is greater than or equal to the throttling number (1500).

Each Operation Worker creates an HTTP Worker, which also has Ning’s async HTTP client functions. When the Manager receives a response from an Operation Worker, it updates the response part of the in-memory hash map of for the associated server. In the event of failing to obtain the response or of timing out, the worker would return exception details (e.g., connection exception) back to the Manager. When the Manager has received all of the responses, it returns the whole hash map of back to the Director. As the job successfully completes, the Director dumps the hash map to disk as a JSON file, then returns.

Beyond web server management – generic HTTP workflows

When modeling and abstracting today’s cloud operations and workflows – e. g., provisioning, file distributions, and software deployment – we find that most of them are similar: each step is a certain form of HTTP call with certain responses, which trigger various operations in the next step. Using the example of monitoring cluster server health, the workflow goes like this:

  1. A single HTTP call to query data storage (such as database as a service) and retrieve the host names and health records of the target servers (1 call to 1 server)
  2. Massive uniform HTTP calls to check the current health of target servers (1 call to N servers); aggregating these N responses; and conducting simple analysis and extractions
  3. Data storage updates for those M servers with changed status (M calls to 1 server)

REST Commander flawlessly supports such use cases with its generic and powerful request models. It therefore is used to automate many tasks involving interactions and workflows (orchestrations) with DBaaS, LBaaS (load balancer as a service), IaaS, and PaaS.

Related work review

Of course, HTTP is a fundamental protocol to the World Wide Web, SOAP/REST-based web services, cloud computing, and many distributed systems. Efficient HTTP/REST/SOAP clients are thus critical in today’s platform and infrastructure services. Although many tools have been developed in this area, we are not aware of any existing tools or libraries on HTTP clients that combine the following three features:

  • High efficiency and scalability with built-in throttling control for parallel requests
  • Generic response aggregation and analysis
  • Generic (i.e., template-based) heterogeneous request generation to the same or different target servers

Postman is a popular and user-friendly REST client tool; however, it does not support efficient parallel requests or response aggregation. Apache JMeter, ApacheBench (ab), and Gatling can send parallel HTTP requests with concurrency control. However, they are designed for load/stress testing on a single target server rather than on multiple servers. They do not support generating different requests to different servers. ApacheBench and JMeter cannot conduct response aggregation or analysis, while Gatling focuses on response verification of each simulation step.

ql.io is a great Node.js-based aggregation gateway for quickly consuming HTTP APIs. However, having a different design goal, it does not offer throttling or generic response extraction (e.g., regular expressions). Also, its own language, table construction, and join query result in a higher learning curve. Furthermore, single-threaded Node.js might not effectively leverage multiple CPU cores unless running multiple instances and splitting traffic between them. 

Typhoeus is a wrapper on libcurl for parallel HTTP requests with throttling. However, it does not offer response aggregation. More critically, its synchronous HTTP library supports limited scalability. Writing a simple shell script with “for” loops of “curl” or “wget” enables sending multiple HTTP requests, but the process is sequential and not scalable.

Ning’s Async-http-client library in Java provides high-performance, asynchronous request and response capabilities compared to the synchronous Apache HTTPClient library. A similar library in Scala is Stackmob’s (PayPal’s) Newman HTTP client with additional response caching and (de)serialization capabilities. However, these HTTP clients are designed as raw libraries without features such as parallel requests with templates, throttling, response aggregation, or analysis.

Performance note

Actual REST Commander performance varies based on network speed, the slowest servers, and Commander throttling and time-out settings. In our testing with single-instance REST Commander, for 10,000 servers across regions, 99.8% of responses were received within 33 seconds, and 100% within 48 seconds. For 20,000 servers, 100% of responses were received within 70 seconds. For a smaller scale of 1,000 servers, 100% of responses were received within 7 seconds.

Conclusion and future work

“Speaking HTTP at scale” is instrumental in today’s platform with XaaS (everything as a service).  Each step in the solution for many of our problems can be abstracted and modeled by parallel HTTP requests (to a single or multiple servers), response aggregation with simple (if/else) logic, and extracted data that feeds into the next step. Taking scalability and agility to heart, we (Yuanteng (Jeff) Pei, Bin Yu, and Yang (Bruce) Li) designed and built REST Commander, a generic parallel async HTTP client as a service. We will continue to add more orchestration, clustering, security, and response analysis features to it. For more details and the video demo of REST Commander, please visit http://www.restcommander.com.  

Yuanteng (Jeff) Pei

Cloud Engineering, eBay Inc.






Async HTTP Client


Play Framework


Apache JMeter


ApacheBench (ab)








Apache HttpClient


Stackmob’s Newman



Yet Another Responsive vs. Adaptive Story

by Senthil Padmanabhan on 03/05/2014

in Software Engineering

Yes, like everyone else in web development, eBay has become immersed in the mystical world of Responsive Web Design. In fact, our top priority for last year was to make key eBay pages ready for multi-screen. Engineers across the organization started brainstorming ideas and coming up with variations on implementing a multi-screen experience. We even organized a “Responsive vs. Adaptive Design” debate meetup to discuss the pros and cons of various techniques. This post summarizes some of the learnings in our multi-screen journey.

There is no one-size-fits-all solution

This is probably one of the most talked-about points in the responsive world, and we want to reiterate it. Every web page is different, and every use case is different. A solution that works for one page might not work for another – in fact, sometimes it even backfires. Considering this, we put together some general guidelines for building a page. For read-only pages or web pages where users only consume information, a purely responsive design (layout controlled by CSS) would suffice. For highly interactive pages or single-page applications, an adaptive design (different views dependent on the device type) might be the right choice. But for most cases, the RESS (Responsive Design + Server Side Components) approach would be the ideal solution. With RESS we get the best of both worlds, along with easier code maintenance and enhancements. Here the server plays a smart role by not switching to a completely new template or wireframe per device; instead, the server helps deliver the best experience by choosing the right modules and providing hints to the client.

User interaction is as important as screen size

Knowing how a user interacts with the device (keyboard, mouse, touch, pointer, TV remote, etc.) is crucial to delivering the optimal experience. Screen size is required for deciding the layout, but is not in itself sufficient. This point resonates with the previous point about RESS:  the server plays a role. The first hint that our servers provide to the browser is an interaction type class (touch, no-touch, pointer, etc.) added to the root HTML or module element. This class helps CSS and JavaScript to enhance features accordingly. For instance, the CSS :hover pseudo selector is applied only to elements having a no-touch class as a predecessor; and in JavaScript, certain events are attached only when the touch class is present. In addition to providing hints, the server can include/exclude module and JavaScript plugins (e.g., Fastclick) based on interaction type.

Keeping the importance of user interaction in mind, we created a lightweight jQuery Plugin called tactile just to handle gesture-based events:  tap, drag (including dragStart, dragEnd), and swipe. Instead of downloading an entire touch library, we felt that tactile was sufficient for our use cases. By including this plugin for all touch-based devices, we enhance the user interaction to a whole new level, bringing in a native feel. These results would not be possible in a purely responsive design.

Understanding the viewport is essential

At a glance the term ‘viewport’ sounds simple, referring to the section of the page that is in view. But when you dig a little deeper, you will realize that the old idiom ‘The devil is in the detail’ is indeed true. For starters, the viewport itself can have three different perspectives:  visual viewport, layout viewport, and ideal viewport. And just adding the default viewport meta tag <meta name="viewport" content="width=device-width, initial-scale=1"/> alone may not always be sufficient (for example, in a mobile-optimized web app like m.ebay.com, the user-scalable=no property should also be used). In order to deliver the right experience, a deeper understanding of the viewport is needed. Hence before implementing a page, our engineers revisit the concept of viewport and make sure they’re taking the right approach.

To get a good understanding of viewports, see the documents introduction, viewport 1, viewport 2, and viewport 3, in that order.

Responsive components vs. responsive pages is a work in progress

Another idea that has been floating around is to build components that are responsive, instead of page layouts that are responsive. However, until element query becomes a reality, there is no clear technical solution for this. So for now, we have settled on two options:

  • The first option is to use media queries at a component level, meaning each component will have its own media queries. When included in a page, a component responds to the browser’s width and optimizes itself (based on touch/no-touch) to the current viewport and device. This approach, though, has a caveat:  It will fail if the component container has a restricted width, since media queries work only at a page level and not at a container level.
  • The second approach was suggested by some engineers in the eBay London office, where they came up with the idea of components always being 100% in width and all their children being sized in percentages. The components are agnostic of the container size; when dropped into a page, they just fit into whatever the container size is. A detailed blog about this technique can be found here.

We try to implement our components using one of the above approaches.  But the ultimate goal is to abstract the multi-screen factor from the page to the component itself. 

We can at least remove the annoyance

Finally, even if we are not able to provide the best-in-class experience on a device, at minimum we do not want to annoy our users. This means following a set of dos and don’ts.


  • Always include the viewport meta tag <meta name="viewport" content="width=device-width, initial-scale=1"/>
  • Add the interaction type class (touch, no-touch, etc.) to the root HTML or module element
  • Work closely with design to get an answer on how the page looks across various devices before even starting the project


  • Tiny click area, less than 40px
  • Hover state functionality on touch devices
  • Tightly cluttered design
  • Media queries based on orientation due to this issue

This post provides a quick summary of the direction in which eBay is heading to tackle ever-increasing device diversity. There is no silver bullet yet, but we are getting there.

Engineer @ eBay


When I started writing this blog post, my original goal was to provide (as alluded to in the title) some insights into my first year as a presentation engineer at eBay – such as my day-to-day role, some of the things we build here, and how we build them. However, before I can do that, I feel I first need to step back and talk about the renaissance. “The renaissance!?”, I hear you say.

The web dev renaissance

Unless you’ve been living under a rock for the past few years, you can’t help but have noticed a renaissance of sorts in the world of web development – propelled in large part, of course, by Node.js, NPM, GitHub, and PaaS, all of which are enabling and empowering developers like never before. Combined with the rapid innovations in the browser space – and within HTML, CSS, and JavaScript – what you have is an incredibly exciting and fun time to be a web developer! And I’m glad to say that the renaissance is truly alive and well here at eBay!


Of course the darling of this renaissance is Node.js. Discovering JavaScript on the server for me was just as exciting and liberating as the day I discovered it in the browser – and I’m sure many, many others will share that sentiment. Spinning up an HTTP server in the blink of an eye with just a few lines of JavaScript is simply audacious, and to this day it still makes me grin with delight – especially when I think of all the hours I’ve wasted in my life waiting for the likes of Apache or IIS! But it’s not just the speed and simplicity that enthralls; it’s also the feeling of utmost transparency and control.


But I digress. I hear you say, “What does this so-called renaissance have to do with eBay?” and “Isn’t eBay just a tired, old Java shop?” That might have been true in the past. But these days, in addition to an excellent new Java stack (we call it Raptor and, as the name correctly implies, it is anything but tired!), we now also have our very own Node.js stack (we call it CubeJS), which already powers several of our existing sites and applications. Yes, the wait is over; Node.js in the enterprise is finally a reality for developers. Since joining eBay in the spring of 2013, I have barely touched a line of Java or JSP code.

JavaScript everywhere

Why is this a big deal? Well, a common pattern for us web developers is that every time we change jobs more often than not we also have to change server-side languages. Over the years I’ve used Perl/CGI, ASP Classic, JSP, ColdFusion/CFML, PHP, and ASP.NET. Now as much as I do enjoy learning new skills (except the circus trapeze – that was ill-advised), I’d be stretching the truth if I said I knew all of those languages and their various intricacies inside out. Most of the time I will learn what I need to learn, but rarely do I feel the need or desire to specialize. It would be fair to say I wasn’t always getting the best out of the language and the language wasn’t always getting the best out of me. Really, deep down, I wanted to be using JavaScript everywhere. And now of course that pipe dream is true.


Adoption of Node.js is a win-win situation for eBay as we seek to embrace the flourishing world-wide community of JavaScript developers like myself as well as to leverage our excellent open-source eco-system. Node.js might only be the beginning; as eBay further adopts and advocates for such polyglotism, we increasingly welcome developers from different tribes – Python, PHP, Ruby on Rails, and beyond – and eagerly anticipate the day they become integrated with our PaaS (Platform as a Service). You see, it’s all about choice and removing barriers – which empowers our developers to delight our users.

Vive la Renaissance

In this post I’ve mainly focused my attention on Node.js but, as mentioned, the renaissance at eBay doesn’t stop there. We also embrace NPM. We embrace GitHub. We embrace PaaS. We embrace modern principles, tools, and workflows (Modular JavaScript, Grunt, JSHint, Mocha, LESS, and Jenkins – to name but a few!). Yes, we embrace open source – and it’s not all take, take, take either; be sure to check out KrakenJS (a web application framework built on Node.js by our good friends over at PayPal), RaptorJS (eBay’s end-to-end JavaScript toolkit for building adaptive modules and UI components), and Skin (CSS modules designed by eBay to build an online store). And be sure to keep your eyes open for more contributions from us in the near future!


Do you share our passion for JavaScript, Node.js, and the crazy, fast-paced world of front-end web development? Interested in finding out more about joining our presentation team? Please visit http://www.ebaycareers.com for current openings.


Deployment to the cloud is an evolving area. While many tools are available that deploy applications to nodes (machines) in the cloud, zero deployment downtime is rare or nonexistent. In this post, we’ll take a look at this problem and propose a solution. The focus of this post is on web applications—specifically, the server-side applications that run on a port (or a shared resource).

In traditional deployment environments, when switching a node in the cloud from the current version to a new version, there is a window of time when the node is unusable in terms of serving traffic. During that window, the node is taken out of traffic, and after the switch it is brought back into traffic.

In a production environment, this downtime is not trivial. Capacity planning in advance usually accommodates the loss of nodes by adding a few more machines. However, the problem becomes magnified where principles like continuous delivery and deployment are adopted.

To provide effective and non-disruptive deployment and rollback, a Platform as a Service (PaaS) should possess these two characteristics:

  • Best utilization of resources to minimize deployment downtime as much as possible
  • Instantaneous deployment and rollback

Problem analysis

Suppose we have a node running Version1 and we are deploying Version2 to that node. This is how the lifecycle would look:

  typical deployment workflow

Every machine in the pool undergoes this lifecycle. The machine stops serving traffic right after the first step and cannot resume serving traffic until the very last step. During this time, the node is effectively offline.

At eBay, the deployment lifecycle takes a reasonably sized application about 9 minutes. For an organization of any size, many days of availability can be lost if every node must go into offline phase during deployment.

So, the more we minimize the off-traffic time, the closer we get to instant/zero-downtime deployment/rollback.

Proposed solutions

Now let’s look into a few options for achieving this goal.

A/B switch

In this approach, we have a set of nodes standing by. We deploy the new version to those nodes and switch the traffic to them instantly. If we keep the old nodes in their original state, we could do instant rollback as well. A load balancer fronts the application and is responsible for this switch upon request.

The disadvantage to this approach is that some nodes will be idle, and unless you have true elasticity, it will amplify the node wastage. When a lot of deployments are occurring at the same time, you may end up needing to double the capacity to handle the load.

Software load balancers

In this approach, we configure the software load balancer fronting the application with more than one end point so that it can effectively route the traffic to one or another. This solution is elegant and offers much more control at the software level. However, applications will have to be designed with this approach in mind. In particular, the load balancer’s contract with the application will be very critical to successful implementation.

From a resource standpoint, both this and the previous approach are similar; both use additional resources, like memory and CPU. The first approach needs the whole node, whereas the other one is accommodated inside the same node.

Zero downtime

With this approach, we don’t keep a set of machines; rather, we delay the port binding. Shared resource acquisition is delayed until the application starts up. The ports are switched after the application starts, and the old version is also kept running (without an access point) to roll back instantly if needed.

Similar solutions exist already for common servers.

Parallel deployment – Apache Tomcat

Apache Tomcat has added the parallel deployment feature to their version 7 release. They let two versions of the application run at the same time and take the latest version as default. They achieve this capability through their context container. The versioning is pretty simple and straightforward, appending ‘##’ to  the war name. For example, webapp##1.war and webapp##2.war can coexist within the same context; and for rolling back to webapp##1, all that is required is to delete webapp##2.

Although this feature might appear to be a trivial solution, apps need to take special care with shared files, caches (as much write-through as possible), and lower-layer socket usage.

Delayed port binding

This solution is not available in web servers currently. A typical server first binds to the port, then starts the services. Apache lets you delay binding to some extent by overriding bindOnInit, but still the binding occurs after the connector is started.

What we propose here is the ability to start the server without binding the port and essentially without starting the connector. Later, a separate command will start and bind the connector. Version 2 of the software can be deployed while version 1 is running and already bound. When version 2 is started later, we can unbind version 1 and bind version 2. With this approach, the node is effectively offline only for a few seconds.

The lifecycle for delayed port binding would look like this:


However, there is still a few-second glitch, so we will look at the next solution.

Advanced port binding

Now that we have minimized the window of unavailability to a few seconds, we will see if we can reduce it to zero. The only way to do that would be to bring version 2 up before version 1 goes down. But first:

Breaking the myth:  ‘Address already in use’

If you’ve used a server to run an application, I am sure you’ve seen this exception at least once. Let’s consider this scenario: We start the server and bind to the port. If we try to start another instance (or another server with the same port), the process fails with the error ‘Address already in use’. We kill the old server and start it again, and it works.

But have you ever given a thought as to why we cannot have two processes listening to the same port? What could be preventing it? The answer is “nothing”! It is indeed possible to have two processes listening to the same port.


The reason we see this error in typical environments is because most servers bind with the SO_REUSEPORT option off. This option lets two (or more) processes bind to the same port, provided the application that bound the first process had this option set while binding. If this option is off, the OS interprets the setting to mean that the port is not to be shared, and it blocks subsequent processes from binding to that port.

The SO_REUSEPORT option also provides fair distribution of requests (important since threading suffers from bottlenecks in multi-cores). Both of the threading approaches—one thread listening and then dispatching, as well as multiple threads listening—suffer from the under/over utilization of cycles. An additional advantage of SO_REUSEPORT is that it takes care of sending the datagram from the same client to the same server process. However, it has a shortcoming:  packets might be dropped if new processes are added or removed on the fly. This shortcoming is being addressed.

You can find a good article about SO_REUSEPORT at this link on LWN.net. If you want to try this out yourself, see this post on the Free-Programmer’s Blog.

The SO_REUSEPORT option address two issues:

  • The small glitch between the application version switching:  The node can serve traffic all the time, effectively giving us zero downtime.
  • Improved scheduling:  Data indicates (see this article on LWN.net) that thread scheduling is not fair; the ratio between the busiest thread versus the one with the least connections is 3:1.


Please note that SO_REUSEPORT is not the same as SO_REUSEADDRESS, and that it is not available in Java as not all operating systems support it.


Applications can successfully serve traffic during deployment, if we carefully design and manage those applications to do so. Combining both late binding and port reuse, we can effectively achieve zero downtime. And if we keep the standby process around, we will be able to do an instant rollback as well.


Copyright © 2011 eBay Inc. All Rights Reserved - User Agreement - Privacy Policy - Comment Policy