Tiered Test Automation

As application code has evolved from monolithic to client-server, and to now micro-services, test automation has to evolve as well. Test automation relies very heavily on user interface and web services endpoint layers. That makes the test pass rates low. These layers are flaky due to factors such as data dependencies and inconsistencies, environmental stability, slow execution, and expensive maintenance of the test code.

In a Tiered-Level Test Automation approach, the test code is written to follow a test pyramid popularized by Martin Fowler, where there is a minimal amount of focus given to the user interface and user actions tests. As the test code is written upstream, more scenarios and test permutations are written on the other tier levels of the automation.

Black box testing

Black box testing tests the application in the eyes of the users. For manual testing, the test interacts with the page and verifies that the functionality is working as expected by visually checking the UI components and performing actions on them.

Customer Facing Public Website

Customer-Facing Public Website

White box testing

White box testing looks at the application code that is subject to test.

public List fetchOverridesFromConfig(IAppConfigContext context) {...}
public void applyOverrides(PageDefinition template, List moduleOverrides, ExpClassification expClassification) {...}

Unit testing

Unit testing looks at the smallest testable parts of an application. Test automation should be heavy on unit tests. We implement unit testing by creating tests for individual methods. The technology stack used in our project is comprised of Java, JUnit, Mockito, JSON, and ObjectMapper.

Implementation highlights

Our unit tests mock the dependencies, and pass them to the application method in question to run them through the business logic. Then they compare and assert whether the actual response is the same as the expected response that is stored in a JSON file. The following sample code is a unit test.

public void testFetchTemplateFromRepo() 
throws JsonGenerationException, JsonMappingException, IOException{
  CustomizationSample s = getSample(name.getMethodName());
  LookUpConfiguration.RCS_TEMPLATE_CONFIG = Mockito.mock(LookUpConfiguration.class);
  String templateString = mapper.writeValueAsString(s.getDefinition());
  PageDefinition defition = helper.fetchPageTemplateFromConfig(

public void testPageDefinition(){
  CustomizationSample s = getSample(name.getMethodName());
  helper.applyOverrides(s.getDefinition(), s.getOverrides(), ExpClassification.ALL);
  Assert.assertEquals(s.getDefinition().toString(), s.getOutput().toString());

Integration mock tests

The strategy and rationale for white box testing is to find out more of the issues and problems upstream. Integration tests can be written to verify the contracts of web services independently from the whole system.

For example, if a configuration system is down or data is wiped out, does that mean the application code cannot be signed-off because the tests are failing? One test implementation is mocking the values from the configuration system, and then using them as the dependencies to run through the integrated business logic and assert whether the actual response is the same as the expected. The technologies used in the mock tests include JUnit Parameterized Tests, MockIto, PowerMockIto, JSON , and GSon.

Implementation highlights

How are the unit tests different from the integration tests? The unit tests run through the individual methods, while the integration tests can call those methods all together.

public void testResponse() throws IOException {
  PageDefinition pageDefinition = getPageDefinition
     (rcsConfigurationContext, true, true);
  new PageDefinitionValidator.Validator(pageDefinition)

private PageDefinition getPageDefinition(RCSConfigurationContext context, 
boolean customized, boolean includeToolMeta) {
  template = PageDefinitionHelper.getInstance()
     template, context, includeToolMeta);
  if (customized) {
     List overrides = PageDefinitionHelper.getInstance()
        template, overrides, expClassification);
  return template;

Services endpoint tests

The services layer tests are run against the RESTful endpoints with only a few of the positive scenarios and invalid requests. Then the responses are validated with the expected data values, error codes, and messages. The technologies used are Java, Gson, and Jersey API.

Implementation highlights

Use JSON to store the request and response, and parse and pass them as a data provider into the test methods.

 "getCustomizationById": [{
     "marketPlaceId": "0",
     "author": "SampleTestUser2",
     "responseCode": "200"
 "negative_getCustomizationById": [{
     "marketPlaceId": "0",
     "customizationId": "",
     "domain": "home_page",
     "responseCode": "500",
     "errorMessage": "Invalid experience usecase."

public void testPageDefinitionService(String url, Site site, 
String propFile, JSONObject jsonObject) throws Exception {
  // Build the context builder from JSON
  // Create Customization
  Response createCustomizationResponse = CustomizationServiceClientResponse
     .createCustomization(url, propFile,
     xpdContextBuilder, xpdModuleOverridesList);
  // Get the customization By context
  String customizationId = CustomizationServiceClientResponse
     .getCustomizationIdByContext(url, propFile,
     jsonObject.getString("experienceUseCase"), xpdContextBuilder);
  // Validate if the module override is applied on pageDefinition
  new PageDefServiceResponseDataValidator
     .Validator(response, xpdModuleOverridesList)

UI Tests

User Interface test automation is minimal and focuses on actions on the page, for example, editing and saving the page after making the customization changes. The technologies used are Java, Selenium WebDriver, HTTPClient, JSON, and Gson.

Implementation highlights

The UI tests iterate through a collected list of web elements and actions, rather than individually taking each locator for each row and column. The save flow tests also integrate to the service response to verify against the source of truth. We are validating the data that is passed through the UI against the service response and comparing them to determine whether they both are equal. This strategy helps to validate more data on the fly rather than hard-coding the expected output.

The functional flow is also verified through the integration with services. For example, the edit and save flows are tested by getting the service response, using it as the source of truth to validate the data that is saved, and verifying whether they are equal. Once again, this approach helps to validate more data on the fly rather than hard-coding the expected output in the test class or in some properties file.

A snippet of the Internal Tool's UI that is the subject of the test

A snippet of the rows and columns of the Internal Tool’s UI that is the subject of the test

public Validator save() {
   SaveServiceAPI saveService = new SaveServiceAPI();
   saveCustomization.editModuleBeforeSaving(content(propFile, "SAVE_ON_TITLE"));
   assertChain.string().equals(content(propFile, "SAVE_ON_TITLE"),
   saveService.getModuleTitle(content(propFile, "SAVE_REQUEST_URL")),
      "Saved Title in the UI doesn't match with the Title in Service response");
   return this;


Discovering issues upstream is more efficient and less expensive than finding them when the product is already developed and in production. The Tiered-Level Test Automation approach encourages developers to think and sets an example for where and when it is best to test the product.

Implementing the tiered test automation was indeed a collaborative effort. Thanks to my colleagues, Kalyana Gundamaraju, Srilatha Pedarla, Krishna Abothu, and Manoj Chandramohan for their contributions to this test automation design.

Ann Del Rio

Beats @ eBay – Collectbeat – A Journey where Company and Community Come Together

In the beginning…

In early 2016, the Monitoring Special Interest Group (SIG) ventured into solving the problem of logs and metrics shipping from Tess.io (eBay’s Kubernetes ecosystem). Kubernetes, as one may be aware of, is a container management system. Users have the flexibility to drop in Docker containers and let Kubernetes manage them. These Kubernetes clusters, or Tess clusters inside of eBay, are multi-tenanted. We have many customers running their workloads at any given time. The multi-tenanted aspect of our Tess clusters brings about some interesting problems. Some of them are:

  • All logs and metrics need to be logically grouped (namespaced) based on customer/application type.
  • Workloads are ephemeral and tend to move across Nodes.
  • Additional metadata is required to search/query logs and metrics from Pods.
  • Metrics and logs need to be exposed/collected in a cloud native fashion.
  • Ability to self onboard logs and metrics to the centralized system.

Our first offering…

With these problems in mind, we wanted to offer a solution that allowed users to drop their Pods into Kubernetes and obtain their logs and metrics in the simplest possible way. If a user’s Pod logs to stdout/stderr, we should be able to collect the logs, append all the Pod metadata to each log line. If a user exposes metrics in a well-known HTTP endpoint, we should be able to collect it.

Knowing these challenges and the goal in mind, we embarked on the problem, attempting to solve one issue at a time.

Let us take logs as the first example and see how we attempted to solve it. Docker allows users to write to logs stdout/stderr, which is taken and placed in a well-known file path of:


If we were to listen to /var/lib/docker/containers/*/*-log.json, we would be able to collect all the logs that are being generated by all Pods in a given Node. Configuring Filebeat to listen to that path is simple and is exactly what we did. When we collected all these logs, we needed a way for users to be able to query based on Pod name, Namespace name, etc. Kubelet also started exposing these files in a symlink of:


It would be easy to write a processor on Filebeat to split the source value in the payload and extract pod, namespace, and container name. But, we realized that pod labels also carry significance in querying an entire deployment’s worth of logs and that information was not present. To solve this, we wrote our own custom Beat called Annotatebeat which can:

  • Listen on lumberjack protocol for Beat events
  • Look for the source field and extract the container ID
  • Look up the Kube API server for metadata of all pods in a given node
  • Use the container ID to append all the remaining metadata onto the event
  • Send it to a prescribed destination

As long as a user writes an application that can write to stdout/stderr, Docker would pick up the log and place it in a well-known log file. Filebeat tails the logs, sends it to Annotatebeat, which annotates the log message with pod metadata and ships the logs out. At this time, the Beats community wasn’t fully invested in Kubernetes, so we built some of these features internal to eBay.

Seeing how simple it was to write logs and have them shipped, we wanted a simple experience for metrics as well. At Elastic{ON} 2016, the Elastic folks announced Metricbeat as a new offering they were coming up with. Metricbeat has the concept of “modules” where a module is a procedural mechanism by which metrics can be collected from a given application. If Metricbeat is configured to listen to localhost:3306 for module type “mysql”, the MySQL module knows that it should connect to the host:port and run a `SHOW GLOBAL STATISTICS` query to extract metrics and ship them out to the configured backend.

This concept appealed to us because it allows “drop in” installations like MySQL, Nginx, etc. to be monitored out of the box. However, we needed users, who write their own code and deploy applications into Kubernetes, to also be able to monitor their applications. We hence came up with Prometheus/Dropwizard modules for users to expose their metrics via the above formats as HTTP endpoints, so that we could collect metrics from them and ship them. However at the time of Metricbeat creation, it was designed to be tailored for specific applications like MySQL, Apache, and Nginx and not for generic frameworks like Prometheus or Dropwizard. Hence our PR was not initially accepted by the community, and we managed the module internally.

The discovery is something that is not supported by Beats out of the box. We had to come up with a mechanism that says “given a node on Kubernetes, find out all the pods that are exposing metrics and start polling for metrics.” How do we find the pods that are poll worthy? We look for following metadata found as annotations:

io.collectbeat.metrics/type - the type of metrics exposed (Metricbeat module name)
io.collectbeat.metrics/endpoints - ports to look at
io.collectbeat.metrics/namespace - namespace to write metrics into

As long as these three mandatory annotations are present, we should be able to start polling for metrics and write them into the configured backend. This discovery module uses Kubernetes’ controller mechanism to keep watching for updates within the node and start polling configured endpoints. This discovery module resided in a custom Beat that we lovingly call Collectbeat. To sum up, we used Collectbeat for collecting metrics from pods and Filebeat for collecting logs. Both sent their data to Annotatebeat, which appended pod metadata and shipped it to the configured backend. We ran this setup internally for about a year on version 1.x. Then Beats came out with 5.x.

Challenges in managing an internal fork…

When we were ready to upgrade to Beats 5.x, most of the interfaces had changed, and all of our custom code had to be upgraded to the newer interfaces. By this time, the Beats community had evolved Metricbeat to support generic collectors like Prometheus and several other changes for which we had written changes in our internal fork were available upstream. The effort to upgrade to 5.x would be substantial.

We had two options in front of us. One was to keep going down this path of managing our internal fork and invest a month every major release to pull in all the new features and make necessary changes to our internally owned features. The second option was to open source anything that was generic enough to be accepted by community. On taking stock of all the features that we had written, 90% of them were features applicable to any Kubernetes cluster. The remaining 10% was required to ship data to our custom backend. Hence, we took a decision to upstream that 90% so that we don’t have to manage it any longer.

Be one with community…

In Elastic{ON} 2016 we met with the Beats community and came to an agreement to open source as much as we can with regards to the Kubernetes use-case, since we already have expertise monitoring Kubernetes internal to eBay, in return for faster PR reviews.

The first thing that we decided to get rid of internally was Annotatebeat, which did the metadata enrichment. Today in libbeat there is a processor called add_kubernetes_metadata, which was a result of that decision. We took all the logic present in Annotatebeat and converted it into a processor with the help of Carlos Pérez-Aradros, a member of the Beats community. We also took our internal Prometheus implementation and used it as a reference to update the community-available version to cover a few missing use cases. Dropwizard, Kubernetes Metricbeat modules, were something we used internally that we also open sourced.

Eventually we got to a point where we could run both Filebeat and Metricbeat as available upstream without any necessary changes. With go1.8 out, there was also support for plugins and we offloaded all our custom code internal to eBay. It is managed independent of stock Beats.

We realized the hard way that it is impossible to keep up with the rapid pace of an open source community if we have custom code residing in our internal fork. Not having a custom fork internally has helped us to be on the most recent version of Beats all the time and has reduced the burden of pulling in new changes.

It is always easier to make progress when we work with the community on features that not only benefit us today, but may also benefit someone else tomorrow. More thoughts and ideas on the code can always make it better. A good working relationship with the Beats community has helped us not only with code management, but also with features that were required internally that ended up getting built by the community. Today, eBay contributes the most amount of code outside of Elastic itself to the Beats product. This has not only benefited the product, but also eBay as well. With the combined effort of eBay and the Elastic, Beats will have native Kubernetes support in 6.0.

A new day…

Removing all of our custom code improved our agility to think of newer use cases. We wanted to increase coverage for the number of applications from which metrics can be collected. We realized that writing Metricbeat modules for every application is an impossible task and that going after protocols is a more scalable option.

One protocols that has tremendous coverage is the plain text protocol understood by Graphite. Tools like CollectD and StatsD can write to destinations that understand the Graphite protocol. We then implemented “Graphite server” as a Metricbeat module and contributed it back to Beats. This module inside of Collectbeat’s Kubernetes discovery helped us support use cases where customers can annotate their Pods with a parsing rule, and Collectbeat would receive metrics and parse them to split the metric name and tags before ingesting them to the desired backend. Another similar protocol that we went after was vanilla HTTP, where users can send metrics as JSON payloads to Metricbeat, and it would be shipped to the desired backend.

Being able to discover metrics inside of a Kubernetes environment is a big win in itself. The benefits were quite huge, and we saw the need to do the same for logs as well to support two use-cases:

  • Being able to stitch stack trace-like log patterns
  • Being able to read logs that are not being written into stdout

Because Kubernetes clusters inside of eBay are multi-tenanted, it becomes impossible to configure a single multiline pattern on Filebeat for all Pods inside of the cluster. We applied our learnings from metrics to log collection and decided to expose annotations that users can use to define multi-line patterns based on how Filebeat expects multiline to be configured. A user can, at a container level, configure multiline via annotations, and Collectbeat ensures that the required Filebeat prospectors are spun up to stitch stack traces.

A long standing problem that we have seen in our Kubernetes clusters is that, since we heavily rely on docker’s JSON log driver, performance is always a concern. Letting Filebeat decode each log line as a JSON payload is quite expensive. Also, there are a lot of use cases where a container may expose one of its many log files via stdout, but all others are written in specific file in the container.

One such example is Apache Tomcat, where catalina.out’s logs are written into stdout, whereas access logs are not. We wanted to solve both these problems with an unconventional solution. Collectbeat was rewritten to accept log paths in the Pod’s annotations, and based on what is the underlying Docker file system, Collectbeat would spin up prospectors by appending the container’s filesystem path to the file path. This would let us tail log files present inside of the container, and helps us to not rely on JSON log file processing. We can also collect log files from different files written by a container.

Where we are today…

Collectbeat has become our defacto agent that sits on every node through DaemonSets in our Kubernetes clusters to collect logs and metrics. Collectbeat runs in both Filebeat mode and Metricbeat mode to be able to tail log files and collect metrics respectively. This is what our Node looks like:

What are the features that Collectbeat has today? We are able to:

  • Collect metrics from any Pod that exposes metrics that abide to all supported Metricbeat modules
  • Collect logs written to stdout or files inside the Docker container
  • Append Pod metadata on every log and metric collected
  • Allow Pods to push metrics through Graphite protocol and parse them uniquely
  • Stitch stack traces for application logs

Today we run Collectbeat on over 1000 nodes shipping more than 3TB of logs and several billion data points per day. Our end goal is to put Collectbeat on every host in eBay and be able to collect logs and metrics from any application that is being deployed.

Are we there yet? No, but we are slowly, but surely, getting there. There are still several more features that we have yet to crack, like being able to give QoS for all Pods so that all Pods are treated equally when shipping logs and metrics. We also want to be able to provide quotas and throttle workloads when applicable.

We have greatly benefited from Collectbeat, and with great excitement we are happy to announce the open sourcing of Collectbeat. Putting our code out in the open will help us get feedback from the community and improve our implementation at the same time help others who are trying to solve the same problem as we are. So, go get github.com/ebay/collectbeat and let us know your feedback.


A big shout out to all the folks in eBay who made this a reality:

Also, a big shout out to the Elastic folks from the Beats community who have helped us along the way:

Automating the Creation of Standard Change Requests at eBay

eBay’s Network Engineering team operates a large-scale network infrastructure with a presence across the globe. Our mission is to provide a seamless experience connecting buyers and sellers wherever they may be. The network we created to support that goal is comprised of different vendors and designs that have evolved over time. Networks require care and feeding on a regular basis in order to ensure that performance targets are met. How can we make the numerous weekly changes required while minimizing the risk of an impact?

One way in which we accomplish our goal is by making all change management procedures as standard and reproducible as possible. Common tasks such as line card installations, BGP changes, or the turn up of new ports are formalized into Standard Operating Procedures (SOPs). A SOP lays out all of the needed pre-checks, change steps, and post-checks for a successful change to be executed. Our SOPs are put through an engineering review process where we review and hone these steps so that the combined experience of all team members can be brought to bear on the problem.

As we went through this process of creating SOPs for most of our workload, we realized that we were doing many of the same things each time. Examples include things such as backing up the configuration, verifying that the console works, and executing commands that let us verify status before and after a change is executed. All of these steps, taken together, began to sound very much like a broken record to us as we created SOP after SOP.

Project Broken Record (PBR)

We determined that fully automating the creation of SOP-based change requests would be a worthwhile investment of our time. Now that we had the most common tasks well-documented in SOPs, we could actually run through most steps programmatically with some work invested. Because many steps were identical (such as collecting ‘show ip ospf neighbor’) from one type of change to another, bits of code would be reusable. Some challenges, such as how to detect different vendors, code versions, or design standards, would present themselves, but the important part for us was to get started and validate that the concept was workable before expanding it.

Our project outline for automating standard change requests was as follows:

  • Preparation and Planning
  • Design the System
  • Develop Proof of Concept
  • Document the System
  • Execute Pilot
  • Evaluation

Preparation and Planning

We decided to focus on a few common and relatively easier tasks with already defined SOPs. The tasks selected were:

  • Costing links in or out for maintenance
  • Enabling or disabling ports
  • Decommissioning switches
  • VLAN add/change
  • Code upgrades (various vendors)

A smaller set of tasks like this kept the scope contained to a reasonable size while still allowing the opportunity to bump into a few challenges and solve problems that might be encountered when the project is expanded to cover all of our SOPs.

Dividing the work among several people allowed us to build components in parallel. All coding was stored in a Git repository to facilitate group participation.

Design the System

The system is built out of various building blocks. The foundation is a Python script named ‘Auto About.’ This script contains functions that lay out the high-level outline of the pending change request. It defines specific devices, interfaces, or neighbors that are involved in the pending change. It gathers the most basic information, “What is this maintenance about?”, hence the name. A few examples of functions within Auto About are ‘get_routing_instances,’ ‘collect_vlan_info,’ and ‘collect_power_supply_data.’  Feeding Auto About the arguments of a device name and the type of maintenance is all that is required to gather information. The output of Auto About is a small YAML file that contains the information collected at this step.   

This small file is fed into the Collector script. This component gathers information from the network devices. Collector is written in Python and is a stateless system. There is no database or long-term storage of information at this point. Collector’s output is a YAML file, much longer than Auto About’s file, with everything we need to know about the change we’re about to execute.

At this point, we have all of the information we need, but reading a YAML is not very friendly for humans. We still track changes in a ticketing system, and we want to be able to review them.

A separate Python script, sop.py, combined with a Jinja2 template that matches the specific type of maintenance desired, takes that long YAML file and generates a few plain text files for us. Each step or check from our original SOP is broken down in the same way within the script, and the output lists each step and sub-step in the proper order. Any device-collected information is added where it is required. Output files created include an Action plan (your “forward” steps), a Verification plan where changes are tested, and a Rollback plan (your “backwards” steps in case you need to undo your changes). These plain text, human-readable files are used to create the Change Request (CR) in our in-house ticketing system. They represent a step-by-step and line-by-line plan to execute the work.

A final Python script called cr.py (‘cr’ indicating change request) handles the task of pushing the information files created by sop.py into Trace, our internally developed ticketing system that tracks changes. This saves engineer time by automating another piece of the CR puzzle for them. cr.py handles aspects of the change ticket process, such as filling in names of the people who submit or check tickets, setting the time and date of the proposed change, and requesting the creation of a new CR ticket.

Develop Proof of Concept

The proof of concept (PoC) involved creating the first versions of the components highlighted above and testing for functionality as well as interoperation of the individual pieces. A number of different people worked on this project, and the correct operation of all of the parts together was tested in the PoC phase. The PoC was a success, and we decided to press forward to a pilot phase.

Document the System

Documentation was created primarily within our Git repository. This was done so that everything a contributor would need was in one place and could be easily updated by anybody working on the project.  A simple ‘readme’ file uploaded into the appropriate directory in Git provided a place to put higher level information about how a piece code was supposed to function. This was done in addition to good commenting within each file, of course!  Some project tracking items were also hosted on a wiki page, where they were more easily accessed by stakeholders who were not directly involved in the coding aspects.

Execute Pilot

During the pilot phase, we saw a rapid expansion of the PBR program as we started onboarding more use cases and actually using this system in our live change management workflow.

Exposing the output from PBR to the wider group of engineers during our pilot phase was a great way to get additional feedback on how we could collect the right information that would be valuable for the change type being executed. During the pilot period of about six months, numerous small issues with the various CR templates were corrected. Many of these issues were uncovered in our regular change management meetings as we discussed pending CR tickets.

Where it was possible, we aimed to make the CR tickets have the same look and feel.  For example, standardizing the sequence numbers for prechecks, change steps, and post checks is one way we found to make the CRs more readable and faster to evaluate at change management meetings. As a result of this feedback loop, our templates and methods quickly evolved to be more comprehensive and polished.


Project Broken Record took us approximately one year to complete from the initial meetings to a working product that had been successfully piloted.  We found that all of the pieces of this product require updating and fine tuning from time to time as we strive to execute the perfect change.  This type of regular time investment is a good tradeoff for eBay, because we are confident that this system has helped to avoid outages while streamlining the change process.

Our change management meetings were able to be run more efficiently, because we became familiar with the standard layout of change tickets. It was easier to review and approve very standard SOP-based things vs. the previous system of a queue of tickets all written differently by different engineers. Increasing the throughput of the review process directly benefited our internal customers waiting on change work to be completed.

We track all impacts to business availability and analyze what we could have done to avoid impact. One way that we sort this data is by root cause. Causes could be things such as hardware failure, vendor software bugs, change tickets gone wrong, etc. In 2017, impact time due to change tickets was very nearly zero. There were several parallel initiatives that contributed to this, but Project Broken Record was a part of that success story to be sure. Doing the same change the same way each time reduces the chance of unexpected consequences and builds our confidence in our procedures.

Where We’re Headed

We are happy with the progress we have made so far, but there are still a number of things we would like to improve upon.

We want to become more disciplined in our coding by creating development and master branches of our code. Currently, most portions of this system are in a development type state, but are also being used daily. We are also testing systems built up from this that will perform standard maintenances completely automatically by following the SOPs using the large YAML file information.

The larger goal we are pursuing here is minimal human interaction with the production network. Now that we have seen a return on our initial investment, we want to take this to a higher level of engineered solution. A ground-up rewrite of many of the pieces described above is already underway to consolidate functions and improve the way in which we gather information from the network. We are committed to this program, and we expect it to continue to evolve and grow.

Our team exists to help eBay’s business to be successful. As we explore this new automation-focused landscape, we are looking for the best ways to achieve that goal through solid uptime, delivery of projects, and a great user experience for everyone on the platform. The thought processes on our team have shifted from one in which we directly care for an ever-increasing number of network devices directly to one in which we create tools that can do that for us. This new way of approaching operations at eBay is much more scalable and is where we are placing a heavy emphasis as we march toward 2018 and beyond.