Introducing Regressr – An Open Source Command Line Tool to Regression Test HTTP Services

 

In the Artificial Intelligence-Human Language Technologies team at eBay, we work on software that powers eBay’s conversational bot, ShopBot. We ship software daily to production that makes our bot intelligent, smarter, and more human. As a crucial part of this effort, we have to make sure any regressions are caught quickly and fixed to help keep our customers doing what they love – making purchases on ShopBot.

ShopBot’s backend is built on a polyglot suite of Scala, Java, and Python-based microservices that work in unison to provide ShopBot’s functionality. Hence, many of the crucial services need to be regression tested before we can release a new version to production.

To help with that effort, we built and are open sourcing our regression testing tool, Regressr.

Why Regressr

We looked at the common approaches that are widely used in the industry today to build an automated regression testing suite. In no particular order, they are listed below.

  • Comprehensive JUnit suite that calls two versions (old and new) of the service and compares the minutiae of the responses – JSON elements, their values and the like.
  • Using SOAP UI’s Test Runner to run functional tests and catch regressions as a result of catching functionality failures.
  • No regression tests. Wait for the front-end to fail as a result of front-end regression tests in dev or test, and trace the failure to the backend.

We also looked at Diffy and were inspired by how simple it was to use for catching regressions.

We had some very unique requirements for testing eBay ShopBot and found out that none of these tools provided the features we wanted:

  1. Super-low ceremony: Must quickly be able to productionize and gain significant value without too much coding or process.
  2. Low conceptual surface area: An engineer should be able to grok what the tool does and use it quickly without going through bricks of manuals and frameworks.
  3. Configurability of comparisons: We want to able to specify how the response should be compared. Do we want to ignore JSON values? Do we want to ignore certain elements? What about comparing floating point numbers, precision, etc.?
  4. Compare at high levels of abstraction: We want to capture high-level metrics of the responses and then perform regression testing on them. For example, we would like to be able to say the number of search results in this response were 5 and then use that number to compare against future deployments.
  5. Low maintenance overhead: We want maintenance of the regression suite to have low or negligible coding effort. Once every deployment is approved for release, we just want the suite to automatically capture the current state of the deployment and use that as a reference for future deployments.
  6. CI/CD Integration: Finally, we wanted this suite to be hooked into our CI/CD build.

We built Regressr specifically to solve these requirements, so that the team can focus on the important stuff, which is serving great experiences and value to our customers who use ShopBot.

Regressr is a Scala-based command line tool that tests HTTP services, plain and simple. We built it to be really good at what it does. With Regressr, you can use the out-of-the-box components to get a basic regression test for your service up and running quickly and gain instantaneous value, while coding regression tests that will cover close to 100% of the functionality in a more delayed fashion as time permits. Finally, Regressr doesn’t even need the two services to be up and running at the same time, as it uses a datastore to capture the detail of the baseline.

Regressr works in two modes:

  1. Record – Use Record when you want to capture the current state of a deployment to be compared as the baseline for later deployments. A strategy file is specified that contains the specifics of what needs to be recorded.
  2. Compare/Replay – Compares the current state of a deployment with a baseline and generates a comparison report.

The image below captures what is done in these two flows.

The Strategy File

The strategy file is the configuration that drives what happens during a record and a compareWith execution.

An example strategy file that posts two requests and performs regression testing is specified below:

service:
  baseURL       : http://localhost:9882/endpoint

commonHeaders:
  Content-Type    : application/json

requests:

  - requestName: say_hello
    path: /say_hello
    method: GET
    recorder: org.ebayopensource.regression.internal.components.recorder.SimpleHTTPJSONRecorder
    comparator: org.ebayopensource.regression.internal.components.comparator.SimpleHTTPJsonComparator

  - requestName: shop for a pair of shoes
    path: /shopping
    method: POST
    requestBuilder: org.ebayopensource.regression.example.ExampleRequestBuilder
    dataInput:
      conversationId     : 12345
      keyword            : Show me a pair of shoes
      mission_start      : yes
    recorder: org.ebayopensource.regression.internal.components.recorder.SimpleHTTPJSONRecorder
    comparator: org.ebayopensource.regression.internal.components.comparator.SimpleHTTPJsonComparator

  - requestName: say goodbye
    path: /goodbye
    method: POST
    requestBuilder: org.ebayopensource.regression.internal.components.requestBuilder.StringHTTPBuilder
    dataInput:
      payload            : '{"mission" : "12345", "keyword" : "good bye", "mission_start" : "no" }'
    recorder: org.ebayopensource.regression.internal.components.recorder.SimpleHTTPJSONRecorder
    comparator: org.ebayopensource.regression.internal.components.comparator.SimpleHTTPJsonComparator

The Components

The important parts of the strategy file are the different components, RequestBuilder, Recorder, and Comparator.

RequestBuilder is used to specify how the request should be built in case of a POST or a PUT request.

The interface for RequestBuilder accepts a Map of Strings and outputs the payload that will be sent in the request.

abstract class RequestPayloadBuilder {

  def buildRequest(dataInput: Map[String, String]): Try[String]

}

Recorder is used to specify what parts of the response should be recorded for future comparison. Regressr injects all parts of the response to the Recorder during this time.

The interface for Recorder accepts a List of HTTPResponses (most of the time this will be one) and return a RequestRecordingEntry.

The RequestRecordingEntry is a holder for a value that will be recorded in Regressr’s datastore. The response code can be stored in a RequestRecordingEntry. Similarly a JSON response can be stored in a RequestRecordingEntry. You can also do some computation on the JSON and store a number (like the number of search results).

The interface for Recorder looks like the below.

protected def record(responses: Seq[HTTPResponse]) : Try[RequestRecordingEntry]

Finally, the Comparator is used to specify the details of comparison during the compareWith mode. How do you want to compare JSON’s? What about strings?

The interface for Comparator looks like the below. It accepts both the recorded RequestRecordingEntry and the current one and returns a List of CompareMessages which will be included in the comparison report.

abstract class Comparator {

  def compare(recorded: RequestRecordingEntry, replayed: RequestRecordingEntry): Try[Seq[CompareMessage]]

}

Regressr comes with out-of-the-box components that can be plugged in to provide significant value instantaneously for many common types of services. However, you can write your own components implementing these interfaces and include them into Regressr (Use ./regressr.sh -r to build everything)

The comparison report is generated at the end of the compareWith lifecycle and looks like this:

Testing HATEOAS services

HATEOAS (Hypermedia As The Engine Of Application State) is where some classes of RESTful services tend to go to, especially when there are lightweight GUIs in front of them that mimic the conversation which happens to the service. Regressr also supports simple and efficient breadth first traversal of HATEOAS resources for regression testing.

We support this through the use of a new component class called as Continuations.

Let’s imagine you have a shopping cart service exposed at a URL such as /shoppingCart/items.

When issued a GET request on this URL, if the services is modeled on HATEOAS principles the results will be similar to:

{
    "items": [
        "item1": "http://<host>/shoppingCart/items/<item-id>/",
        "item2": "http://<host>/shoppingCart/items/<item-id>/",
        "item3": "http://<host>/shoppingCart/items/<item-id>/"
    ]
}

As you can imagine, these new requests are non-deterministic and cannot be modeled with the help of Regressr’s configuration files, because the data may change over time.

That is where Continuations come in. With continuations, the tester can specify how many new requests should be created programmatically based on the response of a previous service call.

This allows the tester to write a continuation generically that creates new requests based on how many items were present in the response of the /items call.

An example of continuations is here.

What’s Next

  1. Maven plugin that attaches to Regressr that can be used in a CI/CD build.
  2. Jenkins plugin for Regressr report.
  3. Global comparators that can be used to capture global metrics across requests and compare them.

Conclusion and Credits

We have found Regressr to be a very useful regression testing tool for lean and low ceremony engineering teams that wish to minimize effort when it comes to regression testing of their services.

There were many people involved in the design, build and testing of Regressr without which this could not have been possible. Recognizing them: Ashwanth FernandoAlex ZhangRobert EnyediAjinkya Kale and our director Amit Srivastava.

Comments and PRs are welcome with open hands at https://github.com/eBay/regressr.

Cube Planner – Build an Apache Kylin OLAP Cube Efficiently and Intelligently

Life is about carefully calculating daily necessities and the same is true for technology. Frugal people spend money on things that are needed most, while programmers are always seeking to reduce the resources cost of their code. Cube Planner is a new feature created by eBay’s programmers that helps you spend resources on building cost-effective dimension combinations.

Background

Apache Kylin is an open source Distributed Analytics Engine designed to provide multi-dimensional analysis (OLAP) on Hadoop, originally contributed from eBay to the Apache Software Foundation in 2014. Kylin became a Apache top-level project in 2015. With the rapid development of the Apache Kylin community since 2015, Apche Kylin has been deployed by many companies worldwide.

There has also been a crescendo of data analysis applications deployed on the Apache Kylin platform within eBay. There are currently 144 cubes in the production environment of eBay, and this number continues to grow on an average of 10 per month. The largest cube is about 100T. The average query amount is between 40,000 and 50,000 per week day. During last month, the average query latency is 0.49 seconds.

Why is Cube Planner needed

In 2016, eBay Data Services and Solutions (DSS) department began to introduce self-service reports. Data analysts are able to drag and drop dimensions and measures to create their own reports. But the self-service report generally has many dimensions, measures, and it is difficult to define the query pattern, which requires Kylin to pre-calculate a large amount of dimension combinations.

In addition, in companies that reach a certain scale, data providers and data users don’t belong to the same department, and data providers may not be particularly familiar to the query pattern, which leads to a gap between the capability of cube design and actual use. Some combinations of dimensions are pre-calculated, but are not queried, while some are often queried, but are not pre-calculated. These gaps have effect on the resource utilization and query performance of a cube.

We decided to make a cube build planner, Cube Planner, with the goal of efficiently using computing and storage resources to build cost-effective dimension combinations intelligently.

What is Cube Planner?

Cube Planner checks the costs and benefits of each dimension combination, and selects cost-effective dimension combination sets to improve cube build efficiency and query performance. The Cube Planner recommendation algorithm can be run multiple times throughout the life cycle of the cube. For example, after the cube is created for the first time, or after a certain amount of query statistics is collected, the cube is periodically optimized.

Definitions and Formulas

Here are some definitions and formulas used in the Cube Planner algorithm.

  • Dimensions and Measures are the basic concepts of data warehousing.
  • A dimension combination (Cuboid) refers to any combination of multiple dimensions of the data. Suppose the data has three dimensions (time, location, category), then it has eight kinds of dimension combinations: [ ], [time], [location], [category], [time, location], [time, category], [location, category], [time, location, category].
  • The basic dimension combination (Basic Cuboid) is a combination of all dimensions, with the most detailed aggregation information, such as the combination of the dimensions above [time, place, category].
  • The symbol≼ represents a parent-child relationship between dimension combinations,
    w ≼ i, indicates that a query involving dimension combination w can be aggregated from the data of the dimension combination i, the combination w is a child combination, and the combination i is the parent combination. For example, dimension combination w [time, category] and dimension combination i [time, location, category]. Each parent combination may have multiple child combinations, and the basic dimension combination is the parent of all other dimension combinations.
  • Costs of a dimension combination
    The costs of a dimension combination is divided into two aspects. One is the build cost, which is defined by the number of rows of data combined by itself, and the other is the query cost, which is defined by the number of rows scanned by the query. Since the build is often one-time while the query is repetitive, we ignore the build cost and only use the query cost to calculate the cost of a dimension combination. C(i) = Cost of dimension combination(i).
  • Benefits of dimension combination
    The benefits of a dimension combination can be understood as reduced costs, that is, the cost of all queries on this cube that can be reduced by pre-calculating this dimension combination.B(i, S) refers to the benefit of creating a dimension combination (i) in the dimension combination set S, calculated by summing up the benefits of all the subsets of i:B(i,S) = ∑w≼i Bw
    Among the set, the benefit of each subset Bw=max{ C(w) – C(i) ,0}
  • Benefit ratio of dimension combination
    Dimension combination benefit ratio = (cost of the dimension combination/benefit of dimension combination) * 100%:
    Br(i, S) = (B(i, S) / C(i)) * 100%

How to choose a cost-effective dimension combination?

We use greedy algorithms to choose a cost-effective dimension combination, and the process of selection is iterative. For each iteration, the system will go through the candidate set, select a dimension combination with the highest benefit ratios at the current state, and add it to the combination result set that needs to be built.

Let’s now look at a simple example to understand the process of the iterative selection.

Cube Planner Select a dimension combination process example

Let’s assume there is a cube that has the following dimension combination.

Dimension Combination ID Dimension
a time, place, category
b time, place
c time, category

The parent-child relationship between each dimension combination is shown in the following figure. Each dimension combination has a set of numbers (i, j) in addition to ID.

i: the number of rows in this dimension combination

j: the cost of querying the dimension combination

In the following figure, dimension combination a is the basic dimension combination (Basic Cuboid), and the data has 100 rows. In the Kylin cube, the basic dimension combination will be pre-calculated by default. Initially, the result set starts with only the basic dimension combination. Therefore, queries on other dimension combinations need to answered by scanning the basic dimension combination data then doing the aggregation, so the cost of these queries is the number of rows, 100.

First iteration of Cube Planner greedy algorithm

The green dimension combinations are the combinations that have been selected into the result set during the previous rounds, initially with only the basic dimension combination. The red ones are the combinations that may be selected into the result set during the this round. The blue ones indicate that these combinations are selected and will be added to the result set during this round.

What is the benefit of selecting dimension combination b, that is, the reduced cost of the entire cube? If b is selected, the cost of b and its children (d, e, g, h) will be reduced from 100 to 50, so that the benefit ratio of the dimension combination b is (100 – 50) * 5 / 50 = 5.

If you choose the dimension combination C, the benefit ratio is 1.67.

If other dimension combinations are selected, the benefit ratio is as follows. It is obvious that g is a combination with the highest benefit ratio, so dimension combination g is added to the result set during this round.

Second round selection of the Cube Planner

At this point, the result set has become (a, g). At this state, if b is selected, although the children of b are d, e, g and h, only query over b, d and e will be effected because g has been pre-calculated and g has fewer row counts than b. When the query against dimension combination g arrives, the system will still choose g to answer the query. This is exactly why the benefit of each child combinations mentioned above is Bw=max{ QC(w) – QC(i) ,0}. So, the benefit ratio of combination b is 4.

If you pick the combination c, the benefit ratio is 1.34.

The benefit of other combinations is as follows. Combination h is selected eventually and is added to the result set during this round.

When do we stop the iteration selection?

When any one of the following three conditions is fulfilled, the system will stop the selection:

  • The expansion rate of combination dimensions in the result set reaches the established expansion rate of the cube.
  • The efficiency ratio of the selected dimension combination this round is lower than the established minimum value. This indicates that the newly added dimension is not cost-effective, so there is no need to select more dimension combinations.
  • The run time of the selection algorithm reaches the established ceiling.

A Cube Planner algorithm based on weighted query statistics

In the previous algorithm, the benefit weights of other dimension combinations are the same when calculating the benefit that one dimension combination will bring for the entire cube. But in the actual application, the probability of querying dimension combination varies. Therefore, when calculating the benefit, we use the probability of querying as the weight of each dimension combination.

Weighted dimension combination benefit WB(i,S) = ∑w≼i (Bw * Pw)

Pw is the probability of querying this dimension combination. As for those dimension combinations that are not queried, we set the probability to a small non-zero value to avoid overfitting.

As for the newly created cube, the probability is set to the same because there are no query statistics in the system. There is room for improvement. We want to use the platform-level statistics data and some data-mining algorithms to estimate the probability of the querying dimension combinations of the new cube in the future. For the cubes that are already online, we collect the query statistics, and it is easy to calculate the probability of querying each dimension combination.

In fact, the real algorithm based on weighted query statistics is more complex than this. Many practical problems need to be solved, for example, how to estimate the cost of a dimension combination that is not pre-calculated, because Kylin only stores statistics of the pre-calculated dimension combinations.

Greedy algorithm VS Genetic algorithm

Greedy algorithms have some limitations when there are too many dimensions. Because the candidate dimension combination set is very large, the algorithm needs to go through each dimension combination, so it takes some time to run. Thus, we implemented a genetic algorithm as an alternative. Using the iterative and evolutionary model of the gene to simulate the selection of the dimension combination set, the optimal combination set of dimensions is finally evolved (selected) through a number of iterations. There is no refinement. If anyone is interested, we can write a detailed description of the genetic algorithm used in Cube Planner.

Cube Planner one-click optimization

For the cube that is already on the production site, Cube Planner can use the collected query statistics, and intelligently recommend a dimension combination set for pre-calculation based on cost and benefit algorithm. Users can make a one-click optimization based on this recommendation. The following figure is the one-click optimization process. We use two jobs, Optimizejob and Checkpointjob, not only to ensure the optimization process concurrency, but also to ensure the atomicity of the data switching after optimizing. In order to save the resource of optimizing the cube, we will not recalculate all the recommended dimension combinations in Optimizejob, but only compute those new combinations in the recommended dimension combinations set. If some recommended dimension combinations are pre-calculated in the cube, we reuse them.

Application of Cube Planner in eBay

Cube Planner’s development was basically completed at the end of 2016. After a variety of testing, it was formally launched in the production environment in February 2017. After the launch, it was highly praised and unanimously recognized by the users. Here is an example of the use of Cube Planner on eBay’s production line. The customer optimized the cube on April 19th. The optimized cube not only had a significant boost in building performance, but also improved significantly in query performance.

Follow-up development of Cube Planner

In addition to enhancing resource utilization, Cube Planner has another goal of reducing the cube design complexity. At present, the design of a Kylin cube depends on the understanding of data. Cube Planner has made Kylin able to recommend and optimize based on data distribution and data usages, but this is not enough. We want to make the Kylin cube design a smart recommendation system based on query statistics across cubes within same domain.

eBay’s New Homepage

 

Building and releasing a modern homepage for eBay’s hundreds of millions of annual and tens of millions daily users was the team’s greatest challenge to date. We were given the mission to develop a personalized buying experience by mashing merchandised best offers with tailored ads, topping it off with the hottest trends and relevant curated content. Any such virtual front door had to be robust to changes, be easily configurable for any site, and had to load really, really fast!

There’s no second chance to make a first impression, so we had to make it count. But how quickly can you go from 0 to 100 (100% deployed for 100% of traffic to the homepage)?

The team commenced this project in late 2016, and we rushed to get an early flavor available for an internal beta in mid-February. Then, traffic was slowly ramped until reaching a 100% mark for the Big 4 sites (US, UK, DE, AU), delivering it less than three months after the beta, while simultaneously adding more scope and fresh content to the sites.

Our new homepage consists of three main components, spread across three large pools:

  • Node Application – The homepage front-end web server handles incoming user requests and acts as the presentation layer for the application, boasting responsive design.
  • Experience Service – The experience service tier is the primary application server that interacts with the page template and customization service and orchestrates the request handling to various data providers. It performs response post-processing that includes tracking. This layer returns a render-ready format that is consumable by all screen types. The experience layer is invoked by Node for desktop and mobile web traffic and directly by the native applications.
  • Data Provider Service – The data provider service is facade that massages responses from datastores into a semantic format. Providers, such as Merchandise, Ads, Marketing, Collections, and Trends, are interfacing with this service. JSON transformation, done using Jolt, produces the response datagrams.

Additional auxiliary components help keep the system configurable and simple to reorganize or to publish new content to site within seconds.

The Experience service was built on a shared platform, called Answers, developed jointly with the Browse and Search domains.

Due to the hierarchical nature of the system, its complex compositeness (16 unique modules), and its multiple dependencies (9 different services, to be exact), special consideration has to be given to the rollout and deployment process.

We adopted a bottom-up strategy by deploying in the reverse order of dependencies. That is, we start with datastores, data providers, experience services and lastly deploy the node application server. This approach helps us bubble up issues early on in the process and tackle misalignment in an effective manner.

All service-to-service calls are made via clearly defined, internally reviewed APIs. Interface changes are only allowed to be additive to maintain backwards compatibility. This is essential for the incremental bottom-up rollout scheme.

When developing and deploying a new build, as you would delicately change a layer in a house of cards, one must act with great care and much thought about the inherited risks. Test strategies like “mock-first” were used as a working proof-of-concept and allowed testing to begin early on, where caching techniques quickly took the place of the very same mocks in the final product. Unit tests, hand-in-hand with functional and nonfunctional tests, had to be done with rigor, to make regression easy, fast, and resilient to change.

The team always deployed to Staging and Pre-production before conducting any smoke-testing on three boxes in Production, one VM instance per Data Center. Specific networking issues were exposed and quickly remediated, as a result.

Some engineering-related challenges that the team was facing had to do with Experimentation Platform complexities, where a selective pool channel redirect appeared to have unexpected behavior with traffic rerouting, while gradually diverting from the current Feed home page to the new carousel-based one.

Internal Beta testing, done by Buyer Experience teams, and external in-the-wild testing conducted by a crowd-testing company, took place with a zero-traffic experiment. This process was followed by a slow 1% traffic ramp, to help control the increments, as gradual 5% and 25% steps were introduced to the pools, spanning days to weeks between the phases. Each ramp involved multiple daily Load & Performance cycles, while fine-tuning timeout and retry variables.

To summarize our key learnings from the homepage experience:

  • Relying on clearly-defined APIs was vital in achieving an iterative, incremental development and rollout strategy.
  • Co-evolving our logging and monitoring infrastructure and the team’s debugging expertise was essential to achieving an aggressive rollout approach on a distributed stack.
  • Tracking and Experimentation setups, as well as baking in Accessibility, proved to be major variables that should get well-scoped into a product’s development lifecycle and launch timelines.
  • Introducing gradual functional increments along with a design for testability in mind proved to be a valuable quality control measure and allowed absorbing change while moving forward.
  • A meticulous development and release process maintained the associated risks and mitigation actions, allowing the team to meet their milestones, as defined and on time.