eBay Tech Blog http://www.ebaytechblog.com Where e-commerce meets world-class technology Thu, 21 Sep 2017 16:17:51 +0000 en-US hourly 1 https://wordpress.org/?v=4.8.2 eBay’s Font Loading Strategy http://www.ebaytechblog.com/2017/09/21/ebays-font-loading-strategy/ http://www.ebaytechblog.com/2017/09/21/ebays-font-loading-strategy/#respond Thu, 21 Sep 2017 16:17:51 +0000 http://www.ebaytechblog.com/?p=7741
Continue Reading »
The usage of custom fonts in web pages have steadily increased in recent years. As of this writing, 68% of sites in the HTTP Archive use at least one custom font. At eBay, we have been discussing custom web fonts for typography for quite some time, but never really pursued it. The main reason was due to uncertainty in end user experiences from a performance standpoint. But this changed recently.

Our design team made a strong case for a custom font to complement our new branding and, after multiple reviews, we all agreed it makes sense. Now it was on the engineering team to come up with an optimized implementation that not only uses the new custom font, but also tackles the performance overhead. This post gives a quick overview of the strategy we use at eBay to load custom web fonts.

Meet “Market Sans”

Our new custom font is rightly named “Market Sans” to denote affiliation with an online marketplace. As observed in the below image, it adds a subtle difference to typography, but in its entirety makes the whole page elegant and creates a unique eBay branded experience. Check out our desktop homepage, where “Market Sans” has been deployed.

Custom Font vs. System Font


It is well known and documented that custom web fonts come with a cost, and that is performance. They often delay rendering of text (critical to any web page) until the font is downloaded. A recent post from Akamai gives an excellent overview of the problems associated with custom fonts. To summarize there are two major issues, and it varies among browsers:

  • FOUT — Flash of Unstyled Text
  • FOIT — Flash of Invisible Text

As expected, the design and product teams were not happy with the compromise. Yes, custom fonts create a unique branded experience, but not at the cost of delaying the same experience. Additionally, from an e-commerce perspective, custom fonts are a good enhancement and not an absolute necessity. System fonts can still provide a compelling typography. So it was up to the engineering team to come up with an efficient font loading strategy with minimal tradeoffs.


Our strategy was pretty simple — avoid FOUT and FOIT: Use the custom font if it is already available (meaning downloaded and cached), else use default system fonts.

Fortunately, there is a CSS Font-Rendering proposal that adds a new @font-display descriptor named font-display. Using font-display, developers can specify how a font is displayed, based on whether and when it’s downloaded and ready to use. There are many values for font-display (checkout this quick video to understand them), and the one that maps to our strategy would be ‘font-display: optional’.

Unfortunately, the adoption of font-display among browsers is not widespread, as it is relatively new. So for now, until the adoption becomes mainstream, we came up with a solution that leverages the localStorage, FontFaceSet APIs and the Font Face Observer utility (as a backup if the FontFaceSet API is not present).

The below illustration gives an overview of how it works:

Flow diagram for Font Loader

To summarize,

  • When users visit an eBay web page, we add a tiny inline CSS and JavaScript snippet in the response HTML <head> tag. We also include a small JavaScript snippet in the footer HTML that incorporates the font loader logic.
  • The JavaScript in the <head> checks the localStorage if a font flag is set. If the flag is set, it immediately adds the CSS class to document root to enable the custom font. The page renders with the “Market Sans” custom font. This is the happy path.
  • The JavaScript in the footer again checks the localStorage for a font flag. If it is NOT set, it calls the font loader function on the document load event.
  • The font loader function loads (downloads) the custom fonts either using the built-in FontFaceSet API (if present) or through the Font Face Observer utility. The Font Face Observer is asynchronously downloaded on demand.
  • Once the font download is complete, a font flag is set on the localStorage. One thing to note — even though the font flag is set, we do not update the current view with the custom font. It is done on the next page visit with Step 2 above kicking in.

We have open sourced this module as ebay-font. It is a small utility that works along with other eBay open source modules Skin, Marko, and Lasso, as well as in standalone mode. We hope others can benefit from it.


There are a couple of tradeoffs with this strategy:

  1. First time users: A new user visiting eBay for the first time will get the system font. On navigation or subsequent visits they will get our custom font. This is acceptable, as we have to start the custom font at some point, and we will start it from the second visit of a new user.
  2. Private or incognito mode: When a user starts a new browsing session in private or incognito mode, they get the system font initially. But subsequent browsing in the same session will render the custom font (Safari is an exception, but it is getting fixed). We do not have metrics on how many users fall under this category, but this is something we have to live with.
  3. Cache eviction: In certain rare scenarios we observed that the custom font entity in the browser cache is evicted, but the localStorage entry is still present. Probably browsers clean up cache more frequently than localStorage. In these scenarios, users will experience a FOIT or FOUT based on the browser. This is more of an edge case and hence less concerning.

As a team we agreed that these tradeoffs are acceptable, considering the unpredictable behavior that comes with default font loading.


Custom web fonts do add value to the overall user experience, but it should not be at the cost of delaying the critical content. Each organization should have a font loading strategy based on their application needs. The new built-in CSS property ‘font-display’ makes it very easy to choose one. We should start using it right away, even if the support is minimal and even if there is already an in-house implementation.

Huge thanks to my colleague Raja Ramu for partnering on this effort and help in open sourcing the module ebay-font.

—  Senthil Padmanabhan

http://www.ebaytechblog.com/2017/09/21/ebays-font-loading-strategy/feed/ 0
Dissect Helps Engineers Visualize and Debug Distributed Applications http://www.ebaytechblog.com/2017/09/19/dissect-helps-engineers-visualize-and-debug-distributed-applications/ http://www.ebaytechblog.com/2017/09/19/dissect-helps-engineers-visualize-and-debug-distributed-applications/#comments Tue, 19 Sep 2017 17:22:15 +0000 http://www.ebaytechblog.com/?p=7398
Continue Reading »
In a natural evolution from a services architecture, we at eBay have adopted microservices to help us drive faster and more productive development cycles. eBay’s sophisticated application log infrastructure helped us for the most part, but with the use of microservices, we need to build the next-level infrastructure to visualize and debug microservices and to achieve these goals:

  • Make debugging more efficient by reducing time to troubleshoot and debug an API without understanding its upstreams and downstreams.
  • Provide clarity and transparency for API developers to understand the caller and callee.
  • Treat instrumentation as a primary data point. Logs are great, but technically, instrumentation is a special kind of log.
  • Last but not least, the ability to query and visualize the service chain in near real time.

To help achieve these goals, we built Dissect. Dissect is an eBay distributed tracing solution, based on Google’s Dapper, that helps engineers visualize and debug distributed Java and Node.js applications. It identifies how to instrument services and which services to instrument.

How Dissect Works



Dissect Instrumentation library intercepts the incoming request and instruments the Request.


Dissect records the Instrumented Span Data based on Sampling.


Dissect provides a simple waterfall view to visualize the Traces.


The recorded Traces can be analyzed for understanding release-over-release performance enhancements, etc.


Dissect provides the following benefits:

  • Bottleneck Detection. Dissect helps us understand the depth of calls. On most occasions, one single microservice does not cause a bottleneck, and multiple issues can be either up in the chain or downstream calls.
  • Developer Productivity. Dissect helps API developers produce a complete call graph and increases the room for optimization of the API.
  • Highly Scaleable. Dissect supports eBay’s large volumes of transactions and scales up to billions of requests for eBay’s use cases.
  • Polyglot. Dissect includes SDKs for Java and Node.js libraries.

Concepts & Terminologies

The root of the transaction generates CorrelationId and every request generates RequestId.
This figure illustrates the sample Request Flow of a distributed service chaining based on the user’s activity with eBay.

  • CorrelationId – A unique 64-bit identifier that identifies the transaction and propagates across the whole service life chain
  • RequestId – A unique 64-bit identifier generated and propagated for every Request part.
  • Response – Standard HTTP Response with Status, Duration.


Dissect borrows the ideas and inspiration from OpenTracing and follows the terminologies:

  • A trace is a unit of work performed in a server-bound transaction. An HTTP Request is a trace.
  • A span is a unit of work performed in the context of a Trace. Executing an HTTP Client Call is a span. Another example is an outbound HTTP client request in the context of a trace.

Data Construct

Trace and span are defined by this data construct:

Name Type Description
Name String Name of the Trace transaction. (Example: Http Request Path)
ID UUID Unique ID created or generated. (Example: 64bit UUID)
TraceId UUID CorrelationId ID propagates across the call chain. Originated when the transaction started. (Example: 64bit UUID)
ParentId UUID ID of the Transaction
Kind String Server (or) Client.
Status String Status of the Trace. (Example: 200 Ok)
StartTime Number Starting time of the Trace. (Example: Start Time in milli-seconds.)
Duration Number Time taken to complete the Span transaction.
edgeAttribute Object
edgeAttribute.Method String Type of the Transaction. (Example: HTTP Request Method.)
edgeAttribute.Caller String Caller (or) Callee originated the Transaction. (Example: HTTP Request Connection Address)
edgeAttribute.Attributes Map<String, String> Custom Key, Value Map of string values. (Example: POOL, Unique LOG ID that served the request.)
Example Dataset

The following example demonstrates how the TraceId, ParentId, and Id are propagated across different systems.

id traceId PARENTID Kind
1 1 server Server
2 1 1 Client
2 1 1 Server
3 1 2 Client
4 1 2 Client
3 1 2 Server
4 1 2 Server

The following figure illustrates the Dissect path. The figure identifies various distributed Service Nodes. Circle nodes are Server Nodes, square nodes are identified as Clients.


Custom Data Attributes

Along with the standard values described above, Dissect captures custom attributes for quicker debuggability. These custom attributes provide enough information to quickly identify the source of the transaction.  The following table lists the custom key information captured.

Name Description
POOL eBay cloud deployment follows the concept of Pool and Instances. A pool is a cluster of instances serving an application across different regions.
Unique LOG ID Log ID represents the single unit of work. This Log ID is the primary index to identify a request.

Even though these examples are tailored for HTTP protocols, the implementation is generic enough to adapt to RPCs. After all it is a simple Java and Node.js API.


Dissect, by default, allows several modes of sampling strategies:

  1. Sampling is implemented to collect Sampled Traces across the applications. Also down-stream applications can set up higher sampling rates to collect Traces based on conditions like Failures, Status Codes, and Latency in ms.
  2. Dissect also allows a sliding window to dissect the requests for every X minutes in X hours. This allows applications to automatically dissect the requests. This approach helps in getting a good sampling rather than a % of sampling.
  3. Finally, you can use a custom sampling strategy. At eBay we have developed extensible A/B strategies. Applications using an A/B strategy can leverage the same as a sampling strategy for Dissect, also.

Note: The sampling and experimentation platforms have been a part of the eBay platform for some years, and applications can sample based on any of the above strategies and use Dissect to trace the requests.


Dissect provides default set of SDKs for developers to quickly boot-strap integration with the Request Stack Trace.

Instrumentation SDKs

Framework Supported Model

The following SDKs support instrumenting your code with Trace. All the SDKs listed below support Java and Node.js implementations.

  • Interceptors/Filters
    • HTTP request Interceptors (or) filter to intercept all the incomming HTTP request and create a Trace.
  • Client Handler
    • Client Handler wraps the outbound HTTP client calls and creates a Span transaction for the Trace context.
    • Client Handler also maintains lifecycle of the Span.
  • JMX Interfaces
    • Request Stack Trace provides support for on-demand sampling using JMX interfaces.
    • Provides configurations to setup or update Reporters.

Collector aka Reporter SDKs

The collector, aka the Reporter SDK, is responsible for transporting the Dissected requests and reporting to the back-end infrastructure.

  • Dissect Reporters
    • Reporter is an interface that helps to ship the collected Traces out of the system to a central storage. The default reporter is a simple log line reporter.
    • Dissect out of the box supports Kafka Messaging as default reporter.
    • Dissect uses the Avro entity as the data format to transport data.
    • ElasticSearch is used as the warm datastore to ingest warm data consumed from Kafka streams.

    The implementation at Ebay uses Rheos as the datastore and Pronto scaleable ES as the warm storage to query and aggregate results.

Visualizing Traces

We use Kibana dashboards to surface all the collected traces, and a simple waterfall view to visualize a selected Trace.

Aggregation Traces

Visualizing every single Trace is important, but it is not practical to drill through every single Trace. We are evaluating Druid to provide richer and faster aggregation across the collected Traces.

OpenSource Alternatives

When we evaluated the Open Source alternatives to instrument the libraries, we found it compelling to use the Open Source libraries and the polyglot support they offer:

 Alternative Polyglot Support Comments
Apache HTrace  No Incubator status, and supports only JVM
OpenTracing Yes Too new at the time I started this effort. We are evaluating OpenTracing for the next iteration.
Spring Sleuth No Too tightly ingrained towards the Spring eco-system. Obviously Spring alone.
Wingtips No  Too new, and only tailored HTTP Request Filters.
Zipkin Yes Promising, but most of the constructs are RPC driven. Zipkin Brave was good, but we would have had to do lots of refactoring to adapt it to the existing Spec.

After evaluating the Open Source alternatives, we decided to go with a standard implementation that suits the standard spec that is already defined and flowing all the sub-systems. A few reasons we made the instrumentation libraries separate implementations include:

  1. Instrumentation needed to confirm to [wwww.ebay.com][eBay] instrumentation spec:
  • Didn’t want to adapt a library and try to retrofit to the standard. Rather, we will come up with a standard instrumentation library that adapts to the existing spec and the data flow.
  • Ability for the instrumentation modules to have eBay internal monitoring and operatibility built in. These are plugins that can be injected and operationalized to the eBay infrastructure.
  1. The ability to instrument based on a custom sampling strategy
  • eBay uses sophisticated and robust A/B sampling strategies, and we wanted to leverage this strategy and provide the ability to supply it as a custom sampling strategy.
  1. Reporters
  • Reporters need to emit enough information about monitoring and operatibility in the eBay infrastructure. Dissect internal reporters are built with possible hooks and are integrated with the eBay infrastructure.
  1. We wanted to take ideas from OpenTracing on semantics and other conventions. OpenTracing was too new for us to pilot a few production-level applications, and we didn’t want external dependencies to derail our timelines.
  2. Lastly, building a Instrumentation library is easy and quick as long as the standards are defined clearly.

Our goal for designing Dissect is to fill the a Service Tracing need. eBay has sophisticated log, monitoring, and event systems. Dissect complements the missing piece in the puzzle, Service Tracing.

Status of Dissect

Dissect has been widely accepted as a need of the hour inside eBay. Currently, checkout microservices have onboarded with Dissect, and they are extremely happy to see the results. Checkout is a super critical flow for eBay, which explains how important the product is.


Large projects and initiatives aren’t successful without the help of many people. Thanks to:


  1. Dapper
http://www.ebaytechblog.com/2017/09/19/dissect-helps-engineers-visualize-and-debug-distributed-applications/feed/ 1
Introducing Regressr – An Open Source Command Line Tool to Regression Test HTTP Services http://www.ebaytechblog.com/2017/08/10/introducing-regressr-an-open-source-command-line-tool-to-regression-test-http-services/ Thu, 10 Aug 2017 16:36:37 +0000 http://www.ebaytechblog.com/?p=7463
Continue Reading »

In the Artificial Intelligence-Human Language Technologies team at eBay, we work on software that powers eBay’s conversational bot, ShopBot. We ship software daily to production that makes our bot intelligent, smarter, and more human. As a crucial part of this effort, we have to make sure any regressions are caught quickly and fixed to help keep our customers doing what they love – making purchases on ShopBot.

ShopBot’s backend is built on a polyglot suite of Scala, Java, and Python-based microservices that work in unison to provide ShopBot’s functionality. Hence, many of the crucial services need to be regression tested before we can release a new version to production.

To help with that effort, we built and are open sourcing our regression testing tool, Regressr.

Why Regressr

We looked at the common approaches that are widely used in the industry today to build an automated regression testing suite. In no particular order, they are listed below.

  • Comprehensive JUnit suite that calls two versions (old and new) of the service and compares the minutiae of the responses – JSON elements, their values and the like.
  • Using SOAP UI’s Test Runner to run functional tests and catch regressions as a result of catching functionality failures.
  • No regression tests. Wait for the front-end to fail as a result of front-end regression tests in dev or test, and trace the failure to the backend.

We also looked at Diffy and were inspired by how simple it was to use for catching regressions.

We had some very unique requirements for testing eBay ShopBot and found out that none of these tools provided the features we wanted:

  1. Super-low ceremony: Must quickly be able to productionize and gain significant value without too much coding or process.
  2. Low conceptual surface area: An engineer should be able to grok what the tool does and use it quickly without going through bricks of manuals and frameworks.
  3. Configurability of comparisons: We want to able to specify how the response should be compared. Do we want to ignore JSON values? Do we want to ignore certain elements? What about comparing floating point numbers, precision, etc.?
  4. Compare at high levels of abstraction: We want to capture high-level metrics of the responses and then perform regression testing on them. For example, we would like to be able to say the number of search results in this response were 5 and then use that number to compare against future deployments.
  5. Low maintenance overhead: We want maintenance of the regression suite to have low or negligible coding effort. Once every deployment is approved for release, we just want the suite to automatically capture the current state of the deployment and use that as a reference for future deployments.
  6. CI/CD Integration: Finally, we wanted this suite to be hooked into our CI/CD build.

We built Regressr specifically to solve these requirements, so that the team can focus on the important stuff, which is serving great experiences and value to our customers who use ShopBot.

Regressr is a Scala-based command line tool that tests HTTP services, plain and simple. We built it to be really good at what it does. With Regressr, you can use the out-of-the-box components to get a basic regression test for your service up and running quickly and gain instantaneous value, while coding regression tests that will cover close to 100% of the functionality in a more delayed fashion as time permits. Finally, Regressr doesn’t even need the two services to be up and running at the same time, as it uses a datastore to capture the detail of the baseline.

Regressr works in two modes:

  1. Record – Use Record when you want to capture the current state of a deployment to be compared as the baseline for later deployments. A strategy file is specified that contains the specifics of what needs to be recorded.
  2. Compare/Replay – Compares the current state of a deployment with a baseline and generates a comparison report.

The image below captures what is done in these two flows.

The Strategy File

The strategy file is the configuration that drives what happens during a record and a compareWith execution.

An example strategy file that posts two requests and performs regression testing is specified below:

  baseURL       : http://localhost:9882/endpoint

  Content-Type    : application/json


  - requestName: say_hello
    path: /say_hello
    method: GET
    recorder: org.ebayopensource.regression.internal.components.recorder.SimpleHTTPJSONRecorder
    comparator: org.ebayopensource.regression.internal.components.comparator.SimpleHTTPJsonComparator

  - requestName: shop for a pair of shoes
    path: /shopping
    method: POST
    requestBuilder: org.ebayopensource.regression.example.ExampleRequestBuilder
      conversationId     : 12345
      keyword            : Show me a pair of shoes
      mission_start      : yes
    recorder: org.ebayopensource.regression.internal.components.recorder.SimpleHTTPJSONRecorder
    comparator: org.ebayopensource.regression.internal.components.comparator.SimpleHTTPJsonComparator

  - requestName: say goodbye
    path: /goodbye
    method: POST
    requestBuilder: org.ebayopensource.regression.internal.components.requestBuilder.StringHTTPBuilder
      payload            : '{"mission" : "12345", "keyword" : "good bye", "mission_start" : "no" }'
    recorder: org.ebayopensource.regression.internal.components.recorder.SimpleHTTPJSONRecorder
    comparator: org.ebayopensource.regression.internal.components.comparator.SimpleHTTPJsonComparator

The Components

The important parts of the strategy file are the different components, RequestBuilder, Recorder, and Comparator.

RequestBuilder is used to specify how the request should be built in case of a POST or a PUT request.

The interface for RequestBuilder accepts a Map of Strings and outputs the payload that will be sent in the request.

abstract class RequestPayloadBuilder {

  def buildRequest(dataInput: Map[String, String]): Try[String]


Recorder is used to specify what parts of the response should be recorded for future comparison. Regressr injects all parts of the response to the Recorder during this time.

The interface for Recorder accepts a List of HTTPResponses (most of the time this will be one) and return a RequestRecordingEntry.

The RequestRecordingEntry is a holder for a value that will be recorded in Regressr’s datastore. The response code can be stored in a RequestRecordingEntry. Similarly a JSON response can be stored in a RequestRecordingEntry. You can also do some computation on the JSON and store a number (like the number of search results).

The interface for Recorder looks like the below.

protected def record(responses: Seq[HTTPResponse]) : Try[RequestRecordingEntry]

Finally, the Comparator is used to specify the details of comparison during the compareWith mode. How do you want to compare JSON’s? What about strings?

The interface for Comparator looks like the below. It accepts both the recorded RequestRecordingEntry and the current one and returns a List of CompareMessages which will be included in the comparison report.

abstract class Comparator {

  def compare(recorded: RequestRecordingEntry, replayed: RequestRecordingEntry): Try[Seq[CompareMessage]]


Regressr comes with out-of-the-box components that can be plugged in to provide significant value instantaneously for many common types of services. However, you can write your own components implementing these interfaces and include them into Regressr (Use ./regressr.sh -r to build everything)

The comparison report is generated at the end of the compareWith lifecycle and looks like this:

Testing HATEOAS services

HATEOAS (Hypermedia As The Engine Of Application State) is where some classes of RESTful services tend to go to, especially when there are lightweight GUIs in front of them that mimic the conversation which happens to the service. Regressr also supports simple and efficient breadth first traversal of HATEOAS resources for regression testing.

We support this through the use of a new component class called as Continuations.

Let’s imagine you have a shopping cart service exposed at a URL such as /shoppingCart/items.

When issued a GET request on this URL, if the services is modeled on HATEOAS principles the results will be similar to:

    "items": [
        "item1": "http://<host>/shoppingCart/items/<item-id>/",
        "item2": "http://<host>/shoppingCart/items/<item-id>/",
        "item3": "http://<host>/shoppingCart/items/<item-id>/"

As you can imagine, these new requests are non-deterministic and cannot be modeled with the help of Regressr’s configuration files, because the data may change over time.

That is where Continuations come in. With continuations, the tester can specify how many new requests should be created programmatically based on the response of a previous service call.

This allows the tester to write a continuation generically that creates new requests based on how many items were present in the response of the /items call.

An example of continuations is here.

What’s Next

  1. Maven plugin that attaches to Regressr that can be used in a CI/CD build.
  2. Jenkins plugin for Regressr report.
  3. Global comparators that can be used to capture global metrics across requests and compare them.

Conclusion and Credits

We have found Regressr to be a very useful regression testing tool for lean and low ceremony engineering teams that wish to minimize effort when it comes to regression testing of their services.

There were many people involved in the design, build and testing of Regressr without which this could not have been possible. Recognizing them: Ashwanth FernandoAlex ZhangRobert EnyediAjinkya Kale and our director Amit Srivastava.

Comments and PRs are welcome with open hands at https://github.com/eBay/regressr.

Cube Planner – Build an Apache Kylin OLAP Cube Efficiently and Intelligently http://www.ebaytechblog.com/2017/08/02/cube-planner-build-an-apache-kylin-olap-cube-efficiently-and-intelligently/ http://www.ebaytechblog.com/2017/08/02/cube-planner-build-an-apache-kylin-olap-cube-efficiently-and-intelligently/#comments Wed, 02 Aug 2017 17:33:43 +0000 http://www.ebaytechblog.com/?p=7422
Continue Reading »
Life is about carefully calculating daily necessities and the same is true for technology. Frugal people spend money on things that are needed most, while programmers are always seeking to reduce the resources cost of their code. Cube Planner is a new feature created by eBay’s programmers that helps you spend resources on building cost-effective dimension combinations.


Apache Kylin is an open source Distributed Analytics Engine designed to provide multi-dimensional analysis (OLAP) on Hadoop, originally contributed from eBay to the Apache Software Foundation in 2014. Kylin became a Apache top-level project in 2015. With the rapid development of the Apache Kylin community since 2015, Apche Kylin has been deployed by many companies worldwide.

There has also been a crescendo of data analysis applications deployed on the Apache Kylin platform within eBay. There are currently 144 cubes in the production environment of eBay, and this number continues to grow on an average of 10 per month. The largest cube is about 100T. The average query amount is between 40,000 and 50,000 per week day. During last month, the average query latency is 0.49 seconds.

Why is Cube Planner needed

In 2016, eBay Data Services and Solutions (DSS) department began to introduce self-service reports. Data analysts are able to drag and drop dimensions and measures to create their own reports. But the self-service report generally has many dimensions, measures, and it is difficult to define the query pattern, which requires Kylin to pre-calculate a large amount of dimension combinations.

In addition, in companies that reach a certain scale, data providers and data users don’t belong to the same department, and data providers may not be particularly familiar to the query pattern, which leads to a gap between the capability of cube design and actual use. Some combinations of dimensions are pre-calculated, but are not queried, while some are often queried, but are not pre-calculated. These gaps have effect on the resource utilization and query performance of a cube.

We decided to make a cube build planner, Cube Planner, with the goal of efficiently using computing and storage resources to build cost-effective dimension combinations intelligently.

What is Cube Planner?

Cube Planner checks the costs and benefits of each dimension combination, and selects cost-effective dimension combination sets to improve cube build efficiency and query performance. The Cube Planner recommendation algorithm can be run multiple times throughout the life cycle of the cube. For example, after the cube is created for the first time, or after a certain amount of query statistics is collected, the cube is periodically optimized.

Definitions and Formulas

Here are some definitions and formulas used in the Cube Planner algorithm.

  • Dimensions and Measures are the basic concepts of data warehousing.
  • A dimension combination (Cuboid) refers to any combination of multiple dimensions of the data. Suppose the data has three dimensions (time, location, category), then it has eight kinds of dimension combinations: [ ], [time], [location], [category], [time, location], [time, category], [location, category], [time, location, category].
  • The basic dimension combination (Basic Cuboid) is a combination of all dimensions, with the most detailed aggregation information, such as the combination of the dimensions above [time, place, category].
  • The symbol≼ represents a parent-child relationship between dimension combinations,
    w ≼ i, indicates that a query involving dimension combination w can be aggregated from the data of the dimension combination i, the combination w is a child combination, and the combination i is the parent combination. For example, dimension combination w [time, category] and dimension combination i [time, location, category]. Each parent combination may have multiple child combinations, and the basic dimension combination is the parent of all other dimension combinations.
  • Costs of a dimension combination
    The costs of a dimension combination is divided into two aspects. One is the build cost, which is defined by the number of rows of data combined by itself, and the other is the query cost, which is defined by the number of rows scanned by the query. Since the build is often one-time while the query is repetitive, we ignore the build cost and only use the query cost to calculate the cost of a dimension combination. C(i) = Cost of dimension combination(i).
  • Benefits of dimension combination
    The benefits of a dimension combination can be understood as reduced costs, that is, the cost of all queries on this cube that can be reduced by pre-calculating this dimension combination.B(i, S) refers to the benefit of creating a dimension combination (i) in the dimension combination set S, calculated by summing up the benefits of all the subsets of i:B(i,S) = ∑w≼i Bw
    Among the set, the benefit of each subset Bw=max{ C(w) – C(i) ,0}
  • Benefit ratio of dimension combination
    Dimension combination benefit ratio = (cost of the dimension combination/benefit of dimension combination) * 100%:
    Br(i, S) = (B(i, S) / C(i)) * 100%

How to choose a cost-effective dimension combination?

We use greedy algorithms to choose a cost-effective dimension combination, and the process of selection is iterative. For each iteration, the system will go through the candidate set, select a dimension combination with the highest benefit ratios at the current state, and add it to the combination result set that needs to be built.

Let’s now look at a simple example to understand the process of the iterative selection.

Cube Planner Select a dimension combination process example

Let’s assume there is a cube that has the following dimension combination.

Dimension Combination ID Dimension
a time, place, category
b time, place
c time, category

The parent-child relationship between each dimension combination is shown in the following figure. Each dimension combination has a set of numbers (i, j) in addition to ID.

i: the number of rows in this dimension combination

j: the cost of querying the dimension combination

In the following figure, dimension combination a is the basic dimension combination (Basic Cuboid), and the data has 100 rows. In the Kylin cube, the basic dimension combination will be pre-calculated by default. Initially, the result set starts with only the basic dimension combination. Therefore, queries on other dimension combinations need to answered by scanning the basic dimension combination data then doing the aggregation, so the cost of these queries is the number of rows, 100.

First iteration of Cube Planner greedy algorithm

The green dimension combinations are the combinations that have been selected into the result set during the previous rounds, initially with only the basic dimension combination. The red ones are the combinations that may be selected into the result set during the this round. The blue ones indicate that these combinations are selected and will be added to the result set during this round.

What is the benefit of selecting dimension combination b, that is, the reduced cost of the entire cube? If b is selected, the cost of b and its children (d, e, g, h) will be reduced from 100 to 50, so that the benefit ratio of the dimension combination b is (100 – 50) * 5 / 50 = 5.

If you choose the dimension combination C, the benefit ratio is 1.67.

If other dimension combinations are selected, the benefit ratio is as follows. It is obvious that g is a combination with the highest benefit ratio, so dimension combination g is added to the result set during this round.

Second round selection of the Cube Planner

At this point, the result set has become (a, g). At this state, if b is selected, although the children of b are d, e, g and h, only query over b, d and e will be effected because g has been pre-calculated and g has fewer row counts than b. When the query against dimension combination g arrives, the system will still choose g to answer the query. This is exactly why the benefit of each child combinations mentioned above is Bw=max{ QC(w) – QC(i) ,0}. So, the benefit ratio of combination b is 4.

If you pick the combination c, the benefit ratio is 1.34.

The benefit of other combinations is as follows. Combination h is selected eventually and is added to the result set during this round.

When do we stop the iteration selection?

When any one of the following three conditions is fulfilled, the system will stop the selection:

  • The expansion rate of combination dimensions in the result set reaches the established expansion rate of the cube.
  • The efficiency ratio of the selected dimension combination this round is lower than the established minimum value. This indicates that the newly added dimension is not cost-effective, so there is no need to select more dimension combinations.
  • The run time of the selection algorithm reaches the established ceiling.

A Cube Planner algorithm based on weighted query statistics

In the previous algorithm, the benefit weights of other dimension combinations are the same when calculating the benefit that one dimension combination will bring for the entire cube. But in the actual application, the probability of querying dimension combination varies. Therefore, when calculating the benefit, we use the probability of querying as the weight of each dimension combination.

Weighted dimension combination benefit WB(i,S) = ∑w≼i (Bw * Pw)

Pw is the probability of querying this dimension combination. As for those dimension combinations that are not queried, we set the probability to a small non-zero value to avoid overfitting.

As for the newly created cube, the probability is set to the same because there are no query statistics in the system. There is room for improvement. We want to use the platform-level statistics data and some data-mining algorithms to estimate the probability of the querying dimension combinations of the new cube in the future. For the cubes that are already online, we collect the query statistics, and it is easy to calculate the probability of querying each dimension combination.

In fact, the real algorithm based on weighted query statistics is more complex than this. Many practical problems need to be solved, for example, how to estimate the cost of a dimension combination that is not pre-calculated, because Kylin only stores statistics of the pre-calculated dimension combinations.

Greedy algorithm VS Genetic algorithm

Greedy algorithms have some limitations when there are too many dimensions. Because the candidate dimension combination set is very large, the algorithm needs to go through each dimension combination, so it takes some time to run. Thus, we implemented a genetic algorithm as an alternative. Using the iterative and evolutionary model of the gene to simulate the selection of the dimension combination set, the optimal combination set of dimensions is finally evolved (selected) through a number of iterations. There is no refinement. If anyone is interested, we can write a detailed description of the genetic algorithm used in Cube Planner.

Cube Planner one-click optimization

For the cube that is already on the production site, Cube Planner can use the collected query statistics, and intelligently recommend a dimension combination set for pre-calculation based on cost and benefit algorithm. Users can make a one-click optimization based on this recommendation. The following figure is the one-click optimization process. We use two jobs, Optimizejob and Checkpointjob, not only to ensure the optimization process concurrency, but also to ensure the atomicity of the data switching after optimizing. In order to save the resource of optimizing the cube, we will not recalculate all the recommended dimension combinations in Optimizejob, but only compute those new combinations in the recommended dimension combinations set. If some recommended dimension combinations are pre-calculated in the cube, we reuse them.

Application of Cube Planner in eBay

Cube Planner’s development was basically completed at the end of 2016. After a variety of testing, it was formally launched in the production environment in February 2017. After the launch, it was highly praised and unanimously recognized by the users. Here is an example of the use of Cube Planner on eBay’s production line. The customer optimized the cube on April 19th. The optimized cube not only had a significant boost in building performance, but also improved significantly in query performance.

Follow-up development of Cube Planner

In addition to enhancing resource utilization, Cube Planner has another goal of reducing the cube design complexity. At present, the design of a Kylin cube depends on the understanding of data. Cube Planner has made Kylin able to recommend and optimize based on data distribution and data usages, but this is not enough. We want to make the Kylin cube design a smart recommendation system based on query statistics across cubes within same domain.

http://www.ebaytechblog.com/2017/08/02/cube-planner-build-an-apache-kylin-olap-cube-efficiently-and-intelligently/feed/ 5
eBay’s New Homepage http://www.ebaytechblog.com/2017/07/26/ebays-new-homepage/ Wed, 26 Jul 2017 16:53:36 +0000 http://www.ebaytechblog.com/?p=7486
Continue Reading »

Building and releasing a modern homepage for eBay’s hundreds of millions of annual and tens of millions daily users was the team’s greatest challenge to date. We were given the mission to develop a personalized buying experience by mashing merchandised best offers with tailored ads, topping it off with the hottest trends and relevant curated content. Any such virtual front door had to be robust to changes, be easily configurable for any site, and had to load really, really fast!

There’s no second chance to make a first impression, so we had to make it count. But how quickly can you go from 0 to 100 (100% deployed for 100% of traffic to the homepage)?

The team commenced this project in late 2016, and we rushed to get an early flavor available for an internal beta in mid-February. Then, traffic was slowly ramped until reaching a 100% mark for the Big 4 sites (US, UK, DE, AU), delivering it less than three months after the beta, while simultaneously adding more scope and fresh content to the sites.

Our new homepage consists of three main components, spread across three large pools:

  • Node Application – The homepage front-end web server handles incoming user requests and acts as the presentation layer for the application, boasting responsive design.
  • Experience Service – The experience service tier is the primary application server that interacts with the page template and customization service and orchestrates the request handling to various data providers. It performs response post-processing that includes tracking. This layer returns a render-ready format that is consumable by all screen types. The experience layer is invoked by Node for desktop and mobile web traffic and directly by the native applications.
  • Data Provider Service – The data provider service is facade that massages responses from datastores into a semantic format. Providers, such as Merchandise, Ads, Marketing, Collections, and Trends, are interfacing with this service. JSON transformation, done using Jolt, produces the response datagrams.

Additional auxiliary components help keep the system configurable and simple to reorganize or to publish new content to site within seconds.

The Experience service was built on a shared platform, called Answers, developed jointly with the Browse and Search domains.

Due to the hierarchical nature of the system, its complex compositeness (16 unique modules), and its multiple dependencies (9 different services, to be exact), special consideration has to be given to the rollout and deployment process.

We adopted a bottom-up strategy by deploying in the reverse order of dependencies. That is, we start with datastores, data providers, experience services and lastly deploy the node application server. This approach helps us bubble up issues early on in the process and tackle misalignment in an effective manner.

All service-to-service calls are made via clearly defined, internally reviewed APIs. Interface changes are only allowed to be additive to maintain backwards compatibility. This is essential for the incremental bottom-up rollout scheme.

When developing and deploying a new build, as you would delicately change a layer in a house of cards, one must act with great care and much thought about the inherited risks. Test strategies like “mock-first” were used as a working proof-of-concept and allowed testing to begin early on, where caching techniques quickly took the place of the very same mocks in the final product. Unit tests, hand-in-hand with functional and nonfunctional tests, had to be done with rigor, to make regression easy, fast, and resilient to change.

The team always deployed to Staging and Pre-production before conducting any smoke-testing on three boxes in Production, one VM instance per Data Center. Specific networking issues were exposed and quickly remediated, as a result.

Some engineering-related challenges that the team was facing had to do with Experimentation Platform complexities, where a selective pool channel redirect appeared to have unexpected behavior with traffic rerouting, while gradually diverting from the current Feed home page to the new carousel-based one.

Internal Beta testing, done by Buyer Experience teams, and external in-the-wild testing conducted by a crowd-testing company, took place with a zero-traffic experiment. This process was followed by a slow 1% traffic ramp, to help control the increments, as gradual 5% and 25% steps were introduced to the pools, spanning days to weeks between the phases. Each ramp involved multiple daily Load & Performance cycles, while fine-tuning timeout and retry variables.

To summarize our key learnings from the homepage experience:

  • Relying on clearly-defined APIs was vital in achieving an iterative, incremental development and rollout strategy.
  • Co-evolving our logging and monitoring infrastructure and the team’s debugging expertise was essential to achieving an aggressive rollout approach on a distributed stack.
  • Tracking and Experimentation setups, as well as baking in Accessibility, proved to be major variables that should get well-scoped into a product’s development lifecycle and launch timelines.
  • Introducing gradual functional increments along with a design for testability in mind proved to be a valuable quality control measure and allowed absorbing change while moving forward.
  • A meticulous development and release process maintained the associated risks and mitigation actions, allowing the team to meet their milestones, as defined and on time.
Enhancing the User Experience of the Hadoop Ecosystem http://www.ebaytechblog.com/2017/05/12/enhancing-the-user-experience-of-the-hadoop-ecosystem/ http://www.ebaytechblog.com/2017/05/12/enhancing-the-user-experience-of-the-hadoop-ecosystem/#comments Fri, 12 May 2017 19:24:55 +0000 http://www.ebaytechblog.com/?p=7223
Continue Reading »

At eBay, we have multiple large, multi-tenant clusters. Each of these clusters stores hundreds of petabytes of data. These clusters offer tens of thousands of cores to run computations on the data. We have thousands of internal users who use Hadoop in their roles, including data analysts, data scientists, engineers, and product managers. These users use multiple technologies like MapReduce, Hive, and Spark to process data. There are thousands of applications that push and pull data from Hadoop and run computations.

Figure 1: Hadoop clusters, auxiliary components, users, applications, and services


The users normally interact with the cluster via the command line by SSHing to specialized gateway machines that reside in the same network zone as the cluster. To transfer job files and scripts, the users need to SCP over multiple hops.

Figure 2: Old way of accessing a Hadoop cluster

The need to traverse multiple hops as well as the command-line-only usage was a major hindrance to the productivity of our data users.

On the other side, our website applications and services need to access data and perform compute. These applications and services reside in a different network zone and hence need to set up network rules to access various services like HDFS, YARN, and Oozie. Since our clusters are secured with Kerberos, the applications need to be able to use Kerberos to authenticate to the Hadoop services. This was causing an extra burden for our application developers.

In this post, I will share the work in progress to facilitate access to our Hadoop clusters for data and compute resources by users and applications.


We need better ways to achieve the following goals:

  • Our engineers and other users need to use multiple clusters and related components.
  • Data Analysts and other users need to run interactive queries and create shareable reports.
  • Developers need to be able to develop applications and services without spending time on connectivity problems or Kerberos authentication.
  • We can afford no compromise on security.


To improve user experience and productivity, we added three open-source components:

Hue — to perform operations on Hadoop and related components.

Apache Zeppelin — to develop interactive notebooks with queries, programs, and reports.

Apache Knox — to serve as a single point for applications to access HDFS, Oozie, and other Hadoop services.

Figure 3: Enhanced user experience with Hue, Zeppelin, and Knox

We will describe each product, the main use cases, a list of our customizations, and the architecture.


Hue is a user interface to the Hadoop ecosystem. It provides user interfaces to several components including HDFS, Oozie, Resource Manager, Hive, and HBase. It is a 100% open-source product, actively supported by Cloudera, and stored at the Hue GitHub site.

Similar products

Apache Airflow allows users to specify workflows in Python. Since we did not want a Python learning curve for our users, we chose Hue instead of Airflow. But we may find Airflow compelling enough to deploy it in future so that it can be used by people who prefer Airflow.

Use cases of Hue

Hue allows a user to work with multiple components of the Hadoop ecosystem. A few common use cases are listed below:

  • To browse, manage, upload, and download HDFS files and directories
  • To specify workflows comprising MapReduce, Hive, Pig, Spark, Java, and shell actions
  • Schedule workflows and track SLAs
  • To manage Hive metadata, run Hive queries, and share the queries with other users
  • To manage HBase metadata and interact with HBase
  • To view YARN applications and terminate applications if needed


Two-factor authentication — To ensure that the same security level is maintained as that of command-line access, we needed to integrate our custom SAML-based two-factor authentication in Hue. Hue supports plugging in new authentication mechanisms, using which we were able to plug in our two-factor authentication.

Ability to impersonate other users — At eBay, users sometimes operate on behalf of a team account. We added capability in Hue so that users can impersonate as another account as long as they are authorized to do so. The authorization is controlled by LDAP group memberships. The users can switch back between multiple accounts.

Working with multiple clusters — Since we have multiple clusters, we wanted to provide single Hue instance serving multiple Hadoop clusters and components. This enhancement required changes in HDFS File Browser, Job Browser, Hive Metastore Managers, Hive query editors, and work flow submissions.


Figure 4: Hue architecture at eBay


A lot of our users, especially data scientists, want to run interactive queries on the data stored on Hadoop clusters. They run one query, check its results, and, based on the results, form the next query. Big data frameworks like Spark, Presto, Kylin, and to some extent, HiveServer2 provide this kind of interactive query support.

Apache Zeppelin (GitHub repo) is a user interface that integrates well with products like Spark, Presto, Kylin, among others. In addition, Zeppelin provides an interface where users can develop data notebooks. The notebooks can express data processing logic in SQL or Scala or Python or R. Zeppelin also supports data visualization in notebooks in the form of tables and charts.

Zeppelin is an Apache project and is 100% open source.

Use cases

Zeppelin allows a user to develop visually appealing interactive notebooks using multiple components of the Hadoop ecosystem. A few common use cases are listed below:

  • Run a quick Select statement on a Hive table using Presto.
  • Develop a report based on a dataset by reading files from HDFS and persisting them in memory as Spark data frames.
  • Create an interactive dashboard that allows users to search through a specific set of log files with custom format and schema.
  • Inspect the schema of a Hive table.


Two-factor authentication — To maintain security parity with that command-line access, we plugged in our custom two-factor authentication mechanism in Zeppelin. Zeppelin uses Shiro for security, and Shiro allows one to plug in a custom authentication with some difficulty.

Support for multiple clusters — We have multiple clusters and multiple instances of components like Hive. To support multiple instances in one Zeppelin server, we created different interpreters for different clusters or server instances.

Capability to override interpreter settings at the user level — Some of the interpreter settings, such as job queues and memory values, among others, need to be customized by users for their specific use cases. To support this, we added a feature in Zeppelin so that users can override certain Interpreter settings by setting properties. This is described in detail in this Apache JIRA ticket ZEPPELIN-1625


Figure 5: Zeppelin Architecture at eBay


Apache Knox (GitHub repo) is an HTTP reverse proxy, and it provides a single endpoint for applications to invoke Hadoop operations. It supports multiple clusters and multiple components like webHDFS, Oozie, WebHCat, etc. It can also support multiple authentication mechanisms so that we can hook up custom authentication along with Kerberos authentication

It is an Apache top-level project and is 100% open source.

Use cases

Knox allows an application to talk to multiple Hadoop clusters and related components through a single entry point using any application-friendly non-Kerberos authentication mechanism. A few common use cases are listed below:

  • To authenticate using an application token and put/get files to/from HDFS on a specific cluster
  • To authenticate using an application token and trigger an Oozie job
  • To run a Hive script using WebHCat


Authentication using application tokens — The applications and services in the eBay backend use a custom token-based authentication mechanism. To take advantage of the existing application credentials, we enhanced Knox to support our application token-based authentication mechanism in addition to Kerberos. Knox utilizes the Hadoop Authentication framework, which is flexible enough to plug in new authentication mechanisms. The steps to plug in an authentication mechanism on Hadoop’s HTTP interface is described in Multiple Authentication Mechanisms for Hadoop Web Interfaces


Figure 6: Knox Architecture at eBay


In this blog post, we describe the approach taken to improve user experience and developer productivity in using our multiple Hadoop clusters and related components. We illustrate the use of three open-source products to make Hadoop users’ life a lot simpler. The products are Hue, Zeppelin, and Knox. We evaluated these products, customized them for eBay’s purpose, and made them available for our users to carry out their projects efficiently.

http://www.ebaytechblog.com/2017/05/12/enhancing-the-user-experience-of-the-hadoop-ecosystem/feed/ 1
A Creative Visualization of OLAP Cuboids http://www.ebaytechblog.com/2017/05/09/a-creative-visualization-of-olap-cuboids/ Tue, 09 May 2017 18:50:16 +0000 http://www.ebaytechblog.com/?p=6994
Continue Reading »

eBay is one of the world’s largest and most vibrant marketplaces with 1.1 billion live listings every day, 169 million active buyers, and trillions of rows of datasets ranging from terabytes to petabytes. Analyzing such volumes of data required eBay’s Analytics Data Infrastructure (ADI) team to create a fast data analytics platform for this big data using Apache Kylin, which is an open-source Distributed Analytics Engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets.

Apache Kylin creatively applies data warehouse OLAP technology to the Hadoop domain, which makes a query return within milliseconds to seconds against datasets in the PBs. The magic of Apache Kylin is that it pre-calculates the metrics against defined dimensions. So when a query is launched, it doesn’t need to scan PBs worth of source data, but instead it scans the pre-calculated metrics, which is much smaller than the source data to accelerate the query speed.

Currently there are hundreds of cubes running on the Kylin production environment within eBay, serving dozens of Business domains in Inventory Healthy Analysis, User Behavior Analysis, eBay APIs Usage Analysis and eBay Business Partner Channel Performance Analysis, etc.

This post showcases the creative visualization of OLAP cuboids that has been implemented in the Cube Planner feature built on Apache Kylin by eBay’s ADI team to solve the challenge of showing huge OLAP cuboids in a fixed space. To better understand the challenge as well as the value of the visualization of sunburst charts that have been introduced into OLAP cuboids, some basic concepts need to be covered.

Basic Concepts

An OLAP cube is a term that typically refers to multi-dimensional array of data [1][2]. OLAP is an acronym for online analytical processing [3], which is a computer-based technique of analyzing data to look for insights. The term cube here refers to a multi-dimensional dataset, which is also sometimes called a hypercube if the number of dimensions is greater than 3.

A cuboid is one combination of dimensions.

For example, if a cube has 4 dimensions — time, item, location, and supplier — it has 16 cuboids, as shown here.

A basic cuboid has the most detailed data, except for the source data itself; it is composed of all dimensions, like (time, item, location, supplier). It can be rolled up to all the other cuboids. For example, a user can roll up the basic cuboid (time, item, location, supplier) along dimension “supplier” to cuboid (time, item, location). And in this case, the basic cuboid is the Parent Cuboid, and a 3-D cuboid (time, item, location) is a Child Cuboid.

OLAP Cuboids Visualization Challenges

The OLAP cuboids visualization has the following characteristics:

  • All the cuboids have a root parent — the basic cuboid.
  • The relationship “rollup to” between two cuboids is directed, from parent to child.
  • The relationship “rollup to” is m:n mapping. One parent cuboid can be rolled up to multiple child cuboids, while one child cuboid can be rolled up from multiple parent cuboids.

So the visualization of cuboids is typically a directed graph. But, in the real OLAP world, not all the relationships are kept. The m:n mappings are simplified to 1:n mappings. Every child cuboid would have just one parent. Usually we keep the relationship with the parent, who has lowest row count, and eliminate others, because the lowest row count of parent cuboid means the lowest cost of aggregation to the child cuboid. Hence, the visualization of cuboids is simplified to a tree, where the basic cuboid is the root, as shown below.

Even with the simplified relationships between cuboids, there can still be some challenges to cuboids layout with a tree:

  • The tree must be collapsed to fit in a fixed space.
  • It is impossible to have an overall view of all cuboids.
  • Multiple clicks are needed to the target node from root layer by layer.
  • It’s hard to focus on the whole view of all the child cuboids from a given cuboid.

Cuboid Visualization Solution in Cube Planner

Cube Planner makes OLAP cubes more resource-efficient. It intelligently builds a partial cube to minimize the cost of building a cube and at the same time maximizes the benefit of serving end user queries. Besides, it learns patterns from queries at runtime and dynamically recommends cuboids accordingly.

In Cube Planner, we want to show query usage down to the cuboid level, which enables the cube owner to gain more insight into their cubes. We want to have a color-coded heat map with all cuboids in one view to give the cube owner an overall feeling of its cube design. Furthermore, when the user hovers over each cuboid, details of individual cuboid — cuboid ID, query count, row count, rollup rate, etc. — would be displayed. We will also recommend a new cube design with recommended cuboids based on the query statistics, thus we will need to put the old cuboids and new cuboids together in one page to show the benefit of cube optimization.

We are not able to use any tree or directed graph component to meet all our requirements above. Luckily, our GUI engineer discovered a means to produce sunburst charts, which greatly meet our expectations.

What is a Sunburst Chart?

Our sunburst charts are created with Angular-nvD3, an AngularJS directive for NVD3 re-usable charting library (based on D3). Users can easily customize their charts via a JSON API. Go to the Angular-nvD3 quick start page if you want to know more about how to include these fancy charts in your GUI.

How Are Sunburst Charts Used for Cube Cuboids Visualization?

Basically, at the back end we create a REST API to return the cuboid tree with the necessary information, and at the front end a JavaScript controller parses the REST response to a relative JSON format and then renders the sunburst chart. Below are samples of code from these two layers.

REST Service returns the Cuboid Tree response

@RequestMapping(value = "/{cubeName}/cuboids/current", method = RequestMethod.GET)
    public CuboidTreeResponse getCurrentCuboids(@PathVariable String cubeName) {
        CubeInstance cube = cubeService.getCubeManager().getCube(cubeName);
        if (cube == null) {
            logger.error("Get cube: [" + cubeName + "] failed when get current cuboids");
            throw new BadRequestException("Get cube: [" + cubeName + "] failed when get current cuboids");
        Map<Long, Long> cuboidList = Maps.newLinkedHashMap();
        try {
            cuboidList = cubeService.getCurrentCuboidStatistics(cube);
        } catch (IOException e) {
            logger.error("Get cuboids list failed.", e);
            throw new InternalErrorException("Get cuboids failed.", e);
        CuboidTree currentCuboidTree = CuboidTreeManager.getCuboidTree(cuboidList);
        Map<Long, Long> hitFrequencyMap = getCuboidHitFrequency(cubeName);
        Map<Long, Long> queryMatchMap = getCuboidQueryMatchCount(cubeName);
        if (currentCuboidTree == null) {
            logger.warn("Get current cuboid tree failed.");
            return null;
        return cubeService.getCuboidTreeResponse(currentCuboidTree, cuboidList, hitFrequencyMap, queryMatchMap, null);

The JavaScript controller parses the REST response to create the sunburst chart

// transform chart data and customized options.
    $scope.createChart = function(data, type) {
        var chartData = data.treeNode;
        var baseNodeInfo = _.find(data.nodeInfos, function(o) { return o.cuboid_id == data.treeNode.cuboid_id; });
        $scope.formatChartData(chartData, data, baseNodeInfo.row_count);
        } else if ('recommend' === type) {
            $scope.recommendData = [chartData];
            $scope.recommendOptions = angular.copy(cubeConfig.chartOptions);
            $scope.recommendOptions.caption = {
                enable: true,
                html: '
Existed:  Hottest '
                    + ' Hot '
                    + ' Warm '
                    + ' Cold
                    + '
New:  Hottest '
                    + ' Hot '
                    + ' Warm '
                    + ' Cold
                    + '
                css: {
                    position: 'relative',
                    top: '-25px'
            $scope.recommendOptions.chart.color = function(d) {
                var cuboid = _.find(data.nodeInfos, function(o) { return o.name == d; });
                if (cuboid.row_count < 0) {
                    return d3.scale.category20c().range()[5];
                } else {
                    var baseRate = 1/data.nodeInfos.length;
                    var colorIndex = 0;
                    if (!cuboid.existed) {
                        colorIndex = 8;
                    if (cuboid.query_rate > (3 * baseRate)) {
                        return d3.scale.category20c().range()[colorIndex];
                    } else if (cuboid.query_rate > (2 * baseRate)) {
                        return d3.scale.category20c().range()[colorIndex+1];
                    } else if (cuboid.query_rate > baseRate) {
                        return d3.scale.category20c().range()[colorIndex+2];
                    } else {
                        return d3.scale.category20c().range()[colorIndex+3];
            $scope.recommendOptions.title.text = 'Recommend Cuboid Distribution';
            $scope.recommendOptions.subtitle.text = 'Cuboid Count: ' + data.nodeInfos.length;

    // transform chart data
    $scope.formatChartData= function(treeNode, orgData, parentRowCount) {
        var nodeInfo = _.find(orgData.nodeInfos, function(o) { return o.cuboid_id == treeNode.cuboid_id; });
        $.extend(true, treeNode, nodeInfo);
        treeNode.parent_row_count = parentRowCount;
        if(treeNode.children.length > 0) {
            angular.forEach(treeNode.children, function(child) {
                $scope.formatChartData(child, orgData, nodeInfo.row_count);

Screenshots and Interaction

Below are some screenshots from the eBay Kylin production environment. With a sunburst chart, cube owners can easily understand their overall cube design with a color-coded cuboid usage heat map. The greater number of dark blue elements in the sunburst chart, the more resource efficient this cube is. The greater number of light blue elements, the more room there is for efficiency.

Putting two sunburst charts of both current and recommended cuboids together, the changes become obvious. The cuboids that are recommended to be removed are marked with gray, and greens are recommended to be added. A popup window with more details of the cuboid will be shown when mouse hover over the cuboid element in a sunburst chart. The value of Cube Planner is now apparent.

Interaction with a sunburst chart is fast and convenient. The user is able to focus on any cuboid and its children with just one click, and the view changes automatically, like from the left chart to the right chart below.


If you want to specify the parent of a leaf, click on the center circle (the part marked yellow).


In short, leveraging sunburst charts as OLAP cuboid visualizations introduces a creative way to discover cube insights down to the cuboid level. With these insights, the cube owner is able to have a resource-efficient cube, thus make Apache Kylin more competitive as an OLAP engine on Hadoop.


Some graphics copyright Apache Kylin

A Surprising Pitfall of Human Judgement and How to Correct It http://www.ebaytechblog.com/2017/05/04/a-surprising-pitfall-of-human-judgement-and-how-to-correct-it/ http://www.ebaytechblog.com/2017/05/04/a-surprising-pitfall-of-human-judgement-and-how-to-correct-it/#comments Thu, 04 May 2017 17:00:10 +0000 http://www.ebaytechblog.com/?p=7156
Continue Reading »

Algorithms based on machine learning, deep learning, and AI are in the news these days. Evaluating the quality of these algorithms is usually done using human judgment. For example, if an algorithm claims to detect whether an image contains a pet, the claim can be checked by selecting a sample of images, using human judges to detect if there is a pet, and then comparing this to the results to the algorithm. This post discusses a pitfall in using human judgment that has been mostly overlooked until now.

In real life, human judges are imperfect. This is especially true if the judges are crowdsourced. This is not a new observation. Many proposals to process raw judgment scores and improve their accuracy have been made. They almost always involve having multiple judges score each item and combining the scores in some fashion. The simplest (and probably most common) method of combination is majority vote: for example, if the judges are rating yes/no (for example, is there a pet), you could report the rating given by the majority of the judges.

But even after such processing, some errors will remain. Judge errors are often correlated, and so multiple judgments can correct fewer errors than you might expect. For example, most of the judges might be unclear on the meaning of “pet”, incorrectly assume that a chicken cannot be a pet, and wrongly label a photo of a suburban home showing a child playing with chickens in the backyard. Nonetheless, any judgment errors remaining after processing are usually ignored, and the processed judgments are treated as if they were perfect.

Ignoring judge errors is a pitfall. Here is a simulation illustrating this. I’ll use the pet example again. Suppose there are 1000 images sent to the judges and that 70% of the images contain a pet. Suppose that when there is no pet, the judges (or the final processed judgment) correctly detect this 95% of the time. This is actually higher than typical for realistic industrial applications. And suppose that some of the pet images are subtle, and so when there is a pet, judges are correct only 90% of the time. I now present a simulation to show what happens.

Here’s how one round of the simulation works. I assume that there are millions of possible images and that 70% of them contain a pet. I draw a random sample of 1000. I assume that when the image has a pet, the judgment process has a 90% chance reporting “pet”, and when there is no pet, the judgment process reports “no pet” 95% of the time. I get 1000 judgments (one for each image), and I record the fraction of those judgments that were “pet”.

That’s one round of the simulation. I do the simulation many times and plot the results as a histogram. I actually did 100,000 simulations, so the histogram has 100,000 values. The results are below in red. Since this is a simulation, I know the true fraction of images with pets: it’s 70%. The estimated fraction from the judges in each simulation is almost always lower than the true fraction. Averaged over all simulations, the estimated fraction is 64%, noticeably lower than the true value of 70%.

This illustrates the pitfall: by ignoring judge error, you get the wrong answer. You might wonder if this has any practical implications. How often do you need a precise estimate for the fraction of pets in a collection of images? But what if you’re using the judgments to determine the accuracy of an algorithm that detects pets? And suppose the algorithm has an accuracy rate of 70%, but the judges say it is 64%. In the machine learning/AI community, the difference between an algorithm that is 70% accurate vs 64% is a big deal.

But maybe things aren’t as bad as they seem. When statistically savvy experimenters report the results of human judgment, they return error bars. The error bars (I give details below) in this case are

    \[       0.641 \pm 1.96 \sqrt{\frac{p (1-p)}{n}} \approx 0.641 \pm 0.030 \qquad (p = 0.64) \]

So even error bars don’t help: the actual accuracy rate of 0.7 is not included inside the error bars.

The purpose of this post is to show that you can compensate for judge error. In the image above, the blue bars are the simulation of the compensating algorithm I will present. The blue bars do a much better job of estimating the fraction of pets than the naive algorithm. This post is just a summary, but full details are in Drawing Sound Conclusions from Noisy Judgments from the WWW17 conference.

The histogram shows that the traditional naive algorithm (red) has a strong bias. But it also seems to have a smaller variance, so you might wonder if it has a smaller mean square error (MSE) than the blue algorithm. It does not. The naive algorithm has MSE 0.0037, the improved algorithm 0.0007. The smaller variance does not compensate for the large bias.

Finally, I can explain why judge error is so important. For a typical problem requiring ML/AI, many different algorithms are tried. Then human judgment can be used to detect if one is better than the other. Current practice is to use error bars as above, which do not take into account errors in the human judgment process. But this can lead the algorithm developer astray and suggest that a new algorithm is better (or worse) when in fact the difference is just noise.

The setup

I’m going to assume there are both an algorithm that takes an input and outputs a label (not necessarily a binary label) and also a judge that decides if the output is correct. So the judge is performing a binary task: determining if the algorithm is correct or not. From these judgments, you can compute the fraction of times (an estimate of the probability p) that the algorithm is correct. I will show that if you can estimate the error in the judgment process, then you can compensate for it and get a better estimate of the accuracy of the algorithm. This applies if you use raw judge scores or if you use a judgment process (for example, majority vote) to improve judge accuracy. In the latter case, you need to estimate the accuracy of the judgment process.

A simple example of this setup is an information retrieval algorithm. The algorithm is given a query and returns a document. A judge decides if the document is relevant or not. A subset of those judgments is reevaluated by gold judge experts, giving an estimate of the judges’ accuracy. Of course, if you are rich enough to be able to afford gold judges to perform all your labeling, then you don’t need to use the method presented here.

A slightly more subtle example is the pet detection algorithm mentioned above. Here it is likely that there would be a labeled set of images (a test set) used to evaluate the algorithm. You want to know how often the algorithm agrees with the labels, and you need to correct for errors in the judgment about agreement. To estimate the error rate, pick a subset of images and have them rejudged by experts (gold judges). The judges were detecting if an image had a pet or not. However, what I am really interested in is the error rate in judging whether the algorithm gave the correct answer or not. But that is easily computed from the gold judgments of the images.

The first formula

In the introduction, I mentioned that there are formulas that can correct for judge error. To explain the formula, recall that there is a task that requires labeling items into two classes, which I’ll call positive and negative. In the motivating example, positive means the algorithm gives the correct output for the item, negative that it is incorrect. I have a set of items, and judges perform the task on each item, getting an estimate of how many items are in the positive class. I’d like to use information on the accuracy of the judges to adjust this estimate. The symbols used in the formula are:

Rendered by QuickLaTeX.com

The first formula is

    \[ p = \frac{p_J + q_- - 1}{q_+ + q_- -1} \]

This formula is not difficult to derive. See the aforementioned WWW paper for details. Here are some checks that the formula is plausible. If the judges are prefect then q_+ = q_- = 1, and the formula reduces to p = p_J. In other words, the judges’ opinion of p is correct. Next, suppose the judges are useless and guess randomly. Then q_+ = q_- = 1/2, and the formula makes no sense because the denominator is infinity. So that’s also consistent.

Notice the formula is asymmetric: the numerator has q_- but not q_+. To see why this makes sense, first consider the case when the judges are perfect on negative items so that q_- = 1. The judges’ only error is to take correct answers and claim they are negative. So an estimate of p by the judges is always too pessimistic. On the other hand, if q_+ = 1, then the judges are optimistic, because they sometimes judge incorrect items as correct.

I will now show that the formula achieves these requirements. In particular, if q_- = 1 then I expect p_J < p and if q_+ = 1 then p_J > p. To verify this, note that if q_- = 1 the formula becomes p = p_J/q_+ > p_J. And if q_+ = 1 then

    \begin{eqnarray*}   p  &=& \frac{p_J + q_- - 1}{q_-} \\ &=& \frac{p_J - 1}{q_-} + 1 \\ &=& \frac{p_J - 1}{q_-} + (1-p_J) + p_J \\   &= & (p_J - 1)\left(\frac{1}{q_-} - 1\right) + p_J \\   & < & p_J \end{eqnarray*}

the last inequality because p_J - 1 < 0 and 1/q_- -1 > 0.

Up until now the formula is theoretical, because the precise values of p_J, q_+ and q_- are not known. I introduce some symbols for the estimates.

Rendered by QuickLaTeX.com

The practical formula is

    \[ \widehat{p} = \frac{\widehat{p}_J + \widehat{q}_\text{--} - 1}{\widehat{q}_+ + \widehat{q}_\text{--} -1} \]

The second formula

In the introduction, I motivated the concern with judgment error using the problem of determining the accuracy of an ML algorithm and, in particular, comparing two algorithms to determine which is better. If you had perfect judges and an extremely large set of labeled data, you could measure the accuracy of each algorithm on the large labeled set, and the one with the highest accuracy would clearly be best. But the labeled set is often limited in size. That leads to some uncertainty: if you had picked a different labeled set you might get a difference answer. That is the purpose of error bars: they quantify the uncertainty. Traditionally, the only uncertainty taken into account is due to the finite sample size. In this case, the traditional method gives 95% error bars of \widehat{p}_J  \pm 2 \sqrt{v_{\widehat{p}_J}} where

    \begin{eqnarray*}       \widehat{p}_J  &=&  \frac{k}{n} \\       v_{\widehat{p}_J} &= &  \frac{\widehat{p}_J(1-\widehat{p}_J)}{n} \end{eqnarray*}

The judge gave a positive label k times out of n, \widehat{p}_J is the estimated mean and v_{\widehat{p}_J} the estimate of the variance of \widehat{p}_J. I showed via simulation that if there are judgment errors, these intervals are too small. The corrected formula is

    \[ v_{\widehat{p}} =  \frac{v_{\widehat{p}_J}}{(\widehat{q}_+ + \widehat{q}_\text{--} - 1)^2} + v_{\widehat{q}_+}\frac{(\widehat{p}_J - 1 + \widehat{q}_\text{--})^2}{(\widehat{q}_+ + \widehat{q}_\text{--} - 1)^4} + v_{\widehat{q}_\text{--}}\frac{(\widehat{p}_J - \widehat{q}_+)^2}{(\widehat{q}_+ + \widehat{q}_\text{--} - 1)^4} \]


Rendered by QuickLaTeX.com

The formula shows that when taking judge error into account, the variance increases, that is, v_{\widehat{p}} >  v_{\widehat{p}_J}. The first term of the equation for v_{\widehat{p}} is already larger than v_{\widehat{p}_J}, since is it v_{\widehat{p}_J} divided by a number less than one. The next two terms add additional error, due to the uncertainty in estimating the judge errors \widehat{q}_+ and \widehat{q}_-.


The theme of this post is correcting for judge errors, but I have only discussed the case when the judge gives each item a binary rating, as a way to estimate what fraction of items are in each of the two classes. For example, the judge reports if an algorithm has given the correct answer, and you want to know how often the algorithm is correct. Or the judge examines a query and a document, and you want to know how often the document is relevant to the query.

But the methods presented so far extend to other situations. The WWW paper referred to earlier gives a number of examples in the domain of document retrieval. It explains how to correct for judge error in the case when a judge gives documents a relevance rating rather than just a binary relevant or not. And it shows how to extend these ideas beyond the metric “fraction in each class” to the metric DCG, which is a common metric for document retrieval.


Analyses of the type presented here always have assumptions. The assumption of this work is that there is a task (for example, “is the algorithm correct on this item?”) with a correct answer and that there are gold judges who are expert enough and have enough time to reliably obtain that correct answer. Industrial uses of human judgment often use detailed guidelines explaining how the judgment should be done, in which case these assumptions are reasonable.

But what about the general case? What if different audiences have different ideas about the task? For example, there may differences from country to country about what constitutes a pet. The analysis presented here is still relevant. You only need to compute the error rate of the judges relative to the appropriate audience.

I treat the judgment process as a black box, where only the overall accuracy of the judgments is known. This is sometimes very realistic, for example, when the judgment process is outsourced. But sometimes details are known about the judgment process would lead you to conclude that the judgments of some items are more reliable than others. For example, you might know which judges judged which items, and so have reliable information on the relative accuracy of the judges. The approach I use here can be extended to apply in these cases.

The simulation

I close this post with the R code that was used to generate the simulation given earlier.

# sim() returns a (n.iter x 4) array, where each row contains:
#   the naive estimate for p,
#   the corrected estimate,
#   whether the true value of p is in a 95% confidence interval (0/1)
#        for the naive standard error
#   the same value for the corrected standard error

sim = function(
  q.pos,  # probability a judge is correct on positive items
  q.neg,  # probability a judge is correct on negative items
  p,      # fraction of positive items
  n,      # number of items judged
  n.sim,  # number of simulations
  n.gold.neg = 200, # number of items the gold judge deems negative
  n.gold.pos = 200  # number of items the gold judge deems positive
   # number of positive items in sample
   n.pos = rbinom(n.sim, n, p)

   # estimate of judge accuracy
   q.pos.hat = rbinom(n.sim, n.gold.pos, q.pos)/n.gold.pos
   q.neg.hat = rbinom(n.sim, n.gold.neg, q.neg)/n.gold.neg

   # what the judges say (.jdg)
   n.pos.jdg = rbinom(n.sim, n.pos, q.pos) +
               rbinom(1, n - n.pos, 1 - q.neg)

   # estimate of fraction of positive items
   p.jdg.hat = n.pos.jdg/n    # naive formula
   # corrected formula
   p.hat = (p.jdg.hat - 1 + q.neg.hat)/
                      (q.neg.hat + q.pos.hat - 1)

   # is p.jdg.hat within 1.96*sigma of p?
   s.hat =  sqrt(p.jdg.hat*(1 - p.jdg.hat)/n)
   b.jdg = abs(p.jdg.hat - p) < 1.96*s.hat

   # is p.hat within 1.96*sigma.hat of p ?
   v.b = q.neg.hat*(1 - q.neg.hat)/n.gold.neg   # variance q.neg
   v.g = q.pos.hat*(1 - q.pos.hat)/n.gold.pos
   denom = (q.neg.hat + q.pos.hat - 1)^2
   v.cor = s.hat^2/denom +
           v.g*(p.jdg.hat - 1 + q.neg.hat)^2/denom^2 +
           v.b*(p.jdg.hat - q.pos)^2/denom^2
   s.cor.hat = sqrt(v.cor)
   # is negative.jdg.rate.cor within 1.96*sigma of p?
   b.cor = abs(p.hat - p) < 1.96*s.cor.hat

   rbind(p.jdg.hat, p.hat, b.jdg, b.cor)

r = sim(n.sim = 100000, q.pos = 0.90,  q.neg = 0.95, p = 0.7, n=1000)

# plot v1 and v2 as two separate histograms, but on the same ggplot
hist2 = function(v1, v2, name1, name2, labelsz=12)
  df1 = data.frame(x = v1, y = rep(name1, length(v1)))
  df2 = data.frame(x = v2, y = rep(name2, length(v2)))
  df = rbind(df1, df2)

  freq = aes(y = ..count..)
  print(ggplot(df, aes(x=x, fill=y)) + geom_histogram(freq, position='dodge') +
      theme(axis.text.x = element_text(size=labelsz),
      axis.text.y = element_text(size=labelsz),
      axis.title.x = element_text(size=labelsz),
      axis.title.y = element_text(size=labelsz),
      legend.text = element_text(size=labelsz))

hist2(r[1,], r[2,], "naive", "corrected", labelsz=16)

Powered by QuickLaTeX.

http://www.ebaytechblog.com/2017/05/04/a-surprising-pitfall-of-human-judgement-and-how-to-correct-it/feed/ 1
Building a UI Component in 2017 and Beyond http://www.ebaytechblog.com/2017/05/03/building-a-ui-component-in-2017-and-beyond/ http://www.ebaytechblog.com/2017/05/03/building-a-ui-component-in-2017-and-beyond/#comments Wed, 03 May 2017 14:00:56 +0000 http://www.ebaytechblog.com/?p=7230
Continue Reading »
As web developers, we have seen the guidelines for building UI components evolve over the years. Starting from jQuery UI to the current Custom Elements, various patterns have emerged. To top it off, there are numerous libraries and frameworks, each advocating their own style on how a component should be built. So in today’s world, what would be the best approach in terms of thinking about a UI component interface? That is the essence of this blog. Huge thanks to folks mentioned in the bottom Acknowledgments section. Also, this post leverages a lot of learnings from the articles and projects listed in the Reference section at the end.

Setting the context

Before we get started, let’s set the context on what this post covers.

  • By UI components, we mean the core, standalone UI patterns that apply to any web page in general. Examples would be Button, Dropdown Menu, Carousel, Dialog, Tab, etc. Organizations in general maintain a pattern library for these core components. We are NOT talking about application-specific components here, like a Photo Viewer used in social apps or an Item Card used in eCommerce apps. They are designed to solve application-specific (social, eCommerce, etc.) use cases. Application components are usually built using core UI components and are tied with a JavaScript framework (Vue.js, Marko, React, etc.) to create a fully featured web page.

  • We will only be talking about the interface (that is, API) of the component and how to communicate. We will not go over the implementation details in depth, just a quick overview.

Declarative HTML-based interface

Our fundamental principle behind building a UI component API is to make it agnostic in regard to any library or framework. This means the API should be able to work in any context and be framework-interoperable. Any popular framework can just leverage the interface out of the box and augment it with their own implementation. With this principle in mind, what would be the best way to get started? The answer is pretty simple — make the component API look like how any other HTML element would look like. Just like how a <div>, <img>, or a <video> tag would work. This is the idea behind having a declarative HTML-based interface/API.

So what does a component look like? Let’s take carousel as an example. This is what the component API will look like.

<carousel index="2" controls aria-label-next="next" aria-label-previous="previous" autoplay>
    <div>Markup for Item 1</div>
    <div>Markup for Item 2</div>

What does this mean? We are declaratively telling the component that the rendered markup should do the following.

  • Start at the second item in the carousel.
  • Display the left and right arrow key controls.
  • Use "previous" and "next" as the aria-label attribute values for the left and right arrow keys.
  • Also, autoplay the carousel after it is loaded.

For a consumer, this is the only information they need to know to include this component. It is exactly like how you include a <button> or <canvas> HTML element. Component names are always lowercase. They should also be hyphenated to distinguish them from native HTML elements. A good suggestion would be to prefix them with a namespace, usually your organization or project name, for example ebay-, core-, git-, etc.

Attributes form the base on how you will pass the initial state (data and configuration) to a component. Let’s talk about them.


  • An attribute is a name-value pair where the value is always a string. Now the question may arise that anything can be serialized as a string and the component can de-serialize the string to the associated data type (JSON for example). While that is true, the guideline is to NOT do that. A component can only interpret the value as a String (which is default) or a Number (similar to tabindex) or a JavaScript event handler (similar to DOM on-event handlers). Again, at the end of the day, this is exactly how an HTML element works.

  • Attributes can also be boolean. As per the HTML5 spec, “The presence of a boolean attribute on an element represents the true value, and the absence of the attribute represents the false value.” This means that as a component developer, when you need a boolean attribute, just check for the presence of it on the element and ignore the value. Having a value for it has no significance; both creator and consumer of a component should follow the same. For example, <button disabled="false"> will still disable the button, even if it is set to false, just because the boolean attribute disabled is present.

  • All attribute names should be lowercase. Camel case or Pascal case is NOT allowed. For certain multiword attributes, hyphenated names like accept-charset, data-*, etc. can be used, but that should be a rare occurrence. Even for multiwords, try your best to keep them as one lowercase name, for example, crossorigin, contenteditable, etc. Check out the HTML attribute reference for tips on how the native elements are doing it.

We can correlate the above attribute rules with our <carousel> example.

  • aria-label-next and aria-label-previous as string attributes. We hyphenate them as they are multiwords, very similar to the HTML aria-label attribute.
  • index attribute will be deserialized as a number, to indicate the position of the item to be displayed.
  • controls and autoplay will be considered as boolean attributes.

A common pattern that used to exist (or still exists) is to pass configuration and data as JSON strings. For our carousel, it would be something like the following example.

<!-- This is not recommended -->
    data-config='{"controls": true, "index": 2, "autoplay": true, "ariaLabelNext": "next", "ariaLabelPrevious": "previous"}' 
    data-items='[{"title":"Item 1", ..}, {"title": "Item 2", ...}]'>

This is not recommended.

Here the component developer reads the data attribute data-config, does a JSON parse, and then initializes the component with the provided configuration. They also build the items of the carousel using data-items. This may not be intuitive, and it works against a natural HTML-based approach. Instead consider a declarative API as proposed above, which is easy to understand and aligns with the HTML spec. Finally, in the case of a carousel, give the component consumers the flexibility to build the carousel items however they want. This decouples a core component from the context in which it is going to be used, which is usually application-specific.


There will be scenarios where you really need to pass an array of items to a core component, for example, a dropdown menu. How to do this declaratively? Let’s see how HTML does it. Whenever any input is a list, HTML uses the <option> element to represent an item in that list. As a reference, check out how the <select> and <datalist> elements leverage the <option> element to list out an array of items. Our component API can use the same technique. So in the case of a dropdown menu, the component API would look like the following.

<dropdown-menu list="options" index="0">
    <option value="0" selected>--Select--</option>
    <option value="1">Option 1</option>
    <option value="2">Option 2</option>
    <option value="3">Option 3</option>
    <option value="4">Option 4</option>
    <option value="5">Option 5</option>

It is not necessary that we should always use the <option> element here. We could create our own element, something like <dropdown-option>, which is a child of the <dropdown-menu> component, and customize it however we want. For example, if you have an array of objects, you can represent each object ({"userName": "jdoe", "score": 99, "displayName": "John Doe"}) declaratively in the markup as <dropdown-option value="jdoe" score="99">John Doe</dropdown-option>. Hopefully you do not need a complex object for a core component.


You may also argue that there is a scenario where I need to pass a JSON config for it to work or else usability becomes painful. Although this is a rare scenario for core components, a use case I can think about will be a core analytics component. This component may not have a UI, but it does all tracking related stuff, where you need to pass in a complex JSON object. What do we do? The AMP Project has a good solution for this. The component would look like the following.

    <script type="application/json">
      "requests": {
        "pageview": "https://example.com/analytics?pid=",
        "event": "https://example.com/analytics?eid="
      "vars": {
        "account": "ABC123"
      "triggers": {
        "trackPageview": {
          "on": "visible",
          "request": "pageview"
        "trackAnchorClicks": {
          "on": "click",
          "selector": "a",
          "request": "event",
          "vars": {
            "eventId": "42",
            "eventLabel": "clicked on a link"

Here again we piggyback the interface based on how we would do it in simple HTML. We use a <script> element inside the component and set the type to application/json, which is exactly what we want. This brings back the declarative approach and makes it look natural.


Till now we talked only about the initial component API. This enables consumers to include a component in a page and set the initial state. Once the component is rendered in the browser, how do you interact with it? This is where the communication mechanism comes into play. And for this, the golden rule comes from the reactive principles of

Data in via attributes and properties, data out via events

This means that attributes and properties can be used to send data to a component and events send the data out. If you take a closer look, this is exactly how any normal HTML element (input, button, etc.) behaves. We already discussed attributes in detail. To summarize, attributes set the initial state of a component, whereas properties update or reflect the state of a component. Let’s dive into properties a bit more.


At any point in time, properties are your source of truth. After setting the initial state, some attributes do not get updated as the component changes over time. For example, typing in a new phrase in an input text box and then calling element.getAttribute('value') will produce the previous (stale) value. But doing element.value will always produce the current typed-in phrase. Certain attributes, like disabled, do get reflected when the corresponding property is changed. There has always been some confusion around this topic, partly due to legacy reasons. It would be ideal for attributes and properties to be in sync, as the usability benefits are undeniable.

If you are using Custom Elements, implementing properties is quite straightforward. For a carousel, we could do this.

class Carousel extends HTMLElement {  
    static get observedAttributes() {
        return ['index'];
    // Called anytime the 'index' attribute is changed
    attributeChangedCallback(attrName, oldVal, newVal) {
        this[attrName] = newVal;
    // Takes an index value
    set index(idx) {
        // First check if it is numeric
        const numericIndex = parseInt(idx, 10);
        if (isNaN(numericIndex)) {
        // Update the internal state
        this._index = numericIndex;
        /* Perform the associated DOM operations */
    get index() {
        return this._index;

Here the index property gets all its associated characteristics. If you are doing carouselElement.index=4, it will update the internal state and then perform the corresponding DOM operations to move the carousel to the fourth item. Additionally, even if you directly update the attribute carouselElement.setAttribute('index', 4), the component will still update the index property, the internal state and perform the exact DOM operations to move the carousel to the fourth item.

However, until Custom Elements gain massive browser adoption and have a good server-side rendering story, we need to come up with other mechanisms to implement properties. And one way would be to use the Object.defineProperty() API.

const carouselElement = document.querySelector('#carousel1');
Object.defineProperty(carouselElement, 'index', {
    set(idx) {
        // First check if it is numeric
        const numericIndex = parseInt(idx, 10);
        if (isNaN(numericIndex)) {
        // Update the internal state
        this._index = numericIndex;
        /* Perform the associated DOM operations */
    get() {
        return this._index;

Here we are augmenting the carousel element DOM node with the index property. When you do carouselElement.index=4, it gives us the same functionality as the Custom Element implementation. But directly updating an attribute with carouselElement.setAttribute('index', 4) will do nothing. This is the tradeoff in this approach. (Technically we could still use a MutationObserver to achieve the missing functionality, but that would be an overkill.) Hopefully as a team, if you can standardize that state updates should only happen through properties, then it should be less of a concern.

With respect to naming conventions, since properties are accessed programmatically, they should always be camel-cased. All exposed attributes (an exception would be ARIA attributes) should have a corresponding camel-cased property, very similar to native DOM elements.


When the state of a component has changed, either programmatically or due to user interaction, it has to communicate the change to the outside world. And the best way to do it is by dispatching events, very similar to click or touchstart events dispatched by a native HTML element. The good news is that the DOM comes with a built-in custom eventing mechanism through the CustomEvent constructor. So in the case of a carousel, we can tell the outside world that the carousel transition has been completed by dispatching a transitionend event as shown below.

const carouselElement = document.querySelector('#carousel1');

// Dispatching 'transitionend' event
carouselElement.dispatchEvent(new CustomEvent('transitionend', {
    detail: {index: this._index}

// Listening to 'transitionend' event
carouselElement.addEventListener('transitionend', event => {
    alert(`User has moved to item number ${event.detail.index}`);

By doing this, we get all the benefits of DOM events like bubbling, capture etc. and also the event APIs like event.stopPropagation(), event.preventDefault(), etc. Another added advantage is that it makes the component framework-agnostic, as most frameworks already have built-in mechanisms for listening to DOM events. Check out Rob Dodson’s post on how this works with major frameworks.

Regarding a naming convention for events, I would go with the same guidelines that we listed above for attribute names. Again, when in doubt, look at how the native DOM does it.


Let me briefly touch upon the implementation details, as they give the full picture. We have been only talking about the component API and communication patterns till now. But the critical missing part is that we still need JavaScript to provide the desired functionality and encapsulation. Some components can be purely markup- and CSS-based, but in reality, most of them will require some amount of JavaScript. How do we implement this JavaScript? Well, there a couple of ways.

  • Use vanilla JavaScript. Here the developer builds their own JavaScript logic for each component. But you will soon see a common pattern across components, and the need for abstraction arises. This abstraction library will pretty much be similar to those numerous frameworks out in the wild. So why reinvent the wheel? We can just choose one of them.

  • Usually in organizations, web pages are built with a particular library or framework (Angular, Ember, Preact, etc.). You can piggyback on that library to implement the functionality and provide encapsulation. The drawback here is that your core components are also tied with the page framework. So in case you decide to move to a different framework or do a major version upgrade, the core components should also change with it. This can cause a lot of inconvenience.

  • You can use Custom Elements. That would be ideal, as it comes default in the browsers, and the browser makers recommend it. But you need a polyfill to make it work across all of them. You can try a Progressive Enhancement technique as described here, but you would lose functionality in non-supportive browsers. Moreover, until we have a solid and performant server-side rendering mechanism, Custom Elements would lack mass adoption.

And yes, all options are open-ended. It all boils down to choices, and software engineering is all about the right tradeoffs. My recommendation would be to go with either Option 2 or 3, based on your use cases.


Though the title mentions the year “2017”, this is more about building an interface that works not only today but also in the future. We are making the component API-agnostic of the underlying implementation. This enables developers to use a library or framework of their choice, and it gives them the flexibility to switch in the future (based on what is popular at that point in time). The key takeaway is that the component API and the principles behind it always stay the same. I believe Custom Elements will become the default implementation mechanism for core UI components as soon as they gain mainstream browser adoption.

The ideal state is when a UI component can be used in any page, without the need of a library or polyfill and it can work with the page owner’s framework of choice. We need to design our component APIs with that ideal state in mind and this is a step towards it. Finally, worth repeating, when in doubt, check how HTML does it, and you will probably have an answer.


Many thanks to Rob Dodson and Lea Verou for their technical reviews and valuable suggestions. Also huge thanks to my colleagues Ian McBurnie, Arun Selvaraj, Tony Topper, and Andrew Wooldridge for their valuable feedback.


http://www.ebaytechblog.com/2017/05/03/building-a-ui-component-in-2017-and-beyond/feed/ 2
Elasticsearch Cluster Lifecycle at eBay http://www.ebaytechblog.com/2017/04/12/elasticsearch-cluster-lifecycle-at-ebay/ Wed, 12 Apr 2017 15:20:37 +0000 http://www.ebaytechblog.com/?p=6821
Continue Reading »
Defining an Elasticsearch cluster lifecycle

eBay’s Pronto, our implementation of the “Elasticsearch as service” (ES-AAS) platform, provides fully managed Elasticsearch clusters for various search use cases. Our ES-AAS platform is hosted in a private internal cloud environment based on OpenStack. The platform currently manages around 35+ clusters and supports multiple data center deployments. This blog provides guidelines on all the different pieces for creating a cluster lifecycle to allow streamlined management of Elasticsearch clusters. All Elasticsearch clusters deployed within the eBay infrastructure follow our defined Elasticsearch lifecycle depicted in the figure below.

Cluster preparation

This lifecycle stage begins when a new use case is being onboarded onto our ES-AAS platform.

On-boarding information

Customers’ requirements are captured onto an onboarding template that contains information such as document size, retention policy, and read/write throughput requirement. Based on the inputs provided by the customer, infrastructure sizing is performed. The sizing uses historic learnings from our benchmarking exercises. On-boarding information has helped us in cluster planning and defining SLA for customer commitments.

We collect the following information from customers before any use case is onboarded:

  • Use case details: Consists of queries relating to use case description and significance.
  • Sizing Information: Captures the number of documents, their average document size, and year-on-year growth estimation.
  • Data read/write information: Consists of expected indexing/search rate, mode of ingestion (batch mode or individual documents), data freshness, average number of users, and specific search queries containing any aggregation, pagination, or sorting operations.
  • Data source/retention: Original data source information (such as Oracle, MySQL, etc.) is captured on an onboarding template. If the indices are time-based, then an index purge strategy is logged. Typically, we do not use Elasticsearch as the source of data for critical applications.

Benchmarking strategy

Before undertaking any benchmarking exercise, it’s really important to understand the underlying infrastructure that hosts your VMs. This is especially true in a cloud-based environment where such information is usually abstracted from end users. Be aware of different potential noisy-neighbors issues, especially on a multi-tenant-based infrastructure.

Like most folks, we have also performed extensive benchmarking exercise on existing hardware infrastructure and image flavors. Data stored in Elasticsearch clusters are specific to customer use cases. It is near to impossible to perform benchmarking runs on all data schemas used by different customers. Therefore, we made assumptions before embarking on any benchmarking exercise, and the following assumptions were key.

  • Clients will use a REST path for any data access on our provisioned Elasticsearch clusters. (No transport client)
  • To start with, we kept a mapping of 1GB RAM to 32GB disk space ratio. (This was later refined as we learnt from benchmarking)
  • Indexing numbers were carefully profiled for different numbers of replicas (1, 2, and 3 replicas).
  • Search benchmarking was done always on GetById queries (as search queries are custom and profiling different custom search queries was not viable).
  • We used fixed-size 1KB, 2KB, 5KB, and 10 KB documents

Working from these assumptions, we derived at a maximum shard size for performance (around 22GB), right payload size for _bulk requests (~5MB), etc. We used our own custom JMeter scripts to perform benchmarking. Recently Elasticsearch has developed and open-sourced the Rally benchmarking tool, which can be used as well. Additionally, based on our benchmarking learnings, we created a capacity-estimation calculator tool that can take in customer requirement inputs and calculate the infrastructure requirement for a use case. We avoided a lot of conversation with our customers on infrastructure cost by sharing this tool directly with end users.

VM cache pool

Our ES clusters are deployed by leveraging an intelligent warm-cache layer. The warm-cache layer consists of ready-to-use VM nodes that are prepared over a period of time based on some predefined rules. This ensures that VMs are distributed across different underlying hardware uniformly. This layer has allowed us to quickly spawn large clusters within seconds. Additionally, our remediation engine leverages this layer to flex up nodes on existing clusters without errors or any manual intervention. More details on our cache pool are available in another eBay tech blog at Ready-to-use Virtual-machine Pool Store via warm-cache

Cluster deployment

Cluster deployment is fully automated via a Puppet/Foreman infrastructure. We will not talk in detail about how Elasticsearch Puppet module was leveraged for provisioning Elasticsearch clusters. This is well documented at Elasticsearch puppet module. Along with every release of Elasticsearch, a corresponding version of the Puppet module is generally made publically available. We have made minor modifications to these Puppet scripts to suit eBay-specific needs. Different configuration settings for Elasticsearch are customized based on our benchmarking learnings. As a general guideline, we do not set the JVM heap memory size to more than 28 GB (because doing so leads to long garbage collection cycles), and we always disable in-memory swapping for the Elasticsearch JVM process. Independent clusters are deployed across data centers, and load balancing VIPs (Virtual IP addresses) are provisioned for data access.

Typically, with each cluster provisioned we give out two VIPs, one for data read operations and another one for write operations. Read VIPs are always created over client nodes (or coordinating nodes), while write VIPs are configured over data nodes. We have observed improved throughput from our clusters with such a configuration.

Deployment diagram


We use a lot of open source on our platform such as OpenStack, MongoDB, Airflow, Grafana, InfluxDB (open version), openTSDB, etc. Our internal services, such as cluster provisioning, cluster management, and customer management services, allow REST API-driven management for deployment and configuration. They also help in tracking clusters as assets against different customers. Our cluster provisioning service relies heavily on OpenStack. For example, we use NOVA for managing compute resources (nodes), Neutron APIs for load balancer provisioning, and Keystone for internal authentication and authorization of our APIs.

We do not use federated or cross-region deployments for an Elasticsearch cluster. Network latency limits us from having such a deployment strategy. Instead, we host independent clusters for use cases across multiple regions. Clients will have to perform dual writes when clusters are deployed in multiple regions. We also do not use Tribe nodes.

Cluster onboarding

We create cluster topology during customer onboarding. This helps to track resources and cost associated with cluster infrastructure. The metadata stored as part of a cluster topology maintains region deployment information, SLA agreements, cluster owner information, etc. We use eBay’s internal configuration management system (CMS) to track cluster information in form of a directed graph. There are external tools that hook onto this topology. Such external integrations allow easy monitoring of our clusters from centralized eBay-specific systems.

Cluster topology example

Cluster management

Cluster security

Security is provided on our clusters via a custom security plug-in that provides a mechanism to both authenticate and authorize the use of Elasticsearch clusters. Our security plug-in intercepts messages and then performs context-based authorization and authentication using an internal authentication framework. Explicit whitelisting based on client IP is supported. This is useful for configuring Kibana or other external UI dashboards. Admin (Dev-ops) are configured to have complete access to Elasticsearch cluster. We encourage using HTTPS (based on TLS 1.2) for securing communication between client and Elasticsearch clusters.

The following is a sample simple security rule that can configure be configured on our platform of provisioned clusters.

sample json code implementing a security rule

In the above sample rule, the enabled field controls if the security feature is enabled or not. whitelisted_ip_list is an array attribute for providing all whitelisted Client IPs. Any Open/Close index operations or delete index operations can be performed only by admin users.

Cluster monitoring

Cluster monitoring is done by custom monitoring plug-in that pushes 70+ metrics from each Elasticsearch node to a back-end TSDB-based data store. The plug-in works on a push-based design. External dashboards using Grafana consume the data on TSDB store. Custom templates are created on a Grafana dashboard, which allows easy centralized monitoring of our own clusters.



We leverage an internal alert system that can be used to configure threshold-based alerts on data stored on OpenTSDB. Currently, we have 500+ active alerts configured on our clusters with varying severity. Alerts are classified as ‘Errors’ or ‘Warnings’. Error alerts, when raised, are immediately attended to either by DevOps or by our internal auto-remediation engine, based on the alert rule configured.

Alerts are created during cluster provisioning based on various thresholds. For Example, if a cluster status turns RED, an ‘Error’ alert is raised or if CPU utilization of node exceeds 80% a ‘Warning’ alert is raised.

Cluster remediation

Our ES-AAS platform can perform an auto-remediation action on receiving any cluster anomaly event. Such actions are enabled via our custom Lights-Out-Management (LOM) module. Any auto-remediation module can significantly reduce manual intervention for DevOps. Our LOM module uses a rule-based engine which listens to all alerts raised on our cluster. The reactor instance maintains a context of the alerts raised and, based on cluster topology state (AUTO ON/OFF), takes remediation actions. For example, if a cluster loses a node and if this node does not return to its cluster within the next 15 minutes, the remediation engine replaces that node via our internal cluster management services. Optionally, alerts can be sent to the team instead of taking a cluster remediation action. The actions of the LOM module are tracked as stateful jobs that are persisted on a back-end MongoDB store. Due to the stateful nature of these jobs, they can be retried or rolled back as required. Audit logs are also maintained to capture the history or timeline of all remediation actions that were initiated by our custom LOM module.

Cluster logging

Along with the standard Elasticsearch distribution, we also ship our custom logging library. This library pushes all Elasticsearch application logs onto a back-end Hadoop store via an internal system called Sherlock. All centralized application logs can be viewed at both cluster and node levels. Once Elasticsearch log data is available on Hadoop, we run daily PIG jobs on our log store to generate reports for error log or slow log counts. We generally have our logging settings as INFO, and whenever we need to triage issues, we use transient a logging setting of DEBUG, which collects detailed logs onto our back-end Hadoop store.

Cluster decommissioning

We follow a cluster decommissioning process for major version upgrades of Elasticsearch. For major upgrades for Elasticsearch clusters, we spawn a new cluster with our latest offering of the Elasticsearch version. We replay all documents from old or existing version of Elasticsearch clusters to the newly created cluster. Client (user applications) starts using both cluster endpoints for all future ingestion until data catches up on the new cluster. Once data parity is achieved, we decommission the old cluster. In addition to freeing up infrastructure resources, we also clean up the associated cluster topology. Elasticsearch also provides a migration plug-in that can be used to check if direct, in-place upgrades can be done on major Elasticsearch versions. Minor Elasticsearch upgrades are done on an as-needed basis and are usually done in-place.