The API Journey: Or How We Built a New Family of Modern Services


An API — or application programming interface — is an intermediary that enables applications to interact. It is a contract that specifies how applications talk to one another. Further, an API creates a separation between a service provider and its consumers. Essentially, it decouples their implementations. As long as the contract stays intact, API providers may continue changing their code and underlying dependencies without destabilizing clients.

APIs are a big deal. After dealing with SOAP-based legacy APIs for years, eBay started a journey to deliver a new, modern family of APIs for sellers and buyers. Our principal goal was to design a set of interfaces that will meet business objectives, attract developers, replace our legacy APIs, and be long-lived. This is not an easy job. As I mentioned, APIs are a contract, and as such, they cannot be changed in ways that break existing integrations. APIs should evolve and grow with the business, so they must also be expandable and flexible. Now, that is hard.

Our challenge was to create a vision for the API, plan ahead and design a stable contract that will last for years, even as we add business capabilities. Here is how we did it.


“Ensuring and validating that assets and artifacts within the architecture are acting as expected and maintaining a certain level of quality.” — Gartner

To achieve consistency across the APIs, we followed a governance process and compliance model. One of our most important goals was improving the quality of the APIs by defining and enforcing standards and policies. We established a governance process that was objective, transparent, manageable, and explicit. Our compliance model for web services is platform- and tenant-agnostic and fits well into eBay’s overall API maturity model. Levels of compliance are specified by a set of characteristics and capabilities that may be measured and assessed. This helps to identify and quantify a realistic target maturity for our APIs in a given period. (And, it is testable!)

Design and beyond

“Unfortunately, people are fairly good at short-term design, and usually awful at long-term design.” — Roy Fielding

First and foremost, the API blueprint is the starting point. At eBay, a blueprint describes detailed API design enough to verify, implement, maintain, and extend capabilities in the future. Designing APIs has many analogies to building a house. Without a proper blueprint, pressure to deliver on time often leads to a poor design. To further frame an analogy to house construction, working without a blueprint causes shortcuts similar to building a bathroom in the kitchen since the plumbing work has already been done there. The challenge lies in finding a balance between our agile product development methodology and time needed to come up with a detailed design. Implementation becomes straightforward once there is a blueprint and clear understanding of what needs to be done.

For our new family of APIs, we followed our interface design method (“IDM”). The IDM is the process of arriving at an underlying model and a stable contract. It starts with capturing use cases by specifying actors, concepts, and actions, and then deriving entity relationships. Further, nouns are identified from the entities and verbs from the actions. The final phase of the IDM process is determining resource representation and specifying authorization details.

Use cases

Actor Use case Description Constraints
Seller Seller creates an advertising campaign The seller creates an advertising campaign and specifies a name, effective dates, funding strategy, and a criterion that defines inventory partitioning. The advertising campaign is either a rule-based campaign, where listings are auto-selected according to specified inventory partitioning criteria or a campaign with listings added by reference. A listing belongs to only one effective campaign.

The currently supported funding model is the cost per sale (CPS).

Inventory is partitioned based on following dimensions:

  • eBay-defined item categories
  • Seller-defined item categories
  • Item price range
  • Brand
  • Fulfillment cost
  • Item condition
. . .

Entity relationships diagram



HTTP request Authorization (OAuth 2.0 scope)
POST /sell/marketing/v1/ad_campaign
. . .

We followed pragmatic RESTful principles with standard implementations around cross-cutting concerns: error handling, security, behavioral tracking, and operational metrics. APIs represent the consumer’s view of the capabilities, and the URI names must be meaningful to developers. Our URI pattern takes a consumer-centric approach by providing a consistent, predictable, and understandable names across APIs. This pattern makes an API intuitive and easy to discover and consume. In most of the cases, the new APIs use JSON for resource representations. It is compact and easy to parse and translate. For certain use cases, supporting additional formats is straightforward, since our RESTful architecture style leaves room for such flexibility. So far, we have managed to stick to standard formats and media types. OAuth 2.0 protocol is leveraged to address security and data privacy concerns. Here, the challenge was to balance the need of fine-grained scopes that protect data and activities while managing the scope policies.

APIs are more than pure design and implementation. They include documentation, technical support, terms of use, and various operational aspects. Bringing transparency to the process through frequent discussions between architects, product owners, and engineering teams was crucial. Getting feedback from technical writers helped to achieve vocabulary consistency across APIs. For sure, all of the teams were aligned on what the success is: to build APIs that developers will love and want to use.

The road ahead

We delivered modern RESTful APIs that cover a subset of our overall marketplace capabilities and follow industry standards, well-established patterns, and best practices. Still, they are powered by a model that is flexible and extensible enough to capture new opportunities that might come in the future. Our journey is not yet complete. We are engaging customers, listening to feedback, and encouraging adoption of the new APIs, all to bring our new, long-term public API program to reality. Our goal is a large and powerful ecosystem of developer applications that add value and benefits to our buyers and sellers. Finally, we want to continue transforming our business by exposing valuable eBay solutions and capabilities to empower developers.

Griffin — Model-driven Data Quality Service on the Cloud for Both Real-time and Batch Data

Overview of Griffin

At eBay, when people use big data (Hadoop or other streaming systems), measurement of data quality is a significant challenge. Different teams have built customized tools to detect and analyze data quality issues within their own domains. As a platform organization, we think of taking a platform approach to commonly occurring patterns. As such, we are building a platform to provide shared infrastructure and generic features to solve common data quality pain points. This will enable us to build trusted data assets.

Currently it is very difficult and costly to validate data quality when we have large volumes of related data flowing across multiple platforms (streaming and batch). Take eBay’s Bullseye Personalization Platform as an example: every day we have to validate the data quality for ~600M records. Data quality often becomes an enormous challenge in this complex environment and at this massive scale.

Our investigation found the following gaps at eBay:

  • No end-to-end unified view of data quality from multiple data sources to target applications that takes account of data lineage. This results in a long delay in identifying and fixing data quality issues.
  • No system to measure data quality in streaming mode through self-service. The need is for a system with a simple tool for registering data assets, defining data quality models, visualizing and monitoring data quality, and alerting teams when an issue is detected.
  • No shared platform and API service. Each team should not have to apply and manage its own hardware and software infrastructure to solve this common problem.

With these needs in mind, we decided to build Griffin, a data quality service that aims to solve these shortcomings. Griffin is an open-source solution for validating the quality of data in an environment with distributed data systems, such as Hadoop, Spark, and Storm. It creates a unified process to define, measure, and report quality for the data assets in these systems. You can see Griffin’s source code at its home page on GitHub.


  • Accuracy measurement: Assessment of the accuracy of a data asset compared to a verifiable source
  • Data profiling: Statistical analysis and assessment of data values within a data asset for consistency, uniqueness, and logic
  • Anomaly detection: Pre-built algorithmic functions for the identification of events that do not conform to an expected pattern in a data asset
  • Visualization: Dashboards that can report the state of data quality

Key benefits

  • Real-time: The data quality checks can be executed in real time to detect issues faster.
  • Extensible: The solution can work with multiple data systems.
  • Scalable: The solution is designed to work on large volumes of data. It currently runs on ~1.2 PB of data.
  • Self-serviceable: The solution provides a simple user interface to define new data assets and rules. It also allows users to visualize the data quality dashboards and personalize their view of the dashboards.

System process

Griffin has been deployed at eBay and is serving major data systems. It takes a platform approach to providing generic features to solve common data quality validation pain points. To detect data quality issues, the key process is as follows.

  1. The user registers the data asset.
  2. The Model Engine creates a data quality model for the data asset.
  3. The Model Engine calculates metrics.
  4. Any data quality issue is reported through email or the web portal.

The following BPMN (Business Process Model and Notation) diagram illustrates the system process.

Business Process Model and Notation diagram for Griffin

The following sections describe each step in detail.

Registering the data asset

The user can register the data set to be used for a data quality check. The data set can be batch data in an RDBMS (for example, Teradata), a Hadoop system, or near real-time streaming data from Kafka, Storm, and other real-time data platforms. Normally, some basic information should be provided for the data asset, including name, type, schema definition, owner, and other items.

Creating the model

After the data asset is ready, the user can create a data quality model to define the data quality rules and metadata. We can define models for different data quality dimensions, such as accuracy, data profiling, anomaly detection, validity, timeliness, and so on.

Executing the model

The model or rule is executed automatically (by the Model Engine) to get the sample data quality validation results in a few seconds for streaming data. “Data quality model design” introduces the details of how the Model Engine is designed and executed.

Calculating on Spark

The models are running on Spark. They can calculate data quality values for both real-time and batch data. Large-scale data can be handled in a timely fashion.

Generating the metrics value

After the data quality values are calculated, the metrics value is generated based on the calculation results and persisted in the MongoDB database.

Notifying by email

If any metrics value is below its threshold, an email notification is triggered and the end user is notified as soon as any data quality issue occurs.

Web portal and metrics display

Finally, all metrics values are displayed in the web portal, so that the user can analyze the data quality results through Griffin’s built-in visualization tool and then take action.

System architecture

To accomplish this process, we designed three layers for the entire system, as shown in the following architecture design diagram:

  • Data collection and processing layer
  • Back-end service layer
  • User interface

Griffin Architecture Design diagram

Data collection and processing layer

The key component of this layer is our Model Engine. Griffin is a model-driven solution, and the user can choose various data quality dimensions to execute data quality validation based on a selected target data set or source data set (as the golden reference data). It has a corresponding library supporting it in the back end for measurements.

We support two kinds of data sources: batch data and real-time data. For batch mode, we can collect the data source from our Hadoop platform by various data connectors. For real-time mode, we can connect with messaging systems like Kafka to achieve near real-time analysis. After retrieving the data, the Model Engine computes data quality metrics in our Spark cluster.

Back-end service layer

On the back-end service layer, we have three key components.

  • The Core Service is responsible for metadata management, such as model definition, subscription management, user customization, and so on.
  • The Job Scheduler is responsible for scheduling the jobs, interacting with Model Engine, saving metrics values, sending email notifications, etc.
  • RESTful web services accomplish all the functions of Griffin, such as registering data sets, creating data quality models, publishing metrics, retrieving metrics, and adding subscriptions. Developers can develop their own user interfaces using these web services.

User Interface

We have a built-in visualization tool for Griffin. It’s a web front-end application that leverages AngularJS and eCharts to give you an effective tool for performing data quality activities. Here are some screenshots.

Griffin visualization UI showing Data Quality Metrics Heat Map

Griffin visualization UI showing multiple performance graphs

Besides the built-in UI, developers can easily develop other kinds of user interfaces by calling the RESTful services provided by Griffin.

Data quality model design

Currently, Griffin support three types of models:

  • Accuracy
  • Data profiling
  • Anomaly detection

Accuracy provides the measurement of the accuracy rate for a data asset. Data profiling provides a way to perform data assessment by investigating the characteristics of subject data sets. Anomaly detection provides the ability to predict data issues by applying some mathematical algorithms.


Given a data set, does this target data set accurately represent the “real-world” values that they are expected to represent? We can define “real-world” values as a source of truth or golden reference data set, which could come from upstream after some data processing logic or from the user’s requirement directly or from a third-party’s certified data.

Now we know how to get a golden data set and target data set, and furthermore, if we know how to compare the target data set against the golden data set by defining some mapping rules, we can measure the accuracy of the target data set.

For example, if a source file has 100 records, but in the target file only 95 records exactly match with records in the source file, then the accuracy rate is 95/100 * 100% = 95.00%.


Creating an accuracy model takes three steps:

  1. The user defines the golden data set (as the source of truth). In our solution, the user can register the golden data set first or just select an existing one in the next step.
  2. The user defines mapping rules between the target data set and the golden data set. In our solution, the user can define mapping rules by selecting corresponding columns (fields) in the UI page.
  3. The user submits the job, and back-end jobs calculate the accuracy model.

Back-end implementation

This section describes how the back end measures the accuracy dimension of a target data set T, given the source of truth as golden data set S.

To measure the accuracy quality of target dataset T, the basic approach is to calculate the discrepancy between the target and source data sets by going through their contents and examining whether all fields are exactly matched as below,

Accuracy = Count(source.field1 == target.field1 && source.field2 == target.field2 && source.field3 == target.field3 && ...source.fieldN == target.fieldN)/Count(source)

Our two data sets are too big to fit in one box, so our approach is to leverage the MapReduce programming model by distributed computing.

The real challenge is how to make this comparing algorithm generic enough to relieve data analysts and data scientists from coding burdens and at the same time keep it flexible enough to cover most accuracy requirements.

The conventional way is to use SQL joins to calculate this, like scripts in Hive, but this SQL-based solution can be improved since it has not considered the unique natures of the source data set and target data set in this context.

Our approach is to provide a generic accuracy model, after taking into consideration the special natures of the source data set and target data set.

Our implementation is in Scala, leveraging Scala’s declarative capability to accommodate various requirements and running in a Spark cluster.

Data Profiling

Profiling Types

Data quality issues can be identified via different data profiling types. Profiling results can be compared with documented expectations, and an alert report is triggered if the result doesn’t meet the expectations.

There are three types of profiling provided in our framework:

  1. Simple statistics generates null, unique, and duplicate count profiles. For example, the null count profile reports the count of null values in the selected column. It helps the customer to identify problems in the data, such as an unexpectedly high ratio of null values in a column. An example is to profile an Email Address column and discover an unacceptably high volume of missing email addresses.
  2. Summary statistics generate max, min, mean, and median number profiles. For example, for Age, the value usually should be less than 150 and greater than 0. The user can do range checking with the max/min profile on the Age column.
  3. Advanced statistics generates the frequency of pattern profiles, expressed with regular expressions. For example, a pattern profile of a United States Zip Code column might produce the regular expressions \d{5}-\d{4}, \d{5}, and \d{9}. If you see other formats, your data likely contains values that are not valid or in an incorrect format.

Backend implementation

Our data profiling mechanism is based on the column summary statistics functions provided in MLib of Spark, which enables us to calculate only once for all basic statistics on Number data type columns.

Key benefits

  • Fast profiling of big data, since our framework is based on Spark
  • Auto-scheduling for data profiling after model creation
  • Visualization including a history trend

Anomaly detection

The goal of anomaly detection is to identify cases that are unusual within data that is seemingly homogeneous. Anomaly detection is an important tool for detecting data quality issues.

For now, we have implemented some statistical detection functions by using the Bollinger Band and MAD (Mean Absolute Deviation) algorithms to find those data sets whose total count falls out of expected region. The expected region is calculated based on the history trend of each day’s total count.

Our anomaly detection also allows users to adjust parameters in the algorithm as needed and dynamically show the results after changing the parameters, so that anomaly detection is customized for the specific user.

Back-end implementation

Let’s take MAD as an example, the MAD of a data set is the average distance between each data value and the mean. These steps calculate the MAD:

  1. Find the mean (average).
  2. Find the difference between each data value and the mean.
  3. Take the absolute value of each difference.
  4. Find the mean (average) of these differences.

The following diagram shows the formula of MAD:

formula for MAD (Mean Absolute Deviation)

The calculation of Bollinger Bands is similar to that of MAD. For more information, refer to Wikipedia’s article about Bollinger Bands.

Griffin at eBay

Griffin is deployed in production at eBay and provides centralized data quality service for several eBay systems (for example, the Bullseye Personalization Platform, Hadoop data sets, and site-speed data). Griffin validates more than 800M records daily.

What’s Next?

  • We will introduce Griffin to more eBay systems, making it the unified data quality platform within eBay.
  • We will support more data quality dimensions, such as validity, completeness, uniqueness, timeliness, and consistency.
  • We will develop more machine-learning algorithms to detect even deeper relationships within data content and find data quality issues.

Monitoring Anomalies in the Experimentation Platform


The Experimentation platform at eBay runs around 1500 experiments that are responsible for processing over hundreds of TBs of reporting data contained in millions of files using Hadoop infrastructure and consuming thousands of computing resources. The entire report generation process contains well over 200 metrics, and it enables millions of customers to experience small and large innovations that enable them to buy and sell products in various countries in diverse currencies and using diverse payment mechanisms in a better way everyday.

The Experimentation reporting platform at eBay is developed using Scala, Scoobi, Apache Hive, Teradata, MicroStrategy, InfluxDB, and Grafana.


Our user-behavior tracking platform enables us to gain insights into how customers behave
and how products are used and unlock the information needed to build the right strategies for improving conversion, deepening engagement, and maximizing retention.

The eBay platform contains hundreds of applications that enable users to search for products, view specific product, and engage in commerce. These applications are running on numerous servers in data centers across the world, and they log details of every event that occurs between a user and eBay (in a specific application), such as activity (view product, perform search, add to cart, and ask questions, to name a few) and transaction (BID, BIN, and Buyer Offer, for example), including the list of experiments that a user is qualified for and has experienced during that event. Tracking data is moved from application servers to distributed systems like Hadoop and Teradata for post-processing, analytics, and archival.


Any experiment that runs on the Experimentation platform can experience anomalies that need to be identified, monitored, and rectified in order to achieve the goal of that experiment.

  • Traffic corruption. An experiment is set up to ensure that it receives an approximately equal share of unique visitors, identified by GUID (global unique identifier) or UID (signed-in), throughout its life cycle. At times, this traffic share is significantly skewed between experiment (the new experience) and control (the default experience), potentially resulting in incorrect computation of bankable and non-bankable metrics. This is one of the critical anomalies and is carefully monitored.
  • Tag corruption. The vast amounts of user activity collected by eBay application servers include information (tags) about the related list of experiments that a user is qualified for. Any kind of corruption or data loss can significantly hamper metrics computed for any experiment.

Here are some typical reasons for these anomalies:

  • GUID reset: GUIDs are stored on browser cookies. Any kind of application error or mishandling of browser upgrades can cause GUID resets against either the experiment or the control, resulting in traffic corruption.
  • Cache refresh: eBay application servers maintain caches of experiment configurations. A software or hardware glitch can cause the caches on these servers to go out of sync. This problem can lead to both traffic and tag corruption.
  • Application anomalies: Web pages are served by application servers. These application servers invoke several experimentation services to determine the list of experiments that a user is qualified for, based on several factors. Application servers can incorrectly log this information, thereby corrupting essential tags because of incorrect encoding, truncation, and application errors. This problem results in both traffic and tag corruption.

flow chart showing the logical flow between users and the experimentation back end

Monitoring anomalies

Anomalies in experiments are detected daily and ingested into InfluxDB, an open-source time-series database, visualized with Grafana.

InfluxDB is an open-source database, specifically designed to handle time-series data with high availability and high performance requirements. InfluxDB installs in minutes without external dependencies, yet is flexible and scalable enough for complex deployments. InfluxDB offers these features, among many others.

  • InfluxDB possesses on-the-fly computational capabilities that allow data to become available within milliseconds of its capture.
  • InfluxDB can store billions of data points for historical analysis.
  • InfluxDB aggregates and precomputes time-series data before it is written to disk.

Grafana provides a powerful and elegant way to create, explore, and share dashboards and data with your team. Grafana includes these features among many others:

  • Fast and flexible client-side graphs with a multitude of options
  • Drag-and-drop panels, where you can change row and panel widths easily
  • Support for several back-end time series databases, like InfluxDB, Prometheus, Graphite, and Elastic Search, with the capability to plug in custom databases
  • Shareable links to dashboards or full-screen panels

The Experimentation reporting platform leverages both InfluxDB and Grafana to monitor anomalies in experiments. It supports the following features.

Home page

The home page consists of a bird’s-eye view of all anomalies, broken at various levels like channel, business (application), and country. Every anomaly has certain threshold beyond which it needs to be further analyzed. The Gauge panel in Grafana enables us to do just that.

animated gif showing an example of the home page with a dashboard for multiple anomalies

Drill-Down view

Any anomaly can be further analyzed in a drill-down view that shows details of that anomaly, which is again broken down at various levels.

animated gif showing an example of selecting the graph for an anomaly and then displaying its enhanced drill-down view

Grafana allows quick duplication of each panel with a view that can be be easily modified. The user can select either an SQL or a drop-down interface to edit queries.

animated gif showing an example of duplicating a single panel and modifying the query for the duplicate


There are several occasions during the triaging process when we need to quickly check if a given experiment or channel or country is experiencing any anomalies. The search feature provided by Grafana (through templating) allows us to do just that. The user can type or select from a drop-down to view details of all anomalies for a specific combination of filters.

animated gif showing an example of entering a search string into a search field

Every dashboard can be customized and shared across the organization.

animated gif showing an example of sharing a dashboard

Setup and scale

InfluxDB (v 0.11-1) is installed on a single node, and so is Grafana (v 3.0.2). Each of these are hosted on the eBay cloud with 45 GB of memory, 60GB of disk space, and Ubuntu 14.04. Each day, around 2000 points are ingested into InfluxDB using a Scala client with ingestion time of around few seconds. Currently, the system contains seven months of historical anomaly data, taking around 1.5 GB disk space in InfluxDB and consuming approximately 19 GB of RAM. Anomaly data is archived on HDFS for recovery in case of system failure.

This dataset is minuscule compared to vast amounts of data that can be handled by InfluxDB, especially when assisted by its capability to be set up as a cluster for fault tolerance, which unfortunately is not supported beyond v 0.11-1.


The anomaly monitoring platform is the cornerstone for monitoring anomalies in experiments at eBay. It is becoming a single point for monitoring, sharing, and searching for anomalies in experiments for anyone in the company who runs experiments on the Experimentation platform. Its ability to be self-service (thanks to Grafana) in terms of creating new dashboards for new datasets is what makes it stand out.

There are several measures and metrics that determine if a experiment is experiencing an anomaly. If the thresholds are breached, the experiment is flagged and a consolidated email notification is sent out. It’s always been discussed in Grafana circles as to when alerting is coming (Winter has come, so will alerting), and it seems that alerting is actually coming to Grafana, enabling users to set alert thresholds for every metric that is being monitored, right from the dashboard.