Types of Content at eBay: Titles


All eBay-generated content is currently translated by our talented localization team, whereas eBay’s user-generated (UG) content is handled by our Machine Translation (MT) engine. It is common knowledge that UG text can get pretty noisy due to typos, non-dictionary terms, etc. At eBay, however, MT deals with more than that. We work with multiple types of UG content — search queries, item titles, and item descriptions — and each presents its own challenges. In the previous post we talked about search queries. This post discusses item titles.

Item Titles

Translating item titles (IT) provides our buyers from Russia, Brazil, Spanish Latin America, France, Italy, Germany, and Spain with an option to view eBay search results in their own language. This allows customers to look through pages of results and make an informed decision on which listings to open, because an image alone does not contain enough information. Being able to read and understand item titles is essential to a positive customer experience, which is why we invest a lot of effort into improving the MT engine for titles.

This type of content is very specific and presents a number of challenges for MT.


A title is a summarized item description composed of keywords. The eBay Help article on writing effective titles encourages sellers to omit punctuation and avoid trying to create a grammatically correct sentence. Following these and other tips is supposed to help sellers create a clear picture of an item and a good first impression, so it is important that the MT translation meets the same expectations. However, the lack of syntax and punctuation presents a problem for an MT engine that is normally trained on sentences. If it tries to translate a sequence of nouns, adjectives, and numbers as a sentence, meaning errors are unavoidable. It may start looking for a subject and a predicate and in general for a sentence structure, thus translating adjectives as verbs, moving words around, and so on.

As an example, let’s take a title for a can of paint: “20g Glow in the Dark Acrylic Luminous Paint Bright Pigment Graffiti Party DIY”.


What might go wrong here?

“Glow” may get translated as an imperative form of the verb, and “dark acrylic” — as a noun phrase with “acrylic” being a noun. (as in “Stay in the shaded area!”) – and that is just part of the title. Similar transformation may happen with polysemous words or those that belong to different parts of speech: “can”, “paint”, “party”, etc. The result of such translation may be a completely different item.


This is closely related to the previous issue. Segmenting a title and correctly identifying semantic units is of utmost importance for machine translation. For example, “Gucci fake snake leather purse”: in case of an incorrect segmentation, we may get a translation of a “Gucci fake” instead of the intended “fake snake leather”. Such translations are the most dangerous because they sound correct and believable yet present misleading information, which in the end may leave both a buyer and a seller unhappy with the experience.

To address these major issues, the science team created an engine just for item titles; it is trained on separate data sets. In addition, they have been working on a named entity recognition (NER) algorithm that identifies semantic units in a title before it goes in the MT engine for translation.


Sellers tend to use multiple synonyms in a title assuming this will increase the chances of matching search queries and coming up high in search results (which is a common misconception). For MT this means several things:

A chain of adjacent nouns or adjectives that are in no relation to each other

The machine needs to learn to translate them independently of each other. This is similar to the first issue described above, because the engine may try to create agreement where there should be none.


Example, Baby Toddler Kids Child Mini Cartoon Animal Backpack Schoolbag Shoulder Bag

We see four synonyms for the age reference and three synonyms for the item itself. The age reference terms are not all adjectives nor can all of them be translated as adjectives. Even a human translator would have to get creative and produce something like “for a baby/toddler, kids’, child’s” – because we could not simply leave all four of them as nouns; it would sound too abrupt and possibly confusing. The task is much more challenging for a machine. Not only should it avoid creating noun phrases (Kids child may turn into a kid’s child), it also needs to rephrase or insert prepositions where necessary (baby toddler child -> for baby, toddler, child; kids –> kids’). The best ways to approach this would vary depending on the target language.

Agreement with the head noun

In our example, there are three synonyms for a head noun: Backpack – Schoolbag – Shoulder Bag. What if they are of different gender in the target language? Which one should the adjectives agree with? A human translator would probably pick the first one, but MT may not think the same way. Here is a bigger challenge: the head noun does not immediately follow the adjectives describing it. In our example there are two other nouns between the attributes “Kids Child” and the head noun “Backpack”. The machine is supposed to figure out that “kids” describes “backpack”, not “cartoon” or “animal”. As you can imagine, however, the most logical decision for a machine would be to connect “kids” with “cartoon”.

Agreement plays a very important role in translating item titles, because it provides a customer with a description of features and qualities of the item. If you connect an attribute with the wrong noun, it will modify an incorrect object and produce an overall misleading translation. In our example, with the incorrect agreement, a user will read: “backpack with a kids’ cartoon animal”, which is in essence a different item than a “kids’ backpack with a cartoon animal”. One may argue that an image would be a clear indication that the item is a kids’ backpack. Unfortunately, a picture is not always a reliable source of information. In our case, there are similar backpacks for adults, which is why an accurate translation will make a difference.



Sellers use multiple acronyms to save space and fit as much information in a title as possible. For MT this presents several challenges.

  • Rare, unknown acronyms or acronyms that sellers made up on the spot. Gathering more training data and compiling additional lists of expanded out-of-vocabulary (OOV) acronyms is helping address that.
  • Polysemic acronyms that have different translations in different categories. The most challenging acronyms are the ones that have more than one meaning in the same category. For example, “RN” appears in Clothing, Shoes and Accessories as “registered nurse”, “Rapa Nui”, “Rusty Neal”, and as part of model names for Nike, Hugo Boss, A&F and other brands.

Writings and names of songs/music bands/movies/video games

This is common content for certain categories. Singling out a movie or song title out of the rest of the string may be difficult because there is often no contextual information pointing to the fact that it is a movie or a song. It is not much of a problem in the DVD or Music category, but quite often you will find reference to a movie title or a music band name in other categories such as Collectables or Clothing. It is also common for sellers to quote a writing on the item they are selling. Ideally, we would want to have the writing to be left as is so that the customer would know exactly what the item depicts. As you can imagine, however, literally anything can be written on a t-shirt or a poster, which is why it is very difficult for a machine to differentiate a writing from the actual item description. In such cases a user would have to rely on the quality and size of an image, which may not be the best on the search results page.


In this example, “New York Vermont Quebec” is part of the poster design, but it is barely visible. In the text of the item title, however, it may be interpreted as locations of the poster, places it originally came from, etc. Identifying this as verbatim writing, thus keeping it in English, would be a very difficult task for an MT engine, but it would clearly benefit an eBay customer.


With so many aspects to keep in mind, training the engine to translate eBay item titles is certainly a challenge. Our teams of scientists and linguists are actively and successfully working on ways to improve the quality of the training data and the MT output.

The API Journey: Or How We Built a New Family of Modern Services


An API — or application programming interface — is an intermediary that enables applications to interact. It is a contract that specifies how applications talk to one another. Further, an API creates a separation between a service provider and its consumers. Essentially, it decouples their implementations. As long as the contract stays intact, API providers may continue changing their code and underlying dependencies without destabilizing clients.

APIs are a big deal. After dealing with SOAP-based legacy APIs for years, eBay started a journey to deliver a new, modern family of APIs for sellers and buyers. Our principal goal was to design a set of interfaces that will meet business objectives, attract developers, replace our legacy APIs, and be long-lived. This is not an easy job. As I mentioned, APIs are a contract, and as such, they cannot be changed in ways that break existing integrations. APIs should evolve and grow with the business, so they must also be expandable and flexible. Now, that is hard.

Our challenge was to create a vision for the API, plan ahead and design a stable contract that will last for years, even as we add business capabilities. Here is how we did it.


“Ensuring and validating that assets and artifacts within the architecture are acting as expected and maintaining a certain level of quality.” — Gartner

To achieve consistency across the APIs, we followed a governance process and compliance model. One of our most important goals was improving the quality of the APIs by defining and enforcing standards and policies. We established a governance process that was objective, transparent, manageable, and explicit. Our compliance model for web services is platform- and tenant-agnostic and fits well into eBay’s overall API maturity model. Levels of compliance are specified by a set of characteristics and capabilities that may be measured and assessed. This helps to identify and quantify a realistic target maturity for our APIs in a given period. (And, it is testable!)

Design and beyond

“Unfortunately, people are fairly good at short-term design, and usually awful at long-term design.” — Roy Fielding

First and foremost, the API blueprint is the starting point. At eBay, a blueprint describes detailed API design enough to verify, implement, maintain, and extend capabilities in the future. Designing APIs has many analogies to building a house. Without a proper blueprint, pressure to deliver on time often leads to a poor design. To further frame an analogy to house construction, working without a blueprint causes shortcuts similar to building a bathroom in the kitchen since the plumbing work has already been done there. The challenge lies in finding a balance between our agile product development methodology and time needed to come up with a detailed design. Implementation becomes straightforward once there is a blueprint and clear understanding of what needs to be done.

For our new family of APIs, we followed our interface design method (“IDM”). The IDM is the process of arriving at an underlying model and a stable contract. It starts with capturing use cases by specifying actors, concepts, and actions, and then deriving entity relationships. Further, nouns are identified from the entities and verbs from the actions. The final phase of the IDM process is determining resource representation and specifying authorization details.

Use cases

Actor Use case Description Constraints
Seller Seller creates an advertising campaign The seller creates an advertising campaign and specifies a name, effective dates, funding strategy, and a criterion that defines inventory partitioning. The advertising campaign is either a rule-based campaign, where listings are auto-selected according to specified inventory partitioning criteria or a campaign with listings added by reference. A listing belongs to only one effective campaign.

The currently supported funding model is the cost per sale (CPS).

Inventory is partitioned based on following dimensions:

  • eBay-defined item categories
  • Seller-defined item categories
  • Item price range
  • Brand
  • Fulfillment cost
  • Item condition
. . .

Entity relationships diagram



HTTP request Authorization (OAuth 2.0 scope)
POST /sell/marketing/v1/ad_campaign https://api.ebay.com/oauth/api_scope/sell.marketing
. . .

We followed pragmatic RESTful principles with standard implementations around cross-cutting concerns: error handling, security, behavioral tracking, and operational metrics. APIs represent the consumer’s view of the capabilities, and the URI names must be meaningful to developers. Our URI pattern takes a consumer-centric approach by providing a consistent, predictable, and understandable names across APIs. This pattern makes an API intuitive and easy to discover and consume. In most of the cases, the new APIs use JSON for resource representations. It is compact and easy to parse and translate. For certain use cases, supporting additional formats is straightforward, since our RESTful architecture style leaves room for such flexibility. So far, we have managed to stick to standard formats and media types. OAuth 2.0 protocol is leveraged to address security and data privacy concerns. Here, the challenge was to balance the need of fine-grained scopes that protect data and activities while managing the scope policies.

APIs are more than pure design and implementation. They include documentation, technical support, terms of use, and various operational aspects. Bringing transparency to the process through frequent discussions between architects, product owners, and engineering teams was crucial. Getting feedback from technical writers helped to achieve vocabulary consistency across APIs. For sure, all of the teams were aligned on what the success is: to build APIs that developers will love and want to use.

The road ahead

We delivered modern RESTful APIs that cover a subset of our overall marketplace capabilities and follow industry standards, well-established patterns, and best practices. Still, they are powered by a model that is flexible and extensible enough to capture new opportunities that might come in the future. Our journey is not yet complete. We are engaging customers, listening to feedback, and encouraging adoption of the new APIs, all to bring our new, long-term public API program to reality. Our goal is a large and powerful ecosystem of developer applications that add value and benefits to our buyers and sellers. Finally, we want to continue transforming our business by exposing valuable eBay solutions and capabilities to empower developers.

Griffin — Model-driven Data Quality Service on the Cloud for Both Real-time and Batch Data

Overview of Griffin

At eBay, when people use big data (Hadoop or other streaming systems), measurement of data quality is a significant challenge. Different teams have built customized tools to detect and analyze data quality issues within their own domains. As a platform organization, we think of taking a platform approach to commonly occurring patterns. As such, we are building a platform to provide shared infrastructure and generic features to solve common data quality pain points. This will enable us to build trusted data assets.

Currently it is very difficult and costly to validate data quality when we have large volumes of related data flowing across multiple platforms (streaming and batch). Take eBay’s Bullseye Personalization Platform as an example: every day we have to validate the data quality for ~600M records. Data quality often becomes an enormous challenge in this complex environment and at this massive scale.

Our investigation found the following gaps at eBay:

  • No end-to-end unified view of data quality from multiple data sources to target applications that takes account of data lineage. This results in a long delay in identifying and fixing data quality issues.
  • No system to measure data quality in streaming mode through self-service. The need is for a system with a simple tool for registering data assets, defining data quality models, visualizing and monitoring data quality, and alerting teams when an issue is detected.
  • No shared platform and API service. Each team should not have to apply and manage its own hardware and software infrastructure to solve this common problem.

With these needs in mind, we decided to build Griffin, a data quality service that aims to solve these shortcomings. Griffin is an open-source solution for validating the quality of data in an environment with distributed data systems, such as Hadoop, Spark, and Storm. It creates a unified process to define, measure, and report quality for the data assets in these systems. You can see Griffin’s source code at its home page on GitHub.


  • Accuracy measurement: Assessment of the accuracy of a data asset compared to a verifiable source
  • Data profiling: Statistical analysis and assessment of data values within a data asset for consistency, uniqueness, and logic
  • Anomaly detection: Pre-built algorithmic functions for the identification of events that do not conform to an expected pattern in a data asset
  • Visualization: Dashboards that can report the state of data quality

Key benefits

  • Real-time: The data quality checks can be executed in real time to detect issues faster.
  • Extensible: The solution can work with multiple data systems.
  • Scalable: The solution is designed to work on large volumes of data. It currently runs on ~1.2 PB of data.
  • Self-serviceable: The solution provides a simple user interface to define new data assets and rules. It also allows users to visualize the data quality dashboards and personalize their view of the dashboards.

System process

Griffin has been deployed at eBay and is serving major data systems. It takes a platform approach to providing generic features to solve common data quality validation pain points. To detect data quality issues, the key process is as follows.

  1. The user registers the data asset.
  2. The Model Engine creates a data quality model for the data asset.
  3. The Model Engine calculates metrics.
  4. Any data quality issue is reported through email or the web portal.

The following BPMN (Business Process Model and Notation) diagram illustrates the system process.

Business Process Model and Notation diagram for Griffin

The following sections describe each step in detail.

Registering the data asset

The user can register the data set to be used for a data quality check. The data set can be batch data in an RDBMS (for example, Teradata), a Hadoop system, or near real-time streaming data from Kafka, Storm, and other real-time data platforms. Normally, some basic information should be provided for the data asset, including name, type, schema definition, owner, and other items.

Creating the model

After the data asset is ready, the user can create a data quality model to define the data quality rules and metadata. We can define models for different data quality dimensions, such as accuracy, data profiling, anomaly detection, validity, timeliness, and so on.

Executing the model

The model or rule is executed automatically (by the Model Engine) to get the sample data quality validation results in a few seconds for streaming data. “Data quality model design” introduces the details of how the Model Engine is designed and executed.

Calculating on Spark

The models are running on Spark. They can calculate data quality values for both real-time and batch data. Large-scale data can be handled in a timely fashion.

Generating the metrics value

After the data quality values are calculated, the metrics value is generated based on the calculation results and persisted in the MongoDB database.

Notifying by email

If any metrics value is below its threshold, an email notification is triggered and the end user is notified as soon as any data quality issue occurs.

Web portal and metrics display

Finally, all metrics values are displayed in the web portal, so that the user can analyze the data quality results through Griffin’s built-in visualization tool and then take action.

System architecture

To accomplish this process, we designed three layers for the entire system, as shown in the following architecture design diagram:

  • Data collection and processing layer
  • Back-end service layer
  • User interface

Griffin Architecture Design diagram

Data collection and processing layer

The key component of this layer is our Model Engine. Griffin is a model-driven solution, and the user can choose various data quality dimensions to execute data quality validation based on a selected target data set or source data set (as the golden reference data). It has a corresponding library supporting it in the back end for measurements.

We support two kinds of data sources: batch data and real-time data. For batch mode, we can collect the data source from our Hadoop platform by various data connectors. For real-time mode, we can connect with messaging systems like Kafka to achieve near real-time analysis. After retrieving the data, the Model Engine computes data quality metrics in our Spark cluster.

Back-end service layer

On the back-end service layer, we have three key components.

  • The Core Service is responsible for metadata management, such as model definition, subscription management, user customization, and so on.
  • The Job Scheduler is responsible for scheduling the jobs, interacting with Model Engine, saving metrics values, sending email notifications, etc.
  • RESTful web services accomplish all the functions of Griffin, such as registering data sets, creating data quality models, publishing metrics, retrieving metrics, and adding subscriptions. Developers can develop their own user interfaces using these web services.

User Interface

We have a built-in visualization tool for Griffin. It’s a web front-end application that leverages AngularJS and eCharts to give you an effective tool for performing data quality activities. Here are some screenshots.

Griffin visualization UI showing Data Quality Metrics Heat Map

Griffin visualization UI showing multiple performance graphs

Besides the built-in UI, developers can easily develop other kinds of user interfaces by calling the RESTful services provided by Griffin.

Data quality model design

Currently, Griffin support three types of models:

  • Accuracy
  • Data profiling
  • Anomaly detection

Accuracy provides the measurement of the accuracy rate for a data asset. Data profiling provides a way to perform data assessment by investigating the characteristics of subject data sets. Anomaly detection provides the ability to predict data issues by applying some mathematical algorithms.


Given a data set, does this target data set accurately represent the “real-world” values that they are expected to represent? We can define “real-world” values as a source of truth or golden reference data set, which could come from upstream after some data processing logic or from the user’s requirement directly or from a third-party’s certified data.

Now we know how to get a golden data set and target data set, and furthermore, if we know how to compare the target data set against the golden data set by defining some mapping rules, we can measure the accuracy of the target data set.

For example, if a source file has 100 records, but in the target file only 95 records exactly match with records in the source file, then the accuracy rate is 95/100 * 100% = 95.00%.


Creating an accuracy model takes three steps:

  1. The user defines the golden data set (as the source of truth). In our solution, the user can register the golden data set first or just select an existing one in the next step.
  2. The user defines mapping rules between the target data set and the golden data set. In our solution, the user can define mapping rules by selecting corresponding columns (fields) in the UI page.
  3. The user submits the job, and back-end jobs calculate the accuracy model.

Back-end implementation

This section describes how the back end measures the accuracy dimension of a target data set T, given the source of truth as golden data set S.

To measure the accuracy quality of target dataset T, the basic approach is to calculate the discrepancy between the target and source data sets by going through their contents and examining whether all fields are exactly matched as below,

Accuracy = Count(source.field1 == target.field1 && source.field2 == target.field2 && source.field3 == target.field3 && ...source.fieldN == target.fieldN)/Count(source)

Our two data sets are too big to fit in one box, so our approach is to leverage the MapReduce programming model by distributed computing.

The real challenge is how to make this comparing algorithm generic enough to relieve data analysts and data scientists from coding burdens and at the same time keep it flexible enough to cover most accuracy requirements.

The conventional way is to use SQL joins to calculate this, like scripts in Hive, but this SQL-based solution can be improved since it has not considered the unique natures of the source data set and target data set in this context.

Our approach is to provide a generic accuracy model, after taking into consideration the special natures of the source data set and target data set.

Our implementation is in Scala, leveraging Scala’s declarative capability to accommodate various requirements and running in a Spark cluster.

Data Profiling

Profiling Types

Data quality issues can be identified via different data profiling types. Profiling results can be compared with documented expectations, and an alert report is triggered if the result doesn’t meet the expectations.

There are three types of profiling provided in our framework:

  1. Simple statistics generates null, unique, and duplicate count profiles. For example, the null count profile reports the count of null values in the selected column. It helps the customer to identify problems in the data, such as an unexpectedly high ratio of null values in a column. An example is to profile an Email Address column and discover an unacceptably high volume of missing email addresses.
  2. Summary statistics generate max, min, mean, and median number profiles. For example, for Age, the value usually should be less than 150 and greater than 0. The user can do range checking with the max/min profile on the Age column.
  3. Advanced statistics generates the frequency of pattern profiles, expressed with regular expressions. For example, a pattern profile of a United States Zip Code column might produce the regular expressions \d{5}-\d{4}, \d{5}, and \d{9}. If you see other formats, your data likely contains values that are not valid or in an incorrect format.

Backend implementation

Our data profiling mechanism is based on the column summary statistics functions provided in MLib of Spark, which enables us to calculate only once for all basic statistics on Number data type columns.

Key benefits

  • Fast profiling of big data, since our framework is based on Spark
  • Auto-scheduling for data profiling after model creation
  • Visualization including a history trend

Anomaly detection

The goal of anomaly detection is to identify cases that are unusual within data that is seemingly homogeneous. Anomaly detection is an important tool for detecting data quality issues.

For now, we have implemented some statistical detection functions by using the Bollinger Band and MAD (Mean Absolute Deviation) algorithms to find those data sets whose total count falls out of expected region. The expected region is calculated based on the history trend of each day’s total count.

Our anomaly detection also allows users to adjust parameters in the algorithm as needed and dynamically show the results after changing the parameters, so that anomaly detection is customized for the specific user.

Back-end implementation

Let’s take MAD as an example, the MAD of a data set is the average distance between each data value and the mean. These steps calculate the MAD:

  1. Find the mean (average).
  2. Find the difference between each data value and the mean.
  3. Take the absolute value of each difference.
  4. Find the mean (average) of these differences.

The following diagram shows the formula of MAD:

formula for MAD (Mean Absolute Deviation)

The calculation of Bollinger Bands is similar to that of MAD. For more information, refer to Wikipedia’s article about Bollinger Bands.

Griffin at eBay

Griffin is deployed in production at eBay and provides centralized data quality service for several eBay systems (for example, the Bullseye Personalization Platform, Hadoop data sets, and site-speed data). Griffin validates more than 800M records daily.

What’s Next?

  • We will introduce Griffin to more eBay systems, making it the unified data quality platform within eBay.
  • We will support more data quality dimensions, such as validity, completeness, uniqueness, timeliness, and consistency.
  • We will develop more machine-learning algorithms to detect even deeper relationships within data content and find data quality issues.