Monthly Archives: November 2011

Announcing ql.io

We are happy to announce ql.io – a declarative, evented, data-retrieval and aggregation gateway for HTTP APIs. Through ql.io, we want to help application developers increase engineering clock speed and improve end user experience. ql.io can reduce the number of lines of code required to call multiple HTTP APIs while simultaneously bringing down network latency and bandwidth usage in certain use cases.

ql.io consists of a domain-specific language inspired by SQL and JSON, and a node.js-based runtime to process scripts written in that language. Check out ql.io on Github for the source and http://ql.io for demos, examples, and docs.

ql.io logo

Why ql.io

HTTP based APIs – some call them services – are an integral part of eBay’s architecture. This is true not just for eBay, but for most companies that use the Web for content and information delivery. Within eBay’s platform engineering group, we noticed several pain points for application developers attempting to get the data they need from APIs:

  • Most use cases require accessing multiple APIs – which involves making several network round trips.
  • Often those API requests have interdependencies – which requires programmatic orchestration of HTTP requests – making some requests in parallel and some in sequence to satisfy the dependencies and yet keep the overall latency low.
  • APIs are not always consistent as they evolve based on the API producers’ needs – which makes code noisier in order to normalize inconsistencies.

We found that these issues have two critical impacts: engineering clock speed and end user experience.

  • Engineering clocks slow down because developers need to account for dependencies between API calls, and to arrange those calls to optimize overall latency. Implementing orchestration logic involves multi-threaded fork-join code, leads to code bloat, and distracts from the main business use case that the developer is striving to support.
  • End user experience suffers due to high bandwidth usage as well as the latency caused by the number of requests and the processing overhead of non-optimized responses from APIs.

The goal of ql.io is to ease both pain points:

  • By using a SQL- and JSON-inspired DSL to declare API calls, their interdependencies, forks and joins, and projections, you can cut down the number of lines of code from hundreds of lines to a few, and the development time from entire sprints to mere hours. Using this language, you can create new consumer-centric interfaces that are optimized for your application’s requirements.
  • You can deploy ql.io as an HTTP gateway between client applications and API servers so that ql.io can process and condense the data to just the fields that the client needs. This helps reduce the number of requests that the client needs to make as well as the amount of data transported to clients.

A quick taste

Here is one of the typical examples of ql.io usage. It shows how ql.io can transform the experience of a developer getting the data needed to paint the UI in a native application.

prodid = select ProductID[0].Value from eBay.FindProducts where
    QueryKeywords = 'macbook pro';
details = select * from eBay.ProductDetails where
    ProductID in ('{prodid}') and ProductType = 'Reference';
reviews = select * from eBay.ProductReviews where
    ProductID in ('{prodid}') and ProductType = 'Reference';

return select d.ProductID[0].Value as id, d.Title as title,
    d.ReviewCount as reviewCount, r.ReviewDetails.AverageRating as rating
    from details as d, reviews as r
    where d.ProductID[0].Value = r.ProductID.Value
    via route '/myapi' using method get;

This script uses three API calls (in this case, all offered by eBay) to get four fields of products that match a keyword. The result is provided via a new HTTP resource with URI http://{host}:{port}/myapi. See the guide to build this example yourself, or copy and paste the above script into ql.io’s Web Console to see it in action.

While we are still working on various benchmarks, we want to share some early results on developer productivity and end user benefits. One of the teams at eBay recently migrated an application that relies solely on eBay’s APIs to get the data needed to paint its UI. The first diagram below shows the request-response traces before migrating to ql.io.

Before migrating to ql.io

The code related to these API calls was about 2800 lines long. The diagram below shows the request-response traces after migrating API access to ql.io.

After migrating to ql.io

This effort brought the code down to about 1200 lines, in addition to reducing the number of requests from 18 to 5 and the data size from 274k to 91k. In this experiment, latency drop is not significant as the client application was using broadband and some of the APIs used were slow APIs.

How to use ql.io

ql.io is not intended to replace frameworks that are currently used to build HTTP APIs. API producers can continue to use existing frameworks to offer interfaces that are generic and broadly reusable. ql.io comes into play when a consumer of APIs wants to implement consumer-specific aggregation, orchestration, and optimizations. In other words, while existing frameworks continue to support “producer-controlled” interfaces, you can use ql.io to create “consumer-controlled” interfaces.

We are building ql.io with flexible deployment in mind. Depending on where the network costs are felt, you can deploy ql.io closer to API servers, closer to users on the edge, or even on front-end machines.

Deploying closer to API servers

The primary usage of ql.io is to run it as a gateway at the reverse proxy tier, potentially between your load balancers and API servers.

ql.io as a gateway on the reverse-proxy tier

Deploying closer to client applications

A secondary usage is to deploy ql.io closer to client applications on the edge.

ql.io on the edge

Edge-side deployment can further reduce network costs for client applications by pushing API orchestration closer to those applications. Where API servers are globally distributed and the best place for aggregation may be closer to client applications, edge-side deployment may yield significant gains. If you are a developer using third-party APIs, you can follow the same pattern and deploy ql.io on your own closer to your applications.

Deploying on the front end

Our choice of Javascript and node.js for building ql.io provides an additional deployment option: front-end applications built on node.js can use ql.io programmatically.

ql.io on the node front end apps

Why node.js

Early on, one of the critical choices that we had to make was the software stack. We had two choices: Should we go with the proven Java stack that has full operational support within eBay? Or should we choose a stack like node.js with its excellent support for async I/O, but which was not yet proven when we started the project? Moreover, very few companies had operational experience with node.js. This was not an easy choice to make. In our deliberations, we considered the following systemic qualities, in their order of importance:

  • Performance and scalability for I/O workloads.  Of workloads performed during script execution, a significant percentage  is I/O bound. CPU loads are limited to in-memory tasks like joining and projections. Blocking I/O was out of the equation for supporting such workloads.
  • Operability. We need to be able to monitor the runtime, know what is going on, and react quickly when things go wrong. Furthermore, integrating with eBay’s logging and monitoring tools is a prerequisite for bringing in a new technology stack.
  • Low per-connection memory overhead. Since script execution involves some slow and some fast APIs, we need the stack to remain stable as the number of open connections increases.
  • Dynamic language support. This consideration had two parts. We wanted to build ql.io very quickly in a very small team with low code-to-execution turn-around times. This approach helps us iterate rapidly in the face of bugs as well as new use cases. In addition, we wanted application developers to be able to extend ql.io’s processing pipeline with small snippets of code.

After some analysis and prototyping, we chose Javascript as the language and node.js as the runtime stack. Here are some highlights of our experience so far:

  • Javascript and node.js allowed us to iterate very rapidly. Though we were initially concerned about finding the right tools and libraries, the node.js ecosystem proved sufficient for us to build as complex a system as ql.io.
  • We were able to tune a regular developer-quality Ubuntu workstation to handle more than 120,000 active connections per node.js process, with each connection consuming about 2k memory. We knew we could go further with the number of connections; although we did not spend the time to go beyond, this gave us the confidence to proceed with node.js.
  • ql.io’s core engine does automatic fork-join of HTTP requests by using compile-time analysis of scripts. Node’s evented I/O model freed us from worrying about locking and concurrency issues that are common with multithreaded async I/O.
  • We did pay some operationalization tax while we prepared the ql.io and node.js stack for integration with eBay’s monitoring and logging systems. This was a one-time penalty.

What’s next

We’re not done with ql.io yet, and we want to continue to develop ql.io in the open. Go to ql.io on Github and http://ql.io to find more about ql.io, try it out, discuss it, and participate.

Click Modeling for eCommerce

Historical user click patterns on search result pages are considered a great resource for algorithms that attempt to learn to rank search results. This ranking method is a well-studied problem in the field of Information Retrieval (IR). See Mike Mathiesonʼs excellent blog post, Using Behavioral Data to Improve Search, for more background on this subject.

If learn-to-rank algorithms could explicitly observe which results were deemed relevant by most users and which were not, we would have open access to usersʼ minds. Alas, we live in the real world where only user clicks can be observed; their thought patterns are forever hidden, for us to infer and speculate upon.

What are click models?

Click models are mathematical models that attempt to do just that: Describe a typical userʼs decision process as he or she interacts with the search results page, so that we may infer said userʼs judgments on the relevance and irrelevance of specific search results.

Take, for example, the following scenario:

The user searched for “Meaning of life”. A search result page (SRP) with 50 results was served back. The user then clicked on result #2, and we never heard from him again.

Consider the following two explanations:

  1. The user looked at the SRP, read the snippet returned for result #1, then ignored it as irrelevant. The user then moved to result #2, read the snippet, found it attractive, clicked through to the page, found the meaning of life in there, then stopped the search, satisfied with what he found.
  2. The user glanced at the SRP, chose result #2 randomly, read the snippet, found it somewhat relevant, clicked-through to the page, and found it completely irrelevant. Then his phone rang and he abandoned the search.

According to the first explanation, result #2 was relevant to the userʼs search. According to the second explanation, it wasnʼt. Both explanations (and many others) are possible. But are they equally likely?

A click model helps us assign mathematical probabilities to every such explanation, enabling us to use millions of scenarios to infer the likelihood of relevance for every search result against every search.

Commonly Used Click Models

The simplest class of click models is called position models. Position models assume that search results have a probability of being examined by the user that decays with the position of the result within the page. A click depends on a result being examined and deemed relevant, so that P(click) = P(examined) * P(relevant), and P(examined) is a decaying function of rank.

Cascade models are another class of click models, where the user is assumed to examine the results sequentially: starting from the top, clicking on the first relevant result examined, and stopping the search immediately. Here the probability of a click depends on the relevance of a result, as well as the irrelevance of all previous results. This model doesnʼt account for abandoned searches, or searches with multiple clicks.

Recent work has shown that using a dynamic bayesian network (DBN) to model general web search clicks outperforms both position models and cascade models [Olivier Chapelle, A dynamic bayesian network click model for web search ranking, Proceedings of the 18th International World Wide Web Conference (WWW), 2009]. The DBN model assumes that users examine items in an SRP starting at position 1 and working downwards:  skipping items that appear irrelevant and clicking on items that appear relevant, until either abandoning the search or landing on a satisfactory page following a click from the SRP. The user is assumed to leave the search as soon as the first satisfactory page is presented. SRP clicks are observed in the logs, but all other variables in the bayesian network are hidden.

Figure 1 below illustrates the concept. The boolean variables inside the box relate to user behavior, whether observable or not, and the boolean variables below the box relate to intrinsic properties of URLs: au denotes how click-attractive a URL snippet is, and su denotes how satisfactory the page really is once you land on it following a search.

Here, clicks are the only observable events. All other events are to be inferred using expectation maximization (EM) guided by the causality links that the model assumes between boolean variables. The causality links are denoted in the figure by arrows.

A parameter of the model is the user persistence rate γ, which is the probability that a user would continue to examine results after landing on an irrelevant page.

Why We Need Something New

DBN and other web search click models are not well suited for explaining user behavior at eCommerce sites such as eBay. Compared to users of general web search, shoppers at eBay are unlikely to be satisfied by landing the first relevant result. Shopping and web browsing are radically different IR exercises. For shopping, the items in the SRP are competing for the shopperʼs business, and the shopper is in a process of arbitration, where he/she must choose to award business to zero or one winning item.

Unlike web surfers, web shoppers (think referees) are expected to sample seemingly relevant items until they feel satisfied that they have come across a representative sample of good deals as well as bad deals. Only then can a shopper make an informed decision that an item is a winner. In this setting, a good deal is only good relative to other deals that are not as good. For example, if the top ten results are equally good deals then weʼd expect the shopper to keep scrolling down until a satisfactory number of bad deals are encountered. That is in sharp contrast to the general web browsing behavior assumed in DBN.

Proposed Click Model

The following model builds upon DBN, with a focus on eCommerce user behavior.

In this model, SRPs list items on sale rather than webpage URLs. Every listing L is assumed to have two independent intrinsic properties: aL and gL. The aL property denotes how attractive the listing looks in the SRP, which influences clicks; and gL denotes how much of a good shopping deal the listing really is, which depends on the price, item details, seller details, shipping details, etc. Both variables are latent – that is, cannot be observed directly; however, they are assumed to influence the behavior of users on the site, and therefore can be inferred (hopefully) by analyzing mountains of click data.

Notice how the model still contains Ei (examine) events, and how, just like the DBN model, the user is assumed to examine results sequentially top to bottom. The proposed model also contains Ai (attracted enough to click) events; but Si (satisfied) events are replaced with Gi (good deal) events, which denote that the user found the listing to be a “good deal” over all. We introduce a new event S (success event) to denote a successful shopping experience. At eBay, an S event translates into a bid or buy-it-now action. Because S events are directly observable, this model enables learning from clicks, bids, and purchases simultaneously.

Upon finding a good deal, the user performs an S event with probability f(G), which depends not only on the goodness of the current listing, but on the goodness of all previously examined listings. This is how the model accounts for users browsing multiple good and bad deals before arriving at an S decision. In the simplest case, f(G) can be the sum of good deals encountered, but can be generalized to the form:

f(G) = αΣ(good deals) + βΣ(bad deals)

Where α and β are model parameters.

This modeling framework, if properly applied to eCommerce click logs, would greatly enhance the learnability of ranking algorithms from click data.

Personalized Search at eBay, part II

In the first part of this blog posting, I talked about how to estimate a buyer’s propensity to purchase an auction over a fixed price item. I gave the Empirical Bayes formula f = (a+k)/(a + b+ n) for the auction propensity f which is a compromise between the buyer’s shopping history k/n and the propensity of his peer group a/(a+b). In this posting I will explain where the formula comes from, and how to compute the numbers a and b.

Since the method is called Empirical Bayes, it won’t surprise you to learn that it uses Bayes Theorem

The specific form I’ll use is

In this formula, p is the probability of buying an auction, n is the number of purchases that a shopper has made, and k is the number of those purchases that were auctions. The left-hand side is what I’d like to know: the probability (propensity) of an auction given that the buyer has previously purchased k/n auctions. As usual, the reason to use Bayes formula is that it relates the unknown left-hand side to the computable right-hand side.

The pleasure of Bayes formula is the first term on the right-hand side. This is the textbook formula we’ve seen before:

The pain of Bayes formula is the second term, the prior Pr(p). This is the estimate of p before (prior to) learning there were k/n auction purchases. Much time has been spent debating what value to use for the prior. This is where empirical Bayes comes in. I will use the data itself to estimate Pr(p). I do it by assuming that Pr(p) follows a Beta probability distribution

and all I need to do is specify the two parameters a and b. Why this distribution? It’s general enough to fit most data, but simple enough to make explicit calculations possible. I’ll do those calculations in a moment. The following figure from Wikipedia shows the first point. By varying the parameters a and b (α and β in the figure) you can get a large range of distributions.

Now for the details of how to compute a and b. The probability of purchasing k/n auctions is

If I don’t know a precise value for p, but do know it varies via some probability distribution Prob(p) then pk is an average over those p, expressed as the integral

As I mentioned above, I’ll assume that Prob(p) is a beta distribution

Now plug this value of Prob(p) into the formula for pk which I’ve written as pk(a, b) to emphasize the dependence on a and b.

If you’re following along in detail, you’ll see that I did some simplifications after the plug-in. The important thing is this: the value of pk depends on the known k, n and the unknown a, b, and my job is to pick those a, b so that

is minimized. In other words, I want to make the computed pk as close as possible to the observed fraction, which I’ve written with a hat. For example p2 (a,b) is the computed fraction of people in the peer group that bought 2 auctions assuming the prior is Beta(a, b), while p2 with a hat is the observed fraction of users who bought two auctions.

Here’s an illustration. Suppose the peer group is users who’ve purchased 21 items. Then the best values of a and b are 1.16 and 2.22 which gives this beta distribution:

Once I have a and b I can compute the pk (a,b) and see how closely they approximate

The match is excellent. The red predicted curve has only two parameters and yet it has a very good fit to all 22 points.

Now that I’ve explained how to compute a and b, I can explain where the formula f = (a+k)/(a + b+ n) comes from. Use Bayes formula

and substitute the beta distribution Ba,b for Pr(p) as follows:

I’ve been a little pedantic and used â to show that it is an estimate of a derived from the peer group (and similarly for b). The calculations above show that when you plug the formula for Ba,b(p) into Bayes formula you get out another beta distribution! This is the explanation for my earlier statement that beta was chosen to make explicit calculations possible.

My goal is to estimate the auction propensity for a buyer who has purchased k/n auctions. What I have so far is a probability distribution for this propensity, specifically a beta distribution. If I want a single probability, the natural choice is the average of that beta distribution. The average of Beta(a,b) is a/(a+b). So if I want to get a single number for the auction propensity, I take the average of

which is (a+k)/(a+b+n). This is the formula that I have used many times in this posting.

So much for the calculations. I want to end by discussing whether personalization is desirable. Some have argued that it isn’t a good idea to show users only what they’ve seen before. I think these arguments are often attacking a straw man. A good personalization system doesn’t eliminate all variety, it merely tailors it. Suppose our Mr. X has purchased 20 items, all of them at fixed price. But his friend Ms. Y has done the opposite: all her purchases were made by auction. I find it hard to believe that anyone would seriously argue that X and Y should see the same search results. Just as bad as showing X a pile of auctions would be to never show him any auctions. A good personalization system should be nuanced. Methods like those shown in this posting provide principled estimates of user’s preferences. But how these preferences are used requires some finesse.