Personalized Search at eBay

In this blog post and a succeeding one, I will discuss personalization at eBay. Specifically, I’ll talk about how search can be personalized. First I’ll investigate whether buyers actually differ in their preferences for the items they’d like to see in their search results. Next I’ll discuss what preferences have the widest variation from buyer to buyer. Then I’ll talk about how to implement personalization, and finally I’ll explain the theory that underpins the implementation. Some people argue that personalization is actually not desirable. I’ll address that at the end of my last posting.

To make this concrete, I’ll focus on what eBay calls format: is the item being offered for sale via an auction, or is it a fixed-price item with a buy-it-now button? Of course some users buy more auctions than others. But does this indicate a preference? Perhaps not. Perhaps users simply search for the best deal, and if a user buys more auctions it’s not because they prefer auctions, but simply because they happened to purchase a string of items where the best deal was an auction.

The following graph suggests that the “just looking for the best deal” theory is not tenable.

Users who bought 1 item in the past year did not have much taste for auctions. They only purchased an auction 25% of the time. On the other hand, heavy buyers who purchased at least 600 items in the past year bought mostly auctions – over 70% of their purchases were auctions. So it seems that users do have different preferences. Infrequent buyers prefer fixed-price items, while frequent buyers prefer auctions.

But before discarding the “looking for best deal” hypothesis, I want to consider some alternate possibilities. First is the question you should always ask – is the data statistically significant? Given how consistently the curve moves upwards, it seems unlikely that we’re seeing an artifact due to noise. But I can check this rigorously. I’m measuring a binary variable – one that takes on two values, auction or fixed-price. Statistics textbooks tell us that the noise (as estimated by the standard error) is √ pq/n . The leftmost point represents 14 million buyers. So p=.25, q=.75 and n≈14000000. The noise is about 0.0001. I can be confident that the 25% is very accurate.

Here’s a more subtle alternate explanation. Suppose everyone has similar auction preferences. But suppose infrequent buyers tend to purchase electronics, which has fewer auctions, and frequent buyers are more likely to buy antiques, which have a high fraction of auctions. So the curve might be consistent with buyers having similar auction preferences, and simply be a reflection of the type of items that different buyers purchase.

I can test this alternate explanation by focusing on a single category, for example Jewelry and Watches. This is a category with a high proportion of auction listings.

The plot shows the same pattern as before: infrequent buyers prefer fixed price, heavy buyers prefer auctions. The curve is shifted upwards, because Jewelry is a high-auction category. But there’s a clear difference in preferences. The same pattern holds in a low-auction category like Computers & Networking.

The curve is shifted downward, but again more experienced buyers have a stronger preference for auctions. So to summarize, the most likely explanation of the graphs I’ve shown is that some buyers have a much stronger preference for auctions than other buyers.

But I want to dig deeper. Suppose I look at buyers with similar buying experience, for example all those who bought 8 items in the past year. Do they have different format preferences?

To answer that question, I need to be more precise about what it means for users to have similar format preferences. I’ll assume that buyers chose an auction with probability p. One way that might happen is that just before buyers make a purchase, they mentally throw dice in their head, and depending on the outcome say “I feel like an auction today”, or “I feel like buying fixed price today”. OK, that’s not too realistic, but there are other scenarios where it will appear that buyers are choosing an auction with probability p. For example their decision might depend on the inventory available, and that could vary randomly. Or more likely, their decision depends on factors that I don’t have available, such as how soon they want to receive the item (for an auction, you have to wait for bidding to end). In any case, imagining that buyers have some propensity for auctions that is captured as a probability is a simple but plausible model.

A consequence of this model is that I can predict how many auctions a user will buy. Recall that I’m focusing on those who bought 8 auctions. Assuming that all buyers have the same auction propensity p, the chance that k of the 8 purchases are an auction is the familiar formula In this formula n = 8. In Jewelry & Watches, about 75% of the purchases are auctions, so p = 0.75. The top histogram below labeled Predicted was generated using this formula.

Most users are predicted to buy 6/8 = 0.75 auctions. But many are predicted to buy 8 auctions. In our model they buy an auction with probability p, and so there’s random variation. They buy 6 on the average, but some buy 8 and others only 2. The bottom histogram gives the actual numbers. They are nothing like the prediction. This suggests that the hypothesis of all users have the same auction preference is false. Here’s another example in the Computers & Networking category.

Again, the count of auctions vs. fixed price is nothing like what would be predicted if all users had a similar auction preference. Here’s a summary of my reasoning. I focused on users with similar buying experience by looking at those who bought 8 items in the past year. I observed how many bought 0/8 auctions, how many bought 1/8, etc. If all buyers had similar auction preferences, that preference would be 75% (in Jewelry), and I could predict how many would buy 0/8, 1/8, etc. But the predicted values and observed values are totally different, so I conclude that buyers do not have similar auction preferences.

Let me move on to a new topic. Users have widely varying preferences for auctions. Are there other attributes of items that vary widely? I can repeat the analysis above to find out. I will examine 3 different attributes: item condition (new vs. used), whether the seller had an eTRS (eBay Top-Rated Seller) rating, and whether the item had free shipping. In each case I will compare the predicted and observed plots, and observe by how much they differ. First item condition:

The two histograms are very different, so condition appears to be a good attribute for personalization. Next is a graph for items sold by trusted sellers.

This time the histograms are similar. True, the tails are a bit larger in the observed histogram, but there is nowhere near the variation there was for item condition. Finally, here’s free shipping.

The observed histogram is not too different from what would be expected if everyone had a 44% preference for eTRS. So I conclude that personalizing search results on format and condition will probably be much more useful than personalizing on eTRS or free shipping.

Now that I’ve established that buyers have different preferences, I need to figure out how to estimate a buyer’s personal preferences. I will use a technique called Empirical Bayes. Like all things Bayes, it has some controversy. Brad Efron, the former chair of the Stanford Statistics department has said:

The suggestion here, made explicit in the final section, is that after 50 years of underuse, we are poised for an avalanche of empirical Bayes applications

Meanwhile across the Bay, the late statistician David Blackwell, who was chairman of the Berkeley statistics department, has been quoted as such:

He [David Blackwell] noted that he didn’t believe in empirical Bayes and showed that it didn’t make sense when applied to a single inference.

I will ignore the controversy and plunge ahead. First, here’s an extremely simple way to implement personalization: If a user has bought more than 10 auctions, I will assume I have enough information to estimate their auction preference, and compute it as k/n where k is the number of auction purchases and n is the total number of purchases. Otherwise I make no assumptions. This is clearly unsatisfying. Why 10? Do I really know nothing if they’ve purchased 9 items, but suddenly at 10 items I believe their purchase history is the whole story? As a rule of thumb, hard cutoffs like this do not perform as well as a smooth transition. And empirical Bayes is a perfect tool for eliminating the hard cutoff. It gives a principled way of estimating a buyer’s propensity (probability) of purchasing an auction. In the next posting, I’ll explain how to use Empirical Bayes to derive an estimation formula. For now, I’ll just explain how the formula applies to a typical buyer, Mr. X.

The first step is to aggregate buyers who are similar to Mr. X. This peer group is then summarized by two numbers, a and b. I’ll explain how to compute them later. These numbers encode the aggregate estimate of the auction propensity, f_A = a/(a+b). I write this as f, since this probability estimates the fraction of auctions. The next step is to record X’s purchases. If he has made n purchases, and k were in auction format, then his personal estimate is f_P = k/n. Empirical Bayes gives a compromise between these two numbers, via the formula f = (a+k)/(a + b+ n). I’ll next check whether this formula is reasonable.

Suppose there were no purchases, so that k = n = 0. Then the formula becomes f = (a+k)/(a + b+ n) = a/(a + b) which is the aggregate estimate. This makes sense. On the other extreme, suppose X has made many purchases so that k and n are large. Then the Bayes estimate becomes f = (a+k)/(a + b+ n) ≈ k/n which is the user’s personal preference. So again the formula makes sense: it gives the aggregate value initially, and then as X’s purchases increase, it becomes more like his personal value.

I’ll now work through an example. Suppose I’ve bought 8 items in the Computers & Networking category in the past year. I’ll use as my peer group the buyers who have also purchased 8 items in Computers & Networking. The a, b numbers are a=0.95, b=4.64. These numbers encode the behavior of my peer group, and I can use them to compute the average fraction of auctions purchased by my peers via , f_A = a/(a+b) = .95/(.95 + 4.64) = .17. The empirical Bayes formula says that my auction propensity is f = (a+k)/(a + b+ n) = (.95+k)/13.59. Here’s a plot of what the formula predicts for different values of k, the number of purchases that were auctions.

The empirical Bayes estimate is a compromise between the aggregate value and my personal value. Here’s the plot with n = 4 and n = 16 added to n = 8.

As n gets larger, the Empirical Bayes estimate gets closer and closer to the personal estimate. Here’s a second example in the Jewelry & Watches category. Again with 8 purchases, the numbers a and b are a = 0.75, b = 0.80 and so f_A = a/(a+b) = .75/(.75 + 40.8) = .48. The plot is below:

Again, the Empirical Bayes estimate is between the aggregate and personal. You might be wondering why the aggregate isn’t more simply represented by a single number f_A, rather than decomposing it into f_A = a/(a+b). The reason is that a and b encode not just the average number of auctions in your peer group, but also the variation in that group. When a and b are large, buyers in the peer group cluster closely near f_A (left plot below). When a and b are small, buyers have a large spread (right plot).

This is consistent with f = (a+k)/(a + b+ n). When a and b are large the left graph shows that the peer group is very consistent in their auction preference, and so we should be skeptical of a user whose personal preference is very different from f_A. And indeed, the formula f = (a+k)/(a + b+ n) shows that k and n have small influence. When a and b are small the right graph applies. It shows that peer group has no strong preference, and so k and n are more important.

I’ve actually already shown you an example of this. In Computers & Networking, a and b are large, and the Bayes compromise is roughly midway between the personal and aggregate. But in Jewelry & Watches, a and b are small and the compromise is much closer to the personal line – see below.

In my next posting I will explain where the formula f = (a+k)/(a + b+ n) comes from, and how to compute the numbers a and b.

Tags: Machine Learning, Search Science