Site Speed for eBay Search Results

First, welcome to the eBay technical blog! Each month, we will publish one or two entries describing technical challenges at eBay, and how we go about solving them. We look forward to your comments, and we welcome your suggestions for articles.

It’s my pleasure to write the first entry in our new blog. In two parts, I’m going to introduce you to how we’ve worked on site speed over the past year, and the results that has delivered. The bottom line is that improving site speed has helped our customers and driven our business. With improvements in site speed, sellers have sold more, buyers have bought more, and eBay’s business has grown as a result. Site speed matters, and we continue to drive improvements.

Defining Site Speed

A simple definition of site speed is “average latency from the time the user submits a request for a page until it’s rendered in their browser”. However, this isn’t easy to measure: we have to decide which users do we measure this for, and how do we go about the measurement? In our case, to realistically simulate what users see, we use a third-party service to measure latency, and ask them to measure the typical experience. They do this by fetching our pages from hundreds of locations in the US and in Europe. They’re able to provide us with measurements from the US backbone (the main trunk that connects the major telecom providers and ISPs), as well as measurements from the “last mile”, that is, close to the small ISPs who provide the service to most of our customers. In our case, we collect measurements every few minutes from hundreds of points, and our team looks at latencies, availability, and several different types of requests; we’ll talk about these in a moment.

The simple definition isn’t the most effective for several reasons.

Importantly, mean latencies hide many sins. Take a look at the fictional example in Figure 1, which shows the distribution of latencies for two different implementations of the same page. The x-axis is the user latency in seconds, and the y-axis is the number of customers (in thousands) who are seeing the page. The distribution of values is different between the red and blue lines: customers in the “red line” experience have latencies in the range of 2.5 to 5.5 seconds, and customers in the “blue line” experience have latencies in the range of approximately 1 to 7 seconds. But the mean average latency for both experiences is 4 seconds. The problem is that over a quarter of the page views in the “blue line” experience are slower than any page fetch in the “red line” experience – and, as you’ve probably observed yourself, it’s the worst case scenarios that leave the largest impression. (We’ve all had those pages that occasionally take much longer to load, seemingly hanging on one browser fetch. You’ve probably closed some of those windows or tabs, and gone somewhere else. Or you’ve hit the refresh button.)

Figure 1

Figure 1


So, what do we do to realistically measure the user experience? In our case, we track the 90th and 95th percentile latencies. This means we measure the mean latency of the worst 10% and 5% of our page requests respectively. In the fictional example in Figure 1, the 90th percentile for the “red line” experience is just under 5 seconds, and the 95th percentile is around 5.3 seconds. In the “blue line” experience, it’s around 6.5 seconds for the 90th percentile and just under 7 seconds for the 95th percentile. We’d therefore view the “red line” experience as substantially better (and, of course, we’d check it really was using a statistical test, such as a one-sided t-test). Another thing that’s great about measuring 90th and 95th percentile latencies is that they’re a great diagnostic tool – we pull apart the data we get, look for the bottlenecks, and fix those. By fixing them we not only improve the percentile latencies and make our customers happier, but we also substantially affect the mean average latencies.

Another metric we measure is availability. At the “last mile”, many things can go wrong between a user’s machine and our servers, such as network glitches, transient machine failures at ISPs, and so on. When you’re viewing our search results page at eBay, you’re making many requests to eBay servers in our data centers, and also requests to other providers who deliver advertising and other page components. If any one of these requests fail or timeout at the last mile, we count this as an availability issue for that page. As part of our site speed work, we track availability and work on improving it. Improvements include creating fewer opportunities to fail, working with our partners to improve their availability, and working on our services to improve them.

It’s also hard to agree on what “rendered in their [the users’] browser” really means. It’s easy to agree to start the timer when the user’s browser issues its first request. But it’s harder to agree when to stop the timer: is it when all network activity ceases? Is it when the browser fires its “onPageLoad” event? Is it when the page first becomes ready for user interaction? Is it when the visible area of the page (the “above the fold” area) is rendered completely? Is it when the components that most users interact with are rendered? In our case, we approximate a definition of “first ready for interaction”. Unfortunately, this can differ between browsers, and it’s a work in progress for us to measure this more granularly.

I’m a big fan of measuring as much as possible, and making informed decisions using all of the data that’s available. We therefore measure many other aspects of our site’s performance. One metric I love is Time-To-First-Item or TTFI. This is the time it takes a user from beginning a search session to visiting their first view item page on eBay. This is a fantastic, user-centric way to look at eBay site speed: how long does it take a customer who wants to buy something using our search engine to get to the first destination where they could buy? It not only captures real site speed, which involves users interacting with potentially many pages on eBay, but it also captures something about how good our search results are. If the user finds what they want at the top of the results page, the TTFI falls (that’s good!). If the site is faster, the TTFI falls. So, improving TTFI helps our customers, and helps us take a holistic view on the eBay experience.

Before we move on, it’s also important to note that there’s significant differences between what users see when content is cached in their browser or near them, and when the browser is starting from a “cold start” in fetching content. We track both of these, and look at the performance of page fetches for new and returning users. In most cases, returning users don’t re-fetch the static images and other static content, since it’s cached in their browser, and so their experience is typically much faster. We’ve observed that over 20% of our users are new users, that is, they don’t have any objects cached in their browser.

Looking at Site Speed

To give you an insight into how a browser interacts with eBay, take a look at Figure 2. It shows a waterfall of what objects a browser fetches when a new user loads the eBay search results page. I’ve produced this using the Fiddler2 web debugging proxy, hooked up to an instance of the Google Chrome browser running on my corporate machine at eBay.

Figure 2

Figure 2

Right now, new users make just over 100 requests to fetch the entire results page, and we’re reducing that number every month. Figure 2 shows you the first forty requests or so that are made to fetch the page shown in Figure 3. Notice that only i.html (our base page) is fetched when the session begins, and then the browser requests around six objects simultaneously as we make our way down the timeline. You’ll also see that most requests in this example are for 80.jpg, and each one is actually a different image thumbnail shown in the search results page. All up, for this query, 1 request is for the base page, 45 are for image thumbnails, 6 are for JavaScript files, 3 are for CSS files, 7 are for advertising assets, 18 are for static images, 8 are for merchandizing assets, and 1 is for tracking. (Note that the time on the x-axis in the Figure isn’t realistic, because we’re intercepting what the browser is doing using Fiddler and slowing down the experience. Figure 2 is therefore for visualizing and diagnosing performance, not measuring actual elapsed times.)

Figure 3

Figure 3

You’ve now seen how the browser interacts with eBay. In our next blog post I’ll talk about how to improve site speed. Until then, thanks for stopping by!

Hugh E. Williams
Vice President, Buyer Experience Development

5 thoughts on “Site Speed for eBay Search Results

  1. Ben Rollins

    Hey Hugh,

    Really nice presentation of the analysis you guys do on page loads – be interested to see where you head in terms of optimising the browser interactions.

    One of the issues that comes up a bit when I browse is that calls to third party sites slow down the site you’re actually reading – for example, overloaded or slow advertising servers that delay the loading of an entire page. How do you guys deal with those kind of availability problems? Do you give low priority to third party calls so they only kick in once your own content has loaded? And is this purely a technical problem, or are you able to leverage eBay’s strength in the marketplace to demand high-availability machines from anyone who wishes to serve advertising on your pages?

    Cheers,

    Ben

    Reply
    1. Hugh Williams Post author

      Hi Ben,

      Great comment, thanks for the feedback.

      We work closely with our partners, and typically we have agreed SLAs for the performance of their services. When things go wrong (which they do occasionally do on our side and our partners’ side), we tend to get our engineers talking and work out what we can do to solve or mitigate the problems we’re seeing. Solutions sometimes include disabling certain features (eg. a particular type of ad), or excluding some traffic (eg. an ad network) while we work on a permanent fix.

      You also point out a great idea, which is giving different “priorities” to different calls. Another way to think about this is to consider what order the calls are made in, and to ensure that the most important content is rendered first. Additionally, as you point out, it’s also useful to consider how to make the page usable without slower content, and to allow that content to be added to page later without interfering with the user experience (or, when it’s really slow, to allow the page to render without that content at all). We have experimented with this over the past year, and we’ve made some good progress in that space — you’ll see a bit more about that in my next blog post, and maybe I’ll follow up with a new post sometime later that explains this more.

      Cheers, Hugh.

      Reply
  2. l Cole

    I’ve noticed an increased number of 0 results from search since the fiddling has taken place. Also medved auction stats show significantly reduced sell through rates as changes to search have been enacted

    Reply
    1. Hugh Williams Post author

      i,

      Thanks for the feedback. I’ve passed on your feedback to the team, and asked them to take a look.

      However, the site speed work I’m discussing in this post has not caused the problems you’re seeing — it improves the speed of the search results page, it doesn’t affect the results you’re seeing on the page.

      Cheers, Hugh.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *