Monthly Archives: September 2010

Site Speed for eBay Search Results – Part two

Having described the problem of defining and measuring site speed in the first part, let’s now look at how to improve it.

Improving Site Speed

Most developers I’ve spoken to believe improving site speed is about reducing object sizes, so that less data is moved across the wire. Sure, that’s one thing to improve, but Figure 2 in our previous post gives you a good sense that there are often other important problems.

In my experience, there are six important areas of focus. I’ve ranked them here from what I believe is the most to least important for a large, modern web company:

  1. Reducing DNS lookups
  2. Reducing the number of HTTP requests
  3. Improving parallelization of requests
  4. Putting data closer to customers
  5. Improving JavaScript and CSS to reduce render times
  6. Reducing file size

1. Reducing DNS lookups

Why are DNS lookups the number one issue? It’s because DNS servers have a high variance in response times, typically dependent on their load and configuration. There are also often many network hops in looking up the IP address that matches a domain name. Given that website owners don’t have control over the DNS infrastructure of the web, it’s important to reduce the number of DNS requests to improve site speed and availability. How do you do this? Well, you reduce the number of domains and subdomains to which the customer’s browser makes requests. If you hook fiddler or another tool up to Google’s product search, you’ll see it only uses four subdomains / domains. At eBay, we use more, and it’s something we’re working on.

2. Reducing the number of HTTP requests

The second issue is reducing HTTP requests. Each HTTP request ties up a connection between the user’s browser and the server, and also has a fixed overhead (it requires setup, transfer, and acknowledgement of the transmission). Having fewer HTTP requests means lower fixed cost overhead, and better use of the concurrent bandwidth between the browser and the server. In July 2009, we had nearly 200 GET requests to compose our results page, and we’re now down to just over 100. We still have a way to go, and plenty of great ideas to get there. One of the easier ones is “spriting” static images into a single image, and slicing and dicing them on the client. To see this in its extreme, take a look at one of Google’s sprites for their product search – this is something they’ve done well.

3. Improving parallelization of requests

The third area is keeping the browser busy, making sure it’s fetching as much as it can. Browsers typically fetch up to 7 resources at the same time, but typically place limits on how many simultaneous requests they make to one subdomain. Most large properties use tricks to get around this. For example, we serve our thumbnail images from several “thumbs” domains, which actually point to the same machines. This means the browser will happily open more simultaneous connections. Of course, the flipside is more DNS lookups. It seems that most folks in large web properties are settling for 2 or 3 image serving subdomains these days. It’s also important to think about how the page is constructed, to make sure that the right elements are fetched in the right order; we do lots of experimentation to make sure we build the page in the best order possible for our customers.

4. Putting data closer to customers

The fourth area is basically caching. The more you can cache in the user’s browser, or at the user’s ISP, the better. We work with partners to put static resources, such as GIFs, JavaScript, and CSS files as close to our customers as we can. Obviously, the closer the data is to the customer, the lower the latency in fetching the data and the more reliably it can be fetched. Of course, there are plenty of resources that are hard to cache – it’s impossible to cache advertising, or assets that continually change (such as counters at the bottom of pages).

5. Improving JavaScript and CSS to reduce render times

The fifth area is a complex software engineering investment. In brief, I believe two things: the less time the browser spends “paused” processing JavaScript and not retrieving or showing pages, the better; and, second, the more efficiently you can process JavaScript and CSS, the faster the page will render. This typically requires significant custom work for each of the major browsers. For example, IE6, IE7, and IE8 are all very different. This flies in the face of keeping pages compact, so it’s often a good idea to fetch different CSS and JavaScript files, depending on the browser and operating system.

6. Reducing file size

And, last, is the common starting point for many developers. It’s good to remove whitespace and comments from JavaScript files. It’s good to only download what you need; for example, remove JavaScript that isn’t used on the page. It’s great to compress images more effectively (while also making sure they have awesome fidelity), and to keep all assets as compact as possible. We spend plenty of time here too.

Our Experience

Our work on site speed over the last year has delivered amazing results to our customers. Our users are buying at least 3% more each week than they used to, simply because they get what they want faster and can use their time better. Of course, this is also great for our sellers who build their businesses on eBay. We continue to work on site speed, and we’ve got some real rocket science in the hopper that’ll transform our experience to a new, faster level. I’ll write about that again someday soon. Thanks for stopping by!

Hugh E. Williams
Vice President, Buyer Experience Development

Site Speed for eBay Search Results

First, welcome to the eBay technical blog! Each month, we will publish one or two entries describing technical challenges at eBay, and how we go about solving them. We look forward to your comments, and we welcome your suggestions for articles.

It’s my pleasure to write the first entry in our new blog. In two parts, I’m going to introduce you to how we’ve worked on site speed over the past year, and the results that has delivered. The bottom line is that improving site speed has helped our customers and driven our business. With improvements in site speed, sellers have sold more, buyers have bought more, and eBay’s business has grown as a result. Site speed matters, and we continue to drive improvements.

Defining Site Speed

A simple definition of site speed is “average latency from the time the user submits a request for a page until it’s rendered in their browser”. However, this isn’t easy to measure: we have to decide which users do we measure this for, and how do we go about the measurement? In our case, to realistically simulate what users see, we use a third-party service to measure latency, and ask them to measure the typical experience. They do this by fetching our pages from hundreds of locations in the US and in Europe. They’re able to provide us with measurements from the US backbone (the main trunk that connects the major telecom providers and ISPs), as well as measurements from the “last mile”, that is, close to the small ISPs who provide the service to most of our customers. In our case, we collect measurements every few minutes from hundreds of points, and our team looks at latencies, availability, and several different types of requests; we’ll talk about these in a moment.

The simple definition isn’t the most effective for several reasons.

Importantly, mean latencies hide many sins. Take a look at the fictional example in Figure 1, which shows the distribution of latencies for two different implementations of the same page. The x-axis is the user latency in seconds, and the y-axis is the number of customers (in thousands) who are seeing the page. The distribution of values is different between the red and blue lines: customers in the “red line” experience have latencies in the range of 2.5 to 5.5 seconds, and customers in the “blue line” experience have latencies in the range of approximately 1 to 7 seconds. But the mean average latency for both experiences is 4 seconds. The problem is that over a quarter of the page views in the “blue line” experience are slower than any page fetch in the “red line” experience – and, as you’ve probably observed yourself, it’s the worst case scenarios that leave the largest impression. (We’ve all had those pages that occasionally take much longer to load, seemingly hanging on one browser fetch. You’ve probably closed some of those windows or tabs, and gone somewhere else. Or you’ve hit the refresh button.)

Figure 1

Figure 1


So, what do we do to realistically measure the user experience? In our case, we track the 90th and 95th percentile latencies. This means we measure the mean latency of the worst 10% and 5% of our page requests respectively. In the fictional example in Figure 1, the 90th percentile for the “red line” experience is just under 5 seconds, and the 95th percentile is around 5.3 seconds. In the “blue line” experience, it’s around 6.5 seconds for the 90th percentile and just under 7 seconds for the 95th percentile. We’d therefore view the “red line” experience as substantially better (and, of course, we’d check it really was using a statistical test, such as a one-sided t-test). Another thing that’s great about measuring 90th and 95th percentile latencies is that they’re a great diagnostic tool – we pull apart the data we get, look for the bottlenecks, and fix those. By fixing them we not only improve the percentile latencies and make our customers happier, but we also substantially affect the mean average latencies.

Another metric we measure is availability. At the “last mile”, many things can go wrong between a user’s machine and our servers, such as network glitches, transient machine failures at ISPs, and so on. When you’re viewing our search results page at eBay, you’re making many requests to eBay servers in our data centers, and also requests to other providers who deliver advertising and other page components. If any one of these requests fail or timeout at the last mile, we count this as an availability issue for that page. As part of our site speed work, we track availability and work on improving it. Improvements include creating fewer opportunities to fail, working with our partners to improve their availability, and working on our services to improve them.

It’s also hard to agree on what “rendered in their [the users’] browser” really means. It’s easy to agree to start the timer when the user’s browser issues its first request. But it’s harder to agree when to stop the timer: is it when all network activity ceases? Is it when the browser fires its “onPageLoad” event? Is it when the page first becomes ready for user interaction? Is it when the visible area of the page (the “above the fold” area) is rendered completely? Is it when the components that most users interact with are rendered? In our case, we approximate a definition of “first ready for interaction”. Unfortunately, this can differ between browsers, and it’s a work in progress for us to measure this more granularly.

I’m a big fan of measuring as much as possible, and making informed decisions using all of the data that’s available. We therefore measure many other aspects of our site’s performance. One metric I love is Time-To-First-Item or TTFI. This is the time it takes a user from beginning a search session to visiting their first view item page on eBay. This is a fantastic, user-centric way to look at eBay site speed: how long does it take a customer who wants to buy something using our search engine to get to the first destination where they could buy? It not only captures real site speed, which involves users interacting with potentially many pages on eBay, but it also captures something about how good our search results are. If the user finds what they want at the top of the results page, the TTFI falls (that’s good!). If the site is faster, the TTFI falls. So, improving TTFI helps our customers, and helps us take a holistic view on the eBay experience.

Before we move on, it’s also important to note that there’s significant differences between what users see when content is cached in their browser or near them, and when the browser is starting from a “cold start” in fetching content. We track both of these, and look at the performance of page fetches for new and returning users. In most cases, returning users don’t re-fetch the static images and other static content, since it’s cached in their browser, and so their experience is typically much faster. We’ve observed that over 20% of our users are new users, that is, they don’t have any objects cached in their browser.

Looking at Site Speed

To give you an insight into how a browser interacts with eBay, take a look at Figure 2. It shows a waterfall of what objects a browser fetches when a new user loads the eBay search results page. I’ve produced this using the Fiddler2 web debugging proxy, hooked up to an instance of the Google Chrome browser running on my corporate machine at eBay.

Figure 2

Figure 2

Right now, new users make just over 100 requests to fetch the entire results page, and we’re reducing that number every month. Figure 2 shows you the first forty requests or so that are made to fetch the page shown in Figure 3. Notice that only i.html (our base page) is fetched when the session begins, and then the browser requests around six objects simultaneously as we make our way down the timeline. You’ll also see that most requests in this example are for 80.jpg, and each one is actually a different image thumbnail shown in the search results page. All up, for this query, 1 request is for the base page, 45 are for image thumbnails, 6 are for JavaScript files, 3 are for CSS files, 7 are for advertising assets, 18 are for static images, 8 are for merchandizing assets, and 1 is for tracking. (Note that the time on the x-axis in the Figure isn’t realistic, because we’re intercepting what the browser is doing using Fiddler and slowing down the experience. Figure 2 is therefore for visualizing and diagnosing performance, not measuring actual elapsed times.)

Figure 3

Figure 3

You’ve now seen how the browser interacts with eBay. In our next blog post I’ll talk about how to improve site speed. Until then, thanks for stopping by!

Hugh E. Williams
Vice President, Buyer Experience Development