The Power of Perceived Performance

Recent years have seen a huge influx of SPAs — Single Page Applications. Though they enhance user experience, implementing SPAs for large-scale web applications is indeed a complex task. At eBay, we faced a similar challenge when we wanted to migrate a couple of our key desktop flows (search and item pages) to an app-like experience, from the current state of full page refreshes. Some of the key challenges were

  • Server and client sync: Solving this challenge is super critical for e-commerce applications. Both the browser and the server should maintain the state of the app. At any point in time the URL should be portable, meaning it should render the page markup the same way every time. From an e-commerce perspective, three main circumstances make this point critical:   SEO, browser refreshes (especially for items ending soon), and URL sharing.
  • Code redundancy: This point follows from the previous one. In order for the app to be rendered on the server, all logic built in JavaScript for the browser should be replicated on the server. The result, however, is a maintenance nightmare, especially for large distributed teams. Although there are solutions like rendr and PhantomJS to simulate the browser environment on the server, they don’t work at scale.
  • Performance penalty for the first hit: Most SPA frameworks out there require the initial rendering on page load to happen on the browser. This is an anti-performance pattern and has proven to be a bad way to go for many (see, for example, Twitter's experience). In addition, with initial rendering on the browser we lose the huge benefits of the preload scanners in modern browsers. This point, then, is a reiteration that server-side rendering is not just an add-on, but a must.
  • Browser back/forward: This may come as a surprise to many, but from what we have observed, even the slightest deviation from the default back/forward actions of the browser has impacts on consumer behavior. Users are so accustomed to these actions (mainly in the desktop environment) that we need to make sure they work as expected. This is not much of an SPA challenge, but it is something to keep in mind.

Considering the above facts and still wanting to build a seamless experience, we decided to go the PJAX route. PJAX (pushState + AJAX) is a technique that delivers a fast browsing experience, without the SPA overhead. It has worked well for biggies like Twitter and GitHub. When looking to implement a PJAX library, we learned about YouTube’s SPF — Structured Page Fragments— from the 2014 Velocity conference (yes, SPF not SPA; we know it's confusing). A quick dive into SPF indicated it was pretty much what we wanted. Moreover, the contributors to SPF responded promptly to all our queries and enhancement requests, thus enabling us to get started quickly. So what does SPF offer?

  • Application code remains intact: For SPF, we don’t have to change the way we build applications. Also, no specialized client treatment is required. The only change needed was to add a conditional hook (based on a particular request URL param) in our server response pipeline to respond with HTML in JSON format, instead of with standard HTML. This benefit is huge to us, as development teams can build and maintain applications while being agnostic about how client-side navigations might happen.
  • Markup is on server: With SPF, markup is always rendered on the server. Along with the code maintenance benefit, this feature also removes the dependency on client hardware specifically for rendering. Although client machines are getting increasingly powerful, we still have a sizable set of our global users on the lower hardware spectrum whom we need to support. Our previous attempts at client-side rendering for these users were not fruitful. On a side note, we are very interested in trying out React for client-side rendering after initial load.

JavaScript and CSS optimizations 

Moving to SPF provided an opportunity for us to clean up JavaScript and CSS. We did two types of optimization.

  1. JavaScript events: Our pages did not have a standard way of handling events — some were handled at an individual element level and some used delegation. This situation was not ideal, as complex pages were sometimes sluggish because they had tons of events. Now with PJAX, we've brought in a standard:  to widgetize our UI modules and delegate events at the widget container level. This standard has made event handling more efficient. Furthermore, we needed to re-initialize only those widgets that were changed on page navigation.
  2. Resource bundling: Most pages were bundled in a way that there was only one JavaScript and one CSS URL per page. All library JS (jQuery, Raptor, tracking, utils, etc.) and CSS (skin) were combined with application JS and CSS, making them one URL each. While this was good for reducing HTTP requests, it also meant anti-caching. When a user navigates from one page to another, the entire CSS and JS have to be downloaded and executed; library files were the big chunk of this overhead, which was unnecessary. With SPF, this approach would fail right away, since it involves a single page context as well as executing the same library code (like jQuery), which would result in unintended behaviors. To fix this situation, we took on the task of creating two resource bundles for key pages — bundling all common JS and CSS shared across pages as one resource, and bundling the application JS and CSS per page as the second resource. This solution saves a lot of time in terms of resource parsing and execution, as only the small amount of application JS and CSS has to be processed on each navigation. Also, in SPF mode the resource processing happens only for the first navigation; for repeated views, the previously executed CSS and JS can be leveraged.

Progress indicators

Now back to why the title “The Power of Perceived Performance.” Moving to SPF measurably increased performance on each navigation. But we had a problem:  the performance gain was not visually perceptible. In an internal demo, the feedback was “yeah, it seems to be a little quicker, but nothing else is different.” We were scratching our heads about what was missing. Finally, it all came down to one thing — a progress indicator. Yes, we did not have progress indicators when users navigated pages in SPF mode.

Transitions or progress indicators mask the slowness in applications. There has been much research around this phenomenon, and we actually experienced the humongous impact it has. Close observation of all major websites that use the PJAX technique reveals they use some sort of progress indicator. For instance, Twitter navigation uses a small throbber, replacing the bird icon in the static header. GitHub replaces the icon right next to a file or folder with a throbber. YouTube shows a red progress bar at the top to indicate the progress of a user’s action.

When we were considering how to implement a transition for SPF-based navigation, a lot of fancy ideas came up. From internal testing, the feedback we received was clear :  more than the fancy stuff, customers just need an elegant transition. We ended up with a real-time progress bar similar to YouTube's. With the progress indicator in place, we did another internal launch. This time, the feedback was unanimous: “Wow! The page looks fast.”

It was surprising how a tiny progress indicator could change the perception of an entire application. The performance numbers with and without the progress indicators were the same. But just with that indicator, the application feels much faster. This is the real power of perceived performance. As a bonus, avoiding the re-parse and re-execution of large CSS and JavaScript on each navigation made our page interactions ready instantly.

Currently the PJAX-based navigation is enabled within the item page and is in production A/B testing. Search is next, and soon other key flows will follow. The ultimate goal is ONE eBay desktop experience.

Huge thanks to YouTube engineers Alex Nicksay, David Phillips, and Marcel Duran for their continuous support throughout the migration. Last but not least, thanks to my colleagues Karthik, Yaniv, Vidya, and Raghu for teaming up and making this is a reality.

Async Fragments: Rediscovering Progressive HTML Rendering with Marko

At eBay, we take site speed very seriously and are always looking for ways to allow developers to create faster-loading web apps. This involves fully understanding and controlling how web pages are delivered to web browsers. Progressive HTML rendering is a relatively old technique that can be used to improve the performance of websites, but it has been lost in a whole new class of web applications. The idea is simple: give the web browser a head start in downloading and rendering the page by flushing out early and multiple times. Browsers have always had the helpful feature of parsing and responding to the HTML as it is being streamed down from the server (even before the response is ended). This feature allows the HTML and external resources to be downloaded earlier, and for parts of the page to be rendered earlier. As a result, both the actual load time and the perceived load time improve.

In this blog post, we will take an in-depth look at a technique we call "Async Fragments" that takes advantage of progressive HTML rendering to improve site speed in ways that do not drastically complicate how web applications are built. For concrete examples we will be using Node.js, Express.js and the Marko templating engine (a JavaScript templating engine that supports streaming, flushing, and asynchronous rendering). Even if you are not using these technologies, this post can give you insight into how your stack of choice could be further optimized.

To see the techniques discussed in this post in action, please take a look at the accompanying sample application.

Background

Progressive HTML rendering is discussed in the post The Lost Art of Progressive HTML Rendering by Jeff Atwood, which was published back in 2005. In addition, the "Flush the Buffer Early" rule is described by the Yahoo! Performance team in their Best Practices for Speeding Up Your Web Site guide. Stoyan Stefanov provides an in-depth look at progressive HTML rendering in his Progressive rendering via multiple flushes post. Facebook discussed how they use a technique they call "BigPipe" to improve page load times and perceived performance by dividing up a page into "pagelets." Those articles and techniques inspired many of the ideas discussed in this post.

In the Node.js world, its most popular web framework, Express.js, unfortunately recommends a view rendering engine that does not allow streaming and thus prevents progressive HTML rendering. In a recent post, Bypassing Express View Rendering for Speed and Modularity, I described how streaming can be achieved with Express.js; this post is largely a follow-up to discuss how progressive HTML rendering can be achieved with Node.js (with or without Express.js).

Without progressive HTML rendering

A page that does not utilize progressive HTML rendering will have a slower load time because the bytes will not be flushed out until the complete HTML response is built. In addition, after the client finally receives the complete HTML it will then see that it needs to download additional external resources (such as CSS, JavaScript, fonts, and images), and downloading these external resources will require additional round trips. In addition, pages that do not utilize progressive HTML rendering will also have a slower perceived load time, since the screen will not update until the complete HTML is downloaded and the CSS and fonts referenced in the <head> section are downloaded. Without progressive HTML rendering, a server/client waterfall chart might be similar to the following:

Single Flush Waterfall Chart

The corresponding page controller might look something like this:

function controller(req, res) {
    async.parallel([
            function loadSearchResults(callback) {
                ...
            },
            function loadFilters(callback) {
                ...
            },
            function loadAds(callback) {
                ...
            }
        ],
        function() {
            ...
            var viewModel = { ... };
            res.render('search', viewModel);
        })
}

As you can see in the above code, the page HTML is not rendered until all of the data is asynchronously loaded.

Because the HTML is not flushed until all back-end services are completed, the user will be staring at a blank screen for a large portion of the time. This will result in a sub-par user experience (especially with a poor network connection or with slow back-end services). We can do much better if we flush part of the HTML earlier.

Flushing the head early

A simple trick to improve the responsiveness of a website is to flush the head section immediately. The head section will typically include the links to the external CSS resources (i.e. the <link> tags), as well as the page header and navigation. With this approach the external CSS will be downloaded sooner and the initial page will be painted much sooner as shown in the following waterfall chart:

Flush Head Waterfall Chart

As you can see in the chart above, flushing the head early reduces the time to render the initial page. This technique improves the responsiveness of the page, but it does not significantly reduce the total time it takes to make the page fully functional. With this approach, the server is still waiting for all back-end services to complete before flushing the final HTML. In addition, downloading of external JavaScript resources will be delayed since <script> tags are placed at the end of the page (assuming you are following best practices) and don't get sent out until the second and final flush.

Multiple flushes

Instead of flushing only the head early, it is often beneficial to flush multiple times before ending the response. Typically, a page can be divided into multiple fragments where some of the fragments may depend on data asynchronously loaded from various back-end services while others may not depend on any asynchronously loaded data. The fragments that depend on asynchronously loaded data should be rendered asynchronously and flushed as soon as possible.

For now, we will assume that these fragments need to be flushed in the proper HTML order (versus the order that the data asynchronously loads), but we will also show how out-of-order flushing can be used to further improve both page load times and perceived performance. When using "in-order" flushing, fragments that complete out of order will need to be buffered until they are ready to be flushed in the proper order.

In-order flushing of async fragments

As an example, let's assume we have divided a complex page into the following fragments:

Page diagram

Each fragment is assigned a number based on the order that it appears in the HTML document. In code, our output HTML for the page might look like the following:

<html>
<head>
    <title>Clothing Store</title>
    <!-- 1a) Head <link> tags -->
</head>
<body>
    <header>
       <!-- 1b) Header -->
    </header>
    <div class="body">
        <main>
            <!-- 2) Search Results -->
        </main>
        <section class="filters">
            <!-- 3) Search filters -->
        </section>
        <section class="ads">
            <!-- 4) Ads -->
        </section>
    </div>
    <footer>
        <!-- 5a) Footer -->
    </footer>
    <!-- 5b) Body <script> tags -->
</body>
</html>

The Marko templating engine provides a way to declaratively bind template fragments to asynchronous data provider functions (or Promises). An asynchronous fragment is rendered when the asynchronous data provider function invokes the provided callback with the data. If the asynchronous fragment is ready to be flushed, then it is immediately flushed to the output stream. Otherwise, if the asynchronous fragment completed out of order then the rendered HTML is buffered in memory until it is ready to be flushed. The Marko templating engine ensures that fragments are flushed in the proper order.

Continuing with the previous example, our HTML page template with asynchronous fragments defined will be similar to the following:

<html>
<head>
    <title>Clothing Store</title>
    <!-- Head <link> tags -->
</head>
<body>
    <header>
        <!-- Header -->
    </header>
    <div class="body">
        <main>
            <!-- Search Results -->
            <async-fragment data-provider="data.searchResultsProvider"
                var="searchResults">

                <!-- Do something with the search results data... -->
                <ul>
                    <li for="item in searchResults.items">
                        $item.title
                    </li>
                </ul>

            </async-fragment>
        </main>
        <section class="filters">

            <!-- Search filters -->
            <async-fragment data-provider="data.filtersProvider"
                var="filters">
                <!-- Do something with the filters data... -->
            </async-fragment>

        </section>
        <section class="ads">

            <!-- Ads -->
            <async-fragment data-provider="data.adsProvider"
                var="ads">
                <!-- Do something with the ads data... -->
            </async-fragment>

        </section>
    </div>
    <footer>
        <!-- Footer -->
    </footer>
    <!-- Body <script> tags -->
</body>
</html>

The data provider functions should be passed to the template as part of the view model as shown in the following code for a sample page controller:

function controller(req, res) {
    template.render({
            searchResultsProvider: function(callback) {
                performSearch(req.params.category, callback);
            },

            filtersProvider: function(callback) {
                ...
            },

            adsProvider: function(callback) {
                ...
            }
        },
        res /* Render directly to the output HTTP response stream */);
}

In this particular example, the "search results" async fragment appears first in the HTML template, and it happens to take the longest time to complete. As a result, all of the subsequent fragments will need to be buffered on the server. The resulting waterfall with in-order flushing of async fragments is shown below:

In-order Flush Waterfall Chart

While the performance of this approach might be fine, we can enable out-of-order flushing for further performance gains as described in the next section.

Out-of-order flushing of async fragments

Marko achieves out-of-order flushing of async fragments by doing the following:

Instead of waiting for an async fragment to finish, a placeholder HTML element with an assigned id is written to the output stream. Out-of-order async fragments are rendered before the ending <body> tag in the order that they complete. Each out-of-order async fragment is rendered into a hidden <div> element. Immediately after the out-of-order fragment, a <script> block is rendered to replace the placeholder DOM node with the DOM nodes of the corresponding out-of-order fragment. When all of the out-of-order async fragments complete, the remaining HTML (e.g. </body></html>) will be flushed and the response ended.

To clarify, here is what the output HTML might look like for a page with out-of-order flushing enabled:

<html>
<head>
    <title>Clothing Store</title>
    <!-- 1a) Head <link> tags -->
</head>
<body>
    <header>
        <!-- 1b) Header -->
    </header>
    <div class="body">
        <main>
            <!-- 2) Search Results -->
            <span id="asyncFragment0Placeholder"></span>
        </main>
        <section class="filters">
            <!-- 3) Search filters -->
            <span id="asyncFragment1Placeholder"></span>
        </section>
        <section class="ads">
            <!-- 4) Ads -->
            <span id="asyncFragment2Placeholder"></span>
        </section>
    </div>
    <footer>
        <!-- 5a) Footer -->
    </footer>

    <!-- 5b) Body <script> tags -->

    <script>
    window.$af=function(){
    // Small amount of code to support rearranging DOM nodes
    // Unminified:
    // https://github.com/raptorjs/marko-async/blob/master/client-reorder-runtime.js
    };
    </script>

    <div id="asyncFragment1" style="display:none">
        <!-- 4) Ads content -->
    </div>
    <script>$af(1)</script>

    <div id="asyncFragment2" style="display:none">
        <!-- 3) Search filters content -->
    </div>
    <script>$af(2)</script>

    <div id="asyncFragment0" style="display:none">
        <!-- 2) Search results content -->
    </div>
    <script>$af(0)</script>

</body>
</html>

One caveat with out-of-order flushing is that it requires JavaScript running on the client to move each out-of-order fragment into its proper place in the DOM. Thus, you would only want to enable out-of-order flushing if you know that the client's web browser has JavaScript enabled. Also, moving DOM nodes may cause the page to be reflowed, which can be visually jarring to the user and result in more client-side CPU usage. If reflow is an issue then there are tricks that can be used to avoid a reflow (e.g., reserving space as part of the initial wireframe). Marko also allows alternative content to be shown while waiting for an out-of-order async fragment.

To enable out-of-order flushing with Marko, the client-reorder="true" attribute must be added to each <async-fragment> tag, and the <async-fragments> tag must be added to the end of the page to serve as the container for rendered out-of-order fragments. Here is the updated <async-fragment> tag for the search results fragment:

<async-fragment data-provider="data.searchResultsProvider"
    var="searchResults"
    client-reorder="true>
    ...
</async-fragment>

The updated HTML page template with the new <async-fragments> tag is shown below:

<html>
<head>
    <title>Clothing Store</title>
    <!-- Head <link> tags >
</head>
<body>
    ...

    <!-- Body <script> tags -->

    <async-fragments/>
</body>
</html>

In combination with out-of-order flushing, it may be beneficial to move <script> tags that link to external resources to the end of the first chunk (before all of the out-of-order chunks). While the server is busy preparing the rest of the page, the client can start downloading the external JavaScript required to make the page functional. As a result, the user will be able to start interacting with the page sooner.

Our final waterfall with out-of-order flushing will now be similar to the following:

Out-of-order Flush Waterfall Chart

The final waterfall shows that the strategy of out-of-order flushing of asynchronous fragments can significantly improve the load time and perceived load time of a page. The user will be met with a progressive loading of a page that is ready to be interacted with sooner.

Additional considerations

HTTP Transport and HTML compression

To allow HTML to be served in parts, chunked transfer encoding should be used for the HTTP response. Chunked transfer encoding uses delimiters to break up the response, and each flush results in a new chunk. If gzip compression is enabled (and it should be) then flushing the pending data to the gzip stream will result in a gzip data frame being written to the response as part of each chunk. Flushing too often will negatively impact the effectiveness of the compression algorithm, but without flushing periodically then progressive HTML rendering will not be available. By default, Marko will flush at the beginning of an <async-fragment> block (in order to send everything that has already completed), as well as when an async fragment completes. This default strategy results in efficient progressive loading of an HTML page as long as there are not too many async fragments.

Binding behavior

For improved usability and responsiveness, there should not be a long delay between rendered HTML being displayed to the user in the web browser and behavior being attached to the associated DOM. At eBay, we use the marko-widgets module to bind behavior to DOM nodes. Marko Widgets supports binding behavior to rendered widgets immediately after each independent async fragment, as illustrated in the accompanying sample app. For immediate binding to work, the required JavaScript code must be included earlier in the page. For more details, please see the marko-widgets module documentation.

Error handling

It is important to note that as soon as a byte is flushed for the HTTP body, then the response is committed; no additional HTTP headers can be sent (e.g., no server-side redirects or cookie-setting), and the HTML that has been sent cannot be "unsent". Therefore, if an asynchronous data provider errors or times out, then the app must be prepared to show alternative content for that particular async fragment. Please see the documentation for the marko-async module for additional information on how to show alternative content in case of an error.

Summary

The Async Fragments technique allows web developers to maximize the benefits of progressive HTML rendering to produce web pages that have improved actual and perceived load times. Developers at eBay have found the concept of binding HTML template fragments to asynchronous data providers easy to grasp and utilize. In addition, the flexibility to support both in-order and out-of-order flushing of async fragments makes this technique applicable for all web browsers and user agents.

The Marko templating engine is being used as part of eBay's latest Node.js stack to improve performance while also simplifying how pages are constructed on both the server and the client. Marko is one of a few templating engines for Node.js and web browsers that support streaming, flushing, and asynchronous rendering. Marko has a simple HTML-based syntax, and the Marko compiler produces small and efficient JavaScript modules as output. We encourage you to try Marko online and in your next Node.js project. Because Marko is a key component of eBay's internal Node.js stack, and given that it is heavily documented and tested, you can be confident that it will be well supported.

Patrick Steele-Idem is a member of eBay's platform team who enjoys writing open-source software and improving how web applications are built. He is the author of RaptorJS, a suite of open-source front-end power tools that are being used within and outside eBay. You can follow Patrick on Twitter at @psteeleidem.

Diversity in Search

This post is about how to rank search results, specifically the list of items you get after doing a search on eBay.

Diversity

One simple approach to ranking is to give a score to each item. For concreteness, imagine the score to be an estimate of the probability that a user will buy the item. Each item is scored individually, and then the items are presented in score order, with the item most likely to be purchased at the top.

This simple approach makes a lot of sense. But there are some situations where it can work poorly. For example, suppose that there are 40 results, 20 for Sony and 20 for Panasonic. And suppose that all the Sony items have a score that is just a tiny bit better than any of the Panasonic items. Then the first 20 results will be Sony, giving the incorrect impression that we only have Sony. It would be better to show at least one Panasonic near the top, preferably high enough to be "above the fold''. In other words, in this case I want to take diversity into account. An ultra-simple method to handle this case would be to add a small amount of random noise to each score. But I think you can see that this is somewhat of a kludge, and doesn't address the problem of balancing score and diversity head-on.

Solutions in the literature

One of the first principled methods for diversity goes back 15 years to the paper of Jaime Carbonell and Jade Goldstein, The use of MMR, diversity-based reranking for reordering documents and producing summarization, in Proceedings of SIGIR '98. Their idea is a simple one. Find a way to measure the similarity of items, where presumably two Sony items would be more similar than a Sony item and a Panasonic one. Then trade off the score and the similarity. In more detail, they propose placing items on the results page one step at a time. The first step places the item with the highest score, and records its index in the set of placed items I. The second step examines every unplaced item and compares each one's score with its similarity to the item i (where I={i}). If P(j) is the score of j (P for probability) then each item j is evaluated using what they call marginal relevance: λP(j)−(1−λ)S(i,j). The marginal relevance is higher when the score P is better, and it is higher when the similarity to i is less. The parameter λ determines the relative importance of these two numbers. The item chosen for the second slot is the one with maximum marginal relevance (MMR):

 MMR = \mbox{argmax}_{j \notin I}\left(\lambda P(j) - (1 - \lambda)S(i, j)\right)

The MMR item is placed on the results page, and its index is added to I. In general the algorithm computes

MMR = \mbox{argmax}_{j \notin I}\left(\lambda P(j) - (1 - \lambda)\max_{i \in I}S(i, j)\right)

and places the resulting item on the results page as well as adding its index in I.

This method seems an improvement over adding random noise, but is unsatisfying for several reasons. Once it places an item, it is reluctant to place a second similar item. This reluctance may be fine for web search where you only want to see one instance of each kind of page, but is not so good for e-commerce where you'd like to compare multiple items. Also, there are almost certainly several dimensions of similarity (brand, cost, new vs. used, etc.), and it's not clear how to combine them all into a single similarity function.

Rakesh Agrawal et. al. have a paper Diversifying search results from WSDM '09 that doesn't address any of those objections, but does address a different problem: how do you pick λ? They propose an algorithm that doesn't have an arbitrary tuning parameter. They suppose that each item is in a category and that the demand (for a fixed query) of each category is known. This approach maps well with eBay search, since we know the category demand for our top queries. How to use this info? They imagine that users who issue query Q are diverse---that some are looking for items in category C1 while others want an item in a different category C2, and that this diversity is captured in the category demand table. The paper gives an algorithm that maximizes the chance that a user will find an item in his or her desired category. Let's call this the Agrawal algorithm.

The Agrawal algorithm is a step above MMR in elegance because it doesn't require fiddling with an arbitrary constant like λ. It works well for categories, but what about other forms of diversity like brand or item condition (new vs. used)? Just as we record category demand, we could record demand for brands, item condition, etc. But then we would need a way to soup-up the Agrawal algorithm to combine these different demands, and presumably would need to prioritize them. And like MMR, the Agrawal algorithm heavily penalizes an item if a similar one already exists, which is not appropriate for e-commerce.

A paper by Michael Welch et. al. Search result diversity for informational queries, WWW '11, addresses one of the objections to Agrawal. They assume that users want to see more than one document per category. Specifically, they introduce a random variable J representing the number of items a user wants to see. You have to provide an estimate for the probability distribution of J (i.e., find numbers pj = Pr(J = j)) and then the algorithm uses that in its ranking. But this approach still doesn't address our requirement to gracefully handle multiple dimensions of diversity.

There's another factor to take into account. With multiple dimensions, we are unlikely to have a good estimate of the demand. For example, if the dimensions are category, brand and condition, we would ideally want the demands for each tuple: for example, (category=TV, Brand=Sony, condition=Used). For all these reasons, we abandoned the Carbonell → Agrawal → Welch train of thought, and took a different approach.

Our solution: agents

For each query, we specify which dimensions are important, together with constraints in the form of a max and/or min. For example, for the query "flat screen TV'', we might want at least 10% new items, and at most 50% with brand Samsung. This fits in nicely with our search architecture, which lets us customize the search behavior for a specific query by putting info into DSBE, a database indexed by query. Also we expect that in the common case the min/max constraints aren't violated, and so the exact value of the constraints aren't critical.

You can think of this as a very simple agent system. Each dimension has an agent with constraints (max/min). Each agent monitors the construction of the results page, and tries to keep its constraint satisfied. The agents won't always succeed, so think of the requirements as being soft constraints.

Whenever you have multiple constraints, your first question should be how to prioritize them. The answer falls out almost automatically from the agent algorithm, which I now describe.

Each agent monitors the construction of the result set. At each stage, the agent checks on its constraints. If any are violated, it scans through the unplaced items looking for an item that will get the result set closer to satisfying its constraints. This item is the candidate.

If the agent's constraint is violated, it is unhappy and it quantifies its unhappiness by summing two terms. The first term measures how much the placed items deviate from the agent's constraints. The more deviation, the more unhappiness. The second term involves the candidate. There will be a penalty for placing the candidate on the search results page (SRP) instead of placing the default item, because the candidate has a lower score (P) than the default has. Summarizing the terminology:

  • Candidate: An item proposed by the agent for the next spot on the results page. If no agent proposes a candidate, the default item is the unplaced item of highest score.
  • Deviance: How far the placed items deviate from the agent's constraints. Larger means more deviation.
  • Penalty: The penalty (in reduced score) for placing the candidate instead of the default. Larger means more of a penalty, i.e. more gap between the scores.
  • Unhappiness: The agent's unhappiness. If unhappiness>0 then the agent's candidate will be a contender for the next placed item.

Now back to the question of prioritizing. The priority falls out automatically. If multiple agents are unhappy, we simply pick the one with the largest unhappiness score, and use its candidate as the next item to be placed.

This approach isn't hassle-free. You need to pick constraints (max/min) for each query. And you need to quantify deviance and penalty. Or in other words, find how to weight them, which is reminiscent of the parameter λ in MMR. But we prefer this approach because it is efficient, handles multiple constraints, and is not too sensitive to the exact values of max and min. For most queries, we expect the constraints to hold with the default ranking, and so placing an agent's candidate is the exception rather than the rule.

A sample formula for unhappiness

The description in the previous section was abstract in that it talked about deviance and penalty without offering a way to compute them. Here is one way.

A typical constraint requires that the fraction items with property P be at least f (this is a min constraint). P might be that an item is used, or has brand Sony, or is newly listed. The straightforward way to compute deviation from a constraint is to look at the number of items with P placed so far and compare it to the number needed to meet the constraint. Suppose that there are n items placed so far, and that k of them are in P. Since the constraint is min = f we'd like at least n items. If k < nf the constraint isn't met, so set the deviance to nf k . Or more precisely, deviance is (nf k)+ where x+ = x when x > 0 and 0 otherwise.

A very simple way to compute the penalty is to compare the score of the candidate item IC with the item that would otherwise be placed, Idefault. The penalty is S(IC ) − S(Idefault). Since items are sorted in score order, the penalty is always nonnegative.

Unhappiness is computed using the formula unhappiness = deviance − λ × penalty. Imagine deviance > 0 but the only way to fix it is by placing an item with a huge penalty. The large minus term makes unhappiness negative, so the agent isn't unhappy after all. In other words, agents are altruistic and don't insist on their constraint if it results in placing a poor item. The parameter \lambda controls the tradeoff between score and diversity.

As a preliminary formula, unhappiness is

\text{unhappiness} = \text{deviance} - \lambda \times \text{penalty} = (nf - k)^+ - \lambda ( S(I_C) - S(I_{\scriptsize{\text{default}}}))

But there's one issue remaining. Suppose the constraint asks for at least 10%, and suppose the first item placed does not have the desired property. Then n = 1, f = 0.1, and k = 0 so (nfk )+ =(0.1)+ =0.1 and the constraint is already unhappy and will be in the running to have its proposed items placed next. This seems like jumping the gun, since the goal is 10% but only one item has been placed. There are at least two ways to account for this situation.

The first would be rounding: replace (nfk)+ with round(nfk)+. But I think of constraints as something only triggered in exceptional circumstances, and so prefer to have unhappiness go negative only at the last possible minute. In other words, if it is still possible to satisfy nfk on the next round, then don't trigger the constraint.

Writing this as an equation, suppose I pass on this constraint and the next item turns out to not have P. Then k stays the same and n becomes n + 1. But on the next round I pick an item with P. Then k becomes k + 1 and n becomes n + 2. So I can satisfy the constraint in the future if (n + 2)fk + 1. So replace (nf−k)+ with ((n+2)f − (k+1))+. The formula becomes

 \boxed{{\mbox{unhappiness}} = {\mbox{deviance}} - \lambda \times \mbox{penalty}= ((n+2)f - k - 1)^+ - \lambda (S(I_C) - S(I_{\scriptsize{\text{default}}}))}

Constraints

There are two types of constraints based on a property P. The first is a min constraint, which asks that the fraction of items with P be greater than fmin. The second is max constraint, asking for the fraction to be less than fmax. Most properties are something very specific, like item=used. But there is also an any property. This would be used (for example) to keep any one seller from dominating the results. A max constraint with property seller, f and using any asks that there be no seller id with more than f of the results. The any property doesn't make sense with the min constraint.

The any property is also a nice way to do duplicate (or fuzzy) dup removal. We assign a hash to each item, so that duplicate (or similar) items have the same hash. Then using a max with f = 1/50 means (roughly) that each item will appear only once on each 50-item page. If λ = 0 the requirement is absolute (unless there are competing agents). But by setting λ to a small number, we can allow dups if the alternative is to show very low quality items.

The algorithm

I close with a more formal description of the agent-based diversity algorithm. It can be implemented quite efficiently. In MMR (as well as Welch and Agrawal), at each step you must scan all unplaced items (think O(n2)). In our algorithm, you scan each item once for the entire algorithm (although this is done separately for each constraint). Scanning is implemented by having a pointer per constraint. The pointer moves down the unplaced list as the algorithm proceeds.

Input: a list of unplaced items, sorted by score
Output: items in their final arrangement for display on the search results page (SRP)
Remove the top item from unplaced and append it to SRP
For every constraint C, set C.ptr := first unplaced item
While unplaced is not empty
   initialize the set candidates to be empty
   ForEach C in set of soft constraints
       If C.unhappiness() > 0
           add C to the candidates
           // this has the side effect of updating C.ptr
       end
   EndForEach
   If candidates = empty then
       set item to be the top (highest score) item on unplaced
   else
       take the element C in candidates with the largest unhappiness score
       set item := C.proposed_item()
   end
   if there are any C.ptr pointing at item, move them down one
   remove item from unplaced and append it to SRP
EndWhile
Method C.unhappiness() =
   n := number of items placed in SRP so far
   f := C.f // the bound for this constraint
   k := the number of items placed so far that satisfy the constraint C
   // For the "any" property, k is the max number of items with the
   // same property value that satisfy the constraint
   if C.op = min then
      deviance := ((n+2)f - k - 1)+
   else
      deviance := (k + 1 - (n+2)f)+
   end
   Starting at C.ptr, scan down unplaced until reaching an item I that will reduce deviance
   C.ptr := I
   Idefault := highest scoring unplaced item
   penalty := S(I) - S(Idefault)
   unhappiness := deviance - lambda * penalty
   // set proposed_item field and return
   C.proposed_item := I
   return unhappiness
end