Building a UI Component in 2017 and Beyond

As web developers, we have seen the guidelines for building UI components evolve over the years. Starting from jQuery UI to the current Custom Elements, various patterns have emerged. To top it off, there are numerous libraries and frameworks, each advocating their own style on how a component should be built. So in today’s world, what would be the best approach in terms of thinking about a UI component interface? That is the essence of this blog. Huge thanks to folks mentioned in the bottom Acknowledgments section. Also, this post leverages a lot of learnings from the articles and projects listed in the Reference section at the end.

Setting the context

Before we get started, let’s set the context on what this post covers.

  • By UI components, we mean the core, standalone UI patterns that apply to any web page in general. Examples would be Button, Dropdown Menu, Carousel, Dialog, Tab, etc. Organizations in general maintain a pattern library for these core components. We are NOT talking about application-specific components here, like a Photo Viewer used in social apps or an Item Card used in eCommerce apps. They are designed to solve application-specific (social, eCommerce, etc.) use cases. Application components are usually built using core UI components and are tied with a JavaScript framework (Vue.js, Marko, React, etc.) to create a fully featured web page.

  • We will only be talking about the interface (that is, API) of the component and how to communicate. We will not go over the implementation details in depth, just a quick overview.

Declarative HTML-based interface

Our fundamental principle behind building a UI component API is to make it agnostic in regard to any library or framework. This means the API should be able to work in any context and be framework-interoperable. Any popular framework can just leverage the interface out of the box and augment it with their own implementation. With this principle in mind, what would be the best way to get started? The answer is pretty simple — make the component API look like how any other HTML element would look like. Just like how a <div>, <img>, or a <video> tag would work. This is the idea behind having a declarative HTML-based interface/API.

So what does a component look like? Let’s take carousel as an example. This is what the component API will look like.

<carousel index="2" controls aria-label-next="next" aria-label-previous="previous" autoplay>
    <div>Markup for Item 1</div>
    <div>Markup for Item 2</div>
    ...
</carousel>

What does this mean? We are declaratively telling the component that the rendered markup should do the following.

  • Start at the second item in the carousel.
  • Display the left and right arrow key controls.
  • Use "previous" and "next" as the aria-label attribute values for the left and right arrow keys.
  • Also, autoplay the carousel after it is loaded.

For a consumer, this is the only information they need to know to include this component. It is exactly like how you include a <button> or <canvas> HTML element. Component names are always lowercase. They should also be hyphenated to distinguish them from native HTML elements. A good suggestion would be to prefix them with a namespace, usually your organization or project name, for example ebay-, core-, git-, etc.

Attributes form the base on how you will pass the initial state (data and configuration) to a component. Let’s talk about them.

Attributes

  • An attribute is a name-value pair where the value is always a string. Now the question may arise that anything can be serialized as a string and the component can de-serialize the string to the associated data type (JSON for example). While that is true, the guideline is to NOT do that. A component can only interpret the value as a String (which is default) or a Number (similar to tabindex) or a JavaScript event handler (similar to DOM on-event handlers). Again, at the end of the day, this is exactly how an HTML element works.

  • Attributes can also be boolean. As per the HTML5 spec, “The presence of a boolean attribute on an element represents the true value, and the absence of the attribute represents the false value.” This means that as a component developer, when you need a boolean attribute, just check for the presence of it on the element and ignore the value. Having a value for it has no significance; both creator and consumer of a component should follow the same. For example, <button disabled="false"> will still disable the button, even if it is set to false, just because the boolean attribute disabled is present.

  • All attribute names should be lowercase. Camel case or Pascal case is NOT allowed. For certain multiword attributes, hyphenated names like accept-charset, data-*, etc. can be used, but that should be a rare occurrence. Even for multiwords, try your best to keep them as one lowercase name, for example, crossorigin, contenteditable, etc. Check out the HTML attribute reference for tips on how the native elements are doing it.

We can correlate the above attribute rules with our <carousel> example.

  • aria-label-next and aria-label-previous as string attributes. We hyphenate them as they are multiwords, very similar to the HTML aria-label attribute.
  • index attribute will be deserialized as a number, to indicate the position of the item to be displayed.
  • controls and autoplay will be considered as boolean attributes.

A common pattern that used to exist (or still exists) is to pass configuration and data as JSON strings. For our carousel, it would be something like the following example.

<!-- This is not recommended -->
<carousel 
    data-config='{"controls": true, "index": 2, "autoplay": true, "ariaLabelNext": "next", "ariaLabelPrevious": "previous"}' 
    data-items='[{"title":"Item 1", ..}, {"title": "Item 2", ...}]'>
</carousel>

This is not recommended.

Here the component developer reads the data attribute data-config, does a JSON parse, and then initializes the component with the provided configuration. They also build the items of the carousel using data-items. This may not be intuitive, and it works against a natural HTML-based approach. Instead consider a declarative API as proposed above, which is easy to understand and aligns with the HTML spec. Finally, in the case of a carousel, give the component consumers the flexibility to build the carousel items however they want. This decouples a core component from the context in which it is going to be used, which is usually application-specific.

Array-based

There will be scenarios where you really need to pass an array of items to a core component, for example, a dropdown menu. How to do this declaratively? Let’s see how HTML does it. Whenever any input is a list, HTML uses the <option> element to represent an item in that list. As a reference, check out how the <select> and <datalist> elements leverage the <option> element to list out an array of items. Our component API can use the same technique. So in the case of a dropdown menu, the component API would look like the following.

<dropdown-menu list="options" index="0">
    <option value="0" selected>--Select--</option>
    <option value="1">Option 1</option>
    <option value="2">Option 2</option>
    <option value="3">Option 3</option>
    <option value="4">Option 4</option>
    <option value="5">Option 5</option>
</dropdown-menu>

It is not necessary that we should always use the <option> element here. We could create our own element, something like <dropdown-option>, which is a child of the <dropdown-menu> component, and customize it however we want. For example, if you have an array of objects, you can represent each object ({"userName": "jdoe", "score": 99, "displayName": "John Doe"}) declaratively in the markup as <dropdown-option value="jdoe" score="99">John Doe</dropdown-option>. Hopefully you do not need a complex object for a core component.

Config-based

You may also argue that there is a scenario where I need to pass a JSON config for it to work or else usability becomes painful. Although this is a rare scenario for core components, a use case I can think about will be a core analytics component. This component may not have a UI, but it does all tracking related stuff, where you need to pass in a complex JSON object. What do we do? The AMP Project has a good solution for this. The component would look like the following.

<analytics>
    <script type="application/json">
    {
      "requests": {
        "pageview": "https://example.com/analytics?pid=",
        "event": "https://example.com/analytics?eid="
      },
      "vars": {
        "account": "ABC123"
      },
      "triggers": {
        "trackPageview": {
          "on": "visible",
          "request": "pageview"
        },
        "trackAnchorClicks": {
          "on": "click",
          "selector": "a",
          "request": "event",
          "vars": {
            "eventId": "42",
            "eventLabel": "clicked on a link"
          }
        }
      }
    }
    </script>
</analytics>

Here again we piggyback the interface based on how we would do it in simple HTML. We use a <script> element inside the component and set the type to application/json, which is exactly what we want. This brings back the declarative approach and makes it look natural.

Communication

Till now we talked only about the initial component API. This enables consumers to include a component in a page and set the initial state. Once the component is rendered in the browser, how do you interact with it? This is where the communication mechanism comes into play. And for this, the golden rule comes from the reactive principles of

Data in via attributes and properties, data out via events

This means that attributes and properties can be used to send data to a component and events send the data out. If you take a closer look, this is exactly how any normal HTML element (input, button, etc.) behaves. We already discussed attributes in detail. To summarize, attributes set the initial state of a component, whereas properties update or reflect the state of a component. Let’s dive into properties a bit more.

Properties

At any point in time, properties are your source of truth. After setting the initial state, some attributes do not get updated as the component changes over time. For example, typing in a new phrase in an input text box and then calling element.getAttribute('value') will produce the previous (stale) value. But doing element.value will always produce the current typed-in phrase. Certain attributes, like disabled, do get reflected when the corresponding property is changed. There has always been some confusion around this topic, partly due to legacy reasons. It would be ideal for attributes and properties to be in sync, as the usability benefits are undeniable.

If you are using Custom Elements, implementing properties is quite straightforward. For a carousel, we could do this.

class Carousel extends HTMLElement {  
    static get observedAttributes() {
        return ['index'];
    }
    // Called anytime the 'index' attribute is changed
    attributeChangedCallback(attrName, oldVal, newVal) {
        this[attrName] = newVal;
    }
    // Takes an index value
    set index(idx) {
        // First check if it is numeric
        const numericIndex = parseInt(idx, 10);
        if (isNaN(numericIndex)) {
            return;   
        }
        // Update the internal state
        this._index = numericIndex;
        /* Perform the associated DOM operations */
        moveCarousel();
    }
    get index() {
        return this._index;
    }
}

Here the index property gets all its associated characteristics. If you are doing carouselElement.index=4, it will update the internal state and then perform the corresponding DOM operations to move the carousel to the fourth item. Additionally, even if you directly update the attribute carouselElement.setAttribute('index', 4), the component will still update the index property, the internal state and perform the exact DOM operations to move the carousel to the fourth item.

However, until Custom Elements gain massive browser adoption and have a good server-side rendering story, we need to come up with other mechanisms to implement properties. And one way would be to use the Object.defineProperty() API.

const carouselElement = document.querySelector('#carousel1');
Object.defineProperty(carouselElement, 'index', {
    set(idx) {
        // First check if it is numeric
        const numericIndex = parseInt(idx, 10);
        if (isNaN(numericIndex)) {
            return;   
        }
        // Update the internal state
        this._index = numericIndex;
        /* Perform the associated DOM operations */
        moveCarousel();        
    },
    get() {
        return this._index;
    }
});

Here we are augmenting the carousel element DOM node with the index property. When you do carouselElement.index=4, it gives us the same functionality as the Custom Element implementation. But directly updating an attribute with carouselElement.setAttribute('index', 4) will do nothing. This is the tradeoff in this approach. (Technically we could still use a MutationObserver to achieve the missing functionality, but that would be an overkill.) Hopefully as a team, if you can standardize that state updates should only happen through properties, then it should be less of a concern.

With respect to naming conventions, since properties are accessed programmatically, they should always be camel-cased. All exposed attributes (an exception would be ARIA attributes) should have a corresponding camel-cased property, very similar to native DOM elements.

Events

When the state of a component has changed, either programmatically or due to user interaction, it has to communicate the change to the outside world. And the best way to do it is by dispatching events, very similar to click or touchstart events dispatched by a native HTML element. The good news is that the DOM comes with a built-in custom eventing mechanism through the CustomEvent constructor. So in the case of a carousel, we can tell the outside world that the carousel transition has been completed by dispatching a transitionend event as shown below.

const carouselElement = document.querySelector('#carousel1');

// Dispatching 'transitionend' event
carouselElement.dispatchEvent(new CustomEvent('transitionend', {
    detail: {index: this._index}
}));

// Listening to 'transitionend' event
carouselElement.addEventListener('transitionend', event => {
    alert(`User has moved to item number ${event.detail.index}`);
});

By doing this, we get all the benefits of DOM events like bubbling, capture etc. and also the event APIs like event.stopPropagation(), event.preventDefault(), etc. Another added advantage is that it makes the component framework-agnostic, as most frameworks already have built-in mechanisms for listening to DOM events. Check out Rob Dodson’s post on how this works with major frameworks.

Regarding a naming convention for events, I would go with the same guidelines that we listed above for attribute names. Again, when in doubt, look at how the native DOM does it.

Implementation

Let me briefly touch upon the implementation details, as they give the full picture. We have been only talking about the component API and communication patterns till now. But the critical missing part is that we still need JavaScript to provide the desired functionality and encapsulation. Some components can be purely markup- and CSS-based, but in reality, most of them will require some amount of JavaScript. How do we implement this JavaScript? Well, there a couple of ways.

  • Use vanilla JavaScript. Here the developer builds their own JavaScript logic for each component. But you will soon see a common pattern across components, and the need for abstraction arises. This abstraction library will pretty much be similar to those numerous frameworks out in the wild. So why reinvent the wheel? We can just choose one of them.

  • Usually in organizations, web pages are built with a particular library or framework (Angular, Ember, Preact, etc.). You can piggyback on that library to implement the functionality and provide encapsulation. The drawback here is that your core components are also tied with the page framework. So in case you decide to move to a different framework or do a major version upgrade, the core components should also change with it. This can cause a lot of inconvenience.

  • You can use Custom Elements. That would be ideal, as it comes default in the browsers, and the browser makers recommend it. But you need a polyfill to make it work across all of them. You can try a Progressive Enhancement technique as described here, but you would lose functionality in non-supportive browsers. Moreover, until we have a solid and performant server-side rendering mechanism, Custom Elements would lack mass adoption.

And yes, all options are open-ended. It all boils down to choices, and software engineering is all about the right tradeoffs. My recommendation would be to go with either Option 2 or 3, based on your use cases.

Conclusion

Though the title mentions the year “2017”, this is more about building an interface that works not only today but also in the future. We are making the component API-agnostic of the underlying implementation. This enables developers to use a library or framework of their choice, and it gives them the flexibility to switch in the future (based on what is popular at that point in time). The key takeaway is that the component API and the principles behind it always stay the same. I believe Custom Elements will become the default implementation mechanism for core UI components as soon as they gain mainstream browser adoption.

The ideal state is when a UI component can be used in any page, without the need of a library or polyfill and it can work with the page owner’s framework of choice. We need to design our component APIs with that ideal state in mind and this is a step towards it. Finally, worth repeating, when in doubt, check how HTML does it, and you will probably have an answer.

Acknowledgments

Many thanks to Rob Dodson and Lea Verou for their technical reviews and valuable suggestions. Also huge thanks to my colleagues Ian McBurnie, Arun Selvaraj, Tony Topper, and Andrew Wooldridge for their valuable feedback.

References

Elasticsearch Cluster Lifecycle at eBay

Defining an Elasticsearch cluster lifecycle

eBay’s Pronto, our implementation of the “Elasticsearch as service” (ES-AAS) platform, provides fully managed Elasticsearch clusters for various search use cases. Our ES-AAS platform is hosted in a private internal cloud environment based on OpenStack. The platform currently manages around 35+ clusters and supports multiple data center deployments. This blog provides guidelines on all the different pieces for creating a cluster lifecycle to allow streamlined management of Elasticsearch clusters. All Elasticsearch clusters deployed within the eBay infrastructure follow our defined Elasticsearch lifecycle depicted in the figure below.

Cluster preparation

This lifecycle stage begins when a new use case is being onboarded onto our ES-AAS platform.

On-boarding information

Customers’ requirements are captured onto an onboarding template that contains information such as document size, retention policy, and read/write throughput requirement. Based on the inputs provided by the customer, infrastructure sizing is performed. The sizing uses historic learnings from our benchmarking exercises. On-boarding information has helped us in cluster planning and defining SLA for customer commitments.

We collect the following information from customers before any use case is onboarded:

  • Use case details: Consists of queries relating to use case description and significance.
  • Sizing Information: Captures the number of documents, their average document size, and year-on-year growth estimation.
  • Data read/write information: Consists of expected indexing/search rate, mode of ingestion (batch mode or individual documents), data freshness, average number of users, and specific search queries containing any aggregation, pagination, or sorting operations.
  • Data source/retention: Original data source information (such as Oracle, MySQL, etc.) is captured on an onboarding template. If the indices are time-based, then an index purge strategy is logged. Typically, we do not use Elasticsearch as the source of data for critical applications.

Benchmarking strategy

Before undertaking any benchmarking exercise, it’s really important to understand the underlying infrastructure that hosts your VMs. This is especially true in a cloud-based environment where such information is usually abstracted from end users. Be aware of different potential noisy-neighbors issues, especially on a multi-tenant-based infrastructure.

Like most folks, we have also performed extensive benchmarking exercise on existing hardware infrastructure and image flavors. Data stored in Elasticsearch clusters are specific to customer use cases. It is near to impossible to perform benchmarking runs on all data schemas used by different customers. Therefore, we made assumptions before embarking on any benchmarking exercise, and the following assumptions were key.

  • Clients will use a REST path for any data access on our provisioned Elasticsearch clusters. (No transport client)
  • To start with, we kept a mapping of 1GB RAM to 32GB disk space ratio. (This was later refined as we learnt from benchmarking)
  • Indexing numbers were carefully profiled for different numbers of replicas (1, 2, and 3 replicas).
  • Search benchmarking was done always on GetById queries (as search queries are custom and profiling different custom search queries was not viable).
  • We used fixed-size 1KB, 2KB, 5KB, and 10 KB documents

Working from these assumptions, we derived at a maximum shard size for performance (around 22GB), right payload size for _bulk requests (~5MB), etc. We used our own custom JMeter scripts to perform benchmarking. Recently Elasticsearch has developed and open-sourced the Rally benchmarking tool, which can be used as well. Additionally, based on our benchmarking learnings, we created a capacity-estimation calculator tool that can take in customer requirement inputs and calculate the infrastructure requirement for a use case. We avoided a lot of conversation with our customers on infrastructure cost by sharing this tool directly with end users.

VM cache pool

Our ES clusters are deployed by leveraging an intelligent warm-cache layer. The warm-cache layer consists of ready-to-use VM nodes that are prepared over a period of time based on some predefined rules. This ensures that VMs are distributed across different underlying hardware uniformly. This layer has allowed us to quickly spawn large clusters within seconds. Additionally, our remediation engine leverages this layer to flex up nodes on existing clusters without errors or any manual intervention. More details on our cache pool are available in another eBay tech blog at Ready-to-use Virtual-machine Pool Store via warm-cache

Cluster deployment

Cluster deployment is fully automated via a Puppet/Foreman infrastructure. We will not talk in detail about how Elasticsearch Puppet module was leveraged for provisioning Elasticsearch clusters. This is well documented at Elasticsearch puppet module. Along with every release of Elasticsearch, a corresponding version of the Puppet module is generally made publically available. We have made minor modifications to these Puppet scripts to suit eBay-specific needs. Different configuration settings for Elasticsearch are customized based on our benchmarking learnings. As a general guideline, we do not set the JVM heap memory size to more than 28 GB (because doing so leads to long garbage collection cycles), and we always disable in-memory swapping for the Elasticsearch JVM process. Independent clusters are deployed across data centers, and load balancing VIPs (Virtual IP addresses) are provisioned for data access.

Typically, with each cluster provisioned we give out two VIPs, one for data read operations and another one for write operations. Read VIPs are always created over client nodes (or coordinating nodes), while write VIPs are configured over data nodes. We have observed improved throughput from our clusters with such a configuration.

Deployment diagram

 

We use a lot of open source on our platform such as OpenStack, MongoDB, Airflow, Grafana, InfluxDB (open version), openTSDB, etc. Our internal services, such as cluster provisioning, cluster management, and customer management services, allow REST API-driven management for deployment and configuration. They also help in tracking clusters as assets against different customers. Our cluster provisioning service relies heavily on OpenStack. For example, we use NOVA for managing compute resources (nodes), Neutron APIs for load balancer provisioning, and Keystone for internal authentication and authorization of our APIs.

We do not use federated or cross-region deployments for an Elasticsearch cluster. Network latency limits us from having such a deployment strategy. Instead, we host independent clusters for use cases across multiple regions. Clients will have to perform dual writes when clusters are deployed in multiple regions. We also do not use Tribe nodes.

Cluster onboarding

We create cluster topology during customer onboarding. This helps to track resources and cost associated with cluster infrastructure. The metadata stored as part of a cluster topology maintains region deployment information, SLA agreements, cluster owner information, etc. We use eBay’s internal configuration management system (CMS) to track cluster information in form of a directed graph. There are external tools that hook onto this topology. Such external integrations allow easy monitoring of our clusters from centralized eBay-specific systems.

Cluster topology example

Cluster management

Cluster security

Security is provided on our clusters via a custom security plug-in that provides a mechanism to both authenticate and authorize the use of Elasticsearch clusters. Our security plug-in intercepts messages and then performs context-based authorization and authentication using an internal authentication framework. Explicit whitelisting based on client IP is supported. This is useful for configuring Kibana or other external UI dashboards. Admin (Dev-ops) are configured to have complete access to Elasticsearch cluster. We encourage using HTTPS (based on TLS 1.2) for securing communication between client and Elasticsearch clusters.

The following is a sample simple security rule that can configure be configured on our platform of provisioned clusters.

sample json code implementing a security rule

In the above sample rule, the enabled field controls if the security feature is enabled or not. whitelisted_ip_list is an array attribute for providing all whitelisted Client IPs. Any Open/Close index operations or delete index operations can be performed only by admin users.

Cluster monitoring

Cluster monitoring is done by custom monitoring plug-in that pushes 70+ metrics from each Elasticsearch node to a back-end TSDB-based data store. The plug-in works on a push-based design. External dashboards using Grafana consume the data on TSDB store. Custom templates are created on a Grafana dashboard, which allows easy centralized monitoring of our own clusters.

 

 

We leverage an internal alert system that can be used to configure threshold-based alerts on data stored on OpenTSDB. Currently, we have 500+ active alerts configured on our clusters with varying severity. Alerts are classified as ‘Errors’ or ‘Warnings’. Error alerts, when raised, are immediately attended to either by DevOps or by our internal auto-remediation engine, based on the alert rule configured.

Alerts are created during cluster provisioning based on various thresholds. For Example, if a cluster status turns RED, an ‘Error’ alert is raised or if CPU utilization of node exceeds 80% a ‘Warning’ alert is raised.

Cluster remediation

Our ES-AAS platform can perform an auto-remediation action on receiving any cluster anomaly event. Such actions are enabled via our custom Lights-Out-Management (LOM) module. Any auto-remediation module can significantly reduce manual intervention for DevOps. Our LOM module uses a rule-based engine which listens to all alerts raised on our cluster. The reactor instance maintains a context of the alerts raised and, based on cluster topology state (AUTO ON/OFF), takes remediation actions. For example, if a cluster loses a node and if this node does not return to its cluster within the next 15 minutes, the remediation engine replaces that node via our internal cluster management services. Optionally, alerts can be sent to the team instead of taking a cluster remediation action. The actions of the LOM module are tracked as stateful jobs that are persisted on a back-end MongoDB store. Due to the stateful nature of these jobs, they can be retried or rolled back as required. Audit logs are also maintained to capture the history or timeline of all remediation actions that were initiated by our custom LOM module.

Cluster logging

Along with the standard Elasticsearch distribution, we also ship our custom logging library. This library pushes all Elasticsearch application logs onto a back-end Hadoop store via an internal system called Sherlock. All centralized application logs can be viewed at both cluster and node levels. Once Elasticsearch log data is available on Hadoop, we run daily PIG jobs on our log store to generate reports for error log or slow log counts. We generally have our logging settings as INFO, and whenever we need to triage issues, we use transient a logging setting of DEBUG, which collects detailed logs onto our back-end Hadoop store.

Cluster decommissioning

We follow a cluster decommissioning process for major version upgrades of Elasticsearch. For major upgrades for Elasticsearch clusters, we spawn a new cluster with our latest offering of the Elasticsearch version. We replay all documents from old or existing version of Elasticsearch clusters to the newly created cluster. Client (user applications) starts using both cluster endpoints for all future ingestion until data catches up on the new cluster. Once data parity is achieved, we decommission the old cluster. In addition to freeing up infrastructure resources, we also clean up the associated cluster topology. Elasticsearch also provides a migration plug-in that can be used to check if direct, in-place upgrades can be done on major Elasticsearch versions. Minor Elasticsearch upgrades are done on an as-needed basis and are usually done in-place.

Healthy Team Backlogs

 

What is a backlog?

Agile product owners use a backlog to organize and communicate the requirements for a team’s work. Product backlogs are deceptively simple, which can sometimes make them challenging to adopt for product owners who may be used to working with lengthy PRDs (“project requirement documents” or similar).

Scrum most commonly uses the term product backlog. However, many product owners who are new to Scrum are confused by this term. Reasonable questions arise: Does this suggest that a team working on multiple products would have multiple backlogs? If so, how do we prioritize between them? Where do bugs get recorded? What happens if work needs to be done, but it isn’t associated with a product; do we create a placeholder?

Therefore, we prefer the term team backlog. Our working definition of team backlog is “the maintained, ordered list of work that the team plans to do now or in the future.” This is a dense description, so let’s unpack it a little.

“The” and “Team”

  • We say the and team because each team needs a single source of truth to track their current and future work.
  • If a team is working on multiple projects or products, all of the work for those stories should appear on a single, unified, team backlog.
  • Teams do not generally share backlogs.

“Work”

  • Work includes almost everything that the development team needs to do.
  • Features, bugs, technical debt, research, improvements, and even user experience work all appear on the same backlog.
  • Generally speaking, recurring team meetings and similar events do not appear on the backlog.

“Maintained”

  • We say maintained because the backlog is a “living” artifact.
  • The product owner and team must continually update and refine their backlog. Otherwise, the team will waste time doing useless work and chasing requirements.
  • This requires several hours per week for the product owner and 1–2 hours per week for the team. It involves adding, ordering, discussing, describing, justifying, deleting, and splitting work.

“Ordered”

  • We say ordered list rather than prioritized list because the backlog is ordered, not just prioritized.
  • If the backlog is only prioritized, there can be multiple items that are all “very high priority.”
  • If the backlog is ordered, we communicate exactly in what order those “very high priority” tasks should be worked on.

“Plans to Do”

  • We say plans to do because we regularly delete everything from the backlog that we no longer plan to work on.
  • Deleting unnecessary work is essential. Unnecessary work clutters up our backlog and distracts from the actual work.

What makes a backlog healthy?

Now that we know what a backlog is, what makes a backlog healthy or not? While what makes for a good backlog is somewhat subjective — in the same way that what makes a good PRD could be subjective — there are 10 characteristics that we’ve found to be particularly important.

Would you like to know if your backlog is healthy? Download this handy PDF checklist, print it out, then open up your backlog and follow along. For each criterion, take note of whether your backlog currently does, doesn’t, or only somewhat meets the criterion. In exchange for less than half an hour of your time, you’ll have good sense as to the health of your backlog and a few ideas for improvement.

  1. Focused, ordered by priority, and the team follows the order diligently

    • At all times, anyone can look at the backlog and know what needs to be worked on next without ambiguity.
    • Even if you have several “P1” issues, the team needs to know which P1 issue needs to be addressed next. Simply saying “they’re all important” will paralyze the team.
    • Although the PO is responsible for the product backlog order and makes the final call, the PO should be willing to negotiate the order with their team. The team often has good insights that can mitigate dependencies or help the PO deliver more value.
    • Stay focused on one thing at a time when possible to deliver value earlier and reduce context switching waste.
  2.  

  3. Higher-value items towards the top, lower-value items towards the bottom

    • In general, do high-value, low-cost work first (“lowest hanging fruit”).
    • Next, do high-value, high-cost work because it is usually more strategic.
    • Then, do low-value, low-cost work.
    • Finally, eliminate low-value, high-cost work. You will almost always find something better to do with your time and resources, so don’t waste your time tracking it. It will be obvious if and when that work becomes valuable.
    • Hint: You can use Weighted Shortest Job First or a similar technique if you’re having difficulty prioritizing.
  4.  

  5. Granular, ready-to-work items towards the top, loosely-defined epics towards the bottom

    • Items that are at the top of the backlog will be worked on next, so we want to ensure that they are the right size to work on.
    • The typical team’s Definition of Ready recommends that items take ≤ ½ of a sprint to complete.
    • Delay decision-making and commitments — manifested as small, detailed, team-ready items — until the last responsible moment.
    • There is little value specifying work in detail if you will not work on it soon. Due to learning and changing customer/company/competitive conditions, your requirements may change or you may cancel the work altogether.

     
    What is an Epic?

    • An “epic” is simply a user story that is too large to complete in one sprint. It gets prioritized in the backlog like every other item.
    • JIRA Tip: “Epics” in JIRA do not appear in the backlog for Scrum boards. As a result, they behave more like organizing themes than epics. Therefore, we suggest using JIRA’s epic functionality to indicate themes and user stories with the prefix “Epic: ”  to indicate actual epics.
  6.  

  7. Solutions towards the top, statements of need towards the bottom

    • Teams can decide to start working on an item as soon as they know what customer needs they hope to solve. However, collaborating between product, design, development, and stakeholders to translate customer needs into solutions takes time.
    • As with other commitments, defer solutioning decisions until the last responsible moment:
      • Your ideal solution may change through learning or changing conditions such as customer, competitors, company, or even technology options.
      • You may decide not to work on the problem after all.
  8.  

  9. 1½ to 2 sprints worth of work that’s obviously ready to work on at the top

    • Teams sometimes surprise the product owner by having more capacity by expected.
    • Having enough ready stories ensures that the team is:
      • Unlikely to run out of work to pull into their sprint backlog during sprint planning.
      • Able to pull in additional work during the sprint if they complete the rest of the work on their sprint backlog.
    • It should be obvious what work is and isn’t ready to work on so that the team doesn’t have to waste time figuring it out each time they look at the backlog.
      • Some teams prefix a story title with a “* ” to indicate a ready story (or a story that isn’t ready).
  10.  

  11. The value of each piece of work is clearly articulated

    • Your team should be able to understand why the work is important to work on.
    • There are three primary sources of value (and you can define your own):
      • User/Business Value: Increase revenue, reduce costs, make users happy
      • Time Criticality: Must it happen soon due to competition, risk, etc.?
      • Opportunity Enablement/Risk Reduction/Learning: Is it strategic? Is it necessary to enable another valuable item (for example, a dependency)?
    • You won’t usually need a complex financial projection, just a reasonable justification as to why the item should be worked on next relative to all other known possibilities. Time previously spent with complex projections can instead be used to talk to customers and identify other opportunities.
  12.  

  13. The customer persona for the work is clearly articulated

    • The “As a” part of the “As a ____, I can ___, so that ____” user story isn’t a mere formality; it’s an essential part of user-centered product development.
    • Who is the customer? Who are you completing this work for? Even if you’re on a “back-end” team, keep the end-user in mind.
    • Partner with your designer to identify your personas and reference them whenever possible. Is this feature for “Serious Seller Sally?” Can you imagine her personality and needs just as well as any of your friends?
      • Example: “As Serious Seller Sally, I can list items using a ‘advanced’ flow so that I can get the options I need without the guidance for casual-sellers that only slows me down.”
    • Tool Tip: Most teams and POs find it best to put just the “I can” part the user story (for example, “List items using a ‘advanced’ flow”) in the planning tool’s title field. Otherwise it can be harder to read the backlog. Put the entire user story at the top of your tool’s description field.
  14.  

  15. ≤ 100 items (a rule of thumb), and contains no work that — realistically — will never be done

    • This is a general rule. If your team works on many very small items or has considerable work that you must track, your backlog could be longer.
    • Assuming that each backlog item takes a minute to read and understand, 100 items alone would take over an hour and a half to process. Keeping our backlog limited like this makes it easier and faster to fully understand.
    • A longer backlog is more likely to contain features that will never be built or bugs that will never be fixed. Keeping a short backlog helps us ensure that we triage effectively and delete items that we are unlikely to work on.
  16.  

  17. The team backlog is not a commitment

    • A Scrum team cannot make a realistic, firm commitment on an entire team backlog because:
      • It has not been through high-level design (for example, tasking at end of Sprint planning).
      • The risk of missed dependencies and unexpected requests/impediments is too great.
      • “Locking in” a plan that far into the future considerably restricts flexibility
    • A Scrum team can make a valid commitment on a sprint backlog if there are no mid-sprint scope changes and few unexpected requests and impediments.
  18.  

  19. Backlog reflects the release plan if available

    • If the team has conducted release planning, create pro forma sprints with items in your planning tool to reflect the release plan.
    • If there are production release, moratorium, or similar dates, communicate those too.
    • Update the release plan at end of each sprint as you learn.

What does a healthy team backlog look like in JIRA?

Glad you asked. Here are four sample “sprints” that take good advantage of JIRA’s built-in functionality.

Sprint 1 (active sprint)

Sprint 2 (next sprint)

Sprint 3 (future sprint)

Sprint 4 (future sprint)

Conclusion

Now you know what a healthy team backlog looks like. If you’ve filled out our printable checklist, mark off up to three items that you’ll work to improve over the next week or two with your teams. We hope this is of use to you!