Monthly Archives: August 2012

Caching HTTP POST Requests and Responses

The basic purpose of HTTP caching is to provide a mechanism for applications to scale better and perform faster. But HTTP caching is applicable only to idempotent requests, which makes a lot of sense; only idempotent and nullipotent requests yield the same result when run multiple times. In the HTTP world, this fact means that GET requests can be cached but POST requests cannot.

However, there can be cases where an idempotent request cannot be sent using GET simply because that request exceeds the limits imposed by popular Internet software. For example, search APIs typically take a lot of parameters, especially for a product with numerous characteristics, all of which have to be passed as parameters. This situation leads to the question, what then is the recommended way of communicating over the wire when the request contains more parameters than the “permitted” length of a GET request?  Here are some of the answers:

  • You may want to re-evaluate the interface design if it takes a large number of parameters. Idempotent requests typically require a number of parameters that falls well within the GET limits.
  • There is no such hard and fast limit imposed by the specification, so the HTTP specification is not to be blamed. Internet clients and servers do impose a limit. Some of them support up to 8 KB, but the safe bet is to keep the length under 2 KB.  
  • Send the body in a GET request.

At this point, we come to the realization that all of the above answers are unsatisfactory. They do not address the underlying problem or change the situation whatsoever.

HTTP caching basics

To appreciate the rest of this topic, let’s first go through the caching mechanism quickly.

HTTP caching involves the client, the proxy, and the server. In this post, we will discuss mainly the proxy, which sits between the client and server. Typically, reverse proxies are deployed close to the server, and forward proxies close to the client. Figure 1 shows the basic topology. From the figure, it should be clear that a cache-hit in the forward proxy saves bandwidth and reduces round-trip time (RTT) and latency; and a cache-hit at the reverse proxy reduces the load on the server.

forward cache proxy between client and internet, and reverse cache proxy between internet and server

Figure 1. Basic topology of proxy cache deployment in a network

The HTTP specification allows a response from cache if one of the following is satisfied:

  1. The cached response is consistent with the origin server’s response, had the origin server handled the request – in short, the proxy can guarantee a semantic equivalence between the cached response and the origin server’s response.
  2. The freshness is acceptable to the client.
  3. The freshness is not acceptable to the client but an appropriate warning is attached.

The specification has a number of flavors and associated headers and controls. Further details of the specification are available at, and of cache controls at

A typical proxy caches idempotent requests. The proxy gets the request, examines it for cache headers, and sends it to the server. Then the proxy examines the response and, if it is cacheable, caches it with the URL as the key (along with some headers in certain cases) and the response as the value.  This scheme works well with GET requests, because for the same URL repeated invocation does not change the response. Intermediaries can make use of this idempotency to safely cache GET requests. But this is not the case with an idempotent POST request. The URL (and headers) cannot be used as the key because the response could be different – the same URL, but with a different body.

POST body digest

The solution is to digest the POST body (along with a few headers), append the URL with the digest, and use this digest instead of just the URL as the cache key (see Figure 2). In other words, the cache key is modified to include the payload in addition to the URL. Subsequent requests with the same payload will hit the cache rather than the origin server. In practice, we add a few headers and their values to the cache key to establish uniqueness, as appropriate for the use case. Although we don’t have a specific algorithm recommendation, if MD5 is used to digest the body, then Content-MD5 could be used as a header.

Figure 2. Digest-based cache

Now the problem is to distinguish idempotent POST requests from non-idempotent ones. There are a few ways to handle this problem:

  • Configure URLs and patterns in the proxy so that it does not cache if there is a match.
  • Add context-aware headers to distinguish between different requests.
  • Base the cache logic on some naming conventions. For example, APIs with names that start with words like “set”, “add”, and “delete” are not cached and will always hit the origin server.

Handling Non-Idempotent Requests

Here’s how we solve the problem of non-idempotent requests:

  • Hit the origin server under any of the following circumstances:
    • if the URL is in the configured “DO NOT CACHE” URL list
    • if the digests do not match
    • after the cache expiry time
    • whenever a request to revalidate is received
  • Attach a warning saying the content could be stale, thereby accommodating the specification.
  • Allow users to hit the origin server by using our client-side tools to turn off the proxy.

We implemented this solution with Apache Traffic Server, customized to cache POST requests and the cache key.


This solution provides the following benefits:

  • We speed up repeated requests by not performing the round trip from the proxy to the origin server.
  • As a hosted solution, one user’s request speeds not only that user’s subsequent requests but also the requests from other users, provided that the cache is set to be shared across requests and that the header permits it.
  • We save the bandwidth between the proxy and the origin server.

Here is a performance comparison of an API invocation deployed as a forward proxy and having a data transfer sum of 20 KB:

No Caching

With Caching

188 ms

90 ms

 Variants of this solution can be used to cache the request or response or both at the forward proxy, reverse proxy, or both.

Cache handshake

To get the full benefit, in this solution we deploy a forward proxy at the client end and a reverse proxy at the server end. The client sends the request to the forward proxy, and the proxy does a cache lookup. In the case of a cache miss, the forward proxy digests the body and sends only the digest to the reverse proxy. The reverse proxy looks for a match in the request cache and, if found, sends that request to the server. The difference is we don’t send the full request from the forward proxy to the reverse proxy.   

The server sends the response to the reverse proxy, which digests the response and sends only the digest – not the full response (see Figure 3). Essentially, we are saving the POST data from being sent, at the cost of an additional round trip (of the digest key) if the result is a cache miss. In most networks, the RTT between the client and the forward proxy and between the server and the reverse proxy is negligible when compared to the RTT between the client and the server. This fact is because typically the forward proxy and the client are close to each other and in the same LAN; likewise, the reverse proxy and the server are close to each other and in the same LAN. The network latency is between the proxies, where data travels through the Internet.

 Sending only digests across the Internet

Figure 3. Cache handshake

This solution can also be applied to just one proxy on either side, at the cost of client or server modification. In such cases, the client or server will have to send the digest instead of the whole body. With the two-proxy architecture, the client and server remain unchanged and, as a result, any HTTP client or server can be optimized.

POST requests typically are large in size. By not having the proxy send the whole request and the whole response, we not only save bandwidth but also save the response time involved in large requests and responses.

Although the savings might seem trivial at first glance, they are not so when it comes to real traffic loads. As we saw, even if the POST response is not cached, we still save bandwidth by not sending the payload. This solution gets even more interesting with distributed caches deployed within the network.


Here is a summary of the benefits of a cache-handshaking topology:

  • The request payload travels to the reverse proxy only if there is a cache miss. As a result, both the RTT and bandwidth are improved. The same applies to the response.
  • As a hosted solution, one request will help save the other user requests travelling between the proxies.
  • There is no technical debt involved. If you remove the proxies, you have a fully HTTP-compliant solution.


HTTP caching is not just for GET requests. By digesting the POST body, handling non-idempotent requests, and distinguishing between idempotent and non-idempotent requests, you can realize a substantial savings in round trips and bandwidth. For further savings, you can employ cache handshaking to send only the digest across the Internet—ideally, by implementing a forward proxy on the client side and a reverse proxy on the server side, but one proxy is sufficient to make a difference.

Cassandra Data Modeling Best Practices, Part 2

In the first part, we covered a few fundamental practices and walked through a detailed example to help you get started with Cassandra data model design. You can follow Part 2 without reading Part 1, but I recommend glancing over the terms and conventions I’m using. If you’re new to Cassandra, I urge you to read Part 1.

September 2014 Update: Readers should note that this article describes data modeling techniques based on Cassandra’s Thrift API.  Please see for CQL API based techniques.

August 2015 Update:  Readers can also sign up a free online self-paced course on how to model their data in Apache Cassandra from

Some of the practices listed below might evolve in the future. I’ve provided related JIRA ticket numbers so you can watch any evolution.

With that, let’s get started with some basic practices!

Storing values in column names is perfectly OK

Leaving column values empty (“valueless” columns) is also OK.

It’s a common practice with Cassandra to store a value (actual data) in the column name (a.k.a. column key), and even to leave the column value field empty if there is nothing else to store. One motivation for this practice is that column names are stored physically sorted, but column values are not.


  • The maximum column key (and row key) size is 64KB.  However, don’t store something like ‘item description’ as the column key!
  • Don’t use timestamp alone as a column key. You might get colliding timestamps from two or more app servers writing to Cassandra. Prefer timeuuid (type-1 uuid) instead.
  • The maximum column value size is 2 GB. But becuase there is no streaming and the whole value is fetched in heap memory when requested, limit the size to only a few MBs.  (Large objects are not likely to be supported in the near future – Cassandra-265. However, the Astyanax client library supports large objects by chunking them.)

Leverage wide rows for ordering, grouping, and filtering

But don’t go too wide.

This goes along with the above practice. When actual data is stored in column names, we end up with wide rows.

Benefits of wide rows:

  • Since column names are stored physically sorted, wide rows enable ordering of data and hence efficient filtering (range scans). You’ll still be able to efficiently look up an individual column within a wide row, if needed.
  • If data is queried together, you can group that data up in a single wide row that can be read back efficiently, as part of a single query. As an example, for tracking or monitoring some time series data, we can group data by hour/date/machines/event types (depending on the requirements)  in a single wide row, with each column containing granular data or roll-ups. We can also further group data within a row using super or composite columns as discussed later.
  • Wide row column families are heavily used (with composite columns) to build custom indexes in Cassandra.
  • As a side benefit, you can de-normalize a one-to-many relationship as a wide row without data duplication. However, I would do this only when data is queried together and you need to optimize read performance.


Let’s say we want to store some event log data and retrieve that data hourly. As shown in the model below, the row key is the hour of the day, the column name holds the time when the event occurred, and the column value contains payload. Note that the row is wide and the events are ordered by time because column names are stored sorted. Granularity of the wide row (for this example, per hour rather than every few minutes) depends on the use case, traffic, and data size, as discussed next.

But not too wide, as a row is never split across nodes:

It’s hard to say exactly how wide a wide row should be, partly because it’s dependent upon the use case. But here’s some advice:

Traffic: All of the traffic related to one row is handled by only one node/shard (by a single set of replicas, to be more precise). Rows that are too “fat” could cause hot spots in the cluster – usually when the number of rows is smaller than the size of the cluster (hope not!), or when wide rows are mixed with skinny ones, or some rows become hotter than others. However, cluster load balancing ultimately depends on the row key selection; conversely, the row key also defines how wide a row will be. So load balancing is something to keep in mind during design.

Size: As a row is not split across nodes, data for a single row must fit on disk within a single node in the cluster. However, rows can be large enough that they don’t have to fit in memory entirely. Cassandra allows 2 billion columns per row. At eBay, we’ve not done any “wide row” benchmarking, but we model data such that we never hit more than a few million columns or a few megabytes in one row (we change the row key granularity, or we split into multiple rows). If you’re interested, Cassandra Query Plans by Aaron Morton shows some performance concerns with wide rows (but note that the results can change in new releases).

However, these caveats don’t mean you should not use wide rows; just don’t go extra wide.

Note: Cassandra-4176 might add composite types for row key in CQL as a way to split a wide row into multiple rows. However, a single (physical) row is never split across nodes (and won’t be split across nodes in the future), and is always handled by a single set of replicas. You might also want to track Cassandra-3929, which would add row size limits for keeping the most recent n columns in a wide row.

Choose the proper row key – it’s your “shard key”

Otherwise, you’ll end up with hot spots, even with RandomPartitioner.

Let’s consider again the above example of storing time series event logs and retrieving them hourly. We picked the hour of the day as the row key to keep one hour of data together in a row. But there is an issue: All of the writes will go only to the node holding the row for the current hour, causing a hot spot in the cluster. Reducing granularity from hour to minutes won’t help much, because only one node will be responsible for handling writes for whatever duration you pick. As time moves, the hot spot might also move but it won’t go away!

Bad row key:  “ddmmyyhh”

One way to alleviate this problem is to add something else to the row key – an event type, machine id, or similar value that’s appropriate to your use case.

Better row key: “ddmmyyhh|eventtype”

Note that now we don’t have global time ordering of events, across all event types, in the column family. However, this may be OK if the data is viewed (grouped) by event type later. If the use case also demands retrieving all of the events (irrespective of type) in time sequence, we need to do a multi-get for all event types for a given time period, and honor the time order when merging the data in the application.

If you can’t add anything to the row key or if you absolutely need ‘time period’ as a row key, another option is to shard a row into multiple (physical) rows by manually splitting row keys: “ddmmyyhh | 1”, “ddmmyyhh | 2”,… “ddmmyyhh | n”, where n is the number of nodes in the cluster. For an hour window, each shard will now evenly handle the writes; you need to round-robin among them. But reading data for an hour will require multi-gets from all of the splits (from the multiple physical nodes) and merging them in the application. (An assumption here is that RandomPartitioner is used, and therefore that range scans on row keys can’t be done.)

Keep read-heavy data separate from write-heavy data

This way, you can benefit from Cassandra’s off-heap row cache.

Irrespective of caching and even outside the NoSQL world, it’s always a good practice to keep read-heavy and write-heavy data separate because they scale differently.


  • A row cache is useful for skinny rows, but harmful for wide rows today because it pulls the entire row into memory. Cassandra-1956 and Cassandra-2864 might change this in future releases. However, the practice of keeping read-heavy data separate from write-heavy data will still stand.
  • Even if you have lots of data (more than available memory) in a column family but you also have particularly “hot” rows, enabling a row cache might be useful.

Make sure column key and row key are unique

Otherwise, data could get accidentally overwritten.

  • In Cassandra (a distributed database!), there is no unique constraint enforcement for row key or column key.
  • Also, there is no separate update operation (no in-place updates!). It’s always an upsert (mutate) in Cassandra. If you accidentally insert data with an existing row key and column key, the previous column value will be silently overwritten without any error (the change won’t be versioned; the data will be gone).

Use the proper comparator and validator

Don’t just use the default BytesType comparator and validator unless you really need to.

In Cassandra, the data type for a column value (or row key)  is called a Validator. The data type for a column name is called a Comparator.  Although Cassandra does not require you to define both, you must at least specify the comparator unless your column family is static (that is, you’re not storing actual data as part of the column name), or unless you really don’t care about the sort order.

  • An improper comparator will sort column names inappropriately on the disk. It will be difficult (or impossible) to do range scans on column names later.
  • Once defined, you can’t change a comparator without rewriting all data. However, the validator can be changed later.

See comparators and validators in the Cassandra documentation for the supported data types.

Keep the column name short

Because it’s stored repeatedly.

This practice doesn’t apply if you use the column name to store actual data. Otherwise, keep the column name short, since it’s repeatedly stored with each column value. Memory and storage overhead can be significant when the size of the column value is not much larger than the size of the column name – or worse, when it’s smaller.

For example, favor ‘fname’ over ‘firstname’, and ‘lname’ over ‘lastname’.

Note: Cassandra-4175 might make this practice obsolete in the future.

Design the data model such that operations are idempotent

Or, make sure that your use case can live with inaccuracies or that inaccuracies can be corrected eventually.

In an eventually consistent and fully distributed system like Cassandra, idempotent operations can help – a lot. Idempotent operations allow partial failures in the system, as the operations can be retried safely without changing the final state of the system. In addition, idempotency can sometimes alleviate the need for strong consistency and allow you to work with eventual consistency without causing data duplication or other anomalies. Let’s see how these principles apply in Cassandra. I’ll discuss partial failures only, and leave out alleviating the need for strong consistency until an upcoming post, as it is very much dependent on the use case.

Because of  Cassandra’s fully distributed (and multi-master) nature, write failure does not guarantee that data is not written, unlike the behavior of relational databases. In other words, even if the client receives a failure for a write operation, data might be written to one of the replicas, which will eventually get propagated to all replicas. No rollback or cleanup is performed on partially written data. Thus, a perceived write failure can result in a successful write eventually. So, retries on write failure can yield unexpected results if your model isn’t update idempotent.


  • “Update idempotent” here means a model where operations are idempotent. An operation is called idempotent if it can be applied one time or multiple times with the same result.
  • In most cases, idempotency won’t be a concern, as writes into regular column families are always update idempotent. The exception is with the Counter column family, as shown in the example below. However, sometimes your use case can model data such that write operations are not update idempotent from the use case perspective. For instance, in part 1, User_by_Item and Item_by_User in the final model are not update idempotent if the use case operation ‘user likes item’ gets executed multiple times, as the timestamp might differ for each like. However, note that a specific instance of  the use case operation ‘user likes item’ is still idempotent, and so can be retried multiple times in case of failures. As this is more use-case specific, I might elaborate more in future posts.
  • Even with a consistency level ONE, write failure does not guarantee data is not written; the data still could get propagated to all replicas eventually.


Suppose that we want to count the number of users who like a particular item. One way is to use the Counter column family supported by Cassandra to keep count of users per item. Since the counter increment (or decrement) is not update idempotent, retry on failure could yield an over-count if the previous increment was successful on at least one node. One way to make the model update idempotent is to maintain a list of user ids instead of incrementing a count, as shown below. Whenever a user likes an item, we write that user’s id against the item; if the write fails, we can safely retry. To determine the count of all users who like an item, we read all user ids for the item and count manually.

In the above update idempotent model, getting the counter value requires reading all user ids, which will not perform well (there could be millions). If reads are heavy on the counter and you can live with an approximate count, the counter column will be efficient for this use case. If needed, the counter value can be corrected periodically by counting the user ids from the update idempotent column family.

Note: Cassandra-2495 might add a proper retry mechanism for counters in the case of a failed request. However, in general, this practice will continue to hold true. So make sure to always litmus-test your model for update idempotency.

Model data around transactions, if needed

But this might not always be possible, depending on the use case.

Cassandra has no multi-row, cluster-wide transaction or rollback mechanism; instead, it offers row-level atomicity. In other words, a single mutation operation of columns for a given row key is atomic. So if you need transactional behavior, try to model your data such that you would only ever need to update a single row at once. However, depending on the use case, this is not always doable. Also, if your system needs ACID transactions, you might re-think your database choice.

Note: Cassandra-4285 might add an atomic, eventually consistent batch operation.

Decide on the proper TTL up front, if you can

Because it’s hard to change TTL for existing data.

In Cassandra, TTL (time to live) is not defined or set at the column family level. It’s set per column value, and once set it’s hard to change; or, if not set, it’s hard to set for existing data. The only way to change the TTL for existing data is to read and re-insert all the data with a new TTL value. So think about your purging requirements, and if possible set the proper TTL for your data upfront.

Note: Cassandra-3974 might introduce TTL for the column family, separate from column TTL.

Don’t use the Counter column family to generate surrogate keys

Because it’s not intended for this purpose.

The Counter column family holds distributed counters meant (of course) for distributed counting. Don’t try to use this CF to generate sequence numbers for surrogate keys, like Oracle sequences or MySQL auto-increment columns. You will receive duplicate sequence numbers! Most of the time you really don’t need globally sequential numbers. Prefer timeuuid (type-1 uuid) as surrogate keys. If you truly need a globally sequential number generator, there are a few possible mechanisms; but all will require centralized coordination, and thus can impact the overall system’s scalability and availability.

Favor composite columns over super columns

Otherwise, you might hit performance bottlenecks with super columns.

A super column in Cassandra can be used to group column keys, or to model a two-layer hierarchy. However, super columns have the following implementation issues and are therefore becoming less favorable.


  • Sub-columns of a super column are not indexed. Reading one sub-column de-serializes all sub-columns.
  • Built-in secondary indexing does not work with sub-columns.
  • Super columns cannot encode more than two layers of hierarchy.

Similar (even better) functionality can be achieved by the use of the Composite column. It’s a regular column with sub-columns encoded in it. Hence, all of the benefits of regular columns, such as sorting and range scans, are available; and you can encode more than two layers of hierarchy.

Note: Cassandra-3237 might change the underlying super column implementation to use composite columns. However, composite columns will still remain preferred over super columns.

The order of sub-columns in composite columns matters

Because order defines grouping.

For example, a composite column key like <state|city> will be stored ordered first by state and then by city, rather than first by city and then by state. In other words, all the cities within a state are located (grouped) on disk together.

Favor built-in composite types over manual construction

Because manual construction doesn’t always work.

Avoid manually constructing the composite column keys using string concatenation (with separators like “:” or “|”). Instead, use the built-in composite types (and comparators) supported by Cassandra 0.8.1 and above.


  • Manual construction won’t work if sub-columns are of different data types. For example, the composite key <state|zip|timeuuid> will not be sorted in a type-aware fashion (state as string, zip code as integer, and timeuuid as time).
  • You can’t reverse the sort order on components in the type – for instance, with the state ascending and the zip code descending in the above key.

Note: Cassandra built-in composite types come in two flavors:

  • Static composite type: Data types for each part of a composite column are predefined per column family.  All the column names/keys within a column family must be of that composite type.
  • Dynamic composite type: This type allows mixing column names with different composite types in a column family or even in one row.

Find more information about composite types at Introduction to composite columns.

Favor static composite types over dynamic, whenever possible

Because dynamic composites are too dynamic.

If all column keys in a column family are of the same composite type, always use static composite types. Dynamic composite types were originally created to keep multiple custom indexes in one column family. If possible, don’t mix different composite types in one row using the dynamic composite type unless absolutely required. Cassandra-3625 has fixed some serious issues with dynamic composites.

Note: CQL 3 supports static composite types for column names via clustered (wide) rows. Find more information about how CQL 3 handles wide rows at DataStax docs.

Enough for now. I would appreciate your inputs to further enhance these modeling best practices, which guide our Cassandra utilization today.

—  Jay Patel, architect@eBay.


Now You See It: A Peer Feedback System for Scrum Teams

eBay Analytics Platform & Delivery (APD) has a roughly 100-person division within the eBay China Technology Center of Excellence (COE). In September of 2008, the APD COE team started its first  Scrum pilot project – one of the earliest Agile pilots within eBay. Since then, the team has completed its transition to the Scrum framework of Agile software development.

Scrum, a term borrowed from rugby, embodies the spirit of the Agile Manifesto and has become mainstream within the Agile community. As the rugby metaphor implies, a Scrum team works as a jelled unit —  crossing functional silos, holding each other accountable, supporting each other —  to move the ball forward and achieve the goal together. Play as a team, win as a team.

The individual performance dilemma

Soon after the pilot, people started to realize that Scrum was not simply “yet another development process.” Agile/Scrum represents a new way of working in a broader sense and demands changes on every level — not only day-to-day development but also the way we do management. A big question soon emerged from senior management: “Scrum is all about the ‘team’. People self-organize and share the team’s performance. But how about the individual’s performance within the team? I’m not supposed to micro-manage each person, but it seems the Scrum team becomes a ‘black hole’ to me, and I lose sight of each team member’s performance behind the ‘event horizon’.”

Not coincidentally, this question has been a common one for companies adopting Agile since the beginning. It’s almost an inevitable topic at every Agile event. After hearing opinions, arguments, and debates ranging from setting “mission impossible” goals for individuals to completely abandoning individual appraisals, the APD China management team drew its own conclusions. In late 2011, managers started implementing a new framework for individual and team performance evaluation. The framework has four main components: product success (which is shared by team members), peer feedback (which distinguishes among team members), self-development, and management tasks.  Among these components, the peer feedback within teams becomes a much more significant measure of individual performance. This blog post focuses on the peer feedback component.

Using peer feedback to evaluate individual performance is not new. However, it becomes much more useful and meaningful in an Agile context. The basic idea behind it is that team members who work closely with you on a daily basis are the ones who know the most about your performance in making the team successful; thus, they can give the most meaningful feedback and evaluation. In addition, performance is not one or two managers’ judgment any more, but rather the aggregated evaluation of the other members of the same team. The “wisdom of the crowd” based on day-to-day facts tends to generate better accuracy.

How did we do it?

We needed a peer feedback system that would support summarizing, quantifying, and analyzing any number of responses. We decided to start with SurveyMonkey, a simple and free solution. Then we developed our own internal system to better suit our requirements.

The final step in implementing peer feedback for Scrum teams was determining what to ask. Since the feedback form is basically a survey, we held a “survey about the survey” to learn what people thought should be asked about their own performance. The results boiled down to the following eight question areas:

  • Q1:  Communication — This is the foundation of human interaction and teamwork.
  • Q2:  Quality — One’s defect has a negative impact on the other team members, and ultimately on the overall quality and productivity of the team.
  • Q3:  Collaboration — We value building consensus and seeking win-win outcomes over just getting one’s own work done (i. e., “self-suboptimizing”:  focusing on one’s own tasks rather than considering the team as a whole).
  • Q4:  Continuous Improvement — By improving oneself and helping others to improve, the capabilities of the overall team increase.
  • Q5:  Role Sharing — The willingness and ability to share responsibilities bi-directionally outside of one’s functional silo makes the team more robust.
  • Q6:  Energizing — An individual can positively influence the team, especially in tough times, instead of finger-pointing and dragging down team morale.
  • Q7:  Overall Satisfaction — “If you had a choice, would you continue working with this team member?”
  • Q8:  Other Comments

Questions 1 through 6 represent the teamwork behaviors that we value the most. You might wonder why there’s no question about how much a team member contributes to the team. There’s a good reason for that. Measuring actual work contribution and delivery, such as the complexity of a completed task, is related to job seniority more than to teamwork behaviors. A newly graduated junior programmer might not be able to independently design an excellent solution to a complex requirement; however, that person can be the glue enabling the team to come up with a brilliant solution by working together. On the other hand, a senior architect might prevent the team’s success by not listening to a second opinion due to ego or status. We want highly functioning teams that can produce more and better results than the individuals combined could do, but that outcome is impossible without the positive teamwork behaviors that we believe in. That’s why the questions are all about teamwork; Agile/Scrum is about teamwork.

Questions 1 through 7 are all multiple choice on a scale of 1 to 10. Question 8 is free-form text. After we replaced SurveyMonkey with our own system, we added a free-text comment area for each multiple choice. The combination provides the advantage of both quantifying and qualifying, enabling us to do data analysis as well as to drill down to detailed information and facts. The way we organize each question’s answer set also lets the respondent give relative feedback by comparing each team member on one dimension. For example:

Question 1, with rows for separate ratings of each team member

The peer feedback survey is sent monthly — a pretty high frequency, which allows us to measure the “pulse” of performance and take necessary actions in time. Another reason for this frequency is to avoid the phenomenon of only the most recent performance counting, while the past gets vague in people’s memory. The performance trend over a longer time, such as a half-year, is now visible; of course, a consistently upward trend indicates better performance compared to a fluctuant one, although the average over such a period may be the same.

The feedback survey is strictly anonymous and confidential. I believe that a perfectly mature team could openly discuss each other’s performance face to face, and that comments on areas for improvement could be treated as gifts without hard feelings. However, it’s more important to create a safe environment for people to offer frank and open evaluation; there are also culture and background factors to consider. Another reason for private feedback is to avoid the “anchoring effect,” where the first comment in a group discussion anchors the following ones.

What do we get from it?

After piloting in SurveyMonkey for several months and then officially switching to the internal system four months ago, we’ve gained much more than we had expected. The peer feedback results not only help the management team get much more insight into each individual’s performance, but also help identify and fix team-level issues that have more profound and meaningful impact on our ability to improve our work.

As a data analytics organization, naturally we utilize techniques for visualizing the survey results. First, let’s see the most detailed information at the individual level – which was our initial motivation for creating this peer feedback system. Each solid blue/red line represents one question, and the dashed line is the overall trend over the four months:

dashed line showing 4-month trend per individual

Here’s a deep dive into the question results for each individual:

4-month trend line per question, per team member

Now let’s move to a bit higher level to see the full picture of each team and its team members:

heat map per question, per team member

The above “heat map” reveals a lot of information. For example:

  • Different teams have different characteristics. The members of Team 12 evaluate each other similarly and tend to measure high on the scale, indicating good teamwork sentiment. Team 10, on the other hand, might still be in an earlier forming/storming stage. Team 11 has high scores for the majority; but notice there’s one problematic team member whom the rest of the team give lower evaluations.
  • Different teams have different teamwork challenges. Team 10 has lower scores (bigger issues) on Q4 (Continuous Improvement) and Q6 (Energizing), while Team 11 may need to improve on Q5 (Role Sharing).

Next, let’s go to an even higher level to see how teams are doing. The following graph shows the average score per team per question, sorted by the average score per team in descending order:

average score per team per question

The above graph indicates the overall sentiment among the team members toward each other. A lower average score (lighter green) may indicate lower satisfaction among the team peers.

This final graph shows the score standard deviation per team per question, sorted by the grand total of the score standard deviation per team in descending order:

score standard deviation per team per question

This graph is an indication of the variation in how team members evaluated each other. Higher variation (darker red) means the evaluation numbers that team members gave to a question were more variable; this variability may indicate the existence of outliers, or of specific issues between specific team members.


Companies want to benefit from Scrum’s focus on highly effective teams. But companies also need visibility into the performance of each team member. In true Agile fashion, eBay’s APD team in China has incrementally developed a peer feedback system that sheds light on both team and individual strengths and weaknesses. As a result, problems can be pinpointed and addressed more accurately and quickly.