eBay Analytics Platform & Delivery (APD) has a roughly 100-person division within the eBay China Technology Center of Excellence (COE). In September of 2008, the APD COE team started its first Scrum pilot project – one of the earliest Agile pilots within eBay. Since then, the team has completed its transition to the Scrum framework of Agile software development.
Scrum, a term borrowed from rugby, embodies the spirit of the Agile Manifesto and has become mainstream within the Agile community. As the rugby metaphor implies, a Scrum team works as a jelled unit -- crossing functional silos, holding each other accountable, supporting each other -- to move the ball forward and achieve the goal together. Play as a team, win as a team.
The individual performance dilemma
Soon after the pilot, people started to realize that Scrum was not simply "yet another development process." Agile/Scrum represents a new way of working in a broader sense and demands changes on every level -- not only day-to-day development but also the way we do management. A big question soon emerged from senior management: "Scrum is all about the 'team'. People self-organize and share the team's performance. But how about the individual's performance within the team? I'm not supposed to micro-manage each person, but it seems the Scrum team becomes a 'black hole' to me, and I lose sight of each team member's performance behind the 'event horizon'."
Not coincidentally, this question has been a common one for companies adopting Agile since the beginning. It's almost an inevitable topic at every Agile event. After hearing opinions, arguments, and debates ranging from setting "mission impossible" goals for individuals to completely abandoning individual appraisals, the APD China management team drew its own conclusions. In late 2011, managers started implementing a new framework for individual and team performance evaluation. The framework has four main components: product success (which is shared by team members), peer feedback (which distinguishes among team members), self-development, and management tasks. Among these components, the peer feedback within teams becomes a much more significant measure of individual performance. This blog post focuses on the peer feedback component.
Using peer feedback to evaluate individual performance is not new. However, it becomes much more useful and meaningful in an Agile context. The basic idea behind it is that team members who work closely with you on a daily basis are the ones who know the most about your performance in making the team successful; thus, they can give the most meaningful feedback and evaluation. In addition, performance is not one or two managers' judgment any more, but rather the aggregated evaluation of the other members of the same team. The "wisdom of the crowd" based on day-to-day facts tends to generate better accuracy.
How did we do it?
We needed a peer feedback system that would support summarizing, quantifying, and analyzing any number of responses. We decided to start with SurveyMonkey, a simple and free solution. Then we developed our own internal system to better suit our requirements.
The final step in implementing peer feedback for Scrum teams was determining what to ask. Since the feedback form is basically a survey, we held a "survey about the survey" to learn what people thought should be asked about their own performance. The results boiled down to the following eight question areas:
- Q1: Communication -- This is the foundation of human interaction and teamwork.
- Q2: Quality -- One's defect has a negative impact on the other team members, and ultimately on the overall quality and productivity of the team.
- Q3: Collaboration -- We value building consensus and seeking win-win outcomes over just getting one's own work done (i. e., "self-suboptimizing": focusing on one's own tasks rather than considering the team as a whole).
- Q4: Continuous Improvement -- By improving oneself and helping others to improve, the capabilities of the overall team increase.
- Q5: Role Sharing -- The willingness and ability to share responsibilities bi-directionally outside of one's functional silo makes the team more robust.
- Q6: Energizing -- An individual can positively influence the team, especially in tough times, instead of finger-pointing and dragging down team morale.
- Q7: Overall Satisfaction -- "If you had a choice, would you continue working with this team member?"
- Q8: Other Comments
Questions 1 through 6 represent the teamwork behaviors that we value the most. You might wonder why there's no question about how much a team member contributes to the team. There's a good reason for that. Measuring actual work contribution and delivery, such as the complexity of a completed task, is related to job seniority more than to teamwork behaviors. A newly graduated junior programmer might not be able to independently design an excellent solution to a complex requirement; however, that person can be the glue enabling the team to come up with a brilliant solution by working together. On the other hand, a senior architect might prevent the team's success by not listening to a second opinion due to ego or status. We want highly functioning teams that can produce more and better results than the individuals combined could do, but that outcome is impossible without the positive teamwork behaviors that we believe in. That's why the questions are all about teamwork; Agile/Scrum is about teamwork.
Questions 1 through 7 are all multiple choice on a scale of 1 to 10. Question 8 is free-form text. After we replaced SurveyMonkey with our own system, we added a free-text comment area for each multiple choice. The combination provides the advantage of both quantifying and qualifying, enabling us to do data analysis as well as to drill down to detailed information and facts. The way we organize each question's answer set also lets the respondent give relative feedback by comparing each team member on one dimension. For example:
The peer feedback survey is sent monthly -- a pretty high frequency, which allows us to measure the "pulse" of performance and take necessary actions in time. Another reason for this frequency is to avoid the phenomenon of only the most recent performance counting, while the past gets vague in people's memory. The performance trend over a longer time, such as a half-year, is now visible; of course, a consistently upward trend indicates better performance compared to a fluctuant one, although the average over such a period may be the same.
The feedback survey is strictly anonymous and confidential. I believe that a perfectly mature team could openly discuss each other's performance face to face, and that comments on areas for improvement could be treated as gifts without hard feelings. However, it's more important to create a safe environment for people to offer frank and open evaluation; there are also culture and background factors to consider. Another reason for private feedback is to avoid the "anchoring effect," where the first comment in a group discussion anchors the following ones.
What do we get from it?
After piloting in SurveyMonkey for several months and then officially switching to the internal system four months ago, we've gained much more than we had expected. The peer feedback results not only help the management team get much more insight into each individual's performance, but also help identify and fix team-level issues that have more profound and meaningful impact on our ability to improve our work.
As a data analytics organization, naturally we utilize techniques for visualizing the survey results. First, let’s see the most detailed information at the individual level – which was our initial motivation for creating this peer feedback system. Each solid blue/red line represents one question, and the dashed line is the overall trend over the four months:
Here's a deep dive into the question results for each individual:
Now let’s move to a bit higher level to see the full picture of each team and its team members:
The above "heat map" reveals a lot of information. For example:
- Different teams have different characteristics. The members of Team 12 evaluate each other similarly and tend to measure high on the scale, indicating good teamwork sentiment. Team 10, on the other hand, might still be in an earlier forming/storming stage. Team 11 has high scores for the majority; but notice there's one problematic team member whom the rest of the team give lower evaluations.
- Different teams have different teamwork challenges. Team 10 has lower scores (bigger issues) on Q4 (Continuous Improvement) and Q6 (Energizing), while Team 11 may need to improve on Q5 (Role Sharing).
Next, let’s go to an even higher level to see how teams are doing. The following graph shows the average score per team per question, sorted by the average score per team in descending order:
The above graph indicates the overall sentiment among the team members toward each other. A lower average score (lighter green) may indicate lower satisfaction among the team peers.
This final graph shows the score standard deviation per team per question, sorted by the grand total of the score standard deviation per team in descending order:
This graph is an indication of the variation in how team members evaluated each other. Higher variation (darker red) means the evaluation numbers that team members gave to a question were more variable; this variability may indicate the existence of outliers, or of specific issues between specific team members.
Companies want to benefit from Scrum's focus on highly effective teams. But companies also need visibility into the performance of each team member. In true Agile fashion, eBay's APD team in China has incrementally developed a peer feedback system that sheds light on both team and individual strengths and weaknesses. As a result, problems can be pinpointed and addressed more accurately and quickly.