Mastering the Fire

 

“If you play with fire, you’re gonna get burned.”  ~ Anonymous

There were several reasons we built the NodeJS stack at eBay and now offer it as part of our polyglot initiative. These reasons include an active open source community, development productivity, reliability, scalability, and speed. Community support and productivity proved to be true from the start, but when it comes to reliability, scalability, and speed, they all depend on developer culture.

We use static code analysis tools, code reviews, unit tests, and regression testing to make sure our modules and applications work according to the spec. One isolated module can perform perfectly fine in its own test environment and as part of application, but once all the modules are packaged together and ready to roll, the app may turn out to be much slower than one expected, for example, due to logging too much data. This can become a tough time for the application stack provider who does not have answers.

Thankfully, flame graphs came on the scene. They were really promising, but their promise turned out to be far from the reality. The flame graphs turned out to be hot like real flames. We touched them a few times, got burned, and backed off. The first time we approached them, flame graphs were available only in SmartOS, and one had to follow specific steps to generate them, and that was the problem, especially when one runs applications on a completely different platform. Addicted to the simplicity of Node, which just works, we found this option was far from simple, and we put it in reserve for tough cases that we could not solve some other way. The second time that we approached flame graphs, they were already available on Linux or OSX, but creating them still required a special setup and too many steps (including merging symbols with profile results) to get SVG charts in OSX.

“It’s a living thing, Brian. It breathes, it eats, and it hates. The only way to beat it is to think like it.” ~ Robert De Niro (as Donald ‘Shadow’ Rimgale), Backdraft, 1991

Meanwhile, we were using v8-profiler to generate profile data that we would load into the Chrome Profile tool, and then we would analyze the aggregation tree for any performance hot spots. It is a laborious task when one has to look at all the call stacks of a big application, and it demanded a lot of focus. We could not offer this solution to our application developers, as it would take too much of their time to troubleshoot. It was going to become a task for a special profile expert who would do a lot of profiling, get a lot experience, and be able to spot things easily and know where to look. This was not scalable. As a big project started knocking at our door, we had to figure out a better way to profile so that the application developers could do the work by themselves.

We got an idea that if Chrome shows profile results in aggregated format, then there should be a way to calculate the same results by ourselves and present them as flame graphs by using one of the tools available. And we found our calculator and a suitable tool that was built to use JSON as profile data. All we needed to do is to put it all together.

“Playing with fire is bad for those who burn themselves. For the rest of us, it is a very great pleasure.”  ~ Jerry Smith, National Football League tight end, Washington Redskins ‘65-77

The result is pretty exciting. We are now able to turn on profiling in production any time without restarting the server and look right into the problem via flame graphs with one click of a button. The results show the JavaScript part of the profiling (no native code), which is what developers want most of the time anyway when it comes to performance issues in their applications.

It also works anywhere that can run Node. For example, developers now can profile right on their Macs or Windows machines without any special effort on their part.

We have already successfully used it to find and optimize performance in platform code as well as in many applications that are soon to be rolled to production. We were able to quickly identify performance problems in production for one critical application when, after a fresh deployment, it started using 80% of CPU instead of the expected 20–30%. Below you can see the problem, it was loading templates over and over again with every request. The fix was simply to cache the templates at the first load.

This first flame graph shows the application’s behavior before the fix. Total time spent on requests was 3500 msec.

flame graph of a sample application before its fix was applied

This next illustration shows a close-up view of the same flame graph, highlighting the trouble spots.

close up view of part of the flame graph of a sample application before its fix was applied

This next flame graph shows the optimization we got after applying the fix.

flame graph of the sample application after its fix was applied

As you can see the rendering part became much smaller. The total time spent on all requests became 1100 msec.

Most of the problems we discovered were not as big as the one that Netflix uncovered with flame graphs, but fixing them helped us save a lot on CPU usage.

“Don’t let your dreams go up in smoke — practice fire safety.”  ~ Unknown Author

cartoon shows a data center in flames with the caption someone rolled to production without CPU profiling

There is still work to do. We need to train developers to read flame graphs. Otherwise this valuable tool can draw an undeserved negative perception and disappear from the developers’ toolset.

After profiling many applications, we have also found common problems that we can highlight by default, and we can implement new rules for static code analysis to identify these problems.

We have found it useful to profile the following areas with flame graphs:

  • Application profiling during development
  • Unexpected activity detection during memory leak analysis
  • Capacity estimation based on CPU usage
  • Issue troubleshooting at runtime in production
  • Proactive smoke testing with live traffic in a special environment using a traffic mirror (cloning read requests and directing them to the target test box)
  • Sampling and storing for future investigation

To summarize our experience with Node and profiling, I would say that the successful employment of any language, no matter how promising, depends on the way it is used, and performance tools like flame graphs play a major role in helping the developer to accomplish what was claimed at the start.

Browse eBay with Style and Speed

One of the top initiatives for eBay this year is to provide a compelling browse experience to our users. In a recent interview, Devin Wenig has given a good overview of why this matters to eBay. The idea is to leverage structured data and machine learning to allow users to shop across a whole spectrum of value, where some users might desire great savings, while others may want to focus on, say, best selling products.

When we started to design the experience, our first area of focus was mobile web. Similar to many other organizations, mobile web has been our highest growing sector. We wanted to launch the new browse experience on mobile web first, followed by desktop and native.

The core design principles of the new mobile web browse experience were to keep it simple, accessible, and fast, really fast. On the front-end side of things, we made a couple of choices to achieve this.

  • Lean and accessible — From the beginning we wanted the page to be as lean as possible. This meant keeping the HTML, CSS, and JS to a minimum. To achieve this goal, we followed a modular architecture and started building atomic components. Basically a page is a bunch of modules, and a module is built from other sub-modules, and so on. This practice enabled maximum code reuse, which in turn reduced the size of resources (CSS and JS) drastically. In addition, our style library enforced accessibility through CSS — by using ARIA attributes to define styles rather than just class names. This forces developers to write a11y-friendly markup from the beginning, instead of it being an afterthought. You can read more about it here.
  • Code with the platform — The web platform has evolved into a more developer friendly stack, and we wanted to leverage this aspect — code with the platform vs. coding against it. What this meant was that we could reduce the dependency on big libraries and frameworks and start using the native APIs to achieve the same. For instance, we tried to avoid jQuery for DOM manipulations and instead use the native DOM APIs. Similarly, we could use the fetch polyfill instead of $.ajax etc. The end result was a faster loading page that was also very responsive to user interactions. BTW, jQuery is still loaded in the page, because some of eBay platform specific code is dependent on it, and we are working towards removing the dependency altogether.

But our efforts did not stop there. The speed aspect was very critical for us, and we wanted to do more for speed. That is when we ran into AMP.

Experimenting with AMP

The AMP project was announced around the same time we started the initial brainstorming for browse. It seemed to resonate a lot with our own thinking on how we wanted to render the new experience. Although AMP was more tuned towards publisher-based content, it was still an open source project built using the open web. Also, a portion of the traffic to the new browse experience is going to be from search engines, which made it more promising to look into AMP. So we quickly pinged the AMP folks at Google and discussed the idea of building an AMP version for the browse experience, in addition to the normal mobile web pages. They were very supportive of it. This positive reaction encouraged us to start looking into AMP technology for the eCommerce world and in parallel develop an AMP version of browse.

Today we are proud to announce that the AMP version of the new browse experience is live, and about 8 million AMP-based browse nodes are available in production. Check out some of the popular queries in a mobile browser — Camera Drones and Sony PlayStation, for example. Basically adding amp/ to the path of any browse URL will render an AMP version (for example, non-AMP, AMP). We have not linked all of them from our regular (non-AMP) pages yet. This step is waiting on few pending tasks to be completed. For now, we have enabled this new browse experience only in mobile web. In the next couple of weeks, the desktop web experience will also be launched.

So how was the experience in implementing AMP for the eCommerce world? We have highlighted some of our learnings below.

What worked well?

  • Best practices — One of the good things about AMP is that at the end of the day it is a bunch of best practices for building mobile web pages. We were already following some of them, but adoption was scattered across various teams, each having its own preference. This initiative helped us consolidate the list and incorporate these best practices as a part of our regular development life cycle itself. This made our approach towards AMP more organic, rather than a forced function. The other good side effect of this is even our non-AMP pages become faster.
  • Less forking in code — This follows the previous point. Since we started following some of the AMP best practices for building regular pages, we were able to reuse most of the UI components between our non-AMP and AMP browse page. This resulted in less forking in code, which otherwise would have become a maintenance nightmare. Having said that, there is still some forking when it comes to JavaScript-based components, and we are still figuring out the best solution.
  • AMP Component list — Although the AMP project’s initial focus was more towards publisher-based content and news feeds, the AMP component list was still sufficient to build a basic product for viewing eCommerce pages. Users will not be able to do actions on items (such as “Add To Cart”), but they still get a solid browsing experience. The good news is that the list is getting better and growing day by day. Components like sidebar, carousel, and lightbox are critical in providing a compelling eCommerce experience.
  • Internal AMP platform — We have been thinking about leveraging the AMP ecosystem for our own search, similar to how Google handles AMP results. This plan is in very early stages of discussion, but the possibility of our search using AMP technology is very interesting.

The complex parts

  • Infrastructure components — To launch an eBay page to production, a lot of infrastructure components automatically come into play. These are things like Global header/footer, site speed beacon kit, experimentation library, and the analytics module. All of them have some amount of JavaScript, which immediately disqualifies them from being used in the AMP version. This adds complexity in development. We had to fork few infrastructure components to support the AMP guidelines. They had to go through a strict regression cycle before being published, which added delays. Also, our default front-end server pipeline had to be conditionally tweaked to exclude or swap certain modules. It was a good learning curve, and over time we have also replaced our early quick hacks with more robust and sustainable solutions.
  • Tracking — AMP provides user activity tracking through its amp-analytics component. amp-analytics can be configured in various ways, but it still was not sufficient for the granular tracking needs that eBay has. We also do stuff like session stitching, which needs cookie access. Creating an amp-analytics configuration to suit our needs was slowly becoming unmanageable. We need some enhancements in the component, which we are hoping to develop and commit to the project soon.

What’s next?

We are excited to partner with Google and everyone else participating on the AMP Project to close the gap in launching a full-fledged eCommerce experience in AMP. We have created a combined working group to tackle the gap, and we will be looking into these items and more.

  • Smart buttons — These enable us to do actions like “Add To Cart” and “Buy It Now” with authentication support.
  • Input elements — User interactive elements are critical to eCommerce experiences, be they simple search text boxes or checkboxes.
  • Advanced tracking — As mentioned earlier, we need more granular tracking for eBay, and so we have to figure out a way to achieve it.
  • A/B Testing — This will enable experimentation on AMP.

With items like these in place, AMP for eCommerce will soon start surfacing.

We will also be looking into creating a seamless transition from the AMP view to a regular page view, similar to what the Washington Post did using Service Workers. This will enable users to have a complete and delightful eBay experience without switching contexts.

We are also asked the question of if there is more focus towards web over native. The answer is NO. At eBay, we strongly believe that web and native do not compete each other. They indeed complement each other, and the combined ecosystem works very well for us. We will soon be launching these browse experiences in our native platforms.

We are on our path to making eBay the world’s first place to shop and this is a step towards it. Thanks to my colleague Suresh Ayyasamy, who partnered in implementing the AMP version of browse nodes and successfully rolling it to production.

Senthil

Igniting Node.js Flames

“Simple things bring infinite pleasure. Yet, it takes us a while to realize that. But once simple is in, complex it out – forever.” ― Joan F. Marques

Now that I have your attention, let me clear up the word “flames.” The flames that I’m referring to have nothing to do with fire. All I am talking about is performance tools in Node.js. When it comes to performance, everyone thinks of fighting fires, as many think performance optimization is a nightmare. Most of us think only that some individuals are masters in profiling.

Anyone can become master in profiling when given simple tools. At eBay, we strive to make things simple and easy for our developers to use. During the course of Node.js development and production issues, we soon realized that profiling in Node.js is not an easy thing to do.

Before jumping to the CPU profiling tool that simplified our lives, let me walk you through our journey that ended up in seeing flame charts from a completely different angle.

Flame graphs using kernel tools

With Brendan Gregg’s flame graph generation, it was much easier to visualize CPU bottlenecks. However, we need to run a small number of tools and scripts to generate these graphs.

Yunong Xiao has posted an excellent blog on how to generate flame graphs using the perf command based on Gregg’s tools. Kernel tools like DTrace (BSD and Solaris) and perf (Linux) are very useful in generating stack traces from the core level and transform the stack calls to flame graphs. This approach gives us complete picture from Node internals, from the V8 engine all the way to JS code.

However, running tools like this need some good understanding on tool itself and sometimes you need different OS itself. In most cases your production box and profiling box setup differ completely. This way makes it hard to investigate the issue going in production as one has to attempt to reproduce this issue in completely different environment.

After managing to run the tools, you will end up with flame charts like this.

netflix-profileImage source from Yunong Xiao’s blog

Here are some pros and cons for this approach.

Pros:

  • Easy to find CPU bottleneck
  • Graphical view
  • Complete profile graph for native and JS frames.

Cons:

  • Complexity in generating graphs.
  • Limited DTrace support by different platforms, harder to profile in DEV boxes

Chrome profiling tool

The Chrome browser is just amazing. It is famous not only for its speed but also for its V8 engine, which is core to Node.js. In addition to these features, one tool that web developers love about Chrome is Developer Tools.

ChromeDeveloperToolsMenu.652x302

There is one tool inside Developer Tools that is used to profile browser-side JS. The v8-profiler enables us to use server-side profile data in the Chrome Profile tool.

DeveloperTools.ProfilesTab

Let us see how we can use this for profiling our Node.js application. Before using Profiles in Chrome, we have to generate some profiling data from our running Node.js application. We will use v8-profiler for creating CPU profile data.

In the following code, I have created a route /cpuprofile for generating CPU profile data for a given number of seconds and then streaming the dump to a browser to open in Chrome.

This sample code creates a CPU dump using v8-profiler.

//file index.js
var express = require('express');
var util = require('util');
var profiler = require('v8-profiler');
var app = express();
app.get('/', function(req, res){
 res.send(“Hello World!!”);
});

app.get('/cpuprofile', function(req, res){
    var duration = req.query.duration || 2;
    res.header('Content-type', 'application/octet-stream');
    res.header('Content-Disposition', 'attachment; filename=cpu-profile' + Date.now() + '.cpuprofile');
    //Start Profiling
    profiler.startProfiling('CPU Profile', true);
    setTimeout(function(){
       //Stop Profiling after duration
       var profile = profiler.stopProfiling();
       //Pipe profile dump to browser
       profile.export().pipe(res).on('finish', function() {
            profile.delete();
       });
    }, duration * 1000); //Convert to millisec
});
app.listen(8080);

To generate CPU profile data, use these steps:

  1. Start your app.
    node index.js

    It’s a good idea to run ab to put some load on the page.

  2. Access the CPU profile dump using http://localhost:8080/cpuprofile?duration=2. A cpu-profile.cpuprofile will be downloaded from the server.
  3. Load the downloaded file cpu-profile.cpuprofile in Chrome using Developer Tools > Profiles > Load. Upon loading, you should see in your Profiles tab something like the following.chrome-profile

Now that you have opened profile data, you can drill down the tree and analyze which piece of code is taking more CPU time. With this tool, anyone can generate profile data anytime with just one click, but just imagine how hard it is to drill down with this tree structure when you have big application.

In comparison to Flame Graphs using Kernel Tools, here are some pros and cons.

Pros

  • Easy generation of a profile dump
  • Platform independent
  • Profiling available during live traffic

Cons

  • Chrome provides a graphical view for profile data, but the data is not aggregated and navigation is limited.

Flame graphs @ eBay

Now that we have seen two different approaches for generating CPU profile data, let us see how we can bring in a nice graphical view like flame graphs to V8-profiler data.

At eBay, we have taken a different approach to make it very simple and easy to use tool for our Node.js developers. We used V8-profiler data, applied the aggregation algorithm, and rendered the data as flame charts using the d3-flame-graphs module.

If you look at the .cpuprofile file closely (created above), it is basically a JSON file. We came across a generic d3-flame-graphs library that can draw flame graphs in a browser using input JSON data. Thanks to “cimi” for his d3-flame-graphs module.

After we made some modifications to the chrome2calltree aggregation algorithm and aggregated profile data (removed core-level CPU profile data), we could convert .cpuprofile data file to JSON that can be read by d3-flame-graphs, and the final outcome is simply amazing.

Three-step process

  1. Generate .cpuprofile on demand using v8-profiler as shown in Chrome Profiling Tool.
  2. Convert .cpuprofile into aggregated JSON format (source code).
  3. Load the JSON using d3-flame-graphs to render the flame graph on browser.

Output

This time access CPU flame graph on browser using the same URL (http://localhost:8080/cpuprofile?duration=2) from Chrome Profiling Tool.

ebay-profile

The above flame chart shows only JS frames, which is what most Node application developers are interested in.

Third-party packages used

Pros

  • Easy and simple to generate flame graphs
  • Doesn’t need special setup
  • Platform independent
  • Early performance analysis during development
  • Graphical view integrated into every application

Cons

  • Imposes 10% overhead during profiling

Summary

To summarize, we have seen three different ways of profiling CPU in Node.js, starting from using OS-level tools to rendering flame graphs on a browser using simple open source JS code. Simple and easy-to-use tools help anyone master profiling and performance tuning. At eBay, we always strive to make some difference in our developers’ lives.