eBay Tech Blog

How We Built eBay’s First Node.js Application

by Senthil Padmanabhan on 05/17/2013

in Software Engineering

For the most part, eBay runs on a Java-based tech stack. Our entire workflow centers around Java and the JVM. Considering the scale of traffic and the stability required by a site like ebay.com, using a proven technology was an obvious choice. But we have always been open to new technologies, and Node.js has been topping the list of candidates for quite some time. This post highlights a few aspects of how we developed eBay’s first Node.js application.

Scalability

It all started when a bunch of eBay engineers (Steven, Venkat, and Senthil) wanted to bring an eBay Hackathon-winning project called “Talk” to production. When we found that Java did not seem to fit the project requirements (no offense), we began exploring the world of Node.js. Today, we have a full Node.js production stack ready to rock. 

We had two primary requirements for the project. First was to make the application as real time as possible–i.e., maintain live connections with the server. Second was to orchestrate a huge number of eBay-specific services that display information on the page–i.e., handle I/O-bound operations. We started with the basic Java infrastructure, but it consumed many more resources than expected, raising questions about scalability for production. These concerns led us to build a new mid-tier orchestrator from scratch, and Node.js seemed to be a perfect fit.

Mindset

Since eBay revolves around Java and since Java is a strongly typed static language, initially it was very difficult to convince folks to use JavaScript on the backend. The numerous questions involved ensuring type safety, handling errors, scaling, etc. In addition, JavaScript itself (being the world’s most misunderstood language) further fueled the debate. To address concerns, we created an internal wiki and invited engineers to express their questions, concerns, doubts, or anything else about Node.js.

Within a couple of days, we had an exhaustive list to work on. As expected, the most common questions centered around the reliability of the stack and the efficiency of Node.js in handling eBay-specific functionality previously implemented in Java. We answered each one of the questions, providing details with real-world examples. At times this exercise was eye-opening even for us, as we had never considered the angle that some of the questions presented. By the end of the exercise, people understood the core value of Node.js; indeed, some of the con arguments proved to be part of the beauty of the language.

Once we had passed the test of our peers’ scrutiny, we were all clear to roll.

Startup

We started from a clean slate. Our idea was to build a bare minimum boilerplate Node.js server that scales; we did not want to bloat the application by introducing a proprietary framework. The first four node modules we added as dependencies were express, clusterrequest, and async. For data persistence, we decided on MongoDB, to leverage its ease of use as well as its existing infrastructure at eBay. With this basic setup, we were able to get the server up and running on our developer boxes. The server accepted requests, orchestrated a few eBay APIs, and persisted some data.

For end-to-end testing, we configured our frontend servers to point to the Node.js server, and things seemed to work fine. Now it was time to get more serious. We started white-boarding all of our use cases, nailed down the REST end points, designed the data model and schema, identified the best node modules for the job, and started implementing each end point. The next few weeks we were heads down–coding, coding, and coding.   

Deployment

Once the application reached a stable point, it was time to move from a developer instance to a staging environment. This is when we started looking into deployment of the Node.js stack. Our objectives for deployment were simple: Automate the process, build once, and deploy everywhere. This is how Java deployment works, and we wanted Node.js deployment to be as seamless and easy as possible.

We were able to leverage our existing cloud-based deployment system. All we needed to do was write a shell script and run it through our Hudson CI job. Whenever code is checked in to the master branch, the Hudson CI job kicks off. Using the shell script, this job builds and packages the Node.js bundle, then pushes it to the deployment cloud. The cloud portal provides an easy user interface to choose the environment (QA, staging, or pre-production) and activate the application on the associated machines.

Now we had our Node.js web service running in various stable environments. This whole deployment setup was quicker and simpler than we had expected.  

Monitoring

At eBay, we have logging APIs that are well integrated with the Java thread model as well as at the JVM level. An excellent monitoring dashboard built on top of the log data can generate reports, along with real-time alerts if anything goes wrong. We achieved similar monitoring for the Node.js stack by hooking into the centralized logging system. Fortunately for us, we had logging APIs to consume. We developed a logger module and implemented three different logging APIs:

  1. Code-level logging. This level includes logging of errors/exceptions, DB queries, HTTP service calls, transaction metadata, etc.
  2. Machine-level logging. This level includes heartbeat data about CPU/memory and other OS statistics. Machine-level logging occurs at the cluster module level; we extended the npm cluster module and created an eBay-specific version.
  3. Logging at the load balancer level. All Node.js production machines are behind a load balancer, which sends periodic signals to the machines and ensures they are in good health. In the case of a machine going down, the load balancer fails-over to a backup machine and sends alerts to the operations and engineering teams.

We made sure the log data formats exactly matched the Java-based logs, thus generating the same dashboards and reports that everyone is familiar with.

One particular logging challenge we faced was due to the asynchronous nature of the Node.js event loop. The result was that the logging of transactions was completely crossed. To understand the problem, let’s consider the following use case:  The Node process starts a URL transaction and issues a DB query with an async callback. The process will now proceed with the next request, before the DB transaction finishes. This being a normal scenario in any event loop-based model like Node.js, the logs are crossed between multiple URL transactions, and the reporting tool shows scrambled output. We have worked out both short-term and long-term resolutions for this issue.

Conclusion

With all of the above work completed, we are ready to go live with our Hackathon project. This is indeed the first eBay application to have a backend service running on Node.js. We’ve already had an internal employee-only launch, and the feedback was very positive–particularly on the performance side. Exciting times are ahead!

A big shout-out to our in-house Node.js expert Cylus Penkar, for his guidance and contributions throughout the project. With the success of the Node.js backend stack, eBay’s platform team is now developing a full-fledged frontend stack running on Node.js. The stack will leverage most of our implementation, in addition to frontend-specific features like L10N, management of resources (JS/CSS/images), and tracking. For frontend engineers, this is a dream come true; and we can proudly say, “JavaScript is EVERYWHERE.”

Senthil Padmanabhan & Steven Luan
Engineers @ eBay

{ 50 comments… read them below or add one }

Benjamin Gleitzman May 17, 2013 at 9:07AM

What did you build? Where can I use it?

Reply

Raghuram Gururajan May 17, 2013 at 10:25AM

Good Blog.Can you elaborate on what specific feature of JAVA was not scalable compared to Node.js?

Reply

Senthil Padmanabhan May 19, 2013 at 10:02PM

Raghu, live connections with the basic JAVA based server did not scale. We could have solved it by using some advanced JAVA options, but wanted to try Node.js which was an easy solve.

Reply

Mark May 20, 2013 at 9:46PM

Disclaimer: I am not a java fan. Do mainly python and erlang. But have done extensive java in my past life.

My comment: This post is good but does a lot of hipster talk like “JAVA based server did not scale” and “real time live connection” requirements. Since this is the official ebay tech blog, I expect more concrete evidence on why you think x does not scale. Did you measure it? What kind of scale does scale? What do you mean by real time?

I apologize, I am not trying to be rude. It’s always good to experiment with new technology and use new approaches, makes you think in a different way. But give proper justification of your choices on a tech blog. Other wise nothing more than a tweet or a reddit post.

Reply

Senthil Padmanabhan May 22, 2013 at 11:34AM

We did not try out various options in Java, which would have solved the problem. That is why mentioned basic Java setup. Also wanted to use this opportunity to try out new stack like Node.js

Reply

JsCoder May 17, 2013 at 11:42AM

Awesome! Now check back in in 5 years and let us know how your greenfield application turned out ;) I love JS and people always tap me to work with it, but I wont lie and say its maintainable. At eBay scale, I’m pretty sure this will translate into you guys writing tons of internal tools to help deal with the invevitable mess it will create… As a greenfield tool, its awesome though, give you that.

Javascript: fast dev acceleration, slowest top speed, Java: slow app acceleration, highest top speed.

Reply

Dmitry May 17, 2013 at 11:54AM

“JavaScript is EVERYWHERE” YAY!

Reply

TJ May 17, 2013 at 12:44PM

Show us the code!

Reply

rynop May 17, 2013 at 1:57PM

one big hanging point of me for node was scale on a single machine (due to 1.7gig limit among other things), which the http://nodejs.org/api/cluster.html addresses. I was very hesitant to bet a production workload on the experimental cluster module – but sounds like you are not. What are your thoughts on this module? Did your team vette it?

Reply

Senthil Padmanabhan May 17, 2013 at 3:26PM

Till now it seems to work fine. This is also one of the reasons we have end-2-end monitoring in place. We did some initial analysis and our platform team extended the module to add some eBay specific monitoring in place.

Reply

Michael J. Ryan May 17, 2013 at 2:21PM

This is great… At my last job I spent about half my time working on NodeJS infrastructure and utility scripts, the other half in the .Net world. I’m glad you were able to overcome the resistance that tends to come from .Net and Java development environments.

It might be helpful for others, if you were to share the questions/comments and answers that came from your interactions internally.

Reply

Andrej May 17, 2013 at 2:30PM

What about ql.io, wasn’t that the first? http://www.ebaytechblog.com/2011/11/30/announcing-ql-io/

Reply

Senthil Padmanabhan May 17, 2013 at 3:04PM

ql.io was more like a Node module or a Node stack open sourced by eBay for an elegant orchestration. Where as this is a real time user facing project serving production traffic. And this is the first time we have a Node.js production server setup.

Reply

Glenn Block May 18, 2013 at 12:26PM

Senthil

Isn’t ql.io used at EBay though?

Reply

Senthil Padmanabhan May 22, 2013 at 11:25AM

Its used in a lot of internal projects. But this is the first time on production scale.

Reply

Nahlyee May 17, 2013 at 4:38PM

Great post! Can you elaborate on how you solved the short-term and long-term async logging challenge you referenced at the end?

Reply

goelvivek August 13, 2013 at 3:07AM

Any answer for this question?

Reply

Senthil Padmanabhan August 28, 2013 at 11:17AM

Short term we generate a unique ID for each URL transaction and all its sub transactions will get it inherited. So when reading an URL transaction from logs, we can retrieve all its sub-transactions with this unique ID. Long term, the infrastructure team is working on a buffer layer to well format the transaction chunks before writing to log system.

Reply

Ivan November 22, 2013 at 6:03AM

Can you please explain little further how did you implement inheriting unique ID from other modules/transactions. I don’t see any normal solution except passing request object (or some context data) to all modules and their functions in chain. We are trying to do this with domains but it is not a solution.

I see that node.js community is also talking about this problem:
https://github.com/joyent/node/issues/3733
https://github.com/nodeup/contribute/issues/20

Reply

HUI LUAN November 27, 2013 at 10:37AM

you are right about passing context, thats the way we are using to create a transaction tree. but we have a separate layer to format the tree transaction view so that in the final log it would not show the tree would not cross each other. given its single-thread and asynchronous it will show very messy logs which cross each other without formatting.

Islam Sharabash May 22, 2014 at 9:04AM

Hi Ivan,

Just wrote a post on how to do this without passing around context.
https://datahero.com/blog/2014/05/22/node-js-preserving-data-across-async-callbacks/

Hope this helps! Let me know any questions you have.

viji May 18, 2013 at 2:56AM

Great Post ! Node.js follows event driven concurrency model (asynchronous ) for the connection and good choice for handling more connections . But Java also supports the asynchronous thread model (netty,Play framework) without creating the thread for the connection. So why do we need to use node.js over JAVA considering the existing stack on JVM ? Is there any use case/specific feature where node.js perform well for these problems. (i.e; i code most of the time on javascript than JAVA)

Reply

Senthil Padmanabhan May 22, 2013 at 11:52AM

Absolutely, Java could have solved the issue. We did not want to compare the 2 languages here, sorry if the post meant that. All languages have their own advantages. We just wanted to use this opportunity to try Node.js

Reply

Patrick Steele-Idem May 23, 2013 at 1:20PM

One of the differences between Java and Node.js is that each Node.js process is single threaded and it forces developers down the asynchronous, non-blocking I/O path. All of the Node.js I/O related APIs are available as non-blocking so Node.js, and the vibrant Node.js community has embraced non-blocking I/O. For those reasons, Node.js makes it very easy to create applications that are extremely scalable based on the non-blocking I/O model. That’s not to say that the JVM could not offer the same scalability, it’s just that the Node.js runtime was designed around non-blocking I/O and the Java runtime was not, and that makes a huge difference in practice.

Not having to compile JavaScript files before running a Node.js app and the dynamic nature and less verbosity of the JavaScript language are added bonuses. Plus, front-end developers can use JavaScript on both the server and the client so there is less context switching required and more code reuse.

Reply

bello May 18, 2013 at 5:40AM

Senthil, with all due respect, I want to retitle this blog entry to
HOW I ATE MY FIRST CHOCOLATE CAKE

…went to the shop and got a large piece of cake and put it on a china plate. We used our own silver fork…

my point being, without a picture, talking about how and what you did is useless.

Introduce some code for god sake— a better more useful entry would be
HOW WE EXTENDED NPM CLUSTER MODULE (as a side note, u can plug that u wrote ur first node.js app)

Reply

Tristan May 20, 2013 at 9:09AM

Any chance of making the wiki available? I’d love to eead the Q&A between all of your engineers and what your conclusions were.

Thanks for the great post!

Reply

Patrick Steele-Idem May 20, 2013 at 12:07PM

Great job team! You all did a lot of work to pave the way for making Node.js a very successful platform at eBay, and I definitely agree that a Node.js stack will be a dream come true for front-end developers.

Reply

bastian bastias May 23, 2013 at 1:26PM

wow i think it is the future. I think that in five years all pointed to this.

Reply

Artem May 24, 2013 at 5:50AM

Did you use node’s domains? How did you preserve from errors? Does one request kill others at same process if throws error?

Reply

Senthil Padmanabhan May 24, 2013 at 10:21AM

Not yet. We are planing to implement it in one of our upcoming releases. For now we are using the “uncaughtException” event on process (as shown below) which will be removed once we move to domains. Also we made sure all our individual functional blocks have proper error/exception handling.

process.on(‘uncaughtException’, function(err) {
console.log(err);
});

Reply

brian May 24, 2013 at 7:22AM

Sorry guys. But the lack of detail in this post makes it almost useless. I’m glad you had fun with Node.js and solved some problems along the way. I look forward to your legal team clearing you to post the details of what you actually did to solve those problems.

Reply

william May 24, 2013 at 5:46PM

Thanks for the blog post.

We are building out our first app with Node.js, but we are taking small steps at first. The node app isn’t talking directly to our data stores but rather using our internal REST APIs. Is that similar what what you have done?

You mentioned MongoDB. What about access to RDBMs, caching, and a message queue in Node.js? How do they interface with your existing infrastructure? Are you sharing cached data between the Node and Java apps, for example?

You also didn’t discuss if/how you might be using websockets. Our project has some background processing where we want to notify the user when done, and one challenge has been how to get messages back to the right Node.js instance in the cluster with the websocket for the given client. Looking at Redis Pub/Sub for that. Anything similar in your app?

Reply

Steven June 3, 2013 at 10:24AM

We use node to talk to both REST APIs and MongoDB instances.

We have not used node to talk to RDBMs. We have tried out Redis for caching and message queues, but not yet released. There is no data shared directly between Java and Node so far.

Currently there is no use case to bi-directional communication with websocket, instead we are trying to use Server-Sent Events and Redis Pub/sub to one way push data to client from server for notification. This feature is not release yet as well.

Reply

Aidan Black May 28, 2013 at 7:05AM

+1 for seeing opening up the Wiki! I hope your legal team clears it because it would be very useful to see if the questions and concerns you had to deal with line up with the ones I am asking myself.

Reply

Ryan Pendergast June 11, 2013 at 7:34AM

Any update on the progress with legal? Almost wish you would not have posted this until you had it – cuz now its like teasing us.

Reply

steven luan October 11, 2013 at 11:44AM
Gurutechmind July 20, 2013 at 3:08AM

Great article! I agree with you that “JavaScript is EVERYWHERE.” Absolutely, Java could have solved the issue.

Reply

concerto49 August 14, 2013 at 8:23AM

Does node.js really make you code faster? How does it compare to the Java frameworks you’re used to? Does node have a lower or higher cost of maintenance?

Interesting stuff.

Reply

Senthil Padmanabhan August 28, 2013 at 11:16AM

From a frontend engineer’s perspective yes, it makes you code faster. The main reason being a single language (JavaScript) context. Also the build & server start-up time are a breeze here which makes engineers very productive. Also Node.js makes you IDE independent, so engineers choose a light weight IDE of their choice. Java framework is closely tied up with a heavy IDE (which makes you slow sometimes) and also build/compile/server start-up times are little higher.

Maintenance wise we have wait and see once the more feature get added.

Reply

Zac Tolley October 8, 2013 at 10:25AM

I’m about to deploy a node app for a very high profile and trafficked site and wondered if cluster is the best way to go to have something that won’t bring the platform to a standstill for the sake of one bad request. I’m also looking at using forever to monitor the node process to restart it if it crashes.

Reply

steven luan October 11, 2013 at 11:42AM

yes, you can not avoid cluster use to utilize all CPU cores and manage the worker status.

you may also check out https://github.com/ql-io/cluster2, which is a wrapper on top of native node.js cluster, to provide better stability and scalability.

Reply

Patrick Steele-Idem October 14, 2013 at 10:21AM

Hi Zac,

I’m the development lead on the team at eBay that is building the Node.js stack for web development at eBay so I can share with you some insight from our experience using cluster in production. To answer your question, you absolutely should use cluster because otherwise you will not be fully utilizing all of the CPU cores on your machine. The OS will load balance the connections across the worker processes so that you can handle processing requests in parallel. The only catch is that Node.js worker nodes are full-blown Node.js/V8 instances so they will consume a decent amount of memory.

While the “cluster” module is labelled as “Experimental” we have not see any issues with the “cluster” implementation. While the API might change in the future, I don’t suspect that will be the case. The “cluster” implementation is pretty straightforward, Node.js simply spawns a new process (using the same main script and command line arguments), enables port sharing with the master and sets up a parent/child communication channel and provides an API for determining if a process is a worker process or the master process.

Error handling on Node.js is a completely different story. By default, any uncaught exception will kill a worker process. You can choose to capture and ignore an uncaught exception but it is generally recommended to catch the exception and to then have the worker process stop listening on the HTTP port and kill it after a minute (or as soon as there are no more active connections for that worker). After killing a worker, you should fork again from the master process so that you have enough workers to handle the traffic.

Lastly, the “cluster” module does not provide any support for managing and monitoring worker processes. That is why eBay developed the open source “cluster2″ module:
https://github.com/cubejs/cluster2

The cluster2 module has been battle-tested and we are already thinking about and developing some major improvements in future releases to support cluster caches, better monitoring and improved worker management strategies.

By the way, CubeJS is the name of the Node.js stack at eBay Marketplaces…Expect to hear more about it in the future as we share more of our learnings.

I hope that helps.

–Patrick

Reply

Ivan November 22, 2013 at 12:52PM

Nice post. I would just add that in our experiment we also saw performance gain when we created more cluster workers than CPU cores (which people usually put as default).

Reply

srirama April 1, 2014 at 7:45AM

Hi Patrick,
I am a java developer. I have a project which was developed in struts framework. I want to implement same project in node.js. whether i need to start from scratch in node.js or can i use existing java code with node js? Can you please suggest me.

Reply

Kyle DeFreitas October 8, 2013 at 7:03PM

This is a great article and raises some thought provoking concepts for thinking about nodejs. I was wondering if it were possible to make public some of the concerns and how you responded to them in the Wiki you mentioned during the post. I think this will do a great deal for helping other who may want to consider the pros and cons of using nodejs.

While information exist, having it centralized like a blog post will be a great resource.

Reply

Chaitanya Teja Chinthapalli November 19, 2013 at 5:14AM

Hi Guys, As i m very new to Node.js i got stuck where to implement and how to connect my application with node.js… Can u guys
please help me out

Reply

Prasad November 22, 2013 at 7:41PM

I really want to know how scalable is node.js I want to develop some application based on affiliate markets. I am targeting some 1 million users. Here is my scenario. I want to develop the application using node.js and mongodb. Then I want to host it in AWS EC2. Then scale the application using Amazon ELB and autoscaling. Please let me if the feasible paperwork discussed on this other than this article. Thanks.

Reply

Oleg Gromov December 13, 2013 at 5:19AM

Hello guys!

Thanks for the post—I’m digging around Node.js world and going to write my first webapp using Node and Mongo now. It’s a bit misleading for me now how the asynchronous webapp should work, after the years of synchronous php-like programming.

You’ve mentioned some resolutions for working out the logging in async apps issue. Could you tell more on this topic please?

Reply

Senthil Padmanabhan December 16, 2013 at 5:02PM

Hi Oleg,

Please refer to the comment thread http://www.ebaytechblog.com/2013/05/17/how-we-built-ebays-first-node-js-application/#comment-137093 for more details about your question.

-Senthil

Reply

July 5, 2014 at 8:46AM

thank you for your post
it is good post to me

Reply

Leave a Comment

{ 6 trackbacks }

Previous post:

Next post:

Copyright © 2011 eBay Inc. All Rights Reserved - User Agreement - Privacy Policy - Comment Policy