Category Archives: Hadoop

Monitoring Anomalies in the Experimentation Platform

Experimentation The Experimentation platform at eBay runs around 1500 experiments that are responsible for processing over hundreds of TBs of reporting data contained in millions of files using Hadoop infrastructure and consuming thousands of computing resources. The entire report generation process contains well over 200 metrics, and it enables millions of customers to experience small
Continue Reading »

Scalable and Nimble Continuous Integration for Hadoop Projects

  Experimentation The Experimentation Platform at eBay runs around 1500 experiments that are responsible for processing over hundreds of terabytes of reporting data contained in millions of files using a 2500+ node Hadoop infrastructure and consuming thousands of computing resources. The entire report generation process contains well over 200 metrics. It enables millions of customers to experience small and
Continue Reading »

The Sprinting Pachyderm: Improving Runtime Performance of Your Big Data Application

  Disclaimer: No elephants were harmed while writing this blog post! Big Data applications have become ubiquitous in software development. With treasure troves of data being collected by companies, there is always a need to derive business sense from the data quickly or risk losing their temporal context. Companies typically enable the quick pace by
Continue Reading »