Category Archives: Data Infrastructure and Services

Platforms, frameworks, services, best practices, etc. for managing Big Data at eBay

Announcing Pulsar Reporting: Near-Real-Time Metrics Reporting Framework

We are excited to announce the first open-source release of Pulsar Reporting. Earlier this year, we announced, an open-source project that included Pulsar Pipeline, a real-time analytics platform and stream processing framework. One of the frequently requested features for Pulsar has been integration with a metrics store for visualizing the near-real-time metrics. We’ve provided
Continue Reading »

Apache Eagle: Secure Hadoop in Real Time

Co-Authors: Chaitali Gupta and Edward Zhang Update:  Eagle was accepted as an Apache Incubator project on October 26, 2015. Today’s successful organizations are data driven. At eBay we have thousands of engineers, analysts, and data scientists who crunch petabytes of data everyday to provide a great experience for our users.  We execute at massive scale using data
Continue Reading »

GZinga: Seekable and Splittable Gzip

Co-Author:  Mahesh Somani Generally, data compression techniques are used to conserve space and network bandwidth. Widely used compression techniques include Gzip, bzip2, lzop, and 7-Zip. According to performance benchmarks, lzop is one of the fastest compression algorithms, while bzip2 has a high compression ratio but is very slow. Gzip offers the lowest level of compression. Gzip
Continue Reading »