Category Archives: Data Infrastructure and Services

Platforms, frameworks, services, best practices, etc. for managing Big Data at eBay

Announcing Pulsar: Real-time Analytics at Scale

We are happy to announce Pulsar – an open-source, real-time analytics platform and stream processing framework. Pulsar can be used to collect and process user and business events in real time, providing key insights and enabling systems to react to user activities within seconds. In addition to real-time sessionization and multi-dimensional metrics aggregation over time
Continue Reading »

HDFS Storage Efficiency Using Tiered Storage

At eBay, we run Hadoop clusters comprised of thousands of nodes that are shared by thousands of users. We store hundreds of petabytes of data in our Hadoop clusters. In this post, we look at how to optimize big data storage based on frequency of data usage. This method helps reduce the cost in an
Continue Reading »

Announcing Kylin: Extreme OLAP Engine for Big Data

We are very excited to announce that eBay has released to the open-source community our distributed analytics engine: Kylin ( Designed to accelerate analytics on Hadoop and allow the use of SQL-compatible tools, Kylin provides a SQL interface and multi-dimensional analysis (OLAP) on Hadoop to support extremely large datasets. Kylin is currently used in production
Continue Reading »