Category Archives: Data Infrastructure and Services

Platforms, frameworks, services, best practices, etc. for managing Big Data at eBay

Ready-to-use Virtual-machine Pool Store via warm-cache

Problem overview Conventional on-demand Virtual Machine (VM) provisioning methods on a cloud platform can be time-consuming and error-prone, especially when we need to provision VMs in large numbers quickly. The following list captures different issues that we often encounter while trying to provision a new VM instance on the fly: Insufficient availability of compute resources
Continue Reading »

Secure Communication in Hadoop without Hurting Performance

  Apache Hadoop is used for processing big data at many enterprises. A Hadoop cluster is formed by assembling a large number of commodity machines, and it enables the distributed processing of data. Enterprises store lots of important data on the cluster. Different users and teams process this data to obtain summary information, generate insights,
Continue Reading »

Multiple Authentication Mechanisms for Hadoop Web Interfaces

  Apache Hadoop is a base component for Big Data processing and analysis. Hadoop servers, in general, allow interaction via two protocols: a TCP-based RPC (Remote Procedure Call) protocol and the HTTP protocol. The RPC protocol currently allows only one primary authentication mechanism: Kerberos. The HTTP interface allows enterprises to plug in different authentication mechanisms.
Continue Reading »