Category Archives: Hadoop

Secure Communication in Hadoop without Hurting Performance

  Apache Hadoop is used for processing big data at many enterprises. A Hadoop cluster is formed by assembling a large number of commodity machines, and it enables the distributed processing of data. Enterprises store lots of important data on the cluster. Different users and teams process this data to obtain summary information, generate insights,
Continue Reading »

Multiple Authentication Mechanisms for Hadoop Web Interfaces

  Apache Hadoop is a base component for Big Data processing and analysis. Hadoop servers, in general, allow interaction via two protocols: a TCP-based RPC (Remote Procedure Call) protocol and the HTTP protocol. The RPC protocol currently allows only one primary authentication mechanism: Kerberos. The HTTP interface allows enterprises to plug in different authentication mechanisms.
Continue Reading »

Griffin — Model-driven Data Quality Service on the Cloud for Both Real-time and Batch Data

Overview of Griffin At eBay, when people use big data (Hadoop or other streaming systems), measurement of data quality is a significant challenge. Different teams have built customized tools to detect and analyze data quality issues within their own domains. As a platform organization, we think of taking a platform approach to commonly occurring patterns.
Continue Reading »