Multiple Authentication Mechanisms for Hadoop Web Interfaces

 

Apache Hadoop is a base component for Big Data processing and analysis. Hadoop servers, in general, allow interaction via two protocols: a TCP-based RPC (Remote Procedure Call) protocol and the HTTP protocol.

The RPC protocol currently allows only one primary authentication mechanism: Kerberos. The HTTP interface allows enterprises to plug in different authentication mechanisms. In this post, we are focusing on enhancing Hadoop with a simple framework that allows us to plug in multiple authentication mechanisms for Hadoop web interfaces.

diagram showing a user interfacing with Hadoop via a custom authentication mechanism

Note that the Hadoop HTTP Authentication module (deployed as the hadoop-auth-version.jar file) is reused by different Hadoop servers like NameNode, ResourceManager, NodeManager, and DataNode as well as other Hadoop-based components like Hbase and Oozie.

We can follow the steps below to plug in custom authentication mechanism.

  1. Implement interface AuthenticationHandler, which is under the  org.apache.hadoop.security.authentication.server package.
  2. Specify the implementation class in the configuration. Make sure that the implementation class is available in the classpath of the Hadoop server.

AuthenticationHandler interface

The implementation of the AuthenticationHandler will be loaded by the AuthenticationFilter, which is a servlet Filter loaded during startup of the Hadoop server’s web server.

The definition of AuthenticationHandler interface is as follows:

package org.apache.hadoop.security.authentication.server;
public interface AuthenticationHandler {
    public String getType();
    public void init(Properties config) throws ServletException;
    public void destroy();
    public boolean managementOperation(AuthenticationToken token, HttpServletRequest request, HttpServletResponse response) throws IOException, AuthenticationException;
    public AuthenticationToken authenticate(HttpServletRequest request, HttpServletResponse response)throws IOException, AuthenticationException;
}

The init method accepts a Properties object. This contains the properties read from the Hadoop configuration. Any config property that is prefixed by hadoop.http.authentication.Type will be added to the Properties object.

The authenticate method does the job of performing the actual authentication. For successful authentication, an AuthenticationToken is returned. The AuthenticationToken implements java.user.Principal and contains the following set of properties:

  • Username
  • Principal
  • Authentication type
  • Expiry time

Existing AuthenticationHandlers

There are a few implementations of AuthenticationHandler interface that are part of the Hadoop distribution.

  • KerberosAuthenticationHandler — Performs Spnego Authentication.
  • PseudoAuthenticationHandler — Performs simple authentication. It authenticates the user based on the identity passed via the user.name URL query parameter.
  • AltKerberosAuthenticationHandler — Extends KerberosAuthenticationHandler. Allows you to provide an alternate authentication mechanism by extending
  • AltKerberosAuthenticationHandler. The developer has to implement the alternateAuthenticate method in which to add the custom authentication logic.

Composite AuthenticationHandler

At eBay, we like to provide multiple authentication mechanisms in addition to the Kerberos and anonymous authentication. The operators prefer to turn off any authentication mechanism by modifying the configuration rather than rolling out new code. For this reason, we implemented a CompositeAuthenticationHandler.

The CompositeAuthenticationHandler accepts a list of authentication mechanisms via the property hadoop.http.authentication.composite.handlers. This property contains a list of classes that are implementations for AuthenticationHandler corresponding to different authentication mechanisms.

diagram showing a user and a service interfacing with Hadoop via two different authentication mechanisms, Kerberos and 2FA

The properties for each individual authentication mechanism can be passed via configuration properties prefixed with hadoop.http.authentication.Type. The following table lists the different properties supported by CompositeAuthenticationHandler.

 

# Property Description Default Value
1 hadoop.http.authentication.composite.handlers List of classes that implement AuthenticationHandler for various authentication mechanisms
2 hadoop.http.authentication.composite.default-non-browser-handler-type The default authentication mechanism for a non-browser access
3 hadoop.http.authentication.composite.non-browser.user-agents List of user agents whose presence in the User-Agent header is considered to be a non-browser. java,curl,wget,perl

The source code for CompositeAuthenticationHandler is attached to the JIRA page HADOOP-10307.