Skip to main content

HDFS

This document is used to introduce the parameters required when accessing HDFS. These parameters apply to:

  • Catalog properties.
  • Table Valued Function properties.
  • Broker Load properties.
  • Export properties.
  • Outfile properties.
  • Backup and restore

Parameter Overview

Property NameFormer NameDescriptionDefault ValueRequired
hdfs.authentication.typehadoop.security.authenticationAuthentication type for accessing HDFS. Supports kerberos and simplesimpleNo
hdfs.authentication.kerberos.principalhadoop.kerberos.principalSpecifies the principal when the authentication type is kerberos-No
hdfs.authentication.kerberos.keytabhadoop.kerberos.keytabSpecifies the keytab when the authentication type is kerberos-No
hdfs.impersonation.enabled-If true, HDFS impersonation will be enabled. It will use the proxy user configured in core-site.xml to proxy the Doris login user to perform HDFS operationsNot supported yet-
hadoop.username-When the authentication type is simple, this user will be used to access HDFS. By default, the Linux system user running the Doris process will be used--
hadoop.config.resources-Specifies the directory of HDFS-related configuration files (must include hdfs-site.xml and core-site.xml), must use a relative path, the default directory is /plugins/hadoop_conf/ under the (FE/BE) deployment directory (can be changed by modifying hadoop_config_dir in fe.conf/be.conf). All FE and BE nodes must configure the same relative path. Example: hadoop/conf/core-site.xml,hadoop/conf/hdfs-site.xml--
dfs.nameservices-Manually configure parameters for HDFS high availability clusters. If configured with hadoop.config.resources, parameters will be automatically read from hdfs-site.xml. Must be used with the following parameters:
dfs.ha.namenodes.your-nameservice
dfs.namenode.rpc-address.your-nameservice.nn1
dfs.client.failover.proxy.provider etc.
--

Authentication Configuration

  • hdfs.authentication.type: Used to specify the authentication type. Options are kerberos or simple. If kerberos is selected, the system will use Kerberos authentication to interact with HDFS; if simple is used, it means no authentication is used, suitable for open HDFS clusters. Choosing kerberos requires configuring the corresponding principal and keytab.
  • hdfs.authentication.kerberos.principal: Specifies the Kerberos principal when the authentication type is kerberos. A Kerberos principal is a string that uniquely identifies an identity, usually including the service name, hostname, and domain name.
  • hdfs.authentication.kerberos.keytab: This parameter specifies the path to the keytab file used for Kerberos authentication. The keytab file is used to store encrypted credentials, allowing the system to authenticate automatically without requiring the user to manually enter a password.

Authentication Types

HDFS supports two authentication methods:

  • Kerberos
  • Simple
Simple Authentication

Simple authentication is suitable for HDFS clusters where Kerberos is not enabled.

To use Simple authentication, the following parameter needs to be set:

"hdfs.authentication.type" = "simple"

In Simple authentication mode, the hadoop.username parameter can be used to specify the username. If not specified, the username of the current process will be used by default.

Example:

Access HDFS using the lakers username

"hdfs.authentication.type" = "simple",
"hadoop.username" = "lakers"

Access HDFS using the default system user

"hdfs.authentication.type" = "simple"
Kerberos Authentication

Kerberos authentication is suitable for HDFS clusters where Kerberos is enabled.

To use Kerberos authentication, the following parameters need to be set:

"hdfs.authentication.type" = "kerberos"
"hdfs.authentication.kerberos.principal" = "<your_principal>"
"hdfs.authentication.kerberos.keytab" = "<your_keytab>"

In Kerberos authentication mode, the Kerberos principal and keytab file path need to be set.

Doris will access HDFS with the identity specified by the hdfs.authentication.kerberos.principal property, using the keytab specified by the keytab for authentication.

Note:

  • The keytab file must exist on every FE and BE node, and the path must be the same, and the user running the Doris process must have read permission for the keytab file.

Example:

"hdfs.authentication.type" = "kerberos",
"hdfs.authentication.kerberos.principal" = "hdfs/hadoop@HADOOP.COM",
"hdfs.authentication.kerberos.keytab" = "/etc/security/keytabs/hdfs.keytab",

Configuration Files

Doris supports specifying the directory of HDFS-related configuration files through the hadoop.config.resources parameter.

The configuration file directory must include hdfs-site.xml and core-site.xml files, the default directory is /plugins/hadoop_conf/ under the (FE/BE) deployment directory. All FE and BE nodes must configure the same relative path.

If the configuration file contains the parameters mentioned in the document above, the parameters explicitly configured by the user will be used preferentially. The configuration file can specify multiple files, separated by commas. For example, hadoop/conf/core-site.xml,hadoop/conf/hdfs-site.xml.