Skip to main content

Alibaba Cloud DLF

Data Lake Formation (DLF) is the unified metadata management service of Alibaba Cloud. It is compatible with the Hive Metastore protocol.

What is DLF

Doris can access DLF the same way as it accesses Hive Metastore.

Connect to DLF

The First Way, Create a Hive Catalog.

CREATE CATALOG hive_with_dlf PROPERTIES (
"type"="hms",
"dlf.catalog.proxyMode" = "DLF_ONLY",
"hive.metastore.type" = "dlf",
"dlf.catalog.endpoint" = "dlf.cn-beijing.aliyuncs.com",
"dlf.catalog.region" = "cn-beijing",
"dlf.catalog.uid" = "uid",
"dlf.catalog.accessKeyId" = "ak",
"dlf.catalog.accessKeySecret" = "sk"
);

type should always be hms. If you need to access Alibaba Cloud OSS on the public network, can add "dlf.catalog.accessPublic"="true".

Other configuration items are fixed and require no modifications.

After the above steps, you can access metadata in DLF the same way as you access Hive MetaStore.

Doris supports accessing Hive/Iceberg/Hudi metadata in DLF.

The Second Way, Configure the Hive Conf

  1. Create the hive-site.xml file, and put it in the fe/conf directory.
<?xml version="1.0"?>
<configuration>
<!--Set to use dlf client-->
<property>
<name>hive.metastore.type</name>
<value>dlf</value>
</property>
<property>
<name>dlf.catalog.endpoint</name>
<value>dlf-vpc.cn-beijing.aliyuncs.com</value>
</property>
<property>
<name>dlf.catalog.region</name>
<value>cn-beijing</value>
</property>
<property>
<name>dlf.catalog.proxyMode</name>
<value>DLF_ONLY</value>
</property>
<property>
<name>dlf.catalog.uid</name>
<value>20000000000000000</value>
</property>
<property>
<name>dlf.catalog.accessKeyId</name>
<value>XXXXXXXXXXXXXXX</value>
</property>
<property>
<name>dlf.catalog.accessKeySecret</name>
<value>XXXXXXXXXXXXXXXXX</value>
</property>
</configuration>
  1. Restart FE, Doris will read and parse fe/conf/hive-site.xml. And then Create Catalog via the CREATE CATALOG statement.
CREATE CATALOG hive_with_dlf PROPERTIES (
"type"="hms",
"hive.metastore.uris" = "thrift://127.0.0.1:9083"
)

type should always be hms; while hive.metastore.uris can be arbitary since it is not used in real practice, but it should follow the format of Hive Metastore Thrift URI.