Skip to main content

LOCAL

Description​

Local table-valued-function(tvf), allows users to read and access local file contents on be node, just like accessing relational table. Currently supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc file format.

syntax​

LOCAL(
"file_path" = "<file_path>",
"backend_id" = "<backend_id>",
"format" = "<format>"
[, "<optional_property_key>" = "<optional_property_value>" [, ...] ]
);

Required Parameters​

ParameterDescriptionRemarks
file_pathThe path of the file to be read, which is relative to the user_files_secure_path directory. The user_files_secure_path parameter is a BE configuration item.
The path cannot include .., and glob syntax can be used for pattern matching, such as logs/*.log.
backend_idThe ID of the BE node where the file is located. It can be obtained via the show backends command.Before version 2.1.1, Doris only supports specifying a BE node to read local data files on that node.
formatThe file format, which is required. Supported formats are csv/csv_with_names/csv_with_names_and_types/json/parquet/orc.

Optional Parameters​

ParameterDescriptionRemarks
shared_storageDefaults to false. If true, the specified file is located on shared storage (e.g., NAS). The shared storage must support POSIX file interfaces and be mounted on all BE nodes.
When shared_storage is true, backend_id can be omitted. Doris may utilize all BE nodes to access the data. If backend_id is set, the data will be accessed only on the specified BE node.
Supported starting from version 2.1.2
column_separatorThe column separator, optional, defaults to \t.
line_delimiterThe line delimiter, optional, defaults to \n.
compress_typeThe compression type, optional. Supported types are UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK. Defaults to UNKNOWN, and the type will be automatically inferred from the uri suffix.
read_json_by_lineFor JSON format imports, optional, defaults to true.Refer to: Json Load
strip_outer_arrayFor JSON format imports, optional, defaults to false.Refer to: Json Load
json_rootFor JSON format imports, optional, defaults to empty.Refer to: Json Load
json_pathsFor JSON format imports, optional, defaults to empty.Refer to: Json Load
num_as_stringFor JSON format imports, optional, defaults to false.Refer to: Json Load
fuzzy_parseFor JSON format imports, optional, defaults to false.Refer to: Json Load
trim_double_quotesFor CSV format imports, optional, defaults to false. If true, it will trim the outermost double quotes around each field in the CSV file.For CSV format
skip_linesFor CSV format imports, optional, defaults to 0, which means skipping the first few lines of the CSV file. When the format is csv_with_names or csv_with_names_and_types, this parameter is ignored.For CSV format
path_partition_keysOptional, specifies the partition column names carried in the file path, e.g., /path/to/city=beijing/date="2023-07-09", then fill in path_partition_keys="city,date". This will automatically read the corresponding column names and values from the path for import.

Access Control Requirements​

PrivilegeObjectNotes
ADMIN_PRIVglobal

Usage Notes​

  • For more detailed usage of local tvf, please refer to S3 tvf, The only difference between them is the way of accessing the storage system.

  • Access data on NAS through local tvf

    NAS shared storage allows to be mounted to multiple nodes at the same time. Each node can access files in the shared storage just like local files. Therefore, the NAS can be thought of as a local file system, accessed through local tvf.

    When setting "shared_storage" = "true", Doris will think that the specified file can be accessed from any BE node. When a set of files is specified using wildcards, Doris will distribute requests to access files to multiple BE nodes, so that multiple nodes can be used to perform distributed file scanning and improve query performance.

Examples​

Analyze the log file on specified BE:

select * from local(
"file_path" = "log/be.out",
"backend_id" = "10006",
"format" = "csv")
where c1 like "%start_time%" limit 10;
+--------------------------------------------------------+
| c1 |
+--------------------------------------------------------+
| start time: 2023 εΉ΄ 08 月 07 ζ—₯ ζ˜ŸζœŸδΈ€ 23:20:32 CST |
| start time: 2023 εΉ΄ 08 月 07 ζ—₯ ζ˜ŸζœŸδΈ€ 23:32:10 CST |
| start time: 2023 εΉ΄ 08 月 08 ζ—₯ 星期二 00:20:50 CST |
| start time: 2023 εΉ΄ 08 月 08 ζ—₯ 星期二 00:29:15 CST |
+--------------------------------------------------------+

Read and access csv format files located at path ${DORIS_HOME}/student.csv:

select * from local(
"file_path" = "student.csv",
"backend_id" = "10003",
"format" = "csv");
+------+---------+--------+
| c1 | c2 | c3 |
+------+---------+--------+
| 1 | alice | 18 |
| 2 | bob | 20 |
| 3 | jack | 24 |
| 4 | jackson | 19 |
| 5 | liming | d18 |
+------+---------+--------+
```--+---------+--------+

Query files on NAS:

select * from local(
"file_path" = "/mnt/doris/prefix_*.txt",
"format" = "csv",
"column_separator" =",",
"shared_storage" = "true");
+------+------+------+
| c1 | c2 | c3 |
+------+------+------+
| 1 | 2 | 3 |
| 1 | 2 | 3 |
| 1 | 2 | 3 |
| 1 | 2 | 3 |
| 1 | 2 | 3 |
+------+------+------+

Can be used with desc function :

desc function local(
"file_path" = "student.csv",
"backend_id" = "10003",
"format" = "csv");
+-------+------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-------+---------+-------+
| c1 | TEXT | Yes | false | NULL | NONE |
| c2 | TEXT | Yes | false | NULL | NONE |
| c3 | TEXT | Yes | false | NULL | NONE |
+-------+------+------+-------+---------+-------+
``` "s3.endpoint" = "cos.ap-hongkong.myqcloud.com",
"s3.region" = "ap-hongkong",
"format" = "parquet",
"use_path_style" = "false");

-- MinIO
select * from s3(
"uri" = "s3://bucket/file.csv",
"s3.endpoint" = "http://172.21.0.101:9000",
"s3.access_key"= "ak",
"s3.secret_key" = "sk",
"s3.region" = "us-east-1",
"format" = "csv"
);

-- The BOS on Baidu Cloud will use 'virtual-hosted style' compatible with the S3 protocol to access s3.
-- BOS
select * from s3(
"uri" = "https://example-bucket.s3.bj.bcebos.com/your-folder/file.parquet",
"s3.access_key"= "ak",
"s3.secret_key" = "sk",
"s3.region" = "bj",
"s3.endpoint" = "http://bj.bcebos.com",
"format" = "parquet",
"use_path_style" = "false");
  • Example of s3://:

    -- Note how to write your bucket of URI, no need to set 'use_path_style'.
    -- s3 will be accessed in 'virtual-hosted style'.
    select * from s3(
    "URI" = "s3://bucket/file/student.csv",
    "s3.endpoint"= "endpont",
    "s3.region" = "region",
    "s3.access_key"= "ak",
    "s3.secret_key" = "sk",
    "format" = "csv");
  • Example of other uri styles:

    -- Virtual Host AWS Client (Hadoop S3) Mixed Style. Used by setting `use_path_style = false` and `force_parsing_by_standard_uri = true`.
    select * from s3(
    "URI" = "s3://my-bucket.s3.us-west-1.amazonaws.com/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88",
    "s3.access_key"= "ak",
    "s3.secret_key" = "sk",
    "s3.endpoint"= "endpont",
    "s3.region"= "region",
    "format" = "csv",
    "use_path_style"="false",
    "force_parsing_by_standard_uri"="true");

    -- Path AWS Client (Hadoop S3) Mixed Style. Used by setting `use_path_style = true` and `force_parsing_by_standard_uri = true`.
    select * from s3(
    "URI" = "s3://s3.us-west-1.amazonaws.com/my-bucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88",
    "s3.access_key"= "ak",
    "s3.secret_key" = "sk",
    "s3.endpoint"= "endpont",
    "s3.region"= "region",
    "format" = "csv",
    "use_path_style"="true",
    "force_parsing_by_standard_uri"="true");
  • csv format

    csv format: Read the file on S3 and process it as a csv file, read the first line in the file to parse out the table schema. The number of columns in the first line of the file n will be used as the number of columns in the table schema, and the column names of the table schema will be automatically named c1, c2, ..., cn, and the column type is set to String , for example:

    The file content of student1.csv:

    1,ftw,12
    2,zs,18
    3,ww,20
  • use S3 tvf

    select * from s3("uri" = "http://127.0.0.1:9312/test2/student1.csv",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "csv",
    "use_path_style" = "true") order by c1;
      +------+------+------+
    | c1 | c2 | c3 |
    +------+------+------+
    | 1 | ftw | 12 |
    | 2 | zs | 18 |
    | 3 | ww | 20 |
    +------+------+------+
  • use desc function S3() to view the table schema

    Desc function s3("uri" = "http://127.0.0.1:9312/test2/student1.csv",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "csv",
    "use_path_style" = "true");
      +-------+------+------+-------+---------+-------+
    | Field | Type | Null | Key | Default | Extra |
    +-------+------+------+-------+---------+-------+
    | c1 | TEXT | Yes | false | NULL | NONE |
    | c2 | TEXT | Yes | false | NULL | NONE |
    | c3 | TEXT | Yes | false | NULL | NONE |
    +-------+------+------+-------+---------+-------+
  • csv_with_names format

    csv_with_names format: The first line of the file is used as the number and name of the columns of the table schema, and the column type is set to String, for example:

    The file content of student_with_names.csv:

    id,name,age
    1,ftw,12
    2,zs,18
    3,ww,20
  • use S3 tvf

    select * from s3("uri" = "http://127.0.0.1:9312/test2/student_with_names.csv",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "csv_with_names",
    "use_path_style" = "true") order by id;
      +------+------+------+
    | id | name | age |
    +------+------+------+
    | 1 | ftw | 12 |
    | 2 | zs | 18 |
    | 3 | ww | 20 |
    +------+------+------+
  • You can also use desc function S3() to view the Table Schema.

    Desc function s3("uri" = "http://127.0.0.1:9312/test2/student_with_names.csv",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "csv_with_names",
    "use_path_style" = "true");
      +-------+------+------+-------+---------+-------+
    | Field | Type | Null | Key | Default | Extra |
    +-------+------+------+-------+---------+-------+
    | id | TEXT | Yes | false | NULL | NONE |
    | name | TEXT | Yes | false | NULL | NONE |
    | age | TEXT | Yes | false | NULL | NONE |
    +-------+------+------+-------+---------+-------+
  • csv_with_names_and_types format

    csv_with_names_and_types format: Currently, it does not support parsing the column type from a csv file. When using this format, S3 tvf will parse the first line of the file as the number and name of the columns of the table

    The file content of student_with_names_and_types.csv:

      id,name,age
    INT,STRING,INT
    1,ftw,12
    2,zs,18
    3,ww,20
  • use S3 tvf

    select * from s3("uri" = "http://127.0.0.1:9312/test2/student_with_names_and_types.csv",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "csv_with_names_and_types",
    "use_path_style" = "true") order by id;
    +------+------+------+
    | id | name | age |
    +------+------+------+
    | 1 | ftw | 12 |
    | 2 | zs | 18 |
    | 3 | ww | 20 |
    +------+------+------+
  • You can also use desc function S3() to view the Table Schema.

    Desc function s3("uri" = "http://127.0.0.1:9312/test2/student_with_names_and_types.csv",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "csv_with_names_and_types",
    "use_path_style" = "true");
      +-------+------+------+-------+---------+-------+
    | Field | Type | Null | Key | Default | Extra |
    +-------+------+------+-------+---------+-------+
    | id | TEXT | Yes | false | NULL | NONE |
    | name | TEXT | Yes | false | NULL | NONE |
    | age | TEXT | Yes | false | NULL | NONE |
    +-------+------+------+-------+---------+-------+
  • json format

    The json format involves many optional parameters, and the meaning of each parameter can be referred to: Json Load. When S3 tvf queries the json form

    data.json ζ–‡δ»Ά

    [{"id":1, "name":"ftw", "age":18}]
    [{"id":2, "name":"xxx", "age":17}]
    [{"id":3, "name":"yyy", "age":19}]
  • use S3 tvf:

    select * from s3(
    "uri" = "http://127.0.0.1:9312/test2/data.json",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "json",
    "strip_outer_array" = "true",
    "read_json_by_line" = "true",
    "use_path_style"="true");
    +------+------+------+
    | id | name | age |
    +------+------+------+
    | 1 | ftw | 18 |
    | 2 | xxx | 17 |
    | 3 | yyy | 19 |
    +------+------+------+

    select * from s3(
    "uri" = "http://127.0.0.1:9312/test2/data.json",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "json",
    "strip_outer_array" = "true",
    "jsonpaths" = "[\"$.id\", \"$.age\"]",
    "use_path_style"="true");
    +------+------+
    | id | age |
    +------+------+
    | 1 | 18 |
    | 2 | 17 |
    | 3 | 19 |
    +------+------+
  • parquet format

    S3 tvf supports parsing the column names and column types of the table schema from the parquet file. Example:

    select * from s3(
    "uri" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "parquet",
    "use_path_style"="true") limit 5;
         +-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
    | p_partkey | p_name | p_mfgr | p_brand | p_type | p_size | p_container | p_retailprice | p_comment |
    +-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
    | 1 | goldenrod lavender spring chocolate lace | Manufacturer#1 | Brand#13 | PROMO BURNISHED COPPER | 7 | JUMBO PKG | 901 | ly. slyly ironi |
    | 2 | blush thistle blue yellow saddle | Manufacturer#1 | Brand#13 | LARGE BRUSHED BRASS | 1 | LG CASE | 902 | lar accounts amo |
    | 3 | spring green yellow purple cornsilk | Manufacturer#4 | Brand#42 | STANDARD POLISHED BRASS | 21 | WRAP CASE | 903 | egular deposits hag |
    | 4 | cornflower chocolate smoke green pink | Manufacturer#3 | Brand#34 | SMALL PLATED BRASS | 14 | MED DRUM | 904 | p furiously r |
    | 5 | forest brown coral puff cream | Manufacturer#3 | Brand#32 | STANDARD POLISHED TIN | 15 | SM PKG | 905 | wake carefully |
    +-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
    MySQL [(none)]> desc function s3(
    "uri" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "parquet",
    "use_path_style"="true");
          +---------------+--------------+------+-------+---------+-------+
    | Field | Type | Null | Key | Default | Extra |
    +---------------+--------------+------+-------+---------+-------+
    | p_partkey | INT | Yes | false | NULL | NONE |
    | p_name | TEXT | Yes | false | NULL | NONE |
    | p_mfgr | TEXT | Yes | false | NULL | NONE |
    | p_brand | TEXT | Yes | false | NULL | NONE |
    | p_type | TEXT | Yes | false | NULL | NONE |
    | p_size | INT | Yes | false | NULL | NONE |
    | p_container | TEXT | Yes | false | NULL | NONE |
    | p_retailprice | DECIMAL(9,0) | Yes | false | NULL | NONE |
    | p_comment | TEXT | Yes | false | NULL | NONE |
    +---------------+--------------+------+-------+---------+-------+
  • orc format

    Same as parquet format, set format parameter to orc.

    select * from s3(
    "uri" = "http://127.0.0.1:9312/test2/test.snappy.orc",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "orc",
    "use_path_style"="true") limit 5;
      +-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
    | p_partkey | p_name | p_mfgr | p_brand | p_type | p_size | p_container | p_retailprice | p_comment |
    +-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
    | 1 | goldenrod lavender spring chocolate lace | Manufacturer#1 | Brand#13 | PROMO BURNISHED COPPER | 7 | JUMBO PKG | 901 | ly. slyly ironi |
    | 2 | blush thistle blue yellow saddle | Manufacturer#1 | Brand#13 | LARGE BRUSHED BRASS | 1 | LG CASE | 902 | lar accounts amo |
    | 3 | spring green yellow purple cornsilk | Manufacturer#4 | Brand#42 | STANDARD POLISHED BRASS | 21 | WRAP CASE | 903 | egular deposits hag |
    | 4 | cornflower chocolate smoke green pink | Manufacturer#3 | Brand#34 | SMALL PLATED BRASS | 14 | MED DRUM | 904 | p furiously r |
    | 5 | forest brown coral puff cream | Manufacturer#3 | Brand#32 | STANDARD POLISHED TIN | 15 | SM PKG | 905 | wake carefully |
    +-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
  • avro format

    S3 tvf supports parsing the column names and column types of the table schema from the avro file. Example:

    select * from s3(
    "uri" = "http://127.0.0.1:9312/test2/person.avro",
    "ACCESS_KEY" = "ak",
    "SECRET_KEY" = "sk",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "FORMAT" = "avro");
    +--------+--------------+-------------+-----------------+
    | name | boolean_type | double_type | long_type |
    +--------+--------------+-------------+-----------------+
    | Alyssa | 1 | 10.0012 | 100000000221133 |
    | Ben | 0 | 5555.999 | 4009990000 |
    | lisi | 0 | 5992225.999 | 9099933330 |
    +--------+--------------+-------------+-----------------+
  • uri contains wildcards

    uri can use wildcards to read multiple files. Note: If wildcards are used, the format of each file must be consistent (especially csv/csv_with_names/csv_with_names_and_types count as different formats), S3 tvf uses the first file to parse out the table schema. For example:

    The following two csv files:

    // file1.csv
    1,aaa,18
    2,qqq,20
    3,qwe,19

    // file2.csv
    5,cyx,19
    6,ftw,21
  • You can use wildcards on the uri to query.

    select * from s3(
    "uri" = "http://127.0.0.1:9312/test2/file*.csv",
    "s3.access_key"= "minioadmin",
    "s3.secret_key" = "minioadmin",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "csv",
    "use_path_style"="true");
     +------+------+------+
    | c1 | c2 | c3 |
    +------+------+------+
    | 1 | aaa | 18 |
    | 2 | qqq | 20 |
    | 3 | qwe | 19 |
    | 5 | cyx | 19 |
    | 6 | ftw | 21 |
    +------+------+------+
  • Using S3 tvf with insert into and cast

    -- Create doris internal table
    CREATE TABLE IF NOT EXISTS ${testTable}
    (
    id int,
    name varchar(50),
    age int
    )
    COMMENT "my first table"
    DISTRIBUTED BY HASH(id) BUCKETS 32
    PROPERTIES("replication_num" = "1");

    -- Insert data using S3
    insert into ${testTable} (id,name,age)
    select cast (id as INT) as id, name, cast (age as INT) as age
    from s3(
    "uri" = "${uri}",
    "s3.access_key"= "${ak}",
    "s3.secret_key" = "${sk}",
    "s3.endpoint" = "http://127.0.0.1:9312",
    "s3.region" = "us-east-1",
    "format" = "${format}",
    "strip_outer_array" = "true",
    "read_json_by_line" = "true",
    "use_path_style" = "true");