Skip to main content

HTTP

HTTP table-valued-function (tvf) allows users to read and access file content on HTTP paths as if accessing relational table format data. Currently supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc file formats.

note

Supported since 4.0.3

Syntax

HTTP(
"uri" = "<uri>",
"format" = "<format>"
[, "<optional_property_key>" = "<optional_property_value>" [, ...] ]
)

Required Parameters

ParameterDescription
uriHTTP address for access. Supports http, https and hf protocols.
formatFile format, supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc

About hf://(Hugging Face), please see Analyzing Hugging Face Data.

Optional Parameters

ParameterDescriptionNotes
http.header.xxxUsed to specify arbitrary HTTP Headers, which will be directly passed to the HTTP Client. For example "http.header.Authorization" = "Bearer hf_MWYzOJJoZEymb...", the final Header will be Authorization: Bearer hf_MWYzOJJoZEymb...
http.enable.range.requestWhether to use range request to access HTTP service. Default is true.
http.max.request.size.bytesMaximum access size limit when using non-range request mode. Default is 100MB

When http.enable.range.request is true, the system will first try to access the HTTP service using range request. If the HTTP service does not support range request, it will automatically fall back to non-range request mode. And the maximum access data size is limited by http.max.request.size.bytes.

Examples

  • Read CSV data from GitHub

    SELECT COUNT(*) FROM
    HTTP(
    "uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv",
    "format" = "csv",
    "column_separator" = ","
    );
  • Access Parquet data from GitHub

    SELECT arr_map, id FROM
    HTTP(
    "uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet",
    "format" = "parquet"
    );
  • Access JSON data from GitHub and use with desc function

    DESC FUNCTION
    HTTP(
    "uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json",
    "format" = "json",
    "strip_outer_array" = "true"
    );