JSON | File Format

This document describes how to import JSON-format data files into Doris. Doris supports importing standard JSON-format data. Through parameter configuration, you can flexibly handle different JSON data structures, extract fields from JSON data, parse nested structures, and more.

Import methods

The following import methods support JSON-format data:

Import method	Applicable scenario	Parameter passing
Stream Load	Small-batch, near-real-time imports pushed from local or client	HTTP Header, for example `-H "jsonpaths: $.data"`
Broker Load	Bulk imports of large files from object storage, HDFS, and similar sources	`PROPERTIES`, for example `PROPERTIES("jsonpaths"="$.data")`
Routine Load	Continuously consume JSON from message queues such as Kafka	`PROPERTIES`, for example `PROPERTIES("jsonpaths"="$.data")`
INSERT INTO FROM S3 TVF	Read JSON files on S3 directly with SQL	TVF parameters, for example `S3("jsonpaths"="$.data")`
INSERT INTO FROM HDFS TVF	Read JSON files on HDFS directly with SQL	TVF parameters

Supported JSON formats

Doris supports the following three JSON file organization formats, each suitable for different business scenarios.

Format 1: Multiple rows represented as an Array

Suitable for bulk-importing multiple rows at once.

Requirements:

The root node must be an array.
Each element in the array is an object that represents one row of data.
strip_outer_array=true must be set.

Example data:

[
    {"id": 123, "city": "beijing"},
    {"id": 456, "city": "shanghai"}
]

// Nested structures are supported
[
    {"id": 123, "city": {"name": "beijing", "region": "haidian"}},
    {"id": 456, "city": {"name": "beijing", "region": "chaoyang"}}
]

Format 2: A single row represented as an Object

Suitable for importing a single row of data.

Requirements:

The root node must be an object.
The entire object represents one row of data.
The file contains only one JSON record per line.

Example data:

{"id": 123, "city": "beijing"}

// Nested structures are supported
{"id": 123, "city": {"name": "beijing", "region": "haidian"}}

Note

This format is typically used with Routine Load, for example a single message in Kafka.

Format 3: Multiple rows of Object data separated by a fixed delimiter

Suitable for bulk-importing multiple rows, where each row is an independent JSON object (similar to NDJSON).

Requirements:

Each line is a complete JSON object.
You do not need to set read_json_by_line=true explicitly; it is enabled by default.
The line delimiter can be specified with the line_delimiter parameter, and defaults to \n.

Example data:

{"id": 123, "city": "beijing"}
{"id": 456, "city": "shanghai"}

Parameter configuration

Parameter support matrix for each import method

The following table lists the support for JSON-related parameters across the different import methods:

Parameter	Default	Stream Load	Broker Load	Routine Load	TVF
`jsonpaths`	None	Supported	Supported	Supported	Supported
`json_root`	None	Supported	Supported	Supported	Supported
`strip_outer_array`	false	Supported	Supported	Supported	Supported
`read_json_by_line`	true	Supported	Not configurable	Not configurable	Supported
`fuzzy_parse`	false	Supported	Supported	Not supported	Supported
`num_as_string`	false	Supported	Supported	Supported	Supported
Compression format	plain	Supported	Supported	Not supported	Supported

Parameter passing and default behavior

Stream Load: parameters are passed directly through HTTP Header, for example -H "jsonpaths: $.data".
Broker Load: parameters are passed through PROPERTIES, for example PROPERTIES("jsonpaths"="$.data").
Routine Load: parameters are passed through PROPERTIES, for example PROPERTIES("jsonpaths"="$.data").
TVF: parameters are passed through the TVF statement, for example S3("jsonpaths"="$.data").
To import the JSON object at the root node of a JSON file directly, set jsonpaths to $., for example PROPERTIES("jsonpaths"="$.").
The default value of read_json_by_line is true, which means: if neither strip_outer_array nor read_json_by_line is specified during import, read_json_by_line is true.
read_json_by_line is not configurable in Broker Load and Routine Load. It is forced to true to enable streaming reads and reduce memory pressure on the BE.

Detailed parameter description

jsonpaths

Purpose: Specifies how to extract fields from JSON data.
Type: Array of strings.
Default: None. By default, columns are matched by name.

Examples:

-- Basic usage
["$.id", "$.city"]

-- Nested structure
["$.id", "$.info.city", "$.data[0].name"]

json_root

Purpose: Specifies the parsing entry point of JSON data.
Type: String.
Default: None. By default, parsing starts from the root node.

Examples:

-- Original data
{
  "data": {
    "id": 123,
    "city": "beijing"
  }
}

-- Set json_root
json_root = $.data

strip_outer_array

Purpose: Specifies whether to strip the outermost array structure.
Type: Boolean.
Default: false.

Examples:

-- Original data
[
  {"id": 1, "city": "beijing"},
  {"id": 2, "city": "shanghai"}
]

-- Set strip_outer_array=true

read_json_by_line

Purpose: Specifies whether to read JSON data line by line.
Type: Boolean.
Default: false.

Examples:

-- Original data (one complete JSON object per line)
{"id": 1, "city": "beijing"}
{"id": 2, "city": "shanghai"}

-- Set read_json_by_line=true

fuzzy_parse

Purpose: Speeds up JSON data import.
Type: Boolean.
Default: false.
Restrictions:
- The field order in every row of the array must be exactly the same.
- Typically used together with strip_outer_array.
Performance: Improves import efficiency by 3 to 5 times.

num_as_string

Purpose: Specifies whether numeric values in JSON are parsed as strings.
Type: Boolean.
Default: false.
Use cases:
- Handling numbers that exceed the supported numeric range.
- Avoiding loss of numeric precision.

Example:

-- Original data
{
  "id": "12345678901234567890",
  "price": "99999999.999999"
}
-- With num_as_string=true, the price field is parsed as a string

Relationship between JSON Path and Columns

During data import, jsonpaths and columns have distinct responsibilities:

Parameter	Responsibility	Result
`jsonpaths`	Defines the data extraction rules	Extracts fields from JSON data along the specified paths and reorders them according to the order defined in `jsonpaths`
`columns`	Defines the data mapping rules	Maps the extracted fields to columns in the target table, with optional column reordering and transformation

The two are processed sequentially:

jsonpaths first extracts fields from the source data and forms an ordered dataset.
columns then maps these data items to the table columns.

If columns is not specified, the extracted fields are mapped directly in the order of the table's columns.

Example 1: Using JSON Path only

Table schema and data:

-- Table schema
CREATE TABLE example_table (
    k2 int,
    k1 int
);

-- JSON data
{"k1": 1, "k2": 2}

Import command:

curl -v ... -H "format: json" \
    -H "jsonpaths: [\"$.k2\", \"$.k1\"]" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load

Import result:

+------+------+
| k1   | k2   |
+------+------+
|    2 |    1 |
+------+------+

Example 2: Using JSON Path together with Columns

Use the same table schema and data as above, and add the columns parameter.

Import command:

curl -v ... -H "format: json" \
    -H "jsonpaths: [\"$.k2\", \"$.k1\"]" \
    -H "columns: k2, k1" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load

Import result:

+------+------+
| k1   | k2   |
+------+------+
|    1 |    2 |
+------+------+

Example 3: Reusing a field

Table schema and data:

-- Table schema
CREATE TABLE example_table (
    k2 int,
    k1 int,
    k1_copy int
);

-- JSON data
{"k1": 1, "k2": 2}

Import command:

curl -v ... -H "format: json" \
    -H "jsonpaths: [\"$.k2\", \"$.k1\", \"$.k1\"]" \
    -H "columns: k2, k1, k1_copy" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load

Import result:

+------+------+---------+
| k2   | k1   | k1_copy |
+------+------+---------+
|    2 |    1 |       1 |
+------+------+---------+

Example 4: Mapping nested fields

Table schema and data:

-- Table schema
CREATE TABLE example_table (
    k2 int,
    k1 int,
    k1_nested1 int,
    k1_nested2 int
);

-- JSON data
{
    "k1": 1,
    "k2": 2,
    "k3": {
        "k1": 31,
        "k1_nested": {
            "k1": 32
        }
    }
}

Import command:

curl -v ... -H "format: json" \
    -H "jsonpaths: [\"$.k2\", \"$.k1\", \"$.k3.k1\", \"$.k3.k1_nested.k1\"]" \
    -H "columns: k2, k1, k1_nested1, k1_nested2" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load

Import result:

+------+------+------------+------------+
| k2   | k1   | k1_nested1 | k1_nested2 |
+------+------+------------+------------+
|    2 |    1 |         31 |         32 |
+------+------+------------+------------+

Usage examples

This section shows how to use the JSON format with each import method. You can copy and adapt the snippets directly.

Stream Load

# Use JSON Path
curl --location-trusted -u <user>:<passwd> \
    -H "format: json" \
    -H "jsonpaths: [\"$.id\", \"$.city\"]" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load

# Specify JSON root
curl --location-trusted -u <user>:<passwd> \
    -H "format: json" \
    -H "json_root: $.events" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load

# Read JSON line by line
curl --location-trusted -u <user>:<passwd> \
    -H "format: json" \
    -H "read_json_by_line: true" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load

Broker Load

-- Use JSON Path
LOAD LABEL example_db.example_label
(
    DATA INFILE("s3://bucket/path/example.json")
    INTO TABLE example_table
    FORMAT AS "json"
    PROPERTIES
    (
        "jsonpaths" = "[\"$.id\", \"$.city\"]"
    )
)
WITH S3
(
    ...
);

-- Specify JSON root
LOAD LABEL example_db.example_label
(
    DATA INFILE("s3://bucket/path/example.json")
    INTO TABLE example_table
    FORMAT AS "json"
    PROPERTIES
    (
        "json_root" = "$.events"
    )
)
WITH S3
(
    ...
);

Routine Load

-- Use JSON Path
CREATE ROUTINE LOAD example_db.example_job ON example_table
PROPERTIES
(
    "format" = "json",
    "jsonpaths" = "[\"$.id\", \"$.city\"]"
)
FROM KAFKA
(
    ...
);

TVF

-- Use JSON Path
INSERT INTO example_table
SELECT *
FROM S3
(
    "uri" = "s3://bucket/example.json",
    "format" = "json",
    "jsonpaths" = "[\"$.id\", \"$.city\"]",
    ...
);

-- Specify JSON root
INSERT INTO example_table
SELECT *
FROM S3
(
    "uri" = "s3://bucket/example.json",
    "format" = "json",
    "json_root" = "$.events",
    ...
);

FAQ

Q1: When importing a JSON file in array form, all data is treated as a single row. Why?

You must explicitly set strip_outer_array=true. Otherwise, Doris parses the entire outer array as a single JSON object and cannot write the data row by row.

Q2: How should I import a file with one JSON object per line (NDJSON)?

Import it directly using Format 3. read_json_by_line is enabled by default, so no extra configuration is required. If the line delimiter is not \n, specify it with line_delimiter.

Q3: How do I import the JSON object at the root node of a JSON file directly?

Set jsonpaths to $., for example PROPERTIES("jsonpaths"="$.").

Q4: How do I avoid precision loss for very large numbers or high-precision decimals?

Set num_as_string=true. Numeric values in JSON are parsed as strings and then written into the corresponding column types, which avoids numeric overflow and precision loss.

Q5: Why does Broker Load / Routine Load not allow `read_json_by_line` to be configured?

To reduce memory pressure on the BE, Broker Load and Routine Load always read line by line in streaming mode. As a result, read_json_by_line cannot be configured explicitly.

Q6: How do I speed up the import of large numbers of JSON arrays with the same structure?

Enable fuzzy_parse=true together with strip_outer_array=true to gain a 3 to 5 times improvement in import performance. The field order in every row of the array must be exactly the same.

Q7: What is the relationship between `jsonpaths` and `columns`, and which takes effect first?

jsonpaths first extracts fields from the JSON and arranges them in its specified order, and columns then maps these fields to columns in the target table. For details, see Relationship between JSON Path and Columns.

Import methods​

Supported JSON formats​

Format 1: Multiple rows represented as an Array​

Format 2: A single row represented as an Object​

Format 3: Multiple rows of Object data separated by a fixed delimiter​

Parameter configuration​

Parameter support matrix for each import method​

Detailed parameter description​

jsonpaths​

json_root​

strip_outer_array​

read_json_by_line​

fuzzy_parse​

num_as_string​

Relationship between JSON Path and Columns​

Example 1: Using JSON Path only​

Example 2: Using JSON Path together with Columns​

Example 3: Reusing a field​

Example 4: Mapping nested fields​

Usage examples​

Stream Load​

Broker Load​

Routine Load​

TVF​

FAQ​

Q1: When importing a JSON file in array form, all data is treated as a single row. Why?​

Q2: How should I import a file with one JSON object per line (NDJSON)?​

Q3: How do I import the JSON object at the root node of a JSON file directly?​

Q4: How do I avoid precision loss for very large numbers or high-precision decimals?​

Q5: Why does Broker Load / Routine Load not allow read_json_by_line to be configured?​

Q6: How do I speed up the import of large numbers of JSON arrays with the same structure?​

Q7: What is the relationship between jsonpaths and columns, and which takes effect first?​

Import methods

Supported JSON formats

Format 1: Multiple rows represented as an Array

Format 2: A single row represented as an Object

Format 3: Multiple rows of Object data separated by a fixed delimiter

Parameter configuration

Parameter support matrix for each import method

Detailed parameter description

jsonpaths

json_root

strip_outer_array

read_json_by_line

fuzzy_parse

num_as_string

Relationship between JSON Path and Columns

Example 1: Using JSON Path only

Example 2: Using JSON Path together with Columns

Example 3: Reusing a field

Example 4: Mapping nested fields

Usage examples

Stream Load

Broker Load

Routine Load

TVF

FAQ

Q1: When importing a JSON file in array form, all data is treated as a single row. Why?

Q2: How should I import a file with one JSON object per line (NDJSON)?

Q3: How do I import the JSON object at the root node of a JSON file directly?

Q4: How do I avoid precision loss for very large numbers or high-precision decimals?

Q5: Why does Broker Load / Routine Load not allow `read_json_by_line` to be configured?

Q6: How do I speed up the import of large numbers of JSON arrays with the same structure?

Q7: What is the relationship between `jsonpaths` and `columns`, and which takes effect first?