Skip to main content

PARQUET_META

The parquet_meta table-valued-function (tvf) can be used to read Footer metadata of Parquet files without scanning data pages. It allows for quickly viewing Row Group statistics, Schema, file-level metadata, KV metadata, and Bloom Filter probe results.

This is an experimental feature, supported since version 4.0.3.

Syntax

PARQUET_META(
"uri" = "<uri>",
"mode" = "<mode>",
{OptionalParameters},
{ConnectionParameters}
);
  • uri

    File path.

  • mode

    Metadata query mode. Optional, defaults to parquet_metadata. See "Supported Modes" section for values.

  • {OptionalParameters}

    • column: Required when mode is parquet_bloom_probe, specifies the column name to probe.
    • value: Required when mode is parquet_bloom_probe, specifies the literal value to probe.
  • {ConnectionParameters}

    Parameters required to access the storage system where the file is located. For details, see:

Supported Modes

parquet_metadata

Default mode.

This mode can be used to query metadata contained in Parquet files. This metadata reveals various internal details of the Parquet file, such as statistics for different columns. This helps determine what types of skip operations can be performed on Parquet files and can even provide quick insights into the content of different columns.

Field NameType
file_nameSTRING
row_group_idBIGINT
row_group_num_rowsBIGINT
row_group_num_columnsBIGINT
row_group_bytesBIGINT
column_idBIGINT
file_offsetBIGINT
num_valuesBIGINT
path_in_schemaSTRING
typeSTRING
stats_minSTRING
stats_maxSTRING
stats_null_countBIGINT
stats_distinct_countBIGINT
stats_min_valueSTRING
stats_max_valueSTRING
compressionSTRING
encodingsSTRING
index_page_offsetBIGINT
dictionary_page_offsetBIGINT
data_page_offsetBIGINT
total_compressed_sizeBIGINT
total_uncompressed_sizeBIGINT
key_value_metadataMAP<VARBINARY, VARBINARY>
bloom_filter_offsetBIGINT
bloom_filter_lengthBIGINT
min_is_exactBOOLEAN
max_is_exactBOOLEAN
row_group_compressed_bytesBIGINT

parquet_schema

This mode can be used to query the internal schema contained in Parquet files. Note that this is the structure included in the Parquet file metadata.

Field NameType
file_nameVARCHAR
nameVARCHAR
typeVARCHAR
type_lengthBIGINT
repetition_typeVARCHAR
num_childrenBIGINT
converted_typeVARCHAR
scaleBIGINT
precisionBIGINT
field_idBIGINT
logical_typeVARCHAR

parquet_file_metadata

This mode can be used to query file-level metadata, such as the format version and encryption algorithm used.

Field NameType
file_nameSTRING
created_bySTRING
num_rowsBIGINT
num_row_groupsBIGINT
format_versionBIGINT
encryption_algorithmSTRING
footer_signing_key_metadataSTRING

parquet_kv_metadata

This mode can be used to query custom metadata defined as key-value pairs.

Field NameType
file_nameSTRING
keySTRING
valueSTRING

parquet_bloom_probe

Doris supports using Bloom filters in Parquet files for data filtering and pruning. This mode is used to detect whether a specified column and column value can be detected through the Bloom filter.

Field NameType
file_nameSTRING
row_group_idINT
bloom_filter_excludesINT

Meaning of bloom_filter_excludes:

  • 1: Bloom Filter determines that this Row Group definitely does not contain this value
  • 0: Bloom Filter determines that it may contain this value
  • -1: File does not have a Bloom Filter

Examples

  • Local file (without scheme)

    SELECT * FROM parquet_meta(
    "uri" = "/path/to/test.parquet"
    );
  • S3 file (with scheme + storage parameters)

    SELECT * FROM parquet_meta(
    "uri" = "s3://bucket/path/test.parquet",
    "mode" = "parquet_schema",
    "s3.access_key" = "...",
    "s3.secret_key" = "...",
    "s3.endpoint" = "s3.xxx.com",
    "s3.region" = "us-east-1"
    );
  • Using wildcards (glob)

    SELECT file_name FROM parquet_meta(
    "uri" = "s3://bucket/path/*meta.parquet",
    "mode" = "parquet_file_metadata"
    );
  • Using parquet_bloom_probe mode

    select * from parquet_meta(
    "uri" = "${basePath}/bloommeta.parquet",
    "mode" = "parquet_bloom_probe",
    "column" = "col",
    "value" = 500,
    "s3.access_key" = "${ak}",
    "s3.secret_key" = "${sk}",
    "s3.endpoint" = "${endpoint}",
    "s3.region" = "${region}",
    );

Notes and Limitations

  • parquet_meta only reads Parquet Footer metadata, not data pages, making it suitable for quickly viewing metadata.
  • Supports wildcards (such as *, {}, []). If no matching files are found, an error will be reported.