跳到主要内容

Paimon Catalog

Doris 支持通过多种元数据服务访问 Paimon 表元数据,并进行 Paimon 数据查询。

目前只支持 Paimon 表的读操作,未来会支持的写入 Paimon 表。

使用 Docker 快速体验 Apache Doris & Paimon

适用场景

场景说明
查询加速利用 Doris 分布式计算引擎,直接访问 Paimon 数据进行查询加速。
数据集成读取 Paimon 数据并写入到 Doris 内表。或通过 Doris 计算引擎进行 ZeroETL 操作。
数据写回暂不支持。

配置 Catalog

语法

CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
'type' = 'paimon',
'paimon.catalog.type' = '<paimon_catalog_type>',
'warehouse' = '<paimon_warehouse>'
{MetaStoreProperties},
{StorageProperties},
{CommonProperties}
);
  • <paimon_catalog_type>

    Paimon Catalog 的类型,支持以下几种:

    • filesystem:默认。直接访问文件系统上存储的元数据。

    • hms:使用 Hive Metastore 作为元数据服务。

    • dlf:使用阿里云 DLF 作为元数据服务。

  • <paimon_warehouse>

    Paimon 的仓库路径。当 <paimon_catalog_type>filesystem 时,需指定这个参数。

    warehouse 的路径必须指向 Database 路径的上一级。如您的表路径是:s3://bucket/path/to/db1/table1,那么 warehouse 应该是:s3://bucket/path/to/

  • {MetaStoreProperties}

    MetaStoreProperties 部分用于填写 Metastore 元数据服务连接和认证信息。具体可参阅【支持的元数据服务】部分。

  • {StorageProperties}

    StorageProperties 部分用于填写存储系统相关的连接和认证信息。具体可参阅【支持的存储系统】部分。

  • {CommonProperties}

    CommonProperties 部分用于填写通用属性。请参阅 数据目录概述 中【通用属性】部分。

支持的 Paimon 版本

当前依赖的 Paimon 版本为 1.0.0。

支持的 Paimon 格式

  • 支持读取 Paimon Deletion Vector

支持的元数据服务

支持的存储系统

支持的数据格式

列类型映射

Paimon TypeDoris TypeComment
booleanboolean
tinyinttinyint
smallintsmallint
integerint
bigintbigint
floatfloat
doubledouble
decimal(P, S)decimal(P, S)
varcharstring
charstring
bianrystring
varbinarystring
datedate
timestamp_without_time_zonedatetime(N)会根据精度进行对应映射。如果精度大于 6,则最大映射到 6。(可能导致精度丢失)
timestamp_with_local_time_zonedatetime(N)会根据精度进行对应映射。如果精度大于 6,则最大映射到 6。(可能导致精度丢失)
arrayarray
mapmap
rowstruct
otherUNSUPPORTED

基础示例

Paimon on HDFS

CREATE CATALOG paimon_hdfs PROPERTIES (
'type' = 'paimon',
'warehouse' = 'hdfs://HDFS8000871/user/paimon',
'dfs.nameservices' = 'HDFS8000871',
'dfs.ha.namenodes.HDFS8000871' = 'nn1,nn2',
'dfs.namenode.rpc-address.HDFS8000871.nn1' = '172.21.0.1:4007',
'dfs.namenode.rpc-address.HDFS8000871.nn2' = '172.21.0.2:4007',
'dfs.client.failover.proxy.provider.HDFS8000871' = 'org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'hadoop.username' = 'hadoop'
);

Paimon on HMS

CREATE CATALOG paimon_hms PROPERTIES (
'type' = 'paimon',
'paimon.catalog.type' = 'hms',
'warehouse' = 'hdfs://HDFS8000871/user/zhangdong/paimon2',
'hive.metastore.uris' = 'thrift://172.21.0.44:7004',
'dfs.nameservices' = 'HDFS8000871',
'dfs.ha.namenodes.HDFS8000871' = 'nn1,nn2',
'dfs.namenode.rpc-address.HDFS8000871.nn1' = '172.21.0.1:4007',
'dfs.namenode.rpc-address.HDFS8000871.nn2' = '172.21.0.2:4007',
'dfs.client.failover.proxy.provider.HDFS8000871' = 'org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'hadoop.username' = 'hadoop'
);

Paimon on DLF

CREATE CATALOG paimon_dlf PROPERTIES (
'type' = 'paimon',
'paimon.catalog.type' = 'dlf',
'warehouse' = 'oss://xx/yy/',
'dlf.proxy.mode' = 'DLF_ONLY',
'dlf.uid' = 'xxxxx',
'dlf.region' = 'cn-beijing',
'dlf.access_key' = 'ak',
'dlf.secret_key' = 'sk'
);

Paimon on Google Dataproc Metastore

CREATE CATALOG `paimon_gms` PROPERTIES (
"type" = "paimon",
"paimon.catalog.type" = "hms",
"hive.metastore.uris" = "thrift://ip:port",
"warehouse" = "gs://bucket/warehouse",
"s3.access_key" = "ak",
"s3.secret_key" = "sk",
"s3.region" = "region",
"s3.endpoint" = "storage.googleapis.com"
);

Paimon on Google Cloud Storage

CREATE CATALOG `paimon_gcs` PROPERTIES (
"type" = "paimon",
"warehouse" = "gs://bucket/warehouse",
"s3.access_key" = "ak",
"s3.secret_key" = "sk",
"s3.region" = "region",
"s3.endpoint" = "storage.googleapis.com"
);

查询操作

基础查询

配置好 Catalog 后,可以通过以下方式查询 Catalog 中的表数据:

-- 1. switch to catalog, use database and query
SWITCH paimon_ctl;
USE paimon_db;
SELECT * FROM paimon_tbl LIMIT 10;

-- 2. use paimon database directly
USE paimon_ctl.paimon_db;
SELECT * FROM paimon_tbl LIMIT 10;

-- 3. use full qualified name to query
SELECT * FROM paimon_ctl.paimon_db.paimon_tbl LIMIT 10;

增量查询

该功能自 3.1.0 版本支持

支持类似 Flink 针对 Paimon 的 Batch Incremental 查询。

支持查询指定的快照或时间戳区间内的增量数据。区间为左闭右开区间。

-- read from snapshot 2
SELECT * FROM paimon_table@incr('startSnapshotId'='2');

-- between snapshots [0, 5)
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5');

-- between snapshots [0, 5) with specified scan mode
SELECT * FROM paimon_table@incr('startSnapshotId'='0', 'endSnapshotId'='5', 'incrementalBetweenScanMode'='diff');

-- read from start timestamp
SELECT * FROM paimon_table@incr('startTimestamp'='1750844949');

-- read between timestamp
SELECT * FROM paimon_table@incr('startTimestamp'='1750844949', 'endTimestamp'='1750944949');

参数说明:

参数说明示例
startSnapshotId起始快照 ID,必须大于 0'startSnapshotId'='3'
endSnapshotId结束快照 ID,必须大于 startSnapshotId。可选,如不指定,则表示从 startSnapshotId 开始读取到最新的快照'endSnapshotId'='10'
incrementalBetweenScanMode指定增量读取的模式,默认 auto,支持 deltachangelogdiff'incrementalBetweenScanMode'='delta'
startTimestamp起始快照时间,必须大于等于 0'startTimestamp'='1750844949'
endTimestamp结束快照时间,必须大于 startTimestamp。可选,如不指定,则表示从 startTimestamp 开始读取到最新的快照'endTimestamp'='1750944949'

注:

  • startSnapshotIdendSnapshotId 会组成 Paimon 参数 'incremental-between'='3,10'
  • startTimestampendTimestamp 会组成 Paimon 参数 'incremental-between-timestamp'='1750844949,1750944949'
  • incrementalBetweenScanMode 对应 Paimon 参数 incremental-between-scan-mode

可参阅 Paimon 文档 进一步了解这些参数。

系统表

该功能自 3.1.0 版本支持

Doris 支持查询 Paimon 系统表,用来查询表的相关元信息。支持使用系统表查看快照历史、清单文件、数据文件、分区等信息。

要访问 Paimon 表的元数据,可以在表名后添加 $ 符号,后跟系统表名称:

SELECT * FROM my_table$system_table_name;

例如,要查看表的审计记录,可以执行:

SELECT * FROM my_table$audit_log;

注意点:Doris 不支持读取 Paimon 全局系统表,其只在 Flink 中支持。

schemas

显示表的当前和历史模式信息。当使用 ALTER TABLECREATE TABLE ASCREATE DATABASE AS 语句修改表模式时,每次修改都会在 schemas 表中生成一条记录:

SELECT * FROM my_table$schemas;

结果:

+-----------+--------------------------------------------------------------------------------------------------------------------+----------------+--------------+---------+---------+-------------------------+
| schema_id | fields | partition_keys | primary_keys | options | comment | update_time |
+-----------+--------------------------------------------------------------------------------------------------------------------+----------------+--------------+---------+---------+-------------------------+
| 0 | [{"id":0,"name":"k","type":"INT NOT NULL"},{"id":1,"name":"f0","type":"INT"},{"id":2,"name":"f1","type":"STRING"}] | [] | ["k"] | {} | | 2025-03-04 22:48:41.666 |
+-----------+--------------------------------------------------------------------------------------------------------------------+----------------+--------------+---------+---------+-------------------------+

snapshots

显示表的所有有效快照信息,包括快照创建时间、提交用户、操作类型等:

SELECT * FROM my_table$snapshots;

结果:

+-------------+-----------+--------------------------------------+---------------------+-------------+-------------------------+------------------------------------------------------+------------------------------------------------------+-------------------------+--------------------+--------------------+------------------------+----------------------+
| snapshot_id | schema_id | commit_user | commit_identifier | commit_kind | commit_time | base_manifest_list | delta_manifest_list | changelog_manifest_list | total_record_count | delta_record_count | changelog_record_count | watermark |
+-------------+-----------+--------------------------------------+---------------------+-------------+-------------------------+------------------------------------------------------+------------------------------------------------------+-------------------------+--------------------+--------------------+------------------------+----------------------+
| 1 | 0 | d7ea4996-92c7-469f-b9ff-c76525954f1c | 9223372036854775807 | APPEND | 2025-03-04 22:48:45.575 | manifest-list-dc5490ba-420c-445a-b6f7-6962d394935c-0 | manifest-list-dc5490ba-420c-445a-b6f7-6962d394935c-1 | NULL | 1 | 1 | 0 | -9223372036854775808 |
| 2 | 0 | 34de47f6-31d1-4f06-b378-c85ef4fbca41 | 9223372036854775807 | APPEND | 2025-07-01 23:11:35.406 | manifest-list-dca6aa5b-6fc6-4b4f-ac22-acfa15bbf171-0 | manifest-list-dca6aa5b-6fc6-4b4f-ac22-acfa15bbf171-1 | NULL | 2 | 1 | 0 | -9223372036854775808 |
| 3 | 0 | 89f67183-a1f8-4ee9-b73c-3f7e992b79a7 | 9223372036854775807 | APPEND | 2025-07-01 23:11:45.114 | manifest-list-6d624d1b-c774-4d95-905e-8258a7b89ecb-0 | manifest-list-6d624d1b-c774-4d95-905e-8258a7b89ecb-1 | NULL | 3 | 1 | 0 | -9223372036854775808 |
| 4 | 0 | 31924a7c-1389-490c-adf1-3bb805b33cd7 | 9223372036854775807 | APPEND | 2025-07-01 23:12:42.042 | manifest-list-09097a51-afde-485e-929b-d2cc39eb6eb2-0 | manifest-list-09097a51-afde-485e-929b-d2cc39eb6eb2-1 | NULL | 5 | 2 | 0 | -9223372036854775808 |
| 5 | 0 | 1e90a80b-41cb-4242-b97c-889728f76810 | 9223372036854775807 | APPEND | 2025-07-01 23:14:26.445 | manifest-list-b8471969-9c4d-41cd-b790-64f6efb2d142-0 | manifest-list-b8471969-9c4d-41cd-b790-64f6efb2d142-1 | NULL | 6 | 1 | 0 | -9223372036854775808 |
| 6 | 0 | 1e90a80b-41cb-4242-b97c-889728f76810 | 9223372036854775807 | COMPACT | 2025-07-01 23:14:29.317 | manifest-list-b8471969-9c4d-41cd-b790-64f6efb2d142-2 | manifest-list-b8471969-9c4d-41cd-b790-64f6efb2d142-3 | NULL | 5 | -1 | 0 | -9223372036854775808 |
+-------------+-----------+--------------------------------------+---------------------+-------------+-------------------------+------------------------------------------------------+------------------------------------------------------+-------------------------+--------------------+--------------------+------------------------+----------------------+

options

显示表的当前配置选项。如果表选项未包含在表中,则该选项设置为默认值:

SELECT * FROM my_table$options;

结果:

+------------------------+--------------------+
| key | value |
+------------------------+--------------------+
| snapshot.time-retained | 5 h |
+------------------------+--------------------+

files

显示当前快照指向的所有数据文件信息,包括文件格式、记录数、文件大小等:

SELECT * FROM my_table$files;

结果:

mysql> SELECT * FROM my_table$files;
+-----------+--------+------------------------------------------------------------------------------------------------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+-------------------+---------------------+---------------------+---------------------+---------------------+-------------------------+-------------+
| partition | bucket | file_path | file_format | schema_id | level | record_count | file_size_in_bytes | min_key | max_key | null_value_counts | min_value_stats | max_value_stats | min_sequence_number | max_sequence_number | creation_time | file_source |
+-----------+--------+------------------------------------------------------------------------------------------------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+-------------------+---------------------+---------------------+---------------------+---------------------+-------------------------+-------------+
| {} | 0 | s3://paimon-warehouse-dev/test-flink/cookbook.db/my_table/bucket-0/data-b4a49c57-6ef6-4c04-8813-07a4960d987c-0.parquet | parquet | 0 | 5 | 5 | 1321 | [1] | [6] | {f0=0, f1=0, k=0} | {f0=4, f1=111, k=1} | {f0=11, f1=k7, k=6} | 0 | 5 | 2025-07-01 23:14:23.967 | COMPACT |
+-----------+--------+------------------------------------------------------------------------------------------------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+-------------------+---------------------+---------------------+---------------------+---------------------+-------------------------+-------------+

tags

显示表的所有标签信息,包括标签名称和关联的快照:

SELECT * FROM my_table$tags;

结果:

+----------+-------------+-----------+-------------------------+--------------+--------------+
| tag_name | snapshot_id | schema_id | commit_time | record_count | branches |
+----------+-------------+-----------+-------------------------+--------------+--------------+
| tag1 | 1 | 0 | 2025-03-04 14:55:29.344 | 3 | [] |
| tag3 | 3 | 0 | 2025-03-04 14:58:24.691 | 7 | [branch-1] |
+----------+-------------+-----------+-------------------------+--------------+--------------+

branches

显示表的所有已知分支信息:

SELECT * FROM my_table$branches;

结果:

+----------------------+-------------------------+
| branch_name | create_time |
+----------------------+-------------------------+
| branch1 | 2025-03-04 20:31:39.084 |
| branch2 | 2025-03-04 21:11:14.373 |
+----------------------+-------------------------+

consumers

显示表的消费者信息,用于跟踪数据消费情况:

SELECT * FROM my_table$consumers;

结果:

+-------------+------------------+
| consumer_id | next_snapshot_id |
+-------------+------------------+
| id1 | 1 |
| id2 | 3 |
+-------------+------------------+

manifests

显示表当前快照的清单文件信息:

SELECT * FROM my_table$manifests;

结果:

+-------------------------------------------------+-----------+-----------------+-------------------+-----------+---------------------+---------------------+
| file_name | file_size | num_added_files | num_deleted_files | schema_id | min_partition_stats | max_partition_stats |
+-------------------------------------------------+-----------+-----------------+-------------------+-----------+---------------------+---------------------+
| manifest-3df9bb64-5c11-4aef-994e-d8717fedfc70-0 | 1949 | 1 | 0 | 0 | {} | {} |
| manifest-d7eb4ec4-7238-478a-9ae6-91a4ccebd561-0 | 1946 | 1 | 0 | 0 | {} | {} |
| manifest-3b6f4079-c893-4413-aedc-1e8fbcea6db1-0 | 1948 | 1 | 0 | 0 | {} | {} |
| manifest-abe5177f-82da-4e86-9864-40efffb391bd-0 | 1964 | 1 | 0 | 0 | {} | {} |
| manifest-ee89dff3-a523-4655-a4b8-d7c9e471a1d6-0 | 1949 | 1 | 0 | 0 | {} | {} |
| manifest-ee89dff3-a523-4655-a4b8-d7c9e471a1d6-1 | 2232 | 1 | 5 | 0 | {} | {} |
+-------------------------------------------------+-----------+-----------------+-------------------+-----------+---------------------+---------------------+

aggregation_fields

显示表的聚合字段信息,用于聚合表的字段配置:

SELECT * FROM my_table$aggregation_fields;

结果:

+------------+--------------+----------+------------------+---------+
| field_name | field_type | function | function_options | comment |
+------------+--------------+----------+------------------+---------+
| k | INT NOT NULL | [] | [] | NULL |
| f0 | INT | [] | [] | NULL |
| f1 | STRING | [] | [] | NULL |
+------------+--------------+----------+------------------+---------+

partitions

显示表的分区信息,包括每个分区的总记录数和文件总大小:

SELECT * FROM my_table$partitions;

结果:

+-----------+--------------+--------------------+------------+-------------------------+
| partition | record_count | file_size_in_bytes | file_count | last_update_time |
+-----------+--------------+--------------------+------------+-------------------------+
| {} | 5 | 1321 | 1 | 2025-07-01 23:14:23.967 |
+-----------+--------------+--------------------+------------+-------------------------+

buckets

显示表的分桶信息,包括每个分桶的统计数据:

SELECT * FROM my_table$buckets;

结果:

+-----------+--------+--------------+--------------------+------------+-------------------------+
| partition | bucket | record_count | file_size_in_bytes | file_count | last_update_time |
+-----------+--------+--------------+--------------------+------------+-------------------------+
| {} | 0 | 5 | 1321 | 1 | 2025-07-01 23:14:23.967 |
+-----------+--------+--------------+--------------------+------------+-------------------------+

statistics

显示表的统计信息,包括行数、数据大小等统计数据:

SELECT * FROM my_table$statistics;

结果:

+--------------+------------+-----------------------+------------------+----------+
| snapshot_id | schema_id | mergedRecordCount | mergedRecordSize | colstat |
+--------------+------------+-----------------------+------------------+----------+
| 2 | 0 | 2 | 2 | {} |
+--------------+------------+-----------------------+------------------+----------+

table_indexes

显示表的索引信息:

SELECT * FROM my_table$table_indexes;

结果:

+--------------------------------+-------------+--------------------------------+--------------------------------+----------------------+----------------------+--------------------------------+
| partition | bucket | index_type | file_name | file_size | row_count | dv_ranges |
+--------------------------------+-------------+--------------------------------+--------------------------------+----------------------+----------------------+--------------------------------+
| {2025-03-01} | 0 | HASH | index-70abfebf-149e-4796-9f... | 12 | 3 | <NULL> |
| {2025-04-01} | 0 | DELETION_VECTORS | index-633857e7-cdce-47d2-87... | 33 | 1 | [(data-346cb9c8-4032-4d66-a... |
+--------------------------------+-------------+--------------------------------+--------------------------------+----------------------+----------------------+--------------------------------+

系统表使用场景

通过系统表,您可以轻松完成以下运维和监控任务。

查看表的最新快照信息,了解表的当前状态

SELECT snapshot_id, commit_time, commit_kind, total_record_count FROM catalog_sales$snapshots ORDER BY snapshot_id DESC;

结果:

+-------------+-------------------------+-------------+--------------------+
| snapshot_id | commit_time | commit_kind | total_record_count |
+-------------+-------------------------+-------------+--------------------+
| 1 | 2025-07-01 21:21:54.179 | APPEND | 14329288 |
+-------------+-------------------------+-------------+--------------------+

查看快照的表信息

SELECT s.snapshot_id, t.schema_id, t.fields FROM store_sales$snapshots s JOIN store_sales$schemas t ON s.schema_id=t.schema_id;

结果:

+-------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| snapshot_id | schema_id | fields |
+-------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | 0 | [{"id":0,"name":"ss_sold_date_sk","type":"INT"},{"id":1,"name":"ss_item_sk","type":"INT NOT NULL"},{"id":2,"name":"ss_ticket_number","type":"INT NOT NULL"},{"id":3,"name":"ss_sold_time_sk","type":"INT"},{"id":4,"name":"ss_customer_sk","type":"INT"},{"id":5,"name":"ss_cdemo_sk","type":"INT"},{"id":6,"name":"ss_hdemo_sk","type":"INT"},{"id":7,"name":"ss_addr_sk","type":"INT"},{"id":8,"name":"ss_store_sk","type":"INT"},{"id":9,"name":"ss_promo_sk","type":"INT"},{"id":10,"name":"ss_quantity","type":"INT"},{"id":11,"name":"ss_wholesale_cost","type":"DECIMAL(7, 2)"},{"id":12,"name":"ss_list_price","type":"DECIMAL(7, 2)"},{"id":13,"name":"ss_sales_price","type":"DECIMAL(7, 2)"},{"id":14,"name":"ss_ext_discount_amt","type":"DECIMAL(7, 2)"},{"id":15,"name":"ss_ext_sales_price","type":"DECIMAL(7, 2)"},{"id":16,"name":"ss_ext_wholesale_cost","type":"DECIMAL(7, 2)"},{"id":17,"name":"ss_ext_list_price","type":"DECIMAL(7, 2)"},{"id":18,"name":"ss_ext_tax","type":"DECIMAL(7, 2)"},{"id":19,"name":"ss_coupon_amt","type":"DECIMAL(7, 2)"},{"id":20,"name":"ss_net_paid","type":"DECIMAL(7, 2)"},{"id":21,"name":"ss_net_paid_inc_tax","type":"DECIMAL(7, 2)"},{"id":22,"name":"ss_net_profit","type":"DECIMAL(7, 2)"}] |
| 2 | 0 | [{"id":0,"name":"ss_sold_date_sk","type":"INT"},{"id":1,"name":"ss_item_sk","type":"INT NOT NULL"},{"id":2,"name":"ss_ticket_number","type":"INT NOT NULL"},{"id":3,"name":"ss_sold_time_sk","type":"INT"},{"id":4,"name":"ss_customer_sk","type":"INT"},{"id":5,"name":"ss_cdemo_sk","type":"INT"},{"id":6,"name":"ss_hdemo_sk","type":"INT"},{"id":7,"name":"ss_addr_sk","type":"INT"},{"id":8,"name":"ss_store_sk","type":"INT"},{"id":9,"name":"ss_promo_sk","type":"INT"},{"id":10,"name":"ss_quantity","type":"INT"},{"id":11,"name":"ss_wholesale_cost","type":"DECIMAL(7, 2)"},{"id":12,"name":"ss_list_price","type":"DECIMAL(7, 2)"},{"id":13,"name":"ss_sales_price","type":"DECIMAL(7, 2)"},{"id":14,"name":"ss_ext_discount_amt","type":"DECIMAL(7, 2)"},{"id":15,"name":"ss_ext_sales_price","type":"DECIMAL(7, 2)"},{"id":16,"name":"ss_ext_wholesale_cost","type":"DECIMAL(7, 2)"},{"id":17,"name":"ss_ext_list_price","type":"DECIMAL(7, 2)"},{"id":18,"name":"ss_ext_tax","type":"DECIMAL(7, 2)"},{"id":19,"name":"ss_coupon_amt","type":"DECIMAL(7, 2)"},{"id":20,"name":"ss_net_paid","type":"DECIMAL(7, 2)"},{"id":21,"name":"ss_net_paid_inc_tax","type":"DECIMAL(7, 2)"},{"id":22,"name":"ss_net_profit","type":"DECIMAL(7, 2)"}] |
+-------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

查看 buckets 的数据分布情况

SELECT `bucket` , COUNT(*) as file_count, SUM(file_size_in_bytes)/1024/1024 as total_size_mb from paimon_s3.tpcds.catalog_sales$files GROUP BY `bucket`  ORDER BY total_size_mb;

注意:Paimon 系统表中的许多字段是 Doris 中的关键字,因此需要加上 `。

结果:

+--------+------------+--------------------+
| bucket | file_count | total_size_mb |
+--------+------------+--------------------+
| 35 | 1 | 12.144722938537598 |
| 81 | 1 | 12.143454551696777 |
| 37 | 1 | 12.14071273803711 |
| 36 | 1 | 12.139023780822754 |
| 63 | 1 | 12.137332916259766 |
| 7 | 1 | 12.122495651245117 |
| 15 | 1 | 12.117934226989746 |
| 11 | 1 | 12.116133689880371 |
| 12 | 1 | 12.11155891418457 |
| 46 | 1 | 12.111005783081055 |
+--------+------------+--------------------+

附录