Skip to main content

PERCENTILE

Description

Calculates the exact percentile, suitable for small datasets. First sorts the specified column in descending order, then takes the exact p-th percentile. The value of p is between 0 and 1. If p does not point to an exact position, it returns the linear interpolation of the adjacent values at position p. Note that this is not the average of the two numbers. Special cases:

Syntax

PERCENTILE(<col>, <p>)

Parameters

ParameterDescription
<col>The column to calculate the exact percentile for. Supported types: Double, Float, LargeInt, BigInt, Int, SmallInt, TinyInt.
<p>The exact percentile to be calculated, a constant value. Supported type: Double. Range: [0.0, 1.0]. The second parameter must be a constant.

Return Value

Return the exact percentile of the specified column, with a return type of Double. If there is no valid data in the group, returns NULL.

Examples

-- Setup
CREATE TABLE sales_data
(
product_id INT,
sale_price DECIMAL(10, 2)
) DUPLICATE KEY(`product_id`)
DISTRIBUTED BY HASH(`product_id`) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
INSERT INTO sales_data VALUES
(1, 10.00),
(1, 15.00),
(1, 20.00),
(1, 25.00),
(1, 30.00),
(1, 35.00),
(1, 40.00),
(1, 45.00),
(1, 50.00),
(1, 100.00);
SELECT 
percentile(sale_price, 0.5) as median_price, -- Median
percentile(sale_price, 0.75) as p75_price, -- 75th percentile
percentile(sale_price, 0.90) as p90_price, -- 90th percentile
percentile(sale_price, 0.95) as p95_price, -- 95th percentile
percentile(null, 0.99) as p99_null -- Null value at 99th percentile
FROM sales_data;

Calculates sale prices at different percentiles.

+--------------+-----------+-------------------+-------------------+----------+
| median_price | p75_price | p90_price | p95_price | p99_null |
+--------------+-----------+-------------------+-------------------+----------+
| 32.5 | 43.75 | 54.99999999999998 | 77.49999999999994 | NULL |
+--------------+-----------+-------------------+-------------------+----------+
select percentile(if(sale_price>90,sale_price,NULL), 0.5) from sales_data;

Only non-NULL input values are considered in the calculation.

+----------------------------------------------------+
| percentile(if(sale_price>90,sale_price,NULL), 0.5) |
+----------------------------------------------------+
| 100 |
+----------------------------------------------------+
select percentile(sale_price, NULL) from sales_data;

If all input values are NULL, returns NULL.

+------------------------------+
| percentile(sale_price, NULL) |
+------------------------------+
| NULL |
+------------------------------+