JSON_HASH
Description
JSON_HASH calculates a hash value for a JSON object. This function accepts a JSON type parameter and returns a BIGINT hash value.
When calculating the hash value of a JSON object, the function sorts the keys of the JSON object before calculation, ensuring that JSON objects with identical content but different key orders produce the same hash value.
Syntax
JSON_HASH(json_value)
Alias
JSONB_HASH
Parameters
json_value - The JSON value for which to calculate a hash value. Must be of JSON type.
Return Value
Returns a BIGINT hash value.
When the input is NULL, the function returns NULL.
Usage
Since the JSON standard specifies that key-value pairs in JSON objects are unordered, to ensure consistent identification of JSON objects with the same content across different systems, the JSON_HASH function sorts the key-value pairs before calculating the hash value, similar to calling the SORT_JSON_OBJECT_KEYS function.
Additionally, for duplicate keys in JSON objects, although Doris allows them to exist, the hash calculation only considers the first occurring key-value pair, which better matches real-world application scenarios.
Examples
- Basic hash calculation
SELECT json_hash(cast('123' as json));
+--------------------------------+
| json_hash(cast('123' as json)) |
+--------------------------------+
| 5279066513252500087 |
+--------------------------------+
- Verifying alias function
SELECT json_hash(cast('123' as json)), jsonb_hash(cast('123' as json));
+--------------------------------+---------------------------------+
| json_hash(cast('123' as json)) | jsonb_hash(cast('123' as json)) |
+--------------------------------+---------------------------------+
| 5279066513252500087 | 5279066513252500087 |
+--------------------------------+---------------------------------+
As shown, json_hash and jsonb_hash functions produce identical hash values for the same input, confirming they are equivalent alias functions.
- Key sorting verification
SELECT
json_hash(cast('{"a":123, "b":456}' as json)),
json_hash(cast('{"b":456, "a":123}' as json));
+-----------------------------------------------+-----------------------------------------------+
| json_hash(cast('{"a":123, "b":456}' as json)) | json_hash(cast('{"b":456, "a":123}' as json)) |
+-----------------------------------------------+-----------------------------------------------+
| 82454694884268544 | 82454694884268544 |
+-----------------------------------------------+-----------------------------------------------+
The json_hash function generates the same hash value regardless of key order because it sorts the keys before calculating the hash value.
- Handling duplicate keys
SELECT
json_hash(cast('{"a":123}' as json)),
json_hash(cast('{"a":456}' as json)),
json_hash(cast('{"a":123, "a":456}' as json));
+--------------------------------------+--------------------------------------+-----------------------------------------------+
| json_hash(cast('{"a":123}' as json)) | json_hash(cast('{"a":456}' as json)) | json_hash(cast('{"a":123, "a":456}' as json)) |
+--------------------------------------+--------------------------------------+-----------------------------------------------+
| -7416836614234106918 | -3126362109586887012 | -7416836614234106918 |
+--------------------------------------+--------------------------------------+-----------------------------------------------+
When a JSON object contains duplicate keys ({"a":123, "a":456}), the json_hash function only considers the first key-value pair for hash calculation. As shown, the hash value of the JSON object with duplicate keys matches that of the object containing only the first key-value pair {"a":123}.
- Different number type handling
SELECT
json_hash(to_json(cast('123' as int))),
json_hash(to_json(cast('123' as tinyint)));
+----------------------------------------+--------------------------------------------+
| json_hash(to_json(cast('123' as int))) | json_hash(to_json(cast('123' as tinyint))) |
+----------------------------------------+--------------------------------------------+
| 7882559133986259892 | 5279066513252500087 |
+----------------------------------------+--------------------------------------------+
The same numeric value 123, when stored in JSON with different types (int and tinyint), produces different hash values. This is because Doris's JSON implementation preserves type information, and the hash calculation considers these type differences.
- Using normalize_json_numbers_to_double for uniform type
SELECT
json_hash(normalize_json_numbers_to_double(to_json(cast('123' as int)))),
json_hash(normalize_json_numbers_to_double(to_json(cast('123' as tinyint))));
+--------------------------------------------------------------------------+------------------------------------------------------------------------------+
| json_hash(normalize_json_numbers_to_double(to_json(cast('123' as int)))) | json_hash(normalize_json_numbers_to_double(to_json(cast('123' as tinyint)))) |
+--------------------------------------------------------------------------+------------------------------------------------------------------------------+
| 4028523408277343359 | 4028523408277343359 |
+--------------------------------------------------------------------------+------------------------------------------------------------------------------+
This example demonstrates how to solve the above issue: use the normalize_json_numbers_to_double function to first convert all numeric values to double precision floating-point type, then calculate the hash value. This ensures consistent hash values regardless of the original numeric type.
- Handling NULL values
SELECT json_hash(null);
+-----------------+
| json_hash(null) |
+-----------------+
| NULL |
+-----------------+
Notes
-
The
JSON_HASHfunction has an aliasJSONB_HASH, both functions have identical functionality. -
This function sorts the keys of JSON objects before calculating hash values, similar to calling the
SORT_JSON_OBJECT_KEYSfunction. -
For duplicate keys in JSON objects, the function only considers the first occurring key-value pair for hash calculation.
-
Because Doris's JSON can store numbers in different types (int, tinyint, bigint, float, double, decimal), the same numeric value with different types may produce different hash values. If consistency is required, you can use the
NORMALIZE_JSON_NUMBERS_TO_DOUBLEfunction to convert all numeric values to a uniform type before calculating hash values. -
When JSON objects are created through text parsing (such as using
CASTto convert a string to JSON), Doris automatically selects the appropriate numeric type for storage, so typically you don't need to worry about numeric type inconsistency issues. -
Note that if you don't manually convert "123" to a JSON object using
cast/to_json, but instead use text conversion (parsing JSON objects from strings), Doris will only store "123" as a tinyint type JSON object, and won't have a situation where "123" is stored as both int type and tinyint type.