Jemalloc Memory Analysis
Doris uses Jemalloc as the general memory allocator by default. The memory occupied by Jemalloc itself includes Cache and Metadata. Cache includes Thread Cache and Dirty Page. You can view the original profile of the memory allocator in real time at http://{be_host}:{be_web_server_port}/memz.
Jemalloc Cache Memory Analysisβ
If you see a large value of Label=tc/jemalloc_cache, Type=overview
Memory Trakcer, it means that Jemalloc or TCMalloc Cache uses a lot of memory. Doris uses Jemalloc as the default Allocator, so here we only analyze the situation where Jemalloc Cache uses a lot of memory.
MemTrackerLimiter Label=tc/jemalloc_cache, Type=overview, Limit=-1.00 B(-1 B), Used=410.44 MB(430376896 B), Peak=-1.00 B(-1 B)
Before Doris 2.1.6,
Label=tc/jemalloc_cache
also includes Jemalloc Metadata, and it is likely that the large memory usage of Jemalloc Metadata causesLabel=tc/jemalloc_cache
to be too large. Refer to the analysis ofLabel=tc/jemalloc_metadata
Memory Tracker.
During the running of the BE process, Jemalloc Cache consists of two parts.
-
Thread Cache, cache a specified number of Pages in Thread Cache, refer to Jemalloc opt.tcache.
-
Dirty Page, all memory Pages that can be reused in Arena.
Jemalloc Cache View Methodβ
View Doris BE's Web page http://{be_host}:{be_web_server_port}/memz
(webserver_port defaults to 8040) to obtain Jemalloc Profile, and interpret the use of Jemalloc Cache based on several sets of key information.
-
tcache_bytes
in Jemalloc Profile is the total number of bytes of Jemalloc Thread Cache. If thetcache_bytes
value is large, it means that the memory used by Jemalloc Thread Cache is too large. -
The sum of the values ββof the
dirty
column in theextents
table in the Jemalloc Profile is large, indicating that the memory used by the Jemalloc Dirty Page is too large.
Thread Cache Memory is Too Largeβ
It may be that the Thread Cache caches a large number of large pages, because the upper limit of the Thread Cache is the number of pages, not the total number of bytes of the pages.
Consider reducing lg_tcache_max
in JEMALLOC_CONF
in be.conf
. lg_tcache_max
is the upper limit of the byte size of the page allowed to be cached. The default value is 15, that is, 32 KB (2^15). Pages exceeding this size will not be cached in the Thread Cache. lg_tcache_max
corresponds to Maximum thread-cached size class
in the Jemalloc Profile.
Before Doris 2.1, the default value of
lg_tcache_max
inJEMALLOC_CONF
inbe.conf
is 20, which will cause the Jemalloc Cache to be too large in some scenarios. After Doris 2.1, it has been changed back to the default value of Jemalloc 15.
This is usually because the query or load in the BE process is applying for a large number of memory pages of large size classes, or after executing a large memory query or load, a large number of memory pages of large size classes are cached in the Thread Cache. There are two times to clean up the Thread Cache. One is to recycle the memory blocks that have not been used for a long time when the memory application and release reach a certain number of times; the other is to recycle all pages when the thread exits. At this time, there is a Bad Case. If the thread has not executed new queries or loads in the future, it will no longer allocate memory and fall into a so-called idle
state. Users expect that the memory can be released after the query is completed, but in fact, in this scenario, if the thread does not exit, the Thread Cache will not be cleaned.
However, there is usually no need to pay attention to the Thread Cache. When the available memory of the process is insufficient, if the size of the Thread Cache exceeds 1G, Doris will manually flush the Thread Cache.
Dirty Page Memory Too Largeβ
extents: size ind ndirty dirty nmuzzy muzzy nretained retained ntotal total
4096 0 7 28672 1 4096 21 86016 29 118784
8192 1 11 90112 2 16384 11 90112 24 196608
12288 2 2 24576 4 49152 45 552960 51 626688
16384 3 0 0 1 16384 6 98304 7 114688
20480 4 0 0 1 20480 5 102400 6 122880
24576 5 0 0 43 1056768 2 49152 45 1105920
28672 6 0 0 0 0 13 372736 13 372736
32768 7 0 0 1 32768 13 425984 14 458752
40960 8 0 0 31 1150976 35 1302528 66 2453504
49152 9 4 196608 2 98304 3 139264 9 434176
57344 10 0 0 1 57344 9 512000 10 569344
65536 11 3 184320 0 0 6 385024 9 569344
81920 12 2 147456 3 241664 38 2809856 43 3198976
98304 13 0 0 1 86016 6 557056 7 643072
114688 14 1 102400 1 106496 15 1642496 17 185139
Reduce dirty_decay_ms
of JEMALLOC_CONF
in be.conf
to 2000 ms or less. The default dirty_decay_ms
in be.conf
is 5000 ms. Jemalloc will release dirty pages according to a smooth gradient curve within the time specified by dirty_decay_ms
. For reference, Jemalloc opt.dirty_decay_ms. When the BE process has insufficient available memory and triggers Minor GC or Full GC, it will actively release all dirty pages according to a certain strategy.
Before Doris 2.1, the default value of
dirty_decay_ms
inJEMALLOC_CONF
inbe.conf
is 15000, which will cause the Jemalloc Cache to be too large in some scenarios. After Doris 2.1, the default value is 5000.
extents
in Jemalloc Profile contains the statistical values ββof buckets of different page sizes in all Jemalloc arena
, where ndirty
is the number of dirty pages and dirty
is the total memory of dirty pages. Refer to stats.arenas.<i>.extents.<j>.{extent_type}_bytes
in Jemalloc and add up the dirty
of all Page Sizes to get the memory byte size of the Dirty Page in Jemalloc.
Jemalloc Metadata Memory Analysisβ
If the value of Label=tc/jemalloc_metadata, Type=overview
Memory Trakcer is large, it means that Jemalloc or TCMalloc Metadata uses a lot of memory. Doris uses Jemalloc as the default Allocator, so here we only analyze the situation where Jemalloc Metadata uses a lot of memory.
MemTrackerLimiter Label=tc/jemalloc_metadata, Type=overview, Limit=-1.00 B(-1 B), Used=144 MB(151759440 B), Peak=-1.00 B(-1 B)
Label=tc/jemalloc_metadata
Memory Tracker was added after Doris 2.1.6. In the past, Jemalloc Metadata was included inLabel=tc/jemalloc_cache
Memory Tracker.
How to view Jemalloc Metadataβ
You can get the Jemalloc Profile by viewing the Doris BE web page http://{be_host}:{be_web_server_port}/memz
(webserver_port defaults to 8040). Find the overall memory statistics of Jemalloc in the Jemalloc Profile as follows, where metadata
is the memory size of Jemalloc Metadata.
Allocated: 2401232080, active: 2526302208, metadata: 535979296 (n_thp 221), resident: 2995621888, mapped: 3221979136, retained: 131542581248
-
Allocated
The total number of bytes of memory allocated by Jemalloc for the BE process. -
active
The total number of bytes of all pages allocated by Jemalloc for the BE process, which is a multiple of Page Size and is usually greater than or equal toAllocated
. -
metadata
The total number of bytes of Jemalloc metadata, which is related to the number of allocated and cached pages, memory fragmentation and other factors. Refer to the document Jemalloc stats.metadata -
retained
The size of the virtual memory mapping retained by Jemalloc, which is not returned to the operating system through munmap or similar methods, and is not strongly associated with physical memory. Reference document Jemalloc stats.retained
Jemalloc Metadata memory is too largeβ
The size of Jemalloc Metadata is positively correlated with the size of process virtual memory. Usually, the virtual memory of Doris BE process is large because Jemalloc retains a large number of virtual memory mappings, that is, the above retained
. The virtual memory returned to Jemalloc is cached in Retained by default, waiting to be reused, and will not be released automatically or manually.
The fundamental reason for the large size of Jemalloc Retained is that the memory reuse at the Doris code level is insufficient, resulting in the need to apply for a large amount of virtual memory, which enters Jemalloc Retained after being released. Usually, the ratio of virtual memory to Jemalloc Metadata size is between 300-500, that is, if there is 10T of virtual memory, Jemalloc Metadata may occupy 20G.
If you encounter problems with Jemalloc Metadata and Retained continuing to increase, and the process virtual memory is too large, it is recommended to consider restarting the Doris BE process regularly. Usually this only occurs after Doris BE has been running for a long time, and only a few Doris clusters will encounter it. There is currently no way to reduce the virtual memory mapping retained by Jemalloc Retained without losing performance. Doris is continuously optimizing memory usage.
If the above problems occur frequently, refer to the following methods.
-
A fundamental solution is to turn off the Jemalloc Retained cache virtual memory mapping, add
retain:false
afterJEMALLOC_CONF
inbe.conf
, and restart BE. However, query performance may be significantly reduced, and the performance of the TPC-H Benchmark test will be reduced by about 3 times. -
On Doris 2.1, you can turn off Pipelinex and Pipeline by executing
set global experimental_enable_pipeline_engine=false; set global experimental_enable_pipeline_x_engine=false;
, because pipelinex and pipeline will apply for more virtual memory. This will also lead to reduced query performance.