To help you understand TommyDS's performance, we conducted a thorough benchmark, comparing it against some of the best and most popular existing C and C++ data structure libraries in the conditions of a real-world application.
Here are the data structures included in the comparison:
Note that googlelibchash, googledensehash and concurrencykit are generally not shown in the primary performance graphs because they exhibit numerous performance spikes across different data sizes. You can find specific details and relevant graphs about these in the Notes on Other Libraries section.
The primary purpose of this benchmark is to measure the performance of storing and searching a collection of N pointers to distinct objects, indexed by an associated integer key.
This test methodology deliberately deviates from typical hash table comparisons where the entire object's data is copied and stored directly within the container.
Storing pointers is a more common requirement in real-world applications where the same object must be referenced by or included in multiple data structures (see Multi-Indexing: Searching Objects in Multiple Ways). Duplicating the object in such cases would be inefficient or incorrect. This difference is critical and explains why these performance metrics may differ from other general-purpose hash table comparisons.
To accurately simulate real-world usage where the application must retrieve and access the object's data, the search operation in this benchmark dereferences the stored pointers. This step typically necessitates an additional memory load, often resulting in a cache miss.
This overhead provides a relative performance advantage to intrusive containers (like the Tommy structures) where the required indexing metadata is stored directly within the user's object structure. For these designs, the cache miss incurred to access the object's data is the same one that retrieves the necessary indexing information, minimizing the "additional" cost.
The tests performed are:
Note that Change, Hit, and Miss tests are always performed with N objects already in the container, while Insert builds the container up, and Remove empties it. The objects are always dereferenced upon successful search and removal.
All objects are preallocated on the heap, and the time for memory allocation/deallocation itself is not included in the performance results.
Each object contains a unique integer key, a value field for consistency checks, and a 16-byte unused payload field, plus any required internal data structure links.
The keys used are unique and form a dense domain starting at 0x80000000 and consisting of N even numbers (e.g., 0x80000000, 0x80000002, ...).
Tests are repeated using two key access modes:
The most relevant tests depend on the intended data usage model. If you are unsure which ones to consider, focus on Random Change, since it covers searches, insertions, and removals and gives a representative view of a real-world access pattern.

Observe the vertical split around the 100,000 element limit.
In the Random access tests, hashtables are the clear winners, followed by tries, with traditional trees generally being the slowest.
The best choices in TommyDS for this access pattern are tommy_hashdyn and tommy_hashlin. tommy_hashlin is often preferred for being real-time friendly as it minimizes heap fragmentation.
|
|
|
|
|
In the Forward (sequential) access tests, tries are the fastest, followed by hashtables. Trees remain the slowest option.
The best choices in TommyDS here are tommy_trie and tommy_trie_inplace. tommy_trie_inplace is often preferred as it does not require a custom allocator.
Note that hashtables are also faster in Forward mode than in Random mode. This happens because the objects are allocated sequentially in memory, and accessing them in key order still results in better cache utilization overall.
|
|
|
|
|
Here you can see how memory usage scales for the different data structures.
|
The benchmark was performed on a Core i7 10700 2.9 GHz running Linux. The compiler used was gcc 15.2.0 with the aggressive optimization flags "-O3 -march=native -flto -fpermissive".
Below is the pseudo-code for the benchmark setup, written here using the C++ unordered_map as an example implementation:
This section provides additional context on the tested libraries that are not part of TommyDS.
This C implementation was excluded from the main graphs because it exhibits poor performance and significant spikes in the Change test for certain N values. See this performance graph for a visual illustration of the issue.
This C++ implementation exhibits erratic performance with significant spikes during the Change benchmark test, particularly in version 2.0.4. The older version 2.0.3 does not exhibit this behavior.
The performance degradation is likely caused by the revised reallocation strategy in 2.0.4, which enters a pathological case under the specific access patterns of this test.
This type of degeneration is characteristic of hash tables that use tombstone entries for deletion handling, where the accumulation of tombstones can lead to increased probe lengths and degraded performance.
Note that downgrading to version 2.0.3 avoids this specific issue but does not guarantee immunity from similar pathological cases under different workloads or access patterns.
See this performance graph for a visual illustration of the issue.
Additionally, it does not automatically release memory upon deletion. To prevent an unfair advantage in the Remove test, we forced a periodic memory release by calling resize(0).
This library does not release memory when elements are deleted, which can lead to an unfair performance advantage in the Remove test. It also does not provide a way to shrink its internal storage, so this advantage remains in the benchmark.
A crash bug was found when inserting a key with the value 0. The issue was reported to the author and the necessary fix has been implemented.
The Judy library (specifically JudyL) can sometimes exhibit unpredictable performance depending on the specific platform and input data size. See for instance this graph showing a big, reproducible performance spike at 50,000 elements.
The non-blocking hash set displays severe performance degradation and numerous spikes in the Change test for some data sizes.
This type of degeneration is characteristic of hash tables that use tombstone entries for deletion handling, where the accumulation of tombstones can lead to increased probe lengths and degraded performance.
See this performance graph for a visual illustration of the issue.