Multi-Indexing: Searching Objects in Multiple Ways

In any real-world application where you use objects to represent information, you'll often need to search for those objects using different keys or criteria.

This is where the concept of multi-indexing becomes essential.

Consider a common example: files in a file system. A single file object might need to be found in several distinct ways:

  • By its name (e.g., 'document.pdf').
  • By its residing directory (e.g., '/home/user/documents/').
  • By its full path (e.g., '/home/user/documents/document.pdf').
  • By its unique inode number.

With multi-indexing, each search key requires the file object to be inserted into a separate, dedicated data structure (like a hash table or a tree) to allow for a fast search based on that specific key. This is exactly what TommyDS is designed to facilitate.

You can compare this concept to a SQL database. In a database, a single table can have multiple indexes. Each index allows for a fast lookup of rows based on the value in a specific column (the search key), without having to scan the entire table. TommyDS helps you achieve this same performance benefit in your C application for in-memory data structures.

TommyDS simplifies the management of these multiple indexes, allowing you to have a single object, but link it into several different data structures simultaneously, each using a different field of the object as the search key.

Note that TommyDS provides only partial iterator support through its simple "foreach" functions. If your application needs full, flexible iterators (meaning the ability to walk through all objects in the collection easily) or if you need to preserve the original insertion order of the objects, you must also insert all the objects into a separate tommy_list.

You can then use the tommy_list structure as your primary iterator. This gives you the best of both worlds: fast search via the indexed structures, and ordered traversal via the list.

The next example demonstrates using multiple data structures (a list and several hash tables) to store a 'file' object, allowing access and searching based on different fields.

First, we declare the file object structure, including the required intrusive nodes for the various data structures that will store it.

struct file {
// data fields
char dir[MAXDIR];
char name[MAXNAME];
inode_t inode;
// for containers
tommy_node node; // node for the file list
tommy_node node_by_dir; // node for the file dir
tommy_node node_by_name; // node for the file name
tommy_node node_by_path; // node for the file path
tommy_node node_by_inode; // node for the file inode
};
Data structure node.
Definition: tommytypes.h:211

Next, we define helper functions to compute the hash for each field used as a key and comparison functions to search for an object based on a specified key.

// search function by inode
int search_by_inode(const void* arg, const void* obj)
{
const inode_t* inode = arg;
const struct file* f = obj;
return *inode != f->inode;
}
// compute the hash of a inode
tommy_uint32 hash_by_inode(const char* dir, inode_t inode)
{
return tommy_inthash_u64(inode); // truncate to 32 bits
}
// compute the hash of a name
tommy_uint32 hash_by_name(const char* name)
{
return tommy_strhash_u32(0, name);
}
// compute the hash of a dir
tommy_uint32 hash_by_dir(const char* dir)
{
return tommy_strhash_u32(0, dir);
}
// search function by path
struct path {
char* dir;
char* name;
};
int search_by_path(const void* arg, const void* obj)
{
const struct path* p = arg;
const struct file* f = obj;
return strcmp(p->dir, f->dir) != 0 || strcmp(p->name, f->name) != 0;
}
// compute the hash of a path as combination of a dir and a name
tommy_uint32 hash_by_path(const char* dir, const char* name)
{
return tommy_strhash_u32(tommy_strhash_u32(0, dir), name);
}
TOMMY_API tommy_uint32_t tommy_strhash_u32(tommy_uint32_t init_val, const void *void_key)
String hash function with a 32 bits result.
tommy_uint64_t tommy_inthash_u64(tommy_uint64_t key)
Integer reversible hash function for 64 bits.
Definition: tommyhash.h:121

Now we declare and initialize the data structures.

tommy_hashdyn hashtable_by_dir;
tommy_hashdyn hashtable_by_name;
tommy_hashdyn hashtable_by_path;
tommy_hashdyn hashtable_by_inode;
// initializes the list and the hash tables
tommy_hashdyn_init(&hashtable_by_dir);
tommy_hashdyn_init(&hashtable_by_name);
tommy_hashdyn_init(&hashtable_by_path);
tommy_hashdyn_init(&hashtable_by_inode);
Hashtable container type.
Definition: tommyhashdyn.h:161
TOMMY_API void tommy_hashdyn_init(tommy_hashdyn *hashdyn)
Initializes the hashtable.
void tommy_list_init(tommy_list *list)
Initializes the list.
Definition: tommylist.h:114

We create a file object and insert it into all the data structures.

// creates an object
struct file* f = malloc(sizeof(struct file));
strcpy(f->dir, ...);
strcpt(f->name, ...);
f->inode = ...;
// inserts into the list and hash tables
tommy_list_insert_tail(&list, &f->node, f);
tommy_hashdyn_insert(&hashtable_by_dir, &f->node_by_dir, f, hash_by_dir(f->dir);
tommy_hashdyn_insert(&hashtable_by_name, &f->node_by_name, f, hash_by_name(f->name));
tommy_hashdyn_insert(&hashtable_by_path, &f->node_by_path, f, hash_by_path(f->dir, f>name));
tommy_hashdyn_insert(&hashtable_by_inode, &f->node_by_inode, f, hash_by_inode(f->inode));
TOMMY_API void tommy_hashdyn_insert(tommy_hashdyn *hashdyn, tommy_hashdyn_node *node, void *data, tommy_hash_t hash)
Inserts an element in the hashtable.
void tommy_list_insert_tail(tommy_list *list, tommy_node *node, void *data)
Inserts an element at the tail of a list.
Definition: tommylist.h:219

After all files are inserted, we can now search them by inode, remove a file with a specific path, and list all the files with a specific name, regardless of the directory they reside in.

// searches a file by inode
inode_t inode_to_find = ...;
struct file* found = tommy_hashdyn_search(&hashtable_by_inode, search_by_inode, &inode_to_find, hash_by_inode(inode_to_find));
if (found) {
printf("%s/%s\n", f->dir, f->name);
}
// searches a file by full path and deletes it
struct path path_to_find;
path_to_find.dir = ...;
path_to_find.name = ...;
struct file* found = tommy_hashdyn_search(&hashtable_by_path, search_by_path, &path_to_find, hash_by_path(path_to_find.dir, path_to_find.name));
if (found) {
printf("%s/%s\n", f->dir, f->name);
// if found removes all the references
tommy_list_remove_existing(&list, &obj->node);
tommy_hashdyn_remove_existing(&hashtable_by_dir, &obj->node_by_dir);
tommy_hashdyn_remove_existing(&hashtable_by_name, &obj->node_by_name);
tommy_hashdyn_remove_existing(&hashtable_by_path, &obj->node_by_path);
tommy_hashdyn_remove_existing(&hashtable_by_inode, &obj->node_by_inode);
}
// iterates over all files with a specific name, even in different directories
cont char* name_to_find = ...;
tommy_node* i = tommy_hashdyn_bucket(&hashtable_by_name, hash_by_name(name_to_find));
while (i) {
struct file* f = i->data; // gets the file pointer
if (strcmp(f->name, name_to_find) == 0) { // the bucket may contain also other names
printf("%s/%s\n", f->dir, f->name);
}
i = i->next; // goes to the next file
}
// iterates over all files
i = tommy_list_head(&list);
while (i != 0) {
struct file* found = i->data; // gets the file pointer
printf("%s/%s %lu\n", f->dir, f->name, f->inode);
i = i->next; // goes to the next file
}
struct tommy_node_struct * next
Next node.
Definition: tommytypes.h:216
void * data
Pointer to the object containing the node.
Definition: tommytypes.h:228
void * tommy_hashdyn_search(tommy_hashdyn *hashdyn, tommy_search_func *cmp, const void *cmp_arg, tommy_hash_t hash)
Searches an element in the hashtable.
Definition: tommyhashdyn.h:223
tommy_hashdyn_node * tommy_hashdyn_bucket(tommy_hashdyn *hashdyn, tommy_hash_t hash)
Gets the bucket of the specified hash.
Definition: tommyhashdyn.h:208
TOMMY_API void * tommy_hashdyn_remove_existing(tommy_hashdyn *hashdyn, tommy_hashdyn_node *node)
Removes an element from the hashtable.
void * tommy_list_remove_existing(tommy_list *list, tommy_node *node)
Removes an element from the list.
Definition: tommylist.h:240
tommy_node * tommy_list_head(tommy_list *list)
Gets the head of the list.
Definition: tommylist.h:123

Finally, we deallocate all the file objects and deinitialize the data structures.

// deallocates the files
tommy_list_foreach(&list, free);
// deallocates the hash tables
tommy_hashdyn_done(&hashtable_by_dir);
tommy_hashdyn_done(&hashtable_by_name);
tommy_hashdyn_done(&hashtable_by_path);
tommy_hashdyn_done(&hashtable_by_inode);
TOMMY_API void tommy_hashdyn_done(tommy_hashdyn *hashdyn)
Deinitializes the hashtable.
void tommy_list_foreach(tommy_list *list, tommy_foreach_func *func)
Calls the specified function for each element in the list.
Definition: tommylist.h:358