Archive | Hash Table RSS for this section

DMP Photo Booth: an Experimental GHashTable Transplant

DMP Photo Booth has had a problem; it has been battered by the club of poor design choices. Deep in the bowels of user_interface.c, there was a terrible cancer. There was a struct called dmp_pb_ui_cb_user_data that started life as an innocent little struct to be passed into GTK callbacks. Since the same pointer must be used for all callbacks to pass user data around, this was my solution. The only problem? Scalability.

At first, there was one member to worry about. One member to initialize, one member to free. Then there were 2. Then 4. Then 8. Things began to take a turn for the worse as my user data registration function ballooned to over 300 lines. Dozens of members to track, to new and destroy. Fearing that my poor user_interface.c file would end up on The Daily WTF I prepared for surgery. The cancer must go.

Easier said than done, right? It turns out, not so much.

GHashTable to the Rescue

Once again, GLib has the answer. After some thought, GLib’s Hash Table implementation offers the solution.

A hash table is a collection that associates a key with a value. You can pass in an integer, a string, or any other arbitrary type as a key, and get its associated value. In my case, I have ready-made keys: the Glade widget name. I can use these names as keys and associate them with their respective components as they are pulled out of the GtkBuilder. The filled hash can then be passed in as the user data pointer to gtk callbacks.

Let’s go over some of the details that make this is a good choice for DMP Photo Booth:

g_hash_table_new_full

This is, of course, the GHashTable constructor. Let’s examine an example, straight out of the new implementation:

dmp_pb_user_data = g_hash_table_new_full ( g_str_hash, g_str_equal, NULL, (GDestroyNotify) gtk_widget_destroy );

This constructor takes 4 arguments: a hashing function, an equality function, a key destruction function, and a value destruction function. The hashing function is probably the most important of these. Hash tables generate a hash based on the key to find the value, this function implements the hash on a given type. GLib provides 3 ready-made hashing functions, g_str_hash (for char arrays), g_int_hash (for integers), and g_direct_hash (for void pointers). You can implement your own if you like, just make sure it’s fast. This function is called on all lookups, a slow hash function will destroy your program’s performance.

Next is the equality function. This function is used to test keys for equality.

Finally, we have the key and value destruction functions. These functions are responsible for ensuring memory is cleaned up. These functions will be called on destruction of the GHashTable, ensuring all memory is cleaned up properly. This is the reason we are using g_hash_table_new_full, and not g_hash_table_new. Without these functions, you have to manually empty the hash table and free all values if you want to avoid a memory leak.

g_hash_table_lookup

Next, is our lookup function. This function takes a pointer to a GHashTable, and a key. A pointer to the key’s value is returned, or NULL if the key does not exist. Pretty straight forward.

The Prognosis

Favorable. The new GHashTable is humming along, and dmp_pb_ui_register_user_data has shed hundred of lines. With a little diet and excercise, the patient should make a full recovery.

Imagine: A World Without Strcmp

Recently, I’ve been working on implementing the config file for DMP Photo Booth, and luckily for me, GLib has the answer for this: GKeyFile. GKeyFile is GLib’s implementation of those flat config files you tend to see in Linux applications:

#Some comment! [section] value=words some_bool=true

…like that. I happily set to work, satisfied that today was not the day that I’d have to begin dealing with FILE pointers. Unfortunately, though I don’t have to deal with straight up file IO, I do still have to deal with functions that will fail if you so much as look at them funny. All the functions of GKeyFile take a group and key name string to look up, and if you pass in a bad string, things go off the rails. At first I had planned to use #defined strings, but halfway through implementing input validation in my first access function, I thought that there must be a better way. Strings are “difficult” to compare against known values, and it all seemed like so much work!

Next, I thought of using magic integers, but I’d have to come up with some way to resolve them to a string. Since this was all starting to sound like the beginnings of implementing a hash table, I turned to GLib for its implementation. This also seemed like more effort than it was worth. As I was poking around the docs, I found my answer!

Introducing Quark!

I’m talking about the GQuark. A GQuark functions in a similar manner to a hash table (in fact, if you dig through the source, you’ll see that GQuark uses a hash table.) GQuarks are a two-way association between a non-zero guint32 and a gchar * (A GQuark value of 0 resolves to NULL.) Basically, once you register a string as a quark, you can pass the GQuark object around and use it to obtain the string. You can also attempt to resolve a string to get its integer value if it has been registered. The best part about a GQuark is you can directly compare the GQuark object using operator==, avoiding a call to strcmp(), reducing boilerplate code.

Notable Functions

GQuarks do not have a lot of moving parts. Below are some important functions for using GQuark:

G_DEFINE_QUARK(some_string, function_name)

This macro defines a function named function_name_quark that returns a quark for “some_string”. Basically, this is your replacement for:

#define MAGIC_STRING_NAME "magic_string_literal"

Where you might use the MAGIC_STRING_NAME before, you can now just call your newly defined quark function! Notice up above that in the macro parameter, some_string is not in quotation marks. This is important; the function defined by the macro will strigify the first parameter. For example, if you enter this:

G_DEFINE_QUARK(The quick brown fox jumped over the lazy dog's back\n, dog_fox)

You’ll get a function:

GQuark dog_fox_quark()

…that returns a GQuark for the string:

"The quick brown fox jumped over the lazy dog's back\n"

The second thing to note about this macro is that it declares AND implements the function. In other words, it expands to something like this:

GQuark name_quark() { ...code... }

As you can see, it would be an error to place a semicolon behind the macro.

All of that may seem like a lot to remember, but if you want to define several quarks, it will save you hours of typing.

g_quark_to/from_string and friends

These functions are your GQuark manipulation functions. They will let you convert strings and GQuarks to each other. For instance, if you enter:

g_printf("%s\n", g_quark_to_string(some_quark));

You will print the string contained in some_quark. Pretty straight-forward.

Thread Safety

You may be thinking: “gee, this sounds like it uses a whole lot of globals! Is this thread safe?” I know I was, so I did a little investigation. Looking through the source for GQuark, all the functions that access global variables are protected by a mutex; therefore you may feel free to use this from multiple threads.

%d bloggers like this: