Archive | Thread RSS for this section

On ArrayLists and Vectors

Long ago, when I first started doing Java, I struggled against Java’s limitations regarding the resizing of arrays. Eventually after poking around for solutions, I found the ArrayList. Before that, I found the Vector, but for whatever reason, NetBeans was saying that Vector was deprecated, so I settled on ArrayList, and it was fine.

A few years later, I took my first class on C++, and learned about the std::vector. I was also introduced to the concepts of the vector and the linked list. Needless to say, I was confused. If vectors and linked lists aren’t the same thing, then what is this ArrayList thing? At the time, I elected not to pursue the question, needing to focus on figuring out these “pointer” things they were making us learn. I put it out of my mind.

Flash forward to today. This morning I was travelling the internet, and I came across a forum thread. The topic of the ArrayList came up. Curiosity finally overcame me, and I looked into the issue.

So, About Those ArrayLists…

The Java Collections framework is broken up into various datatype interfaces. There is an interface for lists, queues, maps, sets, and deqeues.

You’ll notice that there is no vector interface. While a list and an array have different semantics and use cases, you can use them roughly in the same way. “Array indexing” can be implemented on a list by iterating to the nth node, and arrays can be traversed like a list. Since there’s nothing a list can do that a vector can’t and vice versa, it follows that they can both implement the same interface. Sun had to make a choice: name that interface “list” or “vector”. They went with list.

Given that Java’s vector type implements the list interface, it follows that they would reflect that in the name: ArrayList. For all intents and purposes, ArrayList is equivalent to C++’s std::vector. ArrayList implements the Collections framework’s list interface, but it is a growable array. It behaves like a vector, indexing is cheap, resizing is expensive. Java also provides a LinkedList that behaves like a linked list.

So, what about Java’s Vector? It seems that Vector predates the Collections framework. After the introduction of the Collections framework Vector was retrofitted to be a part of it, also implementing the list interface. So, what’s the difference between ArrayList and Vector? Vector is synchronized, and therefore is safe to use in threaded code. Due to this it is significantly slower than ArrayList.

Simply put: use ArrayList in single-threaded code, and Vector in multi-threaded code.

Why Names Matter

This is why names matter. One might not think much of it, but imagine how many hiring managers would be out an interview question if ArrayList and Vector had been named Vector and SynchronizedVector respectively?

DMP Photo Booth 1.0

Well, the day has come and gone. DMP Photo Booth’s final test on June 21st went off without issue, and DMP Photo Booth has left Beta and is now considered “production ready”. The initial 1.0 release can be found on GitHub.

The significance of June 21st is the very reason DMP Photo Booth was created; the 21st is the day of my wedding. My wife wanted a photo booth for the reception. We looked into renting a photo booth, but it turns out that they run around $1,000. I turned to open source. Some quick googling turned up some options, but they were all personal projects or out of date. Sure I could get somebody else’s project working, but what’s the fun in that? I decided that we didn’t need to rent one, or download one, I could build it!

In late 2013, I set to work in earnest. I had a couple of months of downtime in school, and since I’m not currently working it was the perfect time. I decided I had three main objectives for this project: get some arduino experience, get some GTK+ experience, and do this all as portably as possible. I had initially decided to mostly ignore GLib and focus on GTK, but slowly I grew to appreciate GLib for what it is: the standard library that C never had. First I used GModule to handle shared libraries in a portable manner. Next I decided to use GLib primitives to keep from having to deal with cross-platform type wonkiness. Next, having grown tired of dealing with return codes, I refactored the project to use GLib’s exception replacement: GError.

Lessons Learned

It’s not all roses and puppies though. There are certainly things I’d do differently. DMP Photo Booth is developed in an Object Oriented style, passing opaque structs with “method” functions that operate on them. Each component of the program are organized into their own source file with file scoped globals scattered throughout. Said globals are protected by mutexes to create a semblance of thread safety. That said, threading issues have been a major thorn in my side. Long story short: I regret this design choice. While I still feel that this is the correct way to structure C code, and that if globals are required, this is the correct way to handle them; I feel that I should have made more of an effort to limit side effects. Recently, I’ve spent some time doing functional programming, and if I could do it again I’d try to write in a more functional style. Fortunately for me, this is something that a little refactoring could help with.

Additionally, one thing I thought would be a major help is something that began to be a major thorn in my side: NetBeans. As the size of the project grew, NetBeans got slower and slower. It seemed that I spent more time fiddling with IDE settings than actually coding. Even worse is that the IDE-generated makefile is so convoluted that it’s extremely difficult to modify by hand in a satisfying way. I’ve always coded with and IDE so I wouldn’t have even considered not using one, but then I spent some time with Haskell. One of Haskell’s “problems” is that it doesn’t have good IDE support. It doesn’t seem like any IDE really handles it well, so most people use Emacs. Personally, I haven’t really warmed up to Emacs, but GEdit has syntax highlighting for Haskell and a built-in terminal for GHCI. GEdit also has syntax highlighting for C. Next time, I will seriously consider using a lighter-weight text editor for a C project. All this said, I think NetBeans for Java remains the way to go.

What’s Next

Like any program, version 1.0 is just one of many versions. There certainly remains a lot of work to do with DMP Photo Booth. Some major items you are likely to see whenever I get around to working on DMP Photo Booth some more:

Options Dialog

I think anybody who has seen it will agree: the options dialog in DMP Photo Booth is bad. It’s poorly organized, and kind of wonky. Personally, I modify settings using the .rc file, which is telling. This is certainly a high-priority improvement.

Functional Refactor

Like I said above, the code could use a pass to limit side effects. Funtions need to have their side effects limited, and globals need to be eliminated unless absolutely necessary. However, C is not a functional language. While one could argue that function pointers enable functional programming in C, this is a very pedantic argument. I won’t be going crazy with functional programming techniques. There will be no Monads, or for loops being turned into mappings of function pointers.

Optional Module API

An idea I’ve had on the back burner for a while is an optional module API. This would be used for very specific quality-of-life things. For instance, a module could provide a GTK widget to be shown in the options dialog. Any module that doesn’t want to implement any or all of the optional API can just ignore it. The module loading function will gracefully handle the dlsym failure, just treating it as it is: declining to implement the API. I have no plans to change the current existing API, so all you module developers can rest easy!

User Interface Module

It occurred to me that it might be good to have a UI module. This would provide the UI, and wouldn’t be tied to the trigger/printer/camera module start/stop system. This module would be loaded at startup and unloaded on shutdown. This would allow the Photo Booth to use different widget toolkits: QT, Curses, Cocoa, WinForms, or whatever else. Under this scheme, the current GTK+ interface would be abstracted into the reference UI Module.

Forking A New Process Using GLib

These days we tend to think of concurrency in terms of spawning threads. Need to perform a long running calculation? Spawn a thread. However, there are other ways; we can fork and create a new process. Unfortunately for us, fork and threads don’t play nice together. How so, you ask? When you fork a new process, only the current thread is copied into the new process. If any other thread held a lock on a mutex, that mutex will never be unlocked in the new process. This includes mutexes held by system calls such as malloc.

In light of this, you may be wondering why I’m wasting your time with this. No, this isn’t just a PSA, there is something sane you can do with fork in a multi-threaded world: you can call exec and friends. And it just so happens that GLib can help us with this. GLib provides us with Process Spawning facilities that integrate with GIOChannel and GMainLoop.

The first thing you may notice is that there isn’t actually a GLib equivalent to fork or exec. These two calls are combined into the g_*_spawn_* family of functions. The reason for this is because GLib itself spawns threads to perform work. By default, *all* GLib applications potentially have threads running and as such it is never safe to call fork without immediately calling exec.

Forking A New Process

First, let’s do some setup:

gchar * child_argv[] = {"[PROGRAM_TO_RUN]", "[ARGUMENTS]", NULL}

This is the command that will be executed (Your argv). Since this array is terminated by NULL, GLib is able to determine its length and we do not need an argc.

GPid pid; gint stdout; GError * error = NULL;

We’ll need these as well. Now, it’s time to start our process:

gboolean result = g_spawn_async_with_pipes (NULL, child_argv, NULL, G_SPAWN_DEFAULT, NULL, NULL, &pid, NULL, &stdout, NULL, &error);

Yeah… That one’s a doosey. Let’s go over all those fields.

The first argument is the child’s working directory. If this is NULL, then the child inherits the parent’s working directory.

The second argument is the child’s argument vector. This is the command that will be executed.

The third argument is the child’s environment. Like the argv, this must be NULL-terminated. If NULL the child inherits the parent’s environment.

The fourth argument is the child’s spawn flags.

The fifth argument is a pointer to a GSpawnChildSetupFunc function, to be called just before exec. If null, then the process will fork and exec without additional setup.

The sixth argument is the gpointer to be passed to the GSpawnChildSetupFunc.

The seventh argument is a location to return the PID of the new process.

The eighth, ninth, and tenth arguments are return locations for the file descriptors of STDIN, STDOUT, and STDERR respectively.

The last argument is a return location for a GError if something goes wrong. This function returns FALSE if something goes wrong.

What Now

So you’ve got your fancy new process, what do you do with it?

Well, first let’s create some GIOChannels using our file descriptors:

GIOChannel * outch = g_io_channel_unix_new(stdout);

Next, we add callbacks:

GSource * stdout_source = g_io_create_watch( outch, G_IO_IN); g_source_set_callback(stdout_source, stdout_callback, outch, NULL); g_source_attach(stdout_source, main_context); GSource * stdout_abort = g_io_create_watch( outch, G_IO_ERR | G_IO_HUP | G_IO_NVAL); g_source_set_callback(stdout_abort, abort_callback, NULL, NULL); g_source_attach(stdout_abort, main_context);

Here, I’ve created two sources: one that will be called when there’s data to be read, and one to be called when something goes wrong. The first call to g_io_create_watch creates a GSource that watches for a certain condition. The second call to g_source_set_callback tells the watch what function to call when the condition is met. This function should have the following signature:

static gboolean callback(gpointer data)

The final call to g_source_attach attaches a source to a GMainContext. If NULL is passed to the second argument, then the default context is used.

…and that’s all there is to it! Your callbacks can operate on file descriptors using the g_io_channel_* family of functions, and when the abort callback is called, it can exit gracefully.

DMP Photo Booth: Underwater

You’ve heard it before: “Premature optimization is the root of all Evil.” Capital Evil. So you go on about your day, arranging the ones and zeros in pretty christmas tree shapes and suddenly the day arrives: your program is slow as molasses. What are you going to do now?

Last monday was that day for me, and I’ve been underwater ever since. “Why is this happening to me?!” I thought. While not prematurely optimizing, I thought I did things right. I have no nested for loops. I’m not using an array when I need a list. Threads aren’t modifying the UI willy-nilly. Why has God forsaken me?

The Symptoms

I first noticed it while working on the printer module. After the program is open for some length of time, my whole computer begins to lag. Not just a little bit either; things completely fall apart. In the space of about 5 minutes, the computer becomes unusably slow. Killing the Photo Booth process doesn’t help; only physically shutting the computer off helps. Of course, the computer is so slow that I can’t use the shutdown option; I have to press The Button.

At this point, I feel some context is in order. I had been trying to figure out how to make my printer print on photo paper. Apparently printing is one of the areas Linux still hasn’t caught up to windows on, so this was proving to be difficult. After printing a few strips, I realized that my low-res photo strips weren’t going to cut it, so I bumped the resolution from 100 pixels wide to 1000. It was then that I noticed things were off.

Ten years of troubleshooting experience kicked in: “what changed?” I thought. The obvious answer was the image size. Clearly my photo strip assembly algorithm was operating at O(n^n^n) or something. What can be done?

Doing It Wrong

I took a look at my assemble strips function. After poking around for a while, I zeroed in on something that had been bugging me for a while. I had been using a function MagickResetImagePage combined with MagickCoalesceImages to composite images over each other. I had decided to use these functions before I knew this operation was called “compositing”, and I had found them in a tutorial on making animated .gif files in MagickWand. At the time, I was never really happy with this implementation, so I went back to the API docs to see if there was a function with “composite” in its name. There was.

MagickCompositeImage is a lot more intuitive to use than MagickResetImagePage. It doesn’t have that Magickal formatting string that MagickResetImagePage uses, it just takes coordinates. Perhaps this was the solution to my problem. I refactored, and recompiled.

Still broke.

Measure, Don’t Guess

That old gem: I’m sure you’ve heard it too. I decided that maybe this was my best course of action. I decided it was time to learn how to use this Valgrind thing all the Cool Kids are talking about these days. For those of you not in the know, Valgrind is a utility that will tell you various things about your program. The most important/most well-known thing that it can do for you is identify memory leaks. Thinking that prehaps I have a memory leak, I installed Valgrind and got to work.

It turns out that GTK has more than a few memory leaks. Allegedly this is due to the fact that it doesn’t cleanup on exit, relying on the OS to free the memory on program termination. While the general consensus is that this is fine, it doesn’t help us. The folks at Gnome are aware of this, and there is even a Wiki page on ways to mitigate this. The cliff’s notes version of that page being: “Just search for ‘definitely lost'”.

Armed with this piece of wisdom, I set off. I ran the Photo Booth in Valgrind, and examined the results. Valgrind actually turned up some memory leaks, which I corrected. Maybe now we’re set!

Nope.

Breaking Out The Profiler

This is what they usually want you to do when they tell you to Measure. Unfortunately for me, NetBeans’ built-in profiler is only for Java. After some google searching, I found gprof. Gprof is a pretty bare-bones profiler. It does what it says and not much else, which is fine. I hooked my program into the profiler and got to work. The results? Nothing. My two GTK idle functions ran some 7 million times, returning basically immediately each time as expected. Every other function performed as expected.

What now?

Trying The Process Monitor

Having run through Valgrind and GProf, coming out empty-handed, I was at a loss. I got into development because I wanted to fix my own broken code instead of mitigate somebody else’s, and fix it I will. Luckily I have 10 years of sysadmin experience to fall back on. I dusted off my process monitor and got to work.

I fired up DMP Photo Booth, and watched it in the process monitor. I pushed the button. I pushed it again. And again. memory use rose and fell predictably as the strip was assembled, but CPU usage stayed relatively low. Then boom!

I tried again, this time doing literally nothing. Still my computer sputtered and died. I killed the process, but again it was too late.

But wait, isn’t the OS supposed to clean up after me when my process ends? Something fishy is going on.

Have I Mentioned That Threads Are Hard?

Having eliminated all other possibilities, I was forced to consider that I was having a threading issue. “But I was so careful!” I thought. Shortly thereafter I noticed it: I was getting random pthread mutex errors on my console. Clearly I had a threading issue on my hand. Was I spawning extra threads? Was something not releasing its lock? Was I being victimized by gremlins? I set a break point on line one of main() and fired up my debugger. It was time to see just what was being done when nothing was being done.

So, I stepped through my program. Whenever I got to a g_thread_new call, I made sure the thread function was solid. Finally, I got to my g_idle_add calls. I had two of them, one to monitor the status indicators, and one to retrieve photo strip thumbnails. Both of these functions pop from a result from a GAsyncQueue. These Queues are fed by worker threads. I thought back to my profiler output and remembered how often these are called. Looking a few lines down I saw a call to g_timeout_add_seconds. This function is basically adds an idle function, but is only called at most X seconds. Maybe replacing the g_idle_add calls with g_timeout_add_seconds was my answer. I refactored and reran.

Nope.

Well, crud. “Are these functions even my problem?” I thought. I commented them out, recompiled and reran.

Fixed.

“So, what’s the difference?” I wondered. All three of these functions rely on the same basic behavior: pop from a GAsyncQueue some result placed there by a worker thread. I looked at the three threads: the thread that was working properly calls g_async_queue_ref/unref, and the two that don’t work do not take a reference, instead accessing the static global variable in their module. I refactored all thread functions that access a GAsyncQueue to take a reference and work on their local copy only. I recompiled, reran, and went to bed. 46,100 seconds later, everything was humming along just fine.

Wait, So I Just Had To Increment A Reference Count?

It certainly seemed odd. That’s like your car not starting if the headlights are out. Sure, they’re important, but the car should still start right?

Looking through the source of glib didn’t help. So far as I can tell, all that does is increment the reference count, and return a pointer. I turned to the documentation, which says “… Whenever another thread is creating a new reference of (that is, pointer to) the queue, it has to increase the reference count (using g_async_queue_ref()). Also, before removing this reference, the reference count has to be decreased (using g_async_queue_unref()). …” While not definitive, this certainly seems to indicate that taking a reference is important.

Frankly, I’m not happy about this answer. This is just the sort of magic solution that I hate; it’s fixed, but I’m not sure why. For the time being, I won’t dwell on it. Moving forward, I’ll be sure that my threads take a reference of a GAsyncQueue before calling methods on it. At some point when all of this is said and done, perhaps I’ll investigate this mysterious reference count.

I have taken away from this a new appreciation of just how brittle threads are. Sure, they are powerful, but shooting yourself in the foot with a 50 cal hurts a lot more than with a 9 mm. I’ll have to be more careful.

It was also a good introduction to GProf and Valgrind. Expect blog posts on the usage of each of these tools soon!

%d bloggers like this: