Skim over this abbreviated and badly-formatted function definition, from glibpacrunner.c:
static void handle_method_call (GDBusConnection *connection, /* ... */) { /* ... */ if (/* ... */) { gchar *libproxy_url = g_strdup_printf ("pac+%s", pac_url); g_setenv ("http_proxy", libproxy_url, TRUE); g_free (libproxy_url); } else g_setenv ("http_proxy", "wpad://", TRUE); g_proxy_resolver_lookup_async (/* ... */); }
Note:
- This is a GDBus method call handler, so you know GDBus is currently running, and you know this is a multithreaded application.
- The code calls
setenv()
.
Because you’re a careful programmer, you check the manpage before using a function, and read all the way to the bottom, so you know that setenv()
is not threadsafe. Unless your application is trivial and links to no libraries, you can be pretty confident that it is running secondary threads, and that, somewhere, those threads call getenv()
, because getenv()
is used pervasively in libraries. (A frequent culprit for internationalized applications is gettext(), conventionally defined as the underscore function _()
.)
So, what does this mean in practice? It means that if any other thread is calling getenv()
at the same time that you call setenv()
, one or the other call will blow up. This is a timing-dependent race that will only happen occasionally. Thanks to Murphy, you can expect it to only occur on the hardware that your important customer is running, and never for your developers when they attempt to reproduce the issue.
The safest solution is to ensure your application only ever calls setenv()
at the very top of main()
, before your first secondary thread is initialized. It can be quite hard to know which library calls will initialize secondary threads. (What about if you use GOptionContext
? Or GApplication
? If you need to check a setting before changing the environment, how many threads does a GSettings
object spawn when you create it? Even if it was zero — and it’s not — what if a custom GIO module was in use?) Your best bet is to restrict yourself to using setenv()
at the very, very top of main()
, where you know it’s safe. This applies to unsetenv()
, as well.
This is not some theoretical problem. In practice, we have documented:
- Unsafe use of
setenv()
in gnome-session would cause gnome-session to crash, bringing down the entire desktop session and returning the user to the login screen. - In a downstream product using a modified version of gnome-initial-setup, unsafe use of
setenv()
to change LANG on the language selection page would cause the first boot experience to crash, leaving the sufficiently-unlucky user staring at an empty unusable desktop after turning on her new computer.
Both of these issues were rare races that were not possible to reproduce.
In a sense, setenv()
is not really special, in that lots of POSIX functions modify global state and are unsafe to use across threads. The problem with setenv()
is that, once such an issue is identified, it can be very difficult to fix. Sometimes there is an alternative to setting an environment variable, but often environment variables are the only way to configure important settings. If you have no other choice but to set an environment variable, then there is not really much you can do other than risk randomly crashing. Lesson: never make an environment variable the only way for an application to configure an important setting at runtime. Your inconsiderate mandatory environment variable might not be a problem for your own code, but it will be a problem for unforeseen future code that needs to use your code. Mandatory environment variables have caused many problems in GNOME, some unfixable on platforms without library constructors, and some frankly ludicrous. (libproxy, you are not batting very well right now.) The Linux desktop is already stuck with many mandatory environment variables that we cannot get rid of; we do not need to keep adding more. Please, please, please: learn from our mistakes.
I have no clue how to fix glib-pacrunner. It’s already a separate D-Bus service just to allow it to use unsetenv()
(which it does safely) to trick libproxy. I suppose fixing this would require spawning a second helper process in order to run setenv()
before the proxy lookup. More likely, we’ll just leave it broken, since to my knowledge, nobody has reported a crash here yet.
Software is wonderful!