Tuesday, December 15, 2009

¡Tan embarazado!

After making my last post I was thinking about why it bothers me so much that JavaScript uses braces for blocks but doesn't use those blocks for variable scope. I think the reason is that the braces are a false cognate. Braces mean the same thing in many popular languages, and JavaScript uses them in identical contexts to mean something different. We have false cognates, of course, in natural languages. My favorite is from Spanish, where embarazada means pregnant, not embarrassed. But in a programming language, designed deliberately by people, we should try to avoid this stuff, right?

Actually one of the most confusing false cognates I've seen as a programmer is the static keyword of C and C++. There are three major ways to allocate memory for a variable: static, dynamic, and automatic (these terms apply to all languages, not just C). In C global variables are allocated statically and local variables automatically (you can also allocate automatic variables using alloca() on many platforms -- this allows automatic allocation of structures and arrays whose size is not known at compile-time). Memory can be allocated dynamically using malloc() and free() (new and delete in C++). That's all fine. Here's where it gets tricky: you can apply the static keyword to both local and global declarations.

For local declarations this makes sense; they're allocated automatically by default, but if you modify the declaration with static they are allocated statically instead. The scope is still limited to the block in which they're declared, but the lifetime is the full lifetime of the program. For global declarations it does something totally different: it prevents the symbol name from being exported (it limits the visibility of the name of the variable or function to other code in the same object file, although you can export the resources manually using pointers, a common strategy to achieve polymorphism in abstraction layers).

Then C++ comes along and overloads static yet again. By default class member variables have dynamic linkage and functions have static linkage. This basically means that every instance of the class gets its own copy of the variables but shares the same functions. The static keyword lets you declare variables with static linkage; they are statically allocated and thus must have a global definition, as member functions typically do. Similarly, the virtual keyword lets you declare functions with dynamic linkage (this isn't as cool as it is in languages with first-class functions, but it's useful enough).

The real problem, out of these three uses, is C's use of static to limit symbol visibility for globals. Because it's by far the most common use of static, C programmers refer to static variables and static functions when talking about the visibility of their symbols, not their allocation or linkage, using a term that really has nothing to do with visibility at all...

... which is just so pregnant!

No comments: