All entries for June 2022
June 06, 2022
Wherfore art thou – variable names matter!
Sometimes, the simplest questions are the hardest to answer. For instance - what is the meaning of the word "the"? If you've never thought about this, have a go. If you'd prefer a non-grammar, or non-English specific example, try to describe the number '1'. Trickier than you'd think, isn't it?
But that is hardly relevant to programming, so lets look at today's deceptively simple question instead. Namely, how long should variable and function names be?
Some people might seem to have an answer to this question, such as "between one and five words, usually two or more". They are wrong. Any answer containing numbers is unhelpful and either sometimes wrong, or too broad to mean anything. For instance, "Supercalifragilisticexpialidocious" [typed from memory... excuse spelling] is one word, and is terrible. And "SolveWaveEquationSecondOrderWithLimiter" is seven and (in the right context) is quite good. "FlagSettingWhetherWeShouldPrintTheAnswerToScreenOrNot" is clearly terrible - but it is thoroughly descriptive.
So why do we struggle so much to answer this question? Because we are in a sort of "tug-of-war" between two competing interests, and which one pulls a little harder depends on a lot of things. Both ends of the rope are anchored in clarity: on the one hand we want maximum clarity for the function - a very descriptive name. This tends to favour longer names, with more detail. On the other hand, we want maximum clarity for the code as a whole - a name that doesn't take up too much mental space in a block of code. This tends to favour shorter names, with fewer, simpler words. To find our optimum, we must somehow balance these two.
A brief aside here into our absolute best weapon in this - make the two ends stop pulling. Find a way to make "descriptive" and "compact" the same thing. Specialised definitions of terms specific to a field of interest, i.e. jargon, really works in our favour here. We must use it with caution, because new jargon can be very jarring and difficult to master, but in general jargon terms are addressing exactly the same problem - descriptive yet compact terms. For instance, a word you may barely think of as jargon - "laptop". Let us expand this definition - perhaps we get "a portable, all-in-one computing device". But we still have some specialised terms here - what is "all-in-one"? What is "computing"? We could continue the expansion almost indefinitely. Or, we can simply use the term "laptop" and in most cases be perfectly understood.
It is important to flag here that word, "most". Is a tablet a laptop? If I strap a monitor to a tower PC and plug in a keyboard, do I have a "laptop"? It's going to depend on context whether those are reasonable (OK, the second one is extremely unlikely to be). But consider similar questions - "do you have a car?" "No, I have a van" - sensible answer, or irritating pedant? "We need to seat 5 people, who has a car?" "I have an MX5" - as it turns out, "car" doesn't always mean 5 seats.
Back from our aside now, new weapon in hand. Careful, selective use of jargon can make our names very descriptive, while remaining compact. If the jargon we use is not universal, we can provide a glossary or add context in our docs. We should be very very careful about making up our own jargon here, because unfamiliar terms can lead to misunderstanding, but if your field provides words, exploit them.
One last important point - sometimes an otherwise optimal name is not useful due to ambiguity. This can be typographical - we should never have names that differ only in characters like '1' and 'l' or differ only in the case (small or CAPITAL) of the letters. Ambiguity can be similarity with some other function name - imagine we somehow tried to have "GetDiscreteName" and "GetDiscreetName" - could you even tell those apart? Ambiguity is an enemy of clarity - avoid it.
OK, actually there is one final point. Clearly redundancy can only harm our compactness of names. But it does have some applications. We might want two functions with very similar names, but different parameters - "SolveTypeAEquation(a, b, c)" and "SolveTypeAEquationNormalised(a, b, param)" for instance. If we can't design the code to make these less confusing, we might deliberately make one name less individually clear, if it makes its usage clearer. So perhaps the second one becomes "SolveTypeAEquationNormalisedWithParam(a, b, param)" which repeats information already in the signature (redundancy), but helps us keep in mind which function we want.
Short posts like this rarely have conclusions, because everything has already been said, usually repeatedly. Since repetition is another form of redundancy that actually works though, lets restate our key point:
We must balance the demands of clarity between our functions and our wider code and try and find a name which is BOTH descriptive and also short and easy to handle. Sometimes one of these will be a bit more important, sometimes the other.