All entries for January 2020
January 22, 2020
Magic numbers
We've mentioned magic number before but they are one of those things that are always worth mentioning again. Magic numbers are literal numbers that you write into your code that impact the clarity of the code. The alternative to magic numbers is to store the values into variables or preprocessor directives or other ways of associating a number with a name. Exactly what impacts the clarity of your code is often a bit subjective and is certainly research field specific but there are a few rules of thumb
- Is the number only an approximation? You don't want to risk using different approximations in different places in your code. Imagine what would happen if you defined Pi to have different values when using it in different contexts?
- Is the number immediately recognizable? As a physicist I'd recognize "0.5*m*v**2" when I see it in code as being the kinetic energy of an object despite the 0.5 but I'd have more trouble with "4.0e-7 * pi". It's probably the pre-2018 definition of the vacuum permeability but it isn't entirely clear.
- Is the number arbitrary? It's quite common to use a number to specify things like which problem to run or which optional package to use. You might remember what problem 3 is right now but you'll have to look if you ever come back to this code. Even worse, if you lose discipline then you might change what the numbers do if one of the test cases becomes redundant. Replacing your arbitrary numbers with named constants severely reduces these problems. If you give your problem number a sensible name then you'll have a much better chance of remembering what it does, and similarly if you ever remove a problem then if you remember to remove the named constant as well then you won't be able to compile the code when trying to run the old test. A lot of languages provide "enumerated types" or "enums" that allow you to automatically map numbers to names and can also prevent a user from chosing to supply a simple integer instead.
It can feel like this is unnecessary and slows you down when you are just writing a quick code, but one of the major problems with academic software is that it tends to grow. You write a code to solve a problem that you encounter during your research and you aren't terribly careful because you aren't going to keep it. But it is quite likely that you won't put it to one side and never touch it again. You might encounter a similar problem later and modify the code. At that point elements like magic numbers are annoying but you can usually work out what your own code is doing. The major problem comes if you move on and your code is inherited by someone else. In this case they might work everything out perfectly (which is good), they might find that they can't understand it and have to start again (which is annoying but not too troublesome) but worst of all is that they might misunderstand what your code is doing and make changes that are incorrect.
January 09, 2020
Finding The Solution
New Year, New blog post. Just a short one this time, following on my my post on FizzBuzza few months ago. Even a problem as simple as that can be solved in myriad ways, and as I program more and in more languages, I find myself less often wondering how I can solve a problem, and more often how I should.
Most of the languages I work with let me solve problems using the basic command structures, and, as I wrote about last time in The X-Y problemit can be hard, but is vitally important, not to get confused by your partial solution and miss a better one.
Recently I've been learning Perl to do some complex text-processing, and find it to be a drastically different way of thinking to my C/Fortran background. It's tricky to think in terms of text matches and substitutions when I am so used to thinking of the position of each character in a string and working in terms of "index-of-character-X plus 1" (similar to working in terms of for-each loops when one has only used for). For the processing being done, the proper Perl solution is much shorter, easier to understand, etc, although it takes me a bit longer to produce initially.
A recent Stack Exchange post I saw had somebody asking why his boss didn't appreciate his brilliant coding techniques, because design patternswere second nature to him, and his boss wanted to use far simpler solutions. He probably came back to Earth with a bit of a bump when it was firmly pointed out that "patterns being second nature" was actually a bad thing, because it rather sounded like he trotted out the first "pattern" he could think of, instead of actually thinking about the problem he was solving. Nothing wrong with the patterns themselves, but critical thinking is required to decide whether they are suitable, optimal etc.
The other common mistake people make is demonstrated in Terry Pratchett's description of "Death's Swing" (e.g. https://en.wikipedia.org/wiki/Death_(Discworld)#Home), which mirrors the Sunk Costs fallacy. Trying to build a swing for his Grandaughter, the character of Death plows forward inspite of all problems. He hangs the swing from the two strongest branches. These being on opposite sides of the tree, he cuts away the trunk, shores it up and ... This can easily happen when programming and the trick is never to be afraid to throw away (or file away for future use) a solution, even a good one which took a long time, if it stops fitting the problem. In Death's case, it is less because he is unwilling to throw away the work already done, and more an issue of very linear thinking, but the effect is the same.
Hopefully this is already obvious, and you always think before you code, happily refactor or rework your own code, and have an ever-growing solution bank to call on. I suspect very few people are willing or able to throw away all the false starts they probably should though. Just keep in mind that there is a crucial difference between "what solves the immediate problem" and "the code I should probably actually write", and strive for the latter.