All entries for July 2018
July 30, 2018
Committing Sins
A short one again, this time about commit messages. Some of this, particularly the message formats, is specific to Git, but the rest applies to any version control system, even the trivial one of saving a timestamped version of code files.
What I say here has been said many times, but it always bears repeating, because things like this don't matter... until they do, and you're rummaging through weeks of messages looking for some key word you don't recall and trawling through endless useless messages that only say "Fixed a bug" or "Cleaned up code". Having a code history is only useful when you come to use it, usually when something has gone a bit wrong, at which point you want concise, meaningful and absolutely correct data.
The Unwritten Rules
First, there are some simple rules on how a git commit message should be formatted, that are a bit hard to find written down, although some editors like Vim will highlight commit messages according to them. They are noted in the Discussion section of the man page for git-commit too and are discussed here. The expanded rules are:
- Follow any specific rules for a particular project first!
- The first line is the subject, and should be less than 50 characters.
- The second line must be blank. Any text here is ignored.
- The subsequent lines are the message body, and should generally be less than 72 characters.
- Use as many lines as you like, but be concise. If you need paragraphs, put blank lines between them
- If your code uses an issue tracker, include reference numbers for connected bugs etc
- Explain why, and what in the broad sense (see next section)
DRYness and Wit
Next, one of the overarching themes in programming is 'Don't repeat yourself' (DRY). As soon as information appears in two different places, you risk them becoming out of sync, and double the work of changing or understanding the thing. I talked about this a bit in the last post under the banner of "self documenting code". There, the idea is to let the code speak for itself, rather than trying to explain it with comments, saving your comments for the 'whys' and the information not available this way. Commit messages are the same. Anybody can look at the changes you made to the code, and know what precisely you did. The message is your chance to tell them why you did this, how this fixes a problem or what was wrong.
XKCD has a comic on this (he always does) and there are many humoroussites, some ruderthan others, as well as endless blog, forum and user guide entries with their own ideas about what is best. I like to mention once again the 'Principle of least surprise'. Make sure your code does nothing surprising in light of your message. For instance, NEVER commit something with message"Whitespace changes" if it does other things too!
Making History
Finally, there is the question of how large, how contained and how functional commits should be. As usual, there are as many theories about this are there are programmers. Some like to commit little and often, and before sharing that branch with anybody, squash togetherchunks into logical blocks. Some consider that to be a forbidden sort of rewriting of history. There are many, many ways to manage your work, and few rules. But:
- Alwaysfollow any guidelines for the project you're working on first and foremost!
- Anything you share should compile (if relevant) and run, although it needn't work in all respects. Otherwise, tracking when a bug was introduced can be a nightmare, particularly once your part-working code is merged with somebody else's and the result does what neither of you expected. See below for more.
- A large feature should probably be spread across several commits - building it up piece by piece.
- What you share should make sense. It's OK to try things out, find they don't work, and go back, but you probably only want that to be part of the history if its useful in showing why it didn't work.
Note: There is a crucial difference between code which is written to not work, and code which doesn't work. The former is things such as a function called 'is_thingy' always returning 'false'. The latter is that function returning something undefined (in fact, one release of the GCC C compiler had exactly the former for its regular-expression parsing code, always returning no-match). However it is usually acceptable for such a function to return an error, or throw an exception, because the calling code can handle that.
Summary
There's nothing in here that hasn't been said many times before, but the real take away is pretty simple - whatever strategy for messages, comments and code style, it should always be about increasing what you know about the code, not repeating it!
July 11, 2018
Maths in Code
Programming Equations
Short post this week, with a simple theme: how should you translate equations into code?
Self Documenting Code
Self-documenting code (aka self-describing code) is code that documents itself, rather like the term 'self-documenting code' already tells you what it means. The idea is to choose function names, variable names and types, and (in relevant languages) Class or Object hierarchies so that they demonstrate what they do. This can mean a lot of things, and tends to mean slightly different things to different people. In general, especially in cases of ambiguity, one aims for the "principle of least surprise" - what is the least unexpected meaning of something with this name? For instance, old fashioned C programmers, might consider "i" to be a perfectly self-documenting loop variable, as the usage is very familiar.
This leads us nicely into the spirit of self-documentation. Programming is hard. Choosing sensible names and structures reduces the effort needed to keep track of what things are, as well as everything else in your program. This means that the names you choose should help you, and whoever reads or uses your code, work out what things mean. Imagine coming back to your code after weeks or months not working on it. Can you read it through and guess, correctly, what things mean? If not, consider reworking your namings or structure.
Names shouldn't just repeat information already available though, as that wastes space which could be better used. For example, a boolean or Logical variable already tells you that this is probably some sort of flag, controlling whether or not something should happen or has happened. "write_to_file" is a useful name. "flag" is not. "is_open" is a useful name for a flag associated with a file, "open_flag" is rather less so.
On this note, there is a common naming scheme which uses so-called Hungarian Warts, where some sort of 'type' information is prepended using a few (< 3) letters. Originally these were used to add 'extra' information, such as 'us' for 'unsafe string', which can be very useful. In languages like Python, where definitions don't specify type info, even just tages like 's' for string can be useful, as long as you make sure this is actually true!
Don't go mad with descriptions though. Supposedly the average person can keep 7 things in mind at once so you definitely don't want to occupy all that space with your variable names. 2 or 3 pieces is about the limit for most names, so try and use things like "input_file" not "file_to_read_data_from" and avoid linking words as far as practical. Keep names short and sweet, to save typing and brain-space.
By the way, if you use an IDE to program, you might feel that none of this matters, but while it will fill in the names for you, it really doesn't solve everything. You still need to keep track of which variable you want and select the correct one in ambiguous cases, and keep track in your head of what is what. The self-documenting ideal means that when your IDE gives you a list of 5 similar looking variables, you barely have to think to select the one you need in context.
Self Documenting Equations
Take a nice simple equation, ax^2 + bx + c = 0 and imagine programming something to solve it. If you took the self-documenting advice at face value, you might convert "x" into "polynomial_variable", and "a, b, c" into "coefficient_of_square_term", "coefficient_of_linear_term" and "coefficient_of_constant_term". That leaves the quadratic equation looking pretty complicated, as
polynomial_variable = (- coefficient_of_linear_term +|- sqrt( coefficient_of_linear_term^2 - 4 * coefficient_of_square_term * coefficient_of_constant_term) )/ (2 * coefficient_of_square_term)
instead of the far more familiar
x = (- b +|- sqrt( b^2 - 4 * a * c)) /(2 * a)
Similarly, if you program equations from a paper, it can be much, much easier to use variable names that match the paper, especially when trying to verify them. Sometimes it is even a good idea to mimic the paper's equation format, at least until things are working. More on that in the next section.
Optimising Equations from Papers
So what about rearranging equations for optimisation purposes? If you ask ten different people, you'll get ten different answers, but in my opinion the following are the keys:
- Only simplify once the code works. Don't bother collecting, hoisting or other tweaks until you need to.
- Keep names generally the same as the paper, but ideally remove ambiguity, so don't use 'a' as well as 'A'. Consider 'a' and 'bigA' for more distinction. Absolutely avoid ambiguities like 'l' and '1'.
- When you do change to rearranged equations, consider leaving the original code present in a comment. This is one of the few places where commented code is a good thing.
- When you combine or collect terms, start by just combining their names or the functions use, so use 'A_squared', 'sin_theta' or 'A_over_B'
- Include a link to the paper or source you used in the code, so that your variable names remain useful. Somebody with a source that uses different namings will find this style really tricky. If you can, consider including a snippet in your docs detailling the original equations and what rearrangements you did.
Find your style, stick to it, and optimise only as much as necessary to get the performance you need!
Mathematica for Complicated Expressions
If your equations need substantial rearranging or simplifying, you might turn to a tool like Mathematica to do this for you, but did you know that it can even write C or Fortran code directly?
https://reference.wolfram.com/language/tutorial/GeneratingCAndFortranExpressions.html
You can save a lot of typing, and avoid some really easy mistakes by taking advantage of this. Don't forget FullSimplify and to apply suitable restrictions on variables to get a nicer expression before doing this though.