All entries for Friday 18 March 2022
March 18, 2022
Getting a little testy
Testing code is essential to knowing if it works. But how do you know what to test? How do you know you've done enough?
Let's be clear to start here, "testing" as we think of it is some form of comparing a code answer to a predicted one, making sure the code "gets it right" or meets expectations. If there is a canonical "right answer" then it's fairly easy to do - if not then things get difficult, if not impossible, but correctness is the goal.
What to test? Well, all of it, of course! We want to know the entire program works, from each function, to the entire chain. We want to know it all hangs together correctly and does what it should.
So when have you done enough testing to be sure? Almost certainly never. In a non trivial program there is almost always more you could test. Anything which takes a user input has almost infinite possibilities. Anything which can be asked to repeat a task for as long as you like has literally infinite possibilities. Can we test them all? Of course not!
Aside - Testing Everything
Why did we say user input is only almost infinite? Well, all numbers in the computer have a fixed number of bits for storage, which means there's a strictly fixed number of numbers. Technically, for a lot of problems, we really can test absolutely every input. In practice, we can't afford the time and anyway, how do we know what the right answer is without solving the problem completely.
Picking random things to check is an idea that's used sometimes, as are "fuzzers", which aim to try a wide range of correct and incorrect inputs to find errors. But these are also costly to perform, and still require us to know the right answer to be really useful. They can find crashes and other "always wrong" behaviour though.
Lastly, in lots of codes, we aren't expected to protect users from themselves, so entering an obviously silly value (like a person's height of 15 ft) needn't give a sensible answer. We can fall back to "Garbage In Garbage Out" to excuse our errors. But our users might be a lot happier if we don't, and in important circumstances we wont be allowed to either.
Back to the Grind
That all sounds rather dreary - writing good tests is hard, and we're saying they're never enough, so why bother? Well, lets back off for a minute and think about this. How do we turn an "infinite" space into a tractable one? Well, we have to make it smaller. We have to impose restrictions, and we have to break links and make more things independent. The smaller the space we need to test, the more completely we can cover it.
Unit testing
Most people have probably heard of unit testing by now - testing individual functions in isolation. It seems like if you do this then you have tested everything, but this is not true. What happens if you call this function followed by that one (interdependency)? What happens if you call this function before you did this other thing (violation of preconditions)?
Unit testing is not the solution! Unit testing is the GOAL!
If we could reliably say that all of the units working means the program works, then we can completely test our program, which is amazing! But in reality, we can't decouple all the bits to that extent, because our program is a chain of actions - they will always be coupled because the next action occurs in the arena set up by the preceeding ones. But we have to try!
Meeting the Goal
So we have to design, architect and write our programs to decouple as much as possible, otherwise we can't understand them, struggle to reason about them, and are forced to try and test so much we will certainly fail. A lot of advice is given with this sort of "decoupling" in mind - making sure the ways in which one part of a program affects another are:
- as few as possible
- as obvious as possible
- as thin/minimal as possible
What sorts of things do we do?
- Avoid global variables as much as possible as anything which touches them is implicitly coupled
- C++ statics are a form of global, as are public Fortran module variables (in most contexts)
- Similarly, restrict the scope of everything to be as small as we can to reduce coupling of things between and inside our functions
- C++ namespaces, Fortran private module variables, Python "private" variables (starting with an underscore, although not enforced in any way by the language)
- C++ namespaces, Fortran private module variables, Python "private" variables (starting with an underscore, although not enforced in any way by the language)
- Avoid side-effects from functions where possible, as these produce coupling. Certainly avoid unexpected side effects (a "getter" function should not make changes)
- Fortran offers PURE functions explicitly. C++ constexpr is a more complicated, but related idea
- Fortran offers PURE functions explicitly. C++ constexpr is a more complicated, but related idea
- Use language features to limit/indicate where things can change. Flag fixed things as constant.
- Use PARAMETER, const etc freely
- In Fortran, use INTENT, in C/C++ make function arguments const etc
- In C++ try for const correctness, and make member functions const where possible. Use const references for things you only want to read
- Use PARAMETER, const etc freely
- Reduce the "branchiness" (cyclomatic complexity) of code. More paths means more to test.
- Keep "re-entrancy" in mind - a function which depends only on its arguments, doesn't change it's arguments, and has no side effects can be called over and over and always give the same answer. It can be interrupted/stopped and started afresh and still give the same answer. This is the ultimate unit-test friendly function!
Overall, we are trying to break our code into separate parts, making the "seams"between parts small or narrow. If we drew a graph of the things in our code and denote links between them with lines, we want blobs connected with few lines wherever possible.
Every link is something we can't check via unit testing, and unit testing is king because it is possible to see that we have checked every function, although we can never see if we have checked every possible behaviour within the function, even with code coverage tools.
Aside - Writing useful unit tests
Even if everything is unit-testing amenable, it doesn't make it actually easy. There's a few things to keep in mind when writing the tests too.
- Poke at the edge-cases. If an argument of '3' works, then '6' probably will too, but will '0'? What about a very large number? Even (a+b)/2 for an average will break down if a + b overflows! A negative number? Look for the edges where behaviour might change, and test extra hard there
- If your function is known only to work for certain inputs, make them preconditions (things which must be true). This can be done either in documentation, or using things like assert
- Check for consistency. If you mutate an object, check it is still a valid one at the end
- Try to come up with things you didn't think about when you wrote the function, or that no "sensible caller" would do
- Check that things fail! If you have error states or exceptions, make sure these do occur under error conditions
Dealing with the Rest
If we could actually produce a piece of code with no globals, no side-effects and every function fully re-entrant, unit tests would be sufficient to check everything. Sadly, this is impossible. Printing to screen is a side-effect. All useful code has some side effects somewhere. Mutating an argument makes a function non-reentrant (we'd have to make a copy and return the new one instead, and that has costs). So we seem doomed to fail.
But that is OK. We said unit-testing is a goal, something we're trying to make as useful as possible. We can do other things to make up for our shortfalls, and we definitely should, even if we think we got everything. Remember, all of those bullets are about minimising the places we violate them, minimising the chances of emergent things happening when we plug functions together.
We need to check actual paths through our entire code (integration testing). We need to check that things we fixed previously still work (regression testing). We probably want to run some "real" scenarios, or get a colleague to do so (sorta beta testing). These are hard, and we might mostly do them by just running our program with our sceptical hats on and watching for any suspicious results. This is not a bad start, as long as we keep in mind its limitations.
Takeaway
What's the takeaway point here? Unit Testing works well on code that is written with the above points in mind. Those points also make our code easier to understand and reason about, meaning we're much less likely to make mistakes. Honestly, sometimes writing the code to be testable has already gained us so much that the tests themselves are only proving what we already know. Code where we're likely to have written bugs is unlikely to fail our unit tests - the errors will run deeper than that.
So don't get caught up in trying to jam tests into your existing code if that proves difficult. You won't gain nearly as much as by rewriting it to be test friendly, and then you almost get your tests for free. And if you can't do that, sadly unit tests might not get you much more than a false sense of security anyway.