July 24, 2019

SOUP: The FizzBuzz Variations

I like challenges - sometimes I do maths and programming puzzles for fun. It's good practice, and if you go into industry as a programmer you may well get hit with this sort of thing in an interview. Interviewers want to verify that somebody can vaguely code before moving on to more detailled questions, so often set something simple, but originally, novel. The novelty has rather been eroded now, so you can find entire sites dedicated to compiling these questions and drilling them - which we definitely don't recommend here - but they are a good source of challenges.

One classic "challenge" is FizzBuzz. You may have encountered this as a game - I remember it from Secondary School French lessons, to help learn the names of numbers. The basic idea is very simple - you go through reading out the numbers, usually from 1 to 100, and for every multiple of three, substitute the word "fizz", for every multiple of five, the word "buzz" and for multiples of three and five (i.e. fifteen) "fizzbuzz".

For anybody who has done maths past GCSE or so, you've probably met Modulo division in some form. But the best thing about FizzBuzz is that you don't need to have. In fact, there is a complete solution without using anything except addition and equality-checking. Moreover, the core problem is nice and simple, so anybody should be able to create the pseudo-code version even if they have no idea how to check for being a multiple of three or five.

I once read an article from somebody who thought that FizzBuzz was a terribly difficult problem that needed high-level maths to solve. In the comments was somebody claiming that anybody who could just write out the code must have spent far too long drilling interview questions. That seemed a little absurd, but it did make me want to produce a FizzBuzz solution which did indicate spending far too much time thinking about it. Just to see if it was possible.

More seriously though, simple problems like this can be made into a great little challenge. There are many solutions, and you can make the problem almost arbitrarily hard by imposing artificial restrictions. You can always verify the answer, and you can get surprisingly creative. I've compiled a bunch of variations on FizzBuzz, from the outline, to the "cleverest" solution I could create, to illustrate the learning opportunity of this sort of thing. They are all on Github here.

FizzBuzz outline code

Anybody who can program should be able to turn the description above into psuedo-code, which is itself a handy skill. Here, all you have to do is expand the words into something slightly more precise. For example:

loop variable i from 1 to 100
  if i divides by 3, but not 5, print "fizz"
  else if i divides by 5, but not 3, print "buzz"
  else if i divides by 3 and 5, print "fizzbuzz"
  else print i
end loop

So, here is the first place where knowing some maths will help us: I wrote those three branches very carefully to be independent. But they don't need to be! I could just do this:

if i divides by 3: 
  print "fizz"
  set flag indicating not to print i 
if i divides by 5:
  print "buzz"
  set flag indicating not to print i 
if not flag:
  print i

Note that this is not necessarily better! It's different, sure, but to get this to work I need to add an extra flag variable, and have three independent "if"s rather than one "if-else" chain. We can work around this pretty easily, by constructing a string rather than printing it directly, such as (I have this in the repo in Fortran as StringConstructionFizzBuzz.f90)

string = "" 
if i divides by 3: 
  string = "fizz"
if i divides by 5:
  string += "buzz"
if string =="":
  string = str(i)

Before going on to real code, note the other possible approach, if we didn't have a handy way of checking for divisibility:

loop variable i from 1 to 100 
  increment counter_3s 
  increment counter_5s 
  if counter_3s equals 3
    print "fizz"
    set flag_not_print_i
    counter_3s = 1
  if counter_5s equals 5
    print "buzz"
    set flag_not_print_i
    counter_5s = 1
  if not flag_not_print_i
    print i

Both of these "stock solutions" are in the repo in C, Fortran and Python as FizzBuzzModulo and FizzBuzzCounter.

FizzBuzz as a challenge part 1 - pseudo codes

So, what might we come up with as a deliberate variation on this? Well first off, lets fall back onto the "You Aren't Going to Need It" and assume that 3, 5, and 100 are absolutely fixed. Then we have a super simple solution:

print 1
print 2 
print "fizz" 
print 4 
print "buzz"

Simple, and it works. But it's seriously inflexible and we have to type a lot!

But wait a moment! There's a pattern here, isn't there, and it repeats every 15 lines. So what about

i = 1
while i is less than 100
  print i
  increment i
  print i
  increment i
  print "fizz"
  increment i
  print i
  increment i
  print "buzz"
  print i
  increment i
  print "fizzbuzz"
  increment i
end while

Where the ellipsis (...) is, we have enough lines that i ends up as 15 on iteration 1. The nice thing about this, is that we've relaxed the requirement that the end be at exactly 100. Except we have a rather large bug here: we get a multiple of 15 lines, every time. We'd need to find another way of stopping on the last round. In the GitHub repo for this post, I have a silly example using Bash taking this approach (UnrolledFizzBuzzGenerator.sh). It is undoubtedly silly, but this sort of approach is a pretty long way from the obvious first idea, which was after all the intention.

Note the biggest downsides of this code: it is not obvious what it's doing because it is unpacked a little too much, and the 3 and 5 are baked right into the code.

Fizz Buzz Without If

That last example was my first attempt at writing FizzBuzz without using "if", which technically it did achieve. However, "while" can trivially be used to replace if (assuming a language or variant that runs 0 times if the condition is untrue). We can do better than that, surely?

Of course we can, but first we need to think hard about what "if" is doing for us here.

Understanding What If Does

First, we need to find a way of describing the action of "if" without using the word "if". (Note that modern processors actually have inbuilt instructions called "jump-if" so this isn't strictly true, but helps us find a way to code the problem without using "if"). Note that "on" counts as a synonym here.

Consider a simple one line if chain such as

1 if (condition)
2  do thing one
3 else
4  do thing two
5 end if

If the condition is true, we end up on line 2, otherwise we end up on line 4. Then, we end up on line 5 and carry on. So, putting on our geeky hats and trying to be clever, we might think "aha! We end up on line (2 + 0 (true)|2 (false)).

The next step is far more intuitive if you come from C, or one of the other languages where true is just a number. In C, we can write something like

index = 2 + 2 * (condition != true) and get 2 if the condition is true, and 4 if its not, because "true" has value of 1, and false has 0.

Using an actual "goto" instruction is, of course, evil (not entirely true, but good enough), and in this case would be the essence of solving the letter of the problem, rather than the spirit. We need to think more generally.

Function Pointers

I have written before about using function pointers to replace "if" conditions. In this case we can't set a pointer beforehand, but we have just worked out how to replicate the if chains in our original pseudocode. So, what if we made an array, and used something like the expression above to calculate the offset into it, and then called the function we find there?

This variant works pretty well in C, but in Fortran, where Logicals are a distinct kind of variable, it is very clumsy to calculate the offsets, but can be done. This example is in the repo as FunctionPointerFizzBuzz.f90, .c and .py.

Once again, this a fairly silly way to solve FizzBuzz itself, but does use a technique that can be very very handy! Practice with this sort of thing on a silly example is a great way to learn, and if you keep these sorts of snippets around, you have something to refer to when you need to use the technique in anger later.

FizzBuzz for C programmers

The last example, which is the cleverest I could manage to be about this, is pretty deeply based on C semantics. It relies on both how C handles it's strings, and how its pointers work. I'm not going to explain what is going on, but I thought of it after trying to simplify the function pointers example because of how similar the print commands were. This example is in the repo as SillyStringFizzBuzz.c

This example really goes overboard on the cleverness and is not an approach I would ever use in "real code", but it was fun to write and a good challenge. I think it meets my original challenge pretty well - it is FizzBuzz being entirely too clever. If this sort of things appeals, look up the idea of Code Golf and/or Obfuscated C - coding answers to puzzles which are very very short and hard to understand.

That might sound exceptionally silly - why would anybody want to program in a way that's hard to understand? But of course, knowing what's hard to understand really helps know what makes things easy, and brings to your attention all of the sneaky things that might not do what you first think.

Wrap Up

To conclude - there can be a lot to learn even from super simple seeming challenges. In particular, you can lay the challenge of not using some code feature, especially one you first turn to to solve the problem normally. Developing the flair for thinking "outside the box" is very handy when you have such challenges imposed externally.

Cleverness that makes you say "oh that's clever" is often not a good thing - your first reaction should probably include "oh OK, I see how it works" before "that's clever". But developing clever solutions is a good way to learn. Think of it like something like keepy-uppies for a football (soccer) player. They're not a skill that's useful in a game (the Wikipedia anecdote about Scotland not-withstanding) but they develop control skills that absolutely are. It's also a handy display piece to demonstrate those skills, should you need one.

July 10, 2019

Datastructures – Linked lists part 3

Follow-up to Datastructures – Linked lists part 2 from Research Software Engineering at Warwick

This time we're presenting the final piece of the puzzle for linked lists - adding items. Adding items is very, very similar to removing them. To add an item "I" between two other items (lets call them "A" and "B", with A <-> B initially) you just have to set it up so that A's next pointer points to I, B's prev pointer also points to I and that I's prev pointer goes to A and I's next pointer goes to B. You do then have to take care if either A or B don't exist (i.e. you're adding at the start or the end of the list) but basically the idea is always the same. When you don't have a standard that you have to follow interface design in programming is often as much an art as a science and there are several ways of implementing a useful interface to do this. You can write a routine "insert_between(I,A,B)" that adds an item between two specified items (although this the has to deal with what should happen if you try to insert between two items that are not actually linked to each other and also requires the developer to provide redundant information. If this is going to work then A must be linked to B so if you know A you know B and vice versa). To add at the start of the list you specify "A" to be NULL, and to add at the end of the list you can specify "B" to be NULL. This isn't seen very often in the wild but it works perfectly well. You could also write "insert_after(I,A)" and "insert_before(I,B)" routines that would allow you to insert item I either "after A" or "before B", again specifying appropriate NULL items for A or B to add to the start or the end of the list. The most common approach in the wild at the moment is that used by the C++ Standard Type Library containers. Have a routine "insert(I,B)" that inserts the item before a specified item B, a routine "insert_after(I,A) to insert an item after the specified item, a routine "push_back(I)" that adds a new element to the end of your list and a "push_front(I)" routine to add the new item at the start of the list. While I'm not sure that I'd pick them from scratch I'll stick to the STL

void insert(llitem* head, llitem* new, llitem *before)
  new->next = before;
  if (before->prev) {
    before->prev->next = new;
    new->prev = before->prev;
  } else {
    head = new;
  before->prev = new;

void insert_after(llitem* new, llitem *after)
  /*head is not used here because inserting after an item cannot touch the head*/
  new->prev = after;
  if (after->next) {
    after->next->prev = new;
    new->next = after->prev;
  after->next = new;

void push_front(llitem* head, llitem* new)
  if (head){
    head->prev = new;
  head = new;

void push_back(llitem* head, llitem* new)
  llitem * current = head;
  if (!current) {
    head = new;
  } else {
    /*spin on until you find an item that has no next element. That element is the end of the list*/
    while(current->next) current = current->next;
    current->next = new;
    new->prev = current;
SUBROUTINE insert(head, new, before)
  TYPE(llitem), POINTER, INTENT(INOUT) :: head !Head of linked list
  TYPE(llitem), POINTER, INTENT(INOUT) :: new !Item to be inserted
  TYPE(llitem), POINTER, INTENT(INOUT) :: before !Item to insert before

  new%next => before
  IF (ASSOCIATED(before%prev)) THEN
    before%prev%next => new
    new%prev => before%prev
    head => new
  before%prev => new

SUBROUTINE insert_after(new, after)
  !No head pointer because inserting after an item can't touch the head
  TYPE(llitem), POINTER, INTENT(INOUT) :: new !Item to be inserted
  TYPE(llitem), POINTER, INTENT(INOUT) :: after !Item to insert after

  new%prev => after
  IF (ASSOCIATED(after%next)) THEN
    after%next%prev => new
    new%next => after%next
  after%next => new
END SUBROUTINE insert_after

SUBROUTINE push_front(head, new)
  !No head pointer because inserting after an item can't touch the head
  TYPE(llitem), POINTER, INTENT(INOUT) :: head !Head of the linked list
  TYPE(llitem), POINTER, INTENT(INOUT) :: new  !Item to be added to the list

    head%prev => new
  head => new

SUBROUTINE push_front(head, new)
  !No head pointer because inserting after an item can't touch the head
  TYPE(llitem), POINTER, INTENT(INOUT) :: head !Head of the linked list
  TYPE(llitem), POINTER, INTENT(INOUT) :: new  !Item to be added to the list
  TYPE(llitem), POINTER :: current

  current => head

    head => new
    DO WHILE(ASSOCIATED(current%next))
    END DO
    new%prev => current

Once again a diagram will probably make things clearer.

add item to linked list

There is one thing that you can spot easily from the implementations : in order to add to the end of the list you have to cycle through all of the intermediate items to get to the end. This isn't actually a major problem since for a lot of the uses of linked lists you can freely choose to add to the start of the list which happens immediately by just updating the head. However if you want to be able to add to the end of a linked list then it is common to actually store an equivalent of the "head" item that marks where the end of the list is. This item is generally called the "tail" item. I haven't described anything about using tails because it doesn't add anything to the fundamental nature of the linked list while making all of the implementation a little bit messier. Holding the tail also introduces a debugging risk because it is (technically) redundant and can be inconsistent with the actual last element that you would reach by iterating through the list (usually only because of errors in your implementation). The addition of multiple sources of information about something always complicates matters and you have to test very carefully to ensure that they can't become inconsistent.

This series has basically described how to implement a very simple linked list (if you wanted to). There are many limitations and weaknesses of this implementation (not being certain to nullify pointers on items when adding them, not having a tail item and also the fact that each item can only be in a single linked list at any point (usually you relax that requirement by having each llitem have a pointer to your real data rather than holding the data within the llitem structure itself. You do have to be careful to make sure that items are fully deleted when the last reference to them is released though!)) but it does, mostly, work. The final part of this series on linked lists is going to go through why you might not want to use them. Not just this limited implementation but when linked lists are not suitable datastructures and how you can create hybrids of linked lists and arrays that can perform much better.

June 27, 2019

"And then it just clicked

A bit more of the general "philosophy of programming" today, based on a quote I found on the brilliant "C FAQ", currently here and hopefully there to stay. The quote is from Q 18.9b, on learning resources and says:

A word of warning: there is some excellent code out there to learn from, but there is plenty of truly bletcherous code, too. If you find yourself perusing some code which is scintillatingly clear and which accomplishes its task as easily as it ought to (if not more so), do learn everything you can from that code. But if you come across some code that is unmanageably confusing, that seems to be operating with ten bandaged thumbs and boxing gloves on, please do not imagine that that's the way it has to be; if nothing else, walk away from such code having learned only that you're not going to commit any such atrocities yourself.

It's a very good idea to read other people's code when learning - either completely in the wild, or in the form of snippets on sites like Stack Overflow. But always keep in mind that there is some truly terrible code out there, even in commercial packages. It's easy to fall into the trap of thinking that your problem is just "soooo hard" that it just can't be done elegantly, or readably, or even conforming to standards, at all, and mostly that just isn't the case.

Obviously, different people will find different things to be readable, clear, elegant etc. I have an Undergrad degree including Maths, and I rather like the shorthand notation of sums and sets and implication. To some people though, the symbols are simply "all Greek" (pun thoroughly intended). Writing things in a way everybody will find clear is thus often a losing battle. But there is one very very important thing to avoid at all possible costs : the anti-click.

The Click

The 'click' is that feeling of clarity when you jump from grasping the parts of the thing to understanding the whole. I get it a lot with things like Anagram word puzzles, or even song intro quizzes (the kind where you're meant to guess the title). One moment you're seeing a jumble of letters, trying to hold all dozen in your head, and the next you can 'see' the word it must be. Similarly, one moment you're following along the notes and words and the next the key lyric pops into your head and the song is obvious. Optical illusions do it too - and once you've seen it you struggle to "un-see" it again.

This is the 'click' and it's a real asset that you can develop with practice, by reading plenty of code, and writing plenty of code, until you can 'get the gist' of the thing easily.

Symptoms and Diagnosis

It also comes up a lot in debugging - you just 'get the hang' of the way an error pops up, and know what must have happened. That's why reading issue trackers or mailing lists can feel so confusing - the people answering the questions seem to have some mysterious knack for guessing what's wrong from the most un-intuitive errors.

For example, writing in C you may use a function like 'sqrt', and get a 'undefined symbol' error from the linker, which is saying that it can't work out what that function is. So, being logical, you go back and check that you have absolutely definitely included the "math.h" header, which absolutely definitely contains a sqrt function. What did you do wrong? Well probably you forgot to compile using the '-lm' flag, which means "link against the maths library", because that library contains a lot of compiled code (the code isn't in the math.h file). This error can seem very weird, but actually, once you know the cause, it's "obvious".

We had a go at writing a catalogue of programming bugs a while ago, which tried to sort of encapsulate the 'symptoms' of a bug (available here, PDF download). Any bugs, errors or omissions please let us know (rse {@} warwick.ac.uk). Once you see enough bugs, you'll probably start to 'click' and 'just know' what to look for.

The Anti-Click

I mentioned optical illusions up there, and how you can't seem to "unsee" them. Sometimes code has this same sort of thing - an illusory apparent action or cause which is actually not real. This can be awkward and dangerous, because you, and others, will struggle to "unsee" it. It's hard to give an example that isn't really contrived, so I shall fall back onto a classic bit of C: pointer declarations.

In C, you declare pointers using a '*', such as `int * p;`. This is fine. But what about `int * p, q;`? Is q a pointer? No, it's not. Some people argue that the "proper style" will prevent this anti-click, that is, writing that as `int *p, q` and associating closely the * and the p. This is surprisingly hotly debated (e.g. hereand here) with some really tempting anti-clicks (such as here- do read the answers, as the OP there is wrong) but ultimately does need careful thought. You're probably best using whatever you personally find clearest, and changing that only after consideration.

The golden rule - DO NOT write misleading code, docs etc etc. Make it wordy if you have to, but it's not worth the hassle of an obvious, yet wrong, interpretation. As a last resort, just put in a comment explaining what this does not mean, or does not do.

And the flip side - NEVER trust the click - always verify. Always step through line by line and be sure you're right!

Click Immunity

One last thing - there are people for whom some things just never do 'click'. It's very hard to describe, but easy to spot. It's sort of like trying to explain computers to that one relative or quadratic equations to somebody who can't do maths. They're clearly not stupid, but they just don't seem to 'get it'. You explain one thing, like how to print, and it works, but now they need to print from a different program and isn't it just obvious that you do the same thing?

Some people seem to have a knack even beyond that - they don't just "not get it", it's worse! Bug reports display this sometimes - somebody who always gives a lot of information but somehow never includes the vital pieces. They'll post a 100MB log file, but forget to mention which OS. They'll explain exactly what commands they ran but omit some vital parameter.

There's two sides to this - dealing with people who don't click, and dealing when you can't click.

Hell is Other People

Usually you only have one real option to deal with other people who don't get it - guide them through it. Mostly, it just takes longer, not happens never. You probably want to try explaining things different ways, because what feels neat and obvious to you isn't to them. The Socratic style is handy - ask them what they're trying to do, or what they feel they should do, rather than telling.

I just don't GET IT

I expect everybody, sooner or later, finds something that just doesn't click right. You don't get it - it's unintuitive, your gut feeling is always wrong etc. So what do you do? Treat yourself the same as you would somebody else - read different explanations until you find one that fits you. Ask yourself "what am I trying to achieve". Ask yourself "when I ask this question, what might somebody need to answer it" - and if in doubt, ask them! Stop assuming anything, and walk through every step of the problem.

Walk away and do something else - give your brain time to work things through. Never stop asking questions and always learn from the answers. You don't need to 'click' to do something, you just need to watch for wrong intuition. Keep notes - add comments in your code for you to come back to.

Remember - intution is a skill like any other. It can be wrong. And while you might not have a talent for this thing, but with effort, you can make it work.

June 12, 2019

Datastructures – Linked lists part 2

Follow-up to Datastructures – Linked lists part 1 from Research Software Engineering at Warwick

This entry is back to the subject of linked lists. In the previous post on linked lists we described the idea of a linked list and how you created one. In this post I'm going to talk about how you remove items. The idea is pretty simple, to remove item "n" you want item "n-1" to believe that the item after it is now item "n+1" and you want item "n+1" to believe that the item before it is item "n-1". There are a few wrinkles to deal with if the item is at the start of the list (recall that this is usually called the `head` element) but mostly the idea is this simple. This is most easily shown by a code example recalling the definition of linked list items from part 1.

void remove_item(llitem *item, llitem *head)
    if (item->prev) {
    /*Item has an item before it in the list*/
    item->prev->next = item->next;
  } else {
    /*Item does not have an item before it. It is the head*/
    head = item->next;
    item->next->prev = NULL;

  if(item->next) {
    item->next->prev = item->prev;


SUBROUTINE remove_item(item, head)
  TYPE(llitem), POINTER, INTENT(INOUT) :: item !Item to be removed
  TYPE(llitem), POINTER, INTENT(INOUT) :: head !Head of linked list

  IF (ASSOCIATED(item%prev)) THEN
    !Item has an item before it in the list
    item%prev%next => item%next
    !Item does not have an item before it. It is the head
    head => item%next
    item%next%prev => NULL()

  IF (ASSOCIATED(item%next)) THEN
    item%next%prev => item%prev


END SUBROUTINE remove_item

This routine removes an item from a linked list and deals with changing the head of the list to match if needed. After the item is removed from the list it is deallocated. You might not want to do this automatically when an item is removed from a list (you can have multiple lists and freely move items between them) but you do have to do it when you are finished with the item or you will have a memory leak. One of the downsides of linked lists is that since they involve direct pointer operations in normal use memory leaks are a greater risk than in many data structures.

You can also see what's going on in a diagram showing how you remove both a "normal" and a "head" item from the linked list. In both cases links shown in green are new links and links shown in orange are links that are being deleted or changed.

Normal Linked List delete

Removing head element of a linked list

Hopefully the idea is quite clear between the code and the diagrams. If "item" is item "n" in the list then "n-1" is item->prev and "n+1" is item->next. Either or both of these items may not exist if "item" is either the start or the end of the list so you have to cope with these cases. So to update the next element of "n-1" you have to set "item->prev->next" which looks a bit odd but makes sense if you unpack it. Similarly the "prev" element of "n+1" is "item->next->prev".

You will notice that the links on the item being removed aren't touched and the code simply relies on that item being deleted immediately. In a system where you want to retain an item (perhaps to add it to another linked list) you'll probably want to nullify it's prev and next pointers after you remove it from the list.

That code will remove any item from the linked list that it is in. There is however a problem if you have multiple lists, each with it's own head element. Imagine that I tried to use that "remove_item" function where I passed it an element from one list but the head from a different list. So long as "item" isn't the first item of the list that it is in then nothing would go wrong but as soon as it is it would replace the head of the second list. This would cause real problems

  1. The list that "item" was in will become invalid. Because it's "head" element is never updated it will still point to the now deleted "item" and will sooner or later fail when you try to use it. It might, for a while, look as though "item" hasn't been deleted because of this
  2. The original list that you specified the "head" from will now be lost completely. The only reference that you have to a linked list is the head element so if anything happens to this the memory is lost forever. You have both lost data and caused a memory leak
  3. You have also moved data. The list that you specified the head from will now be a valid list, but it will be the list that "item" was originally part of, not the list that is meant to be there.

This type of error in a linked list thus causes memory leaks, segmentation faults and data corruption all in one nice package so you have to be careful to avoid it. You can program in safeguards to prevent doing things like this but they do slow down the operation of the linked list and require extra memory (as well as making rather harder some nice tricks that you can do with linked lists that I'm not going to talk about here) so many working linked list implementations don't bother. You can see graphically what's happening here too.

Error deleting an item in a linked list

Mostly linked lists in operation are stable and reliable but it is possible to get yourself into a bit of a tangle. But you can immediately see that this makes is a lot easier to remove an element from the middle of the linked list than if you had an array. If you had an array then you'd either have to flag that item as "invalid" somehow to know that you shouldn't use it anymore or failing that actually copy all of the elements in your array above the item being removed down to fill the gap. The first one adds complexity and means that you can't actually reduce your memory footprint when removing items (not quite true if you use an array of pointers to items but that's even more complicated) while the second is quite often very slow. This is especially true if you use a library that guarantees contiguity of the elements of your array after every delete operation when you are deleting multiple items (e.g. std::vector in C++. This behaviour is generally an advantage but here is a cost. You can get around it in other ways in C++ but other libraries are less flexible). The cost is managable if you delete many items and then pack the array up but packing after every deletion can be very costly.

The next entry in this section will describe adding items to a linked list which, as you might imagine, is quite similar to deletion but in reverse.

May 30, 2019

Black and white

Quick 'philosophy of programming' entry this time. General solutions to common programming probelms are often called 'design patterns' (e.g. the wiki articleor the book which started the name). The idea of these is to have language independent (as far as possible) 'patterns', like clothing patterns, which can be tweaked to fit a specific situation. A lot of these patterns seem obvious, which is good, and since they're developed and tested by many people they can be very valuable in showing you questions you hadn't even thought of.

This weeks topic is perhaps too simple to really call a pattern, but it is a very useful thing to keep in mind when doing anything that deals with restricting function which exists, but should not be allowed. For example, forms which take user information often disallow anything except numbers in a 'telephone' field. A code I work on has a lot of user-specifiable options, but as the programmer I know that some are incompatible where it might not be obvious to the user - and I want to either warn or abort if these are used together.

There are two general approaches to things like this, and which you choose depends on many things. You have to maintain some kind of list to check against, but you can choose to use either the "blacklist" or the "whitelist". The former, the "blacklist" is a list of the things which aren't allowed, and anything not in the list is OK. The "whitelist" approach means keeping a list of the things which are allowed, and anything not in the list is excluded.

Sometimes the choice is fairly easy, because one method is a much, much simpler list. For instance, in the phone number example, it is fair easier to use the whitelist, allowing only '1234567890', but nothing else. If you try the other way, you might think to exclude letters, but what about Greek or Cyrillic characters? On the other hand, this is a source of deep annoyance if you forget any needed character - in the example I just gave, one could not put any spaces in, which is annoying, nor brackets or the '+' symbol.

A classic example of the poorly-thought out whitelist is in name fields which often exclude characters like the apostrophe, annoying the Scots and the Dutch for eternity. And what about accented letters, or the German ess-tsett. With a whitelist, you need to be sure you've caught everything, or people will be, rightly, upset. For a user-name on a website, and for a password, it is probably fine to allow any ASCII or Unicode character and set up your systems to handle them, leaving far less upset without any real cost to you.

On the other hand, with a blacklist, anything not forbidden is permitted. These are generally used in cases where certain characters have a function and so must be excluded, even if this annoys. So, for instance, in most programming languages variable names may not start with a number, nor contain a comment character.

Apart from the length of the lists, the two methods trade off this annoyance to your user, who must wait until you fix the omission (with a whitelist) against potential unknown failures and security risk (with a blacklist). Imagine the 'incompatible features' problem with both methods. If I use a whitelist, and forget to allow some pairing, my worst case is that I will likely be asked (somewhat irately) why X and Y can't be used together. I realise they can be, I update the code and I make a new release version to fix the omission and everybody is happy. If I use a blacklist and forget that some X and Y don't work together, my worst case is that one day I have to tell somebody that their last n years of research is all invalid, because the simulation they ran didn't work as expected, and since nothing actually went wrong they didn't know. Worse still, would be having to tell them that their fascinating effect is just a code error, and it's my fault.

In some cases, only one or other list type is really viable. For instance, virus scanners keep a list of 'tells' for malicious code, because even though they let things slip through until their lists update, they could never describe all of the 'allowed' code. App permissions (on better, more granular systems) are a whitelist - you give an app the permissions you choose, and only those.

So as a general rules of thumb:

  • If only one method is viable, obviously use that
  • If one or other list is going to be much much shorter, you have better chances of getting it right, so use that method
  • If it is really important not to let things slip through, use a carefully managed, kept up to date, whitelist. If possible, put it into a file or something, so that updates just require sending out new definitions, not modifying the entire code
  • If it's really important not to get accidental exclusions (false positives) use a, similarly carefully managed, kept up to date etc, blacklist
  • In some cases, combine the two. Programming languages generally have a set of allowed characters (a whitelist) and small blacklists for specific contexts such as the first character of a name.

As well as the literal 'blacklist' and 'whitelist' there is a more general principle here - do I selectively forbid, or selectively allow? Do I stop somebody doing this thing here, here and perhaps here, or do I only permit them to do it there and there. If you find the 'here's' or 'there's' proliferating, re-examine whether you're doing it the right way around. In safety or security critical situations, you almost always must allow only what is permitted. If you find yourself trying to plug up security holes with ever growing blacklists, you should probably change tack and think about what should be allowed instead.

May 15, 2019

Datastructures – Linked lists part 1

Back to data structures this month with the linked list. Linked lists are a way of holding data that allows you to add and remove items quickly and easily.

Why not arrays?

First question: why is adding and removing items from an array not quick and/or easy? The problem with adding items is quite simple - arrays have a fixed size so eventually you will run out of spaces in your array to store items. When this happens you have to do something to allocate additional space. Many languages have a function called "realloc" or similar that tries to extend the length of your array but it can only do that if there is unused memory space "above" the location of your array because the array elements have to be arranged one after the other in memory. The concept of "space above" is a bit complex in general and depends on details of your OS etc. but as a general idea if you allocate two arrays then they are placed one after the other in the computer's underlying memory so if you try to realloc the first array then there won't be any space between it and the second array to grow it in. If you can't grow your array like this then you have to allocate new memory to store the bigger array and copy the existing elements in. If you keep adding items then this continual growing of your array can be quite expensive, although this can be mitigated by always growing your array by more elements than you immediately need.

Removing items has the opposite problem. Since arrays are required to be contiguous (can't have gaps in them) you can't just "remove" an item you have to either flag it as empty and ignore it when going through your array in future or take all of the items above the removed element and move them down to pack everything up. The first approach has three problems

  1. You have to use additional memory to flag items as being empty or not
  2. If you are both adding and removing items from your array then since you don't actually recover memory when you remove an item your total memory requirements will grow without bounds
  3. Depending on your algorithm you might have more difficulty getting optimal performance if you have to do fundamentally different things for empty and non-empty array elements

The second approach avoids those problems but on average involves copying half of the elements in your array every time you remove an item which can also be quite expensive.

It is quite possible to build a container based on arrays that you can add and remove items from that has good general performance (C++ std::vector is a good example of one) but they always have to make tradeoffs and if you are doing a lot of adding and removing of arbitrary elements it might be better to use a data structure other than an array.

Linked lists

The idea of a linked list is quite simple. Each element in a linked list is like a link in a chain - linked to the item after them, so you go through the linked list by taking the first item then going to the next item and the next etc. until you reach the end. This is generally implemented using pointers in what are often called "self referential structures", that is structures that contain pointers to themselves. These are easy enough to implement in either C/C++ or Fortran.

struct llitem{
  struct llitem *prev, *next;

TYPE :: llitem
  TYPE(llitem), POINTER :: next, prev
END TYPE llitem

These are more or less normal types but there is one more important rule: self referential structures can contain only pointers to their own type, not actual instances of their own type (try removing the *s in C or the POINTER attribute in Fortran and it will fail to compile). This is because types, much like arrays, are laid out contiguously in memory so they can only contain things that the compiler knows the length of and if you have a type that contains an instance of itself then there would be an infinite regression problem because you don't know how big it is until you have finished creating it and you can't create it until you know how big it is. Pointers are all of a fixed size so they work OK.

The structure as given is for what is technically called a doubly linked list because it contains links both to the next item and the previous item in the list. A singly linked list has each item linked only to the next item in the list. Doubly linked lists have some substantial advantages over singly linked lists, notably that you can go through it from either end, but also you can remove an item from the list needing only the item itself (and the list that it is held in if you have several).

Creating linked lists

Creating a linked list is quite easy. You hold a simple pointer to the first element in the list (generally called the head item) and then you simply create the list going down from that. The key thing is that you have to hook up the prev and next links as you go. This isn't too difficult and looks like


struct llitem{
  int value;
  struct llitem *next;
  struct llitem *prev;

void init_ll(struct llitem * l)
  l-> value = -1;
  l->next = NULL;
  l->prev = NULL;

int main(int argc, char** argv)
  struct llitem *head, *current;
  int i;

  head = malloc(sizeof(struct llitem));
  head->value = 1;
  current = head;
  for (i=0;i<10;++i){
    current->next = malloc(sizeof(struct llitem)); /*Create the next element*/
    init_ll(current->next); /*Initialise it to nullify pointers*/
    current->next->value = current->value + 1; /*Simple counter*/
    current->next->prev = current; /*It's previous pointer should be the current item*/
    current = current->next; /*Now move onwards so the newly created particle is now current*/

  current = head;
    printf("%i\n", current->value);
    current = current->next;


  TYPE :: llitem
    INTEGER :: value = -1
    TYPE(llitem), POINTER :: next => NULL()
    TYPE(llitem), POINTER :: prev => NULL()
  END TYPE llitem

  TYPE(llitem), POINTER :: head, current
  INTEGER :: i

  ALLOCATE(head) !Create the head
  head%value = 1
  current => head
  DO i = 1, 10
    ALLOCATE(current%next) !Create the next element
    current%next%value = current%value + 1
    current%next%prev => current !The next element's previous is the current element
    current => current%next !Now move onwards so the newly created particle is now current

  current => head
    PRINT *,current%value
    current => current%next


This example also shows how you how to step through the linked list from the head, simply by having a "current" pointer that starts at head and is then incremented by setting current = current->next (or current => current%next in Fortran). This can look a bit odd but it isn't that hard to understand. I start by manually creating the "head" element, using either ALLOCATE or malloc. Once I have a head element I then loop through, each time using the same ALLOCATE or malloc command on the "current->next" pointer, creating a new item every time. In C I then call the ll_init function to setup the values of the struct (in Fortran this is done for me since I gave the elements of my TYPE default values). After this the prev and next pointers are both NULL. This is correct for the next pointer becuase my new item is the last item in the list (it won't be next iteration but right now it is), but I have to set the prev pointer. If my new item is the next element in the chain from my current element then the previous element in the chain from my new element must be my current element so I set that up. After that I just have to repeat until I have added enough items.

Part 2 of this will be in a couple of weeks and will describe how you remove and item from a linked list and how to add new items to the middle of a linked list.

May 01, 2019

The Numba "stencil" directive

After a bit of a delay we're getting the blog posts going again with a mention of a slightly odd bit of the Python Numba compiler - the stencil directive. The purpose of Numba is to produce compiled code from Python source that should run at a reasonable fraction of the speed of classical C/Fortran etc. codes. In general it produces codes that is about 20-40% as fast as C or Fortran code, so typically only about 1-2% as fast as the computer can theoretically operate. In general, the more that you can tell Numba in advance about how you are going to use your data the more options it has to optimise the code as it compiles it. The "stencil" directive is used to indicate to Numba that you are going to operate on an array by moving a stencil across it and updating each point using data from neighbouring points.

This is a fairly common thing to want to do and crops up in algorithms from image smoothing to numerical solutions to differential equations so this is a useful bit of the library. As a simple example consider the simplest possible image smoothing algorithm. For each pixel in the image P(i,j) replace the value with the average of the surrounding pixels, so

P'(i,j) = 1/4 * (P(i+1,j) + P(i-1,j) + P(i,j+1) + P(i,j-1))

Note that the left hand side of this equation is P' not P, so when we calculate the average for each pixel using the original values surrounding it not the values that have already been through the averaging process. After you have finished you copy P' to P. There is then one last thing that you have to worry about: what do you do when you reach the edge of the array? Since you are using adjacent cells you have to do something or you will read outside your array. Numba's stencil operator at present only has two options: set the outer cells to be zero or some other constant value. In general, this will mean that you want to have an outer strip of cells added around your image otherwise your image will get smaller as it smooths. These outer cells are often called ghost or guard cells and are also common in numerical solution of differential equations for representing boundary conditions. The code for doing all this in Python is quite simple

def blur(A):
    s = A.shape
    B = np.zeros((s[0],s[1]))
    for i in range(1,s[0]-1): #Range only over the inner cells
        for j in range(1,s[1]-1): #Range only over the inner cells
             B[i, j] = 0.25 * (A[i-1,j] + A[i+1,j] + A[i,j-1] + A[i,j+1])
    return B

This code takes a numpy array and iterates over all but the outer strip of cells in every direction, averages and returns the value. Each call to this function smooths your image a bit (by a radius of about 1 pixel) so in general you'll want to call it several times to smooth your image as much as you want. Running this algorithm on a stock image of 1529 x 2250 pixels on a 3.4GHz processor takes about 3.3 seconds per iteration using the pure Python implementation and 0.006 seconds by using the Numba @njit decorator. For testing we ran 100 iterations of the pure Python code and 1000 iterations of the Numba code. If run to the same number of iterations, the results in both cases is the same

Circuit board image before smoothingCircuit board image after smoothing 1000 times

The simple equivalent using stencil is

def blur(A):
    return 0.25 * (A[-1,0] + A[1,0] + A[0,-1] + A[0,1])

You can see how i-1 becomes just -1 and similarly for other parameters in the stencil, and you can also see how this operation now becomes a one-liner so it's much easier to write. Unfortunately the performance is much worse than the @njit version too taking about 0.22 seconds per iteration, although that is still some 10x faster than the native python performance. Fortunately performance can be improved by calling the stencil from a Numba jit-ed function, so for example

def inner_blur(A):
    return 0.25 * (A[-1,0] + A[1,0] + A[0,-1] + A[0,1])

def blur(A):
    return inner_blur(A)

The result from this method gives performance that is indistinguishable from that of the direct Numba jit version and is still rather shorter. Since these stencils lay out the data dependency you can also set parallel=True in the @njit call and this can be quite succesful but tends to work better for more complex stencils. In this particular case despite showing the Python interpreter apparently using 6 cores solidly the execution speed slows down by a factor of 3.

March 14, 2019

Multiprocessor Profiling

Super quick this time - a bit of background and then something I didn't know worked as well as it does - namely gprof on multi-core programs.


Once code works (and NEVER before) you can start worrying about performance. Often, you can tell using a bit of intuition and a few test runs what takes the most time, especially in simple cases. But this is always a bit risky, because you can easily miss the real performance limiting steps. Sod's law (if it can go wrong, it will) usually means that if something is completely unexpectedly slow, it's either trivial to optimise, because it really oughn't to be slow, or it's almost impossible.

The answer to this is not to guess, but to use one of the many available profiling tools, which tell you, function by function, and line by line, where your code spends it time. You then know where to focus your efforts.

Our upcoming training course (Accelerating Python, 27/3/19, see our calendar) will discuss profiling in detail for Python programs, and there'll be a more specific post after that, so watch this space!

Profiling Compiled Code

For compiled codes, profiling is usually pretty simple, but does require you compile code specially. For C/Fortran etc, there is a very good profiler as part of the gcc toolchain, called gprof. Using gcc, you simple compile your code with `-pg -g`(the second g adds debugging symbols and is only required on older gcc) and run it. This produces a machine-readable file of all the timings for that specific run, usually gmon.out. You can then turn this data into nicer formats using the gprof tool. Something like

gprof <path to program executable> <path to profile file>

You have to provide the program file too so that gprof can report to you in terms of program source lines. Note that strange things happen if the program you give here isn't what you ran (wrong line numbers etc). This is terribly infuriating, so don't do it.

Old Dog New Trick

I have used gprof plenty in the past, but I recently found a really useful way to apply it to multi-core codes (using MPI). This doesn't tend to work very well, because the output from different processes gets all mixed up and the timings end up wonky. It turns out that there's a really, really, simple fix, decribed e.g. hereand reproduced below.

Firstly, you need to get the profiling data on a per-process basis. As the link above says, this isn't very well documented, but is easy to do. Before running, set the environment variable GMON_OUT_PREFIX. This has to be done where the code is running, so if on a cluster etc, you'll probably need to add an export or setenv to your submission script, and/or use the '-x' argument to mpiexec (see below). So we do something like

export GMON_OUT_PREFIX=gmon.out-

When the code runs, it will produce one file per process (so be very careful on a cluster not to overwhelm your home space) named 'gmon.out-<PID>' where PID is the process ID. In theory it is possible to have the files named by MPI rank, but apparently this doesn't work well.

Now you run the code as usual, although on some systems you may need to make sure MPI run is fed the above environment variable. You can do this using the '-x <env var>' flag to mpiexec. On a local machine, as long as you exported the variable, this shouldn't be needed.

Now you want to produce the function or line level profiles using the sum of all of them. You can supply all of the files to gprof, or you can have gprof sum them into a single file, by default gmon.sum, using

gprof myprog -s gmon.out-*

Make sure the wildcard matches the right set of files - when I first tried this I forgot to clean out the files from previous runs and ended up summing across runs, with very odd results. Of course, you can use this to sum across runs to very useful effect if you want to see average profiles across multiple sets of parameters etc.

Now you can run gprof as usual, using either

gprof <path to executable> gmon.out-*


gprof <path to executable> gmon.sum

and voila, properly added up profiling information for multiple cores. Super.

February 27, 2019

Data structures – Stacks

Another fairly simple data structure this week - the stack. The stack is basically the reverse queue. Rather than the first item to arrive being the first to be served as with a queue in a stack the last item to arrive is the first to be served. The name comes from the idea of a stack of paper where you add pages at the top and then take them off the top again when you want them. Formally this behaviour is called LIFO (Last In First Out) as opposed to the FIFO (First In First Out) queue.

As you might imagine this is quite a restrictive data structure and the classical stack actually only has two operations

  1. push - add an element to the top of the stack
  2. pop - take the top element off the stack and give it to you

You'll notice that pop and push are inverses since pop removes the top element from the stack and gives it back to you. This isn't always what you want since you might find that you have to simply push the element back onto the stack if you can't deal with it yet. To make this easier a lot of stack implementations also have a snoop operation which tells you what is on the top of the stack without actually taking it off. What you never have in a true stack is a way of accessing an arbitrary element in the stack. If you can do that then you have an array that you are accessing in a "stack-like" way. This is also something that people quite often want to do, so you find that lists in Python have a "pop" method to remove the rightmost element (although the equivalent of "push" is called "append" for a Python list, probably because it is used for a lot more than making stacks).

Implementations of stacks are generally very simple. The simplest implementation is just an array with an index to the first free element in the stack. When an element is pushed it is added at that location and the index is moved on by one element. When a pop happens the element behind the index is given to the user and the index is moved back by one.

Stack Example

This will all work perfectly well until you run out of space in your array when, much as with the queue, you have to do something else. You can reallocate your stack to give yourself more space, copying the existing elements as you go, you can implement a stack using a linked list (which we will cover in a later post) which means that you won't run out of stack until you run out of memory or you can refuse to accept the new item an return an error. If you do nothing and accept the new item you access memory that you are not supposed to be using at all which can cause all kinds of problems from crashes to possible security exploits and this is called a buffer overrun, which might be familiar with as one of the most common things that are listed as the cause of security problems in software.

At this point it is worth drawing a distinction between the stack as a data structure and all of the other things that are called stacks. There are call stacks that were mentioned in the last blog post which described how a program got to be in the place where it is and there is an associated CPU feature called a "stack register" that holds this information. There is "stack memory" which together with "heap memory" make up the most common way of dealing with memory management in compiled codes and there are plenty of other stacks that turn up all over the place. In general they get the name because they implement a stack data structure (sometimes they used to be a stack back when the name was given to them but no longer do) and otherwise are not really related. You have to be rather careful because lots of things are called stacks and that name itself doesn't alway tell you very much. Always read the documentation!

A question that might come to mind is "what would I ever do with a stack?" They do seem rather less use than a queue and in practice they are mostly fairly "low-level" data structures that most people don't encounter much, but you can easily enough think of cases where they are useful. Their most useful feature is as a "memory" of a sequence of events that can then be undone. Many "undo" systems in software use stacks behind the scenes for exactly this reason. For the same reason they are often used in "depth-first search" algorithms which were first investigated (and still used!) as ways of solving mazes. They also crop up in algorithms to parse mathematical and programming languages (a good example being the Shunting Yard Algorithm, named after train shunting yards. You can easily see the similarity between a stack as described here and a train siding because you can only get carriages off a siding in the reverse of the order you put them in on). But in general you probably won't need to implement your own stacks very often but it is worth knowing about them because they are so ubiquitous in the lower layers underpinning all of modern software development.

February 13, 2019

Again and again and again

"Flow control" means all the ways you can control what your program does, both now and next. Conditionals, loops, function calls all count. Exceptions (throw, raise etc) may or may not - whether its OK to use exceptions as flow control or whether they're meant for, well, exceptional occurences (not necessarily rare, but something that can't be handled by the current piece of code) is a seriously vexed question.


Loops are usually the first or second control option to be taught, and take two general forms, the 'for' type loop and the 'while' type loop. Different languages use different words, but the first one is meant to do something a certain number of times, and this number is known when the loop starts. The second is meant to do something 'until told to stop'. This is not a hard distinction though. A while-type loop can always mimic a for-type loop, and the reverse is also true (although sometimes considered to be inelegant and/or error prone).

Usually, these loops look like

For index = start to end
   {loop body}
End for


While condition
  {loop body}
End while

The condition can be as complicated as you like, it just has to evaluate to either True or False. It can use all the variables you might be changing in the loop, etc. So we create the other kind of loop something like this:

For index = start to max_iterations
  {loop body}
  If condition exit_loop
End for


index = start
While true //This means keep looping forever
  {loop body}
  If index > end exit_loop
   index = index + 1
End while

Note that whether we get (end-start) iterations of a for loop, or (end-start+1) or (end-start-1) can vary by language, but we can easily adjust to match.


The other way of doing something many times, is 'recursion'. This often gets classed as 'super advanced and difficult' for some reason, but is mostly quite simple. First though, we need to know a tiny bit about functions and how they're called.


Scope is very, very important: every variable, function etc within a program has a scope. For variables this means "parts of your code which can use this variable (bit of memory) with this name". For functions, it means "parts of your code which can use this function with this name". There's a few subtleties beyond that, but for now, this will do.

So, what scopes are there? A variable defined inside a function is only usable within that function: it is scoped to that function, or has 'local scope'. A variable defined globally (outside any functions, including main) has 'global scope' and is available everywhere. Do note that a variable defined in 'main' is only available in 'main' and NOT in any functions 'main' may call, as main is a function like any other.

Most languages also have an idea of 'block scope' where 'blocks' (in C anything inside curly braces {}) can contain variable declarations, which are only available inside the block. This can cause some particularly confusing errors, such as when you try and do the following:

while i < 10 
  int i
end while

which will not compile unless there is already a variable, called i, and the one you declare inside the loop then 'shadows' this - inside the loop i refers to one variable, outside the loop, including the loop condition line, i refers to something different. If this isn't completely clear, the following example should help:

string name = 'Nobody'
int i
for i = 1 to 3
  string name = get_string_from_user() 
  print i, name
end for
print 'You entered', name

which gets 3 names from the user an outputs something like:

1 Bill Bailey
2 Madonna
3 Engelbert Humperdinck
You entered Nobody

Another tempting thing is to try

if condition then
  int i = 0
  long int i = 1
end if
print i

which again has either an undefined 'i' or a shadowing problem. There is no way to get a different type for i using an if like this, and with good reason. i's type could only be determined in general when the program runs - so how much storage should be given for it, and can it be passed to any given function?

Function scope has one more really important thing though - each call to a function is a new scope. The variables you used last time do not keep their values.


In Fortran there is one really important idea called the 'SAVE' attribute. A variable in a function (or module) can be given this, as e.g. "INTEGER, SAVE :: a" and the value of 'a' will be kept from call to call. This is very useful. BUT there is a catch. Any variable declared and defined in a single line in a Fortran function is given the SAVE attribute. So, if you do something like `INTEGER :: alpha = 0`, declaring an Integer alpha and in the same line defining it, alpha is set to zero ONLY the first time the function is called. Subsequent calls will inherit whatever value alpha had last time. This is rarely what you intended. Be careful!

Call stack

When you write code to call a function, the computer has to stop what its currently doing, and enter a new scope containing only the variables available inside the function. It also has to remember where it should go back to after the function ends. This is done using the 'call stack'.

We haven't talked about 'stacks' as a data structure yet (coming soon) but we did mention here that they're a "last-in-first-out" structure where the last thing you add to the stack (think of a stack of papers or books) is the first one you take off. Each time you call a function, you add an entry to the stack, and when you return this is 'popped off' and the stack shrinks. Each entry is called a 'frame'.

The stack frame usually contains the location to return to, and also memory for all of the local variables in a function. It often also has space to hold all the parameters passed to the function and sometimes a few other bits of operational stuff. When a function is called, a frame is created with all this in, and when it returns this is destroyed.

We mentioned above that variables inside a function are available only inside it, but we didn't ask what happens if we call a function from within itself. We've seen that between calls to a function the values are 'reset' or lost, and having read the previous paragraph you probably guess that this is both because and why the stack frame gets destroyed.

Now, you might suggest that you could always make sure every call to a given function shares the same variables, but if you've ever used a function pointer you know that you can call a function without ever using its name at the place the call actually happens, so this isn't practical. So, each call to a function, any function, creates a stack frame containing all its local variables, and calling a function from within itself makes two, independent, sets of all the local variables, that know nothing about each other.

My First Recursive function is My First Recursive function is My First....

So what is recursion then? "Recursion occurs when a thing is defined in terms of itself or of its type." (Wikipedia, Recursion) For a function, recursion means having the function call itself. In maths, the factorial function of a number which is the product of all positive integers up to it. So we see immediately that factorial(n) is n*factorial(n-1), which we'd code up something like this:

function factorial(integer n)
  return factorial(n-1)*n
end function

We can pretty immediately see a problem there though - how does the chain ever end? We need something which is not recursive or we'll go on calling forever. This is called the 'base case' and for factorial its obvious from how we said 'positive numbers'. The function above won't stop when n = 1 and it should. So what we actually want is:

function factorial(integer n)
  if n > 1
    return factorial(n-1)*n
    return 1
  end if
end function

Follow this by hand, on paper, for a starting n of say 4. We enter factorial(4), which enters factorial(3), which enters... until we reach factorial(1), which immediately returns '1' to the layer above, factorial(2), which multiplies this by '2' to get '2' and returns this to factorial(3) and so on.

Each Call is Its Own Scope

Remember when looking at recursive functions that each layer of call is a separate scope, with a separate copy of any variables you may define. Anything which needs to go between the layers has to be passed as an argument.

Step by step by step

So we have these two ideas that both let us keep going until we reach some condition, namely recursion and a while loop: what's the difference. There isn't one. Anything you can do with a loop you can do with recursion and vice versa. There are differences in elegance, and often one is a better choice, but not more. In fact some functional languages don't have any concept of the loop, relying solely on recursion. Mostly, elegantly recursive problems are better written as 'while' type loops and rarely as 'for' type loops, because the base case is the same as the loop-stop condition. Some recursive problems, usually those involving trees, are very hard to do elegantly with a loop.


Most of programming is about working out the sequence of steps to get from A to B, so that what you actually code is just a series of things, one after the other. Sometimes these steps are completely independent, and sometimes they aren't but they always (in a single-threaded program) run one after the other. We're always having to think about things, not in terms of the big picture, but just in terms of getting from here to there, ignoring how to get here in the first place.

In maths, one of the simplest methods of proof is called 'induction' which is closely related to recursion. Rather than try and prove the 'whole of a thing' we say 'if it was true for a smaller thing, can we show it's true for the next larger thing?' and then we say 'can we prove its true for the smallest thing?'. If we can do both of these, we've shown its true. As used here a common example is climbing a ladder. We say 'can we climb onto the bottom rung?' and we say 'can we climb onto the next higher rung than we're on?' and if so, we can climb any ladder.

Sometimes, a slightly stronger assumption is used where we instead say 'if we have reached every rung below and including the one we're on, can we reach the next one'. This is actually equivalent, but is sometimes a more useful phrasing.

Proofs by induction only work if it doesn't matter which rung we're on to climb to the next one. We don't ever have to reach rung 53 to know we can reach rung 54. If you have a problem which is easy to think about this way, then it is a prime candidate for programming recursively. The step is always the part going from 'here' to the next 'there', and the base case is how you get to the first 'here'.

Problems with Recursion

Stack Overflow

The 'call stack' we've been talking about is the inspiration for the programming forum Stack Overflow which is probably the most encountered error when programming recursively. Each function call creates a call stack frame and there is a limit to how much memory is available for this. If you forget or mis-program the base case, your recursive function never stops calling itself, until it has filled the call stack and your program crashes horribly with a stack overflow.

The other common way to get a stack overflow is creating large temporaries inside functions since these are all part of the stack. Hopefully more on that soon.

Function Parameters

Secondly, recursion can be a bit tricky to actually set up. Our factorial function was nice and simple, with a single parameter, and a single return value. But what if we have more than one parameter? For instance, a binary search can be done nicely in recursive fashion. Each step is about deciding which half, upper or lower, our target is in, and passing only this half on to the next step, and the base case is when this has length 1. Here though, you want to pass at least two items - the segment of list, and the target value, and you want to return either true or false, or the index the target was found at. If returning the index as an offset into the passed segment, you then have to adjust this at each step so that you end up with the index in the original, complete list. This can get mucky.

Excess Work

The other common example used for a recursive operation is the Fibonacci sequence where the nth value is the sum of the (n-1)th and the (n-2)th. Usually, the first two values are 1, so the sequence goes 1, 1, 2, 3, 5, 8, 13 etc. It's not hard to write a recursive version of this, e.g.

function fibonacci(n)
  if n eq 1 or n eq 2 
    return 1
    return fibonacci(n-1) + fibonacci(n-2)
  end if

but if we work through this on paper for say n =5 we find that we calculate the n=4 case once, the n=3 case twice, the n=2 case three times and the n=1 case twice.

A Python version of this and my model answers to the challenges are here.

Challenge: what's the rule in general for how many times fib(m) is called for each m < n?

Challenge 2: rewrite this recursively with exactly n-1 calls to calculate fibonacci(n). Hint after the post, or solutions at the Github link above.

If we're not careful, we might never notice all the extra work we're doing, which we could avoid. In this case, there's a big hint that something is funny at the point where we put two values of n into our base case.

Challenge 3:

Some image sharing sites now try to use a few real words to create memorable random urls. If you're given n lists of words, can you write loop based and recursive variants of the code to create every combination of one word from each list? Your code should work for any value of 'n'.

For example [large, small] [radiant, lame] [picnic, bobcat] should give (order not important) large-radiant-picnic, small-lame-bobcat, small-radiant-bobcat etc etc.

Small hint below the post

Moral of this Post

The takeaway from this one is that recursion isn't scary if you just think about getting from here to there, pretending all the business to get 'here' has been dealt with. Never mind the rest of the ladder, just think about the next rung. This is one of the vital skills to develop as a programmer, on every scale. Break things down into manageable steps and then build them into a program.

Keep scrolling for the hints....

Hint 1: can you return both the (n-1) and the (n-2) values?

Hint 2: the list created by combining lists 1 and 2 is itself a list of words. 3 lists is just 2 lists, and then another list.

May 2024

Mo Tu We Th Fr Sa Su
Apr |  Today  |
      1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31      

Search this blog



Blog archive

RSS2.0 Atom
Not signed in
Sign in

Powered by BlogBuilder