All 7 entries tagged Soup

View all 8 entries tagged Soup on Warwick Blogs | View entries tagged Soup at Technorati | There are no images tagged Soup on this blog

July 24, 2019

SOUP: The FizzBuzz Variations

I like challenges - sometimes I do maths and programming puzzles for fun. It's good practice, and if you go into industry as a programmer you may well get hit with this sort of thing in an interview. Interviewers want to verify that somebody can vaguely code before moving on to more detailled questions, so often set something simple, but originally, novel. The novelty has rather been eroded now, so you can find entire sites dedicated to compiling these questions and drilling them - which we definitely don't recommend here - but they are a good source of challenges.

One classic "challenge" is FizzBuzz. You may have encountered this as a game - I remember it from Secondary School French lessons, to help learn the names of numbers. The basic idea is very simple - you go through reading out the numbers, usually from 1 to 100, and for every multiple of three, substitute the word "fizz", for every multiple of five, the word "buzz" and for multiples of three and five (i.e. fifteen) "fizzbuzz".

For anybody who has done maths past GCSE or so, you've probably met Modulo division in some form. But the best thing about FizzBuzz is that you don't need to have. In fact, there is a complete solution without using anything except addition and equality-checking. Moreover, the core problem is nice and simple, so anybody should be able to create the pseudo-code version even if they have no idea how to check for being a multiple of three or five.

I once read an article from somebody who thought that FizzBuzz was a terribly difficult problem that needed high-level maths to solve. In the comments was somebody claiming that anybody who could just write out the code must have spent far too long drilling interview questions. That seemed a little absurd, but it did make me want to produce a FizzBuzz solution which did indicate spending far too much time thinking about it. Just to see if it was possible.

More seriously though, simple problems like this can be made into a great little challenge. There are many solutions, and you can make the problem almost arbitrarily hard by imposing artificial restrictions. You can always verify the answer, and you can get surprisingly creative. I've compiled a bunch of variations on FizzBuzz, from the outline, to the "cleverest" solution I could create, to illustrate the learning opportunity of this sort of thing. They are all on Github here.

FizzBuzz outline code

Anybody who can program should be able to turn the description above into psuedo-code, which is itself a handy skill. Here, all you have to do is expand the words into something slightly more precise. For example:

loop variable i from 1 to 100
  if i divides by 3, but not 5, print "fizz"
  else if i divides by 5, but not 3, print "buzz"
  else if i divides by 3 and 5, print "fizzbuzz"
  else print i
end loop

So, here is the first place where knowing some maths will help us: I wrote those three branches very carefully to be independent. But they don't need to be! I could just do this:

if i divides by 3: 
  print "fizz"
  set flag indicating not to print i 
if i divides by 5:
  print "buzz"
  set flag indicating not to print i 
if not flag:
  print i

Note that this is not necessarily better! It's different, sure, but to get this to work I need to add an extra flag variable, and have three independent "if"s rather than one "if-else" chain. We can work around this pretty easily, by constructing a string rather than printing it directly, such as (I have this in the repo in Fortran as StringConstructionFizzBuzz.f90)

string = "" 
if i divides by 3: 
  string = "fizz"
if i divides by 5:
  string += "buzz"
if string =="":
  string = str(i)

Before going on to real code, note the other possible approach, if we didn't have a handy way of checking for divisibility:

loop variable i from 1 to 100 
  increment counter_3s 
  increment counter_5s 
  if counter_3s equals 3
    print "fizz"
    set flag_not_print_i
    counter_3s = 1
  if counter_5s equals 5
    print "buzz"
    set flag_not_print_i
    counter_5s = 1
  if not flag_not_print_i
    print i

Both of these "stock solutions" are in the repo in C, Fortran and Python as FizzBuzzModulo and FizzBuzzCounter.

FizzBuzz as a challenge part 1 - pseudo codes

So, what might we come up with as a deliberate variation on this? Well first off, lets fall back onto the "You Aren't Going to Need It" and assume that 3, 5, and 100 are absolutely fixed. Then we have a super simple solution:

print 1
print 2 
print "fizz" 
print 4 
print "buzz"

Simple, and it works. But it's seriously inflexible and we have to type a lot!

But wait a moment! There's a pattern here, isn't there, and it repeats every 15 lines. So what about

i = 1
while i is less than 100
  print i
  increment i
  print i
  increment i
  print "fizz"
  increment i
  print i
  increment i
  print "buzz"
  print i
  increment i
  print "fizzbuzz"
  increment i
end while

Where the ellipsis (...) is, we have enough lines that i ends up as 15 on iteration 1. The nice thing about this, is that we've relaxed the requirement that the end be at exactly 100. Except we have a rather large bug here: we get a multiple of 15 lines, every time. We'd need to find another way of stopping on the last round. In the GitHub repo for this post, I have a silly example using Bash taking this approach ( It is undoubtedly silly, but this sort of approach is a pretty long way from the obvious first idea, which was after all the intention.

Note the biggest downsides of this code: it is not obvious what it's doing because it is unpacked a little too much, and the 3 and 5 are baked right into the code.

Fizz Buzz Without If

That last example was my first attempt at writing FizzBuzz without using "if", which technically it did achieve. However, "while" can trivially be used to replace if (assuming a language or variant that runs 0 times if the condition is untrue). We can do better than that, surely?

Of course we can, but first we need to think hard about what "if" is doing for us here.

Understanding What If Does

First, we need to find a way of describing the action of "if" without using the word "if". (Note that modern processors actually have inbuilt instructions called "jump-if" so this isn't strictly true, but helps us find a way to code the problem without using "if"). Note that "on" counts as a synonym here.

Consider a simple one line if chain such as

1 if (condition)
2  do thing one
3 else
4  do thing two
5 end if

If the condition is true, we end up on line 2, otherwise we end up on line 4. Then, we end up on line 5 and carry on. So, putting on our geeky hats and trying to be clever, we might think "aha! We end up on line (2 + 0 (true)|2 (false)).

The next step is far more intuitive if you come from C, or one of the other languages where true is just a number. In C, we can write something like

index = 2 + 2 * (condition != true) and get 2 if the condition is true, and 4 if its not, because "true" has value of 1, and false has 0.

Using an actual "goto" instruction is, of course, evil (not entirely true, but good enough), and in this case would be the essence of solving the letter of the problem, rather than the spirit. We need to think more generally.

Function Pointers

I have written before about using function pointers to replace "if" conditions. In this case we can't set a pointer beforehand, but we have just worked out how to replicate the if chains in our original pseudocode. So, what if we made an array, and used something like the expression above to calculate the offset into it, and then called the function we find there?

This variant works pretty well in C, but in Fortran, where Logicals are a distinct kind of variable, it is very clumsy to calculate the offsets, but can be done. This example is in the repo as FunctionPointerFizzBuzz.f90, .c and .py.

Once again, this a fairly silly way to solve FizzBuzz itself, but does use a technique that can be very very handy! Practice with this sort of thing on a silly example is a great way to learn, and if you keep these sorts of snippets around, you have something to refer to when you need to use the technique in anger later.

FizzBuzz for C programmers

The last example, which is the cleverest I could manage to be about this, is pretty deeply based on C semantics. It relies on both how C handles it's strings, and how its pointers work. I'm not going to explain what is going on, but I thought of it after trying to simplify the function pointers example because of how similar the print commands were. This example is in the repo as SillyStringFizzBuzz.c

This example really goes overboard on the cleverness and is not an approach I would ever use in "real code", but it was fun to write and a good challenge. I think it meets my original challenge pretty well - it is FizzBuzz being entirely too clever. If this sort of things appeals, look up the idea of Code Golf and/or Obfuscated C - coding answers to puzzles which are very very short and hard to understand.

That might sound exceptionally silly - why would anybody want to program in a way that's hard to understand? But of course, knowing what's hard to understand really helps know what makes things easy, and brings to your attention all of the sneaky things that might not do what you first think.

Wrap Up

To conclude - there can be a lot to learn even from super simple seeming challenges. In particular, you can lay the challenge of not using some code feature, especially one you first turn to to solve the problem normally. Developing the flair for thinking "outside the box" is very handy when you have such challenges imposed externally.

Cleverness that makes you say "oh that's clever" is often not a good thing - your first reaction should probably include "oh OK, I see how it works" before "that's clever". But developing clever solutions is a good way to learn. Think of it like something like keepy-uppies for a football (soccer) player. They're not a skill that's useful in a game (the Wikipedia anecdote about Scotland not-withstanding) but they develop control skills that absolutely are. It's also a handy display piece to demonstrate those skills, should you need one.

May 01, 2019

The Numba "stencil" directive

After a bit of a delay we're getting the blog posts going again with a mention of a slightly odd bit of the Python Numba compiler - the stencil directive. The purpose of Numba is to produce compiled code from Python source that should run at a reasonable fraction of the speed of classical C/Fortran etc. codes. In general it produces codes that is about 20-40% as fast as C or Fortran code, so typically only about 1-2% as fast as the computer can theoretically operate. In general, the more that you can tell Numba in advance about how you are going to use your data the more options it has to optimise the code as it compiles it. The "stencil" directive is used to indicate to Numba that you are going to operate on an array by moving a stencil across it and updating each point using data from neighbouring points.

This is a fairly common thing to want to do and crops up in algorithms from image smoothing to numerical solutions to differential equations so this is a useful bit of the library. As a simple example consider the simplest possible image smoothing algorithm. For each pixel in the image P(i,j) replace the value with the average of the surrounding pixels, so

P'(i,j) = 1/4 * (P(i+1,j) + P(i-1,j) + P(i,j+1) + P(i,j-1))

Note that the left hand side of this equation is P' not P, so when we calculate the average for each pixel using the original values surrounding it not the values that have already been through the averaging process. After you have finished you copy P' to P. There is then one last thing that you have to worry about: what do you do when you reach the edge of the array? Since you are using adjacent cells you have to do something or you will read outside your array. Numba's stencil operator at present only has two options: set the outer cells to be zero or some other constant value. In general, this will mean that you want to have an outer strip of cells added around your image otherwise your image will get smaller as it smooths. These outer cells are often called ghost or guard cells and are also common in numerical solution of differential equations for representing boundary conditions. The code for doing all this in Python is quite simple

def blur(A):
    s = A.shape
    B = np.zeros((s[0],s[1]))
    for i in range(1,s[0]-1): #Range only over the inner cells
        for j in range(1,s[1]-1): #Range only over the inner cells
             B[i, j] = 0.25 * (A[i-1,j] + A[i+1,j] + A[i,j-1] + A[i,j+1])
    return B

This code takes a numpy array and iterates over all but the outer strip of cells in every direction, averages and returns the value. Each call to this function smooths your image a bit (by a radius of about 1 pixel) so in general you'll want to call it several times to smooth your image as much as you want. Running this algorithm on a stock image of 1529 x 2250 pixels on a 3.4GHz processor takes about 3.3 seconds per iteration using the pure Python implementation and 0.006 seconds by using the Numba @njit decorator. For testing we ran 100 iterations of the pure Python code and 1000 iterations of the Numba code. If run to the same number of iterations, the results in both cases is the same

Circuit board image before smoothingCircuit board image after smoothing 1000 times

The simple equivalent using stencil is

def blur(A):
    return 0.25 * (A[-1,0] + A[1,0] + A[0,-1] + A[0,1])

You can see how i-1 becomes just -1 and similarly for other parameters in the stencil, and you can also see how this operation now becomes a one-liner so it's much easier to write. Unfortunately the performance is much worse than the @njit version too taking about 0.22 seconds per iteration, although that is still some 10x faster than the native python performance. Fortunately performance can be improved by calling the stencil from a Numba jit-ed function, so for example

def inner_blur(A):
    return 0.25 * (A[-1,0] + A[1,0] + A[0,-1] + A[0,1])

def blur(A):
    return inner_blur(A)

The result from this method gives performance that is indistinguishable from that of the direct Numba jit version and is still rather shorter. Since these stencils lay out the data dependency you can also set parallel=True in the @njit call and this can be quite succesful but tends to work better for more complex stencils. In this particular case despite showing the Python interpreter apparently using 6 cores solidly the execution speed slows down by a factor of 3.

February 13, 2019

Again and again and again

"Flow control" means all the ways you can control what your program does, both now and next. Conditionals, loops, function calls all count. Exceptions (throw, raise etc) may or may not - whether its OK to use exceptions as flow control or whether they're meant for, well, exceptional occurences (not necessarily rare, but something that can't be handled by the current piece of code) is a seriously vexed question.


Loops are usually the first or second control option to be taught, and take two general forms, the 'for' type loop and the 'while' type loop. Different languages use different words, but the first one is meant to do something a certain number of times, and this number is known when the loop starts. The second is meant to do something 'until told to stop'. This is not a hard distinction though. A while-type loop can always mimic a for-type loop, and the reverse is also true (although sometimes considered to be inelegant and/or error prone).

Usually, these loops look like

For index = start to end
   {loop body}
End for


While condition
  {loop body}
End while

The condition can be as complicated as you like, it just has to evaluate to either True or False. It can use all the variables you might be changing in the loop, etc. So we create the other kind of loop something like this:

For index = start to max_iterations
  {loop body}
  If condition exit_loop
End for


index = start
While true //This means keep looping forever
  {loop body}
  If index > end exit_loop
   index = index + 1
End while

Note that whether we get (end-start) iterations of a for loop, or (end-start+1) or (end-start-1) can vary by language, but we can easily adjust to match.


The other way of doing something many times, is 'recursion'. This often gets classed as 'super advanced and difficult' for some reason, but is mostly quite simple. First though, we need to know a tiny bit about functions and how they're called.


Scope is very, very important: every variable, function etc within a program has a scope. For variables this means "parts of your code which can use this variable (bit of memory) with this name". For functions, it means "parts of your code which can use this function with this name". There's a few subtleties beyond that, but for now, this will do.

So, what scopes are there? A variable defined inside a function is only usable within that function: it is scoped to that function, or has 'local scope'. A variable defined globally (outside any functions, including main) has 'global scope' and is available everywhere. Do note that a variable defined in 'main' is only available in 'main' and NOT in any functions 'main' may call, as main is a function like any other.

Most languages also have an idea of 'block scope' where 'blocks' (in C anything inside curly braces {}) can contain variable declarations, which are only available inside the block. This can cause some particularly confusing errors, such as when you try and do the following:

while i < 10 
  int i
end while

which will not compile unless there is already a variable, called i, and the one you declare inside the loop then 'shadows' this - inside the loop i refers to one variable, outside the loop, including the loop condition line, i refers to something different. If this isn't completely clear, the following example should help:

string name = 'Nobody'
int i
for i = 1 to 3
  string name = get_string_from_user() 
  print i, name
end for
print 'You entered', name

which gets 3 names from the user an outputs something like:

1 Bill Bailey
2 Madonna
3 Engelbert Humperdinck
You entered Nobody

Another tempting thing is to try

if condition then
  int i = 0
  long int i = 1
end if
print i

which again has either an undefined 'i' or a shadowing problem. There is no way to get a different type for i using an if like this, and with good reason. i's type could only be determined in general when the program runs - so how much storage should be given for it, and can it be passed to any given function?

Function scope has one more really important thing though - each call to a function is a new scope. The variables you used last time do not keep their values.


In Fortran there is one really important idea called the 'SAVE' attribute. A variable in a function (or module) can be given this, as e.g. "INTEGER, SAVE :: a" and the value of 'a' will be kept from call to call. This is very useful. BUT there is a catch. Any variable declared and defined in a single line in a Fortran function is given the SAVE attribute. So, if you do something like `INTEGER :: alpha = 0`, declaring an Integer alpha and in the same line defining it, alpha is set to zero ONLY the first time the function is called. Subsequent calls will inherit whatever value alpha had last time. This is rarely what you intended. Be careful!

Call stack

When you write code to call a function, the computer has to stop what its currently doing, and enter a new scope containing only the variables available inside the function. It also has to remember where it should go back to after the function ends. This is done using the 'call stack'.

We haven't talked about 'stacks' as a data structure yet (coming soon) but we did mention here that they're a "last-in-first-out" structure where the last thing you add to the stack (think of a stack of papers or books) is the first one you take off. Each time you call a function, you add an entry to the stack, and when you return this is 'popped off' and the stack shrinks. Each entry is called a 'frame'.

The stack frame usually contains the location to return to, and also memory for all of the local variables in a function. It often also has space to hold all the parameters passed to the function and sometimes a few other bits of operational stuff. When a function is called, a frame is created with all this in, and when it returns this is destroyed.

We mentioned above that variables inside a function are available only inside it, but we didn't ask what happens if we call a function from within itself. We've seen that between calls to a function the values are 'reset' or lost, and having read the previous paragraph you probably guess that this is both because and why the stack frame gets destroyed.

Now, you might suggest that you could always make sure every call to a given function shares the same variables, but if you've ever used a function pointer you know that you can call a function without ever using its name at the place the call actually happens, so this isn't practical. So, each call to a function, any function, creates a stack frame containing all its local variables, and calling a function from within itself makes two, independent, sets of all the local variables, that know nothing about each other.

My First Recursive function is My First Recursive function is My First....

So what is recursion then? "Recursion occurs when a thing is defined in terms of itself or of its type." (Wikipedia, Recursion) For a function, recursion means having the function call itself. In maths, the factorial function of a number which is the product of all positive integers up to it. So we see immediately that factorial(n) is n*factorial(n-1), which we'd code up something like this:

function factorial(integer n)
  return factorial(n-1)*n
end function

We can pretty immediately see a problem there though - how does the chain ever end? We need something which is not recursive or we'll go on calling forever. This is called the 'base case' and for factorial its obvious from how we said 'positive numbers'. The function above won't stop when n = 1 and it should. So what we actually want is:

function factorial(integer n)
  if n > 1
    return factorial(n-1)*n
    return 1
  end if
end function

Follow this by hand, on paper, for a starting n of say 4. We enter factorial(4), which enters factorial(3), which enters... until we reach factorial(1), which immediately returns '1' to the layer above, factorial(2), which multiplies this by '2' to get '2' and returns this to factorial(3) and so on.

Each Call is Its Own Scope

Remember when looking at recursive functions that each layer of call is a separate scope, with a separate copy of any variables you may define. Anything which needs to go between the layers has to be passed as an argument.

Step by step by step

So we have these two ideas that both let us keep going until we reach some condition, namely recursion and a while loop: what's the difference. There isn't one. Anything you can do with a loop you can do with recursion and vice versa. There are differences in elegance, and often one is a better choice, but not more. In fact some functional languages don't have any concept of the loop, relying solely on recursion. Mostly, elegantly recursive problems are better written as 'while' type loops and rarely as 'for' type loops, because the base case is the same as the loop-stop condition. Some recursive problems, usually those involving trees, are very hard to do elegantly with a loop.


Most of programming is about working out the sequence of steps to get from A to B, so that what you actually code is just a series of things, one after the other. Sometimes these steps are completely independent, and sometimes they aren't but they always (in a single-threaded program) run one after the other. We're always having to think about things, not in terms of the big picture, but just in terms of getting from here to there, ignoring how to get here in the first place.

In maths, one of the simplest methods of proof is called 'induction' which is closely related to recursion. Rather than try and prove the 'whole of a thing' we say 'if it was true for a smaller thing, can we show it's true for the next larger thing?' and then we say 'can we prove its true for the smallest thing?'. If we can do both of these, we've shown its true. As used here a common example is climbing a ladder. We say 'can we climb onto the bottom rung?' and we say 'can we climb onto the next higher rung than we're on?' and if so, we can climb any ladder.

Sometimes, a slightly stronger assumption is used where we instead say 'if we have reached every rung below and including the one we're on, can we reach the next one'. This is actually equivalent, but is sometimes a more useful phrasing.

Proofs by induction only work if it doesn't matter which rung we're on to climb to the next one. We don't ever have to reach rung 53 to know we can reach rung 54. If you have a problem which is easy to think about this way, then it is a prime candidate for programming recursively. The step is always the part going from 'here' to the next 'there', and the base case is how you get to the first 'here'.

Problems with Recursion

Stack Overflow

The 'call stack' we've been talking about is the inspiration for the programming forum Stack Overflow which is probably the most encountered error when programming recursively. Each function call creates a call stack frame and there is a limit to how much memory is available for this. If you forget or mis-program the base case, your recursive function never stops calling itself, until it has filled the call stack and your program crashes horribly with a stack overflow.

The other common way to get a stack overflow is creating large temporaries inside functions since these are all part of the stack. Hopefully more on that soon.

Function Parameters

Secondly, recursion can be a bit tricky to actually set up. Our factorial function was nice and simple, with a single parameter, and a single return value. But what if we have more than one parameter? For instance, a binary search can be done nicely in recursive fashion. Each step is about deciding which half, upper or lower, our target is in, and passing only this half on to the next step, and the base case is when this has length 1. Here though, you want to pass at least two items - the segment of list, and the target value, and you want to return either true or false, or the index the target was found at. If returning the index as an offset into the passed segment, you then have to adjust this at each step so that you end up with the index in the original, complete list. This can get mucky.

Excess Work

The other common example used for a recursive operation is the Fibonacci sequence where the nth value is the sum of the (n-1)th and the (n-2)th. Usually, the first two values are 1, so the sequence goes 1, 1, 2, 3, 5, 8, 13 etc. It's not hard to write a recursive version of this, e.g.

function fibonacci(n)
  if n eq 1 or n eq 2 
    return 1
    return fibonacci(n-1) + fibonacci(n-2)
  end if

but if we work through this on paper for say n =5 we find that we calculate the n=4 case once, the n=3 case twice, the n=2 case three times and the n=1 case twice.

A Python version of this and my model answers to the challenges are here.

Challenge: what's the rule in general for how many times fib(m) is called for each m < n?

Challenge 2: rewrite this recursively with exactly n-1 calls to calculate fibonacci(n). Hint after the post, or solutions at the Github link above.

If we're not careful, we might never notice all the extra work we're doing, which we could avoid. In this case, there's a big hint that something is funny at the point where we put two values of n into our base case.

Challenge 3:

Some image sharing sites now try to use a few real words to create memorable random urls. If you're given n lists of words, can you write loop based and recursive variants of the code to create every combination of one word from each list? Your code should work for any value of 'n'.

For example [large, small] [radiant, lame] [picnic, bobcat] should give (order not important) large-radiant-picnic, small-lame-bobcat, small-radiant-bobcat etc etc.

Small hint below the post

Moral of this Post

The takeaway from this one is that recursion isn't scary if you just think about getting from here to there, pretending all the business to get 'here' has been dealt with. Never mind the rest of the ladder, just think about the next rung. This is one of the vital skills to develop as a programmer, on every scale. Break things down into manageable steps and then build them into a program.

Keep scrolling for the hints....

Hint 1: can you return both the (n-1) and the (n-2) values?

Hint 2: the list created by combining lists 1 and 2 is itself a list of words. 3 lists is just 2 lists, and then another list.

June 13, 2018

Fix Me

Fancy Latex Customisation

Writing up notes and papers, I often find that I want to leave myself reminders, hints and other snippets that I'd rather didn't make it into the final draft. Or I want to collate notes from throughout my text into a file I can put in my appendix, such as a list of future work. I often think of Latex as just a typesetting language, but Tex itself is actually a fully Turing complete language and Latex isn't too far off. The hardest bit of Turing completeness is usually the ability to do conditional branches which Latex can do with a bit of work. Thankfully there's a common package that takes some of the load off us.

NOTE: many journals wont accept manuscripts containing custom commands, so don't forget to remove your commands from papers you submit in future.

The Latex snippets are available on our Github page and are described below in more detail.

Defining Latex Commands

Firstly, we use the Latex \newcommand to do some custom formatting we can easily vary, in this case to make our text bold. \newcommand is fairly simple. The syntax is


where the parts in <> should be replaced with appropriate text. Thist creates a command called "name", taking "n_args" arguments and applies "command". The arguments are referred to using a '#' within your command, as in the following example where we feed the text inside the \fbrtext braces into \textbf.


If there are no arguments, you can omit the [<n_args>] part entirely.

We can define as many commands this way as we like, but they must not be already defined. To redefine something, use \renewcommand (but carefully, making sure not to clobber something important like \section).

Collating Snippets with Output Streams

The first snippet is to collate some things into a file so we can review things we'd like to fix or add to our document before we release it. Snippet 1 is a simple Latex document which does this. We put the commands in a .tex file named ReleaseFixes, and include this into our main document, so that we can reuse it easily.

The core part of this is a package called "newfile" which allows you to write to files as the Latex compilation runs. This is how things like index and glossary packages work at heart, creating a temporary file that can then be included. This package lets us create "streams" for output, that we can write to, which may be familiar from Java, C++ or other languages. We name our stream "releasefixes", and then open a file to which it should output.

Now we create the main command, which takes one argument, a text chunk. This chunk will appear in our document in bold red text, and will also appear in our output file. In the output file, it'll show the line number (in the input .tex file) where the text appears. We could also include the page number in the output pdf if we wanted or other such useful commands.

Including Snippets into our Document

Perhaps we want to include the snippets we've generated into an appendix, so that we can send the draft to somebody else and they have a handy list of all the changes we want to make, or things we need filling in. Or maybe we want to use a similar command to list the changes we've made to a draft for easy review.

In Snippet2 we adjust our command so that rather than creating a simple text file, we put a '\item' command at the start of every item. Then we add a section to our document, and include the file we generated, inside an \itemize environment. This should give a neat bullet-pointed list of things to fix.

This doesn't quite work yet: the stream seems to be empty at the point where we read it, even though the file is complete when we look afterwards. When you have an unexpectedly empty output file it is usually because the stream didn't get "flushed". Flushing an output stream means moving its contents from a buffer in memory to the actual output file. This could be done every time we write to the stream, but if we do lots of tiny writes it is much slower this way than waiting until we have "enough" and writing in a chunk.

Normally, output flushing happens when the compile ends and everything gets closed, but we want access to our file before this. So we insert an explicit \closeoutputstream in just before our new section to write the file immediately, and then we can open it for input. We include its contents, and they are then typeset just like a static included file.

Using ifthen to Typeset Variants

Suppose that rather than showing us the changes, we want a way to include or exclude them from the output file entirely. This is really easy to do with a quick edit, because we just redefine our \fbr command to omit the text instead, and compile again. Easy for one command, but what if we have several things we want to change for our "draft" mode? It's tedious and error prone to edit them all.

A perfectly good solution for this precise case, by the way, is to create two versions of our "ReleaseFixes.tex" file, one for draft mode, and one for release mode, and edit our file to swap between them

\include{ReleaseFixes_draft.tex} %Swap with next line for release


However, this is a good demonstration of conditionals, so let's try that. In Snippet3 we use the package "ifthen" which provides an "ifthenelse" command, that looks like

\ifthenelse{<condition>}{<command if true>}{<command else>}

The package allows us to define boolean flags. We define one called usered which chooses whether our text should be output in bold or red-bold.

However we can only define flags after we've include the package, and we'd prefer the draft mode selection it in our main document. So we define a normal command, \draft, and check if this is defined instead. We adjust the \fbr command to do nothing in release mode (\draft not defined) or to give us bold text (red if the flag says so) and a file entry in draft mode. Then by un/commenting line 6 of Document.tex we control the final form.


Latex typesetting gets a lot easier once you have a few simple commands at hand. It gets a lot more powerful once you add the ability to do conditionals. "Clever" commands like this should be used with caution, since anybody who wants to compile the document needs the relevant packages, but they are very handy when you need them, and can save a lot of time and effort. And do remember that you'll probably have to remove anything like this if you submit Latex to journals.

March 21, 2018

Branch By Boolean

Once upon a time, all programming was for performance. With computer power at a premium, every instruction was a cost. Oddly, the cost of branching (acting differently based on a condition) has got both more and less expensive. Branches are evaluated more quickly now, but improvements in branch prediction (see below) mean branches can be more costly relative to the branchless option. Modern processors are in some ways more similar to the original supercomputers than ever, in particular, their use of vector operations. In this post we talk a bit about what branch prediction means and show a couple of tricks which can be quite useful in code in important loops.

What is Branch Prediction?

Branch prediction refers to the ability of the processor to predict (aha!) which branch is likely, and to get ready to take it. This can happen before the condition is actually evaluated. Once the condition has been checked, if the guess was correct, the operation stream continues, but if it was wrong the incorrect instructions in the queue must be dumped and the correct instructions for the actual branch assembled. This can sometimes be quite costly. Some systems instead execute both possible branches, and throw away the unneeded result, which can halve their possible throughput.

Generally branch predictors are dumb. The commonest option is to assume whatever track was taken previously will be taken again. This is great when you have one rare case (for example, an error condition) and a common one. Sometimes the predictor can 'understand' simple alternation (option 1, then option 2, then option 1...) or similar patterns if they are 'obvious enough'.

How much does it cost, really?

The costs of branch misprediction vary widely, depending on hardware, vectorisation, surrounding code etc. Most profiling tools will give you a miss rate, and reducing this in regularly repeated pieces of code can be worthwhile. A simple demo of branch misses is given here. I have tried to make all paths as similar as possible. The final print statement ensures that the total is used so the loop cannot be simply optimised away.

On my machine, the example compiled without optimisation takes about 0.8 seconds with compile-time option -DBR_3, i.e. the maximum possible number of addition operations, and the case which would take the longest if there was nothing more than simple looping and addition in play. Options 1 and 2 both take about 1 second, and the option with half true, half false, just over 0.8 seconds. This suggests something like the 'same as last time' prediction occuring.

Avoiding Branches

When it does matter, there are several ways to avoid true branches and greatly improve performance in a piece of code. Always profile first, though, to be sure this piece warrants improving.

A Simple Rejig

The first thing to consider is whether a simple rewrite of the code can eliminate a branch, or improve its prediction. Many branch predictors are quite dumb and assume whatever branch was taken the previous iteration will be taken again. If you can rewrite to ensure longer 'runs' of one branch, this may help. The previous section showed one example where this can be quite dramatic.

Do More to Do Less

Sometimes it may even be more effective to do more work, but without a condition, rather than check if the work needs to be done. A trivial example is something like

  DO i = 0, 10000
    IF (B[i] /= 0) A = A + B[i]

where the check is redundant because if B[i] is zero, the addition does nothing.

Function Pointers

Another useful trick is to use function pointers (or named functions), as discussed in this previous Soup. Rather than putting an 'if' into a vital loop, first work out which function needs to be called and then call it. As given in the previous post, we can replace

  DO i = 0, 10000
    IF (condition) function_1()
    ELSE IF (condition) function_2()
    ELSE IF (condition) function_3()


  IF (condition) ptr = function_1
  ELSE IF (condition) ptr = function_2
  ELSE IF (condition) ptr = function_3

  DO i = 0, 10000

As usual, this may or may not help: in some languages function pointers are slow themselves (not C or Fortran though, but e.g. std::function in C++ can be), and sometimes the called function is already complex enough to break any vectorisation or similar optimisations.

Boolean Branch

One particularly sneaky one is the Boolean Branch, most useful in languages which equate 'True' with 1 and False with '0', but still viable with integers otherwise. The trick is to note that

IF condition THEN
  result = value1
  result = value2

is the same as

flag = condition !Flag may be an integer; in C it can be Boolean
result = value1*flag + value2*(1 - flag)

The conditional assignment in the first case is replaced with two simple assignments. This may or may not be faster! Profiling is essential to find out. It is more likely to pay off in a case where the two options alternate, especially when this is irregular, so the branch prediction fails regularly.

February 21, 2018

SOUP: Function Pointers

Today's snippet demos function pointers (objects in Python), in particular an array of function pointers. We use them to print a table of arithmetic operations on all of the IEEE special values.

IEEE 754 defines the behaviour of floating point numbers, in particular what happens when numbers get unrepresentable, whether that is too large, too small or plain not-numbers.

Infinity is a familiar enough concept and in floating-point it mostly means a number which is too large to be represented. There's a positive and a negative infinity and most arithmetic works as expected.

Negative zero (-0.0) seems quite odd at a first glance. The sign-bit is set, but in every other way, and in general, it is equal to positive zero. Comparisons like `0.0 == -0.0` are defined to be true and `-0.0 < 0.0` is false. Most languages have a function to copy the sign from one number to another though, and -0.0 works as expected.

NaN, Not a Number, mostly appears as a lament, "Why is it NaN!?" or worse "Why is it NaN again!?" Any operations involving NaN also give NaN as do several others.

It seems strange that `Inf * 0.0` or `Inf + -Inf` both give NaN where they could both reasonably give zero. Philosophically, both are numbers, but completely unknown ones. Inf isn't mathematical infinity, it is just too large to represent, and 0.0 stands in for any number too small to represent. Their product then can be absolutely anything, hence defining it as NaN.

Code snippets in C, Fortran and Python are in our SOUP repo under 001_IEEE. All three use named functions to "dry out" the code: in fact they use an array of them. Note in Fortran this needs us to use a custom type to hold the function pointer, as we can't have an array of simply pointers.

The core of all snippets is the loop over operations and inputs

In C:

  float (*op)(float, float);/*Holds operation function*/
  float(*allOps[4])(float, float);

  allOps[0] = &add; allOps[1] = &sub; allOps[2] = &mul; allOps[3] = &div;

  for(opNum=0; opNum< 4; opNum++){
    op = allOps[opNum];
    for(rowNum = 0; rowNum < 7; rowNum++){
      row = allRows[rowNum];
      /*print result of op(row, column)*/

In Fortran (where f_ptr is our type holding a pointer):

  TYPE(f_ptr), DIMENSION(4) :: allOps
  TYPE(f_ptr) :: currOp

  allOps(1)%ptr => add
  allOps(2)%ptr => sub
  allOps(3)%ptr => mult
  allOps(4)%ptr => div
  DO opNum = 1, 4
    currOp%ptr => allOps(opNum)%ptr

    DO rowNum = 1, 7
      row = allRows(rowNum)
      !Print results of currOp%ptr applied to row, column

And in Python (using range-based for loops on lists):

  allOps = [add, sub, mul, div]

  for opNum in range(0, 4):
    op = allOps[opNum]

    for rowNum in range(0, 7):
      row = allRows[rowNum]
      #Print result of op(row, column)

Note all three are subtly different in how one assigns the active operation. In C you just get a pointer to a function with the usual address-of operator, &. In Fortran you point your pointer to the proper target using the points-to operator, =>. In Python we just set the operator to equal the function, omitting the brackets () turning this into an assignment, not a call.

Function pointers also have a powerful use in High-Performance code. A long if-else chain inside a loop, which calls a different function in each branch, can be replaced with the same if-chain outside the loop setting a function pointer and then a simple call inside the loop, eliminating the branch. As pseudo-code:

  DO i = 0, 10000
    IF (condition) function_1()
    ELSE IF (condition) function_2()
    ELSE IF (condition) function_3()


  IF (condition) ptr = function_1
  ELSE IF (condition) ptr = function_2
  ELSE IF (condition) ptr = function_3

  DO i = 0, 10000

December 20, 2017

SOUP (Snippet of the Undefined Period)

Every-so-often we come across an interesting, imaginative, instructive or idiomatic code snippet. These can be amusing bugs, a task completed in several languages, or exemplars of how to do something well. Because we post them as and when we find them, they appear at unspecifed times, so we named them Snippet of the <Undefined Period> (SOUP). This gives us a nice acronym and is about as humorous as software engineering gets.

Often the interesting part of the code is buried in boilerplate etc, so we only post the interesting snippet directly. The full code is posted to the WarwickRSE github repository at WarwickRSE/SOUP. As a rule we try and produce a C, Fortran and a Python version of everything, with other languages on request. Comments and suggestions are welcome!

June 2024

Mo Tu We Th Fr Sa Su
May |  Today  |
               1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search this blog



Blog archive

RSS2.0 Atom
Not signed in
Sign in

Powered by BlogBuilder