All entries for Wednesday 03 October 2018
October 03, 2018
Pretty please print
The possible impossible
A pretty short entry this week, on a likely familiar theme. You have some buggy code. You try and trace the bug by inserting 'print' statements. Eventually you have prints wrapped around the troublesome piece and you may find yourself saying something like:
Aha! It reaches line A, but never line B! But thats IMPOSSIBLE! There's nothing there but simple code that can't just not execute! In fact there's only blank lines!
As usual, once you find yourself uttering a line like that, it's time to step back. Computers only rarely do the impossible.
Many things can have happened here, but a common, very confusing one, is that the second print has run, but you're not seeing the result. This is common because most languages default to bufferredoutput to stdout. We mentioned this a little bit in context of output streams in a previous entry, here we give a bit more detail.
Aside - stdout
If stdout and stderr aren't already familiar, they are the short names for standard out and standard error. There's a little bit about those herebut basically they are the designated way of writing output (information) and errors to the terminal. They're separate because sometimes you only want to see one or the other. You can direct them to separate files if you want (see the link for how). It is very good practice to respect the designation, and use the correct stream when you write information. It's acceptable to write everything to stdout in your code, but you really shouldn't be writing anything except actual errors to stderr. Warnings are acceptable, but only if they're "important". Save it for things that a user really needs to know.
Back to the Print Problem
The conversation goes something like this:
"So, what's going on with my prints? If it's printing but I don't see it, where is it?"
"It's in an output buffer, waiting until there's enough stuff to be worth printing. "
"Hang on, worth printing? I think it's worth printing! Print it!"
However, there is a good reason. The easiest way to show this is with some code. The following Python3 code uses both buffered output and unbuffered output where we force each print statement to flush the buffer so each print happens exactly as it is reached:
import time _iters = 100000 def many_prints(flush): for i in range(0, _iters): print(i, flush=flush) start_time = time.time() many_prints(False) timef = time.time() - start_time start_time = time.time() many_prints(True) timet = time.time() - start_time print("No flushing --- %s seconds ---" % timef) print("Flushing --- %s seconds ---" % timet)
Run this a few times and you should see that flushing is slower. It's not always by much, but it can be 20%, or more, depending on system, which is sometimes significant.
Why so Slow?
So why is unbuffered output slow? In this case, mainly because interacting with the terminal and redrawing the screen takes time. Writing to a file can be even slower, because now we have to interact with the file system and disks, and maybe even wait for a success signal to come back. It's much better if these events can be collected into larger bursts, so we only have to do that rigamarole once.
Unbuffered Output and Flushing Options
We showed Python3 above, where we can simply use the flush Keyword. In Python 2 we have to do a little more. This linkshows the main options. We can force the Python interpreter to not use buffering at all, or we can override it for given output streams. Or we can write custom objects that flush exactly when we want.
In Fortran, we just call the FLUSH subroutine either on all outputs or the unit of our choice.
In C, we can flush stdout with fflush, set stdout to be unbuffered, or use stderr which is unbuffered by default.
In C++ there is a slight wrinkle as we can use either C-style IO via stdio, or C++ streamio. std::endl is a command to flush the output stream, not just a new-line, so it's usually recommended to use `'\n' to get just a newline, and std::endl only when required.
While we're here, we should mention on thing about mixed IO in C++. By default, a program can mix stdio via printf and stream-io (using <<), and similarly for input. BUT these two methods have to be force to share any buffering, or else the would not work together. This means they are forced to stay in sync, so inputs and outputs work as expected. In some circumstances, especially if a lot is written to stdout, it may be useful to only use one or other of these, and disable the syncing. See here for details, but be very very careful.
The bottom line of all of this is pretty simple. When doing print-style debugging, or printing error messages, either use stderr (making sure that's not buffered in your language or system), or flush after every essential stage of your prints, or disable buffering completely while debugging. You don't need all three. And finally, remember the golden rule. If it looks impossible, you're probably assuming something that isn't true. Not terribly snappy, but a useful thing to remember.