All entries for August 2018

August 22, 2018

Data structures 2 – Arrays Part 1

This time we're having a little bit on arrays. They're pretty much the prototypical data structure and there's a good chance that you're already familiar with them, but if you're not, arrays are collections of a datatype (like int or float) and you can access an individual element by it's index (a number (usually starting from either 0 or 1) that describes and individual element). They're pretty easy to use and there's lots of resources on arrays so I won't introduce them too much here, but there are some subtleties that you should worry about when using them. Most of them are to do with an element of modern computers called prefetching

In the early days of computers (and to a degree up until the 1980s) memory and processors ran at pretty much the same speed. The main job of a programmer was simply making sure that the processor could keep up with the data that was fed to it from the memory by simplifying the computations to be done as much as possible. But processors kept getting faster and faster and memory wasn't able to keep up with them, at least not while also increasing the amount of memory in the computer. To try and smooth out this effect processors were given caches, small areas of memory that are very close to the processor and that can run as fast as the processor. If your program and data is small enough to fit in this cache memory then it will run very fast indeed, but this isn't a very common situation, so the question is : what should be in the cache memory? There's a lot of work goes into deciding this, but in general the rule is : data that you've used recently and data that is "near" data that you've used recently. There are two reasons for this nearby data being used. Firstly data is loaded into the cache memory in fixed chunks (called cache lines) so if your data is smaller than a cache line then adjacent memory is always loaded anyway, but in more modern machines there is also a prefetch unit in the CPU that actively pulls in additional cache lines if it "thinks" that the processor might need them.

The net effect of this is that not all memory access is equal. Memory access patterns that make your processors prefetching strategy a success are likely to give better performing code than access patterns that fail to do so and this has implications for how you should access your arrays.

Behind the scenes all that arrays are is contiguous strips of memory of a fixed length. Since they are contiguous the most common prefetching patterns will be a success for arrays so long as you go through the array accessing element i+1 immediately after you've accessed element i. That at least is easy, but what happens when you have multidimensional arrays? (int a[100][100]; INTEGER, DIMENSION(100,100) :: a etc.)

Even multidimensional arrays are 1D strips of contiguous memory (in C you can build ones that aren't but I'm ignoring those here). The mutidimensionality is created by the fact that behind the scenes the two dimensional index is converted into a 1 dimensional index. For example, if you have an array nx x ny elements and you are accessing element (i,j) then you could convert this to a 1D index by calculating element = i * ny + j (assuming that the index goes from (0:nx-1, 0:ny-1)). As you increment j you move between adjacent elements and then when you reach j=ny-1, you increment i and set j=0 and you still go smoothly to the next index in the underlying 1D memory strip. The problem is that this is a completely arbitrary choice, I could just as easily have written element = i + j * nx and then to move element by element through the array you have to increment i first and then j when i=nx-1. In fact, different programming languages make different choices for this, and the difference is called either column major or row major order.

Fortran is column major. In order to move through a Fortran array in the right order you should change the last index in the array specifier first. C is row major, so you should change the first index in the array specifier first. As a consequence, despite being completely different both of these programs is operating correctly.

INTEGER, DIMENSION(100,100) :: a
DO iy = 1, 100
DO ix = 1, 100
a(ix,iy) = a(ix,iy) + 1
END DO
END DO

int a[100][100];
for (ix = 0; ix<100; ++ix){
for (iy =0; iy<100;++iy) {
a[ix][iy] = a[ix][iy] + 1;
}
}

Other languages make one or other of these choices, usually with an eye to the language that their "typical" original user was familiar with, so Matlab is column major, just like Fortran because it's so common in scientific use where Fortran is also very popular.

It's very important that you get this the right way round as it can lead to significant (order 2x) speedups compared to getting it wrong. So remember FORTRAN LAST INDEX FIRST, C/C++ FIRST INDEX FIRST!

That's pretty much everything for statically sized arrays (where the size is hard coded into the source code), but there are a few remaining wrinkles to do with dynamically sized arrays (arrays where the size is specified at runtime), so that'll be the topic of part 2 of this entry.


August 08, 2018

Data structures 1 – Bitmasks

This month we're going to start an occasional series on data structures. Data structures are one of the core foundations of computer science, but are often underappreciated by general academic programmers. A large part of this is simply that data structures are usually chosen early in development of a code and are only rarely changed as the code evolves. Since most people don't start a code themselves, but simply work with an existing one, you only need to know how to work with a given data structure rather than why it was chosen or why it's a good choice. To try and remedy this we are going to create a few blog posts on common data structures, why they are chosen and how to implement them.

The first one is definitely one from the archives, being a trick that was more useful when computers only had a few kB of memory available to them, but is still used in modern codes because of it's simplicity. This month, we're going to be talking about bitmasks.

Bitmasks

Bitmasks are solutions to the problem of you having a large number of simple logical flags that you want to keep track of and keep together. The common uses are things like status flags (what state is this object in), capability flags (what can this object do) and error flags (which errors have occured). The answer seems obvious:

LOGICAL, DIMENSION(N) :: flags !Fortran
std::vector<bool> flags; // C++
char[N] flags; /*C*/

In fact, none of these are guaranteed to use as little memory as is needed to store your data.

In fact, since all that you are storing is a single true/false state all that you need is a single bit for each state, so you can store 8 states in a byte. So how do you do it? Unsurprisingly, you need to use bitwise Boolean logic. In particular you will probably find uses for bitwise OR, bitwise AND and bitwise XOR. The syntax for these varies between languages but almost every language has them

Bitwise operators in different languages
  C/C++/Python Fortran (95 or later)
Bitwise OR A | B IOR(A, B)
Bitwise AND A & B IAND(A, B)
Bitwise XOR A ^ B IEOR(A, B)

Bitwise operators are exactly the same as logical operators but they operates on each bit individually. So you go through each bit in the two values and operate as if each was a logical flag. As an example, imagine 15 OR 24

Bit/Number 1 2 4 8 16 32 64 128
15 1 1 1 1 0 0 0 0
24 0 0 0 1 1 0 0 0
Result = 31 1 1 1 1 1 0 0 0

Every bit that was set in either of the two sources is set in the result. To make practical use of this, define named flags for each bit that you want to use. In this example I'm going to assume that I want my bitmask to represent error states based off a real code that I've worked with.

/*NOTE these values must be powers of two since they correspond to individual bits. Bitmasks don't work
for other values*/
char c_err_none = 0; //No error state is all bits zero
char c_err_unparsable_value = 1; //Value that can't be converted to integer
char c_err_bad_value = 2; //Value can be converted to integer, but integer is not valid in context
char c_err_missing_feature = 4; //Code has not been compiled with needed feature
char c_err_terminate = 8; //Error is fatal. Code should terminate

You can do the same in Fortran or Python, although Fortran still doesn't have a portable "char" equivalent so you'll have to use integers. Once you have your list of constants, you can write code to use them for error values. For example

/*Set errcode to c_err_none to start*/
char errcode = c_err_none;
if (value == c_feature_tracking) {
if (!tracking_enabled){
errcode = errcode | c_err_missing_feature; /*Feature is missing so or with c_err_missing_feature flag*/
errcode = errcode | c_err_terminate; /*Code should not continue to run so or with c_err_terminate*/
} else { ... }

}

This code tests for value being a value that the code hasn't been compiled with and sets two error state bits by combining the errcode variable with the constants c_err_missing_feature and c_err_terminate. This sets the correct two bits and errcode now has the error state stored safely in it. But how do you read it back? By using AND. If I want to test for a state, I simply AND the errcode variable with that state. To demonstrate

Bit/Number 1 2 4 8 16 32 64 128
errcode 0 0 1 1 0 0 0 0
c_err_terminate 0 0 0 1 0 0 0 0
Result = 8 0 0 0 1 0 0 0 0

Since the value c_err_terminate has been set, the result is just the value of the bit corresponding to c_err_terminate. But what if I test a bit that isn't set?

Bit/Number 1 2 4 8 16 32 64 128
errcode 0 0 1 1 0 0 0 0
c_err_bad_value 0 1 0 0 0 0 0 0
Result = 0 0 0 0 0 0 0 0 0

The result of the AND operation is just zero. So to test for a bit being set, simply AND your error variable with the flag being tested and test if the result is zero or not. In languages like C or C++ where conditionals can just test for a value being zero or not, this is as simple as

if (errcode & c_err_terminate) { ... }

But in Fortran where the IF statement needs to take a logical, you have to explicitly test against zero

IF (IAND(errcode, c_err_terminate) /= 0)

The other common task with arrays of logicals is to flip the state of a bit. So if it is set, unset it and if it's unset, set it. You do this using the XOR operator. XOR is a little less common than AND or OR, so I'll explain it briefly.

XOR is exclusive OR and it has this truth table


0 1
0 0 1
1 1 0

So if either, BUT NOT BOTHof the inputs is 1 then the output is 1. Otherwise the output is zero. You can see what that does to a bitmask simply enough by considering if I want to flip the state of the c_err_terminate bit in my previous bitmask

Bit/Number 1 2 4 8 16 32 64 128
errcode = 12 0 0 1 1 0 0 0 0
c_err_terminate 0 0 0 1 0 0 0 0
Result = 4 0 0 1 0 0 0 0 0

You can see immediately that by performing the same operation again I'll just set the c_err_terminate bit back.

So by combining the values with XOR I simply flip the state of the bit that I am interested in while leaving everything else alone. In code this is very simple

errcode = IEOR(errcode, c_err_terminate) !Fortran
errcode = errcode ^ c_err_terminate /* C/ C++ or Python */


August 2018

Mo Tu We Th Fr Sa Su
Jul |  Today  | Sep
      1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31      

Search this blog

Tags

Galleries

Blog archive

Loading…
Not signed in
Sign in

Powered by BlogBuilder
© MMXXIV