Data structures 1 – Bitmasks
This month we're going to start an occasional series on data structures. Data structures are one of the core foundations of computer science, but are often underappreciated by general academic programmers. A large part of this is simply that data structures are usually chosen early in development of a code and are only rarely changed as the code evolves. Since most people don't start a code themselves, but simply work with an existing one, you only need to know how to work with a given data structure rather than why it was chosen or why it's a good choice. To try and remedy this we are going to create a few blog posts on common data structures, why they are chosen and how to implement them.
The first one is definitely one from the archives, being a trick that was more useful when computers only had a few kB of memory available to them, but is still used in modern codes because of it's simplicity. This month, we're going to be talking about bitmasks.
Bitmasks
Bitmasks are solutions to the problem of you having a large number of simple logical flags that you want to keep track of and keep together. The common uses are things like status flags (what state is this object in), capability flags (what can this object do) and error flags (which errors have occured). The answer seems obvious:
LOGICAL, DIMENSION(N) :: flags !Fortran
std::vector<bool> flags; // C++
char[N] flags; /*C*/
In fact, none of these are guaranteed to use as little memory as is needed to store your data.
In fact, since all that you are storing is a single true/false state all that you need is a single bit for each state, so you can store 8 states in a byte. So how do you do it? Unsurprisingly, you need to use bitwise Boolean logic. In particular you will probably find uses for bitwise OR, bitwise AND and bitwise XOR. The syntax for these varies between languages but almost every language has them
C/C++/Python | Fortran (95 or later) | |
Bitwise OR | A | B | IOR(A, B) |
Bitwise AND | A & B | IAND(A, B) |
Bitwise XOR | A ^ B | IEOR(A, B) |
Bitwise operators are exactly the same as logical operators but they operates on each bit individually. So you go through each bit in the two values and operate as if each was a logical flag. As an example, imagine 15 OR 24
Bit/Number | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 |
15 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
24 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
Result = 31 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
Every bit that was set in either of the two sources is set in the result. To make practical use of this, define named flags for each bit that you want to use. In this example I'm going to assume that I want my bitmask to represent error states based off a real code that I've worked with.
/*NOTE these values must be powers of two since they correspond to individual bits. Bitmasks don't work
for other values*/
char c_err_none = 0; //No error state is all bits zero
char c_err_unparsable_value = 1; //Value that can't be converted to integer
char c_err_bad_value = 2; //Value can be converted to integer, but integer is not valid in context
char c_err_missing_feature = 4; //Code has not been compiled with needed feature
char c_err_terminate = 8; //Error is fatal. Code should terminate
You can do the same in Fortran or Python, although Fortran still doesn't have a portable "char" equivalent so you'll have to use integers. Once you have your list of constants, you can write code to use them for error values. For example
/*Set errcode to c_err_none to start*/
char errcode = c_err_none;
if (value == c_feature_tracking) {
if (!tracking_enabled){
errcode = errcode | c_err_missing_feature; /*Feature is missing so or with c_err_missing_feature flag*/
errcode = errcode | c_err_terminate; /*Code should not continue to run so or with c_err_terminate*/
} else { ... }
}
This code tests for value being a value that the code hasn't been compiled with and sets two error state bits by combining the errcode variable with the constants c_err_missing_feature and c_err_terminate. This sets the correct two bits and errcode now has the error state stored safely in it. But how do you read it back? By using AND. If I want to test for a state, I simply AND the errcode variable with that state. To demonstrate
Bit/Number | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 |
errcode | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
c_err_terminate | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
Result = 8 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
Since the value c_err_terminate has been set, the result is just the value of the bit corresponding to c_err_terminate. But what if I test a bit that isn't set?
Bit/Number | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 |
errcode | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
c_err_bad_value | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Result = 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
The result of the AND operation is just zero. So to test for a bit being set, simply AND your error variable with the flag being tested and test if the result is zero or not. In languages like C or C++ where conditionals can just test for a value being zero or not, this is as simple as
if (errcode & c_err_terminate) { ... }
But in Fortran where the IF statement needs to take a logical, you have to explicitly test against zero
IF (IAND(errcode, c_err_terminate) /= 0)
The other common task with arrays of logicals is to flip the state of a bit. So if it is set, unset it and if it's unset, set it. You do this using the XOR operator. XOR is a little less common than AND or OR, so I'll explain it briefly.
XOR is exclusive OR and it has this truth table
0 | 1 | |
0 | 0 | 1 |
1 | 1 | 0 |
So if either, BUT NOT BOTHof the inputs is 1 then the output is 1. Otherwise the output is zero. You can see what that does to a bitmask simply enough by considering if I want to flip the state of the c_err_terminate bit in my previous bitmask
Bit/Number | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 |
errcode = 12 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
c_err_terminate | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
Result = 4 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
You can see immediately that by performing the same operation again I'll just set the c_err_terminate bit back.
So by combining the values with XOR I simply flip the state of the bit that I am interested in while leaving everything else alone. In code this is very simple
errcode = IEOR(errcode, c_err_terminate) !Fortran
errcode = errcode ^ c_err_terminate /* C/ C++ or Python */
No comments
Add a comment
You are not allowed to comment on this entry as it has restricted commenting permissions.