All 5 entries tagged Winkt
No other Warwick Blogs use the tag Winkt on entries | View entries tagged Winkt at Technorati | There are no images tagged Winkt on this blog
December 03, 2024
Fortran Types and Component Selection
Derived Types in Fortran
What and Why
Pretty much all languages have support for user-defined data types in some way or another. These can be as simple as a set of data items (called elements, components, properties, members and probably other things too) collected together under a single name, up to fully functioning classes (objects) which can define operators (like + and -), functions (like printing and even the call operator), and act in every way like a "built-in" data type.
Collections of data, without any concept of what the data can do, are often referred to as "POD types" or plain-old-data types, especially in C++ where this distinguishes them from classes like we just described. Note that C++20 has refined this specification and now uses some new terminology with extra caveats, but POD is a widely used term pre-20. Fortran and C++ both only really make the distinction between such 'trivial' types and ones with methods etc in places where it matters. For instance, struct and class in C++ are almost the same, differing only in whether members are accessible or not by default - this means using a struct for a 'data' type makes it clearer that members are intended be accessed directly, but doesn't really change anything.
In Fortran, the user-defined 'TYPE' construct (also called a derived type) can be a POD type, or can have functions (aka procedures or methods) attached to it. They can also be 'polymorphic' - loosely though of as substitutable based on ability rather than identity. True object oriented programming is therefore possible in Fortran since at least 2003 (opinions vary on what a language needs to support innately, versus what can be "hacked in" to be considered a true OO contender). Fortran types support 'access control' with the 'public' and 'private' keywords, but we wont cover that here.
All of the code snippets, and more are available here.
Defining and Using a Type
There's two elements to using your own types, namely defining one (saying what it looks like) and creating one (making a variable of that type). This is best done by example, so let's use a variation on the usual exemplar and do an 'atom' type.
Defining your type looks like below. 'atom' is going to be the name for the type, and it's going to contain a 'name' and an 'atomic number'. The code is
TYPE atom ! The name for the type
CHARACTER(LEN=30) :: name ! a field called name, of type string
INTEGER :: atomic_number ! an integer field for atomic number
END TYPE atom ! Optional to repeat the name here
and it can go in the same places as a variable declaration, e.g. in a MODULE, or at the top of your main program, as long as it's before any attempt to use it.
Now we have the "idea" of an atom, a thing consisting of a name and an atomic number. How do we use this? Well we can create a variable of the 'atom' type like this:
TYPE(atom) :: myAtom ! A variable of type atom
which we often describe as creating an "instance".
We can also create a function which expects something of type atom. This is powerful for two big reasons and plenty of smaller ones.
- The first big one: the code now documents itself far more clearly. Rather than taking two, possibly unrelated, data items, we see it takes an atom.
- The second big one: our function interface and call-lines get shorter and more compact, taking a small number of custom types, rather than a long list of generic ones.
- A third one (with many caveats in reality): we can extend the idea of an atom without changing all our functions signatures; although often doing this is a sign of problems.
So here's a function which takes two atoms and returns a third. Imagine we're some sort of weird alchemists...
FUNCTION transmute(first_atom, second_atom) RESULT(combined_atom)
TYPE(atom) :: first_atom, second_atom, combined_atom
...
Derived or user defined types are used here exactly like any other data type - specify INTENTs as usual, PARAMETER (remembering that we then need a way to give values - see below).
Type Components
So we can create a type, but that's not a lot of use without being able to get and set its elements. Like in most languages, what we need is an instance, and then we can refer to its contents. We do this like in this code block:
TYPE(atom) :: myAtom
myAtom%name = "Hydrogen"
myAtom%atomic_number = 1
PRINT*, myAtom
NOTE: the printing is pretty dumb - it just outputs the items in order with default format. We can do better, but that is a topic for a potential future post, not now.
Aside - one reason why it's not a dot
By the way, a lot of people really dislike the choice of '%' for component selection. I do not know for sure if what follows is defining reason, or just a part of why it's not so easy to fix, but remember that Fortran comes from a time when input characters might have been restricted and it was not certain that somebody had a '=' key. In C the solution was digraphs and trigraphs- 2 or 3 characters that were parsed as one. Think '<=' but potentially far more esoteric.
In Fortran, the decision was made to use the form ".OP." instead, such as .LT. .LE. etc. This was also used for things like .TRUE. and the logical operators .AND. etc. Fortran is also very generous about white-space, caring only when it is needed to divide identifiers.
Now consider this bit of code, if we could use '.' to get type components:
first.lt.other
Is this:
- a less-than comparison between a variable called 'first' and one called 'other'
- a nested type object property access, where we have a variable 'first' with property 'lt' which in turn has property 'other'.
Oh dear... If we can't tell, then we have a problem! Some compilers are willing to risk the ambiguity, but some are not, so do not offer the alternative symbol.
Default Values
We can give each element of our type a default value if we want. We do that inline in the definition, such as
TYPE atom ! The name for the type
CHARACTER(LEN=30) :: name = "" ! a field called name, of type string
INTEGER :: atomic_number = -1 ! an integer field for atomic number
END TYPE atom
NOTE: if you're worried about 'SAVE' behaviour, types don't have any lurking surprises here. All 'SAVE' would mean is that the type's properties retain their values for the life of the variable which is... exactly as we'd hope it would be. No surprises waiting for us here.
Structure Constructors
Not just a tongue-twister, I promise: just another less-known feature of Fortran types. For the atom type above, we can also create an instance with values set at construction, like this:
myAtom = atom("Hydrogen", 1)
or even using keywords, such as
myAtom = atom(name = "Helium", atomic_number=2)
which we think is actually pretty damn informative! As with function calls, keywords can be in any order and we can mix the two styles, as long as all the positional ones come first.
We can also create a temporary atom this way, for example to pass to a function. In many senses, this is a literal just like the string "Hello" or the number 2. If we store it into a variable, then that variable now contains the data. Otherwise, we can't modify it, or access it's individual components*. Mostly we can use it as the source for setting a variable, or to pass to a function.
NOTE: this atom will exist only for the duration of the line where it is created. For a function call this includes the bit inside the function. IF we don't specify intent, we can happily pass this temporary and modify it - but this is probably not what we intended, since it will go away before we can name it. Specifying INTENT(INOUT) or (OUT) will disable passing a temporary, and ensure we can't make this mistake.
*NOTE: once we have passed a temporary to a function, inside the function we have a dummy variable (formal argument) just like normal. If the dummy is "bound" to a temporary, then we can't write to it, as the previous note says. But we can access the components of the dummy regardless of this. So we can do 'myAtom%name' inside a function having passed "atom("Iron", 27), but we can't do atom("Iron", 27)%name. The code for this article, same as linked above, makes this clearer
Well, I Never Knew That: Arrays of types and arrays of components
We often say that arrays are the great strength of Fortran, and that whole array operations, slices etc are a huge part of the reason why. Well, Fortran kindly allows us to retain all of this when we're using types too. For example, the following is perfectly valid code:
TYPE(atom), DIMENSION(10) :: theAtoms
INTEGER, DIMENSION(:), ALLOCATABLE :: theNums
PRINT'(A)', theAtoms%name
theNums = theAtoms%atomic_number
which will PRINT just the names, and then give us an array of just the atomic numbers.
This is a handy trick in any case, but it's also a great thing to know if refactoring code that doesn't use types (or doesn't use them to their capacity yet). We can pretty freely wrap things into types, even if those types are then in arrays, and we can still unwrap the individual components very cheaply. The resulting arrays are "proper" arrays. We can assign them to things, assign to them, pass them to functions, slice them: everything we expect.
An Example Refactoring
Here's a quick example (see the full code here)
Old:
! Takes 3 arrays for components of position
New:
SUBROUTINE old(x, y, z, dt)
REAL, DIMENSION(N) :: x, y, z
REAL :: dt
TYPE pos
REAL :: x, y, z
END TYPE
! Takes one array of types instead
SUBROUTINE new(p, dt)
TYPE(pos), DIMENSION(N) :: p
REAL :: dt
The body of the function can still use idiom like "x = x + 0.1*dt" by just changing to "p%x = p%x + 0.1*dt" even though p is now an array of types. The function signature is clearer, and it is no longer possible to decouple x, y and z from each other at different indices by accident.
Take Away Point
Types are possibly the best way to make your Fortran code better and more modern for extremely little effort. They make function signatures clearer, let the compiler enforce type checking better, and reduce the "mental" load of having a bunch of connected variables which you have to remember the relationships between. Use them in new code, and consider the simple refactoring above with existing code. It wont be wasted effort even if a rewrite is in the future - having the data structures already figured out is a great bonus.
November 27, 2024
How does MPI parallel code actually run?
One of those things I wondered about for a long time, before just getting on and finding out is this:
when I run an MPI parallel code, what actually happens?
And, before forging ahead to answer my question I should clarify why I want to know - well basically so I can understand what happens in error cases and edge cases, such as starting a program without mpiexec, or mpiexec-ing a non MPI program.
I know that MPI lets me run multiple, interacting, instances of my program, and lets those communicate with each other. But I also know that I can start my program without using mpiexec (or mpirun, srun or any other invocation) and it might work as a serial code - but it doesn't always. I know that MPI_Init is really important, but I don't know what a program can and can't do before that line, or how a completely empty program would behave. I don't understand what an MPI program without any comms in would actually do. I am not certain whether any bits of my program are somehow shared - data or state or communicators.
As usual, I could answer all of these questions individually, but there's a good chance I can answer them all if I can just work out what's missing in my mental model. It turns out this is what I hadn't realised:
mpiexec starts N independent copies of my program. When my code reaches the MPI_Init function, communication is established (using some info provided by the launcher) - the copies are made aware of each other and assigned their ranks. MPI_Finalize is where the comms is shut down. *
Obvious in retrospect, but it answers all of my questions.
- Starting a single instance of my code (a serial version) will work as long as my algorithm _can_ work on a single processor, without deadlocks etc. But starting N independent copies won't make for a parallel run, because the information (or daemons etc) MPI_Init needs won't be present - it wont know about the other copies, or even how many copies there are.
- Before the MPI_Init line, my code can do anything that doesn't use communication - no MPI calls, no use of communicators etc. That means I can't know how many processors I am running on (we don't know ranks), or if I am the root (proc 0) or anything like that.
- A completely empty program, or one where I never call MPI_Init, will run N independent copies, but they will never know about each other. Just like the parts of my program before the Init. This also tells me what happens if I mpiexec a completely non-MPI program.
- A program that calls MPI_Init but has no actual comms can still be a parallel program - if I can split up my work with nothing other than MPI_Comm_size or MPI_Comm_rank, for instance dividing my work into N blocks, I can do that work in parallel (as long as I am careful about outputting the final product of my work blocks).
- The one thing I can't definitively answer using this is whether I can mpiexec a program that wasn't compiled as an MPI program. But I can guess, based off the fact that a program without MPI_Init can be valid, that I would probably get N independent programs, and I'd be right, as it happens.
- Finally, I can easily see that no program state can possibly be shared, because my program copies are independent, with their own memory spaces. Things like communicators must contain information sufficient for the message passing "layer" to pass information between copies of my program.
Note that I put a '*' on my statement of what actually happens - this is "correct from the perspective of my program", but a little incomplete in general. The mpiexec launcher can, and generally does, do some elements of setting up comms, but this is lower level than my program and doesn't affect how it behaves or what it can do. I also omitted anything about the compiler step - since I know MPI uses compiler wrapper scripts, something could happen at this stage, which is why I can't completely answer that penultimate question without using some more information.
March 02, 2022
Learning to Program
We (Warwick RSE) love quotes, and we love analogies. We do always caution against letting your analogies leak, by supposing properties of the model will necessarily apply to reality, but nontheless, a carefully chosen story can illustrate a concept like nothing else. This post discusses some of my favorite analogies for the process of learning to program - which lasts one's entire programming life, by the way. We're never done learning.
Jigsaw Puzzles
Learning to program is a lot like trying to complete a jigsaw puzzle without the picture on the box. At first, you have a blank table and a heap of pieces. Perhaps some of the pieces have snippets you can understand already - writing perhaps, or small pictures. You know what these depict, but not where they might fit overall. You start by gathering pieces that somehow fit together - edges perhaps, or writing, or small amounts of distinctive colours. You build up little blocks (build understanding of isolated concepts) and start to fit them together. As the picture builds up, new pieces fit in far more easily - you use the existing picture to help you understand what they mean. Of course, the programming puzzle is never finished, which is where this analogy starts to leak.
I particularly like this one for two reasons. Firstly, one really does tend to build little isolated mini-pictures, and move them around the table until they start to connect together, both in jigsaws and in programming. Secondly, occasionally, you have these little moments - a piece that seemed to be just some weird lines fits into place and you say "oh is THAT what that is!". One gets the same thing when programming - "oh, is that how that works!" or "is that what that means".
Personally, a lot of these moments then become "Why didn't anybody SAY that!". This was the motivation behind our blog series of "Well I Never Knew That" - all those little things that make so much sense once you know.
The other motivation for the WINKT series is better illustrated using a different analogy - hence why we like to have plenty on hand.
A Walk in the Woods
Another way to look at the learning process, is as a process of exploring some new terrain, building your mental map of it. At first, you mostly stick to obvious paths, slowly venturing further and further. You find landmarks and add them to your map. Some paths turn out to link up to other paths, and you start to assemble a true connected picture of how things fit together. Yet still there is terrain just off the paths you haven't gone into. There could be anything out there, just outside your view. Even as you venture further and deeper into the woods, you must also make sure to look around you, and truly understand the areas you've already walked through, and how they connect together.
My badly done keynote picture is below. The paths are thin lines, the red fuzzy marks show those we have explored and the grey shading shows the areas we've actually been into and now "understand". Eventually we notice that the two paths in the middle are joined - a nice "Aha" moment of clarity ensues. But much of our map remains a mystery.
And that is one of the reasons I really like this analogy - all of the things you still don't know - all that unexplored terrain. This too is very like the learning process - you are able to use things once you have walked their paths, even though there may be details about them you don't understand. On the one hand, this is very powerful. On the other, it can lead to "magic incantations" - things you can repeat without actually understanding. This is inevitable in a task this complex, but it is important to go back and understand eventually. Don't let your map get too thin and don't always stick to the paths, or when you do step off them, you'll be lost.
This was the other main motivation behind our WINKT series - the moments when things you do without thinking suddenly make sense, and a new chunk of terrain gets filled in. Like approaching the same bit of terrain from a new angle, or discovering that two paths join, you gain a new perspective, and ideally, this new perspective is simpler than before. Rather than two paths, you see there is only one. Rather than two mysterious houses deep in the woods, you see there is one, seen from two angles.
Take Away Point
If you take one thing away from this post, make it this: the more you work on fitting the puzzle pieces together, the clearer things become. Rather than brown and green and blue and a pile of mysterious shapes, eventually you just have a horse.
And one final thing, if we may stretch the jigsaw analogy a little further: just because a piece seems to fit, doesn't mean it's always the right piece...
October 10, 2019
The Basics aren't so Basic
Sorry for the long break! We've been busy with the start of term and busy expanding our training material (link). This week I am just going to talk about something that you should always keep in mind, not just with programming and computers but with a whole bunch of things, and that is, what does it mean to say something is "basic".
There is a quote often attributed to Einstein, although not directly traceable to himwhich goes
Common sense is the collection of prejudices acquired by age eighteen.
Whether the origin is real or not, it's often true that what people think is simple, or "just common sense" is only so because of their background. To somebody who cut teeth on a BBC Micro, programming might seem super BASIC.
Jokes aside, you will probably keep coming across things that are super-basic and feeling a bit awkward that they somehow escaped you until now. Especially if you have learned things in the usual manner, i.e. by necessity, it is very easy to miss some of the basics. You can find yourself doing really quite advanced things, while not knowing something that "everybody else" seems to. This is normal. It is not beneficial, but it is perfectly normal. Frankly, in computing and programming there is a vast, vast sea of "basics", and no matter how much you learn there always seems to be more.
When I were a Lad
When I was a PhD student, I was happily using 'ssh' to login to remote machines, but I would always type out the whole host spec, such as "username@machinename.blah". I remember feeling a bit dumb when my supervisor pointed out that I didn't need the "username", and he thought this was somehow basic and obvious. I was frankly a little bit irritated because nobody told me! How was I meant to know?
"Simplicity is the final achievement."
(Quote from Frederic Chopin)
Moreover, just because something is "basic" doesn't mean it is simple. In fact, Merriam-Webster's definitionof the adjective "basic", while perhaps a bit unhelpfully recursive does not say simple anywhere. That thing with the username isn't so simple. It's fundamental, sure, but it's not simple!
Years later, I am still regularly coming across things that are "basic" that I have never encountered before. The whole "learning how to program" thing is far more of a helix than a road. You come across fundamental things all the time, some for the first time, some repeatedly, and often you can understand them better every time. Eventually, you find them simple. Sometimes they feel even elegant, because they arise so smoothly from the things you do know, or perhaps even seem so obviously "the only way it could be".
This is most of the motivation for our "WINKT" blog post series. These are fundamental, mostly "basic" things, but they're mostly not things you could usefully be told about the first go-around. Mostly, they are the basics of how the complicated things work. For example:
- On the command line: if you use the '*' wildcard, when does this get expanded into the list of matches? Specifically, if you accidentally create a file called "-rf" in your home, and ran the command `rm *` to remove files, how much trouble would you be in? The answer is, _a lot_. * is processed first, by the shell, and unfortunately '-' comes first in the alphabet. You just ran the equivalent of `rm -rf *`. Ooops.
- Any C/C++ programmers: if you use a variable which is undefined, what is it's value? If you said "whatever is in the associated memory beforehand", you're close, but wrong. An undefined variable is undefined behaviour - it can be given any value, including a different one each time it is accessed. Why? Because the standard says so. But who needs to know that? It is enough to know that its value is unreliable. Using your "basic" knowledge of the C memory model, you would likely guess the above, and it would never matter. [Disclaimer: this is one that I personally only learned a few weeks ago. It's absolutely fundamental, but not at all simple.]
- For Fortran 2003 people: if you have a function-scoped ALLOCATABLE array, allocate it inside the function and forget to free it before the function exits, what happens? A memory leak? Nope! Fortran will helpfully deallocate the array on exit. If you didn't know this and freed everything yourself, there would never be a problem, but this one often surprises people.
- For Python people: suppose you give a function a default argument, like `def func(arg, list_arg = []): ...` and suppose inside the function, list_arg gets filled with stuff. If you call the function twice without supplying list_arg, what do you get the second time? If you said "the combined contents from the first and second calls" you would be correct! The default arg. is an empty list, but it is the SAME empty list each time!
Take Aways
What's the point of all this rambling? Just that there is so much often classed as "the basics" that nobodycan know it all, and there is nothing so basic that you wouldn't do well to re-examine it anyway. It gets said all the time, but with computers there really are no stupid questions. Well, OK, there are some pretty stupid questions. But I have never seen one yet that wasn't worth thinking about.
Postscript: any suggestions for things that make you go "Well I Never Knew That"!, email us or comment! We can always use more
September 10, 2019
New snippets series: WINKT
Super short blog post to start a new snippets series. Along with our SOUP series (Snippet of the <Undefined Period>) we're trying out a new way of posting. WINKT (Well I Never Knew That) is a place for all those things that we stumble across that don't warrant a complete post, but are interesting (if you're a giant geek, anyway).
Fortran Variable Declarations
To start the series off, something I never thought about in Fortran: when you declare a variable with multiple attributes, what is the Left-Hand side comma-separated list actually doing? I was reading a Doctor Fortran post about the potential pitfalls of blindly applying every attribute you might wantand saw his final example of applying (here erroneously) the DIMENSION keyword to a variable in a separate line. "Wait, " I said, "can you do that?" I wrote a quick example, compiled it (with strict standards adherence) and proclaimed "Well I Never Knew That!"
What am I talking about here? Well, consider creating an array of integers. You'll usually write something like:
INTEGER, DIMENSION(:, :), ALLOCATABLE :: I
for a 2-D array. As it turns out, you could equivalently write this as
INTEGER :: I DIMENSION :: I(:, :) ALLOCATABLE :: I
adding the attributes across several lines.
The comma-separated lists of attributes on the left of the double-colon are effectively "stacking" the attributes, but each attribute can also be applied separately.
In that example, it's clearly a bit silly and unwieldy, but it's something you might see "in the wild" and perhaps be confused by, so worth knowing. In some examples, it might actually help make things clear, with, for example, attributes such as SAVE or INTENTs, which can sometimes get "lost in the noise" of declarations. So rather than
INTEGER, SAVE :: I INTEGER :: j, k
I could write
INTEGER :: i, j, k SAVE :: i
This might look clearer, especially if I have more stacked attributes.
This is probably not something I will ever use, and I am not sure I would recommend it, since it looks a bit unexpected, and generally code should avoid unexpectedness. But it did show up to me just how the attribute lists must be working, and was an interesting ten minutes.