All entries for Tuesday 03 December 2024
December 03, 2024
Fortran Types and Component Selection
Derived Types in Fortran
What and Why
Pretty much all languages have support for user-defined data types in some way or another. These can be as simple as a set of data items (called elements, components, properties, members and probably other things too) collected together under a single name, up to fully functioning classes (objects) which can define operators (like + and -), functions (like printing and even the call operator), and act in every way like a "built-in" data type.
Collections of data, without any concept of what the data can do, are often referred to as "POD types" or plain-old-data types, especially in C++ where this distinguishes them from classes like we just described. Note that C++20 has refined this specification and now uses some new terminology with extra caveats, but POD is a widely used term pre-20. Fortran and C++ both only really make the distinction between such 'trivial' types and ones with methods etc in places where it matters. For instance, struct and class in C++ are almost the same, differing only in whether members are accessible or not by default - this means using a struct for a 'data' type makes it clearer that members are intended be accessed directly, but doesn't really change anything.
In Fortran, the user-defined 'TYPE' construct (also called a derived type) can be a POD type, or can have functions (aka procedures or methods) attached to it. They can also be 'polymorphic' - loosely though of as substitutable based on ability rather than identity. True object oriented programming is therefore possible in Fortran since at least 2003 (opinions vary on what a language needs to support innately, versus what can be "hacked in" to be considered a true OO contender). Fortran types support 'access control' with the 'public' and 'private' keywords, but we wont cover that here.
All of the code snippets, and more are available here.
Defining and Using a Type
There's two elements to using your own types, namely defining one (saying what it looks like) and creating one (making a variable of that type). This is best done by example, so let's use a variation on the usual exemplar and do an 'atom' type.
Defining your type looks like below. 'atom' is going to be the name for the type, and it's going to contain a 'name' and an 'atomic number'. The code is
TYPE atom ! The name for the type
CHARACTER(LEN=30) :: name ! a field called name, of type string
INTEGER :: atomic_number ! an integer field for atomic number
END TYPE atom ! Optional to repeat the name here
and it can go in the same places as a variable declaration, e.g. in a MODULE, or at the top of your main program, as long as it's before any attempt to use it.
Now we have the "idea" of an atom, a thing consisting of a name and an atomic number. How do we use this? Well we can create a variable of the 'atom' type like this:
TYPE(atom) :: myAtom ! A variable of type atom
which we often describe as creating an "instance".
We can also create a function which expects something of type atom. This is powerful for two big reasons and plenty of smaller ones.
- The first big one: the code now documents itself far more clearly. Rather than taking two, possibly unrelated, data items, we see it takes an atom.
- The second big one: our function interface and call-lines get shorter and more compact, taking a small number of custom types, rather than a long list of generic ones.
- A third one (with many caveats in reality): we can extend the idea of an atom without changing all our functions signatures; although often doing this is a sign of problems.
So here's a function which takes two atoms and returns a third. Imagine we're some sort of weird alchemists...
FUNCTION transmute(first_atom, second_atom) RESULT(combined_atom)
TYPE(atom) :: first_atom, second_atom, combined_atom
...
Derived or user defined types are used here exactly like any other data type - specify INTENTs as usual, PARAMETER (remembering that we then need a way to give values - see below).
Type Components
So we can create a type, but that's not a lot of use without being able to get and set its elements. Like in most languages, what we need is an instance, and then we can refer to its contents. We do this like in this code block:
TYPE(atom) :: myAtom
myAtom%name = "Hydrogen"
myAtom%atomic_number = 1
PRINT*, myAtom
NOTE: the printing is pretty dumb - it just outputs the items in order with default format. We can do better, but that is a topic for a potential future post, not now.
Aside - one reason why it's not a dot
By the way, a lot of people really dislike the choice of '%' for component selection. I do not know for sure if what follows is defining reason, or just a part of why it's not so easy to fix, but remember that Fortran comes from a time when input characters might have been restricted and it was not certain that somebody had a '=' key. In C the solution was digraphs and trigraphs- 2 or 3 characters that were parsed as one. Think '<=' but potentially far more esoteric.
In Fortran, the decision was made to use the form ".OP." instead, such as .LT. .LE. etc. This was also used for things like .TRUE. and the logical operators .AND. etc. Fortran is also very generous about white-space, caring only when it is needed to divide identifiers.
Now consider this bit of code, if we could use '.' to get type components:
first.lt.other
Is this:
- a less-than comparison between a variable called 'first' and one called 'other'
- a nested type object property access, where we have a variable 'first' with property 'lt' which in turn has property 'other'.
Oh dear... If we can't tell, then we have a problem! Some compilers are willing to risk the ambiguity, but some are not, so do not offer the alternative symbol.
Default Values
We can give each element of our type a default value if we want. We do that inline in the definition, such as
TYPE atom ! The name for the type
CHARACTER(LEN=30) :: name = "" ! a field called name, of type string
INTEGER :: atomic_number = -1 ! an integer field for atomic number
END TYPE atom
NOTE: if you're worried about 'SAVE' behaviour, types don't have any lurking surprises here. All 'SAVE' would mean is that the type's properties retain their values for the life of the variable which is... exactly as we'd hope it would be. No surprises waiting for us here.
Structure Constructors
Not just a tongue-twister, I promise: just another less-known feature of Fortran types. For the atom type above, we can also create an instance with values set at construction, like this:
myAtom = atom("Hydrogen", 1)
or even using keywords, such as
myAtom = atom(name = "Helium", atomic_number=2)
which we think is actually pretty damn informative! As with function calls, keywords can be in any order and we can mix the two styles, as long as all the positional ones come first.
We can also create a temporary atom this way, for example to pass to a function. In many senses, this is a literal just like the string "Hello" or the number 2. If we store it into a variable, then that variable now contains the data. Otherwise, we can't modify it, or access it's individual components*. Mostly we can use it as the source for setting a variable, or to pass to a function.
NOTE: this atom will exist only for the duration of the line where it is created. For a function call this includes the bit inside the function. IF we don't specify intent, we can happily pass this temporary and modify it - but this is probably not what we intended, since it will go away before we can name it. Specifying INTENT(INOUT) or (OUT) will disable passing a temporary, and ensure we can't make this mistake.
*NOTE: once we have passed a temporary to a function, inside the function we have a dummy variable (formal argument) just like normal. If the dummy is "bound" to a temporary, then we can't write to it, as the previous note says. But we can access the components of the dummy regardless of this. So we can do 'myAtom%name' inside a function having passed "atom("Iron", 27), but we can't do atom("Iron", 27)%name. The code for this article, same as linked above, makes this clearer
Well, I Never Knew That: Arrays of types and arrays of components
We often say that arrays are the great strength of Fortran, and that whole array operations, slices etc are a huge part of the reason why. Well, Fortran kindly allows us to retain all of this when we're using types too. For example, the following is perfectly valid code:
TYPE(atom), DIMENSION(10) :: theAtoms
INTEGER, DIMENSION(:), ALLOCATABLE :: theNums
PRINT'(A)', theAtoms%name
theNums = theAtoms%atomic_number
which will PRINT just the names, and then give us an array of just the atomic numbers.
This is a handy trick in any case, but it's also a great thing to know if refactoring code that doesn't use types (or doesn't use them to their capacity yet). We can pretty freely wrap things into types, even if those types are then in arrays, and we can still unwrap the individual components very cheaply. The resulting arrays are "proper" arrays. We can assign them to things, assign to them, pass them to functions, slice them: everything we expect.
An Example Refactoring
Here's a quick example (see the full code here)
Old:
! Takes 3 arrays for components of position
New:
SUBROUTINE old(x, y, z, dt)
REAL, DIMENSION(N) :: x, y, z
REAL :: dt
TYPE pos
REAL :: x, y, z
END TYPE
! Takes one array of types instead
SUBROUTINE new(p, dt)
TYPE(pos), DIMENSION(N) :: p
REAL :: dt
The body of the function can still use idiom like "x = x + 0.1*dt" by just changing to "p%x = p%x + 0.1*dt" even though p is now an array of types. The function signature is clearer, and it is no longer possible to decouple x, y and z from each other at different indices by accident.
Take Away Point
Types are possibly the best way to make your Fortran code better and more modern for extremely little effort. They make function signatures clearer, let the compiler enforce type checking better, and reduce the "mental" load of having a bunch of connected variables which you have to remember the relationships between. Use them in new code, and consider the simple refactoring above with existing code. It wont be wasted effort even if a rewrite is in the future - having the data structures already figured out is a great bonus.
Fortran Refactoring
Fortran into the Future
This blog is about software engineering in academic contexts. We see a lot of "styles"* of code. Improving these, through refactoring (re-writing to have the same function, but an improved form), training academics who write and maintain them, and offering libraries and code snippets that address common function well, is a big part of our role.
*styles is in scare-quotes for a reason. Sometimes style is just an excuse for doing it wrong.
In STEM disciplines (at least the first 3 letters of it), Fortran code is pretty common, and some of it is the opposite of pretty. Modern approaches like modularity of design, well-defined interfaces, use of types and Object Oriented features, are often mysterious and underused.
There's two choices on how to handle these old* (see below) codes.
*old: synonyms 'ancient', 'decrepit', 'mature', 'venerable', 'familiar', 'long-lived'. Take your pick. Sometimes something is old because nobody bothered to renew it, sometimes it is old because it works.
Choice 1) Abandon them. Rewrite them in a cool new language (Rust! Julia!). Get rid of the cruft and the dust and make something new and better.
Choice 2) Refactor them. Take advantage of the experience that has gone into them and make them better slowly but surely.
These posts plan to talk about the second approach. Sometimes small improvements can greatly increase robustness, utility and our quality-of-life when we have to deal with these codes, without introducing new bugs and regressions or re-inventing the wheel.
Refactoring is an idea few seem to bother with, and even those who want to do it struggle to find any time or funding for the work!
So let's discuss some of the nice features of Fortran, with half-an-eye on the less well know bits, especially those which help us to 'patch-in' the new feature cheaply. These kinds of refactors can reward the effort many times over, without the time and risk of a full rewrite.
A List (hopefully a growing one)
Links TBA once the corresponding post is ready
Types and Component Selection - use the type system for clarity and robustness
Common blocks - strategies for getting rid
Explicit Interfaces - a step towards modularity