All 6 entries tagged Fortran

View all 9 entries tagged Fortran on Warwick Blogs | View entries tagged Fortran at Technorati | There are no images tagged Fortran on this blog

December 03, 2024

Fortran Types and Component Selection

Derived Types in Fortran

What and Why

Pretty much all languages have support for user-defined data types in some way or another. These can be as simple as a set of data items (called elements, components, properties, members and probably other things too) collected together under a single name, up to fully functioning classes (objects) which can define operators (like + and -), functions (like printing and even the call operator), and act in every way like a "built-in" data type.

Collections of data, without any concept of what the data can do, are often referred to as "POD types" or plain-old-data types, especially in C++ where this distinguishes them from classes like we just described. Note that C++20 has refined this specification and now uses some new terminology with extra caveats, but POD is a widely used term pre-20. Fortran and C++ both only really make the distinction between such 'trivial' types and ones with methods etc in places where it matters. For instance, struct and class in C++ are almost the same, differing only in whether members are accessible or not by default - this means using a struct for a 'data' type makes it clearer that members are intended be accessed directly, but doesn't really change anything.

In Fortran, the user-defined 'TYPE' construct (also called a derived type) can be a POD type, or can have functions (aka procedures or methods) attached to it. They can also be 'polymorphic' - loosely though of as substitutable based on ability rather than identity. True object oriented programming is therefore possible in Fortran since at least 2003 (opinions vary on what a language needs to support innately, versus what can be "hacked in" to be considered a true OO contender). Fortran types support 'access control' with the 'public' and 'private' keywords, but we wont cover that here.

All of the code snippets, and more are available here.

Defining and Using a Type

There's two elements to using your own types, namely defining one (saying what it looks like) and creating one (making a variable of that type). This is best done by example, so let's use a variation on the usual exemplar and do an 'atom' type.

Defining your type looks like below. 'atom' is going to be the name for the type, and it's going to contain a 'name' and an 'atomic number'. The code is

TYPE atom ! The name for the type
CHARACTER(LEN=30) :: name ! a field called name, of type string
INTEGER :: atomic_number ! an integer field for atomic number
END TYPE atom ! Optional to repeat the name here

and it can go in the same places as a variable declaration, e.g. in a MODULE, or at the top of your main program, as long as it's before any attempt to use it.

Now we have the "idea" of an atom, a thing consisting of a name and an atomic number. How do we use this? Well we can create a variable of the 'atom' type like this:

TYPE(atom) :: myAtom ! A variable of type atom

which we often describe as creating an "instance".

We can also create a function which expects something of type atom. This is powerful for two big reasons and plenty of smaller ones.

  • The first big one: the code now documents itself far more clearly. Rather than taking two, possibly unrelated, data items, we see it takes an atom.
  • The second big one: our function interface and call-lines get shorter and more compact, taking a small number of custom types, rather than a long list of generic ones.
  • A third one (with many caveats in reality): we can extend the idea of an atom without changing all our functions signatures; although often doing this is a sign of problems.

So here's a function which takes two atoms and returns a third. Imagine we're some sort of weird alchemists...

FUNCTION transmute(first_atom, second_atom) RESULT(combined_atom)
TYPE(atom) :: first_atom, second_atom, combined_atom
...

Derived or user defined types are used here exactly like any other data type - specify INTENTs as usual, PARAMETER (remembering that we then need a way to give values - see below).

Type Components

So we can create a type, but that's not a lot of use without being able to get and set its elements. Like in most languages, what we need is an instance, and then we can refer to its contents. We do this like in this code block:

TYPE(atom) :: myAtom
myAtom%name = "Hydrogen"
myAtom%atomic_number = 1
PRINT*, myAtom

NOTE: the printing is pretty dumb - it just outputs the items in order with default format. We can do better, but that is a topic for a potential future post, not now.

Aside - one reason why it's not a dot

By the way, a lot of people really dislike the choice of '%' for component selection. I do not know for sure if what follows is defining reason, or just a part of why it's not so easy to fix, but remember that Fortran comes from a time when input characters might have been restricted and it was not certain that somebody had a '=' key. In C the solution was digraphs and trigraphs- 2 or 3 characters that were parsed as one. Think '<=' but potentially far more esoteric.

In Fortran, the decision was made to use the form ".OP." instead, such as .LT. .LE. etc. This was also used for things like .TRUE. and the logical operators .AND. etc. Fortran is also very generous about white-space, caring only when it is needed to divide identifiers.

Now consider this bit of code, if we could use '.' to get type components:

first.lt.other

Is this:

  • a less-than comparison between a variable called 'first' and one called 'other'
  • a nested type object property access, where we have a variable 'first' with property 'lt' which in turn has property 'other'.

Oh dear... If we can't tell, then we have a problem! Some compilers are willing to risk the ambiguity, but some are not, so do not offer the alternative symbol.

Default Values

We can give each element of our type a default value if we want. We do that inline in the definition, such as

TYPE atom ! The name for the type
CHARACTER(LEN=30) :: name = "" ! a field called name, of type string
INTEGER :: atomic_number = -1 ! an integer field for atomic number
END TYPE atom

NOTE: if you're worried about 'SAVE' behaviour, types don't have any lurking surprises here. All 'SAVE' would mean is that the type's properties retain their values for the life of the variable which is... exactly as we'd hope it would be. No surprises waiting for us here.

Structure Constructors

Not just a tongue-twister, I promise: just another less-known feature of Fortran types. For the atom type above, we can also create an instance with values set at construction, like this:

myAtom = atom("Hydrogen", 1)

or even using keywords, such as

myAtom = atom(name = "Helium", atomic_number=2)

which we think is actually pretty damn informative! As with function calls, keywords can be in any order and we can mix the two styles, as long as all the positional ones come first.

We can also create a temporary atom this way, for example to pass to a function. In many senses, this is a literal just like the string "Hello" or the number 2. If we store it into a variable, then that variable now contains the data. Otherwise, we can't modify it, or access it's individual components*. Mostly we can use it as the source for setting a variable, or to pass to a function.

NOTE: this atom will exist only for the duration of the line where it is created. For a function call this includes the bit inside the function. IF we don't specify intent, we can happily pass this temporary and modify it - but this is probably not what we intended, since it will go away before we can name it. Specifying INTENT(INOUT) or (OUT) will disable passing a temporary, and ensure we can't make this mistake.

*NOTE: once we have passed a temporary to a function, inside the function we have a dummy variable (formal argument) just like normal. If the dummy is "bound" to a temporary, then we can't write to it, as the previous note says. But we can access the components of the dummy regardless of this. So we can do 'myAtom%name' inside a function having passed "atom("Iron", 27), but we can't do atom("Iron", 27)%name. The code for this article, same as linked above, makes this clearer

Well, I Never Knew That: Arrays of types and arrays of components

We often say that arrays are the great strength of Fortran, and that whole array operations, slices etc are a huge part of the reason why. Well, Fortran kindly allows us to retain all of this when we're using types too. For example, the following is perfectly valid code:

TYPE(atom), DIMENSION(10) :: theAtoms
INTEGER, DIMENSION(:), ALLOCATABLE :: theNums
PRINT'(A)', theAtoms%name
theNums = theAtoms%atomic_number

which will PRINT just the names, and then give us an array of just the atomic numbers.

This is a handy trick in any case, but it's also a great thing to know if refactoring code that doesn't use types (or doesn't use them to their capacity yet). We can pretty freely wrap things into types, even if those types are then in arrays, and we can still unwrap the individual components very cheaply. The resulting arrays are "proper" arrays. We can assign them to things, assign to them, pass them to functions, slice them: everything we expect.

An Example Refactoring

Here's a quick example (see the full code here)

Old: ! Takes 3 arrays for components of position
SUBROUTINE old(x, y, z, dt)
REAL, DIMENSION(N) :: x, y, z
REAL :: dt
New:

TYPE pos
REAL :: x, y, z
END TYPE

! Takes one array of types instead
SUBROUTINE new(p, dt)
TYPE(pos), DIMENSION(N) :: p
REAL :: dt

The body of the function can still use idiom like "x = x + 0.1*dt" by just changing to "p%x = p%x + 0.1*dt" even though p is now an array of types. The function signature is clearer, and it is no longer possible to decouple x, y and z from each other at different indices by accident.

Take Away Point

Types are possibly the best way to make your Fortran code better and more modern for extremely little effort. They make function signatures clearer, let the compiler enforce type checking better, and reduce the "mental" load of having a bunch of connected variables which you have to remember the relationships between. Use them in new code, and consider the simple refactoring above with existing code. It wont be wasted effort even if a rewrite is in the future - having the data structures already figured out is a great bonus.


Fortran Refactoring

Fortran into the Future

This blog is about software engineering in academic contexts. We see a lot of "styles"* of code. Improving these, through refactoring (re-writing to have the same function, but an improved form), training academics who write and maintain them, and offering libraries and code snippets that address common function well, is a big part of our role.

*styles is in scare-quotes for a reason. Sometimes style is just an excuse for doing it wrong.

In STEM disciplines (at least the first 3 letters of it), Fortran code is pretty common, and some of it is the opposite of pretty. Modern approaches like modularity of design, well-defined interfaces, use of types and Object Oriented features, are often mysterious and underused.

There's two choices on how to handle these old* (see below) codes.

*old: synonyms 'ancient', 'decrepit', 'mature', 'venerable', 'familiar', 'long-lived'. Take your pick. Sometimes something is old because nobody bothered to renew it, sometimes it is old because it works.

Choice 1) Abandon them. Rewrite them in a cool new language (Rust! Julia!). Get rid of the cruft and the dust and make something new and better.

Choice 2) Refactor them. Take advantage of the experience that has gone into them and make them better slowly but surely.

These posts plan to talk about the second approach. Sometimes small improvements can greatly increase robustness, utility and our quality-of-life when we have to deal with these codes, without introducing new bugs and regressions or re-inventing the wheel.

Refactoring is an idea few seem to bother with, and even those who want to do it struggle to find any time or funding for the work!

So let's discuss some of the nice features of Fortran, with half-an-eye on the less well know bits, especially those which help us to 'patch-in' the new feature cheaply. These kinds of refactors can reward the effort many times over, without the time and risk of a full rewrite.

A List (hopefully a growing one)

Links TBA once the corresponding post is ready

Types and Component Selection - use the type system for clarity and robustness

Common blocks - strategies for getting rid

Explicit Interfaces - a step towards modularity


March 20, 2024

It's sort of pass by reference, ish

When teaching Fortran, it's often tempting to describe it as a "pass by reference" language. This is almost true. Unfortunately as the phrase has it, "almost only counts in horseshoes and handgrenades" and "almost true" in programming terms isn't usually good enough. All too often, the sort of complicated technical thing that should only matter to an expert trips up plenty of beginners who, like the Inexperienced Swordsman "doesn't do the thing he ought to do"

So what's the truth? The truth is that most of the time things work just like pass-by-reference because they are expected to look just as if they were. But actually, they are allowed to do a "copy-in copy-back" process, as long as the final result has the expected changes. Copies can be expensive, so compilers don't tend to actually do this without a good reason.

The place this usually comes to our notice is when we use slices of arrays. For example, we can take every second element of an array in Fortran, very easily. Or we can take half of a 2-dimensional array (in the row direction). Both of these are valid "array slices" but both consist of data that is not contiguous in memory. Items do not follow each other one after the other in memory.

But in Fortran, we can pass these to some function that expects just "an array". Now imagine what would have to happen to do a true "pass by reference" here. The compiler would have to pass the array, and then enough information to "slice out" only the values we wanted. This would be pretty weird, and if we passed that value on to another function also expecting an array, it could easily get out of control. So instead, the compiler will tend to generate the code to copy the values in the slice into a temporary array, use them there, and copy them back when we are done. To us, everything will work seamlessly.

That is, as long as we do what we're supposed to, everything works. But if we start breaking rules, we can get some very odd behaviour from this. Compilers are like puppies - they trust us to keep our promises. In fact, they rely on this so much that they simply assume that we will! When we don't funny things happen.

The following code is modified from this old post https://groups.google.com/g/comp.lang.fortran/c/z11RW0ezojE?pli=1 to use modern Fortran.

MODULE mymod

CONTAINS

SUBROUTINE set_to_two(B, C)
REAL, DIMENSION(10) :: B
REAL, DIMENSION(5) :: C
INTEGER :: i
DO I = 1, 10
B(I) = 2.0
ENDDO
END

END MODULE

PROGRAM main
USE mymod
REAL, DIMENSION(10) :: A

A = 1.0
CALL set_to_two(A, A(1:10:2))
PRINT *, A

END

Pretty simple code - it takes an array and sets every value to 2.0. For some reason it also takes a second unused irrelevant parameter, but we can ignore that one, surely?

Run that code, as written with current gfortran, and this is the result:

1.00000000  2.00000000  1.00000000  2.00000000  1.00000000  2.00000000  1.00000000  2.00000000  1.00000000  2.00000000  

The code is bad - passing A as two different parameters to a function violates a promise we make as programmers not to alias variables (call them by two different names). But even knowing this we're pretty surprised to get that result! In particular, on my machine, if I call the function with CALL set_to_two(A, A(2:6)) which violates the aliasing rules just as badly, nothing odd happens at all, and all I get is A = 2.0. In this case, the compiler is able to avoid a potentially costly copy as the data is contiguous even though it's a slice.

It's pretty obvious what is actually happening once we know about the copy-back idea. Because the compiler trusted us not to have two names for the same piece of data (the names here being B and C, and the data being A) it happily copies data for the C argument, copying it back at the end. This copy is never affected by the update to B so its content remains 1.0. That gets copied back into A, overwriting the changes we'd made.

This can happen even though we never use C in the function, so nothing actually changes - that second irrelevant argument is not so irrelevant after all.

Take Home Point

The real take-home point here is not to upset your compiler - don't do things wrong and these sorts of details wont matter. But when things do go wrong, it can be pretty helpful to understand what is actually going on.

Honestly a lot of the bugs we write as programmers are a case of miscommunication - what we thought we were writing is not what we actually have said. This is expecially true with modern optimising compilers, which very liberally agree to produce a program that works "as if" it was the one we wrote, and will happily omit things that do not affect the results, or, as in this case, will assume that we are writing correct code and act accordingly.

Fortran is usually a lot nicer to us than C/C++ in terms of undefined behaviour, but as this example shows, it will still do strange things if we break the rules.


September 10, 2019

New snippets series: WINKT

Super short blog post to start a new snippets series. Along with our SOUP series (Snippet of the <Undefined Period>) we're trying out a new way of posting. WINKT (Well I Never Knew That) is a place for all those things that we stumble across that don't warrant a complete post, but are interesting (if you're a giant geek, anyway).


Fortran Variable Declarations

To start the series off, something I never thought about in Fortran: when you declare a variable with multiple attributes, what is the Left-Hand side comma-separated list actually doing? I was reading a Doctor Fortran post about the potential pitfalls of blindly applying every attribute you might wantand saw his final example of applying (here erroneously) the DIMENSION keyword to a variable in a separate line. "Wait, " I said, "can you do that?" I wrote a quick example, compiled it (with strict standards adherence) and proclaimed "Well I Never Knew That!"

What am I talking about here? Well, consider creating an array of integers. You'll usually write something like:

INTEGER, DIMENSION(:, :), ALLOCATABLE :: I

for a 2-D array. As it turns out, you could equivalently write this as

INTEGER :: I
DIMENSION :: I(:, :)
 ALLOCATABLE :: I

adding the attributes across several lines.

The comma-separated lists of attributes on the left of the double-colon are effectively "stacking" the attributes, but each attribute can also be applied separately.

In that example, it's clearly a bit silly and unwieldy, but it's something you might see "in the wild" and perhaps be confused by, so worth knowing. In some examples, it might actually help make things clear, with, for example, attributes such as SAVE or INTENTs, which can sometimes get "lost in the noise" of declarations. So rather than

INTEGER, SAVE :: I
 INTEGER :: j, k

I could write

INTEGER :: i, j, k
 SAVE :: i

This might look clearer, especially if I have more stacked attributes.

This is probably not something I will ever use, and I am not sure I would recommend it, since it looks a bit unexpected, and generally code should avoid unexpectedness. But it did show up to me just how the attribute lists must be working, and was an interesting ten minutes.


August 21, 2019

Fortran Memory Management II

Follow-up to Fortran Memory Management from Research Software Engineering at Warwick

This month we're going to cover the question of what to watch out for with using ALLOCATABLEs in Fortran.

Array bounds in functions

Fortran arrays have the very nice property that their indices don't have to run from any specific value to any specific value. So if you want an array that runs from -3 to 103 that's fine. You can allocate it as

  INTEGER, DIMENSION(:), ALLOCATABLE :: array
  ALLOCATE(array(-3:103))

This maps quite neatly maps onto lots of scientific use cases so you quite often see arrays with explicit upper and lower bounds in real world Fortran codes. You can check the upper and lower bounds easily enough using the UBOUND and LBOUND functions

  INTEGER, DIMENSION(:), ALLOCATABLE :: array
  ALLOCATE(array(-3:103))
  PRINT *, LBOUND(array), UBOUND(array)

Produces the output "-3 103" as you'd expect. But there is a wrinkle. What if you move those calls to LBOUND and UBOUND into a function?

MODULE mdl
  IMPLICIT NONE
  CONTAINS
  SUBROUTINE print_array(array)
    INTEGER, DIMENSION(:), INTENT(IN) :: array
    PRINT *, LBOUND(array), UBOUND(array)
  END SUBROUTINE print_array
END MODULE mdl

PROGRAM p1
  USE mdl
  IMPLICIT NONE

  INTEGER, DIMENSION(:), ALLOCATABLE :: array
  ALLOCATE(array(-3:103))
  CALL print_array(array)
END PROGRAM p1

The result now is "1 107". The same number of elements but the lower bound has been moved back to the Fortran default of 1. This is the default behaviour of Fortran when you pass an array into a function; the lower bound is reset to 1. You can override this behaviour to specify the lower bound of the array in the function (INTEGER, DIMENSION(-3:), INTENT(IN) :: array will specify that the lower bound is -3), you can even pass in a parameter to the function to specify what the lower bound is (simply pass in an integer parameter and use it in the DIMENSION option in the same way that I used -3 before) but you can't just use the lower bound that the array was given when it was created. However, if you flag the array argument to the function as either ALLOCATABLE or POINTER things are different.

MODULE mdl
  IMPLICIT NONE
  CONTAINS
  SUBROUTINE print_array(array)
    INTEGER, DIMENSION(:), ALLOCATABLE, INTENT(IN) :: array
    PRINT *, LBOUND(array), UBOUND(array)
  END SUBROUTINE print_array
END MODULE mdl

PROGRAM p1
  USE mdl
  IMPLICIT NONE

  INTEGER, DIMENSION(:), ALLOCATABLE :: array
  ALLOCATE(array(-3:103))
  CALL print_array(array)
END PROGRAM p1

This version of the code looks almost identical but now reports the lower and upper bounds as -3 and 103. In fact, you now can't override the bounds of your array even if you wanted to (if you try putting the lower bound in the DIMENSION part of the parameter definition in the function the code will fail to compile). The main downside is that you now can only pass ALLOCATABLE arrays to the function because the main purpose of applying the ALLOCATABLE option to the parameter is to allow you to allocate and deallocate the array inside the function. The POINTER attribute works in much the same way but only for POINTER arrays

Mostly this sort of thing isn't too much of a problem but you do have to be careful. If you have a function that takes a normal non-ALLOCATABLE parameter then it's tempting to simply recognize that the array lower bound will start from 1 inside the function and write the function accordingly. The problem is that if you ever then have cause to add the ALLOCATABLE or POINTER attribute to the function then you'll have to completely rewrite the function because suddenly the lower bound is no longer under your control. It's generally a good idea to always specify the lower bound of an array argument to a function, either through a fixed value if it's always the same for all arrays that the function will be used on or by passing in the lower bound as a parameter to the function.


Automatic Reallocation

The fact that Fortran allows you to do whole array operations is another feature that makes it well suited to scientific programming but there are some features that can be mixed blessings. One feature that was added to Fortran 2003 is that if you do a whole array assignment to an allocatable array that is either not allocated or is allocated to be a different size than the source array then the array will be reallocated to match the size of the source. To give an example

PROGRAM p1
  IMPLICIT NONE

  INTEGER, DIMENSION(:), ALLOCATABLE :: array
  INTEGER, DIMENSION(-3:103) :: src
  array = src
  PRINT *, LBOUND(array), UBOUND(array)
END PROGRAM p1

This program always feels like it's invalid but from Fortran 2003 onwards it is completely valid and will give you lower bound of -3 and upper bound of 103 because "array" is automatically allocated when "src" is assigned to it. In this example "array" is not allocated before I used it but if it was then it would have been silently deallocated and reallocated to the new size. This is quite useful for many purposes, it allows you to return an array from a function and just store it in an allocatable array and have everything magically work, and sometimes you do want to do literally what I'm doing in this example and make a copy of an array. Why would this ever be a disadvantage? Because it can make debugging much harder by moving the place where the bug occurs. Imagine the following situation

PROGRAM p1
  IMPLICIT NONE

  INTEGER, DIMENSION(:), ALLOCATABLE :: array1, array2
  INTEGER :: i

  !The incorrect allocation of array2 is a typo
  ALLOCATE(array1(-3:103), array2(-2:103))
  array2 = 1

  array1 = 5 * array2
  DO i = -2, 103
    array1(i) = array(i) - array(i-1)
  END DO

END PROGRAM p1

This program does nothing even remotely useful but it has the same structure as a real program that I had a problem with. There was an error in the size of an array that was then used in an array assignment. The assignment caused a reallocation of "array1" then then meant that the later loop was iterating over more items than the array now had so it crashed during the operation of that loop. Specifically the first iteration of the loop is trying to access the -3 element that now no longer exists. The loop in fact was perfectly well written for how the code should have been working but due to the error in the allocation of array2 the code was now crashing there. Without the implicit reallocation the error could have much more easily been traced to the array assignment (that in the real code was much further away from the crash site than in this simple example). There are a surprising number of ways of tripping this behaviour and causing errors in unrelated parts of your code because of this behaviour so if you get very unexpected array behaviour you should watch out for this one.

A question that then quite often comes up is if you can suppress this behaviour if you don't want it, and you definitely can. Most compilers have an option to disable the behaviour entirely (-fno-realloc-lhs in gfortran for example) and equally assigning to an array sectiondoesn't trigger the behaviour so

PROGRAM p1
  IMPLICIT NONE

  INTEGER, DIMENSION(:), ALLOCATABLE :: array
  INTEGER, DIMENSION(-3:103) :: src
  array(:) = src
  PRINT *, LBOUND(array), UBOUND(array)
END PROGRAM p1

will crash because you are assigning to an unallocated array. Some people say that from F2003 onwards you should probably always do array assignment to an array section if you don't want to invoke the automatic reallocation behaviour. I wouldn't go quite that far but you should definitely consider the question of if any odd bugs that you have are related to the automatic reallocation behaviour


August 07, 2019

Fortran Memory Management

Memory Managment

Manually Managing

Memory management is one of the banes of the programmer's life in almost all programming languages. In many languages, such as C, you have to manually pair up all allocations of memory with associated deallocations or you will "leak" memory as your program runs. (Strictly a "memory leak" is when you create memory that you then lose track of in some way, so that you can't then deallocate it even if you want to. From the perspective of your program crashing when you run out of memory there's no difference between a true leak and just piling up unused but technically available memory somewhere so I tend to use the term a bit loosely).

Collecting the Garbage

Other languages, like Java, use mechanisms for counting how many times you use an item and when the last reference to an item is removed then the item will be deleted by something that is generally called a "garbage collector". The problem with these languages is that the garbage collector runs only periodically, not every time an item has its last reference released, so memory use can increase due to items that will be deleted when the garbage collector next runs but haven't been deleted yet.

Destructive Magic

Still other languages like C++ use objects that have systems of constructor and destructor functions that create memory when an object is created and release the memory when the object is deleted. This doesn't immediately sound like it's very helpful because surely that's still just a different way of saying that you have to have matched allocation and deallocation logic? The advantage comes from the fact that simple variables (like integers, floats etc.) don't have these memory management problems, the compiler automatically knows the lifetimes of these variables: they're either global variables the exist for the entire time that the code runs or they are local to a function and exist only when you are in that function (or in other functions called from that function etc.).

So long as you are dealing with a single instance of a C++ class and not a pointer to one or more of them then the compiler has the same level of lifetime guarantee that it has with the simple variables, and so long as the class knows how to allocate its memory when it is created (constructor function) and deallocate its memory when the compiler decides that it is finished with (destructor function) you don't have to worry about matching every single creation of the object with a matching destruction, you simply create them when you want them and let the compiler get rid of them when you are finished with them. To be strictly correct, modern C++ actually recommends against a developer doing memory management at all and recommends using STL containers to hold your data (which do the memory management themselves internally and have correctly implemented constructors and destructors). It really is a good idea to do this but scientific and research codes quite often find the odd edge cases where manually allocating and deallocating memory is the easiest solution.

Getting it for Free (Fortran Rules, OK?)

Since mostly in academic programming we tend to be working with arrays, C++ style objects are a fairly heavyweight way of dealing with problems of allocating and deallocating memory. It feels like it should be possible to have the same advantage of simply allocating an array when you need one but keep the advantage of allowing the compiler to automatically infer the lifetime of the variable and deallocate it when the lifetime is over without needing to go to a fully garbage collected model. In C that isn't really possible because arrays are mostly just pointers and the compiler can't be sure that a pointer is the only pointer to a block of memory. Programming would be impossible if your memory was deallocated as soon as any pointer to that memory went out of scope! But in more restrictive languages it is possible and fortunately Fortran is one of the languages that does have an option to do exactly that, and it's probably a feature of the language that you are already familiar with : the humble ALLOCATABLE array.


Allocatable Arrays in Fortran

Fortran allocatable arrays are very easy to use and anyone working in modern Fortran is probably familiar with them, but their properties are often not so well understood. For example to someone with a C background it feels as though this function should leak memory badly.

  SUBROUTINE alloc_fn(els)
    INTEGER, INTENT(IN) :: els
    INTEGER, DIMENSION(:), ALLOCATABLE :: array

    ALLOCATE(array(els))
  END SUBROUTINE alloc_fn

but it actually doesn't leak any memory at all (although it equally doesn't do anything useful here). It also doesn't crash because you are trying to reallocate an already allocated array. The Fortran standard requires that when an allocatable array goes out of scope it should be deallocated, so that function will run allocate the array as requested and then deallocate it again as soon as the function is over. One question that you might then ask is "What about if I did want the array to stick around?", and as usual in Fortran that is possible by using the SAVE attribute to the array

  SUBROUTINE alloc_fn(els)
    INTEGER, INTENT(IN) :: els
    INTEGER, DIMENSION(:), ALLOCATABLE, SAVE :: array

    IF (.NOT. ALLOCATED(array)) ALLOCATE(array(els))
  END SUBROUTINE alloc_fn

You'll notice that this time I've had to test if the array is allocated because otherwise I would wind up trying to allocate it twice and that will cause a runtime error. That is actually another nice feature of Fortran allocatable arrays that protects you from getting a common way of getting a memory leak in C - you cannot allocate an already allocated allocatable array. You can deallocate memory using a DEALLOCATE(array) statement and this can be useful if you want to explicitly manage the lifetime of your memory, for example if you have large intermediate arrays in a calculation that you don't want to have hanging around while you allocate other intermediate arrays. Many style guides do recommend manually deallocating memory on leaving a function, but that's mainly just a combination of caution and working round (mostly very old) broken compilers that don't comply with the standard.

Simple enough so far, but are there any pitfalls? Yes, but they tend to be a bit subtle.

Show your Intentions

Since Fortran 2003 an array argument to a function can have the "ALLOCATABLE" property, and that means that you can allocate and deallocate the array inside the function, for example

SUBROUTINE alloc_fn(els, array)
    INTEGER, INTENT(IN) :: els
    INTEGER, DIMENSION(:), ALLOCATABLE, INTENT(INOUT) :: array

    IF (ALLOCATED(array)) THEN
      PRINT *, "Deallocating"
      DEALLOCATE(array)
    END IF
    ALLOCATE(array(els))
  END SUBROUTINE alloc_fn

Nothing terribly surprising there. I call my function the first time on an unallocated array and it silently allocates, and if I call it again then it prints "Deallocating" and reallocates the array to the new size.

But what if I switch the INTENT argument of the array from "INOUT" (meaning that I can both read and write data to the array) to "IN" (meaning that I can read data from the array but not make changes). Happily as you might expect the compiler refuses to compile this code because it involves making changes to the array and I've specified that I can only read data from it.

But what about if I switch the intent to "OUT" (meaning that I can put data in the array but not read data from it)? You would probably expect this to work because I'm not using data from the array, but on second thoughts you might expect it to fail because I am testing the allocation status of the array. Well, if you try it it compiles, it runs and it allocates the array as expected. The strange thing is that the "Deallocating" print statement never triggers, and this is exactly how Fortran reads the "INTENT(OUT)" statement for allocatable arrays. Since INTENT(OUT) is supposed to mean that you don't take any information from this variable you must be assuming that it is not allocated when it enters the function SO IT DEALLOCATES IT IF IT IS ALLOCATED! This is useful but you have to be very careful! The same behaviour happens for types that contain allocatable components so watch out!

Coming Next - Pitfalls of Allocatables

There are more things to watch out for with Fortran allocatables, including the behaviour of array bounds when they are passed into functions in different ways, automatic reallocation of variables during array assignment (that sounds good but can cause absolute chaos!), the behaviour of types containing allocatable components and a few similar bits, but those will have to wait for part 2 of this post.


December 2024

Mo Tu We Th Fr Sa Su
Nov |  Today  |
                  1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31               

Search this blog

Tags

Galleries

Blog archive

Loading…
RSS2.0 Atom
Not signed in
Sign in

Powered by BlogBuilder
© MMXXIV