All entries for November 2024
November 27, 2024
How does MPI parallel code actually run?
One of those things I wondered about for a long time, before just getting on and finding out is this:
when I run an MPI parallel code, what actually happens?
And, before forging ahead to answer my question I should clarify why I want to know - well basically so I can understand what happens in error cases and edge cases, such as starting a program without mpiexec, or mpiexec-ing a non MPI program.
I know that MPI lets me run multiple, interacting, instances of my program, and lets those communicate with each other. But I also know that I can start my program without using mpiexec (or mpirun, srun or any other invocation) and it might work as a serial code - but it doesn't always. I know that MPI_Init is really important, but I don't know what a program can and can't do before that line, or how a completely empty program would behave. I don't understand what an MPI program without any comms in would actually do. I am not certain whether any bits of my program are somehow shared - data or state or communicators.
As usual, I could answer all of these questions individually, but there's a good chance I can answer them all if I can just work out what's missing in my mental model. It turns out this is what I hadn't realised:
mpiexec starts N independent copies of my program. When my code reaches the MPI_Init function, communication is established (using some info provided by the launcher) - the copies are made aware of each other and assigned their ranks. MPI_Finalize is where the comms is shut down. *
Obvious in retrospect, but it answers all of my questions.
- Starting a single instance of my code (a serial version) will work as long as my algorithm _can_ work on a single processor, without deadlocks etc. But starting N independent copies won't make for a parallel run, because the information (or daemons etc) MPI_Init needs won't be present - it wont know about the other copies, or even how many copies there are.
- Before the MPI_Init line, my code can do anything that doesn't use communication - no MPI calls, no use of communicators etc. That means I can't know how many processors I am running on (we don't know ranks), or if I am the root (proc 0) or anything like that.
- A completely empty program, or one where I never call MPI_Init, will run N independent copies, but they will never know about each other. Just like the parts of my program before the Init. This also tells me what happens if I mpiexec a completely non-MPI program.
- A program that calls MPI_Init but has no actual comms can still be a parallel program - if I can split up my work with nothing other than MPI_Comm_size or MPI_Comm_rank, for instance dividing my work into N blocks, I can do that work in parallel (as long as I am careful about outputting the final product of my work blocks).
- The one thing I can't definitively answer using this is whether I can mpiexec a program that wasn't compiled as an MPI program. But I can guess, based off the fact that a program without MPI_Init can be valid, that I would probably get N independent programs, and I'd be right, as it happens.
- Finally, I can easily see that no program state can possibly be shared, because my program copies are independent, with their own memory spaces. Things like communicators must contain information sufficient for the message passing "layer" to pass information between copies of my program.
Note that I put a '*' on my statement of what actually happens - this is "correct from the perspective of my program", but a little incomplete in general. The mpiexec launcher can, and generally does, do some elements of setting up comms, but this is lower level than my program and doesn't affect how it behaves or what it can do. I also omitted anything about the compiler step - since I know MPI uses compiler wrapper scripts, something could happen at this stage, which is why I can't completely answer that penultimate question without using some more information.