Benchmarking Parallel Python
Writing about web page http://www.artima.com/weblogs/viewpost.jsp?thread=214303
This post is Bruce Eckel’s follow-up to his previous post which covered, among other things, concurrency within Python. Basically, CPython has the Global Interpreter Lock (GIL) which makes life very awkward for those wanting to run Python on more than one processor.
Anyhow, in this post Bruce points to Parallel Python as an add-on module which is a potential solution. I had a look at this and thought it was pretty cool. However, bearing in mind Guido van Rossum’s post about the performance implications of removing the GIL last time it was attempted I thought I’d see if this actually did provide a speed-up and benchmark it.
The following stats are for calculating the sum of primes below every multiple of 10000 between 105 and 106 (including the lower bound and excluding the upper). The first set uses only one working thread0 of my Core Duo laptop and the second set uses two (as I have two processors).
It should be noted that the code snippet being used is provided as an example on the Parallel Python website and so is probably one of their most optimal cases. Regardless, I think the numbers are helpful.
One Processor
Real Time Taken: 1153.53128409 s
Number of jobs: 90
Total Job Time: 1153.53128409 s
Time/Job: 12.816742
Two Processors
Real Time Taken: 601.201694012 s
Number of jobs: 90
Total Job Time: 1180.9738
Time/Job: 13.121931
It can be seen that running two worker threads increases the actual CPU time used by around 30 seconds but the fact that two processors are being used leads to a total speed up factor of 1.918709304, which is pretty impressive.
—
0 I’m not sure of the internals, so I don’t know if it is technically a thread. Regardless, only one calculation will happen at a time.
Dror Levin
Parallel Python uses processes and IPC, not threads, precisely because of the GIL.
This is nothing new, there isn’t even a comparison to threads which will show the speed remains about the same.
11 Sep 2007, 17:36
I wasn’t really intending to compare this to threads or other ways of parallelising in Python. I just wanted to look at the numbers for this way of doing it (as Bruce Eckel seemed interested) and thought that other people might be interested in seeing those numbers as well.
Another thing to note is that Parallel Python has support for using other machines as well as locally. Obviously this isn’t tested above, but this is a reason to use PP rather than threads if you might want to expand to a number of machines…
11 Sep 2007, 18:20
I’ve now posted updated benchmarks including a comparison with threading.
11 Sep 2007, 21:49
Add a comment
You are not allowed to comment on this entry as it has restricted commenting permissions.