All 5 entries tagged Free

View all 30 entries tagged Free on Warwick Blogs | View entries tagged Free at Technorati | There are no images tagged Free on this blog

September 14, 2007

Multi–Machine Parallel Python Benchmarks

Follow-up to Benchmarking Parallel Python Against Jython Threading (Benchmarks Take 3) from The Utovsky Bolshevik Show

Having claimed in a previous post that Parallel Python's ability to use the processing power of more than a single machine would work in its favour even when compared to the times for Jython threading, I thought I should probably look at some results to see if this is the case.

As previously, the benchmark being used is to sum all the primes beneath each multiple of 10000 between 100000 and 1000000.  The code examples can be found at http://oddbloke.uwcs.co.uk/parallel_benchmarks/

The Jython script uses Tim Lesher's cookbook recipe for a thread pool.  The Parallel Python script uses a slightly tweaked version of one of the examples on the Parallel Python site.

The two machines over which this is being tested are the University of Warwick Computing Society's servers, Backus and Codd, with Codd being used as the master server.  Both these machines have two CPUs.

The setup for the slave machine really is as easy as:

$ ./ppserver.py -p 35000 -w 2 

Once this was set up, I proceeded to test the Jython and Parallel Python scripts.  Disappointingly, the Jython script used more memory than I have available on my ulimit'ed account when running more than a single thread.  I have approximated based on the previous results I've had.


1 Worker
2 Workers
3 Workers
4 Workers
8 Workers
Jython Threading
289s
~150s
N/A
N/A
N/A
Parallel CPython (1 machine)
660s
352s
353s
351s
N/A
Parallel CPython (2 machines)
N/A
185s
180s
183s
188s


Looking solely at the numbers for Parallel Python, it seems that the speedup gained by using a second machine is significant.  It should be noted that Parallel Python's default regardless of whether or not it had the second machine available was 2 workers, so the automatic detection code is obviously sub-optimal.  It's trivial to override, so this wasn't a problem.

When this is compared to Jython's threading, it doesn't look significant but when we consider Jython's arithmetic ability and the fact that Parallel Python can continue to scale beyond this, Parallel Python begins to look better and better.  It should also be noted that, unsurprisingly, Jython uses a considerable amount more memory than CPython does.

EDIT: As pointed out in the comments, Jesse Noller has also started looking into benchmarking this sort of stuff.


September 11, 2007

Benchmarking Parallel Python Against Jython Threading (Benchmarks Take 3)

Follow-up to Benchmarking Parallel Python Against Threading from The Utovsky Bolshevik Show

Having had it pointed out to me that benchmarking against CPython threading is pointless, I am now going to do what I should have done originally (third time's the charm, right?) and benchmark Parallel CPython against threaded Java, in the hopes I will fail less at producing something useful.

Each of these results is the time it takes to sum the prime numbers below each multiple of 10000 between 100000 and 1000000 (i.e. perform the operation 90 times on numbers increasing by 10000 each time). 

I'm reusing the Parallel Python results from previously.

I decided to use Tim Lesher's cookbook recipe to test threads, as I already have a script which doesn't require a great deal of rewriting to make it Jython (i.e. CPython 2.2 or so) compatible.

Now, the results:


1 Worker
2 Workers
4 Workers
Vanilla CPython
1195s
N/A
N/A
Parallel CPython
1153s
601s
582s
Jython Threads
442s
241s
254s

As can be seen here, Jython threads by far and away beat Parallel CPython.  This does not, however, take into account the fact that Parallel Python can use several machines at once, which Jython threading obviously cannot do.

What's interesting to note is that Parallel CPython on one worker is roughly the same as standard GIL'd CPython (slightly faster, in fact, in this case).  If you need to write and deploy CPython as opposed to Jython, then there's no performance cost in writing parallelisable code to use Parallel Python regardless of end user (as PP, by default, spawns a number of workers equal to the available CPUs).

These statistics were taken on an IBM Thinkpad T60 with a Core Duo T2400 running Ubuntu Feisty GNU/Linux (using the standard packages where available) using the scripts found under http://oddbloke.uwcs.co.uk/parallel_benchmarks/ . 

Hopefully these are useful statistics and conclusions, as opposed to my previous efforts to produce such. :) 


Benchmarking Parallel Python Against Threading

Follow-up to Benchmarking Parallel Python from The Utovsky Bolshevik Show

Having had it pointed out to me that my last benchmarking post is fairly useless without a comparison to threading by a couple of people, I now have such a comparison.  The numbers for PP are those used in the last blog post.

For threads I initially tried using Christopher Arndt's threadpool module to make my life easier.  I've included these results in the table below and, looking at them, you can see why I thought had to find a different way of testing threads.

I decided to use Tim Lesher's cookbook recipe to retest threads.

The function used by all the methods is identical, so this should just be a measure of their performance.

Without further ado, the results: 


1 Worker
2 Workers
4 Workers
Parallel Python
1153s
601s
582s
threadpool
1176s
1246s
1254s
Cookbook Recipe
1175s
1238s
1362s


Obviously these results don't reflect brilliantly on threads.  What I did notice is that it was only Parallel Python that used more than 1 of my processors, which I presume is something GIL related.

Either Parallel Python is an excellent improvement over threads, or I'm doing something stupid regarding threads.  If the latter, please let me know and I'll run the benchmarks again.


Benchmarking Parallel Python

Writing about web page http://www.artima.com/weblogs/viewpost.jsp?thread=214303

This post is Bruce Eckel’s follow-up to his previous post which covered, among other things, concurrency within Python. Basically, CPython has the Global Interpreter Lock (GIL) which makes life very awkward for those wanting to run Python on more than one processor.

Anyhow, in this post Bruce points to Parallel Python as an add-on module which is a potential solution. I had a look at this and thought it was pretty cool. However, bearing in mind Guido van Rossum’s post about the performance implications of removing the GIL last time it was attempted I thought I’d see if this actually did provide a speed-up and benchmark it.

The following stats are for calculating the sum of primes below every multiple of 10000 between 105 and 106 (including the lower bound and excluding the upper). The first set uses only one working thread0 of my Core Duo laptop and the second set uses two (as I have two processors).

It should be noted that the code snippet being used is provided as an example on the Parallel Python website and so is probably one of their most optimal cases. Regardless, I think the numbers are helpful.

One Processor

Real Time Taken: 1153.53128409 s
Number of jobs: 90
Total Job Time: 1153.53128409 s
Time/Job: 12.816742

Two Processors

Real Time Taken: 601.201694012 s
Number of jobs: 90
Total Job Time: 1180.9738
Time/Job: 13.121931

It can be seen that running two worker threads increases the actual CPU time used by around 30 seconds but the fact that two processors are being used leads to a total speed up factor of 1.918709304, which is pretty impressive.

0 I’m not sure of the internals, so I don’t know if it is technically a thread. Regardless, only one calculation will happen at a time.


September 09, 2007

Vim Omnicomplete Awesomeness

I just discovered, through jerbear in #python, the omnicomplete feature in Vim 7. This is something that I’ve been idly in hope of for ages, and to discover it actually exists in Vim already is awesome (hence the title).

omnicomplete searches through any files you’ve imported (including Python library modules) and completes names you might possibly want to use:

To do this requires the rather awkward0 key combination of Ctrl-X, Ctrl-O. After much effort1, I rebound the key combination so Ctrl-Space will work as well. This requires the addition of the line “inoremap <Nul> <C-x><C-o>” to your .vimrc2. This doesn’t work for the graphical Vim, where you will probably want ‘C-space’ instead of ‘Nul’ (though I can’t be sure).

Everyone may already be aware of this, but for those who aren’t, check it out!




0 And strangely Emacs-y.

1 Thanks again to jerbear, as well as \amethyst and Heptite in #vim.

2 Or the use of the command “:inoremap <Nul> <C-x><C-o>” when within Vim.


August 2014

Mo Tu We Th Fr Sa Su
Jul |  Today  |
            1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Search this blog

Tags

Galleries

Most recent comments

  • I should note that the talk info is available at http://linux.conf.au/programme/detail?TalkID=293 an… by on this entry
  • "I X'd you a Y, but I eated it by Lamby on this entry
  • Nice. Did not know it was that easy, I had a few problems to get it working some time ago. Have you … by mats on this entry
  • You can't make progress if you just argue, I try to be constructive. Cheers! by James on this entry
  • Thanks a bunch for doing this. I think that both Guido and Bruce Eckel are right; we really need peo… by Eli on this entry

Blog archive

Loading…
RSS2.0 Atom
Not signed in
Sign in

Powered by BlogBuilder
© MMXIV