October 16, 2007

Writing functional Java with Google Collections

I’ve been experimenting with Google’s new Collections API, which are a kind of type-safe, stripped-down version of Jakarta Commons Collections, providing you with some (though not all) of the list-processing features that are commonplace in more functional languages – like each, collect, detect and inject in Ruby for instance.

In theory, this should give a big win in terms of reducing code complexity and making for a better-decoupled and more testable design.

In practice, this turns out to be true, but with a somewhat unpleasant side effect. Java’s type-checking and lack of support for closures or code blocks means that when you switch to this style of coding, you end up introducing a lot of new little classes, typically anonymous inner classes for things like predicates and functions, which get used once and then thrown away.

For instance; this code has a cyclomatic complexity of 9, which is just barely acceptable. But it’s fairly readable, if you know what the object model looks like.

 for (Content content: page.getContents().values()) {
            for (ContentFetcher cf: content.getContentFetchers()) {
                if (cf instanceof AbstractFileBackedContentFetcher) {
                    String cfFile = ((AbstractFileBackedContentFetcher) cf).getFileName();
                    File directory = new File(rootDir, cfFile).getParentFile();
                    if (directory != null && directory.exists()) {
                        for (File subfile: directory.listFiles()) {
                            if (subfile.isFile() && !filenames.contains(subfile.getAbsolutePath())) {
                                files.add(subfile);
                                filenames.add(subfile.getAbsolutePath());
                            }
                        }
                    }
                }
            }
        }

listifying it, we get something like this. Here the cyclomatic complexity is about 5 – much better – but we’ve had to introduce two new anonymous inner classes, and there are a lot of awfully long lines of code.

        Predicate<AbstractFileBackedContentFetcher> cfDirectoryExists = new Predicate<AbstractFileBackedContentFetcher>() {
            public boolean apply(AbstractFileBackedContentFetcher cf) {
                File dir = new File(rootDir, cf.getFileName()).getParentFile();
                return dir != null && dir.exists();
            }
        };
        FileFilter okToAddFile = new FileFilter() {
            public boolean accept(File pathname) {
                return pathname.isFile() && !filenames.contains(pathname.getAbsolutePath());
            }
        };

        for (Content content: page.getContents().values()) {

            Iterable<AbstractFileBackedContentFetcher> contentFetchersWithExistingDirs = filter(filter(
                    content.getContentFetchers(), AbstractFileBackedContentFetcher.class), cfDirectoryExists);

            for (AbstractFileBackedContentFetcher cf: contentFetchersWithExistingDirs) {
                File[] subfiles = new File(rootDir, cf.getFileName()).getParentFile().listFiles(okToAddFile);
                for (File subfile: subfiles) {
                    files.add(subfile);
                    filenames.add(subfile.getAbsolutePath());
                }
            }
        }

Is that better? I’m not convinced either way. The second block of code is structurally simpler, but it’s about 50% more code. As a general rule, less code is better code...

I think that if you have frequently-used predicates and transformation functions (the instanceOf predicate which GC uses in Iterables.filter(Iterable,class) for instance), then it’s well worth the effort. But if you’re defining a predicate just to avoid having an if inside a foreach, then it’s less clear. Maybe when java 7 comes along and gives us closures (with some syntactic sugar to keep them concise) things will be better.

A related ugliness here (at least, if you’re used to the Ruby way of doing this) is google’s choice not to extend java.util.Iterable, but rather to have the collections methods defined static on the Iterables class. So instead of writing something like

GoogleIterable pages = new GoogleIterable(getPagesList())
files = pages.filter(WebPage.class).filter(undeletedPages).transform(new PagesToFilesTransformation())

we have to do

files = Iterables.transform(Iterables.filter(Iterables.filter(getPagesList(),WebPage.class),undeletedPages),new PagesToFilesTransformation()))

which to my eye is a lot more confusing – it’s harder to see which parameters match up with which methods.

Update I’ve now written my own IternalIterable class, which wraps an Iterable and provides filter(), transform(), inject(), sort() and find() methods that just delegate to google-collections (except for inject() which is all mine :-) ). It’s not very long or complex, and it’s already tidied up some previously ugly code rather nicely.


- 7 comments by 1 or more people Not publicly viewable

[Skip to the latest comment]
  1. ocean

    I don’t think ‘less code is better code’ is true in any meaningful sense. Good code, like good writing, contains the bare minimum necessary to communicate some message while being maintainable and readable. There is no absolute value to less code. The second version is more readable, it’s much easier to quickly skim the code and get some idea of what the code should do and this is because it’s much more explicit and contains more code (it contains variables like ‘contentFetchersWithExistingDirs’). Though note that you shouldn’t need two filter’s there, it ought to be possible to combine both filters into a single predicate and write code like:

    for (AbstractFileBackedContentFetcher f : filter(content.getContentFetchers(), fileFetcherWithExistingDir)) {
    ....
    }

    That’s a bit more scannable and rolls off the tongue much easier. Anonymous, inner classes are throwaway classes and are meant just for cases like this where there’s a lot to be gained by defining some variable/classes up front. Personally I think this sort of code is both easier to read AND write since it lets you break down the problem a bit.

    And yeah, it really sucks that google didn’t extend java.util.Iterable.

    16 Oct 2007, 12:00

  2. Chris May

    I don’t think ‘less code is better code’ is true in any meaningful sense.

    yeah; I probably should have qualified that. All other things being equal less code is better code. Clearly in this case all other things are not equal :-)

    The two filters are so that I can avoid an instanceof test and a cast in my Predicate. content.getContentFetchers() returns other classes than AbstractFileBackedContentFetcher, so first I use filter(Iterable,type) to get an Iterable of the right type, which I then can predicate-filter type-safely. Whether the double filter is any ‘better’* than a predicate with a cast in, I’m not sure…

    * for a suitable value of ‘good’

    16 Oct 2007, 12:15

  3. Stephan Schmidt

    Am I wrong or does it not look more like:

      import static Iterables.*;
    
      transform( filter( filter(getPagesList(),WebPage.class), undeletedPages), new PagesToFilesTransformation()))

    Which isn’t that ugly.

    Better would have been to write an FluentIterable interface which returns itself, e.g. like here

    http://stephan.reposita.org/archives/2007/10/10/fluent-interface-and-reflection-for-object-building-in-java/

    Then it would be

    filter(getPagesList(), WebPage.class).filter(undeletedPages).transform( new PagesToFilesTransformation );

    The FluentInterable class could be as short as 20 lines of code I guess.

    Peace
    -stephan


    Stephan Schmidt :: stephan@reposita.org
    Reposita Open Source – Monitor your software development
    http://www.reposita.org
    Blog at http://stephan.reposita.org – No signal. No noise.

    16 Oct 2007, 16:30

  4. Chris May

    using static imports certainly improves things, but IMO still suffers from the problem that the parameters are too far away from the methods. Scanning the line above, it’s still very hard to see what the second argument to the first filter call is (it’s undeletedPages)

    Making a fluent interface to do this might be a little risky, because I think there’s an expectation* that a method like Iterable.filter() wouldn’t modify the source iterable itself, but rather would return a new, filtered iterable. As I understand them, Fluent interfaces return themselves, not copies of themselves. Maybe we need something like ruby’s ”!” convention( Iterable.filter!() ) to signify that it’s changing the source object.

    * at least, I have that expectation :-)

    16 Oct 2007, 17:19

  5. Stephan Schmidt

    Hmm.

    If played a little with my IDE and got:

        List names = Arrays.asList("Stephan", "Chris", "Mike", "Miha", "Katrin");
    
        Iterable filtered = with(names).filter( or(isEqualTo("Chris"), isEqualTo("Miha"))).filter(lengthLessThan(5));

    The or(..) construct isn’t very nice, this will improve with closures in Java 7 I hope*. Perhaps I can get

      isEqualTo("chris").or().isEqualTo("miha")

    working (or something like that). Filtereing, mapping and iterating is a closure thing for sure. But I think my line is readable enough. I’ll try to add some transformations next and see where I end.

    *) On the Quaere blog people have suggested to use ( s "Chris" || s “Miha” ) and then go back in the filter method and parse ther stack trace to get the right expression to filter.

    Peace
    -stephan

    16 Oct 2007, 18:24

  6. Stephan Schmidt

    And of course the source list is not changed :-)

    16 Oct 2007, 19:08

  7. Kevin Bourrillion

    Chris, I agree with you on all counts. The syntax sucks. At the same time, this stuff is now, has always been and will always be nothing but a stopgap measure until the right language change can come along, at which time we can finally really decide on the optimal syntax and have functional-style programming start actually making lives better in Java for once.

    So I’m undecided how much effort to put into the Function/Predicate stuff; it’s in the library more because it sort of had to be, not so much because we think it’s a crown jewel.

    Incidentally, comments like ocean’s “it really sucks that google didn’t extend java.util.Iterable” clearly assume a level of “finality” that just isn’t there yet. We still reserve the right to make changes, even large ones. 1.0 is quite some months away - I’d be stunned and thrilled if we had it ready for JavaOne - so we’ll see how things develop. We value the influx of ideas like yours. Please join our list(s).

    17 Oct 2007, 11:36


Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

Trackbacks

Most recent entries

Loading…

Search this blog

on twitter...


    My Flickr Photos Go to 'Uploads from chrismay'

    Uploads from chrismay

    Mon, Sep 21, 2009
    North America, looking up towards Cut Gate

    chrismay posted a photo:

    North America, looking up towards Cut Gate

    Cranberry Clough

    chrismay posted a photo:

    Cranberry Clough

    Cut Gate

    chrismay posted a photo:

    Cut Gate

    Langsett Reservoir

    chrismay posted a photo:

    Langsett Reservoir

    Wed, Aug 19, 2009
    Another leamington sunset

    chrismay posted a photo:

    Another leamington sunset

    Mon, Aug 17, 2009
    P1020332

    chrismay posted a photo:

    P1020332

    Old and new

    chrismay posted a photo:

    Old and new

    105 inner ring from the commuting bike after ~6,000 miles. I can see now why it was skipping a bit. Let's see if the new one lasts as well.

    Fancy duck

    chrismay posted a photo:

    Fancy duck

    No idea what this is. Alien?

    chrismay posted a photo:

    No idea what this is. Alien?

    Summertime!

    chrismay posted a photo:

    Summertime!

    del.icio.us links Go to 'Delicious/chrismay'

    Tue, Nov 03, 2009
    How To Use An Apostrophe - The Oatmeal
    Bits of Evidence
    "What we actually know about software development and Why we believe it's true"
    Mon, Nov 02, 2009
    campingbarns.net : Maggs Howe, Kendal, Cumbria
    Fri, Oct 30, 2009
    Git merge squash repeatedly - Stack Overflow
    Thu, Oct 29, 2009
    Underscore.js
    functional JS library
    Wed, Oct 28, 2009
    Vanilla - Free, Open-Source Forum Software
    Tue, Oct 27, 2009
    Sonar
    Mon, Oct 26, 2009
    Dormando's [crappy] Operations Mantras
    Fri, Oct 23, 2009
    As a librarian, it is heresy to say this, but in a few short years Google Scholar has improved sufficiently and now pretty much whomps up on a lot of the subscription databases.
    "As a librarian, it is heresy to say this, but in a few short years Google Scholar has improved sufficiently and now pretty much whomps up on a lot of the subscription databases."
    Thu, Oct 22, 2009
    Kanban development oversimplified: a simple explanation of how Kanban adds to the ever-growing Agile toolkit

    Tags

    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMIX