October 16, 2007

Writing functional Java with Google Collections

I’ve been experimenting with Google’s new Collections API, which are a kind of type-safe, stripped-down version of Jakarta Commons Collections, providing you with some (though not all) of the list-processing features that are commonplace in more functional languages – like each, collect, detect and inject in Ruby for instance.

In theory, this should give a big win in terms of reducing code complexity and making for a better-decoupled and more testable design.

In practice, this turns out to be true, but with a somewhat unpleasant side effect. Java’s type-checking and lack of support for closures or code blocks means that when you switch to this style of coding, you end up introducing a lot of new little classes, typically anonymous inner classes for things like predicates and functions, which get used once and then thrown away.

For instance; this code has a cyclomatic complexity of 9, which is just barely acceptable. But it’s fairly readable, if you know what the object model looks like.

 for (Content content: page.getContents().values()) {
            for (ContentFetcher cf: content.getContentFetchers()) {
                if (cf instanceof AbstractFileBackedContentFetcher) {
                    String cfFile = ((AbstractFileBackedContentFetcher) cf).getFileName();
                    File directory = new File(rootDir, cfFile).getParentFile();
                    if (directory != null && directory.exists()) {
                        for (File subfile: directory.listFiles()) {
                            if (subfile.isFile() && !filenames.contains(subfile.getAbsolutePath())) {
                                files.add(subfile);
                                filenames.add(subfile.getAbsolutePath());
                            }
                        }
                    }
                }
            }
        }

listifying it, we get something like this. Here the cyclomatic complexity is about 5 – much better – but we’ve had to introduce two new anonymous inner classes, and there are a lot of awfully long lines of code.

        Predicate<AbstractFileBackedContentFetcher> cfDirectoryExists = new Predicate<AbstractFileBackedContentFetcher>() {
            public boolean apply(AbstractFileBackedContentFetcher cf) {
                File dir = new File(rootDir, cf.getFileName()).getParentFile();
                return dir != null && dir.exists();
            }
        };
        FileFilter okToAddFile = new FileFilter() {
            public boolean accept(File pathname) {
                return pathname.isFile() && !filenames.contains(pathname.getAbsolutePath());
            }
        };

        for (Content content: page.getContents().values()) {

            Iterable<AbstractFileBackedContentFetcher> contentFetchersWithExistingDirs = filter(filter(
                    content.getContentFetchers(), AbstractFileBackedContentFetcher.class), cfDirectoryExists);

            for (AbstractFileBackedContentFetcher cf: contentFetchersWithExistingDirs) {
                File[] subfiles = new File(rootDir, cf.getFileName()).getParentFile().listFiles(okToAddFile);
                for (File subfile: subfiles) {
                    files.add(subfile);
                    filenames.add(subfile.getAbsolutePath());
                }
            }
        }

Is that better? I’m not convinced either way. The second block of code is structurally simpler, but it’s about 50% more code. As a general rule, less code is better code...

I think that if you have frequently-used predicates and transformation functions (the instanceOf predicate which GC uses in Iterables.filter(Iterable,class) for instance), then it’s well worth the effort. But if you’re defining a predicate just to avoid having an if inside a foreach, then it’s less clear. Maybe when java 7 comes along and gives us closures (with some syntactic sugar to keep them concise) things will be better.

A related ugliness here (at least, if you’re used to the Ruby way of doing this) is google’s choice not to extend java.util.Iterable, but rather to have the collections methods defined static on the Iterables class. So instead of writing something like

GoogleIterable pages = new GoogleIterable(getPagesList())
files = pages.filter(WebPage.class).filter(undeletedPages).transform(new PagesToFilesTransformation())

we have to do

files = Iterables.transform(Iterables.filter(Iterables.filter(getPagesList(),WebPage.class),undeletedPages),new PagesToFilesTransformation()))

which to my eye is a lot more confusing – it’s harder to see which parameters match up with which methods.

Update I’ve now written my own IternalIterable class, which wraps an Iterable and provides filter(), transform(), inject(), sort() and find() methods that just delegate to google-collections (except for inject() which is all mine :-) ). It’s not very long or complex, and it’s already tidied up some previously ugly code rather nicely.


- 7 comments by 1 or more people Not publicly viewable

[Skip to the latest comment]
  1. ocean

    I don’t think ‘less code is better code’ is true in any meaningful sense. Good code, like good writing, contains the bare minimum necessary to communicate some message while being maintainable and readable. There is no absolute value to less code. The second version is more readable, it’s much easier to quickly skim the code and get some idea of what the code should do and this is because it’s much more explicit and contains more code (it contains variables like ‘contentFetchersWithExistingDirs’). Though note that you shouldn’t need two filter’s there, it ought to be possible to combine both filters into a single predicate and write code like:

    for (AbstractFileBackedContentFetcher f : filter(content.getContentFetchers(), fileFetcherWithExistingDir)) {
    ....
    }

    That’s a bit more scannable and rolls off the tongue much easier. Anonymous, inner classes are throwaway classes and are meant just for cases like this where there’s a lot to be gained by defining some variable/classes up front. Personally I think this sort of code is both easier to read AND write since it lets you break down the problem a bit.

    And yeah, it really sucks that google didn’t extend java.util.Iterable.

    16 Oct 2007, 12:00

  2. Chris May

    I don’t think ‘less code is better code’ is true in any meaningful sense.

    yeah; I probably should have qualified that. All other things being equal less code is better code. Clearly in this case all other things are not equal :-)

    The two filters are so that I can avoid an instanceof test and a cast in my Predicate. content.getContentFetchers() returns other classes than AbstractFileBackedContentFetcher, so first I use filter(Iterable,type) to get an Iterable of the right type, which I then can predicate-filter type-safely. Whether the double filter is any ‘better’* than a predicate with a cast in, I’m not sure…

    * for a suitable value of ‘good’

    16 Oct 2007, 12:15

  3. Stephan Schmidt

    Am I wrong or does it not look more like:

      import static Iterables.*;
    
      transform( filter( filter(getPagesList(),WebPage.class), undeletedPages), new PagesToFilesTransformation()))

    Which isn’t that ugly.

    Better would have been to write an FluentIterable interface which returns itself, e.g. like here

    http://stephan.reposita.org/archives/2007/10/10/fluent-interface-and-reflection-for-object-building-in-java/

    Then it would be

    filter(getPagesList(), WebPage.class).filter(undeletedPages).transform( new PagesToFilesTransformation );

    The FluentInterable class could be as short as 20 lines of code I guess.

    Peace
    -stephan


    Stephan Schmidt :: stephan@reposita.org
    Reposita Open Source – Monitor your software development
    http://www.reposita.org
    Blog at http://stephan.reposita.org – No signal. No noise.

    16 Oct 2007, 16:30

  4. Chris May

    using static imports certainly improves things, but IMO still suffers from the problem that the parameters are too far away from the methods. Scanning the line above, it’s still very hard to see what the second argument to the first filter call is (it’s undeletedPages)

    Making a fluent interface to do this might be a little risky, because I think there’s an expectation* that a method like Iterable.filter() wouldn’t modify the source iterable itself, but rather would return a new, filtered iterable. As I understand them, Fluent interfaces return themselves, not copies of themselves. Maybe we need something like ruby’s ”!” convention( Iterable.filter!() ) to signify that it’s changing the source object.

    * at least, I have that expectation :-)

    16 Oct 2007, 17:19

  5. Stephan Schmidt

    Hmm.

    If played a little with my IDE and got:

        List names = Arrays.asList("Stephan", "Chris", "Mike", "Miha", "Katrin");
    
        Iterable filtered = with(names).filter( or(isEqualTo("Chris"), isEqualTo("Miha"))).filter(lengthLessThan(5));

    The or(..) construct isn’t very nice, this will improve with closures in Java 7 I hope*. Perhaps I can get

      isEqualTo("chris").or().isEqualTo("miha")

    working (or something like that). Filtereing, mapping and iterating is a closure thing for sure. But I think my line is readable enough. I’ll try to add some transformations next and see where I end.

    *) On the Quaere blog people have suggested to use ( s "Chris" || s “Miha” ) and then go back in the filter method and parse ther stack trace to get the right expression to filter.

    Peace
    -stephan

    16 Oct 2007, 18:24

  6. Stephan Schmidt

    And of course the source list is not changed :-)

    16 Oct 2007, 19:08

  7. Kevin Bourrillion

    Chris, I agree with you on all counts. The syntax sucks. At the same time, this stuff is now, has always been and will always be nothing but a stopgap measure until the right language change can come along, at which time we can finally really decide on the optimal syntax and have functional-style programming start actually making lives better in Java for once.

    So I’m undecided how much effort to put into the Function/Predicate stuff; it’s in the library more because it sort of had to be, not so much because we think it’s a crown jewel.

    Incidentally, comments like ocean’s “it really sucks that google didn’t extend java.util.Iterable” clearly assume a level of “finality” that just isn’t there yet. We still reserve the right to make changes, even large ones. 1.0 is quite some months away - I’d be stunned and thrilled if we had it ready for JavaOne - so we’ll see how things develop. We value the influx of ideas like yours. Please join our list(s).

    17 Oct 2007, 11:36


Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

Most recent entries

Loading…

Search this blog

on twitter...


    Tags

    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXVI