All entries for Wednesday 08 June 2011

June 08, 2011

Learning Scala, a step at a time

So, I have decided to make a more serious effort to learn scala . It fits almost all of my ‘ideal language’ features; it’s object-oriented and it’s statically typed (like java), it’s reasonably terse (unlike java) and it has good support for a more functional style of programming, without forcing you to use it all the time. It also runs on the JVM, which is a good thing if you care about observability, manageability, and all the other stuff that ops people care about.

The downsides, as far as I’ve found so far, are that (a) the tooling is nothing like as good as it is for java. The Eclipse Scala plugin appears to be the best the current bunch, but even there, support for things like refactoring (which is kind of the point of a statically-typed language) is pretty sketchy. And (b) it’s a bit of an unknown quantity in production. Yes, there are some big, high-scale users (Twitter spring to mind), but they’re the kind that can afford to have the best engineers in the world. They could make anything scale. Maybe scala will scale just as well in a more “average” operational environment, but there doesn’t seem to be a huge amount of evidence one way or another at the moment, which may make it a hard sell to a more conservative operations group.

On with the show. The best resources I’ve found so far for learning the language are:

- Daniel Spiewak’s series Scala for Java Refugees . Absolutely brilliant series of articles that do exactly what they say on the tin – put scala firmly into the domain of existing java programmers. Which brings me to….

- Tony Morris’ Scala Exercises for beginners. I had a hard time with these, not so much because they were difficult, but because coming from a non-comp-sci background, I found it hard to motivate myself to complete exercises as dry as “implement add(x: Int, y: Int) without using the ’+’ operator”. I also found the slightly evangelical tone of some of the other blog entries (TL;DR:”All you java proles are dumb”) a bit annoying. But working through the exercises undoubtedly improved my understanding of Scala’s support for FP, so it was worth it. On the Intermediate exercises though, I don’t even know where to start

- Akka Akka is an actor framework for scala and java, so it’s not really a scala resource per se but, it does such a great job of explaining actor-based concurrency that it’s worth reading though and playing with, because you’ll come away with a much clearer idea of how to use either Akka or Scala’s own Actors.

- Project Euler – A series of maths-based programming challenges – find the prime factors of a number, sum the even fibonaci numbers less than a million, and so on. Being maths-ish, they’re a good match for a functional style of programming, so I find it helpful to start by just hacking together a java-ish solution, and then seeing how much smaller and more elegant I can make it. It’s a great way to explore both the syntax, and the Collections APIs, which are at the heart of most of scala’s FP support. It’s also a nice way to learn ScalaTest (the unit testing framework for scala), since my hack-and-refactor approach is reliant on having a reasonable set of tests to catch my misunderstandings.
It’s also exposed some of my concerns about the operational readiness of scala. I came up with what I thought was a fairly neat implementation of a prime-factoring function; obtain the lowest factor of a number (ipso facto prime), divide the original by that number, and repeat until you can’t factor any further:

 lazy val naturals: Stream[Long] = Stream.cons(1, naturals.map(_ + 1))

  def highestFactor(compound: Long): Long = {
   naturals.drop(1).takeWhile(_ < compound/ 2).find(
        (compound % _ == 0)).map(
            highestFactor(compound/_)).getOrElse(compound)
  }

(source)
– which works beautifully for int-sized numbers, but give it a prime towards the upper bounds of a Long and it runs out of heap space in the initial takeWhile (before it’s even had a chance to start recursively calling itself). It seems that Stream.takeWhile doesn’t permit the taken elements to be garbage-collected, which is counter-intuitive. I wouldn’t be at all surprised to find that I’m Doing It Wrong, but then again, shouldn’t the language be trying hard to stop me?


Java String.substring() heap leaks

Here’s an interesting little feature that we ran into a short while ago…

Suppose I have something like a Map which will exist for a long time (say, an in-memory cache), and a large, but short-lived String. I want to extract a small piece of text from that long string, and use it as a key in my map

Map<String,Integer> longLivedMap = ...
String veryLongString = ...
String shortString = veryLongString.subString(5,3);
longLivedMap.put(shortString,123);

Question: How much heap space have we just consumed by adding “abc”=>123 into our map? You might think that it would be just a handful of bytes – the 3-character String, the Integer, plus the overhead for the types. But you would be entirely wrong. Java Strings are backed by char arrays, and whilst the String.subString() method returns a new String, it is backed by the same char array as the originating String. So now the entire veryLongString char[] has a long-lived reference and can’t be garbage collected, even though only 3 chars of it are actually accessible. Rapid heap-exhaustion, coming right up!

The solution is pretty straightforward; if you want to hold a long-lived reference to a string, call new String(string) first. Something like

String shortString = veryLongString.subString(5,3);
longLivedMap.put(new String(shortString),123);

It would be counter-productive to do this on every substring, since most of the time the substring and the original source will go out of scope at the same time, so sharing the underlying char[] is a sensible way to reduce overhead.

Since you (hopefully) won’t have many separate places in your code where you’re storing long-lived references like this (just cache implementations of one sort or another), you can create the new strings inside the implementation of those classes instead. Now your caches are all light, and your heap is all right. Good Night.


Most recent entries

Loading…

Search this blog

on twitter...


    Tags

    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXIX