Java String.substring() heap leaks
Here’s an interesting little feature that we ran into a short while ago…
Suppose I have something like a Map which will exist for a long time (say, an in-memory cache), and a large, but short-lived String. I want to extract a small piece of text from that long string, and use it as a key in my map
Map<String,Integer> longLivedMap = ... String veryLongString = ... String shortString = veryLongString.subString(5,3); longLivedMap.put(shortString,123);
Question: How much heap space have we just consumed by adding “abc”=>123 into our map? You might think that it would be just a handful of bytes – the 3-character String, the Integer, plus the overhead for the types. But you would be entirely wrong. Java Strings are backed by char arrays, and whilst the String.subString() method returns a new String, it is backed by the same char array as the originating String. So now the entire veryLongString char has a long-lived reference and can’t be garbage collected, even though only 3 chars of it are actually accessible. Rapid heap-exhaustion, coming right up!
The solution is pretty straightforward; if you want to hold a long-lived reference to a string, call new String(string) first. Something like
String shortString = veryLongString.subString(5,3); longLivedMap.put(new String(shortString),123);
It would be counter-productive to do this on every substring, since most of the time the substring and the original source will go out of scope at the same time, so sharing the underlying char is a sensible way to reduce overhead.
Since you (hopefully) won’t have many separate places in your code where you’re storing long-lived references like this (just cache implementations of one sort or another), you can create the new strings inside the implementation of those classes instead. Now your caches are all light, and your heap is all right. Good Night.