October 05, 2005

How are newlines represented in an HTML textarea ?

OK, this is just a blatant attempt at stuffing a result into google, since it took me a while to find this out. I hope someone else finds it useful.

If you have an HTML text area, and you want to parse the content of said textarea as a list of lines, you might be wondering what character you need to look for to indicate a line break. \n ? \r\n? That wierd thing that macOS 9 does?

Contrary to what you might expect, the answer is not "it depends on the platform". The merciful W3C (blessed be their spec.) decreed it to be constant regardless of what hare-brained operating system choice your user has made.

If your form's enctype is application/x-www-form-urlencoded (the default) , then the answer is '%0D%0A' i.e. CRLF, url-encoding. If the enctype is multipart/form-data, then it's an unencoded CRLF. This behaviour, it turns out, is inherited from the MIME specification.

Since any binding library worth it's salt will decode the url-encoding for you, the string that you should look for is CR-LF. This is not the same as the default line separator (LF) on any unix box – if you use that, you'll end up with a spurious \r character at the end of each line of parsed text.

