I mentioned in my last post that I've been looking at PLSA, an evolution of LSA (Latent Semantic Analysis). Well, this term really does need a bit of explaining now that I've posted it.
Basically it's to do with measuring terms used often and together across a range of documents. I could possibly use it because I can look through songs and find which terms occur together - suggesting a theme. For example: 'cross', 'jesus', 'died', 'save', 'blood', 'me' and 'thank' might all appear together in a set of songs quite often. This suggests these songs all share a theme (Jesus' death / salvation in this example).
I discovered that Google can do this! (Is there anything it can't do?!)
If you search for a term with a tilde ('~') before it, Google will magically perform LSA on it and return results with similar terms to that word.
Combining this with Google's NOT operator ('-'), if you search for '~computer -computer', you will get results that contain similar terms to 'computer' but won't actually search for the word 'computer'. So the results highlight words like 'hardware', 'PC', 'laptop' and 'computing' instead of 'computer' like it would do for a search for 'computer'. Clever!
(If you're really interested - it seems that Google will only search for 5 terms beyond the search term. ie, in my '~computer' example, the results are only for 'computer', 'PC', 'laptop', 'computing' and 'computerized'. If you 'not' all of those terms, nothing gets returned.)