All entries for Monday 28 January 2008
January 28, 2008
I mentioned in my last post that I've been looking at PLSA, an evolution of LSA (Latent Semantic Analysis). Well, this term really does need a bit of explaining now that I've posted it.
Basically it's to do with measuring terms used often and together across a range of documents. I could possibly use it because I can look through songs and find which terms occur together - suggesting a theme. For example: 'cross', 'jesus', 'died', 'save', 'blood', 'me' and 'thank' might all appear together in a set of songs quite often. This suggests these songs all share a theme (Jesus' death / salvation in this example).
I discovered that Google can do this! (Is there anything it can't do?!)
If you search for a term with a tilde ('~') before it, Google will magically perform LSA on it and return results with similar terms to that word.
Combining this with Google's NOT operator ('-'), if you search for '~computer -computer', you will get results that contain similar terms to 'computer' but won't actually search for the word 'computer'. So the results highlight words like 'hardware', 'PC', 'laptop' and 'computing' instead of 'computer' like it would do for a search for 'computer'. Clever!
(If you're really interested - it seems that Google will only search for 5 terms beyond the search term. ie, in my '~computer' example, the results are only for 'computer', 'PC', 'laptop', 'computing' and 'computerized'. If you 'not' all of those terms, nothing gets returned.)
Hi there... if youre reading this, there's a good chance it's because you've just completed my online questionnaire. In which case, thank you very much! If not, please feel free to take the survey.
Although I've been doing lots of reading recently on subjects such as 'Probabilistic Latent Semantic Analysis' (!) in an attempt to find ways of automatically detecting song themes from lyrics alone, there hasn't been much progress made in the last couple of weeks in actually making it work.
But some tweaks have been made to my PSALM program. The mains ones are:
- Load database supplied as an argument to the program: (eg. 'java -jar XMLGUI.jar mydb.xml')
- Searching for keywords now works as searching for songs containing each word separately rather than as a single phrase. To search for a single phrase, enclose it in "speech marks". This then follows the convention used by Google.
If you're wondering just what PSALM is, it is the software component of my project. Take a look at the progress report or specification for more details. But basically, it is a song organisation system, that will soon also include theme detection & setlist/song recommendation functionality. (PSALM = Personal Software Aid for Leading Music)
So, here is Psalm 1...
Blessed is the man
who does not walk in the counsel of the wicked
or stand in the way of sinners
or sit in the seat of mockers.
But his delight is in the law of the LORD,
and on his law he meditates day and night.
He is like a tree planted by streams of water,
which yields its fruit in season
and whose leaf does not wither.
Whatever he does prospers.
Not so the wicked!
They are like chaff
that the wind blows away.
Therefore the wicked will not stand in the judgment,
nor sinners in the assembly of the righteous.
For the LORD watches over the way of the righteous,
but the way of the wicked will perish.
And the newly released 'PSALM 1.1' ...