October 12, 2005

Latent Semantic Analysis

I am allergic to jargon! It can be used to

  • conceal the lack of substance or meaning in an idea

  • inflate the commonplace or trivial

  • exclude the unititiated

  • confer pseudo-scientific authority and claim false superiority

So what about Latent Semantic Analysis and its inevitable contraction to LSA? I came across it in an interesting conversation with Mike Joy about current issues in e-learning and assessment. LSA is a statistical method designed to measure the commonality of meaning in a collection of text passages or documents. It compares the frequency of significant words, numerically conflates their meanings, applies some mathematical jiggery-pokery to the data (viewed as sparse matrices), and comes up with some numbers that may indicate how close the texts are in meaning. It can be used as an alternative to the more familiar comparison of strings (a la Google) in detecting likely plagiarism. I believe that it is used effectively in monitoring plagiarism in program source code submitted for assessment by students in the Department of Computer Science.

Clearly jargon is both necessary and useful to experts, and LSA meets this test. It also has the virtue of meaning what it says: the analysis of hidden meaning. It might be interesting to run LSA on this and other blogs on plagiarism!

- One comment Not publicly viewable

  1. I am carrying out the research into ‘Latent Semantic Analysis (LSA) and source-code plagiarism’, under the supervision of Mike Joy.

    Very briefly, Latent Semantic Analysis (LSA) applies statistical techniques to capture the major relationships between terms (e.g. words) and contexts (e.g. documents) and to categorise them into a semantic structure depending on their similarity, hence “latent semantic” in the title of the method.

    Regarding plagiarism, as an attempt to plagiarise, students may rename many words, and LSA is suitable for detecting such documents whereas software based on String matching algorithms may fail to identify them.

    LSA has been successfully applied to educational applications such as automatic essay scoring and natural-language plagiarism detection. For more information see link

    Also, LSA is commonly used in tasks such as search and retrieval, classification and filtering and would be very interesting to apply LSA to the plagiarism discussion blogs.

    On my website, I have some information about LSA and plenty of references to papers. link

    Georgina Cosma

    13 Oct 2005, 18:53

Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

October 2005

Mo Tu We Th Fr Sa Su
|  Today  | Nov
               1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search this blog



Most recent comments

  • The LaTeX equations are a little wobbly in their baselines That's because they use mimeTeX. I am try… by Steve Mayer on this entry
  • Aha! You are right, John. I am familiar with a different terminology: Multiple Choice for what Quizb… by on this entry
  • Haven't you used the wrong question type, Trevor? Your answers look, to my unskilled eye, to be mutu… by John Dale on this entry
  • The usual factors in Information Assurance are: Confidentiality Integrity Availability And for syste… by Max Hammond on this entry
  • Is the workshop next Monday,26 March 07, open to anybody interested in computer aided assessment? If… by Kuldeep Singh on this entry

Blog archive

Not signed in
Sign in

Powered by BlogBuilder