All entries for Monday 22 August 2011
August 22, 2011
Writing about web page http://dx.doi.org/10.3998/3336451.0013.305
I've linked to this article: J. Beel and B Gipp, 'Academic Search Engine Spam and Google Scholar's Resilience Against it', Journal of Electronic Publishing, 13 (3), 2010
The article discusses possibilities for academic search engine optimisation, and what happens when this becomes spamming activity. It has a very neat description of how people go about spamming search engines, and it considers some of the ways that scholars can manipulate academic search engines.
If (like me) you aren't already aware of all these nefarious techniques, then the article will be an eye-opener! The researchers experimented on Google Scholar, using the following approaches:
- "When creating an article, an author might place invisible text in it. This way, the article later might appear more relevant for certain keyword searches than it actually is."
- "A researcher could modify his own or someone else’s article and upload it to the Web. Modifications could include the addition of additional references, keywords, or advertisements."
- "A manipulating researcher could create complete fake papers that cite his or her own articles, to increase rankings, reputation, and visibility."
I find it interesting that three of the sites they used in their manipulation were Mendeley, Academia.edu and ResearchGate. I've blogged about these sites and their ilk before and I've suggested to researchers that having details about their work on these sites would help them raise their profiles on the Web. The article says that only the papers uploaded to academia.edu were crawled and indexed by Google Scholar... which is kind of good news for the robustness of Google Scholar. And also an indication that for those searching for full text versions of articles on the web that they should go directly to the kinds of sites which might hold them (I do recommend Mendeley), and not only rely on search engines.
After reading this article, I want to know how to go about modifying a journal article after it has been published (including those not your own!), in order to add references. The authors didn't go into detail about how to do that, but you can imagine the havoc it would play with Google Scholar's citation scores if we were all doing it!
I note that the authors described the journal 'Epidemiology' as "a reputable journal by the publisher JSTOR". JSTOR is not a publisher, it is a content aggregator. 'Epidemiology' is published by Wolters Kluwer. It probably takes a librarian to know this, and I wonder whether it is relevant anyway. It could be a deliberate faux pas on the part of the authors, because it kind of illustrates their point that people don't know where content online is coming from! And the authors are right that a journal available on JSTOR is a reputable academic title.
The discussion section of the paper describes that it is a lot of effort to spam academic search engines, that the benefit is not immediate or measurable for academics, and that academics are unlikely to undertake such work because their reputation is so valuable and could be permanently damaged if a search engine were to ban all his/her articles once spamming activity was discovered. The authors raise the matter of whether a journal or conference might engage in search engine spamming: they don't mention academic institutions, but I believe that Universities could also have a motivation.
I do worry about where we draw the line between authors or journals raising their profile in legitimage ways, and where spamming begins. I have long advised authors to include key words in their article titles because of the way journal indexing tools work in ranking results, and this seems to me to make good sense from both a "discovery optimisation" point of view and from an academic accuracy perspective. I also believe that self-citation is a good idea, in that I think false modesty is pointless and potentially damaging, but authors ought to know whether their earlier work is relevant to their latest article or not, and how well such practice is accepted in their own field and therefore be able to self-cite with caution.
The big question about all such profile raising practices, for me, is how far should we go? This article doesn't give the answer, but it describes an awful lot more about what could be done. They also conclude by suggesting: "the academic community needs to decide what actions are appropriate and when academic search engine optimization ends and academic search engine spam begins."