All entries for Tuesday 07 August 2012
August 07, 2012
Imagine a world without Mendeley, Evernote, Bookends or Zotero, but with plenty of Microsoft Excel or google spreadsheets. Don't panic, it's just a hypothetical situation. You are attempting to catalogue your books using MS Excel, but you also want to include biographical information about your authors. You could have columns in your table for biographical information, but everytime you enter that author you are going to have to re-enter that information. There are only so many times that you can enter Wordsworth's date of birth before you lose the will to live (or at least to work). If you are like me, that 'number of times' is quite small. So, you create a separate spreadsheet with a table of authors and biographical information, and you simply reference that table from your original table of books. Obviously you have read and want to index every Wordsworth text ever published so this is going to save you a lot of copying and pasting. What you have created is a basic relational database, you have split the information into two tables to avoid repetition. In database-ease this is called normalizing your data. I think of data normalization like this: if you have to enter the information more than once, is it easier to split it into a different table? The author's name will probably be the 'key' that you use to relate the two tables. However, if you were in Byron's personal hell, and there were several William Wordsworths (or 'Turdsworths' as Byron would call them), you could use an arbitrary, but unique, number to cross-reference the two tables without any confusion, you'd know which of the ever-replicating Agent-Smith-esque Wordsworths authored which Prelude. Resisting the temptation to explore the deep philosophical questions about identity that this raises, I can tell you that the best tools for creating relational databases like this are, infact, not spreadsheet tools, but relational database management systems like Microsoft Access (best as a personal, desktop based database) or MySQL (which is the wizard behind the curtain of many many websites or institutional HR systems). As I understand it, then, a relational database is a way of recording information in a structure that maximizes efficiency by separating information into different tables which are linked by reference keys (in relational database speak, foreign keys and primary keys).
One planned output of the project that I am working on (Networks of Improvement) is an online relational database of eighteenth-century literary clubs and societies. After a few months of experimentation we now have an online platform for this research. For now, the platform is private, but by the end of the project in 2014 it will be freely available.
I began developing this platform by experimenting with MS Access, and this was really helpful because I learnt about normalizing databases and thought carefully about the design of the database and how the information related. This made me think about efficiency and how to get my users to enter data easily, consistently, and accurately. In the end though, Access seemed the wrong way to go, given that we wanted to collaboratively populate the database and to publish it online. The solution I have ultimately used is to design a website powered by Drupal, a widely used, powerful, open source content management system. I gather Drupal is really intended for people who are comfortable writing php code, but I have found you can do a great deal without this knowledge or even without knowing what php is (apart from a curiously vowelless noise). Using a cocktail of modules (plug-ins to the core drupal management code which are contributed by a community of developers to extend its functionality) I've been able to set up a website that allows users with permissions to add and edit content using forms, as well as to search and sort existing content. They can do this simultaneously, and from anywhere that they have an internet connection.
The site feels like a relational database management system. Drupal does use a MySQL database, but I still do not know whether or not the data in the MySQL database behind the scenes is organising the information we enter into different tables (I should really look into this!). However, when I turned to Drupal, I took the lessons I learnt from MS Access and applied them to content management. The platform I have designed feels like a relational database for its users. Drupal allows you to define different content types, and to define 'fields' (think of these like columns in a spreadsheet), for each content type. It creates forms for users to populate those fields. I began by setting up content types which, in my mind, were the equivalent of each of the tables that I had originally made use of in MS Access. I added contributed modules to the core Drupal in order to acquire nice forms for fields -for instance, I added a location module and a date module which made it easy to add, yes, locations and dates. In my opinion, the module that really pushed this solution ahead of the competition from an academic point of view was 'biblio', a module that acts like a citation manager within the website.
In our case, we are trying to record clubs, club membership, and club venues. I have created Drupal content types for 'club', for 'membership records', for 'individuals', and for 'venues'. I separated membership records and individuals into two different content types or 'tables', as I think of them. This was a lesson learnt from normalizing my database structure in Access -one individual can have lots of membership records, so this way I don't have to enter the individual more than once. This also helps us investigate research questions about an individual's clubbish behaviour -allowing the discovery of individuals who belong to many clubs and form nodal points in the networks of improvement we're studying.
The crucial part of the process was how to link these 'tables'. I ended up using a tagteam of contributed modules called entity_reference and entiity_connect. These modules allowed me to add a field to my content types which was (at least from a user's point of view) equivalent to the foreign key in a conventional database. So, my membership records had a field which referenced a club and an individual, for example. To most intents and purposes, I have created a really usable online relational database management system. The data can even be queried and represented using Structured Query Language (SQL), the major benefit of relational databases as a methodology for answering research questions.
We hope that this will ultimately produce a really valuable tool for scholars of the eighteenth century, as well as enriching people's sense of the history of their areas and of sociable or clubbish behaviour. I also hope that my experience of the technology behind all this will generate ideas for other applications of Drupal and relational databases in humanities research and give other non-developers the confidence to dabble in digital humanities. We're now considering the possibilities for wider collaboration that this platform might offer -but perhaps I'll save that for another post.
Share this post on twitter: Tweet