What's a document management system?
An idea which keeps popping up when I talk to my colleagues both within ITS and also within academic and administrative departments is that lots of people want a Document Management System (DMS for short from now on) to support their work. So I’ve agreed to try and pull together some kind of summary about what it is that people actually mean when they say DMS – do they all mean the same thing? – and thus what kind of system we might be looking for.
As luck would have it, I know nothing about what a DMS is or does, nor am I very aware of what products there are in this space. This is either a gross disadvantage or a refreshing lack of preconceptions allowing for open-minded consideration of the issues. However, some of the problem domains are pretty easy to understand: archiving of documents we need to keep around for legal or business reasons; users who want to work together on authoring documents; users who want a record of the history of a document. So my discussions and reading so far lead me to believe that a DMS might encompass some, all, none or fewer of the following:-
Document creation and editing
A DMS should support the creation and editing of documents using the same desktop applications which people already use. So you should be able to create and edit your Word documents, your Photoshop images, your Autocad drawings, etc. just as you do at the moment, but storing them in the DMS instead of on your local hard drive or a networked hard drive. An implication of this is that there would have to be some way to connect to the DMS directly from your desktop; it would add too much friction if you had to use a web app to download the document, edit it, save it to your hard disk, then re-upload it. You’d need to be able to open the document directly from the DMS. This would suggest that you’d need SMB or CIFS or webDAV and possibly sFTP support, especially if access in this way was also supposed to work off-campus (for when you’re at home, or when you’re collaborating with someone at another university) as well as on.
And in order to properly support editing in this way, you’d also need to be able to lock documents for editing so that while I’m editing it, you can’t. And you might want different sorts of locks; a lock which says “I have this document open for editing right now” is one sort, but you might also want to be able to say “I’m going to be working on this document, on and off, for the next week. Nobody else should be able to change it until a pre-defined time comes around, or until I explicitly signal that I’m done with it.” And implicit in this idea is the idea that you should be able to set permissions on your files and folders to control who can see them, edit them, comment on them, allow others to edit them, etc.
Another feature you’d want over and above what you get from a normal file system is version history. As changes are made to a document in the DMS, metadata about the changes should be stored so that it’s possible to see a history showing who changed the document and when, and you might also want to store all the previous versions of the document if you had disk space to burn.
Archiving and lifecycle management
Once you’re done editing your document, a feature which several people have mentioned is the ability to archive it – a place to store your documents which is secure and stable and allows for long-term storage of a “frozen” unchangeable version of a document. There are lots of documents which make the transition into this state – committee agendas, minutes and papers, annual reports, blueprints, etc. There’s also an interesting question about the lifecycle of archived documents: some documents, notably those which contain personal data, may not be stored for longer than is required to perform the work of the institution. So documents like that may have archiving rules such as “Keep for five years, then delete”. Others may be “Keep indefinitely” – but that raises challenges of its own, since it implies that your storage requirements for your DMS are going to rise every year. And how long is it reasonable to assume that “indefinitely” means? Our Estates office have paper documents going back forty years. Is it reasonable to try and design a DMS to store documents for that kind of period? What’s the lifespan of a given document format (eg. Word or Autocad)? Five years? Ten?
Sharing, publishing and retrieval
Once you’ve put a document into your DMS, you’re likely to want to share it with some people, and you need to be able to find it again later. So you need the same kind of permissions system that you need for editing purposes, but for viewing purposes. And, equally importantly, you need to be able to find your document, and possibly you need other people to be able to find it too. Web sites tend to allow browsing through a hierarchical structure, but a DMS may or may not work in that way, so good indexing, searching and metadata become important. The metadata is particularly relevant because not every file type contains content which can be indexed and searched; if the file I’ve uploaded is an image, then it’s effectively unsearchable unless I also supply a description or some keywords alongside it. (This is particularly important if what you plan to do is scan lots of paper documents and add them to your DMS; unless you intend to do OCR – a slow and expensive proposition – then what you’ll have is effectively just an image, so it’ll only be discoverable if its metadata is good enough.)
An interesting extra wrinkle which some people have mentioned is that once you’ve got the ability to share documents and edit them collaboratively, then you might want other tools to help your collaboration too. So if you and I are working on a research paper, or a design for a new Library, then as well as the documents we’re creating, perhaps we’d also like to-do lists for the participants; maybe a calendar to show due dates, or gantt charts, or ways of leaving message for each other like a mini discussion forum. By this point, I think, you’ve moved beyond pure DMS into a different space. But I can see how, in peoples’ minds, the two spaces might be logically linked, and if you’re doing one, you might well want to do the other around it.
Some things which I think are probably out of scope for our purposes are:-
- Real-time collaborative editing; two or more people working on a document at the same time, able to see each others’ changes live on their respective screens. GoogleDocs lets you do this for word processed documents and spreadsheets, but short of building our own web apps to do the same, I don’t think this is something you could easily get from a DMS; it would need your editing applications – your word processor, your spreadsheet tool, etc. – to support this kind of editing explicitly, and I don’t think many, if any of them, do. Users who want this feature should probably be directed towards GoogleDocs or Zoho or whatever.
- Workflow. Some papers I’ve read have suggested that a DMS could be the tool by which you define and enforce your workflow for certain types of document. So if I create an invoice within the DMS, the system knows that because it’s an invoice, and it’s for more than £5K, it should go first to my head of department for approval, and then to the finance office for processing, and a record should be created in SAP, and so on, and so on. I can see how this could be useful, but I don’t think there’s any realistic chance of implementing this in our very diverse and decentralised environment. So perhaps workflow support is out of scope.
- Document scanning. At least some of the people who want a DMS want to scan lots of paper documents and put them into it. My presumption at the moment is that the scanning and possible OCR work would be a separate project to the creation of a DMS, and the DMS engine wouldn’t particularly distinguish, or have additional support for, scanned documents as opposed to documents which were fully digital.
- Records management. I’m not as sure about this as I am about the other exclusions, but it seems to me that records management, where you have a set of documents which are all in the same, highly structured format like, er, records (in a database), is a niche of its own within document management, and the general purpose nature of a system which allows you to edit and store any kind of file may not be sufficient for more highly structured data.
Phew. So, one of the first questions which occurs to me is, are all these activities really the same in the sense that a single application could or should support them? Or is long-term archiving conceptually and technically different from shared editing? I guess that’s something which might become clearer once we start to consider possible products in this space, though again, I don’t know much about what products there are or what their individual strengths and weaknesses are; several people I’ve spoken to so far have explicitly suggested Microsoft SharePoint as being what they’re thinking of when they say ‘DMS’; other products I’ve heard mentioned include Alfresco as a sort of open source SharePoint, FormScape (which I believe may already be in use within the Finance Office), and Documentum as the sort of heavy-weight market leader in this space. One thing we’d need to watch out for with most off-the-shelf systems is that they generally claim to do a lot more than just document management; SharePoint in particular is like a sort of swiss army knife of a server, claiming to do document management, web content management, portal management, project management and for all I know moon landing management. We’d need to be sure that we could wall the application off so that it doesn’t offer features which would compete with tools which we already use.
Another possibility is to consider whether we could build something for the job, or adapt one of our existing applications. Files.Warwick is probably the closest to what would be required, though it lacks version history, check-in/out, the idea of a “locked” archive, and most challengingly of all, it lacks a way to connect to it directly from your desktop (other than FTP). But it does have some of what’s needed; granular permissions, easy sharing, notifications and so on. But then I also wonder if the name recognition which SharePoint, in particular, seems to enjoy, would be important in that anything which isn’t SharePoint runs the risk of being rejected on that basis alone.
Anyway, we’re a long way from package selection. But it does seem as if some sense of what we might be looking for is starting to emerge from the mist.