March 16, 2005

Remixing culture with RDF

Matthew Haughey, Creative Commons

CC is 2 things:

  1. a set of liberal licenses
  2. a search engine to find licensed work

When designing the search infrastructure, it had to be decentralised, metadata-aware, small orgainisation, with existing toolkits. RDF is a good fit

Metadata format: HTML head generally ignored by search engines, robots.txt hacks too limited and hacky, supporting files (too much faff for end users). So they went for RDF documents embedded in HTML comments. This means that you can search for media with a particular re-usability status.

Built a prototype crawler + search engine out of python + postgresql. It's not scalable but it's a proof of concept

Rebuilt the app on Lucene. Lucene is teH r0X0R for building open-source indexing tools; there's an app called nutch which runs on top of lucene which they used. The CC search engine is about 500 lines of java on top of nutch. The search engine now indexes about 10 million pages.

This talk is not very good.

- No comments Not publicly viewable

Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

Most recent entries


Search this blog

on twitter...


    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXXI