Archive for the ‘Documentation’ Category

Final Product Post: MusicNet & The Alignment Tool

29 Jun

This is a final report and roundup of the MusicNet project. We’ll mainly be discussing the primary outputs of the project but will also cover an overview of the project as a whole.

We have two primary prototypal outputs/products from the project, they are:

  1. The Alignment Tool
  2. The MusicNet Codex

We’ll discuss each of these in turn and address what they are, who they are for and how you can use them in your own projects.

Read the rest of this entry »


Progress Update

10 Mar

Its time for a short update on how the project is progressing. We’ve had an incrementally feature-full prototype of our Codex available on our project web since January and we’ve been working hard to improve it. If you haven’t already then head on over to and search for a composer.

What have we added since January?

Content Negotiation

One of the most important features we’ve added since January is content-negotiation. This enables our Codex to serve up the most appropriate content dependant on the ‘Accept’ header received in the HTTP request. For a more detailed writup see Dan’s blog post on the MusicNet URI Scheme.

A simple example would be:

Franz Schuberts URI is:

If we request this from a regular web browser we are dereferenced to the HTML content at:

However, if we request this URI from a semantic web browser we are dereferenced the RDF content at:

Data Enrichment

We have also been working hard to leverage the data we’ve aligned over the last year to enrich the information provided by our various data partners. Last year we met with the LinkedBrainz team and they provided us with a small set of composer data from MusicBrainz for us to align against. This has allowed us to draw additional information from other open data sources such as the BBC, Wikipedia/DBPedia, IMDB and even the New York Times to provide a more complete representation of the data available about a composer.

This data is available in both the RDF and the HTML representation of the Codex.

e.g. Schubert, Franz (HTML | RDF)

Alignment Progress

Alignment is moving on well and we’re currently at 89%.

What is left to do?

One of the discussions the MusicNet team has been involved in since the start of the project has been and the need for in perpetuity hosting of URI’s minted by JISC projects.

We’re currently in discussions to be one of the first projects to be able to make use of this domain and hope that by the end of the project we’ll be able to move our Codex and URI’s over to a suitable domain such as This will ensure that the data we’ve exposed will be available after the project’s end.

MusicNet Workshop

We’re also hosting a small workshop on the 12th May at JISC HQ to try and expose more people to the potential of the MusicNet URI’s. The workshop will also be looking more broadly at the current Music & Linked Data landscape & should cater to a broad audience. It’s filling up very quickly so if you’re interested and haven’t yet made contact please do so soon.

For more details see our announcement


MusicNet URI scheme and Linked Data hosting

19 Jan

MusicNet’s key contribution is the minting of authoritative URIs for musical composers, that link to records for those composers in different scholarly and commercial catalogues and collections. MusicNet claims authority because the alignment across the sources has been performed by scholars in musicology. The alignment tool and the progress to date has been detailed previously. In this post I will overview our methodology for publishing our work, in terms of the decisions made in choosing our URI scheme and how we model the information using RDF in the exposed Linked Data. I will then describe the architecture for generating the linked data, which has been designed to be easily deployed and maintained, so that it can be hosted centrally in perpetuity by a typical higher education computer science department.

URI Scheme

The URI scheme is designed to expose minimal structural information, for example, the URI for Franz Schubert is currently (see below for a volatility note):

It is comprised of the domain name (, an abstract type (person), an ID taken from the musicSpace hash of the composer (7ca5e11353f11c7d625d9aabb27a6174) and a fragment to differentiate the document from the person (#id).

We have chosen a hash rather than a human-readable label because we want to avoid people using the URI because they think that it refers to a composer when it might refer to a different composer. This is important in this domain because there are a number of composers with the same or similar names. Part of the alignment process has musicologists make this distinction. By forcing people to resolve the URI and check that it is the person they are referring to, we aim to avoid incorrect references being made. In addition it gives us the freedom to alter the canonical label for a composer after we have minted the URI, so that we don’t have a label-based URI with a different label in its metadata.

Domain Name

We intend for the domain name to change soon from one which isn’t explicitly tied to mSpace – this is in place right now for convenience to us. In particular our requirements are a domain that will not cost us anything to re-register in future, will remain in our control (i.e. not get domain parked if someone forgets to renew), and will not dissuade people from using it for any partisan or political reasons. The closest we might reasonably get is, although this is still unconfirmed at this point in time, and we may have to instead use or, which are not preferred, since they might give the impression that the data is a Southampton-centric view of the information, which it is not. For a more in depth discussion of a proposed solutions see our previous posts ( proposal & revisited)

Ontological Constructs

In addition to the scheme for the URI, we also had to determine the best way to expose the data in terms of the ontological constructs (specifically the class types and predicates) used in the published RDF. We are fortunate that an excellent set of linked data in the musical composer domain already exists, in the form of the BBC /music linked data. For example, the BBC /music site exposes Franz Schubert with the URI:

The BBC’s data uses the Music Ontology heavily, as well as other ontologies such as SKOS, Open Vocab and FOAF. Since we are publishing similar data, it makes sense for us to use the same terms and predicates as they do where possible, which is what we have done.

We are still in the process of finalising how we will model the different labels of composers. In the figure below we offer two possible methods, the first is to create a URI for each composer for every catalogue that they are listed in, publishing the label from that catalogue under the new catalogue-based URI, and use owl:sameAs to link it to our canonical MusicNet one. The second method is to “flatten” all labels as simple skos:altLabel links, although this method loses provenance. Currently we do both, and we’ve not finalised whether this is necessary or useful.


RDF model for MusicNet alternative labels

RDF model for MusicNet alternative labels



Content Negotiation & Best Practice

Similarly, we also follow the BBC /music model of using HTTP 303 content negotiation to serve machine-readable RDF and human-readable HTML from the same URI. Specifically, the model we’ve borrowed is to append “.rdf” when forwarding to the RDF view of the data, and to append “.html” when forwarding to the human readable view of the data. This is now implemented, and you can try this out yourself with the above URIs, which you can turn into the following:

There are several other offerings from the MusicNet site, some of which have been detailed before. First, the MusicNet Codex, which is the human search engine for MusicNet. In addition we have also created a (draft!) VoiD document that describes the MusicNet data set, available here:

The perceptive among you will notice that the VoiD document links to an RDF dump of all of the individual linked data files, available here (14MB at time of writing):

Simple Deployment & Hosting

As noted above, our requirements state that our deployment must be as simple as possible to maintain by typical higher education computer science department web admins. In our bid we stated that we will work with the Southampton ECS Web Team to tweak our solution. As such, in order to keep our deployment simple, we have adopted an architecture where all RDF (including the individual Linked Data files for each composer) are generated once and hosted statically. The content negotiation method (mentioned above) makes serving static RDF files simple and easy to understand by web admins that might not know much about the Semantic Web. Similarly, the VoiD document and RDF dump get generated at the same time. The content negotiation is handled by a simple PHP script and some Apache URL rewriting.

Benefits of Linked Data

One of the benefits of using Linked Data is that we can easily integrate metadata from different sources. One of the ways in which we use this is using the aforementioned BBC /music linked data. Specifically, we enrich our Linked Data offering through the use of MusicBrainz. One of the sources of metadata that we have aligned is musicbrainz, based on a data dump we were given by the LinkedBrainz project team. The BBC also have aligned their data to Musicbrainz, and thus we have been able to automatically cross-reference the composers at the BBC with the composers in MusicNet. Thus, we can link directly to the BBC, which offers a number of benefits. Firstly, it means that users can access BBC content, such as recently radio and television recordings that feature those composers (see the Franz Schubert link above, for examples), but also that we can harvest some of the BBC’s outward links in order to enrich our own Linked Data offering. Specifically, we have harvested links that the BBC make to pages on IMDB, DBPedia, Wikipedia, among others, which we now re-publish.

The data flow from the raw data sources to linked data serving is illustrated in the figure below.

MusicNet Architecture Data Flow Diagram

MusicNet Data Flow Diagram

Future Work

The following tasks remain in this area of the project:

  1. Acquire control of a long-term domain name (preferably, see above).
  2. Discuss our RDF model with experts in Linked Data, Ontological Modelling and Provenance.
  3. Determine if we will offer a SPARQL endpoint in future. If we decide not to ourselves (because it might not be sustainable once our hosting is passed over to the department), it might be desirable to put the data on the Data Incubator SPARQL host.

This post documents Work Package 3 from the MusicNet project deliverables. MusicNet is funded through the JISCEXPO programme.


End of year roundup

30 Dec

Its been a busy 2010 for the MusicNet project and we’ve made great progress. Our Alignment Tool is now becoming more mature and the latest code is showing significant increases in task speed, making the workflow much more efficient. We expect to be able to release Beta 2 in the new year, so keep checking back for more details.

Alignment Progress (Work Package 4)

The performance and usability improvements to our Alignment Tool (Work Package 2) have had a dramatic effect on our overall alignment progress. We are now at 56% complete, which places us firmly on target to complete the entire dataset before the proposed deadline (end of March 2011).

Codex/User Portal (Work Package 6.2)

Work has also begun on the MusicNet Codex, which aims to be a single source of search for Musicologists to find information and links into our datapartners catalogs. Although this is in the very early beta stages, it is functional and we are adding more composers as and when they are aligned.

Visit the beta of the MusicNet Codex:

The Codex publicly demonstrates for the first time the outputs of the Alignment Tool and shows the integration with the LinkedBrainz project (read about our meetup with the LinkedBrainz project).

Please feel free to leave any feedback on the Codex in the comments.

Linked Data (Work Package 5)

Work is underway to convert the output of the Alignment Tool into usable Linked Data. In the New Year we plan to release our proposed URI Scheme (Work Package 3) and also expose an early version of our data/alignment as Linked Data.


Performance & Usability Improvements

09 Dec

Lately we’ve been working hard to improve the workflow of our Alignment Tool. Based on the real user experience of the musicologist using the tool daily, we were able to implement some simple performance and UX (User Experience) updates that have had a dramatic effect on efficiency.

The improvements we’ve made and the effect they’ve had on the workflow are outlined below – although some of these may seem simple, it’s only in hindsight and after real-world user testing that the need for such tweaks becomes apparent.

Starting Point

When we released Beta 1 of our tool, we ran some benchmarks to get a sense of how quickly the alignment task could be achieved. Initially we found that the rate at which you could create verified matches was: 135/hr.

Performance Improvements, Phase 1

Alphabetically auto-sort newly created groups

In Beta 1, newly created groups were pushed to the bottom of the groups list so as to reduce the need to re-sort and re-render a potentially large list – a procedure that typically doesn’t perform well in a Javascript environment.  However, this is problematic from a UX perspective: after creating a new group, the alphabetical sorting of the grouped item column is corrupted, which makes comparing its contents to the alphabetically sorted ungrouped column more time consuming.

Happily, recent improvements to Javascript engines and the uptake of ECMAScript 5 functions, such as Array.sort(callback), allow the browser itself to perform the re-ordering rather than us coding it in a Javascript routine. By altering the behaviour of the alignment tool so that groups added to the grouped item column are now listed in their correct alphabetical position, we were able to improve the user experience and remove the difficulty in comparing the contents of the ungrouped and grouped item columns. In testing the change we were unable to make the browser stall or freeze during the re-sort/re-render and did not notice a reduction in interface speed or responsiveness.

Scroll to newly created groups

In Beta 1 this action was the default as we knew that the new group would be at the bottom of the list. In changing to the re-sorting model we needed to work out where the newly created group resided in the list and then scroll the list to make sure the element was visible in the lists viewport.

As it turns out this was quite a simple process, the server returns the ID of the newly created group which allows us to find the element on the DOM after its been sorted. We can then work out how much to scroll the list based on the newly created group elements offsetTop property.

We also added a small Javascript ‘blink’ animation to draw the users attention to the newly created group.

Highlight List Items

When examining the associated metadata in the right hand metadata view pane for a group that has been suggested by the system, sometimes items’ metadata might be ordered differently to how items were listed in the column. Using the Beta 1 code this posed a problem if the entries labels were all identical, as it meant there was no way to identify which metadata belonged to which list item.

To solve this we added a hover effect in the metadata view pane which highlighted the associated list item, allowing for much quicker and accurate removal of single items.

Phase 1 Improvements led to a new match rate of 169/hr (25% improvement on Beta 1)

Performance Improvements, Phase 2

Fix Diacritics

A lot of the tooling we used to generate the datafiles we required as input to Alignment Tool didn’t seem to handle diacritics as well as expected. Specifically all the composers we had imported from the Grove database seemed to have escape characters placed in front of any diacritic. The diacritic remained in tact but there were just extra characters in the string.

We programmatically removed these characters to aid readability during the alignment process.

Create a Merge function

One feature we were missing in Beta 1 that we anticipated we might need was the ability to merge two or more groups into one. The most common use cases where this was required are (i) where the system generates two different groups for the same composer based on two re-occuring variations in name usage, or (ii) where the user creates a new group for ungrouped items, before realising that a suitable group for these items already existed.

This function has now been added and can be found in the Alignment Tool SVN repository on Google Code.

Phase 2 Improvements led to a new rate of 279/hr (106% improvement on Beta 1)


Governance Model

13 Sep

In conjunction with OSS Watch we’ve started to create a Governance Model for the MusicNet project. The first draft can now be downloaded from the Google Code repository:

Governance Model (Draft v1.0)

If you’d like to contribute to the project then please don’t hesitate to get in contact


WP1: Data Triage Report

20 Jul

We recently conducted an assessment of the metadata available in our data partners digital catalogs as part of our stated Work Package 1 deliverable. The aim of this assessment was to ascertain what metadata fields were good candidates to be exposed in our Linked Data.

Here is the introduction to the report:

This document outlines the metadata we aim to expose as part of the MusicNet Project. Decisions on what metadata to include are based on the following factors:

  1. Musicologists needs (recommended by David Bretherton)
  2. Technical feasibility (recommended by Joe Lambert)
  3. Licensing Restrictions

Our remit was to expose only information that is available in the public domain but to ensure enough is made available to allow a composer to be unambiguously identified.

You can download the full report from our Google Code Repository:
WP1: Data Triage Report v1.0 (67KB)



22 Jun

This is the budget submitted for the MusicNet project proposal.


Project Timeline, Workplan & Methodology

22 Jun

Here is the Gantt chart & work packages submitted with the MusicNet project proposal.

Work package 1 Data Triage

In collaboration with musicologists decide on the key types to expose about Composers

Deliverable 1: Report

Work package 2 Composer Alignment Tool

Tools will need to be generated to automatically recognise composer matches between multiple data sources. These tools should also allow for input from musicology experts to improve matches by manual approval and creating patterns for common reoccurring errors.

The musicSpace project has laid the groundwork for this alignment by mapping each different dataset to a common ontology created as part of the project in collaboration with musicology experts.

Deliverable 1: Tools for alignment

Work package 3 URI Scheme

Following the and cabinet office review guidelines for publishing, an appropriate and sustainable URI scheme will be created.

Deliverable 1: URI Scheme

Deliverable 2: Justification for scheme

Work package 4 Perform Data Alignment

Using the tools developed in WP2 each of the available each data sources will be aligned Deliverable 1: Data mappings between each of the data sources

Work package 5 Expose Linked Data

Using the scheme decided upon in WP3, URIs will be minted and structured linked data will be published that represents musicology composers

Deliverable 1: Linked Data

Work package 6 Prototype Development

Once the linked data has been published a prototype timeline visualisation will be produced.

Deliverable 1: Simile timeline using linked data from WP3

Deliverable 2: Codex on top of linked data

Deliverable 3: Documentation demonstrating how the prototypes were achieved

Work package 7 Community Engagement

A one day workshop to encourage use and reuse of the project outputs.

Deliverable 1: Workshop review

Work package 8 Reports & Guidance

Important research discoveries will be disseminated to the wider community via the project website, published deposits into the local EPrints repository as well as monthly posts to the project blog.


All project outputs will be deposed into the e-Framework Knowledgebase, and/or the JISC InnovationBase. A similar approach to that used in the musicSpace project will be used to ensure that sound agile software engineering practices are used..The project will build upon existing specifications and standards from W3C, JISC, and other projects. In particular, it is expected to reference agreed standards such as RDF and follow guidelines from Accessibility of Web-based systems and software will be ensured by conforming to the WC3 Web Accessibility Initiative level Double-A.


Project Team Relationships and End User Engagement

22 Jun

These are the members of the MusicNet project team:

mc schraefel is a Reader in Computer Science at the University of Southampton. She has led a number of JISC funded projects including the musicSpace project from which this proposal builds upon. More recently she is a CI on the EPSRC funded EnAKTing project whose main concern is the exposure of UK Government information as Linked Data.

Joe Lambert is a Research Fellow within the Intelligence, Agents and Multimedia group at the University of Southampton. He is the primary UI developer on the mSpace faceted browser and has worked on the JISC funded Richtags project as well as the Arts and Humanities Research Council (AHRC) funded musicSpace. He also worked on the JISC funded OpenPSI project, where he worked with public sector data in SPARQL databases, producing a standalone SPARQL version of the popular facetted browser; mSpace.

Daniel A. Smith is a Research Fellow; the primary developer on the mSpace server and has also been lead developer on the JISC funded Richtags project and the JISC/AHRC/EPSRC funded musicSpace. His doctoral thesis focused on the intelligent linking of remotely hosted linked data resources in a process called ‘pivoting’. He is also a regular contributor to the wider Linked Data research community, including engagement with the BBC.

David Bretherton is a Research Fellow with the Music department at the University of Southampton. He has a doctorate in Musicology from Oxford University and has also been the primary musicology consultant on the musicSpace project.

Engagement with the Community

To ensure that the way the URI resources are both published for use and made accessible for re- use by other tools and services, we will be working with stakeholders throughout the process for regular review and updates of our approach. We have used this kind of development/evaluation approach successfully with projects like musicSpace.

In the first instance, the project technical team will be working regularly throughout the lifecycle of the project with our musicologist colleagues here at the University of Southampton. In particular we will be running standard evaluation and development processes to refine both the Codex and the Visualisation tools – the main outward-facing interfaces in the project.

At regular intervals, we will also be deploying beta prototypes of these services with our distributed stakeholders from Durham University and Royal Holloway who are all on board to participate in these trials. Their letters of support for the project are appended.

A workshop will also be held toward the end of the project to promote the use and uptake of the linked data outputs of the project. The workshop will be suitable for musicologists as well as computer scientists and cater for a range of abilities and range of familiarity with the semantic web.

The aim of the workshop is to give tutorials of the use and reuse of the project outputs and encourage the linking to minted URIs by publishers of existing linked data (we have liaised with Yves Raimond at the BBC who publish a large amount of linked data including classical music performances from BBC Radio 3, whom has expressed an interest in such a workshop). We have contacted DevCSI who have agreed to aid us in engagement with the UKHE developer community, and we have contacted the DCC who have agreed to aid us by linking to our announcements for our proposed workshop.

We anticipate that the outcomes from the project will also continue to be of interest to various research venues where we have published this kind of work in the past, such as the International Society for Music Information Retrieval (ISMIR) and International Association of Music Libraries.