MusicNet’s key contribution is the minting of authoritative URIs for musical composers, that link to records for those composers in different scholarly and commercial catalogues and collections. MusicNet claims authority because the alignment across the sources has been performed by scholars in musicology. The alignment tool and the progress to date has been detailed previously. In this post I will overview our methodology for publishing our work, in terms of the decisions made in choosing our URI scheme and how we model the information using RDF in the exposed Linked Data. I will then describe the architecture for generating the linked data, which has been designed to be easily deployed and maintained, so that it can be hosted centrally in perpetuity by a typical higher education computer science department.
The URI scheme is designed to expose minimal structural information, for example, the URI for Franz Schubert is currently (see below for a volatility note):
It is comprised of the domain name (musicnet.mspace.fm), an abstract type (person), an ID taken from the musicSpace hash of the composer (7ca5e11353f11c7d625d9aabb27a6174) and a fragment to differentiate the document from the person (#id).
We have chosen a hash rather than a human-readable label because we want to avoid people using the URI because they think that it refers to a composer when it might refer to a different composer. This is important in this domain because there are a number of composers with the same or similar names. Part of the alignment process has musicologists make this distinction. By forcing people to resolve the URI and check that it is the person they are referring to, we aim to avoid incorrect references being made. In addition it gives us the freedom to alter the canonical label for a composer after we have minted the URI, so that we don’t have a label-based URI with a different label in its metadata.
We intend for the domain name to change soon from one which isn’t explicitly tied to mSpace – this is in place right now for convenience to us. In particular our requirements are a domain that will not cost us anything to re-register in future, will remain in our control (i.e. not get domain parked if someone forgets to renew), and will not dissuade people from using it for any partisan or political reasons. The closest we might reasonably get is musicnet.data.ac.uk, although this is still unconfirmed at this point in time, and we may have to instead use musicnet.soton.ac.uk or musicnet.ecs.soton.ac.uk, which are not preferred, since they might give the impression that the data is a Southampton-centric view of the information, which it is not. For a more in depth discussion of a proposed solutions see our previous posts (data.ac.uk proposal & data.ac.uk revisited)
In addition to the scheme for the URI, we also had to determine the best way to expose the data in terms of the ontological constructs (specifically the class types and predicates) used in the published RDF. We are fortunate that an excellent set of linked data in the musical composer domain already exists, in the form of the BBC /music linked data. For example, the BBC /music site exposes Franz Schubert with the URI:
The BBC’s data uses the Music Ontology heavily, as well as other ontologies such as SKOS, Open Vocab and FOAF. Since we are publishing similar data, it makes sense for us to use the same terms and predicates as they do where possible, which is what we have done.
We are still in the process of finalising how we will model the different labels of composers. In the figure below we offer two possible methods, the first is to create a URI for each composer for every catalogue that they are listed in, publishing the label from that catalogue under the new catalogue-based URI, and use owl:sameAs to link it to our canonical MusicNet one. The second method is to “flatten” all labels as simple skos:altLabel links, although this method loses provenance. Currently we do both, and we’ve not finalised whether this is necessary or useful.
Content Negotiation & Best Practice
Similarly, we also follow the BBC /music model of using HTTP 303 content negotiation to serve machine-readable RDF and human-readable HTML from the same URI. Specifically, the model we’ve borrowed is to append “.rdf” when forwarding to the RDF view of the data, and to append “.html” when forwarding to the human readable view of the data. This is now implemented, and you can try this out yourself with the above URIs, which you can turn into the following:
There are several other offerings from the MusicNet site, some of which have been detailed before. First, the MusicNet Codex, which is the human search engine for MusicNet. In addition we have also created a (draft!) VoiD document that describes the MusicNet data set, available here:
The perceptive among you will notice that the VoiD document links to an RDF dump of all of the individual linked data files, available here (14MB at time of writing):
Simple Deployment & Hosting
As noted above, our requirements state that our deployment must be as simple as possible to maintain by typical higher education computer science department web admins. In our bid we stated that we will work with the Southampton ECS Web Team to tweak our solution. As such, in order to keep our deployment simple, we have adopted an architecture where all RDF (including the individual Linked Data files for each composer) are generated once and hosted statically. The content negotiation method (mentioned above) makes serving static RDF files simple and easy to understand by web admins that might not know much about the Semantic Web. Similarly, the VoiD document and RDF dump get generated at the same time. The content negotiation is handled by a simple PHP script and some Apache URL rewriting.
Benefits of Linked Data
One of the benefits of using Linked Data is that we can easily integrate metadata from different sources. One of the ways in which we use this is using the aforementioned BBC /music linked data. Specifically, we enrich our Linked Data offering through the use of MusicBrainz. One of the sources of metadata that we have aligned is musicbrainz, based on a data dump we were given by the LinkedBrainz project team. The BBC also have aligned their data to Musicbrainz, and thus we have been able to automatically cross-reference the composers at the BBC with the composers in MusicNet. Thus, we can link directly to the BBC, which offers a number of benefits. Firstly, it means that users can access BBC content, such as recently radio and television recordings that feature those composers (see the Franz Schubert link above, for examples), but also that we can harvest some of the BBC’s outward links in order to enrich our own Linked Data offering. Specifically, we have harvested links that the BBC make to pages on IMDB, DBPedia, Wikipedia, among others, which we now re-publish.
The data flow from the raw data sources to linked data serving is illustrated in the figure below.
The following tasks remain in this area of the project:
- Acquire control of a long-term domain name (preferably musicnet.data.ac.uk, see above).
- Discuss our RDF model with experts in Linked Data, Ontological Modelling and Provenance.
- Determine if we will offer a SPARQL endpoint in future. If we decide not to ourselves (because it might not be sustainable once our hosting is passed over to the department), it might be desirable to put the data on the Data Incubator SPARQL host.