RSS
 

Posts Tagged ‘outputs’

Add MusicNet data to COPAC with ‘Composed’ bookmarklet

03 Oct

Here’s a great example of how the MusicNet data can be used to enhance existing sites. The ‘Composed’ bookmarklet decorates an existing COPAC composer record with all the extra information that MusicNet contains about that person.

Head on over to this blog post for more details. Incidentally this was created as an entry to the UK Discovery competition we blogged about earlier in the year.

So, who’s going to turn this into a GreaseMonkey script so that the bookmarklet isn’t need?

 
 

UK Discovery Developer Competition features the MusicNet dataset

11 Jul

The MusicNet dataset has been included as part of the UK Discovery global developer competition. The rules of the competition are simple, build an app/tool that makes use of at least one of the 10 featured datasets.

UK Discovery is working with libraries, archives and museums to open up data about their resources for free re-use and aggregation. DevCSIis working with developers in the education sector, many of who will have innovative ideas about how to exploit this open data in new applications.

This Developer Competition runs throughout July 2011. It starts on Monday 4 July – Independence Day, a good day for liberating data – and closes on Monday 1 August. It’s open to anyone anywhere in the world.

For more information about the competition see http://discovery.ac.uk/developers/competition/. Prizes are available for the best entrants, competition ends Monday 1 August 2011.

 
 

Final Product Post: MusicNet & The Alignment Tool

29 Jun

This is a final report and roundup of the MusicNet project. We’ll mainly be discussing the primary outputs of the project but will also cover an overview of the project as a whole.

We have two primary prototypal outputs/products from the project, they are:

  1. The Alignment Tool
  2. The MusicNet Codex

We’ll discuss each of these in turn and address what they are, who they are for and how you can use them in your own projects.

Read the rest of this entry »

 

Progress Update

10 Mar

Its time for a short update on how the project is progressing. We’ve had an incrementally feature-full prototype of our Codex available on our project web since January and we’ve been working hard to improve it. If you haven’t already then head on over to http://musicnet.mspace.fm/codex and search for a composer.

What have we added since January?

Content Negotiation

One of the most important features we’ve added since January is content-negotiation. This enables our Codex to serve up the most appropriate content dependant on the ‘Accept’ header received in the HTTP request. For a more detailed writup see Dan’s blog post on the MusicNet URI Scheme.

A simple example would be:

Franz Schuberts URI is: http://musicnet.mspace.fm/person/7ca5e11353f11c7d625d9aabb27a6174

If we request this from a regular web browser we are dereferenced to the HTML content at: http://musicnet.mspace.fm/person/7ca5e11353f11c7d625d9aabb27a6174.html

However, if we request this URI from a semantic web browser we are dereferenced the RDF content at: http://musicnet.mspace.fm/person/7ca5e11353f11c7d625d9aabb27a6174.rdf

Data Enrichment

We have also been working hard to leverage the data we’ve aligned over the last year to enrich the information provided by our various data partners. Last year we met with the LinkedBrainz team and they provided us with a small set of composer data from MusicBrainz for us to align against. This has allowed us to draw additional information from other open data sources such as the BBC, Wikipedia/DBPedia, IMDB and even the New York Times to provide a more complete representation of the data available about a composer.

This data is available in both the RDF and the HTML representation of the Codex.

e.g. Schubert, Franz (HTML | RDF)

Alignment Progress

Alignment is moving on well and we’re currently at 89%.

What is left to do?

One of the discussions the MusicNet team has been involved in since the start of the project has been data.ac.uk and the need for in perpetuity hosting of URI’s minted by JISC projects.

We’re currently in discussions to be one of the first projects to be able to make use of this domain and hope that by the end of the project we’ll be able to move our Codex and URI’s over to a suitable domain such as musicnet.data.ac.uk. This will ensure that the data we’ve exposed will be available after the project’s end.

MusicNet Workshop

We’re also hosting a small workshop on the 12th May at JISC HQ to try and expose more people to the potential of the MusicNet URI’s. The workshop will also be looking more broadly at the current Music & Linked Data landscape & should cater to a broad audience. It’s filling up very quickly so if you’re interested and haven’t yet made contact please do so soon.

For more details see our announcement

 
 

MusicNet URI scheme and Linked Data hosting

19 Jan

MusicNet’s key contribution is the minting of authoritative URIs for musical composers, that link to records for those composers in different scholarly and commercial catalogues and collections. MusicNet claims authority because the alignment across the sources has been performed by scholars in musicology. The alignment tool and the progress to date has been detailed previously. In this post I will overview our methodology for publishing our work, in terms of the decisions made in choosing our URI scheme and how we model the information using RDF in the exposed Linked Data. I will then describe the architecture for generating the linked data, which has been designed to be easily deployed and maintained, so that it can be hosted centrally in perpetuity by a typical higher education computer science department.

URI Scheme

The URI scheme is designed to expose minimal structural information, for example, the URI for Franz Schubert is currently (see below for a volatility note):

http://musicnet.mspace.fm/person/7ca5e11353f11c7d625d9aabb27a6174#id

It is comprised of the domain name (musicnet.mspace.fm), an abstract type (person), an ID taken from the musicSpace hash of the composer (7ca5e11353f11c7d625d9aabb27a6174) and a fragment to differentiate the document from the person (#id).

We have chosen a hash rather than a human-readable label because we want to avoid people using the URI because they think that it refers to a composer when it might refer to a different composer. This is important in this domain because there are a number of composers with the same or similar names. Part of the alignment process has musicologists make this distinction. By forcing people to resolve the URI and check that it is the person they are referring to, we aim to avoid incorrect references being made. In addition it gives us the freedom to alter the canonical label for a composer after we have minted the URI, so that we don’t have a label-based URI with a different label in its metadata.

Domain Name

We intend for the domain name to change soon from one which isn’t explicitly tied to mSpace – this is in place right now for convenience to us. In particular our requirements are a domain that will not cost us anything to re-register in future, will remain in our control (i.e. not get domain parked if someone forgets to renew), and will not dissuade people from using it for any partisan or political reasons. The closest we might reasonably get is musicnet.data.ac.uk, although this is still unconfirmed at this point in time, and we may have to instead use musicnet.soton.ac.uk or musicnet.ecs.soton.ac.uk, which are not preferred, since they might give the impression that the data is a Southampton-centric view of the information, which it is not. For a more in depth discussion of a proposed solutions see our previous posts (data.ac.uk proposal & data.ac.uk revisited)

Ontological Constructs

In addition to the scheme for the URI, we also had to determine the best way to expose the data in terms of the ontological constructs (specifically the class types and predicates) used in the published RDF. We are fortunate that an excellent set of linked data in the musical composer domain already exists, in the form of the BBC /music linked data. For example, the BBC /music site exposes Franz Schubert with the URI:

http://www.bbc.co.uk/music/artists/f91e3a88-24ee-4563-8963-fab73d2765ed#artist

The BBC’s data uses the Music Ontology heavily, as well as other ontologies such as SKOS, Open Vocab and FOAF. Since we are publishing similar data, it makes sense for us to use the same terms and predicates as they do where possible, which is what we have done.

We are still in the process of finalising how we will model the different labels of composers. In the figure below we offer two possible methods, the first is to create a URI for each composer for every catalogue that they are listed in, publishing the label from that catalogue under the new catalogue-based URI, and use owl:sameAs to link it to our canonical MusicNet one. The second method is to “flatten” all labels as simple skos:altLabel links, although this method loses provenance. Currently we do both, and we’ve not finalised whether this is necessary or useful.

 

RDF model for MusicNet alternative labels

RDF model for MusicNet alternative labels

 

 

Content Negotiation & Best Practice

Similarly, we also follow the BBC /music model of using HTTP 303 content negotiation to serve machine-readable RDF and human-readable HTML from the same URI. Specifically, the model we’ve borrowed is to append “.rdf” when forwarding to the RDF view of the data, and to append “.html” when forwarding to the human readable view of the data. This is now implemented, and you can try this out yourself with the above URIs, which you can turn into the following:

http://musicnet.mspace.fm/person/7ca5e11353f11c7d625d9aabb27a6174.rdf
http://musicnet.mspace.fm/person/7ca5e11353f11c7d625d9aabb27a6174.html
http://www.bbc.co.uk/music/artists/f91e3a88-24ee-4563-8963-fab73d2765ed.rdf
http://www.bbc.co.uk/music/artists/f91e3a88-24ee-4563-8963-fab73d2765ed.html

There are several other offerings from the MusicNet site, some of which have been detailed before. First, the MusicNet Codex, which is the human search engine for MusicNet. In addition we have also created a (draft!) VoiD document that describes the MusicNet data set, available here:

http://musicnet.mspace.fm/void#

The perceptive among you will notice that the VoiD document links to an RDF dump of all of the individual linked data files, available here (14MB at time of writing):

http://musicnet.mspace.fm/dump.rdf#

Simple Deployment & Hosting

As noted above, our requirements state that our deployment must be as simple as possible to maintain by typical higher education computer science department web admins. In our bid we stated that we will work with the Southampton ECS Web Team to tweak our solution. As such, in order to keep our deployment simple, we have adopted an architecture where all RDF (including the individual Linked Data files for each composer) are generated once and hosted statically. The content negotiation method (mentioned above) makes serving static RDF files simple and easy to understand by web admins that might not know much about the Semantic Web. Similarly, the VoiD document and RDF dump get generated at the same time. The content negotiation is handled by a simple PHP script and some Apache URL rewriting.

Benefits of Linked Data

One of the benefits of using Linked Data is that we can easily integrate metadata from different sources. One of the ways in which we use this is using the aforementioned BBC /music linked data. Specifically, we enrich our Linked Data offering through the use of MusicBrainz. One of the sources of metadata that we have aligned is musicbrainz, based on a data dump we were given by the LinkedBrainz project team. The BBC also have aligned their data to Musicbrainz, and thus we have been able to automatically cross-reference the composers at the BBC with the composers in MusicNet. Thus, we can link directly to the BBC, which offers a number of benefits. Firstly, it means that users can access BBC content, such as recently radio and television recordings that feature those composers (see the Franz Schubert link above, for examples), but also that we can harvest some of the BBC’s outward links in order to enrich our own Linked Data offering. Specifically, we have harvested links that the BBC make to pages on IMDB, DBPedia, Wikipedia, among others, which we now re-publish.

The data flow from the raw data sources to linked data serving is illustrated in the figure below.

MusicNet Architecture Data Flow Diagram

MusicNet Data Flow Diagram

Future Work

The following tasks remain in this area of the project:

  1. Acquire control of a long-term domain name (preferably musicnet.data.ac.uk, see above).
  2. Discuss our RDF model with experts in Linked Data, Ontological Modelling and Provenance.
  3. Determine if we will offer a SPARQL endpoint in future. If we decide not to ourselves (because it might not be sustainable once our hosting is passed over to the department), it might be desirable to put the data on the Data Incubator SPARQL host.

This post documents Work Package 3 from the MusicNet project deliverables. MusicNet is funded through the JISCEXPO programme.

 

End of year roundup

30 Dec

Its been a busy 2010 for the MusicNet project and we’ve made great progress. Our Alignment Tool is now becoming more mature and the latest code is showing significant increases in task speed, making the workflow much more efficient. We expect to be able to release Beta 2 in the new year, so keep checking back for more details.

Alignment Progress (Work Package 4)

The performance and usability improvements to our Alignment Tool (Work Package 2) have had a dramatic effect on our overall alignment progress. We are now at 56% complete, which places us firmly on target to complete the entire dataset before the proposed deadline (end of March 2011).

Codex/User Portal (Work Package 6.2)

Work has also begun on the MusicNet Codex, which aims to be a single source of search for Musicologists to find information and links into our datapartners catalogs. Although this is in the very early beta stages, it is functional and we are adding more composers as and when they are aligned.

Visit the beta of the MusicNet Codex: http://codex.musicnet.mspace.fm

The Codex publicly demonstrates for the first time the outputs of the Alignment Tool and shows the integration with the LinkedBrainz project (read about our meetup with the LinkedBrainz project).

Please feel free to leave any feedback on the Codex in the comments.

Linked Data (Work Package 5)

Work is underway to convert the output of the Alignment Tool into usable Linked Data. In the New Year we plan to release our proposed URI Scheme (Work Package 3) and also expose an early version of our data/alignment as Linked Data.

 
 

MusicNet at SDH 2010

12 Nov

Austrian Parliament Building

Last month I was very pleased to be able to present the work of the musicSpace and MusicNet research teams at the Supporting the Digital Humanities 2010 conference (Vienna, 19-20 October 2010), which was jointly organized by CLARIN and DARIAH. The musicology session was convened by PhD student Richard Lewis, and also featured presentations by Alan Marsden and Frans Wiering. Our paper explained how the motivation for MusicNet came out of our previous work on the musicSpace project.

Please download the slides from our presentation below, and take a look at the other presentations from the conference at http://www.dariah.eu/index.php?option=com_docman&Itemid=200.

 
 

MusicNet at AHM 2010

13 Oct
City Hall, Cardiff

City Hall, Cardiff

Last month at the UK e-Science All Hands Meeting 2010 at City Hall in Cardiff (13-16 September 2010), we gave our first conference paper about the MusicNet project. Thank you to everyone that came to our session and asked questions. It was informative to learn that many other delegates have encountered datasets (across a range of subjects, from geography to chess!) in which synonymous entities are not aligned; precisely the problem which the alignment tool we are building for MusicNet aims to address.

A short abstract of our paper is given below, but please also take a look at the extended abstract and our presentation slides:

Thank you to the organising committee and administrators for making the event run so smoothly.

ABSTRACT: The MusicNet Composer URI Project
Daniel Alexander Smith, David Bretherton, Joe Lambert, and mc schraefel

In any domain, a key activity of researchers is to search for and synthesize data from multiple sources in order to create new knowledge. In many cases this process is laborious, to the point of making certain questions nearly intractable because the cost of the searches outstrips the time available to complete the research. As more resources are published as Linked Data, data from multiple heterogeneous sources should be more rapidly discoverable and automatically integrable, enabling previously intractable queries to be explored, and standard queries to be significantly accelerated for more rapid knowledge discovery. But Linked Data is not of itself a complete solution. One of the key challenges of Linked Data is that its strength is also a weakness: anyone can publish anything. So in classical music, for instance, 17 sources may publish data about ‘Schubert’, but there is no de facto way to know that any of these Schuberts are the same, because the sources are not aligned. Without alignment, much of the benefit of Linked Data is diminished: resources can effectively be stranded rather than discovered, or tangled nets of only guessed at associations in a particular dataset can end up costing more than their value to untangle.

The MusicNet project, which emerged out of Southampton’s musicSpace project, is set to address the challenge just outlined by “minting” URIs for key musicology assets to provide a framework for the effective exploration of Linked Data about classical music. Unique URIs will be minted for each composer that exists in our data partners’ datasets. Basic biographical data will also be exposed, as well as name variants in different sources to allow for compatibility with legacy data. Crucially, this information will be curated by domain experts so that MusicNet will become a reliable source of data about the names of classical music composers. However, the real benefit of this work is that it will align identifiers across data sources, which is a prerequisite for the creation of Linked Data classical music and musicology resources, if such resources are to be optimally useful and usable.

The establishment of authoritative URIs for composers, and moreover the disambiguation of composers in online data sources that will flow from this, is an essential first step in the provision of Linked Data services for classical music and musicology. Our work will provide a model and tools that can usefully be employed elsewhere.

 
 

Alignment Tool (Beta Release 1)

30 Sep

There has been quite a bit of interest in the tool we’ve been developing to solve the (mis-)alignment of data across our multiple catalogs.

Today we’re pleased to be able to make a release available for download, albeit in an unsupported beta form!

Download Alignment Tool (beta 1) (Firefox 3.5+ or WebKit only)

Whilst we can’t make the data we’re working on available for download, we’ve created an example data set using names from the ECS Eprints repository. The use-case example here is that you have multiple name representations for specific individuals and you’d like to find the matches. Once installed the alignment tool will list all ‘Authors’ and when requested will present basic information about the papers/deposits that the author string was found on. For justifications on why the additional metadata/context is important be sure to read the previous post about the development process for the tool.

If you’d like to get the latest source-code for the tool as it evolves/improves you can check it out from our SVN repository which is hosted on the project’s Google Code page.

Finally, if you’d like to see what the Alignment tool should look like once you’ve got it installed then take a look at this screencast produced by our resident Musicologist David Bretherton for his recent presentation at AHM2010.

 
 

Aims, Objectives and Final Outputs

22 Jun

Problem In any domain, a key activity of researchers is to search for and synthesise data from multiple sources in order to create new knowledge. In many cases this process is laborious, to the point of making certain questions nearly intractable because the cost of the search outstrips the time available to consider the work. As more resources are published as linked data this should mean that, with appropriate tools, data from multiple heterogeneous sources can be more rapidly discovered and automatically integrated. This will enable previously intractable queries to be explored, and more standard queries to be significantly accelerated. But linked data is not of itself a complete solution. A key challenge of linked data is that its strength is also its weakness: anyone can publish anything. So in classical music, for instance, 17 sources may publish work on Schubert, but there is no de facto way to know that any of these Schuberts are the same. The sources are not aligned. Without alignment, much of the benefit of linked data is diminished: resources can effectively be stranded rather than discovered, or become tangled nets of only guessed associations.

Proposed Solution To address these problems, this project proposes to produce a suite of resources and tools that will support effective linked data exploration with a focus in musicology. The project’s original data contribution will be archival, canonical linked data references, aka “minted” URIs, for classical music composers. These URIs will associate recognized reference data sources in Musicology like COPAC, RISM, Grove, the British Library, etc (see partner letters) into standard representative pointers for composers. The original tools contribution will be data alignment mechanisms that will easily enable domain experts to associate any linked data resources with our minted reference URIs. The URIs and the alignment tools mean that musicologists as data contributors will be able to harmonize rather than replicate their resources with standard sources. Our instructional prototype contribution will be: a Codex and a Visualiser. The codex will act as a dynamic catalogue of any linked data resource that use our URIs. This prototype will act as a resource hub for musicologists: they will be able to access it with confidence of exploring well-aligned, disambiguated resources. Likewise for tool developers, this hub will be a clear data reference point for testing linked data resources. As an example of these features – resource hub, research access, tool demonstrator – we will provide a rich temporal visualisation tool. This visualization will act as a model & service template both of how linked data can be richly visualised and explored by the researcher, as well as how tool developers might take advantage of these affordances to develop new tool resources and interactions.

Domain We are focusing on musicology because we already have strong relationships with both commercial and research resource partners in musicology – Grove, BBC, British Library, COPAC, to name a few – where, through the AHRC musicSpace project we demonstrated how commercial and research developed heterogeneous data resources could be integrated for rapid exploration and knowledge building. Both the data partners of this project and our current musicSpace evaluators are keen to work with us to deliver minted URIs and these associated services that will make both their existing and new data more useful and usable by musicologists.

User Analysis We are focusing on minted URIs and data alignment services within linked data because our extensive experience in musicSpace with stakeholders and with the data resources themselves shows this service to be a sine qua non necessity for linked data resources to be useful and usable.

Deliverables. This project will deliver:

a. An archival, canonical reference set of minted musicology URIs

b. An ongoing commitment to maintain this research for ongoing scholarship

c. A suite of tools to support the alignment and integration of new linked data resources for increased discovery and usefulness

d. A backlinks service that will make new link data resources published to our Codex associated with our minted URIs and thus easy to integrate into new tools and services. A model tool to show how these resources can be dynamically added and explored in a rich hierarchical timeline and visualised alongside other historical events.

These deliverables address the following specific aims of the call:

Make a collection of resources available on the Web as structured linked data

The project will produce and publish linked data about classical music composers using data from publishers partnering on the musicSpace project. This data will be exposed using existing linked data technology and will form the basis of an online source of canonical data about (and, in time, comprehensive index of) musical composers. It is intended that as well as exposing basic meta- data about each composer (for example birth/death date and nationality) the linked data will provide URLs that reference back into the online web catalogues of our data partners so that musicologists can immediately access all relevant data from each partner collection. Composer data is fundamental to the work of musicologists and music educators, and we see this as the essential first step in the provision of linked data services for classical music.

Due to the nature of linked data, and the requirement to support the hosting of the data output of the project past the project end date, we have agreed with the ECS systems team (see supporting letter) to develop a best practice for a packaged lightweight linked data deployment strategy, to enable ECS to sustain hosting of the linked data at the permanent URIs into the future. A report on our best practice recommendations for lightweight hosting of linked data will be published.

As part of producing the linked data, unique URIs will need to be minted for each composer that exist within the data partners’ current datasets. The project team comprises experts from both musicology and the Semantic Web, which ensures that the ideal skill sets are available for creating an authoritative and reusable URI scheme. Utilising domain knowledge, data licensed from trusted musicological scholarly catalogues and in accordance with the ‘Four Rules to Linked Data’ as recommended by data.gov.uk, the project envisages producing the definitive URIs for musical composers, that are trustable as backed up by musicology scholars.

In addition to URI minting, the datasets from each data partner will need to be aligned to ensure that composers from one dataset will match up with the same composers from another. This matching should be capable of handling different formatting of names (composer disambiguation) as well as input errors occurring when the data partners digitised their catalogues. A subset of this co-reference alignment has been performed under the musicSpace project, and we propose for this project that the existing alignments are exposed as linked data, and that the alignment work be expanded to all composers within the data sets, by using an expanded version of our prototype alignment tool created for musicSpace.

Develop a prototype with instructional step-by-step demonstration and documentation

During the musicSpace project and through engagement with musicologists at the University of Southampton and the musicological community more broadly (including stakeholders identified at Durham University and Royal Holloway) through musicSpace’s dissemination and demo activities, it was apparent that a number of crucial research and education tools have yet to be developed. The data required for these tools however is available, albeit in an unhelpful format. A prime example of a cited teaching aide for HE musicology students was that of a timeline visualisation. Currently the passage of time and influence of composers throughout history can only be understood by time-consuming information-triage across the multiple online musicology catalogues. If however linked data were available it would be possible to make use of the popular open source timeline software Simile to better understand the temporal relationship between composers. Students could then use the timeline as an entry point into the multiple online musicology catalogues rather than themselves having to perform an exhaustive search of each. A benefit to using a Linked Data approach here is that any other Linked Data sources can be added to the same timelines, so that correlations between other historical events and music can be shown on the timeline, providing additional context for end-users.

In order to directly unlock the benefits of the Linked Data to non-technical end-users, a Codex will be created on top of the data that will allow musicologists to search for items of interest, and to get links to all references to those items in the partner collections. For example, a user is interested in the works of Beethoven, and searches the Codex. The search finds all Linked Data that references Beethoven and offers links to all of the collections so that the user can quickly explore the data from those providers. These links include the musicSpace data partners as well as Linked Data publishers such as MusicBrainz, DBPedia and the BBC, allowing users to listen to works through the BBC iPlayer, using their existing Linked Data output which includes classical music performances on Radio 3. The Codex will also utilise the backlinks technology, developed by the co-located enAKTing project, to automatically update the codex’s links to show all catalogues that utilise our minted URIs, so that future uses of the URIs are exposed to users.

The project intends to produce the above prototype visualisations to meet the specific needs of students of musicologists and by proxy their educators and lecturers. In addition it will also provide rich documentation to allow future projects to make use of the underlying data that is being exposed. Documentation will also be provided to demonstrate how third party datasets, both currently known or not yet existing, can be joined with the published linked data to add additional information or meaning. Video tutorials on how to find Linked Data on the web, and how to explore it with the data we expose will be posted to YouTube and the project blog.

Explore and report on the opportunities and barriers in making content structured

It is anticipated that much will be learned through the alignment of multiple data sources and that the tools generated to aide this technique will be useful for other research domains. The project intends to regularly publish findings on the project blog specifically regarding the discovery of similar resources within dissimilar non-structured datasets and how best to converge these into a single canonical structured linked dataset.

More formalised discovery resulting from the efforts of the project will be deposited into the School of Electronics & Computer Science at the University of Southampton’s EPrints Open Access online repository.

One of the roles of the musicologist on this project is to highlight important sources within musicology that can be further leveraged by conversion into Linked Data.