Problem In any domain, a key activity of researchers is to search for and synthesise data from multiple sources in order to create new knowledge. In many cases this process is laborious, to the point of making certain questions nearly intractable because the cost of the search outstrips the time available to consider the work. As more resources are published as linked data this should mean that, with appropriate tools, data from multiple heterogeneous sources can be more rapidly discovered and automatically integrated. This will enable previously intractable queries to be explored, and more standard queries to be significantly accelerated. But linked data is not of itself a complete solution. A key challenge of linked data is that its strength is also its weakness: anyone can publish anything. So in classical music, for instance, 17 sources may publish work on Schubert, but there is no de facto way to know that any of these Schuberts are the same. The sources are not aligned. Without alignment, much of the benefit of linked data is diminished: resources can effectively be stranded rather than discovered, or become tangled nets of only guessed associations.
Proposed Solution To address these problems, this project proposes to produce a suite of resources and tools that will support effective linked data exploration with a focus in musicology. The project’s original data contribution will be archival, canonical linked data references, aka “minted” URIs, for classical music composers. These URIs will associate recognized reference data sources in Musicology like COPAC, RISM, Grove, the British Library, etc (see partner letters) into standard representative pointers for composers. The original tools contribution will be data alignment mechanisms that will easily enable domain experts to associate any linked data resources with our minted reference URIs. The URIs and the alignment tools mean that musicologists as data contributors will be able to harmonize rather than replicate their resources with standard sources. Our instructional prototype contribution will be: a Codex and a Visualiser. The codex will act as a dynamic catalogue of any linked data resource that use our URIs. This prototype will act as a resource hub for musicologists: they will be able to access it with confidence of exploring well-aligned, disambiguated resources. Likewise for tool developers, this hub will be a clear data reference point for testing linked data resources. As an example of these features – resource hub, research access, tool demonstrator – we will provide a rich temporal visualisation tool. This visualization will act as a model & service template both of how linked data can be richly visualised and explored by the researcher, as well as how tool developers might take advantage of these affordances to develop new tool resources and interactions.
Domain We are focusing on musicology because we already have strong relationships with both commercial and research resource partners in musicology – Grove, BBC, British Library, COPAC, to name a few – where, through the AHRC musicSpace project we demonstrated how commercial and research developed heterogeneous data resources could be integrated for rapid exploration and knowledge building. Both the data partners of this project and our current musicSpace evaluators are keen to work with us to deliver minted URIs and these associated services that will make both their existing and new data more useful and usable by musicologists.
User Analysis We are focusing on minted URIs and data alignment services within linked data because our extensive experience in musicSpace with stakeholders and with the data resources themselves shows this service to be a sine qua non necessity for linked data resources to be useful and usable.
Deliverables. This project will deliver:
a. An archival, canonical reference set of minted musicology URIs
b. An ongoing commitment to maintain this research for ongoing scholarship
c. A suite of tools to support the alignment and integration of new linked data resources for increased discovery and usefulness
d. A backlinks service that will make new link data resources published to our Codex associated with our minted URIs and thus easy to integrate into new tools and services. A model tool to show how these resources can be dynamically added and explored in a rich hierarchical timeline and visualised alongside other historical events.
These deliverables address the following specific aims of the call:
Make a collection of resources available on the Web as structured linked data
The project will produce and publish linked data about classical music composers using data from publishers partnering on the musicSpace project. This data will be exposed using existing linked data technology and will form the basis of an online source of canonical data about (and, in time, comprehensive index of) musical composers. It is intended that as well as exposing basic meta- data about each composer (for example birth/death date and nationality) the linked data will provide URLs that reference back into the online web catalogues of our data partners so that musicologists can immediately access all relevant data from each partner collection. Composer data is fundamental to the work of musicologists and music educators, and we see this as the essential first step in the provision of linked data services for classical music.
Due to the nature of linked data, and the requirement to support the hosting of the data output of the project past the project end date, we have agreed with the ECS systems team (see supporting letter) to develop a best practice for a packaged lightweight linked data deployment strategy, to enable ECS to sustain hosting of the linked data at the permanent URIs into the future. A report on our best practice recommendations for lightweight hosting of linked data will be published.
As part of producing the linked data, unique URIs will need to be minted for each composer that exist within the data partners’ current datasets. The project team comprises experts from both musicology and the Semantic Web, which ensures that the ideal skill sets are available for creating an authoritative and reusable URI scheme. Utilising domain knowledge, data licensed from trusted musicological scholarly catalogues and in accordance with the ‘Four Rules to Linked Data’ as recommended by data.gov.uk, the project envisages producing the definitive URIs for musical composers, that are trustable as backed up by musicology scholars.
In addition to URI minting, the datasets from each data partner will need to be aligned to ensure that composers from one dataset will match up with the same composers from another. This matching should be capable of handling different formatting of names (composer disambiguation) as well as input errors occurring when the data partners digitised their catalogues. A subset of this co-reference alignment has been performed under the musicSpace project, and we propose for this project that the existing alignments are exposed as linked data, and that the alignment work be expanded to all composers within the data sets, by using an expanded version of our prototype alignment tool created for musicSpace.
Develop a prototype with instructional step-by-step demonstration and documentation
During the musicSpace project and through engagement with musicologists at the University of Southampton and the musicological community more broadly (including stakeholders identified at Durham University and Royal Holloway) through musicSpace’s dissemination and demo activities, it was apparent that a number of crucial research and education tools have yet to be developed. The data required for these tools however is available, albeit in an unhelpful format. A prime example of a cited teaching aide for HE musicology students was that of a timeline visualisation. Currently the passage of time and influence of composers throughout history can only be understood by time-consuming information-triage across the multiple online musicology catalogues. If however linked data were available it would be possible to make use of the popular open source timeline software Simile to better understand the temporal relationship between composers. Students could then use the timeline as an entry point into the multiple online musicology catalogues rather than themselves having to perform an exhaustive search of each. A benefit to using a Linked Data approach here is that any other Linked Data sources can be added to the same timelines, so that correlations between other historical events and music can be shown on the timeline, providing additional context for end-users.
In order to directly unlock the benefits of the Linked Data to non-technical end-users, a Codex will be created on top of the data that will allow musicologists to search for items of interest, and to get links to all references to those items in the partner collections. For example, a user is interested in the works of Beethoven, and searches the Codex. The search finds all Linked Data that references Beethoven and offers links to all of the collections so that the user can quickly explore the data from those providers. These links include the musicSpace data partners as well as Linked Data publishers such as MusicBrainz, DBPedia and the BBC, allowing users to listen to works through the BBC iPlayer, using their existing Linked Data output which includes classical music performances on Radio 3. The Codex will also utilise the backlinks technology, developed by the co-located enAKTing project, to automatically update the codex’s links to show all catalogues that utilise our minted URIs, so that future uses of the URIs are exposed to users.
The project intends to produce the above prototype visualisations to meet the specific needs of students of musicologists and by proxy their educators and lecturers. In addition it will also provide rich documentation to allow future projects to make use of the underlying data that is being exposed. Documentation will also be provided to demonstrate how third party datasets, both currently known or not yet existing, can be joined with the published linked data to add additional information or meaning. Video tutorials on how to find Linked Data on the web, and how to explore it with the data we expose will be posted to YouTube and the project blog.
Explore and report on the opportunities and barriers in making content structured
It is anticipated that much will be learned through the alignment of multiple data sources and that the tools generated to aide this technique will be useful for other research domains. The project intends to regularly publish findings on the project blog specifically regarding the discovery of similar resources within dissimilar non-structured datasets and how best to converge these into a single canonical structured linked dataset.
More formalised discovery resulting from the efforts of the project will be deposited into the School of Electronics & Computer Science at the University of Southampton’s EPrints Open Access online repository.
One of the roles of the musicologist on this project is to highlight important sources within musicology that can be further leveraged by conversion into Linked Data.