RSS
 

Archive for July, 2010

data.ac.uk proposal

20 Jul

Since our previous post on a domain name for our project, it has been suggested that a possible scalable solution is a central data repository for the UK academic community (data.ac.uk), in a similar style to data.gov.uk, which is a repository of open public government data.

To recap, the need is that a commercial domain carries a per-year fee (typically 10-20 GBP), which is intractable to maintain for many years for typical academic short-term projects, and thus a purchased domain would not exist in future. While this may not be crucial for a project homepage, it is crucial for Linked Data, where the URLs of the data must exist in perpetuity, because the data themselves use these URIs. Since our project’s output is Linked Data, which is intended to be used by anyone that outputs data that may include Classical Music (we have partners including BBC /music, who are interested in using our URLs), the domain must exist in perpetuity.

We also found that it would be best to not use a subdomain of our school (such as musicnet.soton.ac.uk), since this would be seen as partisan to the school/university and is likely to get less uptake that something at a higher level (such as musicnet.ac.uk).

Current JANET/UKERNA policy does not enable us to have a top level academic domain (musicnet.ac.uk) because they are limited to projects funded centrally, and for at least 2 years. This makes sense to lower the overhead of having to register domains for thousands of small projects.

Thus, a floated suggestion has been for JISC to fund/host a “data.ac.uk” domain and/or repository to provide a linked data domain and/or web hosting solution for academic data publishers (such as MusicNet).

There are two key points that I would like to make:

1) It will provide a lower technical and financial barrier to entry to people that have some RDF to publish.

If a project has some RDF to publish right now, they have to first figure out how to publish it correctly as linked data — few academic projects have managed this, and they probably don’t realise that they could be doing better. By providing a central service that can manage and host data properly, there is also the potential to add extract features. This is analogous to a cloud hosted blog service, such as tumblr.com or blogspot.com, where security patches, and features are added for free by the hosters, without the publishers doing anything. For RDF I can forsee better human readable access, backlink features to external RDF, etc. being added over time, even to legacy RDF that is being hosted for projects that have long since finished. Similarly, the hosting and maintenance of the servers is then the responsibility of the data.ac.uk team, rather than some small-term project. The investment in creating this data is then protected, since the central repository holds it.

2) It must not limit what technically able projects can do.

In our case, we do not require a hosting solution, because we already know how to host data, and we already negotiated ad-inifinitum in perpetuity hosting of our data by the central School of Electronics and Computer Science administration team (this was a key part of our project bid).

Furthermore, we wish to host an alignment service that enables musicologists to make edits to data after it has been published, so that it is kept up-to-date, and any mistakes can be fixed over time. Other projects may have different needs.

Thus it is important for projects like ours that we can apply for just a subdomain (without hosting) of data.ac.uk (musicnet.data.ac.uk) so that we can run our own hosting and bespoke services.

 
18 Comments

Posted in Uncategorized

 

WP1: Data Triage Report

20 Jul

We recently conducted an assessment of the metadata available in our data partners digital catalogs as part of our stated Work Package 1 deliverable. The aim of this assessment was to ascertain what metadata fields were good candidates to be exposed in our Linked Data.

Here is the introduction to the report:

This document outlines the metadata we aim to expose as part of the MusicNet Project. Decisions on what metadata to include are based on the following factors:

  1. Musicologists needs (recommended by David Bretherton)
  2. Technical feasibility (recommended by Joe Lambert)
  3. Licensing Restrictions

Our remit was to expose only information that is available in the public domain but to ensure enough is made available to allow a composer to be unambiguously identified.

You can download the full report from our Google Code Repository:
WP1: Data Triage Report v1.0 (67KB)

 
No Comments

Posted in Uncategorized

 

Data, URIs & Permanence

12 Jul

There is a strong drive in the UK at the moment to turn public sector data into semantically marked up and globally accessible resources (see OpenPSI for examples of use). There is also a heavy push from academic funders to make the outputs of research projects available in a similar format, like the JISCexpo call that this project is funded under.

Whilst there seems to be a lot of focus on the creation of URI schemes (Jeni Tennison, data.gov.uk) there doesn’t seem to be the same consideration given to the permanence of the resources created. Typically a project concerned with the creation of Linked Data will be funded for a fixed period of time. During this time the funding pays for staff to conduct the work required, which ultimately results in the production of some Linked Data. Once the project has finished and the funding has stopped, what happens to the data that is produced?

One of the primary requirements of useful Linked Data is that that the URIs created exist in perpetuity, so that future data sets can be linked to them. Assuming a project is funded for a year and that there is a commitment to locally host the data for a further year, what happens after this period has elapsed?

What we need is provision for UK academic data to be hosted on the JANET network under a suitable .ac.uk domain. For example the data outputs of this project could be hosted on http://musicnet.ac.uk, ensuring that:

  1. The URIs are Institution independent
  2. There is no ongoing administration cost for renewing a project domain name (.ac.uk are a one off payment)



The current rules make it hard for short term projects to acquire an academic ac.uk domain, as a project must be funded for two years as a requirement. It is important that there be a process to decide which projects should and shouldn’t get a domain name but as the way we use the web changes, with focus shifting more to semantically marked up data rather than just human consumed HTML, the academic research community need to discuss & rethink the metric on which this decision is made.

This issue of persistent URI hosting is a new and increasingly important problem for the emerging Semantic Web. If we started to asses projects based on their impact over time as opposed to just their funding duration we might encourage the creation of more short term projects that expose Linked Data, which can only be a good thing for the community at large!

There is also the issue of hosting. Who will actually provide the server space where the data is to reside. However this is a secondary issue, one that is easier to discuss once a system is in place for maintaining the actual URIs.

 
19 Comments

Posted in Uncategorized

 

Minutes of Face-to-Face Project Kick-off Meeting

09 Jul

MusicNet project kick-off meeting held on 24 June 2010 at Southampton.

Completed actions to report:

mc has call with DFF confirming project plan and offer.
Project Blog Established with initial 7 posts by Joe; update posted to DFF.
DFF email confirming request to JISC for Project Offer Letter requested.

Agenda:

Data Preparation

Action: David and Joe to meet on 9th July to synchronise on the data triage task (WP1).
Deliverables of this action will be a document plan of the core data resources to be used within the project, detailed what they offer, and how they will be used.

Workshop Plan

Goal: have a workshop to demonstrate project, including how to use it and what it offers. Identified participants, including stakeholders as listed in bid document. Planned to be hosted in London.

David: Decide on a date for the workshop.

David: Approach IMR about hosting the workshop, negotiate suitable date in May with stakeholders. Due: August.

Update bid partners

David: Write to authors of letters of support to thank and update them. Due: 9 July

Dan: Write to other letters of support thanking them. Due: 9 July

PR

David: To write a short publicity announcement for the music news web page, and to send around to us to check, and we can decide on releasing it through Joyce/ECS also. Due: 16 July

Joe: Set up Blog to Twitter posting. Due: 9 July

OSSWatch

Joe: To reconnect Gabriel at OSSWatch and confirm a date. Due: 9 July

Project Management

Joe: Upload Gantt chart in a format we can all look at that shows which tasks are outstanding and next.

 
1 Comment

Posted in Uncategorized