RSS
 

Archive for the ‘Discussion’ Category

data.ac.uk revisited

18 Aug

Following on from Dan’s recent post about data.ac.uk there is a growing consensus that this is a necessary step that needs to be made to facilitate the growth of usable linked data in Higher Education. Next week there will be a public briefing paper from JISC on the topic. We expect that they will announce, among other things, that the data.ac.uk domain name has been ring-fenced for to provide a data.gov.uk-esque repository for HE.

The tools to visualise this type of data have been around for a long time but as yet there has been no long term strategy for maintaining the actual data that drives these tools.

Why data.ac.uk?

So why do we need a centralised datastore for HE data? Why can’t we just host it locally on data.southampton.ac.uk?

In our original post Dan highlighted our stance on the need for institutional agnosticism for the data we are creating for this project:

We also found that it would be best to not use a subdomain of our school (such as musicnet.soton.ac.uk), since this would be seen as partisan to the school/university and is likely to get less uptake that something at a higher level (such as musicnet.ac.uk).

It doesn’t really make sense to expose the MusicNet data (a set of canonical names and datapoints for classical music composers) on a URI which contains the institution (in our case University of Southampton). The data has global significance and it just so happens that we are the ones tasked with exposing it. Our data is a perfect fit for a non politically aligned data.ac.uk implementation.

In his post Time for data.ac.uk? Or a local data.open.ac.uk? Tony Hirst raises some interesting questions about where we should be storing data:

Another possible source of data in a raw form is from the data.gov.uk education datastore (an example can be found via here, which makes me wonder about the extent to which a data.ac.uk website might just be an HE/FE view over that wider datastore? (Related: @kitwallace on University data.) And then maybe, hence: would data.*.ac.uk be a view over data.ac.uk for a particular institution. Or *.sch.ac.uk a view over a data.sch.ac.uk view over the full education datastore?

These questions are yet to be answered but he does make an interesting point drawing on a presentation given by Mike Nolan at mashlib2010. At present a lot of the information that could be (re-)exposed at minimal cost would be the syndicated data that most institutions already make available, RSS, CalDav etc. I would argue however that these institutional specific datasets, being themselves already intrinsically politically aligned, would be a better fit for localised hosting on a data.southampton.ac.uk type URI. The URI then infers ownership and some context/authority to the data held there.

So perhaps there is room for both data.ac.uk AND data.southampton.ac.uk?

How to make data.ac.uk work?

The only way we can make any datastore work is if the data is available in formats that people are able to make use of! For MusicNet we intend to host RDF-XML on our local server (until a data.ac.uk alternative becomes a reality) using the standard content negotiation to allow for a human readable HTML representation to be presented to casual users.

We also intend to investigate the Linked Data API that was announced at the Second London Linked Data Meetup and has been developed by Dave ReynoldsJeni Tennison and Leigh Dodds. The LD API will allow us to also provide our dataset in formats such as JSON & Turtle using a RESTful querying API, which is currently the protocol of choice for mashup/web2.0 developers.

I would like to see a similar infrastructure in place on a centrally hosted data.ac.uk.

 
 

data.ac.uk proposal

20 Jul

Since our previous post on a domain name for our project, it has been suggested that a possible scalable solution is a central data repository for the UK academic community (data.ac.uk), in a similar style to data.gov.uk, which is a repository of open public government data.

To recap, the need is that a commercial domain carries a per-year fee (typically 10-20 GBP), which is intractable to maintain for many years for typical academic short-term projects, and thus a purchased domain would not exist in future. While this may not be crucial for a project homepage, it is crucial for Linked Data, where the URLs of the data must exist in perpetuity, because the data themselves use these URIs. Since our project’s output is Linked Data, which is intended to be used by anyone that outputs data that may include Classical Music (we have partners including BBC /music, who are interested in using our URLs), the domain must exist in perpetuity.

We also found that it would be best to not use a subdomain of our school (such as musicnet.soton.ac.uk), since this would be seen as partisan to the school/university and is likely to get less uptake that something at a higher level (such as musicnet.ac.uk).

Current JANET/UKERNA policy does not enable us to have a top level academic domain (musicnet.ac.uk) because they are limited to projects funded centrally, and for at least 2 years. This makes sense to lower the overhead of having to register domains for thousands of small projects.

Thus, a floated suggestion has been for JISC to fund/host a “data.ac.uk” domain and/or repository to provide a linked data domain and/or web hosting solution for academic data publishers (such as MusicNet).

There are two key points that I would like to make:

1) It will provide a lower technical and financial barrier to entry to people that have some RDF to publish.

If a project has some RDF to publish right now, they have to first figure out how to publish it correctly as linked data — few academic projects have managed this, and they probably don’t realise that they could be doing better. By providing a central service that can manage and host data properly, there is also the potential to add extract features. This is analogous to a cloud hosted blog service, such as tumblr.com or blogspot.com, where security patches, and features are added for free by the hosters, without the publishers doing anything. For RDF I can forsee better human readable access, backlink features to external RDF, etc. being added over time, even to legacy RDF that is being hosted for projects that have long since finished. Similarly, the hosting and maintenance of the servers is then the responsibility of the data.ac.uk team, rather than some small-term project. The investment in creating this data is then protected, since the central repository holds it.

2) It must not limit what technically able projects can do.

In our case, we do not require a hosting solution, because we already know how to host data, and we already negotiated ad-inifinitum in perpetuity hosting of our data by the central School of Electronics and Computer Science administration team (this was a key part of our project bid).

Furthermore, we wish to host an alignment service that enables musicologists to make edits to data after it has been published, so that it is kept up-to-date, and any mistakes can be fixed over time. Other projects may have different needs.

Thus it is important for projects like ours that we can apply for just a subdomain (without hosting) of data.ac.uk (musicnet.data.ac.uk) so that we can run our own hosting and bespoke services.

 
 

Data, URIs & Permanence

12 Jul

There is a strong drive in the UK at the moment to turn public sector data into semantically marked up and globally accessible resources (see OpenPSI for examples of use). There is also a heavy push from academic funders to make the outputs of research projects available in a similar format, like the JISCexpo call that this project is funded under.

Whilst there seems to be a lot of focus on the creation of URI schemes (Jeni Tennison, data.gov.uk) there doesn’t seem to be the same consideration given to the permanence of the resources created. Typically a project concerned with the creation of Linked Data will be funded for a fixed period of time. During this time the funding pays for staff to conduct the work required, which ultimately results in the production of some Linked Data. Once the project has finished and the funding has stopped, what happens to the data that is produced?

One of the primary requirements of useful Linked Data is that that the URIs created exist in perpetuity, so that future data sets can be linked to them. Assuming a project is funded for a year and that there is a commitment to locally host the data for a further year, what happens after this period has elapsed?

What we need is provision for UK academic data to be hosted on the JANET network under a suitable .ac.uk domain. For example the data outputs of this project could be hosted on http://musicnet.ac.uk, ensuring that:

  1. The URIs are Institution independent
  2. There is no ongoing administration cost for renewing a project domain name (.ac.uk are a one off payment)



The current rules make it hard for short term projects to acquire an academic ac.uk domain, as a project must be funded for two years as a requirement. It is important that there be a process to decide which projects should and shouldn’t get a domain name but as the way we use the web changes, with focus shifting more to semantically marked up data rather than just human consumed HTML, the academic research community need to discuss & rethink the metric on which this decision is made.

This issue of persistent URI hosting is a new and increasingly important problem for the emerging Semantic Web. If we started to asses projects based on their impact over time as opposed to just their funding duration we might encourage the creation of more short term projects that expose Linked Data, which can only be a good thing for the community at large!

There is also the issue of hosting. Who will actually provide the server space where the data is to reside. However this is a secondary issue, one that is easier to discuss once a system is in place for maintaining the actual URIs.