RSS
 

Archive for October, 2010

MusicNet & LinkedBrainz Meetup

25 Oct

Overview

Last Friday the MusicNet team headed to QMUL to meet with Kurt Jacobson & Simon Dixon from the LinkedBrainz project. LinkedBrainz is also funded by the JISC Expose (#jiscexpo) programme and is working, in conjunction with MusicBrainz, to produce an official Linked Data mapping for the MusicBrainz database. You can follow their progress on the project blog at http://linkedbrainz.c4dmpresents.org/.

What does a collaboration look like?

As well as learning a bit more about each others projects we were able to look at a few ways in which we might be able to collaborate over the coming months. We also came away with a data export from the most recent MusicBrainz database for all Classical musicians. This will essentially allow us to link our exposed composer URIs directly to the MusicBrainz (or LinkedBrainz) equivalents. This will greatly increase the utility of our URIs, especially as organisations such as the BBC are already using the MusicBrainz IDs.

Adding “same-as” links to LinkedBrainz is only one side of the solution, ideally it would be great if we could convince the MusicBrainz community to provide the reverse linking. This is likely to be a longer term outcome and one we should approach once the sustainability of URIs issue has been resolved (data.ac.uk?).

How will we align our URIs to LinkedBrainz?

We’ll use out custom built Alignment Tool! Over the last few months we’ve spent quite a while engineering the tool and making sure its as re-usable as possible, we plan to add the LinkedBrainz data as though it were just another partner’s catalog. This means that once our Musicology expert has performed the alignment we’ll not only know the overlaps between our partners catalogs but we’ll also know how they map to MusicBrainz and by proxy to Wikipedia & the BBC etc.

 
 

Alignment Tool Implementation

19 Oct

In this post we’ll discuss a little about the implementation of the relatively simple server component of the Alignment Tool. You can read more about the tool in previous posts (Beta Release, Assisted Manual Data Alignment), or download the source yourself and have a play.

Server Application Component

Our servers run a typical LAMP (Linux, Apache, MySQL, PHP) stack & although it can also run python, perl & ruby we decided that due to the experience of the project team we would develop the server component in PHP. Usually when we need to write a PHP driven application we would reach for the Kohana Framework.

Kohana is an elegant HMVC PHP5 framework that provides a rich set of components for building web applications

HMVC (or Hierarchical-MVC) is an extension to the more commonly used MVC (Model View Controller). HMVC is essentially useful to help build more modular “widgets” that make up a webpage, we won’t be discussing this as it doesn’t serve our purposes for MusicNet.

In MVC, each object in a system is separated into one of the following groups:

  1. Model: Objects which make up the datastructures used in the system
  2. View: Typically the UI
  3. Controller: Where application specific code is implemented

MVC allows for proper code separation and makes for easier design and maintenance.

Using Kohana, each HTTP request to the server is interpretted as a method call on a constructor object. For example:

http://myserver.com/api/get_tags

This URL equates to calling the public function get_tags() on the controller object Api.

Lightweight PHP Framework

We felt that requiring the Kohana Framework for the server component of the Alignment Tool was a bit heavyweight but still wanted the flexibility of the a lightweight MVC architecture in which to quickly code the AJAX API used by the the Javascript Client. So taking inspiration from Kohana’s URL interpreting we wrote a lightweight framework of our own.

To achieve this we needed 3 distinct parts:

  1. URL Interpreting
  2. Controller Object
  3. Abstract UI Rendering

URL Interpreting

To enable Kohana style requests we first needed to route all URL requests through a single PHP gateway script. As we’re using the LAMP stack this is easily done using ModRewrite. Our .htaccess file in our /ajax folder looks like this:

# Turn on URL rewriting
RewriteEngine On
 
# Installation directory
RewriteBase /
 
# Allow any files or directories that exist to be displayed directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
 
# Rewrite all other URLs to index.php/URL
RewriteRule .* /ajax/index.php/$0 [PT,L]

The key part to this script is the last line, here we tell Apache to send all requests that are made below the ajax folder to the index file.

For our purposes we only need a single controller object so our bootstrapping code (index.php) only needs to work out the method/action and the arguments. Our bootstrap script looks like this:

require_once('ajax.php');
 
$path = $_SERVER['PATH_INFO'];
 
$parts = explode('/', trim($path, '/'));
 
if(count($parts))
{
   $method = array_shift($parts);
   $args = $parts;
}
 
$ajax = new Ajax();
call_user_func_array(array($ajax, $method), $args);
$ajax->output();

Controller Object

Now that all the routing is taken care of we just need a simple Controller class with a public function for each method/action in our API:

class Ajax
{
   private $status = 200;
   private $message = "Success";
   private $data = array();
 
   // Fetch all Ungrouped items
   public function ungrouped()
   { }
 
   // Fetch all Grouped items
   public function grouped()
   { }
}

We also need an output() function as this is what the bootstrap script calls to send output to the client.

public function output()
{
   if($this->status == 404)
      header('HTTP/1.0 404 Not Found');
 
   $output = array(
      "status"	=> $this->status,
      "message"	=> $this->message,
      "data"		=> $this->data,
   );
 
   header("Content-Type: application/json");
   echo json_encode($output);
}

And its a good idea to implement the __call method incase the client makes an unrecognised request:

public function __call($method, $args)
{
   $this->status = 404;
   $this->message = "Unknown method: '$method'";
}

Abstract UI Rendering

The final piece of the system is to enable abstract UI rendering. In one of the calls in the Alignment Tool, the server is required to return HTML rather than JSON. To remove this rendering from the Controller class and to enable 3rd parties (we hope the Alignment Tool will be useful to others too!) to write their own Views for their own data we use PHP’s Output buffering:

ob_start();
 
include_once("views/musicnet.php");
 
$html = ob_get_contents();
ob_end_clean();

By including the file in this way the View script has all the same variable scope as the method in the Controller object. Here’s an extract from our View file to give an idea of how it can be used.

<?php foreach($this->data as $item): ?>
	<div class="item" id="info-<?=$item->id?>">
		<h1><?=$item->label?></h1>
		<ul class="metadata">
			<?php if(isset($item->metadata->Birth_Date)): ?>
				<li><span class="title">Birth Date</span><?=$item->metadata->Birth_Date?></span></li>
			<?php endif; ?>
			<?php if(isset($item->metadata->Death_Date)): ?>
				<li><span class="title">Death Date</span><?=$item->metadata->Death_Date?></span></li>
			<?php endif; ?>
		</ul>
	</div>
<?php endforeach; ?>
 
 

MusicNet at AHM 2010

13 Oct
City Hall, Cardiff

City Hall, Cardiff

Last month at the UK e-Science All Hands Meeting 2010 at City Hall in Cardiff (13-16 September 2010), we gave our first conference paper about the MusicNet project. Thank you to everyone that came to our session and asked questions. It was informative to learn that many other delegates have encountered datasets (across a range of subjects, from geography to chess!) in which synonymous entities are not aligned; precisely the problem which the alignment tool we are building for MusicNet aims to address.

A short abstract of our paper is given below, but please also take a look at the extended abstract and our presentation slides:

Thank you to the organising committee and administrators for making the event run so smoothly.

ABSTRACT: The MusicNet Composer URI Project
Daniel Alexander Smith, David Bretherton, Joe Lambert, and mc schraefel

In any domain, a key activity of researchers is to search for and synthesize data from multiple sources in order to create new knowledge. In many cases this process is laborious, to the point of making certain questions nearly intractable because the cost of the searches outstrips the time available to complete the research. As more resources are published as Linked Data, data from multiple heterogeneous sources should be more rapidly discoverable and automatically integrable, enabling previously intractable queries to be explored, and standard queries to be significantly accelerated for more rapid knowledge discovery. But Linked Data is not of itself a complete solution. One of the key challenges of Linked Data is that its strength is also a weakness: anyone can publish anything. So in classical music, for instance, 17 sources may publish data about ‘Schubert’, but there is no de facto way to know that any of these Schuberts are the same, because the sources are not aligned. Without alignment, much of the benefit of Linked Data is diminished: resources can effectively be stranded rather than discovered, or tangled nets of only guessed at associations in a particular dataset can end up costing more than their value to untangle.

The MusicNet project, which emerged out of Southampton’s musicSpace project, is set to address the challenge just outlined by “minting” URIs for key musicology assets to provide a framework for the effective exploration of Linked Data about classical music. Unique URIs will be minted for each composer that exists in our data partners’ datasets. Basic biographical data will also be exposed, as well as name variants in different sources to allow for compatibility with legacy data. Crucially, this information will be curated by domain experts so that MusicNet will become a reliable source of data about the names of classical music composers. However, the real benefit of this work is that it will align identifiers across data sources, which is a prerequisite for the creation of Linked Data classical music and musicology resources, if such resources are to be optimally useful and usable.

The establishment of authoritative URIs for composers, and moreover the disambiguation of composers in online data sources that will flow from this, is an essential first step in the provision of Linked Data services for classical music and musicology. Our work will provide a model and tools that can usefully be employed elsewhere.