Goal: matching ontologies aims at dealing with heterogeneity in available data; it can also be used to interlink data. The goal of this session is to experiment with matching ontologies in the context of data exposed on the web.

Learning Outcomes: students have working knowledge of simple ontology matching and simple alignment manipulation. They should experience the benefits of doing so in exploiting data.

Duration: 60mins

Champion: Jérôme

Involved: Mathieu, Andrea

Tools: Alignment API, SPARQL (Jena suite is OK) The tutorial may in fact be achieved:

Note on software and links: The tutorial may be achieved online or offline:

The Alignment API and server (version 4.4) are available from the software archive. For installing the server, follows the simplest instructions in the README.AServ file. This will tell what are the URL prefixes.

Alternative playground: There are two sources of similar hands-on exercises that may be considered by students, in particular:

Starting with a SPARQL query

We start with the SPARQL query provided in the SPARQL hands on.

It may be summarised as:

SELECT DISTINCT ?event ?title ?date 
WHERE {
  ?event <http://linkedevents.org/ontology/inSpace> ?space .
  ?space <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat. 
  ?space <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?lon.
  ?event <http://purl.org/dc/elements/1.1/title> ?title .
  ?event <http://linkedevents.org/ontology/atTime> ?odate .
  ?odate <http://www.w3.org/2006/time#inXSDDateTime> ?date .
  FILTER ( ( xsd:float(?lat) - 45.22 ) > -0.5  && ( xsd:float(?lat) - 45.22 ) < 0.5 
        && ( xsd:float(?lon) - 5.81 ) > -0.5  && ( xsd:float(?lon) - 5.81 ) < 0.5 
        && regex(str(?date), "2012", "i") )
}
LIMIT 10

Evaluating it on http://eventmedia.eurecom.fr/sparql/, returns the following answers:

event title date
http://data.linkedevents.org/event/98d5a7e8-b74e-4ad5-8cd7-855bd3e2c9a3 Apéro Web 2012-04-26T00:00:00+02:00
http://data.linkedevents.org/event/98d5a7e8-b74e-4ad5-8cd7-855bd3e2c9a3 Apéro JS 2012-05-10T00:00:00+02:00
http://data.linkedevents.org/event/1c466435-05e9-4066-a240-1468e96df936 Soirée Devops @ Kelkoo 2012-06-25T00:00:00+02:00

This query and the underlying data it retrieves is expressed with respect to the LODE ontology [http://linkedevents.org/ontology/ | ontos/lode.rdf] (as well as Dublin core).

Matching with schema.org

Schema.org has been set up by search companies for providing a uniform way to semantically annotate web pages using the schema.org ontology [http://schema.org/docs/schemaorg.owl | ontos/schemaorg.owl]. However, this query and the linkedevents data source are not annotated in schema.org.

Your goal in this hands-on session is to find alignments between the two ontologies.

For that purpose, we will explore the use of alignment techniques.

For those using the Alignment API, we will first define a few variables:

   
$  SOFTDIR= // The directory in which software is installed
$  JAVALIB=$SOFTDIR/alignapi/lib
and check that this works:
$ java -jar $JAVALIB/procalign.jar

  1. Try generating alignments with either the command line or the Alignment server with the StringDistAlignment method.

    $ java -jar $JAVALIB/procalign.jar file:ontos/schemaorg.owl file:ontos/lode.rdf -o results/schema-lode-stringeq.rdf
    

    Observe the results (in results/schema-lode-stringeq.rdf).

  2. Try to do the same with the Levenshtein measure.

     $ java -jar $JAVALIB/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance  file:ontos/schemaorg.owl file:ontos/lode.rdf -o results/schema-lode-levenstein.rdf
    

    Observe the results (in results/schema-lode-levenstein.rdf).

  3. If a measure provides many results with low confidence correspondences with not-so-good quality, it is possible to trim the alignments of these correspondences in order to get rid of these correspondences. Trim the obtained alignments with the adequate threshold (in the example: .5):

    $ java -jar $JAVALIB/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance  file:ontos/schemaorg.owl file:ontos/lode.rdf -t .5 -o results/schema-lode-levenstein5.rdf
    

    Should the results be better? Observe it (in results/schema-lode-levenstein5.rdf).

  4. In order to evaluate the quality of the obtained result, there is nothing better than looking into it. However, some tools can help the comparison. The Alignment API offers the notion of evaluators which may be different tools. For instance, they can compute precision and recall measures or display the differences between two alignments. For this purpose, a reference alignment is needed. But we are able to provide one of these.

    Use them for this purpose:

    $ java -cp $JAVALIB/procalign.jar fr.inrialpes.exmo.align.cli.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.PRecEvaluator file:server/schemaorg-lode.rdf file:results/schema-lode-levenstein5.rdf
    $ java -cp $JAVALIB/procalign.jar fr.inrialpes.exmo.align.cli.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.DiffEvaluator file:server/schemaorg-lode.rdf file:results/schema-lode-levenstein5.rdf
    

Using the obtained alignment

Once this alignment has been obtained, our goal is to use it considering that a set of instances in Schema.org is available.

You can try using the Alignment API or the server for generating transformations from the alignment that you generated to a for applicable to your purpose. For instance, for generating an XSLT transformation, you may try:

$ java -cp $JAVALIB/procalign.jar fr.inrialpes.exmo.align.cli.ParserPrinter results/schema-lode-levenstein5.rdf -r fr.inrialpes.exmo.align.impl.renderer.XSLTRendererVisitor

From annotations to annotations

Here, we will try to take advantage of distributed resources for manipulating data on the web. Your goal is to be able to convert semantic markup from RDFa web pages (HTML pages with embedded RDF) so that it can be indexed by search engines. For that purpose, it will need to be converted to the schema.org ontology.

Looking at some pages online, such as http://data.linkedevents.org/event/89be8dd3-758e-4cec-9254-80f6d093bdd0 | pages/le-coderoma.html, which is a linkedevents page served by Virtuoso (that may have been returned by our previous query), or a corresponding offline page http://www.codemotion.it | pages/codemotion2012.html, you will find that they are RDFa pages.

This can take advantage of the HTML interface of the Alignment server or use the REST access to that server (when launched with the -W switch):

curl -L -H "Accept:application/rdf+xml" 'http://aserv.inrialpes.fr/rest/find?onto1=http://linkedevents.org/ontology/&onto2=http://schema.org/'
You will try to use these alignments for converting the vocabulary used in the ontologies, by finding corresponding entities and replacing them (replace the id by the one returned in the previous query):
curl -L -H "Accept:application/rdf+xml" 'http://aserv.inrialpes.fr/rest/corresp?id=http://aserv.inrialpes.fr:8089/alid/1342351942522/157&entity=http://linkedevents.org/ontology/atTime'