Goal: matching ontologies aims at dealing with heterogeneity in available data; it can also be used to interlink data. The goal of this session is to experiment with matching ontologies in the context of data exposed on the web.
Learning Outcomes: students have working knowledge of simple ontology matching and simple alignment manipulation. They should experience the benefits of doing so in exploiting data.
Duration: 60mins
Champion: Jérôme
Involved: Mathieu, Andrea
Tools: Alignment API, SPARQL (Jena suite is OK) The tutorial may in fact be achieved:
- At command line with the Alignment API (the instructions will be provided in this way),
- In a browser by using the Alignment server.
Note on software and links: The tutorial may be achieved online or offline:
- online, one may used the links provided here and the Alignement server available at [http://aserv.inrialpes.fr],
- offline, one may use the cached links and install a local alignment server.
Alternative playground: There are two sources of similar hands-on exercises that may be considered by students, in particular:
- our usual Alignment API tutorial [http://alignapi.gforge.inria.fr/tutorial/] the principles tutorial 1 [http://alignapi.gforge.inria.fr/tutorial/tutorial1/]) are reproduced here (they are also available in the Alignment API archive in the html directory).
- An extra exercise [extra.html] that may help understanding this.
Starting with a SPARQL query
We start with the SPARQL query provided in the SPARQL hands on.
It may be summarised as:
SELECT DISTINCT ?event ?title ?date WHERE { ?event <http://linkedevents.org/ontology/inSpace> ?space . ?space <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat. ?space <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?lon. ?event <http://purl.org/dc/elements/1.1/title> ?title . ?event <http://linkedevents.org/ontology/atTime> ?odate . ?odate <http://www.w3.org/2006/time#inXSDDateTime> ?date . FILTER ( ( xsd:float(?lat) - 45.22 ) > -0.5 && ( xsd:float(?lat) - 45.22 ) < 0.5 && ( xsd:float(?lon) - 5.81 ) > -0.5 && ( xsd:float(?lon) - 5.81 ) < 0.5 && regex(str(?date), "2012", "i") ) } LIMIT 10
Evaluating it on http://eventmedia.eurecom.fr/sparql/, returns the following answers:
event | title | date |
---|---|---|
http://data.linkedevents.org/event/98d5a7e8-b74e-4ad5-8cd7-855bd3e2c9a3 | Apéro Web | 2012-04-26T00:00:00+02:00 |
http://data.linkedevents.org/event/98d5a7e8-b74e-4ad5-8cd7-855bd3e2c9a3 | Apéro JS | 2012-05-10T00:00:00+02:00 |
http://data.linkedevents.org/event/1c466435-05e9-4066-a240-1468e96df936 | Soirée Devops @ Kelkoo | 2012-06-25T00:00:00+02:00 |
This query and the underlying data it retrieves is expressed with respect to the LODE ontology [http://linkedevents.org/ontology/ | ontos/lode.rdf] (as well as Dublin core).
Matching with schema.org
Schema.org has been set up by search companies for providing a uniform way to semantically annotate web pages using the schema.org ontology [http://schema.org/docs/schemaorg.owl | ontos/schemaorg.owl]. However, this query and the linkedevents data source are not annotated in schema.org.
Your goal in this hands-on session is to find alignments between the two ontologies.
For that purpose, we will explore the use of alignment techniques.
For those using the Alignment API, we will first define a few variables:
$ SOFTDIR= // The directory in which software is installed $ JAVALIB=$SOFTDIR/alignapi/liband check that this works:
$ java -jar $JAVALIB/procalign.jar
-
Try generating alignments with either the command line or the Alignment server with the StringDistAlignment method.
$ java -jar $JAVALIB/procalign.jar file:ontos/schemaorg.owl file:ontos/lode.rdf -o results/schema-lode-stringeq.rdf
Observe the results (in results/schema-lode-stringeq.rdf).
-
Try to do the same with the Levenshtein measure.
$ java -jar $JAVALIB/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance file:ontos/schemaorg.owl file:ontos/lode.rdf -o results/schema-lode-levenstein.rdf
Observe the results (in results/schema-lode-levenstein.rdf).
-
If a measure provides many results with low confidence correspondences with not-so-good quality, it is possible to trim the alignments of these correspondences in order to get rid of these correspondences. Trim the obtained alignments with the adequate threshold (in the example: .5):
$ java -jar $JAVALIB/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance file:ontos/schemaorg.owl file:ontos/lode.rdf -t .5 -o results/schema-lode-levenstein5.rdf
Should the results be better? Observe it (in results/schema-lode-levenstein5.rdf).
-
In order to evaluate the quality of the obtained result, there is nothing better than looking into it. However, some tools can help the comparison. The Alignment API offers the notion of evaluators which may be different tools. For instance, they can compute precision and recall measures or display the differences between two alignments. For this purpose, a reference alignment is needed. But we are able to provide one of these.
Use them for this purpose:
$ java -cp $JAVALIB/procalign.jar fr.inrialpes.exmo.align.cli.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.PRecEvaluator file:server/schemaorg-lode.rdf file:results/schema-lode-levenstein5.rdf $ java -cp $JAVALIB/procalign.jar fr.inrialpes.exmo.align.cli.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.DiffEvaluator file:server/schemaorg-lode.rdf file:results/schema-lode-levenstein5.rdf
Using the obtained alignment
Once this alignment has been obtained, our goal is to use it considering that a set of instances in Schema.org is available.
- What strategies are available to evaluate the given query with respect to schema.org?
- Can you express these strategies with repects to the definition given in class?
- Can you illustrate the use of such an alignment?
- How could this alignment be used for translating the query?
You can try using the Alignment API or the server for generating transformations from the alignment that you generated to a for applicable to your purpose. For instance, for generating an XSLT transformation, you may try:
$ java -cp $JAVALIB/procalign.jar fr.inrialpes.exmo.align.cli.ParserPrinter results/schema-lode-levenstein5.rdf -r fr.inrialpes.exmo.align.impl.renderer.XSLTRendererVisitor
From annotations to annotations
Here, we will try to take advantage of distributed resources for manipulating data on the web. Your goal is to be able to convert semantic markup from RDFa web pages (HTML pages with embedded RDF) so that it can be indexed by search engines. For that purpose, it will need to be converted to the schema.org ontology.
Looking at some pages online, such as http://data.linkedevents.org/event/89be8dd3-758e-4cec-9254-80f6d093bdd0 | pages/le-coderoma.html, which is a linkedevents page served by Virtuoso (that may have been returned by our previous query), or a corresponding offline page http://www.codemotion.it | pages/codemotion2012.html, you will find that they are RDFa pages.
- Your first task is to try extracting the RDF markup from these pages. You may try various online resources such as http://rdfa.info/tools/ (the python one seems to be more robust and it can be downloaded for using as command-line [http://www.w3.org/2012/pyRdfa/]).
- Then you will try to find out the vocabularies used in these pages, either by hand or programmatically,
- You will then have to consult your local alignment server for availability of alignments between these vocabularies and schema.org.
This can take advantage of the HTML interface of the Alignment server or use the REST access to that server (when launched with the -W switch):
curl -L -H "Accept:application/rdf+xml" 'http://aserv.inrialpes.fr/rest/find?onto1=http://linkedevents.org/ontology/&onto2=http://schema.org/'You will try to use these alignments for converting the vocabulary used in the ontologies, by finding corresponding entities and replacing them (replace the id by the one returned in the previous query):
curl -L -H "Accept:application/rdf+xml" 'http://aserv.inrialpes.fr/rest/corresp?id=http://aserv.inrialpes.fr:8089/alid/1342351942522/157&entity=http://linkedevents.org/ontology/atTime'