From annotations to annotations
Here, we will try to take advantage of distributed resources for manipulating data on the web. Your goal is to be able to convert semantic markup from HTML+RDFa web pages so that it can be indexed by search engines. For that purpose, it will need to be converted to the schema.org ontology. If one looks at some pages online, such as http://data.linkedevents.org/event/89be8dd3-758e-4cec-9254-80f6d093bdd0, which is a linkedevents page served by Virtuoso, or a corresponding offline page http://www.codemotion.it, they would find RDFa that they are RDFa pages, i.e. there are provided with RDF annotations.- Your first task is to try extracting the RDF markup from these pages. You may try various online resources such as http://rdfa.info/tools/ (honnestly I use the python one which is more robust).
- Then you my try to find out the vocabularies used in these pages, either by hand or programmatically,
- You will then have to consult our local alignment server for availability of alignments between these vocabularies and schema.org.
curl -L -H "Accept:application/rdf+xml" \ 'http://aserv.inrialpes.fr/rest/find?onto1=http://xmlns.com/foaf/0.1/&onto2=http://schema.org/'You will try to use these alignments for converting the vocabulary used in the ontologies, by finding corresponding entities and replacing them:
curl -L -H "Accept:application/rdf+xml" \ 'http://aserv.inrialpes.fr/rest/corresp?id=http://aserv.inrialpes.fr:8089/alid/1335542900539/204&entity=http://xmlns.com/foaf/0.1/mbox'
From Text to Annotations
The pages:- http://www.london2012.com/news/articles/day-olympic-flame-greeted-the-queen-and-the-duke-edinburgh-windsor-castle.html
- http://www.london2012.com/news/articles/largest-olympic-rings-unveiled-richmond-park.html
- http://www.london2012.com/news/articles/paralympic-torch-relay-route-revealed-1258473.html
The tools Stanbol enhancer, FRED, and Tipalo are text processing tools that analyse the text of a page, and generate RDF data reflecting the content of the page. Trying them on the pages above, and inspecting the results, consider the following questions:
- What does the result express? Is it the whole content of the page? Can you understand how the tools came up with these results?
- Do you agree with the results? What would you have done differently?
- Do you think these results could be used automatically? Would you be able to combine the results in order to get a coherent representation of the page content?
- You can give the whole text to Stanbol enhancer;
- FRED can get up to 1000 character, if you have more you will have to spit the input into parts;
- Tipalo takes a wikipedia page as input, hence once Stanbol enhancer has recognized a set of dbpedia entities, you can have additional RDF data about that entity by giving its corresponding wikipedia page URI as input to Tipalo. You will have to feed Tipalo with one entity at a time.
Enriching Annotations with Resources From the Web
The annotations you obtained previously are using specific vocabularies that are driven by the tools used to produce them. What we would like to do here is to enrich this representation with more ontological constructs from resources available on the Web. In other words, we want to build a more complete representation of these annotations, by including links and relations to resources we will find using a variety of services.- Using Watson, find ontologies that can be used to represent the classes and properties that are mentioned in the automatically generated annotations. Choose some alternative representations of these concepts and relationships and see how you would integrate them with the structures you obtained from the tools used before.
- Using sameAs.org, can you find links between the instances you obtained and other resources?
- Using Sindice.com and Sig.ma, can you find other entities that should be linked to the entities mentioned in the automatic annotation you obtained before.
Querying across linked datasets
In the SQUIN directory, execute the shell command ./bin/squin.sh start to launch the SQUIN service. Now go to the URL http://localhost:8080/SQUIN/ to start running SPARQL queries directly on the Linked Data cloud!Dereference the following URI in your browser and look at the owl:sameAs links:
<http://geo.linkeddata.es/resource/Provincia/Segovia>Run the following SPARQL query in SQUIN:
PREFIX owl: <http://www.w3.org/2002/07/owl#> SELECT * WHERE { <http://geo.linkeddata.es/resource/Provincia/Segovia> owl:sameAs ?o1. }Look at the results. What did you get back. And note that you are not running this query on the GeoLinkedData.es sparql endpoint.
Run the following SPARQL queries in SQUIN:
Get more owl:sameAs links
PREFIX owl: <http://www.w3.org/2002/07/owl#> SELECT * WHERE { <http://geo.linkeddata.es/resource/Provincia/Segovia> owl:sameAs ?o1. OPTIONAL { ?o1 owl:sameAs ?o2. } }What is the lat and long? Where is that data coming from?
PREFIX owl: >http://www.w3.org/2002/07/owl#< PREFIX wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#> SELECT * WHERE { <http://geo.linkeddata.es/resource/Provincia/Segovia> owl:sameAs ?o1. OPTIONAL{ ?o1 wgs84_pos:lat ?lat. ?o1 wgs84_pos:long ?long. } OPTIONAL { ?o1 owl:sameAs ?o2. } }Who was born in Segovia?
PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT * WHERE { <http://geo.linkeddata.es/resource/Provincia/Segovia> owl:sameAs ?o1. OPTIONAL{ ?o1 wgs84_pos:lat ?lat. ?o1 wgs84_pos:long ?long. OPTIONAL { ?ppl <http://dbpedia.org/ontology/birthPlace> ?o1 ; foaf:name ?name. } } OPTIONAL { ?o1 owl:sameAs ?o2. } }