Lab: SPARQL: Difference between revisions

From info216
No edit summary
 
(101 intermediate revisions by 7 users not shown)
Line 1: Line 1:
=Lab 3: SPARQL=
==Topics==
==Topics==
* Meeting with Andreas to discuss group project idea.
* Setting up GraphDB
* Setting up the Blazegraph graph database. Previously we have only stored our triples in memory, which is not persistent.
* SPARQL queries and updates
* SPARQL queries and updates. We use SPARQL to retrieve of update triples in our databases/graphs of triples


==Meeting with Andreas==
==Useful materials==
One group at the time will go and talk to Andreas Lothe Opdahl about their group project idea. This is an opportunity to get early feedback for the programming project that you will develop throughout the semester.
GraphDB documentation:
The office of Andreas is in Room 609 on the 6th floor of the SV-building next door(Laurits Melzers house).  
* [https://graphdb.ontotext.com/documentation/10.8/ Getting Started with GraphDB]
Enter the doors on the left when you are facing the entrance of the building and walk up the stairs.


Remember, we have a wiki page (linked below) that describes details about the group project, including some example ideas.  
Introduction to SPARQL:
* [https://graphdb.ontotext.com/documentation/10.8/sparql.html Getting Started with SPARQL]


SPARQL reference:
* [https://www.w3.org/TR/sparql11-query/ SPARQL Query Documentation]
<!--
* [http://www.w3.org/TR/sparql11-update/ SPARQL Update Documentation]
-->
* [https://en.wikibooks.org/wiki/SPARQL/Expressions_and_Functions SPARQL Expressions and Functions]


==Tasks==
==Tasks==
Meanwhile you can start working on the tasks for the next lab which is about SPARQL and storage of triples in Blazegraph. Install blazegraph like below.
We recommend you download and install the free desktop version of OntoText's GraphDB to run the SPARQL exercises.
If you have trouble installing Blazegraph you can use this link for now: "i2s.uib.no:8888/bigdata/#splash".
This is the same blazegraph interface, but its stored in the cloud and only be used on the UiB network.


==Installing the Blazegraph database on your own computer==
If you do not like proprietary software, it is still possible to do most of the exercises using Blazegraph, which you can [https://blazegraph.com/ download here] (requires Java). Blazegraph is a powerful open-source tool, but GraphDB offers even more functionality and is what the lab leader will prepare for this semester.
Download Blazegraph (blazegraph.jar) from here: [https://blazegraph.com/ https://blazegraph.com/]
I recommend placing blazegraph.jar in the same folder of your python project for the labs.
Navigate to the folder of blazegraph.jar in your commandline/terminal using cd. (cd C:\Users\Martin\PycharmProjects\info216_labs for me as an example). Now run this command:
<syntaxhighlight>
java -server -Xmx4g -jar blazegraph.jar
</syntaxhighlight>
You might have to install an older version of java (7) if you have problems installing it. Similarily there might be problems if you have a 32-bit version of Java. 
If it works it should now display an url like: "http://10.0.0.13:9999/blazegraph/". Open this in a browser.  


You can now run SPARQL queries and updates and load RDF graphs from your file into Blazegraph.
===Installing and running GraphDB===
In the update tab, load RDF data (select type below) and then paste the contents of your turtle/.txt file to add them all at once to the database. If you have not serialized your graph from lab 2 yet, you can use the triples on the bottom of the page instead. Just copy and paste them into the Update section.
Follow the instructions in [https://graphdb.ontotext.com/documentation/10.8/ Getting Started with GraphDB] to download and install GraphDB. It seems pre-registration is no longer needed for the Free version (please let Andreas and Sondre know if that is not correct and we will update.)


When GraphDB has been properly installed and is started, it should open in a web browser window at the address http://localhost:7200/ .


Write the following SPARQL queries:  
===Setting up a repository===
Follow the instructions in [https://graphdb.ontotext.com/documentation/10.8/ Getting Started with GraphDB] to create a new GraphDB Repository called, for example, ''info216_lab2_NN'', where ''NN'' are your initials. Choose ''No inference'' for now. Otherwise, the default parameters are fine.


* SELECT all triples in your graph.
Connect to the new repository and pin it as your default repository.
* SELECT all the interests of Cade.
* SELECT the city and country of where Emma lives.
* SELECT only people who are older than 26.
* SELECT Everyone who graduated with a Bachelor Degree.  


Use SPARQL Update's DELETE DATA to delete that fact that Cade is interested in Photography. Run your SPARQL query again to check that the graph has changed.
===Load data===
Download the Turtle file [[File:russia_investigation_kg.txt]], and save it with the correct extension, as ''russia_investigation_kg.ttl'' (not ''.txt''). (You can also experiment with the Turtle file you saved after exercises 1 and 2.) Load the Russia_investigation data through the GraphDB Workbench as described in the QuickStart guide.


Use INSERT DATA to add information about Sergio Pastor, who lives in 4 Carrer del Serpis, 46021 Valencia, Spain. he has a M.Sc. in computer from the University of Valencia from 2008. His areas of expertise include big data, semantic technologies and machine learning.
You can use ''http://example.org/'' as Base IRI.


Write a SPARQL DELETE/INSERT update to change the name of "University of Valencia" to "Universidad de Valencia" whereever it occurs.
===Graph visualisation===
Go to ''Explore'' -> ''Visual graph'' and create an ''Easy graph'' around the resource ''http://example.org#investigation_0''. Double-click on nodes to expand them. Are there any more investigations related to ''Richard Nixon''?


Write a SPARQL DESCRIBE query to get basic information about Cade.
===SPARQL tasks===
Go to the ''SPARQL Query & Update'' tab.
 
'''Task:'''
Using the data in ''russia_investigation_kg.ttl'', write the following SPARQL SELECT queries.
([[Russian investigation KG | This page explains]] the Russian investigation KG a bit more.)
* List all triples in your graph.
* List the first 100 triples in your graph.
* Count the number of triples in your graph.
* Count the number of indictments in your graph.
* List everyone who pleaded guilty, along with the name of the investigation.
* List everyone who were convicted, but who had their conviction overturned by which president.
* For each investigation, list the number of indictments made.
* For each investigation with multiple indictments, list the number of indictments made.
* For each investigation with multiple indictments, list the number of indictments made, sorted with the most indictments first.
* For each president, list the numbers of convictions and of pardons made after conviction.


==If you have more time==
==If you have more time==
Redo all the above steps, this time writing a Python/RDFlib program. This will be the topic of lab 6.
You can look at the python example page to see how to connect to your blazegraph database in Python and perform some basic queries.
==Useful Links==
[https://wiki.uib.no/info216/index.php/About_the_programming_project About the programming project]
[https://wiki.uib.no/info216/index.php/File:S03-SPARQL-13.pdf Lecture Notes]
==Triples that you can base your queries on: (turtle format)==
<syntaxhighlight>
@prefix ex: <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:Cade a foaf:Person ;
    ex:address [ a ex:Address ;
            ex:city ex:Berkeley ;
            ex:country ex:USA ;
            ex:postalCode "94709"^^xsd:string ;
            ex:state ex:California ;
            ex:street "1516_Henry_Street"^^xsd:string ] ;
    ex:age 27 ;
    ex:characteristic ex:Kind ;
    ex:degree [ ex:degreeField ex:Biology ;
            ex:degreeLevel "Bachelor"^^xsd:string ;
            ex:degreeSource ex:University_of_California ;
            ex:year "2011-01-01"^^xsd:gYear ] ;
    ex:interest ex:Bird,
        ex:Ecology,
        ex:Environmentalism,
        ex:Photography,
        ex:Travelling ;
    ex:married ex:Mary ;
    ex:meeting ex:Meeting1 ;
    ex:visit ex:Canada,
        ex:France,
        ex:Germany ;
    foaf:knows ex:Emma ;
    foaf:name "Cade_Tracey"^^xsd:string .
ex:Mary a ex:Student,
        foaf:Person ;
    ex:age 26 ;
    ex:characteristic ex:Kind ;
    ex:interest ex:Biology,
        ex:Chocolate,
        ex:Hiking .
ex:Emma a foaf:Person ;
    ex:address [ a ex:Address ;
            ex:city ex:Valencia ;
            ex:country ex:Spain ;
            ex:postalCode "46020"^^xsd:string ;
            ex:street "Carrer_de_la Guardia_Civil_20"^^xsd:string ] ;
    ex:age 26 ;
    ex:degree [ ex:degreeField ex:Chemistry ;
            ex:degreeLevel "Master" ;
            ex:degreeSource ex:University_of_Valencia ;
            ex:year "2015-01-01"^^xsd:gYear ] ;
    ex:expertise ex:Air_Pollution,
        ex:Toxic_Waste,
        ex:Waste_Management ;
    ex:interest ex:Bike_Riding,
        ex:Music,
        ex:Travelling ;
    ex:meeting ex:Meeting1 ;
    ex:visit ( ex:Portugal ex:Italy ex:France ex:Germany ex:Denmark ex:Sweden ) ;
    foaf:name "Emma_Dominguez"^^xsd:string .


ex:Meeting1 a ex:Meeting ;
'''Task:''' Try to program some of the queries in a Python program (this will be the topic of later labs). You have two options:
    ex:date "August, 2014"^^xsd:string ;
    ex:involved ex:Cade,
        ex:Emma ;
    ex:location ex:Paris .


ex:Paris a ex:City ;
''Using rdflib:''
    ex:capitalOf ex:France ;
Read the Turtle file into an rdflib Graph and use the ''query()'' method.
    ex:locatedIn ex:France .
g = Graph()
g.parse(..., format='ttl')
r = g.query(...your_query_string...)
The hard part is picking the results out of the object ''r''...


ex:France ex:capital ex:Paris .
''Using SPARQLwrapper:''
You can use SPARQLwrapper (another Python API) to connect to your running GraphDB endpoint. See the Python example page for how to do this.


'''Task:''' If you want to explore more, try out the Wikidata Query Service (WDQS):
* [https://query.wikidata.org/ Wikidata Query Service]


</syntaxhighlight>
WDQS tutorials:
* [https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial Wikidata SPARQL tutorial]
* [https://wdqs-tutorial.toolforge.org/ Interactive WDQS tutorial]

Latest revision as of 15:16, 29 January 2025

Topics

  • Setting up GraphDB
  • SPARQL queries and updates

Useful materials

GraphDB documentation:

Introduction to SPARQL:

SPARQL reference:

Tasks

We recommend you download and install the free desktop version of OntoText's GraphDB to run the SPARQL exercises.

If you do not like proprietary software, it is still possible to do most of the exercises using Blazegraph, which you can download here (requires Java). Blazegraph is a powerful open-source tool, but GraphDB offers even more functionality and is what the lab leader will prepare for this semester.

Installing and running GraphDB

Follow the instructions in Getting Started with GraphDB to download and install GraphDB. It seems pre-registration is no longer needed for the Free version (please let Andreas and Sondre know if that is not correct and we will update.)

When GraphDB has been properly installed and is started, it should open in a web browser window at the address http://localhost:7200/ .

Setting up a repository

Follow the instructions in Getting Started with GraphDB to create a new GraphDB Repository called, for example, info216_lab2_NN, where NN are your initials. Choose No inference for now. Otherwise, the default parameters are fine.

Connect to the new repository and pin it as your default repository.

Load data

Download the Turtle file File:Russia investigation kg.txt, and save it with the correct extension, as russia_investigation_kg.ttl (not .txt). (You can also experiment with the Turtle file you saved after exercises 1 and 2.) Load the Russia_investigation data through the GraphDB Workbench as described in the QuickStart guide.

You can use http://example.org/ as Base IRI.

Graph visualisation

Go to Explore -> Visual graph and create an Easy graph around the resource http://example.org#investigation_0. Double-click on nodes to expand them. Are there any more investigations related to Richard Nixon?

SPARQL tasks

Go to the SPARQL Query & Update tab.

Task: Using the data in russia_investigation_kg.ttl, write the following SPARQL SELECT queries. ( This page explains the Russian investigation KG a bit more.)

  • List all triples in your graph.
  • List the first 100 triples in your graph.
  • Count the number of triples in your graph.
  • Count the number of indictments in your graph.
  • List everyone who pleaded guilty, along with the name of the investigation.
  • List everyone who were convicted, but who had their conviction overturned by which president.
  • For each investigation, list the number of indictments made.
  • For each investigation with multiple indictments, list the number of indictments made.
  • For each investigation with multiple indictments, list the number of indictments made, sorted with the most indictments first.
  • For each president, list the numbers of convictions and of pardons made after conviction.

If you have more time

Task: Try to program some of the queries in a Python program (this will be the topic of later labs). You have two options:

Using rdflib: Read the Turtle file into an rdflib Graph and use the query() method.

g = Graph()
g.parse(..., format='ttl')
r = g.query(...your_query_string...)

The hard part is picking the results out of the object r...

Using SPARQLwrapper: You can use SPARQLwrapper (another Python API) to connect to your running GraphDB endpoint. See the Python example page for how to do this.

Task: If you want to explore more, try out the Wikidata Query Service (WDQS):

WDQS tutorials: