Lab: SPARQL: Difference between revisions

From info216
No edit summary
Line 3: Line 3:
* SPARQL queries and updates. We use SPARQL to retrieve of update triples in our databases/graphs of triples
* SPARQL queries and updates. We use SPARQL to retrieve of update triples in our databases/graphs of triples


==Installing the Blazegraph database on your own computer==
==Useful materials==
Blazegraph:
* [https://blazegraph.com/ Welcome to Blazegraph]
 
SPARQL:
* [https://www.w3.org/TR/sparql11-query/ SPARQL Query Documentation]
* [http://www.w3.org/TR/sparql11-update/ SPARQL Update Documentation]
 
==Tasks==
'''Installing the Blazegraph database on your own computer:'''
Download Blazegraph (blazegraph.jar) from here: [https://blazegraph.com/ https://blazegraph.com/]
Download Blazegraph (blazegraph.jar) from here: [https://blazegraph.com/ https://blazegraph.com/]
I recommend placing blazegraph.jar in the same folder of your python project for the labs.  
You can place blazegraph.jar in the same folder of your python project for the labs.  
Navigate to the folder of blazegraph.jar in your commandline/terminal using cd. (cd C:\Users\marti\info216 for me as an example). Now run this command:
Navigate to the folder of blazegraph.jar in your commandline/terminal using cd. (cd C:\Users\marti\info216 for me as an example). Now run this command:
<syntaxhighlight>
java -server -Xmx4g -jar blazegraph.jar
java -server -Xmx4g -jar blazegraph.jar
You might have to install java 64-bit JDK if you have problems running Blazegraph. You can do it from [https://www.oracle.com/technetwork/java/javase/downloads/ this link].
</syntaxhighlight>
If you get an "Address already in use" error, this is likely because Blazegraph has been terminated improperly. Either restart the terminal-session or try to run this command instead:  
You might have to install java 64-bit JDK if you have problems running blazegraph. You can do it from this link:
java -server -Xmx4g -Djetty.port=19999 -jar blazegraph.jar  
"https://www.oracle.com/technetwork/java/javase/downloads/"
This changes the port of the Blazegraph server.
If you get an "Address already in use" error, this is likely because blazegraph has been terminated improperly. Either restart the terminal-session or try to run this command instead:  
<syntaxhighlight>
java -server -Xmx4g -Djetty.port=19999 -jar blazegraph.jar  
</syntaxhighlight>
This changes the port of the blazegraph server.


If you have trouble installing Blazegraph you can use this link for now: "http://sandbox.i2s.uib.no/bigdata/".
'''Running Blazegraph online:'''
This is the same blazegraph interface, but its stored in the cloud and only be used on the UiB network. You may be able to access it without connecting to the UiB Network, but if you are unable to access the endpoint try connecting via the VPN. Instructions [https://hjelp.uib.no/tas/public/ssp/content/detail/service?unid=a566dafec92a4d35bba974f0733f3663 here].
If you have trouble installing Blazegraph, you can use [http://sandbox.i2s.uib.no/bigdata/ this link] for now. This is the same Blazegraph interface, but its stored in the cloud and only be used on the UiB network. You may be able to access it without connecting to the UiB Network, but if you are unable to access the endpoint try connecting via the VPN. Instructions [https://hjelp.uib.no/tas/public/ssp/content/detail/service?unid=a566dafec92a4d35bba974f0733f3663 here]. If it works it should now display an URL like: "http://10.0.0.13:9999/blazegraph/". Open this in a browser.  


 
'''Using Blazegraph:'''
If it works it should now display an url like: "http://10.0.0.13:9999/blazegraph/". Open this in a browser.
You can now run SPARQL queries and updates and load RDF graphs from your file into Blazegraph.
You can now run SPARQL queries and updates and load RDF graphs from your file into Blazegraph.
In the update tab, load RDF data (select type below) and then paste the contents of your turtle/.txt file to add them all at once to the database. If you have not serialized your graph from lab 2 yet, you can use the triples on the bottom of the page instead. Just copy and paste them into the Update section.
In the update tab, load RDF data (select type below) and then paste the contents of your turtle/.txt file to add them all at once to the database. If you have not serialized your graph from lab 2 yet, you can use the triples on the bottom of the page instead. Just copy and paste them into the Update section.
Line 63: Line 66:


==If you have more time==
==If you have more time==
Redo all the above steps, this time writing a Python/RDFlib program. This will be the topic of lab 6.
'''Task:''' Try to program some of the queries/updates in a Python program (this will be the topic of later labs). You have two options:
You can look at the python example page to see how to connect to your blazegraph endpoint in Python and how to perform some basic queries.
# Read the Turtle file into an rdflib Graph and use the ''query()'' method.  
 
g = Graph()
 
g.parse(..., format='ttl')
==Useful Links==
r = g.query(...your_query_string...)
[https://wiki.uib.no/info216/index.php/File:S03-SPARQL-13.pdf Lecture Notes]
The hard part is picking the results out of the object ''r''...
 
# You can use SPARQLwrapper (another Python API) to connect to your running Blazegraph endpoint. See the Python example page for how to do this.
[https://www.w3.org/TR/sparql11-query/ SPARQL Query Documentation]
 
[http://www.w3.org/TR/sparql11-update/ SPARQL Update Documentation]
 
If you want to explore more, try out Wikidata Query Service
 
[https://query.wikidata.org/ Wikidata Query Service]
 
Tutorials
 
[https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial Tutorials]
 
[https://wdqs-tutorial.toolforge.org/ Interactive tutorial]
 
==Triples that you can base your queries on: (turtle format)==
<syntaxhighlight>
@prefix ex: <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
 
ex:Cade a foaf:Person ;
    ex:address [ a ex:Address ;
            ex:city ex:Berkeley ;
            ex:country ex:USA ;
            ex:postalCode "94709"^^xsd:string ;
            ex:state ex:California ;
            ex:street "1516_Henry_Street"^^xsd:string ] ;
    ex:age 27 ;
    ex:characteristic ex:Kind ;
    ex:degree [ ex:degreeField ex:Biology ;
            ex:degreeLevel "Bachelor"^^xsd:string ;
            ex:degreeSource ex:University_of_California ;
            ex:year "2011-01-01"^^xsd:gYear ] ;
    ex:interest ex:Bird,
        ex:Ecology,
        ex:Environmentalism,
        ex:Photography,
        ex:Travelling ;
    ex:married ex:Mary ;
    ex:meeting ex:Meeting1 ;
    ex:visit ex:Canada,
        ex:France,
        ex:Germany ;
    foaf:knows ex:Emma ;
    foaf:name "Cade_Tracey"^^xsd:string .
 
ex:Mary a ex:Student,
        foaf:Person ;
    ex:age 26 ;
    ex:characteristic ex:Kind ;
    ex:interest ex:Biology,
        ex:Chocolate,
        ex:Hiking .
 
ex:Emma a foaf:Person ;
    ex:address [ a ex:Address ;
            ex:city ex:Valencia ;
            ex:country ex:Spain ;
            ex:postalCode "46020"^^xsd:string ;
            ex:street "Carrer_de_la Guardia_Civil_20"^^xsd:string ] ;
    ex:age 26 ;
    ex:degree [ ex:degreeField ex:Chemistry ;
            ex:degreeLevel "Master"^^xsd:string ;
            ex:degreeSource ex:University_of_Valencia ;
            ex:year "2015-01-01"^^xsd:gYear ] ;
    ex:expertise ex:Air_Pollution,
        ex:Toxic_Waste,
        ex:Waste_Management ;
    ex:interest ex:Bike_Riding,
        ex:Music,
        ex:Travelling ;
    ex:meeting ex:Meeting1 ;
    ex:visit ( ex:Portugal ex:Italy ex:France ex:Germany ex:Denmark ex:Sweden ) ;
    foaf:name "Emma_Dominguez"^^xsd:string .
 
ex:Meeting1 a ex:Meeting ;
    ex:date "August, 2014"^^xsd:string ;
    ex:involved ex:Cade,
        ex:Emma ;
    ex:location ex:Paris .
 
ex:Paris a ex:City ;
    ex:capitalOf ex:France ;
    ex:locatedIn ex:France .
 
ex:France ex:capital ex:Paris .


'''Task:''' If you want to explore more, try out Wikidata Query Service (WDQS)
* [https://query.wikidata.org/ Wikidata Query Service]


</syntaxhighlight>
WDQS tutorials:
* [https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial Wikidata SPARQL tutorial]
* [https://wdqs-tutorial.toolforge.org/ Interactive WDQS tutorial]

Revision as of 14:59, 17 January 2023

Topics

  • Setting up the Blazegraph graph database. Previously we have only stored our triples in memory, which is not persistent.
  • SPARQL queries and updates. We use SPARQL to retrieve of update triples in our databases/graphs of triples

Useful materials

Blazegraph:

SPARQL:

Tasks

Installing the Blazegraph database on your own computer: Download Blazegraph (blazegraph.jar) from here: https://blazegraph.com/ You can place blazegraph.jar in the same folder of your python project for the labs. Navigate to the folder of blazegraph.jar in your commandline/terminal using cd. (cd C:\Users\marti\info216 for me as an example). Now run this command:

java -server -Xmx4g -jar blazegraph.jar

You might have to install java 64-bit JDK if you have problems running Blazegraph. You can do it from this link. If you get an "Address already in use" error, this is likely because Blazegraph has been terminated improperly. Either restart the terminal-session or try to run this command instead:

java -server -Xmx4g -Djetty.port=19999 -jar blazegraph.jar 

This changes the port of the Blazegraph server.

Running Blazegraph online: If you have trouble installing Blazegraph, you can use this link for now. This is the same Blazegraph interface, but its stored in the cloud and only be used on the UiB network. You may be able to access it without connecting to the UiB Network, but if you are unable to access the endpoint try connecting via the VPN. Instructions here. If it works it should now display an URL like: "http://10.0.0.13:9999/blazegraph/". Open this in a browser.

Using Blazegraph: You can now run SPARQL queries and updates and load RDF graphs from your file into Blazegraph. In the update tab, load RDF data (select type below) and then paste the contents of your turtle/.txt file to add them all at once to the database. If you have not serialized your graph from lab 2 yet, you can use the triples on the bottom of the page instead. Just copy and paste them into the Update section.

Tasks

Write the following SPARQL queries:

  • SELECT all triples in your graph.
  • SELECT all the interests of Cade.
  • SELECT the city and country of where Emma lives.
  • SELECT only people who are older than 26.
  • SELECT Everyone who graduated with a Bachelor Degree.

Use SPARQL Update's DELETE DATA to delete that fact that Cade is interested in Photography. Run your SPARQL query again to check that the graph has changed.

Use INSERT DATA to add information about Sergio Pastor, who lives in 4 Carrer del Serpis, 46021 Valencia, Spain. he has a M.Sc. in computer from the University of Valencia from 2008. His areas of expertise include big data, semantic technologies and machine learning.

Insert these triples into your RDF graph, if you have not done so before:

  • George Papadopoulos was adviser to the Trump campaign.
    • He pleaded guilty to lying to the FBI.
    • He was sentenced to prison.
  • Michael Flynn was adviser to Donald Trump.
    • He pleaded guilty for lying to the FBI.
    • He negotiated a plea agreement.
  • Michael Cohen was Donald Trump's attorney.
    • He pleaded guilty for lying to Congress.
  • Roger Stone is a Republican.
    • He was adviser to Trump.
    • He was an official in the Trump campaign.
    • He interacted with Wikileaks.
    • He was indicted for making false statements, witness tampering, and obstruction of justice.
    • He made a testimony for the House Intelligence Committee.

Write a SPARQL DELETE/INSERT update to change the name of "University of Valencia" to "Universidad de Valencia" whereever it occurs.

Write a SPARQL DESCRIBE query to get basic information about Sergio.

Write a SPARQL CONSTRUCT query that returns that: any city in an address is a cityOf the country of the same address.

If you have more time

Task: Try to program some of the queries/updates in a Python program (this will be the topic of later labs). You have two options:

  1. Read the Turtle file into an rdflib Graph and use the query() method.
g = Graph()
g.parse(..., format='ttl')
r = g.query(...your_query_string...)

The hard part is picking the results out of the object r...

  1. You can use SPARQLwrapper (another Python API) to connect to your running Blazegraph endpoint. See the Python example page for how to do this.

Task: If you want to explore more, try out Wikidata Query Service (WDQS)

WDQS tutorials: