Lab: SPARQL Programming: Difference between revisions

From info216
Line 24: Line 24:
* Use an ASK query to investigate whether Donald Trump has pardoned more than 5 people.
* Use an ASK query to investigate whether Donald Trump has pardoned more than 5 people.
* Use a DESCRIBE query to create a new graph with information about Donald Trump. Print out the graph in Turtle format.
* Use a DESCRIBE query to create a new graph with information about Donald Trump. Print out the graph in Turtle format.
Note that different types of queries return objects with different contents. You can use core completion in your IDE or Python's ''dir()'' function to explore this further (for example ''dir(results)'').
* SELECT: returns an object you can iterate over (among other things) to get the table rows (the result object also contains table headers)
* ASK: returns an object that contains a single logical value (''True'' or ''False'')
* DESCRIBE and CONSTRUCT: return an rdflib Graph


'''Contents of the file 'spouses.ttl':'''
'''Contents of the file 'spouses.ttl':'''
Line 86: Line 91:
'''Task:'''
'''Task:'''
Program the following queries and updates with SPARQLWrapper and Blazegraph.
Program the following queries and updates with SPARQLWrapper and Blazegraph.
Note that different types of queries return different data formats with different structures:
* SELECT and ASK: return a SPARQL Results Document in either XML, JSON, or CSV/TSV format.
* DESCRIBE and CONSTRUCT: return an RDF graph serialised in TURTLE or RDF/XML syntax, for example.


* Use a DESCRIBE query to create an rdflib Graph about Oliver Stone. Print the graph out in Turtle format.
* Use a DESCRIBE query to create an rdflib Graph about Oliver Stone. Print the graph out in Turtle format.
Line 145: Line 154:
</syntaxhighlight>
</syntaxhighlight>


==If you have more time==
Continue with the ''russia_investigation_kg.ttl'' example and either rdflib or SPARQLWrapper as you prefer - or both :-)
'''Task:''' Write a query that lists all the resources in your graph with Wikidata prefixes (i.e., ''http://www.wikidata.org/entity/'').


The different types of queries requires different return formats:
'''Task:''' Generate a list of item identifiers (Q-codes like these ''Q13'', ''Q42'', ''Q80''...)
* SELECT and ASK: a SPARQL Results Document in XML, JSON, or CSV/TSV format.
* DESCRIBE and CONSTRUCT: an RDF graph serialized, for example, in the TURTLE or RDF/XML syntax, or an equivalent RDF graph serialization.
Remember to make sure that you can see the changes that take place after your inserts.

Revision as of 12:08, 1 February 2023

Topics

SPARQL programming in Python:

  • with rdflib: to manage an rdflib Graph internally in a program
  • with SPARQLWrapper and Blazegraph: to manage an RDF graph stored externally in Blazegraph (on your own local machine or on the shared online server)

Motivation: Last week we entered SPARQL queries and updates manually from the web interface. But in the majority of cases we want to program the management of triples in our graphs, for example to handle automatic or scheduled updates.

Useful materials

Tasks

SPARQL programming in Python with rdflib

Getting ready: No additional installation is needed. You are already running Python and rdflib.

Parse the file russia_investigation_kg.ttl into an rdflib Graph. (The original file is available here: File:Russia investigation kg.txt. Rename it from .ttl to .txt).

Task: Write the following queries and updates with Python and rdflib. See boilerplate examples below.

  • Print out a list of all the predicates used in your graph.
  • Print out a sorted list of all the presidents represented in your graph.
  • Create dictionary (Python dict) with all the represented presidents as keys. For each key, the value is a list of names of people indicted under that president.
  • Use an ASK query to investigate whether Donald Trump has pardoned more than 5 people.
  • Use a DESCRIBE query to create a new graph with information about Donald Trump. Print out the graph in Turtle format.

Note that different types of queries return objects with different contents. You can use core completion in your IDE or Python's dir() function to explore this further (for example dir(results)).

  • SELECT: returns an object you can iterate over (among other things) to get the table rows (the result object also contains table headers)
  • ASK: returns an object that contains a single logical value (True or False)
  • DESCRIBE and CONSTRUCT: return an rdflib Graph

Contents of the file 'spouses.ttl':

@prefix ex: <http://example.org/> .
@prefix schema: <https://schema.org/> .

ex:Donald_Trump schema:spouse ( ex:IvanaTrump ex:MarlaMaples ex:MelaniaTrump ) .

Boilerplate code for rdflib query:

from rdflib import Graph

g = Graph()
g.parse("spouses.ttl", format='ttl')
result = g.query("""
    PREFIX ex: <http://example.org/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX schema: <https://schema.org/>

    SELECT ?spouse WHERE {
        ex:Donald_Trump schema:spouse / rdf:rest* / rdf:first ?spouse .
    }""")
for row in result:
    print("Donald has spouse %s" % row)

Boilerplate code for rdflib update: This is the KG4News graph again:

from rdflib import Graph

update_str = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX kg: <http://i2s.uib.no/kg4news/>
PREFIX ss: <http://semanticscholar.org/>

INSERT DATA {    
    kg:paper_123 rdf:type ss:Paper ;
               ss:title "Semantic Knowledge Graphs for the News: A Review"@en ;
            kg:year 2022 ;
            dct:contributor kg:auth_456, kg:auth_789 . 
}"""

g = Graph()
g.update(update_str)
print(g.serialize(format='ttl'))  # format=’turtle’ also works

SPARQL programming in Python with SPARQLWrapper and Blazegraph

Getting ready: Make sure you have to access to a running Blazegraph as in Exercise 3: SPARQL. You can either run Blazegraph locally on your own machine (best) or online on a shared server at UiB (also ok).

Install SPARQLWrapper (in your virtual environment):

pip install SPARQLWrapper

Some older versions also require you to install requests API. The SPARQLWrapper page on GitHub contains more information.

Continue working with the RDF graph you created in exercises 1-2 (perhaps creating a new namespace in Blazegraph first.)

Task: Program the following queries and updates with SPARQLWrapper and Blazegraph.

Note that different types of queries return different data formats with different structures:

  • SELECT and ASK: return a SPARQL Results Document in either XML, JSON, or CSV/TSV format.
  • DESCRIBE and CONSTRUCT: return an RDF graph serialised in TURTLE or RDF/XML syntax, for example.
  • Use a DESCRIBE query to create an rdflib Graph about Oliver Stone. Print the graph out in Turtle format.

Boilerplate code for SPARQLWrapper query:

from SPARQLWrapper import SPARQLWrapper

SERVER = 'http://sandbox.i2s.uib.no/bigdata/'       # you may want to change this
NAMESPACE = 's03'                                   # you most likely want to change this

endpoint = f'{SERVER}namespace/{NAMESPACE}/sparql'  # standard path for Blazegraph queries

query = """
    PREFIX ex: <http://example.org/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX schema: <https://schema.org/>

    SELECT ?spouse WHERE {
 	 	ex:Donald_Trump schema:spouse / rdf:rest* / rdf:first ?spouse .
    }"""
    
client = SPARQLWrapper(endpoint)
client.setReturnFormat('json')
client.setQuery(query)

print('Spouses:')
results = client.queryAndConvert()
for result in results["results"]["bindings"]:
    print(result["spouse"]["value"])

Boilerplate code for SPARQLWrapper update:

from SPARQLWrapper import SPARQLWrapper

SERVER = 'http://sandbox.i2s.uib.no/bigdata/'       # you may want to change this
NAMESPACE = 's03'                                   # you most likely want to change this

endpoint = f'{SERVER}namespace/{NAMESPACE}/sparql'  # standard path for Blazegraph updates

update_str = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX kg: <http://i2s.uib.no/kg4news/>
PREFIX ss: <http://semanticscholar.org/>

INSERT DATA {    
    kg:paper_123 rdf:type ss:Paper ;
               ss:title "Semantic Knowledge Graphs for the News: A Review"@en ;
            kg:year 2023 ;
            dct:contributor kg:auth_654, kg:auth_789 . 
}"""

client = SPARQLWrapper(endpoint)
client.setMethod('POST')
client.setQuery(update_str)
res = client.queryAndConvert()

If you have more time

Continue with the russia_investigation_kg.ttl example and either rdflib or SPARQLWrapper as you prefer - or both :-)

Task: Write a query that lists all the resources in your graph with Wikidata prefixes (i.e., http://www.wikidata.org/entity/).

Task: Generate a list of item identifiers (Q-codes like these Q13, Q42, Q80...)