Lab: SPARQL

From info216

Topics

  • Setting up GraphDB
  • SPARQL queries and updates

Useful materials

GraphDB documentation:

SPARQL reference:

Tasks

Registering for GraphDB Free

To retrieve a download link for Ontotext's GraphDB Free tool, you first need to register. Here is the registration link (or search for "ontotext graphdb registration").

If you do not like registering for proprietary software, it is still possible to do most of the exercises using Blazegraph, which you can download here (requires Java). Blazegraph is a powerful open-source tool, but GraphDB offers even more functionality and is what the lab leaders will prepare for this semester.

Installing and running GraphDB

When you have received the download link from the GraphDB Team, proceed to install and run GraphDB according to this Quick Start Guide up to and including the section Run GraphDB as a Standalone Server¶.

Setting up a repository

Jump forward in the Quick Start Guide to the section Create a Repository.

Create a new GraphDB Repository called, for example, info216_lab3_NN, where NN are your initials. Connect to it and set it as your default repository.

Load data

Download the Turtle file File:Russia investigation kg.txt, and save it with the correct extension, as russia_investigation_kg.ttl (not .txt). (You can also experiment with the Turtle file you saved after exercises 1 and 2.)

Load the Russia_investigation data through the GraphDB Workbench as described in the QuickStart guide.

SPARQL tasks

Task: Using the data in russia_investigation_kg.ttl, write the following SPARQL SELECT queries. ( This page explains the Russian investigation KG a bit more.)

  • List all triples in your graph.
  • List the first 100 triples in your graph.
  • Count the number of triples in your graph.
  • Count the number of indictments in your graph.
  • List everyone who pleaded guilty, along with the name of the investigation.
  • List everyone who were convicted, but who had their conviction overturned by which president.
  • For each investigation, list the number of indictments made.
  • For each investigation with multiple indictments, list the number of indictments made.
  • For each investigation with multiple indictments, list the number of indictments made, sorted with the most indictments first.
  • For each president, list the numbers of convictions and of pardons made after conviction.

Task: Write the following SPARQL updates:

  • The muellerkg:name property is misnamed, because the object in those triples is always a resource. Rename it to something like muellerkg:person.
  • Update the graph so all the investigated person and president nodes (such as muellerkg:G._Gordon_Liddy and muellerkg:Richard_Nizon) become the subjects in foaf:name triples with the corresponding strings (G. Gordon Liddy and Richard Nixon) as the literals. (Tip: Use STR(kgmueller:) inside a REPLACE in a BIND statement to remove the URI path.)

Task: Load the RDF graph you created in exercises 1 and 2. (Maybe you want to create a new namespace in Blazegraph first.) Use INSERT DATA updates to add these triples to your graph:

  • George Papadopoulos was adviser to the Trump campaign.
    • He pleaded guilty to lying to the FBI.
    • He was sentenced to prison.
  • Roger Stone is a Republican.
    • He was adviser to Trump.
    • He was an official in the Trump campaign.
    • He interacted with Wikileaks.
    • He made a testimony for the House Intelligence Committee.
    • He was cleared of all charges.

Task: Use DELETE DATA and then INSERT DATA updates to correct that Roger Stone was cleared of all charges. Actually,

  • He was indicted for making false statements, witness tampering, and obstruction of justice.

Task:

  • Use a DESCRIBE query to show the updated information about Roger Stone.
  • Use a CONSTRUCT query to create a new RDF group with triples only about Roger Stone (in other words, having Roger Stone as the subject.)

If you have more time

Task: Install curl on your computer if you do not have it.

Windows 10/11: You most likely already have it, test if you have it by typing: curl --help in your command prompt. If you do not have it, follow the guide on https://stackoverflow.com/questions/9507353/how-do-i-install-and-use-curl-on-windows.

Mac: If you do not have, type the following: sudo port install curl in your terminal.

Linux: If you do not have, type the following: sudo apt install curl in your terminal.

Use the command below to download all the triples in your Blazegraph namespace. (You must replace NAMESPACE with the name of your Blazegraph namespace and FILENAME with the Turtle file you want to save to.)

curl -X POST http://sandbox.i2s.uib.no/bigdata/namespace/NAMESPACE/sparql \
     --data-urlencode 'query=CONSTRUCT {?s?p?o} WHERE {?s?p?o}' \
     -H 'Accept:application/x-turtle' > FILENAME.ttl

(On Windows, you have to use double quotes and write everything on a single line.) This command works for the shared online server. If you run Blazegraph on your own machine, you must use a local address like http://10.112.161.87:9999/blazegraph/ instead of the cloud address http://sandbox.i2s.uib.no/bigdata/.

Task: Go back to the russia_investigation_kg.ttl dataset (maybe you need to change to an old Blazegraph namespace). The muellerkg:name property used as predicate is already covered by a standard term from an estalished vocabulary in the LOD cloud: foaf:name, where foaf: is http://xmlns.com/foaf/0.1/.

  • If you have not done so already: write a SPARQL DELETE/INSERT update to change every muellerkg:name predicate in your graph to foaf:name. (It is easy to destroy your RDF graph when you do this, so it is good you saved a copy in the previous task.)
  • Otherwise: find another resource to rename everywhere. For example, you can change your local URI for a public person to a standard Wikidata URI.

Task: Write a DELETE/INSERT statement to change one of the prefixes in your graph, renaming all the resources that use that prefix.

Task: Write an INSERT statement to add at least one significant date to the Mueller investigation, with literal type xsd:date. Write a DELETE/INSERT statement to change the date to a string, and a new DELETE/INSERT statement to change it back to xsd:date.

Task: Try to program some of the queries/updates in a Python program (this will be the topic of later labs). You have two options:

Using rdflib: Read the Turtle file into an rdflib Graph and use the query() method.

g = Graph()
g.parse(..., format='ttl')
r = g.query(...your_query_string...)

The hard part is picking the results out of the object r...

Using SPARQLwrapper: You can use SPARQLwrapper (another Python API) to connect to your running Blazegraph endpoint. See the Python example page for how to do this.

Task: If you want to explore more, try out the Wikidata Query Service (WDQS):

WDQS tutorials: