Lab: SPARQL: Difference between revisions

From info216
Line 25: Line 25:


'''Running Blazegraph online:'''
'''Running Blazegraph online:'''
If you have trouble installing Blazegraph, you can use [http://sandbox.i2s.uib.no/bigdata/ a shared online server] for now. It provides the same Blazegraph interface, but runs in the cloud and can only be used from inside the UiB network. (If you are outside the UiB campus, you can connect through the [https://hjelp.uib.no/tas/public/ssp/content/detail/service?unid=a566dafec92a4d35bba974f0733f3663 UiB VPN].)
If you have trouble installing Blazegraph, you can use [http://sandbox.i2s.uib.no/bigdata/ a shared online server] for now. It provides the same Blazegraph interface, but runs in the cloud and can only be used from inside the UiB network. (If you are outside the UiB campus, you can connect through the [https://hjelp.uib.no/tas/public/ssp/content/detail/service?unid=a566dafec92a4d35bba974f0733f3663 UiB VPN] first.)


'''Using Blazegraph:'''
'''Using Blazegraph:'''

Revision as of 12:59, 18 January 2023

Topics

  • Setting up the Blazegraph graph database.
  • SPARQL queries and updates.

Useful materials

Blazegraph:

SPARQL:

Tasks

Running Blazegraph

You can either run Blazegraph locally on your own machine (best) or online at a local server (also ok).

Installing the Blazegraph database on your own computer: Download Blazegraph (blazegraph.jar) from here: https://blazegraph.com/ You can place blazegraph.jar in the same folder of your python project for the labs. Navigate to the folder of blazegraph.jar in your commandline/terminal using cd. (cd C:\Users\marti\info216 for me as an example). Now run this command:

java -server -Xmx4g -jar blazegraph.jar

You might have to [install Java 64-bit JDK if you have problems running Blazegraph. If you get an "Address already in use" error, this is likely because Blazegraph has been terminated improperly. Either restart the terminal-session or try to run this command instead:

java -server -Xmx4g -Djetty.port=19999 -jar blazegraph.jar 

This changes the port of the Blazegraph server.

Running Blazegraph online: If you have trouble installing Blazegraph, you can use a shared online server for now. It provides the same Blazegraph interface, but runs in the cloud and can only be used from inside the UiB network. (If you are outside the UiB campus, you can connect through the UiB VPN first.)

Using Blazegraph:

  • Create namespace: In the Blazegraph interface, you may go to the UPDATE tab and create a new namespace using default values and the Create namespace button. You must do this if you use the shared online server. You can also do this on your local server to keep your datasets separate. (If you do not create a namespace, the default is kb.)
  • Uploading data: In the Blazegraph interface, go to the UPDATE tab and use the Browse... and Update buttons to load the file into Blazegraph.
    • You can use the data in the Turtle file File:Russia investigation kg.txt. Make sure you save it with the correct extension, as russia_investigation_kg.ttl (not .txt).
    • You can also use the Turtle file you saved after exercises 1 and 2.
  • Querying and updating: In the Blazegraph interface, go to the QUERY and UPDATE tabs to enter queries and updates.

SPARQL tasks

Task: Using the data in russia_investigation_kg.ttl, write the following SPARQL queries:

  • SELECT all triples in your graph.
  • SELECT all the interests of Cade.
  • SELECT the city and country of where Emma lives.
  • SELECT only people who are older than 26.
  • SELECT Everyone who graduated with a Bachelor Degree.

This page explains the Russian investigation KG a bit more.

Task: Load the RDF graph you created in exercises 1 and 2. Use INSERT DATA to add these triples to your graph:

  • George Papadopoulos was adviser to the Trump campaign.
    • He pleaded guilty to lying to the FBI.
    • He was sentenced to prison.
  • Roger Stone is a Republican.
    • He was adviser to Trump.
    • He was an official in the Trump campaign.
    • He interacted with Wikileaks.
    • He was indicted for making false statements, witness tampering, and obstruction of justice.
    • He made a testimony for the House Intelligence Committee.

Use SPARQL Update's DELETE DATA to delete that fact that Cade is interested in Photography. Run your SPARQL query again to check that the graph has changed.

Use INSERT DATA to add information about Sergio Pastor, who lives in 4 Carrer del Serpis, 46021 Valencia, Spain. he has a M.Sc. in computer from the University of Valencia from 2008. His areas of expertise include big data, semantic technologies and machine learning.


Write a SPARQL DELETE/INSERT update to change the name of "University of Valencia" to "Universidad de Valencia" whereever it occurs.

Write a SPARQL DESCRIBE query to get basic information about Sergio.

Write a SPARQL CONSTRUCT query that returns that: any city in an address is a cityOf the country of the same address.

If you have more time

Task: Try to program some of the queries/updates in a Python program (this will be the topic of later labs). You have two options:

Using rdflib: Read the Turtle file into an rdflib Graph and use the query() method.

g = Graph()
g.parse(..., format='ttl')
r = g.query(...your_query_string...)

The hard part is picking the results out of the object r...

Using SPARQLwrapper: You can use SPARQLwrapper (another Python API) to connect to your running Blazegraph endpoint. See the Python example page for how to do this.

Task: If you want to explore more, try out Wikidata Query Service (WDQS)

WDQS tutorials: