Lab: SPARQL: Difference between revisions

From info216
mNo edit summary
No edit summary
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Topics==
==Topics==
* Setting up the Blazegraph graph database.
* Setting up GraphDB
* SPARQL queries and updates.
* SPARQL queries and updates


==Useful materials==
==Useful materials==
Blazegraph homepage:
GraphDB documentation:
* [https://blazegraph.com/ Welcome to Blazegraph]
* [https://graphdb.ontotext.com/documentation/10.0/quick-start-guide.html GraphDB 10.0 Quick Start Guide]
* [https://github.com/blazegraph/database/wiki Blazegraph wiki]
* [https://graphdb.ontotext.com/documentation/10.5/ GraphDB 10.5 Documentation]


SPARQL reference:
SPARQL reference:
* [https://www.w3.org/TR/sparql11-query/ SPARQL Query Documentation]
* [https://www.w3.org/TR/sparql11-query/ SPARQL Query Documentation]
<!--
* [http://www.w3.org/TR/sparql11-update/ SPARQL Update Documentation]
* [http://www.w3.org/TR/sparql11-update/ SPARQL Update Documentation]
-->
* [https://en.wikibooks.org/wiki/SPARQL/Expressions_and_Functions SPARQL Expressions and Functions]
* [https://en.wikibooks.org/wiki/SPARQL/Expressions_and_Functions SPARQL Expressions and Functions]


==Tasks==
==Tasks==
===Running Blazegraph===
===Registering for GraphDB Free===
You can either run Blazegraph locally on your own machine (best) or online on a shared server at UiB (also ok).
To retrieve a download link for Ontotext's GraphDB Free tool, you first need to register. Here is the [https://www.ontotext.com/products/graphdb/download/?utm_source=adwords&utm_medium=ppc&utm_term=ontotext%20graphdb&utm_campaign=Search+Graphdb&hsa_cam=19852701758&hsa_mt=p&hsa_ver=3&hsa_src=g&hsa_ad=651747487851&hsa_net=adwords&hsa_tgt=kwd-1467556044238&hsa_acc=9129462532&hsa_grp=148766495402&hsa_kw=ontotext%20graphdb&gad_source=1&gclid=Cj0KCQiAh8OtBhCQARIsAIkWb69Mvno3kVHLrHHpZ_FV2_vnwf9IWoMa207bd43maPDUOm2R53UAuYYaAgCxEALw_wcB registration link] (or search for "ontotext graphdb registration").


'''Installing the Blazegraph database on your own computer:'''
''If you do not like registering for proprietary software, it is still possible to do most of the exercises using Blazegraph, which you can [https://blazegraph.com/ download here] (requires Java). Blazegraph is a powerful open-source tool, but GraphDB offers even more functionality and is what the lab leaders will prepare for this semester.
* Download the [https://github.com/blazegraph/database/releases/tag/BLAZEGRAPH_2_1_6_RC Blazegraph 2.1.6 2.1.6 Release Candidate (the file ''blazegraph.jar'')]. You can place ''blazegraph.jar'' in your INFO216 exercises folder.
* Go to the folder where you saved ''blazegraph.jar'' in your command/terminal window using ''cd'' (for example, ''cd C:\Users\marti\info216'').  
* Start Blazegraph:
java -server -Xmx4g -jar blazegraph.jar
** You might have to [https://www.oracle.com/technetwork/java/javase/downloads/ install a 64-bit Java Development Kit (JDK)] if you have problems running Blazegraph.
** If you get an "Address already in use" error, this is likely because Blazegraph has been terminated improperly. Either restart the command/terminal window or try to change the port of the Blazegraph server with this command:
java -server -Xmx4g -Djetty.port=19999 -jar blazegraph.jar
* When everything works, Blazegraph will print out something like:
Welcome to the Blazegraph(tm) Database.
Go to http://10.112.161.87:9999/blazegraph/ to get started.
* Open the URI on the previous line in a web browser to access Blazegraph's web interface (the address will most likely be different from this example).


'''Running Blazegraph online:'''
===Installing and running GraphDB===
If you have trouble installing Blazegraph, you can use [http://sandbox.i2s.uib.no/bigdata/ a shared online server] for now. It provides the same Blazegraph interface, but runs in the cloud and can only be used from inside the UiB network. (If you are outside the UiB campus, you can connect through the [https://hjelp.uib.no/tas/public/ssp/content/detail/service?unid=a566dafec92a4d35bba974f0733f3663 UiB VPN] first.) Note that there is no authentication or authorisation: ''all the data you upload to the cloud server will be visible to - and can be changed by - anyone inside the UiB network.''
When you have received the download link in an email from the ''GraphDB Team'', you can proceed to install and run GraphDB in the following manner, depending on your system:
* On Windows:
** Download the GraphDB Desktop .msi installer file.
** Double-click the application file and follow the on-screen installer prompts.
** Locate the GraphDB Desktop application in the Windows Start menu and start it. The GraphDB Workbench opens at http://localhost:7200/.


'''Using Blazegraph:'''
* On MacOS
* ''Creating a namespace:'' In the Blazegraph interface, you may go to the ''UPDATE'' tab and create a new namespace using default values and the ''Create namespace'' button.  
** Download the GraphDB Desktop .dmg file.
** You '''must''' do this if you use the shared online server to keep your own graph(s) separate.  
** Double-click it and get a virtual disk on your desktop. Copy the program from the virtual disk to your hard disk Applications folder, and you’re set.
** You can also do this on your own (local) server to keep your graphs separate.  
** Start GraphDB Desktop by clicking the application icon. The GraphDB Workbench opens at http://localhost:7200/.
** If you do not create a namespace, the default will be '''kb'''.  
 
** Note that Blazegraph namespaces have nothing to do with namespaces in rdflib or in Turtle or other RDF serialisations.
* On Linux
* ''Uploading data:'' In the Blazegraph interface, go to the ''UPDATE'' tab and use the ''Browse...'' and ''Update'' buttons to load the file into Blazegraph.
** Download the GraphDB Desktop .deb or .rpm file.
** You can use the data in the Turtle file [[File:russia_investigation_kg.txt]]. Make sure you save it with the correct extension, as ''russia_investigation_kg.ttl'' (not ''.txt'').  
** Install the package with sudo dpkg -i or sudo rpm -i and the name of the downloaded package. Alternatively, you can double-click the package name.
** You can also use the Turtle file you saved after exercises 1 and 2.
** Start GraphDB Desktop by clicking the application icon. The GraphDB Workbench opens at http://localhost:7200/.
* ''Querying and updating:'' In the Blazegraph interface, go to the ''QUERY'' and ''UPDATE'' tabs to enter queries and updates.
 
For more information about setting up GraphDB you can check out their quick start guide:
[https://graphdb.ontotext.com/documentation/10.0/quick-start-guide.html Quick Start Guide].
 
===Setting up a repository===
Follow the ''Create a Repository'' section in the [https://graphdb.ontotext.com/documentation/10.0/quick-start-guide.html Quick Start Guide].
Create a new GraphDB Repository called, for example, ''info216_lab3_NN'', where ''NN'' are your initials. Choose ''No inference'' for now.
Otherwise, the default parameters are fine.
 
Connect to the new repository and pin it as your default repository.
 
===Load data===
Download the Turtle file [[File:russia_investigation_kg.txt]], and save it with the correct extension, as ''russia_investigation_kg.ttl'' (not ''.txt''). (You can also experiment with the Turtle file you saved after exercises 1 and 2.) Load the Russia_investigation data through the GraphDB Workbench as described in the QuickStart guide.
 
You can use ''http://example.org/'' as Base IRI.
 
===Graph visualisation===
Go to ''Explore'' -> ''Visual graph'' and create an ''Easy graph'' around the resource ''http://example.org#investigation_0''. Double-click on nodes to expand them. Are there any more investigations related to ''Richard Nixon''?


===SPARQL tasks===
===SPARQL tasks===
Go to the ''SPARQL Query & Update'' tab.


'''Task:'''
'''Task:'''
Line 60: Line 72:
* For each investigation with multiple indictments, list the number of indictments made, sorted with the most indictments first.
* For each investigation with multiple indictments, list the number of indictments made, sorted with the most indictments first.
* For each president, list the numbers of convictions and of pardons made after conviction.
* For each president, list the numbers of convictions and of pardons made after conviction.
'''Task:'''
Write the following SPARQL updates:
* The ''muellerkg:name'' property is misnamed, because the object in those triples is always a resource. Rename it to something like ''muellerkg:person''.
* Update the graph so all the investigated person and president nodes (such as ''muellerkg:G._Gordon_Liddy'' and  ''muellerkg:Richard_Nizon'') become the subjects in ''foaf:name'' triples with the corresponding strings (''G. Gordon Liddy'' and ''Richard Nixon'') as the literals. (''Tip:'' Use ''STR(kgmueller:)'' inside a REPLACE in a BIND statement to remove the URI path.)
'''Task:'''
Load the RDF graph you created in exercises 1 and 2. (Maybe you want to create a new namespace in Blazegraph first.) Use INSERT DATA updates to add these triples to your graph:
* George Papadopoulos was adviser to the Trump campaign.
** He pleaded guilty to lying to the FBI.
** He was sentenced to prison.
* Roger Stone is a Republican.
** He was adviser to Trump.
** He was an official in the Trump campaign.
** He interacted with Wikileaks.
** He made a testimony for the House Intelligence Committee.
** He was cleared of all charges.
'''Task:'''
Use DELETE DATA and then INSERT DATA updates to correct that Roger Stone was cleared of all charges. Actually,
* He was indicted for making false statements, witness tampering, and obstruction of justice.
'''Task:'''
* Use a DESCRIBE query to show the updated information about Roger Stone.
* Use a CONSTRUCT query to create a new RDF group with triples only about Roger Stone (in other words, having Roger Stone as the subject.)


==If you have more time==
==If you have more time==
'''Task:'''
Install ''curl'' on your computer if you do not have it.
'''Windows 10/11:''' You most likely already have it, test if you have it by typing: ''curl --help'' in your command prompt. If you do not have it, follow the guide on https://stackoverflow.com/questions/9507353/how-do-i-install-and-use-curl-on-windows.
'''Mac:''' If you do not have, type the following: ''sudo port install curl'' in your terminal.
'''Linux:''' If you do not have, type the following: ''sudo apt install curl'' in your terminal.
Use the command below to download all the triples in your Blazegraph namespace. (You must replace ''NAMESPACE'' with the name of your Blazegraph namespace and ''FILENAME'' with the Turtle file you want to save to.)
curl -X POST http://sandbox.i2s.uib.no/bigdata/namespace/NAMESPACE/sparql \
      --data-urlencode 'query=CONSTRUCT {?s?p?o} WHERE {?s?p?o}' \
      -H 'Accept:application/x-turtle' > FILENAME.ttl
(On Windows, you have to use double quotes and write everything on a single line.) This command works for the shared online server. If you run Blazegraph on your own machine, you must use a local address like ''http://10.112.161.87:9999/blazegraph/'' instead of the cloud address ''http://sandbox.i2s.uib.no/bigdata/''.
'''Task:'''
Go back to the ''russia_investigation_kg.ttl'' dataset (maybe you need to change to an old Blazegraph namespace). The ''muellerkg:name'' property used as predicate is already covered by a standard term from an estalished vocabulary in the LOD cloud: ''foaf:name'', where ''foaf:'' is ''http://xmlns.com/foaf/0.1/''.
* If you have not done so already: write a SPARQL DELETE/INSERT update to change every ''muellerkg:name'' predicate in your graph to ''foaf:name''. (It is easy to destroy your RDF graph when you do this, so it is good you saved a copy in the previous task.)
* Otherwise: find another resource to rename everywhere. For example, you can change your local URI for a public person to a standard [https://wikidata.org Wikidata] URI.
'''Task:''' Write a DELETE/INSERT statement to change one of the prefixes in your graph, renaming all the resources that use that prefix.
'''Task:''' Write an INSERT statement to add at least one significant date to the Mueller investigation, with literal type xsd:date. Write a DELETE/INSERT statement to change the date to a string, and a new DELETE/INSERT statement to change it back to xsd:date.


'''Task:''' Try to program some of the queries/updates in a Python program (this will be the topic of later labs). You have two options:
'''Task:''' Try to program some of the queries in a Python program (this will be the topic of later labs). You have two options:


''Using rdflib:''
''Using rdflib:''
Line 121: Line 85:


''Using SPARQLwrapper:''
''Using SPARQLwrapper:''
You can use SPARQLwrapper (another Python API) to connect to your running Blazegraph endpoint. See the Python example page for how to do this.
You can use SPARQLwrapper (another Python API) to connect to your running GraphDB endpoint. See the Python example page for how to do this.


'''Task:''' If you want to explore more, try out the Wikidata Query Service (WDQS):
'''Task:''' If you want to explore more, try out the Wikidata Query Service (WDQS):

Latest revision as of 10:46, 11 February 2024

Topics

  • Setting up GraphDB
  • SPARQL queries and updates

Useful materials

GraphDB documentation:

SPARQL reference:

Tasks

Registering for GraphDB Free

To retrieve a download link for Ontotext's GraphDB Free tool, you first need to register. Here is the registration link (or search for "ontotext graphdb registration").

If you do not like registering for proprietary software, it is still possible to do most of the exercises using Blazegraph, which you can download here (requires Java). Blazegraph is a powerful open-source tool, but GraphDB offers even more functionality and is what the lab leaders will prepare for this semester.

Installing and running GraphDB

When you have received the download link in an email from the GraphDB Team, you can proceed to install and run GraphDB in the following manner, depending on your system:

  • On Windows:
    • Download the GraphDB Desktop .msi installer file.
    • Double-click the application file and follow the on-screen installer prompts.
    • Locate the GraphDB Desktop application in the Windows Start menu and start it. The GraphDB Workbench opens at http://localhost:7200/.
  • On MacOS
    • Download the GraphDB Desktop .dmg file.
    • Double-click it and get a virtual disk on your desktop. Copy the program from the virtual disk to your hard disk Applications folder, and you’re set.
    • Start GraphDB Desktop by clicking the application icon. The GraphDB Workbench opens at http://localhost:7200/.
  • On Linux
    • Download the GraphDB Desktop .deb or .rpm file.
    • Install the package with sudo dpkg -i or sudo rpm -i and the name of the downloaded package. Alternatively, you can double-click the package name.
    • Start GraphDB Desktop by clicking the application icon. The GraphDB Workbench opens at http://localhost:7200/.

For more information about setting up GraphDB you can check out their quick start guide: Quick Start Guide.

Setting up a repository

Follow the Create a Repository section in the Quick Start Guide. Create a new GraphDB Repository called, for example, info216_lab3_NN, where NN are your initials. Choose No inference for now. Otherwise, the default parameters are fine.

Connect to the new repository and pin it as your default repository.

Load data

Download the Turtle file File:Russia investigation kg.txt, and save it with the correct extension, as russia_investigation_kg.ttl (not .txt). (You can also experiment with the Turtle file you saved after exercises 1 and 2.) Load the Russia_investigation data through the GraphDB Workbench as described in the QuickStart guide.

You can use http://example.org/ as Base IRI.

Graph visualisation

Go to Explore -> Visual graph and create an Easy graph around the resource http://example.org#investigation_0. Double-click on nodes to expand them. Are there any more investigations related to Richard Nixon?

SPARQL tasks

Go to the SPARQL Query & Update tab.

Task: Using the data in russia_investigation_kg.ttl, write the following SPARQL SELECT queries. ( This page explains the Russian investigation KG a bit more.)

  • List all triples in your graph.
  • List the first 100 triples in your graph.
  • Count the number of triples in your graph.
  • Count the number of indictments in your graph.
  • List everyone who pleaded guilty, along with the name of the investigation.
  • List everyone who were convicted, but who had their conviction overturned by which president.
  • For each investigation, list the number of indictments made.
  • For each investigation with multiple indictments, list the number of indictments made.
  • For each investigation with multiple indictments, list the number of indictments made, sorted with the most indictments first.
  • For each president, list the numbers of convictions and of pardons made after conviction.

If you have more time

Task: Try to program some of the queries in a Python program (this will be the topic of later labs). You have two options:

Using rdflib: Read the Turtle file into an rdflib Graph and use the query() method.

g = Graph()
g.parse(..., format='ttl')
r = g.query(...your_query_string...)

The hard part is picking the results out of the object r...

Using SPARQLwrapper: You can use SPARQLwrapper (another Python API) to connect to your running GraphDB endpoint. See the Python example page for how to do this.

Task: If you want to explore more, try out the Wikidata Query Service (WDQS):

WDQS tutorials: