Lab: SPARQL: Difference between revisions

From info216
No edit summary
 
(67 intermediate revisions by 3 users not shown)
Line 1: Line 1:
==Topics==
==Topics==
* Setting up the Blazegraph graph database. Previously we have only stored our triples in memory, which is not persistent.
* Setting up GraphDB
* SPARQL queries and updates. We use SPARQL to retrieve of update triples in our databases/graphs of triples
* SPARQL queries and updates


==Useful materials==
==Useful materials==
Blazegraph:
GraphDB documentation:
* [https://blazegraph.com/ Welcome to Blazegraph]
* [https://graphdb.ontotext.com/documentation/10.0/quick-start-guide.html GraphDB 10.0 Quick Start Guide]
* [https://graphdb.ontotext.com/documentation/10.5/ GraphDB 10.5 Documentation]


SPARQL:
SPARQL reference:
* [https://www.w3.org/TR/sparql11-query/ SPARQL Query Documentation]
* [https://www.w3.org/TR/sparql11-query/ SPARQL Query Documentation]
<!--
* [http://www.w3.org/TR/sparql11-update/ SPARQL Update Documentation]
* [http://www.w3.org/TR/sparql11-update/ SPARQL Update Documentation]
-->
* [https://en.wikibooks.org/wiki/SPARQL/Expressions_and_Functions SPARQL Expressions and Functions]


==Tasks==
==Tasks==
===Running Blazegraph===
===Registering for GraphDB Free===
You can either run Blazegraph locally on your own machine (best) or online at a local server (also ok).
To retrieve a download link for Ontotext's GraphDB Free tool, you first need to register. Here is the [https://www.ontotext.com/products/graphdb/download/?utm_source=adwords&utm_medium=ppc&utm_term=ontotext%20graphdb&utm_campaign=Search+Graphdb&hsa_cam=19852701758&hsa_mt=p&hsa_ver=3&hsa_src=g&hsa_ad=651747487851&hsa_net=adwords&hsa_tgt=kwd-1467556044238&hsa_acc=9129462532&hsa_grp=148766495402&hsa_kw=ontotext%20graphdb&gad_source=1&gclid=Cj0KCQiAh8OtBhCQARIsAIkWb69Mvno3kVHLrHHpZ_FV2_vnwf9IWoMa207bd43maPDUOm2R53UAuYYaAgCxEALw_wcB registration link] (or search for "ontotext graphdb registration").


'''Installing the Blazegraph database on your own computer:'''
''If you do not like registering for proprietary software, it is still possible to do most of the exercises using Blazegraph, which you can [https://blazegraph.com/ download here] (requires Java). Blazegraph is a powerful open-source tool, but GraphDB offers even more functionality and is what the lab leaders will prepare for this semester.
Download Blazegraph (blazegraph.jar) from here: [https://blazegraph.com/ https://blazegraph.com/]
You can place blazegraph.jar in the same folder of your python project for the labs.
Navigate to the folder of blazegraph.jar in your commandline/terminal using cd. (cd C:\Users\marti\info216 for me as an example). Now run this command:
java -server -Xmx4g -jar blazegraph.jar
You might have to install java 64-bit JDK if you have problems running Blazegraph. You can do it from [https://www.oracle.com/technetwork/java/javase/downloads/ this link].
If you get an "Address already in use" error, this is likely because Blazegraph has been terminated improperly. Either restart the terminal-session or try to run this command instead:
java -server -Xmx4g -Djetty.port=19999 -jar blazegraph.jar
This changes the port of the Blazegraph server.


'''Running Blazegraph online:'''
===Installing and running GraphDB===
If you have trouble installing Blazegraph, you can use [http://sandbox.i2s.uib.no/bigdata/ a shared local server] for now. This is the same Blazegraph interface, but its stored in the cloud and only be used from the UiB network. You may be able to access it without connecting to the UiB Network, but if you are unable to access the endpoint try connecting via the VPN. Instructions [https://hjelp.uib.no/tas/public/ssp/content/detail/service?unid=a566dafec92a4d35bba974f0733f3663 here]. If it works it should now display an URL like: "http://10.0.0.13:9999/blazegraph/". Open this in a browser.  
When you have received the download link in an email from the ''GraphDB Team'', you can proceed to install and run GraphDB in the following manner, depending on your system:
* On Windows:
** Download the GraphDB Desktop .msi installer file.
** Double-click the application file and follow the on-screen installer prompts.
** Locate the GraphDB Desktop application in the Windows Start menu and start it. The GraphDB Workbench opens at http://localhost:7200/.


'''RDF data:'''
* On MacOS
You can use the data in the Turtle file [File:russia_investigation_kg.txt russia_investigation_kg.ttl]. Make sure you save it with the correct extension (''.ttl''). In the Blazegraph interface, go to the '''UPDATE''' tab and use the '''Browse...''' and '''Update''' buttons to load the data into Blazegraph.
** Download the GraphDB Desktop .dmg file.
** Double-click it and get a virtual disk on your desktop. Copy the program from the virtual disk to your hard disk Applications folder, and you’re set.
** Start GraphDB Desktop by clicking the application icon. The GraphDB Workbench opens at http://localhost:7200/.


'''Using Blazegraph:'''
* On Linux
In the Blazegraph interface, go to the '''QUERY''' and '''UPDATE''' tabs to enter queries and updates.
** Download the GraphDB Desktop .deb or .rpm file.
** Install the package with sudo dpkg -i or sudo rpm -i and the name of the downloaded package. Alternatively, you can double-click the package name.
** Start GraphDB Desktop by clicking the application icon. The GraphDB Workbench opens at http://localhost:7200/.


==Tasks==
For more information about setting up GraphDB you can check out their quick start guide:
[https://graphdb.ontotext.com/documentation/10.0/quick-start-guide.html Quick Start Guide].


Write the following SPARQL queries:  
===Setting up a repository===
Follow the ''Create a Repository'' section in the [https://graphdb.ontotext.com/documentation/10.0/quick-start-guide.html Quick Start Guide].
Create a new GraphDB Repository called, for example, ''info216_lab3_NN'', where ''NN'' are your initials. Choose ''No inference'' for now.
Otherwise, the default parameters are fine.


* SELECT all triples in your graph.
Connect to the new repository and pin it as your default repository.
* SELECT all the interests of Cade.
* SELECT the city and country of where Emma lives.
* SELECT only people who are older than 26.
* SELECT Everyone who graduated with a Bachelor Degree.  


Use SPARQL Update's DELETE DATA to delete that fact that Cade is interested in Photography. Run your SPARQL query again to check that the graph has changed.
===Load data===
Download the Turtle file [[File:russia_investigation_kg.txt]], and save it with the correct extension, as ''russia_investigation_kg.ttl'' (not ''.txt''). (You can also experiment with the Turtle file you saved after exercises 1 and 2.) Load the Russia_investigation data through the GraphDB Workbench as described in the QuickStart guide.


Use INSERT DATA to add information about Sergio Pastor, who lives in 4 Carrer del Serpis, 46021 Valencia, Spain. he has a M.Sc. in computer from the University of Valencia from 2008. His areas of expertise include big data, semantic technologies and machine learning.
You can use ''http://example.org/'' as Base IRI.


Insert these triples into your RDF graph, if you have not done so before:
===Graph visualisation===
* George Papadopoulos was adviser to the Trump campaign.
Go to ''Explore'' -> ''Visual graph'' and create an ''Easy graph'' around the resource ''http://example.org#investigation_0''. Double-click on nodes to expand them. Are there any more investigations related to ''Richard Nixon''?
** He pleaded guilty to lying to the FBI.
** He was sentenced to prison.  
* Michael Flynn was adviser to Donald Trump.
** He pleaded guilty for lying to the FBI.
** He negotiated a plea agreement. 
* Michael Cohen was Donald Trump's attorney.
** He pleaded guilty for lying to Congress.
* Roger Stone is a Republican.
** He was adviser to Trump.
** He was an official in the Trump campaign.
** He interacted with Wikileaks.
** He was indicted for making false statements, witness tampering, and obstruction of justice.
** He made a testimony for the House Intelligence Committee.


Write a SPARQL DELETE/INSERT update to change the name of "University of Valencia" to "Universidad de Valencia" whereever it occurs.
===SPARQL tasks===
Go to the ''SPARQL Query & Update'' tab.


Write a SPARQL DESCRIBE query to get basic information about Sergio.
'''Task:'''
Using the data in ''russia_investigation_kg.ttl'', write the following SPARQL SELECT queries.
([[Russian investigation KG | This page explains]] the Russian investigation KG a bit more.)
* List all triples in your graph.
* List the first 100 triples in your graph.
* Count the number of triples in your graph.
* Count the number of indictments in your graph.
* List everyone who pleaded guilty, along with the name of the investigation.
* List everyone who were convicted, but who had their conviction overturned by which president.
* For each investigation, list the number of indictments made.
* For each investigation with multiple indictments, list the number of indictments made.
* For each investigation with multiple indictments, list the number of indictments made, sorted with the most indictments first.
* For each president, list the numbers of convictions and of pardons made after conviction.


Write a SPARQL CONSTRUCT query that returns that: any city in an address is a cityOf the country of the same address.
==If you have more time==


==If you have more time==
'''Task:''' Try to program some of the queries in a Python program (this will be the topic of later labs). You have two options:
'''Task:''' Try to program some of the queries/updates in a Python program (this will be the topic of later labs). You have two options:
 
# Read the Turtle file into an rdflib Graph and use the ''query()'' method.  
''Using rdflib:''
Read the Turtle file into an rdflib Graph and use the ''query()'' method.  
  g = Graph()
  g = Graph()
  g.parse(..., format='ttl')
  g.parse(..., format='ttl')
  r = g.query(...your_query_string...)
  r = g.query(...your_query_string...)
The hard part is picking the results out of the object ''r''...
The hard part is picking the results out of the object ''r''...
# You can use SPARQLwrapper (another Python API) to connect to your running Blazegraph endpoint. See the Python example page for how to do this.


'''Task:''' If you want to explore more, try out Wikidata Query Service (WDQS)
''Using SPARQLwrapper:''
You can use SPARQLwrapper (another Python API) to connect to your running GraphDB endpoint. See the Python example page for how to do this.
 
'''Task:''' If you want to explore more, try out the Wikidata Query Service (WDQS):
* [https://query.wikidata.org/ Wikidata Query Service]
* [https://query.wikidata.org/ Wikidata Query Service]



Latest revision as of 10:46, 11 February 2024

Topics

  • Setting up GraphDB
  • SPARQL queries and updates

Useful materials

GraphDB documentation:

SPARQL reference:

Tasks

Registering for GraphDB Free

To retrieve a download link for Ontotext's GraphDB Free tool, you first need to register. Here is the registration link (or search for "ontotext graphdb registration").

If you do not like registering for proprietary software, it is still possible to do most of the exercises using Blazegraph, which you can download here (requires Java). Blazegraph is a powerful open-source tool, but GraphDB offers even more functionality and is what the lab leaders will prepare for this semester.

Installing and running GraphDB

When you have received the download link in an email from the GraphDB Team, you can proceed to install and run GraphDB in the following manner, depending on your system:

  • On Windows:
    • Download the GraphDB Desktop .msi installer file.
    • Double-click the application file and follow the on-screen installer prompts.
    • Locate the GraphDB Desktop application in the Windows Start menu and start it. The GraphDB Workbench opens at http://localhost:7200/.
  • On MacOS
    • Download the GraphDB Desktop .dmg file.
    • Double-click it and get a virtual disk on your desktop. Copy the program from the virtual disk to your hard disk Applications folder, and you’re set.
    • Start GraphDB Desktop by clicking the application icon. The GraphDB Workbench opens at http://localhost:7200/.
  • On Linux
    • Download the GraphDB Desktop .deb or .rpm file.
    • Install the package with sudo dpkg -i or sudo rpm -i and the name of the downloaded package. Alternatively, you can double-click the package name.
    • Start GraphDB Desktop by clicking the application icon. The GraphDB Workbench opens at http://localhost:7200/.

For more information about setting up GraphDB you can check out their quick start guide: Quick Start Guide.

Setting up a repository

Follow the Create a Repository section in the Quick Start Guide. Create a new GraphDB Repository called, for example, info216_lab3_NN, where NN are your initials. Choose No inference for now. Otherwise, the default parameters are fine.

Connect to the new repository and pin it as your default repository.

Load data

Download the Turtle file File:Russia investigation kg.txt, and save it with the correct extension, as russia_investigation_kg.ttl (not .txt). (You can also experiment with the Turtle file you saved after exercises 1 and 2.) Load the Russia_investigation data through the GraphDB Workbench as described in the QuickStart guide.

You can use http://example.org/ as Base IRI.

Graph visualisation

Go to Explore -> Visual graph and create an Easy graph around the resource http://example.org#investigation_0. Double-click on nodes to expand them. Are there any more investigations related to Richard Nixon?

SPARQL tasks

Go to the SPARQL Query & Update tab.

Task: Using the data in russia_investigation_kg.ttl, write the following SPARQL SELECT queries. ( This page explains the Russian investigation KG a bit more.)

  • List all triples in your graph.
  • List the first 100 triples in your graph.
  • Count the number of triples in your graph.
  • Count the number of indictments in your graph.
  • List everyone who pleaded guilty, along with the name of the investigation.
  • List everyone who were convicted, but who had their conviction overturned by which president.
  • For each investigation, list the number of indictments made.
  • For each investigation with multiple indictments, list the number of indictments made.
  • For each investigation with multiple indictments, list the number of indictments made, sorted with the most indictments first.
  • For each president, list the numbers of convictions and of pardons made after conviction.

If you have more time

Task: Try to program some of the queries in a Python program (this will be the topic of later labs). You have two options:

Using rdflib: Read the Turtle file into an rdflib Graph and use the query() method.

g = Graph()
g.parse(..., format='ttl')
r = g.query(...your_query_string...)

The hard part is picking the results out of the object r...

Using SPARQLwrapper: You can use SPARQLwrapper (another Python API) to connect to your running GraphDB endpoint. See the Python example page for how to do this.

Task: If you want to explore more, try out the Wikidata Query Service (WDQS):

WDQS tutorials: