Lab: SHACL: Difference between revisions

From info216
No edit summary
Line 14: Line 14:
==Tasks==
==Tasks==
'''Task:'''  
'''Task:'''  
Go to the interactive, online [https://shacl.org/playground/ SHACL Playground]. The file [File:xxx.txt] contains a small Turtle example you can paste into the Data Graph text field. The example is based on the ''kg4news.ttl'' graph introduced in the SPARQL lecture (S03). It contains several errors. Take some time to look at it in Turtle and also in JSON-LD, using the drop-down menu next to the ''Data Graph'' heading.
Go to the interactive, online [https://shacl.org/playground/ SHACL Playground]. Cut-and-paste the Turtle triples below into the Data Graph text field.  
<syntaxhighlight>
@prefix ex: <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
 
ex:Paul_Manafort
    a ex:PersonUnderInvestigation ;
    ex:hasBusinessPartner ex:Rick_Gates .
 
ex:Rick_Gates
    a ex:PersonUnderInvestigation ;
    foaf:name
        "Rick Gates" ,
        "Richard William Gates III"@en ;
    ex:chargedWith
        "Foreign Lobbying"@en ,
        ex:MoneyLaundering ,
        ex:TaxEvasion .
</syntaxhighlight>
The example is based on Exercises 1 and 2. Take some time to look at it in Turtle and also in JSON-LD, using the drop-down menu next to the ''Data Graph'' heading.


'''Task:'''  
'''Task:'''  
Line 20: Line 39:


You can use the following prefixes:
You can use the following prefixes:
  xxx
  @prefix ex: <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
 
* Every person under investigation has exactly one name.
* All person names must be language-tagged.
* The value of a charged with property must be a URI.


Constraints:
Difficult:
* Every kg:MainPaper has (is the subject of) exactly one kg:year property.
* If one person (under investigation) has another as business partner, the second person must have the first as business partner in return.
** In the example, the node xxx has more than one such property.
* Every kg:year value (literal object) is an xsd:integer.
** In the example, xx node have kg:year-s that are not integers.
* Only kg:MainPapers have kg:year properties.
** In the example, the kg:MainAuthor xx also has a kg:year property.
* Every kg:MainPaper has at least one dcterm:subject, whose value is a skos:Concept (see Sessions xx and xx on Vocabularies).
** In the example, ...
* More specifically, every value of dcterm:subject is ''either''
(Hint: use SHACL's ''sh:or'' property to express this.)
** In the example, ...
* Only kg:MainPaper-s have dcterm:subject-s.
** In the example, ...


'''Task:'''  
'''Task:'''  
Write a Python program using rdflib and pySHACL, which:
Write a Python program using rdflib and pySHACL, which:
# parses the contents of [File:xxx.ttl] as a Turtle file into a ''data_graph'',
# parses the Turtle example above into a ''data_graph''
# parses the contents of a ''shape_graph'' you made in the previous task (for example checking the kg:year-s of kg:MainPaper-),
** ''Tip:'' you can either save it to file, or parse directly from a string using ''graph.parse(data=turtle_data, format='ttl')''
# parses the contents of a ''shape_graph'' you made in the previous task (for example checking that every person under investigation has exactly one name),
# uses pySHACL's validate method to apply the ''shape_graph'' constraints to the  ''data_graph'', and
# uses pySHACL's validate method to apply the ''shape_graph'' constraints to the  ''data_graph'', and
# print out the validation result (a boolean value, a ''results_graph'', and a ''result_text'').
# print out the validation result (a boolean value, a ''results_graph'', and a ''result_text'').


'''Task:'''
'''Task:'''
Add the Turtle triples below (from exercise 3-5) to your ''data_graph''.
<syntaxhighlight>
ex:investigation_162 a ex:Indictment ;
    ex:american "Yes" ;
    ex:cp_date "2018-02-23"^^xsd:date ;
    ex:cp_days 282 ;
    ex:indictment_days 166 ;
    ex:investigation ex:russia ;
    ex:investigation_days 659.0 ;
    # ex:investigation_end "None" ;
    ex:investigation_start "2017-05-17" ;
    ex:name ex:Rick_Gates ;
    ex:outcome ex:guilty-plea ;
    ex:overturned false ;
    ex:pardoned false ;
    ex:president "Donald Trump"@en .
</syntaxhighlight>
Download the whole [kg4news.ttl KG4NEWS graph] we used in the SPARQL lecture (S03) and parse it into the data graph. Re-run a selection of your ''shape_graph'' constraints on the larger graph.  
Download the whole [kg4news.ttl KG4NEWS graph] we used in the SPARQL lecture (S03) and parse it into the data graph. Re-run a selection of your ''shape_graph'' constraints on the larger graph.  


Line 56: Line 88:
'''Task:''' Install pySHACL into your virtual environment:
'''Task:''' Install pySHACL into your virtual environment:
  pip install pyshacl
  pip install pyshacl


==If you have more time==
==If you have more time==

Revision as of 18:51, 18 February 2023

Topics

  • Validating RDF graphs with SHACL
  • Running pySHACL

Useful materials

SHACL:

pySHACL:

Tasks

Task: Go to the interactive, online SHACL Playground. Cut-and-paste the Turtle triples below into the Data Graph text field.

@prefix ex: <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

ex:Paul_Manafort 
    a ex:PersonUnderInvestigation ;
    ex:hasBusinessPartner ex:Rick_Gates .

ex:Rick_Gates 
    a ex:PersonUnderInvestigation ;
    foaf:name 
        "Rick Gates" ,
        "Richard William Gates III"@en ;
    ex:chargedWith 
        "Foreign Lobbying"@en ,
        ex:MoneyLaundering ,
        ex:TaxEvasion .

The example is based on Exercises 1 and 2. Take some time to look at it in Turtle and also in JSON-LD, using the drop-down menu next to the Data Graph heading.

Task: Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the constraints below. Keep copies of your of your Shape Graphs in a separate text editor and file. You will need them later. Each time you have entered a Shape Graph into the text field, click Update to validate the contents of the Data Graph.

You can use the following prefixes:

@prefix ex: <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
  • Every person under investigation has exactly one name.
  • All person names must be language-tagged.
  • The value of a charged with property must be a URI.

Difficult:

  • If one person (under investigation) has another as business partner, the second person must have the first as business partner in return.

Task: Write a Python program using rdflib and pySHACL, which:

  1. parses the Turtle example above into a data_graph
    • Tip: you can either save it to file, or parse directly from a string using graph.parse(data=turtle_data, format='ttl')
  1. parses the contents of a shape_graph you made in the previous task (for example checking that every person under investigation has exactly one name),
  2. uses pySHACL's validate method to apply the shape_graph constraints to the data_graph, and
  3. print out the validation result (a boolean value, a results_graph, and a result_text).

Task: Add the Turtle triples below (from exercise 3-5) to your data_graph.

ex:investigation_162 a ex:Indictment ;
    ex:american "Yes" ;
    ex:cp_date "2018-02-23"^^xsd:date ;
    ex:cp_days 282 ;
    ex:indictment_days 166 ;
    ex:investigation ex:russia ;
    ex:investigation_days 659.0 ;
    # ex:investigation_end "None" ;
    ex:investigation_start "2017-05-17" ;
    ex:name ex:Rick_Gates ;
    ex:outcome ex:guilty-plea ;
    ex:overturned false ;
    ex:pardoned false ;
    ex:president "Donald Trump"@en .

Download the whole [kg4news.ttl KG4NEWS graph] we used in the SPARQL lecture (S03) and parse it into the data graph. Re-run a selection of your shape_graph constraints on the larger graph.

Task: In some cases, the results_graph and result_text will report the same error many times, but for different nodes. Write a SPARQL query to print out each distinct sh:xxxMessage in the results_graph.

Task: Modify the above query so it prints out each sh:xxxMessage in the results_graph once, along with the number of times the message has been repeated in the results.


Task: Install pySHACL into your virtual environment:

pip install pyshacl

If you have more time

Task: Fix kg4news.txt (renamed to .ttl) so that:

  • Every kg:year value has rdf:type xsd:year .
  • xxx