Lab: SHACL: Difference between revisions

From info216
No edit summary
mNo edit summary
 
(10 intermediate revisions by 2 users not shown)
Line 8: Line 8:
* [https://book.validatingrdf.com/bookHtml011.html Chapter 5 ''SHACL''] in [https://book.validatingrdf.com/index.html Validating RDF] (available online)
* [https://book.validatingrdf.com/bookHtml011.html Chapter 5 ''SHACL''] in [https://book.validatingrdf.com/index.html Validating RDF] (available online)
* Interactive, online [https://shacl.org/playground/ SHACL Playground]
* Interactive, online [https://shacl.org/playground/ SHACL Playground]
* [https://docs.google.com/presentation/d/1weO9SzssxgYp3g_44X1LZsVtL0i6FurQ3KbIKZ8iriQ/ Lab presentation containing a short overview of SHACL and pySHACL]


pySHACL:
pySHACL:
Line 16: Line 17:
Go to the interactive, online [https://shacl.org/playground/ SHACL Playground]. Cut-and-paste the Turtle triples below into the Data Graph text field, and click ''Update''.
Go to the interactive, online [https://shacl.org/playground/ SHACL Playground]. Cut-and-paste the Turtle triples below into the Data Graph text field, and click ''Update''.
<syntaxhighlight>
<syntaxhighlight>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix ex: <http://example.org/> .
@prefix ex: <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .


ex:Paul_Manafort  
ex:Paul_Manafort  
     a ex:PersonUnderInvestigation ;
     a ex:PersonUnderInvestigation ;
    foaf:name
        "Paul Manafort"@en ; 
     ex:hasBusinessPartner ex:Rick_Gates .
     ex:hasBusinessPartner ex:Rick_Gates .


Line 26: Line 33:
     a ex:PersonUnderInvestigation ;
     a ex:PersonUnderInvestigation ;
     foaf:name  
     foaf:name  
         "Rick Gates" ,
         "Rick Gates"@en ; 
         "Richard William Gates III"@en ;
    skos:altLabel
         "Richard William Gates III"@en ;
     ex:chargedWith  
     ex:chargedWith  
         "Foreign Lobbying"@en ,
         ex:ForeignLobbying ,
         ex:MoneyLaundering ,
         ex:MoneyLaundering ,
         ex:TaxEvasion .
         ex:TaxEvasion ;
    ex:pleadedGuilty
        ex:Conspiracy, [
                a ex:Lying ;
                ex:wasLyingTo ex:FBI
            ] .
 
ex:ForeignLobbying a ex:Offense . 
ex:MoneyLaundering a ex:Offense . 
ex:TaxEvasion a ex:Offense . 
 
</syntaxhighlight>
</syntaxhighlight>
The example is based on Exercises 1 and 2. Take some time to look at it in Turtle and also in JSON-LD, using the drop-down menu next to the ''Data Graph'' heading.
The example is based on Exercises 1 and 2. Take some time to look at it in Turtle and also in JSON-LD, using the drop-down menu next to the ''Data Graph'' heading.


'''Task:'''  
'''Task:'''  
Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the constraints below. Keep copies of your of your Shape Graphs in a separate text editor and file. You will need them later. Each time you have entered a Shape Graph into the text field, click ''Update'' to validate the contents of the Data Graph.
Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the constraints below. Keep copies of your Shape Graphs in a separate text editor and file. You will need them later. Each time you have entered a Shape Graph into the text field, click ''Update'' to validate the contents of the Data Graph.


You can use the following prefixes:
You can use the following prefixes:
Line 45: Line 63:
  @prefix ex: <http://example.org/> .
  @prefix ex: <http://example.org/> .


Constraints:
* Every person under investigation has exactly one name.
* Every person under investigation has exactly one name.
* The object of a charged with property must be a URI.
* The object of a charged with property must be a URI.
Line 50: Line 69:
* All person names must be language-tagged (''hint:'' rdf:langString is a datatype!).
* All person names must be language-tagged (''hint:'' rdf:langString is a datatype!).


Change the ''data_graph'' to remove the detected errors.
Change the ''data_graph'' to remove the detected errors as you go along (it is easier to read the outputs then).
<!--
''Difficult:''
* If one person (under investigation) has another as business partner, the second person must have the first as business partner in return.
-->


'''Task:'''  
'''Task:'''  
Write a Python program using rdflib and pySHACL, which:
Write a Python program using rdflib and pySHACL, which:
# parses the Turtle example above into a ''data_graph''
# parses the Turtle example above into a ''data_graph'' (''tip:'' you can either save it to file, or parse directly from a string using ''graph.parse(data=turtle_data, format='ttl')''),
** ''Tip:'' you can either save it to file, or parse directly from a string using ''graph.parse(data=turtle_data, format='ttl')''
# parses the contents of a ''shape_graph'' you made in the previous task (for example checking that every person under investigation has exactly one name),
# parses the contents of a ''shape_graph'' you made in the previous task (for example checking that every person under investigation has exactly one name),
# uses pySHACL's validate method to apply the ''shape_graph'' constraints to the  ''data_graph'', and
# uses pySHACL's validate method to apply the ''shape_graph'' constraints to the  ''data_graph'', and
# print out the validation result (a boolean value, a ''results_graph'', and a ''result_text'').
# prints out the validation result (a boolean value, a ''results_graph'', and a ''result_text'').


==If you have more time==
==If you have more time==
Line 69: Line 83:
<syntaxhighlight>
<syntaxhighlight>
ex:investigation_162 a ex:Indictment ;
ex:investigation_162 a ex:Indictment ;
     ex:american "Yes" ;
     ex:american "unknown" ;
     ex:cp_date "2018-02-23"^^xsd:date ;
     ex:cp_date "2018-02-23"^^xsd:date ;
     ex:cp_days 282 ;
     ex:cp_days 282 ;
     ex:indictment_days 166 ;
     ex:indictment_days 166 ;
     ex:investigation ex:russia ;
     ex:investigation ex:russia ;
     ex:investigation_days 659.0 ;
     ex:investigation_days 659 ;
     # ex:investigation_end "None" ;
     ex:investigation_end "unknown" ;
     ex:investigation_start "2017-05-17" ;
     ex:investigation_start "2017-05-17"^^xsd:date ;
     ex:name ex:Rick_Gates ;
    foaf:name "Rick Gates" ;
     ex:outcome ex:guilty-plea ;
     ex:investigatedPerson ex:Rick_Gates ;
     ex:outcome ex:guilty_plea ;
     ex:overturned false ;
     ex:overturned false ;
     ex:pardoned false ;
     ex:pardoned false ;
     ex:president "Donald Trump"@en .
     ex:president ex:Donald_Trump .
</syntaxhighlight>
</syntaxhighlight>


Line 87: Line 102:
* The only allowed values for ''ex:american'' are ''true'', ''false'' or ''unknown''.
* The only allowed values for ''ex:american'' are ''true'', ''false'' or ''unknown''.
* The value of a property that counts days must be an integer.
* The value of a property that counts days must be an integer.
* The value of a property that indicates a date must be ''xsd:date''.
* The value of a property that indicates a start date must be ''xsd:date''.
* The value of a property that indicates a date must be ''xsd:date'' or ''unknown'' (''tip:'' use ''sh:or (...)''').
* The value of a property that indicates an end date must be ''xsd:date'' or ''unknown'' (''tip:'' you can use ''sh:or (...)'' ).
* Every indictment must have exactly one FOAF name for the investigated person
* Every indictment must have exactly one FOAF name for the investigated person.
* Every indictment must have exactly one investigated person property, and that person must have the type ex:PersonUnderInvestigation.
* Every indictment must have exactly one investigated person property, and that person must have the type ex:PersonUnderInvestigation.
* No URI-s can contain hyphens ('-').
* No URI-s can contain hyphens ('-').
Line 95: Line 110:


'''Task:'''
'''Task:'''
Download the whole [kg4news.ttl KG4NEWS graph] we used in the SPARQL lecture (S03) and parse it into the data graph. Re-run a selection of your ''shape_graph'' constraints on the larger graph.
When you run SHACL on large data graphs, the ''results_graph'' and ''result_text'' will report the same error many times (but for different nodes). Write a SPARQL query to print out each distinct ''sh:resultMessage'' in the ''results_graph''.
 
'''Task:'''
Now that you are running SHACL on a larger data graph, the ''results_graph'' and ''result_text'' will report the same error many times (but for different nodes). Write a SPARQL query to print out each distinct ''sh:resultMessage'' in the ''results_graph''.


'''Task:'''
'''Task:'''
Modify the above query so it prints out each ''sh:resultMessage'' in the ''results_graph'' once, along with the number of times that message has been repeated in the results.
Modify the above query so it prints out each ''sh:resultMessage'' in the ''results_graph'' once, along with the number of times that message has been repeated in the results.

Latest revision as of 13:13, 19 March 2024

Topics

  • Validating RDF graphs with SHACL
  • Running pySHACL

Useful materials

SHACL:

pySHACL:

Tasks

Task: Go to the interactive, online SHACL Playground. Cut-and-paste the Turtle triples below into the Data Graph text field, and click Update.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix ex: <http://example.org/> .

ex:Paul_Manafort 
    a ex:PersonUnderInvestigation ;
    foaf:name 
        "Paul Manafort"@en ;  
    ex:hasBusinessPartner ex:Rick_Gates .

ex:Rick_Gates 
    a ex:PersonUnderInvestigation ;
    foaf:name 
        "Rick Gates"@en ;  
    skos:altLabel 
        "Richard William Gates III"@en ;  
    ex:chargedWith 
        ex:ForeignLobbying ,  
        ex:MoneyLaundering ,
        ex:TaxEvasion ;
    ex:pleadedGuilty 
        ex:Conspiracy, [
                a ex:Lying ;
                ex:wasLyingTo ex:FBI 
            ] .

ex:ForeignLobbying a ex:Offense .  
ex:MoneyLaundering a ex:Offense .  
ex:TaxEvasion a ex:Offense .

The example is based on Exercises 1 and 2. Take some time to look at it in Turtle and also in JSON-LD, using the drop-down menu next to the Data Graph heading.

Task: Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the constraints below. Keep copies of your Shape Graphs in a separate text editor and file. You will need them later. Each time you have entered a Shape Graph into the text field, click Update to validate the contents of the Data Graph.

You can use the following prefixes:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.org/> .

Constraints:

  • Every person under investigation has exactly one name.
  • The object of a charged with property must be a URI.
  • The object of a charged with property must be an offense.
  • All person names must be language-tagged (hint: rdf:langString is a datatype!).

Change the data_graph to remove the detected errors as you go along (it is easier to read the outputs then).

Task: Write a Python program using rdflib and pySHACL, which:

  1. parses the Turtle example above into a data_graph (tip: you can either save it to file, or parse directly from a string using graph.parse(data=turtle_data, format='ttl')),
  2. parses the contents of a shape_graph you made in the previous task (for example checking that every person under investigation has exactly one name),
  3. uses pySHACL's validate method to apply the shape_graph constraints to the data_graph, and
  4. prints out the validation result (a boolean value, a results_graph, and a result_text).

If you have more time

Task: Add the Turtle triples below (from exercise 3-5) to your data_graph.

ex:investigation_162 a ex:Indictment ;
    ex:american "unknown" ;
    ex:cp_date "2018-02-23"^^xsd:date ;
    ex:cp_days 282 ;
    ex:indictment_days 166 ;
    ex:investigation ex:russia ;
    ex:investigation_days 659 ;
    ex:investigation_end "unknown" ;
    ex:investigation_start "2017-05-17"^^xsd:date ;
    foaf:name "Rick Gates" ;
    ex:investigatedPerson ex:Rick_Gates ;
    ex:outcome ex:guilty_plea ;
    ex:overturned false ;
    ex:pardoned false ;
    ex:president ex:Donald_Trump .

Extend your shapes graph for each of these constraints:

  • The only allowed values for ex:american are true, false or unknown.
  • The value of a property that counts days must be an integer.
  • The value of a property that indicates a start date must be xsd:date.
  • The value of a property that indicates an end date must be xsd:date or unknown (tip: you can use sh:or (...) ).
  • Every indictment must have exactly one FOAF name for the investigated person.
  • Every indictment must have exactly one investigated person property, and that person must have the type ex:PersonUnderInvestigation.
  • No URI-s can contain hyphens ('-').
  • Presidents must be identified with URIs.

Task: When you run SHACL on large data graphs, the results_graph and result_text will report the same error many times (but for different nodes). Write a SPARQL query to print out each distinct sh:resultMessage in the results_graph.

Task: Modify the above query so it prints out each sh:resultMessage in the results_graph once, along with the number of times that message has been repeated in the results.