Lab: SHACL: Difference between revisions
mNo edit summary |
|||
(8 intermediate revisions by 2 users not shown) | |||
Line 8: | Line 8: | ||
* [https://book.validatingrdf.com/bookHtml011.html Chapter 5 ''SHACL''] in [https://book.validatingrdf.com/index.html Validating RDF] (available online) | * [https://book.validatingrdf.com/bookHtml011.html Chapter 5 ''SHACL''] in [https://book.validatingrdf.com/index.html Validating RDF] (available online) | ||
* Interactive, online [https://shacl.org/playground/ SHACL Playground] | * Interactive, online [https://shacl.org/playground/ SHACL Playground] | ||
* [https://docs.google.com/presentation/d/1weO9SzssxgYp3g_44X1LZsVtL0i6FurQ3KbIKZ8iriQ/ Lab presentation containing a short overview of SHACL and pySHACL] | |||
pySHACL: | pySHACL: | ||
Line 16: | Line 17: | ||
Go to the interactive, online [https://shacl.org/playground/ SHACL Playground]. Cut-and-paste the Turtle triples below into the Data Graph text field, and click ''Update''. | Go to the interactive, online [https://shacl.org/playground/ SHACL Playground]. Cut-and-paste the Turtle triples below into the Data Graph text field, and click ''Update''. | ||
<syntaxhighlight> | <syntaxhighlight> | ||
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . | |||
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . | |||
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> . | |||
@prefix foaf: <http://xmlns.com/foaf/0.1/> . | |||
@prefix skos: <http://www.w3.org/2004/02/skos/core#> . | |||
@prefix ex: <http://example.org/> . | @prefix ex: <http://example.org/> . | ||
ex:Paul_Manafort | ex:Paul_Manafort | ||
a ex:PersonUnderInvestigation ; | a ex:PersonUnderInvestigation ; | ||
foaf:name | |||
"Paul Manafort"@en ; | |||
ex:hasBusinessPartner ex:Rick_Gates . | ex:hasBusinessPartner ex:Rick_Gates . | ||
Line 26: | Line 33: | ||
a ex:PersonUnderInvestigation ; | a ex:PersonUnderInvestigation ; | ||
foaf:name | foaf:name | ||
"Rick Gates" | "Rick Gates"@en ; | ||
"Richard William Gates III"@en ; | skos:altLabel | ||
"Richard William Gates III"@en ; | |||
ex:chargedWith | ex:chargedWith | ||
ex:ForeignLobbying , | |||
ex:MoneyLaundering , | ex:MoneyLaundering , | ||
ex:TaxEvasion . | ex:TaxEvasion ; | ||
ex:pleadedGuilty | |||
ex:Conspiracy, [ | |||
a ex:Lying ; | |||
ex:wasLyingTo ex:FBI | |||
] . | |||
ex:ForeignLobbying a ex:Offense . | |||
ex:MoneyLaundering a ex:Offense . | |||
ex:TaxEvasion a ex:Offense . | |||
</syntaxhighlight> | </syntaxhighlight> | ||
The example is based on Exercises 1 and 2. Take some time to look at it in Turtle and also in JSON-LD, using the drop-down menu next to the ''Data Graph'' heading. | The example is based on Exercises 1 and 2. Take some time to look at it in Turtle and also in JSON-LD, using the drop-down menu next to the ''Data Graph'' heading. | ||
'''Task:''' | '''Task:''' | ||
Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the constraints below. Keep copies | Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the constraints below. Keep copies of your Shape Graphs in a separate text editor and file. You will need them later. Each time you have entered a Shape Graph into the text field, click ''Update'' to validate the contents of the Data Graph. | ||
You can use the following prefixes: | You can use the following prefixes: | ||
Line 45: | Line 63: | ||
@prefix ex: <http://example.org/> . | @prefix ex: <http://example.org/> . | ||
Constraints: | |||
* Every person under investigation has exactly one name. | * Every person under investigation has exactly one name. | ||
* The object of a charged with property must be a URI. | * The object of a charged with property must be a URI. | ||
Line 50: | Line 69: | ||
* All person names must be language-tagged (''hint:'' rdf:langString is a datatype!). | * All person names must be language-tagged (''hint:'' rdf:langString is a datatype!). | ||
Change the ''data_graph'' to remove the detected errors | Change the ''data_graph'' to remove the detected errors as you go along (it is easier to read the outputs then). | ||
'''Task:''' | '''Task:''' | ||
Line 61: | Line 76: | ||
# parses the contents of a ''shape_graph'' you made in the previous task (for example checking that every person under investigation has exactly one name), | # parses the contents of a ''shape_graph'' you made in the previous task (for example checking that every person under investigation has exactly one name), | ||
# uses pySHACL's validate method to apply the ''shape_graph'' constraints to the ''data_graph'', and | # uses pySHACL's validate method to apply the ''shape_graph'' constraints to the ''data_graph'', and | ||
# | # prints out the validation result (a boolean value, a ''results_graph'', and a ''result_text''). | ||
==If you have more time== | ==If you have more time== | ||
Line 68: | Line 83: | ||
<syntaxhighlight> | <syntaxhighlight> | ||
ex:investigation_162 a ex:Indictment ; | ex:investigation_162 a ex:Indictment ; | ||
ex:american " | ex:american "unknown" ; | ||
ex:cp_date "2018-02-23"^^xsd:date ; | ex:cp_date "2018-02-23"^^xsd:date ; | ||
ex:cp_days 282 ; | ex:cp_days 282 ; | ||
ex:indictment_days 166 ; | ex:indictment_days 166 ; | ||
ex:investigation ex:russia ; | ex:investigation ex:russia ; | ||
ex:investigation_days 659 | ex:investigation_days 659 ; | ||
ex:investigation_end "unknown" ; | |||
ex:investigation_start "2017-05-17" ; | ex:investigation_start "2017-05-17"^^xsd:date ; | ||
ex: | foaf:name "Rick Gates" ; | ||
ex:outcome ex: | ex:investigatedPerson ex:Rick_Gates ; | ||
ex:outcome ex:guilty_plea ; | |||
ex:overturned false ; | ex:overturned false ; | ||
ex:pardoned false ; | ex:pardoned false ; | ||
ex:president | ex:president ex:Donald_Trump . | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 86: | Line 102: | ||
* The only allowed values for ''ex:american'' are ''true'', ''false'' or ''unknown''. | * The only allowed values for ''ex:american'' are ''true'', ''false'' or ''unknown''. | ||
* The value of a property that counts days must be an integer. | * The value of a property that counts days must be an integer. | ||
* The value of a property that indicates a date must be ''xsd:date''. | * The value of a property that indicates a start date must be ''xsd:date''. | ||
* The value of a property that indicates | * The value of a property that indicates an end date must be ''xsd:date'' or ''unknown'' (''tip:'' you can use ''sh:or (...)'' ). | ||
* Every indictment must have exactly one FOAF name for the investigated person | * Every indictment must have exactly one FOAF name for the investigated person. | ||
* Every indictment must have exactly one investigated person property, and that person must have the type ex:PersonUnderInvestigation. | * Every indictment must have exactly one investigated person property, and that person must have the type ex:PersonUnderInvestigation. | ||
* No URI-s can contain hyphens ('-'). | * No URI-s can contain hyphens ('-'). | ||
Line 94: | Line 110: | ||
'''Task:''' | '''Task:''' | ||
When you run SHACL on large data graphs, the ''results_graph'' and ''result_text'' will report the same error many times (but for different nodes). Write a SPARQL query to print out each distinct ''sh:resultMessage'' in the ''results_graph''. | |||
'''Task:''' | '''Task:''' | ||
Modify the above query so it prints out each ''sh:resultMessage'' in the ''results_graph'' once, along with the number of times that message has been repeated in the results. | Modify the above query so it prints out each ''sh:resultMessage'' in the ''results_graph'' once, along with the number of times that message has been repeated in the results. |
Latest revision as of 13:13, 19 March 2024
Topics
- Validating RDF graphs with SHACL
- Running pySHACL
Useful materials
SHACL:
- Section 7.4 Expectation in RDF in Allemang, Hendler & Gandon's textbook (Semantic Web for the Working Ontologist)
- Chapter 5 SHACL in Validating RDF (available online)
- Interactive, online SHACL Playground
- Lab presentation containing a short overview of SHACL and pySHACL
pySHACL:
- pySHACL at PyPi.org (after installation, go straight to "Python Module Use".)
Tasks
Task: Go to the interactive, online SHACL Playground. Cut-and-paste the Turtle triples below into the Data Graph text field, and click Update.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix ex: <http://example.org/> .
ex:Paul_Manafort
a ex:PersonUnderInvestigation ;
foaf:name
"Paul Manafort"@en ;
ex:hasBusinessPartner ex:Rick_Gates .
ex:Rick_Gates
a ex:PersonUnderInvestigation ;
foaf:name
"Rick Gates"@en ;
skos:altLabel
"Richard William Gates III"@en ;
ex:chargedWith
ex:ForeignLobbying ,
ex:MoneyLaundering ,
ex:TaxEvasion ;
ex:pleadedGuilty
ex:Conspiracy, [
a ex:Lying ;
ex:wasLyingTo ex:FBI
] .
ex:ForeignLobbying a ex:Offense .
ex:MoneyLaundering a ex:Offense .
ex:TaxEvasion a ex:Offense .
The example is based on Exercises 1 and 2. Take some time to look at it in Turtle and also in JSON-LD, using the drop-down menu next to the Data Graph heading.
Task: Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the constraints below. Keep copies of your Shape Graphs in a separate text editor and file. You will need them later. Each time you have entered a Shape Graph into the text field, click Update to validate the contents of the Data Graph.
You can use the following prefixes:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix sh: <http://www.w3.org/ns/shacl#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ex: <http://example.org/> .
Constraints:
- Every person under investigation has exactly one name.
- The object of a charged with property must be a URI.
- The object of a charged with property must be an offense.
- All person names must be language-tagged (hint: rdf:langString is a datatype!).
Change the data_graph to remove the detected errors as you go along (it is easier to read the outputs then).
Task: Write a Python program using rdflib and pySHACL, which:
- parses the Turtle example above into a data_graph (tip: you can either save it to file, or parse directly from a string using graph.parse(data=turtle_data, format='ttl')),
- parses the contents of a shape_graph you made in the previous task (for example checking that every person under investigation has exactly one name),
- uses pySHACL's validate method to apply the shape_graph constraints to the data_graph, and
- prints out the validation result (a boolean value, a results_graph, and a result_text).
If you have more time
Task: Add the Turtle triples below (from exercise 3-5) to your data_graph.
ex:investigation_162 a ex:Indictment ;
ex:american "unknown" ;
ex:cp_date "2018-02-23"^^xsd:date ;
ex:cp_days 282 ;
ex:indictment_days 166 ;
ex:investigation ex:russia ;
ex:investigation_days 659 ;
ex:investigation_end "unknown" ;
ex:investigation_start "2017-05-17"^^xsd:date ;
foaf:name "Rick Gates" ;
ex:investigatedPerson ex:Rick_Gates ;
ex:outcome ex:guilty_plea ;
ex:overturned false ;
ex:pardoned false ;
ex:president ex:Donald_Trump .
Extend your shapes graph for each of these constraints:
- The only allowed values for ex:american are true, false or unknown.
- The value of a property that counts days must be an integer.
- The value of a property that indicates a start date must be xsd:date.
- The value of a property that indicates an end date must be xsd:date or unknown (tip: you can use sh:or (...) ).
- Every indictment must have exactly one FOAF name for the investigated person.
- Every indictment must have exactly one investigated person property, and that person must have the type ex:PersonUnderInvestigation.
- No URI-s can contain hyphens ('-').
- Presidents must be identified with URIs.
Task: When you run SHACL on large data graphs, the results_graph and result_text will report the same error many times (but for different nodes). Write a SPARQL query to print out each distinct sh:resultMessage in the results_graph.
Task: Modify the above query so it prints out each sh:resultMessage in the results_graph once, along with the number of times that message has been repeated in the results.