Lab: SHACL: Difference between revisions
No edit summary |
No edit summary |
||
Line 17: | Line 17: | ||
'''Task:''' | '''Task:''' | ||
Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the | Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the constraints below. Keep copies of your of your Shape Graphs in a separate text editor and file. You will need them later. Each time you have entered a Shape Graph into the text field, click ''Update'' to validate the contents of the Data Graph. | ||
You can use the following prefixes: | You can use the following prefixes: | ||
xxx | xxx | ||
Constraints: | |||
* Every kg:MainPaper has (is the subject of) exactly one kg:year property. | |||
** In the example, the node xxx has more than one such property. | |||
* Every kg:year value (literal object) is an xsd:integer. | |||
** In the example, xx node have kg:year-s that are not integers. | |||
* Only kg:MainPapers have kg:year properties. | |||
** In the example, the kg:MainAuthor xx also has a kg:year property. | |||
* Every kg:MainPaper has at least one dcterm:subject, whose value is a skos:Concept (see Sessions xx and xx on Vocabularies). | |||
** In the example, ... | |||
* More specifically, every value of dcterm:subject is ''either'' | |||
(Hint: use SHACL's ''sh:or'' property to express this.) | |||
** In the example, ... | |||
* Only kg:MainPaper-s have dcterm:subject-s. | |||
** In the example, ... | |||
'''Task:''' | |||
Write a Python program using rdflib and pySHACL, which: | |||
# parses the contents of [File:xxx.ttl] as a Turtle file into a ''data_graph'', | |||
# parses the contents of a ''shape_graph'' you made in the previous task (for example checking the kg:year-s of kg:MainPaper-), | |||
# uses pySHACL's validate method to apply the ''shape_graph'' constraints to the ''data_graph'', and | |||
# print out the validation result (a boolean value, a ''results_graph'', and a ''result_text''). | |||
'''Task:''' | |||
Download the whole [kg4news.ttl KG4NEWS graph] we used in the SPARQL lecture (S03) and parse it into the data graph. Re-run a selection of your ''shape_graph'' constraints on the larger graph. | |||
'''Task:''' | |||
In some cases, the ''results_graph'' and ''result_text'' will report the same error many times, but for different nodes. Write a SPARQL query to print out each distinct ''sh:xxxMessage'' in the ''results_graph''. | |||
'''Task:''' | |||
Modify the above query so it prints out each ''sh:xxxMessage'' in the ''results_graph'' once, along with the number of times the message has been repeated in the results. | |||
'''Task:''' Install pySHACL into your virtual environment: | '''Task:''' Install pySHACL into your virtual environment: | ||
Line 34: | Line 63: | ||
Fix ''kg4news.txt'' (renamed to ''.ttl'') so that: | Fix ''kg4news.txt'' (renamed to ''.ttl'') so that: | ||
* Every kg:year value has rdf:type xsd:year . | * Every kg:year value has rdf:type xsd:year . | ||
* xxx |
Revision as of 12:50, 17 February 2023
Topics
- Validating RDF graphs with SHACL
- Running pySHACL
Useful materials
SHACL:
- Section 7.4 Expectation in RDF in Allemang, Hendler & Gandon's textbook (Semantic Web for the Working Ontologist)
- Chapter 5 SHACL in Validating RDF (available online)
- Interactive, online SHACL Playground
pySHACL:
- pySHACL at PyPi.org After installation, go straight to "Python Module Use".
Tasks
Task: Go to the interactive, online SHACL Playground. The file [File:xxx.txt] contains a small Turtle example you can paste into the Data Graph text field. The example is based on the kg4news.ttl graph introduced in the SPARQL lecture (S03). It contains several errors. Take some time to look at it in Turtle and also in JSON-LD, using the drop-down menu next to the Data Graph heading.
Task: Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the constraints below. Keep copies of your of your Shape Graphs in a separate text editor and file. You will need them later. Each time you have entered a Shape Graph into the text field, click Update to validate the contents of the Data Graph.
You can use the following prefixes:
xxx
Constraints:
- Every kg:MainPaper has (is the subject of) exactly one kg:year property.
- In the example, the node xxx has more than one such property.
- Every kg:year value (literal object) is an xsd:integer.
- In the example, xx node have kg:year-s that are not integers.
- Only kg:MainPapers have kg:year properties.
- In the example, the kg:MainAuthor xx also has a kg:year property.
- Every kg:MainPaper has at least one dcterm:subject, whose value is a skos:Concept (see Sessions xx and xx on Vocabularies).
- In the example, ...
- More specifically, every value of dcterm:subject is either
(Hint: use SHACL's sh:or property to express this.)
- In the example, ...
- Only kg:MainPaper-s have dcterm:subject-s.
- In the example, ...
Task: Write a Python program using rdflib and pySHACL, which:
- parses the contents of [File:xxx.ttl] as a Turtle file into a data_graph,
- parses the contents of a shape_graph you made in the previous task (for example checking the kg:year-s of kg:MainPaper-),
- uses pySHACL's validate method to apply the shape_graph constraints to the data_graph, and
- print out the validation result (a boolean value, a results_graph, and a result_text).
Task: Download the whole [kg4news.ttl KG4NEWS graph] we used in the SPARQL lecture (S03) and parse it into the data graph. Re-run a selection of your shape_graph constraints on the larger graph.
Task: In some cases, the results_graph and result_text will report the same error many times, but for different nodes. Write a SPARQL query to print out each distinct sh:xxxMessage in the results_graph.
Task: Modify the above query so it prints out each sh:xxxMessage in the results_graph once, along with the number of times the message has been repeated in the results.
Task: Install pySHACL into your virtual environment:
pip install pyshacl
If you have more time
Task: Fix kg4news.txt (renamed to .ttl) so that:
- Every kg:year value has rdf:type xsd:year .
- xxx