Lab: SHACL: Difference between revisions

From info216
No edit summary
No edit summary
Line 17: Line 17:


'''Task:'''  
'''Task:'''  
Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the checks below. Keep copies of your of your Shape Graphs in a separate text editor and file. You will need them later. Each time you have entered a Shape Graph into the text field, click ''Update'' to validate the contents of the Data Graph.
Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the constraints below. Keep copies of your of your Shape Graphs in a separate text editor and file. You will need them later. Each time you have entered a Shape Graph into the text field, click ''Update'' to validate the contents of the Data Graph.
 
* Every kg:MainPaper has (is the subject of) exactly on kg:year property.
* Every kg:year value (literal object) is an integer.


You can use the following prefixes:
You can use the following prefixes:
  xxx
  xxx
Constraints:
* Every kg:MainPaper has (is the subject of) exactly one kg:year property.
** In the example, the node xxx has more than one such property.
* Every kg:year value (literal object) is an xsd:integer.
** In the example, xx node have kg:year-s that are not integers.
* Only kg:MainPapers have kg:year properties.
** In the example, the kg:MainAuthor xx also has a kg:year property.
* Every kg:MainPaper has at least one dcterm:subject, whose value is a skos:Concept (see Sessions xx and xx on Vocabularies).
** In the example, ...
* More specifically, every value of dcterm:subject is ''either''
(Hint: use SHACL's ''sh:or'' property to express this.)
** In the example, ...
* Only kg:MainPaper-s have dcterm:subject-s.
** In the example, ...
'''Task:'''
Write a Python program using rdflib and pySHACL, which:
# parses the contents of [File:xxx.ttl] as a Turtle file into a ''data_graph'',
# parses the contents of a ''shape_graph'' you made in the previous task (for example checking the kg:year-s of kg:MainPaper-),
# uses pySHACL's validate method to apply the ''shape_graph'' constraints to the  ''data_graph'', and
# print out the validation result (a boolean value, a ''results_graph'', and a ''result_text'').
'''Task:'''
Download the whole [kg4news.ttl KG4NEWS graph] we used in the SPARQL lecture (S03) and parse it into the data graph. Re-run a selection of your ''shape_graph'' constraints on the larger graph.
'''Task:'''
In some cases, the ''results_graph'' and ''result_text'' will report the same error many times, but for different nodes. Write a SPARQL query to print out each distinct ''sh:xxxMessage'' in the ''results_graph''.
'''Task:'''
Modify the above query so it prints out each ''sh:xxxMessage'' in the ''results_graph'' once, along with the number of times the message has been repeated in the results.


'''Task:''' Install pySHACL into your virtual environment:
'''Task:''' Install pySHACL into your virtual environment:
Line 34: Line 63:
Fix ''kg4news.txt'' (renamed to ''.ttl'') so that:
Fix ''kg4news.txt'' (renamed to ''.ttl'') so that:
* Every kg:year value has rdf:type xsd:year .
* Every kg:year value has rdf:type xsd:year .
* xxx

Revision as of 12:50, 17 February 2023

Topics

  • Validating RDF graphs with SHACL
  • Running pySHACL

Useful materials

SHACL:

pySHACL:

Tasks

Task: Go to the interactive, online SHACL Playground. The file [File:xxx.txt] contains a small Turtle example you can paste into the Data Graph text field. The example is based on the kg4news.ttl graph introduced in the SPARQL lecture (S03). It contains several errors. Take some time to look at it in Turtle and also in JSON-LD, using the drop-down menu next to the Data Graph heading.

Task: Write Shapes Graphs in Turtle (recommended) or JSON-LD for each of the constraints below. Keep copies of your of your Shape Graphs in a separate text editor and file. You will need them later. Each time you have entered a Shape Graph into the text field, click Update to validate the contents of the Data Graph.

You can use the following prefixes:

xxx

Constraints:

  • Every kg:MainPaper has (is the subject of) exactly one kg:year property.
    • In the example, the node xxx has more than one such property.
  • Every kg:year value (literal object) is an xsd:integer.
    • In the example, xx node have kg:year-s that are not integers.
  • Only kg:MainPapers have kg:year properties.
    • In the example, the kg:MainAuthor xx also has a kg:year property.
  • Every kg:MainPaper has at least one dcterm:subject, whose value is a skos:Concept (see Sessions xx and xx on Vocabularies).
    • In the example, ...
  • More specifically, every value of dcterm:subject is either

(Hint: use SHACL's sh:or property to express this.)

    • In the example, ...
  • Only kg:MainPaper-s have dcterm:subject-s.
    • In the example, ...

Task: Write a Python program using rdflib and pySHACL, which:

  1. parses the contents of [File:xxx.ttl] as a Turtle file into a data_graph,
  2. parses the contents of a shape_graph you made in the previous task (for example checking the kg:year-s of kg:MainPaper-),
  3. uses pySHACL's validate method to apply the shape_graph constraints to the data_graph, and
  4. print out the validation result (a boolean value, a results_graph, and a result_text).

Task: Download the whole [kg4news.ttl KG4NEWS graph] we used in the SPARQL lecture (S03) and parse it into the data graph. Re-run a selection of your shape_graph constraints on the larger graph.

Task: In some cases, the results_graph and result_text will report the same error many times, but for different nodes. Write a SPARQL query to print out each distinct sh:xxxMessage in the results_graph.

Task: Modify the above query so it prints out each sh:xxxMessage in the results_graph once, along with the number of times the message has been repeated in the results.


Task: Install pySHACL into your virtual environment:

pip install pyshacl


If you have more time

Task: Fix kg4news.txt (renamed to .ttl) so that:

  • Every kg:year value has rdf:type xsd:year .
  • xxx