Lab: JSON-LD
Topics
JSON-LD @context and processing in the JSON-LD Playground.
Using a Web APIs to retrieve JSON-LD data from ConceptNet, parse them programmatically, use JSON-LD to turn them into RDF.
Useful Reading
Imports:
- import json
- import json-ld
- import rdflib
- import requests
Tasks
Part 1: Basic JSON-LD
In the first part of the lab you will start with an existing JSON-LD document:
{ "@context": { "@base": "http://example.org/", "edges": "http://example.org/triple", "start": "http://example.org/source", "rel": "http://example.org/predicate", "end": "http://example.org/object", "Person" : "http://example.org/Person", "birthday" : { "@id" : "http://example.org/birthday", "@type" : "xsd:date" }, "nameEng" : { "@id" : "http://example.org/en/name", "@language" : "en" } }, "@graph": [ { "@id": "people/Jeremy", "@type": "Person", "birthday" : "1987.1.1", "nameEng" : "Jeremy" }, { "@id": "people/Tom", "@type": "Person" }, {"edges" : [ { "start" : "people/Jeremy", "rel" : "knows", "end" : "people/Tom" } ]} ] }
Task: Using the JSON-LD document as a starting point, complete the following tasks:
- Create a new property named age, that has an integer type, and add it to the people already in the graph.
- Add the following 2 more people to the graph, make sure that their names' languages follow their nationality:
* Ju: chinese, 22 years old, likes to play basketball * Louis: french, 45 years old, and has black hair
- Add the following edges to the graph:
* Tom knows Louis * Louis teaches Ju * Ju plays basketball with Jeremy and Tom
Task: In a web browser, go to http://json-ld.org and continue to the Playground. Copy the JSON-LD document into the JSON-LD Input form.
In the Compacted form below the input form you will see a processed version of the JSON-LD input. Compare the information about Jeremy in the Input and Compacted output. Many of the keys and values in the output have been expanded according to mappings defined in the @context object at the beginning of the document.
http://json-ld.org provides many other processed variants of the JSON-LD, but Compacted may be easiest to start with.
Part 2: Retrieving JSON-LD from ConceptNet
Task: In a web browser, go to http://conceptnet.io and search for a term you are interested in. (It is good to take concept related to the Mueller investigation, for example 'indictment'.)
The same URL, but with https://api.conceptnet.io/ instead of just https://conceptnet.io/ returns the data as JSON-LD. It looks awfully detailed, but it will be easy to simplify it!
Task: In another web browser tab, go to http://json-ld.org again and continue to the Playground. Copy your JSON-LD data from the ConceptNet tab to the JSON-LD Input form.
In the Expanded form you again will see a processed version of the JSON-LD input. This time the keys and values in the output have been expanded according to mappings defined in the @context file http://api.conceptnet.io/ld/conceptnet5.7/context.ld.json, as specified in the beginning of the JSON-LD input:
"@context": [ "http://api.conceptnet.io/ld/conceptnet5.7/context.ld.json" ],
Task: Instead of a file, we will write our own simpler @context object into the JSON-LD Input. It should look like this instead:
"@context": { "current_key": "url_we_want_the_key_mapped_to", ... },
We are interested in these keys: edges, start, rel, end. Map them to simple URLs, like http://ex.org/t (for triple), http://ex.org/s, http://ex.org/p and http://ex.org/o. These are the basic triples we are most interested in!
Look at the Expanded version again. It is much simpler now: the JSON-LD processor ignores regular keys that are not mapped (but the special keys with @ are still there.)
Task: Remove the line that maps the edges key. What happens and why? Put the edges mapping back in again.
Task: In addition to the Expanded tab, the Playground can show Compacted and Flattened versions of the JSON-LD Input too. They are different ways of processing the same data, each of them useful for different purposes.
Which one do you prefer for reading? Which one would be easiest to program as JSON?
Task: We have lost the labels again!
Map label to http://www.w3.org/2000/01/rdf-schema#label and see what happens.
Part 3: Programming JSON-LD in Python
Task: Install the rdflib-jsonld package in the same environment as you have rdflib installed.
Create a graph object and parse the https://api.conceptnet.io/... URL you used to download JSON-LD data earlier. You need to add the argument format="json-ld" when you call parse(...), but you should not need to import more than rdflib as before.
Task: Inspect the graph object using simple SPARQL queries to find the distinct predicates and types used.
You can also count the number of triples in the graph:
print(len(g))
or iterate through all the triples
for s, p, o in g: print(s, p, o)
Task: Unfortunately, the graph is much more complex than we need and it is not easy to pick out the triples we want. We want to add our own context object like we did in the Playground. Instead of parsing a graph directly from a URL, we first download it as a JSON object, for example:
import json import requests CN_BASE = 'http://api.conceptnet.io/c/en/' json_obj = requests.get(CN_BASE+'indictment').json()
Now, json_obj['@context'] contains the @context object. Define your own context object in Python similar to the one you used in the play ground, and assign it to json_obj['@context'].
First parse the modified JSON object into a JSON string (import json and json.dumps(...)). Then create another graph object and parse the JSON string. You need to add the argument data=... in addition to format="json-ld" when you call parse(...) because you are no longer parsing from a file or URL, but from a string.
Save the JSON string for later, so you do not have to retrieve the same data over and over from ConceptNet.
Task: Create a new SPARQL SELECT query that lists all the (s, p, o) triples in your graph.
The URLs for predicates should be fine now, but the URLs of subjects and objects can be improved by mapping the special @base key in the @context object to a simple URL like http://ex.org/.
Task: Extend the SELECT query so that it also lists all the labels of subjects and objects.
Task: Change the SELECT query into a CONSTRUCT query that return a new graph of all the basic triples in the original JSON-LD data. Save it to file and look at it in a visualiser you like.
If you have more time...
Task: Merge the new triples with your existing graph if they fit there.
Task: Wrap the code you have written into a function describe_concept(...) that takes a concept name as argument (e.g., 'indictment') and returns a ConceptNet subgraph that describes the concept.
Task: The original JSON-LD data from https://api.conceptnet.io/... contains a view object at the end. Check it out!
By default, the API only returns 20 edges at a time. You can modify that by adding a ?limit=... argument to your URL.
Modify your describe_concept(...) method to take an extra argument that controls how many edges are downloaded.
Task: You still have way too many triples. Use FILTER and STRENDS to ignore some predicates like Synonym and general RelatedTo.
Task: Modify your @context and query so you can remove triples with concepts that are not in English language (en).