Lab: Web APIs and JSON-LD: Difference between revisions

From info216
No edit summary
(Added link to exam presentation)
 
(17 intermediate revisions by 5 users not shown)
Line 1: Line 1:
=Lab 12: Accessing and lifting Web APIs (RESTful web services)=


==Topics==  
==Topics==  
Programming regular (non-semantic) as well as semantic Web APIs (RESTful web services) with JSON and JSON-LD.
Programming regular (non-semantic) Web APIs (RESTful web services) with JSON-LD.


We will use Web APIs to retrieve regular JSON data, and then append it with a semantic context (@context).
We will use Web APIs to retrieve regular JSON data, parse them programmatically, where possible link the resources to established DBpedia ones and finally create a RDFLib graph with the data.
Finally we will parse it with RDFlib.  


@context: signifies a JSON object that contains the
==Useful Reading==
context (or semantic mapping) for the other objects in
* [https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/ Reading and writing with JSON - stackabuse.com]
the same JSON array. (Similar to namespaces)
* [https://wiki.uib.no/info216/index.php/Python_Examples Examples]
* [https://realpython.com/python-requests/ Requests - realpython.com]
* [https://json-ld.org/ JSON for Linking Data]
* [https://www.dbpedia-spotlight.org/api Spotlight Documentation]
* [https://docs.google.com/presentation/d/1GpzP6825dxau-W21t4Nj0bccd4nbFplGTyMjRBbiXTU/edit?usp=sharing Exam Presentation]


 
'''Imports:'''
==Imports==
* import json
* import rdflib
* import requests
* import requests
* import json
* import spotlight
* import pprint
* from rdflib import Graph
 


==Tasks==
==Tasks==
===Regular JSON web APIs===
=== Task 1 ===
Write a small program that accesses a regular (non-semantic) web API and download the result. The "json" library in python can be used to load a json string as a json object (json.loads(data)).
Write a small program that queries the Open Notify Astros API (link below) for the people currently in space. Create a graph from the response connecting each astronaut to the craft they are currently on, for instance using http://example.com/onCraft as a property. Also as the space station is not too big, it is safe to assume that two people who spent time on it at the same time know each other, so add this to the graph.
Use the the prettyprint import to print a readable version of the json object.


The GeoNames web API (http://www.geonames.org/export/ws-overview.html) offers many services. For example, you can use this URL to access more information about Ines' neighbourhood in Valencia: http://api.geonames.org/postalCodeLookupJSON?postalcode=46020&country=ES&username=demo (You might need to register a username instead of using "demo"). You can register here if you want to: https://www.geonames.org/login.
* Astros API url: http://api.open-notify.org/astros.json
You also need to enable the webservice here: https://www.geonames.org/manageaccount.
* Documentation: http://open-notify.org/Open-Notify-API/People-In-Space/
* Requests Quickstart: https://docs.python-requests.org/en/latest/user/quickstart/


You do not have to use the GeoNames web API. There are lots and lots of other web APIs out there. But we preferably want something simple that does not require extensive registration (HTTPS can also make things more complex when the certificates are outdated). Here are some examples to get you started if you want to try out other APIs: http://opendata.app.uib.no/ , http://data.ssb.no/api , http://ws.audioscrobbler.com/2.0/ , http://www.last.fm/api /intro , http://wiki.musicbrainz.org/Development/JSON_Web_Service .
The response from the API follows the format


While you are testing and debugging things, it is good to make measures so that you do not need to call the GeoNames or other API over and over. A solution can be writing the returned data to a file, or copying it into a variable.  
<syntaxhighlight>
{
    "message": "success",
    "number": 7,
    "people": [
        {
            "craft": "ISS",
            "name": "Sergey Ryzhikov"
        },
        {
            "craft": "ISS",
            "name": "Kate Rubins"
        },
        ...
    ]
}
</syntaxhighlight>


Here is an example of a results string you can use, if you have trouble connecting to GeoNames (note that you have to escape all the quotation marks inside the Java string):
We only need to think about whats inside the list of the "people"-value.
{\"postalcodes\":[{\"adminCode2\":\"V\",\"adminCode1\":\"VC\",\"adminName2\":\"Valencia\",\"lng\":-0.377386808395386,\"countryCode\":\"ES\",\"postalcode\":\"46020\",\"adminName1\":\"Comunidad Valenciana\",\"placeName\":\"Valencia\",\"lat\":39.4697524227712}]}"
To create the graph you can iteratively extract the values of craft and name and add them. As none of the names or craft is a valid URI, they can be crated using the example-namespace.


===Lifting JSON to JSON-LD===
=== Task 2 ===
Serialise the graph to JSON-LD, set the context of the JSON-LD object to represent the properties for knows and onCraft.


In python we can represent JSON objects as dictionaries ({}) and JSON Arrays as lists ([]).
To do this you need to pip install the json-ld portion of rdflib if you have not already:
 
So far we have only used plain JSON. Now we want to move to JSON-LD, the semantic version of JSON. Make a new JSON object (dictionary/{} in python) that will contain the context key-value pairs (context_data). This data has to eventually be added to out JSON data, with "@context" as the key and context_data as the value.
 
Put at least one pair of strings into it. For example, if you used the postcode API, the pair "lat" and "http://www.w3.org/2003/01/geo/wgs84_pos#lat". You can also put the pair "lng" and "http://www.w3.org/2003/01/geo/wgs84_pos#long".
 
Add this pair too to the context object: "postalcodes" and "http://dbpedia.org/ontology/postalCode".
 
Add more string pairs, using existing or inventing new terms as you go along, to the context object.
 
We will now make a RDFlib Graph from the JSON-LD object.
 
First you need to pip install the json-ld portion of rdflib if you have not already:
<syntaxhighlight>
<syntaxhighlight>
pip install rdflib-jsonld
pip install rdflib-jsonld
</syntaxhighlight>
</syntaxhighlight>


Now, create a new Graph. Then convert the JSON-LD object to a string (use json.dumps() and write it to a file). Then parse the file with Rdflib (g.parse()).
== If you have more time ==
DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia.


Congratulations - you have now gone through the steps of accessing a web API over the net, lifting the results using JSON-LD, manipulating the in JSON-LD and reading them into a RDF Graph. Of course, it is easy to convert the RDFlib graph back into JSON-LD using g.serialize("json-ld")
Build upon the program using the DBpedia Spotlight API (example code below) to use a DBpedia-resource in your graph if one is available. You can add some simple error-handling for cases where no DBpedia resource is found - use an example-entity in stead. Keep in mind that some resources may represent other people with the same name, so try to change the types-parameter so you only get astronauts in return, the confidence-parameter might also help you with this.


===If You have more time===
The response from DBpedia Spotlight is a list of dictionaries, where each dictionary contains the URI of the resource, its types and some other metadata we will not use now. Set the type of the resouce to the types listed in the response.
Try to download a new JSON from a different API and lift its data to the rdflib Graph, without making a context. This mean you must iterate/access each data point that you need with the json library.
e.g http://api.geonames.org/weatherJSON?formatted=true&north=44.1&south=-9.9&east=-22.4&west=55.2&username=demo&style=full


Which approach do you find to be easiest?
=== Example code for DBpedia Spotlight query ===
First pip install <b>pyspotlight</b>
<syntaxhighlight>
import spotlight
# Note that althoug we import spotlight in python, we need to pip install pyspotlight to get the correct package


 
SERVER = "https://api.dbpedia-spotlight.org/en/annotate"
 
annotations = spotlight.annotate(SERVER, "str_to_be_annotated")
===Useful Reading===
</syntaxhighlight>
* [https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/ - Reading and writing with JSON - stackabuse.com]
* [https://wiki.uib.no/info216/index.php/Python_Examples - Examples]
* [https://realpython.com/python-requests/ Requests - realpython.com]

Latest revision as of 20:13, 3 May 2023

Topics

Programming regular (non-semantic) Web APIs (RESTful web services) with JSON-LD.

We will use Web APIs to retrieve regular JSON data, parse them programmatically, where possible link the resources to established DBpedia ones and finally create a RDFLib graph with the data.

Useful Reading

Imports:

  • import json
  • import rdflib
  • import requests
  • import spotlight

Tasks

Task 1

Write a small program that queries the Open Notify Astros API (link below) for the people currently in space. Create a graph from the response connecting each astronaut to the craft they are currently on, for instance using http://example.com/onCraft as a property. Also as the space station is not too big, it is safe to assume that two people who spent time on it at the same time know each other, so add this to the graph.

The response from the API follows the format

{
    "message": "success",
    "number": 7,
    "people": [
        {
            "craft": "ISS",
            "name": "Sergey Ryzhikov"
        },
        {
            "craft": "ISS",
            "name": "Kate Rubins"
        },
        ...
    ]
}

We only need to think about whats inside the list of the "people"-value. To create the graph you can iteratively extract the values of craft and name and add them. As none of the names or craft is a valid URI, they can be crated using the example-namespace.

Task 2

Serialise the graph to JSON-LD, set the context of the JSON-LD object to represent the properties for knows and onCraft.

To do this you need to pip install the json-ld portion of rdflib if you have not already:

pip install rdflib-jsonld

If you have more time

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia.

Build upon the program using the DBpedia Spotlight API (example code below) to use a DBpedia-resource in your graph if one is available. You can add some simple error-handling for cases where no DBpedia resource is found - use an example-entity in stead. Keep in mind that some resources may represent other people with the same name, so try to change the types-parameter so you only get astronauts in return, the confidence-parameter might also help you with this.

The response from DBpedia Spotlight is a list of dictionaries, where each dictionary contains the URI of the resource, its types and some other metadata we will not use now. Set the type of the resouce to the types listed in the response.

Example code for DBpedia Spotlight query

First pip install pyspotlight

import spotlight
# Note that althoug we import spotlight in python, we need to pip install pyspotlight to get the correct package

SERVER = "https://api.dbpedia-spotlight.org/en/annotate"
annotations = spotlight.annotate(SERVER, "str_to_be_annotated")