Lab: Getting started with VSCode, Python and RDFlib: Difference between revisions

From info216
No edit summary
Line 14: Line 14:
* [https://rdflib.readthedocs.io/en/stable/namespaces_and_bindings.html Namespaces and Bindings]
* [https://rdflib.readthedocs.io/en/stable/namespaces_and_bindings.html Namespaces and Bindings]


==Useful rdflib classes/interfaces and methods==
RDFlib classes/interfaces and methods:
 
* Graph (add and perhaps remove methods)
import [https://rdflib.readthedocs.io/en/stable/py-modindex.html rdflib]:
* URIRef
*Graph (add)
* Literal
*URIRef
* Namespace
*Literal
* perhaps RDF (the RDF.type field)
*NameSpace
* perhaps BNode
*Bnode
* perhaps Collection
*Collection
*RDF, FOAF


Documentation for all the RDFlib modules can be found [https://rdflib.readthedocs.io/en/stable/py-modindex.html here].  
Documentation for all the RDFlib modules can be found [https://rdflib.readthedocs.io/en/stable/py-modindex.html here].  
Line 29: Line 27:


==Tasks==
==Tasks==
===Installation of and introduction to Python, Pip, VSCode, and RDFlib===
===Install Python, Pip, VSCode, and RDFlib===
* The '''Pip''' command is used to easily install additional python packages for your coding environment.
You need to have Python version >= 3.7 on your computer. Use the command ''python --version'' in a command/terminal window to check. You can download Python and Pip [https://www.python.org/downloads/ here]. To ensure you have the most recent Pip, you can do
* '''VSCode''' is an Integrated Development Environment (IDE) that can be used for writing Python code.
python -m pip install --upgrade pip
* '''RDFLib''' is a Python package for working with RDF. With this we can create RDF graphs, parse and serialize RDF, perform SPARQL queries on graphs and more.


You likely already have '''Python''' installed on your computer (it comes with MAC for instance). To find out, type 'python' in the commandline on windows or type 'python --version' in the terminal on your MAC.
You need to have an Integrated Development Environment (IDE) that supports Python. If you are unsure, you can download the free and open source Visual Studio Code (VSCode)[https://code.visualstudio.com/Download here].
If you don't already have Python, one alternative is to download and install the Python for your operative system [https://www.python.org/downloads/ here.] Per today (January 2022) RDFlib officially supports python version 3.7, 3.8 and 3.9. Or..


If you are already using a different Python IDE than VSCode that you are comfortable with, then you are free to use that program instead.
Create a folder for your INFO216 exercises. Start VSCode and create a new project (from the File menu) by opening your exercise folder folder. Create a new file with ''.py'' extension.  


Otherwise Download and Install the free and open source '''Visual Studio Code''' IDE [https://code.visualstudio.com/Download here.]
Go to your INFO216 exercise folder. You can do this in a command/terminal window outside VScode or use the Terminal menu to create a terminal inside VScode. In the terminal, create and activate a virtual environment. It is easiest to use ''pip'', but it is ok to use ''pipenv'' or ''conda'' of you prefer.
python -m venv venv


Start VSCode and create a new project by opening a folder with a new .py file. You can create the folder and file in the terminal or in windows explorer. Install the Python extension by Microsoft in the vscode extension manager. When the Python extension is installed you can use the 'select interpreter' field on the bottom left to use the virtual environment you made, or make sure you are using a supported version of Python.  
((Missing here - how to activate. Just close the terminal window and open it again should work.))
 
You can create the folder and file in the terminal or in windows explorer.  
Install Microsoft's Python extension in the VScode extension manager. When the Python extension is installed you can use the 'select interpreter' field on the bottom left to use the virtual environment you made, or make sure you are using a supported version of Python.  
   
   
===Programming with RDFlib===
In the terminal, and inside your Python environment, install RDFlib:
Using the VSCode terminal, inside your pip environment, install RDFlib by simply entering: 'pip install rdflib'
pip install rdflib
 
You can now ''import rdflib'' into your ''.py'' file, or import specific classes/inferfaces such as ''from rdflib import Namespace, Graph''.
To import rdflib to your .py file, write 'import rdflib' at the top of the file.
Alternatively to import certain modules of rdflib, you can write e.g 'from rdflib import Graph'.  


===Programming tasks===
===Programming tasks===
When solving the coding tasks you can look at the readings below if needed.
'''Task:''' Write a program that creates an RDF graph containing the triples from the sentences below:  
Use the classes/interfaces and methods listed on this page.
 
'''Task:''' Write a program that creates an RDF graph containing the triples from the following sentences. Note that one sentence could result in more than one triple.
When solving these particular tasks I would try to avoid using Blank Nodes, and instead use full URI's instead. For URIs you can just use an example URI like "http://example.org/".
This means that if you talk about a person name Trump, the URI could be  "http://example.org/Donald_Trump". Remember that Namespaces can be used so that you don't have to write the full URI everytime.
 
* The Mueller Investigation was lead by Robert Mueller.
* The Mueller Investigation was lead by Robert Mueller.
** It involved Paul Manafort, Rick Gates, George Papadopoulos, Michael Flynn, Michael Cohen, and Roger Stone.
** It involved Paul Manafort, Rick Gates, George Papadopoulos, Michael Flynn, Michael Cohen, and Roger Stone.
Line 68: Line 60:
* Rick Gates was charged with money laundering, tax evasion and foreign lobbying.
* Rick Gates was charged with money laundering, tax evasion and foreign lobbying.
** He pleaded guilty to conspiracy and lying to FBI.
** He pleaded guilty to conspiracy and lying to FBI.
''Tips:'' Note that some sentences can result in several triples.
For the URIs, you can just use an example path like ''http://example.org/''.
So if you want to represent Donald Trump, the URI could be ''http://example.org/Donald_Trump''. Remember that Namespaces can be used so that you don't have to write the full URI every time.


'''Task:''' Use the ''serialize'' method to write out the model in different formats (on screen or to file):
'''Task:''' Use the ''serialize'' method to write out the model in different formats (on screen or to file):
Line 75: Line 71:
* RDF-XML (format='xml')
* RDF-XML (format='xml')


Which one is easiest to read? What could be the pros and cons of the different formats?
Which one is easiest to read? What are the pros and cons of the different formats?
We will look more at some of them later in the course!
We will look more at some of them later in the course!



Revision as of 14:41, 13 January 2023

Topics

  1. Get you set up for programming knowledge graphs with rdflib in Python.
  2. Get started with basic RDF programming.

Useful materials

VSCode:

RDFlib:

RDFlib classes/interfaces and methods:

  • Graph (add and perhaps remove methods)
  • URIRef
  • Literal
  • Namespace
  • perhaps RDF (the RDF.type field)
  • perhaps BNode
  • perhaps Collection

Documentation for all the RDFlib modules can be found here. Browser search (often Ctrl-F) can be useful to find the module that you want.

Tasks

Install Python, Pip, VSCode, and RDFlib

You need to have Python version >= 3.7 on your computer. Use the command python --version in a command/terminal window to check. You can download Python and Pip here. To ensure you have the most recent Pip, you can do

python -m pip install --upgrade pip

You need to have an Integrated Development Environment (IDE) that supports Python. If you are unsure, you can download the free and open source Visual Studio Code (VSCode)here.

Create a folder for your INFO216 exercises. Start VSCode and create a new project (from the File menu) by opening your exercise folder folder. Create a new file with .py extension.

Go to your INFO216 exercise folder. You can do this in a command/terminal window outside VScode or use the Terminal menu to create a terminal inside VScode. In the terminal, create and activate a virtual environment. It is easiest to use pip, but it is ok to use pipenv or conda of you prefer.

python -m venv venv

((Missing here - how to activate. Just close the terminal window and open it again should work.))

You can create the folder and file in the terminal or in windows explorer. Install Microsoft's Python extension in the VScode extension manager. When the Python extension is installed you can use the 'select interpreter' field on the bottom left to use the virtual environment you made, or make sure you are using a supported version of Python.

In the terminal, and inside your Python environment, install RDFlib:

pip install rdflib

You can now import rdflib into your .py file, or import specific classes/inferfaces such as from rdflib import Namespace, Graph.

Programming tasks

Task: Write a program that creates an RDF graph containing the triples from the sentences below:

  • The Mueller Investigation was lead by Robert Mueller.
    • It involved Paul Manafort, Rick Gates, George Papadopoulos, Michael Flynn, Michael Cohen, and Roger Stone.
  • Paul Manafort was business partner of Rick Gates.
    • He was campaign chairman for Donald Trump
    • He was charged with money laundering, tax evasion, and foreign lobbying.
    • He was convicted for bank and tax fraud.
    • He pleaded guilty to conspiracy.
    • He was sentenced to prison.
    • He negotiated a plea agreement.
  • Rick Gates was charged with money laundering, tax evasion and foreign lobbying.
    • He pleaded guilty to conspiracy and lying to FBI.

Tips: Note that some sentences can result in several triples. For the URIs, you can just use an example path like http://example.org/. So if you want to represent Donald Trump, the URI could be http://example.org/Donald_Trump. Remember that Namespaces can be used so that you don't have to write the full URI every time.

Task: Use the serialize method to write out the model in different formats (on screen or to file):

  • Turtle (format='ttl')
  • N-Triple (format='nt')
  • JSON-LD (format='json-ld')
  • RDF-XML (format='xml')

Which one is easiest to read? What are the pros and cons of the different formats? We will look more at some of them later in the course!

Task; Use the online RDF grapher to visualise your model.

Task: Loop through the triples in the model to print out all triples that have pleading guilty as predicate. If you have been inconsistent about some predicate or other term, you can first write loops that correct wrong terms everywhere in the model. (Tip: to correct a term in a model, you typically have to first remove the old triple and then add a new one.)

If you have more time...

Task: If you have more time you can continue extending your graph:

  • Michael Cohen was Donald Trump's attorney.
    • He pleaded guilty for lying to Congress.
  • Michael Flynn was adviser to Donald Trump.
    • He pleaded guilty for lying to the FBI.
    • He negotiated a plea agreement.

Task: According to this FRONTLINE article, Gates', Cohen's and Flynn's lying were different and are described in different detail. How can you modify your knowledge graph to account for this?

Task: Write a method (function) that submits your model to https://www.ldf.fi/service/rdf-grapher for rendering and saves the returned image to file.