Lab: Getting started with VSCode, Python and RDFlib: Difference between revisions

From info216
No edit summary
m (Added lab presentation to useful materials)
 
(41 intermediate revisions by 2 users not shown)
Line 1: Line 1:
=Lab 1: Getting started with VSCode, Python and RDFlib=
==Topics==  
==Topics==  
Today we have 2 goals.
# Prepare for programming knowledge graphs with rdflib in Python.
1. To setup a programming environment that you will use for the lab excercises (if you don't already have one).  
# Get started with basic RDF programming.
2. To start some basic programming of RDF.  


Installation of and introduction to Python, VSCode, Pip and RDFlib.
==Useful materials==
VSCode:
* [https://code.visualstudio.com/docs/python/python-tutorial Getting Started with Python in VS Code]
* [https://code.visualstudio.com/docs/python/environments# Using Python environments in VS Code] (pip is easiest to start with)


'''VSCode''' is an Integrated Development Environment (IDE) that can be used for writing Python code.
RDFlib:
* [https://rdflib.readthedocs.io/en/stable/gettingstarted.html Intro to RDFlib]
* [https://rdflib.readthedocs.io/en/stable/intro_to_creating_rdf.html  Intro to creating triples]
* [https://rdflib.readthedocs.io/en/stable/intro_to_parsing.html Serialising and parsing]
* [https://www.youtube.com/watch?v=sCU214rbRZ0 8 min introduction video to RDFlib]


'''Conda''' is a package and virtual environment manager that will help you avoid dependency conflicts and clutter.
Lab Presentations:
* [https://docs.google.com/presentation/d/1blXlTTTsL8jqeV5sRLhQuZ-nssNZqH_bjSgjj-kougE/edit?usp=sharing Lab 1 - RDF Presentation]


The '''Pip''' command is used to easily install additional python packages for your coding environment.
RDFlib classes/interfaces and methods:
 
* Graph (add, close, and perhaps remove methods)
'''RDFLib''' is a Python package for working with RDF. With this we can create RDF graphs, parse and serialize RDF, perform SPARQL queries on graphs and more.
* URIRef
* Literal
* Namespace


==Tasks==
==Tasks==
===Install Python, Pip, VSCode, and RDFlib===
'''1)''' You need to have Python version >= 3.7 on your computer. Use the command ''python --version'' in a command/terminal window to check. You can download Python and Pip [https://www.python.org/downloads/ here]. To ensure you have the most recent Pip, you can do
python -m pip install --upgrade pip


You likely already have '''Python''' installed on your computer (it comes with MAC for instance). To find out, type 'python' in the commandline on windows or type 'python --version' in the terminal on your MAC.
'''2)''' You need to have an Integrated Development Environment (IDE) that supports Python. If you are unsure, you can download the free and open source Visual Studio Code (VSCode) [https://code.visualstudio.com/Download here].
If you don't already have Python, one alternative is to download and install the Python for your operative system [https://www.python.org/downloads/ here.] Per today (January 2022) RDFlib officially supports python version 3.7, 3.8 and 3.9. Or..


If you already have python 3.10 installed (not officially supported), or if you just want to install python in a most hygienic way, use  [https://docs.conda.io/en/latest/miniconda.html '''Conda'''] to create a virtual python environment for this course.  
'''3)''' Create a folder for INFO216. Start VSCode and ''create a workspace'' in the file menu (File Menu --> Save Workspace As) and save it in your folder. Afterwards, on the left side of VSCode, click on the document icon (explorer). Click Open Folder, and open your INFO216 folder. Create a new file with ''.py'' extension.  
* Install 'Miniconda' using the link above
* Windows search for 'anaconda' and open a prompt, or open a terminal in mac and linux.
* Enter 'conda update conda' to make sure you get the latest.
* Enter 'conda create --name info216 python=3.9' to create a python 3.9 environment and name it after this course.
* Enter 'conda activate info216' to use the new python environment.


'''4)''' You will be asked to install the Python extension, install it. If you weren't asked, on the left side of VSCode click on the 4 cubes (extension manager). Within here search for Microsoft's Python extension and install it.


If you are already using a different Python IDE than VSCode that you are comfortable with, then you are free to use that program instead.
'''5)''' If you don't have your terminal open, go to the top menu and click on terminal, and then ''New Terminal''. Check where your terminal window is currently located. The bottom line starting with ''PS'' or ''(base)'' shows where it's located. If you added the folder earlier, then you should be located in your INFO216 folder. However, if the destination after PS is not your INFO216 folder, you need to locate to this folder. You can move through folders with the ''cd'' command in the terminal. For instance, if you are at ''PS C:\Users\YourName>'' and your INFO216 folder is at your desktop, you could type the following ''cd .\Desktop\INFO216\''.    


Otherwise Download and Install the free and open source '''Visual Studio Code''' IDE [https://code.visualstudio.com/Download here.]
'''6)''' If you are correctly located, type in the following command into your terminal window


Start VSCode and create a new project by opening a folder with a new .py file. You can create the folder and file in the terminal or in windows explorer. Install the Python extension by Microsoft in the vscode extension manager. When the Python extension is installed you can use the 'select interpreter' field on the bottom left to use the virtual environment you made, or make sure you are using a supported version of Python.  
(Windows)
   
py -3 -m venv .venv
  .venv\scripts\activate


'''RDFlib'''
(Mac / Linux)
python3 -m venv .venv
source .venv/bin/activate


Using the VSCode terminal, or activated conda terminal if you are using a conda environment, install RDFlib by simply entering: 'pip install rdflib'
'''7)''' If you get the message ''"... is not digitally signed. You cannot run this script on the current system."'' copy and paste the following in the terminal, and repeat step 6:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process


To import rdflib to your .py file, write 'import rdflib' at the top of the file.  
'''8)''' This should now have created a virtual environment in your folder called ''.venv''. In the bottom right corner you will receive a notification asking your to select the new environment for the workspace folder, select yes. You should now see a ''(.venv)'' in front of PS/(base) in the terminal window. Your virtual environment should now automatically be selected when you open your workspace. However, sometimes you might need to open a new terminal for the (.venv) to appear.
Alternatively to import certain modules of rdflib, you can write e.g 'from rdflib import Graph'.  


'''9)''' In the terminal, type the following to install RDFlib:
pip install rdflib


'''Coding Tasks'''
'''10)''' You might need to close and reopen VSCode for RDFlib to work. You can now ''import rdflib'' into your ''.py'' file, or import specific classes/inferfaces such as ''from rdflib import Namespace, Graph''.


When solving the coding tasks you can look at the readings below if needed.
'''11)''' Right click in your code and click "Run Current File in Interactive Window". When running your program for the first time, you might be asked to install ''ipykernel package'', install it.  
Use the Classes/interfaces and methods listed at the bottom of the page. ()


===Programming tasks===
'''Task:'''
Represent the sentences below as triples. Note that some sentences can result in several triples.
* The Mueller Investigation was lead by Robert Mueller.
** It involved Paul Manafort, Rick Gates, George Papadopoulos, Michael Flynn, Michael Cohen, and Roger Stone.
* Paul Manafort was business partner of Rick Gates.
** He was campaign chairman for Donald Trump
** He was charged with money laundering, tax evasion, and foreign lobbying.
** He was convicted for bank and tax fraud.
** He pleaded guilty to conspiracy.
** He was sentenced to prison. 
** He negotiated a plea agreement.
* Rick Gates was charged with money laundering, tax evasion and foreign lobbying.
** He pleaded guilty to conspiracy and lying to FBI.


'''1.''' Write a program that creates an RDF graph containing the triples from the following sentences. Note that one sentence could result in more than one triple.
'''Task:''' Write a program that creates an RDF graph and adds the triples you just created.  
When solving these particular tasks I would try to avoid using Blank Nodes, and instead use full URI's instead. For URIs you can just use an example URI like "http://example.org/".
This means that if you talk about a person called Cade, the URI could be  "http://example.org/Cade". Remember the Namespaces can be used so that you don't have to write the full URI everytime.


* Cade is married to Mary
For the URIs, you can just use an example path like ''http://example.org/''.
* The capital of France is Paris
So if you want to represent Donald Trump, the URI could be ''http://example.org/Donald_Trump'', and you can create the resource like this:
* Cade is 27 years old
from rdflib import URIRef
* 26 years is the age of Mary
* Marys interests include hiking, chocolate and biology
donaldTrump = URIRef('http://example.org/Donald_Trump')
* Mary is a student
* Paris is a City in France
* Cade and Mary are kind people


You can even use a Namespace so you don't have to write the full URI every time:
from rdflib import Namespace
ex = Namespace('http://example.org/')
donaldTrump = ex.Donald_Trump


==If you have more time...==
'''Task:''' Use the ''serialize'' method of ''rdflib.Graph'' to write out the model in different formats (on screen or to file):
If you have more time you can continue extending your graph with some of the stuff from lab 2:
* Turtle (format='ttl')
''"Cade has the full name Cade Tracey. He lives in 1516 Henry Street, Berkeley, California 94709, USA. He has a B.Sc. in biology from the University of California, Berkeley from 2011. His interests include birds, ecology, the environment, photography and travelling. He has visited Canada and France."''
* N-Triple (format='nt')
Try to use as many different methods as possible to create the triples.
* JSON-LD (format='json-ld')
* RDF-XML (format='xml')


==Documentation and other useful reading==
Which one is easiest to read? What are the pros and cons of the different formats?
We will look more at some of them later in the course!


Conda documentation for what we did today
'''Task:''' Use the simple [https://www.ldf.fi/service/rdf-grapher online RDF grapher] to visualise your model. :isSemantic offers [https://issemantic.net/rdf-visualizer a more advanced RDF visualiser] that you can also test if you want.
* [https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html# Getting started with Conda]
* [https://code.visualstudio.com/docs/python/environments# Using Python environments in VS Code]


If needed here is an introduction of how to create projects and python files in VSCode [https://code.visualstudio.com/docs/python/python-tutorial Getting Started with Python in VS Code].
'''Task:''' Loop through the triples in the model to print out all triples that have ''pleading guilty'' as predicate. If you have been inconsistent about some predicate or other term, you can first write loops that correct wrong terms everywhere in the model. (''Tip:'' to correct a term in a model, you typically have to first ''remove'' the old triple and then ''add'' a new one.)


[https://rdflib.readthedocs.io/en/stable/index.html# RDFLib Documentation]- Todays focus are the following:
==If you have more time...==
* [https://rdflib.readthedocs.io/en/stable/gettingstarted.html Intro to RDFlib]
'''Task:''' If you have more time you can continue extending your graph:
* [https://rdflib.readthedocs.io/en/stable/intro_to_creating_rdf.html  Intro to creating triples]
* Michael Cohen was Donald Trump's attorney.
 
** He pleaded guilty for lying to Congress.
Additional recommended in depth documentation:
* Michael Flynn was adviser to Donald Trump.
* [https://rdflib.readthedocs.io/en/stable/rdf_terms.html RDF Terms]
** He pleaded guilty for lying to the FBI.
* [https://rdflib.readthedocs.io/en/stable/namespaces_and_bindings.html Namespaces and Bindings]
** He negotiated a plea agreement.
 
[[:File:S01-KnowledgeGraphs.pdf | Slides from the lecture]]


==Relevant RDFlib interfaces (and methods)==
'''Task:''' According to [https://www.pbs.org/wgbh/frontline/article/the-mueller-investigation-explained-2/ this FRONTLINE article], Gates', Cohen's and Flynn's lying were different and are described in different detail. How can you modify your knowledge graph to account for this?
import [https://rdflib.readthedocs.io/en/stable/py-modindex.html rdflib]:
*Graph (add)
*URIRef
*Literal
*NameSpace
*Bnode
*Collection
*RDF, FOAF


All RDFlib modules can be found [https://rdflib.readthedocs.io/en/stable/py-modindex.html here]
'''Task:''' Write a method (function) that submits your model to https://www.ldf.fi/service/rdf-grapher for rendering and saves the returned image to file.
Browser search (often Ctrl-F) is useful here to find the module that you want. Look at the different serializations that are available and try a few of them out. Which serialization is the most comprehensible to you?

Latest revision as of 13:18, 19 January 2024

Topics

  1. Prepare for programming knowledge graphs with rdflib in Python.
  2. Get started with basic RDF programming.

Useful materials

VSCode:

RDFlib:

Lab Presentations:

RDFlib classes/interfaces and methods:

  • Graph (add, close, and perhaps remove methods)
  • URIRef
  • Literal
  • Namespace

Tasks

Install Python, Pip, VSCode, and RDFlib

1) You need to have Python version >= 3.7 on your computer. Use the command python --version in a command/terminal window to check. You can download Python and Pip here. To ensure you have the most recent Pip, you can do

python -m pip install --upgrade pip

2) You need to have an Integrated Development Environment (IDE) that supports Python. If you are unsure, you can download the free and open source Visual Studio Code (VSCode) here.

3) Create a folder for INFO216. Start VSCode and create a workspace in the file menu (File Menu --> Save Workspace As) and save it in your folder. Afterwards, on the left side of VSCode, click on the document icon (explorer). Click Open Folder, and open your INFO216 folder. Create a new file with .py extension.

4) You will be asked to install the Python extension, install it. If you weren't asked, on the left side of VSCode click on the 4 cubes (extension manager). Within here search for Microsoft's Python extension and install it.

5) If you don't have your terminal open, go to the top menu and click on terminal, and then New Terminal. Check where your terminal window is currently located. The bottom line starting with PS or (base) shows where it's located. If you added the folder earlier, then you should be located in your INFO216 folder. However, if the destination after PS is not your INFO216 folder, you need to locate to this folder. You can move through folders with the cd command in the terminal. For instance, if you are at PS C:\Users\YourName> and your INFO216 folder is at your desktop, you could type the following cd .\Desktop\INFO216\.

6) If you are correctly located, type in the following command into your terminal window

(Windows)

py -3 -m venv .venv
.venv\scripts\activate

(Mac / Linux)

python3 -m venv .venv
source .venv/bin/activate

7) If you get the message "... is not digitally signed. You cannot run this script on the current system." copy and paste the following in the terminal, and repeat step 6:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process

8) This should now have created a virtual environment in your folder called .venv. In the bottom right corner you will receive a notification asking your to select the new environment for the workspace folder, select yes. You should now see a (.venv) in front of PS/(base) in the terminal window. Your virtual environment should now automatically be selected when you open your workspace. However, sometimes you might need to open a new terminal for the (.venv) to appear.

9) In the terminal, type the following to install RDFlib:

pip install rdflib

10) You might need to close and reopen VSCode for RDFlib to work. You can now import rdflib into your .py file, or import specific classes/inferfaces such as from rdflib import Namespace, Graph.

11) Right click in your code and click "Run Current File in Interactive Window". When running your program for the first time, you might be asked to install ipykernel package, install it.

Programming tasks

Task: Represent the sentences below as triples. Note that some sentences can result in several triples.

  • The Mueller Investigation was lead by Robert Mueller.
    • It involved Paul Manafort, Rick Gates, George Papadopoulos, Michael Flynn, Michael Cohen, and Roger Stone.
  • Paul Manafort was business partner of Rick Gates.
    • He was campaign chairman for Donald Trump
    • He was charged with money laundering, tax evasion, and foreign lobbying.
    • He was convicted for bank and tax fraud.
    • He pleaded guilty to conspiracy.
    • He was sentenced to prison.
    • He negotiated a plea agreement.
  • Rick Gates was charged with money laundering, tax evasion and foreign lobbying.
    • He pleaded guilty to conspiracy and lying to FBI.

Task: Write a program that creates an RDF graph and adds the triples you just created.

For the URIs, you can just use an example path like http://example.org/. So if you want to represent Donald Trump, the URI could be http://example.org/Donald_Trump, and you can create the resource like this:

from rdflib import URIRef

donaldTrump = URIRef('http://example.org/Donald_Trump')

You can even use a Namespace so you don't have to write the full URI every time:

from rdflib import Namespace

ex = Namespace('http://example.org/')
donaldTrump = ex.Donald_Trump

Task: Use the serialize method of rdflib.Graph to write out the model in different formats (on screen or to file):

  • Turtle (format='ttl')
  • N-Triple (format='nt')
  • JSON-LD (format='json-ld')
  • RDF-XML (format='xml')

Which one is easiest to read? What are the pros and cons of the different formats? We will look more at some of them later in the course!

Task: Use the simple online RDF grapher to visualise your model. :isSemantic offers a more advanced RDF visualiser that you can also test if you want.

Task: Loop through the triples in the model to print out all triples that have pleading guilty as predicate. If you have been inconsistent about some predicate or other term, you can first write loops that correct wrong terms everywhere in the model. (Tip: to correct a term in a model, you typically have to first remove the old triple and then add a new one.)

If you have more time...

Task: If you have more time you can continue extending your graph:

  • Michael Cohen was Donald Trump's attorney.
    • He pleaded guilty for lying to Congress.
  • Michael Flynn was adviser to Donald Trump.
    • He pleaded guilty for lying to the FBI.
    • He negotiated a plea agreement.

Task: According to this FRONTLINE article, Gates', Cohen's and Flynn's lying were different and are described in different detail. How can you modify your knowledge graph to account for this?

Task: Write a method (function) that submits your model to https://www.ldf.fi/service/rdf-grapher for rendering and saves the returned image to file.