Lab: Semantic Lifting - CSV: Difference between revisions
No edit summary |
No edit summary |
||
Line 52: | Line 52: | ||
ex = Namespace("httph://example.org/") | ex = Namespace("httph://example.org/") | ||
g.bind("ex", ex) | g.bind("ex", ex) | ||
# iterate through each row. First I select the subjects of the triples which will be the names. | # iterate through each row. First I select the subjects of the triples which will be the names. | ||
for index, row in csv_data.iterrows(): | for index, row in csv_data.iterrows(): | ||
subject = row['Name'].replace(" ", "_") | |||
#Continue Code here: | |||
print(g.serialize(format="turtle").decode()) | |||
</syntaxhighlight> | </syntaxhighlight> | ||
==Useful Readings== | ==Useful Readings== |
Revision as of 21:56, 12 March 2020
Lab 9: Semantic Lifting - CSV
Topics
Today's topic involves lifting the data in CSV format into RDF. The goal is for you to learn an example of how we can convert unsemantic data into RDF.
CSV stands for Comma Seperated Values, meaning that each point of data is seperated by a column.
Fortunately, CSV is already structured in a way that makes the creation of triples relatively easy.
Relevant Libraries
- Pandas
- Python functions:
split(), replace().
Tasks
Task 1
Below are four lines of CSV that could have been saved from a spreadsheet. Copy them into a file in your project folder and write a program with a loop that reads each line from that file (except the initial header line) and adds it to your graph as triples:
"Name","Gender","Country","Town","Expertise","Interests" "Regina Catherine Hall","F","Great Britain","Manchester","Ecology, zoology","Football, music travelling" "Achille Blaise","M","France","Nancy","","Chess, computer games" "Nyarai Awotwi Ihejirika","F","Kenya","Nairobi","Computers, semantic networks","Hiking, botany" "Xun He Zhang","M","China","Chengdu","Internet, mathematics, logistics","Dancing, music, trombone"
When solving the task take note of the following:
- The subject of the triples will be the names of the people. The header (first line) are the columns of data and should act as the predicates of the triples.
- Some columns like expertise have multiple values for one person. You should create unique triple for each of these expertises.
- Spaces should replaced with underscores to from a valid URI. E.g Regina Catherine should be Regina_Catherine.
- Any case with missing data should not form a triple.
- For consistency, make sure all resources start with a Captital letter.
Code to Get Started
from rdflib import Graph, Literal, Namespace, URIRef
import pandas as pd
csv_data = pd.read_csv("task1.csv")
g = Graph()
ex = Namespace("httph://example.org/")
g.bind("ex", ex)
# iterate through each row. First I select the subjects of the triples which will be the names.
for index, row in csv_data.iterrows():
subject = row['Name'].replace(" ", "_")
#Continue Code here:
print(g.serialize(format="turtle").decode())