Lab: Semantic Lifting - XML: Difference between revisions
From info216
No edit summary |
No edit summary |
||
Line 20: | Line 20: | ||
'''Task 2''' | '''Task 2''' | ||
Parse trough the fictional XML data below and add the correct journalist as the writers of the news_articles from earlier. | |||
This means that e.g if the news article is written on a Tuesday, Thomas Smith is the one who wrote it. | |||
One way to do this is by checking if any of the days in the "whenWriting" attribute is contained in the news articles "pubDate". | |||
<syntaxhighlight> | |||
<data> | |||
<news_publisher name="BBC News"> | |||
<journalist whenWriting="Mon, Tue, Wed" > | |||
<firstname>Thomas</firstname> | |||
<lastname>Smith</lastname> | |||
</journalist> | |||
<journalist whenWriting="Thu, Fri" > | |||
<firstname>Joseph</firstname> | |||
<lastname>Olson</lastname> | |||
</journalist> | |||
<journalist whenWriting="Sat, Sun" > | |||
<firstname>Sophia</firstname> | |||
<lastname>Cruise</lastname> | |||
</journalist> | |||
</news_publisher> | |||
</data> | |||
<syntaxhighlight> | |||
'''Task 3''' | '''Task 3''' |
Revision as of 23:18, 18 March 2020
Lab 10: Semantic Lifting - XML
Link to Discord server
Topics
Today's topic involves lifting data in XML format into RDF. XML stands for Extensible Markup Language and is used to... The goal is for you to learn an example of how we can convert unsemantic data into RDF.
Relevant Libraries/Functions
Tasks
Task 1
Task 2
Parse trough the fictional XML data below and add the correct journalist as the writers of the news_articles from earlier. This means that e.g if the news article is written on a Tuesday, Thomas Smith is the one who wrote it. One way to do this is by checking if any of the days in the "whenWriting" attribute is contained in the news articles "pubDate".
<data>
<news_publisher name="BBC News">
<journalist whenWriting="Mon, Tue, Wed" >
<firstname>Thomas</firstname>
<lastname>Smith</lastname>
</journalist>
<journalist whenWriting="Thu, Fri" >
<firstname>Joseph</firstname>
<lastname>Olson</lastname>
</journalist>
<journalist whenWriting="Sat, Sun" >
<firstname>Sophia</firstname>
<lastname>Cruise</lastname>
</journalist>
</news_publisher>
</data>
<syntaxhighlight>
'''Task 3'''
==If You have more Time==
==Code to Get Started==
<syntaxhighlight>
from rdflib import Graph, Literal, Namespace, URIRef
from rdflib.namespace import RDF, XSD
import xml.etree.ElementTree as ET
import requests
import re
g = Graph()
ex = Namespace("http://example.org/")
prov = Namespace("http://www.w3.org/ns/prov#")
g.bind("ex", ex)
g.bind("ex", prov)
# url of rss feed
url = 'http://feeds.bbci.co.uk/news/rss.xml'
# creating HTTP response object from given url
resp = requests.get(url)
# saving the xml file
with open('test.xml', 'wb') as f:
f.write(resp.content)
Hints |
Replacing characters with Dataframe:
|