Lab: Wikidata in RDF: Difference between revisions

From info216
No edit summary
Line 2: Line 2:
Wikidata in RDF:  
Wikidata in RDF:  
* retrieve ''truthy'' triples about a Wikidata entity
* retrieve ''truthy'' triples about a Wikidata entity
* retrieve semantic metadata (such as qualifications and references) about the triples
* load the semantic data and metadata into GraphDB
* load the semantic data and metadata into GraphDB
* visualise the semantic data and metadata
* visualise the semantic data and metadata
Line 11: Line 10:
* [https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/A_gentle_introduction_to_the_Wikidata_Query_Service A gentle introduction to the Wikidata Query Service] (Simple)
* [https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/A_gentle_introduction_to_the_Wikidata_Query_Service A gentle introduction to the Wikidata Query Service] (Simple)
* [https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format RDF Dump Format]
* [https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format RDF Dump Format]
* [https://en.wikibooks.org/wiki/SPARQL/Expressions_and_Functions SPARQL Expressions and Functions] - you will need this a lot


==Tasks==
==Tasks==
Line 27: Line 27:
'''Task:'''
'''Task:'''
Use a SELECT query to retrieve the first 100 triples about your entity.  
Use a SELECT query to retrieve the first 100 triples about your entity.  
''Tip:'' Always save your queries and updates as soon as they succeed. You may need to go back to them later.


'''Task:'''
'''Task:'''
Line 44: Line 46:
     # but is not strictly necessary when you alreadu have an inner LIMIT
     # but is not strictly necessary when you alreadu have an inner LIMIT


 
'''Task:'''
Change the SELECT query to an INSERT query that adds the Wikidata triples your local repository. Use a local ASK and/or SELECT query to check that the triples have actually been added.


'''Task:'''
'''Task:'''
Change the SELECT query to an INSERT query that adds the Wikidata triples your local repository. Use a local ASK and/or SELECT query to check that the triples have actually been added.
''Truthy'' Wikidata statements use the prefix ''wd:'' for resources and ''wdt:'' for predicates. Use a FILTER statement to only SELECT ''truthy'' triples in this sense.

Revision as of 14:07, 21 February 2024

Topics

Wikidata in RDF:

  • retrieve truthy triples about a Wikidata entity
  • load the semantic data and metadata into GraphDB
  • visualise the semantic data and metadata

Motivation: So far you have built your own knowledge graph and worked on a small grap you were given. This week we will look at how to retrieve knowledge graphs from Wikidata, which can then be merged with your own graph to provide additional context. This is not a trivial problem because Wikidata most likely contains a lot more data - and in particular metadata - than you need.

Useful materials

Tasks

Getting ready: In a web browser, go to Wikidata's Query Service (WDQS). Be careful to always use a limit like LIMIT 100 when you test things. Otherwise, you risk being blocked from the query service or, worse, you risk blocking out a whole subdomain.

Emergency data: If Wikdata's Query Service is unavailable, you can load [:File:Q42-extended.txt | this Turtle file] into GraphDB instead, and continue there using Q42 as your example entity. (Remember to rename it from .txt to .ttl.)

Task: From Wikidata's ordinary UI, find the Q-code of one of the people or entities involved in the Mueller investigation. Use that entity as your reference in the rest of this lab. (The Q-code should look like this https://www.wikidata.org/entity/Q42 or wd:Q42.)

Task: Use a DESCRIBE query to retrieve some triples about your entity (remember LIMIT 100, although it is less critical on DESCRIBE queries).

Task: Use a SELECT query to retrieve the first 100 triples about your entity.

Tip: Always save your queries and updates as soon as they succeed. You may need to go back to them later.

Task: Start GraphDB on your local machine. Create a new repository (No inference needed), and activate it. Write a local SELECT query that embeds a <https://query.wikidata.org/bigdata/namespace/wdq/sparql> SERVICE query to retrieve the first 100 triples about your entity to your local machine.

Tip: wd: is a PREFIX for <http://www.wikidata.org/entity/>.

Tip: To make LIMIT work inside a SERVICE query, you have to add another SELECT inside it, like this:

SELECT ... {  # the local query
    SERVICE ... {  # the remote service
        SELECT ... {
            ...
        } LIMIT 100  # this limit works on the remote service
    }
}  # a limit here would work on your local service, 
   # but is not strictly necessary when you alreadu have an inner LIMIT

Task: Change the SELECT query to an INSERT query that adds the Wikidata triples your local repository. Use a local ASK and/or SELECT query to check that the triples have actually been added.

Task: Truthy Wikidata statements use the prefix wd: for resources and wdt: for predicates. Use a FILTER statement to only SELECT truthy triples in this sense.