Lab: Using Graph Embeddings: Difference between revisions
From info216
(→Tasks) |
(→Tasks) |
||
Line 41: | Line 41: | ||
'''K-nearest neighbours''': | '''K-nearest neighbours''': | ||
* Find the indexes of the 10 entity vectors that are nearest | * Find the indexes of the 10 entity vectors that are nearest neighbours to your entity of choice. You can use [https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html sciKit-learn's sklearn.neighbors.NearestNeighbors.kneighbors()-method] for this. | ||
* Map the indexes of the 10-nearest neighbouring entities back into human-understandable labels. Does this make sense? Try the same thing with another entity (e.g., 'WALL·E'). | * Map the indexes of the 10-nearest neighbouring entities back into human-understandable labels. Does this make sense? Try the same thing with another entity (e.g., 'WALL·E'). | ||
Revision as of 14:05, 14 April 2022
Lab 13: Using Graph Embeddings
Topics
Using knowledge graph embeddings with TorchKGE.
Classes and methods
The following TorchKGE classes are central:
- KnowledgeGraph - contains the knowledge graph (KG)
- Model - contains the embeddings (entity and relation vectors) for some KG
Tasks
Knowledge Graph:
- Use a dataset loader to load a KG you want to work with. Freebase FB15k is a good choice. (You will need a pre-trained model for your KG later, to choose one of FB15k, FB15k237, WDV5, WN18RR, or Yago3-10. This lab has mostly been tested on FB15k.)
- Use the methods provided by the KnolwedgeGraph class to inspect the KG.
- Print out the numbers of entities, relations, and facts in the training, validation, and testing sets.
- Print the identifiers for the first 10 entities and relations (tip: ent2ix and rel2ix).
External identifiers:
- Download a dataset that provides more understandable labels for the entities (and perhaps relations) in your KnowledgeGraph
- If you use FB15k, the relation names are not so bad, but the entity identifiers do not give much meaning. Same with WordNet. This repository contains mappings for the Freebase and WordNet datasets.
- If you use a Wikidata graph, the entities and relations are all P- and Q-codes. To get labels, you can try a combination of SPARQL queries and this API.
- Create mappings from external label to entity (and perhaps relation) ids in the KnowledgeGraph. Also create the inverse mappings.
Test entities and relations:
- Get the KG indexes for a few entities and relations. If you use the Freebase or Wikidata graphs, you can try 'J. K. Rowling' and 'WALL·E' as entities (note that the dot in 'WALL·E' is not a hyphen or usual period.) For relations you can try 'influenced by' and 'genre'.
Model:
- Load a pre-trained TransE model that matches your KnowledgeGraph.
- Get the vectors for your test entities and relations (for example, 'J. K. Rowling' and 'influenced by').
- Find vectors for a few more entities (both unrelated and related ones, e.g., 'J. R. R. Tolkien', 'C. S. Lewis', ...). Use the model.dissimilarity()-method to estimate how semantically close your entities are. Do the distances make sense?
K-nearest neighbours:
- Find the indexes of the 10 entity vectors that are nearest neighbours to your entity of choice. You can use sciKit-learn's sklearn.neighbors.NearestNeighbors.kneighbors()-method for this.
- Map the indexes of the 10-nearest neighbouring entities back into human-understandable labels. Does this make sense? Try the same thing with another entity (e.g., 'WALL·E').
Translation:
- Add together the vectors for an entity and a relation that that gives meaning for the entity (e.g., 'J. K. Rowling' - 'influenced by', 'WALL·E' - 'genre'). Find the 10-nearest neighbouring entities for the vector sum. Does this make sense? Try more entities and relations. Try to find examples that work and that do not work well.
Code to get started
If You Have More Time
- Try it out with different datasets, for example one you create youreself using SPARQL queries on an open KG.