dtai-kg/TorchicTab-Heuristic
TorchicTab-Heuristic: Semantic Table Annotation with Wikidata
TorchicTab Heuristic
TorchicTab is a semantic table annotation system that automatically understands the content of a table and assigns semantic tags to its elements with high accuracy. It was originally developed for the SemTab challenge. You can find more about the full system in our dedicated article and paper.
This repository contains TorchicTab-Heuristic, the TorchicTab subsystem that annotates tables, using the Wikidata knowledge graph as a reference knowledge base. TorchicTab-Heuristic produces annotations for the following semantic annotation tasks:
- The Cell Entity Annotation (CEA) task associates a table cell with an entity.
- The Column Type Annotation (CTA) task assigns a semantic type to a column.
- The Column Property Annotation (CPA) task discovers a semantic relation contained in the RDF graph that best represents the relation between two columns.
- The Topic Detection (TD) task identifies the topic of a table that lacks a subject column and assigns a class.
Installation
TorchicTab-Heuristic requires a Python 3.9, 3.10 or 3.11 version. In case of conflicts, create a new virtual environment. For example, if you use conda, run:
conda create -n torchictab_env python=3.11conda activate torchictab_envSimple installation:
pip install torchic_tab_heuristicOptional:
TorchicTab also allows the creation of an Elasticsearch index which contains all Wikidata entity-labels pairs. This allows for enhanced lookup tecnhiques leveraging powerful Elasticsearch functionalities, such as fuzzy querying. To use TorchicTab-Heuristic with Elasticsearch:
-
Download a Wikidata RDF dump from Zenodo
-
Install Elasticsearch. Recommended version: Elasticsearch 8
-
Process
config.pyfile to configure index name and RDF dump adress. -
Run elasticsearch server:
cd elasticsearch-X.X.X ./bin/elasticsearch -
Create the elasticsearch index:
python elasticsearch/create_index.py
Usage
Example usage of TorchicTab-Heuristic with Wikidata:
Without Elasticsearch
python examples/sta_demo.py -i "examples/tables/cities.csv"With Elasticsearch
python examples/sta_demo.py -i "examples/tables/cities.csv" -eCite
Thank you for reading! To cite our resource:
@InProceedings{dasoulas2023torchictab,
author = {Dasoulas, Ioannis and Yang, Duo and Duan, Xuemin and Dimou, Anastasia},
journal = {CEUR Workshop Proceedings},
publisher = {CEUR Workshop Proceedings},
title = {TorchicTab: Semantic Table Annotation with Wikidata and Language Models},
year = {2023-11-02},
}
