GitHunt
DT

dtai-kg/TorchicTab-Heuristic

TorchicTab-Heuristic: Semantic Table Annotation with Wikidata

TorchicTab Heuristic

License
Python Versions

TorchicTab is a semantic table annotation system that automatically understands the content of a table and assigns semantic tags to its elements with high accuracy. It was originally developed for the SemTab challenge. You can find more about the full system in our dedicated article and paper.

This repository contains TorchicTab-Heuristic, the TorchicTab subsystem that annotates tables, using the Wikidata knowledge graph as a reference knowledge base. TorchicTab-Heuristic produces annotations for the following semantic annotation tasks:

  • The Cell Entity Annotation (CEA) task associates a table cell with an entity.
  • The Column Type Annotation (CTA) task assigns a semantic type to a column.
  • The Column Property Annotation (CPA) task discovers a semantic relation contained in the RDF graph that best represents the relation between two columns.
  • The Topic Detection (TD) task identifies the topic of a table that lacks a subject column and assigns a class.

TorchicTab-Heuristic Overview

Installation

TorchicTab-Heuristic requires a Python 3.9, 3.10 or 3.11 version. In case of conflicts, create a new virtual environment. For example, if you use conda, run:

conda create -n torchictab_env python=3.11
conda activate torchictab_env

Simple installation:

pip install torchic_tab_heuristic

Optional:

TorchicTab also allows the creation of an Elasticsearch index which contains all Wikidata entity-labels pairs. This allows for enhanced lookup tecnhiques leveraging powerful Elasticsearch functionalities, such as fuzzy querying. To use TorchicTab-Heuristic with Elasticsearch:

  • Download a Wikidata RDF dump from Zenodo

  • Install Elasticsearch. Recommended version: Elasticsearch 8

  • Process config.py file to configure index name and RDF dump adress.

  • Run elasticsearch server:

    cd elasticsearch-X.X.X
    ./bin/elasticsearch
  • Create the elasticsearch index:

    python elasticsearch/create_index.py

Usage

Example usage of TorchicTab-Heuristic with Wikidata:

Without Elasticsearch

python examples/sta_demo.py -i "examples/tables/cities.csv"

With Elasticsearch

python examples/sta_demo.py -i "examples/tables/cities.csv" -e

Cite

Thank you for reading! To cite our resource:

@InProceedings{dasoulas2023torchictab,
    author    = {Dasoulas, Ioannis and Yang, Duo and Duan, Xuemin and Dimou, Anastasia},
    journal = {CEUR Workshop Proceedings},
    publisher = {CEUR Workshop Proceedings},
    title = {TorchicTab: Semantic Table Annotation with Wikidata and Language Models},
    year = {2023-11-02},
    }

Languages

Python100.0%

Contributors

Apache License 2.0
Created April 4, 2025
Updated February 4, 2026
dtai-kg/TorchicTab-Heuristic | GitHunt