JohnGiorgi/seq2rel-ds
This is a companion repository to seq2rel (https://github.com/JohnGiorgi/seq2rel) which aims to make it easy to generate training data.
seq2rel: Datasets
This is a companion repository to seq2rel, which makes it easy to preprocess training data.
Installation
This repository requires Python 3.8 or later.
Setting up a virtual environment
Before installing, you should create and activate a Python virtual environment. If you need pointers on setting up a virtual environment, please see the AllenNLP install instructions.
Installing the library and dependencies
If you do not plan on modifying the source code, install from git using pip
pip install git+https://github.com/JohnGiorgi/seq2rel-ds.gitOtherwise, clone the repository and install from source using Poetry:
# Install poetry for your system: https://python-poetry.org/docs/#installation
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
# Clone and move into the repo
git clone https://github.com/JohnGiorgi/seq2rel-ds
cd seq2rel-ds
# Install the package with poetry
poetry installUsage
Installing this package gives you access to a simple command-line tool, seq2rel-ds. To see the list of available commands, run:
seq2rel-ds --helpNote, you can also call the underlying python files directly, e.g.
python path/to/seq2rel_ds/main.py --help.
To preprocess a dataset (and in most cases, download it), call one of the commands, e.g.
seq2rel-ds cdr main "path/to/cdr"Note, you have to include
mainbecausetyperdoes not support default commands.
This will create the preprocessed tsv files under the specified output directory, e.g.
cdr
┣ train.tsv
┣ valid.tsv
┗ test.tsv
which can then be used to train a seq2rel model.