GitHunt
SL

slavastar/word2vec

Word2Vec implementation in PyTorch.

Word2Vec

Implementation of the Word2Vec paper in PyTorch: Efficient Estimation of Word Representations in Vector Space

Most of the code was taken from this repository.

Description

The Word2Vec model was trained on Wiki2 dataset.
The hyperparameters of the model can be found in the config.yaml file.

Project structure

  • src/
    • model/
      • cbow.py - implemented CBOW model.
      • skip_gram.py - implemented Skip-Gram model.
      • utils.py - contains common function used for models.
    • dataloader.py - contains functions for text preprocessing and collecting a dataset.
    • train.py - contains a full pipeline for training and saving the model.
    • training.py - contains a wrapper for training a model.
    • utils.py - contains common functions used by other modules.
  • config.yaml - config file with the main parameters of the model.
  • demo.ipynb - demo notebook with several examples of model inference with visualisation.

Usage

python train.py --config config.yaml

Contributors

Created March 7, 2023
Updated March 8, 2023