MO
MoleculeTransformers/moleculenet-smiles-bert-mixup
Training pre-trained BERT language model on molecular SMILES from the Molecule Net benchmark by leveraging mixup and enumeration augmentations.
MoleculeNet SMILES BERT Mixup
This repository contains implementation of mixup strategy for text classification. The implementation is primarily based on the paper Augmenting Data with Mixup for Sentence Classification: An Empirical Study
, although there is some difference.
Three variants of mixup are considered for text classification
- Embedding mixup: Texts are mixed immediately after word embeedding
- Hidden/Encoder mixup: Mixup is done prior to the last fully connected layer
- Sentence mixup: Mixup is done before softmax
Run Supervised Training with Late Mixup Augmentation
from tqdm import tqdm
SAMPLES_PER_CLASS = [50, 100, 150, 200, 250]
N_AUGMENT = [0, 2, 4, 8, 16]
DATASETS = ['bace', 'bbbp']
METHODS = ['embed', 'encoder', 'sent']
OUTPUT_FILE = 'eval_result_mixup_augment_v1.csv'
N_TRIALS = 20
EPOCHS = 20
for method in METHODS:
for dataset in DATASETS:
for sample in SAMPLES_PER_CLASS:
for n_augment in N_AUGMENT:
for i in tqdm(range(N_TRIALS)):
!python bert_mixup/late_mixup/train_bert.py --dataset-name={dataset} --epoch={EPOCHS} \
--batch-size=16 --model-name-or-path=shahrukhx01/muv2x-simcse-smole-bert \
--samples-per-class={sample} --eval-after={EPOCHS} --method={method} \
--out-file={OUTPUT_FILE} --n-augment={n_augment}
!cat {OUTPUT_FILE}Run Supervised Training with Early Mixup Augmentation
from tqdm import tqdm
SAMPLES_PER_CLASS = [50, 100, 150, 200, 250]
N_AUGMENT = [2, 4, 8, 16, 32]
DATASETS = ['bace', 'bbbp']
OUTPUT_FILE = '/nethome/skhan/moleculenet-smiles-bert-mixup/eval_result_early_mixup.csv'
N_TRIALS = 20
EPOCHS = 100
for dataset in DATASETS:
for sample in SAMPLES_PER_CLASS:
for n_augment in N_AUGMENT:
for i in tqdm(range(N_TRIALS)):
!python bert_mixup/early_mixup/main.py --dataset-name={dataset} --epoch={EPOCHS} \
--batch-size=16 --model-name-or-path=shahrukhx01/muv2x-simcse-smole-bert \
--samples-per-class={sample} --eval-after={EPOCHS} \
--out-file={OUTPUT_FILE} --n-augment={n_augment}
!cat {OUTPUT_FILE}Acknowledgement:
The code in this repository is mainly adapted from the repo "xashru/mixup-text".
On this page
Languages
Python100.0%
Contributors
Apache License 2.0
Created December 22, 2022
Updated March 15, 2024