GitHunt
SS

ss87021456/Venue-Prediction

Venue Prediction with bag of words + heterogenous information using sklearn SGDClassifier

Venue-Prediction

Venue Prediction with bag-of-words + heterogenous information as features using sklearn SGDClassifier

Dataset DBLP:

training: https://www.dropbox.com/s/rrbksqvvoefrr4p/training.txt?dl=0

validation: https://www.dropbox.com/s/tw094y2xfcoosv3/validation.txt?dl=0

Dataset describe:

Paper_Id \tab Paper_title \tab Publication_venue \tab Cited_Papers \tab Cited_Papers_Venues

Dependency:

python3

sklearn

pandas

numpy

pickle

Pipeline:

mkdir input # Create input directory

<Download training, validation dataset on the link above and move into input directory>

python3 ./src/clean_data.py --input ./input/training.txt --output ./input/cleaned_training.txt

python3 ./src/clean_data.py --input ./input/validation.txt --output ./input/cleaned_validation.txt

python3 ./src/create_data_example.py --train ./input/cleaned_training.txt --validation ./input/cleaned_validation.txt

python3 ./src/train_classifier.py --train ./input/cleaned_training.txt --validation ./input/cleaned_validation.txt

Default Configuration:

bag-of-word dimension: 3000

classifier: sklearn SGDClassifier (default)

Result: (on validation dataset)

Feature F1-micro F1-macro Accuracy
title info. 0.266 0.172 0.267
title + cited_venue info. 0.982 0.758 0.981
ss87021456/Venue-Prediction | GitHunt