JC
jc-healy/text_evaluation
A data science cookie cutter repo for doing text evaluation.
text evaluation
evaluation of text embedding techniques
GETTING STARTED
- Create and switch to the virtual environment:
cd text_evaluation
make create_environment
conda activate text_evaluation
make requirements
- Explore the notebooks in the
notebooksdirectory
Project Organization
LICENSEMakefile- top-level makefile. Type
makefor a list of valid commands
- top-level makefile. Type
README.md- this file
data- Data directory. often symlinked to a filesystem with lots of space
data/raw- Raw (immutable) hash-verified downloads
data/interim- Extracted and interim data representations
data/processed- The final, canonical data sets for modeling.
docs- A default Sphinx project; see sphinx-doc.org for details
models- Trained and serialized models, model predictions, or model summaries
models/trained- Trained models
models/output- predictions and transformations from the trained models
notebooks- Jupyter notebooks. Naming convention is a number (for ordering),
the creator's initials, and a short-delimited description,
e.g.1.0-jqp-initial-data-exploration.
- Jupyter notebooks. Naming convention is a number (for ordering),
references- Data dictionaries, manuals, and all other explanatory materials.
reports- Generated analysis as HTML, PDF, LaTeX, etc.
reports/figures- Generated graphics and figures to be used in reporting
reports/tables- Generated data tables to be used in reporting
reports/summary- Generated summary information to be used in reporting
requirements.txt- (if using pip+virtualenv) The requirements file for reproducing the
analysis environment, e.g. generated withpip freeze > requirements.txt
- (if using pip+virtualenv) The requirements file for reproducing the
environment.yml- (if using conda) The YAML file for reproducing the analysis environment
setup.py- Turns contents of
srcinto a
pip-installable python module (pip install -e .) so it can be
imported in python code
- Turns contents of
src- Source code for use in this project.
src/__init__.py- Makes src a Python module
src/data- Scripts to fetch or generate data. In particular:
src/data/make_dataset.py- Run with
python -m src.data.make_dataset fetch
orpython -m src.data.make_dataset process
- Run with
src/analysis- Scripts to turn datasets into output products
src/models- Scripts to train models and then use trained models to make predictions.
e.g.predict_model.py,train_model.py
- Scripts to train models and then use trained models to make predictions.
tox.ini- tox file with settings for running tox; see tox.testrun.org
This project was built using cookiecutter-easydata, an experimental fork of [cookiecutter-data-science](https://github.com/drivendata/cookiecutter-data-science) aimed at making your data science workflow reproducible.
On this page
Languages
Jupyter Notebook81.8%Python16.9%Makefile1.3%
Contributors
MIT License
Created February 26, 2019
Updated November 15, 2019