GitHunt
MA

markiskorova/Machine-Learning-NLP-Predict-Author

๐Ÿง  Machine Learning & Natural Language Processing: Predict the author of literary text snippets. Built with TensorFlow and Keras, this project trains an LSTM model on classic literature to identify writing style and authorship.

๐Ÿง  Machine Learning & NLP: Predicting Authors from Classic Literature

This project employs machine learning and natural language processing (NLP) to analyze classic literary works and predict the author of a given phrase. By examining textual patterns and stylistic nuances, the model learns to attribute authorship with notable accuracy.

๐Ÿ“š Overview

  • Objective: Develop a model that can predict the author of a text snippet from classic literature.
  • Techniques Used:
    • Text vectorization and tokenization
    • Sequential modeling with LSTM (Long Short-Term Memory) networks
  • Tools & Libraries:
    • Python
    • TensorFlow & Keras
    • Pandas & NumPy

๐Ÿ“ Repository Structure

  • Text_Author.csv: Dataset containing text excerpts and corresponding author labels.
  • text-analysis-detect-author-seq-lstm.py: Python script for data preprocessing, model training, and evaluation.
  • README.md: Project documentation.
  • LICENSE: MIT License.

๐Ÿš€ Getting Started

Prerequisites

Ensure you have the following installed:

  • Python 3.x
  • pip (Python package installer)

Installation

  1. Clone the repository:

    git clone https://github.com/markiskorova/Machine-Learning-NLP-Predict-Author.git
    cd Machine-Learning-NLP-Predict-Author
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install required packages:

    pip install tensorflow pandas numpy

Running the Model

Execute the script to train and evaluate the model:

python text-analysis-detect-author-seq-lstm.py

The script will process the data, train the LSTM model, and output evaluation metrics.

๐Ÿ“Š Dataset Details

  • Source: Curated collection of classic literary texts.
  • Format: CSV file with two columns:
    • text: Excerpt from a literary work.
    • author: Name of the author.

๐Ÿ” Model Architecture

  • Embedding Layer: Converts words into vector representations.
  • LSTM Layer: Captures sequential dependencies in the text.
  • Dense Output Layer: Outputs probabilities for each author class.

๐Ÿ“ˆ Evaluation Metrics

  • Accuracy: Measures the proportion of correct predictions.
  • Loss: Evaluates the model's prediction error.

๐Ÿ› ๏ธ Future Enhancements

  • Incorporate more diverse literary works to improve model generalization.
  • Experiment with advanced architectures like Bidirectional LSTMs or Transformers.
  • Implement a user interface for interactive author prediction.

๐Ÿ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.

๐Ÿค Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.

๐Ÿ“ฌ Contact

For questions or suggestions, feel free to open an issue or contact the repository maintainer.