meemanali/Saraiki-Poetry-Sentiment-Analysis-NLP
This project explores sentiment analysis of Saraiki poetry, a low-resource regional language, using classical NLP preprocessing and a lightweight deep learning model.
Saraiki Poetry Sentiment Analysis (NLP)
Overview
This project explores sentiment analysis of Saraiki poetry, a low-resource regional language, using classical NLP preprocessing and a lightweight deep learning model.
The focus is on:
- Manual data collection and labeling
- Emotion normalization and sentiment abstraction
- End-to-end NLP pipeline using Python and TensorFlow
- Preparing the model for potential mobile (Android) deployment via TensorFlow Lite
Dataset
- Language: Saraiki
- Domain: Poetry
- Total samples: 958 (full dataset kept private)
- Public release: Partial sample only
Annotation Strategy
Each poem was manually labeled with an emotion:
- happy, sad, surprise, fear, anger, disgust
Emotions were mapped to binary sentiment:
- Positive (p): happy, surprise
- Negative (n): sad, fear, anger, disgust
Only a subset of the dataset is uploaded to illustrate:
- Data format
- Labeling approach
- Reproducibility
- The complete dataset is retained for possible future academic use.
Data Cleaning & Analysis
Before training, the dataset was cleaned and analyzed to ensure consistency:
- Normalized inconsistent emotion labels (case differences, spelling variants)
- Verified emotion and sentiment distributions
- Analyzed emotion-to-sentiment relationships
Dataset Analysis (Visual)
The following visualizations summarize the dataset structure:
- Emotion distribution:
- Sentiment (positive / negative) distribution:
- Emotion → sentiment mapping:
Methodology
Preprocessing
- Data loaded from Excel files using Pandas
- Text tokenized using Keras Tokenizer
- Sequences padded/truncated to fixed length
- Out-of-vocabulary handling enabled
Model
A lightweight neural network was used due to dataset size:
- Embedding layer
- Global Average Pooling
- Dense (ReLU)
- Output layer (Sigmoid)
The model performs binary sentiment classification, not multi-class emotion prediction.
Training
- Train / test split: 80% / 20%
- Loss function: Binary Crossentropy
- Optimizer: Adam
- Epochs: 30
The model learns general sentiment patterns from poetic text rather than fine-grained emotional nuance.
Mobile Deployment
After training, the model is converted to TensorFlow Lite (TFLite).
This enables:
- On-device inference
- Low latency predictions
- Future integration with Android applications
Limitations
- Small dataset size
- Possible class imbalance
- Binary sentiment only
- Simple model architecture
Results should be interpreted as exploratory and baseline-level, not production-ready.
Technologies Used
- Python
- Pandas
- Matplotlib
- NLTK
- TensorFlow / Keras
- TensorFlow Lite
Note
This project emphasizes data preparation, language challenges, and applied ML workflow in a low-resource setting rather than optimized model performance.


