MI
Minku-Koo/Comment-Sentiment-Analysis
Comment Sentiment Analysis using Deep Learning
Comment-Sentiment-Analysis
Comment Sentiment Analysis using Deep Learning
π Author : Minku Koo
π Project Period : Dec/2020 ~ Jan/2021
π Contact : corleone@kakao.com
π Main Library : tensorflow, keras, KoNLPy
π Keyword : "Sentiment Analysis", "Machine Learning", "Korean", "Deep Learning"
π Table of Contents
- Introduction
- Data Scrapping
- Data Labeling
- Data Preprocessing
- Build Deep Learning Network
- Predict Data Sentiments
- Result
1. Scrapping Comment Data
- Python Crawler : ./python-code/comment_crawling.py
- Target Place : Naver, Daum News Comment
- Scrapped Data : Comment, Replay, Article Date (+ Title, Content)
- News Searching Keyword : "κΈ°λ κ΅", "λΆκ΅", "μ²μ£Όκ΅", "μ μ²μ§", "μ’ κ΅"
- Data Saved Place : Database (MariaDB)
- Database Data to Text file - path : ./comment/raw-comment/
π Scrapping Period per Religion
| κ²μ ν€μλ | μμ§ μμ κΈ°κ° | κΈ°μ€ λ μ§ | μμ§ μ’ λ£ κΈ°κ° |
|---|---|---|---|
| μ μ²μ§ | 19.09.17 | 20.02.17 | 20.07.18 |
| κΈ°λ κ΅ | 19.08.20 | 20.01.20 | 20.10.20 |
| μ²μ£Όκ΅ | 19.08.20 | 20.01.20 | 20.08.20 |
| λΆκ΅ | 19.08.20 | 20.01.20 | 20.08.20 |
| μ’ κ΅ | 19.08.20 | 20.01.20 | 20.10.10 |
π Scrapped Data Result
| κ²μ ν€μλ | μ΄μ κΈ°κ° | μ΄ν κΈ°κ° | ||
|---|---|---|---|---|
| Article | Comment | Article | Comment | |
| μ μ²μ§ | 211 | 22,658 | 2,974 | 262,840 |
| κΈ°λ κ΅ | 1,771 | 94,405 | 1,186 | 85,443 |
| μ²μ£Όκ΅ | 1,899 | 37,010 | 1,685 | 56,881 |
| λΆκ΅ | 833 | 6,465 | 420 | 7,585 |
| μ’ κ΅ | 1,939 | 52,527 | 2,373 | 122,206 |
2. Labeling Comment Data
- path : ./train-data/
- Comment Human Inspection : ./train-data/comment-labeling.csv
- Naver Movie Review Data : naver-ratings.csv
- ( Data from Here )
3. Using KoNLPy Okt
Text Data Preprocessing
okt.pos(comment)
remove 'Josa', 'Punctuation', 'Number'
save path : ./comment/after-okt-comment/
4. Build Deep Learning Network using Keras
- Python File Name : ./python-code/make_rnn_model.py
- Train Data path : ./train-data/
- Crawled Comment + Naver Movie Reivew => Transfer Learning
- Comment text data convert to Vector (using TextVectorization)
- Accuracy : 0.95
- Val Accuracy : 0.83
5. Predict Sentiments Value
- Make json file -> dict[date][article] = [[comment list],[]]
- Every Comment Labeling using Deep Learning Model
- Update json file / dict[date][article] = [[comment list],[sentiment value list]] (path: ./comment/json-okt-comment)
- Calculate sentiment value per date
- each Article sentiment : Weight Average (article comment count / date comment count)
- each Date sentiment : using IMDb's rating system
6. RESULT (Make Graph)
π Average, Standard Deviation / Religion
| κ²μ ν€μλ | μ΄μ κΈ°κ° | μ΄ν κΈ°κ° | ||
|---|---|---|---|---|
| νκ· | νμ€ νΈμ°¨ | νκ· | νμ€ νΈμ°¨ | |
| μ μ²μ§ | 0.381 | 0.412 | 0.313 | 0.388 |
| κΈ°λ κ΅ | 0.310 | 0.372 | 0.276 | 0.371 |
| μ²μ£Όκ΅ | 0.375 | 0.405 | 0.284 | 0.377 |
| λΆκ΅ | 0.356 | 0.392 | 0.272 | 0.369 |
| μ’ κ΅ | 0.313 | 0.376 | 0.271 | 0.367 |
π Sentiment Average stick graph / Religion
(path : ./result-graph/emotion-average-stick/)

π Sentiment time flow graph
(path : ./result-graph/emotion-flow/)
- Before COVID19 : green
- After COVID19 : red
- y axis
- close to 1 : Positive
- close to 0 : Negative
β μ²μ£Όκ΅
π All Comment Count per Month / Religion
(path : ./result-graph/comment-count/)

π WordCloud / Religion
(path : ./result-graph/word-cloud/)
β Before COVID19, κΈ°λ
κ΅

β After COVID19, κΈ°λ
κ΅

π Top 30 Word / Religion
(path : ./result-graph/word-cloud/)


