sertacafsari/disinformation-analysis-slm
Source Code for the Bachelor's Project: Hybrid Small Language Models for Accurate Multimodal Disinformation and Misinformation Analysis
Source Code for the Bachelor's Project: "Hybrid Small Language Models for Accurate Multimodal Disinformation and Misinformation Analysis"
This repository contains the source code for the Bachelor's project titled "Hybrid Small Language Models for Accurate Multimodal Disinformation and Misinformation Analysis", authored by Sertac Bahadir Afsari and supervised by Jiapan Guo. This thesis is presented to Faculty of Science and Engineering at the University of Groningen for the degree of Bachelor of Science, Computing Science.
Abstract of the Thesis
Disinformation and misinformation are critical challenges in today's world, as they influence public opinion and pose threats to democratic institutions and right governance. With recent advancements in language models, their ability to analyze information and provide insights has significantly improved. These enhanced capabilities can be used to detect disinformation and misinformation in the social media or news sources for decreasing its influence over the public. In this study, the accuracy, effectiveness, and capabilities of Small Language Models (SLMs) and small Vision-Language Models (VLMs) for detecting and classifying disinformation and misinformation in both textual and multimodal contexts were evaluated. Consequently, several fine-tuning experiments were conducted using LIAR2, Fake News and Fauxtography datasets. In conclusion, the findings shows that SLMs and small VLMs can accurately detect and classify disinformation and misinformation. Furthermore, they may be a more efficient alternative to base LLMs used for this task.
Setting The Environment
This project requires a Python virtual environment to run and install the necessary libraries. After creating virtual environment, it can be activated by following command:
source .venv/bin/activateThen all used libraries can be installed by using the following command:
pip install -r requirements.txtSet Environmental Variables
To conduct experiments and get same results, environmental variables for Hugging Face token and Weights & Biases token are required. When you get these, you can create an .env file that contains:
HF_TOKEN = "<YOUR_HUGGINGFACE_TOKEN>"
WB_TOKEN = "<YOUR_WANDB_TOKEN>"An example file also can be found in .env.example file.
Conduct Experiments
Experiments can be conducted by using the scripts in the Habrok.
Ownership & Contact
Author: Sertac Bahadir Afsari
Email: s.b.afsari@student.rug.nl