Perspectief LLM + RAG Project

End‑to‑end pipeline that turns a public website into a question‑answering assistant powered by a fine‑tuned Llama‑3.1 8B Instruct model, a FAISS knowledge base and a lightweight RAG wrapper.

Hugging Face: https://huggingface.co/ReinoutW/perspectief-llama-3-8b-dutch-qa-lora-v1

🗺️ Repository structure

.
├── Scrape-website/                  # website crawler → JSONL dataset
│   └── scraper.py
├── training/                 # fine‑tuning assets
│   ├── perspectief_training.py
├── perspectief-llama3-rag-website/  # retrieval‑augmented chat
│   ├── build_kb.py
│   └── rag_chat.py
└── README.md

0 . Prerequisites

Tool	Version tested
Python	3.10 – 3.12
CUDA	12.x (optional, CPU works but is slow)
PyTorch	2.2+
Unsloth	2025.5
Transformers	4.41
Sentence‑Transformers	2.6
FAISS	1.7

Install everything in one go:

pip install -r requirements.txt     # provided

GPU note: the training script uses 4‑bit QLoRA and runs comfortably on a single RTX 4090 or A6000 (≈ 24 GB vRAM).

1 . Scrape the website

python scraper/crawl_perspectief.py \
       --base https://perspectief.eu/ \
       --out  data/perspectief_raw.json

The crawler:

follows internal links only
extracts URL, title, main content
skips duplicates & junk pages

2 . Convert raw pages → fine‑tune dataset

The helper scraper/dataset-processor.py reads the raw pages and heuristically derives Question / Answer pairs (H2 → Q, paragraph → A). The resulting JSON‑Lines file uses the schema your model expects:

{
  "URL": "…",
  "Question": "What is the total cost for the 4‑hour plenary session, and is there a discount for colleagues?",
  "Answer":   "The plenary session costs €595 (excluding VAT)…",
  "DetailedContext": "The €595 fee applies…"
}

Run it:

python scraper/dataset-processor.py \
       --in  data/perspectief_raw.json \
       --out data/perspectief_dataset-1.jsonl

💡 Manual pass recommended: skim the JSONL and fix typos—garbage in, garbage out!

3 . Fine‑tune Llama‑3 with Unsloth

The training script below (already in training/) applies 4‑bit QLoRA to Meta‑Llama‑3.1‑8B‑Instruct. It formats every row as:

<|user|>
{Question}

Context:
{DetailedContext}
<|assistant|>
{Answer}<|end|>

3.1 Launch training

python training/perspectief_training.py \
  --output_dir training/checkpoints/perspectief-llama3 \
  --epochs 6 \
  --batch  2 \
  --lr     2e-5

Early stopping isn’t enabled; monitor loss and abort if it plateaus.
The script automatically splits 10 % for validation.

3.2 Result

training/checkpoints/perspectief-llama3/
 ├─ adapter_model.safetensors   # LoRA weights
 ├─ adapter_config.json
 ├─ tokenizer.json / tokenizer_config.json
 └─ …

4 . Build the knowledge base (FAISS)

python rag/build_kb.py \
       --data      data/perspectief_dataset-1.jsonl \
       --index_dir kb

Uses all‑MiniLM‑L6‑v2 embeddings (swapable via --embedder).
Normalises embeddings so IndexFlatIP ≈ cosine similarity.

Output:

kb/
 ├─ faiss.index   # vectors
 └─ meta.pkl      # original rows, aligned with vectors

5 . Run Retrieval‑Augmented Generation

python rag/rag_chat.py \
       --question "" \
       --top_k    3

Under the hood:

Embed the question → cosine search in FAISS.
Assemble prompt in the exact fine‑tune template (user/context/assistant).
Generate with the adapter on top of the base model.

Typical output:

Context passages used:
--------------------------------------------------------------------------------
The €595 fee applies to each primary participant…
[source] https://...
--------------------------------------------------------------------------------
A: De plenaire sessie kost €595 (excl. btw) en een tweede collega krijgt 40 % korting…

6 . Evaluation & QA checks

Check	Command
Retriever sanity	`python rag/quick_test.py --question "…"`
Exact‑match score	see `evaluation/simple_em.py`
Live chat	integrate `rag_chat.py` as a FastAPI endpoint

7 . Troubleshooting

FAISS `AttributeError: numpy_array_to_float_array`

You’re on FAISS ≥ 1.7. The fixed build_kb.py already passes a plain np.ndarray—make sure you pulled latest.

CUDA OOM during training

Reduce --batch or enable gradient‑checkpointing (already on).
Fit can be resumed from the last checkpoint.

Model ignores retrieved context

Increase --top_k in rag_chat.py.
Split very long DetailedContext strings (< 512 tokens ideal).

ReinoutWW/llm-finetuned-perspectief

Perspectief LLM + RAG Project

🗺️ Repository structure

0 . Prerequisites

1 . Scrape the website

2 . Convert raw pages → fine‑tune dataset

3 . Fine‑tune Llama‑3 with Unsloth

3.1 Launch training

3.2 Result

4 . Build the knowledge base (FAISS)

5 . Run Retrieval‑Augmented Generation

6 . Evaluation & QA checks

7 . Troubleshooting

FAISS `AttributeError: numpy_array_to_float_array`

CUDA OOM during training

Model ignores retrieved context

On this page

Languages

Contributors

ReinoutWW/llm-finetuned-perspectief

Perspectief LLM + RAG Project

🗺️ Repository structure

0 . Prerequisites

1 . Scrape the website

2 . Convert raw pages → fine‑tune dataset

3 . Fine‑tune Llama‑3 with Unsloth

3.1 Launch training

3.2 Result

4 . Build the knowledge base (FAISS)

5 . Run Retrieval‑Augmented Generation

6 . Evaluation & QA checks

7 . Troubleshooting

FAISS AttributeError: numpy_array_to_float_array

CUDA OOM during training

Model ignores retrieved context

On this page

Languages

Contributors

Perspectief LLM + RAG Project

FAISS `AttributeError: numpy_array_to_float_array`