🛒 Kassalapp Assistant - RAG-Powered Grocery Shopping Helper

Tip

🚀 Try it LIVE: Click the Hugging Face button above to launch the application!
Please note: If the Space is sleeping, it may take up to 30 seconds to start.

An intelligent, Norwegian grocery shopping assistant built with Groq (Llama 3.3) and the Kassalapp API, designed for deployment on Hugging Face Spaces.

✨ Key Features

🌍 Cloud-Hybrid RAG: Combines static domain expert knowledge with live market data for accurate shopping assistance.
🧠 Extensible Knowledge Base: Easily add your own data (loyalty programs, store guides) by dropping files in a folder.
⚡ Inference Engine: Powered by Groq & Llama 3.3 for responsive conversational AI.
🛠️ Real-Time Data: Dynamic tool calling to the Kassalapp API for live grocery price, product, and store information.
🛡️ Universal Secrets: Seamless transition between local .env and cloud st.secrets environments.

💡 Motivation & Inspiration

I built this project to get a general understanding of how to develop RAG (Retrieval-Augmented Generation) systems and the tools that exist for it.

The spark for using the Kassalapp API came from a 2023 Kode24 article about Helge, who built a price tracker for groceries. I thought it was a very cool project, and I’d always wanted to make something with that API.

When I started learning about RAG and LLMs, I realized this was the perfect opportunity to "kill two birds with one stone". While this solution is not perfect, it works for common cases. It is also worth noting that AI has been used as a tool in the development of this project.

⚠️ Current Status & Limitations

This is an educational project. While it covers the essentials, you may encounter some limitations:

Prompt Sensitivity: Tricky or advanced prompts might occasionally result in sub-optimal answers.
Data Availability: The underlying API is relatively simple, and many grocery products are missing price data or specific store information.
Expansion Potential: There are many more things that can be done to improve the system, such as fetching more data from the API (such as allergies, nutrition, etc.) or expanding the knowledge base further.

🏗️ Technology Choices & Trade-offs

This project evolved significantly during development. I made specific decisions to balance performance, scalability, and free-tier accessibility:

The Transition from Gemini File Search to Groq + Pinecone
- My first iteration used Gemini's File Search (RAG-as-a-Service). It was an excellent "first run" because it handled chunking, embedding, and indexing automatically. However, I eventually migrated away because:
  1. Rate Limits: The Gemini free tier had strict requests-per-day limits that I quickly exhausted. Groq (hosting Llama 3.3 70B) offered a much higher volume of requests per day for a good high-reasoning model.
  2. Cost & Lock-in: While Gemini File Search simplifies the stack, embedding generation isn't entirely free, and it locks the project into the Google ecosystem. Moving to Pinecone and local embeddings gave me more control and portability.
Why all-MiniLM-L6-v2 for Embeddings?
- This model is incredibly efficient for local development. It is tiny (approx. 23MB), meaning it loads almost instantly and runs fast on a standard CPU. This was important for keeping the synchronization script lightweight and keeping the project compatible with free-tier cloud hosting. More importantly, it was more than enough for the scope and complexity of this specific knowledge base.
Why Streamlit for the UI?
- I chose Streamlit because it allows for rapid development of data-focused AI applications using pure Python. It handled the complex state management of a chat interface and tool-calling status badges with minimal boilerplate, while making deployment to Hugging Face or Streamlit Cloud seamless.

🧠 Architecture: Understanding "Hybrid RAG"

This project uses a Hybrid RAG (Retrieval-Augmented Generation) architecture. It merges two distinct types of data:

The "Static" Semantic Layer (Pinecone Cloud):
- What it is: Expert domain knowledge that doesn't change every second (e.g., How the "Trumf" program works, return policies, or store chain history).
- Why we need it: By storing these as vectors in Pinecone, the AI can "read" the official guides before answering.
- How to add to it: Place any .md or .txt file into the knowledge/ directory and run the sync script.
The "Dynamic" Tool Layer (Kassalapp API):
- What it is: Real-time market data (e.g., The current price of a loaf of bread).
- Why we need it: Even a 5-minute-old database is "stale" for prices. The AI uses Function Calling to fetch live data directly from the Kassalapp API.

Synchronizing your Knowledge

The sync_to_pinecone.py utility handles the process of breaking your knowledge files into chunks, turning them into mathematical vectors (embeddings), and uploading them to your cloud database. This ensures your AI stays updated with your custom documentation.

🛠️ Tech Stack

Frontend: Streamlit
Hosting: Hugging Face Spaces or Streamlit Cloud
LLM Inference: Groq (Llama 3.3 70B)
Vector Store: Pinecone Cloud
Embeddings: all-MiniLM-L6-v2
Data Source: Kassalapp API

🚀 Getting Started

To get the assistant up and running locally or in the cloud:

Refer to the Deployment Guide for step-by-step instructions.
Set up your environment variables via .env (local) or Secrets (Cloud).
Ensure your data is synced using sync_to_pinecone.py.

Built with ❤️ using Groq, Pinecone, Hugging Face, Streamlit, and Kassalapp.

YounesBB/kassalapp-rag