PRamoneda/music-crs-baselines
Music Conversational Recommendation Challenge Baselines
Welcome to the Music CRS Challenge! This repository provides baseline systems for building conversational music recommendation systems using the TalkPlayData-2 dataset.
📋 Challenge Overview
Build a conversational AI that can:
- Understand user music preferences through natural dialogue
- Recommend relevant tracks from a music catalog
- Generate engaging, personalized responses about music
Baseline System
The system operates on a two-stage pipeline:
- Recsys: Find candidate tracks matching user preferences
- LLM: Create natural language responses explaining recommendations
Core Components
-
🤖 LLM (Language Model)
- Generates natural language responses
- Model: Llama-3.2-1B-Instruct
- Module:
mcrs/lm_modules
-
🎯 RecSys (Recommendation System)
- Retrieves relevant tracks from catalog
- Methods: BM25 (sparse) or BERT (dense)
- Module:
mcrs/retrieval_modules/
-
👤 User DB (User Database)
- Stores user profiles (user_id, age, gender, country)
- Module:
mcrs/db_user/user_profile.py
-
🎵 Item DB (Music Catalog Database)
- Contains track metadata (track_id track name, artist, album, tags, release date)
- Module:
mcrs/db_item/music_catalog.py
📚 Challenge Resources
- Conversation Dataset: TalkPlayData-2
- Track Metadata: Track Metadata
- Pre-extracted Track Embeddings: Track Embeddings
- User Profiles: User Metadata
- Pre-extracted User Embeddings: User Embeddings
🚀 Quick Start
Installation
uv venv .venv --python=3.10
source .venv/bin/activate
uv pip install -e .Run a Demo Query
Try the baseline system with a simple query:
python run_crs.py --user_query "I'm looking for jazz music."Example Output:
----------------------------------------------------------------------------------------------------
🎵 Music: https://open.spotify.com/track/3auejP8jQXX4soeSvMCtqL
🤖 Assistant Response:
I'm glad you liked the recommended track \"Isabella\" by Gregg Karukas!
Isabella is a smooth jazz track that exudes a soothing and intimate atmosphere.
The song features a gentle piano melody, accompanied by a subtle saxophone solo,
creating a warm and relaxing ambiance. The tempo is moderate, with a steady beat
that encourages you to sway to the rhythm....[omitted]
Run Full Inference
Process the entire test dataset with batch inference:
# BM25 baseline
python run_inference.py --tid llama1b_bm25 --batch_size 16
# BERT baseline
python run_inference.py --tid llama1b_bert --batch_size 16Results will be saved to exp/inference/{tid}.json.
🛠️ Custom Configuration
Create your own config file in config/:
# config/my_model.yaml
lm_type: "meta-llama/Llama-3.2-1B-Instruct"
retrieval_type: "qwen_embedding" # your custom retriever
item_db_name: "talkpl-ai/TalkPlayData-2-Track-Metadata"
user_db_name: "talkpl-ai/TalkPlayData-2-User-Metadata"
split_types:
- "test_warm"
- "test_cold"
corpus_types:
- "track_name"
- "artist_name"
- "album_name"
- "tag_list"
cache_dir: "./cache"
device: "cuda"
attn_implementation: "flash_attention_2"Then run with your config:
python run_inference.py --tid my_model📊 Evaluation
For evaluation, please refer to:
https://github.com/nlp4musa/music-crs-evaluator
🎯 Challenge Tips
- Start simple: Run baseline, understand the pipeline
- Iterate quickly: Test changes on a subset before full evaluation
- Use caching: Precompute embeddings to speed up experiments
- Monitor metrics: Track both recommendation accuracy and response quality
See ./tips/ for advanced techniques and future directions:
- Improve Item Representation: Add audio features, use better embedding models
- Add Reranker Module: Implement two-stage ranking with LLM or embedding-based rerankers
- Generative Retrieval: Use semantic IDs for end-to-end track generation
🤝 Contributing
Feel free to:
- Implement new retrieval/reranking modules
- Add evaluation metrics
- Improve prompt engineering
- Share your best-performing configurations
Good luck with the challenge! 🎵
