Music Conversational Recommendation Challenge Baselines

Welcome to the Music CRS Challenge! This repository provides baseline systems for building conversational music recommendation systems using the TalkPlayData-2 dataset.

📋 Challenge Overview

Build a conversational AI that can:

Understand user music preferences through natural dialogue
Recommend relevant tracks from a music catalog
Generate engaging, personalized responses about music

Baseline System

The system operates on a two-stage pipeline:

Recsys: Find candidate tracks matching user preferences
LLM: Create natural language responses explaining recommendations

Core Components

🤖 LLM (Language Model)
- Generates natural language responses
- Model: Llama-3.2-1B-Instruct
- Module: mcrs/lm_modules
🎯 RecSys (Recommendation System)
- Retrieves relevant tracks from catalog
- Methods: BM25 (sparse) or BERT (dense)
- Module: mcrs/retrieval_modules/
👤 User DB (User Database)
- Stores user profiles (user_id, age, gender, country)
- Module: mcrs/db_user/user_profile.py
🎵 Item DB (Music Catalog Database)
- Contains track metadata (track_id track name, artist, album, tags, release date)
- Module: mcrs/db_item/music_catalog.py

📚 Challenge Resources

Conversation Dataset: TalkPlayData-2
Track Metadata: Track Metadata
Pre-extracted Track Embeddings: Track Embeddings
User Profiles: User Metadata
Pre-extracted User Embeddings: User Embeddings

🚀 Quick Start

Installation

uv venv .venv --python=3.10
source .venv/bin/activate
uv pip install -e .

Run a Demo Query

Try the baseline system with a simple query:

python run_crs.py --user_query "I'm looking for jazz music."

Example Output:

----------------------------------------------------------------------------------------------------
🎵 Music: https://open.spotify.com/track/3auejP8jQXX4soeSvMCtqL
🤖 Assistant Response:
I'm glad you liked the recommended track \"Isabella\" by Gregg Karukas!

Isabella is a smooth jazz track that exudes a soothing and intimate atmosphere.
The song features a gentle piano melody, accompanied by a subtle saxophone solo,
creating a warm and relaxing ambiance. The tempo is moderate, with a steady beat
that encourages you to sway to the rhythm....[omitted]

Run Full Inference

Process the entire test dataset with batch inference:

# BM25 baseline
python run_inference.py --tid llama1b_bm25 --batch_size 16

# BERT baseline
python run_inference.py --tid llama1b_bert --batch_size 16

Results will be saved to exp/inference/{tid}.json.

🛠️ Custom Configuration

Create your own config file in config/:

# config/my_model.yaml
lm_type: "meta-llama/Llama-3.2-1B-Instruct"
retrieval_type: "qwen_embedding"  # your custom retriever
item_db_name: "talkpl-ai/TalkPlayData-2-Track-Metadata"
user_db_name: "talkpl-ai/TalkPlayData-2-User-Metadata"
split_types:
  - "test_warm"
  - "test_cold"
corpus_types:
  - "track_name"
  - "artist_name"
  - "album_name"
  - "tag_list"
cache_dir: "./cache"
device: "cuda"
attn_implementation: "flash_attention_2"

Then run with your config:

python run_inference.py --tid my_model

📊 Evaluation

For evaluation, please refer to:
https://github.com/nlp4musa/music-crs-evaluator

🎯 Challenge Tips

Start simple: Run baseline, understand the pipeline
Iterate quickly: Test changes on a subset before full evaluation
Use caching: Precompute embeddings to speed up experiments
Monitor metrics: Track both recommendation accuracy and response quality

See ./tips/ for advanced techniques and future directions:

Improve Item Representation: Add audio features, use better embedding models
Add Reranker Module: Implement two-stage ranking with LLM or embedding-based rerankers
Generative Retrieval: Use semantic IDs for end-to-end track generation

🤝 Contributing

Feel free to:

Implement new retrieval/reranking modules
Add evaluation metrics
Improve prompt engineering
Share your best-performing configurations

Good luck with the challenge! 🎵

PRamoneda/music-crs-baselines