GitHunt
PR

PRamoneda/music-crs-baselines

Music Conversational Recommendation Challenge Baselines

Welcome to the Music CRS Challenge! This repository provides baseline systems for building conversational music recommendation systems using the TalkPlayData-2 dataset.

📋 Challenge Overview

Build a conversational AI that can:

  • Understand user music preferences through natural dialogue
  • Recommend relevant tracks from a music catalog
  • Generate engaging, personalized responses about music

Baseline System

System Architecture

The system operates on a two-stage pipeline:

  1. Recsys: Find candidate tracks matching user preferences
  2. LLM: Create natural language responses explaining recommendations

Core Components

  1. 🤖 LLM (Language Model)

    • Generates natural language responses
    • Model: Llama-3.2-1B-Instruct
    • Module: mcrs/lm_modules
  2. 🎯 RecSys (Recommendation System)

    • Retrieves relevant tracks from catalog
    • Methods: BM25 (sparse) or BERT (dense)
    • Module: mcrs/retrieval_modules/
  3. 👤 User DB (User Database)

    • Stores user profiles (user_id, age, gender, country)
    • Module: mcrs/db_user/user_profile.py
  4. 🎵 Item DB (Music Catalog Database)

    • Contains track metadata (track_id track name, artist, album, tags, release date)
    • Module: mcrs/db_item/music_catalog.py

📚 Challenge Resources

🚀 Quick Start

Installation

uv venv .venv --python=3.10
source .venv/bin/activate
uv pip install -e .

Run a Demo Query

Try the baseline system with a simple query:

python run_crs.py --user_query "I'm looking for jazz music."

Example Output:

----------------------------------------------------------------------------------------------------
🎵 Music: https://open.spotify.com/track/3auejP8jQXX4soeSvMCtqL
🤖 Assistant Response:
I'm glad you liked the recommended track \"Isabella\" by Gregg Karukas!

Isabella is a smooth jazz track that exudes a soothing and intimate atmosphere.
The song features a gentle piano melody, accompanied by a subtle saxophone solo,
creating a warm and relaxing ambiance. The tempo is moderate, with a steady beat
that encourages you to sway to the rhythm....[omitted]

Run Full Inference

Process the entire test dataset with batch inference:

# BM25 baseline
python run_inference.py --tid llama1b_bm25 --batch_size 16

# BERT baseline
python run_inference.py --tid llama1b_bert --batch_size 16

Results will be saved to exp/inference/{tid}.json.


🛠️ Custom Configuration

Create your own config file in config/:

# config/my_model.yaml
lm_type: "meta-llama/Llama-3.2-1B-Instruct"
retrieval_type: "qwen_embedding"  # your custom retriever
item_db_name: "talkpl-ai/TalkPlayData-2-Track-Metadata"
user_db_name: "talkpl-ai/TalkPlayData-2-User-Metadata"
split_types:
  - "test_warm"
  - "test_cold"
corpus_types:
  - "track_name"
  - "artist_name"
  - "album_name"
  - "tag_list"
cache_dir: "./cache"
device: "cuda"
attn_implementation: "flash_attention_2"

Then run with your config:

python run_inference.py --tid my_model

📊 Evaluation

For evaluation, please refer to:
https://github.com/nlp4musa/music-crs-evaluator

🎯 Challenge Tips

  1. Start simple: Run baseline, understand the pipeline
  2. Iterate quickly: Test changes on a subset before full evaluation
  3. Use caching: Precompute embeddings to speed up experiments
  4. Monitor metrics: Track both recommendation accuracy and response quality

See ./tips/ for advanced techniques and future directions:

  • Improve Item Representation: Add audio features, use better embedding models
  • Add Reranker Module: Implement two-stage ranking with LLM or embedding-based rerankers
  • Generative Retrieval: Use semantic IDs for end-to-end track generation

🤝 Contributing

Feel free to:

  • Implement new retrieval/reranking modules
  • Add evaluation metrics
  • Improve prompt engineering
  • Share your best-performing configurations

Good luck with the challenge! 🎵