7 results for “topic:causal-intervention”
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
[SIGIR 2022] Source code and datasets for "Bias Mitigation for Evidence-aware Fake News Detection by Causal Intervention".
Demystifying Verbatim Memorization in Large Language Models
A framework for evaluating auto-interp pipelines, i.e., natural language explanations of neurons.
A causal intervention framework to learn robust and interpretable character representations inside subword-based language models
[EMNLP 2023] A Causal View of Entity Bias in (Large) Language Models
Capture macOS dictation accurately without rewriting your words, keeping your input true to what you speak and avoiding common app issues.