Repos
28
Stars
375
Forks
82
Top Language
Python
Loading contributions...
Top Repositories
[ICLR 23] A new framework to transform any neural networks into an interpretable concept-bottleneck-model (CBM) without needing labeled concept data
[ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs
[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.
[NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)
[CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with minimal concept supervision, while preserving image quality
[EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.
Repositories
28No description provided.
No description provided.
[NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)
[ICLR 23] A new framework to transform any neural networks into an interpretable concept-bottleneck-model (CBM) without needing labeled concept data
[ICML 24] S-DQN and S-PPO: Robust smoothed deep RL agents without sacrificing performance
[ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs
[CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with minimal concept supervision, while preserving image quality
[NeurIPS'23 ATTRIB] An efficient framework to generate neuron explanations for LLMs
No description provided.
[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.
No description provided.
[EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.
[ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks
A new training framework for Trustworthy Large Reasoning Models
[ICLR 24] This work proposes RSCP+ to provide robustness guarantee in evaluation, and two novel methods PTT and RCT to robustify conformal predictions with improved efficiency through post-hoc transformation and training.
No description provided.
[ICML 24] AND: the first framework to provide automatic natural language explanations for deep acoustic network
[ICML 25] A unified mathematical framework to evaluate neuron explanations of deep learning models with sanity tests
No description provided.
[NAACL 25] Two novel, light-weight, and training-free skill unlearning methods for LLMs
No description provided.
[TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models
Boosting misclassification detection ability by radius-aware training (RAT)
[ICCV 23] Evaluating robustness of neuron explanation methods
[ECCV 24] A new and low-cost test-time defense for DNNs based on neuron-level-interpretability methods
No description provided.
No description provided.
official code repo