Trustworthy-ML-Lab

Languages

Python56%Jupyter Notebook44%

Repos

Stars

375

Forks

Top Language

Python

Loading contributions...

Top Repositories

Label-free-CBM

[ICLR 23] A new framework to transform any neural networks into an interpretable concept-bottleneck-model (CBM) without needing labeled concept data

134Jupyter Notebook

CLIP-dissect

[ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs

63Jupyter Notebook

CB-LLMs

[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.

31Python

VLG-CBM

[NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)

30Jupyter Notebook

posthoc-generative-cbm

[CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with minimal concept supervision, while preserving image quality

17Jupyter Notebook

ThinkEdit

[EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.

17Python

Repositories

Trustworthy-ML-Lab/CB-SAE

No description provided.

30Updated 2 weeks ago

Trustworthy-ML-Lab/Efficient-Interpretability-Eval

No description provided.

10Updated 3 months ago

Trustworthy-ML-Lab/VLG-CBM

[NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)

Jupyter Notebook305Updated 9 months ago

computer-visionconcept-bottleneck-modelsdeep-learningdeep-neural-networksexplainable-aiinterpretable-machine-learninglarge-language-models

Trustworthy-ML-Lab/Label-free-CBM

[ICLR 23] A new framework to transform any neural networks into an interpretable concept-bottleneck-model (CBM) without needing labeled concept data

Jupyter Notebook13431Updated 1 year ago

computer-visiondeep-learningdeep-neural-networksinterpretabilityinterpretable-deep-learning

Trustworthy-ML-Lab/Robust_HighUtil_Smoothed_DRL

[ICML 24] S-DQN and S-PPO: Robust smoothed deep RL agents without sacrificing performance

Python60Updated 7 months ago

adversarial-machine-learningdeep-learningdeep-reinforcement-learningrandomized-smoothingrobust-learningrobust-machine-learningrobustness

Trustworthy-ML-Lab/CLIP-dissect

[ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs

Jupyter Notebook6316Updated 2 years ago

computer-visiondeep-learningdeep-neural-networksexplainable-aiinterpretable-deep-learninginterpretable-machine-learningmechanistic-interpretability

Trustworthy-ML-Lab/posthoc-generative-cbm

Jupyter Notebook172Updated 2 weeks ago

computer-visionconcept-bottleneck-modelsdeep-learninggenerative-aiinterpretability-and-explainabilityinterpretable-deep-learningmechanistic-interpretability

Trustworthy-ML-Lab/Efficient-LLM-automated-interpretability

[NeurIPS'23 ATTRIB] An efficient framework to generate neuron explanations for LLMs

Python61Updated 2 years ago

deep-learningexplainable-aiinterpretabilitylarge-language-modelsmechanistic-interpretability

Trustworthy-ML-Lab/Steer2Edit

No description provided.

Python10Updated 1 month ago

Trustworthy-ML-Lab/CB-LLMs

[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.

Python3118Updated 1 month ago

deep-learningexplainable-aiinterpretable-deep-learninglarge-language-modelsmechanistic-interpretabilitynatural-language-processing

Trustworthy-ML-Lab/ReflCtrl

No description provided.

Python01Updated 1 month ago

Trustworthy-ML-Lab/ThinkEdit

Python171Updated 3 months ago

deep-learninggenerative-aiinterpretable-machine-learninglarge-language-modelsmechanistic-interpretabilityreasoning-language-models

Trustworthy-ML-Lab/Linear-Explanations

[ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks

Jupyter Notebook140Updated 10 months ago

computer-visiondeep-learninginterpretable-machine-learningmechanistic-interpretability

Trustworthy-ML-Lab/Training_Trustworthy_LRM_with_Refine

A new training framework for Trustworthy Large Reasoning Models

Python41Updated 4 months ago

deep-learningfaithfulnessinterpretabilityllmsllms-reasoningmachine-learningtrustworthy-ai

Trustworthy-ML-Lab/Provably-Robust-Conformal-Prediction

[ICLR 24] This work proposes RSCP+ to provide robustness guarantee in evaluation, and two novel methods PTT and RCT to robustify conformal predictions with improved efficiency through post-hoc transformation and training.

Python51Updated 1 year ago

adversarial-machine-learningdeep-learningdeep-neural-networksrobust-machine-learningrobustness

Trustworthy-ML-Lab/DSC-210-NLA-FA22

No description provided.

Jupyter Notebook10Updated 1 year ago

Trustworthy-ML-Lab/Audio_Network_Dissection

[ICML 24] AND: the first framework to provide automatic natural language explanations for deep acoustic network

Jupyter Notebook40Updated 1 year ago

deep-learningdeep-neural-networksinterpretable-machine-learningmechanistic-interpretability

Trustworthy-ML-Lab/Neuron_Eval

[ICML 25] A unified mathematical framework to evaluate neuron explanations of deep learning models with sanity tests

Jupyter Notebook70Updated 8 months ago