GitHunt

Trustworthy-ML-Lab

Languages

Python56%Jupyter Notebook44%

Repos

28

Stars

375

Forks

82

Top Language

Python

Loading contributions...

Top Repositories

Repositories

28
TR
Trustworthy-ML-Lab/CB-SAE

No description provided.

30Updated 2 weeks ago
TR
Trustworthy-ML-Lab/Efficient-Interpretability-Eval

No description provided.

10Updated 3 months ago
TR
Trustworthy-ML-Lab/VLG-CBM

[NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)

Jupyter Notebook305Updated 9 months ago
computer-visionconcept-bottleneck-modelsdeep-learningdeep-neural-networksexplainable-aiinterpretable-machine-learninglarge-language-models
TR
Trustworthy-ML-Lab/Label-free-CBM

[ICLR 23] A new framework to transform any neural networks into an interpretable concept-bottleneck-model (CBM) without needing labeled concept data

Jupyter Notebook13431Updated 1 year ago
computer-visiondeep-learningdeep-neural-networksinterpretabilityinterpretable-deep-learning
TR
Trustworthy-ML-Lab/Robust_HighUtil_Smoothed_DRL

[ICML 24] S-DQN and S-PPO: Robust smoothed deep RL agents without sacrificing performance

Python60Updated 7 months ago
adversarial-machine-learningdeep-learningdeep-reinforcement-learningrandomized-smoothingrobust-learningrobust-machine-learningrobustness
TR
Trustworthy-ML-Lab/CLIP-dissect

[ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs

Jupyter Notebook6316Updated 2 years ago
computer-visiondeep-learningdeep-neural-networksexplainable-aiinterpretable-deep-learninginterpretable-machine-learningmechanistic-interpretability
TR
Trustworthy-ML-Lab/posthoc-generative-cbm

[CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with minimal concept supervision, while preserving image quality

Jupyter Notebook172Updated 2 weeks ago
computer-visionconcept-bottleneck-modelsdeep-learninggenerative-aiinterpretability-and-explainabilityinterpretable-deep-learningmechanistic-interpretability
TR
Trustworthy-ML-Lab/Efficient-LLM-automated-interpretability

[NeurIPS'23 ATTRIB] An efficient framework to generate neuron explanations for LLMs

Python61Updated 2 years ago
deep-learningexplainable-aiinterpretabilitylarge-language-modelsmechanistic-interpretability
TR
Trustworthy-ML-Lab/Steer2Edit

No description provided.

Python10Updated 1 month ago
TR
Trustworthy-ML-Lab/CB-LLMs

[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.

Python3118Updated 1 month ago
deep-learningexplainable-aiinterpretable-deep-learninglarge-language-modelsmechanistic-interpretabilitynatural-language-processing
TR
Trustworthy-ML-Lab/ReflCtrl

No description provided.

Python01Updated 1 month ago
TR
Trustworthy-ML-Lab/ThinkEdit

[EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.

Python171Updated 3 months ago
deep-learninggenerative-aiinterpretable-machine-learninglarge-language-modelsmechanistic-interpretabilityreasoning-language-models
TR
Trustworthy-ML-Lab/Linear-Explanations

[ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks

Jupyter Notebook140Updated 10 months ago
computer-visiondeep-learninginterpretable-machine-learningmechanistic-interpretability
TR
Trustworthy-ML-Lab/Training_Trustworthy_LRM_with_Refine

A new training framework for Trustworthy Large Reasoning Models

Python41Updated 4 months ago
deep-learningfaithfulnessinterpretabilityllmsllms-reasoningmachine-learningtrustworthy-ai
TR
Trustworthy-ML-Lab/Provably-Robust-Conformal-Prediction

[ICLR 24] This work proposes RSCP+ to provide robustness guarantee in evaluation, and two novel methods PTT and RCT to robustify conformal predictions with improved efficiency through post-hoc transformation and training.

Python51Updated 1 year ago
adversarial-machine-learningdeep-learningdeep-neural-networksrobust-machine-learningrobustness
TR
Trustworthy-ML-Lab/DSC-210-NLA-FA22

No description provided.

Jupyter Notebook10Updated 1 year ago
TR
Trustworthy-ML-Lab/Audio_Network_Dissection

[ICML 24] AND: the first framework to provide automatic natural language explanations for deep acoustic network

Jupyter Notebook40Updated 1 year ago
deep-learningdeep-neural-networksinterpretable-machine-learningmechanistic-interpretability
TR
Trustworthy-ML-Lab/Neuron_Eval

[ICML 25] A unified mathematical framework to evaluate neuron explanations of deep learning models with sanity tests

Jupyter Notebook70Updated 8 months ago
computer-visiondeep-neural-networksexplainable-aiinterpretable-deep-learninglarge-language-modelsmechanistic-interpretability
TR
Trustworthy-ML-Lab/Concept-Bottleneck-LLM

No description provided.

Python50Updated 7 months ago
TR
Trustworthy-ML-Lab/effective_skill_unlearning

[NAACL 25] Two novel, light-weight, and training-free skill unlearning methods for LLMs

Python40Updated 11 months ago
deep-learninginterpretabilitylarge-language-modelnatural-language-processing
TR
Trustworthy-ML-Lab/efficient_neuron_eval

No description provided.

10Updated 9 months ago
TR
Trustworthy-ML-Lab/Describe-and-Dissect

[TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models

Jupyter Notebook102Updated 1 year ago
computer-visiondeep-learningdeep-neural-networksexplainable-aigenerative-aiinterpretable-machine-learninglarge-language-modelsmechanistic-interpretability
TR
Trustworthy-ML-Lab/RAT_MisD

Boosting misclassification detection ability by radius-aware training (RAT)

Python00Updated 1 year ago
deep-learningmisclassification-detection
TR
Trustworthy-ML-Lab/corrupting_neuron_explanations

[ICCV 23] Evaluating robustness of neuron explanation methods

Jupyter Notebook41Updated 2 years ago
computer-visiondeep-learningdeep-neural-networksinterpretable-machine-learningmechanistic-interpretabilityrobust-machine-learningrobustness
TR
Trustworthy-ML-Lab/Interpretability-Guided-Defense

[ECCV 24] A new and low-cost test-time defense for DNNs based on neuron-level-interpretability methods

Python41Updated 1 year ago
adversarial-examplesadversarial-machine-learningcomputer-visiondeep-learninginterpretabilityrobustness
TR
Trustworthy-ML-Lab/provable-efficient-dataset-distill-KRR

No description provided.

Python10Updated 1 year ago
TR
Trustworthy-ML-Lab/NN-LPK

No description provided.

Python20Updated 1 year ago
TR
Trustworthy-ML-Lab/concept-driven-continual-learning

official code repo

Jupyter Notebook01Updated 1 year ago

Gists

Recent Activity

Trustworthy-ML-Lab | GitHunt