62 results for “topic:multimodal-fusion”
Semantic Segmentation for Remote Sensing
SuperYOLO is accepted by TGRS
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)
E2E-MFD-OOD
Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.
[AAAI-2026] Patho-AgenticRAG: Towards Multimodal Agentic Retrieval-Augmented Generation for Pathology VLMs via Reinforcement Learning
Creating multimodal multitask models
Multimodal sentiment analysis using hierarchical fusion with context modeling
[CVAMD 2021] "End-to-End Learning of Fused Image and Non-Image Feature for Improved Breast Cancer Classification from MRI"
Few-Shot malware classification using fused features of static analysis and dynamic analysis (基于静态+动态分析的混合特征的小样本恶意代码分类框架)
Multimodal object tracking and scene analytics for highly actionable, real-world contextualized data
[2025] ModalFormer: Multimodal Transformer for Low-Light Image Enhancement
Multimodal sentiment analysis
[NeurIPS 2025] Implementation of the paper "InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions".
Official implementation of "Multi-scale Bottleneck Transformer for Weakly Supervised Multimodal Violence Detection"
This repository contains the dataset and baselines explained in the paper: M2H2: A Multimodal Multiparty Hindi Dataset For HumorRecognition in Conversations
FusionBrain Challenge 2.0: creating multimodal multitask model
E2E-MFD-HOD
VAPOR: Legged Robot Navigation in Outdoor Vegetation using Offline Reinforcement Learning (ICRA2024)
Deep-HOSeq: Deep Higher-Order Sequence Fusion for Multimodal Sentiment Analysis.
Repo for "Centaur: Robust Multimodal Fusion for Human Activity Recognition"
Source code for the paper "Automatic fused multimodal deep learning for plant identification" (Alfreds Lapkovskis, Natalia Nefedova & Ali Beikmohammadi, 2025)
Code for the paper: "A Novel Cross Fusion Model with Fine-grained Detail Reconstruction for Remote Sensing Image Pan-sharpening ", TGSI 2024.
A Transferability-guided Protein-Ligand Interaction Prediction Method
Official Pytorch Implementation of our paper: GAF-Net: Video-Based Person Re-Identification via Appearance and Gait Recognitions
[TGRS2025] This is the official PyTorch implementation of "PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification"
Contributed to a vision-driven accessibility tool translating sign language into text