24 results for “topic:video-representation-learning”
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Awesome papers & datasets specifically focused on long-term videos.
Code for the paper Learning the Predictability of the Future (CVPR 2021)
PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529
[CVPR2023] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning (https://arxiv.org/abs/2212.04500)
Implementation of "Generating Videos with Scene Dynamics" in Tensorflow
[ICCV 2025] Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning.
Winning SubNetwork (WSN), Fourier Subneural Operator (FSO), Video-Incremental Learning (VIL), Sequential Neural Implicit Representation (NIR)
Official Pytorch implementation of EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens [ICML2024].
👆PyTorch Implementation of JEDi Metric described in "Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality"
Official repository of the “Mask Again: Masked Knowledge Distillation for Masked Video Modeling” (ACM MM 2023)
This is the code accompanying the AAAI 2022 paper "Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives" https://arxiv.org/abs/2201.11736 . The method allows you to use additional ranking information for representation learning.
Actor-agnostic Multi-label Action Recognition with Multi-modal Query [ICCVW '23]
Official repository of the "Fine-grained Key-Value Memory Enhanced Predictor for Video Representation Learning" (ACM MM 2023)
The code for the paper "Efficient Self-Supervised Video Hashing with Selective State Spaces" (AAAI'25).
A paper list of partially relevant video retrieval.
[CVPR 2026] Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval.
Official code for CVPR2024 “VideoMAC: Video Masked Autoencoders Meet ConvNets”
Chainer implementation of Networks for Learning Video Representations
[Asilomar 2022] Contextual Explainable Video Representation: Human Perception-based Understanding
📚 Paper Notes (Computer vision)
The official repository for creating casual action effect (CAE) dataset for the IJCNLP-AACL 2023 paper: Implicit Affordance Acquisition via Causal Action–Effect Modeling in the Video Domain
The official repository for the IJCNLP-AACL 2023 paper: Implicit Affordance Acquisition via Causal Action–Effect Modeling in the Video Domain
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training