GitHunt — Discover GitHub Repositories

Awesome video understanding toolkits based on PaddlePaddle. It supports video data annotation tools, lightweight RGB and skeleton based action recognition model, practical applications for video tagging and sport action detection.

Python1.7k387Updated 2 days ago

actbertaction-detectionaction-localizationaction-recognitionactivitynet+14

yjxiong/temporal-segment-networks

Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Python1.6k474Updated 1 week ago

action-recognitiontemporal-segment-networksvideo-understanding

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

1.4k111Updated 1 day ago

audioaudio-processingaudio-visual-understandingbytedanceiclr2024+10

TheShadow29/awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

1.1k105Updated 4 weeks ago

arxivawesome-listcaptioning-imagescaptioning-videoscomputer-vision+13

yjxiong/tsn-pytorch

Temporal Segment Networks (TSN) in PyTorch

Python1.1k308Updated 1 week ago

action-recognitiondeep-learningpytorchtemporal-segment-networksvideo-understanding

PKU-YuanGroup/Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Python94548Updated 4 days ago

image-understandinglarge-language-modelsvideo-understandingvision-language-model

cuixing158/Awesome-CV-MasterHub

:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works

88050Updated just now

awesomeimage-captioningimage-classificationimage-dehazingimage-denoising+15

OpenGVLab/VideoMAEv2

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Python75988Updated 1 day ago

action-detectionaction-recognitioncvpr2023foundation-modelself-supervised-learning+2

yjxiong/action-detection

temporal action detection with SSN

Python647175Updated 1 month ago

action-detectionaction-recognitionstructured-segment-networkstemporal-activity-localizationvideo-understanding

Vision-CAIR/MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Python64171Updated 2 days ago

long-video-understandingvideo-question-answeringvideo-retrievalvideo-understanding

waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

Python52371Updated 9 hours ago

anthropicapple-siliconaudio-processingclaude-codecomputer-vision+15

henghuiding/MeViS

[ICCV 2023 & TPAMI 2025] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

Python52121Updated 6 days ago

mevis-datasetmose-datasetmultimodal-learningreferring-expression-comprehensionreferring-expression-segmentation+2

HKUDS/VideoAgent

"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"

Python48971Updated 4 hours ago

agentsaudio-editingaudio-understandingllm-agentsnotebooklm+3

yoosan/video-understanding-dataset

A collection of recent video understanding datasets, under construction!

46978Updated 1 month ago

action-recognitioncomputer-visiondatasetsvideo-understanding

chihyaoma/Activity-Recognition-with-CNN-and-RNN

Temporal Segments LSTM and Temporal-Inception for Activity Recognition

Lua444146Updated 1 month ago

activity-recognitionconvolutional-neural-networkslstm-neural-networkstorchvideo-understanding

Leon1207/Video-RAG-master

✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"

Python40239Updated 1 week ago

long-video-understandingmulti-modal-large-language-modelplug-and-playretrieval-augmented-generationtraining-free+2

MCG-NJU/TDN

[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition

Python38155Updated 2 months ago

action-recognitioncvpr2021pytorchtemporal-modelingvideo-classification+1

SoccerNet/sn-gamestate

[CVPRW'24] SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap (CVPR24 - CVSports workshop)

Python38183Updated 3 days ago

bird-eye-viewdetectionmulti-object-trackingre-identificationsoccer+5

V-

v-iashin/SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Jupyter Notebook37039Updated 2 weeks ago

audioaudio-generationbmvcevaluation-metricsgan+10

microsoft/DeepVideoDiscovery

**Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.

Python35913Updated 2 days ago

agentdeepresearchvideo-processingvideo-understanding

boheumd/MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Python34730Updated 5 days ago

llmvideo-understanding

Page 1 of 10