Teng Wang
ttengwang
Tencent & HKU. 🌟 Actively looking for research interns in multimodality learning.
Languages
Repos
34
Stars
3.4k
Forks
240
Top Language
Python
Loading contributions...
Top Repositories
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
A curated list of prompt-based paper in computer vision and vision-language learning.
Awesome papers & datasets specifically focused on long-term videos.
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
Second-place solution to dense video captioning task in ActivityNet Challenge (CVPR 2020 workshop)
Event Sequence Generation Network
Repositories
34Awesome papers & datasets specifically focused on long-term videos.
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
A curated list of prompt-based paper in computer vision and vision-language learning.
No description provided.
Second-place solution to dense video captioning task in ActivityNet Challenge (CVPR 2020 workshop)
Event Sequence Generation Network
Accelerating the development of large multimodal models (LMMs) with lmms-eval
Code for paper "Event-centric hierarchical representation for dense video captioning" (TCSVT2020)
EVA Series: Visual Representation Fantasies from BAAI
TMM: show, tell and rephrase
Recent Advances in Vision and Language Pre-training (VLP)
Reading list for research topics in multimodal machine learning
MERLOT: Multimodal Neural Script Knowledge Models
Must-read papers on prompt-based tuning for pre-trained language models.
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
No description provided.
Prefix-Tuning: Optimizing Continuous Prompts for Generation
No description provided.
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Feature Extractor module for videos using the PySlowFast framework
python codes for CIDEr - Consensus-based Image Caption Evaluation
No description provided.
PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
No description provided.
Dense video captioning in PyTorch
Evaluation code for Dense-Captioning Events in Videos
A faster pytorch implementation of faster r-cnn
image captioning codebase in pytorch(finetunable cnn in branch "with_finetune";diverse beam search can be found in 'dbs' branch; self-critical training is under my self-critical.pytorch repository.)