"topic:clip" — Search

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Python1.5k127Updated 1 day ago

chatbotclipgpt-4llamallavamulit-modalvicunavideo-chatboatvideo-conversationvision-languagevision-language-pretraining

yzhuoning/Awesome-CLIP

Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).

1.2k57Updated 5 days ago

clipcontrastive-learningpre-training

unum-cloud/UForm

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Python1.2k76Updated 2 hours ago

bertclipclusteringcontrastive-learningcross-attentionhuggingface-transformersimage-searchlanguage-visionllavamulti-lingualmultimodalneural-networkopenaiopenclippretrained-modelspytorchrepresentation-learningsemantic-searchtransformervector-search

gokayfem/awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

Markdown1.2k56Updated 23 hours ago

awesomeawesome-listblipclipcogvlmimage-encoderinternlmkosmosllavamultimodalqwen-vltext-encodervision-language-modelvlm

SkalskiP/vlms-zero-to-hero

This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.

Jupyter Notebook1.2k102Updated 18 hours ago

bert-modelclipcomputer-visionembeddingsgptgpt-2loranatural-language-processingseq2seqvision-language-modelword2vec

EdVince/Stable-Diffusion-NCNN

Stable Diffusion in NCNN with c++, supported txt2img and img2img

C++1.1k105Updated 2 days ago

androidclipcppdiffusionexecutableimg2imgmnnncnnonnxstable-diffusiontensorrttnntxt2img

haltakov/natural-language-image-search

Search photos on Unsplash using natural language

Jupyter Notebook1.0k104Updated 5 days ago

clipcomputer-visionimage-searchmachine-learningphotosunsplash

ArrowLuo/CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python1.0k137Updated 6 days ago

activitynetclipdidemolsmdcmsrvttmsvdmultimodalmultimodal-learningmultimodalityrankingretrievalretrieval-modelsearchvideo-clip-retrievalvideo-text-retrieval

haltakov/natural-language-youtube-search

Search inside YouTube videos using natural language

Jupyter Notebook93472Updated 1 month ago

clipcomputer-visionmachine-learningsearchyoutube

hila-chefer/Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Jupyter Notebook903115Updated 5 days ago

clipdetrexplainabilityexplainable-aiinterpretabilitylxmerttransformertransformersvisualbertvisualizationvqa

omerbt/Text2LIVE

Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)

Python88877Updated 1 week ago

clipeccv2022generative-modelimage-editingimage-manipulationsingle-imagesingle-videotext-driven-editingtext2livevideo-editing

pengsongyou/openscene

[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies

Python80065Updated 1 day ago

3d-scene-understandingclipcvpr2023llmmatterport3dnuscenespoint-cloud-segmentationpoint-cloudsscannetsemantic-segmentation

eps696/aphantasia

CLIP + FFT/DWT/RGB = text to image/video

Python788104Updated 1 week ago

cliptext-to-imagetext-to-video

geekwenjie/SmartJavaAI

🔥🔥🔥Java免费离线AI算法工具箱，支持人脸识别，活体检测，表情识别、目标检测、实例分割、行人检测、OCR文字识别、车牌识别、表格识别、ASR+TTS、机器翻译等功能，Maven引用即可使用。支持PyTorch、Tensorflow，已集成 Mtcnn、InsightFace、SeetaFace6、YOLOv8~v12、PaddleOCR(PPOCRv5)、Whisper等主流模型

Java776137Updated 19 hours ago

androidasrclipdeep-learningdjlface-attributeface-comparisonface-detectionface-qualityface-recognitionlandmarkobject-detectionocr-recognitionpose-estimationsilent-face-anti-spoofingtable-structure-recognitiontranslationttsyolov12yolov8

PaddlePaddle/PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Python718222Updated 10 hours ago

aigcclipcontrolnetdeepseek-vlditeva-clipgot-ocr20image-to-textinternvl2llavaminicpm-vmultimodalppdiffusersqwen2-vlsd-xlsorastable-diffusionstablevideodiffusiontext-to-imagetext-to-video

Sense-GVT/DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Python67532Updated 3 weeks ago

big-modelclipimage-textmulti-modelself-supervisedvision-language-pretrainingzero-shot

Page 1 of 34