1,108 results for “topic:clip”
Effortless data labeling with AI support from Segment Anything and other awesome models.
BoxMOT: Pluggable SOTA multi-object tracking modules for segmentation, object detection and pose estimation models
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
开放源码的无App推送服务,iOS14+扫码即用。亦支持快应用/iOS和Mac客户端、Android客户端、自制设备
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
OpenMMLab Pre-training Toolbox and Benchmark
中文nlp解决方案(大模型、数据、模型、训练、推理)
Collection of AWESOME vision-language models for vision tasks
Image to prompt with BLIP and CLIP
Easily compute clip embeddings and build a clip retrieval system with them
🥂 Gracefully face hCaptcha challenge with multimodal large language model.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Android UI 快速开发,专治原生控件各种不服
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Famous Vision Language Models and Their Architectures
This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.
Stable Diffusion in NCNN with c++, supported txt2img and img2img
Search photos on Unsplash using natural language
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Search inside YouTube videos using natural language
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)
[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies
CLIP + FFT/DWT/RGB = text to image/video
🔥🔥🔥Java免费离线AI算法工具箱,支持人脸识别,活体检测,表情识别、目标检测、实例分割、行人检测、OCR文字识别、车牌识别、表格识别、ASR+TTS、机器翻译等功能,Maven引用即可使用。支持PyTorch、Tensorflow,已集成 Mtcnn、InsightFace、SeetaFace6、YOLOv8~v12、PaddleOCR(PPOCRv5)、Whisper等主流模型
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm