77 results for “topic:gpt-4-vision”
Resource, examples & tutorials for multimodal AI, RAG and agents using vector search and LLMs
【新增PDF和Office文件解析上传】安卓端全场景GPT助手,可用音量键唤起并进行语音交流,支持联网、拍照、模板、PDF和Office文件解析等 | GPT assistant for Android, activated via volume keys for voice interaction, supporting features such as networking, taking photos, templates and parsing PDF and Office documents.
The most advanced Web UI for AI chat
Cool experiments at the intersection of Computer Vision and Sports ⚽🏃
SGPT is a command-line tool that provides a convenient way to interact with OpenAI models, enabling users to run queries, generate shell commands and produce code directly from the terminal.
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more
AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.
Convert a screenshot to a working Flutter app.
A versatile multi-modal chat application that enables users to develop custom agents, create images, leverage visual recognition, and engage in voice interactions. It integrates seamlessly with local LLMs and commercial models like OpenAI, Gemini, Perplexity, and Claude, and allows to converse with uploaded documents and websites.
Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦
GPT-4 Vision Chatbot examples
ChatGPT wrapper in your TTY
GPT 4 Turbo Vision with Chainlit
API | GPT-5, GML-4.5, VEO-3, Kling, gpt-4o, Claude 4 opus, command a, Recraft v3, Dalle-3, Stable Diffusion, Flux, Kandinsky, Suno V4.5, Hailuo, TTS
This sample project integrates OpenAI's GPT-4 Vision, with advanced image recognition capabilities, and DALL·E 3, the state-of-the-art image generation model, with the Chat completions API. This powerful combination allows for simultaneous image creation and analysis.
Language instructions to mycobot using GPT-4V
This tool offers an interactive way to analyze and understand your screenshots using OpenAI's GPT-4 Vision API. Capture any part of your screen and engage in a dialogue with ChatGPT to uncover detailed insights, ask follow-up questions, and explore visual data in a user-friendly format.
Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.
Using Azure OpenAI deployment of GPT-4 Turbo with Vision to analyse out-of-stock situation in a fictitious retail shop.
Object detection using Open AI Vision Model
A web-based tool that utilizes GPT-4's vision capabilities to analyze and describe system architecture diagrams, providing instant insights and detailed breakdowns in an interactive chat interface.
Capture images with HoloLens and receive descriptive responses from OpenAI's GPT-4V(ision).
An AI-powered Mattermost ChatGPT chatbot that utilizes the OpenAI API to provide helpful, contextual responses to user messages, extract text from links, and describe or generate images. With Docker support!
Use text or image prompts to generate components and apps built with React.
A customizable GPT in a single page, using OpenAI models text-embedding-ada-002, tts-1, whisper-1, dall-e-3, and gpt-4-vision-preview
A simple chat app with vision using Next.js, Vercel AI SDK, and GPT-4V.
Create AI Agents equipped with tools and extensions
OSINT Platform - Provides image analysis, digital footprints, video transcription and more. Retrieval Augmented Generation (RAG) capable platform
Создание телеграм бота с ChatGPT o1, o3-mini, DeepSeek, Claude 3.7, Command-A, MiniMax