"topic:vision-ai" — Search

76 results for “topic:vision-ai”

Open Vision Agents by Stream. Build Vision Agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.

Python7.4k573Updated 1 hour ago

agentic-aiagentsaiai-agentsrealtimesttttsvideo-agentsvideo-aivision-aivoice-ai

Duelion/homebox-companion

AI-powered companion for Homebox. Snap photos and let AI auto-identify and catalog items into your inventory, then use the AI Chat to organize, search, and update your inventory effortlessly.

Python24962Updated 1 day ago

dockerfastapihomeboxinventorylitellmopenaisveltevision-ai

athrael-soju/Snappy

🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐

Python8315Updated 2 days ago

colpalicomputer-visiondeepseek-ocrdockerdocument-retrievaldocument-understandingfastapimultimodal-aimultivector-searchnextjspdf-searchpythonqdrantragtypescriptvector-databasevector-searchvision-aivisual-retrieval

GetStream/awesome-ai-news

Keep track of what has happened in AI this month. Discover the best AI/LLM resources and news for this month.

603Updated 1 day ago

aiai-newsaimodelsanthropicartificial-intelligencechatgptdeepseekelevenlabsgeminigemini3gpt-5kimi-aillmmistralopenaiqwenvision-aivoice-ai

instill-ai/console

📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core

TypeScript4112Updated 1 week ago

computer-visionconsoledata-connectordata-pipelinedeep-learningfrontendimage-classificationmodel-servingno-codeobject-detectionstructured-datauiunstructured-datavdpversatile-data-pipelinevision-ai

SamurAIGPT/Open-Pomelli

Open-source implementation of Pomelli project by Google

Python178Updated 1 hour ago

ad-generatorai-campaign-generatorautomated-marketingbrand-analyzerbrand-consistencybusiness-dnagenerative-aigoogle-pomelli-alternativemarketing-aimarketing-automationopen-source-marketingplaywrightpomellipythonsocial-media-automationvision-ai

YCSE/nanobanana-mcp

Gemini Vision & Image Generation MCP for Claude Desktop and Claude Code

JavaScript165Updated 3 days ago

aiclaudeclaude-desktopgeminigoogle-aiimage-generationmcpmodel-context-protocolmultimodalvision-ai

maim010/openclaw-video-vision

AI-powered video understanding — extract key frames from YouTube, Bilibili & any video page, get structured summaries via vision AI. Supports yt-dlp, Playwright, cloud browsers. AI驱动的视频理解-从YouTube， Bilibili和任何视频页面提取关键帧，通过VLM获得结构化摘要。支持yt-dlp、Playwright和一些常见云浏览器。

JavaScript130Updated 2 days ago

agentaiai-toolsautomationbilibiliffmpegopenclawplaywrightskillsvideovision-aivlmweb-scrapingyoutubeyt-dlp

pej0918/SK-RD4AD

[CVPRW'25] Official Code For "SK-RD4AD: Skip-Connected Reverse Distillation for One-Class Anomaly Detection"

Python122Updated 1 month ago

anomaly-detectioncomputer-visioncvpr-workshop-2025industrial-aione-class-classificationskip-connectionvision-ai

yihong1120/YOLOv8-License-Plate-Insights

This repository demonstrates YOLOv8-based license plate recognition with GCP Vision AI integration, enabling versatile real-world applications like vehicle identification, traffic monitoring, and geospatial analysis while capturing vital media metadata for enhanced insights.

Jupyter Notebook125Updated 1 week ago

computer-visiondata-augmentationdeep-learninggcpgeospatial-analysislicense-plate-recognitionmachine-learningmedia-metadatamlobject-detectionocropencvoptical-character-recognitionpythonpytorchtraffic-monitoringultralyticsvehicle-identificationvision-aiyolov8

templetwo/spiral-agent

🌀 The world's first emotionally intelligent CLI that thinks, creates, and empathizes with developers. Autonomous AI with Vision, Dream Engine, and Emotional Intelligence.

TypeScript71Updated 1 month ago

ai-assistantcli-tooldeveloper-toolsemotional-intelligencereact-frameworktypescriptvision-ai

josharsh/md-pdf-md

Bidirectional Markdown↔PDF converter with AI-powered vision. MD→PDF with beautiful themes, PDF→MD with LLaVA - open source & privacy-first

TypeScript62Updated 2 weeks ago

aibeginner-friendlybidirectionalcliconverterdocumentationgood-first-issuehacktoberfestllavamarkdownmarkdown-to-pdfnodejsollamaopen-sourcepdfpdf-to-markdownprivacy-firstpuppeteertypescriptvision-ai

Gavri-dev/kAIhoot

AI-Powered Kahoot Auto-Answer Chrome Extension — supports every question type

JavaScript50Updated 19 hours ago

aiauto-answerchrome-extensiondom-manipulationfreegptgpt-5javascriptkahootkahoot-answerskahoot-botkahoot-hackkahoot-hacksopenaiquizreactvision-aiwebsocket

Navy10021/MDDenseResNet

MDDenseResNet : Enhanced Malware Detection Using DNNs

Jupyter Notebook40Updated 1 week ago

cyber-securitydeep-learning-algorithmsdeep-neural-networksmalware-analysismalware-detection-frameworkvision-ai

choudaryhussainali/MCQ_Grading_Bot

MCQ_Grading_Bot is an AI-powered tool that grades solved MCQ exam sheets from images using Gemini Vision. It extracts student info, checks answers, calculates score, and displays detailed results—all through a simple Gradio interface in Colab.

Jupyter Notebook40Updated 6 months ago

ai-in-educationai-projectanswer-sheet-evaluationautomated-gradingedtecheducational-technologyexam-checkingexam-evaluationgoogle-generative-aigrading-botgradioimage-processingmachine-learningmcq-checkermcq-gradingocrpillowpythonvision-ai

andrew-shwetzer/human-experience

Vision-powered UX simulation engine for Claude Code. Renders pages in a real browser, captures 36+ screenshots across viewports, clicks through interactive elements, maps CTA funnels, tests signup flows, and scores across 7 UX dimensions. Replaces manual user testing with automated multi-viewport analysis.

JavaScript30Updated 21 hours ago

ai-agentautomationclaude-codedeveloper-toolsplaywrightuser-experienceux-auditux-testingvision-aiweb-testing

Poolchaos/Lumi

AI-powered health platform with multi-LLM engine (GPT-4o, Claude, Gemini). Workout generation, medication tracking with OCR, vision AI, gamification with leaderboards/rewards. Self-hosted, privacy-first.

TypeScript30Updated 3 weeks ago

anthropicdockerexpressfitness-trackergamificationgeminihealth-analyticsllmmedication-trackingmongodbnodejsocropenaiprivacy-firstreactself-hostedtailwindcsstypescriptvision-aiworkout-generator

dineshtripathi/documind-engineering

Hybrid AI orchestration stack combining local LLMs (Ollama), vector search (Qdrant), and Azure AI Foundry for scalable RAG, Agentic AI, and Vision. Built with .NET 8 and Python.

Python20Updated 5 months ago

agentic-aiazure-ai-foundrydotnethybrid-aiinferencemistral-7bollamaopen-aiorchestratorphi3-minipythonqdrantqwenragroutingvision-ai

simonyang0608/DeeperSimon

General vision AI defect detection engine for MLops process/simulations

Python20Updated 10 months ago

classificationdefect-detectiondetectionmlopsopencvpythonpytorchsegmentationshell-scriptingvision-ai

KazKozDev/vision-agent-analyst

Vision Agent Analyst is a professional web application for automatic analysis of visual data (diagrams, interfaces, documents) using multimodal artificial intelligence models.

Python20Updated 3 months ago

ai-agentscomputer-visiondata-visualizationdocument-analysisfastapifinancial-analysisimage-analysisllmmultimodal-aipdf-processingpythonreacttypescriptui-reviewvision-ai

AkashKobal/springboot-gemini-integration

Spring Boot + Gemini AI integration using Ollama Cloud with support for text and image chat APIs.

Java20Updated 14 hours ago

aiai-integrationchatbotgemgemini-aigenerative-aijavallmollamaollama-cloudrest-apispring-bootthymeleafvision-ai

go-park-mail-ru/2023_2_OND_team

Backend проекта Pinterest команды OND team

Go22Updated 2 years ago

backendcdciclean-architecturedockerdocker-composeeasyjsongolanggrafanagrpc-gokafkametricsmicroservicespgxpoolpostgresqlprometheusredisswaggervision-aiwebsocket

srvaroa/ai-camera

People detection and notifications based on the Raspberry Pi + AI Camera

Python21Updated 4 months ago

airasbperry-piraspberry-pi-cameravision-ai

wgabrys88/windows-ai-agent-toolset-v1

qwen3-vl-2b-instruct performing step by step tasks confirming normalized coordinations usage and tools executions

Python20Updated 1 month ago

agentcomputer-controllocal-aiproject-basesimplicitytoolsetvision-ai

s59mz/eagle-eye-ai

Eagle-Eye-AI is a project designed for the Kria KR260 board that enables AI-driven camera tracking and face detection.

Tcl22Updated 6 months ago

deep-learningfollow-camerakr260kriamachine-learningros2vision-aizynq-ultrascale

datamata-io/mata

Model-Agnostic Task Architecture — a task-centric computer vision framework.

Python10Updated 2 days ago

ai-infrastructurecomputer-visiondeep-learningmachine-visionmodel-orchestrationvision-aivision-pipeline

lucidprogrammer/youtube-vision-transcriber

AI-powered pipeline that converts YouTube videos into polished articles using vision-based transcription - captures code, terminal output, and on-screen text that subtitles miss

Python11Updated 2 weeks ago

aifast-agentgeminiknowledge-knowledge-extractionllmmcpmodel-context-protocolopenaipythontranscriptionvideo-to-textvision-aiyoutube

shyamsridhar123/MultiAgent-CUA

Multi-Agent Vision-Driven Automation Showcase: CUA + Playwright + LangChain

Python11Updated 1 month ago

automationcomputer-uselangchainmulti-agentplaywrightpythonvision-ai

atahabilder1/DocuMind

Multi-modal AI agent that extracts information from PDFs, images, and documents to answer questions. Combines vision models with RAG architecture for intelligent document understanding. Upload any file and chat with your documents. Built with LangChain, vision APIs, and vector embeddings.

Python10Updated 1 month ago

ai-agentsdocument-processing-pipelinemutli-modalquestion-answeringragvision-ai

samestrin/chromium-screenshots

Vision AI "Cortex" for Agents. A Playwright-based MCP Server & API that captures screenshots with ground-truth DOM extraction and full auth state injection. Containerized.

Python10Updated 2 months ago

ai-agentsautomationcomputer-usedocker-imagedom-extractionheadless-chromellm-toolsmcp-serverocrplaywright-pythonpython-fastapiscrapingscreenshot-apivision-aizero-drift

Page 1 of 3