40 results for “topic:web-agent”
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Fuji is an AI agent that lives in your browser's sidepanel. You can now get tasks done online with a single command!
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use" (ACL 2025 Oral).
AI-powered login automation. Uses Claude to classify login pages and Playwright to interact with them.
Web-Use is a CDP powered Browser Agent
Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934
Run Surfer-H agents powered by Holo1 using the Surfer-H-CLI. Includes example tasks, scripts, and configurations.
[NAACL2025] LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications
The Library for LLM-based multi-agent applications
vibebin: code and host inside Incus containers on your own VPS/server.
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval And Synthesis For SLMs
Opensource benchmark evaluating web operators/agents performance
Agent Skill Induction: "Inducing Programmatic Skills for Agentic Tasks"
Code for 🌍 UI-Simulator: LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
Navi-Bench: benchmarking web agents on everyday tasks directly on real websites
CLI + SDK to automate, scrape, and extract from the web — for AI agents and humans. Cloud or local browser, one command.
Based on the MCP protocol, enable defining MCP Servers on the frontend, allowing AI to operate web applications using natural language. 基于 MCP 协议,实现在前端定义 MCP Server,用自然语言让 AI 操作 Web 应用。
Secure browser tab control for Claude/Codex
Screen recording and computer interaction capture tool that records keyboard/mouse input, screen video, DOM snapshots, and accessibility trees. Perfect for creating datasets to train and evaluate computer-use AI models.
Screen recording and computer interaction capture tool that records keyboard/mouse input, screen video, DOM snapshots, and accessibility trees. Perfect for creating datasets to train and evaluate computer-use AI models.
Python scripts for generating and categorizing web browsing tasks for benchmark datasets
Agent-CE is a containerized continuous evaluation (CE) platform for web browsing agents. It provides production-ready Docker images and CI/CD pipelines for running and evaluating multiple agent frameworks including Browser Use, Notte, Anthropic Computer Use, and OpenAI Computer Use.
A DOM-based browser agent that can tackle anything.
This dataset contains 3,167 completed tasks of human-computer interactions captured with video, screenshots, DOM snapshots, and detailed interaction events. Created by Paradigm Shift AI for advancing computer use AI agent research.
Neurosim is a Python framework for building, running, and evaluating AI agent systems. It provides core primitives for agent evaluation, cloud storage integration, and an LLM-as-a-judge system for automated scoring.
A Chrome extension that helps blind and visually impaired users navigate the web through voice commands. User speaks their goal, the agent understands, acts, and confirms via speech.
🤖SpidyCrawler - Synthetic Web Traffic Agent & Anti-Detection.
A web application that summarizes the content of any public web page using advanced AI language models.
AI-powered Chrome side panel assistant that understands natural language and performs real actions in your browser.