"topic:ai-ops" — Search

82 results for “topic:ai-ops”

incidentfox/incidentfox

AI-powered SRE platform for automated incident investigation

Python48953Updated 2 weeks ago

ai-opsai-sreclouddevopsincident-managementobservabilityon-call

traas-stack/holoinsightArchived

HoloInsight is a cloud-native observability platform with a special focus on real-time log analysis and AI integration.

Java35171Updated 8 months ago

ai-opsalertingdevopsk8slog-analysisloggingmetricsobservabilityprometheustraastrace

GMSSH/GMSSH

GMSSH: Desktop-Grade AI-Driven Operations Terminal High Performance · Non-Intrusive · AI-Powered;GMSSH 桌面级 AI 运维终端.高性能·AI 智驱

Go33441Updated 4 days ago

aiai-opsdevopsdevops-toolsguiintelligent-opslinux-administrationopsshellsshssh-clientsysadminterminalvps

bgdnvk/clanker

autonomous systems engineering cli agent for any cloud environment: AWS, GCP, Cloudflare, etc

Go20512Updated 6 hours ago

agentaiai-agentai-opsawsclankerclanker-cliclicloudflarecloudflare-workersdevopsgcpgolanginfrastructurekubernetesmlobservabilityopenclawsreterraform

ucsandman/DashClaw

🛡️Decision infrastructure for AI agents. Intercept actions, enforce guard policies, require approvals, and produce audit-ready decision trails.

JavaScript16033Updated 1 day ago

agent-frameworkagent-governanceagent-runtimeai-agentsai-infrastructureai-opsautogencrew-aidecision-enginedeveloper-toolshermeslangchainopenclaw

Temaki-AI/clawd-control

🏰 Real-time dashboard for monitoring and managing Clawdbot AI agents

HTML11520Updated 3 weeks ago

agent-managementai-agentsai-opsclaudeclawdbotdashboardllmmonitoring

R3gm/InsightSolver-Colab

InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning, machine learning, and related models.

Jupyter Notebook10231Updated 1 year ago

ai-opsaiopsautogptcolab-notebookcolorizationcomputer-visiondeep-learningllama-2llama-cppllmmachine-learningobject-detectionstable-diffusiontext-to-speech

avivl/cloud-sre-agent

An autonomous SRE agent that monitors cloud logs across multiple platforms, leveraging AI models from various providers to detect anomalies, perform root cause analysis, and automate remediation by creating GitHub Pull Requests.

Python384Updated 1 month ago

ai-agentsai-opsautomationawsclouddevopsgcpgemini-aigoogle-cloudincident-responsellmlog-analysislog-monitoringplatform-engineeringpythonresiliencesrevertex-ai

easyshell-ai/easyshell

Lightweight server management & intelligent ops platform with Docker one-click deployment, batch script execution, web terminal, and AI-powered operations.

Java375Updated 21 hours ago

ai-opsserver-managementweb-terminal

mverab/metaclaw

Founder-first meta-layer to generate production-ready OpenClaw multi-agent setups from one prompt: agents, skills, workflows, memory, and install docs.

Go Template236Updated 2 weeks ago

agent-orchestrationai-agentsai-opsautonomous-agentsdeveloper-toolsfounder-toolsllmmetaclawmulti-agentopenclawprompt-engineeringsetup-generatorskillsskills-shworkflow-automation

sherpa-sh/Sherpa-Action

AI that ships your code. Deploy to any cloud with plain English.

212Updated 1 month ago

aiai-developer-toolsai-opsawsci-cdclaude-codeclaude-code-pluginclaude-code-pluginscloudflarecontinuous-deploymentdeploymentdevopsgithub-actiongithub-actionsinfrastructure-as-codenextjssherpa-sh

petterjuan/agentic-reliability-framework

ARF is an agentic reliability intelligence platform that separates decision intelligence (OSS) from governed execution (Enterprise), enabling autonomous operations with deterministic safety guarantees.

Python195Updated 3 weeks ago

ai-agentsai-infrastructureai-opsanomaly-detectionautonomous-systemsdevopsgraph-memoryincident-managementmlops-workflowobservabilityobservability-platformproduction-monitoringpython-libraryreliability-engineeringself-healingself-healing-infrastructuresre

botanu-ai/botanu-sdk-python

SDK to track cost-per-outcome for AI workflows

Python151Updated 3 weeks ago

ai-opscloud-cost-efficiencycost-optimizationenterprise-solutionsfinopsgenaigenai-usecasellmmachine-learningobservabilityopentelemetryopentelemetry-pythonoutcomes-analyticsroi-analysistracing

Zyling-ai/ZyHive

🐝 引巢 · ZyHive | AI 团队操作系统 — 为 AI 成员注入灵魂，可视化管理多智能体团队。Self-hosted AI Team OS · Go + Vue 3 · One-click deploy

Go133Updated 2 days ago

agent-frameworkai-agentai-opsai-teamchatbotdockergolangllmmulti-agentopen-sourceself-hostedvue3

bitsandbrains/ai-ad-creative-strategist

Advanced, end-to-end, enterprise-grade agentic AI pipeline that automates competitor ad intelligence, performs multimodal creative strategy extraction, enables brand-safe adaptation, and generates AI video ads using LLM reasoning, multimodal analysis, and deterministic workflow orchestration with full auditability.

81Updated 2 months ago

ad-intelligenceagentic-ai-developmentai-advertisingai-content-generationai-opsai-orchestrationai-strategy-engineai-video-generationbrand-intelligencecreative-intelligenceenterprise-aievent-driven-architecturegenerative-videogrowth-engineeringllm-pipelinesmarketing-ai-generativemultimodal-ain8n-automationsora-video-aiworkflow-automation

vitas/evidra-lock

MCP Kill-switch for AI agents. Validates infrastructure operations before execution. Fail-closed. Evidence-backed.

Go71Updated 1 week ago

aiai-agentai-guardai-integrationai-opsai-ops-guardrailsauditcompliancedevopsdevops-toolsevidenceguardrailskubernetes-securitykustomizemcp-serveropapolicy-as-codeterraform

alibaba/AIOpsServing

Open source code for AIOpsServing

Python63Updated 3 years ago

ai-opsalicloud-compatiblemachine-learningmlflow-compatiblemodel-benchmarkingmodel-serving

Runbook-Agent/RunbookAI

Hypothesis-driven AI agent for incident investigation. AWS, K8s, PagerDuty.

TypeScript60Updated 2 weeks ago

ai-opsawsclaudedevopsincident-responsekubernetesllm-agenton-callon-call-documentationsre

seehiong/n8n-k8s-monitor

n8n workflows for Kubernetes monitoring, diagnostics, and automated remediation

60Updated 1 week ago

ai-opsautomationdevopshomelabk8skubectlkubernetesn8nplatform-engineeringself-healingtalos

LinChuang2008/vigilops

AI-powered open-source monitoring platform with auto-remediation. 6 built-in runbooks, MCP integration (global first), DeepSeek root cause analysis. 5-minute Docker setup.

Python52Updated 1 day ago

aiai-opsaiopsalertingauto-remediationdevopsdockerfastapiincident-responseinfrastructure-monitoringmcpmonitoringobservabilityopen-sourcereactself-healingself-hosted

flavienbwk/repochat-action

Deploy an AI-powered chatbot for your repo in under 2 minutes.

JavaScript30Updated 1 year ago

ai-opschatbotdevopsgithub-actionsllmllm-ops

pguso/ai-agents-saas-edition

Real-world patterns for shipping AI agents to production. Learn versioning, cost optimization, multi-tenancy, guardrails, and observability through runnable TypeScript examples.

TypeScript31Updated 2 months ago

agent-architectureagentsai-agentsai-opsai-testingguardrailsllmsobservabilityopenaiproductionproduction-aiprompt-engineeringsaas

clay-good/tpu-doc

tpu-doc is a zero-dependency diagnostic binary for Google Cloud TPU environments that instantly validates hardware health, discovers software stack configurations, and provides AI-powered log analysis to eliminate expensive debugging downtime.

Rust20Updated 2 months ago

ai-opscloud-infrastructurediagnosticsdistrbuted-traininggcpgoogle-cloud-tpuhardware-validahigh-performance-computingjaxllm-opsmachine-learning-infrastructuremlopsperformance-monitoringpytorchrusttensorflowtputroubleshooting

renezander030/fixclaw

Deterministic first. Fetching, filtering, and routing are plain code. AI is only used for judgment calls: classification, drafting, summarization.

Go20Updated 10 hours ago

ai-automationai-opsemail-assistantgolanghuman-in-the-loopllmmicrosoft-365slacksmall-businessyaml-pipelines

omri3193/Enterprise-Multi-AI-Agent-Systems-

🤖 Build and deploy scalable Multi-AI Agent systems with LangGraph and Groq LLMs to enhance intelligence across enterprise applications.

Python20Updated 1 hour ago

agentic-aiagentic-engineeringagentic-frameworkagentic-workflowai-opsanomaly-detectionautonomous-agentsautonomous-systemsclaude-codecodexmcp-serverobservability-platformpython-libraryreliability-engineeringself-healingself-healing-infrastructureswarmswarm-intelligence

khael-kun-cmd/gemini-sre-agent

🚀 Enhance Google Cloud operations with the Gemini SRE Agent, automating log monitoring and incident response for smarter site reliability.

Python21Updated 2 hours ago

ai-opsautomationdevopsgemini-aigoogle-cloudincident-responselog-monitoringpythonresiliencesrevertex-ai

vijayanandmit/auto-runbook

Automated Runbook Engine for Incident Response, CLI, command line documentation, and AI-assisted Operations

Shell10Updated 1 month ago

agentagentic-aiagentsai-opsautomationautomation-frameworkautomation-testingchatopsdevopsincident-responseobservabilityoperations-automationplatform-engineeringplaybooksrunbookself-healingsitesitereliabilitysresre-infrastructure

arkon-ai/arkon

The AI Operations Control Plane — monitor, govern, and automate your AI agents

TypeScript10Updated 4 days ago

ai-agent-controlai-agentsai-governanceai-opscontrol-planellm-monitoringmcpnemoclawopenclaw

bluet/arguslm

The hundred-eyed watcher for your LLM providers. Monitor uptime, TTFT, TPS, and latency across OpenAI, Anthropic, Azure, Bedrock, Ollama, LM Studio, and 100+ providers through a single dashboard. Benchmark, compare, and get alerts — all self-hosted.

Python10Updated 3 weeks ago

ai-opsanthropicdashboardfastapilitellmllmllm-benchmarkllm-monitoringllm-opslm-studiomlopsmonitoringobservabilityollamaopenaiperformance-monitoringpythonreactself-hostedtypescript

dongkoony/LLM-Quality-Observer

Production-ready MLOps platform for monitoring and evaluating LLM response quality with automated alerts and real-time analytics

TypeScript10Updated 1 week ago

ai-opsanalyticsdockerfastpaillmllm-evaluationmlopsmonitoringopenaiopenai-apipython

Page 1 of 3