44 results for “topic:ai-infra”
OpenSandbox is a general-purpose sandbox platform for AI applications, offering multi-language SDKs, unified sandbox APIs, and Docker/Kubernetes runtimes for scenarios like Coding Agents, GUI Agents, Agent Evaluation, AI Code Execution, and RL Training.
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑💻 Video Tutorials.
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
A full-stack AI Red Teaming platform securing AI ecosystems via AI Infra scan, MCP scan, Agent skills scan, and LLM jailbreak evaluation.
The context backend for AI agents. Durable agent memory you can trust. Build, version, and retrieve grounded context from a context graph.
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
AI 基础知识 - GPU 架构、CUDA 编程、大模型基础及AI Agent 相关知识
High-performance distributed multi-tier cache system. Built in Rust.
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
hpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等
Implement a Pytorch-like DL library in C++ from scratch, step by step
Transform your pythonic research to an artifact that engineers can deploy easily.
This is a landscape of the infrastructure that powers the generative AI ecosystem
Cloud Native ML/DL Platform
A High-Performance LLM Inference Engine with vLLM-Style Continuous Batching
Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.
LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model
Agent Sandbox is an E2B compatible, enterprise-grade ai-first, cloud-native runtime environment for AI Agents. Allows Agents to securely run untrusted LLM-generated Code, Browser use, Computer use, and Shell commands etc. with stateful, long-running, multi-session and multi-tenant.
💥 Make peer-2-peer global works
KsanaDiT: High-Performance DiT (Diffusion Transformer) Inference Framework for Video & Image Generation
A curated list of awesome tools, frameworks, platforms, and resources for building scalable and efficient AI infrastructure, including distributed training, model serving, MLOps, and deployment.
ElasticMM: Elastic and Efficient MLLM Serving System
TME: Structured memory engine for LLM agents to plan, rollback, and reason across multi-step tasks.
vgpu.rs is the fractional GPU & vgpu-hypervisor implementation written in Rust
Triton for OpenCL backend, and use mlir-translate to get source OpenCL code
This repository contains a list of various service-specific Azure Landing Zone implementation options.
Memory Management Service, a Long Term Memory Solution for AI
A distributed cluster orchestrator for AI/ML batch workloads. Orchestrates containers via a custom Rust runtime.
The coordination protocol for autonomous AI agents across networks. Summoner lets you compose, run, and coordinate agents over a WAN with a Python SDK and Rust server.
The Lisa programming language.