RustyGPT

RustyGPT is a workspace of Rust crates that together provide a chat assistant server, a Yew web UI, and a command line interface. The project focuses on end-to-end Rust implementations for authentication, threaded conversations, Server-Sent Event (SSE) streaming, and local LLM execution through a pluggable llama.cpp provider.

Workspace layout

Crate	Purpose
`rustygpt-server`	Axum HTTP server with authentication, rate limiting, SSE streaming, and OpenAPI documentation.
`rustygpt-web`	Yew single-page application that consumes the server APIs and renders threaded conversations.
`rustygpt-cli`	Command line client for logging in, inspecting conversations, following SSE streams, and running the server locally.
`rustygpt-shared`	Shared models, configuration loader, and llama.cpp integration code reused by all binaries.
`rustygpt-doc-indexer`	Helper used by the docs build to generate the machine-readable index.
`rustygpt-tools/confuse`	Development helper that runs frontend/backend watchers via the `just dev` recipe.

Other notable directories include scripts/pg for schema/procedure SQL, deploy/grafana for metrics dashboards, and docs for the mdBook documentation.

Capabilities

Threaded conversations – /api/conversations and /api/threads endpoints manage conversation membership, invites, roots, and replies (rustygpt-server/src/handlers/{conversations,threads}.rs).
Streaming updates – conversation_stream in handlers/streaming.rs broadcasts ConversationStreamEvent values over SSE at /api/stream/conversations/:conversation_id, with optional PostgreSQL persistence configured through [sse.persistence].
Authentication – cookie-backed sessions, refresh, and logout flows (see handlers/auth.rs) plus optional GitHub or Apple OAuth handlers when the relevant environment variables are present. First-time setup uses /api/setup to create the initial administrator (handlers/setup.rs).
Rate limiting – middleware::rate_limit enforces per-route buckets populated from the database using stored procedures in scripts/pg/procs/034_limits.sql. Admin APIs under /api/admin/limits/* allow live updates when rate_limits.admin_api_enabled and features.auth_v1 are enabled.
Local LLM inference – AssistantService streams replies via llama.cpp models configured under [llm] in config.toml, with metrics such as llm_model_cache_hits_total and llm_model_load_seconds.
Observability – Prometheus counters and gauges for health checks, bootstrap progress, rate limiting, and LLM usage, plus /metrics, /healthz, and /readyz endpoints. Grafana dashboards live in deploy/grafana/.
Typed configuration – rustygpt-shared::config::server::Config loads layered TOML/YAML/JSON files with environment overrides (e.g. RUSTYGPT__SERVER__PORT). The template config.example.toml documents all sections.

Quick start

Install prerequisites
- Rust 1.81+ (rustup default stable)
- just, cargo-watch, and trunk
- PostgreSQL 15+ (local install or Docker)
- Optional: llama.cpp-compatible model files for streaming replies
Create a configuration file
```
cp config.example.toml config.toml
```
Adjust values as needed. For a full local experience set:
```
[features]
auth_v1 = true
sse_v1 = true
well_known = true
```
Ensure [db].url points to your PostgreSQL instance and that the database already exists.
Start PostgreSQL You can use the provided Compose service:
```
docker compose up postgres -d
```
The server automatically runs the bootstrap SQL in scripts/pg on startup.
Run the backend
```
just run-server
```
The process listens on http://127.0.0.1:8080 by default.

Perform first-time setup POST to /api/setup once to create the initial admin account:

curl -X POST http://127.0.0.1:8080/api/setup \
  -H 'Content-Type: application/json' \
  -d '{"username":"admin","email":"admin@example.com","password":"change-me"}'

Run the web client
```
just web-serve
```
The SPA proxies API requests to the backend and renders conversations, presence, and streaming updates.
Use the CLI
```
just cli login
just cli chat --conversation <uuid>
just cli follow --root <thread-uuid>
```
Commands reuse the same configuration loader and session cookies as the server. See rustygpt-cli/src/main.rs for the full list of subcommands (serve, chat, reply, follow, spec, completion, config, login, me, logout).

Observability

Metrics are exposed at /metrics after calling server::metrics_handle(). Key instruments include:

Metric	Description
`health_checks_total{endpoint,status}`	Count of `/healthz` and `/readyz` responses.
`db_bootstrap_batches_total{stage,status}` / `db_bootstrap_script_duration_seconds{stage,status}`	Bootstrap progress per SQL stage (`schema`, `procedures`, `indexes`, `seed`).
`db_liveness_checks_total{status}` / `db_readiness_checks_total{status}`	Database readiness probes.
`db_pool_max_connections`, `db_statement_timeout_ms`	Gauges reflecting the active configuration.
`http_rate_limit_requests_total{profile,result}`	Requests allowed or denied by the rate limit middleware.
`http_rate_limit_remaining{profile}` / `http_rate_limit_reset_seconds{profile}`	Current token state per bucket.
`rustygpt_limits_profiles`, `rustygpt_limits_assignments`	Gauges updated when admin routes reload configuration.
`llm_model_cache_hits_total{provider,model}` / `llm_model_load_seconds{provider,model}`	llama.cpp model cache activity.

Import the Grafana dashboards in deploy/grafana/*.json to visualise these metrics.

Documentation

The mdBook at docs/ covers architecture, API reference, configuration keys, and operational guides. Run just docs-serve to preview it locally or browse the published version via GitHub Pages.

Contributing

Contributions are welcome! Please review CONTRIBUTING.md and the code of conduct before opening a pull request. Run just check and just test prior to submitting changes. Security concerns should be reported via the SECURITY.md process.

License

RustyGPT is available under the Apache 2.0 license. See LICENSE for details.

VannaDii/RustyGPT