GitHunt
MA

markiskorova/ai-legal-assistant

⚖️ AI Legal Assistant — a modular, explainable legal-tech platform that uses Django and GPT-4o to analyze contracts, extract clauses, assess risk, and generate transparent summaries with case-level reasoning and evidence provenance.

AI Legal Assistant

AI Legal Assistant is a modular Django + React project for document-level legal analysis with explainable findings.

Summary

AI Legal Assistant demonstrates how AI can support document-level legal review without obscuring how conclusions were reached. Users upload a legal document, start an asynchronous review run, and retrieve structured findings produced by deterministic rule checks plus schema-constrained LLM analysis. Each run persists status, chunk-level provenance, evidence spans, and optional recommendations and embeddings so the review process remains inspectable, retry-safe, and auditable after execution.

For a fuller description of product behavior, see Project Overview. For system design details, see the Software Architecture Document in docs/architecture.md.

Current Capabilities (Phase 2 + Phase 1.5 Complete)

The project currently supports:

  • Uploading legal documents (.txt, .pdf, .csv, .xlsx)
  • Clause extraction
  • Deterministic rule checks plus LLM analysis
  • Always-async review execution (POST /v1/review/run returns run_id)
  • Review run lifecycle tracking (queued, running, succeeded, failed, partial)
  • Idempotency keys, concurrency caps, and request rate limits for review execution
  • Persisted chunk artifacts with stable chunk_id provenance
  • Run-level instrumentation (token_usage, stage_timings, cache hit/miss fields)
  • Persisting review runs and findings
  • Retrieving findings by document (optionally by run), with pagination/sorting
  • Optional finding recommendations and persisted embeddings
  • pgvector bootstrap support for Postgres deployments
  • A minimal React + TypeScript UI for upload, run, and findings review

Tech Stack

  • Backend: Django 5 + Django REST Framework
  • LLM: OpenAI API with strict JSON schema validation
  • Auth library available: djangorestframework-simplejwt
  • Database:
    • SQLite for local dev by default
    • PostgreSQL 16 in Docker Compose
    • pgvector bootstrap migration for Postgres (vector extension + vector index/column)
  • Frontend: React 18 + TypeScript + Vite
  • Container orchestration: Docker Compose
  • Async jobs: Celery + Redis

Repository Layout

ai-legal-assistant/
|-- apps/
|   |-- accounts/
|   |-- documents/
|   `-- review/
|-- backend/
|-- docker/
|-- docs/
|-- frontend/
|-- docker-compose.yml
|-- Dockerfile
|-- manage.py
`-- requirements.txt

API Endpoints (Implemented)

  • GET / - health check
  • POST /v1/documents/upload - upload/ingest a document
  • POST /v1/review/run - enqueue clause extraction + rules + LLM analysis (returns run_id)
  • GET /v1/review-runs/{id} - retrieve run status/progress for a review run
  • GET /v1/documents/{id}/findings - retrieve findings for latest run
  • GET /v1/documents/{id}/findings?run_id=<uuid> - retrieve findings for a specific run
  • GET /v1/documents/{id}/findings?page=1&page_size=50&ordering=-created_at - paginated/sorted retrieval

Async Run Semantics

  • POST /v1/review/run response codes:
    • 202 Accepted: run was queued and task enqueue succeeded
    • 200 OK: idempotency key reused an existing unexpired run
    • 409 Conflict: idempotency key exists but is expired (older than 24h)
    • 429 Too Many Requests: concurrency or rate limit reached
    • 503 Service Unavailable: enqueue failed
  • Run status values:
    • queued, running, succeeded, failed, partial
  • Findings retrieval query params:
    • run_id=<uuid> (optional)
    • page=<int> and page_size=<int> (optional)
    • ordering=<field> where field is one of created_at, severity, source, confidence (prefix with - for descending)
  • Partial-result policy:
    • If deterministic stages succeed but LLM stage fails/timeouts, run is marked partial.
    • Rule findings are still persisted and retrievable for that run.

Run Locally (Backend + Frontend)

  1. Create and activate a virtual environment.
  2. Install backend dependencies.
  3. Apply migrations.
  4. Run the backend.
  5. Install frontend dependencies and run Vite.
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
python manage.py migrate
python manage.py runserver

In a second terminal:

cd frontend
npm install
npm run dev

Required for async review execution in a third terminal:

celery -A backend worker -l info

URLs:

  • Backend: http://localhost:8000
  • Frontend: http://localhost:5173

Run with Docker Compose

docker compose up --build

Compose services:

  • db (PostgreSQL 16)
  • redis (Celery broker/result backend)
  • web (Django API on port 8000)
  • worker (Celery worker)
  • frontend (Vite dev server on port 5173)

Environment

Use .env (or copy from .env.example) for configuration:

  • LLM_PROVIDER (mock or openai)
  • OPENAI_API_KEY
  • OPENAI_MODEL
  • OPENAI_EMBEDDING_MODEL
  • DB_NAME, DB_USER, DB_PASSWORD, DB_HOST, DB_PORT
  • CELERY_BROKER_URL, CELERY_RESULT_BACKEND
  • REVIEW_MAX_CONCURRENT_RUNS, REVIEW_RATE_LIMIT_PER_MINUTE
  • REVIEW_ENABLE_PIPELINE_CACHE, REVIEW_CACHE_TTL_SECONDS
  • REVIEW_ENABLE_EMBEDDINGS, REVIEW_EMBEDDING_PROVIDER, REVIEW_EMBEDDING_DIM
  • REVIEW_FINDINGS_DEFAULT_PAGE_SIZE, REVIEW_FINDINGS_MAX_PAGE_SIZE

Note:

  • If LLM_PROVIDER=mock, analysis runs without external API calls.
  • If LLM_PROVIDER=openai and no API key is set, the code falls back to mock findings.
  • Default embedding provider is mock; set REVIEW_EMBEDDING_PROVIDER=openai to use OpenAI embeddings.
  • For existing findings, run embedding backfill:
    • python manage.py backfill_finding_embeddings --batch-size 100

Validation Commands

.\.venv\Scripts\python.exe manage.py check
.\.venv\Scripts\python.exe manage.py test -v 1
cd frontend; npm run build

Project Docs

  • docs/overview.md
  • docs/architecture.md
  • docs/MVP_Checklist.md
  • docs/PHASE_2_Checklist.md
  • docs/POST_MVP_PLAN.md
  • docs/AI_Legal_ARCHITECTURE.md (legacy pointer)
  • docs/verification_logs/

License

MIT