GitHunt
CH

chen-friedman/awesome-legaltech

Curated open-source Legal AI & LegalTech — tools, datasets, benchmarks, and learning resources. Global scope; jurisdiction-tagged.

Awesome Legal AI Awesome GitHub Stars PRs Welcome

Curating the best production-ready tools, datasets, and communities for legal professionals and developers

Legal Tech
AI Powered
Global Scope

🇬🇧 English | 🇮🇱 עברית


What Makes This List Special?

Rigorous Quality Standards → Only production-ready, actively maintained projects with real-world adoption
Global Legal Coverage → Worldwide scope with clear jurisdiction tagging (🇺🇸 🇪🇺 🇬🇧 🇩🇪 🇮🇳 🌍)
Practitioner-Tested Tools → Real solutions that legal professionals deploy in actual workflows
Premium AI Resources → Curated datasets, benchmarks, and models purpose-built for legal applications
Thriving Ecosystems → Active communities driving innovation and collaborative development

New to Legal AI? Start with our Quick Start Guide below!


Table of Contents

Quick Navigation Count Best For
NLP Libraries & Domain Models 6 projects Text processing & analysis
AI-Powered Contract & Document Analytics 5 platforms Contract intelligence
Legal Research & Case Law Data/APIs 6 resources Research & citations
E-Discovery & Litigation 3 tools Legal discovery
Speech Recognition & Transcription 8 tools Audio/video transcription
Document Signing & Collaboration 3 platforms Digital signatures & wikis
Document Management, OCR & PDF 9 solutions Document processing
Document Assembly & Rules-as-Code 4 platforms Automation & workflows
Datasets & Benchmarks 8 collections Training & evaluation
General-Purpose Document Intelligence 6 tools Document understanding
Learning, Communities & Curations 2 communities Education & networking

Total: 60+ High-Quality Open-Source Legal Tech Resources


Quick Start Guide

  1. Start with: AI-Powered Contract Analytics for document review
  2. Research tools: Legal Research & Case Law APIs for case discovery
  3. Document processing: Document Management & OCR for digitization

For Developers

  1. Begin with: NLP Libraries & Models for text processing
  2. Training data: Datasets & Benchmarks for model development
  3. Infrastructure: Document Intelligence for pipelines

For Organizations

  1. Enterprise solutions: OpenContracts for contract analytics
  2. Document workflows: docassemble for automation
  3. Case management: CourtListener for legal data

NLP Libraries & Domain Models

Essential tools for processing and understanding legal text with specialized language models

Project Focus Scope Status
LexNLP Information extraction from unstructured legal text (Python) Global Active
Blackstone spaCy pipeline for long-form legal text processing Global Active
LEGAL-BERT Pretrained BERT variants for legal corpora (contracts, ECHR, EU law) EU/Global Stable
InLegalBERT 🇮🇳 BERT models and recipes for Indian law corpora India Active
CaseHOLD Tasks and baselines for case-law holdings analysis Global Research
LeXLMs Corpora and probing tasks for legal language models Multilingual Research
Legal-HeBERT 🇮🇱 BERT model for Hebrew legal and legislative domains Israel Research

AI-Powered Contract & Document Analytics

Enterprise-grade platforms for intelligent contract analysis and document understanding

Project Features Best For Maturity
OpenContracts Enterprise document analytics with AI-powered analysis (GPL-3) Large organizations Enterprise
ContraxSuite Full contract analytics & document platform (AGPL) Commercial use Production
LawGlance Free, open-source RAG-based AI legal assistant SME & individuals Community
OpenEDGAR 🇺🇸 Framework for searchable EDGAR filings databases US Securities Stable
CUAD Tools Code and data interfaces for Contract Understanding Research Research

Comprehensive databases and APIs for legal research and case law discovery

Project Coverage Jurisdiction API Access
CourtListener 🇺🇸 Primary legal data & research platform United States API
Juriscraper 🇺🇸 Scrapers for opinions, oral arguments, PACER content United States Tools
Eyecite 🇺🇸 Fast, robust legal citation extractor United States Library
Caselaw Access Project 🇺🇸 6.7M+ U.S. court decisions with API United States API
UK National Archives 🇬🇧 Public API for UK court judgments United Kingdom API
Open Legal Data 🇩🇪 German legal data platform & API Germany Platform

E-Discovery & Litigation

Specialized tools for legal discovery, document review, and litigation support

Project Capabilities Use Case License
FreeEed Complete eDiscovery processing (OCR, indexing, metadata) Large-scale discovery Open Source
FreeDiscovery Information retrieval engine based on scikit-learn Document analysis Open Source
FOIAMachine 🇺🇸 Manage and send FOIA requests with agency directory Government transparency Open Source

Speech Recognition & Transcription

Essential tools for converting audio/video to text in legal workflows

Project Specialty Performance Use Case
Whisper General-purpose speech recognition by OpenAI High Multilingual transcription
WhisperX Fast ASR with word-level timestamps and speaker diarization Ultra Fast Speaker identification
faster-whisper Optimized Whisper implementation Fast Efficient transcription
insanely-fast-whisper Ultra-fast Whisper implementation Insane Batch processing
WhisperLiveKit Real-time speech recognition with Whisper Real-time Live transcription
whisper-diarization Speaker diarization with Whisper Specialized Multi-speaker identification
Vibe Desktop transcription app with Whisper Desktop Self-hosted transcription
Scriberr Transcription and note-taking tool Notes Meeting transcription
hebrew_whisper 🇮🇱 GUI for Hebrew transcription using ivrit.ai Whisper models Hebrew Hebrew legal transcription
ivrit.ai Whisper Turbo 🇮🇱 Optimized Hebrew Whisper model with 388 hours training data Optimized Hebrew speech recognition

Document Signing & Collaboration

Platforms for digital document signing and collaborative documentation

Project Primary Use Best For License
Documenso Open-source DocuSign alternative Digital signatures AGPL
DocuSeal Document filling and signing platform PDF forms & signatures AGPL
Docmost Collaborative wiki and documentation software Team documentation AGPL

Document Management, OCR & PDF

Essential tools for document digitization, management, and processing workflows

Project Core Function Performance Special Features
paperless-ngx Self-hosted document management system High Searchable archive, AI tagging
Stirling-PDF Local web-based PDF toolbox Fast Split/merge/convert/optimize
OCRmyPDF Add OCR text layer to scanned PDFs Reliable Searchable PDF/A output
Paperless-AI AI addon for paperless-ngx Smart Semantic search, auto-classification
paperless-gpt ChatGPT integration for paperless-ngx AI Document Q&A, AI assistance
Tesseract Industry-standard OCR engine Standard Text recognition, 100+ languages
EasyOCR Ready-to-use OCR with 80+ languages Easy Quick text extraction
markitdown Convert documents to Markdown Convert PDF/DOCX/PPTX to Markdown
ExifTool Read/write metadata in files Metadata Digital evidence analysis

Document Assembly & Rules-as-Code

Platforms for automating legal document creation and implementing legal logic

Project Primary Use Target Users Technical Level
docassemble Expert-system platform for guided interviews Legal professionals Medium
AssemblyLine 🇺🇸 Court-form automation toolkit Court systems Low
Blawx Visual Rules-as-Code environment Legal technologists High
Catala Programming language for statute implementation Developers High
LEOS 🇪🇺 Legislative editing platform for AkomaNtoso XML format EU institutions Enterprise

Datasets & Benchmarks

High-quality training data and evaluation benchmarks for legal AI development

Dataset Content Type Coverage Scale Best For
Pile of Law 🇺🇸 Legal/administrative texts US-centric Large Language model training
MultiLegalPile 🌍 Multilingual legal corpus 24 languages Massive Multilingual models
LexGLUE 🇪🇺🇺🇸 Multi-task benchmark EU/US/Multi Medium Legal NLU evaluation
LEXTREME 🌍 Multilingual legal tasks 24 languages Large Cross-lingual evaluation
LegalBench Legal reasoning tasks Global Comprehensive LLM legal reasoning
LegalBench-RAG Contract retrieval benchmark Global Focused RAG system evaluation
CUAD Contract clause annotations Global Specialized Contract understanding
CaseHOLD 🇺🇸 Case holdings analysis United States Targeted Legal reasoning
ivrit.ai datasets 🇮🇱 Hebrew speech dataset creation platform Israel Platform Hebrew model training
crowd-transcribe-v5 🇮🇱 Hebrew speech dataset with 388 hours transcribed data Israel Large Hebrew speech models

Not legal-specific, but widely used in legal AI pipelines for document processing

Project Specialty Input Types Performance
GROBID ML extraction of document structure PDF → TEI/XML High
Unstructured Pre-processing for RAG pipelines PDF/Office/HTML Versatile
Layout Parser Deep learning layout detection Multi-format Advanced
Docling Modern document parsing PDF/DOCX/PPTX/HTML Modern
Nougat Neural OCR for academic documents Academic PDFs Specialized
Marker Fast PDF to Markdown conversion PDF Fast

Learning, Communities & Curations

Essential communities and learning resources for legal AI professionals

Resource Focus Community Size Activity Level
Free Law Project Open legal data ecosystem Large Very Active
Awesome Legal NLP Curated academic research Medium Active
Legal ML Datasets Comprehensive collection of legal ML datasets and tasks Large Active
EOLE Conference 🇪🇺 European Open Source & Free Software Law Event Medium Annual

Contributing

We'd love your help making this list even better! Here's how to contribute:

Submission Guidelines

Must Have:

  • Open-source with OSI-approved license
  • Clear documentation and README
  • Active maintenance (commits within 12 months)
  • Clear relevance to legal workflows

Nice to Have:

  • Community adoption (GitHub stars)
  • Production usage examples
  • Testing and CI/CD
  • Performance benchmarks

How to Submit

  1. Fork this repository
  2. Add your project in the appropriate section (alphabetical order)
  3. Include: Name, one-line description, primary link(s), jurisdiction flag if applicable
  4. Test your links and formatting
  5. Submit a pull request with a clear description

Optional Quality Checks

# Link checker
npx lychee --no-progress --accept 200,999 README.md

# Awesome list linter  
npx awesome-lint

Curation Policy

We Include We Exclude
Open-source projects only Closed-source SaaS platforms
Global scope (jurisdiction-tagged) Internal/private tools
Production-ready tools Abandoned experimental repos
High-value datasets/benchmarks Low-quality or duplicate data
Active, reputable communities Inactive or harmful communities

Quality First: We prioritize well-maintained projects with good documentation and real-world usage over comprehensive coverage.


License

CC0

CC0 1.0 Universal – No rights reserved.

Feel free to copy, remix, and build upon this list.

By contributing, you agree to license your contribution under CC0.


Credits

Curated by Chen Friedman
Powered by Legal Tech Systems


Star this repo if you found it helpful!

GitHub Stars

Made with ❤️ for the legal tech community

chen-friedman/awesome-legaltech | GitHunt