chen-friedman/awesome-legaltech
Curated open-source Legal AI & LegalTech — tools, datasets, benchmarks, and learning resources. Global scope; jurisdiction-tagged.
Awesome Legal AI

The Ultimate Collection of Open-Source Legal Technology & AI Resources
Curating the best production-ready tools, datasets, and communities for legal professionals and developers
What Makes This List Special?
Rigorous Quality Standards → Only production-ready, actively maintained projects with real-world adoption
Global Legal Coverage → Worldwide scope with clear jurisdiction tagging (🇺🇸 🇪🇺 🇬🇧 🇩🇪 🇮🇳 🌍)
Practitioner-Tested Tools → Real solutions that legal professionals deploy in actual workflows
Premium AI Resources → Curated datasets, benchmarks, and models purpose-built for legal applications
Thriving Ecosystems → Active communities driving innovation and collaborative development
New to Legal AI? Start with our Quick Start Guide below!
Table of Contents
| Quick Navigation | Count | Best For |
|---|---|---|
| NLP Libraries & Domain Models | 6 projects | Text processing & analysis |
| AI-Powered Contract & Document Analytics | 5 platforms | Contract intelligence |
| Legal Research & Case Law Data/APIs | 6 resources | Research & citations |
| E-Discovery & Litigation | 3 tools | Legal discovery |
| Speech Recognition & Transcription | 8 tools | Audio/video transcription |
| Document Signing & Collaboration | 3 platforms | Digital signatures & wikis |
| Document Management, OCR & PDF | 9 solutions | Document processing |
| Document Assembly & Rules-as-Code | 4 platforms | Automation & workflows |
| Datasets & Benchmarks | 8 collections | Training & evaluation |
| General-Purpose Document Intelligence | 6 tools | Document understanding |
| Learning, Communities & Curations | 2 communities | Education & networking |
Total: 60+ High-Quality Open-Source Legal Tech Resources
Quick Start Guide
For Legal Professionals
- Start with: AI-Powered Contract Analytics for document review
- Research tools: Legal Research & Case Law APIs for case discovery
- Document processing: Document Management & OCR for digitization
For Developers
- Begin with: NLP Libraries & Models for text processing
- Training data: Datasets & Benchmarks for model development
- Infrastructure: Document Intelligence for pipelines
For Organizations
- Enterprise solutions: OpenContracts for contract analytics
- Document workflows: docassemble for automation
- Case management: CourtListener for legal data
NLP Libraries & Domain Models
Essential tools for processing and understanding legal text with specialized language models
| Project | Focus | Scope | Status |
|---|---|---|---|
| LexNLP | Information extraction from unstructured legal text (Python) | Global | |
| Blackstone | spaCy pipeline for long-form legal text processing | Global | |
| LEGAL-BERT | Pretrained BERT variants for legal corpora (contracts, ECHR, EU law) | EU/Global | |
| InLegalBERT 🇮🇳 | BERT models and recipes for Indian law corpora | India | |
| CaseHOLD | Tasks and baselines for case-law holdings analysis | Global | |
| LeXLMs | Corpora and probing tasks for legal language models | Multilingual | |
| Legal-HeBERT 🇮🇱 | BERT model for Hebrew legal and legislative domains | Israel |
AI-Powered Contract & Document Analytics
Enterprise-grade platforms for intelligent contract analysis and document understanding
| Project | Features | Best For | Maturity |
|---|---|---|---|
| OpenContracts | Enterprise document analytics with AI-powered analysis (GPL-3) | Large organizations | |
| ContraxSuite | Full contract analytics & document platform (AGPL) | Commercial use | |
| LawGlance | Free, open-source RAG-based AI legal assistant | SME & individuals | |
| OpenEDGAR 🇺🇸 | Framework for searchable EDGAR filings databases | US Securities | |
| CUAD Tools | Code and data interfaces for Contract Understanding | Research |
Legal Research & Case Law Data/APIs
Comprehensive databases and APIs for legal research and case law discovery
| Project | Coverage | Jurisdiction | API Access |
|---|---|---|---|
| CourtListener 🇺🇸 | Primary legal data & research platform | United States | |
| Juriscraper 🇺🇸 | Scrapers for opinions, oral arguments, PACER content | United States | |
| Eyecite 🇺🇸 | Fast, robust legal citation extractor | United States | |
| Caselaw Access Project 🇺🇸 | 6.7M+ U.S. court decisions with API | United States | |
| UK National Archives 🇬🇧 | Public API for UK court judgments | United Kingdom | |
| Open Legal Data 🇩🇪 | German legal data platform & API | Germany |
E-Discovery & Litigation
Specialized tools for legal discovery, document review, and litigation support
| Project | Capabilities | Use Case | License |
|---|---|---|---|
| FreeEed | Complete eDiscovery processing (OCR, indexing, metadata) | Large-scale discovery | |
| FreeDiscovery | Information retrieval engine based on scikit-learn | Document analysis | |
| FOIAMachine 🇺🇸 | Manage and send FOIA requests with agency directory | Government transparency |
Speech Recognition & Transcription
Essential tools for converting audio/video to text in legal workflows
| Project | Specialty | Performance | Use Case |
|---|---|---|---|
| Whisper | General-purpose speech recognition by OpenAI | Multilingual transcription | |
| WhisperX | Fast ASR with word-level timestamps and speaker diarization | Speaker identification | |
| faster-whisper | Optimized Whisper implementation | Efficient transcription | |
| insanely-fast-whisper | Ultra-fast Whisper implementation | Batch processing | |
| WhisperLiveKit | Real-time speech recognition with Whisper | Live transcription | |
| whisper-diarization | Speaker diarization with Whisper | Multi-speaker identification | |
| Vibe | Desktop transcription app with Whisper | Self-hosted transcription | |
| Scriberr | Transcription and note-taking tool | Meeting transcription | |
| hebrew_whisper 🇮🇱 | GUI for Hebrew transcription using ivrit.ai Whisper models | Hebrew legal transcription | |
| ivrit.ai Whisper Turbo 🇮🇱 | Optimized Hebrew Whisper model with 388 hours training data | Hebrew speech recognition |
Document Signing & Collaboration
Platforms for digital document signing and collaborative documentation
| Project | Primary Use | Best For | License |
|---|---|---|---|
| Documenso | Open-source DocuSign alternative | Digital signatures | |
| DocuSeal | Document filling and signing platform | PDF forms & signatures | |
| Docmost | Collaborative wiki and documentation software | Team documentation |
Document Management, OCR & PDF
Essential tools for document digitization, management, and processing workflows
| Project | Core Function | Performance | Special Features |
|---|---|---|---|
| paperless-ngx | Self-hosted document management system | Searchable archive, AI tagging | |
| Stirling-PDF | Local web-based PDF toolbox | Split/merge/convert/optimize | |
| OCRmyPDF | Add OCR text layer to scanned PDFs | Searchable PDF/A output | |
| Paperless-AI | AI addon for paperless-ngx | Semantic search, auto-classification | |
| paperless-gpt | ChatGPT integration for paperless-ngx | Document Q&A, AI assistance | |
| Tesseract | Industry-standard OCR engine | Text recognition, 100+ languages | |
| EasyOCR | Ready-to-use OCR with 80+ languages | Quick text extraction | |
| markitdown | Convert documents to Markdown | PDF/DOCX/PPTX to Markdown | |
| ExifTool | Read/write metadata in files | Digital evidence analysis |
Document Assembly & Rules-as-Code
Platforms for automating legal document creation and implementing legal logic
| Project | Primary Use | Target Users | Technical Level |
|---|---|---|---|
| docassemble | Expert-system platform for guided interviews | Legal professionals | |
| AssemblyLine 🇺🇸 | Court-form automation toolkit | Court systems | |
| Blawx | Visual Rules-as-Code environment | Legal technologists | |
| Catala | Programming language for statute implementation | Developers | |
| LEOS 🇪🇺 | Legislative editing platform for AkomaNtoso XML format | EU institutions |
Datasets & Benchmarks
High-quality training data and evaluation benchmarks for legal AI development
| Dataset | Content Type | Coverage | Scale | Best For |
|---|---|---|---|---|
| Pile of Law 🇺🇸 | Legal/administrative texts | US-centric | Language model training | |
| MultiLegalPile 🌍 | Multilingual legal corpus | 24 languages | Multilingual models | |
| LexGLUE 🇪🇺🇺🇸 | Multi-task benchmark | EU/US/Multi | Legal NLU evaluation | |
| LEXTREME 🌍 | Multilingual legal tasks | 24 languages | Cross-lingual evaluation | |
| LegalBench | Legal reasoning tasks | Global | LLM legal reasoning | |
| LegalBench-RAG | Contract retrieval benchmark | Global | RAG system evaluation | |
| CUAD | Contract clause annotations | Global | Contract understanding | |
| CaseHOLD 🇺🇸 | Case holdings analysis | United States | Legal reasoning | |
| ivrit.ai datasets 🇮🇱 | Hebrew speech dataset creation platform | Israel | Hebrew model training | |
| crowd-transcribe-v5 🇮🇱 | Hebrew speech dataset with 388 hours transcribed data | Israel | Hebrew speech models |
General-Purpose Document Intelligence (useful for legal)
Not legal-specific, but widely used in legal AI pipelines for document processing
| Project | Specialty | Input Types | Performance |
|---|---|---|---|
| GROBID | ML extraction of document structure | PDF → TEI/XML | |
| Unstructured | Pre-processing for RAG pipelines | PDF/Office/HTML | |
| Layout Parser | Deep learning layout detection | Multi-format | |
| Docling | Modern document parsing | PDF/DOCX/PPTX/HTML | |
| Nougat | Neural OCR for academic documents | Academic PDFs | |
| Marker | Fast PDF to Markdown conversion |
Learning, Communities & Curations
Essential communities and learning resources for legal AI professionals
| Resource | Focus | Community Size | Activity Level |
|---|---|---|---|
| Free Law Project | Open legal data ecosystem | ||
| Awesome Legal NLP | Curated academic research | ||
| Legal ML Datasets | Comprehensive collection of legal ML datasets and tasks | ||
| EOLE Conference 🇪🇺 | European Open Source & Free Software Law Event |
Contributing
We'd love your help making this list even better! Here's how to contribute:
Submission Guidelines
Must Have:
- Open-source with OSI-approved license
- Clear documentation and README
- Active maintenance (commits within 12 months)
- Clear relevance to legal workflows
Nice to Have:
- Community adoption (GitHub stars)
- Production usage examples
- Testing and CI/CD
- Performance benchmarks
How to Submit
- Fork this repository
- Add your project in the appropriate section (alphabetical order)
- Include: Name, one-line description, primary link(s), jurisdiction flag if applicable
- Test your links and formatting
- Submit a pull request with a clear description
Optional Quality Checks
# Link checker
npx lychee --no-progress --accept 200,999 README.md
# Awesome list linter
npx awesome-lintCuration Policy
| We Include | We Exclude |
|---|---|
| Open-source projects only | Closed-source SaaS platforms |
| Global scope (jurisdiction-tagged) | Internal/private tools |
| Production-ready tools | Abandoned experimental repos |
| High-value datasets/benchmarks | Low-quality or duplicate data |
| Active, reputable communities | Inactive or harmful communities |
Quality First: We prioritize well-maintained projects with good documentation and real-world usage over comprehensive coverage.
License
CC0 1.0 Universal – No rights reserved.
Feel free to copy, remix, and build upon this list.
By contributing, you agree to license your contribution under CC0.
Credits
Curated by Chen Friedman
Powered by Legal Tech Systems
Star this repo if you found it helpful!
Made with ❤️ for the legal tech community
