resume-parser (resume-to-flock)
Parse resumes (DOCX/PDF) into jobs.json, skills.json, categories.json, and other-sections.json.
LLM-driven skill term merges with manual approval.
Installation (for consumers like resume-flock)
Install the package so you get the CLI commands and the resume_parser Python API.
From Git (branch or tag):
pip install "resume-parser @ git+https://github.com/sbecker11/resume-parser.git@main"
# or pin a release: @v1.0.0From PyPI (once published):
pip install resume-parser
# or pin: resume-parser==1.0.0In a consumer project (e.g. resume-flock):
- requirements.txt:
resume-parser @ git+https://github.com/sbecker11/resume-parser.git@v1.0.0 - pyproject.toml:
dependencies = ["resume-parser @ git+https://github.com/sbecker11/resume-parser.git@v1.0.0"]
After install, root-level CLIs are on your PATH: resume-to-flock, render-resume-html, run-merge-on-parsed, validate-parsed-resume. Resume-flock can invoke the renderer with render-resume-html -i <folder> (see contracts/RENDER_RESUME_HTML-v1.0.md).
Single source of truth for integration: Schema, validator, and contract specs all live in contracts/ (see contracts/README.md). The package provides the CLIs only; contracts/ holds the documents consumers need.
Setup (developers)
cd /path/to/resume-parser
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -e ".[dev]"Create .env and set the LLM provider key (LLM_PROVIDER=anthropic uses ANTHROPIC_API_KEY):
ANTHROPIC_API_KEY=your-anthropic-key-here
# Optional: LLM_PROVIDER=anthropic
CLI utilities
When the package is installed, these commands are on your PATH. (Developers can also run the scripts in the repo.)
| Command | Purpose |
|---|---|
resume-to-flock |
Parse a resume (DOCX/PDF) → write JSON (+ optional merge and HTML). |
render-resume-html |
Generate resume.html from existing JSON in a folder. Contract: contracts/RENDER_RESUME_HTML-v1.0.md. Invoked by resume-flock. |
run-merge-on-parsed |
Run skill merge on an existing parsed folder (read/write JSON, optional --render). |
validate-parsed-resume |
Validate a parsed-resume folder’s JSON against the schema. |
See docs/SMOKE_TEST.md for manual smoke-test steps for all four.
Usage
Provide the path to your resume file (DOCX or PDF); there is no project resumes/ folder.
resume-to-flock /path/to/resume.docx -o /path/to/output-filesOptions
| Option | Description |
|---|---|
-o, --output-dir |
Where to write .json files (default: flock-of-postcards/static_content if found, else cwd) |
--id |
Resume id for meta.json (default: output dir basename) |
--no-llm |
Skip LLM; only extract text (for testing) |
--no-enrich |
Skip LLM skill URL enrichment |
--provider |
Force LLM_PROVIDER: anthropic (requires ANTHROPIC_API_KEY) |
--no-merge |
Skip interactive skill merge step (for CI / non-interactive use) |
--render |
After writing .json, generate resume.html (calls render-resume-html) |
Output
All files are written in the output folder (no subfolders). The output folder also includes a copy of the original resume file (DOCX or PDF) under its original filename.
Data model (consistent across the three dicts):
-
Skills dictionary uses skillID as primary key. Skill item has display name (
name), optional list ofcategoryIDs, optional list ofjobIDs. -
Jobs dictionary uses jobID as primary key. Job item has display name (role, employer), optional list of
skillIDs. -
Categories dictionary uses categoryID as primary key. Category item has display name (
name), optional list ofskillIDs. -
jobs.json - Jobs dict keyed by jobID (resume-flock format). Each job has role, employer, start, end, Description, skillIDs, etc.
-
skills.json - Skills dict keyed by skillID (slug):
{ "skillID": { "name": "Display Name", "url": "", "img": "", "categoryIDs": ["id1", ...], "jobIDs": [0, 1, ...] }, ... }. Same structure as jobs and categories (ID as key, display name inside). Includes skills from job descriptions (with job indices injobIDs) plus any from the resume’s skills section (jobIDsempty).categoryIDsreference categories.json for display names. -
categories.json - Categories dict keyed by categoryID (resume-flock format). Each category has name, skillIDs.
-
other-sections.json - Contact, title, summary, certifications, websites, custom_sections, skills (resume-flock format).
-
meta.json - Resume metadata for list UI: id, displayName, createdAt, fileName, jobCount, skillCount.
-
resume.html - Rendered resume (generate with
render-resume-html -i /path/to/outputor--render). -
resume_template.html - Copy of the template (written when generating resume.html).
Pipeline
- Extract - python-docx (DOCX) or pdfplumber (PDF) → raw text
- Parse jobs - LLM (Anthropic) extracts structured jobs with dates and descriptions
- Parse resume sections - LLM extracts contact, summary, certifications, skills, other sections →
resume_meta.json - Extract skills - Regex
[text]{img}(url)from job descriptions; merge resume skills section - Enrich - Optional LLM pass to suggest URLs for skills without one
- Categorize - LLM assigns each skill a list of categories (e.g. Programming Language, Framework) →
skills.json
HTML generation (optional)
Generate resume.html from the JSON files:
render-resume-html -i /path/to/output-folderOr use --render with resume-to-flock to run this step automatically after parsing. Contract for resume-flock: contracts/RENDER_RESUME_HTML-v1.0.md.
Run merge on existing parsed folder
To run the skill-merge step on a folder that already has parsed JSON (e.g. from a previous parse or from parsed_resumes/), use run-merge-on-parsed. It reads jobs.json, skills.json, and categories.json, runs the LLM merge (interactive or --accept-all), updates those files and job descriptions, and optionally re-renders resume.html.
# One folder (interactive prompts; re-render HTML after)
run-merge-on-parsed /path/to/parsed-folder --render
# All subfolders, apply all suggested merges, then render
run-merge-on-parsed parsed_resumes --all --accept-all --renderRequires ANTHROPIC_API_KEY in .env. Options: --all (each subfolder), --accept-all (no prompts), --render / --render-after-merging (run render-resume-html after merging).
Validate parsed output
To check that a parsed-resume folder’s JSON conforms to the schema:
validate-parsed-resume /path/to/parsed-folderRequires the package installed (pip install -e . or from Git). On success, prints the list of validated files (e.g. jobs.json, skills.json, categories.json, other-sections.json, meta.json).
Tests
From the repo root (with venv activated):
python -m unittest discover -s testsTests in tests/ cover extractors.extract_text, parsers.extract_skills_from_text, and parsers.get_llm_provider (no LLM calls).
Coverage report
Install coverage (add to venv if needed):
pip install coverageRun tests under coverage and print a report:
coverage run -m unittest discover -s tests
coverage reportTo generate an HTML report (opens in browser):
coverage run -m unittest discover -s tests
coverage html
# open htmlcov/index.htmlFlock integration
The generated output files can be read by workspace-resume/resume-flock