59 results for “topic:extract-text”
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
:warning: ARCHIVED :warning: Search across and get full text for OA & closed journals
Use the Java Tika text extraction library on the .NET platform
Multiple and Large PDF Documents Text Extraction.
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
Read pdf files on javascript
C# and VB.NET samples for Docotic.Pdf library
R wrapper for antiword utility
R Interface to Apache Tika
Build search across multiple documents client-side in your file storage
simple rule based named entity recognition
An R package to extract text from pdf.
Tools for export and import scripts
pdfRest API Toolkit is a REST API service for processing PDF documents, made by developers, for developers. Rapidly integrate PDF workflows with your existing projects and applications, simply and seamlessly. Get started for free in seconds.
A collection of tools for OCR (optical character recognition).
Text Processing & Segmentation Framework
VNDB explorer and VNR-like text hooker.
Repo which contains a small demo to Extract Text from image OCR using Google Vision API in Python
ZWSP-Tool is a powerful toolkit that allows to manipulate zero width spaces quickly and easily. ZWSP-Tool allows in particular to detect, clean, hide, extract and bruteforce a text containing zero width spaces.
view pdf on X11 and the Linux framebuffer; resize pdf; convert pdf to text, html, TeX, groff
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.
Extract text from a document by Apache Tika
Wrapper for 'unrtf' utility to extract text from RTF documents
Octical Character Recognition app that extracts Text from images built with FastAPI, Tailwindcss and Pytesseract
Library that allows to extract text from RPG Maker files.
This repository contains examples to extract text from PDF documents in Flutter apps using Syncfusion PDF Flutter library.
A stenography program that can embed and extract text into and out of the pixels of an image.