51 results for “topic:apache-tika”
Smart local file search engine that understands your files
可以将word(doc、docx)、excel、pdf、ppt、csv、txt文件的文本内容提取出来,同时能够提取出word、pdf文件的目录
Open Source Computer Vision with TensorFlow, MiniFi, Apache NiFi, OpenCV, Apache Tika and Python For processing images from IoT devices like Raspberry Pis, NVidia Jetson TX1, NanoPi Duos and more which are equipped with attached cameras or external USB webcams, we use Python to interface via OpenCV and PiCamera. From there we run image processing at the edge on these IoT device using OpenCV and TensorFlow to determine attributes and image analytics. A pache MiniFi coordinates running these Python scripts and decides when and what to send from that analysis and the image to a remote Apache NiFi server for additional processing. At the Apache NiFi cluster in the cluster it routes the images to one processing path and the JSON encoded metadata to another flow. The JSON data (with it's schema referenced from a central Schema Registry) is routed and routed using Record Processing and SQL, this data in enriched and augment before conversion to AVRO to be send via Apache Kafka to SAM. Streaming Analytics Manager then does deeper processing on this stream and others including weather and twitter to determine what should be done on this data. References https://community.hortonworks.com/articles/103863/using-an-asus-tinkerboard-with-tensorflow-and-pyth.html https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html https://github.com/tspannhw/rpi-noir-screen https://community.hortonworks.com/articles/77988/ingest-remote-camera-images-from-raspberry-pi-via.html https://community.hortonworks.com/articles/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html https://community.hortonworks.com/articles/58265/analyzing-images-in-hdf-20-using-tensorflow.html
Python bindings for Apache Tika
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.
Extract text from a document by Apache Tika
AWS Lambda layer containing latest version of Apache Tika
Visualize unstructured data using Watson NLU
Text extraction from scanned pdf documents in java
Apache NiFi + Apache Tika + OptimaizeLangDetector
ApacheDeepLearning101
Golang client for Apache Tika
A permissively licensed crate to detect MIME types
The metadata and text content extractor for almost every file type.
All my processors (NARs) in one place
🚴♂️⛷Data Lake, Performance tuning for text extraction from a huge amount of files.
A security in mind file uploading web app
Directory tree metadata parser using Apache Tika
A place to release saved machine learning models for tika-dl
Developed a Spatial Search website that allow users to search documents from FBI Vault website. Extract the most frequently occurring location in each of documents, and load the geo-tagged data into Apache Solr to index the documents, visualize search results using the Google Maps API.
Tika detector for MKV and WebM
Document management system implemented with microservices
A simple information retrieval system, a PDF Search Engine for UN agencies and NGOs.
No description provided.
Extraction analysis of PixStory Social Media Dataset using language detection, language translation, tike geotopic parser, tika image object recognition/image caption generation, and PyTorch detoxify.
a tool set for indexing and searching through documents
This API use Annif as local server, NER component is included. It also includes Tesseract and uses Apache-tika software for language detection. It also has a limited multilingual support.
This application is designed for managing OCR (Optical Character Recognition) tasks. It allows users to define, schedule, and execute OCR tasks through a REST API. The core technologies used are Spring Framework, MongoDB, and Tesseract OCR.
Microsserviço de assistentes de IA com Spring Boot e Spring AI baseado em RAG. Integra OpenAI e pgvector para ingestão de documentos, busca vetorial e geração de respostas contextualizadas por domínio.