AnyParse
AnyParse is a powerful multimodal document parsing and understanding engine designed to seamlessly convert complex files into structured Markdown and JSON formats. Whether it's basic text processing, professional document conversion, or advanced Vision-Language Models (VLM) and OCR recognition, AnyParse provides a comprehensive, one-stop solution.
Core Capabilities
- Multimodal Document Understanding: Supports cross-modal parsing of images and documents. By combining OCR and VLM technologies, it accurately extracts unstructured data.
- Comprehensive Format Coverage: Easily parses office documents, web pages, spreadsheets, e-books, and emails with a single tool.
- Structured Output: Transforms complex files into standardized Markdown and JSON, streamlining downstream data processing and Large Language Model (LLM) applications.
Key Features
- Documents & Layouts: PDF, DOCX, PPTX, XLSX, EPUB, IPYNB
- Text & Markup: TXT, MD, RST, HTML/XHTML/HTM/SHTML
- Spreadsheets & Data: CSV, TSV
- Images & Multimedia: PNG, JPEG/JPG
- Others: EML (Emails)
- Built-in CLI, FastAPI
- Supports running in a pure CPU environment, and also supports GPU
- Output text in human reading order, suitable for single-column, multi-column and complex layouts
- Retain the original document structure, including titles, paragraphs, lists, etc.
- Extract images, image descriptions, tables, table titles and footnotes
- Automatically identify and convert formulas in documents to LaTeX format
- Automatically identify and convert tables in documents to HTML format
All Thanks To Our Contributors
License Information
This repository is licensed under the AnyParse Open Source License, based on Apache 2.0 with additional conditions.