No description
  • Python 75%
  • Jupyter Notebook 23.2%
  • JavaScript 1.3%
  • Makefile 0.2%
  • HTML 0.1%
Find a file
2026-05-20 19:55:56 +02:00
.agents/skills/docling-document-intelligence refactored the document-extractor 2026-04-26 15:14:26 +02:00
.opencode Removed Supabase related Opencode permissions 2026-05-20 18:48:44 +02:00
document-extractor-api Made services-py new root 2026-05-20 18:26:19 +02:00
document-extractor-app Made services-py new root 2026-05-20 18:26:19 +02:00
document-extractor-shared Made services-py new root 2026-05-20 18:26:19 +02:00
document-pipeline Made services-py new root 2026-05-20 18:26:19 +02:00
local local 2026-04-13 12:28:45 +02:00
main-docs Made services-py new root 2026-05-20 18:26:19 +02:00
model-inference Made services-py new root 2026-05-20 18:26:19 +02:00
pdf-rag Made services-py new root 2026-05-20 18:26:19 +02:00
reports Made services-py new root 2026-05-20 18:26:19 +02:00
scripts Made services-py new root 2026-05-20 18:26:19 +02:00
src/services_py Made services-py new root 2026-05-20 18:26:19 +02:00
test-pdf Made services-py new root 2026-05-20 18:26:19 +02:00
.env.example Added Readme.md 2026-05-20 19:55:56 +02:00
.env.modal.example Made services-py new root 2026-05-20 18:26:19 +02:00
.gitignore Made services-py new root 2026-05-20 18:26:19 +02:00
.python-version Made services-py new root 2026-05-20 18:26:19 +02:00
Makefile Fixed .Phony in Makefile 2026-05-20 18:45:09 +02:00
product_extraction_schema.yaml Made services-py new root 2026-05-20 18:26:19 +02:00
pyproject.toml Made services-py new root 2026-05-20 18:26:19 +02:00
README.md Added Readme.md 2026-05-20 19:55:56 +02:00
skills-lock.json refactored the document-extractor 2026-04-26 15:14:26 +02:00
test-schema.yaml Made services-py new root 2026-05-20 18:26:19 +02:00
uv.lock Made services-py new root 2026-05-20 18:26:19 +02:00

Docext

Docext is a document extraction and processing workspace. It contains the document extractor API, shared extraction code, model inference components, and documentation for the surrounding system.

Hosted Sites

Local Development

This repository is managed with uv. The main documentation can optionally be served with mdbook. Building the test PDF additionally requires typst.

Prerequisites

  • uv
  • Optional: mdbook, for serving main-docs locally
  • Optional: typst, for building test-pdf/test-pdf.pdf

uv, mdbook, and typst can be installed through Rust's package manager, cargo:

cargo install uv
cargo install mdbook
cargo install typst-cli

cargo can be installed directly, or it can be installed through rustup by installing the Rust toolchain. When using rustup, you may need to add Cargo's binary directory to your shell path:

export PATH="$HOME/.cargo/bin:$PATH"

Install Dependencies

Install the workspace dependencies with:

make uv-sync

Configuration

Example environment files are provided as .env*.example files. Copy the relevant example file to a local .env file and fill in the values needed for your setup.

.env.example contains local runtime settings:

  • PYTORCH_ROCM_ARCH: optional ROCm setting for selecting the AMD GPU architecture PyTorch should target.
  • HSA_OVERRIDE_GFX_VERSION: optional ROCm setting for overriding the detected AMD GPU GFX version when needed by the local ROCm stack.
  • TF_MIN_GPU_MULTIPROCESSOR_COUNT: optional GPU-related setting for TensorFlow workloads.
  • DOCLING_CUDA_USE_FLASH_ATTENTION2: optional Docling setting for enabling or disabling FlashAttention 2 usage on CUDA setups.
  • OPENROUTER_API_KEY: optional API key for OpenRouter-backed model access.

.env.modal.example contains Modal-specific settings:

  • HF_TOKEN: optional Hugging Face token used by Modal jobs when they need authenticated access to Hugging Face resources.

The repository also contains experimental, disabled ROCm support. The relevant PyTorch ROCm dependency configuration is currently commented out in pyproject.toml; enable and adjust it only if you are working on an AMD GPU setup and know which ROCm versions and GPU architecture values apply to your machine.

Run The Document Extractor API

Start the local API server with:

make serve-document-extractor

The API will be served at http://localhost:8100. Its local Swagger UI is available at http://localhost:8100/docs.

Serve The Documentation

If mdbook is installed, serve the main documentation locally with:

make serve-main-docs

The documentation will be available at http://localhost:3080.

Serve the pipeline documentation locally with:

make serve-pipeline-docs

The pipeline documentation will be available at http://localhost:3081.