3 weeks ago

Thurs Jan 22, 2026 1:11pm PST

Show HN: Pdfpull – PDF extraction API with structured invoice and resume parsing

A friend of mine was struggling with manually extracting data from PDFs, so I built a REST API to automate it.

Main features: - Text extraction (full document or specific pages) - Table extraction → JSON with headers/rows - Invoice parsing → vendor, amounts, line items, tax (auto language detection) - Resume parsing → contact info, skills, work experience, education - Metadata, links, embedded images extraction

API endpoint: https://pdfpull-895295000838.europe-west1.run.app/docs Playground: https://bnacar.dev/pdfpull-landing/playground.html

Tech: FastAPI, PyMuPDF, pdfplumber. Rule-based extraction (no LLM API calls).

The invoice/resume parsers detect language automatically (EN/DE/TR) and extract fields without per-template configuration.

Demo key for testing: sk_demo_123456789

Example request:

    curl -X POST "https://pdfpull-895295000838.europe-west1.run.app/api/v1/parse/invoice" \
      -H "X-API-Key: sk_demo_123456789" \
      -F "file=@invoice.pdf"

Returns structured JSON:

    {
      "vendor_name": "ACME Corp",
      "invoice_number": "INV-2024-001", 
      "total_amount": 1250.00,
      "currency": "USD",
      "line_items": [...],
      "confidence": 0.92
    }

Free to try (100 requests with demo key). Looking for feedback on the API design and what document types to add next.

read article

comments:

add comment

loading comments...