Tecnologías VM
Work
AI research · Nairu · 2025

Oxcart

Oxcart is a domain-specialized GraphRAG system for Costa Rican philately — 162 years of stamp history made answerable. Researchers ask about issues, varieties, plates, overprints, forgeries and postal history, and get grounded, cited answers from a corpus of 1,424 documents fused with a catalog knowledge graph.

The corpus dissolving into the catalog knowledge graph — and back. Issues, stamps, varieties, plates, legal acts; the amber node is the anchor entity, the 1863 first issue.
Imagery

The research UI.

The Gradio research interface and the live Neo4j graph viewer — real captures on the way.

real imagery — coming
slot reserved — real photo (TODO)
real imagery — coming
slot reserved — real photo (TODO)
real imagery — coming
slot reserved — real photo (TODO)
01 / 03
0PDF documents indexed
0Pages parsed
0Text chunks indexed
0Years of philately — 1863–2025

Costa Rican philately has a documentation problem: 162 years of stamp history (first issue, 1863) recorded across scattered, hard-to-search literature — catalogs, decades of society bulletins, monographs, forgery studies, postal bulletins, auction results. Mostly scanned PDFs, about 70% in Spanish, that don't cross-reference each other and use several incompatible catalog-numbering systems. Oxcart makes that corpus answerable.

Philatelic questions mix exact nomenclature — catalog numbers, denominations, perforation gauges — with narrative: history, provenance, forgery analysis. A plain vector RAG isn't enough. Oxcart parses the corpus with Dolphin (LandingAI ADE as a selective high-fidelity fallback), embeds it into Weaviate, and fuses it with a Neo4j catalog knowledge graph — so exact queries resolve against precise entities and then expand into textual evidence. Every answer carries first-class citations down to document, page and chunk.

It's the same GraphRAG pattern we validated end to end in Canopy Intelligence, matured through years of Nairu retrieval work — Elasticsearch to embeddings to RAG to GraphRAG — pointed at a new domain: cultural heritage. We present it honestly as an advanced research prototype with a working demo, not a production service. MIT-licensed and open source.

The pipeline

From scanned PDFs to cited answers.

  1. 01

    Parse

    Dolphin parses 1,424 PDFs — 22,940 pages of scanned philatelic literature — with LandingAI ADE as a selective high-fidelity fallback.

  2. 02

    Enrich

    Chunks are typed — text, headers, decrees, issue notices, auction results, captions — and tied to issues, catalogs and dates.

  3. 03

    Dual-index

    193,180 chunks embedded into Weaviate; the Mena 2018 catalog structured into a Neo4j knowledge graph.

  4. 04

    Ground

    Catalog-number and issue-name queries resolve against graph entities, then expand into literature evidence.

  5. 05

    Answer

    Grounded, cited answers — document, page and chunk — in a Gradio UI with a live graph viewer.

What it is — and isn't

Oxcart is an advanced research prototype with a real indexed corpus and a working live demo — not a hardened production service. We say so up front; the repo is MIT-licensed and open for anyone to verify.

Contact

Let’s build something that works.

Have a real problem where emerging technology might be part of the answer? We’d like to hear about it.