AI research · Nairu · 2025

Oxcart

Oxcart is a domain-specialized GraphRAG system for Costa Rican philately — 162 years of stamp history made answerable. Researchers ask about issues, varieties, plates, overprints, forgeries and postal history, and get grounded, cited answers from a corpus of 1,424 documents fused with a catalog knowledge graph.

Repository ↗

The corpus dissolving into the catalog knowledge graph — and back. Issues, stamps, varieties, plates, legal acts; the amber node is the anchor entity, the 1863 first issue.

Imagery

The research UI.

The Gradio research interface and the live Neo4j graph viewer — real captures on the way.

real imagery — coming

slot reserved — real photo (TODO)

real imagery — coming

slot reserved — real photo (TODO)

real imagery — coming

slot reserved — real photo (TODO)

0PDF documents indexed

0Pages parsed

0Text chunks indexed

0Years of philately — 1863–2025

Costa Rican philately has a documentation problem: 162 years of stamp history (first issue, 1863) recorded across scattered, hard-to-search literature — catalogs, decades of society bulletins, monographs, forgery studies, postal bulletins, auction results. Mostly scanned PDFs, about 70% in Spanish, that don't cross-reference each other and use several incompatible catalog-numbering systems. Oxcart makes that corpus answerable.

Philatelic questions mix exact nomenclature — catalog numbers, denominations, perforation gauges — with narrative: history, provenance, forgery analysis. A plain vector RAG isn't enough. Oxcart parses the corpus with Dolphin (LandingAI ADE as a selective high-fidelity fallback), embeds it into Weaviate, and fuses it with a Neo4j catalog knowledge graph — so exact queries resolve against precise entities and then expand into textual evidence. Every answer carries first-class citations down to document, page and chunk.

It's the same GraphRAG pattern we validated end to end in Canopy Intelligence, matured through years of Nairu retrieval work — Elasticsearch to embeddings to RAG to GraphRAG — pointed at a new domain: cultural heritage. We present it honestly as an advanced research prototype with a working demo, not a production service. MIT-licensed and open source.

The pipeline

From scanned PDFs to cited answers.

01
Parse
Dolphin parses 1,424 PDFs — 22,940 pages of scanned philatelic literature — with LandingAI ADE as a selective high-fidelity fallback.
02
Enrich
Chunks are typed — text, headers, decrees, issue notices, auction results, captions — and tied to issues, catalogs and dates.
03
Dual-index
193,180 chunks embedded into Weaviate; the Mena 2018 catalog structured into a Neo4j knowledge graph.
04
Ground
Catalog-number and issue-name queries resolve against graph entities, then expand into literature evidence.
05
Answer
Grounded, cited answers — document, page and chunk — in a Gradio UI with a live graph viewer.

What it is — and isn't

Oxcart is an advanced research prototype with a real indexed corpus and a working live demo — not a hardened production service. We say so up front; the repo is MIT-licensed and open for anyone to verify.

Contact

Let’s build something that works.

Have a real problem where emerging technology might be part of the answer? We’d like to hear about it.

info@tecnologiasvm.com

Email us

Oxcart

The research UI.

From scanned PDFs to cited answers.

Parse

Enrich

Dual-index

Ground

Answer

Let’s build something that works.