Oxcart
Oxcart is a domain-specialized GraphRAG system for Costa Rican philately — 162 years of stamp history made answerable. Researchers ask about issues, varieties, plates, overprints, forgeries and postal history, and get grounded, cited answers from a corpus of 1,424 documents fused with a catalog knowledge graph.
The research UI.
The Gradio research interface and the live Neo4j graph viewer — real captures on the way.
Costa Rican philately has a documentation problem: 162 years of stamp history (first issue, 1863) recorded across scattered, hard-to-search literature — catalogs, decades of society bulletins, monographs, forgery studies, postal bulletins, auction results. Mostly scanned PDFs, about 70% in Spanish, that don't cross-reference each other and use several incompatible catalog-numbering systems. Oxcart makes that corpus answerable.
Philatelic questions mix exact nomenclature — catalog numbers, denominations, perforation gauges — with narrative: history, provenance, forgery analysis. A plain vector RAG isn't enough. Oxcart parses the corpus with Dolphin (LandingAI ADE as a selective high-fidelity fallback), embeds it into Weaviate, and fuses it with a Neo4j catalog knowledge graph — so exact queries resolve against precise entities and then expand into textual evidence. Every answer carries first-class citations down to document, page and chunk.
It's the same GraphRAG pattern we validated end to end in Canopy Intelligence, matured through years of Nairu retrieval work — Elasticsearch to embeddings to RAG to GraphRAG — pointed at a new domain: cultural heritage. We present it honestly as an advanced research prototype with a working demo, not a production service. MIT-licensed and open source.
From scanned PDFs to cited answers.
- 01
Parse
Dolphin parses 1,424 PDFs — 22,940 pages of scanned philatelic literature — with LandingAI ADE as a selective high-fidelity fallback.
- 02
Enrich
Chunks are typed — text, headers, decrees, issue notices, auction results, captions — and tied to issues, catalogs and dates.
- 03
Dual-index
193,180 chunks embedded into Weaviate; the Mena 2018 catalog structured into a Neo4j knowledge graph.
- 04
Ground
Catalog-number and issue-name queries resolve against graph entities, then expand into literature evidence.
- 05
Answer
Grounded, cited answers — document, page and chunk — in a Gradio UI with a live graph viewer.
Oxcart is an advanced research prototype with a real indexed corpus and a working live demo — not a hardened production service. We say so up front; the repo is MIT-licensed and open for anyone to verify.
Let’s build something that works.
Have a real problem where emerging technology might be part of the answer? We’d like to hear about it.
info@tecnologiasvm.com