Datasets with receipts
The Data Operations division of Blue Hen RE, as a product. Every dataset here carries its provenance (source, extractor, chunking strategy, and an OKF card), because training data you can't audit is training data you can't trust.
Datasets in the catalog
0Source documents
0Retrieval-ready chunks
0Last refinery sync
Fri, 03 Jul 2026 17:40:19 GMT
The pipeline
source → fetch (SSRF-guarded) → structure → chunk → OKF dataset card → catalog. Unchanged content is skipped by hash; every collection is logged; consented contributions enter the same pipeline with a provenance receipt.