Data Refinery — Blue Hen RE

Provenance

Source: content/fleet/bd/queue.json · extractor heuristic · chunking sentence

OKF dataset card

---
type: Dataset
title: Validation Lab — promotion queue and certification scorecards
description: "Point-in-time collection: 1 docs, 1 chunks."
tags: [dataset, datalab]
timestamp: "2026-07-02T23:50:48Z"
datasetId: 20260702-185048-validation-lab---promotion-queue-and-certificati
---

# Provenance

Point-in-time collection run `20260702-185048-validation-lab---promotion-queue-and-certificati` — 1 documents,
1 chunks (sentence strategy,
~298 tokens). Raw artifacts live at
`data/datalab/20260702-185048-validation-lab---promotion-queue-and-certificati/` (docs.jsonl, chunks.jsonl, manifest.json);
trace `20260702-185048-c3b16b54` in `data/traces/`.

# Sources

| Source | Status |
|--------|--------|
| `content/fleet/bd/queue.json` | ok |

# Consumption

Chunks feed the training worker's pair builder and the retrieval index.
Filter on `token_estimate` and `strategy` in `chunks.jsonl`. See the
[data pipeline](/platform/data-pipeline.md) concept for stage details.

Sample chunks (first 1, sanitized)

{ "version": 1, "updated": "2026-06-28", "description": "Research → BD promotion queue. Research writes candidates; BD runs pilots and issues charters.", "candidates": [ { "id": "barlow-wave2-research-rag", "siteId": "research", "method": "Barlow Twins", "status": "awaiting_pilot", "submittedAt": "2026-06-28", "recipe": { "loss": "barlow", "barlowLam

33a0298c9a21f1e1

Request full access ← Catalog