wiki-smoke
2 docs · 5 chunks · created Thu, 02 Jul 2026 15:47:00 GMT
Provenance
Source: docs/wiki/GOALS.md · extractor heuristic · chunking sentence
OKF dataset card
--- type: Dataset title: wiki-smoke description: "Point-in-time collection: 2 docs, 5 chunks." tags: [dataset, datalab] timestamp: "2026-07-02T15:47:00Z" datasetId: 20260702-104700-wiki-smoke --- # Provenance Point-in-time collection run `20260702-104700-wiki-smoke` — 2 documents, 5 chunks (sentence strategy, ~1414 tokens). Raw artifacts live at `data/datalab/20260702-104700-wiki-smoke/` (docs.jsonl, chunks.jsonl, manifest.json); trace `20260702-104700-9f108f80` in `data/traces/`. # Sources | Source | Status | |--------|--------| | `docs/wiki/GOALS.md` | ok | | `docs/wiki/BUILD.md` | ok | # Consumption Chunks feed the training worker's pair builder and the retrieval index. Filter on `token_estimate` and `strategy` in `chunks.jsonl`. See the [data pipeline](/platform/data-pipeline.md) concept for stage details.
Sample chunks (first 5, sanitized)
# Goal alignment baselines Evaluate new actions against these deliverables. Product claims must match `EVIDENCE.md`. ## Primary mission Multi-tenant **synthetic orgs** — each runs **collect → train → eval → deploy** to beat zero-shot embedders (BGE, e5) on *its* corpus, with an **edge tier** (Matryoshka t=8 + int8). ## Phase A deliverables (active) | Deliverable | Acceptance signal | |---|---| | F
Non-trivial features need a spec in `specs/` ## Current blockers (check live) ```powershell pnpm work:blockers ``` Disk (BLK-DISK) blocks Docker, real-text evals, and large corpus harvests.
# B.U.I.L.D. framework **Philosophy:** Action supersedes over-engineering. The system sharpens through consistent repetition and execution. If a process or tool does not actively add value, remove it. ## Agent runtimes (shared) All coding agents use the same wiki + queue. Runtime-specific lanes live in `config/agents.json`. | Runtime | Claim as | Entrypoint | |---|---|---| | **Cursor** | `--agent
Base — architecture & tooling | Layer | Location | Notes | |---|---|---| | Raw ingest | `docs/raw/` | Dumps, API responses, unstructured reference | | Wiki | `docs/wiki/` | Indexed docs, ADRs, finalized guidelines | | Knowledge bundle | `knowledge/` | OKF v0.1 — platform concepts, dataset cards, living SME reviews | | Data collection | `packages/datalab` | `python -m datalab collect` → `data/datal
```powershell uv run python scripts/build_sync.py inflow-sessions --agent claude --limit 10 uv run python scripts/build_sync.py inflow-sessions --agent opencode --limit 10 ``` ## 4. Loop — triaged improvement See [IMPROVEMENT_LOOP.md](./IMPROVEMENT_LOOP.md). Route all structural changes through buckets 1–3; do not attempt full automation for system modifications. | Runtime | Bucket policy | |---|-