We focus on theoretical sciences (math, theoretical CS, physics, etc.) where progress centers on definitions, conjectures, reductions, constructions, heuristics, and proof strategies.

What we collect (beta)

  • Task briefs (problem framing, priors, constraints)
  • Interaction transcripts (curated “episodes” with turns, branches, rationales)
  • Milestones (new lemmas, promising directions, dead-ends)
  • Outcome labels (novelty, utility, correctness proxy, “breakthrough points”)

Why this matters

  • Training: episodes become instructional curricula for creative reasoning, not just short Q&A.
  • Evaluation: we score creative moves and trajectory quality, not only final answers.

Roadmap

  • v0 (now): recruit beta testers and pilot the schema with guided prompts
  • v1: public benchmark + leaderboard for creative scientific tasks
  • v2: open Arena for model-vs-model co-creative sessions (observer-rated)