# A self-building knowledge graph with a GAN-style overnight self-improvement loop

> How parallel research agents extract atomic facts into a vector-search-free knowledge graph, and an overnight generator/discriminator loop with self-written rubrics improves voice, email, and skills
> by Artemii Novoselov (Forsyn)
> group: personal
> event: ai-native-startups
> source: cyber•OS — https://os.cyber.fund/case-studies/self-building-knowledge-graph-autoimprove

## Problem

If you've tried to give an agent durable memory, you've probably hit the same wall: a knowledge store that just accumulates raw notes isn't useful, and you don't want to point the agent at the exact right slice of memory every single time you ask it something. Vector search felt like the wrong primitive for this. And there's a second problem underneath it — once an agent produces an artifact (a voice note, an email, a piece of code), how do you make it get *better* over time without you hand-tuning every output? Artemii Novoselov (Forsyn) hit both running multiple businesses off one shared "bigger brain."

## Approach

Artemii runs two interlocking systems, demoed live on a stablecoin-compliance research task.

**Auto-built knowledge graph.** He kicks off research by voice (Super Whisper into the console), which fans out to parallel Claude Code agents *and* submits a task to a self-hosted OpenClaw instance on EC2 — he runs about five, one per business, and they're addressable from Claude, Telegram, or anywhere. The agents don't just research; they extract **atomic facts** into a live graph: red nodes are source papers/articles, yellow are concepts, blue are facts. OpenClaw builds a DAG for the given tags and pulls relevant prior facts from the company's bigger brain as context, rather than running on the submitted prompt alone. The graph is built automatically on every research run, not hand-authored. When he then asks a question ("how much does it cost to review a flagged stablecoin transaction"), the system answers by drawing from the graph and literally injecting the chosen facts, concepts, and papers into the prompt — and it can show exactly which ones it used.

- **No vector search.** It's a wiki / hierarchical-tag approach: search a tag, then a heading, then go deeper in the tree. He treats the graph as augmented context, not a source of truth. Periodic cleanup/consolidation runs keep nodes dense but tidy.
- **'autoimprove' loop.** A generator/discriminator loop (he frames it as Karpathy-style auto-research crossed with Goodfellow's GANs) iterates an artifact overnight until both halves score better — for a voice note, both the words and the voice itself (it swapped to a newer ElevenLabs v3 model). The same skill applies to email campaigns, tasks, and the skills themselves.
- **Self-generated eval rubrics.** Instead of hand-writing evals, the agent designs its own scoring rubric for an artifact (e.g. Gemini listening to a voice note), scores against it, runs autoimprove until it plateaus, then critiques and improves the rubric itself.

What worked and what didn't, in his own words: getting the rubric self-design to work "took me a while" — he had to nudge/force it several times before it produced good rubrics. The loop "doesn't go in a straight slope"; scores dip before they climb. It's polished on voice and "works pretty well" on email, but on more subjective artifacts like HTML code it sometimes "produces hilarious slope" — those failures he keeps as edge cases and feeds back into the system.

## Results

Most of the impact shown was qualitative and demonstrated live rather than measured. The graph answered the stablecoin-cost question with full provenance — it surfaced exactly which facts, concepts, and papers were injected. Overnight autoimprove visibly improved a voice note (better words plus a better voice model) in real time during the demo, and he showed the same skill running over email campaigns and tasks. On scale, the strongest stated result: he's run the graph for his own company for two months and a second company for about a month without hitting a ceiling — explicitly because he avoided vector search in favor of the hierarchical wiki approach — and the same setup worked on a hardware company's codebase too. He also runs an OpenClaw-based Kanban (proposed / running / blocked) that he "barely looks at," where the system revisits blocked tasks overnight to unblock them.