The Latticework A Mental-Models Reading · May 2026
Field Note № 19 · Tools & Cognition

Tokenmaxxing.

A latticework reading of Y Combinator's Lightcone on Gary Tan's return to code — which mental models hold, which crack, and which ones to add.

Lightcone hosts on the Y Combinator stage

Photo: Y Combinator / Lightcone

400×Output multiplier
$200Cloud Code budget
5 daysPosterous, rebuilt
15Parallel agents
I · The Frame

What this transcript is really about.

Charlie Munger's "latticework" idea is that worldly wisdom comes from holding many disciplines' core models in your head at once, and reaching for the right one in the right moment. Most podcast episodes give you anecdote. A useful one gives you a perturbation: it tests the latticework you already have. This Lightcone episode is the second kind. Across forty minutes, Y Combinator's CEO and his partners are not just describing what Gary Tan did with Claude Code. They are — without quite saying so — proposing edits to the canon.

Three kinds of edits are on offer. Some classic Farnam-Street models come out amplified: leverage, activation energy, inversion, and margin of safety all get crisper, more extreme illustrations than they have ever had. Some get contradicted, or at least bent: diminishing returns, the path of least resistance, specialization, and the old "lines of code is a vanity metric" piety. And finally, the episode dangles a handful of new models — not yet on Farnam Street's list — that earn a place in the latticework.

What follows is a structured pass through all three.

II · The Reinforced

Models the episode amplifies.

These are existing entries on the Farnam Street list. Tan's run gives each of them a sharper, more memorable instance than they had before.

Reinforced · 01
Physics & Chemistry · Leverage

Leverage, turned to eleven.

Munger's classic: a small force, applied with mechanical advantage, yields disproportionate output. The episode is a study in this — but the multipliers are unfamiliar. One person, one Mac, $200 of inference, and the work of four hundred engineers comes out the other side.

Tan delivers the 400× number not as a brag but as a personal shock — his own opener is "I'm relatively shocked myself." The hosts nudge him back to the project that started the run: rebuilding Posterous (his 2008 YC startup, sold to Twitter for ~$20M) for a third time as Gary's List — in five days, on a $200 Claude Code Max account, including a full agentic newsroom built on top.

>> Well, I'm relatively shocked myself. So I'm amazed as well. It was 13 years of not coding and then suddenly boom, I'm doing about 400x th…
>> Well, I'm relatively shocked myself. So I'm amazed as well. It was 13 years of not coding and then suddenly boom, I'm doing about 400x the amount of work that I was that year. The last time I was even sort of like twothirds of the time writing code. Maybe to start things off, how about we go back to the project that started it all off, which was Gary's list. Oh, yeah. And just like talk about a few months ago how you powered up Cloud Code and like started to get back to coding. >> It was right after one of the Lyon episodes, right? >> Oh yeah, definitely. I realized that I wanted to bring together all the people who believed what I believed particularly for California. And so I started a 501c4 and now it's a C3 and a pack which is sort of what a lot of political groups do. it's a very common way to bring people together. You know, everyone focuses on the money but we're trying to bring together smart people. you know what I learned in

The lesson is not that Tan is exceptional. It is that the lever has lengthened so much that the question becomes: what are you willing to push against?

Reinforced · 02
Physics & Chemistry · Activation Energy

The threshold most people don't cross.

"Activation energy" is the one-shot cost to get a reaction started. Tan's complaint about his critics is, mechanically, an activation-energy complaint: the people most equipped to benefit are the people who haven't yet paid the upfront cost — installing the tools, paying for Opus tier, learning to write skills, tolerating the first week of slop.

The frame surfaces near the end of the episode, while Tan is responding to critics who don't believe his output is real. He declines to argue the math. "Stop fighting. Just open Claude Code and try it." The threshold he names isn't intelligence — it's the cost of the first install, plus the willingness to spend a real token budget for a week.

know, believe, right? So, stop fighting, just open cloud code and try it. You know, >> I think another thing that's potentially going on is…
know, believe, right? So, stop fighting, just open cloud code and try it. You know, >> I think another thing that's potentially going on is just like the experiences vary dramatically depending on like the the models and the harnesses. , like certainly something I've noticed is any sort of like semi complicated programming task I try and do through my openclaw agent just like kind of fails. like it's exactly the same model and so like Opus 4.7 as clawed code but it just like like anything above like a simple script I just find like it's not like
Reinforced · 03
General Thinking · Inversion

A second model, used to falsify the first.

Inversion asks: what would guarantee failure? Tan's /codex skill operationalises it. After a feature is planned and built by a confident generalist agent (Claude Code), a second, slower, more rigorous agent is invoked with a single instruction: find all the problems and bugs. The optimist proposes; the pessimist disposes.

Tan describes arriving at a YC batch event "brain totally frazzled" and overhearing founders praising Codex over Claude Code. He'd been Claude-only. The reason, they told him: Claude is good but will "BS a bunch of stuff"; Codex is the "200-IQ, nearly nonverbal CTO" you call in when something gets harder. That conversation seeded the /codex GStack skill — it hands the freshly-built code to a second, slower model with one instruction: find every bug.

to an event and brain totally frazzled but you know went to one of our batch events and we were just you shooting the about what was going o…
to an event and brain totally frazzled but you know went to one of our batch events and we were just you shooting the about what was going on with claude code versus codeex and at the time I was a total claude code only guy and I realized oh a lot of people actually prefer codecs. Why is that? And I discovered that claude code is ideal for the ADHD CEO, but once in a while there's a, you know, claude code will just BS a bunch of stuff. Like claude models are very very good, but like they are not the smartest, it turns out. And so a lot of people, you know, explained to me that if you have a problem that's much crazier. You need the 200 IQ nearly nonverbal CTO. So you can just call in a friend and then that's what like /codex is. It's a, you know, GStack skill that takes whatever plan your plan is or if you're out of plan mode and you already implement it, it'll take your repo and it'll run codeex in a command line prompt with the prompt that says find
Reinforced · 04
Systems · Margin of Safety

80% test coverage as a load-bearing wall.

Without margin, vibe-coded software is "10× worse than human-written code" — Tan's words. The fix is unromantic: tests, before users touch anything. The new wrinkle is that the cost of producing them has collapsed, so the old excuse for skipping (it's tedious) is gone. Margin of safety used to be expensive prudence. It's now the default setting.

Tan's setup is a confession — early Gary's List was "slop" because he skipped tests. "I knew I needed to have it, but I'm here to write fun new code." The fix wasn't discipline; it was discovering Claude could produce the tests cheaply. Hitting 80–90% coverage went from chore to default once the cost collapsed.

Reinforced · 05
Systems · Hierarchical Organization

Thin harness, fat skills.

The phrase Tan and Pete Koomen coined is, structurally, a clean two-layer hierarchy: a generic execution loop below, an editable layer of plain-English judgement above. Don't rebuild the bottom. Don't bury the top.

The phrase comes from YC partner Pete Koomen, who noticed every team was rewriting the same generic agent loop. Tan's wedding-planner example does the work: a checklist teaching the next person to throw a wedding, in plain English, is markdown; calling twenty venues is code. The boundary becomes the architecture.

like you know why should we rewrite a version of that over and over again like you know we should just use the things that are really awesom…
like you know why should we rewrite a version of that over and over again like you know we should just use the things that are really awesome as you know harnesses like a harness is the core loop that takes the user input gives it to the LLM runs what the LLM does like it can do tool calls and things like that I mean why would we build that like what we should spending all our time doing is thinking about what markdown should there be? And the way to think about markdown is if you were an event planner and throwing a wedding and you were trying to write down a checklist of how to throw a wedding again, like what would you what would you write in plain English to teach the next person who had to do it what to do? All of that should be in the markdown. Whereas all the things that should you know be deterministic like I mean or is is a real action like a a wedding planner might have to call like 20 venues right but you wouldn't use markdown for that like you would make a you know a call to Twilio for instance right there's like a
Reinforced · 06
Economics · Trade-offs & Opportunity Cost

The SF-rent argument, generalised.

Tan's analogy: founders see San Francisco rent as expensive, but the truer accounting is that it is more expensive not to be in Dogpatch, where the serendipity is. Token spend works the same way. The naive frame is "models cost too much"; the correct frame is "the cheaper option is the one that quietly costs you the upside."

The analogy lands when the hosts ask whether it's reasonable to expect founders to drop $500 a day on tokens. Tan revisits a familiar YC moment — founders insisting Bay Area rent is too expensive to be worth it. "It's so expensive to not live there." Tokens, he says, are the new rent: the cheap option is the actively expensive one.

paradigm. >> It actually reminds me of rent. San Francisco rents. Like one of the things that I feel like we always have to do with YC found…
paradigm. >> It actually reminds me of rent. San Francisco rents. Like one of the things that I feel like we always have to do with YC founders is that it's like a general thing. I was like, "Oh, like I don't want to move to San Francisco because it's like so expensive to live there, but it's like >> it's so expensive to not live there." >> Yeah, exactly. That's the whole point, right? Like early on in a YC batch, like I'm used to like a fan of being like like this like this apartment is like thousands of dollars a month in rent. Like seems ridiculous. Like should I like pay it or not? And it's like, no, you should absolutely pay. And if anything, you should pay more to not just be in San Francisco, but be in like the dog patch and just like be in like neighborhoods where you create this serendipity. Like token maxing is going to be one of those things for founders that we sort of have to teach them where it's not immediately obvious that you shouldn't. This is actually like rent. Like this is one of the things where you should like spend as much as you can to like get the like most utility out of it
III · The Contradicted

Models that do not survive intact.

Some entries on Farnam Street's list look different after this episode — sometimes inverted, sometimes simply de-rated. Use them with care from now on.

Bent · 01
Systems · Law of Diminishing Returns

The curve has not yet bent.

Conventional wisdom says that each marginal dollar buys progressively less. Tan's claim is that, for inference today, the curve is still nearly linear: each extra $5 of Opus calls buys real new context — another twenty sources, another round of red-team review, another full pass of tests. We are, briefly, in a regime where diminishing returns hasn't caught up. Plan accordingly; the regime won't last.

Tan never says "diminishing returns," but the figure makes the claim implicit: for the cost of $5–$10 in Opus calls, the agent does what would take a meticulous human a month of reading and cross-referencing. He frames the deal as boil the ocean — the machine doesn't care; the curve hasn't bent yet.

Bent · 02
Biology · Tendency to Minimize Energy Output

The path of least resistance is now the wrong default.

Organisms conserve effort. Engineers historically have too — write the test that catches the bug, not the test that catches every bug. Tan's "boil the ocean" inverts this. When the marginal cost of thoroughness has crashed, the lazy choice and the right choice diverge. Catch yourself any time you start economising on tokens, sources, or test cases.

When the hosts press on whether token-maxxing is sustainable, Tan invokes his earlier Boil the Ocean essay. The thrust: when an LLM can read twenty sources instead of one, settling for one is the lazy mistake, not the prudent move. The path of least resistance has inverted; conservation of effort no longer protects you.

Bent · 03
Economics · Specialization

The generalist returns.

For half a century, the playbook said: hire a frontend specialist, a backend specialist, a QA specialist. The episode reverses the polarity. The human stays generalist — taste, judgement, agency, prompts. The agents specialise, via skills like plan-CEO, codex, browse-QA. Specialization migrates from carbon to silicon.

Tan describes his daily setup running 15 Conductor windows in parallel — each one a separately-skilled agent (plan-CEO, codex, browse-QA, designer). He's the generalist orchestrator; the specialists are silicon. Diana Hu reflects that this is the inverse of how YC has been advising team composition for years.

Bent · 04
Folk Wisdom · "Lines of Code is a Vanity Metric"

Old proxy, new validity.

The orthodox dismissal of LoC was correct in its native context: humans pad code, optimise for legible effort, and game whatever metric they're paid against. Strip the human author out of the loop and the metric quietly re-acquires signal — agents do not pad. Tan's measured 400× was, after de-padding, higher, not lower. A retired metric, conditionally rehabilitated.

When critics challenged his 100× claim on X, Tan ran a public LoC normalizer against his 2013 code and his 2026 output. The de-padded multiplier wasn't smaller — it was larger, 400×. "It actually went up." The vanity-metric dismissal had only ever been true when humans wrote the code; agents don't pad.

It also kind of does, right? >> Yeah. Like it does. It's clearly And you know what's interesting is you can actually there's wellpublished g…
It also kind of does, right? >> Yeah. Like it does. It's clearly And you know what's interesting is you can actually there's wellpublished git repos out there that you can run to strip away and like standardize what is actual logical lines of code. And so I actually did go ahead and do that. you know, and I got into trouble for saying like, oh, I'm coding at like a 100x the rate that I was in 2013. And then after I did the logical lines of code strip down it actually went up. >> It actually went up. So it turns out that I was actually doing 400x the amount of code. But you know obviously I wasn't writing it. I was directing you know 15 agents at a time to do so. And then by the numbers like it was not that it did like knock down my lines of code from cloud code a little bit but the surprising thing to me was that it knocked down the amount of lines of code that I was writing in 2013 by like 70%. >> And so I think that that's sort of the mismatch here. Like people get very
Bent · 05
General Thinking · The Map Is Not the Territory

When the map compiles.

Korzybski's warning held when maps were inert representations. Markdown skills are not inert — they are the executable artifact. The English description of the wedding-planner checklist is the program that runs the wedding. Map and territory do not collapse, exactly, but the gap shrinks to something thinner than the canon assumes.

The reframing comes from Tan defending himself on X against the charge of "just peddling markdown." His rebuttal: markdown is code now — it's compiled differently, but it executes. The English description of the wedding-planner checklist is the program that runs the wedding.

IV · The New

Models worth adding to the latticework.

These don't appear on Farnam Street's index. They earn entry by being load-bearing in the episode and portable beyond it.

New · 01
Coined · Cognition & Capital

Tokenmaxxing.

The deliberate practice of overspending on inference because the bottleneck is not cost but completeness. Generalises beyond LLMs: any context where the marginal cost of "thoroughness" has crashed — simulation, A/B testing, code review, research — invites a tokenmaxxing posture. Inverse of: minimum viable effort.

The verb arrives mid-episode as a joke and sticks. By the closing argument Tan is using it generalized: any context where the marginal cost of thoroughness has crashed invites a token-maxxing posture. Not just inference — simulation, code review, research, due diligence.

New · 02
Coined · Architecture

Thin Harness, Fat Skills.

Architectural principle. Keep the generic execution loop ("the harness") as small and replaceable as possible; push every domain-specific judgement into editable, plain-language skill files. Optimise for what is hot-swappable. Applies far beyond agents — any system where the rules change faster than the runtime should look this way.

Tan attributes the framing to Pete Koomen, who had been writing about it after YC's partners rebuilt their internal agent harness for the third time. The post-hoc realization: every team was building a harness, not a product. The valuable code is markdown — and it's hot-swappable in a way the loop isn't.

New · 03
Metaphor · Tools & Self-reliance

The Ferrari–Mechanic Bargain.

Powerful new tools require their users to also be the repair crew. Capability and self-reliance must scale together; you cannot accept the one without the other. Implication: the population that benefits from frontier tools is bounded by the population willing to debug them at 2 a.m.

The metaphor opens the episode. Tan is mid-explanation of OpenClaw when he reaches for it — exhilarating, he says, "but it'll break down on the side of the road when you most need it." Pop the hood, grab the wrench, fix it yourself. The implication: capability and self-reliance are not separable purchases.

control over you? Using OpenClaw these days is like driving a Ferrari and it's like exhilarating. It's insane. Like you get to do things lik…
control over you? Using OpenClaw these days is like driving a Ferrari and it's like exhilarating. It's insane. Like you get to do things like it figures things out you would never think a machine could figure out and it does it so quickly. But then it's also like a Ferrari and that you better be a mechanic. like it's a Ferrari that will break down on the side of the road, you know, when you most need it and you need to get out with your wrench and pop the hood and like f fix it, you know, you're gonna have to fix it yourself. And so this is a very exciting time in computer science and technology. Welcome back to a special episode of the light cone. In this episode, we're going to talk about how Gary Tan got back to building. If you follow us on Twitter, you'll know that after a multi-year
New · 04
Composition Pattern · Decision-making

The CEO + Codex Pair.

A two-model protocol: pair an optimistic, fast generalist with a slower, more rigorous auditor. The first proposes; the second falsifies. Generalises: any high-stakes judgement benefits from a structurally different second opinion before commitment. Investment committees, code review, and medical second opinions all look like this.

The pattern emerges from the same YC-batch-event conversation as Inversion. Tan formalizes it later: an ADHD-CEO model (Claude Code) proposes; a slower, more rigorous CTO model (Codex) audits. Generalizes well beyond agents — investment committees, code review, and medical second opinions all use the same shape.

New · 05
Coined · Time & Attention

Time-Billionaire by Proxy.

You cannot extend your own life. You can buy machine-lifetimes pointed at the causes you care about. Token spend converts, with imperfect fidelity, into surrogate consciousness-hours. Reframes "compute budget" from operating cost to cognitive endowment.

Diana Hu asks if running YC made the side-project run easier — counter-intuitive given Tan's time-scarcity. Tan reframes scarcity entirely: "I am in a crazy rush in my brain… I need every single moment to count." Then the rotation: you can't extend your own life, but you can buy millions of years of machine-consciousness pointed at the causes you care about.

personally like I think my philosophy is I am in a crazy rush in my brain. I'm like probably live 10 billion lifetimes to live in this body…
personally like I think my philosophy is I am in a crazy rush in my brain. I'm like probably live 10 billion lifetimes to live in this body right now and I need every single moment to count. and then if you can token max it's like I mean you could buy millions of years of consciousness of machine consciousness. Now I can be a time billionaire. It's not you know my own time. It's the time of a machine like doing work for me and like the human entities that I care about working on the causes that I care about, right? I care about YC. I care about builders being able to build. Even in a lot of our internal meetings last year, remember in our offsites, we would talk about like how do we teach the next generation how to use these tools? And
New · 06
Historical Analogy · Personal Computing

Personal AI as Personal Computer.

The 2026 analogue of the 1976 Apple-I moment. Two paths split: hosted AI (a curated feed; someone else's prompts and business model) versus owned AI (your prompts, your data, your loop). Frames the choice not as feature comparison but as autonomy. Most users will not notice the fork until it has closed.

Tan invokes the Homebrew Computer Club and the original Apple I — a breadboard in a wooden case, held together with duct tape. The 2026 analogue is a $500 token spend, a MacBook, and a stack of skill markdown. The fork in the road, he says, is between owning your prompts and renting your cognition from someone else's algorithm.

New · 07
Reframing · Code & Language

Markdown is Code.

Plain-English skill files are an executable specification compiled differently. The corollary: the people who can write precise prose now have a path into systems they previously could not author. Not democratisation in the cheap sense — the writing must still be good — but the union of "writers" and "developers" enlarges sharply.

Delivered while Tan is defending himself against trolling on X. The argument: people who can write precise prose now have a path into systems they previously could not author. Not democratization in the cheap sense — the writing must still be good — but the union of "writers" and "developers" enlarges sharply.

New · 08
Architectural Heuristic · Latent vs Deterministic

Latent-Space-Aware Engineering.

Decide explicitly: which parts of your system run in deterministic code (zeros and ones, brittle, exact), and which run in LLM latent space (semantic, fuzzy, context-aware)? The new architecture diagram has two halves, not one. Most agentic-engineering pain comes from putting the wrong logic on the wrong side.

Tan extends the wedding-planner analogy: code doesn't know who you are; LLMs do. "The magic right now as an engineer is figuring out how much of it is over here in LLM land and how much over there in code land." Most agentic failures, he claims, come from putting logic on the wrong side of that line.

things that should you know be deterministic like I mean or is is a real action like a a wedding planner might have to call like 20 venues r…
things that should you know be deterministic like I mean or is is a real action like a a wedding planner might have to call like 20 venues right but you wouldn't use markdown for that like you would make a you know a call to Twilio for instance right there's like a you sort of all of the difficulty in enantic engineering today is when people try to do things that should be in markdown in code and it fails because code is brittle it doesn't understand special cases. It actually you know code literally doesn't understand what you want or who you are. It is like you know executing deterministic zeros and ones in a touring complete loop right like it doesn't know but then now we have LLMs that have latent space and they know who you are and it knows what your motivations are and it can handle generic cases and then you know a lot of the the magic right now as an engineer is like figuring out okay how much of it is over here in LLM land and how how
New · 09
Re-rated Idiom · Strategy

Boil the Ocean (re-rated).

Old idiom, opposite advice. Once a synonym for misallocated effort, "boil the ocean" now describes the move that fits the moment. Re-rate it whenever the marginal cost of completeness collapses in your domain. The shape of the heuristic is the same; its sign has flipped.

Boil the Ocean is Tan's own essay title from before this episode. Old idiom for "don't try to do everything." His revision: when the LLM can do everything cheaply, you should. Same shape, opposite advice.

V · The Field Card

When to reach for which.

A practical question, not a theoretical one: standing in front of a real decision, which of these models do you actually pull off the shelf?

VI · Coda

The latticework, after Lightcone.

Munger's argument for the latticework was always anti-fragility: many independent disciplines, each generating models, so that no single failure of any one model ruins your judgement. This episode is useful precisely because it does not respect the existing inventory. It takes some classics and amplifies them. It bends others. It contributes a handful of new ones with surprising portability.

Will you have control over your own tools, or will your tools have control over you? That is the defining question. — Gary Tan, Lightcone S26E19

The honest summary is that the latticework, after listening, is heavier. Heavier in the load-bearing sense — more tools, applied more often, against decisions that used to be made by reflex. The episode's most enduring contribution may turn out to be neither the 400× number nor the GStack repo, but the pattern it sets: watch closely whenever a frontier moves a marginal cost to zero, because the model you trusted last week probably needs to be re-rated.

★   END   ★
A latticework reading of Lightcone S26E19, against fs.blog/mental-models.