The current generation of AI coding tools — Cursor, Copilot, Claude Code, Windsurf — solved a real problem. For developers building web applications, internal tools, SaaS backends, and the broad swath of "code that runs on rented servers and shows up on screens," these tools delivered the largest productivity step-function in a generation.

I've used them. They are remarkable.

But they were built for a specific kind of code, by people who write that kind of code, optimized for the patterns of that kind of code. And when you take them outside that context — when you ask them to reason about a firmware bug that only manifests after 47 hours of runtime, or a sensor driver that fails silently when the I2C bus is at 99% utilization, or an interrupt handler that crashes a forklift in the back corner of a warehouse — they break in instructive ways.

This essay is about what those breaks reveal, and why I think the next generation of AI engineering tools will look very different from the current one.


The hidden assumption

Look closely at any of the leading AI coding assistants and you'll find a hidden assumption baked into their design: the code being written today looks roughly like the code being written yesterday, on the same machine, in the same language, against the same APIs.

This is reasonable. The training data overwhelmingly reflects this world. JavaScript on browsers, Python in notebooks, Go services on Linux. The patterns are dense. The feedback loops are short. The errors surface in seconds. The runtime is forgiving.

The assumption holds beautifully — until the code has to leave the screen.

The moment your code begins to interact with the physical world — through a sensor, an actuator, a real-time constraint, a microcontroller, an industrial protocol — almost every assumption underneath modern AI coding tools begins to fail.

Let me show you specifically how.


Five places where general-purpose AI fails embedded engineers

1. Context that doesn't fit in a context window

A typical AI coding session works like this: the assistant reads the relevant files, understands the intent, and proposes changes. The model is reasoning over the visible code.

In embedded development, the most important context is often invisible:

None of this lives in your repository. None of it can be easily piped into a prompt. And without it, an AI's suggestions are worse than useless — they look authoritative while being subtly wrong in ways that pass code review and fail in production.

A general-purpose AI looks at your code and says "this looks fine."

A purpose-built tool says "this looks fine if you're running on the silicon revision after B0. If you're on A2, this exact pattern triggers an erratum on the SPI peripheral."

Those are not the same product.

2. Bugs that aren't bugs in the data

Most AI coding tools learned what a bug looks like by training on millions of examples of code-with-bugs and code-without-bugs from public repositories. This works extraordinarily well for the kinds of bugs that get caught, fixed, and committed publicly.

It works poorly for the bugs of physical computing.

Race conditions in interrupt handlers don't usually show up in a public commit titled "fix race condition in ISR." They show up as a single line changed in a commit titled "increase delay" — because the engineer who fixed it didn't know why increasing the delay worked, only that it did. The cause was never written down. It only exists in the head of someone who has debugged dozens of similar systems.

The training data has a survivorship bias problem in physical computing: the real lessons are encoded in folklore, not Git history.

3. The cost asymmetry of mistakes

In SaaS development, an AI suggestion that's wrong costs you a few seconds and maybe a deploy. In embedded development, an AI suggestion that's wrong can cost you:

This shifts the value of uncertainty quantification from "nice to have" to "essential." A general-purpose AI tool will confidently suggest a fix without flagging that it has lower confidence in this domain. A purpose-built tool for physical-world code needs to be deeply aware of what it doesn't know — and to refuse to guess.

Embedded engineers have a working theory of this already. They call it "respect for the metal."

The current generation of AI tools does not respect the metal.

4. Real-time constraints that AI doesn't see

When a general-purpose AI suggests refactoring a function for clarity, it considers correctness and readability. It does not consider:

In a typical embedded system, correctness alone is insufficient. A correct function that takes 12 microseconds where the system needs 8 microseconds is, for practical purposes, broken.

This is not a small problem. Real-time constraints are the defining characteristic of an enormous fraction of physical-world software, and AI tools today are essentially blind to them.

5. The sensor data problem

The deepest failure of current AI tools in physical computing isn't about code at all. It's about the data the code consumes.

Every embedded system that does anything interesting reads from sensors. Every one of those sensors lies — in small, physically lawful ways. They drift with temperature. They pick up electromagnetic noise. They sample on edges they shouldn't sample on. They exhibit hysteresis. They saturate. They quantize.

A correct piece of code that reads a sensor naively and trusts the value will fail in the field 100% of the time. The handling of these physical realities — the filtering, the calibration, the cross-validation — is central to embedded software, but invisible to an AI that has only ever seen software running on machines that pretend to be perfect.

This is the gap I find most interesting. It's also the gap where the next generation of tools will be built.


What a purpose-built tool for this world looks like

I've spent years thinking about this problem from multiple angles. As a researcher studying MEMS sensors and the physics of how mechanical motion becomes a digital signal. As an AI agent engineer at Alibaba, writing code that reasoned about other code. As an IoT developer building systems that, looking back, would have benefited enormously from the kind of tooling that didn't exist.

Here is what I think a tool built specifically for embedded, IoT, and robotics engineers should do, and not do:

It should not:

It should:

This is a narrower product than the current AI coding tools. It serves a smaller audience. It will never have the universal appeal of GitHub Copilot.

I think that's exactly why it should exist.


Why I think this is the shape of the next decade

The first wave of AI in software engineering was about reaching the largest possible audience: every developer, every codebase, every IDE. This worked. The tools are useful. The market is enormous.

The second wave will be about depth. Specifically, it will be about the kinds of engineering that the first wave was structurally incapable of serving — the engineering where the code has to interact with physical reality, where mistakes are expensive, where the context lives outside the repository, where the rules are different.

This is not an obvious thesis. It looks "smaller" than the first wave. It is. But the per-engineer value is much higher, the moats are much deeper, and the tools that succeed here will look almost nothing like the chat-driven, prompt-everything paradigm that dominates today.

The space between "AI that can write code" and "AI that can write code which controls a real machine that affects real people in real time" is enormous. Most of it is empty.

That's where I'm building.


A note on what I'm shipping

This essay is the rationale behind the first product from PhyCyber — a small studio I'm running to build AI tools for engineers in the physical world.

The first instrument is called Code Sentinel. It's a local code review tool for embedded, IoT, and robotics codebases. The first alpha is intentionally narrow: 8 static rules for FreeRTOS and ISR-context mistakes that often survive general code review and fail in the field.

You can run the alpha today from GitHub. It ships with a FreeRTOS demo, terminal output, JSON for CI, and a self-contained HTML report. PyPI packaging is next; for now, install directly from the repo:

pip install git+https://github.com/lonrun/code-sentinel
sentinel scan ./firmware --target=stm32 --rtos=freertos

If you build software that has to interact with the physical world, and you've felt the gap I've described in this essay, run it on one real file and send the report. The next version will be shaped by those runs, not by a roadmap written in isolation.

I write about these ideas occasionally. I don't post daily updates, build-in-public threads, or product teasers. When the next thing is ready to talk about, I'll write again.