George Hotz Warns AI Coding Agents May Flood Software With Hidden Defects

Add on Google

AI coding agents are winning attention for the wrong reason, according to George Hotz, the hacker who broke into the iPhone at 17. He warns that wide adoption of these tools could become one of the most expensive mistakes in software history.

Hotz is not speaking from theory. He says he has used coding agents for six months on real work, including parts of Tinygrad and a full reverse-engineering effort on USB-PCIe chip firmware.

Fast gains, hidden damage

His main concern is not the early boost in speed. Hotz says agents can move projects forward quickly at the start, but the hard work often returns to humans once the output needs correction.

He argues that the code often looks broken in obvious ways at first, then becomes harder to spot as the errors get subtler. In his view, that pattern fits a statistical model that gets better at imitation without truly understanding the program.

Hotz also rejects the idea that this criticism comes from ego or fear of replacement. For him, the real issue is that software quality can degrade in a systematic way.

The risk grows when everyone uses them

The danger, Hotz says, rises sharply when an organization pushes agents across a large number of developers. He sees pressure from major tech companies and Wall Street to adopt these tools broadly, and he thinks that trend could lower the average quality of code.

That, he argues, creates more output without creating more quality. He describes the result as a future filled with “buckets and buckets of slop” while high-quality work becomes much rarer.

Apple is one example that caught his attention. Hotz points to reports that the company is pushing AI coding tools across its engineering organization, and he questions what that could mean for macOS quality over the next two years.

Quality control matters more than raw volume

Hotz frames the issue as a team problem as much as a tool problem. High-performing workers, he says, usually have enough feedback loops to catch agent mistakes before code is shipped.

Lower-performing workers do not always have the same self-checking habits. If those teams use agents to produce ten times more output, Hotz believes the end result can be faster degradation that stays hidden behind volume.

He also contrasts coding agents with other tools that can find bugs. Google’s AFL, he notes, can uncover more bugs than LLMs, but it does not trigger the same debate about pride or status.

The broader debate is still unresolved

Hotz now sits with a skeptical camp that includes Yann LeCun of Meta and Gary Marcus. That view holds that language models can mimic existing code distributions, but still cannot reason through new problems from first principles.

The industry, however, is moving in the opposite direction. Vibe coding, where a person describes what they want in plain language and lets AI build the implementation, has surged over the past year.

Microsoft has also moved GitHub Copilot into a fully agentic system in 2025. Satya Nadella has described that shift as a platform-level change comparable to the move to the cloud.

Even former skeptics are changing course

The debate has grown sharper because some prominent figures have become more optimistic. Andrej Karpathy, once skeptical about agents earlier in 2025, changed his position after a new model release.

Karpathy joined Anthropic’s pre-training team on 19 May and said the next few years at the frontier of AI will be “especially formative.” Anthropic’s own leadership says internal habits are changing as well.

Dario Amodei said at Davos that some Anthropic engineers have already stopped writing code themselves and now let the model do it before reviewing the results. Hotz says he tried a similar workflow, but still ended up doing the fixes manually.

For him, that remains the clearest sign that AI coding agents are not yet ready to serve as the backbone of software development at scale.