The most original post:
If you google “elon musk ai will skip code machine code,” you may find that an overwhelming number of software developers disagree with and criticize this idea for various reasons. I picked a few in-depth articles whose authors apparently have a wide range of knowledge and a deep understanding of computing science and software engineering.
I would consider what Elon said to be quite feasible.
I don’t have any insider information. Elon Musk means that AI would generate machine code directly, rather than generating source code and invoking a compiler targeting a specific type of processor:
Current: Code → Compiler → Binary → Execute
Future: Prompt → AI-generated Binary → Execute
The training of Grok Code should be fundamentally different from the training of other LLM AI code agents that produce human-readable source code.
Disclaimer:
As a programmer, I am not old enough to have used plugboards and switches in the pre-machine-code era, nor punch cards and paper tape for writing machine code. However, I have used Fortran, Assembly, and PLC at university as part of coursework while studying digital technologies for a bachelor’s degree in Physics. I also wrote an M68K processor emulator as part of my master’s thesis, using C++ code to interpret M68K machine code.
Over the years, I have used C / C++, Turbo Pascal, VB, Delphi, C#, and TypeScript for day-to-day coding. As far as I understand, high-level languages and most principles of computer science and software engineering are designed for flesh-and-blood human brains, like the following:
No matter how well you conform to those principles or how clean your source code is, it will eventually be compiled and linked into something resembling spaghetti—something human brains hate but computers do not care about.
Clean code can facilitate better compile-time, link-time, and run-time optimizations, and the performance boost could be up to 25%. I think that current optimization algorithms are largely written as fixed rules by programmers, and generally such rules reward clean code.
In short, these practices exist to help human brains digest functional and technical complexity in order to deliver a working program.
I presume you have a basic idea of how AI code agent vendors train their models. However, I am not sure what raw code they collect or how they label it:
As a programmer, I have benefited greatly from AI code agents, partly because I am poor at remembering trivial technical details. So far, I have found that once I ask AI code agents to implement a non-trivial feature—no matter how detailed, formal, or simple the prompt—the generated source code is usually over-bloated in design and implementation, and the line count is typically 3–5 times what it should be.
“3–5 times” sounds like a magic number to me. In several commercial rewrite projects and one rewrite of an open-source tool—using the same technical stack and language:
Regarding the legacy codebases:
Basically, these legacy codebases looked “politically correct” with respect to SOLID, but were simply over-complicated and too lengthy.
I believe these programmers jumped directly into implementing their first workable idea without evaluating simpler alternatives, without spending enough time understanding business and technical contexts, and without following basic Agile practices: starting with basic working code, writing plenty of unit and integration tests, and actively refactoring in each iteration.
The purpose is not to make the design look elegant or impressive, but to deepen understanding of business and technical contexts through frequent communication with technical peers and business stakeholders.
Do I write 2–4 times more lines of code than average senior developers in my city?
Not really. When I lead the SDLC, I typically spend around 1/3 to 1/4 of my billable hours coding (including testing), especially during the early 1/4 to 1/3 of the SDLC, when architecture and software design are taking shape. I spend the remaining time thinking, studying, and communicating with stakeholders. Even if I produce the same amount of code—or less—the overall maintenance cost is dramatically reduced.
Let’s review what Robert C. Martin said in “UML for Java Programmers”:
When should these principles be applied? At the first hint of pain. It is not wise to make all systems conform to all principles all the time. You will spend an eternity imagining possible environments for OCP or sources of change for SRP, create dozens of little interfaces for ISP, and invent many worthless abstractions for DIP.
The best way to apply these principles is reactively rather than proactively. When you detect a structural problem or notice a module being affected by changes elsewhere, then consider whether one or more of these principles can help.
A reactive approach also requires a proactive effort to create pressure early. If you want to react to pain, you must diligently search for sore spots.
One of the best ways to do this is to write lots of unit tests—ideally before writing the code itself. But that is a topic for another chapter.
Do you see possible correlations between AI code agents and some senior developers?
From prompt to AI-generated binary, this approach may avoid the code bloat caused by accumulated and complex designs that other AI code agents typically produce. A sufficiently powerful AI with strong hardware can handle enormous complexity without relying on CS and SE techniques developed to aid human programmers.
Grok Code likely has its own mechanisms to avoid code bloat, and SOLID and design patterns likely play near zero role.
This approach effectively eliminates human intervention: review, verification, and validation regarding product quality.
Even if you disassemble machine code into assembly code, the resulting code is extremely difficult for humans to read, even if AI attempts to polish it by adding symbolic names.
The entire binary becomes a black box. At best, it behaves as expected but with more than you have bargained for; at worst, it may contain multiple Pandora’s boxes—unless you fully trust AI-based review, verification, and validation.
This statement is often attributed to Charles Kettering. In the context of AI generating code, it means that clearly defining a prompt makes it significantly easier for the AI to produce code that aligns with human expectations. However, here are two questions:
For example, as a mathematician, you may use Grok Code to generate the whole Matlab-like libraries.
Given a complex Swagger/OpenAPI definition—such as those used by Medicare Online, can an AI code generator produce usable code in C#, TypeScript, Java, and other languages?
Apparently, ChatGPT and Copilot cannot. Otherwise, Microsoft would have released an online AI‑based code generator to handle this task, instead of delivering Microsoft Kiota.
A Swagger / OpenAPI definition is clearly a well stated problem.
I’ll be interested to see whether, by the end of the year, Grok Code can generate a client library in machine code—based on the Medicare Online OpenAPI definitions—running on Windows 11 and an Intel processor.