Creating a programming language
31 July 2025
1 From Electrons to Bytes
Every computer is a colossal city of microscopic switches called transistors. Each switch is either conducting electricity or blocking it. We label these states 1
and 0
. A single bit isn’t impressive, yet eight neighbours joined at the hip form a byte—the basic currency of memory.
Fig 1 – eight bits dancing between 0 and 1 combine into a byte (0‑255).
Why care? Because every extra bit doubles the number of distinct patterns. One bit can only say “yes/no”; two bits track four states; ten bits already distinguish 1 024 cases. That exponential head‑room is how tiny chips stream 4K video and protect cryptographic keys.
Fig 2 – slide to feel how information explodes with more bits.
Fig 3 – turning raw patterns into real‑world meaning.
The same 0‑1 patterns can wear many hats. With one bit we encode truth;two bits steer a robot across the four cardinal points;four bits cover every hexadecimal digit you type in CSS; bump to seven bits and you hold the entire classic ASCII table.
A cleaner mental image of memory is a long street of numbered mailboxes. Address 0x0000
is the first mailbox, 0x0001
the next, and so on. Each mailbox stores exactly one byte. Ask the hardware “what’s at 0x0042
?” and it walks to that mailbox and hands back the byte inside.
Fig 4 – sixteen consecutive mailboxes. The first six spell “HELLO\n” in ASCII.
Meaning lives in the reader. The sequence 48 45 4C 4C 4F
might be text ("HELLO"
), five CPU opcodes, or part of a JPEG header—depending on the software interpreting those addresses. A pointer in C is just a street address; a Python string is an address plus a length; an executable is a giant relocation map telling the OS which mailboxes should land where.
When a program launches, its bytes are copied onto this street and the CPU’s program counter points at the first instruction.
Fig 5 – the CPU repeats a fetch → decode → execute ballet billions of times per second.
At this raw level the processor knows nothing about variables or objects—it only consumes opcodes. All the structure we enjoy in high‑level languages eventually compresses into this binary choreography.
48 65 6C 6C 6F 2C 20 43 50 55 21 48 65 78 20 64 75 6D 70 20 6F 66 20 48 65 6C 6C 6F 2C 20 43 50 55 21
Listing A – a hex dump of the ASCII string “Hello, CPU!” stored as raw bytes.
section .text global _start _start: mov rax, 60 ; sys_exit mov rdi, 42 ; status code syscall
Listing B – minimal x86‑64 assembly: exit with status 42.
These two listings are different facets of the same reality: bytes that shape behaviour. In Listing A we gaze at data; in Listing B the bytes are behaviour. Grasping that duality prepares us for the next chapter, where we climb one rung up to study machine instructions—the specific binary patterns each CPU understands.
2 Machine Instructions
The journey from abstract algorithms to working silicon begins with machine instructions. These are small binary sentences the CPU can speak natively. Each instruction encodes what to do (an opcode
) and who to do it with (registers, memory addresses, or immediate values).
A modern x86‑64 chip exposes sixteen 64‑bit general‑purpose registers. Here are the first four, wired into the arithmetic‑logic unit (ALU) that performs additions, bitwise ops and comparisons:
Unlike high‑level languages, machine code is obsessed with placement: which register holds which operand, how many bytes the immediate value spans, and whether memory operands cross cache‑line boundaries. This low‑level specificity is what makes code run quickly but also what makes it brittle across architectures.
48 05 05 00 00 00 C3 ^ ^ ^^^^^^^^^^^ ^ | | | └─ 0xC3 = RET | | └────────── Immediate 32‑bit value 5 | └────────────────── Opcode ADD (imm32 to RAX) └───────────────────── REX.W (64‑bit operand)
Listing C – seven raw bytes implementing ADD RAX, 5
andRET
. The initial 0x48
is the “REX.W” prefix that switches the CPU into 64‑bit operand mode.
; add 5 to RAX and return mov rax, 2 ; set RAX = 2 add rax, 5 ; RAX = RAX + 5 ret
Listing D – human‑readable Intel syntax disassembly of the same idea.
Notice how a single assembly instruction add rax, 5
blurred into multiple machine‑code fields: a prefix byte, an opcode byte, a mode‑specifying ModR/M byte and a 32‑bit immediate operand. The assembler takes care of that busywork for us, which is why the next chapter pivots to Assembly Language—a thin textual veneer over these raw opcodes that humans can actually edit.
3 Assembly Language
After wrestling with raw opcodes, developers in the 1950s invented assembly languages: symbolic spellings for machine instructions plus labels, macros and directives that guide the build toolchain. Assemblers translate these mnemonics back into binary with zero loss of information.
Fig 4 – an assembler converts text into a relocatable object file; a linker stitches multiple objects into a runnable binary.
Two dominant syntaxes exist in x86 land: Intel (source → destination) and AT&T (destination ← source, with %
prefixes). Toggle between them below; note how only the notation changes, not the underlying semantics.
; Intel syntax (NASM/FASM) section .data msg db "Hello, world!", 10 len equ $-msg section .text global _start _start: mov rax, 1 ; sys_write mov rdi, 1 ; stdout lea rsi, [rel msg] ; buffer mov rdx, len ; length syscall ; write(...) mov rax, 60 ; sys_exit xor rdi, rdi syscall
Listing E – the canonical “Hello, world!” for Linux x86‑64 in your preferred syntax.
Although assemblers differ in macro facilities, they all follow a predictable workflow:
- **Scan tokens** – identify mnemonics, registers, literals and labels.
- **Two‑pass resolution** – compute symbol addresses and fix forward jumps.
- **Encode** – translate each instruction into its final byte form.
To illustrate, here is the single‑line arithmetic example re‑expressed in assembly and its final bytes:
_start: mov rax, 2 add rax, 5 ret
48 c7 c0 02 00 00 00 48 05 05 00 00 00 c3
Listing F – assembly (left) versus encoded bytes (right).
You can already see the productivity gain: we write mov
andadd
, the assembler conjures up REX
prefixes, opcode extensions, little‑endian immediates and relocation entries. Chapter 4 shows how high‑level languages like C free us from even these register shuffles, but the assembly mindset lingers in calling conventions, optimisation and debugging.
4 The Rise of C and Beyond
In 1972, Dennis Ritchie welded together ideas from BCPL and Ken Thompson’s B language to create C. Its mission: deliver a high‑level syntax while remaining close enough to the metal to implement an operating system. That OS was UNIX; C became its portable lingua franca.
Unlike assembly, C grants structured control flow, typed variables, and functions, yet compilers still emit tightly tuned machine code. This sweet‑spot made C the dominant systems language for decades.
#include <stdio.h> int main(void) { puts("Hello, portable world!"); return 0; }
Listing G – the canonical “Hello, world!” that every K&R reader typed first.
Show GCC output (‑O1, x86‑64)
main: push rbp mov rbp, rsp lea rdi, .L.str[rip] call puts@PLT xor eax, eax pop rbp ret .L.str: .string "Hello, portable world!"
Listing H – what the compiler produces: still readable once you know assembly, but thankfully generated for you.
Pointers, manual memory management and undefined behaviour give C raw speed and room for foot‑guns. Successors attempt to tame that power without sacrificing performance:
Feature | C | C++ | Go | Rust |
---|---|---|---|---|
Memory mgmt | malloc/free | RAII + new/delete | Garbage‑collected | Ownership & borrow checker |
Generics | — | Templates | Type parameters (1.18+) | Parametric generics |
Concurrency | Threads (POSIX, Win32) | std::thread | Goroutines & channels | Async/await + threads |
Safety net | Undefined behaviour | UB + sanitize flags | Runtime panics | Compile‑time safety |
Each evolution tackles the same question: how close to the hardware can we code while protecting the developer from catastrophe? C++ layers zero‑cost abstractions like std::vector
; Go trades a garbage collector for unprecedented simplicity; Rust enforces safety at compile‑time via ownership rules.
The next chapter steps back from language history to a deeper insight: regardless of surface syntax, all Turing‑complete languages compute the same class of problems. We’ll explore that by comparing esoteric and mainstream languages under the lens of computational expressiveness.
5 Same Power, Different Styles
By now we have ascended from electrons to C. A natural question arises:Does adding more syntactic sugar make a language fundamentally more powerful? Surprisingly, the answer is no. Once a language can store arbitrary data, branch on conditions, and loop or recurse, it can emulate any other such language.
Different languages and styles can reach the same results. The examples below show the same tasks expressed in a few ways—pick what reads best or runs fastest for your project.
1def sum_to_n(n: int) -> int:
2 total = 0
3 i = 1
4 while i <= n:
5 total += i
6 i += 1
7 return total
Listing I – one algorithm, written four ways. Each produces the same sum of integers from 1 to N.
What matters is not the characters you type, but the building blocks a language offers. Below we map three essential constructs across four radically different ecosystems:
Building Block | Python | Brainf*ck | Excel |
---|---|---|---|
Storage | x = 0 | > | A1 |
Conditional | if x == 0: | [...] | IF() |
Loop | while i<n: | [] | SEQUENCE() |
With these pieces you can build the same programs you could in other general‑purpose languages. We judge languages by how they help us work (clarity, tools, speed), not by what they can compute.
Same power, different paradigms
Imperative, functional, object‑oriented, and logic styles feel different but compute the same results. Below, the same function (sum of squares 1..N) in imperative vs. functional style. Switch the language to see the flavour change while the essence doesn’t.
Imperative
1def sum_squares(n: int) -> int:
2 total = 0
3 for i in range(1, n + 1):
4 total += i * i
5 return total
Functional
1def sum_squares(n: int) -> int:
2 return sum(
3 map(lambda i: i * i, range(1, n + 1))
4 )
Listing J – same algorithm, different paradigms. The choice is about clarity, composition, and performance—not capability.
Loop
1def fact(n: int) -> int:
2 acc = 1
3 for i in range(2, n + 1):
4 acc *= i
5 return acc
Recursion
1def fact(n: int) -> int:
2 return 1 if n <= 1 else n * fact(n - 1)
Listing K – loop and recursion are two lenses over the same computation.
Performance vs Power
While power caps out quickly, performance varies wildly. Implementing AES encryption in Brainf*ck is technically possible but practically absurd. In real‑world engineering we optimise for:
- Execution speed (CPU, JIT, GPU, distributed)
- Memory footprint and allocation patterns
- Developer velocity (readability, tooling, ecosystem)
The coming chapter translates these insights into design decisions forour own language: which primitives to expose, which to compile away, and how to keep the toolchain approachable.