Creating a programming language

31 July 2025

1 From Electrons to Bytes

Every computer is a colossal city of microscopic switches called transistors. Each switch is either conducting electricity or blocking it. We label these states 1 and 0. A single bit isn’t impressive, yet eight neighbours joined at the hip form a byte—the basic currency of memory.

Fig 1 – eight bits dancing between 0 and 1 combine into a byte (0‑255).

Why care? Because every extra bit doubles the number of distinct patterns. One bit can only say “yes/no”; two bits track four states; ten bits already distinguish 1 024 cases. That exponential head‑room is how tiny chips stream 4K video and protect cryptographic keys.

Fig 2 – slide to feel how information explodes with more bits.

Fig 3 – turning raw patterns into real‑world meaning.

The same 0‑1 patterns can wear many hats. With one bit we encode truth;two bits steer a robot across the four cardinal points;four bits cover every hexadecimal digit you type in CSS; bump to seven bits and you hold the entire classic ASCII table.

A cleaner mental image of memory is a long street of numbered mailboxes. Address 0x0000 is the first mailbox, 0x0001 the next, and so on. Each mailbox stores exactly one byte. Ask the hardware “what’s at 0x0042?” and it walks to that mailbox and hands back the byte inside.

0x0048

0x0145

0x024C

0x034C

0x044F

0x050A

0x0600

0x0797

0x083C

0x09FA

0x0a5D

0x0b00

0x0c13

0x0d00

0x0eBE

0x0fEF

Fig 4 – sixteen consecutive mailboxes. The first six spell “HELLO\n” in ASCII.

Meaning lives in the reader. The sequence 48 45 4C 4C 4F might be text ("HELLO"), five CPU opcodes, or part of a JPEG header—depending on the software interpreting those addresses. A pointer in C is just a street address; a Python string is an address plus a length; an executable is a giant relocation map telling the OS which mailboxes should land where.

When a program launches, its bytes are copied onto this street and the CPU’s program counter points at the first instruction.

Fig 5 – the CPU repeats a fetch → decode → execute ballet billions of times per second.

At this raw level the processor knows nothing about variables or objects—it only consumes opcodes. All the structure we enjoy in high‑level languages eventually compresses into this binary choreography.

48 65 6C 6C 6F 2C 20 43 50 55 21
48 65 78 20 64 75 6D 70 20 6F 66 20
48 65 6C 6C 6F 2C 20 43 50 55 21

Listing A – a hex dump of the ASCII string “Hello, CPU!” stored as raw bytes.

section .text
global  _start
_start:
    mov     rax, 60     ; sys_exit
    mov     rdi, 42     ; status code
    syscall

Listing B – minimal x86‑64 assembly: exit with status 42.

These two listings are different facets of the same reality: bytes that shape behaviour. In Listing A we gaze at data; in Listing B the bytes are behaviour. Grasping that duality prepares us for the next chapter, where we climb one rung up to study machine instructions—the specific binary patterns each CPU understands.

2 Machine Instructions

The journey from abstract algorithms to working silicon begins with machine instructions. These are small binary sentences the CPU can speak natively. Each instruction encodes what to do (an opcode) and who to do it with (registers, memory addresses, or immediate values).

A modern x86‑64 chip exposes sixteen 64‑bit general‑purpose registers. Here are the first four, wired into the arithmetic‑logic unit (ALU) that performs additions, bitwise ops and comparisons:

Unlike high‑level languages, machine code is obsessed with placement: which register holds which operand, how many bytes the immediate value spans, and whether memory operands cross cache‑line boundaries. This low‑level specificity is what makes code run quickly but also what makes it brittle across architectures.

48 05 05 00 00 00   C3
^  ^  ^^^^^^^^^^^   ^
|  |       |        └─ 0xC3 = RET
|  |       └────────── Immediate 32‑bit value 5
|  └────────────────── Opcode ADD (imm32 to RAX)
└───────────────────── REX.W (64‑bit operand)

Listing C – seven raw bytes implementing ADD RAX, 5 andRET. The initial 0x48 is the “REX.W” prefix that switches the CPU into 64‑bit operand mode.

; add 5 to RAX and return
mov     rax, 2          ; set RAX = 2
add     rax, 5          ; RAX = RAX + 5
ret

Listing D – human‑readable Intel syntax disassembly of the same idea.

Notice how a single assembly instruction add rax, 5 blurred into multiple machine‑code fields: a prefix byte, an opcode byte, a mode‑specifying ModR/M byte and a 32‑bit immediate operand. The assembler takes care of that busywork for us, which is why the next chapter pivots to Assembly Language—a thin textual veneer over these raw opcodes that humans can actually edit.

3 Assembly Language

After wrestling with raw opcodes, developers in the 1950s invented assembly languages: symbolic spellings for machine instructions plus labels, macros and directives that guide the build toolchain. Assemblers translate these mnemonics back into binary with zero loss of information.

Fig 4 – an assembler converts text into a relocatable object file; a linker stitches multiple objects into a runnable binary.

Two dominant syntaxes exist in x86 land: Intel (source → destination) and AT&T (destination ← source, with % prefixes). Toggle between them below; note how only the notation changes, not the underlying semantics.

; Intel syntax (NASM/FASM)
section .data
msg     db "Hello, world!", 10
len     equ $-msg

section .text
global  _start
_start:
    mov     rax, 1          ; sys_write
    mov     rdi, 1          ; stdout
    lea     rsi, [rel msg]  ; buffer
    mov     rdx, len        ; length
    syscall                 ; write(...)
    mov     rax, 60         ; sys_exit
    xor     rdi, rdi
    syscall

Listing E – the canonical “Hello, world!” for Linux x86‑64 in your preferred syntax.

Although assemblers differ in macro facilities, they all follow a predictable workflow:

**Scan tokens** – identify mnemonics, registers, literals and labels.
**Two‑pass resolution** – compute symbol addresses and fix forward jumps.
**Encode** – translate each instruction into its final byte form.

To illustrate, here is the single‑line arithmetic example re‑expressed in assembly and its final bytes:

_start:
    mov     rax, 2
    add     rax, 5
    ret

48 c7 c0 02 00 00 00   48 05 05 00 00 00   c3

Listing F – assembly (left) versus encoded bytes (right).

You can already see the productivity gain: we write mov andadd, the assembler conjures up REX prefixes, opcode extensions, little‑endian immediates and relocation entries. Chapter 4 shows how high‑level languages like C free us from even these register shuffles, but the assembly mindset lingers in calling conventions, optimisation and debugging.

4 The Rise of C and Beyond

In 1972, Dennis Ritchie welded together ideas from BCPL and Ken Thompson’s B language to create C. Its mission: deliver a high‑level syntax while remaining close enough to the metal to implement an operating system. That OS was UNIX; C became its portable lingua franca.

Fig 5 – compiling C involves #include substitution, optimisation, assembly generation, and final linking.

Unlike assembly, C grants structured control flow, typed variables, and functions, yet compilers still emit tightly tuned machine code. This sweet‑spot made C the dominant systems language for decades.

#include <stdio.h>

int main(void) {
    puts("Hello, portable world!");
    return 0;
}

Listing G – the canonical “Hello, world!” that every K&R reader typed first.

Show GCC output (‑O1, x86‑64)

main:
    push    rbp
    mov     rbp, rsp
    lea     rdi, .L.str[rip]
    call    puts@PLT
    xor     eax, eax
    pop     rbp
    ret
.L.str:
    .string "Hello, portable world!"

Listing H – what the compiler produces: still readable once you know assembly, but thankfully generated for you.

Pointers, manual memory management and undefined behaviour give C raw speed and room for foot‑guns. Successors attempt to tame that power without sacrificing performance:

Feature	C	C++	Go	Rust
Memory mgmt	malloc/free	RAII + new/delete	Garbage‑collected	Ownership & borrow checker
Generics	—	Templates	Type parameters (1.18+)	Parametric generics
Concurrency	Threads (POSIX, Win32)	std::thread	Goroutines & channels	Async/await + threads
Safety net	Undefined behaviour	UB + sanitize flags	Runtime panics	Compile‑time safety

Each evolution tackles the same question: how close to the hardware can we code while protecting the developer from catastrophe? C++ layers zero‑cost abstractions like std::vector; Go trades a garbage collector for unprecedented simplicity; Rust enforces safety at compile‑time via ownership rules.

The next chapter steps back from language history to a deeper insight: regardless of surface syntax, all Turing‑complete languages compute the same class of problems. We’ll explore that by comparing esoteric and mainstream languages under the lens of computational expressiveness.

5 Same Power, Different Styles

By now we have ascended from electrons to C. A natural question arises:Does adding more syntactic sugar make a language fundamentally more powerful? Surprisingly, the answer is no. Once a language can store arbitrary data, branch on conditions, and loop or recurse, it can emulate any other such language.

Different languages and styles can reach the same results. The examples below show the same tasks expressed in a few ways—pick what reads best or runs fastest for your project.

1def sum_to_n(n: int) -> int:
2    total = 0
3    i = 1
4    while i <= n:
5        total += i
6        i += 1
7    return total

Listing I – one algorithm, written four ways. Each produces the same sum of integers from 1 to N.

What matters is not the characters you type, but the building blocks a language offers. Below we map three essential constructs across four radically different ecosystems:

Building Block	Python	Brainf*ck	Excel
Storage	x = 0	>	A1
Conditional	if x == 0:	[...]	IF()
Loop	while i<n:	[]	SEQUENCE()

With these pieces you can build the same programs you could in other general‑purpose languages. We judge languages by how they help us work (clarity, tools, speed), not by what they can compute.

Same power, different paradigms

Imperative, functional, object‑oriented, and logic styles feel different but compute the same results. Below, the same function (sum of squares 1..N) in imperative vs. functional style. Switch the language to see the flavour change while the essence doesn’t.

Imperative

1def sum_squares(n: int) -> int:
2    total = 0
3    for i in range(1, n + 1):
4        total += i * i
5    return total

Functional

1def sum_squares(n: int) -> int:
2    return sum(
3        map(lambda i: i * i, range(1, n + 1))
4    )

Listing J – same algorithm, different paradigms. The choice is about clarity, composition, and performance—not capability.

Loop

1def fact(n: int) -> int:
2    acc = 1
3    for i in range(2, n + 1):
4        acc *= i
5    return acc

Recursion

1def fact(n: int) -> int:
2    return 1 if n <= 1 else n * fact(n - 1)

Listing K – loop and recursion are two lenses over the same computation.

Performance vs Power

While power caps out quickly, performance varies wildly. Implementing AES encryption in Brainf*ck is technically possible but practically absurd. In real‑world engineering we optimise for:

Execution speed (CPU, JIT, GPU, distributed)
Memory footprint and allocation patterns
Developer velocity (readability, tooling, ecosystem)

The coming chapter translates these insights into design decisions forour own language: which primitives to expose, which to compile away, and how to keep the toolchain approachable.