Monday, June 15, 2020

Source Code vs. Machine Code

An excerpt from my upcoming book:

Here is an example of source code:

```
int add(int a, int b) {
    int c = a + b;
    return c;
}
```

This is a function that takes two numbers (integers in this case), adds them, and returns the sum.  Computers, of course, read code in zeros and ones.  Raw data (the zeros and ones) that correspond to instructions is called machine code, and here is what it looks like:

```
10101011000100111100101100000111110110000010000111010000110010
10000000000000000000000000000010110110000000110100000000000000
00010001011010101010000100010001011010001010000110000000001110
10000100010010100010111111100100010110100010111111100110010011
1000011
```

Got it, right?  Good, because if you don't understand that machine code, you should probably just give up on this cyber thing.

Totally kidding.  No one, even experts, are expected to be able to read binary.  People have to program computers somehow, though.  Fortunately, there is a language to define computer behavior that can be read by humans: assembly language.  An example of assembly code is below:

```assembly
push ebp
mov ebp, esp
sub esp, 16
call __x86.get_pc_thunk.ax
add eax, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
mov edx, DWORD PTR 8[ebp]
mov eax, DWORD PTR 12[ebp]
add eax, edx
mov DWORD PTR -4[ebp], eax
mov eax, DWORD PTR -4[ebp]
leave
ret
```

Each line of assembly code consists of an instruction and zero or more arguments to that instruction.  Let's consider the third line above, `sub esp, 16`.  The 'esp' refers to a register, which is a physical part of the processor that stores a number.  If you were a computer and were doing math on your hands, you might say that each hand is a register that can store a number between zero and five.  This line of assembly is telling the processor that it should subtract 16 from the register esp.

It is actually quite easy to convert between assembly language and machine code, because assembly is intended to be a human readable version of machine code.  When Steve Wozniak wrote some of the first software for Apple computers, he did so in math class by writing the assembly in his notebook.  Then, for each line, he looked up in a table for the corresponding hexadecimal and wrote that in the margin.  Since it just requires a table lookup to go between machine code and assembly language, we will use assembly as a proxy for what the machine is actually doing.

Fun fact: the software mentioned above was a BASIC compiler for the Apple I.  Wozniak was inspired to write it because he was annoyed at all the attention some jerk named Bill Gates was getting at the Homebrew Computer Club meetings for writing a BASIC compiler for an earlier Altair computer.

No comments:

Post a Comment