This is intended to be an accessible, simple tutorial for to explain how a buffer overflow works.
Assumptions:
- You have some basic programming knowledge
- You have a Ubuntu 16.04 LTS setup or similar
First, we are going to create our vulnerable program. It doesn't need to be fancy, it just needs to accept user input in an insecure way.
Create a file called hackme.c and put this C code in it:
#include <stdio.h>
void win() {
printf("This should never be printed.\n");
}
int main(int argc, char** argv) {
char user_in[16];
printf("Enter a string: ");
scanf("%s", user_in);
printf("Done.\n");
}
Let's go line by line through this and see what it does.
#include <stdio.h>
This line, and any that starts with a "#" is a preprocessor directive. It instructs the preprocessor to make use of the stdio.h file, which contains the function definition for printf and scanf, which we plan to use. If you want to see what this file looks like, you can look at it manually at /usr/include/stdio.h.
void win() { ...
This declares a function called win. The void in front indicates that it does not return anything, and the empty parentheses indicate that it does not take any arguments. The curly braces after each function declaration bound what code is included in that function.
printf("This should never be printed.\n");
This line makes use of the printf command, which formats and prints data. It is similar to print functions in other languages.
int main(int argc, char** argv) { ...
This is a function declaration like above, but for our main program. By calling this function main, we are telling the compiler that this should be run as soon as the user starts the program. The "int" at the beginning says that we will return an integer. By convention, main functions return an integer which represents the status code of the program upon exit. The two arguments to the main function allow us to access command line arguments. More on this later.
char user_in[16];
Now it gets interesting. This creates an array of type "char" for character called "user_in" with a length of 16. Since a character is 1 byte, this is going to allocate a space of 16 bytes
on the stack. At the assembly level, the stack pointer is going to be decremented by 16 (the stack grows downward in memory space). The variable name user_in is essentially just a convenient human shortcut to represent the offset from the base pointer to this free space of 16 bytes on the stack.
Question: What is in this 16 byte space on the stack?
Answer: That is undefined! We just allocated space, so whatever was in that spot in memory is likely still there. You should never rely on memory space being "clean" or zeroed out.
scanf("%s", user_in);
This is the last function we haven't talked about. Scanf is going to look at stdin (keyboard input in this case) for something that fits the format of a string ("%s"). It is then going to write that string to user_in. Keep in mind that user_in is a pointer to the 16-byte area in memory that we allocated.
Now we'll compile and run our vulnerable program.
gcc hackme.c -o hackme
./hackme
You can try entering any short string, such as your name. It should simply print "Done." Whatever string you are entering is being written in the 16 byte space we allocated earlier. But wait, what happens if we enter a string longer than 16 characters? Try it!
You can always run this program and then type or copy-paste your input manually, but you can use Python to automate your input:
python3 -c 'print("A"*37)' | ./hackme
Using the -c argument, we are running Python code on the command line. The print statement is printing the string "A" 37 times. Isn't Python cool?
If you enter a long enough string, you should get this message: "*** stack smashing detected ***"
Wait a second, what is the stack? What does smashing it mean?
[There will later be a stack tutorial here]
Our stack smashing attack was detected because the compiler puts special values called
canaries on the stack. The compiler also inserts code to check (just before function return) whether these canaries have been modified. If they have been modified, something's fishy. One way to get around stack canaries is to figure out where they are and what values are supposed to be. We can then ensure when we overwrite the stack we overwrite those with the same values! For now, though, we'll just recompile without these. We'll also be using 32-bit from now on.
gcc -m32 hackme.c -fno-stack-protector -o hackme
To make that work you may have to run this:
sudo apt-get install lib32z1 lib32ncurses5 gcc-multilib
Putting our 37 "A"s into the program again, you should get a segmentation fault! This is exciting! Breaking stuff is usually the first step to exploiting it.
0804848b