Hi Guys,
From so long I was thinking of putting some stuff together regarding all basic things about C that gives better understanding not only about C but also some about OS perspective. So Here is the start, (oh please... "Hello World!" again)
0. Prerequisites
Through out the post I am assuming that reader has some knowledge of C. All explanations/examples bellow are tested/explained on Linux system. To prepare system install following packages,
1. Compilation
Bear with the stupid example :)
This is basic C program, almost all programmers have written this program at least one in life.
Ok, so this is the C syntax to say computer to write/ display "Hello Guys!". What happens next?
Program to become executable it needs to go through certain steps,
1. Pre-processing
2. Compilation
3. Assembly
4. Linking
To see all intermediate file in process of compilation compile above program as follows,
This will save all intermediate files in current directory.
1. Pre-processing
This is very first step in process of compilation of the C program. It mainly does following tasks,
1. Macro substitution
2. Comments stripping
3. Expansion of included files
Output of pre-processor for above program is as shown bellow,
file explanation:
1. first observation is at end of the file you can see argument in printf() is replaced by string "Hello Guys!" instead of macro.
2. second is it does not have any comments we added, they have been stripped out.
3. Third observation in "#include<stdio.h>" is replaced by lots of other stuff. It has been expanded and added in the source file so that compiler can clearly see "printf()" function.
It says printf() is defined somewhere externally.
2. Compilation
Next, compiler takes helloguys.i as input and converts it into compiler intermediate output file. This is assembly code. You can see "helloguys.s" file bellow. its consists of assembly instructions of your processor. Next assembler takes this instructions and converts them into machine code.
3. Assembly
In this stage "helloguys.s" file is taken as input and intermediate file "helloguys.o" is generated. This is object file.
This file contains machine level instructions. But function calls to external functions are not resolved yet. Since this is machine code it's not readable. But still see, it will look as bellow,
ELF string in file tells that this is Executable and Linkable Format.
4. Linking
So, this is the final stage of compilation process. It does linking of function calls and their definitions. Like printf() function. Until now this code doesn't know anything about printf(). It only puts place holder at the place of function call. At his stage actual function call printf() gets resolved and function address gets plugged-in.
function of linker:
1. Plug-ins actual function address
2. It also adds some extra code/info in your file. To run our code we need to submit it to system, so this stage adds some extra code to your code like, at start of code OS needs to set some environment variables, command line arguments plus some OS specific code so that OS can signal your application. Also to while exiting, code needs to return some variables to OS which will help to maintain the status of code and its environment. This all extra things will be added by linker.
To verify it, do following,
From so long I was thinking of putting some stuff together regarding all basic things about C that gives better understanding not only about C but also some about OS perspective. So Here is the start, (oh please... "Hello World!" again)
0. Prerequisites
Through out the post I am assuming that reader has some knowledge of C. All explanations/examples bellow are tested/explained on Linux system. To prepare system install following packages,
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ sudo apt-get install gcc vim |
Bear with the stupid example :)
This is basic C program, almost all programmers have written this program at least one in life.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include<stdio.h> | |
#define MSG "Hello Guys!" | |
int main() | |
{ | |
/* printing msg */ | |
printf(MSG); | |
return 0; | |
} |
Program to become executable it needs to go through certain steps,
1. Pre-processing
2. Compilation
3. Assembly
4. Linking
To see all intermediate file in process of compilation compile above program as follows,
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ gcc -Wall -save-temps helloguys.c -o helloguys | |
$ ls | |
helloguys.c | |
helloguys.i | |
helloguys.o | |
helloguys.s | |
helloguys | |
$ |
1. Pre-processing
This is very first step in process of compilation of the C program. It mainly does following tasks,
1. Macro substitution
2. Comments stripping
3. Expansion of included files
Output of pre-processor for above program is as shown bellow,
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
... | |
... | |
... | |
# 870 "/usr/include/stdio.h" 3 4 | |
extern FILE *popen (__const char *__command, __const char *__modes) ; | |
extern int pclose (FILE *__stream); | |
extern char *ctermid (char *__s) __attribute__ ((__nothrow__ , __leaf__)); | |
# 910 "/usr/include/stdio.h" 3 4 | |
extern void flockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)); | |
extern int ftrylockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)) ; | |
extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)); | |
# 940 "/usr/include/stdio.h" 3 4 | |
# 2 "helloguys.c" 2 | |
int main() | |
{ | |
printf("Hello Guys!"); | |
return 0; | |
} |
1. first observation is at end of the file you can see argument in printf() is replaced by string "Hello Guys!" instead of macro.
2. second is it does not have any comments we added, they have been stripped out.
3. Third observation in "#include<stdio.h>" is replaced by lots of other stuff. It has been expanded and added in the source file so that compiler can clearly see "printf()" function.
It says printf() is defined somewhere externally.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
extern int printf (__const char *__restrict __format, ...); |
2. Compilation
Next, compiler takes helloguys.i as input and converts it into compiler intermediate output file. This is assembly code. You can see "helloguys.s" file bellow. its consists of assembly instructions of your processor. Next assembler takes this instructions and converts them into machine code.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
.file "helloguys.c" | |
.section .rodata | |
.LC0: | |
.string "Hello Guys!" | |
.text | |
.globl main | |
.type main, @function | |
main: | |
.LFB0: | |
.cfi_startproc | |
pushl %ebp | |
.cfi_def_cfa_offset 8 | |
.cfi_offset 5, -8 | |
movl %esp, %ebp | |
.cfi_def_cfa_register 5 | |
andl $-16, %esp | |
subl $16, %esp | |
movl $.LC0, %eax | |
movl %eax, (%esp) | |
call printf | |
movl $0, %eax | |
leave | |
.cfi_restore 5 | |
.cfi_def_cfa 4, 4 | |
ret | |
.cfi_endproc | |
.LFE0: | |
.size main, .-main | |
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3" | |
.section .note.GNU-stack,"",@progbits |
In this stage "helloguys.s" file is taken as input and intermediate file "helloguys.o" is generated. This is object file.
This file contains machine level instructions. But function calls to external functions are not resolved yet. Since this is machine code it's not readable. But still see, it will look as bellow,
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
^?ELF^A^A^A^@^@^@^@^@^@^@^@^@^A^@^C^@^A^@^@^@^@^@^@^@^@^@^@^@$^A^@^@^@^@^@^@4^@^@^@^@^@(^@^M^@ | |
^@U<89>å<83>äð<83>ì^P¸^@^@^@^@<89>^D$èüÿÿÿ¸^@^@^@^@ÉÃ^@^@^@Hello Guys!^@^@GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3^@^@^T^@^@^@^@^@^@^@^AzR^@^A|^H^A^[^L^D^D<88>^A^@^@^\^@^@^@^\^@^@^@^@^@^@^@^]^@^@^@^@A^N^H<85>^BB^M^EYÅ^L^D^D^@^@^@.symtab^@.strtab^@.shstrtab^@.rel.text^@.data^@.bss^@.rodata^@.comment^@.note.GNU-stack^@.rel.eh_frame^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ | |
... | |
... | |
... |
4. Linking
So, this is the final stage of compilation process. It does linking of function calls and their definitions. Like printf() function. Until now this code doesn't know anything about printf(). It only puts place holder at the place of function call. At his stage actual function call printf() gets resolved and function address gets plugged-in.
function of linker:
1. Plug-ins actual function address
2. It also adds some extra code/info in your file. To run our code we need to submit it to system, so this stage adds some extra code to your code like, at start of code OS needs to set some environment variables, command line arguments plus some OS specific code so that OS can signal your application. Also to while exiting, code needs to return some variables to OS which will help to maintain the status of code and its environment. This all extra things will be added by linker.
To verify it, do following,
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ size helloguys.o | |
text data bss dec hex filename | |
97 0 0 97 61 helloguys.o | |
$ size helloguys | |
text data bss dec hex filename | |
1149 256 8 1413 585 helloguys |
At least you can see that size of executable is lager than object file.
So, now you know Life of C, the stages that your code needs to go through before becoming executable. This is just a tip of an ice-burg, there is lot more if you want to go into details.
Next part of the Life of C, will contain the life of your executable, what all happens when you submit it to OS to run.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$./helloguys | |
Hello Guys! |
Cheers!!!
grt
ReplyDelete