“Compilation” Process in C Programming

You are currently viewing “Compilation” Process in C Programming

I’m discussing this article using the example of GCC Compiler, supplied with Linux by default, but available to many other platforms like Windows, MacOS etc. It is one of the most popular C Compilers out there and is similar to using the traditional UNIX cc compiler. GCC formerly known as GNU C Compiler, now stands for GNU Compiler Collection, because the current version of GCC compiles programs written in variety of languages like Ada, C, C++, Fortran, Java and Objective-C. Click here, for more information about GCC.

Let’s consider the "pun.c" program from K. N. King’s "C Programming – A Modern Approach"

#include <stdio.h>

int main(void)
{
    printf("To C, or not to C: that is the question.\n");
    return 0;
}

We’ve a simple C program where:

  • #include <stdio.h> is a directive necessary to "include" information about C’s standard I/O (input/output) library.
  • main is a special mandatory function, which gets called automatically when the program is executed. The word int preceding main indicates that the main function returns an integer value and word void in parentheses after main indicates that the main has no arguments.
  • printf is a function from the standard I/O library that can produce nicely formatted output.
  • return statement (return 0) returns the integer value 0 to the operating system, while terminating the program.

Now, we must know how to obtain executable code, which when executed will print the following line on standard output:

To C, or not to C: that is the question.

The process of obtaining executable code form the given C source code requires 3 important steps: Preprocessing, Compilation and Linking. But, all these three steps are often automated with one command:
$ gcc pun.c -o pun
where, GCC invokes the preprocessor, compiler and linker one after the another, in that order, taking source code file pun.c as input and generating final executable code file pun.

Lets look at the entire process:

  1. Editor: First a text file is created, using a text editor like vim, which contains the C program (or source code). Let us consider this text file as pun.c, shown above. The name of the file here, doesn’t matter much, but the ".c" extension is often required by compilers.

  2. Preprocessor: Next comes the Preprocessing Stage. The program is given as an input to preprocessor, which processes all the lines beggining with # (know as directives).

Commands intended for the preprocessor are called directives. In program pun.c, there is only one "#include <stdio.h>" directive, which states that the information in <stdio.h> is to be "included" into the program, before it is compiled. <stdio.h> contains information about C’s standard I/O Library. C has a number of headers like <stdio.h>, each containing information about some part of the standard library. The reason we’re including <stdio.h> is that C, unlike some programming languages, has no built-in "read" and "write" commands and the ability to perform input and output operations is provided instead by functions in standard library. Directives always begin with # character and by default are one line long, there’s no semicolon or other special symbol at the end of the directive.

A preprocessor is a bit like a text editor, can add things to the program and make modifications.
Invoking GCC with "-E" flag, tells GCC to stop after preprocessing phase. Following command stores the output of preprocessor in file pun.p:
$ gcc -E pun.c -o pun.p

  1. Compiler: Then comes the Compilaton Stage. The output of the preprocessor is fed as an input to the compiler, which translates the entire human readable program into machine instructions (aka object code) stored in a newly created file with the same name as source code file but with a different extension ".o". This file is a binary file, though, it isn’t quite ready to run yet.
    Invoking GCC with "-c" flag, tell GCC to stop after compilation phase. Following command generates the object code file pun.o:
    $ gcc -c pun.p

  2. Linker: Finally comes the Linking Stage in which a linker program (ld – The GNU Linker), when invoked by GCC, links the object code file produced in the 3rd step with any additional code needed to yield a complete executable program. This additional code includes library functions (like printf, etc.) that are used in program. Once linked, the final output file containing the executable code is generated with default name "a.out".
    Invoking GCC with object code as input, automatically links the object code with required additional code and generate the final executable binary (a.out by default):
    $ gcc pun.o
    Note: More specifically you may choose ld directly to link your object code file with C Library, though it isn’t recommended and may generate some errors:
    $ ld pun.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -lc -m elf_x86_64
    Here, additional code is being linked dynamically, statically linking is a complicated process.


One may choose to compile their programs using the simple command discussed above:
gcc -o OUTPUTFILENAME SOURCECODEFILE.c
though, a much better invocation will be:
gcc -Wall -W -pedantic -ansi -std=c99 -g -o OUTPUTFILENAME SOURCECODEFILE.c
where:
-Wall "-W" implies Warning followed by string "all", i.e. "all Warnings". -W can be followed by codes for specific warnings as well. It causes the compiler to produce warning messages when it detects possible errors. It should be used in conjunction with -O for maximum effect.
-W issues additional warning messages beyond those produced by "-Wall".
-pedantic issues all warnings required by C standard. Causes programs that use non-standard features to be rejected.
-ansi turns off GCC features, that aren’t standard C and enables a few standard features that are normally disabled.
-std=c89 or -std=c99 specifies which version of C compiler should be used to check the program.
-g tells the compiler to include information needed by the debugger, which enables us to debug the program in future, using a debugger like gdb.
Note: Three options -pedantic, -ansi & -std are great for ensuring you’ve a portable program.

Leave a Reply