hello world, C and GNU as
May 11th, 2008 | Published in Assembler, C, Languages, Programming | 6 Comments

Finally, it's time to switch to the fabulous GNU as. We'll forget about DEBUG for some time. Thanks DEBUG. GNU as, Gas, or the GNU Assembler, is obviously the assembler used by the GNU Project. It is part of the Binutils package, and acts as the default back-end of gcc. Gas is very powerful and can target several computer architectures. Quite a program, then. As most assemblers, Gas' input is comprised of directives (also referred to as Pseudo Ops), comments, and of course, instructions. Instructions are very dependent on the target computer architecture. Conversely, directives tend to be relatively homogeneous.
1 Syntax
Originally, this assembler only accepted the AT&T assembler syntax, even for the Intel x86 and x86-64 architectures. The AT&T syntax is different to the one included in most Intel references. There are several differences, the most memorable being that two-operand instructions have the source and destinations in the opposite order. For example, instruction mov ax, bx would be expressed in AT&T syntax as movw %bx, %ax, i.e., the rightmost operand is the destination, and the leftmost one is the source. Other distinction is that register names used as operands must be preceded by a percent (%) sign. However, since version 2.10, Gas supports Intel syntax by means of the .intel_syntax directive. But in the following we'll be using AT&T syntax.
2 Our Goals
What we'll be doing is to create a new instance of a hello, world! program. Let's recapitulate the articles we've studied so far. First, we presented some reminiscences and motivations for hello, world!. Next, we coded a hello, world! program by using the MS-DOS DEBUG program. Later, we encoded such program directly in hexadecimal (no need for DEBUG). And finally, we abused the MS-DOS ECHO command to create a binary, executable hello, world! program directly from the DOS command line (again no need for DEBUG.) A thing all these programs had in common was their use of the 09h function of INT 21h for printing the "hello, world!" string. But it's time to move forward. Now I plan to use the lovely C printf function. In C, our greeting program would be
int main() { printf("hello, world!\n"); return 0; }
We've omitted inclusion of the stdio.h header. We could recur to only one sentence: return printf("hello, world!\n") - 14; but I think that by using two sentences we'll get a clearer code. We save our program in a file called "hello.c", and compile with
gcc -o hello.exe hello.c
I'll be working on Windows, with the MinGW port of the GNU Compiler Collection. I like MinGW a lot, specially its ability to provide native functionality via direct Windows API calls, which is good for performance of our applications. Working in Windows means that our executable files (object code and DLLs too) follow the PE/COFF format. The Portable Executable (PE) file format is a wrapper for all the information the Windows loader requires in order to run the code. PE is a modified version of the Unix COFF file format (hence the reference PE/COFF.) Other popular file format for executable code is ELF (Executable and Linkable Format), which is used by Linux, the Nintendo Wii and DS, and the PlayStation 3. For the time being, we only have to know that the behavior of GNU as varies according to the target file format (in our case, PE/COFF.)
gcc can also provide us with the x86 assembly file it used. I typed gcc -S hello.c and this was the output I got:
.file "hello.c" .def ___main; .scl 2; .type 32; .endef .section .rdata,"dr" LC0: .ascii "hello, world!\12\0" .text .globl _main .def _main; .scl 2; .type 32; .endef _main: pushl %ebp movl %esp, %ebp subl $8, %esp andl $-16, %esp movl $0, %eax addl $15, %eax addl $15, %eax shrl $4, %eax sall $4, %eax movl %eax, -4(%ebp) movl -4(%ebp), %eax call __alloca call ___main movl $LC0, (%esp) call _printf movl $0, %eax leave ret .def _printf; .scl 2; .type 32; .endef
3 Code Explanations
From a general view, we identify 3 elements in the above listing. First, we have directives, which are symbols beginning with a '.' (dot.) As aforesaid, directives are typically valid for any computer. If the symbol begins with a letter the statement is an assembly language instruction, i.e., it will assemble into a machine language instruction, and surely will differ between computer architectures. Finally, labels are those symbols immediately followed by a ':' (colon.) We may think of labels as "directions" for data or code. Now let's do a shallow review of a few germane directives, so bear with me.
.file string
This directive identifies the start of the logical file (and string should be the file name.) Actually, the directive is ignored and is only there for compatibility with old versions. We can remove it.
.def name ... .endef
This pair of directives enclose debugging information for the symbol name, and are only observed when Gas is configured for PE/COFF format output. But we don't need it for a simple hello, world! program.
.section name
This directive indicates that the following code has to be assembled into a section called name. For PE/COFF targets, the .section directive is used in one of the following ways:
.section name [, "flags"]
.section name [, subsegment]
The gcc's output we've got recurs to the form with flags, and specifically, two flags (single character) are used to indicate the attributes of the section: d (data section) and r (read-only section.) But again, we don't need to explicitly signal section attributes for our simple program.
.ascii "string"
Defines one or more string literals (separated by commas.) Each string is assembled into consecutive addresses (with no trailing zero character.)
.text subsection
Tells Gas to assemble the following statements onto the end of the text subsection numbered subsection. If subsection is omitted (as it's our case), subsection number zero is used. Clearly, this directive is mandatory, or Gas will not assemble the code to print our hello, world! message.
.global symbol (or .globl symbol)
.global makes symbol visible to the linker. In our case, we want to inform the linker about the _main function that it is expecting. For compatibility with other assemblers, both spellings (.global or .globl) are valid.
Now, directives are done. After label _main we only have assembly code up to the ret instruction. Some of this code should be clear if you have previous experience with assembly programming. Nevertheless, let's review these instructions too. Note that the 'l' on the end of each mnemonic tells Gas that we want to use the version of the instruction that works with "long" (32-bit) operands.
First 3 instructions are typical code for stack initialization:
pushl %ebp movl %esp, %ebp subl $8, %esp
By subtracting 8 bytes from ESP we're reserving the space on the stack to hold local variables (the Intel stack "grows" from high memory locations to the lower ones.) Next we have the rarer
andl $-16, %esp
Remember that in hexadecimal, -16 is expressed as 0xFFFFFFF0. Therefore, this and aligns the stack with the next lowest 16-byte address. The reasons for this alignment are not very clear to me. It may be a gcc choice in order to accelerate floating point accesses, or it may be for compatibility with a particular architecture. Any of these, we don't require such alignment for displaying hello, world!
The following code is mostly a very contrived way of storing a value in EAX:
movl $0, %eax addl $15, %eax addl $15, %eax shrl $4, %eax sall $4, %eax movl %eax, -4(%ebp) movl -4(%ebp), %eax
Clearly the code is not optimized as there are a lot of unnecessary lines. Moreover, final EAX's value is also stored into memory previously reserved on the stack. It seems the value in EAX is a parameter for the _alloca invocation in the two following lines:
call __alloca call ___mai
These two calls are unnecessary for our toy application. We won't delve into details, but I'll say the alloca() is a function used to allocate memory on the stack. And if PE/COFF binaries are used, and our application has an int main() function, then a function void __main() should be called first thing after entering main(). We'll leave it at that for now. More information can be found in this excellent and instructive article from OSDevWiki.
At last, we find the useful code
movl $LC0, (%esp) call _printf
It moves the address of the ascii string into the stack, and invokes printf. Now, where's the definition of printf? Well, we'll take it from the C library, of course. The linker (ld) is responsible of associating our code with the definition of printf.
Finally, we found
movl $0, %eax leave ret
These instructions constitute the "returning code." Store the return value (0 == success!) in EAX, destroy the stack, and pop the saved Instruction Pointer from the stack in order to return control to the calling procedure or program.
If we strip all the unnecessary lines, our hello, world! would acquire this form:
.data LC0: .ascii "hello, world!\n\0" .text .global _main _main: pushl %ebp movl %esp, %ebp subl $4, %esp movl $LC0, (%esp) call _printf movl $0, %eax leave ret
Shorter and clearer. I assembled the hard way, step by step:
as -o hello.o hello.s ld -o hello.exe /mingw/lib/crt2.o C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/crtbegin.o -LC:/MinGW/bin/../lib/gcc/mingw32/3.4.5 -LC:/MinGW/lib hello.o -lmingw32 -lgcc -lmsvcrt -lkernel32 C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/crtend.o
But it's better to just type gcc -o hello.exe hello.s ![]()

May 13th, 2008at 4:26 am(#)
At last, the force is with us all… XD
May 23rd, 2008at 2:08 pm(#)
thank you for this post,i just began to learn assembly language, and i use the book , you know, the example in the book is for linux, and i use windows.so none of the examples ran well. again, thank you! i saw the “hello world!”
May 24th, 2008at 10:03 pm(#)
@Carlos: The force is GNU or GNU as? Explain ;=)
@Jiao: If you could provide more information, perhaps we could help you.
June 2nd, 2008at 11:41 pm(#)
[...] If you’re programming in assembly, 0×80 only works in Linux. For DOS/Windows, you must use 0×21, 0×25, 0×26, etc. That’s the rationale behind my decision for using a function of the C library, printf, in order to avoid this problem in the example code of this post. [...]
June 2nd, 2008at 11:42 pm(#)
This post may be helpful:
http://www.halcode.com/archives/2008/06/02/colinux-int-0×80-on-windows-and-other-rants/
June 6th, 2008at 7:10 am(#)
–Intel syntax support is a bliss.