Seldom does anybody code an entire program in assembly any more. While it’s still possible to do so, the majority of a program is typically written in a high-level language such as C, C++, or Java, and only the parts that require high speed or low code size are written in assembly language.
For this assignment, you’ll need to write a simple C program and a few simple assembly-language functions, then combine the two into a single executable file. This requires you to understand how functions are called, and requires you to have at least some vague understanding of the linking process.
You’ll need to work on Intel Linux/Cygwin machines to compile and run this, so if you don’t have a personal system with these features, you’ll need to use the iLab machines. The linked page has directions on how to set up an account, if you don’t already have one.
The C portion of the program will need to demonstrate the difference between little- and big-endian encodings of numbers. To do this, it must take in one or more numbers at the command line; for each of these numbers, it must print out
You must create two assembly functions for this assignment:
Your C code must call writeHexASCII to generate the hexadecimal strings for its output, and it must call swapEnds to reverse endianness of each number. You may use printf to write the decimal form of the number, or to write the hexadecimal strings. Your C code must be valid ANSI C, as usual.
Your assembly code shouldn’t make calls to any outside functions, and it should be runnable on any 80486 or 80386 CPU in 32-bit protected mode. (I.e., precisely the environment we’ve used in class.)
You can write your assembly code in NASM or AT&T format; if it’s NASM, end your assembly file in .asm, and if it’s AT&T, end your file in .S (note the capital S).
Your output should have the following form:
| $ ./your_program NUM1 NUM2 NUM3... |
| NUM1: HEX_FORM <-> REV_HEX_FORM = REV_DEC_FORM |
| NUM2: HEX_FORM <-> REV_HEX_FORM = REV_DEC_FORM |
| NUM3: HEX_FORM <-> REV_HEX_FORM = REV_DEC_FORM |
| ... |
| $ ./your_program 1 512 4660 |
| 1: 00000001 <-> 01000000 = 16777216 |
| 512: 00000200 <-> 00020000 = 131072 |
| 4660: 00001234 <-> 34120000 = 873594880 |
By the by, don’t try to just compile some C code and hand it in. We can always tell. We can always tell.
| $ nasm -felf -o filename.o filename.asm |
| $ gcc -c filename.S |
| $ gcc -o binary_filename object1.o object2.o ... |
| ; ...for NASM format: |
| section .text |
| bits 32 |
| global writeHexASCII |
| global swapEnds |
| /* ...for AT&T format: */ |
| .text |
| .code32 |
| .globl writeHexASCII |
| .globl swapEnds |
This is a horrible format for assembly language, but one that’s quite common, and it would behoove you to gain some familiarity with it. (It’s also useful if you want to integrate assembly code into C source code for GCC, or if you like sharing header files between C and assembly.)
There are a few big differences between AT&T and Intel/NASM formats:
| NASM | AT&T |
|---|---|
| [label] | label |
| [eax] | (%eax) |
| [eax+foo] | foo(%eax) |
| [eax+ebx] | (%eax,%ebx) |
| [eax+ebx+foo] | foo(%eax,%ebx) |
| [eax+4*ebx] | (%eax,%ebx,4) |
| [eax+4*ebx+foo] | foo(%eax,%ebx,4) |
| NASM | AT&T |
|---|---|
| cmp a,b jbe x |
cmp b,a jbe x |
| b | Integer byte |
| w | Integer word |
| l | Integer doubleword (“longword”) |
| d | A pair of words that make a doubleword (e.g., DX:AX), or a pair of doublewords that make a quadword (e.g., EDX:EAX) |
| q | A single integer quadword value. Until the later x86en, this was only supported by the FPU. |
| s | A single-precision real (float in C) |
| l | A double-precision real (double in C) |
| t | An 80-byte temporary real |
| NASM/Intel | AT&T |
|---|---|
| stosb, stosw, stosd | stosb, stosw, stosl |
| movsb, movsw, movsd | movsb, movsw, movsl |
| lodsb, lodsw, lodsd | lodsb, lodsw, lodsl |
| cmpsb, cmpsw, cmpsd | cmpsb, cmpsw, cmpsl |
| scasb, scasw, scasd | scasb, scasw, scasl |
| movzx ax,al | movzbw %al,%ax |
| movzx eax,ax | movzwl %ax,%eax |
| movzx eax,al | movzbl %al,%eax |
| movsx ax,al | movsbw %al,%ax |
| movsx eax,ax | movswl %ax,%eax |
| movsx eax,al | movsbl %al,%eax |
| cbw, cwde, cwd, cdq | cbtw, cwtl, cwtd, cltd |
| Intel | AT&T |
|---|---|
| global foo | .globl foo |
| section foo | .section foo |
| section .text | .text |
| section .data | .data |
| bits 32 | .code32 |
| db 1,2,3 | .byte 1,2,3 |
| dw 1,2,3 | .word 1,2,3 |
| dd 1,2,3 | .long 1,2,3 |
| db "A string!",13,10 | .ascii "A string!\r\n" |
| db "A string!",13,10,0 | .asciz "A string!\r\n" |
...then you’ll need to download and build it. The NASM Sourceforge site is here; grab a source-code tarball (.tar.gz or .tar.bz2) and put it in your home directory somewhere. (Don’t grab binaries or RPMs or anything—you either can’t run them or can’t use them easily.)
Assuming you’ve gotten nasm-2.00.tar.*, extract the contents of the tarball by running
| $ tar -zxvf nasm-2.00.tar.gz |
| $ tar -jxvf nasm-2.00.tar.bz2 |
| $ cd nasm-2.00 |
| $ ./configure --prefix=$HOME |
| ... the build environment is set up ... |
| ... a bunch of stuff prints out ... |
| $ make install |
| ... everything is built and installed ... |
| ... a bunch of other stuff prints out ... |