assembly basics cs 2xa3
play

Assembly basics CS 2XA3 Term I, 2020/21 Outline What is Assembly - PowerPoint PPT Presentation

Assembly basics CS 2XA3 Term I, 2020/21 Outline What is Assembly Language ? Assemblers Why Assembly? NASM Character and String Literals Integer literals Labels and Names Statements Program structure Input/Output Compiling+linking What


  1. Assembly basics CS 2XA3 Term I, 2020/21

  2. Outline What is Assembly Language ? Assemblers Why Assembly? NASM Character and String Literals Integer literals Labels and Names Statements Program structure Input/Output Compiling+linking

  3. What is Assembly Language? In a high level language (HLL), one line of code usually translates to 2, 3 or more machine instructions. Some statements may translate to hundreds or thousands of machine instructions. ◮ In Assembly Language (AL), one line of code translates to one machine instruction; AL is a "human readable" form of machine language ◮ HLLs are designed to be "machine-independent", but machine dependencies are almost impossible to eliminate. ◮ ALs are NOT machine-independent. Each different machine (processor) has a different machine language. Any particular machine can have more than one assembly language

  4. Assemblers An assembler is a program that translates an assembly language program into binary code of machine instructions ◮ NASM Netwide Assembler ◮ MASM Microsoft Assembler ◮ GAS GNU assembler ◮ ARM Assembly Language

  5. Why Assembly? There are two reasons to write programs in assembly: (a) it is the only “language” the CPU understands (b) to obtain understanding of how the CPU works Item (b) is also the biggest disadvantage of assembly programming: you are programming from the perspective of a CPU, not that of a human brain. This proves difficult for a wide range of students who are not familiar with CPU architecture and are used to have “convenient” data structures at hand (such as list in Python).

  6. NASM ◮ We are using 64-bit NASM in this course ◮ NASM is operating system independent - One of the two widely used Linux assemblers (the other is GAS) - NASM is an open source (80x86 and x86-64 architecture) assembler. Compared to MASM, TASM, or GAS, it is rather easy to use and provides convenient syntactic constructs.

  7. NASM ◮ We will not cover NASM syntax in full depth - We are interested in a basic machine interface and NOT in a proficient production assembler programming - NASM has many syntactic constructs similar to C - NASM has an extensive preprocessor similar to the C preprocessor.

  8. Character and String Literals Escape characters format description ASCII decimal value \' single quote ( ' ) 39 \" double quote ( " ) 34 \ ‘ backquote ( ‘ ) 96 \\ back slash ( \ ) 92 \ ? question mark ( ? ) 63 tab (TAB) 9 \ t newline (LF) 10 \ n carriage return (CR) 13 \ r ' This is a string literal ' " This is a string literal, too " ‘Backquoted strings can use escape chars \ n‘

  9. Integer literals 200 integer in decimal notation decimal - the leading 0 does not make it octal 0200 0200d explicit - d suffix also explicit decimal - 0d prefix 0d200 0c8h hexadecimal - h suffix leading 0 is required, because c8h looks like a name 0xc8 hexadecimal - the classic 0x prefix hexadecimal - for some reason NASM likes prefix 0h 0hc8 octal - q suffix 310q octal - 0q prefix 0q310 binary - b suffix 11001000b 0b1100_1000 binary - 0b prefix, underscores are allowed

  10. Labels and Names Names identify labels, variables, symbols, and keywords ◮ May contain: letters: a .. z A .. Z digits: 0 .. 9 special chars: ? _ @ $ . ~ ◮ NASM (unlike most assemblers) is case-sensitive with respect to labels and variables – it is not case-sensitive with respect to keywords, mnemonics, register names, directives, etc. ◮ First character must be a letter, _ or . (which has a special meaning in NASM as a “local label” indicating it can be redefined) ◮ Names cannot match a reserved word (and there are many reserved words!)

  11. Statements Syntax: [label[:]] [mnemonic] [operands] [;comment] ◮ [ ] indicates optionality ◮ Note that all parts are optional → blank lines are legal ◮ Labels are used to identify locations in code (instruction labels) or memory location (data definitions labels) ◮ Statements are free form; they need not be formed into columns ◮ Statement must be on a single line, max 128 chars

  12. Examples of Statements a100: add rax, rdx ; add subtotal Labels often appear on a separate line for code clarity: a100: ADD RAX, RDX ; add subtotal Note case-insensitivity of mnemonics ( add or ADD ) and registers ( rax or RAX ), however A100 instead of a100 would be wrong.

  13. Type of statements ◮ Directives+Pseudo-instructions limit EQU 100 ;defines a symbol limit %define limit 100 ;like C #define ◮ Data Definitions msg: db ' Welcome to Assembler! ' db 0Dh, 0Ah count dd 0 mydat: dd 1,2,3,4,5 resd 100 ;reserves 100 dwords ◮ Instructions mov rax, rbx ADD RCX, 10

  14. Directives directives for linker extern printf declares printf to be an external symbol global asm_main declares asm_main to be an entry point directives for preprocessor every occurrence of symbol ctrl is %define ctrl 0x1F replaced by literal 0x1F %define b(x) 2*x b(y) is replaced by the value of 2*y %define a(x) 1+b(x) a(y) is replaced by the value of 1+2*y %include " file10 " replaced by the contents of the file

  15. Pseudo-instructions Pseudo-instructions are not x86 instructions, rather they are part of the NASM assembler. These are used to declare initialized and uninitialized data and few other things. Lets look over them in brief : every occurrence of symbol ctrl is ctrl EQU 0x1F replaced by literal 0x1F and cannot be changed, i.e. defines a constant (similar to %define )

  16. Data definitions Declaring Initialized Data General format is [label[:]] <pseudo-instruction> <value> [;comment] initialized data declaration pseudo-instructions: DB , DW , DD , DQ are used to declare initialized data. The first letter D stands for data, and the second stands for: B yte (1 byte), W ord (2 bytes), D word (4 bytes), and Q word (8 bytes).

  17. ◮ label1 db ABh declares byte with value AB in hex with label label1 ◮ label2 db 1010010b declares byte with value 1010010 in binary with label label2 ◮ label3: dw 12ABh declares word with value 12AB in hex with label label3 ◮ label4 dd 1A2Bh declares double word with value 1A2B in hex with label label4 ◮ label5: db "A" declares byte with value of the ASCII code of A i.e. 65 in decimal.

  18. Array – the only innate data structure available to NASM ( there is a mechanism to define user data structures which is an advanced topic not covered in this course ). Array is several items of the same type together, stored in consecutive memory one after another. String is another word for byte array. A C-string is a byte array terminated with byte 0 (null character). ◮ label6 db 0, 1, 2, 3 declare 4 consecutive bytes with values 0 , 1 , 2 and 3 respectively ◮ label7 db "h", "e", "l", "l", "o", 0 declares a C-string of length 6 (the terminator 0 is not counted). ◮ label8 db "hello",0 The same as label7

  19. Declaring Uninitialized Data General format is [label[:]] <pseudo-instruction> [;comment] uninitialized data declaration pseudo-instructions: RESB , RESW , RESD , RESQ are used to declare uninitialized data. The first part RES stands for reserve, and the last letter stands again for: B yte (1 byte), W ord (2 bytes), D word (4 bytes), and Q word (8 bytes). mybuffer: resb 64 reserve 64 bytes with label mybuffer mywordbuffer resw 64 reserve 128 bytes (64 words) with label mywordbuffer

  20. Times The times pseudo-instruction It is a very versatile pseudo-instruction. It is a kind of a loop, but we will use it only in data definitions to initialize arrays So, for us the format is [label[:]] TIMES <value> <pseudo-instruction> [;comment] times 10 db 0 is the same as db 0,0,0,0,0,0,0,0,0,0

  21. The Location Counter str1 DB ' This is a string ' slen EQU $-str1 ; const slen = 16 ◮ The symbol $ refers to the location counter ◮ As the assembler processes source code, it emits either code or data into the object code. ◮ The location counter is incremented for each byte emitted ◮ With slen EQU $-str1 the assembler performs the arithmetic to compute the length of str1 ◮ Note the use str1 in this expression as a numeric value (the address of the first byte)

  22. Program layout BS S came from “ Block S tarted by S ymbol” , an assembler for IBM 704 in the 1950s.

  23. NASM program structure %include "simple_io.inc" segment .data ;initialized data segment .bss ;uninitialized data segment .text global asm_main asm_main: enter 0,0 ;setup saveregs ;save all registers (our macro) ;put your code here restoregs ;restore all registers (our macro) mov rax,0 ;return value leave ret

  24. Input/Output Input/Output (standardly abbreviated as I/O) routines. ◮ We will only deal with standard input and standard output. ◮ We will deal with I/O through the preprogrammed routines in the file simple_io.asm It requires that the header file simple_io.inc be included : %include "simple_io.inc" ◮ The great advantage is that the I/O routines in simple_io.asm do not use system stack, the information is passed in/out in RAX register. Thus, calling of any of these routines does not involve manipulation of the system stack, a huge simplifi- cation .

  25. Simple I/O routines ◮ print_int prints the integer stored in RAX ◮ print_char prints ASCII value of AL ◮ print_string prints the C-string stored at the address stored in RAX ◮ print_nl prints newline ◮ read_char reads a character into AL

Recommend


More recommend