Assembler
Assembler
Assembler
Course Content
➢ Introduction to System Software: Definition, System software, Machine
structure, Components of a programming system, Assemblers, linker,
loader, compiler, Macros, text Editor, Debugger, Program development
Flow, Introduction to Operating System, Language Processor, Assembly
Language, Introduction to CISC and RISC machine architecture
Slide 2
Course Content
➢ Assembler: Basic Assembler Functions, Machine Dependent Features, Machine Independent
Features, One pass and Multi pass Assembler
➢ Linkers, Loaders, Macros and Macro Processors: Basic Loader Function, Loader Design
Options, Relocation and Linking Concepts, Design of a Linker, Case study for Linker and Loader,
Macro definition, Macro expansion, Basic Macro Processor Functions and Features, Macro
Processor Design Options, Implementation example for Macro Processor
Slide 3
Assembly Programming Language
Slide 4
Pentium Assembly Language
➢ Registers
➢ Memory Models
➢ Addressing Modes
➢ Instruction Set
➢ Instruction format
Slide 5
Registers
➢ The Intel 32-bit architecture provides 16 basic program execution registers for
use in general system and application programming.
◼ General purpose registers: Eight general purpose registers are available for
storing operands and pointers
◼ EFLAGS registers: It holds the program status and allows limited control
of the processor to the application program.
Slide 6
Slide 7
General Purpose Registers
➢ The eight 32-bit general purpose registers EAX, EBX, ECX,
EDX, ESI, EDI, EBP, and ESP are provided for the following
operations:
Slide 8
➢ The special uses of general purpose registers by instructions are as follows:
◼ EAX - Accumulator for operands and results data.
◼ EBX - Pointer to data in the data scgment.
◼ ECX - Counter for string and loop operations.
◼ EDX - I/O pointer.
◼ ESI - Pointer to data in the segment pointed to by the DS segment
register; source pointer for string operations.
◼ EDI - Pointer to data (or destination) in the segment pointed to by the
segment register ES; destination pointer for string operations.
◼ EBP - Pointer to data on the stack (in the SS segment).
◼ ESP - Stack pointer in the segment pointed to by the segment register SS.
Slide 9
➢ The lower order 16 bits of the general
purpose registers map directly to the
register set found in the earlier Intel
processors, like 8086.
➢ These 16-bit registers can be referenced
with the names AX, BX, CX, DX, BP, SP,
SI, and DI. The &bit portions of AX, BX,
CX, and DX can be accessed as a collection
of 8-bit registers.
➢ The higher order 8-bits of AX is referred to
as AH, while the lower order 8-bits can be
accessed as AL. Similarly, BX, CX, and
DX can be accessed as BH and BL, CH and
CL, and DH and DL registers.
Slide 10
Segment Register
➢ The segment registers (CS, DS, SS, ES, FS, and GS) hold 16-
bit segment selectors.
➢A segment selector is a special pointer that identifies a
segment in memory.
➢ To access a particular segment in memory, the segment
selector for that segment must be present in the appropriate
segment register.
➢ Each of the segment registers is associated with one of the
three types of storage: code, data, or stack.
Slide 11
➢ The DS, ES, FS, and GS registers point to four data segments.
The availability of four data segments permits efficient and
secure access to different types of data. To access additional data
segments, the application program must load segment selectors
for these segments into the corresponding segment register.
Slide 12
EFLAGS register
Slide 13
Slide 14
Memory Model
Slide 15
➢ In flat memory model, memory appears as a continuous
address space. This linear address space is byte addressable
with addresses running continuously from 0 to 232 — 1
Slide 16
➢ With the segmented memory model, memory appears to a program as a group of
independent address spaces called segments. When using this model, code, data and
stack are typically contained in separate segments.
➢ To address a byte in a segment, the program must issue a logical address, which
consists of a segment selector and an offset.
➢ The segment selector identifies the segment to be accessed and the offset identifies a
byte in the address space of the segment.
➢ The programs can address up to 16,383 segments of different sizes and types, and
each segment can be as large as 232 bytes.
Slide 17
➢ The real address mode memory model uses the memory model
for Intel 8086 processor. Individual segments can be up to 64
KBytes in size, and maximum size of the linear address space
is 220 bytes. The linear address is obtained by multiplying the
segment register by 16 and adding offset to it.
Slide 18
➢ Modern operating systems and applications use the flat (that
is, unsegmented) memory mode: all the segment registers are
loaded with the same segment selector (zero) so that all
memory references a program makes are to a single linear-
address space.
Slide 19
Segmented Memory Model
➢ In this mode, the segment registers receive the name selectors. Although the
segment registers are still 16-bit in size, their interpretation by the processor is
different. The structure of a selector has been shown below.
Slide 20
Descriptor Tables
➢ Descriptor tables reside in system memory and are used by the processor to
perform address translation. Each entry in a descriptor table is 8 bytes long
and represents a single segment in memory.
➢ It contains a pointer to the first byte in the associated segment and a 20-bit
value representing the size of the segment in memory.
Slide 21
➢ The individual fields are as follows:
➢ 1. Base: This 32-bit segment base address field points to the segment’s starting
location in the 4GB linear address space.
➢ 2. D/B: It is called segment size bit. When the descriptor entry describes a
code segment, this bit is used to specify the default length of operands and
addresses. If the bit is set, the processor assumes a 32-bit segment, else a 16-
bit segment is assumed.
➢ 3. DPL: Descriptor privilege level (2-bit) ficld defines the segment privilege
level. It is used by the protection mechanism built into the processor to restrict
access to the segment.
➢ 4. G: Granularity bit controls the resolution of the segment limit fleld. When
this bit is clear, the resolution is set to one byte. When this bit is set, the
resolution is 4KB.
Slide 22
➢ 5. Limit: The segment limit field (20-bit) determines the size of the segment
in units of one byte or 4KB (depending on the G-bit).
➢ 8. Type: The segment type bit (4-bit) determines the type of the segment, for
example, execute-only, execute-read, read-only, read-write, and so on.
Slide 23
➢ Two types of descriptor tables are used by the processor when
working in the protected mode.
◼ The first one is known as the Global descriptor table
(GDT) and is used mainly for holding descriptor entries of
operating system segments.
◼ The second type is known as the Local descriptor table
(LDT). It contains entries of normal application segments.
During initialization, the kernel creates a single GDT
which is kept in memory until either the operating system
terminates or the processor is switched to the real-mode.
Slide 24
➢ Whenever the user starts an application, the operating system creates a new LDT to
hold the descriptor entries, which represent the segments used by the new task.
➢ When looking into a specific descriptor entry, the addressing unit in the processor uses
the TI bit to decide which descriptor table be used—
◼ the GDT or the currently active LDT. The linear address and size of the GDT are
stored in a special processor register called GDTR. Similarly, LDTR register
contains the size and position of the currently active LDT in memory.
Slide 25
➢ The logical address generated by
the CPU consists of two parts
◼ a selector part, and an offset
part.
➢ The segment selector points to a
segment descriptor, which
contains the base address of a
memory segment.
➢ The 32-bit offset from the logical
address is added to the segment’s
base address, generating 32-bit
linear address.
➢ It has been illustrated here.
Slide 26
Addressing Mode
Slide 27
Register addressing mode
Slide 28
Immediate addressing mode
➢ In this mode, the source operand is a constant. After assembly, the operand comes
immediately after the opcode. This addressing mode can be used to load into any of the
registers excepting the segment registers and the flag register.
➢ For example,
◼ MOV AX, 2550h ; move hexadecimal value 2550 into register AX.
➢ It may be noted that the segment registers can be loaded by transfer from general
purpose registers using the register addressing mode. The following arc two examples
of it.
➢ Real-address mode:
◼ MOV AX, 2550h ; AX contains the segment address
◼ MOV DS, AX ; load DS segment register with AX
➢ Protected mode:
◼ MOV AX, 08h ; AX contains the segment selector
◼ MOV DS, AX ; load DS segment register with AX
Slide 29
Direct addressing mode
➢ In this case, the data is located in the memory. The address of the memory location
comes immediately after the instruction. It may be noted that in immediate addressing,
operand is provided itself with the instruction, whereas in direct addressing, the
address of the operand is found in the instruction.
➢ The following are two examples of direct addressing used in real and protected mode
respectively.
➢ Real-address mode:
◼ MOV DL, [2400h] ; moves content of memory location 16*DS + 2400h to DL
➢ Protected mode:
◼ MOV EAX, [12345678h]; segment selector in DS indexes into the descriptor table
to get the base of the data segment to which 12345678h is added to obtain the
logical address content of this location is transferred to EAX
Slide 30
Register indirect addressing mode
➢ The address of the memory location where the operand resides is held in a register.
➢ In real-address mode, the register SI, DI, and BX can be used lor this purpose.
➢ In the protected mode, any of the registers EAX, EBX, ECX, EDX, ESI, or EDI can be
used.
➢ These register values are combined with DS segment register to generate the physical
address. For example,
➢ Real-address mode:
◼ MOV AL, [BX] ;moves into AL contents of memory location pointed to by
DS:BX
➢ Protected mode:
◼ MOV AL, [EAX]; moves into AL the contents of the memory location pointed to
by DS:EAX
Slide 31
Based relative addressing mode
➢ In this case, in real-address mode, the base registers BX and BP, as well as a displacement value
are used to compute an effective address, which, combined with the segment register value, gives
the address of the operand.
➢ The default segment registers used for the calculation of physical address are DS for BX and SS
for BP.
➢ For Example,
◼ MOV CX, [BX + 10h]; moves 16-bit word at memory location DS:BX + 10h into CX
register
◼ MOV [BP + 22h], CX; moves contents of CX into memory location SS:BP + 22h
➢ In case of protected mode, registers EAX, EBX, ECX, EDX, ESL, EDI, EBP, and ESP can be
used as base registers. The default segment registers used for the calculation of physical address
are DS for EAX, EBX, ECX, EDX, ESI, and EDI and S8 for EBP and ESP.
➢ For example,
◼ MOV EAX, [EBX + 10h] ; moves 32-bit quantity at memory location DS:EBX + 10h into
EAX register
◼ MOV [EBP + 22h], EDX ; moves contents of EDX into memory location SS:EBP + 22h
Slide 32
Indexed relative addressing mode
➢ In real-address mode, index registers SI and DI, as well as a displacement
value are used to calculate effective address. The default segment register is
DS.
➢ For example,
◼ MOV CX, [ST+10h] ; moves 16-bil word at memory location DS:SI +
10h into CX register
➢ In case of protected mode, registers EAX, EBX, ECX, EDX, ESI, EDI, EBP,
and ESP can be used as index registers. The default segment registers used are
DS for EAX, EBX, ECX, EDX, ESI and EDI, and SS for EBP and ESP.
➢ For example,
◼ MOV EAX, [ESI+10h] ; moves 32-bit quantity at memory location DS:
ESI + 10h into BAX register
Slide 33
Based indexed relative addressing mode
➢ It is the combination of based and indexed addressing modes.
➢ In this mode, one base register and one index register are used.
➢ For example,
➢ Real-address mode:
◼ MOV AL, [BX + SI + 8h] ; moves into AL contents of memory location pointed to
by DS:BX + SI + 8h
◼ MOV AL, [BP + SI + 8h] ; moves into AL contents of memory location pointed to
by SS:BP + SI + 8h
➢ Protected mode:
◼ MOV AL, [EAX + EBX + 10h] ; moves into AL the contents of the memory
location pointed to by DS:EAX + EBX + 10h
◼ MOV AL, [EBP + EBX + 10h] ; moves into AL the contents of the memory
location pointed to by SS:EBP + EBX + 10h
Slide 34
Segment overrides
➢ Each of the addressing modes has a default segment register associated with it.
However, it is possible to explicitly mention in the statement some other
segment register to be used to access the operands.
➢ It is called segment overrides.
➢ For example, the following instruction asks to use ES as the segment register,
though the default segment register is DS.
◼ MOV AL, ES:[SI + 8h] ; moves into AL contents of memory location
pointed to by ES:ST + 8h
Slide 35
Scaling factors
➢ For 32-bit instructions, a scaling factor can be specified in any of the based,
➢ indexed, or based indexed addressing modes.
➢ The scaling factor is 1, 2, 4, or 8.
➢ For example,
◼ MOV EAX, [ESI + EAX*4] ; moves into EAX the 32-bit quantity at
DS:EST + EAX * 4
➢ Thus in the protected mode, the effective address can be viewed as,
Slide 36
Instruction Set
➢ Most of the instructions have exactly two operands, in which case one of the operands
must be a register, while there is no restriction on the other one – it can be another
register, memory location, or immediate value.
➢ Most of the instructions are available in 8-, 16-, and 32-bit operands.
Slide 37
➢ The set of instructions can be classified into a number of
groups based on their functionality.
◼ Data movement
◼ Integer arithmetic
◼ Logical
◼ Floating point
◼ Control transfer
Slide 38
Data Movement
Slide 40
Integer Arithmetic
➢ ADD reg, r/m ; 2's complement addition
r/m, Teg
reg, immmed
r/m, immmed
Slide 41
➢ MUL EAX, t/m ; unsigned multiplication, EDX || EAX ← EAX * r/m
; 64-bit result s stored in the EDX, TAX pair
➢ IMUL r/m ; 2’s complement multiplication
; EDX || EAX ← EAX *r/m
reg, r/m ; reg ←reg * r/m
reg, immed ; reg ← reg * immed
➢ DIV r/m ; unsigned division, docs EDX || EAX / r/m
; EAX ← quotients, EDX ← remainder
➢ IDIV r/m ; 2's complement division, does EDX | EAX / r/m
; EAX ← quotients, EDX ← remainder
➢ CMP reg, r/m ; sets EFLAGS based on
r/m, immed ; second operand — first operand
r/m8, immmed8
r/m, immmed8 ; sign extends immed8 before subtract
Slide 42
Example
➢ takes the double word at the address EAX + 4 and finds its
additive inverse, then planes the additive inverse back at that
address
◼ NEG [BAX + 4]
Slide 43
Logical Instruction
➢ NOT r/m ; logical NOT
➢ AND reg, r/m ; logical AND
reg8, r/m8
r/m, reg
r/m8, reg8
r/m, immed
r/m8, immed8
➢ OR reg, r/m ; logical OR.
reg8, r/m8
r/m, reg
r/m8, reg8
r/m, immed
r/m8, immed8
Slide 44
➢ XOR reg, r/m ; logical exclusive OR
reg8, r/m8
r/m, reg
r/m8, reg8
r/m, immed
r/m8, immed8
Slide 45
Control Instructions
Slide 46
Instruction Format
➢ The instructions in an Intel processor vary in size from one byte up to fourteen
bytes. However. all these instructions follow a six-part structure noted below.
◼ Prefix – 0 to 4 bytes
◼ Opcode – 1 to 2 bytes
◼ ModR/M – 1 byte
◼ SIB – 1 byte
◼ Displacement – 1 byte or word
◼ Immediate – 1 byte or word
Slide 47
➢ Prefixes. The optional prefixes modify the behavior of instructions in several ways. Each prefix
adds one byte to the instruction. An instruction can have one prefix from each of the four prefix
groups, for a maximum of four prefix bytes.
➢ Group 1: LOCK, REPE/REPZ, REP, REPNE/REPNZ—the prefix bytes for these are FoH
through F3H. The REP prefixes are used only with string instructions.
➢ Group 2: Segment overrides prefixes. The hexadecimal codes 2E, 36, 3E, 26, 64, and 65 are used
to override the default segment registers by CS, SS, DS, ES, FS, and GS respectively.
➢ Group 3: Operand-size override (16-bit vs. 32-bit) the prefix byte is 66H.
➢ Group 4: Address-size override (16-bit vs. 32-bit) the prefix byte is 67H.
Slide 48
➢ Opcode. The operation code or opcode comes after at the most four number of
prefix bytes.
➢ The opcode field is one or two bytes. The code tells the processor which
instruction to execute. Apart from the operation, it also contains bit-fields
identifying the type and size of operands expected.
➢ For example, the NOT operation has an opcode 1111011w.
➢ Here, the ‘w’ bit determines whether the operand is a byte or a word.
➢ Similarly, the OR instruction has the format 000010dw. Here the ‘d’ bit
determines the direction of dataflow—that is, which operand is the source and
which one is the destination.
Slide 49
➢ ModR/M: It is 0 or 1 byte long and is put if the instruction requires it. If present, the ModR/M
field comes just after the opcode. This byte tells the processor which registers or memory locations
to use as the instruction’s operands. The byte has the structure shown in shown below.
➢ Both regl and reg2 fields take three-bit register codes, indicating which registers to be used as
operands. By default, reg1 is the source operand and reg2 is the destination. However, it may be
overridden by some instructions like OR, using the direction bit there. If an instruction requires
only one operand, the unused reg2 field holds extra opcode bits.
➢ The mod field determines the meaning of the reg1 field.
Slide 50
➢ SIB. In 32-bit addressing, for mod = 00, 01, or 10, when reg1 field indicates ESP
register, an additional byte follows the ModR/M byte—known as SIB byte.
➢ SIB is the acronym for Scale % Index + Base.
➢ It provides a powerful addressing mode in 32-bit that uses a combination of two
registers and a scaling factor instead of reg1 for the operand address.
➢ The SIB byte structure is shown as
➢ In SIB bytes, both index and base are three bit registers codes, and scale is a two-bit
number. To compute the SIB value, the processor uses the formula: (index*2scale)+base
Slide 51
➢ This value is used instead of reg1 field to access the memory.
➢ For example, the ModR/M and SIB bytes for the memory address such as “[EBX*4 +
ESI + displ]” is as follows.
◼ ModR/M.mod = 10 (that is [regl + word1])
◼ ModR/M.reg2 = the destination register
◼ ModR/M.regl = ESP
◼ SIB.scale = 2
◼ SIB.index = EBX
◼ SIB.base = ESI
Slide 52
➢ When mod = 01 or 10, a displacement is a part of the operand’s address.
Slide 53
➢ Immediate: If an instruction uses immediate value as an
operand, the immediate value is the last part of the instruction.
Slide 54
Assembler Design
Slide 55