Emu8086 Tutorial
Emu8086 Tutorial
Emu8086 Tutorial
numbering systems
part 3: variables
part 4: interrupts
part 8: procedures
this tutorial is intended for those who are not familiar with
assembler at all, or have a very distant idea about it. of course if
you have knowledge of some high level programming language
(java, basic, c/c++, pascal...) that may help you a lot.
but even if you are familiar with assembler, it is still a good idea to
look through this document in order to study emu8086 syntax.
8086 CPU has 8 general purpose registers, each register has its own
name:
because registers are located inside the cpu, they are much faster
than memory. accessing a memory location requires the use of a
system bus, so it takes much longer. accessing data in a register
usually takes no time. therefore, you should try to keep variables in
the registers. register sets are very small and most registers have
special purposes which limit their use as variables, but they are still
an excellent place to store temporary data of calculations.
segment registers
Memory Access
to access memory we can use these four registers: BX, SI, DI, BP.
combining these registers inside [ ] symbols, we can get different
memory locations. these combinations are supported (addressing
modes):
you can form all valid combinations by taking only one item from
each column or skipping the column by not taking anything from it.
as you see BX and BP never go together. SI and DI also don't go
together. here are an examples of a valid addressing
modes: [BX+5] , [BX+SI] , [DI+BX-4]
the value in segment register (CS, DS, SS, ES) is called a segment,
and the value in purpose register (BX, SI, DI, BP) is called
an offset.
When DS contains value 1234h and SI contains the value 7890h it
can be also recorded as 1234:7890. The physical address will be
1234h * 10h + 7890h = 19BD0h.
7h = 7
70h = 112
for example:
byte ptr [BX] ; byte access.
or
word ptr [BX] ; word access.
assembler supports shorter prefixes as well:
MOV instruction
REG: AX, BX, CX, DX, AH, AL, BL, BH, CH, CL, DH, DL, DI, SI, BP, SP.
you can copy & paste the above program to the code editor, and
press [Compile and Emulate] button (or press F5 key on your
keyboard).
1. select the above text using mouse, click before the text and
drag it down until everything is selected.
as you may guess, ";" is used for comments, anything after ";"
symbol is ignored by compiler.
Variables
Variable is a memory location. For a programmer it is much easier
to have some value be kept in a variable named "var1" then at the
address 5A73:235B, especially when you have 10 or more
variables.
name DB value
name DW value
name - can be any letter or digit combination, though it should start with
a letter. It's possible to declare unnamed variables by not specifying the
name (this variable will have an address but no name).
ORG 100h
Copy the above code to the source editor, and press F5 key to
compile it and load in the emulator. You should get something like:
As you see this looks a lot like our example, except that variables
are replaced with actual memory locations. When compiler makes
machine code, it automatically replaces all variable names with
their offsets. By default segment is loaded in DS register
(when COM files is loaded the value of DS register is set to the
same value as CS register - code segment).
You can see that there are some other instructions after
the RET instruction, this happens because disassembler has no idea
about where the data starts, it just processes the values in memory
and it understands them as valid 8086 instructions (we will learn
them later).
You can even write the same program using DB directive only:
ORG 100h
DB 0A0h
DB 08h
DB 01h
DB 8Bh
DB 1Eh
DB 09h
DB 01h
DB 0C3h
DB 7
DB 34h
DB 12h
Copy the above code to the source editor, and press F5 key to
compile and load it in the emulator. You should get the same
disassembled code, and the same functionality!
As you may guess, the compiler just converts the program source to
the set of bytes, this set is called machine code, processor
understands the machine codeand executes it.
Arrays
Arrays can be seen as chains of variables. A text string is an
example of a byte array, each character is presented as an ASCII
code value (0..255).
You can access the value of any element in array using square
brackets, for example:
MOV AL, a[3]
You can also use any of the memory index registers BX, SI, DI,
BP, for example:
MOV SI, 3
MOV AL, a[SI]
If you need to declare a large array you can use DUP operator.
The syntax for DUP:
for example:
c DB 5 DUP(9)
is an alternative way of declaring:
c DB 9, 9, 9, 9, 9
For example:
BYTE PTR [BX] ; byte access.
or
WORD PTR [BX] ; word access.
assembler supports shorter prefixes as well:
in certain cases the assembler can calculate the data type automatically.
ORG 100h
RET
VAR1 DB 22h
END
RET
VAR1 DB 22h
END
These lines:
LEA BX, VAR1
MOV BX, OFFSET VAR1
are even compiled into the same machine code: MOV BX, num
num is a 16 bit value of the variable offset.
Please note that only these registers can be used inside square
brackets (as memory pointers): BX, SI, DI, BP!
(see previous part of the tutorial).
Constants
Constants are just like variables, but they exist only until your
program is compiled (assembled). After definition of a constant its
value cannot be changed. To define constants EQU directive is
used:
k EQU 5
MOV AX, k
MOV AX, 5
Interrupts
Interrupts can be seen as a number of functions. These functions
make the programming much easier, instead of writing a code to
print a character you can simply call the interrupt and it will do
everything for you. There are also interrupt functions that work with
disk drive and other hardware. We call such functions software
interrupts.
Copy & paste the above program to the source code editor, and
press [Compile and Emulate] button. Run it!
8086 assembler tutorial for beginners (part 5)
include 'emu8086.inc'
To use any of the above macros simply type its name somewhere in
your code, and if required parameters, for example:
include emu8086.inc
ORG 100h
GOTOXY 10, 5
CALL PTHIS
db 'Hello World!', 0
To use any of the above procedures you should first declare the
function in the bottom of your file (but before the END directive),
and then use CALL instruction followed by a procedure name. For
example:
include 'emu8086.inc'
ORG 100h
DEFINE_SCAN_NUM
DEFINE_PRINT_STRING
DEFINE_PRINT_NUM
DEFINE_PRINT_NUM_UNS ; required for
print_num.
DEFINE_PTHIS
First compiler processes the declarations (these are just regular the
macros that are expanded to procedures). When compiler gets
to CALL instruction it replaces the procedure name with the address
of the code where the procedure is declared. When CALL instruction
is executed control is transferred to procedure. This is quite useful,
since even if you call the same procedure 100 times in your code
you will still have relatively small executable size. Seems
complicated, isn't it? That's ok, with the time you will learn more,
currently it's required that you understand the basic principle.
8086 assembler tutorial for beginners (part 6)
As you may see there are 16 bits in this register, each bit is called
a flag and can take a value of 1 or 0.
Zero Flag (ZF) - set to 1 when result is zero. For none zero
result this flag is set to 0.
REG, memory
memory, REG
REG, REG
memory, immediate
REG, immediate
REG: AX, BX, CX, DX, AH, AL, BL, BH, CH, CL, DH, DL, DI, SI, BP, SP.
1 AND 1 = 1
1 AND 0 = 0
0 AND 1 = 0
0 AND 0 = 0
1 OR 1 = 1
1 OR 0 = 1
0 OR 1 = 1
0 OR 0 = 0
As you see we get 1 every time when at least one of the bits
is 1.
1 XOR 1 = 0
1 XOR 0 = 1
0 XOR 1 = 1
0 XOR 0 = 0
As you see we get 1 every time when bits are different from
each other.
REG
memory
REG: AX, BX, CX, DX, AH, AL, BL, BH, CH, CL, DH, DL, DI, SI, BP, SP.
REG
memory
REG: AX, BX, CX, DX, AH, AL, BL, BH, CH, CL, DH, DL, DI, SI, BP, SP.
unconditional jumps
JMP label
To declare a label in your program, just type its name and add
":" to the end, label can be any character combination but it
cannot start with a number, for example here are 3 legal label
definitions:
label1:
label2:
a:
x1:
MOV AX, 1
org 100h
calc:
add ax, bx ; add bx to ax.
jmp back ; go 'back'.
stop:
Opposite
Instruction Description Condition
Instruction
if you emulate this code you will see that all instructions are
assembled into JNB, the operational code (opcode) for this
instruction is 73h this instruction has fixed length of two
bytes, the second byte is number of bytes to add to
the IP register if the condition is true. because the instruction
has only 1 byte to keep the offset it is limited to pass control
to -128 bytes back or 127 bytes forward, this value is always
signed.
jnc a
jnb a
jae a
mov ax, 4
a: mov ax, 5
ret
Opposite
Instruction Description Condition
Instruction
ZF = 0
Jump if Greater (>).
JG , JNLE and JNG, JLE
Jump if Not Less or Equal (not <=).
SF = OF
ZF = 1
Jump if Less or Equal (<=).
JLE , JNG or JNLE, JG
Jump if Not Greater (not >).
SF <> OF
Opposite
Instruction Description Condition
Instruction
CF = 1
Jump if Below or Equal (<=).
JBE , JNA or JNBE, JA
Jump if Not Above (not >).
ZF = 1
Another example:
it's required to compare 7 and 7,
7-7=0
the result is zero! (Zero Flag is set to 1 and JZ or JE will do the
jump).
try the above example with different numbers for AL and BL,
open flags by clicking on flags button, use single step and see
what happens. you can use F5hotkey to recompile and reload
the program into the emulator.
loops
opposite
instruction operation and jump condition
instruction
DEC
LOOP decrease cx, jump to label if cx not zero.
CX and JCXZ
OR CX,
JCXZ jump to label if cx is zero.
CX and JNZ
All conditional jumps have one big limitation,
unlike JMP instruction they can only jump 127 bytes forward
and 128 bytes backward (note that most instructions are
assembled into 3 or more bytes).
label_x: - can be any valid label name, but there must not be
two or more labels with the same name.
here's an example:
include "emu8086.inc"
org 100h
mov al, 5
mov bl, 5
add bl, al
sub al, 10
xor al, bl
jmp skip_data
db 256 dup(0) ; 256 bytes
skip_data:
stop:
ret
org 100h
ret
8086 assembler tutorial for beginners (part 8)
Procedures
Procedure is a part of code that can be called from your program in
order to make some specific task. Procedures make program more
structural and easier to understand. Generally procedure returns to
the same point from where it was called.
RET
name ENDP
name - is the procedure name, the same name should be in the top
and the bottom, this is used to check correct closing of procedures.
PROC and ENDP are compiler directives, so they are not assembled
into any real machine code. Compiler just remembers the address
of procedure.
Here is an example:
ORG 100h
CALL m1
MOV AX, 2
m1 PROC
MOV BX, 5
RET ; return to caller.
m1 ENDP
END
The above example calls procedure m1, does MOV BX, 5, and
returns to the next instruction after CALL: MOV AX, 2.
ORG 100h
MOV AL, 1
MOV BL, 2
CALL m2
CALL m2
CALL m2
CALL m2
m2 PROC
MUL BL ; AX = AL * BL.
RET ; return to caller.
m2 ENDP
END
ORG 100h
CALL print_me
;
==========================================================
; this procedure prints a string, the string should be null
; terminated (have zero in the end),
; the string address should be in SI register:
print_me PROC
next_char:
CMP b.[SI], 0 ; check for zero to stop
JE stop ;
stop:
RET ; return to caller.
print_me ENDP
;
==========================================================
END
"b." - prefix before [SI] means that we need to compare bytes, not
words. When you need to compare words add "w." prefix instead.
When one of the compared operands is a register it's not required
because compiler knows the size of each register.
8086 assembler tutorial for beginners (part 9)
The Stack
Stack is an area of memory for keeping temporary data. Stack is
used by CALL instruction to keep return address for
procedure, RET instruction gets this value from the stack and
returns to that offset. Quite the same thing happens
when INT instruction calls an interrupt, it stores in stack flag
register, code segment and offset. IRET instruction is used to
return from interrupt call.
PUSH REG
PUSH SREG
PUSH memory
PUSH immediate
REG: AX, BX, CX, DX, DI, SI, BP, SP.
POP REG
POP SREG
POP memory
REG: AX, BX, CX, DX, DI, SI, BP, SP.
Notes:
Here is an example:
ORG 100h
RET
END
ORG 100h
END
The exchange happens because stack uses LIFO (Last In First Out)
algorithm, so when we push 1212h and then 3434h, on pop we
will first get 3434h and only after it 1212h.
Add 2 to SP register.
Macros
Macros are just like procedures, but not really. Macros look like
procedures, but they exist only until your code is compiled, after
compilation all macros are replaced with real instructions. If you
declared a macro and never used it in your code, compiler will
simply ignore it. emu8086.inc is a good example of how macros
can be used, this file contains several macros to make coding easier
for you.
Macro definition:
name MACRO [parameters,...]
<instructions>
ENDM
MOV AX, p1
MOV BX, p2
MOV CX, p3
ENDM
ORG 100h
MyMacro 1, 2, 3
MyMacro 4, 5, DX
RET
CALL MyProc
When you want to use a macro, you can just type its name. For
example:
MyMacro
To pass parameters to macro, you can just type them after the
macro name. For example:
MyMacro 1, 2, 3
To mark the end of the macro ENDM directive is enough.
To mark the end of the procedure, you should type the name of the
procedure before the ENDP directive.
MyMacro2 MACRO
LOCAL label1, label2
CMP AX, 2
JE label1
CMP AX, 3
JE label2
label1:
INC AX
label2:
ADD AX, 2
ENDM
ORG 100h
MyMacro2
MyMacro2
RET