0% found this document useful (0 votes)
2 views159 pages

Advanced Processing Technique

The document provides an overview of assembly language, particularly focusing on the EMU8086 environment and the architecture of the 8086 CPU, including its registers and commands. It covers concepts such as memory management, dynamic RAM, memory interleaving, and pipelining, along with practical homework assignments for programming in EMU8086. Additionally, it discusses pipeline hazards and their impact on instruction execution in a CPU.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views159 pages

Advanced Processing Technique

The document provides an overview of assembly language, particularly focusing on the EMU8086 environment and the architecture of the 8086 CPU, including its registers and commands. It covers concepts such as memory management, dynamic RAM, memory interleaving, and pipelining, along with practical homework assignments for programming in EMU8086. Additionally, it discusses pipeline hazards and their impact on instruction execution in a CPU.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

We will use EMU8086 ‫العملي‬

what is assembly language?

Assembly language is a low level programming language. you need to


get some knowledge about computer structure in order to understand
anything.
The CPU is the heart of the computer, most of computations occur inside the CPU.
RAM is a place to where the programs are loaded in order to be executed.
general purpose registers
8086 CPU has 8 general purpose registers, each register has its own name:
AX - the accumulator register (divided into AH / AL).
BX - the base address register (divided into BH / BL).
CX - the count register (divided into CH / CL).
DX - the data register (divided into DH / DL).
SI - source index register.
DI - destination index register.
BP - base pointer.
SP - stack pointer.

segment registers
CS - points at the segment containing the current program.
DS - generally points at segment where variables are defined.
ES - extra segment register, it's up to a coder to define its usage.
SS - points at the segment containing the stack.
special purpose registers
IP - the instruction pointer.
flags register - determines the current state of the microprocessor.
Mov Command : Copy operand2 to operand1
Flags set in the Status Register by the ALU
An important function of the ALU is to set up bits or flags which give information to
the control unit about the result of an operation. The flags are grouped together in the
status word.
As the ALU has only an adder, to subtract numbers one has to use 2s-complement
arithmetic. The ALU has no knowledge of this at all — it simply adds two binary inputs
and sets the flags. It is up to control unit (or really the programmer’s instructions
executed by the control unit) to interpret the results.
Z Zero flag: This is set to 1 whenever the output from the ALU is zero.
N Negative flag: This is set to 1 whenever the most significant bit of the output is 1.
Note that it is not correct to say that it is set when the output of the ALU is negative
the ALU doesn’t know or care whether you are working in 2’s complement . However
,this flag is used by the controller for just such interpretation.

Overflow Flag (O) – This flag will be set (1) if the result of a signed operation is too large to fit
in the number of bits available to represent it, otherwise reset (0)
Counter Registers the PC and SP are counting registers, which either act
as loadable registers (loaded from the IR (address) register) or count,
both on receipt of a clock pulse. The internal workings appear
them as black complicated, but are unremarkable, so we leave
boxes here.
Array multiplier for above example
‫العملي‬
‫العملي‬
Example:
Write Emu8086 for adding content of array a[4].
Homework:
Q1. Write EMU8086 for checking any number if it is odd
print “ odd“ else print “even”
Q2.Write Emu8086 for adding content of array a[10].
Q3. Write Emu8086 for adding 1…10.
Q4. Write Emu8086 for finding the number of odd
number of an array a[10].
Q5. Write Emu8086 for finding the factorial of any
number.
‫العملي‬
Homework:

Q1. Write an emu8086 for checking any number if it is prime or not.


Q2. Write an emu8086 program for exchanging content of Ax, Bx registers.
Q3. Write emu8086 for finding the results of following :
Draw a pixel:
: ‫العملي‬
Draw horizontal line :
Draw vertical line:

Note: Through drawing vertical line dx register will changed


And through drawing horizontal line cx register will changed.
Homework:
Q1.Draw parallel lines.
Q2. Draw rectangle.
Q3. Draw triangle.
Q4. check value of x,y .
‫العملي‬
Homework :
Q1. Write emu8086 for rotate right number 40h 10 times.
Q2. Write emu8086 for rotate left number 40h 10 times

Q3.Write emu8086 program for shift right number ffh 6 times.


Q4.Write emu8086 program for shift left number ffh 6 times.
Dynamic random-access memory (dynamic RAM or DRAM) is a type of
random-access semiconductor memory that stores each bit of data in a
memory cell consisting of a tiny capacitor and a transistor . The
capacitor can either be charged or discharged; these two states are
taken to represent the two values of a bit, called 0 and 1. The electric
charge on the capacitors slowly leaks off‫تتسرر ببطئ رر ب‬, so without
intervention the data on the chip would soon be lost. To prevent this,
DRAM requires an external memory refresh circuit which periodically
rewrites the data in the capacitors, restoring them to their original
charge. This refresh process is the defining characteristic of dynamic
random-access memory, in contrast to static random-access memory
(SRAM) which does not require data to be refreshed. Unlike flash
memory, DRAM typically takes the form of an integrated circuit chip,
which can consist of dozens to billions of DRAM memory cells. DRAM
chips are widely used in digital electronics where low-cost and high-
capacity computer memory is required.
DRAM will lose values if not refreshed periodically
Memory Interleaving: It is a Technique which divides memory into a number
of modules such that Successive words in the address space are placed in the
Different module.
Consecutive Word in a Module:

Let us assume 16 Data’s to be Transferred to the Four Module. Where Module 00 be


Module 1, Module 01 be Module 2, Module 10 be Module 3 & Module 11 be Module 4.
Also 10, 20, 30….130 are the data to be transferred.
From the figure above in Module 1, 10 [Data] is transferred then 20, 30 & finally, 40 which are
the Data. That means the data are added consecutively in the Module till its max capacity.

Most significant bit (MSB) provides the Address of the Module & least significant bit (LSB)
provides the address of the data in the module.

For Example, to get 90 (Data) 1000 will be provided by the processor. In this 10 will indicate that
the data is in module 10 (module 3) & 00 is the address of 90 in Module 10 (module 3). So,

Module 1 Contains Data : 10, 20, 30, 40


Module 2 Contains Data : 50, 60, 70, 80
Module 3 Contains Data : 90, 100, 110, 120
Module 4 Contains Data : 130, 140, 150, 160
Consecutive Word in Consecutive Module:

Now again we assume 16 Data’s to be


transferred to the Four Module. But Now the
consecutive Data are added in Consecutive
Module. That is, 10 [Data] is added in Module
1, 20 [Data] in Module 2 and So on.

Least Significant Bit (LSB) provides the Address


of the Module & Most significant bit (MSB)
provides the address of the data in the
module.

For Example, to get 90 (Data) 1000 will be


provided by the processor. In this 00 will
indicate that the data is in module 00 (module
1) & 10 is the address of 90 in Module 00
(module 1). That is,

Module 1 Contains Data : 10, 50, 90, 130


Module 2 Contains Data : 20, 60, 100, 140
Module 3 Contains Data : 30, 70, 110, 150
Module 4 Contains Data : 40, 80, 120, 160
Why we use Memory Interleaving? [Advantages]:
Whenever, Processor request Data from the main memory. A block (chunk) of Data is
Transferred to the cache and then to Processor. So whenever a cache miss occurs the Data
is to be fetched from main memory. But main memory is relatively slower than the cache.
So to improve the access time of the main memory interleaving is used.
We can access all four Module at the same time thus achieving Parallelism. From Figure 2
the data can be acquired from the Module using the Higher bits. This method Uses
memory effectively.
Homework:
Q1. If you have two arrays of type integer find the sum of array of a , b into array c.
e.g. a=1 1 1 1 1 b=2 2 2 2 2 c=3 3 3 3 3 print all thing to display.
Q2. Write a procedure for finding the factorial of any number and find the factorial of
4,5.
Q3. Write a procedure for print ‘Computer Engineering’ and call 5 times.
Non Pipeline
Example of Non-Pipelined and Pipelined

Note: The time in non-pipeline = 8 but in pipeline =5


Represent a pipelining with 4 stages and 4 instructions .

Draw the hardware pipelining of 4 stages unit (Fetch ,Decode ,Execute , Write)
In Pipelining

Time =210

Non Pipelining

Time=360
‫العملي‬
Example: Write a macro for print any string and use it to print Computer Science ,
Network .
Example: Write a macro for print 3 string and call it 2 times.
Homework:
Q1. Write a macro for finding the factorial of any number and find the factorial of
4,5.
Q2. Write a macro for print any string and use it to print ‘Computer Engineering’ .
Q3. Find y use macro :
1 2 3 4 n
y= 2 + 2 + 2 + 2 +……. +2
Pipelining
Example : If you have 9 instruction with 6 stages
Fetch instruction (FI)
Decode instruction (DI)
Calculate operands (CO)
Fetch operands (FO)
Execute instructions (EI)
Write result (WR)
Represent that without pipeline and with pipeline (draw timing diagram)
and find the time for each type :
1. With pipeline : time =14 units

2. With Non-pipelining time =54 units.


Homework :timing diagram
Types of Pipeline : It is divided into 2 categories:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for floating point
operations, multiplication of fixed point numbers etc. For example: The input to the Floating
Point Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating point numbers), while a and b are
exponents. The floating point addition and subtraction is done in 4 parts:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract mantissas
4. Produce the result.
Registers are used for storing the intermediate results between the above operations.
Instruction Pipeline
In this a stream of instructions can be executed by
overlapping fetch, decode and execute phases of an
instruction cycle. This type of technique is used to
increase the throughput ‫اإلنتاجية‬of the computer system.
An instruction pipeline reads instruction from the
memory while previous instructions are being executed
in other segments of the pipeline. Thus we can execute
multiple instructions simultaneously. The pipeline
will be more efficient if the instruction cycle is
divided into segments of equal duration.
Pipeline Conflicts
There are some factors that cause the pipeline to deviate its normal performance.
Some of these factors are given below:
1. Timing Variations: All stages cannot take same amount of time. This problem
generally occurs in instruction processing where different instructions have different
operand requirements and thus different processing time.
2. Data Hazards : When several instructions are in partial execution, and if they
reference same data then the problem arises. We must ensure that next instruction
does not attempt to access data before the current instruction, because this will lead
to incorrect results.
3. Branching: In order to fetch and execute the next instruction, we must know what
that instruction is. If the present instruction is a conditional branch, and its result will
lead us to the next instruction, then the next instruction may not be known until the
current one is processed.
4. Interrupts : Interrupts set unwanted instruction into the instruction stream.
Interrupts effect the execution of instruction.
5. Data Dependency : It arises when an instruction depends upon the result of a
previous instruction but this result is not yet available. A=A+1;
B=A+2;
This is the latency for that operation. It's the amount of time between
when the instruction is issued and when it completes
‫العملي‬
Macros are expanded directly in code, therefore if there are labels inside the macro
definition you may get "Duplicate declaration" error when macro is used for twice or
more. To avoid such problem, use LOCAL directive followed by names of variables,
labels or procedure names. For example:
MyMacro2 MACRO
LOCAL label1, label2
CMP AX, 2
JE label1
CMP AX, 3
JE label2
label1: INC AX
label2: ADD AX, 2
ENDM
ORG 100h
MyMacro2
MyMacro2
RET
Homework :

Q1 . Write Emu8086 for finding y = (x^2)/z and print y into display.


Q2. Write Emu8086 for find the biggest number between 3 numbers.
Q3. Write Emu8086 for check any number if divisible by 7 or not.
Q4. Write an assembly language program to add only the odd numbers in the list of
the following elements: 6, 5, 21, 3, 8, 9. Use Macro.
Q5. Write an assembly language program to find the length of the following string
"Computer Engineering Department".
Pipeline Hazards
There are situations, called hazards ‫مخاطر‬, that prevent the next instruction in the
instruction stream from being executing during its designated clock cycle. Hazards
reduce the performance from the ideal ‫ مثطلي‬speedup gained by pipelining.

There are three classes of hazards:

Structural Hazards. They arise from resource conflicts when the hardware cannot
support all possible combinations of instructions in simultaneous overlapped
execution.
Data Hazards. They arise when an instruction depends on the result of a previous
instruction in a way that is exposed ‫ يكشا‬by the overlapping of instructions in the
pipeline.
Control Hazards . They arise from the pipelining of branches and other instructions
that change the PC.
Structural Hazards

When a machine is pipelined, the overlapped execution of instructions


requires pipelining of functional units and duplication of resources to
allow all posible combinations of instructions in the pipeline.If some
combination of instructions cannot be accommodated ‫ استيعابة‬because
of a resource conflict , ‫ صراع لىر علمراعد‬the machine is said to have a
structural hazard.
Example:1
a machine may have only one register-file write port, but in some
cases the pipeline might want to perform two writes in a clock cycle.
Example:2
a machine has shared a single-memory pipeline for data and
instructions. As a result, when an instruction contains a data-memory
reference(load), it will conflict with the instruction reference for a later
instruction (instr 3).
Clock cycle number

Instr 1 2 3 4 5 6 7 8

Load IF ID EX MEM WB

Instr 1 IF ID EX MEM WB

Instr 2 IF ID EX MEM WB

Instr 3 IF ID EX MEM WB

To resolve this, we stall ‫ ممطرلا‬the pipeline for one clock cycle when a data-memory
access occurs. The effect of the stall is actually to occupy the resources for that
instruction slot ‫فتحااا‬. The following table shows how the stalls are actually
implemented.
Clock cycle number

Instr 1 2 3 4 5 6 7 8 9

Load IF ID EX MEM WB

Instr 1 IF ID EX MEM WB

Instr 2 IF ID EX MEM WB

Stall bubble bubble bubble bubble bubble

Instr 3 IF ID EX MEM WB

Instruction (load) assumed to be data-memory reference (load or


store), otherwise Instruction3 cannot start execution for the same
reason as above.
To simplify the picture it is also commonly shown like this
Clock cycle number

Instr 1 2 3 4 5 6 7 8 9

Load IF ID EX MEM WB

Instr 1 IF ID EX MEM WB

Instr 2 IF ID EX MEM WB

Instr 3 stall IF ID EX MEM WB

Introducing stalls degrades ‫يحر‬performance as we saw before. Why, then, would the
designer allow structural hazards? There are two reasons:
To reduce cost. For example, machines that support both an instruction and a cache
access every cycle (to prevent the structural hazard of the above example) require at
least twice as much total memory.
To reduce the latency of the unit. The shorter latency comes from the lack of pipeline
registers that introduce overhead.
Data Hazards
A major effect of pipelining is to change the relative timing ‫ التوقيا السبا ي‬of instructions
by overlapping their execution. This introduces data and control hazards. Data hazards
occur when the pipeline changes the order of read/write accesses to operands so that
the order differs from the order seen by sequentially executing instructions on the
unpipelined machine.
Consider the pipelined execution of these instructions:

1 2 3 4 5 6 7 8 9

ADD R1, R2, R3 IF ID EX MEM WB

SUB R4, R5, R1 IF IDsub EX MEM WB

AND R6, R1, R7 IF IDand EX MEM WB

OR R8, R1, R9 IF IDor EX MEM WB

XOR R10,R1,R11 IF IDxor EX MEM WB


All the instructions after the ADD use the result of the ADD instruction (in R1). The ADD
instruction writes the value of R1 in the WB stage (shown black), and the SUB instruction reads the value
during ID stage (IDsub). This problem is called a data hazard. Unless precautions ‫احتيطراط‬are taken to
prevent it, the SUB instruction will read the wrong value and try to use it.
The AND instruction is also affected by this data hazard. The write of R1 does not complete until the end of
cycle 5 (shown black). Thus, the AND instruction that reads the registers during cycle 4 (IDand) will receive
the wrong result.
The OR instruction can be made to operate without incurring ‫ تك ا‬a hazard by a simple implementation
technique. The technique is to perform register file reads in the second half of the cycle, and writes in the
first half. Because both WB for ADD and IDor for OR are performed in one cycle 5, the write to register file
by ADD will perform in the first half of the cycle, and the read of registers by OR will perform in the second
half of the cycle.
The XOR instruction operates properly, because its register read occur in cycle 6 after the register write by
ADD.
Control Hazards

Control hazards can cause a greater performance loss for pipeline than data hazards.
When a branch is executed, it may or may not change the PC (program counter) to
something other than its current value plus 4. If a branch changes the PC to its target
address, it is a taken branch; if it falls through, it is not taken.
If instruction i is a taken branch, then the PC is normally not changed until the end of
MEM stage, after the completion of the address calculation and comparison.
The simplest method of dealing with branches is to stall the pipeline as soon as the
branch is detected until we reach the MEM stage, which determines the new PC. The
pipeline behavior looks like :
Branch IF ID EX MEM WB

Branch
successor IF(stall) stall stall IF ID EX MEM WB

Branch
successor IF ID EX MEM WB
+1
The stall does not occur until after ID stage (where we know that the instruction is a
branch).This control hazards stall must be implemented differently from a data hazard,
since the IF cycle of the instruction following the branch must be repeated as soon as
we know the branch outcome. Thus, the first IF cycle is essentially a stall (because it
never performs useful work), which comes to total 3 stalls.
Three clock cycles wasted for every branch is a significant loss. With a 30% branch
frequency and an ideal CPI of 1, the machine with branch stalls achieves only half the
ideal speedup from pipelining!
The number of clock cycles can be reduced by two steps:
• Find out whether the branch is taken or not taken earlier in the pipeline;
• Compute the taken PC (i.e., the address of the branch target) earlier.
Both steps should be taken as early in the pipeline as possible . By moving the zero test
into the ID stage, it is possible to know if the branch is taken at the end of the ID cycle.
Computing the branch target address during ID requires an additional adder, because
the main ALU, which has been used for this function so far, is not usable until EX. The
revised datapath : With this datapath we will need only one-clock-cycle stall on
branches.
In some machines, branch hazards are even more expensive in clock
cycles. For example, a machine with separate decode and register fetch
stages will probably have a branch delay - the length of the control
hazard - that is at least one clock cycle longer. The branch delay, unless it
is dealt with, turns into a branch penalty. Many older machines that
implement more complex instruction sets have branch delays of four
clock cycles or more. In general, the deeper the pipeline, the worse the
branch penalty in clock cycles.
Finding the length of any string : write $-string
msg db 'Hello, world!',0xa ;our dear string
len equ $ - msg ;length of our dear string
Homework :
Q1. Write Emu8086 for printing the letter from A-Z.
Q2. Write Emu8086 for finding the biggest number of an array
a[10] of integer number.
Q3. Write Emu8086 for sorting an integer array a[6] ascending .
Q4. Write Emu8086 for finding the length of your name.
Vectorization and Vector Processors
A Scalar processor is a normal processor, which works on
simple instruction at a time, which operates on single data
items. But in today's world, this technique will prove to be
highly inefficient, as the overall processing of instructions will
be very slow. A scalar processor is classified as a SISD
processor in Flynn's taxonomy.
In computing, a vector processor or array processor is a central
processing unit (CPU) that implements an instruction set containing instructions that
operate on one-dimensional arrays of data called vectors, compared to the scalar
processors, whose instructions operate on single data items. Vector processors can
greatly improve performance on certain workloads, notably numerical simulation and
similar tasks. Vector machines appeared in the early 1970s and supercomputer design
through the 1970s into the 1990s. ‫ ظهةت الال المجههةا‬.‫وال سيما المحاكاة العددية والمهام المماثلةة‬
. ‫في أوائل السبعينيا وجم جصميم الحواسيب العمالقة خالل السبعينيا وحجى الجسعينيا‬
Characteristics of Vector Processors
A vector processor is a CPU (Central Processing Unit) in a computer with parallel
processors and the capability for vector processing. The main characteristic of a
vector processor is that it makes use of the parallel processing capability of the
processor where two or more processors operate concurrently. This makes it
possible for: the processors to perform multiple tasks simultaneously
or
for the task to be split into different subtasks handled by different processors and
combined to get the result.
The vector processor considers all of the elements of the vector as one single
element as it traverses through the vector in a single loop. Computers with vector
processors find many uses that involve computation of massive amounts of data
such as image processing, artificial intelligence, mapping the human genome,
space simulations, seismic data, and hurricane predictions. ‫الهينةوم البرةةتم ومحاكةةاة‬
.‫الفضاء والبيانا الزلزالية وجنبؤا األعاصيت‬
Types of Array Processors
There are basically two types of array processors:
• Attached Array Processors
• SIMD Array Processors
SIMD Array Processors Single Instruction Multiple Data
SIMD is the organization of a single computer containing multiple processors operating in
parallel. The processing units are made to operate under the control of a common control unit,
thus providing a single instruction stream and multiple data streams.
A general block diagram of an array processor is shown below. It contains a set of identical
processing elements (PE's), each of which is having a local memory M. Each processor element
includes an ALU and registers. The master control unit controls all the operations of the
processor elements. It also decodes the instructions and determines how the instruction is to
be executed. The main memory is used for storing the program. The control unit is responsible
for fetching the instructions. Vector instructions are send to all PE's simultaneously and results
are returned to the memory.
Why use the Array Processor

• Array processors increases the overall instruction processing speed.


• As most of the Array processors operates asynchronously from the host CPU, hence it
improves the overall capacity of the system.
• Array Processors has its own local memory, hence providing extra memory for systems
with low memory.

Applications of Vector Processors


Computer with vector processing capabilities are in demand in specialized applications.
The following are some areas where vector processing is used:
1.Petroleum exploration..‫الجنقيب عن البجتول‬
2.Medical diagnosis..‫جرخيص طبي‬
3.Data analysis.. ‫جحليل البيانا‬
4.Weather forecasting..‫الجنبؤ بالطقس‬
5.Aerodynamics and space flight simulations..‫الديناميكا الهوائية ومحاكاة تحال الفضاء‬
6.Image processing..‫معالهة الصوتة‬
7.Artificial intelligence..‫الذكاء االصطناعي‬
INT 21h / AH=9 - output of a string at DS:DX. String must be terminated by '$'.

example:

org 100h
mov dx, offset msg
mov ah, 9
int 21h
ret
msg db "hello world $"
INT 21h / AH=1 - read character from standard input, with echo, result is stored in AL.
if there is no character in the keyboard buffer, the function waits until any key is pressed.
example:
mov ah, 1
int 21h
ret
Computer Architecture Flynn’s taxonomy
Parallel computing is a computing where the jobs are broken into discrete parts that can be
executed concurrently. Each part is further broken down to a series of instructions. Instructions
from each part execute simultaneously on different CPUs. Parallel systems deal with the
simultaneous use of multiple computer resources that can include a single computer with
multiple processors, a number of computers connected by a network to form a parallel
processing .
Parallel systems are more difficult to program than computers with a single processor because
the architecture of parallel computers varies accordingly and the processes of multiple CPUs
must be coordinated and synchronized.
The crux ‫جوهر‬of parallel processing are CPUs. Based on the number of instruction and data
streams that can be processed simultaneously, computing systems are classified into four major
categories:
Flynn's taxonomy is a classification of computer architectures
The four classifications defined by Flynn are based upon the number of concurrent
instruction (or control) streams and data streams available in the architecture
According to Flynn's taxonomy- the data transfer mode, computer can be divided into 4 major
groups:
• SISD
• SIMD
• MISD
• MIMD

SISD (Single Instruction Stream ,Single Data Stream)


It represents the organization of a single computer contains a control unit ,processor unit and
a memory unit. Instructions are executed sequentially . It can be a achieved by pipelining or
multiple functional unit.
A sequential computer which exploits no parallelism in either the instruction or data streams.
Single control unit (CU) fetches single instruction stream (IS) from memory. The CU then
generates appropriate control signals to direct single processing element (PE) to operate on
single data stream (DS) i.e., one operation at a time.
Examples of SISD architecture are the traditional uniprocessor machines like older personal
computers
SIMD (Single Instruction Stream, Multiple Data Stream)
It represents an organization that includes multiple processing units under the control
of a common control unit. All processors receive the same instruction from control unit
but operate on different parts of the data. They are highly specialized computers. They
are basically used for numerical problems that are expressed in the form of vector or
matrix. But they are not suitable for other types of computations
A single instruction operates on multiple different data streams. Instructions can be executed
sequentially, such as by pipelining, or in parallel by multiple functional units.
SIMD is an execution model used in parallel computing where single instruction, multiple data.
MISD (Multiple Instruction Stream, Single Data Stream)
It consists of a single computer containing multiple processors connected with multiple control
units and a common memory unit. It is capable of processing several instructions over single
data stream simultaneously. MISD structure is only of theoretical interest since no practical
system has been constructed using this organization.
Multiple instructions operate on one data stream. Heterogeneous systems operate on the
same data stream and must agree on the result.
MIMD (Multiple Instruction Stream, Multiple Data Stream
It represents the organization which is capable of processing several programs at same time. It
is the organization of a single computer containing multiple processors connected with multiple
control units and a shared memory unit. The shared memory unit contains multiple modules to
communicate with all processors simultaneously. Multiprocessors and multicomputer are the
examples of MIMD.
Multiple processors simultaneously executing different instructions on different data. MIMD
architectures include multi-core superscalar processors, and distributed systems, using either
one shared memory space or a distributed memory space.
‫العملي‬
Vector Processor
Definition: Vector processor is basically a central processing
unit that has the ability to execute the complete vector input
in a single instruction. More specifically we can say, it is a
complete unit of hardware resources that executes a
sequential set of similar data items in the memory using a
single instruction.
Difference between multiprocessor and Multicomputer:
A multiprocessor system is a single computer that operates with multiple CPUs where as a
multicomputer system is a cluster of computers that operate as a singular computer

Multiprocessor:
A Multiprocessor is a computer system with two or more central processing units (CPUs)
share full access to a common RAM. The main objective of using a multiprocessor is to boost
the system’s execution speed, with other objectives being fault tolerance and application
matching.
There are two types of multiprocessors, one is called shared memory multiprocessor and
another is distributed memory multiprocessor. In shared memory multiprocessors, all the
CPUs shares the common memory but in a distributed memory multiprocessor, every CPU has
its own private memory.
Multicomputer:
A multicomputer system is a computer system with multiple processors that are connected
together to solve a problem. Each processor has its own memory and it is accessible by that
particular processor and those processors can communicate with each other via an
interconnection network.

As the multicomputer is capable of messages passing between the processors, it is possible to


divide the task between the processors to complete the task. Hence, a multicomputer can be
used for distributed computing. It is cost effective and easier to build a multicomputer than a
multiprocessor.
Difference between multiprocessor and Multicomputer:

1. Multiprocessor is a system with two or more central processing


units (CPUs) that is capable of performing multiple tasks where as
a multicomputer is a system with multiple processors that are
attached via an interconnection network to perform a
computation task.
2. A multiprocessor system is a single computer that operates with
multiple CPUs where as a multicomputer system is a cluster of
computers that operate as a singular computer.
3. Construction of multicomputer is easier and cost effective than a
multiprocessor.
4. In multiprocessor system, program tends to be easier where as in
multicomputer system, program tends to be more difficult.
5. Multiprocessor supports parallel computing, Multicomputer
supports distributed computing.
How do I print a result of a sum of two numbers?
How to calculate the number of elements in array?
Homework
Q1.Write emu8086 for printing the table of 9 … for e.g 9*1=9,9*2=18,…,9*10
Q2. Write emu8086 for finding y and print it
1 2 3 n
y=2+2+2+….+2

Q3. Write emu8086 for display menu :


1. finding the sum of array
2. finding the length of array.
3. Exit
Architecture of array processor
In this section, we define four simple parallel architectures:
1. Linear array and rings of processors
2. Binary tree of processors
3. Mesh and Torus of processors
4. Shared Memory of processor
There are many important Terms:
Network Diameter D: It is the minimum distance between the farthest nodes in a network.
The distance is measured in terms of number of distinct hops between any two nodes.
Node degree d: Number of edges connected with a node is called node degree. If the
edge carries data from the node, it is called out degree and if this carries data into the
node it is called in degree.
Bisection Bandwidth: Number of edges required to be cut to divide a network into two
halves is called bisection bandwidth.
Where N represents Number of nodes ,here N=5,D=4
Ring

Where N represents Number of nodes

b=2
Where h is height of tree

Here h=4,D=2*4=8
Where p represents Number of nodes
Here D=2√9 -2=4 , when we connect p0 to p8
It need 4 link,b=3;
Shared Memory. A shared-memory multiprocessor can be modeled as a complete graph,
in which every node is connected to every other node, as shown in Fig. 4 .However, it has to go
through an intermediary to send/receive data to/from P 4 , say. In a shared-memory
multiprocessor, every piece of data is directly accessible to every processor (we assume that
each processor can simultaneously send/receive data over all of its p – 1 links). The diameter
D = 1 of a complete graph is an indicator of this direct access . The node degree d = p – 1,
here 9-1=8 ,d=8.

Fig. 4. A shared-variable architecture modeled as a complete graph


‫العملي‬
Write a Program For Read a Character From The Keyboard
MOV ah, 1h //keyboard input subprogram
INT 21h // character input
// character is stored in al
MOV c, al //copy character from alto c
Write a Program For Reading and Displaying a Character
MOV ah, 1h // keyboard input subprogram
INT 21h //read character into al
MOV dl, al //copy character to dl
MOV ah, 2h //character output subprogram
INT 21h // display character in dl
Write a Program For Displaying The String Using Library Functions
include emu8086.inc //Macro declaration
ORG 100h
PRINT ‘Hello World!’
GOTOXY 10, 5
PUTC 65 // 65 – is an ASCII code for ‘A’
PUTC ‘B’
RET //return to the operating system.
END //directive to stop the compiler.
Homework
Q1.Write emu8086 for printing “Multiple “ if A multiple of B.
Q2. Write emu8086 for print the average of 5 elements.
Q3. Write emu8086for checking any number if it is ‘Prime’ or ‘Not Prime’

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy