|  | 
Branch: CSE/IT (4th SEM)
Session-2012
Lecture – 1:
• Digital logics
• Boolean Algebra
• Logic Gates
• Truth table
Submitted by:
Prerna Mittal
Computer Architecture and
Organization
CSE – 210-F
Unit -1
Basic Principles
1
In this chapter we will be dealing with the basic digital circuits of our computer. That is
what are the hardware components we are using , how these hardware components are
related and interacted to each other and how this hardware is accessed or seen by the
user.
This gives the birth of the classification of our computer study into:
· Computer design: This is concerned with the hardware design of the computer. In this
designer decides on the specifications of the computer system.
· Computer Organization: This is concerned with the way the hardware components
operate and the way they are connected to form the computer system.
· Computer Architecture: This is concerned with the structure and behavior of the
computer as seen by the user. It includes the information formats, the instruction set and
addressing modes for accessing memory.
In our course we will be dealing with computer architecture and organization.
Before starting with the computer architecture and organization lets discuss the
components which make the hardware or the organization of the computer which is
composed of digital circuits which are handled by digital computer.
Digital Computers
· Imply that the computer deals with digital information
· Digital information: is represented by binary digits (0 and 1)
Gates – blocks of Hardware that produce 1 or 0 when input logic requirements are
satisfied
Functions of gates can be described by:
· Truth Table
· Boolean Function
· Karnaugh Map
Table for various logic gates -1.1
Gate
GATE Binary digital
input signal Binary digital
output signal
2
Boolean algebra
· Algebra with Binary (Boolean) Variable and Logic Operations
· Boolean Algebra is useful in Analysis and Synthesis of Digital
Logic Circuits
- Input and Output signals can be represented by Boolean
Variables and
- Function of the Digital Logic Circuits can be
represented by Logic Operations , i.e., Boolean Function(s)
- From a Boolean function, a logic diagram can be
constructed using AND, OR, and I
Note: We can have many circuits for the same Boolean expression.
3
For example:
Truth Table
· The most elementary specification of the function of a Digital
Logic Circuit is the Truth Table
· Table that describes the Output Values for all the combinations
of the Input Values, called MINTERMS
· n input variables → 2n minterms
Summary:
· Computer Design: what hardware components we need.
· Computer Organization: how these hardware components are interacted.
· Computer Architecture: how these are connected with the user.
· Logic Gates: Blocks of hardware giving result in 0 or 1. Basic 8 logic
gates out of 3 (AND , OR and I ) are basic
· Boolean Algebra: The representation of input and output signals in the
form of expressions.
· Truth table: Table that describes the Output Values for all the
combinations of the Input Values
4
Lecture – 2:
• Combinational logic Blocks
Multiplexers
Adders
Encoders
Decoders
Combinational circuits are circuits without memory where the outputs are obtained from
the inputs only. An n-input m-output combinational circuit is of the form.
Multiplexer is the combinational circuit which selects one of the many inputs depending
on the selection criteria.
The no of selection inputs depends on the number of inputs in the manner as 2x = y
By this if y is the no of inputs then x is the no of selection lines.
Thus if we have 4 input lines, we use 2 selection lines as 22 =4 and so on.
And this will be called as 4:1 multiplexer or 4*1 multiplexer.
This has been explained in the diagram as:
Combinational
circuits
n input m output
5
Adders
Half Adder
Full Adder
Half Adder: Adds 2 bits and give out carry and sum as result
4-to-1 Multiplexer
I0
I1
I2
I3
S0 S1
Y
0 0 I0
0 1 I1
1 0 I2
1 1 I3
Select Output
S1 S0 Y
6
Full Adder: Adds 2 bits with carry in and gives carry out and sum as result.
x
y
x
y
c = xy s = xy’ + x’y
= x Å y
x
c
s
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0
x y c s 0
1
0 0
0
1
1
y
Truth Table
Digital Circuit
0
XY
Cin
S
cout
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Cout = xy + xcin+ ycin
= xy + (x Å y) Cin
s = x’y’ cin+x’yc’in+xy’c’in+xyCin
= x Å y Å Cin
= (x Å y) Å Cin
x
Cin
x
Cin
Cout s
x y cin cout s
0
0
1
0
0
1
1
1
0
1
0
1
1
0
1
0
7
Decoder: Decoder takes n inputs and gives 2n outputs.
That is we get 8 outputs for 3 inputs and is called as 3* 8 decoder.
We also have 2* 4 decoder and 4*16 decoder and so on.
We are implementing a decoder with the help of NAND gates.
Using NAND gates, it becomes more economical.
8
Summary:
· Combinational circuits: where the outputs are obtained from the inputs
only.
· Various combinational circuits are:
o Multiplexers: No of selection inputs depends on the number of
inputs in the manner as 2x = y.
o Half Adder: Adds 2 bits and give result as carry and sum.
o Full Adder: Adds 2 bits with carry in and gives result as carry out
and Sum.
o Encoder: Takes 2n inputs and gives n outputs.
o Decoder: Takes n inputs and gives 2n outputs.
Important Questions derived from this:
Q1. What is the difference in multiplexer and decoder?
Q2.Draw a 4*1 decoder with the help of AND gates.
9
Lecture – 3:
• Sequential logic Blocks
Latches
Flip flops
Registers
Counters
• Sequential logic Blocks : logic blocks whose output logic value
depends on the input values and the state of the blocks
– In this we have the concept of memory which was not
applicable for combinational circuits.
The various sequential blocks or circuits are:
Latches:
• A latch is a kind of bistable multivibrator, an electronic circuit which has two
stable states and thereby can store one bit of information. Today the word is
mainly used for simple transparent storage elements, while slightly more
advanced non-transparent (or clocked) devices are described as flip-flops.
Informally, as this distinction is quite new, the two words are sometimes used
interchangeably.
S-R latch:
To overcome the restricted combination, one can add gates to the inputs that would
convert (S,R) = (1,1) to one of non-restricted combinations. That can be:
Q = 1 (1,0) — referred to as an S-latch
10
Q = 0 (0,1) — referred to as an R-latch
Keep state (0,0) — referred to as an E-latch
D-LATCH
Forbidden input values are forced not to occur by using an inverter between the inputs.
Flip Flops:
D – flip flop:
Q
Q’
D(data)
E
(enable)
D Q
E Q’
E Q’
D Q
D Q(t+1)
0 0
1 1
11
If you compare the D-flip flop and D – latch the only difference you find in the circuit is
that latches do not have clocks and flip – flops have it.
So you can note down the difference between latches and flip – flops as:
• Latch is an edge triggered device whereas Flip flop is a level triggered.
• The output of a latch changes independent of a clock signal whereas the Output of
a Flip - Flop changes at specific times determined by a clocking signal.
• In Latch We do not require clock pulses and flip flops are clocked devices.
Characteristics
- State transition occurs at the rising edge or
falling edge of the clock pulse
Latches
respond to the input only during these periods
Edge-triggered Flip Flops (positive)
respond to the input only at this time
12
Counters: A counter is a device which stores (and sometimes displays) the number of
times a particular event or process has occurred, often in relationship to a clock signal.
4 – bit binary counter:
RING COUNTER:
In Ring Counter the output of 1st flip flop is moved to the input of 2nd flip flop.
J K
Q
J K
Q
J K
Q
J K
Q
Clock
Counter
Enable
A0 A1 A2 A3
Output
Carry
13
JOHNSON COUNTER :
In Johnson counter the output of last flip flop is inverted and given to the first flip flop.
Registers: It refers to a group of flip-flops operating as a coherent unit to hold data. This
is different from a counter, which is a group of flip-flops operating to generate new data
by tabulating it.
14
Shift register: A register that is capable of shifting data one bit at a time is called a shift
register. The logical configuration of a serial shift register consists of a chain of flip-flops
connected in cascade, with the output of one flip-flop being connected to the input of its
neighbor. The operation of the shift register is synchronous; thus each flip-flop is
connected to a common clock. Using D flip-flops forms the simplest type of shiftregisters.
Bi- directional shift register with parallel load
Summary:
D
Q
C D
Q
C D
Q
C D
Q
C
A0 A1 A2
A3
4 x 1
MUX
4 x 1
MUX
4 x 1
MUX
4 x 1
MUX
Clock S0S1 SeriaI
Input
I0 I1 I2
I3 Serial
Input
15
· Sequential circuits: output logic value depends on the input values and the
state of the blocks. These circuits have memory.
· Various combinational circuits are:
o Latches: An electronic circuit which has two stable states and
thereby can store one bit of information
o Flip flops: It also has 2 stable states but with memory.
o Counter: A device which stores number of times a particular event
or process has occurred.
o Registers: A group of flip-flops operating as a coherent unit to hold
data.
Important Questions derived from this:
Q1. What is the difference in latch and flip flop?
Q2. Explain Johnson counter?
Q3. Draw shift register with parallel load.
16
Lecture – 4:
• Stored Program control concept
• Flynn’s classification of computers:
– SISD
– SIMD
– MISD
– MIMD
After the discussion of basic principles of hardware and the combinational and sequential
circuits we have in our computer system. Let’s see how these components are interacted
to make our computer system which we use. We will be starting with the basic
architectures of the computer system. And the most basic one which comes is how the
programs are stored in our computer system or how the different programs and data are
arranged in our system.
Stored Program control concept
• The simplest way to organize a computer is to have one processor register and an
instruction code with 2 parts.
– Opcode (What operation is to be completed)
– Address (Address of the operands on which the operation is to be
computed)
• A computer that by design includes an instruction set architecture and can store in
memory a set of instructions (a program) that details the computation and the data
on which computation is to be done.
Memory 4096*16
• The opcode tells us the operation to be performed.
• Address tells us the memory location where to find the operand.
• For a memory unit of 4096 bits we need 12 bits to specify the address.
Instruction Format
11 0
Opcode
15
Address
12
15 0
Binary Operand
Fig 1: Stored Program Organization
Processor register
(Accumulator or AC)
Instructions
(Program)
Operands
(Data)
17
• When we store an instruction code in memory, 4 bits are specified for 16
operations (as 12 bits are for operand address).
• For an operation control fetches the instruction from memory, it decodes the
operation (one out of 16) and finds out the operands and then do the operation.
• Computers with one processor register generally name it accumulator (or AC).
The operation is performed with the operand and the content of AC.
• In case no operand is specified, we compute the operation on
accumulator .E.g. – Clear AC, complement AC etc.
PARALLEL COMPUTERS
The one we studied was very basic one but sometimes we have very large computations
in which one processor with general architecture will not of much help. Thus we take the
help of many processors or divide the processor functions into many functional units and
also doing the same computation on many data values. So to give solutions to all these
we have various types of computers.
Architectural Classification
– Flynn's classification
• Based on the multiplicity of Instruction Streams and Data Streams
• Instruction Stream
– Sequence of Instructions read from memory
• Data Stream
– Operations performed on the data in the processor
Fig 2: Classification accordance to Instruction and Data stream
• There are a variety of ways parallel processing can be classified.
• M.J.Flynn considered the organization of a computer system by the number of
instructions and data items manipulated simultaneously.
• The normal operation of a computer is to fetch instructions from memory and
execute them in the processor.
Number of Data Streams
Number of
Instruction
Streams
Single
Multiple
Single Multiple
SISD SIMD
MISD MIMD
18
• The sequence of instructions read from memory constitutes an instruction
stream.
• The operations performed on the data in the processor constitute a data
stream.
• Parallel processing can be implemented with either instruction stream, data stream
or both.
SISD COMPUTER SYSTEMS
SISD (Single instruction single data stream) is the simplest computer available. It
contains no parallelism. It has single instruction and single data stream. The instructions
associated with SISD are executed sequentially and the system may or may not have
external; parallel processing capabilities.
Fig 3: SISD Architecture
Characteristics
- Standard von Neumann machine
- Instructions and data are stored in memory
- One operation at a time
Limitations
Von Neumann bottleneck
Maximum speed of the system is limited by the
Memory Bandwidth (bits/sec or bytes/sec)
- Limitation on Memory Bandwidth
- Memory is shared by CPU and I/O
Examples: Superscalar processors
Super pipelined processors
VLIW
MISD COMPUTER SYSTEMS
MISD (Multiple instruction, single data stream) is of no practical usage as there is least
chance where a lot of instructions get executed on a single data.
Control
Unit
Processor
Unit Memory
Instruction stream
Data stream
19
Fig 4: MISD Architecture
• Characteristics
- There is no computer at present that can be
Classified as MISD
SIMD COMPUTER SYSTEMS
SIMD (Single instruction Multiple data stream) is the computer where a single instruction
gets operated with different sets of data. It gets executed with the help of many
processing units controlled by a single control unit. The shared memory must contain
various modules so that it can communicate with all the processors at the same time.
• Main memory is used for storage of programs.
• Master control unit decodes the instruction and determine the instruction to be
executed.
M1
CU1 P1
M2 CU2 P2
Mn CUn Pn
•••
•••
Memory
Instruction stream
Data stream
Control Unit
Alignment network
P1 P2 • • • Pn
M1 Mn M2 • • •
Data bus
Instruction Stream
Data stream
Processor units
Memory modules
20
Memory
Fig 5: SIMD Architecture
• Characteristics
- Only one copy of the program exists
- A single controller executes one instruction at a time
Examples:
Array processors
Systolic arrays
Associative processors
MIMD COMPUTER SYSTEMS
MIMD (Multiple instruction, multiple data stream) refers to a computer system where we
have different processing elements working on different data.
In this we classify various multiprocessors and multi computers.
• Characteristics
- Multiple processing units
- Execution of multiple instructions on multiple data
Fig 6: MIMD Architecture
• Types of MIMD computer systems
- Shared memory multiprocessors
• UMA
• NUMA
- Message-passing multi computers
SHARED MEMORY MULTIPROCESSORS
Example systems
Bus and cache-based systems
- Sequent Balance, Encore Multimax
Multistage IN-based systems
- Ultra computer, Butterfly, RP3, HEP
Interconnection Network
P1 M1 P n Mn P2 M2 • • •
Shared Memory
21
Crossbar switch-based systems
- C.mmp, Alliant FX/8
Limitations
Memory access latency
Hot spot problem
SHARED MEMORY MULTIPROCESSORS (UMA)
Fig 7: Uniform Memory access(UMA)
Characteristics
All processors have equally direct access to one large memory address space. Thus
the access time to reach that memory is same for all processors thus it is named as UMA.
SHARED MEMORY MULTIPROCESSORS (NUMA)
Interconnection Network
• • •
P • • • 1 Pn P2
M1 Mn M2
Interconnection Network
• • •
P • • • 1 Pn P2
M M M
22
Mn M1 M2
Fig 8: NUMA (Non uniform memory access)
Characteristics
All processors have equally direct access to one large memory address space and also
have their own memory. Thus the access time to reach different memories is different for
each processor thus it is named as NUMA.
MESSAGE-PASSING MULTICOMPUTER
Fig 9: Message passing multi computer Architecture
Characteristics
- Interconnected computers
- Each processor has its own memory, and communicates via message-passing
Example systems
- Tree structure: Teradata, DADO
- Mesh-connected: Rediflow, Series 2010, J-Machine
- Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III
Limitations
- Communication overhead
- Hard to programming
Summary:
· Stored Program Control Concept: In this type of organization instructions and
data are stored separately.
· Flynn’s classification Of computers: It divided the processing work into
data streams and instruction streams and resulted in:
Message-Passing Network
P • • • 1 Pn P2
M M • • • M
Point-to-point connections
23
o SISD(Single instruction Single data)
o SIMD(Single instruction Multiple data)
o MISD(Multiple instruction Single data)
o MIMD (Multiple instruction Multiple data)
Important Questions:
Q1. Explain stored program control concept.
Q2. Explain Flynn’s classification of computers.
Q3. Describe the concept of data stream and instruction stream.
Lecture -5
MULTILEVEL VIEWPOINT OF A MACHINE
· MICRO ARCHITECTURE
· ISA
· MICRO ARCHITECTURE
CPU
CACHES
MAIN MEMORY AND SECONDARY MEMORY UNITS
INPUT / OUTPUT MAPPING
After the discussion of stored program control concept and the various type of parallel
computers, lets study the different components of the computer structure.
MULTILEVEL VIEWPOINT OF A MACHINE
Our computer is build on various layers.
These layers are basically divided into:
Software layer
Hardware Layer
Instruction Set Architecture
24
Fig 1: Multilevel viewpoint of a machine
Computer system architecture is decided on the basis of the type of applications or usage
of the computer.
The computer architect decides the different layers and the function of each layer for a
specific computer.
These layers or functions of each can vary from one organization to another.
Our layered architecture is basically divided into 3 parts:
Macro-Architecture: as a unit of deployment, we will talk about Client
applications and COM Servers.
Computer Architecture is the conceptual design and fundamental operational structure
of a computer system. It is a blueprint and functional description of requirements
(especially speeds and interconnections) and design implementations for the various parts
of a computer .
• This is basically our software layer of the computer.
• It comprises of :
– User Application layer
The user layer is basically to give the interface to the user with the computer
for which the computer is designed .At this layer the user gives the inputs as
what processing has to be done .The requirements given by the user has to be
implemented by the computer architect with the help of other layers.
– High level language
INSTRUCTION SET ARCHITECTURE (ISA)
PROCESSOR MEMORY I/0 SYSTEM
CIRCUIT LEVEL DESIGN
SILICON LAYOUT LAYER
COMPILER
ASSEMBLER
OS –MSDOS
WINDOWS
UNIX / LINUX
USER APPLICATION LAYER
SOFTWARE
LAYER
HARDWARE
LAYER
DATA PATH AND CONTROL
GATE LEVEL DESIGN
MACRO
ARCHITECTURE
MICRO
ARCHITECTURE
25
High-level programming language is a programming language with strong
abstraction from the details of the computer. In comparison to low-level
programming languages, it may use natural language elements, be easier to
use, or more portable across platforms. Such languages hide the details of
CPU operations such as memory access models and management of
scope.E.g. – C/Fortran/Pascal .These are not computer dependent.
– Assembly language
Assembly Language refers to the lowest-level human-readable method for
programming a particular computer. Assembly Languages are platform
specific, and therefore there is a different Assembly Language necessary for
programming every different type of computer.
– Machine language
Machine languages consist entirely of numbers and are almost impossible for
humans to read and write.
– Operating system
Operating systems interface with hardware to provide the necessary services
for application software. E.g. OS, LINUX, UNIX etc.
• Functions of Operating system:
– Process management
– Memory management
– File management
– Device management
– Error Detection
– Security
• Types of Operating system:
– Multiprogramming Operating System
– Multiprocessing Operating system
– Time Sharing Operating system
– Real time Operating system
– Distributed Operating system
– Network Operating system
– Compiler
Software that translates a program written in a high-level programming
language (C/C++, COBOL, etc.) into machine language. A compiler usually
generates assembly language first and then translates the assembly language
into machine language. A utility known as a "linker" then combines all
required machine language modules into an executable program that can run
in the computer.
26
– Assembler is the software that translates assembly language into machine
language. Contrast with compiler, which is used to translate a high-level
language, such as COBOL or C, into assembly language first and then into
machine language.
Instruction set architecture: This is an abstraction on the interface between the
hardware and the low-level software. It deals with the functional behaviour of a
computer system as viewed by a programmer . Computer organization deals with
structural relationships that are not visible by a programmer. Instruction set
architecture is the attribute of a computing system, as seen by the assembly
language programmer or compiler.
ISA is determined by:
 Data Storage.
 Memory Addressing Modes.
 Operations in the Instruction Set.
 Instruction Formats.
 Encoding the Instruction Set.
 Compiler’s View.
Micro-Architecture: inside a unit of deployment we will talk about running
process, COM apartment, thread concurrency and synchronization, memory
sharing.
Micro architecture, also known as Computer organization is a lower level, more
concrete, description of the system that involves how the constituent parts of the
system are interconnected and how they interoperate in order to implement the
ISA. The size of a computer’s cache for instance, is an organizational issue that
generally has nothing to do with the
· Processor memory I /o system – These are the basic hardware
devices required for the processing of any system application.
· Data path and control – In different computers we have different
number and type of registers and other logic circuits .The data path
and control decides the flow of information within the various
parts of the computer system in various circuits.
· Gate level design – These circuits such as register, counters etc are
implemented in the form of various gates available.
· Circuit level design – to add the gates to form a logical circuit or a
component we have the basic circuit level design which ultimately
gives birth to all the hardware components of a computer system.
· Silicon layout layer
Other than the architecture of the computer , we have some very basic units which are
important for our computer.
27
Memory units:
· Main Memory: The main memory of the computer is also known as RAM,
standing for Random Access Memory. It is constructed from integrated circuits
and needs to have electrical power in order to maintain its information. When
power is lost, the information is lost too! It can be directly accessed by the CPU.
· Caches: A CPU cache is a cache used by the central processing unit of a
computer to reduce the average time to access memory. The cache is a smaller,
faster memory which stores copies of the data from the most frequently used main
memory locations. Cache memory is random access memory (RAM) that a
computer microprocessor can access more quickly than it can access regular
RAM. As the microprocessor processes data, it looks first in the cache memory
and if it finds the data there (from a previous reading of data), it does not have to
do the more time-consuming reading of data from larger memory.
· Secondary Memory: Secondary memory which is sometimes called backing
store or external memory, allows the permanent storage of large quantities of data.
Example : Hard disk , floppy disk , CDs etc.
CPU: A central processing unit (CPU) is a machine that can execute computer
programs. The fundamental operation of most CPUs, regardless of the physical form they
take, is to execute a sequence of stored instructions called a program. The program is
represented by a series of numbers that are kept in some kind of computer memory. There
are four steps that nearly all CPUs use in their operation: fetch, decode, execute, and
writeback.
I/O units: I/O refers to the communication between an information processing system
(such as a computer), and the outside world – possibly a human, or another information
processing system. Inputs are the signals or data received by the system, and outputs are
the signals or data sent from it.
Summary:
· Multilevel view point of a machine describes the complete structure of the
computer system in a hierarchical manner which comprises of:
o Macro Architecture: Hardware components
o Micro Architecture: Software components
 Operating system
 High level language
 Assembly language
 Compiler
 Assembler
o ISA: How hardware components and software components are
connected. It describes
 Data Storage.
 Memory Addressing Modes.
28
 Operations in the Instruction Set.
 Instruction Formats.
 Encoding the Instruction Set.
 Compiler’s View
· Other than the structured organization of computer , other important
elements are:
o Memory
o CPU
o I/O
Important Questions:
Q1. Explain multi – level view point of a machine.
Q2. Describe micro architecture.
Q3. Describe macro architecture.
Q4. Explain ISA and why we call it is a link between the hardware and software
components.
Q5. What is operating system?
29
Lecture – 6:
• CPU performance measures
• MIPS
• MFLOPS
After the discussion of all the elements of computer structure in the previous topics , we
describe the performance of a computer in this lecture with the help of their performance
metrics.
• Performance of a machine is determined by:
– Instruction count
– Clock cycle time
– Clock cycles per instruction
• Processor design (datapath and control) will determine:
– Clock cycle time
– Clock cycles per instruction
• Single cycle processor - one clock cycle per instruction
– Advantages: Simple design, low CPI
– Disadvantages: Long cycle time, which is limited by the slowest
instruction
• We have different methods to calculate the performance of a CPU or two compare
two CPUs but it highly depends on what type of instructions we give to these
CPU.
• The two phenomenon we generally use are:
– MIPS
– MFLOPS
MIPS:
• For a specific program running on a specific computer MIPS is a measure of
how many millions of instructions are executed per second:
MIPS = Instruction count / (Execution Time x 106)
= Instruction count / (CPU clocks x Cycle time x 106)
= (Instruction count x Clock rate) /
(Instruction count x CPI x 106)
= Clock rate / (CPI x 106)
CP
I
Inst.
Count
Cycle
Time
30
• Faster execution time usually means faster MIPS rating.
MIPS is a good technique but it also have some pitfalls.
Problems with MIPS rating:
· No account for the instruction set used.
· Program-dependent: A single machine does not have a single MIPS rating
since the MIPS rating may depend on the program used.
· Easy to abuse: Program used to get the MIPS rating is often omitted.
· Cannot be used to compare computers with different instruction sets.
· A higher MIPS rating in some cases may not mean higher performance or
better execution time i.e. due to compiler design variations.
• For a machine with instruction classes:
• For a given program, two compilers produced the following instruction counts:
• The machine is assumed to run at a clock rate of 100 MHz.
MIPS = Clock rate / (CPI x 106) = 100 MHz / (CPI x 106)
CPI = CPU execution cycles / Instructions count
CPU time = Instruction count x CPI / Clock rate
• For compiler 1:
– CPI1 = (5 x 1 + 1 x 2 + 1 x 3) / (5 + 1 + 1) = 10 / 7 = 1.43
– MIP1 = 100 / (1.428 x 106) = 70.0
– CPU time1 = ((5 + 1 + 1) x 106 x 1.43) / (100 x 106) = 0.10 seconds
• For compiler 2:
– CPI2 = (10 x 1 + 1 x 2 + 1 x 3) / (10 + 1 + 1) = 15 / 12 = 1.25
– MIP2 = 100 / (1.25 x 106) = 80.0
– CPU time2 = ((10 + 1 + 1) x 106 x 1.25) / (100 x 106) = 0.15 seconds
Instruction class CPI
A 1
B 2
C 3
Instruction counts (in millions)
for each instruction class
Code from: A B C
Compiler 1 5 1 1
Compiler 2 10 1 1
31
MFLOPS:
• MFLOPS, for a specific program running on a specific computer, is a measure of
millions of floating point-operation (megaflops) per second.
MFLOPS = Number of floating-point operations /(Execution time x 106 )
• MFLOPS is a better comparison measure between different machines than MIPS.
This is better than MIPS but it also has some pitfalls.
Problems with MFLOPS:
• A floating-point operation is an addition, subtraction, multiplication, or division
operation applied to numbers represented by a single or a double precision
floating-point representation.
• Program-dependent: Different programs have different percentages of floatingpoint
operations present i.e. compilers have no floating- point operations and
yield a MFLOPS rating of zero.
• Dependent on the type of floating-point operations present in the program.
Summary:
· Performance of a machine is determined by:
• Instruction count
• Clock cycle time
• Clock cycles per instruction
· MIPS = Instruction count / (Execution Time x 106)
· MFLOPS = Number of floating-point operations /(Execution time
x 106 )
Important Questions:
Q1. What is MIPS?
Q2. What is MFLOPS?
Q3. What is the difference between MIPS and MFLOPS?
Q4. What are CPU performance measures?
32
Lecture – 7:
· Cache Memory
· Main Memory
· Secondary Memory
We have basically 3 type of memories attached with our processor.
Cache Memory
Main Memory
Secondary Memory
Primary storage, presently known as memory, is the only one directly accessible to the
CPU. The CPU continuously reads instructions stored there and executes them as
required. Any data actively operated on is also stored there in uniform manner.
there are two more sub-layers of the primary storage, besides main large-capacity RAM:
· Processor registers are located inside the processor. Each register typically holds a
word of data (often 32 or 64 bits). CPU instructions instruct the arithmetic and
logic unit to perform various calculations or other operations on this data (or with
the help of it). Registers are technically among the fastest of all forms of
computer data storage.
· Processor cache is an intermediate stage between ultra-fast registers and much
slower main memory. It's introduced solely to increase performance of the
computer. Most actively used information in the main memory is just duplicated
in the cache memory, which is faster, but of much lesser capacity. On the other
hand it is much slower, but much larger than processor registers. Multi-level
hierarchical cache setup is also commonly used—primary cache being smallest,
fastest and located inside the processor; secondary cache being somewhat larger
and slower.
These are the type of memories accessed when we work with processor . But if we have
to store some data permanently we need to take help of secondary or auxiliary memory.
Secondary memory (or secondary storage) is the slowest and cheapest form of memory. It
cannot be processed directly by the CPU. It must first be copied into primary storage
(also known as RAM ).
Secondary memory devices include magnetic disks like hard drives and floppy disks ;
optical disks such as CDs and CDROMs ; and magnetic tapes, which were the first forms
of secondary memory.
Primary memory Secondary memory
33
1. Fast 1. Slow
2. Expensive 2. Cheap
3. Low capacity 3. Large capacity
4. Connects directly to the processor 4. Not connected directly to the processor
Hard Disks:
Hard disks similar to cassette tapes use the magnetic recording techniques - the magnetic medium
can be easily erased and rewritten, and it will "remember" the magnetic flux patterns stored onto
the medium for many years.
Hard drive consists of platter, control circuit board and interface parts.
A hard disk is a sealed unit containing a number of platters in a stack. Hard disks may be mounted
in a horizontal or a vertical position. In this description, the hard drive is mounted horizontally.
Electromagnetic read/write heads are positioned above and below each platter. As the platters spin,
the drive heads move in toward the center surface and out toward the edge. In this way, the drive
heads can reach the entire surface of each platter.
On a hard disk, data is stored in thin, concentric bands. A drive head, while in one position can
read or write a circular ring, or band called a track. There can be more than a thousand tracks on a
3.5-inch hard disk. Sections within each track are called sectors. A sector is the smallest physical
storage unit on a disk, and is almost always 512 bytes (0.5 kB) in size.
The stack of platters rotate at a constant speed. The drive head, while positioned close to the center
of the disk reads from a surface that is passing by more slowly than the surface at the outer edges
of the disk. To compensate for this physical difference, tracks near the outside of the disk are lessdensely
populated with data than the tracks near the center of the disk. The result of the different
data density is that the same amount of data can be read over the same period of time, from any
34
drive head position.
The disk space is filled with data according to a standard plan. One side of one platter contains
space reserved for hardware track-positioning information and is not available to the operating
system. Thus, a disk assembly containing two platters has three sides available for data. Trackpositioning
data is written to the disk during assembly at the factory. The system disk controller
reads this data to place the drive heads in the correct sector position.
Magnetic Tapes:
An electric current in a coil of wire produces a magnetic field similar to that of a bar magnet, and
that field is much stronger if the coil has a ferromagnetic (iron-like) core
Tape heads are made from rings of ferromagnetic material with a gap where the tape contacts it so
the magnetic field can fringe out to magnetize the emulsion on the tape. A coil of wire around the
ring carries the current to produce a magnetic field proportional to the signal to be recorded. If an
already magnetized tape is passed beneath the head, it can induce a voltage in the coil. Thus the
same head can be used for recording and playback.
35
Lecture – 8:
• Instruction Set based classification of computers
– Three address instructions
– Two address instructions
– One address instructions
– Zero address instructions
– RISC address instructions
– CISC address instructions
– RISC Vs CISC
In the last chapter we discussed the various architectures and the layers of the computer
architecture. In this chapter we are explaining the middle layer of the multilevel view
point of a machine i.e. Instruction Set Architecture.
Instruction Set Architecture (ISA) is an abstraction on the interface between the hardware
and the low-level software.
It comprises of :
Instruction Formats.
Memory Addressing Modes.
Operations in the Instruction Set.
Encoding the Instruction Set.
Data Storage.
Compiler’s View.
Instruction Format
Is the representation of the instruction. It contains the various Instruction Fields :
· opcode field – specify the operations to be performed
· Address field(s) – designate memory address(es) or processor register(s)
· Mode field(s) – determine how the address field is to be interpreted to get
effective address or the operand
• The number of address fields in the instruction format :
depend on the internal organization of CPU
• The three most common CPU organizations :
- Single accumulator organization :
ADD X /* AC ← AC + M[X] */
- General register organization :
ADD R1, R2, R3 /* R1 ← R2 + R3 */
ADD R1, R2 /* R1 ← R1 + R2 */
MOV R1, R2 /* R1 ← R2 */
ADD R1, X /* R1 ← R1 + M[X] */
- Stack organization :
PUSH X /* TOS ← M[X] */
36
ADD
Address Instructions:
Three-address Instructions
- Program to evaluate X = (A + B) * (C + D) :
ADD R1, A, B /* R1 ← M[A] + M[B] */
ADD R2, C, D /* R2 ← M[C] + M[D] */
MUL X, R1, R2 /* M[X] ← R1 * R2 */
- Results in short program
- Instruction becomes long (many bits)
• Two-address Instructions
- Program to evaluate X = (A + B) * (C + D) :
MOV R1, A /* R1 ← M[A] */
ADD R1, B /* R1 ← R1 + M[A] */
MOV R2, C /* R2 ← M[C] */
ADD R2, D /* R2 ← R2 + M[D] */
MUL R1, R2 /* R1 ← R1 * R2 */
MOV X, R1 /* M[X] ← R1 */
One-address Instructions
- Use an implied AC register for all data manipulation
- Program to evaluate X = (A + B) * (C + D) :
LOAD A /* AC ← M[A] */
ADD B /* AC ← AC + M[B] */
STORE T /* M[T] ← AC */
LOAD C /* AC ← M[C] */
ADD D /* AC ← AC + M[D] */
MUL T /* AC ← AC * M[T] */
STORE X /* M[X] ← AC */
• Zero-address Instructions
- Can be found in a stack-organized computer
- Program to evaluate X = (A + B) * (C + D) :
PUSH A /* TOS ←A */
PUSH B /* TOS ←B */
ADD /* TOS ← (A + B) */
PUSH C /* TOS ←C */
PUSH D /* TOS ←D */
ADD /* TOS ← (C + D) */
MUL /* TOS ← (C + D) * (A + B) */
POP X /* M[X] ← TOS */
CISC(Complex Instruction Set Computer)
• These computers with many instructions and addressing modes came to be known
as Complex Instruction Set Computers (CISC)
37
• One goal for CISC machines was to have a machine language instruction to match
each high-level language statement type.
Criticisms on CISC
-Complex Instruction
→ Format, Length, Addressing Modes
→ Complicated instruction cycle control due to the complex decoding HW
and decoding process
- Multiple memory cycle instructions
→ Operations on memory data
→ Multiple memory accesses/instruction
- Microprogrammed control is necessity
→ Microprogram control storage takes substantial portion of CPU chip area
→ Semantic Gap is large between machine instruction and microinstruction
- General purpose instruction set includes all the features required by
individually different applications
→ When any one application is running, all the features required by
the other applications are extra burden to the application
RISC
In the late ‘70s - early ‘80s, there was a reaction to the shortcomings of the CISC style of
processors
– Reduced Instruction Set Computers (RISC) were proposed as an
alternative
• The underlying idea behind RISC processors is to simplify the instruction set and
reduce instruction execution time
Note : In RISC type of instructions , we cant access the memory operands directly .
Evaluate X = (A + B) * (C + D) :
MOV R1, A /* R1 ← M[A] */
MOV R2, B /* R2 ← M[B] */
ADD R1,R1,R2 /* R1 ← R1 + R2
MOV R2, C /* R2 ← M[C] */
MOV R3, D /* R3 ← M[D] */
ADD R2,R2, R3 /* R2 ← R2 + R2 */
MUL R1,R1, R2 /* R1 ← R1 * R2 */
MOV X, R1 /* M[X] ← R1 */
• RISC processors often feature:
– Few instructions
– Few addressing modes
– Only load and store instructions access memory
38
– All other operations are done using on-processor registers
– Fixed length instructions
– Single cycle execution of instructions
– The control unit is hardwired, not microprogrammed
Since all (but the load and store instructions) use only registers for operands,
– only a few addressing modes are needed
• By having all instructions the same length :
– reading them in is easy and fast
• The fetch and decode stages are simple, looking much more like Mano’s BC
than a CISC machine
– The instruction and address formats are designed to be easy to decode
– (Unlike the variable length CISC instructions,) the opcode and register
fields of RISC instructions can be decoded simultaneously
• The control logic of a RISC processor is designed to be simple and fast :
– The control logic is simple because of the small number of instructions and
the simple addressing modes
– The control logic is hardwired, rather than microprogrammed, because
hardwired control is faster
ADVANTAGES OF RISC
VLSI Realization
- Control area is considerably reduced
RISC chips allow ⇒ a large number of registers on the chip
- Enhancement of performance and HLL support
- Higher regularization factor and lower VLSI design cost
• Computing Speed
- Simpler, smaller control unit ⇒ faster
- Simpler instruction set; addressing modes; instruction format
⇒ faster decoding
- Register operation ⇒ faster than memory operation
- Register window ⇒ enhances the overall speed of execution
- Identical instruction length, One cycle instruction execution
⇒ suitable for pipelining ⇒ faster
Design Costs and Reliability
- Shorter time to design
⇒ reduction in the overall design cost and reduces the problem that the end
product will be obsolete by the time the design is completed
- Simpler, smaller control unit
⇒ higher reliability
- Simple instruction format (of fixed length)
39
⇒ ease of virtual memory management
• High Level Language Support
- A single choice of instruction ⇒ shorter, simpler compiler
- A large number of CPU registers ⇒ more efficient code
- Register window ⇒ Direct support of HLL
- Reduced burden on compiler writer
RISC VS CISC
• The CISC Approach
Thus, the entire task of multiplying two numbers can be completed with one
instruction:
– MULT 2:3, 5:2
• One of the primary advantages of this system is that the compiler has to do very
little work to translate a high-level language statement into assembly. Because the
length of the code is relatively short, very little RAM is required to store
instructions. The emphasis is put on building complex instructions directly into
the hardware.
• The RISC Approach
In order to perform the exact series of steps described in the CISC approach, a
programmer would need to code four lines of assembly:
• LOAD A, 2:3
LOAD B, 5:2
PROD A, B
STORE 2:3,
• A At first, this may seem like a much less efficient way of completing the
operation. Because there are more lines of code, more RAM is needed to store the
assembly level instructions. The compiler must also perform more work to
convert a high-level language.
•
RISC vs CISC
Emphasis on hardware
Transistors used for storing
complex instructions
Emphasis on software
Spends more transistors
on memory registers
Includes multi-clock
complex instructions,
Single-clock reduced instruction only
Memory-to-memory:
"LOAD" and "STORE"
incorporated in instructions
Register to register:
"LOAD" and "STORE"
are independent instructions
Small code sizes large code sizes
40
High cycles per second Low cycles per second
Summary:
· The instruction format is composed of the opcode field, address field, and mode field.
· The different types of address instructions used are three-address, two-address, oneaddress
and zero-address.
· RISC and CISC Introduction with its advantages and criticism
· RISC Vs CISC
Important Questions:
Q1.Explain the different addressing formats in detail with example.
Q2.Explain RISC AND CISC with their advantages and criticisms.
Q3 Numerical
41
Lecture – 9:
• Addressing modes
– Implied Mode
– Immediate Mode
– Register Mode
– Register Indirect Mode
– Autoincrement or Autodecrement Mode
– Direct Addressing Mode
– Indirect Addressing Mode
– Relative addressing Mode
In the last lecture we studied the instruction formats, now we study how the instructions
use the addressing modes of different types.
Addressing Modes
Addressing Modes
* Specifies a rule for interpreting or modifying the address field of the instruction
(before the operand is actually referenced)
* Variety of addressing modes
- to give programming flexibility to the user
- to use the bits in the address field of the instruction efficiently
In simple words we can say the addressing modes is the way to fetch operands (or Data)
from memory.
TYPES OF ADDRESSING MODES
• Implied Mode
: Address of the operands are specified implicitly in the definition of the
instruction
- No need to specify address in the instruction
- EA = AC, or EA = Stack[SP]
- Examples from BC : CLA, CME, INP
• Immediate Mode
: Instead of specifying the address of the operand,operand itself is specified
- No need to specify address in the instruction
- However, operand itself needs to be specified
- (-)Sometimes, require more bits than the address
- (+) Fast to acquire an operand
- Useful for initializing registers to a constant value
• Register Mode
: Address specified in the instruction is the register address
42
- Designated operand need to be in a register
- (+) Shorter address than the memory address
-- Saving address field in the instruction
- (+) Faster to acquire an operand than the memory addressing
- EA = IR(R) (IR(R): Register field of IR)
• Register Indirect Mode
: Instruction specifies a register which contains the memory address of the
operand
- (+) Saving instruction bits since register address is shorter than the memory
address
- (-) Slower to acquire an operand than both the register addressing or memory
addressing
- EA = [IR(R)] ([x]: Content of x)
• Autoincrement or Autodecrement Mode
- Similar to the register indirect mode except :
When the address in the register is used to access memory, the value in the
register is incremented or decremented by 1 automatically
• Direct Address Mode
: Instruction specifies the memory address which can be used directly to access
the memory
- (+) Faster than the other memory addressing modes
- (-) Too many bits are needed to specify the address for a large physical memory
space
- EA = IR(addr) (IR(addr): address field of IR)
- E.g., the address field in a branch-type instr
• Indirect Addressing Mode
: The address field of an instruction specifies the address of a memory location
that contains the address of the operand
- (-) Slow to acquire an operand because of an additional memory access
- EA = M[IR(address)]
• Relative Addressing Modes
: The Address fields of an instruction specifies the part of the address
(abbreviated address) which can be used along with a designated
register to calculate the address of the operand
--> Effective addr = addr part of the instr + content of a special register
- (+) Large physical memory can be accessed with a small number of
address bits
- EA = f(IR(address), R), R is sometimes implied
--> Typically EA = IR(address) + R
- 3 different Relative Addressing Modes depending on R
* (PC) Relative Addressing Mode (R = PC)
* Indexed Addressing Mode (R = IX, where IX: Index Register)
* Base Register Addressing Mode (R = BAR(Base Addr Register))
* Indexed addressing mode vs. Base register addressing mode
- IR(address) (addr field of instr) : base address vs. displacement
- R (index/base register) : displacement vs. base address
43
- Difference: the way they are used (NOT the way they are computed)
* indexed addressing mode : processing many operands in an array using the same instr
* base register addressing mode : facilitate the relocation of programs in memory in
multiprogramming systems
Addressing Modes: Examples
Summary:
· Addressing Modes: Specifies a rule for interpreting or modifying the address field
of the instruction.
· The different types of addressing modes are: Implied mode, Immediate mode,
Register mode, Register indirect mode, Autoincrement or auto decrement mode,
Direct mode, Indirect mode, Relative addressing mode.
Important Questions:
Q1. Explain the addressing modes with suitable examples.
44
Lecture – 10:
• Instruction set
– Data Transfer Instructions
o Typical Data Transfer Instructions
o Data Transfer Instructions with Different Addressing
Modes
– Data Manipulation Instructions
o Arithmetic instructions
o Logical and bit manipulation instructions
o Shift instructions
– Program Control Instructions
o Conditional Branch Instructions
o Subroutine Call & Return
DATA TRANSFER INSTRUCTIONS
These are the type of instructions used only for the transfer of data from
registers to registers, registers to memory operands and other memory
components. There is no manipulation done on the data values.
These are the type of instructions in which there is no usage of various
addressing modes. We have a direct transfer between the various registers
and memory components.
Load LD
Store ST
Move MOV
Exchange XCH
Input IN
Output OUT
Push PUSH
Pop POP
Name Mnemonic
Typical Data Transfer Instructions
Table 3.1
45
 Like Load and store we used for the transfer of data to and from the
accumulator.
 LD 20
 ST D
 Move and Exchange are used for the data transfer between various
general purpose registers.
 MOV R1,R2
 MOV R1,X
 XCH R1,R2
 Input and Output are used for the data transfer between memory
and I/O devices.
 Push and Pop operations are used for information flow between
stack and memory.
· Data Transfer Instructions with Different Addressing Modes
In these type of data transfers we use different addressing for loading the operand value
in the accumulator register.
DATA MANIPULATION INSTRUCTIONS
· Three Basic Types:
 Arithmetic instructions
Direct address LD ADR AC ¬  M[ADR]
Indirect address LD @ADR AC ¬ M[M[ADR]]
Relative address LD $ADR AC ¬ M[PC + ADR]
Immediate operand LD #NBR AC ¬ NBR
Index addressing LD ADR(X) AC ¬ M[ADR + XR]
Register LD R1 AC ¬ R1
Register indirect LD (R1) AC ¬ M[R1]
Autoincrement LD (R1)+ AC ¬ M[R1], R1 ¬ R1 + 1
Autodecrement LD -(R1) R1 ¬ R1 - 1, AC ¬ M[R1]
Mode
Assembly
Conventio
n
Register Transfer
Table 3.2
46
 Logical and bit manipulation instructions
 Shift instructions
· Arithmetic Instructions : These are the type of instructions used for arithmetical
calculations like addition , subtraction , increment etc.
· Logical and Bit Manipulation Instructions
These are the type of instructions in which are operations are computed on string
of bits. These bits are treated as individual and thus the operation can be done on
individual or a group of bits ignoring the whole value and even new bits insertion
is possible.
For example:
CLR R1 will make all the bits as 0.
COM R1 will invert all the bits.
AND , OR and XOR will produce the result on 2 individual bits of
each operand.
E.g.: AND of 0011 and 1100 will result to:
0000.
AND instruction is also known as mask instruction as if we have to mask
some values of operand we can AND that value with 0s giving other
inputs as 1(high).
E.g.: Suppose we have to mask register with value 11000110
On 1st , 3rd and 7th bit. Then we will have to AND it with value 01011101.
CLRC, SETC and COMC will work only on 1 bit of the operand i.e.
Carry.
Name Mnemonic
Increment INC
Decrement DEC
Add ADD
Subtract SUB
Multiply MUL
Divide DIV
Add with Carry ADDC
Subtract with Borrow SUBB
Negate(2’s Complement) NEG
Table 3.3
47
Similarly in case of EI and DI we work only on 1 bit interrupt flip flop to
enable it.
Name Mnemonic
Clear CLR
Complement COM
AND AND
OR OR
Exclusive-OR XOR
Clear carry CLRC
Set carry SETC
Complement carry COMC
Enable interrupt EI
Disable interrupt DI
· Shift Instructions : These are the type of instructions which modify the whole
value of operand but by shifting the bits on left or right side.
· Say R1 has value 11001100
o SHR inserts 0 at the left most position.
 Result 01100110
o SHL inserts 0 at the right most position.
 Result 10011000
o SHRA : In case of SHRA the sign bit remains same else every bit shift left
or right accordingly.
 Result 11100110
o SHLA is same as that of SHL inserting 0 in the end.
 Result 10011000
o In ROR , all the bits are shifted towards right and the rightmost one moves
to leftmost position.
 Result : 01100110
o In ROL , all the bits are shifted towards left and the leftmost one moves to
rightmost position.
 Result : 10011001
48
Table 3.4
o In case of RORC , suppose we have a carry bit as 0 with register R1. In
this all the bits of the register will be right shifted and the value of carry
will be moved to leftmost position and the rightmost position will be
moved to carry.
 Result : 01100110 with carry as 0
o Similarly in case of ROLC , we will get all the bits of the register left
shifted and the value of carry moved to rightmost position and the leftmost
position will be moved to carry.
 Result : 10011000 with carry as 1.
PROGRAM CONTROL INSTRUCTIONS:
Before starting with program control instructions, lets study the concept of PC i.e.
Program counter. Program counter is the register which tells us the address of the
next instruction to be executed. When we fetch the instruction pointed by PC from
memory it changes it value giving us the address of the next instruction to be
fetched. In case of sequential instructions it simply increments itself and in case of
branching or modular programs it gives us the address of the first instruction of the
called program. After the execution of the called program , the program counter
points back to the instruction next to the instruction from which the subprogram
was called. In case of go to kind of instructions the program counter simply changes
the value of program counter with out keeping any reference of the previous
instruction..
Logical shift right SHR
Logical shift left SHL
Arithmetic shift right SHRA
Arithmetic shift left SHLA
Rotate right ROR
Rotate left ROL
Rotate right thru carry RORC
Rotate left thru carry ROLC
Name
Mnemonic
49
Table 3.5
Program Control Instructions: These instructions are used for the transfer of control to
other instructions. That is these instructions are used in case we have to execute the next
instruction from some other location instead of sequential manner.
The conditions can be :
Calling a sub program
Returning to the main program
Jumping onto some other instruction or location
Skip the instructions in case of break and exit or in case the condition you
check is false and so on
*CMP and TST instructions do not retain their results of operation (- and AND,
respectively).They only set or clear certain flags.
Conditional Branch Instructions: These are the instructions in which we test some
conditions and depending on the result we go either for branching or sequential way.
PC
+1
In-Line Sequencing (Next instruction is fetched
from the next adjacent location in the memory)
Address from other source; Current Instruction,
Stack, etc; Branch, Conditional Branch,
Subroutine, etc.
Name
Mnemonic
Branch BR
Jump JMP
Skip SKP
Call CALL
Return RTN
Compare(by - ) CMP
Test(by AND) TST
50
Table 3.6
Subroutine Call and Return:
Subroutine Call: Call Subroutine
Jump to Subroutine
Branch to Subroutine
Branch & save return address
Two most important operations are implied:
*Branch to the beginning of the Subroutine
-Same as the branch or conditional branch
*Save the return address to get the address of the location of the calling program
upon exit from the subroutine.
· Location of storing return address:
 Fixed Location in the subroutine (Memory)
 Fixed Location in memory
BZ Branch if zero Z = 1
BNZ Branch if not zero Z = 0
BC Branch if carry C = 1
BNC Branch if no carry C = 0
BP Branch if plus S = 0
BM Branch if minus S = 1
BV Branch if overflow V = 1
BNV Branch if no overflow V = 0
BHI Branch if higher A > B
BHE Branch if higher or equal A ³ B
BLO Branch if lower A < B
BLOE Branch if lower or equal A £ B
BE Branch if equal A = B
BNE Branch if not equal A ¹ B
BGT Branch if greater than A > B
BGE Branch if greater or equal A ³ B
BLT Branch if less than A < B
BLE Branch if less or equal A £ B
BE Branch if equal A = B
BNE Branch if not equal A ¹ B
Unsigned compare conditions (A -
B)
Signed compare conditions (A -
B)
Mnemonic Branch condition Tested condition
51
Table 3.7
 In a processor Register
 In memory stack
-most efficient way
Summary:
 Data Transfer Instructions are of two types namely: Typical Data Transfer
Instructions and Data Transfer Instructions with Different Addressing Modes.
 The Data Manipulation Instructions are of three types, which are Arithmetic
instructions, Logical and bit manipulation instructions and Shift instructions.
 Program Control Instructions can be divided into Conditional Branch Instructions
and Subroutine Call & Return instructions.
Important Questions:
Q1.Explain the data Transfer instructions.
Q2.Explain the data Manipulation instructions.
Q3.Explain the Program control instructions with example.
52
CALL
SP ¬ SP - 1
M[SP] ¬ PC
PC ¬ EA
RTN
PC ¬ M[SP]
SP ¬ SP + 1
Lecture – 11:
· Program Interrupts
· MASM
PROGRAM INTERRUPT:
Types of Interrupts:
1. External Interrupt: External interrupts are initiated from outside of CPU &
memory.
-I/O device-> Data transfer request or data transfer complete
-Timing device ->timeout
- Power failure
- Operator
2. Internal Interrupts (traps): Internal Interrupts are caused by the
currently running program.
- Register, Stack Overflow
- Divide by Zero
- OP- code violation
- Protection Violation
3. Software Interrupts: Both external & internal interrupts are
intiated by the computer hardware. Software interrupts are initiated
by the executing instruction.
-Supervisor Call -> Switching from user mode to the supervisor
mode
-> Allows to execute a certain class of
operations which are not allowed in the
user mode.
MASM:
If you have used a modern word processor such as Microsoft Word and have noticed the
macros feature. Where you can record a series of frequently used actions or commands
into the macros. For example, you always need to insert a 2 by 4 column with the title
"Date" and "Time". You can start the macro recorder and create the table as you wish.
After that, you can save the macro. The next time you need to create the same kind of
table, you just need to execute the macro. The same applies for a macro assembler. It
enables you to record down frequently performed actions or a frequently used block of
code so that you do not have to re-type it each time.
The Microsoft Macro Assembler (abbreviated MASM) is an x86 high-level assembler for
DOS and Microsoft Windows. Currently it is the most popular x86 assembler. It supports
a wide variety of macro facilities and structured programming idioms, including highlevel
functions for looping and procedures. Later versions added the capability of
53
producing programs for Windows. MASM is one of the few Microsoft development tools
that target 16-bit, 32-bit and 64-bit platforms. Earlier versions were MS-DOS
applications. Versions 5.1 and 6.0 were OS/2 applications and later versions were Win32
console applications. Versions 6.1 and 6.11 included Phar Lap's TNT DOS extender so
that MASM could run in MS-DOS.[ citation needed
 The name MASM originally referred to as MACRO ASSEMBLER but over the
years it has become synonymous with Microsoft Assembler.
 An Assembly language translator converts macros into several machine language
instructions.
 MASM isn't the fastest assembler around (it's not particularly slow, except in a
couple of degenerate cases, but there are faster assemblers available).
 Though very powerful, there are a couple of assemblers that, arguably, are more
powerful (e.g., TASM and HLA).
 MASM is only usable for creating DOS and Windows applications; you cannot
effectively use it to create software for other operating systems.
Benefits of MASM
There are some benefits to using MASM today:
–Steve Hutchessen's ("Hutch") MASM32 package provides the support for MASM that
Microsoft no longer provides.
–You can download MASM (and MASM32) free from Microsoft and other sites.
–Most Windows' assembly language examples on the Internet today use MASM syntax.
–You may download MASM directly from Webster as part of the MASM32 package.
Summary:
 Program Interrupts can be external, internal or software interrupts.
 MASM is Microsoft or macro assembler used for implementing macros.
Important Questions:
Q1.What are Program interrupts. Explain the types of Program interrupts.
Q2. Explain MASM in detail.
54
Lecture – 10:
· CPU Architecture types
o Accumulator
o Register
o Stack
o Memory / Register
· Detailed data path of a register based CPU
In Unit -3 we discussed the instruction set computer(ISA) which deals with the various
types of address instructions , addressing modes and different types of instructions in
various computer architectures.
In this chapter we will discuss the various type of computer organizations we have.
• In general, most processors or computers are organized in one of 3 ways
– Single register (Accumulator) organization
• Basic Computer is a good example
• Accumulator is the only general purpose register
– Stack organization
• All operations are done using the hardware stack
• For example, an OR instruction will pop the two top elements from
the stack, do a logical OR on them, and push the result on the stack
– General register organization
• Used by most modern computer processors
• Any of the registers can be used as the source or destination for
computer operations
Accumulator type of Organization:
In case of accumulator type of organizations, one operand is in memory and other is in
accumulator.
The instructions we can run with accumulator are :
AC ¬ AC Ù DR AND with DR
AC ¬ AC + DR Add with DR
AC ¬ DR Transfer from DR
AC(0-7) ¬ INPR Transfer from INPR
AC ¬ AC¢ Complement
AC ¬ shr AC, AC(15) ¬ E Shift right
AC ¬ shl AC, AC(0) ¬ E Shift left
AC ¬ 0 Clear
AC ¬ AC + 1 Increment
55
Circuit required:
Stack Organization:
Stack
- Very useful feature for nested subroutines, nested interrupt services
- Also efficient for arithmetic expression evaluation
- Storage which can be accessed in LIFO
- Pointer: SP
- Only PUSH and POP operations are applicable
Stack type of organization is of two types
1616
8
Adder and
logic
circuit
16
AC
From DR Accumulator
From INPR
Control
Gates
LD INR CLR
16
To bus
Clock
56
REGISTER STACK ORGANIZATION
Register Stack
Push, Pop operations
A B C
0 1 2 3 4
63
Address
FULL EMPTY
SP
DR
Flags
Stack pointer
6 bits
/* Initially, SP = 0, EMPTY = 1, FULL = 0 */
PUSH POP
SP ¬ SP + 1 DR ¬ M[SP]
M[SP] ¬ DR SP ¬ SP - 1
If (SP = 0) then (FULL ¬ 1) If (SP = 0) then (EMPTY ¬ 1)
EMPTY ¬ 0 FULL ¬ 0
57
MEMORY STACK ORGANIZATION
Memory with Program, Data, and Stack Segments
A portion of memory is used as a stack with a processor register as a stack pointer
- PUSH: SP ¬ SP - 1
M [SP] ¬ DR
- POP: DR ¬ M [SP]
SP ¬ SP + 1
Note: Most computers do not provide hardware to check stack overflow (full
stack) or underflow (empty stack) à must be done in software
Register type of organization:
In this we take the help of various registers , say R1 to R8 for transfer and
manipulation of data.
Detailed data path of a typical register based CPU
4001
4000
3999
3998
3997
3000
Data
(Operands)
Program
(Instructions)
1000
PC
AR
SP
Stack
Stack grows
In this direction
58
To avoid memory access directly (as it is very time consuming and thus a costly
technique) , we prefer the register organization as it proves to be more efficient and time
saving organization.
In this we are using 7 registers. The two multiplexers and a decoder decide which
registers to be used as operands source and what register to be used as a destination for
the storage of result.
MUX 1 decides the 1st operand register which depends on the values of SELS1 (Selector
for source 1).Similarly, for MUX 2, SELs2 works as input for 2nd operand decision.
These two inputs through S1bus and S2 bus reach ALU. OPR denotes the type of
operation to be performed and the computation or operation is performed on ALU. Then
the result is either stored back in one of the 7 registers with the help of decoder which
decides which is the resultant register with the help of SELD.
MUX
SELS1 { 1 MUX
2 }SELS2
OPR ALU
R1
R2
R3
R4
R5
R6
R7
Input
3 x 8
Decoder
SELD
Load
(7 lines)
Output/Result
S1
bus
S2
bus
Clock
59
Lecture – 13:
· Address Sequencing / Microinstruction Sequencing
· Implementation of control unit
Address Sequencing/Microinstruction Sequencing:
Microinstructions are stored in control memory in groups, with each group specifying a
routine. The hardware that controls the address sequencing of the control memory must
be capable of sequencing the microinstructions within a routine and be able to branch
from one routine to another with the help of this circuit.
Steps :
An initial address is loaded into CAR at power turned ON that usually is the first
microinstruction that activates the instruction fetch routine.This routine may be
sequenced by incrementing.At the end of the fetch routine the instructionm is in the IR of
the computer.Next the control memory computes the effective address of the operand.The
net step is the execution of the instruction fetched from memory.
The transformation from the instruction code bits to an address in control memory where
the routine is located is reffered to as a mapping process.
Instruction code
Mapping
logic
Multiplexers
Control memory (ROM)
Subroutine
register
(SBR)
Branch
logic
Status
bits
Microoperations
Control address register
(CAR)
Incrementer
MUX
select
select a status
bit
Branch address
60
At the completion of the execution of the instruction, control must return to the fetch
routine by executing an unconditional; branch microinstruction to the first address of the
fetch routine.
Sequencing Capabilities Required in a Control Storage
- Incrementing of the control address register
- Unconditional and conditional branches
- A mapping process from the bits of the machine
instruction to an address for control memory
- A facility for subroutine call and return
Design of control Unit:
After getting the microoperations we have to execute these microperations but before that
we need to decode them.
Fig: Decoding of microoperation fields.
Because we have 8 microoperations represented with the help of 3 bits in every table and
also we have 3 such tables possible we have decoded these microperations field bits with
three 3*8 decoders.
After getting the microoperations, we have to give it to particular circuits, the data
manipulation type of microperations like AND, ADD, Sub and so on we give to ALU and
microoperation fields
3 x 8 decoder
6 5 4 3 2 1 0
F1
3 x 8 decoder
7 6 5 4 3 2 1 0
F2
3 x 8 decoder
7 6 5 4 3 2 1 0
F3
Arithmetic
logic and
shift unit
AND
ADD
DRTAC
AC
Load
From
PC
From
DR(0-10)
Select 0 1
Multiplexers
Load AR Clock
AC
DR
DRTAR
P
C
T
A
R
61
the corresponding results moved to AC. The ALU has been provided data from AC and
DR.
And for data transfer type of instructions like in the case of PCTAR or DRTAR we need
to simply transfer the values .Because we have two options for data transfer in AR we are
taking the help of MUX to choose one . We will take 2*1 MUX and one select line which
is attached with DRTAR microperation signal .That means if DRTAR is high then MUX
will choose DR to transfer the data to AR else PC ‘s data will be moved to AR.And the
corresponding data movement will be done with the help of load high or not .If any of the
values is high the value will be loaded to AR.
The clock signal is provided for the synchronization of microoperations.
62
Lecture – 13:
· Fetch and decode cycle
· Control Unit
Fetch and Decode
T0: AR ¬  PC (S0S1S2=010, T0=1)
T1: IR ¬ M [AR], PC ¬ PC + 1 (S0S1S2=111, T1=1)
T2: D0, . . . , D7 ¬ Decode IR(12-14), AR ¬ IR(0-11), I ¬ IR(15)
S2
S1
S0
Bus
Memory 7
unit
Address
Read
AR
LD
PC
INR
IR
LD Clock
1
2
5
Common bus
T1
T0
63
Control Unit
• Control unit (CU) of a processor translates from machine instructions to the
control signals for the microoperations that implement them
• Control units are implemented in one of two ways
• Hardwired Control
– CU is made up of sequential and combinational circuits to generate
the control signals
• Microprogrammed Control
– A control memory on the processor contains microprograms that
activate the necessary control signals
• We will consider a hardwired implementation of the control unit for the
Basic Computer
Fetch and Decode
T0: AR ¬  PC (S0S1S2=010, T0=1)
T1: IR ¬ M [AR], PC ¬ PC + 1 (S0S1S2=111, T1=1)
T2: D0, . . . , D7 ¬ Decode IR(12-14), AR ¬ IR(0-11), I ¬ IR(15)
64
Control Unit
• Control unit (CU) of a processor translates from machine instructions to the
control signals for the microoperations that implement them
• Control units are implemented in one of two ways
• Hardwired Control
– CU is made up of sequential and combinational circuits to generate
the control signals
• Microprogrammed Control
– A control memory on the processor contains microprograms that
activate the necessary control signals
• We will consider a hardwired implementation of the control unit for the
Basic Computer
S2
S1
S0
Bus
Memory 7
unit
Address
Read
AR
LD
PC
INR
IR
LD Clock
1
2
5
Common bus
T1
T0
65
Lecture – 15:
• Memory hierarchy and its organization
• Need of memory hierarchy
• Locality of reference principle
In the last units we have studied the various instructions , data and the registers associated
with our computer organization.
Lets come on to micro architecture of computer , in which an important part is memory.
Lets study what is a memory and what are the various types of memory available.
Memory unit is a very essential component in a computer which is used for storing
programs and data. We use main memory for running programs and also additional
capacity for storage . We have various levels of memory units in terms of memory
hierarchy.
MEMORY HIERARCHY
Memory Hierarchy is to obtain the highest possible access speed while minimizing
the total cost of the memory system
The various components are:
Main Memory: The memory unit that communicates directly with CPU. The programs
and data currently needed by the processor reside in main memory.
Auxiliary Memory : This is made of devices that provide backup storage. Example :
Magnetic tapes , magnetic disks etc.
Cache memory : This is the memory which lies in between your main memory and CPU.
]
Magnetic
tapes
Magnetic
disks
I/O
processor
CPU
Main
memory
Cache
memory
66
Fig :Memory Hierarchy
In this hierarchy , we have magnetic tapes at the lowest level which means they are very
slow and very cheap in nature. Moving on to upper levels , we have main memory in
which we get increased speed but with increased cost per bit.
Thus we can conclude as we go towards upper levels:
- Price increases
- Speed increases
- Cost per bit increases
- Access time decreases
- Size decreases
Many operating systems are designed to enable the CPU to process a number of
independent programs concurrently. This concept is called multiprogramming.This is
made possible by the existence of 2 programs residing in different pats of memory
hierarchy at the same time . Example : CPU and I/O transfer.
The locality of reference, also known as the locality principle, is the phenomenon, that
the collection of the data locations referenced in a short period of time in a running
computer, often consists of relatively well predictable clusters.
Analysis of a large number of typical programs has shown that the references to memory
at any given interval of time tend to be confined within a few localized areas in memory.
This phenomenon is known as locality of reference
Register
Cache
Main Memory
Magnetic Disk
Magnetic Tape
67
Important special cases of locality are temporal, spatial, equidistant and branch locality.
· Temporal locality: if at one point in time a particular memory location is
referenced, then it is likely that the same location will be referenced again in the
near future. There is a temporal proximity between the adjacent references to the
same memory location. In this case it is common to make efforts to store a copy
of the referenced data in special memory storage, which can be accessed faster.
Temporal locality is a very special case of the spatial locality, namely when the
prospective location is identical to the present location.
· Spatial locality: if a particular memory location is referenced at a particular time,
then it is likely that nearby memory locations will be referenced in the near future.
There is a spatial proximity between the memory locations, referenced at almost
the same time. In this case it is common to make efforts to guess, how big
neighbourhood around the current reference is worthwhile to prepare for faster
access.
· Equidistant locality: it is halfway between the spatial locality and the branch
locality. Consider a loop accessing locations in an equidistant pattern, i.e. the path
in the spatial-temporal coordinate space is a dotted line. In this case, a simple
linear function can predict which location will be accessed in the near future.
· Branch locality: if there are only few amount of possible alternatives for the
prospective part of the path in the spatial-temporal coordinate space. This is the
case when an instruction loop has a simple structure, or the possible outcome of a
small system of conditional branching instructions is restricted to a small set of
possibilities. Branch locality is typically not a spatial locality since the few
possibilities can be located far away from each other.
· Sequential locality:In a typical program the execution of instructions follows a
sequential order unless branch instructions create out of order execution. This also
take into consideration spatial locality as the sequential instructions are stored
near to each other.
In order to make benefit from the very frequently occurring temporal and spatial kind of
locality, most of the information storage systems are hierarchical. The equidistant locality
is usually supported by the diverse nontrivial increment instructions of the processors.
For the case of branch locality, the contemporary processors have sophisticated branch
predictors, and on the base of this prediction the memory manager of the processor tries
to collect and preprocess the data of the plausible alternatives.
Reasons for locality
There are several reasons for locality. These reasons are either goals to achieve or
circumstances to accept, depending on the aspect. The reasons below are not disjoint; in
fact, the list below goes from the most general case to special cases.
68
· Predictability: In fact, locality is merely one type of predictable behavior in
computer systems. Luckily, many of the practical problems are decidable and
hence the corresponding program can behave predictably, if it is well written.
· Structure of the program: Locality occurs often because of the way in which
computer programs are created, for handling decidable problems. Generally,
related data is stored in nearby locations in storage. One common pattern in
computing involves the processing of several items, one at a time. This means that
if a lot of processing is done, the single item will be accessed more than once,
thus leading to temporal locality of reference. Furthermore, moving to the next
item implies that the next item will be read, hence spatial locality of reference,
since memory locations are typically read in batches.
· Linear data structures: Locality often occurs because code contains loops that
tend to reference arrays or other data structures by indices. Sequential locality, a
special case of spatial locality, occurs when relevant data elements are arranged
and accessed linearly. For example, the simple traversal of elements in a onedimensional
array, from the base address to the highest element would exploit the
sequential locality of the array in memory.[2] The more general equidistant
locality occurs when the linear traversal is over a longer area of adjacent data
structures having identical structure and size, and in addition to this, not the whole
structures are in access, but only the mutually corresponding same elements of the
structures. This is the case when a matrix is represented as an sequential matrix of
rows and the requirement is to access a single column of the matrix.
Use of locality in general
If most of the time the substantial portion of the references aggregate into clusters, and if
the shape of this system of clusters can be well predicted, then it can be used for speed
optimization. There are several ways to make benefit from locality. The common
techniques for optimization are:
· to increase the locality of references. This is achieved usually on the software
side.
· to exploit the locality of references. This is achieved usually on the hardware side.
The temporal and spatial locality can be capitalized by hierarchical storage
hardwares. The equidistant locality can be used by the appropriately specialized
instructions of the processors, this possibility is not only the responsibility of
hardware, but the software as well, whether its structure is suitable for compiling
a binary program which calls the specialized instructions in question. The branch
locality is a more elaborate possibility, hence more developing effort is needed,
but there is much larger reserve for future exploration in this kind of locality than
in all the remaining ones.
69
Lecture – 16:
· Main Memory
o RAM chip organization
o ROM chip organization
· Expansion of main memory
o Memory connections to CPU
o Memory address map
Till now we have discussed the memory interconnections and their comparisons.
Lets take each in detail.
Main Memory: Main memory is a large (w.r.t Cache Memory ) and fast memory (w.r.t
magnetic tapes , disks etc) used to store the programs and data during the computer
operation. I/O processor manages data transfers between auxiliary memory and main
memory.
Main Memory is available in 2 types :
The principal technology used for main memory is based on semiconductor integrated
circuits.
RAM : This is part of main memory where we can both read and write data.
Typical RAM chip:
CS1 and CS2 are used to enable or disable a particular RAM.
.
We have corresponding truth table as:
Chip select 1
Chip select 2
Read
Write
7-bit address
CS1
CS2
RD
WR
AD 7
128 x 8
RAM 8-bit data bus
70
We have RAM enabled when CS1 as 1 and CS2 as 0.Else we will have inhibit
operation or high impedence state. When we have 1 and 0 we will have RAM enabled.
But if we have both read and write as 0 we don’t have any operation and thus RAM is in
high impedence state .
RD pin tells us that RAM is getting used fro read operation.
Similarly WR pin is used to show that Write operation is getting performed on RAM.
In this if we have option of both WR and RD as high we choose read operation else we
will have inconsistency of data.
Since we have 128 * 8 words RAM that means we have 128 words and each word of
length 8 bits.
Thus we need 8 bit data bus to transfer the data and we have bidirection 8 bit data
bus .
To access 128 words we need 27 i.e. 7 bits to access 128 words.
Integrated circuit RAM chips are available in 2 modes :
Static memory:
Dynamic Memory:
We have ROM enabled when CS1 as 1 and CS2 as 0.
We need not have any WR pin as ROM does not allow write operation. Also we do not
need RD pin because if ROM is enabled it will be for read operation only.
Since we have 512* 8 words ROM that means we have 512 words and each word of
length 8 bits.
 Thus we need 8 bit data bus to transfer the data but unidirectional as it
only allows
 reading.
 To access 512 words we need 29 i.e. 9 bits.
Typical ROM chip
Chip select 1
Chip select 2
9-bit address
CS1
CS2
AD 9
51R2O xM 8
8-bit data
bus
71
Memory Expansion:
Sometimes we need to combine RAMs or ROMs to expand memory. Taking a similar
case we need 512 words of RAM with 128 words RAM chip and also we need 512 ROM
memory.
In this we will have 4 RAMs of 128 each and one 512 ROM.
To access a particular word of memory we have to go in 3 steps:
1. To access a particular word in 128 words RAM ( we need 7 bits for one
RAM) or a particular word in 512 words ROM ( we need 9 bits for ROM).
2. To choose a particular RAM out of four we need 2 bits and thus we need a
2*4 decoder for it.
3. To choose between RAM or ROM we need 1 more bit to choose one out of 2.
To show these type of connections to the CPU we have the following circuit:
 We have 4 RAMs and one ROM.
 The address lines from 1 – 7 is given to all RAMs.
 The address lines from 1 – 9 to ROM.
 2 bits i.e.8th and 9th are used to access 4 RAMs thus have used a 2*4 decoder.
 To identify between RAM and ROM we used one more bit i.e. 10th bit. That
means if we have value of 10th bit as low RAM will be enabled else ROM will be
enabled.
72
Fig: Memory connection to CPU
To represent them properly we take the help of memory address map:
}
CS1
CS2
RD
WR
AD7
128 x 8
RAM 1
CS1
CS2
RD
WR
AD7
128 x 8
RAM 2
CS1
CS2
RD
WR
AD7
128 x 8
RAM 3
CS1
CS2
RD
WR
AD7
128 x 8
RAM 4
Decoder
3 2 1 0
16-11 10 9 8 7-1 RD WR
Address bus
Data bus
CPU
CS1
CS2
512 x 8
AD9 ROM
1- 7
9
8
Data Data Data Data D
at
a
RAM 1
RAM 2
RAM 3
RAM 4
ROM
0000 - 007F
0080 - 00FF
0100 - 017F
0180 - 01FF
0200 - 03FF
Component
Hexadecimal
address
0 0 0 x x x x x x x
0 0 1 x x x x x x x
0 1 0 x x x x x x x
0 1 1 x x x x x x x
1 x x x x x x x x x
10 9 8 7 6 5 4 3 2 1
Address
bus
73
 We have used x or don’t care for 1-7 bits of RAM and 1-9 for ROM as any value
whether 0 or 1 it will lie in that particular RAM address only.
 2 bits i.e.8th and 9th are used to access 4 RAMs i.e. for
o RAM1 – 0 0
o RAM2 – 0 1
o RAM3 – 1 0
o RAM4 – 1 1
 To identify between RAM and ROM we used one more bit i.e. 10th bit. That
means if we have value of 10th bit as low RAM will be enabled else ROM will be
enabled.
74
Lecture – 17:
· Static RAM nd Dynamic RAM
· Associative Memory
In last lecture we discussed the RAM and ROM chips and their expansion
mechanisms. Lets discuss the various types of RAM .
Static Random Access Memory (SRAM) is a type of semiconductor memory where the
word static indicates that, unlike dynamic RAM (DRAM), it does not need to be
periodically refreshed, as SRAM uses bistable latching circuitry to store each bit. SRAM
exhibits data remanence,[1] but is still volatile in the conventional sense that data is
eventually lost when the memory is not powered.
Dynamic random access memory (DRAM) is a type of random access memory that
stores each bit of data in a separate capacitor within an integrated circuit. Since real
capacitors leak charge, the information eventually fades unless the capacitor charge is
refreshed periodically. Because of this refresh requirement, it is a dynamic memory as
opposed to SRAM and other static memory.
The advantage of DRAM is its structural simplicity: only one transistor and a capacitor
are required per bit, compared to six transistors in SRAM. This allows DRAM to reach
very high density. Unlike Flash memory, it is volatile memory (cf. non-volatile memory),
since it loses its data when the power supply is removed.
Static RAM is a type of RAM that holds its data without external refresh, for as long as
power is supplied to the circuit. This is contrasted to dynamic RAM
(DRAM), which must be refreshed many times per second in order to hold its data
contents. SRAMs are used for specific applications within the PC, where their
strengths outweigh their weaknesses compared to DRAM:
· Simplicity: SRAMs don't require external refresh circuitry or other work in order
for them to keep their data intact.
· Speed: SRAM is faster than DRAM.
In contrast, SRAMs have the following weaknesses, compared to DRAMs:
· Cost: SRAM is, byte for byte, several times more expensive than DRAM.
· Size: SRAMs take up much more space than DRAMs (which is part of why the
cost is higher).
75
We access our memory by some address but we can get some chance to access
memory by the content value and not the address . For example the search
mechanisms. So their comes the concept of associative memory.
Associative Memory: This is the type of memory where we access data by searching
or matching the contents and not by address value.
- Accessed by the content of the data rather than by an address
- Also called Content Addressable Memory (CAM)
In this the data we need to search is kept in Argument register. This is of the same
length as of word size. Since we have m words of nbits length we will have argument
register of length n bits.
Also we have M as match register which gives us the matching result in terms of
those particular bits high. This is equal to the number of words in memory, so match
register is of length m bits.
For example if we have to search 1011 and you have words
Like
Argument register(A)
Key register (K)
Associative memory
array and logic
m words
n bits per word
Match
register
Input
Read
Write
M
76
1011
0111
1000
1100
0010
1011
0111
1011
In this we have occurrence of 1011 , 3 times i.e for 1st , 6th and 8th word. Thus the
value of match register will be high at 1st , 6th and 8th place else it will be low.
In this the value of key register is 1111 which represents that it is matching all the bits of
argument register to every word of associative memory.
In case we need to choose only some bits for checking as we want all words ending with
one .
The value of key register will be 0001( as we are matching only last bit)
And the value of match register will be 11000111.
77
1011
1011
0111
1000
1100
0010
1011
0111
1011
1
0
0
0
0
1
0
1
1111
Lecture – 18:
· Cache Memory:
o Locality of reference
o Associative mapped cache organizations
o Direct mapped cache organizations
o Set associative mapped cache organizations
Cache Memory: The basic idea of cache organization is that by keeping the most
frequently accessed instructions and data in the fast memory i.e. cache memory , the
average memory access time is reduced.
Examples are : Important subprograms , iterative procedures etc. If these active portions
of the program and data are placed in fast small memory , the average access time can be
reduced , thus reducing the total execution time of the program. Such a fast small
memory is referred to as a cache memory.
This is placed between main memory and CPU.
When the CPU needs to access memory , the cache is examined.If the word is found in
the cache , it is read from this fast memory and is called a cache hit.Else we access the
main memory and this is called a cache miss.
The performance of cache memory is frequently measured in terms of hit ratio.
Analysis of a large number of typical programs has shown that the references to memory
at any given interval of time tend to be confined within a few localized areas in memory.
This phenomenon is known as locality of reference .
The various type of concepts used for locality of reference are:
- Temporal Locality
The information which will be used in near future
is likely to be in use already( e.g. Reuse of information in loops)
- Spatial Locality
If a word is accessed, adjacent(near) words are likely accessed soon
(e.g. Related data items (arrays) are usually stored together;
instructions are executed sequentially)
-Sequential locality
In a typical program the execution of instructions follows a sequential order
unless branch instructions create out of order execution. This also take into
consideration spatial locality as the sequential instructions are stored near to each
other.
Performance of Cache Memory System
Cache memory
CPU
78
Main memory
Hit Ratio - % of memory accesses satisfied by Cache memory system
Te: Effective memory access time in Cache memory system
Tc: Cache access time
Tm: Main memory access time
Te = Tc + (1 - h) Tm
Example: Tc = 0.4 m s, Tm = 1.2m s, h = 0.85%
Te = 0.4 + (1 - 0.85) * 1.2 = 0.58m s
MEMORY AND CACHE MAPPING –
Mapping Function:
Specification of correspondence between main memory blocks and cache blocks
There are 3 type of mapping mechanisms
 Associative mapping
 Direct mapping
 Set-associative mapping
ASSOCIATIVE MAPPING –
- Any block location in Cache can store any block in memory
-> Most flexible ( we can place any word of any address value anywhere)
- Mapping Table is implemented in an associative memory
-> Fast, very Expensive (The associative logic device is expensive . Also word size
increases by number of bits in the address of main memory)
- Mapping Table
Stores both address and the content of the memory word
79
In this we have fetched these important words and place
into cache memory but since we get only address values to fetch the data we need to
place this address too in our cache memory. But in cache memory this is saved as the
content value. Thus we have to search this address in forms of content taking the concept
of associative memory. That is why the address we need to fetch we place it in argument
register and searches it in our cache memory .When we find it the corresponding data is
fetched back.
DIRECT MAPPING-
- Each memory block has only one place to load in Cache
- Mapping Table is made of RAM instead of CAM
- n-bit memory address consists of 2 parts; k bits of Index field and
n-k bits of Tag field
- n-bit addresses are used to access main memory
and k-bit Index is used to access the Cache
Addressing Relationships
Memory
address Memory data
00000 1 2 2 0
00777
01000
01777
02000
02777
2 3 4 0
3 4 5 0
4 5 6 0
5 6 7 0
6 7 1 0
address (15 bits)
Argument register
Address Data
0 1 0 0 0
0 2 7 7 7
2 2 2 3 5
3 4 5 0
6 7 1 0
1 2 3 4
CAM
80
In this we divide the address of main memory to 2 fields :
Index: No of bits equal to bits required to access cache memory.
Tag : Total – Index bits.
The reason of this division is we place the content of main memory onto the cache
memory address equal to index bits.
Example : In main memory we have 1220 at address 00000 , by dividing this address we
get Tag as 00 and index as 000. Thus we save 1220 at address 000 of cache memory. But
we also have data 01000 which has also index 000 . So to distinguish between them we
save the tag values along with data in cache memory.
Similarly 2340 is saved at 00777 in main memory will be saved at 777 address in cache
memory with tag value as 00 and so on.
Direct mapping cache organization
Problem:
But in this case we cannot save data of both address 00000, 1000 and 02000 they have
same index value, we have to replace one to store other. Similarly we cannot save 00777
and 01777.That means even we have free words in our memory we cannot save the words
with the same index.
Tag(6) Index(9)
32K x 12
Main memory
Address = 15 bits
Data = 12 bits
512 x 12
Address = 9 bits
Data = 12 bits
00 000
77 777
000
777
Memory
address Memory data
00000 1 2 2 0
00777
01000
01777
02000
02777
2 3 4 0
3 4 5 0
4 5 6 0
5 6 7 0
6 7 1 0
Index
address Tag Data
000 0 0 1 2 2 0
777 0 2 6 7 1 0
Cache memory
81
Cache memory
But this is relatively less expensive and also the word size is smaller to associative type
of organization.
To avoid problems of both direct and associative type of organizations we take the 3rd
concept i.e. Set-associative mapping.
In this we can save more than one data value with same index. And we also save both the
tags corresponding to data values.
Index Ta
g Data
000 0 1 3 4 5 0 0 2 5 6 7 0
Ta
g Data
777 0 2 6 7 1 0 0 0 2 3 4 0
82
Lecture – 17,18,19 and 20:
· Cache to memory write
o Write back policy
o Write through policy
· Cache Coherence
o Software precautions
o Snoopy controller
In last lecture we studied that on the basis of locality principle important or repetitive
data is placed in cache memory. That means we can make any kind of data change in
cache memory but since we have one copy of main memory too we need to make
changes in main memory also.
Thus in this lecture we will study how this updation is done and what are the various
problems we face .
Cache to memory write: Once we access data in cache memory and we make changes to
it , we need to reflect this changes to main memory too. Thus we adopt two policies for it.
Write Through
When writing into memory
If Hit, both Cache and memory is written in parallel
If Miss, Memory is written
For a read miss, missing block may be
overloaded onto a cache block
Memory is always updated
-> Important when CPU and DMA I/O are both executing
Slow, due to the memory access time
Write-Back (Copy-Back)
When writing into memory
o If Hit, only Cache is written
o If Miss, missing block is brought to Cache and write into Cache
 For a read miss, candidate block must be
 written back to the memory
Memory is not up-to-date, i.e., the same item in Cache and memory may have
different value
83
The mechanism is opted depending on 2 important scenarios like :
1. How frequent are the changes in cache memory.
2. And is main memory is used by some other cache memory too.
The case is possible in multiprocessor mechanism , when all the processors use the same
main memory and have their own separate cache memories.
There is one part of main memory containing the value of X is shared by all the
processors ‘ cache memories.
Thus either we use write through or write back policy we will find issues in it and we will
have inconsistency of data which is known as cache coherence.
Cache coherence:
Maintaining Cache Coherency
• Shared Cache
– Disallow private cache
– Access time delay
• Software Approaches
– Read-Only Data are Cacheable
• Private Cache is for Read-Only data
• Shared Writable Data are not cacheable
• Compiler tags data as cacheable and non-cacheable
• Degrade performance due to software overhead
84
– Centralized Global Table
• Status of each memory block is maintained in CGT: RO(Read-
Only); RW(Read and Write)
• All caches can have copies of RO blocks
• Only one cache can have a copy of RW block
• Hardware Approaches
– Snoopy Cache Controller
• Cache Controllers monitor all the bus requests from CPUs and IOPs
• All caches attached to the bus monitor the write operations
• When a word in a cache is written, memory is also updated (write
through)
• Local snoopy controllers in all other caches check their memory to
determine if they have a copy of that word; If they have, that
location is marked invalid(future reference to this location causes
cache miss)
85
Lecture – 20:
· Goals of parallelism
o Segmentation of processor in functional units
· Amdahl’s law
Parallel Processing
In last unit we discussed various type of instructions and microinstructions generated but
here we discuss how we can select these instructions for parallel processing.
Parallel processing is the term used for simultaneous execution of 2 or more instructions.
Various levels of Parallel Processing are:
- Job or Program level
- Task or Procedure level
- Inter-Instruction level
- Intra-Instruction level
First type of parallelism is implemented by increasing the number of processors. The
classification can be on basis of either instruction or data which has been classified as
SISD
SIMD
MISD
MIMD
Given by M.J.Flynn.
The other technique what we use is that instead of using more than one processor , we
can divide the work into various processing unit which is called as segmentation of
processor into various processing units.
Execution of Concurrent Events in the computing process
to Achieve faster Computational Speed
86
Fig: Processor with multiple functional units
Through this we save the cost of adding more and more processor. We have divided one
processor into various functional units , so that they can work simultaneously on different
type of instructions.
That means one instruction requiring shift operation can run simultaneously with
instruction requiring addition on a single processor.
Another way to improve performance is through pipelining. Piepelining is a technique of
decomposing a sequential process into suboperations , with each subprocess being
executed in a special dedicated segment that operates concurrently with all other
segments.
Either we add number of processors or we segment the processor into functional units ,
the speed up we achieve depends on the percentage of parallel execution possible. This
concept gave birth to Amdahl’s law.
Adder/Subtractor
Multiplier/Divisor
Logic Unit
Shift Unit
Incrementer
Floating point
multiply
Floating point
Add / subtract
Floating point
divide
Processor
Register
To
Memory
87
Amdahl's law, also known as Amdahl's argument is named after computer architect
Gene Amdahl, and is used to find the maximum expected improvement to an overall
system when only part of the system is improved. It is often used in parallel computing to
predict the theoretical maximum speedup using multiple processors.
The speedup of a program using multiple processors in parallel computing is limited by
the time needed for the sequential fraction of the program. For example, if a program
needs 20 hours using a single processor core, and a particular portion of 1 hour cannot be
parallelized, while the remaining promising portion of 19 hours (95%) can be
parallelized, then regardless of how many processors we devote to a parallelized
execution of this program, the minimal execution time cannot be less than that critical 1
hour. Hence the speed up is limited up to 20x, as shown in the diagram on the right.
Thus Gene Amdahl in his 1967 paper titled “validity of the single processor
approach to achieve large scale computing capabilities” has main statement as :
If F is the fraction of the calculation that is sequential and (1 – F) is the fraction that
can be parallelized , then the maximum speed up that can be achieved by using P
processors is :
1/(F+ (1-F)/P)
For example :
1.If 50 % is the portion which can be paralelised addng 1 more processor will make
a difference of speed up of only 1.333 times rather than twice.
1/(.50+.50/2)
Similarly if we add 4 more processor the speed up will be only 1.667 times.
2. If we take into consideration 75% of the portion that can be parallelized then the
speed up by adding one more processor is 1.6 times which is more than 1.33 times
in case of 50% parallel portion.
Percent
Parallel
execution
0 50 75 90 95 100
Number of
processors
2 1 1.33 1.6 1.82 1.9 2
5 1 1.667 2.4 3.57 4.17 5
10 1 1.81 3.08 5.26 6.9 10
100 1 1.98 3.88 9.17 16.8 100
Thus we can deduce that speed up depends directly on the percentage of parallel
portion and number of processors . But to a certain limit.
And the extreme cases what we can discuss are :
· In case of 0% parallel portion , whatever the number of processors are there
is no increase in speed up.
88
· In case of 100% parallel portion , the maximum speedup is equal to the
number of processors used.
89
Lecture – 21:
· Pipelining or Pipeline processing
o Example
o Data table
Another way to improve performance is through pipelining. Piepelining is a technique of
decomposing a sequential process into suboperations , with each subprocess being
executed in a special dedicated segment that operates concurrently with all other
segments.
Example :
In this we have the instruction as A* B + C and we have to execute it for various data
values . So we can represent it as :
In this we have divided the steps of execution of this instruction into various steps (as
segments) as we work in one segment , we don’t leave other segment idle. We work on 2
segments simultaneously but for different data values . This helps us in way as if we
execute the data sequentially we will take clock pulse 7* 3 = 21 pulses( considering 1
pulse each for one segment) . To decrease the clock pulses we take the help of pipelining.
Ai
R1 R2
Multiplier
R3 R4
Adder
R5
B Memory i Ci
Segment 1
Segment 2
Segment 3
90
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
.
This is implemented as :
Step 1: In first clock pulse R1 is loaded with the first value of A as A1 and similarly R2 is
loaded with B1.
Step 2: In second clock pulse values of R1 and R2 will be given to multiplier and C1 is
loaded into R3.The values of R1 and R2 are free we can load the values of A2 and B2
into R1 and R2. That means in pulse 2 we have both segment 1 and segment 2 in working
state.
Step 3 : In third clock pulse , the values of multiplier and R3 with value C1 is given to
adder. Thus multiplier and R3 will be free so we can multiply R1 and R2 with value A2
and B2 . But now even segment is free , so we take the values of A3 and B3 into R1 and
R2 .
Similarly in next step
In segment 1 : R1 and R2 will take values of A4 and B4.
In segment 2 : R1 and R2 are multiplied with R3 loaded by C3.
In segment 3 : Adder is working on A2 , B2 and C2.
Data table:
So the calculation of clock pulse in case of sequential access is
No of steps * No of data streams
is replaced with no of steps + no of data streams -1.
Clock
Pulse Segment 1 Segment 2 Segment 3
Number R1 R2 R3 R4 R5
1 A1 B1 ------ ----- -------------
2 A2 B2 A1 * B1 C1 -------------
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 ------------ A7 * B7 C7 A6 * B6 + C6
9 ------------ --------- ----- A7 * B7 + C7
91
Lecture – 22:
· Instruction level parallelism
o Instruction steps
o Example
o Flowchart
· Pipelining hazards
Instruction Pipelining
The pipelining concept we discussed in last chapter was an example of SIMD ( single
instruction on various data values ) . We can also segment the steps of instruction and
execute them with the help of pipelining.
This phenomenon is known as instruction pipelining or instruction level parallelism..
The steps of a particular instruction are:
[1] Fetch an instruction from memory
[2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place
* Some instructions skip some phases
* Effective address calculation can be done in the part of the decoding phase
* Storage of the operation result into a register is done automatically in the
execution phase
==> 4-Stage Pipeline
[1] FI: Fetch an instruction from memory
[2] DA: Decode the instruction and calculate the effective address of the operand
[3] FO: Fetch the operand
[4] EX: Execute the operation
Say we have 3 instructions and if we execute them sequentially we will get a scenario as :
FI DA FO EX
FI DA FO EX
FI DA FO EX
i
i+1
i+2
92
But if we use pipelining the scenario will be
But it has some exceptions as, in case of branching and interrupts. Lets discuss it with the help of a
flowchart.
In this we have instruction pipelining with sequential execution till instruction 3 . But the
3rd instruction is decoded and we get to know that its branching to some other address .
SO the fourth instruction which is fetched is not the next instruction to be executed. So
we don’t take it further and wait till the 3rd instruction executed so that we get the address
to be executed next and that called one becomes 4th instruction. And the pipelining
continues.
FI DA FO EX
FI DA FO EX
FI DA FO EX
i
i+1
i+2
1 2 3 4 5 6 7 8 9 10
12
13
11
1 FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO
2
3
4
5
6
7
FI
Ste
Instrupc:tion
(Branch)
93
EX
Fetch instruction
from memory
Decode instruction
and calculate
Effective address
Branch?
Fetch operand
from memory
Execute instruction
Interrupt Interrupt?
handling
Update PC
Empty pipe
no
yes
yes no
Segment1:
Segment2:
Segment3:
Segment4:
94
Limitations of Pipelining / Pipelining hazards
There are various advantages or uses of pipelining. But there are some problem areas what we
face in case of pipelining.
Major Hazards we face in Pipelined Execution are:
Structural Hazards
Occur when some resource has not been duplicated enough to allow all combinations
of instructions in the pipeline to execute.
Example: With one memory-port, a data and an instruction fetch
cannot be initiated in the same clock
The Pipeline is stalled for a structural hazard
<- Two Loads with one port memory
i FI DA FO EX
i+1
i+2
FI DA FO EX
stall stall FI DA FO EX
95
-> Two-port memory will serve without stall
Data Hazards: Occurs when the execution of an instruction depends on the results of a previous
instruction
ADD R1, R2, R3
SUB R4, R1, R5
Data hazard can be dealt with either HW techniques or SW technique
HW Technique
- Interlock
- hardware detects the data dependencies and delays the scheduling
of the dependent instruction by stalling enough clock cycles
- (Operand) Forwarding (bypassing, short-circuiting)
- Accomplished by a data path that routes a value from a source
(usually an ALU) to a user, bypassing a designated register. This
allows the value to be produced to be used at an earlier stage in the
pipeline than would otherwise be possible
SW Technique
- Instruction Scheduling (compiler) for delayed load
Control Hazards
Prefetch Target Instruction
– Fetch instructions in both streams, branch not taken and branch taken
– Both are saved until branch is executed. Then, select the right
instruction stream and discard the wrong stream
96
Branch Target Buffer (BTB; Associative Memory)
– Entry: Addr of previously executed branches; Target instruction
and the next few instructions
– When fetching an instruction, search BTB.
– If found, fetch the instruction stream in BTB;
– If not, new stream is fetched and update BTB
Loop Buffer (High Speed Register file)
– Storage of entire loop that allows to execute a loop
without accessing memory
Branch Prediction
– Guessing the branch condition, and fetch an instruction stream based on
the guess. Correct guess eliminates the branch penalty
Delayed Branch
– Compiler detects the branch and rearranges the instruction sequence
by inserting useful instructions that keep the pipeline busy
in the presence of a branch instruction
97
Lecture – 23:
· Vector Processors
· Super computers
· Memory Interleaving
· Array Processors
o SIMD array processor
o Attached array processor
There are various type of processors which perform particular operations.
Vector Processors:
One more type of processor what we use are vector processors.
Ability to process vectors, and related data structures such as matrices and multidimensional
arrays, much faster than conventional computers
Vector Processing Applications
• Problems that can be efficiently formulated in terms of vectors
– Long-range weather forecasting
– Petroleum explorations
– Seismic data analysis
– Medical diagnosis
– Aerodynamics and space flight simulations
– Artificial intelligence and expert systems
– Mapping the human genome
– Image processing
Vector Processor (computer)
Ability to process vectors, and related data structures such as matrices
and multi-dimensional arrays, much faster than conventional computers
Vector Processors may also be pipelined
Example:
98
Supercomputers:Is a broad term for one of the fastest computers currently available. Such
computers are typically used for number crunching including scientific simulations, (animated)
graphics, analysis of geological data (e.g. in petrochemical prospecting), structural analysis,
computational fluid dynamics, physics, chemistry, electronic design, nuclear energy research and
meteorology. Perhaps the best known supercomputer manufacturer is Cray Research.
The chief difference between a supercomputer and a mainframe is that a supercomputer channels
all its power into executing a few programs as fast as possible, whereas a mainframe uses its
power to execute many programs concurrently.
A supercomputer is a computer that leads the world in terms of processing capacity,
particularly speed of calculation, at the time of its introduction. The first supercomputers were
introduced in the 1960s, led primarily by Seymour Cray at Control Data Corporation (CDC),
which led the market into the 1970s until Cray split off to form his own company, Cray
Research, and then took over the market. In the 1980s a large number of smaller competitors
entered the market, a parallel to the creation of the minicomputer market a decade earlier, many
of whom disappeared in the mid-1990s "supercomputer market crash". Today supercomputers
are typically one-off custom designs produced by "traditional" companies such as IBM and HP,
who had purchased many of the 1980s companies to gain their experience.
DO 20 I = 1, 100
20 C(I) = B(I) + A(I)
Conventional computer
Initialize I = 0
20 Read A(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = i + 1
If I £ 100 goto 20
Vector computer
C(1:100) = A(1:100) + B(1:100)
99
Technologies developed for supercomputers include:
· Vector processing : A vector processor, or array processor, is a CPU design where
the instruction set includes operations that can perform mathematical operations
on multiple data elements simultaneously. This is in contrast to a scalar processor
which handles one element at a time using multiple instructions. The vast
majority of CPUs are scalar (or close to it). Vector processors were common in
the scientific computing area, where they formed the basis of most
supercomputers through the 1980s and into the 1990s, but general increases in
performance and processor design saw the near disappearance of the vector
processor as a general-purpose CPU.
· Liquid cooling : An uncommon practice is to submerse the computer's
components in a thermally conductive liquid. Personal computers that are cooled
in this manner do not generally require any fans or pumps, and may be cooled
exclusively by passive heat exchange between the computer's parts, the cooling
fluid and the ambient air.
· Non-Uniform Memory Access (NUMA) : Non-Uniform Memory Access or Non-
Uniform Memory Architecture (NUMA) is a computer memory design used in
multiprocessors, where the memory access time depends on the memory location
relative to a processor. Under NUMA, a processor can access its own local
memory faster than non-local memory, that is, memory local to another processor
or memory shared between processor
· Striped disks (the first instance of what was later called RAID): In computer data
storage, data striping is the segmentation of logically sequential data, such as a
single file, so that segments can be assigned to multiple physical devices (usually
disk drives in the case of RAID storage, or network interfaces in the case of Gridoriented
Storage) in a round-robin fashion and thus written concurrently.
· Parallel filesystems : n computing, a file system (often also written as filesystem)
is a method for storing and organizing computer files and the data they contain to
make it easy to find and access them. File systems may use a data storage device
such as a hard disk or CD-ROM and involve maintaining the physical location of
the files, they might provide access to data on a file server by acting as clients for
a network protocol (e.g., NFS, SMB, or 9P clients), or they may be virtual and
exist only as an access method for virtual data (e.g., procfs). It is distinguished
from a directory service and registry.
Memory Interleaving
Also known as MULTIPLE MEMORY MODULE AND INTERLEAVING
100
Memory interleaving is the term used because we are combining or communicating the different
memories for assigning addresses and again to interchange the data.
Array Processors:
A microprocessor that executes one instruction at a time but on an array or table of data at the
same time rather than on single data elements.
• Array processor performs a single instruction in multiple execution units in the same clock
cycle
• The different execution units have same instruction using same set of vectors in the array.
Features of array proessor:
• Use of parallel execution units for processing different vectors of the arrays
• Use of memory interleaving, n memory address registers and n memory data registers in
case of k pipelines and use of vector register files
A computer/processor that has an architecture especially designed for processing arrays (e.g.
matrices) of numbers. The architecture includes a number of processors (say 64 by 64) working
simultaneously, each handling one element of the array, so that a single operation can apply to all
elements of the array in parallel. To obtain the same effect in a conventional processor, the
operation must be applied to each element of the array sequentially, and so consequently much
more slowly.
101
An array processor may be built as a self-contained unit attached to a main computer via an I/O
port or internal bus; alternatively, it may be a distributed array processor where the processing
elements are distributed throughout, and closely linked to, a section of the computer's memory.
Array processors are very powerful tools for handling problems with a high degree of
parallelism. They do however demand a modified approach to programming. The conversion of
conventional (sequential) programs to serve array processors is not a trivial task, and it is
sometimes necessary to select different (parallel) algorithms to suit the parallel approach.
Array processors are most imortantly implemented in 2 ways:
SIMD array processors: A SIMD array processor is a computer with multiple processing
units operating in parallel. The processing units are synchronized to perform the same operation
under the control of a common control unit, thus providing a single instruction stream, multiple
data stream organization.
• Data level parallelism in array processor, for example, the multiplier unit pipelines are in
parallel Computing x[i] × y[i] in number of parallel units.
• It multifunctional units simultaneously perform the actions
Fig: SIMD Array Processor
Attached array processors:
The various components of this structure are:
General purpose computer : Used for general procesing
Main memory : Memory attached to general purpose computer
I/O interface : To connect the two procesors.
Attached array processor : The array processor required for high computations.
Local memory: Attached to array processor
102
• The attached array processor has an input output interface to a common processor and another
interface with a local memory
• The local memory interconnects main memory
Fig: Attached Array Processor
103
Lecture – 24:
· Instruction Codes
· Type of Instructions
o Memory reference type of Instructions
o Register reference type of Instructions
o I/O reference type of Instructions
Instruction Codes:
In last topics, we studied the various types of organizations of our computer. Today , we
will study the various types of instructions supported by our computer.
Before that lets see the cycle of an instruction.
• In Basic Computer, a machine instruction is executed in the following cycle:
1. Fetch an instruction from memory
2. Decode the instruction
3. Read the effective address from memory if the instruction has an indirect
address
4. Execute the instruction
• After an instruction is executed, the cycle starts again at step 1, for the next
instruction
• Note: Every different processor has its own (different)
instruction cycle
The Basic Computer Instruction Format we have is
(OP-code = 000 ~ 110)
This is the type in which we refer memory to fetch our operands .Thus these type of
instructions are called as memory reference instructions.
In this case
 I -is the mode field which tells us whether the technique to fetch the
operand is of type direct addressing or indirect type. i.e
o I = 0 Direct address
o I = 1 
 Opcode – This is another field which is used to tell us the type of
operation to be performed. Since this is of 3 bits, the maximum no of
memory reference operations possible are 23 = 8.
The operations possible in memory reference type of instructions are:
104
15 14 12 11 0
I Opcode Address
 Address – Address is the field which tells us the address on which we have
to fetch the operand.
The effective address of the instruction is in AR and was placed there during
timing signal T2 when I = 0, or during timing signal T3 when I = 1
- Memory cycle is assumed to be short enough to complete in a CPU cycle
- The execution of MR instruction starts with T4
AND to AC
D0T4: DR ¬ M[AR] Read operand
D0T5: AC ¬ AC Ù DR, SC ¬ 0 AND with AC
ADD to AC
D1T4: DR ¬ M[AR] Read operand
D1T5: AC ¬ AC + DR, E ¬ Cout, SC ¬ 0 Add to AC
and store carry in E
LDA: Load to AC
D2T4: DR ¬ M[AR]
D2T5: AC ¬ DR, SC ¬ 0
STA: Store AC
D3T4: M[AR] ¬ AC, SC ¬ 0
BUN: Branch Unconditionally
D4T4: PC ¬ AR, SC ¬ 0
BSA: Branch and Save Return Address
M[AR] ¬ PC, PC ¬ AR + 1
Symbol Operation
Decoder AND D Symbolic Description 0 AC ¬ AC Ù M[AR]
ADD D1 AC ¬ AC + M[AR], E ¬ Cout
LDA D2 AC ¬ M[AR]
STA D3 M[AR] ¬ AC
BUN D4 PC ¬ AR
BSA D5 M[AR] ¬ PC, PC ¬ AR + 1
ISZ D6 M[AR] ¬ M[AR] + 1, if M[AR] + 1 = 0 then PC ¬
PC+1
105
BSA:
D5T4: M[AR] ¬ PC, AR ¬ AR + 1
D5T5: PC ¬ AR, SC ¬ 0
ISZ: Increment and Skip-if-Zero
D6T4: DR ¬ M[AR]
D6T5: DR ¬ DR + 1
D6T4: M[AR] ¬ DR, if (DR = 0) then (PC ¬ PC + 1), SC ¬ 0
Memory, PC after execution
21
0 BSA 135
Next instruction
Subroutine
20
PC = 21
AR = 135
136
1 BUN 135
0 BSA 135
Next instruction
Subroutine
20
21
135
PC = 136
1 BUN 135
Memory Memory
106
Register – Reference Instructions
The instruction format to represent the register -reference type of instructions is:
(OP-code = 111, I = 0)
In this case last 4 bits are fixed i.e. 0111. And 1st 12 bits denotes the type of
operation to be performed on operation. The 12 bits i.e. B0 to B11 represent individual
instruction that has to be performed.
Memory-reference instruction
DR ¬ M[AR] DR ¬ M[AR] DR ¬ M[AR] M[AR] ¬ AC
SC ¬ 0
AND ADD LDA STA
AC ¬ AC DR
SC ¬ 0
AC ¬ AC + DR
E ¬ Cout
SC ¬ 0
AC ¬ DR
SC ¬ 0
D0 T4 D1 T4 D2 T4 D 3 T4
D0 T5 D1 T5 D2 T5
PC ¬ AR
SC ¬ 0
M[AR] ¬ PC
AR ¬ AR + 1
DR ¬ M[AR]
BUN BSA ISZ
D4 T4 D5 T4 D6 T4
DR ¬ DR + 1
D5 T5 D6 T5
PC ¬ AR
SC ¬ 0
M[AR] ¬ DR
If (DR = 0)
then (PC ¬ PC + 1)
SC ¬ 0
D6 T6
Ù
15 12 11Register Operation 0
OPerationoperationoperat
ion
0 1 1 1
107
In these type of instructions the instructions itself tells us the operation and the
register on which the operand has to be performed.
- D7 = 1, I = 0
- Register Ref. Instr. is specified in B0 ~ B11 of IR
- Execution starts with timing signal T3
r = D7 I¢ T3 => Register Reference Instruction
Bi = IR(i) , i=0,1,2,...,11
Input-Output Instructions
The instruction format to represent input - output type of instructions is:
(OP-code =111, I = 1)
In this case last 4 bits are fixed i.e. 1111. And 1st 12 bits denotes the type of
operation to be performed on operation. The 12 bits i.e. B0 to B11 represent individual
instruction that has to be performed.
r: SC ¬ 0
CLA rB11 : AC ¬ 0
CLE rB10 : E ¬ 0
CMA rB9: AC ¬ AC’
CME rB8: E ¬ E’
CIR rB7: AC ¬ shr AC, AC(15) ¬ E, E ¬ AC(0)
CIL rB6: AC ¬ shl AC, AC(0) ¬ E, E ¬ AC(15)
INC rB5: AC ¬ AC + 1
SPA rB4: if (AC(15) = 0) then (PC ¬ PC+1)
SNA rB3: if (AC(15) = 1) then (PC ¬ PC+1)
SZA rB2: if (AC = 0) then (PC ¬ PC+1)
SZE rB1: if (E = 0) then (PC ¬ PC+1)
HLT rB0: S ¬ 0 (S is a start-stop flip-flop)
15 12
11 0
1 1 1 1 I/O operation
108
To understand these operations lets discuss a simple computer with input and
output devices connected to it.
Now we will discuss how control unit will identify that what type of instruction is
getting executed.
Input-Output Configuration
Here are the details of the registers used in this organization:
INPR Input register - 8 bits : When we enter some value from keyboard(or
from any other input device) ,the alphanumeric value of it gets stored to INPR and
then processes to move to accumulator.
OUTR Output register - 8 bits : Similar to INPR , OUTR is the register
which holds the alphanumeric code of the input it gets from accumulator before it
gets printed on printer ( or displays to the monitor).
AC Accumulator – 16 bits: Accumulator id the main processor register who
receives the first inputs and last outputs.
FGI Input flag - 1 bit: This is a control flip flop used to synchronize the
timing difference between the input devices and the processor’s speed.
FGO Output flag - 1 bit: Similar to FGI, this is a control flip flop used to
synchronize the timing difference between the output devices and the processor’s
speed.
IEN Interrupt enable - 1 bit: This is a flip flop which tells us whether to
interrupt the operations or not.
Important points:
Input-output
terminal
Serial
communication
interface
Computer
registers and
flip-flops
Printer
Keyboard
Receiver
interface
Transmitter
interface
OUTR FGO
AC
INPR FGI
Serial Communications Path
Parallel Communications Path
109
- The terminal sends and receives serial information
- The serial info. from the keyboard is shifted into INPR
- The serial info. for the printer is stored in the OUTR
- INPR and OUTR communicate with the terminal
serially and with the AC in parallel.
- The flags are needed to synchronize the timing
difference between I/O device and the computer
The process continues as:
Initially , the input flag FGI is cleared to 0.When a key is struck in the keyboard,
an 8 – bit alphanumeric code is shifted into INPR and the input flag FGI is set to 1.As
long as the FGI is set to 1 , new information cannot be entered to it. The computer checks
the flag bit , if it is 1 , the information from INPR is transferred in parallel to AC and FGI
is set to 0 that means ready to take new key input.
Similar is the operation in case of output devices , except in the flow of direction. Initially
the FGO is set to 1. The computer checks the flag bit ;if it is 1 , the information from AC
is transferred to OUTR and FGO is cleared to 0.The output device accepts the coded
information , prints the corresponding character , and when the operation is completed ,
sets the flag to 1. In this case OUTR does not accept new character until the FGO is 0.
After understanding the operation in I/O organization lets discuss the various
operations of inout output organization.
In these type of instructions the instructions itself tells us the operation and the
register on which the operand has to be performed.
D7IT3 = p
IR(i) = Bi, i = 6, …, 11
That means it supports only six operations for input /output and interrupt
type of instructions for bits B6 to B11 and B0 to B5 holds no importance.
And the operations are:
INP: When FGI is 0 , give the value of INPR to accumulator.
OUT: When output flag is set to 0 , sends accumulator value to OUTR.
p: SC ¬ 0 Clear SC
INP pB11 : AC(0-7) ¬ INPR, FGI ¬ 0 Input char. to AC
OUT pB10 : OUTR ¬ AC(0-7), FGO ¬ 0 Output char. from AC
SKI pB9: if(FGI = 1) then (PC ¬ PC + 1) Skip on input flag
SKO pB8: if(FGO = 1) then (PC ¬ PC + 1) Skip on output flag
ION pB7: IEN ¬ 1 Interrupt enable on
IOF pB6: IEN ¬ 0 Interrupt enable off
110
SKI : This shows that now the input device is busy and processor is free , so we
can execute the next instruction .
SKO: Similarly , if output device is busy in printing the output and the processor
is now free we can fetch a new instruction for execution.
But to note that these instructions will be of branch type so that they return and
check the flag again for the execution.
ION: This causes the interrupt to be ON , i.e the operations can be interrupted if
IEN (interrpt enable) flag is set to 1.
IOF: This will cause the condition in which no interrupt is possible.
To explain more on the interrupts and know the value of IEN flag, lets discuss the
interrupt cycle.
R is the interrupt flip flop which checks whether the instruction execution is due
to normal fetch condition or because of interrupt. Thus w e checks whether R = 0 or not.
If R is 0 , this is the case of normal instruction cycle. We fetch , decode and execute
instruction and in parallel checking there is interrupt or not . System will only accept
interrupts in case if IEN is 1. Thus , if IEN is 0, no chance of interrupt cycle. If IEN is 1
ten we check the flags FGI and FGO, if they are 0 that means processor is busy so no
interrupt is possible. If they are 1 that means we can go for interrupt , thus setting the
value of R as 1 and continue for interrupt cycle.
In case of interrupt, we have to store the address of the next instruction which
comes in the normal execution to some place. We have stored it at address 0 memory and
set the value of PC as 1 and also we set IEN and R as 0 to avoid further interrupt until
this interrupt cycle is completed.
Store return address
=0 R =1
in location 0
M[0] ¬ PC
Branch to location 1
PC ¬ 1
IEN ¬ 0
R ¬ 0
Instruction cycle Interrupt cycle
Fetch and decode
instructions
IEN
FGI
FGO
Execute
instructions
R ¬ 1
=1
=1
=1
=0
=0
=0
111
Lecture – 25:
· Computer Registers
· Instruction set completeness
· Timing and control circuit
Another type of organization uses some processor registers other than accumulator.
Some important points to be noticed are :
• A processor has many registers to hold instructions, addresses, data, etc
• The processor has a register, the Program Counter (PC) that holds the memory
address of the next instruction to get
– Since the memory in the Basic Computer only has 4096 locations, the PC
only needs 12 bits
• In a direct or indirect addressing, the processor needs to keep track of what
locations in memory it is addressing: The Address Register (AR) is used for this
– The AR is a 12 bit register in the Basic Computer
• When an operand is found, using either direct or indirect addressing, it is placed
in the Data Register (DR). The processor then uses this value as data for its
operation
• The Basic Computer has a single general purpose register – the Accumulator
(AC).
• The significance of a general purpose register is that it can be referred to in
instructions
• e.g. load AC with the contents of a specific memory location; store the contents of
AC into a specified memory location
• Often a processor will need a scratch register to store intermediate results or other
temporary data; in the Basic Computer this is the Temporary Register (TR)
• The Basic Computer uses a very simple model of input/output (I/O) operations
• Input devices are considered to send 8 bits of character data to the processor
• The processor can send 8 bits of character data to output devices
• The Input Register (INPR) holds an 8 bit character gotten from an input device
• The Output Register (OUTR) holds an 8 bit character to be send to an output
device
The organization of these basic registers looks like:
112
The data registers are of 16 bit length and the address registers are of 12 bit.
Common Bus System
Common bus system deals with how these various registers are connected and they
interact to each other.
• The registers in the Basic Computer are connected using a bus.
• This gives a savings in circuitry over complete connections between registers.
That means if we use the general connection system i.e. connect each register
with every other register will make it very complex and also a large no of
connections will be required.
11 0
PC
15 0
IR
15 0
TR
7 0
OUTR
15 0
DR
15 0
AC
11 0
AR
INPR
0 7
Memory
4096 x 16
CPU
113
Bus
Memory unit
4096 x 16
LD INR CLR
Address
Write Read
AR
LD INR CLR
PC
LD INR CLR
DR
LD INR CLR
ALU AC
E
INPR
IR
LD
LD INR CLR
TR
OUTR
LD
Clock
16-bit common bus
7
1
2
3
4
5
6
114
• In Basic Computer, there is only one general purpose register, the Accumulator
(AC)
• In modern CPUs, there are many general purpose registers
• It is advantageous to have many registers
– Transfer between registers within the processor are relatively fast
– Going “off the processor” to access memory is much slower
Instruction set completeness
A computer should have a set of instructions so that the user can
construct machine language programs to evaluate any function that is known to be
computable.
Instruction Types
Functional Instructions
- Arithmetic, logic, and shift instructions
- ADD, CMA, INC, CIR, CIL, AND, CLA
Transfer Instructions
- Data transfers between the main memory
and the processor registers
- LDA, STA
Control Instructions
- Program sequencing and control
- BUN, BSA, ISZ
Input/Output Instructions
- Input and output
- INP, OUT
115
Flowchart for complete computer operations:
=1 (I/O) =0 (Register) =1(Indir) =0(Dir)
start
SC ¬ 0, IEN ¬ 0, R ¬ 0
R
AR ¬ PC
R’T0
IR ¬ M[AR], PC ¬ PC + 1
R’T1
AR ¬ IR(0~11), I ¬ IR(15)
D0...D7 ¬ Decode IR(12 ~ 14)
R’T2
AR ¬ 0, TR ¬ PC
RT0
M[AR] ¬ TR, PC ¬ 0
RT1
PC ¬ PC + 1, IEN ¬ 0
R ¬ 0, SC ¬ 0
RT2
D7
I I
Execute
I/O
Instruction
Execute
RR
Instruction
AR <- M[AR] Idle
D7IT3 D7I’T3 D7’IT3 D7’I’T3
Execute MR
Instruction
=0(Instruction =1(Interrupt
Cycle) Cycle)
=1(Register or I/O) =0(Memory Ref)
D7’T4
116
Lecture – 26:
· Instruction Cycle
o Flowchart for determining the type of instruction
o Timing and control circuit
o Timing Signals
We have the instrcuction cycle as Fetch , decode and execute.At the time of decoding
the instruction we find out the type of instruction.
In this section we will discuss the flowchart and corresponding circuit to do so.
Flowchart for determining the type of instruction:
= 0 (direct)
Start
SC ¬  0
AR ¬ PC
T0
IR ¬ M[AR], PC ¬ PC + 1
T1
AR ¬ IR(0-11), I ¬ IR(15)
Decode Opcode in IR(12-14),
T2
D7
(Register or I/O) = 1 = 0 (Memory-reference)
I I
Execute
Register-reference
Instruction
SC ¬ 0
Execute
Input-output
Instruction
SC ¬ 0
AR ¬ M[AR] Nothing
(I/O) = 1 = 0 (register) (indirect) = 1
T3 T3 T3 T3
Execute
Memory-reference
Instruction
SC ¬ 0
T4
117
Control unit of Basic Computer
In this circuit we explain how the instruction is fetched in IR . And the corresponding bits
12 , 13 and 14 are decoded to check the type of instruction. Too take this decision we
take the help of combinational control logic by additional mode bit information. After
checking the type of instruction corresponding control signals are generated.
To synchronize the fetch , decode and execute phases of instruction cycle we use a timing
circuit . This contains a 4 bit sequence counter which gives us 16 timing signals by
converting it with 4*16 decoder. Since the timing signals are fixed till 16 . We clear this
D'7IT3:AR[AR]
D'7I'T3:Nothing
D7I'T3:Execute a register-reference instruction.
D7IT3:Execute an input-output instruction.
Instruction register (IR)
15 14 13 12 11 - 0
3 x 8
decoder
7 6 5 4 3 2 1 0
I
D0
15 14 . . . . 2 1 0
4 x 16
decoder
4-bit
sequence
counter
(SC)
Increment (INR)
Clear (CLR)
Clock
Other inputs
Control
signals
D
T
T
7
15
0
Combinational
Control
logic
118
clock for every instruction and increment it for the various phases . Again when new
instruction is fetched , we clear the SC so that timing signals start back from T0.
To explain this further :
We have taken an example of instruction STA which executes at D3T4.
- Generated by 4-bit sequence counter and 4´ 16 decoder
- The SC can be incremented or cleared.
- Example: T0, T1, T2, T3, T4, T0, T1, . . .
Assume: At time T4, SC is cleared to 0 if decoder output D3 is active.
Clock
T0 T1 T2 T3 T4 T0
T0
T1
T2
T3
T4
D3
CLR
SC
D3T4: SC ¬ 0
119
Lecture – 27:
· Control Memory
o Its Organization
· Mapping Logic
· Microprogram Example
Control Memory:
The function of control unit in a digital computr is to initiate sequences of
microoperations and these microoperations (the number) derives the complexity of digital
system.
These control signals can be hardwaired(using conventional logic design techniques ) or
microprogrammed .Generally , the control function is a binary variable can be either 1
state or 0 state depending on the application and these can be represented by a string of
1’s or 0’s called a control word..
A control unit whose binary control variables are stored in memory is called a
microprogrammed control unit.
Each word in a control memory contains a micro instruction which is a set of
microoperations. A sequence of microinstructions constitute a microprogram.
Since alteration are not required once the control unit is in operation , the control memory
can be static memory or ROM(read only memory).
We can also use the technique of dynamic programming which can be used for writing
(to change the program) but is used mostly for reading.This type of memory is also called
writable control memory.
Thus we can say,
A memory that is part of control unit is known as control memory.
A computer having microprogrammed control unit have 2 separate memories:
Main Memory: This is used for storing programs which can be altered.
Control Memory: This holds a fixed microprogram that cannot be altered by
occasional user and these specify various microinstructions that contains various internal
control signals for execution of register operations.
These microinstructions generate the microoperations to:
· Fetch the instruction from memory.
· Evaluate the effective address
· Execute the operation specified by the instruction
120
· Return control to the fetch phase in order to repeat the cycle for next
instruction.
Configuration of a micro programmed control unit:
Fig: Micro programmed Control Unit
The control unit is assumed to be ROM , within which all control information is
permanently stored. The control memory address register contains address of the
microinstruction. Control data register holds microinstruction read from memory .The
microinstruction contains a control word that specifies one or more microoperations for
the data processor.
After the execution of these microoperations we should get the location of the next
operation that can also depend on the external input .To find the next address , we need
next address generator also called a sequencer as it determines the address sequence that
is read from control memory.
The typical function of a microprogram sequencer :
· Incrementing the CAR by 1 (in case of sequential execution).
· Loading the CAR an address from control memory(in case of branching).
· Transferring the external address(In case of interrupts)
· Loading an initial address to start the control operations(In case of first
microoperation).
The control data register holds the present microinstruction while the next address is
computed and read from memory.It is also called pipeline register as it allows the
execution of microoperations simultaneously with the generation of the next
microinstruction.It requires a 2 phase clock , one applied to address register and one for
data register.
We can also work without control data register using single phase clock in which the
control word and next address information are taken directly from control memory.Rom
operates as a combinational circuit , with the address value as the input and the
corresponding word as output.The content of the specified word remains in the address
register .
Next Address
Generator
Control Address
Register (CAR)
Control
Memory-ROM Control Data
Register
Control Word
121
External Input
The main advantage of the microprogrammed control is the fact that once the hardware
configuration is established , there should be no need for further hardwire or wiring
changes.Only thing changes is microprogram residing in control memory.
Mapping Of instructions:
Mapping from the OP-code of an instruction to the
address of the Microinstruction which is the starting
microinstruction of its execution microprogram
Here , we have to generate the address of the microinstruction with the help of
instruction.In this we fetch the instruction and gets the opcode of the particular
instruction. For mapping we copy the values of the opcode directly for microinstruction
address but we append some bits at the end and starting of the microinstruction. What
values will be appended is completely a decision of the designer .like in this example we
have appended 0 before opcode copied bits and 00 at the end .And this mapping rule will
be generalized for all opcodes /instructions.
Note : The number of bits appended at the end specifies the maximum length of the
microprogram.
In the next diagram we have shown the mapping of various instructions to its particular
microinstructions or microprogram.
1 0 1 1 Address
OP-code
Mapping bits
Microinstruction
address
0 x x x x 0 0
0 1 0 1 1 0 0
Machine
Instruction
122
MICROPROGRAM EXAMPLE
Computer Harware Configuration
This type of configuration contains two memory units :
· Main memory for storing instructions and data
· Control memory for storing microinstructions/microprogram.
In this main memory is accessed with the help of PC , AR and DR and the transfer of
information takes place with the help of MUX instead of common bus.
Similarly control memory is accessed with the help of CAR and the manipulation of
address sequencing is helped through SBR.
The control signals then fetched from control memory are manipulated with the help of
ALU taking the values from DR and AC and storing the result to AC.
MUX
AR
10 0
PC
10 0
Address Memory
2048 x 16
MUX
DR
15 0
Arithmetic
logic and
shift unit
AC
15 0
SBR
6 0
CAR
6 0
Control memory
128 x 20
Control unit
123
Mapping function implemented by ROM or PLA
The mapping function is sometimes implemented by means of an integrated circuit called
programmable logic device or PLD.
This is similar to ROM and the mapping function is expressed in terms of Boolean
expressions which are implemented with PLD.
ADD Routine
AND Routine
LDA Routine
STA Routine
BUN Routine
Control
Storage
0000
0001
0010
0011
0100
OP-codes of Instructions
ADD
AND
LDA
STA
BUN
0000
0001
0010
0011
0100
...
Direct Mapping
Address
10 0000 010
10 0001 010
10 0010 010
10 0011 010
10 0100 010
Mapping
Bits 10 xxxx 010
ADD Routine
Address
AND Routine
LDA Routine
STA Routine
BUN Routine
OP-code
Mapping memory
(ROM or PLD)
Control address register
Control Memory
124
Lecture – 30:
· Direct Memory Access
Block of data transfer from high speed devices, Drum, Disk, Tape
* DMA controller - Interface which allows I/O transfer directly between Memory
and Device, freeing CPU for other tasks
* CPU initializes DMA Controller by sending memory address and the block
size(number of words)
Block Diagram of DMA controller
Starting an I/O
- CPU executes instruction to
Load Memory Address Register
Load Word Counter
Load Function(Read or Write) to be performed
Issue a GO command
Upon receiving a GO Command DMA performs I/O
High-impedence
(disabled)
when BG is
enabled
CPU bus signals for DMA transfer
} Address bus
Data bus
Read
Write
ABUS
DBUS
RD
WR
Bus request
Bus granted
BR
BG
CPU
Address bus
Data bus
DMA select
Register select
Read
Write
Bus request
Bus grant
Interrupt
DS
RS
RD
WR
BR
BG
Interrupt
Data bus
buffers Address bus
buffers
Address register
Word count register
Control register
DMA request
DMA acknowledge to I/O device
Control
logic
Internal Bus
125
operation as follows independently from CPU
Input
[1] Input Device <- R (Read control signal)
[2] Buffer(DMA Controller) <- Input Byte; and
assembles the byte into a word until word is full
[4] M <- memory address, W(Write control signal)
[5] Address Reg <- Address Reg +1; WC(Word Counter) <- WC - 1
[6] If WC = 0, then Interrupt to acknowledge done, else go to [1]
Output
[1] M <- M Address, R
M Address R <- M Address R + 1, WC <- WC - 1
[2] Disassemble the word
[3] Buffer <- One byte; Output Device <- W, for all disassembled bytes
[4] If WC = 0, then Interrupt to acknowledge done, else go to [1]
While DMA I/O takes place, CPU is also executing instructions
DMA Controller and CPU both access Memory -> Memory Access Conflict
Memory Bus Controller
- Coordinating the activities of all devices requesting memory access
- Priority System
Memory accesses by CPU and DMA Controller are interwoven,
with the top priority given to DMA Controller
-> Cycle Stealing
Cycle Steal
- CPU is usually much faster than I/O(DMA), thus
CPU uses the most of the memory cycles
- DMA Controller steals the memory cycles from CPU
- For those stolen cycles, CPU remains idle
- For those slow CPU, DMA Controller may steal most of the memory
cycles which may cause CPU remain idle long time
126
DMA TRANSFER
BG
BR
CPU
RD WR Addr Data
Interrupt
Random-access
memory unit (RAM)
RD WR Addr Data
BR
BG
RD WR Addr Data
Interrupt
DS
RS DMA
Controller
I/O
Peripheral
device
DMA request
DMA ack.
Read control
Write control
Data bus
Address bus
Address
select
127
Lecture – 30:
· Interrupts
o Types of interrupts
· Interrupt cycle
Types of Interrupts:
External interrupts
External Interrupts initiated from the outside of CPU and Memory
- I/O Device → Data transfer request or Data transfer complete
- Timing Device → Timeout
- Power Failure
- Operator
Internal interrupts (traps)
Internal Interrupts are caused by the currently running program
- Register, Stack Overflow
- Divide by zero
- OP-code Violation
- Protection Violation
Software Interrupts
Both External and Internal Interrupts are initiated by the computer HW.
Software Interrupts are initiated by the executing an instruction.
- Supervisor Call → Switching from a user mode to the supervisor mode
→ Allows to execute a certain class of operations
which are not allowed in the user mode
Interrupt Procedure:
- The interrupt is usually initiated by an internal or
an external signal rather than from the execution of
an instruction (except for the software interrupt)
- The address of the interrupt service program is
determined by the hardware rather than from the
address field of an instruction
- An interrupt procedure usually stores all the
information necessary to define the state of CPU
rather than storing only the PC.
The state of the CPU is determined from;
Content of the PC
128
Content of all processor registers
Content of status bits
Many ways of saving the CPU state
depending on the CPU architectures
Flowchart of interrupts:
To explain more on the interrupts and know the value of IEN flag, lets discuss the
interrupt cycle.
R is the interrupt flip flop which checks whether the instruction execution is due
to normal fetch condition or because of interrupt. Thus w e checks whether R = 0 or not.
If R is 0 , this is the case of normal instruction cycle. We fetch , decode and execute
instruction and in parallel checking there is interrupt or not . System will only accept
interrupts in case if IEN is 1. Thus , if IEN is 0, no chance of interrupt cycle. If IEN is 1
ten we check the flags FGI and FGO, if they are 0 that means processor is busy so no
interrupt is possible. If they are 1 that means we can go for interrupt , thus setting the
value of R as 1 and continue for interrupt cycle.
In case of interrupt, we have to store the address of the next instruction which
comes in the normal execution to some place. We have stored it at address 0 memory and
set the value of PC as 1 and also we set IEN and R as 0 to avoid further interrupt until
this interrupt cycle is completed.
Store return address
=0 R =1
in location 0
M[0] ¬ PC
Branch to location 1
PC ¬ 1
IEN ¬ 0
R ¬ 0
Instruction cycle Interrupt cycle
Fetch and decode
instructions
IEN
FGI
FGO
Execute
instructions
R ¬ 1
=1
=1
=1
=0
=0
=0
129
 
No comments:
Post a Comment