News:

GinGly.com - Used by 85,000 Members - SMS Backed up 7,35,000 - Contacts Stored  28,850 !!

Main Menu

CPU DESIGN_2

Started by ganeshbala, Apr 18, 2008, 01:15 PM

Previous topic - Next topic

ganeshbala

To perform a memory fetch operation, we need to complete the following tasks:
The CPU transfers the address of the required memory location to the Memory Address Register (MAR).
The MAR is connected to the memory address line of the memory bus, hence the address of the required word is transfered to the main memory.
Next, CPU uses the control lines of the memory bus to indicate that a Read operation is initiated. After issuing this request, the CPU waits until it receives an answer from the memory, indicating that the requested operation has been completed.
This is accomplished by another control signal of memory bus known as Memory-Function-Complete (MFC).
The memory set this signal to 1 to indicate that the contents of the specified memory location are available in memory data bus.
As soon as MFC signal is set to 1, the information available in the data bus is loaded into the Memory Data Register (MDR) and this is available for use inside the CPU.
As an example, assume that the address of the memory location to be accessed is kept in register R2 and that the memory contents to be loaded into register R1. This is done by the following sequence of operations:
1. MAR [R2] 2. Read
3. Wait for MFC signal 4. R1 [MDR]
The time required for step 3 depends on the speed of the memory unit. In general, the time required to access a word from the memory is longer than the time required to perform any operation within the CPU.
The scheme that is used here to transfer data from one device (memory) to another device (CPU) is referred to as an asynchronous transfer.
This asynchronous transfer enables transfer of data between two independent devices that have different speeds of operation. The data transfer is synchronised with the help of some control signals. In this example, Read request and MFC signal are doing the synchronization task.
An alternative scheme is synchronous transfer. In this case all the devices are controlled by a common clock pulse (continously running clock of a fixed frequency). These pulses provide common timing signal to the CPU and the main memory. A memory operation is completed during every clock period. Though the synchronous data transfer scheme leads to a simpler implementation, it is difficult to accommodate devices with widely varying speed. In such
cases, the duration of the clock pulse will be synchronized to the slowest device. It reduces the speed of all the devices to the slowest one.

Storing a word into memory
The procedure of writing a word into memory location is similar to that for reading one from memory. The only difference is that the data word to be written is first loaded into the MDR, the write command is issued.
As an example, assumes that the data word to be stored in the memory is in register R1 and that the memory address is in register R2. The memory write operation requires the following sequence:
1. MAR [R2]
2. MDR [R1]
3. Write
4. Wait for MFC
- In this case step 1 and step 2 are independent and so they can be carried out in any order. In fact, step 1 and 2 can be carried out simultaneously, if this is allowed by the architecture, that is, if these two data transfers (memory address and data) do not use the same data path.
In case of both memory read and memory write operation, the total time duration depends on wait for the MFC signal, which depends on the speed of the memory module.
There is a scope to improve the performance of the CPU, if CPU is allowed to perform some other operation while waiting for MFC signal. During the period, CPU can perform some other instructions which do not require the use of MAR and MDR.

Register Transfer Operation


Register transfer operations enable data transfer between various blocks connected to the common bus of CPU. We have several registers inside CPU and it is needed to transfer information from one register another. As for example during memory write operation data from appropriate register must be moved to MDR.
Since the input output lines of all the register are connected to the common internal bus, we need appropriate input output gating. The input and output gates for register Ri are controlled by the signal Ri in and Ri out respectively.
Thus, when Ri in set to 1 the data available in the common bus is loaded into Ri . Similarly when, Ri out is set to 1, the contents of the register Ri are placed on the bus. To transfer data from one register to other register, we need to generate the appropriate register gating signal.
For example, to transfer the contents of register R1 to register R2, the following actions are needed:
• Enable the output gate of register R1 by setting R1out to 1.
-- This places the contents of R1 on the CPU bus.
• Enable the input gate of register R2 by setting R2 in to 1.
-- This loads data from the CPU bus into the register R2.

Performing the arithmetic or logic operation:

• Generally ALU is used inside CPU to perform arithmetic and logic operation. ALU is a combinational logic circuit which does not have any internal storage.
Therefore, to perform any arithmetic or logic operation (say binary operation) both the input should be made available at the two inputs of the ALU simultaneously. Once both the inputs are available then appropriate signal is generated to perform the required operation.
We may have to use temporary storage (register) to carry out the operation in ALU .
The sequence of operations that have to carried out to perform one ALU operation depends on the organization of the CPU. Consider an organization in which one of the operand of ALU is stored in some temporary register Y and other operand is directly taken from CPU internal bus. The result of the ALU operation is stored in another temporary register Z.

Multiple Bus Organization


Till now we have considered only one internal bus of CPU. The single-bus organization, which is only one of the possibilities for interconnecting different building blocks of CPU.
An alternative structure is the two bus structure, where two different internal buses are used in CPU. All register outputs are connected to bus A, add all registered inputs are connected to bus B.
There is a special arrangement to transfer the data from one bus to the other bus. The buses are connected through the bus tie G. When this tie is enabled data on bus A is transfer to bus B. When G is disabled, the two buses are electrically isolated.
Since two buses are used here the temporary register Z is not required here which is used in single bus organization to store the result of ALU. Now result can be directly transferred to bus B, since one of the inputs is in bus A. With the bus tie disabled, the result can directly be transferred to destination register.
For example, for the operation, [R3] [R1] + [R2] can now be performed as
1. R1out, Genable, Yin
2. R2out, Add, ALUout, R3in
In this case source register R2 and destination register R3 has to be different, because the two operations R2in and R2out can not be performed together. If the registers are made of simple latches then only we have the restriction.
We may have another CPU organization, where three internal CPU buses are used. In this organization each bus connected to only one output and number of inputs. The elimination of the need for connecting more than one output to the same bus leads to faster bus transfer and simple control

Instructruction execution proceeds as follows:

In Step1:
The instruction fetch operation is initiated by loading the contents of the PC into the MAR and sending a read request to memory.
To perform this task first of all the contents of PC have to be brought to internal bus and then it is loaded to MAR.To perform this task control circuit has to generate the PCout signal and MARin signal.
After issuing the read signal, CPU has to wait for some time to get the MFC signal. During that time PC is updated by 1 through the use of the ALU. This is accomplished by setting one of the inputs to the ALU (Register Y) to 0 and the other input is available in bus which is current value of PC.
At the same time, the carry-in to the ALU is set to 1 and an add operation is specified.
In Step 2:
The updated value is moved from register Z back into the PC. Step 2 is initiated immediately after issuing the memory Read request without waiting for completion of memory function. This is possible, because step 2 does not use the memory bus and its execution does not depend on the memory read operation.
In Step 3:
Step3 has been delayed until the MFC is received. Once MFC is received, the word fetched from the memory is transfered to IR (Instruction Register), Because it is an instruction. Step 1 through 3 constitute the instruction fetch phase of the control sequence.
The instruction fetch portion is same for all instructions. Next step onwards, instruction execution phase takes place.
As soon as the IR is loaded with instruction, the instruction decoding circuits interprets its contents. This enables the control circuitry to choose the appropriate signals for the remainder of the control sequence, step 4 to 8, which we referred to as the execution phase. To design the control sequence of execution phase, it is needed to have the knowledge of the internal structure and instruction format of the PU. Secondly , the length of instruction phase is different for different instruction.
In this example , we have assumed the following instruction format :
opcode M R
i.e., opcode: Operation Code
M: Memory address for source
R: Register address for source/destination
In Step 5 :
The destination field of IR, which contains the address of the register R1, is used to transfer the contents of register R1 to register Y and wait for Memory function Complete. When the read operation is completed, the memory operand is available in MDR.
In Step 6 :
The result of addition operation is performed in this step.
In Step 7:
The result of addition operation is transfered from temporary register Z to the destination register R1 in this step.
In step 8 :
It indicates the end of the execution of the instruction by generating End signal. This indicates completion of execution of the current instruction and causes a new fetch cycle to be started by going back to step 1.
Branching
With the help of branching instruction, the control of the execution of the program is transfered from one particular position to some other position, due to which the sequence flow of control is broken. Branching is accomplished by replacing the current contents of the PC by the branch address, that is, the address of the instruction to which branching is required.
Consider a branch instruction in which branch address is obtained by adding an offset X, which is given in the address field of the branch instruction, to the current value of PC.
Consider the following unconditional branch instruction
JUMP X
i.e., the format is
op- code offset of jump
The control sequence that enables execution of an unconditional branch instruction using the single - bus organization is as follows :
Steps Actions
1. PCout, MARin, Read, Clear Y, Set Carry-in to ALU, Add ,Zin
2. Zout, PCin, Wait for MFC
3. MDRout, IRin
4. PCout, Yin
5. Address field-of IRout, Add, Zin
6. Zout, PCin
7. End
Execution starts as usual with the fetch phase, ending with the instruction being loaded into the IR in step 3. To execute the branch instruction, the execution phase starts in step 4.
In Step 4
The contents of the PC are transferred to register Y.
In Step 5
The offset X of the instruction is gated to the bus and the addition operation is performed.
In Step 6
The result of the addition, which represents the branch address is loaded into the PC.
In Step 7
It generates the End signal to indicate the end of execution of the current instruction.
Consider now the conditional branch instruction instead of unconditional branch. In this case, we need to check the status of the condition codes, between step 3 and 4. i.e., before adding the offset value to the PC contents.
For example, if the instruction decoding circuitry interprets the contents of the IR as a branch on Negative(BRN) instruction, the control unit proceeds as follows:First the condition code register is checked. If bit N (negative) is equal to 1 , the control unit proceeds with step 4 trough step 7 of control sequence of unconditional branch instruction.
If , on the other hand , N is equal to 0, and End signal is issued .
This in effect , terminates execution of the branch instruction and causes the instruction immediately following in the branch instruction to be fetched when a new fetch operation is performed.
Therefore , the control sequence for the conditional branch instruction BRN can be obtained from the control sequence of an unconditional branch instruction by replacing the step 4 by
4. If then End
If N then PCout, yin
Most commonly need conditional branch instructions are
BNZ : Branch on not Zero
BZ : Branch on positive
BP : Branch on Positive
BNP : Branch on not Positive
BO : Branch on overflow
Design of Control Unit
To execute an instruction, the control unit of the CPU must generate the required control signal in the proper sequence. As for example, during the fetch phase, CPU has to generate PCout signal along with other required signal in the first clock pulse. In the second clock pulse CPU has to generate PCin signal along with other required signals. So, during fetch phase, the proper sequence for generating the signal to retrieve from and store to PC is PCout and PCin.
To generate the control signal in proper sequence, a wide variety of techniques exist. Most of these techniques, howeve, fall into one of the two categories,
1. Hardwired Control

2. Microprogrammed Control.
Hardwired Control
In this hardwired control techniques, the control signals are generated by means of hardwired circuit. The main objective of control unit is to generate the control signal in proper sequence.

Consider the sequence of control signal required to execute the ADD instruction that is explained in previous lecture. It is obvious that eight non-overlapping time slots are required for proper execution of the instruction represented by this sequence.

Each time slot must be at least long enough for the function specified in the corresponding step to be completed. Since, the control unit is implemented by hardwire device and every device is having a propagation delay, due to which it requires some time to get the stable output signal at the output port after giving the input signal. So, to find out the time slot is a complicated design task.
For the moment, for simplicity, let us assume that all slots are equal in time duration. Therefore the required controller may be implemented based upon the use of a counter driven by a clock.
Each state, or count, of this counter corresponds to one of the steps of the control sequence of the instructions of the CPU.
In the previous lecture, we have mentioned control sequence for execution of two instructions only (one is for add and other one is for branch). Like that we need to design the control sequence of all the instructions.
By looking into the design of the CPU, we may say that there are various instruction for add operation. As for example,
ADD NUM R1 Add the contents of memory location specified by NUM to the contents
of register R1