CS340 - COMPUTER ARCHITECTURE - II
ANNA UNIVERSITY :: CHENNAI – 600 025
MODEL QUESTION PAPER
B.E. COMPUTER SCIENCE AND ENGINEERING
Time: Three Hours Maximum: 100 Marks
Answer All The Questions
PART – A (10 x 2 = 20 Marks)
1. Consider a processor with a clock cycle time of 1 ns running code with the following instruction mix:-
Frequency Clock cycle count
ALU operation 45% 1
Loads 22% 3
Stores 15% 3
Branches 18% 2
A new level of cache is being added, that speeds up memory access time by 2, and increases clock cycle time by 50%. Find the improvement.
2. The common trend in processor, in that the size of the operand keeps increasing (8 to 16 to 32 etc.). Recently there is a move towards keeping them small. What are the advantages?
3. Dhrystone is a well-known integer benchmark. Computers A and B are measured to perform DA & DB executions of the Dhrystone benchmark per second. Computer A has a MIPS of MIPSA. What is the fallacy in calculating the MIPS of B to be MIPSB = MIPSAX (DB/DA)?
4. What is meant by ‘having precise exceptions’? when can you allow a processor to not have precise exceptions?
5. What is meant by predication?
6. How is loop level parallelism determined?
7. Is it possible to use a global branch prediction for certain branches, and a local branch prediction for others? If so, how? If not, why not?
8. What are the key ideas used in hardware-based speculation?
9. How can an atomic operation be achieved using a pair of separate instructions?
10. How does the SCSI bus allow interlacing of commands from different initiators to a single target.
PART – B (5 x 16 = 80 Marks)
11.i) Explain the different stage of the pipeline for a processor other than the MIPS processor. Compare it with the MIPS processor pipeline. (8 )
ii) If you have to design a processor for a mobile application (say a cell phone), what design choices (of all that you studied in this course) would you make. (8 )
12.a)i) When is normalized execution time used for performance comparison? Why should arithmetic mean not be used to average normalized execution times? What are the pros and cons of using geometric mean. (6)
ii) If you are designing a processor for digital signal processing applications, What addressing modes would you include in your processor. Why. (4)
iii) What is the architectural support (visibility) that is to be provided to a complier, to aid the various static branch prediction schemes? What I is the disadvantage of architectural visibility? (6)
12.b)i) A design choice is to be made in enhancing a processor. One option is to invest in additional hardware that works at twice the speed which will benefit 60% of the program. The other is to keep the hardware simple, but invest in compiler optimization, which provides varying benefits for different programs. 40% of the programs can be speeded up by a factor 2, while 30% of the programs can be speeded up by a factor 3. Which option would be better? (6)
ii) Design the forwarding login for the DLX FP pipeline. (10)
13. Consider the following code (DAXPY loop);
Loop : LD F2, 0(R1)
MULD F4, F2, F0
LD F6, 0(R2)
ADDD F6, F4, F6
SD 0(R2), F6
ADDI R1, R1, #8
ADDI R2, R2, #8
CMPI R3, R1, #800
BEQZ R3, loop
Assume the following:
i. Functional units are not pipelined
ii. Results are communicated by CDB
iii. 5 stage pipeline – IF/ID/IS/EX/WB
(IS – issue and write-back take 1 clock cycle)
FU type Cycles in EX Number of FUs # of Reservation Stations
Integer 1 1 5
FP ADD 4 1 3
FP MUL 15 1 2
LD/ST 2 2 5 LD / 5 ST buffers
13.a) Show that the working of this code for 3 iterations of the loop – on
i) Single-issue Tomasulo processor (8 )
ii) Score boarding with pipelined functional units. Compare the two. (8 )
13.b) Show the working of this code for 3 iterations of the loop – on
i) Two-issue Tomasulo processor (8 )
ii) Tomasulo processor with speculation. Compare the two. (8 )
14.a)i) List all the dependences in the following code fragment. Determine the loop-level parallelism in the code. (6)
a(i,j) = b(ij) + c(i,j);
a(i-1, j) = b(i,j) – d(i, j);
b(i+1, j) = b(i,j) + c(i-1, j);
c(i,j-1) = (c(i+1,j) + c(i-1, j) ) / 2;
ii) Compare the VLIW and super scalar approaches to ILP. (6)
iii) Comment on the use of ‘scatter-gather’ operation and the use of vector masks in a vector processor. (4)
14.b)i) Critically compare the various dynamic branch prediction schemes. (8 )
ii) What is software pipelining? Software pipeline the code given in Question No.13.
15.a)i) Explain with a state-diagram, the snoop-based cache-coherence protocol for a ‘write through’ cache. (8 )
ii) How does the sense-reversing barrier work? (4)
iii) What are the features of the PCI bus? (4)
15.b)i) Using an example, demonstrate why implementation of synchronization using a ‘directory-based’ cache coherence protocol can deteriorate performance. (10)
ii) What are the unusual features found in the MIPS and SPARC processors? (6)