Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider the design of a stalling control for a pipelined implementation of the

ID: 3862647 • Letter: C

Question

Consider the design of a stalling control for a pipelined implementation of the MIPS data path

(a) Design the Hazard Detection Unit that triggers the Stall Control Mechanism by
activating a flag (hazard detected).


(b) Define the control lines that need to be used / added to the figure in Problem 1 to
enable the stalling mechanism. Explain how your solution stalls the pipe for the exact
number of clock cycles needed. Does your solution work when more than one hazard is
found in the pipeline?

Consider the pipelined implementation (without forwarding and/or stalling) of the MIPS microprocessor as given below. register 1 register Registers Road register

Explanation / Answer

A pipeline processor is comprised of a sequential, linear list of segments, where each segment performs one computational task or group of tasks.A pipeline processor can be represented in two dimensions  the pipeline segments (Seg #1 through Seg #3) are arranged vertically, so the data can flow from the input at the top left downward to the output of the pipeline (after Segment 3). The progress of an instruction is charted in blue typeface, and the next instruction is shown in red typeface.There are three things that one must observe about the pipeline. First, the work (in a computer, the ISA) is divided up into pieces that more or less fit into the segments alloted for them. Second, this implies that in order for the pipeline to work efficiently and smoothly, the work partitions must each take about the same time to complete. Otherwise, the longest partition requiring time T would hold up the pipeline, and every segment would have to take time T to complete its work. For fast segments, this would mean much idle time. Third, in order for the pipeline to work smoothly, there must be few (if any) exceptions or hazards that cause errors or delays within the pipeline. Otherwise, the instruction will have to be reloaded and the pipeline restarted with the same instruction that causes the exception. There are additional problems we need to discuss about pipeline processors,

Suppose you wanted to make an automobile from scratch. You might gather up the raw materials, form the metal into recognizeable shapes, cast some of the metal into an engine block, connect up fuel lines, wires, etc., to eventually (one would hope) make a workable automobile. To do this, you would need many skills - all the skills of the artisans that make autos, and management skills in addition to being an electrician and a metallurgist. This would not be an efficient way to make a car, but would definitely provide many challenges.

That is the way a multicycle datapath works - it is designed to do everything - input, output, and computation (recall the fetch-decode-execute sequence). We need to ask ourselves if this is really the best way to compute efficiently, especially when we consider the complexity of control for large (CISC) systems or even smaller RISC processors.

Fortunately, our analogy with car-making is not so far-fetched, and can actually help us arrive at a more efficient processor design. Consider the modern way of making cars - on an assembly line. Here, there is an orderly flow of parts down a conveyor belt, and the parts are processed by different stations (also called segments of the assembly line). Each segment does one thing, over and over. The segments are coordinated to exploit the sequentiality inherent in the automobile assembly process. The work gets done more smoothly (because of the orderly flow of input parts and output results), more efficiently (because each assembler at each segment of the pipeline does his or her task at what one hopes is maximum efficiency), and more reliably because there is greater consistency in one task being done repetitively (provided the assembly line is designed correctly).

A similar analogy exists for computers. Instead of a multicycle datapath with its complex control system that walks, talks, cries, and computes - let us suppose that we could build an assembly line for computing. Such objects actually exist, and they are called pipeline processors. They have sequentially-arranged stages or segments, each of which perform a specific task in a fixed amount of time. Data flows through these pipelines like cars through an assembly line.It is easily verified, through inspection of Figure 5.1., that the response time for any instruction that takes three segments must be three times the response time for any segment,provided that the pipeline was full when the instruction was loaded into the pipeline. As we shall see later in this section, if an N-segment pipeline is empty before an instruction starts, then N + (N-1) cycles or segments of the pipeline are required to execute the instruction, because it takes N cycles to fill the pipe.we designed a multicycle datapath based on the assumption that computational work associated with the execution of an instruction could be partitioned into a five-step process,

In order to implement MIPS instructions effectively on a pipeline processor, we must ensure that the instructions are the same length (simplicity favors regularity) for easy IF and ID, similar to the multicycle datapath. We also need to have few but consistent instruction formats, to avoid deciphering variable formats during IF and ID, which would prohibitively increase pipeline segment complexity for those tasks. Thus, the register indices should be in the same place in each instruction. In practice, this means that the rd, rs, and rt fields of the MIPS instruction must not change location across all MIPS pipeline instructions.

Additionally, we want to have instruction decoding and reading of the register contents occur at the same time, which is supported by the datapath architecture that we have designed thus far. Observe that we have memory address computation in the lw and sw instructions only, and that these are the only instructions in our five-instruction MIPS subset that perform memory operations. As before, we assume that operands are aligned in memory, for straightforward access.