We’ve now seen how to design a simple single cycle CPU. Next, we’re going to start talking about how to improve its performance and look at a more realistic design.
As discussed, there are two fundamental ways for computer architects to improve performance and overcome the fundamental physical constraints of electrical systems: Parallelism and locality. The pipelined processor leverages parallelism, specifically “pipelined” parallelism to improve performance and overlap instruction execution.
In the next section on Instruction-level parallelism, we will see another type of parallelism and how it can further increase performance.
Computer Organization and Design
Sections 4.6-4.10
Edition 1
Sections 4.5-4.9
Computer Architecture: A Quantitative Approach
Appendix C.2-C.7
Note: In this section there are a couple of videos which have examples using the DINO CPU in great detail. These are very useful for the students in 154B while they are working on the DINO CPU project. Those of you in 201A can skip these videos or run through at 2x speed.
Computer Organization and Design
Sections 4.6
Edition 1
Section 4.5
Computer Architecture: A Quantitative Approach
Appendix C.2
This is a video introducing the concept of pipelining through the metaphor of laundry. I’m sure you don’t love doing laundry, but it’s a pretty universal chore. I tried using a different metaphor once… I used one of my hobbies, brewing beer, but since most people didn’t know the beer brewing process, it was a bit less effective. So, you’re stuck with me talking about laundry.
Computer Organization and Design
Section 4.7
Edition 1
Section 4.6
Computer Architecture: A Quantitative Approach
Appendix C.3
The videos in this section use the DINO CPU design from Spring Quarter 2020 which is slightly different than the design for this quarter.
The main difference is this quarter we have a NextPC
unit which outputs the address of the next instruction and also whether a branch is “taken”.
This is equivalent to the two muxes at the “top” of the execute stage in the previous design.
The next video explains how to modify the single-cycle DINO CPU to be pipelined and goes through an example of how a few instructions may be executed.
This video shows an example of how a program will execute in a pipelined CPU.
This video discusses the performance of a pipelined CPU.
Use canvas to complete the quiz!
Now that we understand the basics of pipelining, let’s look at some of the limitations and requirements to implement real system.
Note: these lectures will be useful when completing Part II of DINO CPU assignment 3.
Computer Organization and Design
Section 4.8
Edition 1
Section 4.7
Computer Architecture: A Quantitative Approach
Appendix C.4
Through the following videos we will be understanding data hazards and the way the affect the pipeline performance and implementation. These videos are bit more detailed than some of the others. I suggest you give yourself enough time to go through the videos a couple of times to make sure you understand these concepts.
First, let’s look at a very detailed example of how an example application uses this pipelined design. This video is very detailed. You may want to watch the next video first which goes over the example at a higher level and then come back to this video.
This video introduces data hazards and goes over a very detailed example.
Next, let’s bring things up a level and look at a different way to represent the same example.
This video introduces data hazards in a slightly different way than the previous video. After this video, you may want to watch the first video again to make sure you understand the details.
Now that we see how data hazards can cause a problem, let’s look at a naive way to “solve” this problem by stalling some instructions.
This video discusses how to “fix” the data hazard problem by stalling instructions that cause the hazard.
Note that in a “statically” scheduled design, the compiler can actually detect/predict which instructions are going to cause hazards and insert
nop
instructions in the right places. These designs are not common since you would have to recompile your program for every different hardware design. This is a good example of why you don’t want you microarchitecture leaking into your architecture.
This video introduces forwarding to handle data hazards.
This video talks about a special kind of data hazard: load to use hazards.
Use canvas to complete the quiz!
Computer Organization and Design
Section 4.9
Edition 1
Section 4.8
Computer Architecture: A Quantitative Approach
Appendix C.2, Section 3.3
This video introduces control hazards with an example.
This video introduces the idea of branch prediction.
This video discusses the requirements of a branch predictor and some more advanced branch predictions schemes.
This video talks about structural hazards.
Computer Organization and Design
Section 4.10
Edition 1
Section 4.9
This video talks about how to handle exceptions/interrupts/errors in a pipelined processor design.
This video goes over an end-to-end example of the pipelined DINO CPU. This video should be helpful on Assignment 3.2. While it uses the design from WQ ‘21, there are very few differences in how the pipeline control should be implemented. Details with the design from WQ ‘22 will be covered in discussion.
Use canvas to complete the quiz!