Pipelined CPU Design

XKCD comic

We’ve now seen how to design a simple single cycle CPU. Next, we’re going to start talking about how to improve its performance and look at a more realistic design.

As discussed, there are two fundamental ways for computer architects to improve performance and overcome the fundamental physical constraints of electrical systems: Parallelism and locality. The pipelined processor leverages parallelism, specifically “pipelined” parallelism to improve performance and overlap instruction execution.

In the next section on Instruction-level parallelism, we will see another type of parallelism and how it can further increase performance.

Reading (154B)

Computer Organization and Design

Sections 4.6-4.10

Edition 1

Sections 4.5-4.9

Reading (201A)

Computer Architecture: A Quantitative Approach

Appendix C.2-C.7

Note: In this section there are a couple of videos which have examples using the DINO CPU in great detail. These are very useful for the students in 154B while they are working on the DINO CPU project. Those of you in 201A can skip these videos or run through at 2x speed.

Basic pipeline design

What is pipelining?

Reading (154B)

Computer Organization and Design

Sections 4.6

Edition 1

Section 4.5

Reading (201A)

Computer Architecture: A Quantitative Approach

Appendix C.2

This is a video introducing the concept of pipelining through the metaphor of laundry. I’m sure you don’t love doing laundry, but it’s a pretty universal chore. I tried using a different metaphor once… I used one of my hobbies, brewing beer, but since most people didn’t know the beer brewing process, it was a bit less effective. So, you’re stuck with me talking about laundry.

Play on AggieVideo

Pipelined design for the DINO CPU

Reading (154B)

Computer Organization and Design

Section 4.7

Edition 1

Section 4.6

Reading (201A)

Computer Architecture: A Quantitative Approach

Appendix C.3

The videos in this section use the DINO CPU design from Spring Quarter 2020 which is slightly different than the design for this quarter. The main difference is this quarter we have a NextPC unit which outputs the address of the next instruction and also whether a branch is “taken”. This is equivalent to the two muxes at the “top” of the execute stage in the previous design.

The next video explains how to modify the single-cycle DINO CPU to be pipelined and goes through an example of how a few instructions may be executed.

Play on AggieVideo

Example execution in pipelined DINO CPU

This video shows an example of how a program will execute in a pipelined CPU.

Play on AggieVideo

Pipelined CPU performance

This video discusses the performance of a pipelined CPU.

Play on AggieVideo

QUIZ Basic pipelining

Use canvas to complete the quiz!

Pipeline hazards

Now that we understand the basics of pipelining, let’s look at some of the limitations and requirements to implement real system.

Note: these lectures will be useful when completing Part II of DINO CPU assignment 3.

Limits of our basic pipelined design and data hazards

Reading (154B)

Computer Organization and Design

Section 4.8

Edition 1

Section 4.7

Reading (201A)

Computer Architecture: A Quantitative Approach

Appendix C.4

Through the following videos we will be understanding data hazards and the way the affect the pipeline performance and implementation. These videos are bit more detailed than some of the others. I suggest you give yourself enough time to go through the videos a couple of times to make sure you understand these concepts.

Detailed examples of data hazards and dependencies

First, let’s look at a very detailed example of how an example application uses this pipelined design. This video is very detailed. You may want to watch the next video first which goes over the example at a higher level and then come back to this video.

Play on AggieVideo

This video introduces data hazards and goes over a very detailed example.

Next, let’s bring things up a level and look at a different way to represent the same example.

Play on AggieVideo

This video introduces data hazards in a slightly different way than the previous video. After this video, you may want to watch the first video again to make sure you understand the details.

“Solving” data hazards with stalling

Now that we see how data hazards can cause a problem, let’s look at a naive way to “solve” this problem by stalling some instructions.

Play on AggieVideo

This video discusses how to “fix” the data hazard problem by stalling instructions that cause the hazard.

Note that in a “statically” scheduled design, the compiler can actually detect/predict which instructions are going to cause hazards and insert nop instructions in the right places. These designs are not common since you would have to recompile your program for every different hardware design. This is a good example of why you don’t want you microarchitecture leaking into your architecture.

How to handle data hazards (forwarding)

This video introduces forwarding to handle data hazards.

Play on AggieVideo

Load to use hazards

This video talks about a special kind of data hazard: load to use hazards.

Play on AggieVideo

QUIZ Data hazards

Use canvas to complete the quiz!

Control hazards and branch prediction

Reading (154B)

Computer Organization and Design

Section 4.9

Edition 1

Section 4.8

Reading (201A)

Computer Architecture: A Quantitative Approach

Appendix C.2, Section 3.3

This video introduces control hazards with an example.

Play on AggieVideo

This video introduces the idea of branch prediction.

Play on AggieVideo

This video discusses the requirements of a branch predictor and some more advanced branch predictions schemes.

Play on AggieVideo

Other kinds of hazards: Structural hazards

Play on AggieVideo

This video talks about structural hazards.

Exceptions in a pipelined processor

Reading (154B)

Computer Organization and Design

Section 4.10

Edition 1

Section 4.9

Play on AggieVideo

This video talks about how to handle exceptions/interrupts/errors in a pipelined processor design.

Putting it all together: examples of pipelined execution

This video goes over an end-to-end example of the pipelined DINO CPU. This video should be helpful on Assignment 3.2. While it uses the design from WQ ‘21, there are very few differences in how the pipeline control should be implemented. Details with the design from WQ ‘22 will be covered in discussion.

Play on AggieVideo

QUIZ Pipelining review

Use canvas to complete the quiz!

Previous submodule:

Single Cycle CPU Design

Next submodule:

Instruction-Level Parallelism