# Discussion 9

March 5th

#### Outline

- gem5 assignment 5
  - Introduction to the gem5 simulator
  - Look at the assignment 5 components
- Quiz 9

#### Introduction to the gem5 simulator

### What is gem5?

#### Michigan m5 + Wisconsin GEMS = gem5

"The gem5 simulator is a modular platform for computer-system architecture research, encompassing system-level architecture as well as processor microarchitecture."

Lowe-Power et al. **The gem5 Simulator: Version 20.0+**. ArXiv Preprint ArXiv:2007.03152, 2021. https://doi.org/10.48550/arXiv.2007.03152

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. **The gem5** simulator. SIGARCH Comput. Archit. News 39, 2 (August 2011), 1-7. DOI=http://dx.doi.org/10.1145/2024716.2024718







Created at Michigan by Steve Reinhardt and his students, principally Nate Binkert.

"A tool for simulating systems"





#### Two Views of M5

- 1. A framework for event-driven simulation
  - Events, objects, statistics, configuration
- 2. A collection of predefined object models
  - CPUs, caches, busses, devices, etc.

- ☐ This tutorial focuses on #2
- You may find #1 useful even if #2 is not











Created at Michigan by students of Steve Reinhardt, principally Nate Binkert.

"A tool for simulating systems"



n by students of Mark Hill and David

Detailed memory system





#### GEMS From 50,000 Feet







#### Why simulation

Need a tool to evaluate systems that don't exist (yet) Performance, power, energy, etc.

Very costly to actually make the hardware

Computer systems are complex with many interdependent parts

Not easy to be accurate without the full system

Simulation can be parameterized Design-space exploration Sensitivity analysis





# Computer systems research/engineering





From Computer Architecture
Performance Evaluation
Methods by Lieven Eeckhout

Computer architecture simulation!





## gem5's goals



#### Agile Hardware Dev. Methodology







# Discrete event simulation example



Let's look at the gem5 assignment 5's setup

# Quiz 9

4 pts

Question 4

Which system is the most energy efficient (uses the least energy)?

[Select]

$$t_1 = I \times CPI_{3/4} + I_{1}$$

$$t_2 = I \times (25\% + \frac{75\%}{2}) \times CPI_{3/4} + I_{2}$$

$$t_2 = t_1 \times (25\% + \frac{75\%}{2}) \times \frac{f_1}{f_2} = I \times t \times (25\% + \frac{75\%}{2}) \times \frac{2GH_2}{I \cdot 5GH_2}$$

$$= 0 \cdot 833 t$$

4 pts

tic are constant.

Assume you have an application where 75.0% is parallelizable.

You have three systems, a one core system, a two core system, and a four core system. Fill in the table below with the relative performance, power, and energy for each of the systems.

| Cor | es Capacita | nce Voltag | ge Frequenc | У          | Time     | Power      | Energy |
|-----|-------------|------------|-------------|------------|----------|------------|--------|
| 1   | 1*c         | 1V         | 2 GHz       | 1*t        |          | 2*c        | 2ct    |
| 2   | 1.8*c       | .9V        | 1.5 GHz     | [ Select ] | 0.833+   | [ Select ] |        |
| 4   | 2.5*c       | .7V        | 1 GHz       | [ Select ] | 0.18t. · | [ Select ] |        |
|     |             |            |             |            |          |            |        |

Which system is the most energy efficient (uses the least energy)?  $t_4 = t_1 \times \left(25\% + \frac{75\%}{4}\right) \times \frac{26H_2}{16H_2}$ 

2 1.8\*c .9V 1.5 GHz [Select] 0.833t.  $\checkmark$  [Select] 2.14C.

4 2.5\*c .7V 1 GHz [Select] 0.88t.  $\checkmark$  [Select] [-22C.

Which system is the most energy efficient (uses the least energy)?

[Select]  $\checkmark$   $CV^2f$   $Power_2 = (.8c \times (-9)^2 \times (.5GHz) = 2.19c$ .  $Power_4 = 2.5c \times (.7)^2 \times (.5GHz) = 1.22c$ .

Question 4

2.5\*c

4 pts

tic are constant.

Assume you have an application where 75.0% is parallelizable.

You have three systems, a one core system, a two core system, and a four core system. Fill in the table below with the relative performance, power, and energy for each of the systems.

| Cor | es Capacita | nce Volta | ge Frequenc | у          | Time    |   |            | Power  | Energy |
|-----|-------------|-----------|-------------|------------|---------|---|------------|--------|--------|
| 1   | 1*c         | 1V        | 2 GHz       | 1*t        |         |   | 2*c        |        | 2ct    |
| 2   | 1.8*c       | .9V       | 1.5 GHz     | [ Select ] | 0-833t. | ~ | [ Select ] | 2.190. | 1-820  |

[Select] 0.88+ . v [Select] [-226.

Which system is the most energy efficient (uses the least energy)?

1 GHz

[Select]

[Select] 4 core. Frangy 
$$2 = 0.833 + \times 2.19c = 1.82ct$$
.  
Energy  $4 = 0.88 + \times (.22c = 1.07ct)$ .





A webserver serving requests from many clients

| Data - Level.  When the application has [Select] * parallelism a SI         |                    |
|-----------------------------------------------------------------------------|--------------------|
| When the application has [Select] parallelism, a SI                         | MD architecture is |
| probably more efficient than a MIMD architecture.                           |                    |
| Question 7                                                                  | 2 pts              |
| Question /                                                                  | 2 pt3              |
| Implementing communication via message passing is required when whice true? |                    |
| Implementing communication via message passing is required when which       |                    |
| Implementing communication via message passing is required when whice true? |                    |

**Question 8** 2 pts Which of the following are examples of message passing architectures? Most smartphones The AMD Epyc system discussed at the end of the memory section A GPU ▼ Super computers (e.g., Summit → at ORNL and Sierra → and LLNL) Most desktop computers Data centers

Question 9 2 pts

Read the wikipedia pages on the minimax algorithm  $\rightarrow$ . This algorithm can be used for playing games and maximizes the minimum value of an opponents' move.

Which parallel programming technique do you think would be most appropriate to parallelize minimax?



O Data-level parallelism / SIMD by using array operations

