Due on 3/3 11:59 pm (PST): See Submission for details
GitHub Classroom link for 154B: https://classroom.github.com/a/bDGsMxaa
GitHub Classroom link for 201A: https://classroom.github.com/a/HZ0NCDOx
In this assignment, you will explore the design space of virtual memory translation caches. You will:
Virtual memory translation is a critical component of modern computer systems. Every memory access requires translating a virtual address to a physical address. To make this translation fast, systems employ translation lookaside buffers (TLBs) and page walk caches.
Should we allocate area to a TLB or to the page walk cache?
To answer this question, you will need to simulate different configurations of TLBs and page walk caches and measure their impact on system performance.
You will use two different workloads for this assignment:
Breadth-first search (BFS)
bfs_x86_run
or bfs_fs_run
Matrix multiplication with blocked layout (MM-Block-IK)
mm_block_ik_x86_run
or mm_block_ik_fs_run
Both workloads are available in syscall emulation (SE) and full system (FS) modes.
The system you are simulating looks something like the picture below:
Before exploring the main part of this assignment, we will look at the differences between running our workloads in syscall emulation (SE) mode versus full system (FS) mode.
SE mode simulates only the application, intercepting system calls and handling them directly. This provides:
FS mode simulates the entire operating system. This provides:
Let’s look at the differences when running in SE and FS modes.
The run script provided (run.py
) will run the workload in SE mode by default.
gem5 run.py bfs
gem5 run.py mm_block_ik
Run these workloads and note the number of instructions, the time, and the IPC.
We have provided Workload
s that allow you to run the exact same binary in FS
mode. However, before you can run the binary, you have to boot the operating
system. Since this would take a very long time (an hour or more) to run in
detailed simulation mode, the run script will use a fast-forward mode to boot
the OS. After the OS is booted, and the binary is copied from your host system
into the guest system, the run script will switch to detailed simulation mode
and run the workload. The code in the exit_event_handler
function handles the
transition from fast-forward to detailed simulation, if you are interested in
the details.
You can see the output of the boot process by either watching the terminal output file or by connecting to the gem5 terminal. To watch the output file, you can run:
tail -f m5out/<path to output>/board.pc.com_1.device`
To connect to the gem5 terminal, you first need to build the m5term
tool:
cd util/term
make
Then you can run:
./m5term <port>
The default port for the gem5 terminal is 3456, but if you run more than one simulation at a time the port will be different. The port number is printed to the terminal when you run the simulation.
Now you can run the workloads in FS mode:
gem5 run.py bfs --fs
gem5 run.py mm_block_ik --fs
Just like in SE mode, when the application begins the region of interest, the statistics are reset. When the region of interest ends, the statistics are dumped to the output file. So, when you compare the statistics between SE and FS modes, you are comparing the same region of interest.
Look at the Hints section on where to get started on reading the
stats.txt
.
You will use the following configuration for your experiments:
The provided run.py
script allows you to configure these parameters.
Complete the following steps and answer the questions for your report.
Before running any experiments:
Run both workloads with the small TLB (16 entries) and small page walk cache configuration.
Run experiments varying both TLB size and page walk cache configuration.
Using the data from your experiments, answer the research question:
Assume that the area difference of the larger TLB and the larger page walk cache is the same. (This is approximately true for the configurations we are using since the TLB will be fully associative and requires a lower access time.)
simSeconds
The hits, misses, and total latency for the data TLB’s (DTB) page walk cache. You can divide the latency by the number of misses to get the average per miss latency (i.e., AMAT). The latency is given in ticks not cycles or seconds.
board.cache_hierarchy.dptw_caches.overallHits::total
board.cache_hierarchy.dptw_caches.overallMisses::total
board.cache_hierarchy.dptw_caches.overallMissLatency::total
The TLB stats (hits, misses, accesses). Feel free to ignore the intruction TLB (itb).
SE:
board.processor.cores.core.mmu.dtb.rdAccesses
board.processor.cores.core.mmu.dtb.wrAccesses
board.processor.cores.core.mmu.dtb.rdMisses
board.processor.cores.core.mmu.dtb.wrMisses
FS:
board.processor.switch.core.mmu.dtb.rdMisses
board.processor.switch.core.mmu.dtb.wrMisses
board.processor.switch.core.mmu.dtb.rdAccesses
board.processor.switch.core.mmu.dtb.wrAccesses
The accesses by the page table waker to memory and to the L2 cache
board.memory.mem_ctrl.dram.bwTotal::processor.switch.core.mmu.dtb.walker
board.cache_hierarchy.l2-cache-0.overallAccesses::processor.switch.core.mmu.dtb.walker
board.cache_hierarchy.l2-cache-0.overallMisses::processor.switch.core.mmu.dtb.walker
board.cache_hierarchy.l2-cache-0.overallHits::processor.switch.core.mmu.dtb.walker
You will submit this assignment via GitHub Classroom.
questions.md
fileMake sure you include both your runscript, an explanation of how to use your
script, and the answers to the questions in the questions.md
file.
Include a detailed explanation of how to use your script and how you use your
script to generate your answers. Make sure that all paths are relative to this
directory (virtual-memory/
).
You are required to work on this assignment individually. You may discuss high level concepts with others in the class but all the work must be completed on your own.
Remember, DO NOT POST YOUR CODE PUBLICLY ON GITHUB! Any code found on GitHub that is not the base template you are given will be reported to SJA. If you want to sidestep this problem entirely, don’t create a public fork and instead create a private repository to store your work.