## Machine-Level Representation of Programs: Data and Locality

#### Yipeng Huang

**Rutgers University** 

April 2, 2024

## Table of contents

Procedures, function calls, and support for recursion

Transfer of control to callee Transfer of data to callee Transfer of control back to caller Transfer of data back to caller and restoring caller state Architecture support for recursive programming Cache, memory, storage, and network hierarchy trends Static random-access memory (registers, caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage) Locality: How to create illusion of fast access to capacious data

Spatial locality Temporal locality

Caches: motivation

Hardware caches supports software locality Software locality exploits hardware caches

(ロ)、(部)、(E)、(E)、 E) の(C) 2/32

### Announcements

#### Class session plan

 Tuesday, 4/2: Arrays and data structures in assembly. (Book chapter 3.8). Bomblab phase\_5, phase\_6.

<□▶ < @ ▶ < E ▶ < E ▶ E < 9< 0 3/32</p>

## Table of contents

Procedures, function calls, and support for recursion

Transfer of control to callee Transfer of data to callee Transfer of control back to caller Transfer of data back to caller and restoring caller state Architecture support for recursive programming Cache, memory, storage, and network hierarchy trends Static random-access memory (registers, caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage) Locality: How to create illusion of fast access to capacious data

Spatial locality Temporal locality

Caches: motivation

Hardware caches supports software locality Software locality exploits hardware caches

### Procedures and function calls



Figure: Steps of a C function call. Image credit CS:APP

#### To create the abstraction of functions, need to:

- Transfer control to function
- Transfer data to function (parameters)
- Transfer control back from function
- Transfer data back from function (return type)
- Restore caller context

## CPU and memory state in support of procedures and functions

#### **Carnegie Mellon**

12

#### Assembly/Machine Code View



#### **Programmer-Visible State**

- PC: Program counter
  - Address of next instruction
  - Called "RIP" (x86-64)

#### Register file

- Heavily used program data
- Condition codes
  - Store status information about most recent arithmetic or logical operation
- Bryant and O'Hallaron, Computer systems: A Programmer's Perspective, Third Edition

Memory

- Byte addressable array
- Code and user data
- Stack to support procedures

### Relevant state in CPU:

 %rip register / instruction pointer / program counter

(ロ)、(同)、(E)、(E)、(E)、(A)(C) 6/32

- %rsp register / stack pointer
- Relevant state in Memory:



## Procedures and function calls: Transferring data

For purposes of this class, the Bomb Lab, and the CS:APP textbook, we study the x86-64 Linux Application Binary Interface (ABI). Would be different on ARM or in Windows. So, don't memorize this, but it is helpful for PA4 Lab.

#### Passing parameters

| Parameter      | Register / stack | Subset registers | Mnemonic <sup>1</sup> |
|----------------|------------------|------------------|-----------------------|
| 1st            | %rdi             | %edi, %di        | Diane's               |
| 2nd            | %rsi             | %esi, %si        | silk                  |
| 3rd            | %rdx             | %edx, %dx, %dl   | dress                 |
| 4th            | %rcx             | %ecx, %cx, %cl   | cost                  |
| 5th            | %r8              | %r8d             | \$8                   |
| 6th            | %r9              | %r9d             | 9                     |
| 7th and beyond | Stack            |                  |                       |

<sup>&</sup>lt;sup>1</sup>http://csappbook.blogspot.com/2015/08/dianes-silk-dress-costs-89.htmloge 7/32

### PA4 Defusing a Binary Bomb: sscanf();

```
1 int sscanf (
2 const char *str, // lst arg, %rdi
3 const char *format, // 2nd arg, %rsi
4 ...
5 )
```

## Memory stack frames



## Structure of stack for currently executing function Q()

P() calls Q(). P() is the caller function. Q() is the callee function.

(ロ)、(型)、(E)、(E)、(E)、(O)() 9/32

### Stack instructions: push src and pop dest



Figure: x86-64 offers dedicated instructions to work with stack in memory. In addition to moving data, the updating of %rsp is implied. Image credit: CS:APP.

### Procedure call and return: call and ret



Figure: Effect of call 0x400540 instruction and subsequent return. call and ret instructions update the instruction pointer, the stack pointer, and the stack to create the procedure / function call abstraction. Image credit: CS:APP.

## Example in GDB

```
1 #include <stdio.h>
2
3 int return_neg_one() {
4   return -1;
5 }
6
7 int main() {
8   int num = return_neg_one();
9   printf("%d", num);
10   return 0;
11 }
```

```
return_neg_one:
    movl $-1, %eax
    ret
main:
    subq $8, %rsp
    movl $0, %eax
    call return_neg_one
    movl %eax, %edx
    ...
```

Compile, and then run it in GDB:

gdb return

In GDB, see evolution of %rip, %rsp, and stack:

- (gdb) layout split
- (gdb) break return\_neg\_one
- (gdb) info stack
- ▶ (gdb) print /a \$rip
- (gdb) print /a \$rsp
- ▶ (gdb) x /a \$rsp

# Step past return instruction, and inspect again:

- ▶ (gdb) stepi

### Transfer of data back to caller

### Passing function return data

Function return data is passed via:

- ▶ the 64-bit %rax register
- the 32-bit subset %eax register

Example from textbook slides on assembly procedures Slides 33 through 38.

<ロト < 回 ト < 三 ト < 三 ト 三 の < で 13/32

## Restoring caller state



# Structure of stack for currently executing function Q()

P() calls Q(). P() is the caller function. Q() is the callee function.

Example from textbook slides on assembly procedures

Slides 40 through 44.

## Table of contents

Procedures, function calls, and support for recursion

Transfer of control to callee Transfer of data to callee Transfer of control back to caller Transfer of data back to caller and restoring caller state Architecture support for recursive programming Cache, memory, storage, and network hierarchy trends Static random-access memory (registers, caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage)

Locality: How to create illusion of fast access to capacious data Spatial locality

Temporal locality

Caches: motivation

Hardware caches supports software locality Software locality exploits hardware caches 3\_recursion.c: Putting it all together to support recursion

### **Discussion points**

- ▶ Use info stack, info args in GDB to see recursion depth
- Difference between compiling with and without -g for debugging information.
- Memory costs of recursion.
- Compilers can recognize tail recursive calls to reduce memory use. Enabled with -foptimize-sibling-calls, -O2, -O3, and -Os.

(ロ)、(同)、(目)、(目)、(目)、(16/32)

## Table of contents

Procedures, function calls, and support for recursion

Transfer of control to callee Transfer of data to callee Transfer of control back to caller Transfer of data back to caller and restoring caller state Architecture support for recursive programming Cache, memory, storage, and network hierarchy trends Static random-access memory (registers, caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage)

Locality: How to create illusion of fast access to capacious data Spatial locality

Temporal locality

Caches: motivation

Hardware caches supports software locality Software locality exploits hardware caches

## Cache, memory, storage, and network hierarchy trends

 Assembly programming view of computer: CPU and memory.

 Full view of computer architecture and systems: +caches, +storage, +network



Figure: Memory hierarchy. Image credit CS: APP

## Cache, memory, storage, and network hierarchy trends



Topic of this chapter:

- Technology trends that drive CPU-memory gap.
- How to create illusion of fast access to capacious data.

Figure: Widening gap: CPU processing time vs. memory access time. Image credit CS:APP

### Static random-access memory (registers, caches)

- SRAM is bistable logic
- Access time: 1 to 10 CPU clock cycles
- Implemented in the same transistor technology as CPUs, so improvement has matched pace.



Figure: SRAM operating principle. Image credit Wikimedia

## Dynamic random-access memory (main memory)

- Needs refreshing every 10s of milliseconds
- 8GB typical in laptop; 1TB on each ilab machine
- Access time: 100 CPU clock cycles
- Memory gap: DRAM technological improvement slower relative to CPU/SRAM.



Figure: DRAM operating principle. Image credit ocw.mit.edu

## CPU / DRAM main memory interface



Figure: Memory Bus. Image credit CS:APP

 DDR4 bus standard supports 25.6GB/s transfer rate



Figure: Intel 2020 Gulftown die shot. Image credit AnandTech

(日) (同) (目) (日) (日) (0) (0)

22/32

## Solid state and hard disk drives (storage)

### Technology

- SSD: flash nonvolatile memory stores data as charge.
- ► HDD: magnetic orientation.
- Access time: 100K CPU clock cycles

# For in-depth on storage, file systems, and operating systems, take:

- CS214 Systems Programming
- CS416 Operating Systems Design

Since summer 2021, LCSR (admins of iLab) have moved the storage systems that supports everyone's home directories to SSD. https://resources.cs.rutgers.edu/docs/file-storage/storage-technology-options/



Figure: SSD. Image credit CS:APP



#### Figure: HDD. Image credit CS:APP

## I/O interfaces



Figure: I/O Bus. Image credit CS:APP

### Storage interfaces

- SATA 3.0 interface (6Gb/s transfer rate) typical
- PCIe (15.8 GB/s) becoming commonplace for SSD
- But interface data rate is rarely the bottleneck.

For in-depth on computer network layers, take:

CS352 Internet Technology

24/32

## Cache, memory, storage, and network hierarchy trends



Topic of this chapter:

- Technology trends that drive CPU-memory gap.
- How to create illusion of fast access to capacious data.

Figure: Widening gap: CPU processing time vs. memory access time. Image credit CS:APP

## Table of contents

Procedures, function calls, and support for recursion

Transfer of control to callee Transfer of data to callee Transfer of control back to caller Transfer of data back to caller and restoring caller state Architecture support for recursive programming Cache, memory, storage, and network hierarchy trends Static random-access memory (registers, caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage)

Locality: How to create illusion of fast access to capacious data Spatial locality

Temporal locality

Caches: motivation

Hardware caches supports software locality Software locality exploits hardware caches Locality: How to create illusion of fast access to capacious data

From the perspective of memory hierarchy, locality is using the data in at any particular level more frequently than accessing storage at next slower level.

(ロ)、(同)、(目)、(目)、(目)、(0)、(0), 27/32

First, let's experience the puzzling effect of locality in sumArray.c

- sumArrayRows()
- sumArrayCols()

Well-written programs maximize locality

- Spatial locality
- Temporal locality

## Spatial locality

```
1 double dotProduct (
    double a[N],
2
    double b[N],
3
4)
5
    double sum = 0.0;
    for(size_t i=0; i<N; i++) {</pre>
6
       sum += a[i] * b[i];
7
8
9
    return sum;
10 }
```

### Spatial locality

- Programs tend to access adjacent data.
- Example: stride 1 memory access in a and b.

◆□ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ <

## Temporal locality

```
1 double dotProduct (
    double a[N],
2
    double b[N],
3
4)
5
    double sum = 0.0;
    for(size_t i=0; i<N; i++) {</pre>
6
      sum += a[i] * b[i];
7
8
9
    return sum;
10 }
```

### Temporal locality

- Programs tend to access data over and over.
- Example: sum gets accessed *N* times in iteration.

◆□ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ <

## Table of contents

Procedures, function calls, and support for recursion

Transfer of control to callee Transfer of data to callee Transfer of control back to caller Transfer of data back to caller and restoring caller state Architecture support for recursive programming Cache, memory, storage, and network hierarchy trends Static random-access memory (registers, caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage)

Locality: How to create illusion of fast access to capacious data Spatial locality

Temporal locality

Caches: motivation

Hardware caches supports software locality Software locality exploits hardware caches

## CPU / cache / DRAM main memory interface



Figure: Cache resides on CPU chip close to register file. Image credit CS:APP





Figure: Intel 2020 Gulftown die shot. Image credit AnandTech

Figure: Cache stores a temporary copy from the slower main memory. Image credit CS:APP

## CPU / cache / DRAM main memory interactions



Figure: Cache stores a temporary copy from the slower main memory. Image credit CS:APP

### When CPU stores (ST) to memory

<ロト < 同ト < 三ト < 三ト 三 の < で 32/32

- Cache write hit
- Cache write miss