Memory Management

Memory managment needs to provide allocation and protection. We have no knowledge of how the memory addresses are generated, and what they are used for. There is a heirrarchy of memory:

Register access takes one clock cycle.
Caches sit inbetween the registers and main memory.
Main memory can take many cycles.

1. Memory Allocation

Memory management binds logical address space to physical address space:

Logical address space - generated by the CPU, address space seen by process.
Physical address space - address seen by the memory unit, refers to physical system memory.

These are the same in compile, but different in execution time.

The Memory Managment Unit (MMU) is a hardware device for mapping logical to physical address. It has to be fast, so it is implemented in hardware.

1.1. Contiguous Allocation

Main memory is split into two partitions:

Resident operating system (kernel) - held in low memory with interrupt vector.
User processes (user) - held in high memory.

Contigous allocation with relocation registers:

Base register contains the value of the smallest physical address.
Limit register contains the range of logical addresses.
MMU maps logical address dynamically.

invert center screen small

We also check memory access valid to protect user processes. If the logical address is outside the user's segment, we get a segmentation fault (SEGFAULT).

1.2 Multiple Partition Allocation

A hole is a block of available memory. Holes of various sizes are scattered throughout memory. When a process arrives, it is allocated memory from a hole large enough to accommodate it. OS maintains information about allocated partitions and free partitions.

invert center screen Small

How do we satisfy the request of size from a list of free holes?

First Fit: allocate the first whole that is big enough.
Best Fit: allocate the smallest hole that is big enough (must search entire list).
Worst Fit: allocate the largest hole (must also search entire list).

1.3 Fragmentation

Fragmentation occurs when memory is allocated and deallocated. There are two types:

External Fragmentation: total memory exists to satisfy request, but not contiguous.
Internal Fragmentation: allocated memory larger than requested memory. Size difference internal to partition.

Fragmentation can be reduced with compaction - shuffle memory contents to place all free memory together in one block. This may lead to IO bottlenecks.

1.4 Swapping

Currently, the number of processes is limited by the amount of available memory. To stop this, we swap processes temporarily out of memory to disk, and bring back into memory for continued execution. This requires a swap space on disk. Transfer time is a bottleneck.

2. Virtual Memory

Virtual memory is the seperation of logical memory from physical memory. This means:

Only part of the process needs to be in memory for execution.
Logical address spaces can be much larger than physical address spaces.
Address spaces can be shared by several processes.
Allows for more efficient process creation.

Virtual memory may be implemented via paging or segmentation.

2.1 Paging

Logical address spaces of processes can be noncontiguous. Processes are allocated physical memory when available in fixed-size blocks called frames. We keep track of all free frames. Pages are the same size as frames in the logical address space. To run a program of size pages:

Find free frames and load program.
Set up a page table to translate logical to physical addresses.

invert center small screen

Addresses generated by the CPU are divided into:

Page number () - used as an index into a page table which contains base address of each page in physical memory.
Page offset () - combined with base address to define the physical memory address that is sent to the memory unit.

For a logical address space of size and page size of , the page number has bits, and the page offset has bits.

invert center small screen

Fixed-size paging avoids external memory fragmentation.

Memory protection is implemented by associating protection bits with pages. A valid-invalid bit is attached to each page table entry:

Valid indicates a legal page (in the process' logical address space).
Invalid indicates a missing page. This could be:
- Page not in process' address space (page fault).
- Need to load page from disk (demand paging).
- Incorrect access (programming error).

invert center small screen

2.2 Page Tables

A page table can be kept in main memory with two registers:

Page table base register (PTBR) points to the page table.
Page table length register (PRLR) indicates size of page table.

However, this is innefficient as every data/instruction access needs two memory accesses: one for page table and one for data/instruction.

Instead, we use a special fast-lookup hardware cache as associative memory. Associative memory supports parallel search.

For address translation on :

If in associative register, get frame number.
Otherwise, get frame number from page table in memory.

Translation Lookaside Buffers (TLBs) can store address space IDs (ASID) to uniquely identify each process to provide address space protection for that process. TLBs usually need to be flushed after context switch, which can lead to substantial overhead.

invert center screen

We can measure the performance of a TLB by calculating the effective access time (EAT):

The time taken for associative lookup is .
The time taken for memory access is assumed to be .
The number of associative registers is .
The hit ratio is the percentage of times a page number is found in associative registers. It is related to the number of associative registers.
EAT is given by: .

There are types of page tables:

Heirarchical page tables.
Hashed page tables.
Inverted page tables.

2.2.1 Heirarchical Page Tables

We break up logical address space into multiple page tables. For example, two-level paging:

invert center screen small

Logical address is divided into a page number and page offset. For example, with a page size and address:

pages, so bits for page number.
bytes per page, so bits for page offset. Since the page table is also divided into pages, we need another page table for the page number, so it is divided as:
bit page number.
bit page offset.

Thus, the logical address is:

On a bit machine with pages, a page table needs entries, which is too large (8 bytes per entry is million GB).

2.2.2 Hashed Page Tables

We hash the virtual page number into the page table:

Page table contains a chain of elements leading to the same location.
We search for a match of the virtual page number in the chain.
Extract a frame number from the matched entry.

invert center small screen

2.2.3 Inverted Page Tables

An entry for each page frame of memory of:

Virtual address of page stored in frame.
Information about the process that owns the page.

This decreases the memory needed to store the table, but increases the time to search the table. A hash table can be used to limit the search to a few entries.

invert center small screen

2.3 Shared Memory

Processes can set up shared memory areas (implicitly or explicitly). After shared memory is established, there is no need for kernel involvement. The system API is as follows:

System Call	Description
`shmget`	Create a new shared memory segment.
`shmat`	Attach a shared memory segment to the address space.
`shmdt`	Detach a shared memory segment from the address space.
`shmctl`	Control shared memory.

It is more common to use mmap instead.

2.4 Segmentation

Paging gives a one-dimensional virtual address space. Segments provide:

Independent address space from to some maximum limit.
Can grow or shrink dynamically.
Support different kinds of protection (read-only, execute-only, etc).
Unlike pages, programmers are aware of segments.

Memory allocation is harder due to variable size - may suffer from external fragmentation. Segments can be shared between processes.

2.5 Demand Paging

Bring page into memory only when needed. This means a:

Lower IO load.
Less memory needed.
Faster response time.
Support for more users.

When a page is needed, we check if its a valid reference. Only if it is valid do we bring it into memory. If it is not valid, we get a page fault.

invert center screen

2.6 Page Faults

First references always lead to page faults. If the reference is invalid, we abort. Otherwise, we handle the request:

Get an empty page frame.
Swap the required page into the frame from disk.
Update the page table to reflect the new mapping.
Restart the instruction that caused the page fault.

invert center screen

The effective access time for demand paging depends on the number of page faults. Now:

where is the page fault rate.

2.7 Virtual Memory Tricks

Copy-on-write (COW) allows parent and child processes to initially share the same pages of memory. Only if one process writes to the memory does the OS copy the page. This is useful for fork system calls, as it allows for efficient process creation. Free pages are allocated from a pool of zeroed pages.

Memory mapped files map files into virtual address space using paging. This simplifies the programming model for IO.

3. Page Replacement

If no free frame is available, we need to find an unused page in memory to swap out. We need an optimal strategy for page replacement that:

Minimises the number of page faults (avoids bringing the same page into memory several times).
Prevent over-allocation (page fault service routine should include page replacement).
Use dirty bit to reduce overhead of page transfers (only modified pages are written to disk).

In essence we have to:

Find the location of the desired page on disk.
Find a free frame. If it is found, use it. If not, use a replacement algorithm to select a victim frame.
Read the desired page into the (newly) free frame.
Update the page and frame tables.
Restart the process.

invert center screen small

Page replacement algorithms want the lowest page-fault rate. In general, as the number of frames increases, the page-fault rate decreases.

3.1 FIFO Page Replacement

In FIFO Page Replacement, we replace the oldest page. This is easy to implement with a circular queue. However, it may not be the best algorithm. It may also suffer from Belady's Anomaly - increasing the number of frames may increase the number of page faults. This anomaly happens because the FIFO algorithm does not consider the frequency or recency of page accesses. It simply replaces the oldest page, regardless of its future usefulness. As a result, even with more page frames available, the algorithm may still evict pages that will be needed again in the near future, leading to more page faults.

An optimal algorithm would be to replace the page that will not be used for the longest period of time. However, this is impossible to implement in practice.

3.2 LRU Page Replacement

In the Least Recently Used algorithm, each page has a counter. When a page is reference, the system clock is copied to the counter. When a page needs to be replaced, the page with the smallest counter is replaced. This is a good approximation to the optimal page replacement algorithm. Howeverm it is expensive to implement, so an approximation is used instead.

With a reference bit, each page is given a bit initially set to . When a page is reference, set the bit to . Always replace a page with a bit. Periodically, set all bits to . This does not provide a proper ordering of pages for LRU.

A second chance algorithm combines round robin with a reference bit. If page to be replaced (in round robin order) has a reference bit of , set it to and give it a second chance (increment its index), then repeat. If that page has a reference bit of , replace the page and increment the round robin index. This is a good approximation to LRU.

invert center screen small

3.3 Counting Algorithms

Counting algorithms keep a counter of the number of references made to each page.

LFU (Least Frequently Used) - replace the page with the smallest count. This may replace a page that is just brought into memory. However, this never forgets page usage, which can be fixed by resetting counters periodically, or by using a aging algorithm.
MFU (Most Frequently Used) - replace the page with the highest count. This may remove a page that is heavily used in the beginning of a process.

For a program to run efficiently, the system must maintain a program's favoured subset of pages in main memory. Otherwise, thrashing occurs - excessive paging activity causes low processor utilisation, and programs repeatedly request pages from secondary storage. In general, programs request the same pages in space and time - this is called locality of reference.

3.4 Working Set Model

A working set of pages is a set of pages referenced by a process during the time interval to . A working set clock algorithm adds a time of last use to the clock algorithm by keeping track if the page is in the current working set. On each page fault:

If the reference bit is , set the reference bit to and move to the next page.
If the reference bit is , calculate the age. If it is less than the working set window, move to the next page (Page is in working set). Otherwise, if the page is clean, replace, otherwise trigger a write back and continue.

How do we choose the size of the working set? One method is to observe the page fault frequency. If the working set is too small, the page fault frequency will be high. If the working set is too large, the page fault frequency will be low, but the working set will be too large to fit in memory.

3.5 Global vs Local Replacement

In local replacement:

Each process gets a fixed allocation of physical memory.
The OS needs to pick up changes in the working set size.

In global replacement:

The memory is shared dynamically between runnable processes.
Initial memory is allocated in proportion to the process size.
Consider the page fault frequency to tune the allocation.

There is no single best page replacement algorithm. The best algorithm depends on the system and the workload. Linux uses global page replacement, and Windows uses local page replacement.

Back to Home

Table of Contents

Memory Management

1. Memory Allocation

1.1. Contiguous Allocation

1.2 Multiple Partition Allocation

1.3 Fragmentation

1.4 Swapping

2. Virtual Memory

2.1 Paging

2.2 Page Tables

2.2.1 Heirarchical Page Tables

2.2.2 Hashed Page Tables

2.2.3 Inverted Page Tables

2.3 Shared Memory

2.4 Segmentation

2.5 Demand Paging

2.6 Page Faults

2.7 Virtual Memory Tricks

3. Page Replacement

3.1 FIFO Page Replacement

3.2 LRU Page Replacement

3.3 Counting Algorithms

3.4 Working Set Model

3.5 Global vs Local Replacement