Memory Management

Memory managment needs to provide allocation and protection. We have no knowledge of how the memory addresses are generated, and what they are used for. There is a heirrarchy of memory:

  1. Register access takes one clock cycle.
  2. Caches sit inbetween the registers and main memory.
  3. Main memory can take many cycles.

1. Memory Allocation

Memory management binds logical address space to physical address space:

These are the same in compile, but different in execution time.

The Memory Managment Unit (MMU) is a hardware device for mapping logical to physical address. It has to be fast, so it is implemented in hardware.

1.1. Contiguous Allocation

Main memory is split into two partitions:

Contigous allocation with relocation registers:

invert center screen small

We also check memory access valid to protect user processes. If the logical address is outside the user's segment, we get a segmentation fault (SEGFAULT).

1.2 Multiple Partition Allocation

A hole is a block of available memory. Holes of various sizes are scattered throughout memory. When a process arrives, it is allocated memory from a hole large enough to accommodate it. OS maintains information about allocated partitions and free partitions.

invert center screen Small

How do we satisfy the request of size from a list of free holes?

1.3 Fragmentation

Fragmentation occurs when memory is allocated and deallocated. There are two types:

Fragmentation can be reduced with compaction - shuffle memory contents to place all free memory together in one block. This may lead to IO bottlenecks.

1.4 Swapping

Currently, the number of processes is limited by the amount of available memory. To stop this, we swap processes temporarily out of memory to disk, and bring back into memory for continued execution. This requires a swap space on disk. Transfer time is a bottleneck.

2. Virtual Memory

Virtual memory is the seperation of logical memory from physical memory. This means:

Virtual memory may be implemented via paging or segmentation.

2.1 Paging

Logical address spaces of processes can be noncontiguous. Processes are allocated physical memory when available in fixed-size blocks called frames. We keep track of all free frames. Pages are the same size as frames in the logical address space. To run a program of size pages:

  1. Find free frames and load program.
  2. Set up a page table to translate logical to physical addresses.

invert center small screen

Addresses generated by the CPU are divided into:

For a logical address space of size and page size of , the page number has bits, and the page offset has bits.

invert center small screen

Fixed-size paging avoids external memory fragmentation.

Memory protection is implemented by associating protection bits with pages. A valid-invalid bit is attached to each page table entry:

invert center small screen

2.2 Page Tables

A page table can be kept in main memory with two registers:

However, this is innefficient as every data/instruction access needs two memory accesses: one for page table and one for data/instruction.

Instead, we use a special fast-lookup hardware cache as associative memory. Associative memory supports parallel search.

For address translation on :

Translation Lookaside Buffers (TLBs) can store address space IDs (ASID) to uniquely identify each process to provide address space protection for that process. TLBs usually need to be flushed after context switch, which can lead to substantial overhead.

invert center screen

We can measure the performance of a TLB by calculating the effective access time (EAT):

There are types of page tables:

2.2.1 Heirarchical Page Tables

We break up logical address space into multiple page tables. For example, two-level paging:

invert center screen small

Logical address is divided into a page number and page offset. For example, with a page size and address:

Thus, the logical address is:

On a bit machine with pages, a page table needs entries, which is too large (8 bytes per entry is million GB).

2.2.2 Hashed Page Tables

We hash the virtual page number into the page table:

invert center small screen

2.2.3 Inverted Page Tables

An entry for each page frame of memory of:

This decreases the memory needed to store the table, but increases the time to search the table. A hash table can be used to limit the search to a few entries.

invert center small screen

2.3 Shared Memory

Processes can set up shared memory areas (implicitly or explicitly). After shared memory is established, there is no need for kernel involvement. The system API is as follows:

System CallDescription
shmgetCreate a new shared memory segment.
shmatAttach a shared memory segment to the address space.
shmdtDetach a shared memory segment from the address space.
shmctlControl shared memory.

It is more common to use mmap instead.

2.4 Segmentation

Paging gives a one-dimensional virtual address space. Segments provide:

Memory allocation is harder due to variable size - may suffer from external fragmentation. Segments can be shared between processes.

2.5 Demand Paging

Bring page into memory only when needed. This means a:

When a page is needed, we check if its a valid reference. Only if it is valid do we bring it into memory. If it is not valid, we get a page fault.

invert center screen

2.6 Page Faults

First references always lead to page faults. If the reference is invalid, we abort. Otherwise, we handle the request:

  1. Get an empty page frame.
  2. Swap the required page into the frame from disk.
  3. Update the page table to reflect the new mapping.
  4. Restart the instruction that caused the page fault.

invert center screen

The effective access time for demand paging depends on the number of page faults. Now:

where is the page fault rate.

2.7 Virtual Memory Tricks

Copy-on-write (COW) allows parent and child processes to initially share the same pages of memory. Only if one process writes to the memory does the OS copy the page. This is useful for fork system calls, as it allows for efficient process creation. Free pages are allocated from a pool of zeroed pages.

Memory mapped files map files into virtual address space using paging. This simplifies the programming model for IO.

3. Page Replacement

If no free frame is available, we need to find an unused page in memory to swap out. We need an optimal strategy for page replacement that:

In essence we have to:

  1. Find the location of the desired page on disk.
  2. Find a free frame. If it is found, use it. If not, use a replacement algorithm to select a victim frame.
  3. Read the desired page into the (newly) free frame.
  4. Update the page and frame tables.
  5. Restart the process.

invert center screen small

Page replacement algorithms want the lowest page-fault rate. In general, as the number of frames increases, the page-fault rate decreases.

3.1 FIFO Page Replacement

In FIFO Page Replacement, we replace the oldest page. This is easy to implement with a circular queue. However, it may not be the best algorithm. It may also suffer from Belady's Anomaly - increasing the number of frames may increase the number of page faults. This anomaly happens because the FIFO algorithm does not consider the frequency or recency of page accesses. It simply replaces the oldest page, regardless of its future usefulness. As a result, even with more page frames available, the algorithm may still evict pages that will be needed again in the near future, leading to more page faults.

An optimal algorithm would be to replace the page that will not be used for the longest period of time. However, this is impossible to implement in practice.

3.2 LRU Page Replacement

In the Least Recently Used algorithm, each page has a counter. When a page is reference, the system clock is copied to the counter. When a page needs to be replaced, the page with the smallest counter is replaced. This is a good approximation to the optimal page replacement algorithm. Howeverm it is expensive to implement, so an approximation is used instead.

With a reference bit, each page is given a bit initially set to . When a page is reference, set the bit to . Always replace a page with a bit. Periodically, set all bits to . This does not provide a proper ordering of pages for LRU.

A second chance algorithm combines round robin with a reference bit. If page to be replaced (in round robin order) has a reference bit of , set it to and give it a second chance (increment its index), then repeat. If that page has a reference bit of , replace the page and increment the round robin index. This is a good approximation to LRU.

invert center screen small

3.3 Counting Algorithms

Counting algorithms keep a counter of the number of references made to each page.

For a program to run efficiently, the system must maintain a program's favoured subset of pages in main memory. Otherwise, thrashing occurs - excessive paging activity causes low processor utilisation, and programs repeatedly request pages from secondary storage. In general, programs request the same pages in space and time - this is called locality of reference.

3.4 Working Set Model

A working set of pages is a set of pages referenced by a process during the time interval to . A working set clock algorithm adds a time of last use to the clock algorithm by keeping track if the page is in the current working set. On each page fault:

How do we choose the size of the working set? One method is to observe the page fault frequency. If the working set is too small, the page fault frequency will be high. If the working set is too large, the page fault frequency will be low, but the working set will be too large to fit in memory.

3.5 Global vs Local Replacement

In local replacement:

In global replacement:

There is no single best page replacement algorithm. The best algorithm depends on the system and the workload. Linux uses global page replacement, and Windows uses local page replacement.

Back to Home