Random Access Lists

Binary Lists

We can represent natural numbers as peano naturals with a successor function and a base, zero:

data Nat = Z | S Nat

From this, we can see a deep correspondence between natural numbers and lists:

1inc :: Nat -> Nat
2inc n = S n
3
4cons :: a -> [a] -> [a]
5cons x xs = x : xs
6
7dec :: Nat -> Nat
8dec (S n) = n
9
10tail :: [a] -> [a]
11tail (_ : xs) = xs
12
13add :: Nat -> Nat -> Nat
14add Z n = n
15add (S m) n = S (add m n)
16
17(++) :: [a] -> [a] -> [a]
18[] ++ ys = ys
19(x : xs) ++ ys = x : (xs ++ ys)

Addition on peano numbers is inneficient, and peano numbers are clumsy to work with at scale. Instead, we use binary, a denser, more efficient representation that is easier to read and work with.

1data Bit = O | I deriving (Show, Eq, Enum)
2type Binary = [Bit]
3
4instance Num Bit where fromInteger = toEnum . fromInteger

We will describe binary as a list of bits, in reversed notation from what we are used to. For example [1, 0, 1, 1] is equal to 0b1101, or 13. We can also define zero as [], [0], [0, 0], etc. Let's define some basic operations on binary numbers:

1inc :: Binary -> Binary -- ~ O(1)
2inc [] = [1]
3inc (0 : bs) = 1 : bs
4inc (1 : bs) = 0 : inc bs

The amortized complexity of this is . Why?

If we have a number full of 1s, we will have to flip all of them to 0s, and then flip the last one to a 1. This is .
However, this only needs to be done rarely. Incrementing a number with a leading 0 is , and we have a while to go before a number full of 1s is reached.

1leadingOnes :: Binary -> Int
2leadingOnes = Seq.length . takeWhile (== 1)
3
4bitCount :: Binary -> Int
5bitCount = Seq.length . filter (== 1)

We can trivially state that = leadingOnes n + 1. This is also works for , which means that it describes the cost in all cases. Nice!
So, we must solve leadingOnes n 1 .

A nice strategy to find is to consider the expensive case . In this case, given that the will also drop to after the expensive operation. Hence, we can try . In the expensive case:

leadingOnes n + 1

leadingOnes n + 1 0 - leadingOnes n

leadingOnes n + 1 leadingOnes n

1

However, observe its value just before we reach the expensive case, which is always at odd numbers:

leadingOnes n + 1

leadingOnes n + 1 leadingOnes (inc n) - leadingOnes n

1 leadingOnes (inc n)

leadingOnes (inc n) + 1 $\leq A_\texttt{inc}(\texttt{n})`

Now, we have constrained the amortised complexity to , so we can see that leadingOnes is too chaotic: it forgets about expensiveness as it approaches.

Instead, we could consider a measure that grows more smoothly as we get closer to the most expensive case. This can be achieved with = bitCount n. This has the desired property that the difference between and is still leadingOnes (2 ^ n - 1), but this time we have that . Now, we can perform the check for all :

leadingOnes n + 1
leadingOnes n + 1 bitCount (inc n) - bitCount n
leadingOnes n + 1 1 - bitCount n
leadingOnes n - bitCount n + 2 $\leq A_\texttt{inc}(\texttt{n})`
2 $\leq A_\texttt{inc}(\texttt{n})`

From there, we can stipulate that and hence the amortised complexity is . However, this leads to an interesting question: if Peano numbers are closely related to cons lists, what do binary numbers correspond to, and what properties do they have?

Peano numbers have S, which is analogous to the list cons (:) as they store the same amount of information. Binary is a list of bits, where the th bit stores information. This is analogous to a list of trees, each of which store to the power of some number of elements perfectly:

1data BinList a = BinList !Int [Maybe (Bush a)] deriving Show
2data Bush a | Leaf a | Fork (Bush a) (Bush a) deriving Show
3
4instance Seq BinList where
5  nil :: BinList a
6  nil = BinList 0 []
7
8  length :: BinList a -> Int
9  length (BinList n _) = n

Just like there was a relationship between cons and inc for lists and and Peanos, there is a relationship between cons and inc for BinList and binary numbers:

1cons :: a -> BinList a -> BinList a
2cons x (BinList n bs) = BinList (n + 1) (inc (Leaf x) bs)
3  where inc :: Bush a -> [Maybe (Bush a)] -> [Maybe (Bush a)] -- ~ O(1)
4        inc t [] = [Just t]
5        inc t (Nothing : ts) = Just t : ts
6        inc t (Just t' : ts) = Nothing : inc (Fork t t') ts

If we compare binary inc to this cons, we can see that they have the same shape. Hence, the complexities match up the same way. However, maybe we can optimise indexing to , as a binary tree has depth for elements, and there are trees in a BinList of length .

The strategy is to go find a bush with the power of two we are interested in:

1(!!) :: BinList a -> Int -> a -- O(log n)
2BinList n ts !! i
3  | i < 0 || i >= n = error "Index out of bounds"
4  | otherwise       = findBush ts i 1
5  where findBush :: [Maybe (Bush a)] -> Int -> Int -> a
6
7        -- No values here, we must be further up
8        findBush (Nothing : ts) i szT = findBush ts i (szT * 2)
9        
10        findBush (Just t : ts)
11          -- i is inside this bush!
12          | i < szT = findIndex t i (szT `div` 2)
13          -- this is not the bush we are looking for
14          | otherwise = findBush ts (i - szT) (szT * 2)
15
16        findIndex :: Bush a -> Int -> Int -> a
17        findIndex (Leaf x) 0 1 = x
18        findIndex (Fork lt rt) i szT
19          | i < szT = findIndex lt i (szT `div` 2)
20          | otherwise = findIndex rt (i - szT) (szT `div` 2)

Now, we can implement head and last cheaply:

head :: BinList a -> a -- O(log n)
head = (!! 0)

last :: BinList a -> a -- O(log n)
last xs = xs !! (length xs - 1)

We can also convert the BinList back to a list with the function catMaybe:

1catMaybe :: [Maybe a] -> [a]
2catMaybe [] = []
3catMaybe (Nothing : xs) = catMaybe xs
4catMaybe (Just x : xs) = x : catMaybe xs
5
6toList :: BinList a -> [a] -- O(n)
7toList (BinList _ ts) = toList (foldr (append . values) nil (catMaybes ts))
8  where values :: Bush a -> DList a
9        values (Leaf x) = cons x nil
10        values (Fork lt rt) = values lt `append` values rt

We can also use the amortised cons to implement fromList and append:

1fromList :: [a] -> BinList a -- O(n)
2fromList = foldr cons nil
3
4append :: BinList a -> BinList a -> BinList a -- O(m)
5append xs ys = foldr cons ys (toList xs)

tail corresponds to decrementing the binary number. This time, we need to carry the "bit" backwards, and each time it has to be split in half before it can be slotted back in. The final bush will have size , which is the element we want to discard.

1tail :: BinList a -> BinList a -- ~O(1)
2tail (BinList n ts) = BinList (n - 1) (snd (dec ts))
3  where dec :: [Maybe (Bush a)] -> (Bush a, [Maybe (Bush a)])
4        dec (Just t : ts) = (t, ts)
5        -- The tree returned by recursion is twice our size
6        -- We need to break it in half and feed the rest back
7        dec (Nothing : ts) = let (Fork t t', ts') = dec ts in (t, Just t' : ts')

tail is problematic. When we can both inc and dec a number, amortised analysis no longer works: if we have a BinList with elements, then repeatedly cons and tail it, we will be stuck in the case. This is because we are at the boundary of all 1s, and we need to flip all of them to 0s. This is a problem, as we can't just "forget" about the expensiveness as we approach it. We need to find a measure that grows smoothly as we approach the expensive case. To fix this, we need to find a that is representative of both operations. To find such a value, we'd have to lose our correspondence with numbers, which we can do in an RAList. First, let's implement init and null to complete BinList:

1init :: BinList a -> BinList a -- O(n)
2init = fromList . init . toList
3
4null :: BinList a -> Bool -- O(1)
5null = (== 0) . length

Random Access Lists

Previously, we found the BinList implementation was flawed from the inclusion of tail, which ruined our amortised analysis. Instead, we can consider a list which contains no Nothings:

1data NETree = Tip a | Node (NETree a) a (NETree a) deriving Show
2data RAList = RAList !Int [(Int, NETree a)] deriving Show

In this structure, a non empty tree will always contain elements for depth . In the random access list, every NETree a is paired up with its number of elements. The invariance of the structure is that every element in the list has an increasing size, except for the first two trees, which may be the same size.

1instance Seq RAList where
2  nil :: RAList a
3  nil = RAList 0 []

When both of the first trees are the same size, we can combine them to make a tree of the next size up by adding a single element into a Node that links them. This new tree might also be next to a list of the same size, or on its own. Because two trees of the same size always emerge where possible, there is no danger of the invariance being broken.

1cons :: a -> RAList a -> RAList a -- O(1)
2cons x (RAList n ((sz1, t1) : (sz2, t2) : ts))
3  -- The trees are equal size, so we can combine
4  | sz1 == sz2 = RAList (n + 1) ((sz1 + sz2 + 1, Node t1 x t2) : ts)
5-- Otherwise, add a new tip
6cons x (RAList n ts) = RAList (n + 1) ((1, Tip x) : ts)

Tail can be defined similarly, also without recursion:

1tail :: RAList a -> RAList a -- O(1)
2-- It is easy to drop a tip
3tail (RAList n ((1, Tip _) : ts)) = RAList (n - 1) ts
4-- We split and discard the top, making two trees of the same size
5-- This will make two trees of the same size sat next to each other
6-- The next list is guaranteed to be double the size of the current one
7-- Which preserves the invariant
8tail (RAList n ((sz, Node t1 _ t2) : ts)) = RAList (n - 1) ((sz', t1) : (sz', t2) : ts)
9  where sz' = sz `div` 2

For indexing, the algorithm is pretty much the same as for binary lists:

1(!!) :: RAList a -> Int -> a -- O(log n)
2RAList n ts !! i
3  | i < 0 || i >= n = error "Index out of bounds"
4  | otherwise       = findTree ts i
5  where findTree :: [(Int, NETree a)] -> Int -> a
6        findTree ((sz, t) : ts) i
7          -- This is our tree
8          | i < sz   = findIndex t i ((sz - 1) `div` 2)
9          -- This is not our tree
10          | otherwise = findTree ts (i - sz)
11
12        findIndex :: NETree a -> Int -> Int -> a -- O(log n)
13        findIndex (Tip x) 0 0 = x
14        findIndex (Node t1 x t2) i sz
15          | i == 0 = x
16          | i <= sz = findIndex t1 (i - 1) sz'
17          | otherwise = findIndex t2 (i - sz - 1) sz'
18          where sz' = (sz - 1) `div` 2

We have to be careful with findIndex:

The top of the tree is always index 0.
The next sz elements are found on the left.
The remaining sz elements are found on the right.

We can then turn it back into a list, taking care of the order of insertion:

1toList :: RAList a -> [a] -- O(n)
2toList (RAList _ ts) = toList (foldr (append . values . snd) nil ts)
3  where values :: NETree a -> DList a
4        values (Tip x) = cons x nil
5        values (Node t1 x t2) = cons x (append (values t1) (values t2))

And the rest of the functions are as before:

1head :: RAList a -> a -- O(log n)
2head = (!! 0)
3
4last :: RAList a -> a -- O(log n)
5last xs = xs !! (length xs - 1)
6
7length :: RAList a -> Int -- O(1)
8length (RAList n _) = n
9
10null :: RAList a -> Bool -- O(1)
11null = (== 0) . length
12
13init :: RAList a -> RAList a -- O(n)
14init = fromList . init . toList
15
16fromList :: [a] -> RAList a -- O(n)
17fromList = foldr cons nil
18
19append :: RAList a -> RAList a -> RAList a -- O(m)
20append xs ys = foldr cons ys (toList xs)

Now that we have adjusted the structure to fix the complexities, can we tie this back to our analogousness with numbers?

In our new number system, each digit corresponds to , and we can sometimes have two of them. Our invariance states that we can only have two of the least-significant digit. Let's try representing some numbers:

This is really interesting, because at any one time, at most two bits flip when you increment or decrement. As such, we have operations for these assuming you can "jump" straight to the least-significant set bit. Our tree representation allows us to do just this.

Can we tie this back to our analogousness with numbers? Yes, we can.

Back to Home

Table of Contents

Random Access Lists

Binary Lists

Random Access Lists

Now that we have adjusted the structure to fix the complexities, can we tie this back to our analogousness with numbers?