Random Access Lists
Binary Lists
We can represent natural numbers as peano naturals with a successor function and a base, zero:
data Nat = Z | S Nat
From this, we can see a deep correspondence between natural numbers and lists:
1inc :: Nat -> Nat 2inc n = S n 3 4cons :: a -> [a] -> [a] 5cons x xs = x : xs 6 7dec :: Nat -> Nat 8dec (S n) = n 9 10tail :: [a] -> [a] 11tail (_ : xs) = xs 12 13add :: Nat -> Nat -> Nat 14add Z n = n 15add (S m) n = S (add m n) 16 17(++) :: [a] -> [a] -> [a] 18[] ++ ys = ys 19(x : xs) ++ ys = x : (xs ++ ys)
Addition on peano numbers is inneficient, and peano numbers are clumsy to work with at scale. Instead, we use binary, a denser, more efficient representation that is easier to read and work with.
1data Bit = O | I deriving (Show, Eq, Enum) 2type Binary = [Bit] 3 4instance Num Bit where fromInteger = toEnum . fromInteger
We will describe binary as a list of bits, in reversed notation from what we are used to. For example [1, 0, 1, 1] is equal to 0b1101, or 13. We can also define zero as [], [0], [0, 0], etc. Let's define some basic operations on binary numbers:
1inc :: Binary -> Binary -- ~ O(1) 2inc [] = [1] 3inc (0 : bs) = 1 : bs 4inc (1 : bs) = 0 : inc bs
The amortized complexity of this is
- If we have a number full of
1s, we will have to flip all of them to0s, and then flip the last one to a1. This is. - However, this only needs to be done rarely. Incrementing a number with a leading
0is, and we have a while to go before a number full of 1s is reached.
1leadingOnes :: Binary -> Int 2leadingOnes = Seq.length . takeWhile (== 1) 3 4bitCount :: Binary -> Int 5bitCount = Seq.length . filter (== 1)
- We can trivially state that
= leadingOnes n + 1. This is also works for, which means that it describes the cost in all cases. Nice! - So, we must solve
leadingOnes n 1.
A nice strategy to find
is to consider the expensive case . In this case, given that the will also drop to after the expensive operation. Hence, we can try . In the expensive case:
leadingOnes n + 1leadingOnes n + 10 - leadingOnes nleadingOnes n + 1leadingOnes n1However, observe its value just before we reach the expensive case, which is always at odd numbers:
leadingOnes n + 1leadingOnes n + 1leadingOnes (inc n) - leadingOnes n1leadingOnes (inc n)leadingOnes (inc n) + 1$\leq A_\texttt{inc}(\texttt{n})`Now, we have constrained the amortised complexity to
, so we can see that leadingOnesis too chaotic: it forgets about expensiveness as it approaches.
Instead, we could consider a measure that grows more smoothly as we get closer to the most expensive case. This can be achieved with = bitCount n. This has the desired property that the difference between leadingOnes (2 ^ n - 1), but this time we have that
leadingOnes n + 1leadingOnes n + 1bitCount (inc n) - bitCount nleadingOnes n + 11 - bitCount nleadingOnes n - bitCount n + 2$\leq A_\texttt{inc}(\texttt{n})`2$\leq A_\texttt{inc}(\texttt{n})`
From there, we can stipulate that
Peano numbers have S, which is analogous to the list cons (:) as they store the same amount of information. Binary is a list of bits, where the
1data BinList a = BinList !Int [Maybe (Bush a)] deriving Show 2data Bush a | Leaf a | Fork (Bush a) (Bush a) deriving Show 3 4instance Seq BinList where 5 nil :: BinList a 6 nil = BinList 0 [] 7 8 length :: BinList a -> Int 9 length (BinList n _) = n
Just like there was a relationship between cons and inc for lists and and Peanos, there is a relationship between cons and inc for BinList and binary numbers:
1cons :: a -> BinList a -> BinList a 2cons x (BinList n bs) = BinList (n + 1) (inc (Leaf x) bs) 3 where inc :: Bush a -> [Maybe (Bush a)] -> [Maybe (Bush a)] -- ~ O(1) 4 inc t [] = [Just t] 5 inc t (Nothing : ts) = Just t : ts 6 inc t (Just t' : ts) = Nothing : inc (Fork t t') ts
If we compare binary inc to this cons, we can see that they have the same shape. Hence, the complexities match up the same way. However, maybe we can optimise indexing to BinList of length
The strategy is to go find a bush with the power of two we are interested in:
1(!!) :: BinList a -> Int -> a -- O(log n) 2BinList n ts !! i 3 | i < 0 || i >= n = error "Index out of bounds" 4 | otherwise = findBush ts i 1 5 where findBush :: [Maybe (Bush a)] -> Int -> Int -> a 6 7 -- No values here, we must be further up 8 findBush (Nothing : ts) i szT = findBush ts i (szT * 2) 9 10 findBush (Just t : ts) 11 -- i is inside this bush! 12 | i < szT = findIndex t i (szT `div` 2) 13 -- this is not the bush we are looking for 14 | otherwise = findBush ts (i - szT) (szT * 2) 15 16 findIndex :: Bush a -> Int -> Int -> a 17 findIndex (Leaf x) 0 1 = x 18 findIndex (Fork lt rt) i szT 19 | i < szT = findIndex lt i (szT `div` 2) 20 | otherwise = findIndex rt (i - szT) (szT `div` 2)
Now, we can implement head and last cheaply:
head :: BinList a -> a -- O(log n)
head = (!! 0)
last :: BinList a -> a -- O(log n)
last xs = xs !! (length xs - 1)
We can also convert the BinList back to a list with the function catMaybe:
1catMaybe :: [Maybe a] -> [a] 2catMaybe [] = [] 3catMaybe (Nothing : xs) = catMaybe xs 4catMaybe (Just x : xs) = x : catMaybe xs 5 6toList :: BinList a -> [a] -- O(n) 7toList (BinList _ ts) = toList (foldr (append . values) nil (catMaybes ts)) 8 where values :: Bush a -> DList a 9 values (Leaf x) = cons x nil 10 values (Fork lt rt) = values lt `append` values rt
We can also use the amortised cons to implement fromList and append:
1fromList :: [a] -> BinList a -- O(n) 2fromList = foldr cons nil 3 4append :: BinList a -> BinList a -> BinList a -- O(m) 5append xs ys = foldr cons ys (toList xs)
tail corresponds to decrementing the binary number. This time, we need to carry the "bit" backwards, and each time it has to be split in half before it can be slotted back in. The final bush will have size
1tail :: BinList a -> BinList a -- ~O(1) 2tail (BinList n ts) = BinList (n - 1) (snd (dec ts)) 3 where dec :: [Maybe (Bush a)] -> (Bush a, [Maybe (Bush a)]) 4 dec (Just t : ts) = (t, ts) 5 -- The tree returned by recursion is twice our size 6 -- We need to break it in half and feed the rest back 7 dec (Nothing : ts) = let (Fork t t', ts') = dec ts in (t, Just t' : ts')
tail is problematic. When we can both inc and dec a number, amortised analysis no longer works: if we have a BinList with cons and tail it, we will be stuck in the 1s, and we need to flip all of them to 0s. This is a problem, as we can't just "forget" about the expensiveness as we approach it. We need to find a measure that grows smoothly as we approach the expensive case. To fix this, we need to find a RAList. First, let's implement init and null to complete BinList:
1init :: BinList a -> BinList a -- O(n) 2init = fromList . init . toList 3 4null :: BinList a -> Bool -- O(1) 5null = (== 0) . length
Random Access Lists
Previously, we found the BinList implementation was flawed from the inclusion of tail, which ruined our amortised analysis. Instead, we can consider a list which contains no Nothings:
1data NETree = Tip a | Node (NETree a) a (NETree a) deriving Show 2data RAList = RAList !Int [(Int, NETree a)] deriving Show
In this structure, a non empty tree will always contain NETree a is paired up with its number of elements. The invariance of the structure is that every element in the list has an increasing size, except for the first two trees, which may be the same size.
1instance Seq RAList where 2 nil :: RAList a 3 nil = RAList 0 []
When both of the first trees are the same size, we can combine them to make a tree of the next size up by adding a single element into a Node that links them. This new tree might also be next to a list of the same size, or on its own. Because two trees of the same size always emerge where possible, there is no danger of the invariance being broken.
1cons :: a -> RAList a -> RAList a -- O(1) 2cons x (RAList n ((sz1, t1) : (sz2, t2) : ts)) 3 -- The trees are equal size, so we can combine 4 | sz1 == sz2 = RAList (n + 1) ((sz1 + sz2 + 1, Node t1 x t2) : ts) 5-- Otherwise, add a new tip 6cons x (RAList n ts) = RAList (n + 1) ((1, Tip x) : ts)
Tail can be defined similarly, also without recursion:
1tail :: RAList a -> RAList a -- O(1) 2-- It is easy to drop a tip 3tail (RAList n ((1, Tip _) : ts)) = RAList (n - 1) ts 4-- We split and discard the top, making two trees of the same size 5-- This will make two trees of the same size sat next to each other 6-- The next list is guaranteed to be double the size of the current one 7-- Which preserves the invariant 8tail (RAList n ((sz, Node t1 _ t2) : ts)) = RAList (n - 1) ((sz', t1) : (sz', t2) : ts) 9 where sz' = sz `div` 2
For indexing, the algorithm is pretty much the same as for binary lists:
1(!!) :: RAList a -> Int -> a -- O(log n) 2RAList n ts !! i 3 | i < 0 || i >= n = error "Index out of bounds" 4 | otherwise = findTree ts i 5 where findTree :: [(Int, NETree a)] -> Int -> a 6 findTree ((sz, t) : ts) i 7 -- This is our tree 8 | i < sz = findIndex t i ((sz - 1) `div` 2) 9 -- This is not our tree 10 | otherwise = findTree ts (i - sz) 11 12 findIndex :: NETree a -> Int -> Int -> a -- O(log n) 13 findIndex (Tip x) 0 0 = x 14 findIndex (Node t1 x t2) i sz 15 | i == 0 = x 16 | i <= sz = findIndex t1 (i - 1) sz' 17 | otherwise = findIndex t2 (i - sz - 1) sz' 18 where sz' = (sz - 1) `div` 2
We have to be careful with findIndex:
- The top of the tree is always index
0. - The next
szelements are found on the left. - The remaining
szelements are found on the right.
We can then turn it back into a list, taking care of the order of insertion:
1toList :: RAList a -> [a] -- O(n) 2toList (RAList _ ts) = toList (foldr (append . values . snd) nil ts) 3 where values :: NETree a -> DList a 4 values (Tip x) = cons x nil 5 values (Node t1 x t2) = cons x (append (values t1) (values t2))
And the rest of the functions are as before:
1head :: RAList a -> a -- O(log n) 2head = (!! 0) 3 4last :: RAList a -> a -- O(log n) 5last xs = xs !! (length xs - 1) 6 7length :: RAList a -> Int -- O(1) 8length (RAList n _) = n 9 10null :: RAList a -> Bool -- O(1) 11null = (== 0) . length 12 13init :: RAList a -> RAList a -- O(n) 14init = fromList . init . toList 15 16fromList :: [a] -> RAList a -- O(n) 17fromList = foldr cons nil 18 19append :: RAList a -> RAList a -> RAList a -- O(m) 20append xs ys = foldr cons ys (toList xs)
Now that we have adjusted the structure to fix the complexities, can we tie this back to our analogousness with numbers?
In our new number system, each digit corresponds to
, and we can sometimes have two of them. Our invariance states that we can only have two of the least-significant digit. Let's try representing some numbers:
This is really interesting, because at any one time, at most two bits flip when you increment or decrement. As such, we have
operations for these assuming you can "jump" straight to the least-significant set bit. Our tree representation allows us to do just this. Can we tie this back to our analogousness with numbers? Yes, we can.