Sequences
1. Persistent Data Structures
In haskell, we have access to singly-linked lists. These are persistent data structures.
1data [a] where 2 [] :: [a] 3 (:) :: a -> [a] -> [a]
These operations are both
Persistence data structures always want to share as much as they can with their sub structures.
(++) :: [a] -> [a] -> [a]
A bad definition of (++) would be to destroy ys:
1[] ++ [] = [] 2[] ++ (y:ys) = y : ([] ++ ys) 3(x:xs) ++ ys = x : (xs ++ ys)
A better definition would be to preserve ys, and so it has structural sharing. We have to tear down xs because we can't see its last cell, and we can't mutate it:
1[] ++ ys = ys 2(x:xs) ++ ys = x : (xs ++ ys)
This (++) is n = length xs.
2. Folds
1concat :: [[a]] -> [a] 2concat [] = [] 3concat (xs:xss) = xs ++ concat xss
Let's assume all the xs in xss have average length n and there are m of them. Then concat is
foldr f k is a recipe for the right-associative application of the function f to the elements of a list and k. The effect of foldr f k on this list is:
(:) x ((:) y ((:) z k)) = f x (f y (f z k))
It is defined as:
1foldr :: (a -> b -> b) -> b -> [a] -> b 2foldr f k [] = k 3foldr f k (x:xs) = f x (foldr f k xs) 4 5xs ++ ys = foldr (:) ys xs 6concat xss = foldr (++) [] xss
We also have foldl which is left-associative:
1foldl :: (b -> a -> b) -> b -> [a] -> b 2foldl f acc [] = acc 3foldl f acc (x:xs) = foldl f (f acc x) xs
3. Monoids
Foldl will produce left associative application f (f (f k x) y) z. foldl and foldr will do the same thing extrinsically, given f is associative and k is a "zero".
1f x k = x 2f k x = x
A mathematical term for this is a Monoid:
1class Monoid m where 2 mempty :: m 3 (<>) :: m -> m -> m 4 5mempty <> x = x 6x <> mempty = x 7(x <> y) <> z = x <> (y <> z)
Under monoids, foldr (<>) mempty = foldl (<>) mempty. Some example Monoids are:
1instance Monoid Int where 2 mempty = 0 -- 1 3 (<>) = (+) -- (*) 4 5instance Monoid [a] where 6 mempty = [] 7 (<>) = (++)
As lists are monoids, foldr (++) [] and foldl (++) [] do the same thing. However, the complexities of foldl (++) [] and foldr (++) [] are different:
foldl (++) []isfoldr (++) []is
4. Trees
1data Tree a = Leaf a | Node (Tree a) (Tree a) 2 3values :: Tree a -> [a] 4values (Leaf x) = [x] 5values (Node l r) = values l ++ values r 6 7t :: Tree Int 8t = Fork (Fork (Leaf 1) (Leaf 2)) (Fork (Leaf 3) (Leaf 4))
What values does is replace forks with ++ and leaves with singleton lists.
1vs :: [Int] 2vs = (++) ((++) [1] [2]) ((++) [3] [4]) 3vs = ([1] ++ [2]) ++ ([3] ++ [4])
We have no control over the associativity of the ++s, as a result, the worst case is that values is
5. Sequences
1class Seq seq where 2 nil :: seq a 3 cons :: a -> seq a -> seq a 4 snoc :: seq a -> a -> seq a 5 append :: seq a -> seq a -> seq a 6 len :: seq a -> Int 7 8 -- Special: 9 toList :: seq a -> [a] 10 fromList :: [a] -> seq a
Unsurprisingly, lists are sequences. What is interesting is to observe their complexities of operations.
1instance Seq [] where 2 nil = [] -- O(1) 3 cons = (:) -- O(1) 4 snoc xs x = xs ++ [x] -- O(n) 5 append = (++) -- O(n) 6 len = length -- O(n) 7 toList = id -- O(1) 8 fromList = id -- O(1)
We can make this a bit more insightful by making a new sequence; let's choose to improve the O(n) complexity of len for [].
1data LenList a = LenList Int [a] 2 3instance Seq LenList where 4 nil LenList 0 [] -- O(1) 5 cons x (LenList n xs) = LenList (n+1) (x:xs) -- O(1) 6 snoc (LenList n xs) x = LenList (n+1) (snoc xs x) -- O(n) 7 append (LenList n xs) (LenList m ys) = LenList (n+m) (xs ++ ys) -- O(n) 8 len (LenList n _) = n -- O(1) 9 head (LenList _ xs) = head xs -- O(1) 10 tail (LenList n xs) = LenList (n - 1) (tail xs) -- O(1) 11 init (LenList n xs) = LenList (n - 1) (init xs) -- O(n) 12 last (LenList _ xs) = last xs -- O(n) 13 LenList _ xs !! n = xs !! n -- O(n) 14 toList (LenList _ xs) = xs -- O(1) 15 fromList xs = LenList (length xs) xs -- O(n)
The rationale behind what we are about to do comes from some maths that isn't really relevant to this course, so we'll focus on pure intiution. Firstly, let's look at function composition
So, bracketing has no effect on the result. This is called associativity. Now we can do some manipulation:
1xs ++ (ys ++ zs) :: [a] 2((xs ++) . (ys ++) . (zs ++)) [] :: [a]
From this notion, we can build a difference list:
1data DList a = DList ([a] -> [a]) 2 3instance Seq DList where 4 nil = DList id -- O(1) 5 cons x (DList f) = DList (x:) . f -- O(1) 6 snoc (DList f) x = f `append` cons x nil -- O(1) 7 append (DList f) (DList g) = DList (f . g) -- O(1) 8 len = length . toList -- O(n) 9 head = head . toList -- O(n) 10 tail = fromList . tail . toList -- O(n) 11 init = fromList . init . toList -- O(n) 12 last = last . toList -- O(n) 13 (!!) = (!!) . toList -- O(n) 14 toList (DList f) = f [] -- O(n) 15 fromList xs = DList (xs ++) -- O(1)
So, difference lists are fantastic for constructing lists. They are awful at processing them. But, that's all we need. Let's rewrite the values function from before and hopefully do better than
1data Tree a = Leaf a | Fork (Tree a) (Tree a) 2 3values' :: Tree a -> [a] 4values' t = toList (go t) -- O(n) 5 where go :: Tree a -> DList a -- O(n) 6 go (Leaf x) = cons x nil -- O(1) 7 go (Fork l r) = go l `append` go r -- O(1)
So, the total complexity using difference lists is
6. Conclusion
In a purely functional language, like Haskell, difference lists are fantastic for building lists. Just as long as we don't look at them along the way.
In an impure language with mutation, like Scala, we can do better.
We can just "add things to the end" of a mutable builder. In fact,
sometimes toList is O(1) even for those.
Regardless, the lesson is simple: we like immutable results, but when constructing, we should favour representations that have fast building, even with other disadvantages. When we are ready to return the result of the function, we can "freeze" the datastructure into the representation we want moving forward.