Poset
We want sets that have better time complexity than linear for searches. For this, we use partially ordered sets, or posets:
1class Poset set where 2 empty :: set a 3 insert :: Ord a => a -> set a -> set a 4 delete :: Ord a => a -> set a -> set a 5 6 singleton :: Ord a -> a -> set a 7 singleton x = insert x empty 8 9 member :: Ord a -> a -> set a -> Bool 10 minVal :: Ord a => set a -> a 11 maxVal :: Ord a => set a -> a 12 13 fromList :: Ord a => [a] -> set a 14 fromList = foldr insert empty 15 16 toList :: set a -> [a] 17 18 union :: Ord a => set a -> set a -> set a 19 union s1 s2 = foldr insert s1 (toList s2) 20 difference :: Ord a => set a -> set a -> set a 21 difference s1 s2 = foldr delete s1 (toList s2) 22 intersection :: Ord a => set a -> set a -> set a 23 intersection s1 s2 = fromList $ filter (`member` s2) (toList s1)
A very bad example of a Poset, which does not improve the complexity of member is to use []:
1instance Poset [] where 2 empty = [] -- O(1) 3 insert x [] = [x] -- O(1) 4 insert x yys@(y : ys) 5 | x == y = yys 6 | x < y = x : yys 7 | otherwise = y : insert x ys 8 delete = List.delete -- O(n) 9 10 member = elem -- O(n) 11 minValue = head -- O(1) 12 maxValue = last -- O(n) 13 14 toList = id -- O(1)
While this is very simple, it doesn't make use of the Ord constraint. For good set structures, most operations should be
1data Tree a = Tip a | Node (Tree a) a (Tree a) 2 3instance Poset Tree where 4 empty = Tip -- O(1) 5 6 insert :: Ord a => a -> Tree a -> Tree a -- O(n) 7 insert x Tip = Node Tip x Tip 8 insert x t@(Node lt y rt) = case compare x y of 9 EQ -> t -- x is in the set, nothing happens 10 LT -> Node (insert x lt) y rt 11 GT -> Node lt y (insert x rt) 12 13 toList :: Tree a -> [a] -- O(n) 14 toList = Seq.toList . go 15 where go :: Tree a -> Seq.DList a 16 go Tip = Seq.nil 17 go (Node lt x rt) = go lt `Seq.append` Seq.cons x (go rt)
The
insertfunction can be seen as a frozen representation of the divide step of quicksort. From here, we can intuitvely see thatinsertwill generate a poorly balanced tree.Similarly,
fromListwill also produce very biased trees. This is why we need balanced trees. However, from our existing definition, we can also definequicksortas:1quicksort :: Ord a => [a] -> [a] 2quicksort = toList . fromList @TreeQuicksort also suffers when the data is already sorted.
To fix the above problems, we can use a self balancing tree called an AVL Tree:
1data Tree a = Tip | Node !Int (Tree a) a (Tree a) 2 3height :: Tree a -> Int 4height Tip = 0 5height (Node h _ _ _) = h
For height to be useful, we need to actually update the !Int. We can do this with a smart constructor:
1node :: Tree a -> a -> Tree a -> Tree a 2node l x r = Node (1 + max (height l) (height r)) l x r
Additionally, we need to define an invariant on the tree - we only allow each pair of subtrees to be at most 1 different in height - or we could have non-perfect trees.
1balanced :: Tree a -> Tree a -> Bool 2balanced l r = abs (height l - height r) <= 1
We need to ensure a newly constructed tree is balanced. We assume that we know where an imbalance may have occurred to save some work:
1balanceL :: Tree a -> a -> Tree a -> Tree a -- O(1) 2balanceL lt x rt 3 | balanced lt rt = node lt x rt 4 -- Pre: height lt > height rt + 1 5 | height llt >= height lrt = rotr $ node lt x rt 6 | otherwise = rotr $ node (rotl lt) x rt 7 where Node _ llt _ lrt = lt
(?) How does this work?
The
balanceLfunction makes sure that the left subtree is at most 1 higher than the right subtree. If the tree is imbalanced, there are two cases:
- The leftmost subtree is the tallest: we rotate to ensure that the leftmost subtree is the rightmost subtree of the left subtree.
- The middle subtree is the tallest: we rotate
ltto make the case similar to the one above.A rotation on a tree is a simple operation that changes the structure of the tree. For example:
- A right rotation on a tree
Node a x (Node b y c) z dwill result inNode b y (Node a x c) z d.- A left rotation on a tree
Node a x b y (Node c z d)will result inNode (Node a x b y c) z d.
Now, we can define rotr, rotl and then balanceR:
1-- Pre: llt is taller than or equal to lrt and rt 2rotr :: Tree a -> Tree a -- O(1) 3rotr (Node _ (Node _ llt x lrt) y rt) = node llt x (node lrt y rt) 4 5-- Pre: rrt is taller than or equal to rlt and lt 6rotl :: Tree a -> Tree a -- O(1) 7rotl (Node _ lt x (Node _ rlt y rrt)) = node (node lt x rlt) y rrt 8 9balanceR :: Tree a -> a -> Tree a -> Tree a -- O(1) 10balanceR lt x rt 11 | balanced lt rt = node lt x rt 12 -- Pre: height lt + 1 < height rt 13 | height rrt >= height rlt = rotl $ node lt x rt 14 | otherwise = rotl $ node lt x (rotr rt) 15 where Node _ rlt _ rrt = rt
Finally, we can create our poset:
1instance Poset Tree where 2 empty :: Tree a 3 empty = Tip 4 5 singleton :: a -> Tree a 6 singleton x = node Tip x Tip 7 8 member :: Ord a => a -> Tree a -> Bool -- O(log n), as the tree is balanced 9 member _ Tip = False 10 member x (Node _ lt y rt) = case compare x y of 11 LT -> member x lt 12 EQ -> True 13 GT -> member x rt 14 15 toList :: Tree a -> [a] -- O(n) 16 toList = Seq.toList . go 17 where go :: Tree a -> Seq.DList a 18 go Tip = Seq.nil 19 go (Node _ lt x rt) = go lt `Seq.append` Seq.cons x (go rt)
When we insert / delete into the tree, we want to guarantee it is balanced, giving us a balanceL and balanceR:
- BalanceL is used when the left tree has grown or the right tree has shrunk.
- BalanceR is used when the right tree has grown or the left tree has shrunk.
1insert :: Ord a => a -> Tree a -> Tree a -- O(log n) 2insert x Tip = singleton x 3insert x t@(Node _ lt y rt) = case compare x y of 4 EQ -> t 5 LT -> balanceL (insert x lt) y rt 6 GT -> balanceR lt y (insert x rt) 7 8delete :: Ord a => a -> Tree a -> Tree a -- O(log n) 9delete _ Tip = Tip 10delete x t@(Node _ lt y rt) = case compare x y of 11 EQ -> glue lt rt 12 LT -> balanceR (delete x lt) y rt 13 GT -> balanceL lt y (delete x rt)
In delete, when the element to remove is found in the node, we need a new element to root the tree. We know that lt and rt are balanced with respect to each other, so we can use the glue function. The glue function will require extractMin and extractMax, which can be used to define minValue and maxValue on the poset:
1-- Find the minimum element in a tree and return the rest of the tree 2extractMin :: Tree a -> a -> Tree a -> (a, Tree a) -- O(log n) 3extractMin Tip min rest = (min, rest) 4extractMin (Node _ llt lx lrt) x rt = (min, balanceR rest x rt) 5 where (min, rest) = extractMin llt lx lrt 6 7-- Find the maximum element in a tree and return the rest of the tree 8extractMax :: Tree a -> a -> Tree a -> (a, Tree a) -- O(log n) 9extractMax rest max Tip = (max, rest) 10extractMax lt x (Node _ rlt rx rrt) = (max, balanceL lt x rest) 11 where (max, rest) = extractMax rlt rx rrt 12 13minValue :: Ord a => Tree a -> a -- O(log n) 14minValue (Node _ lt x rt) = fst $ extractMin lt x rt 15 16maxValue :: Ord a => Tree a -> a -- O(log n) 17maxValue (Node _ lt x rt) = fst $ extractMax lt x rt
Now we can define glue, which connects two balanced trees:
1glue :: Tree a -> Tree a -> Tree a -- O(log n) 2glue Tip rt = rt 3glue lt Tip = lt 4glue lt@(Node lh llt lx lrt) rt@(Node rh rlt rx rrt) 5 | lh < rh = let (x, rt') = extractMin rlt rx rrt in balanceL lt x rt' 6 | otherwise = let (x, lt') = extractMax llt lx lrt in balanceR lt' x rt
(?) How does
gluework?
- In the first case, the height
rhis guaranteed to belh + 1. This means that the minimum ofrtis a good candidate to be the root of the new tree. We extract the minimum ofrtand use it as the root, with the rest of the tree as the right subtree.- In the second case, the height
lhis guaranteed to berh + 1. This means that the maximum ofltis a good candidate to be the root of the new tree. We extract the maximum ofltand use it as the root, with the rest of the tree as the left subtree.
From the AVL list, we learned that:
- Datastructures often represent algorithms frozen in time. This provides an insight to the performance characteristics of various algorithms.
- The importance of invariance preservation - this can easily be done with a smart constructor.