Poset

We want sets that have better time complexity than linear for searches. For this, we use partially ordered sets, or posets:

1class Poset set where
2  empty :: set a
3  insert :: Ord a => a -> set a -> set a
4  delete :: Ord a => a -> set a -> set a
5
6  singleton :: Ord a -> a -> set a
7  singleton x = insert x empty
8
9  member :: Ord a -> a -> set a -> Bool
10  minVal :: Ord a => set a -> a
11  maxVal :: Ord a => set a -> a
12
13  fromList :: Ord a => [a] -> set a
14  fromList = foldr insert empty
15
16  toList :: set a -> [a]
17
18  union :: Ord a => set a -> set a -> set a
19  union s1 s2 = foldr insert s1 (toList s2)
20  difference :: Ord a => set a -> set a -> set a
21  difference s1 s2 = foldr delete s1 (toList s2)
22  intersection :: Ord a => set a -> set a -> set a
23  intersection s1 s2 = fromList $ filter (`member` s2) (toList s1)

A very bad example of a Poset, which does not improve the complexity of member is to use []:

1instance Poset [] where
2  empty = [] -- O(1)
3  insert x [] = [x] -- O(1)
4  insert x yys@(y : ys)
5    | x == y = yys
6    | x < y = x : yys
7    | otherwise = y : insert x ys
8  delete = List.delete -- O(n)
9
10  member = elem -- O(n)
11  minValue = head -- O(1)
12  maxValue = last -- O(n)
13
14  toList = id -- O(1)

While this is very simple, it doesn't make use of the Ord constraint. For good set structures, most operations should be or better. Therefore, a better example of a poset may be to use trees:

1data Tree a = Tip a | Node (Tree a) a (Tree a)
2
3instance Poset Tree where
4  empty = Tip -- O(1)
5
6  insert :: Ord a => a -> Tree a -> Tree a -- O(n)
7  insert x Tip = Node Tip x Tip 
8  insert x t@(Node lt y rt) = case compare x y of
9    EQ -> t -- x is in the set, nothing happens
10    LT -> Node (insert x lt) y rt
11    GT -> Node lt y (insert x rt)
12
13  toList :: Tree a -> [a] -- O(n)
14  toList = Seq.toList . go
15    where go :: Tree a -> Seq.DList a
16          go Tip = Seq.nil
17          go (Node lt x rt) = go lt `Seq.append` Seq.cons x (go rt)

The insert function can be seen as a frozen representation of the divide step of quicksort. From here, we can intuitvely see that insert will generate a poorly balanced tree.

Similarly, fromList will also produce very biased trees. This is why we need balanced trees. However, from our existing definition, we can also define quicksort as:
1quicksort :: Ord a => [a] -> [a]
2quicksort = toList . fromList @Tree
Quicksort also suffers when the data is already sorted.

To fix the above problems, we can use a self balancing tree called an AVL Tree:

1data Tree a = Tip | Node !Int (Tree a) a (Tree a)
2
3height :: Tree a -> Int
4height Tip = 0
5height (Node h _ _ _) = h

For height to be useful, we need to actually update the !Int. We can do this with a smart constructor:

1node :: Tree a -> a -> Tree a -> Tree a
2node l x r = Node (1 + max (height l) (height r)) l x r

Additionally, we need to define an invariant on the tree - we only allow each pair of subtrees to be at most 1 different in height - or we could have non-perfect trees.

1balanced :: Tree a -> Tree a -> Bool
2balanced l r = abs (height l - height r) <= 1

We need to ensure a newly constructed tree is balanced. We assume that we know where an imbalance may have occurred to save some work:

1balanceL :: Tree a -> a -> Tree a -> Tree a -- O(1)
2balanceL lt x rt
3  | balanced lt rt           = node lt x rt
4  -- Pre: height lt > height rt + 1
5  | height llt >= height lrt = rotr $ node lt x rt
6  | otherwise                = rotr $ node (rotl lt) x rt
7  where Node _ llt _ lrt = lt

(?) How does this work?

The balanceL function makes sure that the left subtree is at most 1 higher than the right subtree. If the tree is imbalanced, there are two cases:

The leftmost subtree is the tallest: we rotate to ensure that the leftmost subtree is the rightmost subtree of the left subtree.

The middle subtree is the tallest: we rotate lt to make the case similar to the one above.

A rotation on a tree is a simple operation that changes the structure of the tree. For example:

A right rotation on a tree Node a x (Node b y c) z d will result in Node b y (Node a x c) z d.

A left rotation on a tree Node a x b y (Node c z d) will result in Node (Node a x b y c) z d.

Now, we can define rotr, rotl and then balanceR:

1-- Pre: llt is taller than or equal to lrt and rt
2rotr :: Tree a -> Tree a -- O(1)
3rotr (Node _ (Node _ llt x lrt) y rt) = node llt x (node lrt y rt)
4
5-- Pre: rrt is taller than or equal to rlt and lt
6rotl :: Tree a -> Tree a -- O(1)
7rotl (Node _ lt x (Node _ rlt y rrt)) = node (node lt x rlt) y rrt
8
9balanceR :: Tree a -> a -> Tree a -> Tree a -- O(1)
10balanceR lt x rt
11  | balanced lt rt           = node lt x rt
12  -- Pre: height lt + 1 < height rt
13  | height rrt >= height rlt = rotl $ node lt x rt
14  | otherwise                = rotl $ node lt x (rotr rt)
15  where Node _ rlt _ rrt = rt

Finally, we can create our poset:

1instance Poset Tree where
2  empty :: Tree a
3  empty = Tip
4
5  singleton :: a -> Tree a
6  singleton x = node Tip x Tip
7
8  member :: Ord a => a -> Tree a -> Bool -- O(log n), as the tree is balanced
9  member _ Tip = False
10  member x (Node _ lt y rt) = case compare x y of
11    LT -> member x lt
12    EQ -> True
13    GT -> member x rt
14
15  toList :: Tree a -> [a] -- O(n)
16  toList = Seq.toList . go
17    where go :: Tree a -> Seq.DList a
18          go Tip = Seq.nil
19          go (Node _ lt x rt) = go lt `Seq.append` Seq.cons x (go rt)

When we insert / delete into the tree, we want to guarantee it is balanced, giving us a time complexity. To do this we use balanceL and balanceR:

BalanceL is used when the left tree has grown or the right tree has shrunk.
BalanceR is used when the right tree has grown or the left tree has shrunk.

1insert :: Ord a => a -> Tree a -> Tree a -- O(log n)
2insert x Tip = singleton x
3insert x t@(Node _ lt y rt) = case compare x y of
4  EQ -> t
5  LT -> balanceL (insert x lt) y rt
6  GT -> balanceR lt y (insert x rt)
7
8delete :: Ord a => a -> Tree a -> Tree a -- O(log n)
9delete _ Tip = Tip
10delete x t@(Node _ lt y rt) = case compare x y of
11  EQ -> glue lt rt
12  LT -> balanceR (delete x lt) y rt
13  GT -> balanceL lt y (delete x rt)

In delete, when the element to remove is found in the node, we need a new element to root the tree. We know that lt and rt are balanced with respect to each other, so we can use the glue function. The glue function will require extractMin and extractMax, which can be used to define minValue and maxValue on the poset:

1-- Find the minimum element in a tree and return the rest of the tree
2extractMin :: Tree a -> a -> Tree a -> (a, Tree a) -- O(log n)
3extractMin Tip min rest = (min, rest)
4extractMin (Node _ llt lx lrt) x rt = (min, balanceR rest x rt)
5  where (min, rest) = extractMin llt lx lrt
6
7-- Find the maximum element in a tree and return the rest of the tree
8extractMax :: Tree a -> a -> Tree a -> (a, Tree a) -- O(log n)
9extractMax rest max Tip = (max, rest)
10extractMax lt x (Node _ rlt rx rrt) = (max, balanceL lt x rest)
11  where (max, rest) = extractMax rlt rx rrt
12
13minValue :: Ord a => Tree a -> a -- O(log n)
14minValue (Node _ lt x rt) = fst $ extractMin lt x rt
15
16maxValue :: Ord a => Tree a -> a -- O(log n)
17maxValue (Node _ lt x rt) = fst $ extractMax lt x rt

Now we can define glue, which connects two balanced trees:

1glue :: Tree a -> Tree a -> Tree a -- O(log n)
2glue Tip rt = rt
3glue lt Tip = lt
4glue lt@(Node lh llt lx lrt) rt@(Node rh rlt rx rrt)
5  | lh < rh   = let (x, rt') = extractMin rlt rx rrt in balanceL lt x rt'
6  | otherwise = let (x, lt') = extractMax llt lx lrt in balanceR lt' x rt

(?) How does glue work?

In the first case, the height rh is guaranteed to be lh + 1. This means that the minimum of rt is a good candidate to be the root of the new tree. We extract the minimum of rt and use it as the root, with the rest of the tree as the right subtree.

In the second case, the height lh is guaranteed to be rh + 1. This means that the maximum of lt is a good candidate to be the root of the new tree. We extract the maximum of lt and use it as the root, with the rest of the tree as the left subtree.

From the AVL list, we learned that:

Datastructures often represent algorithms frozen in time. This provides an insight to the performance characteristics of various algorithms.
The importance of invariance preservation - this can easily be done with a smart constructor.

Back to Home

Table of Contents

Poset

(?) How does this work?

(?) How does `glue` work?

Table of Contents

Poset

(?) How does this work?

(?) How does glue work?

(?) How does `glue` work?