Binary Trees
Mr.
Jitesh Attry
BCA 2nd yr 2011
BCA 2nd yr 2011
College
of Engineering & Technology, Bikaner
Trees are natural structures for representing certain kinds of
hierarchical data. A (rooted) tree consists of a set of nodes (or
vertices) and a set of arcs (or edges). Each arc links a
parent node to one of the parent's children. A special root node
has no parent. Every other node has exactly one parent. It is possible to reach
any node by following a unique path of arcs from the root. If arcs are
considered bidirectional, there is a unique path between any two nodes. The
simplest kind of tree is a binary tree where each parent has at most two
children.
(We consider unrooted trees elsewhere, e.g. in [minimum spanning trees].)
(We consider unrooted trees elsewhere, e.g. in [minimum spanning trees].)
A way to think of a binary tree is that it is either
empty (emptyTree) or it is a fork which contains an element and two subtrees which are themselves
binary trees.
Types:
type tree e = emptyTree | fork e × (tree e) × (tree e)
Operations:
empty: tree e -> boolean
leaf : tree e -> boolean
fork : e × tree e × tree e -> tree e
left : tree e -> tree e
right: tree e -> tree e
contents: tree e -> e
Rules:
empty(emptyTree) = true
empty(fork(x, T, T'))= false
leaf(fork(x, emptyTree, emptyTree)) = true
leaf(fork(x, T, T'))
= false, if not empty(T) or not empty(T')
leaf(emptyTree) = error
left(fork(x, T, T')) = T
left(emptyTree) = error
right(fork(x, T, T')) = T'
right(emptyTree) = error
contents(fork(x, T, T')) = x
contents(emptyTree) = error
The Tree Abstract
Data Type.
(It is also common to use curried versions of constructors and
functions, such as fork, especially in functional languages.)
Many operations can be defined on trees. For example:
height(emptyTree) = 0 |
height(fork(e,T,T')) = 1+max(height(T),
height(T'))
weight(emptyTree) = 0 |
weight(fork(e,T,T')) =
1+weight(T)+weight(T')
Note that these functions are binary-recursive as is the definition of
the tree type. (Also note that a tree consisting of a single leaf is
sometimes defined to be of height zero, rather than 1, as here.)
Trees have many uses in computing. For example a parse-tree can
represent the structure of an expression:
input: a+b*c
----------> +
. .
.
.
a *
. .
.
.
b c
Multiplication has a higher priority then addition and
binds more tightly. This tree shows that a+b*c is interpreted as a+(b*c) rather
than as (a+b)*c. Such trees are used in compilers and other programs.
Implementation of Binary Trees by
Pointers and Records
A tree data type can be implemented as a collection of records and
pointers.
The basic operations can create new records and manipulate pointers.
Not surprisingly this tree implementation is similar to the
implementation of hierarchical
lists.
An output routine for trees is a virtual necessity. A tree can be
printed in a style like that used for lists but a graphical two-dimensional
layout is more informative. It is difficult to print trees down the page,
because they quickly grow too wide, but it is relatively easy to print them
across the page so that the root is at the left and the tree grows to the
right.
For an example of tree output, see later sections.
Example: Parse Trees
Parsing an input language according to a grammar is frequently necessary
in computing. Here we consider a language of simple arithmetic expressions. A
grammar written in Backus Naur form (BNF) is given below.
<exp> ::= <exp> + <term> |
<term>
<term ::= <term> * <operand> |
<operand>
<operand> ::= <identifier>
| ( <exp> )
<identifier> ::= <letter>
<identifier> | <letter>
<letter> ::= "a" | "b" | ... |
"z"
Grammar for Simple
Expression.
The grammar can be read as a definition of the syntactic structure of
expressions, <exp>. The symbol `::=' can be read as `is' or `can be
replaced by'. The symbol `|' can be read as `or'. The names in angle brackets,
such as '<exp>', are variables for parts of the language. The string
<exp>+1 is not a legal expression but it stands for many legal
expressions such as x+1, y+1 and (x+y*z)+1. The first line of the grammar can
be read as: an <exp> is an <exp> plus <term> or a
<term>. An expression is either an expression plus a term or just a term.
Note that the syntax rules are recursive. A little thought shows that an
expression is a sequence of terms separated by plus signs. Similarly, a term is
a sequence of operands separated by multiplication signs. An operand is either
an identifier or a bracketed subexpression. An identifier is a sequence of
letters.
A parser can be written using the recursive-descent technique.
A procedure is written to recognise each class of object in the grammar - exp,
term, operand. These routines are recursive, as is the grammar, and call each
other to implement the grammar rules. For example, the routine for expression
calls the routine for term. The repetitive nature of expressions and terms is
coded by the use of a loops. A bracketed subexpression is inherently recursive
and so is the parser at that point. The complete parser is given below. It is
moderately long but not complex, especially if read with the grammar in mind.
A lexical routine, insymbol, skips white space and packages
input letters into identifiers and other kinds of symbol. It is followed by the
parser proper consisting of the routines Operand, Term and Exp.
[Java/Exp/]
[SML97/Exp/]
Note the use of mutual recursion. Various errors may be detected during
parsing. If the expression is syntactically correct, the tree representing its
structure is finally printed.
input: a + b*(c+d)
Parse Tree:
| | |d---------|
| |+---------|
| | |c---------|
|*---------|
| |b---------|
+---------|
|a---------|
Recursive Tree Traversal
There are three classic ways of recursively traversing a
tree or of visiting every one of its nodes once. In each of these, the left and
right subtrees are visited recursively and the distinguishing feature is when
the element in the root is visited or processed.
In a preorder or prefix traversal the
root is visited first (pre) and then the left and right subtrees are traversed.
In an infix traversal, the left subtree is traversed
and then the root is visited and finally the right subtree is traversed.
In a postorder or postfix traversal
the left and right subtrees are traversed and then the root is visited
afterwards (post).
This method can be used to generate postfix or reverse Polish code
for a stack machine.
Given the following tree, formed by starting with the empty binary
search tree and inserting, in turn, the words, the land rover is a type of motor car,
|type------|
the-------|
| |rover-----|
| | |of--------|
| | | |motor-----|
|land------|
| |is--------|
| | | |car-------|
| | |a---------|
Example Tree.
the three traversals give the following results. The results of yet
another method, breadth-first traversal, are included for
comparison; see the next section.
infix: a car is land
motor of rover the
type
prefix: the land is a
car rover of
motor type
postfix: car a is
motor of rover land
type the
b-first: the land
type is rover a
of car motor
Traversals.
Note that the method given for printing a tree is a reversed infix
traversal.
Breadth-First Traversal
A breadth-first traversal of a tree starts at the root
of the tree. It next visits the children, then the grand-children and so on.
1
. .
. .
2 3
. . . .
. .
. .
4 5
6 7
. . .
. .
.
. . .
.
8
9 10 11
12
. . .
. .
.
13 14
15
Example:
Breadth-First Order.
The numbers indicate the order in which the nodes are visited, not the
contents of the nodes. Because children are only accessible from a parent, they
must be stored while the parent's siblings and cousins are visited. A [queue] is used to do this.
Note that the queue is a queue of pointer to node or a queue of tree
type. Initially the queue contains just the root of the tree. At each iteration
of the algorithm, the first element is removed from the queue. Its children, if
any, are pushed onto the end of the queue and the element is processed. The
algorithm terminates when the queue is empty. See also the chapter on queues.
Recursion
Most routines on trees are recursive. This is natural because the tree
is a recursive data type. It is possible to write iterative versions of these
operations but it is harder to do so than is the case for flat lists because
the tree type is binary recursive. The flat list and hence most of
its operations are linear recursive and a linear recursive
routine usually has a simple iterative version. It is often necessary to
introduce an explicit stack into a program when writing a
non-recursive tree routine. This is often not worth the effort as the language
implementors can usually do a better job with the system stack.
The object-oriented programming language Java has the notion of
an Enumeration, i.e. an iterator, which steps through the elements of a set of values.
An enumerator must implement the predicate hasMoreElements and the
function nextElement. To program an enumerator which will emit the contents of a tree in one
of the standard orders it is useful to a employ a Stack:
Side-Effects
The tree operations described so far have no side-effects except for
input, output and manipulation of dynamic storage; they are pure tree
operations. As is the case with lists, it is often necessary or desirable to
use operations having side-effects on efficiency grounds. This is particularly
natural if a program uses a single tree as a dictionary or database structure.
As before, should multiple trees share components, changing one tree may change
another and if this is not anticipated it will cause program bugs.
Demonstration: There is a demonstration
of Binary Search (& AVL) Trees [here].
|
A binary search tree can be created so that the
elements in it satisfy an ordering property. This allows elements to be
searched for quickly. All of the elements in the left subtree are less than the
element at the root which is less than all of the elements in the right subtree
and this property applies recursively to all the subtrees. The great advantage
of this is that when searching for an element, a comparison with the root will
either find the element or indicate which one subtree to
search. The ordering is an invariant property of the search tree. All routines
that operate on the tree can make use of it provided that they also keep it
holding true.
It takes O(h) time to search a search tree of height h. Since a tree of
height `h' can hold n=2h-1 elements, the search takes O(log(n)) time
under favourable circumstances. The search-tree and its search routine should
be compared with the use of the binary search algorithm on sorted arrays in the
chapter on tables.
The use of the tree speeds up the insertion and deletion operations at
the price of the space needed to hold the pointers. The tree has the speed
advantage when the data in the structure changes rapidly.
The routine given here to insert an element does so as a side-effect by
changing the tree.
If elements are inserted in the order d, b, a, c, e, f, g the following
tree is created:
----------->d
. .
.
.
. .
b e
. . .
. . .
a c f
.
.
g
Note that an insertion takes O(h) time. Under favourable circumstances,
a balanced tree is created, as for b, a, c, giving O(log(n))
search time. If the input is sorted however an unbalanced tree approximating a
list is created, as for e, f, g, and the search degenerates to an O(n) linear
search. This problem is addressed later.
Search Tree:
|type------|
the-------|
| |rover-----|
| | |of--------|
| | | |motor-----|
|land------|
| |is--------|
| | | |car-------|
| | |a---------|
Example Search Tree.
A new element is added to the tree as a new peripheral, leaf node.
However if an element can also be deleted it is possible for it to be internal.
This makes deletion rather more difficult than insertion.
A leaf element is easily deleted by setting the pointer to it to
emptyTree.
T:-----| T:emptyTree
|
|
v
x
before after
Deletion of a Leaf.
The node becomes garbage and can be freed.
An element with one child can be deleted by by-passing it.
T:------| T:|
| |
| |
v |
x |
. emptyTree |
. v
e e
. . . .
.
. .
.
before after
Deletion of a
Single-Child Parent.
An internal element x with two children cannot easily be bypassed
without loosing one of its subtrees. The solution is to overwrite x with some
other element y of the tree and then to delete the original copy of y. There
are two obvious elements to choose from - either the largest element `A' less
than x or the smallest element `B' greater than x. Each of these has at most
one child! The sortedness of the tree is maintained if x is overwritten with
either of these.
T:----------| T:-----|
| |
| |
v v
x B
.
. . .
. . . .
L R L R
. . . . . . . .
. . .
. . .
. .
A B A C
. . . . .
. . . .
.
C
. .
. .
before after
Deletion of a
Two-Child Parent.
Both of A and B fall into one of the cases previously dealt with. A can
be found from x by taking one step left and then as many steps right as
possible. B can be found by taking one step right and then as many steps left
as possible.
Deletion takes O(h) time.
Notes
- [Tries] and [PATRICIA trees] are
an alternatives to binary search trees.
- The [Suffix Tree] is a
data-structure for solving the string searching problem.
Demonstration: There is a demonstration
of AVL (& bin. s.) Trees [here].
|
As mentioned above, if elements are inserted into a search tree in
sorted order, a tree is created that is equivalent to a list. This will lead to
inefficient searches. A height-balanced tree or an AVL-tree
(after G. M. Adel'son-Velskii and E. M. Landis) is a search tree in
which the height of the right subtree minus the height of the left subtree
equals 1, 0, or -1. This property also applies recursively to all subtrees. It
can be shown that a height-balanced tree of `n' elements has height O(log(n))
and this guarantees efficient search. Fortunately fast O(log(n)) insertion and
deletion is still possible.
A flag indicating the balance of each subtree is added to the node
record.
There are four crucial cases during insertion. In the first case, the
left subtree L grows too tall on its left:
T L
. . . .
. . . .
L . -----> .
T
. . | | . .
. . | | . .
. . | | . .
| | * | |
| | * | |
| | * | |
* new elt' rebalanced
* disturbs
* balance
Left-2 Rotation.
By rotating about L and T the height balance is restored and the
ordering in the tree is maintained. In the second case, L grows too tall on its
right:
T LR
. . . .
. . . .
L . L T
. . | . . . .
. . |
----> . .
. .
.
LR | .
. . .
|
.. | |
| | |
|
. . |
| | | |
|
. . |
| | | |
|
| | |
| | | |
|
| | |
| |
| |
|
| | |
| | | |
|
| | | *
|
|
| | | *
|
|
| | | *
|
*
*
*
Left-3 Rotation.
The rotation restores the balance while maintaining the tree ordering.
In the above example left(LR) grew; an alternative is that right(LR) grew but
the same operations still restore the balance. There are mirror-image right-2
and right-3 rotations.
Maintaining the balance significantly complicates the tree insertion
routine. However a fixed amount of work is done at each node that is visited
and so it takes O(h) time.
Note that if a tree has just grown in height, it can only be perfectly
balanced if it is a single (new) leaf. Otherwise one subtree must be higher
than the other.
| |type------|
|the-------|
rover-----|
| | |of--------|
| |motor-----|
| | |land------|
|is--------|
| | |car-------|
| |a---------|
Height-Balanced Tree.
Compare this with the tree given earlier and created by the
non-balancing insert.
Notes
- G. M. Adelson-Velskii and E.
M. Landis [AVL]. An algorithm for the organization of information. Soviet
Math. Dokl., 3, pp.1259-1263, 1962.
- N. Wirth. Algorithms
and Data Structures. pp.218, Prentice-Hall, 1986.
AVL-tree insertion and deletion. - M. A. Weiss. Data
Structures and Algorithm Analysis in C. s4.4, pp.110, Addison Wesley,
1997.
Implementation of Binary Trees by
Arrays
A binary tree can be implemented as an array of records.
type TreeT :0..nmax
var Tree: array 1..nmax of
record elt :Element_Type
left, right :TreeT
end record
The empty tree is represented by zero. Left and right are indexes to
left and right subtrees.
This implementation has no advantage in a language supporting dynamic
storage unless random access to nodes is also needed. It is useful in a
language without dynamic storage.
Full Trees by Arrays:-
It is possible to define a full or weight
balanced tree in contrast to a height balanced tree. The empty tree is
a full tree of zero levels. If T and T' are full binary trees of n levels then
fork(e,T,T') is a full binary tree of n+1 levels.
1
. .
. .
. .
. .
2 3
. . . .
. . .
.
4 5 6
7
Full Binary Tree of 3
Levels.
The numbering of the nodes corresponds to a breadth-first traversal.
This suggests that such a tree can be stored in an array:
const n = 2**Levels - 1
array 1..n of Element_Type
type tree = 0..n
empty(T) = T=0
left(T) = 2*T
right(T) = 2*T+1
parent(T) = T div 2
leaf(T) = T >= 2**(Levels-1)
Such an implementation is very efficient indeed, if the tree is full,
because no space at all is used for pointers. See the chapter on sorting for an
application of this technique in[heap-sort].
Testing and Debugging:-
Programming techniques for trees share a great deal in common with those
for lists. Pre and post conditions and assertions should be included where
possible. Minimise the use of side-effects in general and the manipulation of
global variables in particular.
Note that there are really just two kinds of tree - emptyTree and
non-emptyTree. Most operations on a tree follow this pattern:
f(emptyTree) = ... |
f(fork(e,T,T')) = ...often
f(T)...f(T')...
If a case is missing it probably indicates an error. The empty case is
usually very simple, often returning a constant. The main case often operates
on one or both subtrees recursively.
When testing or debugging a tree routine, test cases should cover the
above options. For non-emptyTree trees, it may be appropriate to try a tree of
a "few" nodes and a tree of a single node. As usual it is invaluable
to write the output routine first. The most common problems are: . unassigned
pointers, particularly those that should be emptyTree, . lost pointer values
through a wrong ordering of operations, . knotted pointers maybe introducing a
cycle, . side-effects through shared pointers, intentional or otherwise.
The coding of reasonably complex routines such as those on AVL trees is
prone to small typographical errors. Nothing can beat careful proof reading but
one good testing trick is to use a set of ascending data and then its reversal
in which case the mirror image tree should be produced. This test is not
sufficient on its own! It is important to exercise all the paths through a
complex routine and this requires a great deal of thought.
Exercises
1. Classify tree
operations (height, weight, min_element, max_element, flatten, insert, lookup,
delete, pre,- in- and post-order traversal) on unordered binary trees and on
search trees as either linear or binary recursive. Note that a linear-recursive
operation may contain two or more calls on itself but only ever execute at most
one of them under the control of an if statement. Write
iterative versions of one or more of the linear operations.
2. Add a full set of
operators +, -, *, / to the expression parser.
3. Write a routine to
traverse the expression parse tree to generate reverse Polish code. eg. a*b+c*d
---> a b * c d * +
4. Write a routine to do
symbolic differentiation. Input is the parse tree of an arithmetic expression
which may contain numerals, variables {x, y, ...}, + and *. Output is the tree
of the expression that is the result of differentiation with respect to a given
variable.
Use the rules:
diff(x, x) = 1
diff(y, x) = 0 if y
is a constant or a variable different from x
diff(A+B, x) =
diff(A,x)+diff(B,x)
diff(A*B, x) =
A*diff(B,x)+diff(A,x)*B
eg. diff(x*x+3*x+4, x) =
x*1+1*x+3*1+0*x+0 = 2*x+3
5. Write a routine
to free all of the nodes in a binary tree. Count the number
of new and free operations and check that
they are equal.
6. Write a routine to
collect only the elements in all the leaf nodes of a binary tree into a list.
7. The path of
a node in a tree is the sequence of elements contained in nodes from the root,
down to and including the node itself. Give a procedure to print the path of
every leaf node in a tree.
8. A height-balanced
tree is not necessarily full or weight-balanced. How un-weight-balanced can one
be? ie. What is the minimum possible number `n' of nodes in a height-balanced
tree for height h=1, 2, ..., 8. Is there a pattern? How does n compare with the
value 2h-1 for a full tree?
9. (Hard) Implement
a delete operation for height-balanced trees.
Comments
Post a Comment