Introduction to Trees#

Group Presentation#

What’s the difference between Type ** and Type *? Why use Type ** in homework?
How to use valgrind to check memory leak?
What is a dynamic array, and what are the similarities and differences compared to h6 ex5?
What is a doubly linked list, and what are the similarities and differences compared to h6 ex7?

Homework Feedback#

Generally, you did a great job! However, remember the following things:

Always indent and format your code. You can use code formatters to do that easily.
Push to master branch for your final submission. Don’t forget to release.
Follow the code quality rules (we’ll talk about that in detail later)

Warning

This lab is quite difficult. If I don’t make something clear, please forgive me.

Tree Structure#

What is tree?#

In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be connected to exactly one parent, except for the root node, which has no parent.

—Wikipedia

../_images/tree.png — A tree with data integers#

Basic Concepts#

A node is a structure which may contain data and connections to other nodes. It forms the most basic elements in a tree.

The root node is the node that has no parent (the ancestor of all the other nodes in a tree).

The leaf node of a tree is the node that doesn’t haven any child.

A binary tree is a special tree: every node have at most 2 children nodes.

../_images/binary_tree.png — A binary tree#

Note

Tree is an abstract data structure, which means that it is not necessarily related to a specific programming language. You can implement tree in any Turing-Complete language.

Why trees?#

Trees are good because it can represent many data in our real life naturally.

What does tree look like in C?#

In this part, we will focus on the implementations in C.

Why pointers are involved?#

Pointers are helpful to writing “compound data structure with self-referring” (a self-made word).

Examples

Linked list
Tree
Graph (more general than tree)
Recursion

Think

Why can recursion also be related to pointers?

Think

What’s the common things among those examples?

Tip

Think about what their “nodes” are. Write out the concrete implementations.

Here I give two examples. Note that the data can be any data type, I use int only for demonstration.

Doubly Linked list#

typedef struct _node{
    int data;
    // other staffs
    struct _node *next;
    struct _node *prev;
} node;

Binary Tree#

typedef struct _node{
    int data;
    // other staffs
    struct _node *left;
    struct _node *right;
} node;

Review of the Game#

In the last lab, we have a fun game which simulates a FIFA world cup.

However, the method we used last time is not very easy to understand (using array to represent the teams, etc..).

Now we have learned tree. Is it useful in this game?

We use Post Order DFS to traverse all the nodes in the tree.

Garbage Collection: Easy Version#

After we simulate the whole game, we have to carefully collect all the “garbages” generated through the game. (Memory Management)

We will take a look at a simple example. In this example, we only have 2 countries and 3 nodes.

After we finish the game, all the nodes have set the country variable.

In the simple version, every data in the node is not a pointer. It is directly the Country data type. Hence, many countries are copied. (eg. Node 1 and Node 2 have the same country, but they don’t share the same memory)

struct Node_t {
     // data
     Country country;
     // children
     Node *left;
     Node *right;
 };

skinparam classAttributeIconSize 0
package "Resource" as R1 <<Rectangle>> #yellow {
object "Country" as Country1
Country1 : name = "UFO"
Country1 : id = 10
}

package "Resource" as R1c <<Rectangle>> #yellow {
object "Country" as Country1c
Country1c : name = "UFO"
Country1c : id = 10
}

package "Resource" as R2 <<Rectangle>> #yellow {
object "Country" as Country2
Country2 : name = "ABC"
Country2 : id = 3
}

object "Node 1 (Root Node)" as node1
node1 : data
node1 : *left
node1 : *right

node1::data -> R1c

object "Node 2" as node2
node2 : data
node2 : *left
node2 : *right

node2::data -> R1

object "Node 3" as node3
node3 : data
node3 : *left
node3 : *right

node3::data -> R2

node1::left --> node2
node1::right --> node3 — Simple Node#

The Node_create() function is defined by:

Node *Node_create() {
     Node   *new_node = malloc(sizeof(Node));
     if (!new_code){
         exit(-1);
     }
     Country s;
     s.rank            = NON;
     new_node->country = s;
     new_node->left    = NULL;
     new_node->right   = NULL;
     return new_node;
 }

Tip

Always check the return value of malloc()!

The destroy function should be:

void tree_destroy(Node *root) {
    // release memory
    if (root == NULL) {
        return;
    } else {
        tree_destroy(root->left);
        tree_destroy(root->right);
        free(root);
    }
}

As you can see, we don’t need to delete any other resources.

When we are destroying the tree, there are three steps:

Step 1#

We first delete the left child. When we delete Node 2, its country will be automatically destroyed.

skinparam classAttributeIconSize 0
package "Resource" as R1 <<Rectangle>> #grey {
object "Country" as Country1 #grey
Country1 : name = "UFO"
Country1 : id = 10
}

package "Resource" as R1c <<Rectangle>> #yellow {
object "Country" as Country1c
Country1c : name = "UFO"
Country1c : id = 10
}

package "Resource" as R2 <<Rectangle>> #yellow {
object "Country" as Country2
Country2 : name = "ABC"
Country2 : id = 3
}

object "Node 1 (Root Node)" as node1
node1 : data
node1 : *left
node1 : *right

node1::data -> R1c

object "Node 2" as node2 #grey
node2 : data
node2 : *left
node2 : *right

node2::data -> R1

object "Node 3" as node3
node3 : data
node3 : *left
node3 : *right

node3::data -> R2

node1::left --> node2
node1::right --> node3 — Step 1#

Step 2#

We then delete the right child. When we delete Node 3, its country will also be automatically destroyed.

Step 3#

Finally we destroy all the objects.

Think

What is the disadvantages of this simple method?

Reference counting#

In the difficult version, we define a new Country_proxy type by:

typedef struct {
    Country *proxy;
    unsigned int counter;
} Country_proxy;

And then define the Node by:

struct Node_t {
    // data
    Country_proxy *country_proxy;
    // children
    Node *left;
    Node *right;
};

Using pointers to store country is actually better when Country type is a large structure. It will cost less memory.

Think

Is it possible to use Country *country in the definition of Node to implement the simulation. And why?

(Definitely yes! But some solutions may need the special property of this tree and cannot generalize)

Here I want to introduce a general solution to those kinds of problems.

The key trick is to use a reference counter to track the number of references to a resource.

Originally, we directly save country inside node:

skinparam classAttributeIconSize 0
package "Resource" as R1 <<Rectangle>> #yellow {
object "Country" as Country1
Country1 : name = "UFO"
Country1 : id = 10
}

object "Node" as node1
node1 : data
node1 : *left
node1 : *right

node1::data -> R1 — Original Node#

Now we add a “proxy” (or you can call it a counter) between them:

The counter in the proxy will remember the number of references. (In the above example, it’s 1 because only one node want to point to the resource)

And, one resource will only have one proxy.

When the proxy found that its counter is 0, it will automatically delete the resource it points to and also itself. It is safe because there are no other references.

skinparam classAttributeIconSize 0
package "Resource" as R1 <<Rectangle>> #grey {
object "Country" as Country1 #grey
Country1 : name = "UFO"
Country1 : id = 10
}

object "Proxy" as proxy1 #grey
proxy1 : *country
proxy1 : counter = 0

object "Node" as node1
node1 : *proxy
node1 : *left
node1 : *right

node1::proxy .> proxy1
proxy1::country -> R1 — Proxy is deleted#

Now let’s try to use this in the example we talked about in the simple version.

After simulation:

Step 1#

Step 2#

Step 3#

Reference code to create the node:

Node *Node_create(Country *country) {
    // Create a Node with given country
    // return a new node
    Node *new_node = malloc(sizeof(Node));
    if (country) {
        new_node->country_proxy          = malloc(sizeof(Country_proxy));
        new_node->country_proxy->counter = 0;
        new_node->country_proxy->proxy   = country;
        new_node->country_proxy->counter++;
    } else {
        new_node->country_proxy = NULL;
    }
    new_node->left  = NULL;
    new_node->right = NULL;
    return new_node;
}

Reference code for destroying:

void tree_destroy(Node *root) {
    // release memory
    if (root == NULL) {
        return;
    } else {
        // free all resources
        tree_destroy(root->left);
        tree_destroy(root->right);
        if (root->country_proxy && root->country_proxy->counter > 0) {
            // proxy still alive, decrease the counter by 1
            root->country_proxy->counter--;
            if (root->country_proxy->counter == 0) {
                // counter is 0, free it!
                free(root->country_proxy->proxy);
                free(root->country_proxy);
            }
        }
        free(root);
    }
}

Drone Setup Checking#

In the last lab we did this in a hurry. Let’s check if everyone has done it!

Warning

If you didn’t set up drone for your projects, you will not be able to see the compilation results, code quality feedback, etc.. As a result, you may get deductions on your projects.

Explanation of the checks#

There are many checks we enabled on the server.

Code Check#

clang-tidy with custom checks. The checks could be found on the materials repo.
Please follow code_quality.pdf on canvas.
Not following it will lead to deductions on exams/projects.

Warning

Please read the code quality document now (download it from Canvas to get the latest version).

Runtime Check#

This is the check mainly for your p2.

You have to learn how to manage memory by yourself in p2. (although in the real world it’s not recommended to manage the memory by yourself)

We use msan, asan, lsan, ubsan to perform the checks. The feedback is very clear. It shows the track-back and tell you where the problem is in your source files.

Alternative memory checking tools are valgrind (not available on Windows). For Windows users, you can either try searching tools online or use WSL.

Attendance Quiz#

Please do the attendance quiz on Canvas now.