Lecture Notes: Princeton Algorithms, Part I

Algorithms

Here’s my notes of the course Algorithms, Part I by Princeton University, which is an introduction course to algorithms and data structures. The course is based on the book: Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne.

Week 1

Lecture 1: Course Introduction

Bob Sedgewick Professor of Computer Science at Princeton

What’s this course?
Intermediate-level survey course on algorithms.
Programming and problem solving, with applications.
Algorithm: method for solving a problem.
Data structure: method ot store information.

Data types, sorting and searching are the first part of this course.

Why should we study algorithms?
Their impact is broad and far-reaching.
Ancient roots. The first algorithms was created in 300 BCE.
The concept of algorithms was formalized by Church and Turing in 1930s.

To solve problems that could not otherwise be addressed. For example, the Network connectivity problem.

Network connectivity problem

“Algothrims + Data Structures = Programs”

They may unlock the secrets of life and of the universe.

For fun and profit.

Lecture 2: Union Find

We illustrate our basic approach to developing and analyzing algorithms by considering the dynamic connectivity problem.
We introduce the union–find data type and consider several implementations (quick find, quick union, weighted quick union, and weighted quick union with path compression).
Finally, we apply the union–find data type to the percolation problem from physical chemistry.

Dynamic Connectivity

Given a set of N objects.
Union command: connect two objects.
Find/connected query: is there a path connecting the two objects?

union(x,y) Create a connection between x and y
connected(x,y) Whether x and y are connected.

Modeling the connections:

We assume “is connected to” is an equivalence relation:
Reflexive: p is connected to p.
Symmetric: if p is connected to q, then q is connected to p.
Transitive: if p is connected to q and q is connecte to r, then p is connected to r.

Connected components: Maximal set of objects that are mutually connected.

For example: {0} {1 4 5} {2 3 6 7}

Implement the operations:
Find query: Check if two objects are in the same component.
Union command: Replace components containing two objects with their union.

Union-find data type (API):

public class UF
     UF(int N) //initialize union-find data structure with N ojbects (0 to N-1)
     void union(int p, int q) //add connection between p and q
     boolean connected(int p, int q) //whether p and q are in the same compoment
     int find(int p) //compoment identifier for p(0 to N-1)
     int count() //number of compoments

Quick Find

Data structure

Integer array id[] of size N.
Interperetation: p and q are connected if and only if they have the same id.

Find

Check if p and q have the same id.

Union

To merge components containing p and q, change all entries whose id equals id[p] to id[q].
Change the first one to match the second one.

Java implementation
public class QuickFindUF
    {
            private int[] id;
            
            public QuickFindUF(int N)
            {
                id = new int[N];
                for (int i = 0; i < N; i++)
                    id[i] = i
            }
            
            public boolean connected(int p, int q)
            {
                return id[p] == id[q];
            }
            
            public void union(int p, int q)
            {
                int pid = id[p];
                int qid = id[q];
                for (int i = 0; i < id.length; i++)
                    if (id[i] == pid) id[i] == qid;
            }
    }

However, the union operation is too expensive.
Quadratic algorithms do not scale.

Quick Union

Data structure

Integer array id[] of size N.
Interpretation: id[i] is parent of i.
Root of i is id[id[id[…id[i]]]].

Find

Check if p and q have the same root.

Union

To merge components containing p and q, set the id of p’s root to the id of q’s root.

Java implementation
 public class QuickUnionUF
        {
            private int[] id;
            
            public QuickUnionUF(int N)
            {
                id = new int[N];
                for (int i = 0; i < N; i++) id[i]=i;
            }
            
            private int root(int i)
            {
                while (i != id[i]) i = id[i];
                return i;
            }
            
            public boolean connected(int p, int q)
            {
                return root(p) == root(q)
            }
            
            public void union(int p, int q)
            {
                int i = root(p);
                int j = root(q);
                id[i] = j;
            }
        }

Trees can get tall and find operation might be too expensive.

Improvement

Weighting

Weighted quick-union.
Modify quick-union to avoid tall trees.
Keep track of size of each tree(number of objects).
Balance by linking root of smaller tree to root of larger tree. (Smaller tree goes below)

Data structure

Same as quick-union, but maintain extra array sz[i] to count number of objects in the tree rooted at i.

Find

Identical to quick-union.

Union

Modify quick-union to:
Link root of smaller tree to root of larger tree.
Update the sz[] array.

Java implementation of weighting
int i = root(p);
int j = root(q);
if (i == j) return;
if (sz[i] < sz[j])  { id[i] = j; sz[j] += sz[i]; }
else                { id[j] = i; sz[i] += sz[j]; }
Analysis

Running time:
Find: Takes time proportional to depth of p and q.
Union: takes constant time, given roots.

Proposition. Depth of any node x is at most lg(N).
Proof. When does depth of x increase?
Increase by 1 when tree T1 containing x is merged into another tree T2.
The size of the tree containing x at least doubles since T2 is larger than T1.
Size of tree containing x can double at most lg(N) times.

Path Compression

Just after computing the root of p, set the id of each examined node to point to that root.

Java implementation:


In the private function root, we could add one line of code.

id[i] = id[id[i]]

Summary

Weighted quick union with path compression makes it possible to solve problems that could not otherwise be addressed.
Quick Find: M N
Quick Union: M N
Wegithed QU: N+M log N
QU and Path compression: N+M log N
Weighted Quick Union and Path compression: N + M lg* N

Applications

Percolation

N-by-N grid of sites. Each site is open with probability p or blocked with probability 1-p. System percolates if and only if top and bottom are connected by open sites. Likelihood of percolation depends on site vacancy probability p. The phase of transition is sharp. p>p: almost certainly percolates; p almost certainly does not percolate.

Solution: Monte Carlo simulation

1.Initialize N-by-N whole grid to be blocked.
2.Declare random sites open until top connected to bottom.
3.Vacancy percentage estimates p*.
4.Create a virtual top site and virtual bottom site and check whether they are connected.

Lecture 3: Analysis of Algorithms

Cast of Characters

Programmer: Develop Solution
Client: Solve Problem
Theoretician: Understand Solution
Basic Blocking and Tackling

We analysis algorithms to predict performance, compare algorithms and provide guarantees.
The primary practical reason is to avoid performance bugs.

Scientific method applied to analysis of algorithms.
Scientific method:
Observe some feature of the natural world.
Hypothesize a model that is consistent with the observations.
Predict event using the hypothesis.
Verify the predictions by making further observations.
Validate by repeating until the hypothesis and observations agree.

Principles:
Expeeriments must be reproducible.
Hypotheses must be falsifiable.

Observation

Example: 3-Sum Algorithm
How to time a program?
1. Watch
2. Automatic (Stopwatch class in Java)
Run the program for various input sizes and measure running time.

Doubling hypothesis
Quick way to estimate b in a power-law relationship.

System independent effects:
Algorithm, Input Data
System dependent effects:
Software, Hardware

Mathematical Model

Total running time: sum of cost * frequency for all operations
Simplification:
Use basic computation as a proxy of time.
Use tilde notation.

How to estimate a discrete sum?
1. Take discrete mathematics course
2. Replace the sum with an integral and use calculus.

In principle, accurate mathematical models are avaliable.
In practice, formulas can be complicated, advanced mathematics might be required, and exact models best left for expers.

Order of Growth Classifications

We could use a few functions to describe order-of-growth of typical algorithms. 1, \log N, N, N \log N, N^{2}, N^{3}

Better order of growth means faster in practice.

Theory of Algorithms

Types of analyses
Best case: Lower bound
Worst case: Upper bound

Notations
Big Theta: asymoptotic order of growth, which used to classify algorithms.
Big Oh: Theta(N^{2}) and smaller, which used to develop upper bounds.
Big Omega: Theta(N^{2}) and larger, which used to develop lower bounds.

Memory

Bit0 or 1
Byte8 bits
Megabyte1 million bytes
Gigabyte1 billion bytes

For an old machine, we used to assume a 32-bit machine with 4 byte pointers. For a new machine, we now assume a 64-bit machine with 9 bytes pointers, which can address more memory and pointers use more space.

Here’s a table of basic memory usage.
Padding Each object uses a multiple of 8.

boolean1
byte1
char2
int4
float4
long8
double8
char[]2N+24
int[]4N+24
double[]8N+24
char[][]~2MN
int[][]~4MN
double[][]~4MN
Obejct overhead16
Reference8

Published by

Siujoeng Lau

Liberty will never perish.

Leave a Reply

Your email address will not be published. Required fields are marked *