CSC2040  Data Structures, Algorithms and Programming Languages
Data Structures, Algorithms and Programming Languages
Assessment Information
Assessment Dates
Test  Date 
First Practical  4th Oct 
Test 1  25th Oct 
Test 2  15 Nov 
Test 3  13 Dec 
Test 4  Week of 5 Feb 
Test 5  Week of 19 Mar 
Format
Asked to write programs to solve a set of problems
Open book: Textbook and lecture notes
Books
Objects, Abstraction, Data Structures and Design Using C++
C++
Lecture 1
Overview
C++ is an object orientated language. It has many features common with Java. C++ is much more complex than Java. C++ is generally regarded as the most powerful and fastest language.
It provides the programmer with low level programming features and lots of control over how the code is executed. It mostly avoids run time checks, assuming the programmer will have written the code correctly.
C is a high level language. It offers control of the computer hardware with high level language statements. C++ is a superset of C, which means it includes all of the functions included in C, with added OOP capabilities.
C++ Program Structure
A C++ program is written in two separate parts:

Header files
A header file (.h extension) contains the declaration of variables, functions or classes used in the program 
Source file
(.cpp extension) which includes the header file#include
, contains the definition or implementation of the functions or classes declared in the header file.
A header file
#ifndef MY_FUNCTIONS_H
#define MY_FUNCTIONS_H
// add two integers
int add(int a, int b);
// return maximum of two integers
int max(int a, int b);
#endif
The source file
#include <header.h>
int add(int a, int b)
{
return a + b;
}
int max(int a, int b)
{
if(a > b) return a;
return b;
}
main()
A C++ program contains a main function. This is the place where the operating system begins the execution of the program.
#include "my_functions.h"
int main()
{
int sum_value = add(5, 13);
int max_value = max(23896, 31653);
return 0;
}
#include
The #include
directive is part of the preprocessing, which instructs the compiler to read the source code within the referenced file into the main program, so that they can be seen and used.
The #include
directive effectively builds a complete source code by inserting all other source code for execution
This is advantageous as the same functions (add
, max
) or other classes entirely can be used in different programs by simply including their header files, instead of rewriting the same functions. This could also make the construction of the final executable code much faster (less to compile).
#include
guards
Say that two files are referenced in the header, and both files have some function name that is identical. An error would be produced at runtime. This problem can be solved using include guards, which is part of the preprocessing.
Using the #ifndef
directive, a block of text can be included only if a particular expression is undefined. Then, within the header file, the expression can be defined. This ensures that the code in the #ifndef
block is included only the first time that the file is loaded.
General class declaration form
#ifndef MY_CLASS_H
#define MY_CLASS_H
/* Class code */
#endif
Visual Studio uses #pragma once
C++ Program Execution
Preprocessing
The preprocessor merges the include files into the source code. It also deals with other preprocessing directives. The output of this step is a pure C++ file without preprocessing directives.
Compiling
The compiler takes the preprocessor's output and translates it into machine code. This is classed the Object code (.obj file).
Linking (Building)
The linker combines the user's object code files with that of the library to produce an executable code file (.exe file).
Console I/O
Output any built in data types onto the screen: cout
with the output operator <<
, and multiple items can be chained for output.
Read any built in data types from the keyboard: cin
with the input operator >>
, and multiple items can be chained for input. Use function getline(cin, s)
to read a whole line where s
is a string
type variable. The C++ library classes <iostream>
and <string>
should be included, using namespace std
.
Libraries
Study to understand what library functions and classes are available to be used (by #include
)
 C++ Standard Function Library
 C++ Standard Class Library
 C++ Standard Template Library
Example
#include <iostream> // input / output etc
#include <string> // a class handling char string
using namespace std;
int main()
{
// an empty string
string line;
int d, m, y;
// output a line to command prompt
cout << "give today's day month year" << endl; //new line
cin >> d >> m >> y;
cout << d << " " << m << " " << y << endl;
return 0;
}
Example 2 (getline(cin, s)
)
#include <iostream> // input / output etc
#include <string> // a class handling char string
using namespace std;
int main()
{
// an empty string
string line;
int d, m, y;
// output a line to command prompt
cout << "give today's day month year" << endl; //new line
// read a whole line
getline(cin, line);
cout << line << '\n';
return 0;
}
Lecture 2
Arrays
In C++, the size of an array must be a constant value. Arrays will be stored in the stack memory along with local variables, and parameters passed to functions.
#include <iostream>
#include <cmath>
uding namespace std;
#define SIZE 100
int main()
{
// a double array of size 100
double sample[100];
for (int i = 0; i < 100; i++)
{
sample[i] = sqrt(i);
}
// Display array
for (int i = 0; i < 100; i++)
{
cout << "value[" << i << "] root [" << sample[i] << "]" << endl;
}
return 0;
}
The following is not valid as length is a variable, not a constant.
int length = 100;
float sample[length];
Pointers
A pointer is a variable, which is used to hold the memory address of another variable.
Declaration of a pointer
int* p;
Declares a pointer p
, of type integer, but it is not pointed to any specific integer.
This declares an uninitialised pointer, which is not recommended.
The NULL
pointer
It is good practice to assign the pointer NULL
to a pointer variable in case an exact address to be assigned is not known.
int* p = NULL;
int* p = 0; // 0 is identical to NULL for pointers
Int* p = nullptr;
Always initialise an unused pointer as a NULL pointer
Two special operators: *
and &
int m = 23;
int* p = &m;
Declare an integer pointer p
, and point it to an integer m
. So that p
holds the memory address of m
.
int n = *p;
This statement reads the value at memory address pointed to by p
, which is m
(as above = 23)
*p = 62;
This statement assigns a value 62 to memory address pointed to by p
. This is equivalent to the statement m = 62;
Pointer arithmetic
p++; //1
++p; //2
p; //3
p; //4
p = p + 5; //5
p = p  8; //6

p
is post incremented to the next integer address. 
p
is preincremented to the next integer address. 
p
is post decremented to the previous integer address. 
p
is predecremented to the previous integer address. 
p
is moved forward 5 integer addresses. 
p
is moved backwards 8 integer addresses.
Dangling Pointers
if(true){
int x = 50;
p = &x;
}
*p = 100;
x
's location is only valid within the scope of the if
statement. It may still work, but the memory address may be used for another value at any time by the application or the operating system. It is undefined behaviour.
PointerArray Equivalence
In C++, arrays and pointers are closely related  the name of an array is a pointer to the 1st element of that array. Therefore, is is possible to have two methods of accessing array elements:
 Array indexing
 Pointer arithmetic
// a double array
double sample[100];
for (int i = 0; i < 100; i++)
sample[i] = sqrt(i);
// display the array using array indexing
for (int i = 0; i < 100; i++)
cout << sample[i] << endl;
//display the array using pointer arithmetic
for (int i = 0; i < 100; i++)
cout << *(sample + i) << endl;
*(sample)
is a pointer to the start of the array, then + i
moves the pointer i
places along the array each time the for loop executes.
References
In a computer language, there are two ways that arguments can be passed to a function:
 Call by value  copies the value of an argument into the parameter of the function; changes made to the parametes have no effect on the argument
 Call by reference  copies the address of the argument into the parameter, which is used inside the cuntion to access the argument; changes made to the parameter thus affect the argument.
By default, C++ and Java uses call by value to pass arguments. This means that the code within a function cannot alter the argments used to call the function. C++ contains a feature that is related to the pointer called a reference. The most important use is to allow for the creation of functions that automatically use callbyreference parameter passing.
Dynamic Allocation in C++
Creates memory spaces in the Heap memory. Heap memory must be manually freed. There are two operators for managing heap memory, new
and delete

new
allocates memory and returns a pointer to the start of it 
delete
frees memory previously allocated using new for reuse  Two two operators must match to avoid a memory leak, or undefined behaviour
Allocating a single item with optional initialisation:
type_name* p = new type_name(initialiser);
delete p;
Allocating arrays:
type_name* p = new type_name[size];
delete[] p;
Allocated arrays can be accessed using array indexing. p[0]
, p[1]
etc because of the pointerarray equivallence. No array allocated by new can have an initialiser.
Each new must have a matching delete. If there are more new
calls than delete
calls, there will be a memory leak. More delete
calls than new
calls, the program can start to give undefined behaviour.
Vector
It is not straightforward to do the following with an array, even if it is dynamically allocated (using new
):
 Increase or decrease the length
 Insert an element at a specified position
 Remove an element at a specified position
The C++ template class vector
supports a dynamic array, that solves all these problems by allocating memory as needed. Although a vector is dynamic, you can still use the standard array subscript notation to access its elements. The word template means vector is generic or a framework, leaving the details to be filled in by the compiler.
The C++ vector
combines the best features of arrays in C++ and Java.
Vector vs Pointer
The vector
class eases the handling of dynamic arrays at the price of performance.
Lecture 3
Class definition
The header file defines a class. Example:
#ifndef STACK_H
#define STACK_H
class Stack{
public:
Stack(int size);
// destructor
~Stack();
void push(int i);
int pop();
private:
int stack_size;
int* stck;
int tos;
};
Members (Data and functions) defined after public
, can be accessed by functions outside the class. Members defined after private
can only be accessed by functions that are members of the class, or are declared to be friends of the class.
A constructor is a special ember function of a class and has the same name as that class. It can have parameters or be parameterless, can be overloaded, and can't have a return type. It is automatically called when an object of the class is created, to provide initialisation as appropriate.
The destructor has the same name as the class, but it is preceded by a ~
. It can't have parameters or a return type. It is called automatically when an object is destroyed or out of scope, to undo what the constructor does. For example, it will deallocate dynamic memory previously allocated (otherwise the program will suffer from a memory leak), or to close a file that had been opened, etc.
Integer stack
#ifndef ISTACK_H
#define ISTACK_H
class intStack {
public:
// Constructor
intStack(int size);
// Destructor
~intStack();
//public members (Data or functions)
void push(int i);
int pop();
private:
int* data;
int tos; // Top of stack
int stack_size;
};
#endif
The .
and >
operators
Accessing members for an object is done by using the .
operator. Accessing members for pointers is done using the >
operator.
.
int main() {
Stack s(3);
s.push(10);
std::cout << s.pop() << std::endl;
return 0;
}
>
int main(){
Stack* s = new Stack(3);
s>push(10);
std::cout << s>pop() << std::endl;
}
Dynamic array of objects
No array allocated by new
can have an initialising parameters. To use dynamic arrays of objects, a parameterless constructor must be created. Stack()
Static Class Members
Each object of a class keeps an individual copy of the members of that class. When you precede a data member's declaration with static
, you tell the compiler that only one copy of that member will exist, and that all objects of that class will share that member.
Lecture 4
Operator overloading
You can overload most operators such as +
, 
, ++
, 
, ==
, !=
, +=
etc so that they perform special operations over classes as if they were primitive types.
When an operator is overloaded, none of its original meaning is lost. Instead, the type of objects it can be applied to is expanded. You overload an operator by creating an operator function, which defines the operations the overloaded operator will perform over the class it will work on. Operator functions can be members or nonmembers.
A member operator function takes this general form:
returntype classname :: operator#(argumentlist)
{
//operations
}

#
 a placeholder. If you overload/
, useoperator/
.  If you overload a unary operator (eg
++
), the argument list should be empty.  If you overload a binary operator (eg
+
), the object on the left of+
generates the call to the operator function, and the object on the right of+
is an argument of the function.
Example  Stack
// header
void operator+(int i);
int operator();
bool operator==(const iStack& z);
// overload + for push
void Stack::operator+(int i)
{
push(i);
}
// overload  for pop
int Stack::operator()
{
return pop();
}
// overload ==
bool iStack::operator==(const iStack &z)
{
if (tos !=z.tos) return false;
if (stack_size != z.stack_size) return false;
for (int i = 0; i < tos; i++)
if (data[i] != z.data[i]) return false;
return true;
}
int main()
{
Stack s(3);
// push using overloaded +
s + (8);
s + 21;
s + 192;
s + (82);
// pop using overloaded 
cout << s << endl;
cout << s << endl;
cout << s << endl;
....
return 0;
}
Inheritance
In C++, public
is used instead of extends
(as in Java) to denote inheritance. The most common class inheritance uses this general form:
class derivedclassname : public baseclassname
{
// body of class
};
Access status: All public members of the base class become public members of the derived class. The private elements of the base class remain private to the base and are not accessible by members of the derived class.
To provide greater flexibility, in addition to public
and private
, C++ also has protected
, to create class members that are private to their class, but can still be inherited and accessed by a derived class.
In case the base class' constructor requires parameters, the derived class' constructor uses the following form of definition that passes along arguments to base class' constructor:
derivedclassconstructor(arglist) : baseclassconstructor(arglist)
{
// body of derived constructor
}
In the derived class, a base class member function can be redefined to include operations specific to the derived class. This is called function overriding.
Virtual functions and polymorphism
A virtual function is a member function that is declared within a base class and redefined by a dervied class.
To create a virtual function, precede the function's declaration in the base class with the keyword virtual
. A pure virtual function without definition in the base class can be declared as follows:
virtual returntype functionanem(arguments) = 0;
When a base class pointer points to a derived object that contains a virtual function, C++ calls the version of the function associated with the derived object pointed to by rhe pointer.
The virtual function declared in the base class defines the interface to that function. Each redefinition in a derived class implements a specific operation for that derived class. The different operations can be accessed by using a common interface  the base class pointer, pointing to the different derived objects.
Default function arguments
C++ allows a function to assign a parameter to a default value whne no arument corresponding to that parameter is specified in the call to that function.
Lecture 5
Template functions
Templates allow for use of the same code for a range of data types. The idea is to pass the type of data that the function will operate on to the function as a parameter.
The general form of a template function definiton:
template <typename T> returntype functionname(args)
{
// body of function
}
T
is a placeholder name for a data type used by the function. T
may be used within the function definition. However, it is only a placeholder that the compiler will automatically replace with an actual data type when it creates a specific version of the function.
template<typename T>
T add(T x, T y)
{
return x + y;
}
Template functions are similar to overlaoded functions except that they are more restrictive. When functions are overloaded, different actions may be performed within the body of each function. A template function function performs the same general action for all versions  only the type of data can differ.
Applying template functions
Whenever a function that defines a generalisable algorithm, it can be made into a template function. Once this is done, it can be used with any type of data without having to recode it.
A function template is not a function. It's a recipe for creating a new funtion for each T
that is encountered. A template cannot be compiled into code. For any code to appear, a template must be instantiated. The template arguments must be known so that the compiler can generate an actual function. This is called Instantiationstyle polymorphism.
Templates can be viewed as declarations rather than implementations. It is most convient to define them in the header file.
Template Classes
Class templates are also possible, in the same way function templates are written.
template <typename T>
class classname
{
// class body
}
When this is done, a class is created that defines all the algorithms used by that class; howver the actual type of the data being manipulated will be specified as a parameter when objects of that class are created.
T is the placeholder type name, which will be specified when a class is instantiated. More than one generic data type can be defined using a commaseparated list.
Member functions of a generic class are themselves automatically generic, and are defined using the following general form:
template <typename T>
returntype classname<T>::functionname(parameters)
{
// body
}
Once a class template is created, a specific instance of that class is created using classname <type> o;
. type
is the type name of the data that the class will be operating on.
Standard template library
Part of the C++ standard library, STL provides generalpurpose classes and functions that implement many popular and commonly used algorithms and data structures, including support for vectors, lists, queues, stacks.
Because the STL is constructed using template classes, the algorithms and data structures can be applied to nearly any type of data.
At the core of the STL, there are three foundational items: containers, algorithms and iterators. These items work in conjunction with one another to provide offtheshelf solutions to a variety of programming problems.
Containers
Objects that hold other objects. Examples include the vector
class that defines a dynamic array, deque
that creates a doubleended queue, and list
that provides a linear list.
Each container class defines a set of member functions that may be applied to the container. For example, a list container included functions that insert, delete and merge elements.
Iterators
Objects that act, more or less, like pointers. They give you the ability to cycle through the contents of a container in much the same way that you would use a pointer to cycle through an array.
Algorithms
Act on containers. Their capabilities include initialisation, sorting, searching and transforming the contens of the containers.
While each container class provides support for its own basic operations, the standard algorithms provide more extended or complex actions.
Function objects  functor
A functor is simply any object that can be called as if it is a function. An ordinary function is a function object, and so is an object of a class that defines the function call operator()
.
Sometimes, a function object can be used when an ordinary function won't work. The STL provides a rich assortment of builtin function objects, using the header <functional>
.
To create a function object, overload the operator()
method.
Command line arguments
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char* argv[])
{
if (argc < 4)
{
// it needs 3 parameters: a double, int and string
cout << "Input: a double, an integer, and a string" << endl;
return 1;
}
double d = stod(argv[1]);
int i = stoi(argv[2]);
string s = argv[3];
return 0;
}
Data Structures and Algorithms
Lecture 1
Data structures in programs
In well designed software, the data objects in the program should match the real world objects as closely as possible. Therefore, common patterns are expected for arranging groups of software objects. These reusable patterns are called data structures.
Each data structure provides a specialised and limited set of methods for accessing existing items, adding new items, etc. One of the goals of focussing on data structures is to increase software reusability.
When using template classes, it is accepable to put the declaration and the implementation in the same header file, rather than putting the implementation in a source file.
Using a stack for evaluating expressions
Using a stack as the intermediate storage.
If a value is encountered, push it.
If an operator is encountered, pop the top two items and push the result.
The answer should be the onl item remaining on the stack. Writring the operator between the operands is called INFIX notation.
Writing the operator after the operands (X Y +)
is called POSTFIX notation. With POSTFIX expressions, parentheses are unnecessary.
Dynamic data structures
When an array is used to hold a stack, there is always an upper limit on the size. This is called a static implementation, because it is fixed. In some applications, the size of the stack is not always known in advanced.
One solution is to create a new, larger array then copy the old one into it when the stack is about to overflow. However, this can cause unexpected pauses, and it is also slow.
In a dynamic stack representation, a new space for every new item is added as needed. As each new item will therefore be somewhere random in memory, the items are linked together using pointers.
LinkedStack  Stacks using dynamic implementation
A data item is stored in a Stack Node consisting of the data item, and a pointer to the next node.
template <typename T>
class StackNode
{
public:
StackNode(T i, StackNode<T> *nxt);
~StackNode() {};
T item;
StackNode<T> *next;
};
template <typename T>
StackNode<T>::StackNode(T i, StackNode<T> *nxt)
: item(i), next(nxt)
{
}
A class for the links of stack nodes in a LIFO structure
template <typenmae T>
class LinkedStack
{
public:
LinkedStack();
~LinkedStack();
void push(T val);
T* pop();
T* top();
private:
StackNode<T> *tos;
int stackSize;
T temp_item;
};
Lecture 2
In some linear data structures, it is only possible to join at one of the ends, but not in the middle. Likewise, only the items at the end may be accessed/processed.
In pther cases, the data structure may need to be more flexible  to be able to join at any point, and to access an item at any position
A list
A list is one of the most versatile and general one dimensional data structures.
ArrayList<string> s1(100);
string name1 = "Fred";
string name2 = "joe";
string name3 = "Mary";
sa.addAtFront(name1);
s1.addAt(1, name3);
s1.addAt(1, name2);
string * ps = s1.getAt(0);
s1.addAtEnd(name1);
s1.addAtEnd(name2);
List implemented using arrays
To insert an item at position p
, somewhere in the middle of an array list, requires the following steps:
 Move everything from position
p
up one place (starting at the end, not fromp
)  Copy the new item into the 'gap' at position
p
.
To remove an item at position p
, the following it done:
 Move everything from position
p
down one place (starting atp+1
), not from the end)  Return the item which had been at position
p
.
LinkedList
Avoids having to shift the items up and down the array. In a stack, only one pointer is needed  the item below. In a list, a link to both neighbouring items is needed.
#ifndef LISTNODE_H
#define LISTNODE_H
// An object of type ListNode is a link which simply holds three things:
//  an item (of type T)
//  a pointer to the previous ListNode in the list(if any)
//  a pointer to the next ListNode in the list(if any)
template <typename T>
class ListNode
{
public:
ListNode(T i, ListNode<T>* prev, ListNode<T>* nxt);
~ListNode() {};
// The three things held in a ListNode link
T item;
ListNode<T> *next, *previous;
};
template <typename T>
ListNode<T>::ListNode(T i, ListNode<T>* prev, ListNode<T>* nxt)
: item(i), previous(prev), next(nxt)
{
}
#endif
Inserting a new item
template <typename T>
void LinkedList<T>::addAt(int p, T i)
{
if (p == 0)
addAtFront(i);
else if (p == size)
addAtEnd(i);
else {
ListNode<T>* temp = findAt(p);
if (temp != NULL) {
ListNode<T>* l = new ListNode<T>(i, temp>previous, temp);
temp>previous>next = l;
temp>previous = l;
size++;
}
}
}
The problem with findAt(p)
/ getAt(p)
To find the 23rd item in a linked list, start at the first item and follow the chain of links until position 22 is reached, if it exists. An array provides direct immediate access to any elements.
This is particularly a problem for lists since processing every element in a list is such a common procesing pattern.
Anything proportional to the square of the problem size can be a problem.
One solution
A moveToNext
or getNext
operation, rather than treating every access as being independent and potentially random access.
There are different ways of achieveing this. Another private pointer could be maintained which will be the next one to be accessed:
 Before starting a for loop, set the pointer to the first item
 At each
getNext
, return the current one and move it on to the next
Linked lists and the vector
class
The vector class is essentially a linked list class which was programmed and included in the standard library.
Its push_back()
function is more or less equivalent to addAtEnd()
. Instead of linking each individual item, a vector is a linked list of smaller arrays. When adding items to the end, it only grabs another chunk of memory if the current chink is full.
Not all chunks necessarily use their full capacity  to save filling gaps when removing an item from the middle. The vector implementation is more complex than LinkedList
. It has been optimised and thoroughly tested.
Special cases of lists
Queues
Where only the front item can be accessed and removed, and items can only be added at the end. Queus are good for buffers between two processes, for smoothing out the variations in their processing rates.
Doubleended queues  deques
In a deque, either the front item or last item can be accessed and removed. New items can be added either at the end or at the front.
Deques can be suitable for buffering twoway converstaions between processes, or when each process is both a producer and a consumer. They are not as common as Queues.
Cyclic buffer  Array implementation of a queue
removeAtFront
previously moved everything else up to fill the gap. Instead, just move the value of 'first' and leave the gap. Conceptually arrange the array (buffer) as a ring.
Lecture 5
Algorithms and their analysis
Algorithms
An algorithm is a set of instructions to be followed which will solve a problem. There can be many different algorithms for solving the same problem. In coding terms, an algorithm is a method or function to solve a problem.
Time to execute code
for (int i = 0; i < N; i++)
action();
If action()
takes 1 microsecond to run on a given computer, how long will this loop take to execute?
T(N) = N * (1 + loop overhead per iteration)
in microseconds. T(N) ∝ N
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
action();
T(N) ∝ N^{2}
The big O notation and complexity
The upper bound on the time taken to execute an algorithm is of interest and the asymptotic performance (what it tends to, large N). In these conditions ∝ N^{2}
is rephrased as of order N^{2}
. If an action doesn't depend on N
, then action is O(1)
.
This is called the complexity of the algorithm. It doesn't mean how complex it is. It does not give an absolute time, but a mathematical measure of how the execution time will increase as the problem size increases.
In general, for a loop which ma or may not include a nested loop (Loop3)
for (int i = 0; i < N; i++)
action(N);
C(Loop3)
is O(N * C(action(N))
.
If S
is the statement:
if (condition)
action1(N)
else
action2(N)
... Then the complexity of S
is: C(S) = maximum ( C(action1(N)), C(action2(N)) )
.
This is the complexity of the worst case, in general. Sometimes it is better to measure the complexity of the most common case.
Linear search and analysis
Linear search of a sorted array/list A
of length N
for a value key.
int i = 0;
while(i < N) && A[i] < Key)
i++;
if(i < N && A[i] == Key)
return true;
return false;
Average number of comparisons is N/2
when found, and N
when not found. Linear search is O(N)
. Searching an unsorted array is O(N)
.
Binary search and analysis
Binary search of a sorted array / list A
of length N
for a value Key.
// search can be performed using recursion
if (p != NULL)
if (p>item == Key) return true;
else if (Key < p>item)
return binarySearch(p>left);
else
return binarySearch(p>right);
else return false; // not foung
Average number of comparisons: log2N where log_{2}16 = 4; log_{2}2^{X} = X
.
Binary Search is O(log_{2}N)
No matter how small the slop of O(N)
is, it will always eventually outstrip O(logN)
. So an algorithm of O(logN)
is better than one of O(N)
.
Building a balanced binary search tree (BST)
Given a sorted array, write a function that creates a balanced binary search tree sing array elements.
 Get the middle of the array and make it root.
 Recursively do the same for left half and right half.
 Get the middle of the left half and make it the left child of the root created in step 1.
 Get the middle of the right half and make it the right child of the root created in step 1.
Lecture 6
Algorithms for sorting, and their analysis
Every item in the input seqence must appear in the output sequence, the same number of times. Sorting is a very common operation in computer systems, so sorting has been studied intensely. There are many different algorithms for sorting a sequence.
Selection sort
 Find the position of the minimum in the array
 Swap it with the (current) start position
 Sort the rest of the array
for (int p  0; p < N  1; p++)
{
find the minimum (at position m) in A[p...N1];
swap A[p] with A[m];
}
Finding the minimum is O(N). Therefore, selection sort is O(N^{2})
for (int p = 0; p < N  1; p++)
{
min = A[p]; m = p;
for (int j = p + 1; j <= N  1; j++)
if (A[j] < min) {
min = A[j];
m = j;
}
}
Bubble sort
do {
sorted = true;
for (int i = 0; i < N  1; i++)
if(A[i] > A[i + 1]) {
swapAt(i, i + 1);
sorted = false;
}
} while (!sorted)
The inner for loop is O(N). Best case scenario is when A was already sorted, so the loop runs once. Best is O(N). Worst case scenario is when A
is in reverse order, so the loop runs N
times. Perhaps on average, N / 2
.
Therefore, bubble sort has a complexity of O(N^{2}).
for (;;) {
for (int i = 0; i < N  1; i++) {
if (A[i] > A[i+1]) {
swapAt(i, i + 1);
goto outer;
}
break;
}
break;
outer:;
}
Insertion sort
Build up a new sorted list, one element at a time. Always maintain the order of the result list B.
for (int i = 0; i < N; i++) {
item = A[i];
p = 0;
while (p < i && B[p] <= item)
p++
<insert item at position p in B>;
}
item = A[i]
is the next item to be inserted into B
. i
is the length of the current sorted solution. Insert the next item from A
into the correct position in B
, at or before position i
. Search for insertion position p
in B
. p = 0;
is the position in B
. Find the location of the first item in B
which is > item
, or end of B
.
Array list insertion
To find the position in sorted A
for a new value, and insert it:
 Find position
p
of the new item (the inner while loop):p
comparisons  Insert at position
p
. Note: the remainder of the array (i  p
) must be moved up
This operation is O(N).
Linked list insertion
To access a random position p
is O(N), but using setAtStart()
and getNext()
, to scan for the position of the new value is O(N). To find position and insert new value, there are N / 2
accesses on average: O(N).
For an array list 
C(S) = O(N^{2}) (N * {N})

For a linked list using getNext to scan 
C(S) = O(N^{2}) (N * {N})

For a linked list not using getNext

C(S) = O(N^{3}) (N * {N * N})

QuickSort
Reorder the array so that everything on the left (although unsorted) is less than everything on the right (although the right is unsorted)
This is called partitioning the array. Now the left part of the array can be sorted, then the right part can be sorted. This can be done recursively.
Sort(A, first, last)
Partition A into A[first...M1] and A[M...last]
Sort(A, first, M1)
Sort(A, M, last);
Implementing partitioning
 A middle value is chosen, called the pivot. Ideally this is the median, but this is unknown before it is sorted.
 Scan from the left, and if something >= median, it must go to the right side
 At the same time, scan from the right to find something <= 25, it must go to the left side
 The two values found in 2 and 3 are swapped
 This is continued until the scans meet in the middle
/* partition *?
int i = first, j = last;
int pivot = A[(first + last) / 2];
while (i <= j)
{
while (A[i] < pivot) i++;
while (A[j] > pivot) j;
if (i <= j) {
tmp = A[i];
A[i++] = A[j];
A[j] = tmp;
}
}
if (first < i1) Sort(A, first, i1);
if (i < last) Sort(A, i, last);
Analysis of QuickSort
In an ideal scenario, the array is broken down into a nice balanced binary tree.
There are N
comparisons done in the partitioning at each level, and there are log_{2} N
levels. The complexity of QuickSort in good conditions is O(N log_{2} N).
Lecture 7
Graphs
Graphs are data structures rather like trees. However, graphs are different from trees in many way. In trees a node can have only one parent. In graphs, this limitation is overcome.
A tree is a special case of a graph with restricted relations between nodes. Graphs are more general forms of trees, allowing for arbitrary relations between nodes.
Trees are more closely associated with computer science. Trees are shaped in a way to make it easy to search for data and insert new data.
Graphs were studied long before a computer was invented. Graphs often have a shape dictated by a physical problem, rather than by the algorithms used on them. In graphs, we call a node a vertex, and a relation an edge. The nomenclature arose in mathematics centuries ago.
Terminology
A graph consists of a set of vertices, V, and a set of edges, E: G = {V, E}
.
The vertex set V
The vertext set contains all of the vertices in a graph. Each vertex can be labelled by a name, or by a number.
The edge set E
The edge set contains all the edges in a graph. An edge connects two vertices, source and destination, and taking a form (source, destination, weight)
, where weight is the weight giving to this specific connection.
Directed graphs  Digraphs
Edges are directed, only go from source to destination. This is indicated by edges with arrows indicating the direction of the edge. The arrow head points to the destination. If not specified, each edge is given a weight of 1.0 by default.
Undirected graphs
Edges are undirected, or go in both directions. Indicated by edges without arrows. Only upward edges are defined  When the destination is greater than or equal to the source. The downward edges can be obtained by swapping the source and destination in each upward case.
Again, each edge is given a default weight of 1.0 if not specified.
Adjacent vertices
A vertex is adjacent to another if there is an edge to it from that other vertex. Basically two vertices are adjacent if there is a path between them (through other vertices if need be?)
Adjacency lists
Each list stores the vertices (or the corresponding edges) adjacent to a particular vertex.
Directed
Vertex  Adjacency 
A  (A, B, 1.0) 
B  (B, A, 1.0), (B, E, 1.0) 
C  
D  (D, A, 1.0) 
E  (E, A, 1.0), (E, C, 1.0), (E, D, 1.0) 
Undirected
Vertex  Adjacency Lists 
0  (0, 1, 1.0), (0, 2, 1.0), (0, 3, 1.0), (0, 5, 1.0) 
1  (1, 2, 1.0), (1, 4, 1.0), (1, 6, 1.0) 
2  (2, 4, 1.), (2, 5, 1.0) 
3  (3, 5, 1.0) 
4  (4, 6, 1.0) 
5  (5, 6, 1.0) 
6 
For undirected graphs, only list the upward edges. The downward edges can be obtained by swapping the source / destination in each upward edge.
Adjacency matrices
The matrix stores the weights (assuming the default weight is 1.0
for two vertices with an edge, and 0
for two vertices without an edge.
Directed
W[A][B] = 1.0
, so there is an edge from A
to B
. W[D][C] = 0.0
so there is no edge from D
to C
.
Undirected
Only store the lower triangle of the array. When destination <= source check if W[source][dest] = 1.0
or 0.0
for an edge or no edge.
When dest > source
check if W[dest][source] = 1.0
or 0.0
for an edge or no edge.
Reading a graph representation from a File
A graph definition file contains a list of vertex pairs with edges, with weight != 0.0
.
Using the examples from above:
Format
number of vertices
directed / undirected (u/d)
Edge1 from vertex A (or 0) 1.0
Edge2 from vertex A (or 0) 1.0
...
Edge1 from vertex B (or 1) 1.0
Edge2 from vertex B (or 1) 1.0
Directed
5
d
A B 1.0
B A 1.0
B E 1.0
D A 1.0
E A 1.0
E C 1.0
E D 1.0
Undirected
7
u
0 1 1.0
0 2 1.0
0 3 1.0
0 5 1.0
1 2 1.0
1 4 1.0
1 6 1.0
2 4 1.0
2 5 1.0
3 5 1.0
4 6 1.0
5 6 1.0
Typical applications
 Finding the shortest path between two vertices
 Minimum cost scheduling
 Maximum probability event sequence
 Maximum network flow
Representing a graph in a program
Two classes are used: class Edge
to represent a single edge (source, destination, weight)
and class Graph
to represent a complete graph with num_v
vertices, directed
or undirected
. The Graph
class uses an array of adjacency lists to represent the edges associated with each vertex.
The other approach is the Adjacency Matrix representation. Where the graph is loaded from a graph definition file.
The Edge
class
Header
class Edge {
public:
Edge(int src, int dst, double weight = 1.0);
~Edge() {};
int get_source() const { return source; }
int get_dest() const { return dest; }
double get_weight() const { return weight; }
bool operator==(const Edge& other);
private:
int source, dest;
double weight;
};
Source
#include "Edge.h"
Edge::Edge(int src, int dst, double weight)
: source(src), dest(dst), weight(weight)
{}
bool Edge::operator==(const Edge& other)
{
return (source == other.source && dest == other.dest);
}
The Graph
class
Header
#include "Edge.h"
#include <list>
using std::list;
class Graph {
public:
Graph(char* file_name);
~Graph();
int get_num_v() const {return num_v; }
Edge get_edge(int source, int destination) const;
list<Edge>::iterator begin(int vertex) const {
return edges[vertex.begin();]
}
list<Edge>::iterator end(int vertex) const {
return edges[vertex].end();
}
private:
int num_v;
bool directed;
list<Edge>* edges;
};
Source
#include "Graph.h"
#inculde <iostream>
#include <fstream>
#include <algorithm> // part of the C++ standard library
using namespace std;
Graph::Graph(char* fname) {
edges = 0;
ifstream fin(fname);
if(!fin) {
cout << "Can't open file : " << fname << endl;
return;
}
fin >> num_v;
char c;
fin >> c;
if (c == 'd') directed = true;
else if (c == 'u') directed = false;
edges = new list<Edge>[num_v];
int source, dest; double weight;
while(fin >> source >> dest >> weight) {
edges[source].push_back(Edge(source, dest, weight));
if (!directed) edges[dest].push_back(Edge(dest, source, weight));
}
fin.close();
}
Graph::~Graph() {
if (edges) delete[] edges;
}
Edge Graph:: get_edge(int source, int dest) const {
if (edges[source].size() == 0) return Edge(1, 1, 0.);
list<Edge>::iterator itr = find(edges[source].begin(), edges[source].end(), Edge(source, dest));
if (itr != edges[source].end()) return *itr;
return Edge(1, 1, 0.);
}
Traversal of graphs
Determining which vertices can be reached from a specified vertex, and/or the shortest paths in terms of going through the smallest number of vertices.
There are two fundamental algorithms: a breadthfirst search and a depthfirst search.
Breadthfirst search
Visit the start vertex first of all, then all vertices that are adjacent to it next, then all vertices that are adjacent to each of the adjacent vertices, and so on, until all the vertices have been visited.

Variables:
queue
 stores the vertices to visit.visited
bool
array of sizenum_v
, noting the visit status of each vertex.backtrack
 array of sizenum_v
, noting the parent vertex of each vertex, for retrieving the shortest path. 
Initialisation:
For all verticesv
, setvisited[v] = false;
backtrack[v] = 1;
.
Take thestart
vertex, push it into queue, and mark it as visited:visited[start] = true;

Breadthfirst search:
whilequeue
is not empty, take a vertex source fromqueue
.
for each of its adjacent verticesdest
, ifdest
has not been visited:
pushdest
intoqueue
, mark it as visitedvisited[dest] = true;
, and set the parent ofdest
assource
:backtrack[dest] = source;

Backtracking:
For any vertexv
, retrieve the shortest path to it fromstart
through backtracking.
To retrieve the shortest path through backtracking, assume 0
is the start
vertex. After search, the backtrack
array may look like the following, in which each element notes the parent vertex that leads to the current vertex.
Shortest paths for weighted graphs
Breadthfirst search finds the shortest path assuming each edge has the same weight (ie. equal length). Weights can have arbitrary positive values, eg distance or cost to be minimised through selecting a best path.
For example, in speech recognition, the vertices may represent individual words, and the weight represent probabilities of words following one another to form a sentence. The maximum probability word sequence given a speech signal is found.
Dijkstra's algorithm
This algorithm finds the shortest paths from one specified vertex to all the other vertices. A greedy algorithm which solves a problem in stages by doing what appears to be the best thing at each stage.

Variables:
set v_s
 store unprocessed vertices.distance
 array of sizenum_v
, holding the shortest distance from start vertex to each vertex.backtrack
 array of sizenum_v
, noting the parent vertex of each vertex, for retrieving the shortest path. 
Initialisation:
Insert all vertices except thestart
vertex intov_s
.
For each vertexv
inv_s
, setdistance[v]
to the weightweight(start, v)
or infinity if no edge.
For each vertexv
inv_s
, setbacktrack[v]
as thestart
vertex. 
Dijkstra's algorithm:
whilev_s
is not empty, for allu
inv_s
, find theu
with the smallestdistance[u]
Removeu
fromv_s
For each ofu
's adjacent vertices,v
, ifdistance[u] + weight(u, v) < distance[v]
Setdistance[v] = distance[v] + weight(u, v)
and setbacktrack[v] = u
.
Complexity of single source shortest path algorithms
Unweighted graphs using a breadthfirst search: O(E)
, there E
is the total number of edges.
For weighted graphs, with unbounded nonnegative weights, dijkstra's original algorithm O(V^{2})
, where V
is the number of vertices. The implementation based on a minpriority queue: O(E + V log V)
 so far this is the fastest.
C++ File I/O