Data Structures

at a glance:

Data Structure	Time Complexity								Space Complexity (Worst)
	Average				Worst
	Access	Search	Insertion	Deletion	Access	Search	Insertion	Deletion
Array	`Θ(1)`	`Θ(n)`	`Θ(n)`	`Θ(n)`	`O(1)`	`O(n)`	`O(n)`	`O(n)`	`O(n)`
Stack	`Θ(n)`	`Θ(n)`	`Θ(1)`	`Θ(1)`	`O(n)`	`O(n)`	`O(1)`	`O(1)`	`O(n)`
Queue	`Θ(n)`	`Θ(n)`	`Θ(1)`	`Θ(1)`	`O(n)`	`O(n)`	`O(1)`	`O(1)`	`O(n)`
Singly-Linked List	`Θ(n)`	`Θ(n)`	`Θ(1)`	`Θ(1)`	`O(n)`	`O(n)`	`O(1)`	`O(1)`	`O(n)`
Doubly-Linked List	`Θ(n)`	`Θ(n)`	`Θ(1)`	`Θ(1)`	`O(n)`	`O(n)`	`O(1)`	`O(1)`	`O(n)`
Hash Table	`N/A`	`Θ(1)`	`Θ(1)`	`Θ(1)`	`N/A`	`O(n)`	`O(n)`	`O(n)`	`O(n)`
Binary Search Tree	`Θ(log(n))`	`Θ(log(n))`	`Θ(log(n))`	`Θ(log(n))`	`O(n)`	`O(n)`	`O(n)`	`O(n)`	`O(n)`
B-Tree	`Θ(log(n))`	`Θ(log(n))`	`Θ(log(n))`	`O(log(n))`	`O(log(n))`	`O(log(n))`	`O(log(n))`	`O(log(n))`	`O(n)`
Red-Black Tree	`Θ(log(n))`	`Θ(log(n))`	`Θ(log(n))`	`Θ(log(n))`	`O(log(n))`	`O(log(n))`	`O(log(n))`	`O(log(n))`	`O(n)`
AVL Tree	`Θ(log(n))`	`Θ(log(n))`	`Θ(log(n))`	`Θ(log(n))`	`O(log(n))`	`O(log(n))`	`O(log(n))`	`O(log(n))`	`O(n)`

primitives

schematic/s

ieee 754 floating point representation

two's complement for a 4 bit value

python

type	example
`int`	32
`float`	3.2
`complex`	(3+2j)
`bool`	False
`str`	"Hello World!"
`bytes`	b'\x00\x00\x00'
`NoneType`	None

java

type	size (bits)	description	usage
`byte`	8	-128 to 127	`byte b = 100;`
`short`	16	-32,768 to 32,767	`short s = 1000;`
`int`	32	-$2^{31}$ to $2^{31}-1$	`int i = 12345;`
`long`	64	-$2^{63}$ to $2^{63}-1$	`long l = 123456789L;`
`float`	32	single-precision floating number	`float f = 3.14f;`
`double`	64	double-precision floating number	`double d = 3.14159;`
`char`	16	single 16-bit unicode character	`char c = 'A';`
`boolean`	1bit or 1byte 𐃏	`true` or `false`	`boolan flag = true;`

c

type	size (bits)	description	usage
`char`	typically 8	character type (smallest addressable unit)	`char c = 'A';`
`short`	typically 16	short integer	`short s = 1000;`
`int`	typically 32	integer number	`int i = 42;`
`long`	typically 32 or 64	long integer	`long l = 123456L;`
`long long`	typically $\geq$64	extended-precision integer	`long long ll = 123456789LL;`
`float`	typically 32	single-precision floating number	`float f = 3.14f;`
`double`	typically 64	double-precision floating number	`double d = 3.14159;`
`long double`	$\geq$80 (platform-dependent)	extended-precision floating number	`long double ld = 3.141592653589793L;`
`_Bool`	typically 8	boolean value (true or false, since C99)	`_Bool flag = 1;` or `bool flag = true;`
`void`	—	represents “no value” or “no type”	`void functionName();`

arrays

schematic

time complexity

	Access	Search	Insertion	Deletion
Average	$\mathcal{O}(1)$	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(n)$
Worst	.	.	.	.

explanations are warranted for these. access will be thought of as the time complexity required to sequentially access the $k$th item in the data-structure.

to "access" the $k$th item we can index into the array: constant time

searching for a particular key is not something we can do intelligently in this contiguous block of memory, so we must check all $n$ items.

insertion would on average take $\dfrac n2$ time, but because we are working with asymptotics, the constant disappears. the worst case is insertion at the front of the array with every subsequent item having to be moved into the $k+1$th position.

deletion follows a similiar argument with worst-case being deletion of the first element, and the average case decaying to the worst-case bounds asymptotically.

space complexity

this is less of a question when you have a data structure, as opposed to an algorithm, because with $n$ elements you will have to store all of them uniquely. as such for arrays and all data structures on this page 𐃏 we have $\mathcal{O}(n)$ space complexity.

operations

naturally, whilst we are considering access, search, insertion and deletion operations on this page, an array is represented in Python as a list. these lists have the following methods:

method	explanation
`append()`	adds an element to the end of the list
`clear()`	removes all the elements from the list
`copy()`	returns a copy of the list
`count(arg)`	returns the number of elements with value of arg
`extend(another_list)`	add the elements of another list to this list
`index(arg)`	return index of first element with arg value
`insert(arg)`	add element at arg position
`pop(arg)`	remove element at arg position
`remove()`	remove first item with specified value
`reverse()`	reverses the order of the list
`sort()`	sorts the list in place. mutates array

linked lists

singly linked list

schematic

time complexity

	Access	Search	Insertion	Deletion
Average	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(1)$	$\mathcal{O}(1)$
Worst	.	.	.	.

once again, explanations are necessary:

to access a $k$th item, we need to start at the head and make our way to this item; hence $\mathcal{O}(n)$ for access
same for search; even if you know which item you want, there is no way of knowing where it is. you have to traverse from the head.
the very act of insertion will take constant time. if you wish to find a "middle" item and then insert there, the complexity would be $\mathcal{O}(n)$, but here we are decoupling the operations
same as above. simply freeing memory / deleting a node will take constant time.

doubly linked lists

schematic

time complexity

	Access	Search	Insertion	Deletion
Average	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(1)$	$\mathcal{O}(1)$
Worst	.	.	.	.

with respect to the asymptotics of operations, the doubly linked list provides no advantage over the singly linked list.

tradeoffs

non-contiguous use of memory is an advantage in terms of finding more memory for nodes, but is also a disadvantage in terms of traversal.

strings

strings really are just special cases of arrays. but because their operations vary so wildly from the usual array operations:

concatenation
joining
comparison
splitting
searching for substrings

it makes sense for this topic to have its own little nook.

this page is not about algorithms, and so there is nothing really novel to add here at the moment, but a number of clever algorithms will relate back to this heading.

stacks & queues

stack

schematic

time complexity

	Access	Search	Insertion	Deletion
Average	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(1)$	$\mathcal{O}(1)$
Worst	.	.	.	.

as expected:

random access will take $n$ steps. presumably here we are using a linked list implementation though and so arbitrary accesses will always take (asymptotically) $n$ steps
similarly search as in an array or linked list requires $n$ steps
the advantage comes from inserting into the stack which always takes constant time
and deletion costs constant time. this is the use-case for this data structure anyhow

queue

schematic

time complexity

	Access	Search	Insertion	Deletion
Average	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(1)$	$\mathcal{O}(1)$
Worst	.	.	.	.

this is the same as the stack:

underlying implementation details would be identical, so access behaviour wouldn't change
neither would search functionality
only the location of insertion
and location of deletion would change

double ended queue (deque)

this is an interesting data structure. at first I thought it was short for dequeue, but it is not. instead this structure is pronounced "deck", and is a list with 2 pointers:

hash tables / hashmaps

A map maintains insertion order.

schematic

time complexity

	Access	Search	Insertion	Deletion
Average	n/a	$\mathcal{O}(1)$	$\mathcal{O}(1)$	$\mathcal{O}(1)$
Worst	.	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(n)$

obviously the time-complexity depends on the data structure and the way in which it is implemented from an atomic operations point of view.

having said this, hash-tables are sort of a mystery to me at the moment. they have varying implementations and even then I need to study each to understand what the associated compute would look like for each method.

for now I have just copied down what was given at bigocheatsheet.com

trees

schematic

Binary Search Tree

schematic

binary-tree, unsorted.

binary search tree (sorted)

the critical distinguishing property that makes a binary tree a binary search tree is:

the left child of each node must be less than the node, and;
the right child of this node must be larger than the node.

time complexity

note that the following time complexities are for a binary search tree, because there is really no benefit to performing these operations on a regular binary tree. the time complexities are all $\mathcal{O}(n)$ for the unsorted tree.

	Access	Search	Insertion	Deletion
Average	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$
Worst	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(n)$

naturally, this table warrants explanation. though currently I have none.

I can reconcile average complexity of search to take log n steps, but I wonder why worst case is not the same.

I wonder if insertion causes a total rebalancing to occur. is a Binary Search Tree even balanced?

full vs. complete vs. balanced vs. perfect

1. complete binary tree (not full)

2. full binary tree (not complete)

3. perfect binary tree (full + complete)

4. balanced binary tree

Complete binary tree
- every level is completely filled except possibly the last, and all nodes on the last level appear as far left as possible.
- use-cases: binary-heap, array representations
Full binary tree
- every node has either $0$ or $2$ children (no node has exactly one child).
- use-cases: huffman-encoding
Perfect binary tree
- both full and complete; equivalently, all internal nodes have $2$ children and all leaves lie at the same depth.
Balanced binary tree
- take every node and find the height of its left-subtree and right-subtree. if the difference in heights exceed $k=1$, where $k$ can be any number criterion we choose, then the tree is unbalanced.

Red Black Trees

schematic

another red black tree

small rb tree

time complexity

	Access	Search	Insertion	Deletion
Average	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$
Worst	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$

I've just copied these down from bigocheatsheet.com blindly to close that tab and be done with it for now.

AVL Trees

Is a binary search tree that is height balanced; for each node $x$, the heights of the lef and right subtrees of $x$ differ by at most 1.

To implement such a tree, maintain an extra attribute $h$ in each node such that $x.h$ is the height of node $x$.

	Access	Search	Insertion	Deletion
Average	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$
Worst	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$	$\mathcal{O}(\log(n))$

same thing as above, here. I've just copied the time-complexities down without understanding them.

I also worry about the additional 5 'data-structures':

skip lists
cartesian tree
b-tree (this one makes sense)
splay tree
kd tree

Data Structures

at a glance:

primitives

schematic/s

python

java

c

arrays

schematic

time complexity

space complexity

operations

linked lists

singly linked list

schematic

time complexity

doubly linked lists

schematic

time complexity

tradeoffs

strings

stacks & queues

stack

schematic

time complexity

queue

schematic

time complexity

double ended queue (deque)

hash tables / hashmaps

schematic

time complexity

trees

schematic

Binary Search Tree

schematic

time complexity

full vs. complete vs. balanced vs. perfect

Red Black Trees

schematic

time complexity

AVL Trees

heaps

schematic

graphs

schematic

representations