0% found this document useful (0 votes)

50 views759 pages

Main

Uploaded by

Raghu Nandan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views759 pages

Main

Uploaded by

Raghu Nandan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 759

Hands-on Algorithmic

Problem Solving
Data Structures, Algorithms, Python
Modules and Coding Interview Problem
Patterns

Li Yin1

February 6, 2022

1
https://liyinscience.com
ii
Contents

0 Preface 1

1 Reading of This Book 7

1.1 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Reading Suggestions . . . . . . . . . . . . . . . . . . . . . . . 10

I Introduction 13

2 The Global Picture of Algorithmic Problem Solving 15

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 What? . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.2 How? . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.3 Organization of the Contents . . . . . . . . . . . . . . 18
2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Problem Modeling . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Understand Problems . . . . . . . . . . . . . . . . . . 20
2.3.2 Understand Solution Space . . . . . . . . . . . . . . . 22
2.4 Problem Solving . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1 Apply Design Principle . . . . . . . . . . . . . . . . . 25
2.4.2 Algorithm Design and Analysis Principles . . . . . . . 26
2.4.3 Algorithm Categorization . . . . . . . . . . . . . . . . 28
2.5 Programming Languages . . . . . . . . . . . . . . . . . . . . . 28
2.6 Tips for Algorithm Design . . . . . . . . . . . . . . . . . . . . 29
2.7 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.7.1 Knowledge Check . . . . . . . . . . . . . . . . . . . . . 29

iii
iv CONTENTS

3 Coding Interviews and Resources 33

3.1 Tech Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.1 Coding Interviews and Hiring Process . . . . . . . . . 33
3.1.2 Why Coding Interviews? . . . . . . . . . . . . . . . . . 35
3.2 Tips and Resources . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.2 Resources . . . . . . . . . . . . . . . . . . . . . . . . . 38

II Warm Up: Abstract Data Structures and Tools 43

4 Abstract Data Structures 47

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Linear Data Structures . . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Array . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.2 Linked List . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.3 Stack and Queue . . . . . . . . . . . . . . . . . . . . . 50
4.2.4 Hash Table . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.2 Types of Graphs . . . . . . . . . . . . . . . . . . . . . 56
4.3.3 Reference . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.2 N-ary Tres and Binary Tree . . . . . . . . . . . . . . . 62

5 Introduction to Combinatorics 65
5.1 Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.1 n Things in m positions . . . . . . . . . . . . . . . . . 66
5.1.2 Recurrence Relation and Math Induction . . . . . . . 67
5.1.3 See Permutation in Problems . . . . . . . . . . . . . . 67
5.2 Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.1 Recurrence Relation and Math Induction . . . . . . . 68
5.3 Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.1 Integer Partition . . . . . . . . . . . . . . . . . . . . . 69
5.3.2 Set Partition . . . . . . . . . . . . . . . . . . . . . . . 69
5.4 Array Partition . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5 Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.6 More Combinatorics . . . . . . . . . . . . . . . . . . . . . . . 72

6 Recurrence Relations 75
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2 General Methods to Solve Linear Recurrence Relation . . . . 78
6.2.1 Iterative Method . . . . . . . . . . . . . . . . . . . . . 78
CONTENTS v

6.2.2 Recursion Tree . . . . . . . . . . . . . . . . . . . . . . 79

6.2.3 Mathematical Induction . . . . . . . . . . . . . . . . . 80
6.3 Solve Homogeneous Linear Recurrence Relation . . . . . . . . 81
6.4 Solve Non-homogeneous Linear Recurrence Relation . . . . . 83
6.5 Useful Math Formulas . . . . . . . . . . . . . . . . . . . . . . 84
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

III Get Started: Programming and Python Data Struc-

tures 85

7 Iteration and Recursion 89

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2 Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.3 Factorial Sequence . . . . . . . . . . . . . . . . . . . . . . . . 91
7.4 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.5 Iteration VS Recursion . . . . . . . . . . . . . . . . . . . . . . 94
7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8 Bit Manipulation 97
8.1 Python Bitwise Operators . . . . . . . . . . . . . . . . . . . . 97
8.2 Python Built-in Functions . . . . . . . . . . . . . . . . . . . . 99
8.3 Twos-complement Binary . . . . . . . . . . . . . . . . . . . . 100
8.4 Useful Combined Bit Operations . . . . . . . . . . . . . . . . 102
8.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

9 Python Data Structures 111

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
9.2 Array and Python Sequence . . . . . . . . . . . . . . . . . . . 112
9.2.1 Introduction to Python Sequence . . . . . . . . . . . . 112
9.2.2 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9.2.3 String . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
9.2.4 List . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
9.2.5 Tuple . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
9.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.2.7 Bonus . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.3 Linked List . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.3.1 Singly Linked List . . . . . . . . . . . . . . . . . . . . 126
9.3.2 Doubly Linked List . . . . . . . . . . . . . . . . . . . . 130
9.3.3 Bonus . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
vi CONTENTS

9.3.4 Hands-on Examples . . . . . . . . . . . . . . . . . . . 132

9.3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.4 Stack and Queue . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.4.1 Basic Implementation . . . . . . . . . . . . . . . . . . 135
9.4.2 Deque: Double-Ended Queue . . . . . . . . . . . . . . 137
9.4.3 Python built-in Module: Queue . . . . . . . . . . . . . 138
9.4.4 Bonus . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.5 Hash Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
9.5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . 142
9.5.2 Python Built-in Data Structures . . . . . . . . . . . . 145
9.5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 148
9.6 Graph Representations . . . . . . . . . . . . . . . . . . . . . . 149
9.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 149
9.6.2 Use Dictionary . . . . . . . . . . . . . . . . . . . . . . 152
9.7 Tree Data Structures . . . . . . . . . . . . . . . . . . . . . . . 153
9.7.1 LeetCode Problems . . . . . . . . . . . . . . . . . . . 155
9.8 Heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.8.1 Basic Implementation . . . . . . . . . . . . . . . . . . 158
9.8.2 Python Built-in Library: heapq . . . . . . . . . . . . . 162
9.9 Priority Queue . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.10 Bonus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

IV Core Principle: Algorithm Design and Analysis 173

10 Algorithm Complexity Analysis 177

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
10.2 Asymptotic Notations . . . . . . . . . . . . . . . . . . . . . . 180
10.3 Practical Guideline . . . . . . . . . . . . . . . . . . . . . . . . 183
10.4 Time Recurrence Relation . . . . . . . . . . . . . . . . . . . . 184
10.4.1 General Methods to Solve Recurrence Relation . . . . 185
10.4.2 Solve Divide-and-Conquer Recurrence Relations . . . 188
10.4.3 Hands-on Example: Insertion Sort . . . . . . . . . . . 190
10.5 *Amortized Analysis . . . . . . . . . . . . . . . . . . . . . . . 192
10.6 Space Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 192
10.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
10.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
10.8.1 Knowledge Check . . . . . . . . . . . . . . . . . . . . . 194
CONTENTS vii

11 Search Strategies 195

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
11.2 Uninformed Search Strategies . . . . . . . . . . . . . . . . . . 198
11.2.1 Breath-first Search . . . . . . . . . . . . . . . . . . . . 199
11.2.2 Depth-first Search . . . . . . . . . . . . . . . . . . . . 201
11.2.3 Uniform-Cost Search(UCS) . . . . . . . . . . . . . . . 203
11.2.4 Iterative-Deepening Search . . . . . . . . . . . . . . . 204
11.2.5 Bidirectional Search** . . . . . . . . . . . . . . . . . . 206
11.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 208
11.3 Graph Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
11.3.1 Depth-first Search in Graph . . . . . . . . . . . . . . . 210
11.3.2 Breath-first Search in Graph . . . . . . . . . . . . . . 215
11.3.3 Depth-first Graph Search . . . . . . . . . . . . . . . . 218
11.3.4 Breadth-first Graph Search . . . . . . . . . . . . . . . 222
11.4 Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
11.4.1 Depth-First Tree Traversal . . . . . . . . . . . . . . . 224
11.4.2 Iterative Tree Traversal . . . . . . . . . . . . . . . . . 227
11.4.3 Breath-first Tree Traversal . . . . . . . . . . . . . . . . 231
11.5 Informed Search Strategies** . . . . . . . . . . . . . . . . . . 232
11.5.1 Best-first Search . . . . . . . . . . . . . . . . . . . . . 232
11.5.2 Hands-on Examples . . . . . . . . . . . . . . . . . . . 233
11.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
11.6.1 Coding Practice . . . . . . . . . . . . . . . . . . . . . 234

12 Combinatorial Search 235

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
12.2 Backtracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
12.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 238
12.2.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . 239
12.2.3 Combinations . . . . . . . . . . . . . . . . . . . . . . . 245
12.2.4 More Combinatorics . . . . . . . . . . . . . . . . . . . 247
12.2.5 Backtracking in Action . . . . . . . . . . . . . . . . . . 250
12.3 Solving CSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
12.4 Solving Combinatorial Optimization Problems . . . . . . . . . 254
12.4.1 Knapsack Problem . . . . . . . . . . . . . . . . . . . . 256
12.4.2 Travelling Salesman Problem . . . . . . . . . . . . . . 260
12.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

13 Reduce and Conquer 263

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
13.2 Divide and Conquer . . . . . . . . . . . . . . . . . . . . . . . 265
13.2.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 265
13.2.2 Hands-on Examples . . . . . . . . . . . . . . . . . . . 267
13.3 Constant Reduction . . . . . . . . . . . . . . . . . . . . . . . 269
viii CONTENTS

13.3.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 269

13.3.2 Hands-on Examples . . . . . . . . . . . . . . . . . . . 270
13.4 Divide-and-conquer VS Constant Reduction . . . . . . . . . . 271
13.5 A to B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
13.5.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 272
13.5.2 Practical Guideline and Examples . . . . . . . . . . . 272
13.6 The Skyline Problem . . . . . . . . . . . . . . . . . . . . . . . 273
13.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

14 Decrease and Conquer 275

14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
14.2 Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
14.2.1 Lower Bound and Upper Bound . . . . . . . . . . . . 277
14.2.2 Applications . . . . . . . . . . . . . . . . . . . . . . . 281
14.3 Binary Search Tree . . . . . . . . . . . . . . . . . . . . . . . . 285
14.3.1 Operations . . . . . . . . . . . . . . . . . . . . . . . . 286
14.3.2 Binary Search Tree with Duplicates . . . . . . . . . . 293
14.4 Segment Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
14.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . 296
14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
14.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 300

15 Sorting and Selection Algorithms 303

15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
15.2 Python Comparison Operators and Built-in Functions . . . . 305
15.3 Naive Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
15.3.1 Insertion Sort . . . . . . . . . . . . . . . . . . . . . . . 308
15.3.2 Bubble Sort and Selection Sort . . . . . . . . . . . . . 310
15.4 Asymptotically Best Sorting . . . . . . . . . . . . . . . . . . . 313
15.4.1 Merge Sort . . . . . . . . . . . . . . . . . . . . . . . . 314
15.4.2 HeapSort . . . . . . . . . . . . . . . . . . . . . . . . . 316
15.4.3 Quick Sort and Quick Select . . . . . . . . . . . . . . 316
15.5 Linear Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
15.5.1 Bucket Sort . . . . . . . . . . . . . . . . . . . . . . . . 320
15.5.2 Counting Sort . . . . . . . . . . . . . . . . . . . . . . . 322
15.5.3 Radix Sort . . . . . . . . . . . . . . . . . . . . . . . . 326
15.6 Python Built-in Sort . . . . . . . . . . . . . . . . . . . . . . . 331
15.7 Summary and Bonus . . . . . . . . . . . . . . . . . . . . . . . 334
15.8 LeetCode Problems . . . . . . . . . . . . . . . . . . . . . . . . 335

16 Dynamic Programming 341

16.1 Introduction to Dynamic Programming . . . . . . . . . . . . 343
16.1.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 343
16.1.2 From Complete Search to Dynamic Programming . . . 345
CONTENTS ix

16.1.3 Fibonacci Sequence . . . . . . . . . . . . . . . . . . . . 346

16.2 Dynamic Programming Knowledge Base . . . . . . . . . . . . 348
16.2.1 When? Two properties . . . . . . . . . . . . . . . . . . 348
16.2.2 How? Five Elements and Steps . . . . . . . . . . . . . 350
16.2.3 Which? Tabulation or Memoization . . . . . . . . . . 352
16.3 Hands-on Examples (Main-course Examples) . . . . . . . . . 352
16.3.1 Exponential Problem: Triangle . . . . . . . . . . . . . 353
16.3.2 Polynomial Problem: Maximum Subarray . . . . . . . 355
16.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
16.4.1 Knowledge Check . . . . . . . . . . . . . . . . . . . . . 357
16.4.2 Cooding Practice . . . . . . . . . . . . . . . . . . . . . 357
16.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

17 Greedy Algorithms 375

17.1 Exploring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
17.2 Introduction to Greedy Algorithm . . . . . . . . . . . . . . . 380
17.3 *Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
17.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 383
17.3.2 Greedy Stays Ahead . . . . . . . . . . . . . . . . . . . 385
17.3.3 Exchange Arguments . . . . . . . . . . . . . . . . . . . 386
17.4 Design Greedy Algorithm . . . . . . . . . . . . . . . . . . . . 390
17.5 Classical Problems . . . . . . . . . . . . . . . . . . . . . . . . 391
17.5.1 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 391
17.5.2 Partition . . . . . . . . . . . . . . . . . . . . . . . . . 398
17.5.3 Data Compression, File Merge . . . . . . . . . . . . . 400
17.5.4 Factional S . . . . . . . . . . . . . . . . . . . . . . . . 401
17.5.5 Graph Algorithms . . . . . . . . . . . . . . . . . . . . 401
17.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

18 Hands-on Algorithmic Problem Solving 403

18.1 Direct Approach . . . . . . . . . . . . . . . . . . . . . . . . . 403
18.1.1 Search in Graph . . . . . . . . . . . . . . . . . . . . . 403
18.1.2 Self-Reduction . . . . . . . . . . . . . . . . . . . . . . 404
18.1.3 Dynamic Programming . . . . . . . . . . . . . . . . . 405
18.2 A to B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
18.2.1 Self-Reduction . . . . . . . . . . . . . . . . . . . . . . 405
18.2.2 Dynamic Programming . . . . . . . . . . . . . . . . . 406
18.2.3 Divide and Conquer . . . . . . . . . . . . . . . . . . . 407

V Classical Algorithms 411

19 Advanced Search on Linear Data Structures 415

19.1 Slow-Faster Pointers . . . . . . . . . . . . . . . . . . . . . . . 416
x CONTENTS

19.1.1 Array . . . . . . . . . . . . . . . . . . . . . . . . . . . 416

19.1.2 Minimum Window Substring (L76, hard) . . . . . . . 419
19.1.3 When Two Pointers do not work . . . . . . . . . . . . 421
19.1.4 Linked List . . . . . . . . . . . . . . . . . . . . . . . . 421
19.2 Opposite-directional Pointers . . . . . . . . . . . . . . . . . . 426
19.3 Follow Up: Three Pointers . . . . . . . . . . . . . . . . . . . . 427
19.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
19.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430

20 Advanced Graph Algorithms 431

20.1 Cycle Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 432
20.2 Topological Sort . . . . . . . . . . . . . . . . . . . . . . . . . 434
20.3 Connected Components . . . . . . . . . . . . . . . . . . . . . 438
20.3.1 Connected Components Detection . . . . . . . . . . . 439
20.3.2 Strongly Connected Components . . . . . . . . . . . . 442
20.4 Minimum Spanning Trees . . . . . . . . . . . . . . . . . . . . 444
20.4.1 Kruskal’s Algorithm . . . . . . . . . . . . . . . . . . . 445
20.4.2 Prim’s Algorithm . . . . . . . . . . . . . . . . . . . . . 448
20.5 Shortest-Paths Algorithms . . . . . . . . . . . . . . . . . . . . 453
20.5.1 Algorithm Design . . . . . . . . . . . . . . . . . . . . . 454
20.5.2 The Bellman-Ford Algorithm . . . . . . . . . . . . . . 461
20.5.3 Dijkstra’s Algorithm . . . . . . . . . . . . . . . . . . . 467
20.5.4 All-Pairs Shortest Paths . . . . . . . . . . . . . . . . . 469

21 Advanced Data Structures 475

21.1 Monotone Stack . . . . . . . . . . . . . . . . . . . . . . . . . . 475
21.2 Disjoint Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
21.2.1 Basic Implementation with Linked-list or List . . . . . 481
21.2.2 Implementation with Disjoint-set Forests . . . . . . . 483
21.3 Fibonacci Heap . . . . . . . . . . . . . . . . . . . . . . . . . . 489
21.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
21.4.1 Knowledge Check . . . . . . . . . . . . . . . . . . . . . 489
21.4.2 Coding Practice . . . . . . . . . . . . . . . . . . . . . 489

22 String Pattern Matching Algorithms 491

22.1 Exact Single-Pattern Matching . . . . . . . . . . . . . . . . . 491
22.1.1 Prefix Function and Knuth Morris Pratt (KMP) . . . 493
22.1.2 More Applications of Prefix Functions . . . . . . . . . 499
22.1.3 Z-function . . . . . . . . . . . . . . . . . . . . . . . . . 499
22.2 Exact Multi-Patterns Matching . . . . . . . . . . . . . . . . . 503
22.2.1 Suffix Trie/Tree/Array Introduction . . . . . . . . . . 503
22.2.2 Suffix Array and Pattern Matching . . . . . . . . . . . 503
22.2.3 Rabin-Karp Algorithm (Exact or anagram Pattern Match-
ing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
CONTENTS xi

22.3 Bonus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509

22.4 Trie for String . . . . . . . . . . . . . . . . . . . . . . . . . . . 510

VI Math and Geometry 517

23 Math and Probability Problems 519

23.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
23.1.1 Prime Numbers . . . . . . . . . . . . . . . . . . . . . . 519
23.1.2 Ugly Numbers . . . . . . . . . . . . . . . . . . . . . . 521
23.1.3 Combinatorics . . . . . . . . . . . . . . . . . . . . . . 523
23.2 Intersection of Numbers . . . . . . . . . . . . . . . . . . . . . 526
23.2.1 Greatest Common Divisor . . . . . . . . . . . . . . . . 526
23.2.2 Lowest Common Multiple . . . . . . . . . . . . . . . . 527
23.3 Arithmetic Operations . . . . . . . . . . . . . . . . . . . . . . 528
23.4 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . 529
23.5 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 530
23.6 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
23.7 Miscellaneous Categories . . . . . . . . . . . . . . . . . . . . . 532
23.7.1 Floyd’s Cycle-Finding Algorithm . . . . . . . . . . . . 532
23.8 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
23.8.1 Number . . . . . . . . . . . . . . . . . . . . . . . . . . 533

VII Problem-Patterns 535

24 Array Questions(15%) 537

24.1 Subarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
24.1.1 Absolute-conditioned Subarray . . . . . . . . . . . . . 540
24.1.2 Vague-conditioned subarray . . . . . . . . . . . . . . . 547
24.1.3 LeetCode Problems and Misc . . . . . . . . . . . . . . 552
24.2 Subsequence (Medium or Hard) . . . . . . . . . . . . . . . . . 556
24.2.1 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
24.3 Subset(Combination and Permutation) . . . . . . . . . . . . . 560
24.3.1 Combination . . . . . . . . . . . . . . . . . . . . . . . 561
24.3.2 Combination Sum . . . . . . . . . . . . . . . . . . . . 564
24.3.3 K Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
24.3.4 Permutation . . . . . . . . . . . . . . . . . . . . . . . 573
24.4 Merge and Partition . . . . . . . . . . . . . . . . . . . . . . . 574
24.4.1 Merge Lists . . . . . . . . . . . . . . . . . . . . . . . . 574
24.4.2 Partition Lists . . . . . . . . . . . . . . . . . . . . . . 574
24.5 Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
24.5.1 Speedup with Sweep Line . . . . . . . . . . . . . . . . 576
24.5.2 LeetCode Problems . . . . . . . . . . . . . . . . . . . 578
xii CONTENTS

24.6 Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

24.7 Miscellanous Questions . . . . . . . . . . . . . . . . . . . . . . 580
24.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
24.8.1 Subsequence with (DP) . . . . . . . . . . . . . . . . . 581
24.8.2 Subset . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
24.8.3 Intersection . . . . . . . . . . . . . . . . . . . . . . . . 587

25 Linked List, Stack, Queue, and Heap Questions (12%) 589

25.1 Linked List . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
25.2 Queue and Stack . . . . . . . . . . . . . . . . . . . . . . . . . 591
25.2.1 Implementing Queue and Stack . . . . . . . . . . . . . 591
25.2.2 Solving Problems Using Queue . . . . . . . . . . . . . 592
25.2.3 Solving Problems with Stack and Monotone Stack . . 593
25.3 Heap and Priority Queue . . . . . . . . . . . . . . . . . . . . 601

26 String Questions (15%) 605

26.1 Ad Hoc Single String Problems . . . . . . . . . . . . . . . . . 606
26.2 String Expression . . . . . . . . . . . . . . . . . . . . . . . . . 606
26.3 Advanced Single String . . . . . . . . . . . . . . . . . . . . . . 606
26.3.1 Palindrome . . . . . . . . . . . . . . . . . . . . . . . . 606
26.3.2 Calculator . . . . . . . . . . . . . . . . . . . . . . . . . 612
26.3.3 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
26.4 Exact Matching: Sliding Window and KMP . . . . . . . . . . 616
26.5 Anagram Matching: Sliding Window . . . . . . . . . . . . . . 616
26.6 Exact Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 617
26.6.1 Longest Common Subsequence . . . . . . . . . . . . . 617
26.7 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
26.7.1 Palindrome . . . . . . . . . . . . . . . . . . . . . . . . 617

27 Tree Questions(10%) 619

27.1 Binary Search Tree . . . . . . . . . . . . . . . . . . . . . . . . 619
27.2 Segment Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
27.3 Trie for String . . . . . . . . . . . . . . . . . . . . . . . . . . . 630
27.4 Bonus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
27.5 LeetCode Problems . . . . . . . . . . . . . . . . . . . . . . . . 637

28 Graph Questions (15%) 639

28.1 Basic BFS and DFS . . . . . . . . . . . . . . . . . . . . . . . 639
28.1.1 Explicit BFS/DFS . . . . . . . . . . . . . . . . . . . . 639
28.1.2 Implicit BFS/DFS . . . . . . . . . . . . . . . . . . . . 639
28.2 Connected Components . . . . . . . . . . . . . . . . . . . . . 641
28.3 Islands and Bridges . . . . . . . . . . . . . . . . . . . . . . . . 645
28.4 NP-hard Problems . . . . . . . . . . . . . . . . . . . . . . . . 647
CONTENTS xiii

29 Dynamic Programming Questions (15%) 651

29.1 Single Sequence O(n) . . . . . . . . . . . . . . . . . . . . . . . 652
29.1.1 Easy Type . . . . . . . . . . . . . . . . . . . . . . . . 653
29.1.2 Subarray Sum: Prefix Sum and Kadane’s Algorithm . 657
29.1.3 Subarray or Substring . . . . . . . . . . . . . . . . . . 662
29.1.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . 664
29.2 Single Sequence O(n2 ) . . . . . . . . . . . . . . . . . . . . . . 664
29.2.1 Subsequence . . . . . . . . . . . . . . . . . . . . . . . 664
29.2.2 Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . 665
29.3 Single Sequence O(n3 ) . . . . . . . . . . . . . . . . . . . . . . 671
29.3.1 Interval . . . . . . . . . . . . . . . . . . . . . . . . . . 671
29.4 Coordinate: BFS and DP . . . . . . . . . . . . . . . . . . . . 677
29.4.1 One Time Traversal . . . . . . . . . . . . . . . . . . . 677
29.4.2 Multiple-time Traversal . . . . . . . . . . . . . . . . . 683
29.4.3 Generalization . . . . . . . . . . . . . . . . . . . . . . 689
29.5 Double Sequence: Pattern Matching DP . . . . . . . . . . . . 690
29.5.1 Longest Common Subsequence . . . . . . . . . . . . . 691
29.5.2 Other Problems . . . . . . . . . . . . . . . . . . . . . . 692
29.5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 698
29.6 Knapsack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698
29.6.1 0-1 Knapsack . . . . . . . . . . . . . . . . . . . . . . . 699
29.6.2 Unbounded Knapsack . . . . . . . . . . . . . . . . . . 701
29.6.3 Bounded Knapsack . . . . . . . . . . . . . . . . . . . . 702
29.6.4 Generalization . . . . . . . . . . . . . . . . . . . . . . 702
29.6.5 LeetCode Problems . . . . . . . . . . . . . . . . . . . 703
29.7 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
29.7.1 Single Sequence . . . . . . . . . . . . . . . . . . . . . . 705
29.7.2 Coordinate . . . . . . . . . . . . . . . . . . . . . . . . 705
29.7.3 Double Sequence . . . . . . . . . . . . . . . . . . . . . 709

VIII Appendix 711

30 Cool Python Guide 713

30.1 Python Overview . . . . . . . . . . . . . . . . . . . . . . . . . 714
30.1.1 Understanding Objects and Operations . . . . . . . . 714
30.1.2 Python Components . . . . . . . . . . . . . . . . . . . 717
30.2 Data Types and Operators . . . . . . . . . . . . . . . . . . . 719
30.2.1 Arithmetic Operators . . . . . . . . . . . . . . . . . . 720
30.2.2 Assignment Operators . . . . . . . . . . . . . . . . . . 720
30.2.3 Comparison Operators . . . . . . . . . . . . . . . . . . 720
30.2.4 Logical Operators . . . . . . . . . . . . . . . . . . . . 720
30.2.5 Special Operators . . . . . . . . . . . . . . . . . . . . 721
30.3 Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
xiv CONTENTS

30.3.1 Python Built-in Functions . . . . . . . . . . . . . . . . 722

30.3.2 Lambda Function . . . . . . . . . . . . . . . . . . . . . 722
30.3.3 Map, Filter and Reduce . . . . . . . . . . . . . . . . . 723
30.4 Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
30.4.1 Special Methods . . . . . . . . . . . . . . . . . . . . . 725
30.4.2 Class Syntax . . . . . . . . . . . . . . . . . . . . . . . 726
30.4.3 Nested Class . . . . . . . . . . . . . . . . . . . . . . . 726
30.5 Shallow Copy and the deep copy . . . . . . . . . . . . . . . . 727
30.5.1 Shallow Copy using Slice Operator . . . . . . . . . . . 727
30.5.2 Iterables, Generators, and Yield . . . . . . . . . . . . 728
30.5.3 Deep Copy using copy Module . . . . . . . . . . . . . 728
30.6 Global Vs nonlocal . . . . . . . . . . . . . . . . . . . . . . . . 730
30.7 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
30.8 Special Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
30.9 Supplemental Python Tools . . . . . . . . . . . . . . . . . . . 731
30.9.1 Re . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
30.9.2 Bitsect . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
30.9.3 collections . . . . . . . . . . . . . . . . . . . . . . . . . 732
List of Figures

1.1 Four umbrellas: each row indicates corresponding parts as

outlined in this book. . . . . . . . . . . . . . . . . . . . . . . 8

2.1 The State Space Graph. This may appears as a tree, but we
can redraw it as a graph. . . . . . . . . . . . . . . . . . . . . 23
2.2 State Transfer process on a linear structure . . . . . . . . . . 24
2.3 State Transfer Process on the tree . . . . . . . . . . . . . . . 25
2.4 Linear Search on explicit linear data structure . . . . . . . . 25
2.5 Binary Search on an implicit Tree Structure . . . . . . . . . . 26
2.6 The State Spaces Graph . . . . . . . . . . . . . . . . . . . . 30
2.7 State Transfer Tree Structure for LIS, each path represents
a possible solution. Each arrow represents an move: find
an element in the following elements that’s larger than the
current node. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1 Computer Prices, Computer Speed and Cost/MHz . . . . . . 35

3.2 Topic tags on LeetCode . . . . . . . . . . . . . . . . . . . . . 39
3.3 Use Test Case to debug . . . . . . . . . . . . . . . . . . . . . 39
3.4 Use Test Case to debug . . . . . . . . . . . . . . . . . . . . . 40

4.1 Array Representation . . . . . . . . . . . . . . . . . . . . . . . 48

4.2 Singly Linked List . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Doubly Linked List . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Stack VS Queue . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 Example of Hashing Table, replace key as index . . . . . . . . 52
4.6 Hashtable chaining to resolve the collision, change it to the
real example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

xv
xvi LIST OF FIGURES

4.7 Example of graphs. Middle: undirected graph, Right: di-

rected graph, and Left: representing undirected graph as di-
rected, Rightmost: weighted graph. . . . . . . . . . . . . . . . 56
4.8 Bipartite Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.9 Example of Trees. Left: Free Tree, Right: Rooted Tree with
height and depth denoted . . . . . . . . . . . . . . . . . . . . 60
4.10 A 6-ary Tree Vs a binary tree. . . . . . . . . . . . . . . . . . . 62
4.11 Example of different types of binary trees . . . . . . . . . . . 63

6.1 The process to construct a recursive tree for T (n) = 3T (bn/4c)+

O(n). There are totally k+1 levels. Use a better figure. . . . 79

7.1 Iteration vs recursion: in recursion, the line denotes the top-

down process and the dashed line is the bottom-up process.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2 Call stack of recursion function . . . . . . . . . . . . . . . . . 92

8.1 Two’s Complement Binary for Eight-bit Signed Integers. . . . 100

9.1 Linked List Structure . . . . . . . . . . . . . . . . . . . . . . 126

9.2 Doubly Linked List . . . . . . . . . . . . . . . . . . . . . . . . 130
9.3 Four ways of graph representation, renumerate it from 0. Re-
draw the graph . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9.4 Max-heap be visualized with binary tree structure on the left,
and be implemented with Array on the right. . . . . . . . . . 157
9.5 A Min-heap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.6 Left: delete node 5, and move node 12 to root. Right: 6 is
the smallest among 12, 6, and 7, swap node 6 with node 12. . 160
9.7 Heapify: The last parent node 45. . . . . . . . . . . . . . . . 162
9.8 Heapify: On node 1 . . . . . . . . . . . . . . . . . . . . . . . 162
9.9 Heapify: On node 21. . . . . . . . . . . . . . . . . . . . . . . 162

10.1 Order of Growth of Common Functions . . . . . . . . . . . . 181

10.2 Graphical examples for asymptotic notations. Replace f(n)
with T(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
10.3 The process to construct a recursive tree for T (n) = 3T (bn/4c)+
O(n). There are totally k+1 levels. Use a better figure. . . . 186
10.4 The cheat sheet for time and space complexity with recur-
rence function. If T(n) = T(n-1)+T(n-2)+...+T(1)+O(n-1)
= 3n . They are called factorial, exponential, quadratic, lin-
earithmic, linear, logarithmic, constant. . . . . . . . . . . . . 193

11.1 Graph Searching . . . . . . . . . . . . . . . . . . . . . . . . . 197

11.2 Exemplary Acyclic Graph. . . . . . . . . . . . . . . . . . . . 199
LIST OF FIGURES xvii

11.3 Breath-first search on a simple search tree. At each stage, the

node to be expanded next is indicated by a marker. . . . . . 200
11.4 Depth-first search on a simple search tree. The unexplored
region is shown in light gray. Explored nodes with no de-
scendants in the frontier are removed from memory as node
L disappears. Dark gray marks nodes that is being explored
but not finished. . . . . . . . . . . . . . . . . . . . . . . . . . 201
11.5 Bidirectional search. . . . . . . . . . . . . . . . . . . . . . . . 206
11.6 Exemplary Graph: Free Tree, Directed Cyclic Graph, and
Undirected Cyclic Graph. . . . . . . . . . . . . . . . . . . . . 209
11.7 Search Tree for Exemplary Graph: Free Tree and Directed
Cyclic Graph, and Undirected Cyclic Graph. . . . . . . . . . 212
11.8 Depth-first Graph Search Tree. . . . . . . . . . . . . . . . . . 214
11.9 Breath-first Graph Search Tree. . . . . . . . . . . . . . . . . . 217
11.10The process of Depth-first Graph Search in Directed Graph.
The black arrows denotes the the relation of u and its not
visited neighbors v. And the red arrow marks the backtrack
edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
11.11Classification of Edges: black marks tree edge, red marks
back edge, yellow marks forward edge, and blue marks cross
edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
11.12The process of Breath-first Graph Search. The black arrows
denotes the the relation of u and its not visited neighbors v.
And the red arrow marks the backtrack edge. . . . . . . . . 223
11.13Exemplary Binary Tree . . . . . . . . . . . . . . . . . . . . . 224
11.14Left: PreOrder, Middle: InOrder, Right: PostOrder. The red
arrows marks the traversal ordering of nodes. . . . . . . . . . 225
11.15The process of iterative preorder tree traversal. . . . . . . . . 227
11.16The process of iterative postorder tree traversal. . . . . . . . 228
11.17The process of iterative tree traversal. . . . . . . . . . . . . . 230
11.18Draw the breath-first traversal order . . . . . . . . . . . . . . 231

12.1 A Sudoku puzzle and its solution . . . . . . . . . . . . . . . . 236

12.2 The search tree of permutation . . . . . . . . . . . . . . . . . 240
12.3 The search tree of permutation by swapping. The indexes of
items to be swapped are represented as a two element tuple. 241
12.4 The search tree of permutation with repetition . . . . . . . . 242
12.5 The Search Tree of Combination. . . . . . . . . . . . . . . . . 245
12.6 Acyclic graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
12.7 The Search Tree of subsequences.The red circled nodes are
redundant nodes. Each node has a variable s to indicate the
starting index of candidates to add to current subsequence. i
indicate the candidate to add to the current node. . . . . . . 250
12.8 A Sudoku puzzle and its solution . . . . . . . . . . . . . . . . 252
xviii LIST OF FIGURES

12.9 Depth-First Branch and bound . . . . . . . . . . . . . . . . . 258

12.10A complete undirected weighted graph. . . . . . . . . . . . . 260

13.1 Divide and Conquer Diagram . . . . . . . . . . . . . . . . . . 265

13.2 Merge Sort with non-overlapping subproblems where sub-
problems form a tree . . . . . . . . . . . . . . . . . . . . . . . 268
13.3 Fibonacci number with overlapping subproblems where sub-
problems form a graph. . . . . . . . . . . . . . . . . . . . . . 270

14.1 Example of Binary Search . . . . . . . . . . . . . . . . . . . . 276

14.2 Binary Search: Lower Bound of target 4. . . . . . . . . . . . . 278
14.3 Binary Search: Upper Bound of target 4. . . . . . . . . . . . 278
14.4 Binary Search: Lower and Upper Bound of target 5 is the same.278
14.5 Example of Binary search tree of depth 3 and 8 nodes. . . . . 285
14.6 The red colored path from the root down to the position where
the key 9 is inserted. The dashed line indicates the link in
the tree that is added to insert the item. . . . . . . . . . . . 287
14.7 A BST with nodes 3 duplicated twice. . . . . . . . . . . . . . 294
14.8 A BST with nodes 3 marked with two occurrence. . . . . . . 294
14.9 A Segment Tree . . . . . . . . . . . . . . . . . . . . . . . . . 295
14.10Illustration of Segment Tree for Sum Range Query. . . . . . 297

15.1 The whole process for insertion sort: Gray marks the item to
be processed, and yellow marks the position after which the
gray item is to be inserted into the sorted region. . . . . . . . 309
15.2 One pass for bubble sort . . . . . . . . . . . . . . . . . . . . . 311
15.3 The whole process for Selection sort . . . . . . . . . . . . . . 312
15.4 Merge Sort: The dividing process is marked with dark arrows
and the merging process is with gray arrows with the merge
list marked in gray color too. . . . . . . . . . . . . . . . . . . 315
15.5 Lomuto’s Partition. Yellow, while, and gray marks as region
(1), (2) and (3), respectively. . . . . . . . . . . . . . . . . . . 318
15.6 Bucket Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
15.7 Counting Sort: The process of counting occurrence and com-
pute the prefix sum. . . . . . . . . . . . . . . . . . . . . . . . 323
15.8 Counting sort: Sort keys according to prefix sum. . . . . . . . 324
15.9 Radix Sort: LSD sorting integers in iteration . . . . . . . . . 327
15.10Radix Sort: MSD sorting strings in recursion. The black
and grey arrows indicate the forward and backward pass in
recursion, respectively. . . . . . . . . . . . . . . . . . . . . . 329
15.11The time complexity for common sorting algorithms . . . . . 335

16.1 Dynamic Programming Chapter Recap . . . . . . . . . . . . . 341

16.2 Subproblem Graph . . . . . . . . . . . . . . . . . . . . . . . . 343
16.3 State Transfer for the panlindrom splitting . . . . . . . . . . 370
LIST OF FIGURES xix

16.4 Summary of different type of dynamic programming problems 373

17.1 All intervals sorted by start and end time. . . . . . . . . . . . 377

17.2 All intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
17.3 All intervals sorted by start and end time. . . . . . . . . . . . 393
17.4 Left: sort by start time, Right: sort by finish time. . . . . . . 397
17.5 Left: sort by start time, Right: sort by finish time. . . . . . . 397

18.1 Graph Model for LIS, each path represents a possible solution. 406
18.2 The solution to LIS. . . . . . . . . . . . . . . . . . . . . . . . 407

19.1 Two pointer Technique . . . . . . . . . . . . . . . . . . . . . . 415

19.2 The data structures to track the state of window. . . . . . . . 419
19.3 The partial process of applying two pointers. The grey shaded
arrow indicates the pointer that is on move. . . . . . . . . . . 420
19.4 Slow-fast pointer to find middle . . . . . . . . . . . . . . . . . 422
19.5 Circular Linked List . . . . . . . . . . . . . . . . . . . . . . . 423
19.6 Floyd’s Cycle finding Algorithm . . . . . . . . . . . . . . . . . 424
19.7 Sliding Window Property . . . . . . . . . . . . . . . . . . . . 429

20.1 Undirected Cyclic Graph. (0, 1, 2, 0) is a cycle . . . . . . . . . 432

20.2 Directed Cyclic Graph, (0, 1, 2, 0) is a cycle. . . . . . . . . . . 432
20.3 DAG 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
20.4 The connected components in undirected graph, each dashed
read circle marks a connected component. . . . . . . . . . . . 438
20.5 The strongly connected components in directed graph, each
dashed read circle marks a strongly connected component. . . 438
20.6 A graph with four SCCs. . . . . . . . . . . . . . . . . . . . . . 442
20.7 Example of minimum spanning tree in undirected graph, the
green edges are edges of the tree, and the yellow filled vertices
are vertices of MST (change this to a graph with multiple
spanning tree, and highlight the one with the minimum ones. 445
20.8 The process of Kruskal’s Algorithm . . . . . . . . . . . . . . . 446
20.9 A cut denoted with red curve partition V into {1,2,3} and
{4,5}. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
20.10Prim’s Algorithm, at each step, we manage the cross edges. . 449
20.11Prim’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 451
20.12A weighted and directed graph. . . . . . . . . . . . . . . . . . 454
20.13All paths from source vertex s for graph in Fig. 20.12 and its
shortest paths. . . . . . . . . . . . . . . . . . . . . . . . . . . 456
20.14The simple graph and its adjacency matrix representation
(changing it to lower letter) . . . . . . . . . . . . . . . . . . . 457
20.15DP process using Eq. 20.4 for Fig. 20.14 . . . . . . . . . . . 458
20.16DP process using Eq. 20.5 for Fig. 20.14 . . . . . . . . . . . 459
20.17DP process using Eq. 20.6 for Fig. 20.14 . . . . . . . . . . . 460
xx LIST OF FIGURES

20.18The update on D for Fig. 20.12. The gray filled spot marks
the nodes that updated its estimate value, with its precessor
indicated by incoming red arrow. . . . . . . . . . . . . . . . . 462
20.19The tree structure indicates the updates on D, and the short-
est path tree marked by red arrows. . . . . . . . . . . . . . . 463
20.20The execution of Bellman-Ford’s Algorithm with ordering
[s, t, y, z, x]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
20.21The execution of Bellman-Ford’s Algorithm on DAG using
topologically sorted vertices. The red color marks the shortest-
paths tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
20.22The execution of Dijkstra’s Algorithm on non-negative weighted
graph. Red circled vertices represent the priority queue, and
blue circled vertices represent the set S. Eventually, the blue
colored edges represent the shortest-paths tree. . . . . . . . . 468
20.23All shortest-path trees starting from each vertex. . . . . . . . 472

21.1 The process of decreasing monotone stack . . . . . . . . . . . 476

21.2 The connected components using disjoint set. . . . . . . . . . 482
21.3 A disjoint forest . . . . . . . . . . . . . . . . . . . . . . . . . . 484

22.1 The process of the brute force exact pattern matching . . . . 492
22.2 The Skipping Rule . . . . . . . . . . . . . . . . . . . . . . . . 494
22.3 The Sliding Rule . . . . . . . . . . . . . . . . . . . . . . . . . 495
22.4 Proof of Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 495
22.5 Z function property . . . . . . . . . . . . . . . . . . . . . . . . 500
22.6 Cyclic Shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
22.7 Building a Trie from Patterns . . . . . . . . . . . . . . . . . . 509
22.8 Trie VS Compact Trie . . . . . . . . . . . . . . . . . . . . . . 511
22.9 Trie Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 512

23.1 Example of floyd’s cycle finding . . . . . . . . . . . . . . . . . 532

24.1 Subsequence Problems Listed on LeetCode . . . . . . . . . . 556

24.2 Interval questions . . . . . . . . . . . . . . . . . . . . . . . . . 575
24.3 One-dimensional Sweep Line . . . . . . . . . . . . . . . . . . 576
24.4 Min-heap for Sweep Line . . . . . . . . . . . . . . . . . . . . . 577

25.1 Example of insertion in circular list . . . . . . . . . . . . . . . 590

25.2 Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
25.3 Track the peaks and valleys . . . . . . . . . . . . . . . . . . . 599
25.4 profit graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
25.5 Task Scheduler, Left is the first step, the right is the one we
end up with. . . . . . . . . . . . . . . . . . . . . . . . . . . . 603

26.1 LPS length at each position for palindrome. . . . . . . . . . 610

LIST OF FIGURES xxi

27.1 Example of Binary search tree of depth 3 and 8 nodes. . . . . 620

27.2 The lightly shaded nodes indicate the simple path from the
root down to the position where the item is inserted. The
dashed line indicates the link in the tree that is added to
insert the item. . . . . . . . . . . . . . . . . . . . . . . . . . 620
27.3 Illustration of Segment Tree. . . . . . . . . . . . . . . . . . . 628
27.4 Trie VS Compact Trie . . . . . . . . . . . . . . . . . . . . . . 631
27.5 Trie Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 631

29.1 State Transfer Tree Structure for LIS, each path represents
a possible solution. Each arrow represents an move: find
an element in the following elements that’s larger than the
current node. . . . . . . . . . . . . . . . . . . . . . . . . . . . 665
29.2 Word Break with DFS. For the tree, each arrow means check
the word = parent-child and then recursively check the result
of child. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
29.3 Caption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
29.4 One Time Graph Traversal. Different color means different
levels of traversal. . . . . . . . . . . . . . . . . . . . . . . . . . 679
29.5 Caption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
29.6 Caption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
29.7 Tree Structure for One dimensional coordinate . . . . . . . . 688
29.8 Longest Common Subsequence . . . . . . . . . . . . . . . . . 690
29.9 Caption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706

30.1 Copy process . . . . . . . . . . . . . . . . . . . . . . . . . . . 727

30.2 Caption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
30.3 Caption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
30.4 Caption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
xxii LIST OF FIGURES
List of Tables

3.1 10 Main Categories of Problems on LeetCode, total 877 . . . 38

3.2 Problems categorized by data structure on LeetCode, total
877 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 10 Main Categories of Problems on LeetCode, total 877 . . . 41

9.1 Common Methods of String . . . . . . . . . . . . . . . . . . . 116

9.2 Common Boolean Methods of String . . . . . . . . . . . . . . 116
9.3 Common Methods of List . . . . . . . . . . . . . . . . . . . . 119
9.4 Common Methods for Sequence Data Type in Python . . . . 125
9.5 Common out of place operators for Sequence Data Type in
Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9.6 Common Methods of Deque . . . . . . . . . . . . . . . . . . . 138
9.7 Datatypes in Queue Module, maxsize is an integer that sets
the upperbound limit on the number of items that can be
places in the queue. Insertion will block once this size has
been reached, until queue items are consumed. If maxsize is
less than or equal to zero, the queue size is infinite. . . . . . . 139
9.8 Methods for Queue’s three classes, here we focus on single-
thread background. . . . . . . . . . . . . . . . . . . . . . . . . 139
9.9 Methods of heapq . . . . . . . . . . . . . . . . . . . . . . . . 163

10.1 Analog of Asymptotic Relation . . . . . . . . . . . . . . . . . 182

11.1 Performance of Search Algorithms on Trees or Acyclic Graph 208

14.1 Methods of bisect . . . . . . . . . . . . . . . . . . . . . . . . 280

15.1 Comparison operators in Python . . . . . . . . . . . . . . . . 305

15.2 Operator and its special method . . . . . . . . . . . . . . . . 307

xxiii
xxiv LIST OF TABLES

16.1 Tabulation VS Memoization . . . . . . . . . . . . . . . . . . . 352

27.1 Time complexity of operations for BST in big O notation . . 626

29.1 Different Type of Single Sequence Dynamic Programming . . 652

29.2 Different Type of Coordinate Dynamic Programming . . . . . 652
29.3 Process of using prefix sum for the maximum subarray . . . . 658
29.4 Different Type of Coordinate Dynamic Programming . . . . . 677

30.1 Arithmetic operators in Python . . . . . . . . . . . . . . . . . 720

30.2 Comparison operators in Python . . . . . . . . . . . . . . . . 721
30.3 Logical operators in Python . . . . . . . . . . . . . . . . . . . 721
30.4 Identity operators in Python . . . . . . . . . . . . . . . . . . 722
30.5 Membership operators in Python . . . . . . . . . . . . . . . . 722
30.6 Special Methods for Object Creation, Destruction, and Rep-
resentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
30.7 Special Methods for Object Creation, Destruction, and Rep-
resentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
30.8 Container Data types in collections module. . . . . . . . . . 733
0

Preface

1
2 0. PREFACE
Preface

Graduating with a Computer science or engineering degree? Converting

from physics, or math, or any unrelated field to computer science? Dream-
ing of getting a job as a software engineer in game-playing companies such
as Google, Facebook, Amazon, Microsoft, Oracle, LinkedIn, and so on? Un-
fortunately, there are the most challenging “coding interview” guarding the
door to these top-notch tech companies. The interview process can be in-
timidating, with the interviewer scrutinizing every punch of your typing
or scribbling on the whiteboard. Meanwhile, you are required to express
whatever is on your mind to walk your interviewer through the design and
analysis process and end the interview with clean and supposedly functional
code.
What kind of weapons or martial arts do we need to toughen ourselves up
so that we can knock down the “watchdog” and kick it in? By weapons and
martial arts, I mean books and resources. Naturally, you pull out your first
or second year college textbook Introduction to Algorithms from bookshelf,
dust it off, and are determined to read this 1000-plus-pages massive book to
refresh your brain with data structures, divide and conquer, dynamic pro-
gramming, greedy algorithm and so on. If you are bit more knowledgeable,
you would be able to find another widely used book–Cracking the Coding
Interviews and online coding websites–LeetCode and LintCode–to prepare.
How much time do you think you need to put in? A month? Two months?
Or three months? You would think after this, you are done with the inter-
view, but for software engineers, it is not uncommon to switch companies
frequently. Then you need to start the whole process again until you gain a
free pass to “coding interviews” via becoming an experienced senior engineer
or manager.
I was in the exact same shoes. My first war started in the fall of 2015,
continued for two months and ended without a single victory. I gave up the
whole interview thing until two years ago when my life (I mean finances)

3
4 0. PREFACE

situation demanded me to get an internship. This time, I got to know

LeetCode and started to be more problem and practice driven from the
beginning. ‘Cause God knows how much I did not want to redo this process,
I naturally started to dig, summarize or create, and document problem-
patterns, from sources such as both English and Chinese blogs, class slides,
competitive programming guideline and so on.
I found I was not content with just passing interviews. I wanted to seek
the source of the wisdom of algorithmic problem solving–the principles. I
wanted to reorganize my continuously growing knowledge in algorithms in
a way that is as clear and concise as possible. I wanted to attach math
that closely relates to the topic of algorithmic problem solving, which would
ease my nerves when reading related books. But meanwhile I tried to avoid
getting too deep and theoretical which may potentially deviate me from
the topic and adds more stress. All in all, we are not majoring in math,
which is not ought to be easy; we use it as a practical tool, a powerful
one! When it comes to data structures, I wanted to connect the abstract
structures to real Python objects and modules, so that when I’m using
data structures in Python, I know the underlying data structures and their
responding behaviors and efficiency. I felt more at ease seeing each particular
algorithm explained with the source principle of algorithm design–why it is
so, instead of treating each as a standalone case and telling me “what” it is.
Three or four months in midst of the journey of searching for answers to
the above “wantes”, the idea of writing a book on this topic appeared in my
mind. I did not do any market research, and did not know anything about
writing a book. I just embarked on the boat, drifted along, and as I was far-
ther and deeper in the ocean of writing the book, I realized how much work
it can be. If you are reading this sometime in the future, then I landed. The
long process is more of an agile development in software engineering; knowl-
edge, findings, and guidelines are added piece by piece, constantly going
through revision. Yet, when I started to do research, I found that there are
plenty of books out there focusing on either teaching algorithmic knowledge
(Introduction to Algorithms, Algorithmic Problem Solving, etc) or introduc-
ing interview processes and solving interview problems(Cracking the Coding
Interview, Coding Interview Questions, etc), but barely any that combines
the two. This book naturally makes up this role in the categorization; learn-
ing the algorithmic problem solving by analyzing and practicing interview
problems creates a reciprocal relationship–creating passion and confidence
to make 1+1=4.
What’s my expectation? First, your feeling of enjoyment when reading
and practicing along with the book is of the upmost importance to me.
Second, I really wish that you would be able to sleep well right the night
before the interview which proves that your investment both financially and
timewise was worthwhile.
In all, this is a book that unites the algorithmic problem solving, Coding
5

Interviews, and Python objects and modules. I tried hard to do a good job.
This book differs from books focusing on extracting the exact formulation
of problems from the fuzzy and obscure world. We focus on learning the
principle of algorithm design and analysis and practicing it using well-defined
classical problems. This knowledge will also help you define a problem more
easily in your job.
Li Yin

Li Yin
http://liyinscience.com
8/30/2019

Acknowledgements
6 0. PREFACE
1

Reading of This Book

1.1 Structures
I summarize the characteristics that potentially set this book apart from
other books seen in the market; starting from introducing technically what
I think of the core principles of algorithm design are–the “source” of the
wisdom I was after as mentioned in the preface, to illustrating the concise
organization of the content, and to highlighting other unique features of this
book.

Core Principles
Algorithm problem solving follows a few core principles: Search and Com-
binatorics, Reduce and Conquer, Optimization via Space-Time Trade–off or
be Greedy. We specifically put these principles in one single part of the
book–Part. IV.
1. In Chapter. IV (Search and Combinatorics), I teach how to formulate
problems as searching problems via combinatorics in the field of math
to enumerate its state space—solution space or all possibilities. Then
we further optimize and improve the efficiency through “backtracking”
techniques.

2. In Chapter. ??(Reduce and Conquer) we can either reduce problem

A to problem B (solving problem B means we solved problem A) or
Self-Reduction to reduce problem to a series of subproblems (Such as
these algorithm design methodologies fall into this area: divide and
conquer, some search algorithms, dynamic programming and greedy
algorithms). Mathematical induction and recurrence relations

7
8 1. READING OF THIS BOOK

play as an important role in problem solving, complexity analysis and

even correctness proof.

3. When optimization is needed, we have potentially two methods: when

we see the subproblems/states overlap, space-time trade-off can be
applied such as in dynamic programming; Or we can make greedy
choice based on current situation.

Concise and Clear Organization

Figure 1.1: Four umbrellas: each row indicates corresponding parts as out-
lined in this book.

In this book, we organize in the ordering of Part, Chapter, Section,

Subsection, Subsubsection and Paragraph. The parts will be categorized
under four umbrellas and each serves an essential purpose:

1. Preparation: Introduce the global picture of algorithmic problem solv-

ing and coding interviews, learn abstract data structures and highly
related and useful math such as recurrence relation, and hands-on
Python practice by relating the abstract data structures to Python
data structures.

2. Principles: As we introduced in the core principle part, we organize

the design and principle here so that readers can use them as guidance
while not seeking for peculiar algorithm for solving a problem.

3. Classical algorithms: We enhance our algorithm database via learning

how to apply the core principles to a variety of classical problems. A
database that we can quickly relate to when seeing problems.
1.1. STRUCTURES 9

4. Coding interview problem patterns: We close our book with the an-
alyzing and categorizing problems by patterns. We address classical
and best solutions for each problem pattern.

Other Features and Uniqueness

1. The exercise and answer setting: at the problem-pattern section, the
first chapter will be named problem pool which list all problems with
description. At each exercise section across chapters, only problem id
is referred. Instead the answers to problems are organized by different
patterns so that users can review problem solving skills quickly when
preparing for an interview.This is also practical to problem solving
skills.
2. Real coding interview problems referred from LeetCode, users can eas-
ily practice online and join discussions with other users.
3. Real Python Code included in the textbook and offered via Google
Colab instead of using Pseudo-code.
4. The content is grain-scaled, great for users to skim when necessary to
prepare for interviews.
5. Included practical algorithms that are extremely useful for solving cod-
ing interview problems and yet are almost never be included in other
books, such as monotone stack, two-pointer techniques, and bit ma-
nipulation with Python.
6. Included highly related math methods to ease the learning of the topic,
including recurrence relations, math formulas, math induction method.
7. Explanation of concepts are problem solving oriented, this makes it
easier for users to grasp the concepts. We introduce the concepts
along with examples, we strengthen and formalize the concepts in the
summary section.

Q&A
What do we not cover? In the spectrum of coding interviews and the
spectrum of the algorithms, we do not include:
• Although this book is a comprehensive combination of Algorithmic
Problem Solving and Coding Interview Coaching, I decided not to
provide preparation guideline to the topic of System Design to avoid
deviation from our main topic. An additional reason is, personally, I
have no experience yet about this topic and meanwhile it is not a topic
that I am currently interested in either, so a better option is to look
for that in another book.
10 1. READING OF THIS BOOK

• On the algorithm side, we briefly explain what is approximate algo-

rithms, heuristic search, and linear programming, which is mainly
seen in Artificial Intelligence, such as machine learning algorithms and
neural networks. We do mention it because I think it is important to
know that the field of artificial intelligence are just simply a subfield
of algorithms, it is powerful because of its high dimensional modeling
and large amount of training data.

How much we include about Python 3? We use Python 3 as our

programming language to demonstrate the algorithms for its high readability
and popularity in both industry and academics. We mainly focus on Python
built-in Data types, frequently used modules, and a single class, and leave
out knowledge such as object-oriented programming that deals with class
heritages and composition, exception handling, an so on. Our approach
is to provide brief introduction to any prior Python 3 knowledge when it
is first used in the book, and put slightly more details in the Appendix
for further reading and reference. We follow PEP 8 Python programming
style. If you want to the object-oriented programming in Python, Python 3
Object-oriented programming is a good book to use.

Problem Setting Compared with other books that talk about the prob-
lem solving (e.g. Problem Solving with Algorithms and Data Structures, we
do not talk about problems in complex setting. We want the audience to
have a simple setting so that they can focus more on analyzing the algorithm
or data structures’ behaviors. This way, we keep out code clean and it also
serves the purpose of coding interview in which interviewees are required
to write simpler and less code compared with a real engineering problems
because of the time limit.
Therefore, the purpose of this book is three-fold: to answer your ques-
tions about interview process, prepare you fully for the “coding intervie”,
and the most importantly master algorithm design and analysis principles
and sense the beauty of them and in the future to use them in your work.

1.2 Reading Suggestions

We divide the learning of this book in four stages, each stage builds up on
each other. Evaluate which stage you are, and we kindly suggest you to read
in these orders:
• First Stage I recommend readers first start with Part Two, funda-
mental algorithm design and analysis, part Three, bit manipulation
and data structures to know the basics in both algorithm design and
data structures. In this stage, for graph data structures, we learn BFS
and DFS with their corresponding properties to help us understand
1.2. READING SUGGESTIONS 11

more graph and tree based algorithms. Also, DFS is a good example
of recursive programming.

• Second Stage In the second stage, we move further to Part Four,

Complete Search and Part Five, Advanced Algorithm Design. The
purpose of this stage is to move further to learn more advanced al-
gorithm design methodologies: universal search, dynamic program-
ming, and greedy algorithms. At the end, we will understand under
what condition, we can improve our algorithms with efficiency from
searching-based algorithms to dynamic programming, and similarly
from dynamic programming to greedy algorithms.

• Third Stage After we know and practiced the universal algorithm

design and know their difference and handle their basic problems. We
can move to the third stage, where we push ourselves further in algo-
rithms, we learn more advanced and special topics which can be very
helpful in our career. The content is in Part Six, Advanced and Special
Topics.

1. For example, we learn move advanced graph algorithms. They

can be either BFS or DFS based.
2. Dynamic programming special, where we explore different types
of dynamic programming problems to gain even better under-
standing to this topic.
3. String pattern Matching Special:

• Fourth Stage In this stage, I recommend audience to review the

content by topics:

1. Graph: Chapter Graphs, Chapter Non-linear Recursive Back-

tracking, Chapter Advanced Graph Algorithms, Chapter Graph
Questions.
2. Tree: Chapter Trees, Chapter Tree Questions
3. String matching: Chapter String Pattern Matching Special, Chap-
ter String Questions
4. Other topics: Chapter Complete Search, Chapter Array Ques-
tions, Chapter Linked List, Stack, Queue and Heap.

Wanna Finish the Book ASAP? Or just review it for interviews?

I organize the book in all of forty chapters, it is a lot but they are carefully
put under different parts to highlight each individual’s purpose in the book.
We can skim difficult chapters marked by asterisk(∗) that will unlikely ap-
pear in a short-time interview. The grained categorization helps us to skim
12 1. READING OF THIS BOOK

on the chapter levels, if you are confident enough with some chapters or
you think they are too trivial, just skim, given that the book is designed
to be self-contained of multiple fields(programming languages, algorithmic
problem solving and the coding interview problem patterns).
The content within the book is almost always partitioned into para-
graphs with titles. This conveniently allows us to skip parts that are just
for enhancement purpose, such as “stores” or . This helps us skim within
each chapter.
Part I

Introduction

13
2

The Global Picture of Algorithmic Problem Solving

“We problem modeling with data structures and problem solving

with algorithms, Data structures often influence the details of an
algorithm. Because of this the two often go hand in hand.”
– Niklaus Wirth, Algorithms + Data Structures = Programs, 1976

In this chapter, we build up a global picture of algorithmic problem

solving to guide the reader through the whole “ride”.

2.1 Introduction
In the past, a person who is capable of solving complex math/physics com-
putation problem faster than the ordinaries stands out and is highly seek
out. For example, during world war two, Alen Turing hired engineer who
was fast solving the Sudoku problems. These kind of stories die with the
rise of powerful machines, with which the magic sticks are handed over to
ones–programmers who are able to harness the continually growing compu-
tation power of the hardwares to solve those once only a handful or none of
people that can solve, with algorithms.
There are many kinds of programmers. Some of them code the real-
world, obvious and easy rules to implement applications, some others chal-
lenge more computational problems with knowledge in math, calculus, ge-
ometry, physics, and so. We give a universal definition of algorithmic prob-
lem solving–information processing. Three essential parts include: Data
structures, algorithms, and programming languages. Knowing some basic
data structures, some types of programming languages and some basic al-
gorithms are enough for the first type of programmers. They might focus

15
16 2. THE GLOBAL PICTURE OF ALGORITHMIC PROBLEM SOLVING

more on the front-end, such as mobile design, webpage design. The second
type of programmers, however, need to be equipped with more advanced
data structures and algorithm design and analysis techniques. Sadly, it is
all just a start, the real powerful lie in the combination of these algorithm
design methodologies and the other subjects. Math among all is the most
important, for both design and analysis, as we will see in this book. Still
a candidate with strong algorithmic skills is off a good start, at least with
some basic math knowledge, we can almost always manage to solve problems
with brutal force searching, and some others with dynamic programming.
Let us continue to define the algorithmic problem solving as information
processing, just what it is, and not how at this moment.

2.1.1 What?
Introduction to Data Structure Information is the data we care about,
which needs to be structured. And we can think of data structure as our low-
level file manager, what it needs to do is to support four basic operations–
’find’ a file belongs to Bob, ’Add’ Emily’s file, ’Delete’ Shown’s file, ’Modify’
Bod’s file. Why structured? If you are the file manager, would you just
throw all the hundreds of files over the floor or just throwing over in the
drawer? Nope, you line them up in the drawer, or you even put a name
on top of each file and order them by their first name. The way data is
structured in program is similar to real-world system, simply lining up, or
organize like a tree structure if there is some belonging and hierarchical
ordering which appears in institutions and companies.

Introduction to Algorithms Algorithms further process data with a

series of basic operations–searching, modifying, inserting, deleting, and so–
that come with input data’s structures or even auxiliary data’s structures if
necessary. How to design and analyze this series of operations are the field
of algorithmic problem solving.
Same problem can be solved with different level of complexities in time
and storing space. Deep down, algorithm designers
With this information processing step, we get our task done–computing
our high school math, sorting the student ids in order, searching a word in
a document, you name it.

Programming Language A programming language especially higher level

of language such as Python would come with data structures that might al-
ready have the basic operations: search, modify, insert, delete. For example,
the list module in Python, it is used to store an array of items, it comes
with append(), insert(), remove, pop that you can operate your data,
thus, a list can be viewed as a data structure. If we know what data struc-
ture we save our input instance, what algorithms to use to operate the data,
2.1. INTRODUCTION 17

we can code these rules with a certain programming language and let the
computer take over and if it won’t demands billions of operations, it will get
the result way more faster than humans are capable of, this is why we need
computers anyway.

2.1.2 How?

Knowing what it is, now, you would ask how. How can we know how to or-
ganize our data, how to design and analysis our algorithm? how to program
it? We need to study existing and well-designed data structures, algorithm
design principle and algorithm analysis techniques, understand and analyze
our problems, and study classical algorithms that our predecessors invented
for solving a classical problem, only then when we are seeing a problem,
old or new, we are prepared, we compare it with problems we know how
to solve: if it is exact the same same, congratulations, we would solve our
problem; if it is similar to a certain category of problems, at least we start
from a direction and not from scratch; if it is totally new, at least we have
our algorithm design principle and analysis techniques, we design one after
understanding the problem and relate it to all our skills. Of course, there
are problems that no body has been able to solve it yet. We will study it
in the book so that you would identify when the problem you are solving is
too hard.

The Tree of Algorithmic Problem Solving Back to the question,how?

We study and build up our knowledge and skill base. A well-organized and
explained knowledge base will surely ease our nerves and make things easier.
The field of algorithms and computer science is highly flexible. Assuming
the knowledge of computer science is a tree, and assume that each leaf is a
specific algorithm to solve a specific type of problem. What would be the
root of the tree? The main trunk, branches? It is impossible for us to check
or even count the number of leaves. But, we can understand the tree by
knowing its structures. This book is fascinated with this belief and shows a
lot of effort into organizing the algorithm design and analysis methodologies,
data structures, and problem patterns. It starts with the rooting algorithm
design and analysis principle, and we study classical leaves by relating it
and explained with our principle rather than treating each one individually.
The algorithm design and analysis principles comprise the trunk of the
algorithm tree. A branch would be applying a type of algorithm design
principle on a certain type of data structure, for example, algorithms related
to tree structures, to graph structures, to string, to list, to set, to stack, to
priority queue and so on.
18 2. THE GLOBAL PICTURE OF ALGORITHMIC PROBLEM SOLVING

2.1.3 Organization of the Contents

Based on our understanding of what is algorithmic problem solving and how
to solve it, we organize the content of the book as:

• Part ?? includes the abstract commonly used data structures in com-

puter science, the math tools for design, correctness prove, and some
geometry knowledge that we need to solve geometry problems that are
still often seen in the interviews.

• Part ?? strengthens the programming skills by implementing data

structures and some basic coding.

• Part ?? is our main trunk of the algorithmic programming solving.

• Part ?? to Part ?? takes us to different branches and showcases clas-

sical algorithms within that branch. One or many algorithm design
principles can be applied to solve these problems.

• Part. ?? is the problem patterns. Actually, if we have a good grasp

of the sections before, this section is more of a review and exercises
section. The finding of the patterns are to ease our coding interview
preparation.

As a part of the introduction part, As I always believe, setting up the

big picture should be be very first part of any technical book; it helps to
know how each part plays its role global-wise with more details comes from
the preface. The organization of this chapter follows the global picture and
each element of algorithmic problem solving is further briefed on in each
section:

• Problem Modeling (Section. 2.3), includes Data structures, hands-on

examples.

• Problem Solving (Section. 2.4), includes Algorithm Design and Analy-

sis Methodologies (Section. 2.4.2) and Programming Language(Section. 2.5).

2.2 Introduction
Algorithms are Not New Algorithms should not be considered purely
abstract and obscure. It origins from real-life problem solving including time
before there even exist computers (machines). The recurrence were studied
as early as 1202 by L. Fibonacci, for whom the fibinocci number is named.
Algorithms, as a set of rules/actions to solve a problem, they leverage any
form of knowledge – math, physics. Math stands out among all, as it is our
tool to understand problems, present relations, solve problems, and analyze
complexity. In this book, we use math in the most practical way and only
2.3. PROBLEM MODELING 19

at places where it really matters. The difference is, with computer program
written in a certain computer language to execute the algorithm is way more
efficient generally than doing it in person.

Algorithms are Everywhere in our daily life. Assume you are given a
group of people, your task is to find if there is a person in the group that is
born on a certain day. The most intuitive way to do is to check each of them
and see if his/her birthday matches with the target, this needs you to go a
full-round of this group of people. If you observed that this group of people
is grouped by the months, then you can nail down the times of checking by
checking the subgroup that matches the month of your target day. The first
way is the easiest and most straightforward way to solve a problem, which
is called brute force. The second one is involves more observation and might
takes less time to get the answer. However, they both have one thing in
common, need us to nail down the possibilities; in the first way, we nail it
down one by one, and in the second, we nail it down by almost 11/12 of the
original possibility. We can say solving the problem is to find its solution
in solution space, and different way of finding the solution is called different
algorithm.

2.3 Problem Modeling

The very first thing is to find or be given a problem exist in the world and
solving it can bring practical value and hopefully make some good effect
on the mother natural or humanity. In problem modeling, we analyze the
characteristics of problem and relate it on certain data structures.
In the stage of problem modeling, we define the problem and model our
problems with data structures. In this section, we first answer the question,
“what is a problem in computer science?” Then, we introduce the “skeleton”
–Data Structures to prepare for our next step –problem solving. Then, we
give hands-on Examples about how to model a problem with data structures.
If you are a zoologist, these are how you define a species: describe the
fresh and appearance, put together its skeletons, search similar well-studied
species from dataset, match observed behaviors, and induce the unknown
ones from similar species. There are two key steps to problem modeling:

1. Understand the problems (Section. 2.3.1): We give the definition of

problems, followed by the problem categories which categorizing prob-
lems without the context of data structures. This is like describe the
fresh of a species

2. Apply data structures to the problems (Section. 2.3.2): We then de-

scribe our problem in terms of data structures; connecting the fresh
and the skeletons. We also analyze the the problem by exploring its
20 2. THE GLOBAL PICTURE OF ALGORITHMIC PROBLEM SOLVING

solution space and simulating the process; finding the series of actions
between the input and output instance.

2.3.1 Understand Problems

Problem Formulation

A problem can be a task or something to be done according to the definition

of “problem” in English dictionary, such as finding a ball numbered 11 from
a pool of numbered balls, sorting the list of students by their IDs. The first
thing we need to understand should be problem formulation and the closest
knowledge we need to define a problem comes from the field of math. The
intuitive definition of a problem is that it is a set of related tasks, usually
infinite.
The formal definition of problems: A problem is characterized by:

1. A set of input instances: The instance represents some real exam-

ples of this type of problems. And input instances are data, which
needed to be saved and accessed from the machine. This mostly re-
quires us to define a data structure, however, different data structures
can be used to define.

2. A task to be preformed on the input instances: The problem

definition should usually comes with examples to better explain how
the task decides the output of the exemplary input instances.

For example, we formulate the problem of drawing a call from the pool
as: Given a list of unsorted integers, find if the number 11 is in the list,
return true or false.
Example :
Given t h e l i s t : [ 1 , 3 4 , 8 , 1 5 , 0 , 7 ]
Return F a l s e b e c a u s e 11 d o e s not appear i n t h e l i s t .

Problem Categories

Now, to better understand what computer science deals with, we categorize

problems commonly solved in the field.

Continuous or Discrete? Based on whether the variables are continuous

or discrete, we have two categories of problems:

1. Continuous problems: relates to continuous solution spaces.

2. Discrete problems: relates to discrete solution spaces.

2.3. PROBLEM MODELING 21

The field of algorithmic problem solving is highly correlated to Discrete

Mathematics, which covers topics such as arithmetic and geometric sequence,
recurrence relations, inductions, graph theory, generating functions, number
theory, combinatorics, and so. Through this book. some important parts
are detailed (recurrence relation, induction, combinatorics, graph theory)
which serves as powerful tools to do good job in computer science.

What do They Ask? We may be asked to answer four types of questions:

1. YES/No Decision Problems: answering whether a number is prime,

odd or even are examples of such decision problems.

2. Search problems: Find one/all feasible solutions that meets problem

requirement, which requires the identification of a solution from within
a potentially infinite set of possible solutions. For example, finding the
nth prime number. Almost all problems are or can be converted to a
search problem in some way. Further, search problems can be divided
into:

• Counting Problems: Count all feasible solutions to a search

problem, such as answering, ‘how many of the 100 integers are
prime?’.
• Optimization Problems: Find the best solution among all fea-
sible solutions to a search problem. In addition to the search
problem, optimization problems answers the decision, ‘is the so-
lution the best among all feasible ones?’.

Combinatorics When discrete problems are asked with counting or opti-

mization questions, in computer science we further have combinatorial prob-
lems, which is also widely called combinatorics.
Combinatorics originates from discrete mathematics and become part
of computer science. As the name suggested, combinatorics is about com-
bining things; it answers questions: "How many ways can these items be
combined?", and "Whether a certain combination is possible, or what com-
bination is the ‘best’ in some sense?"
Through this book, permutations, combinations, subsets, strings, points
in the linear order, and trees, graphs, polygons in the non-linear ordering
will be examined (suggest contents in the book). We will have some briefy
study on this topic in Chapter ??.

Tractable or Intractable? The complexity of a problem is normally

described in relation with the size of the input instance. If a problem is
algorithmic and computable, being able to produce a solution may depend
on the size of the input or the limitations of the hardware used to implement
22 2. THE GLOBAL PICTURE OF ALGORITHMIC PROBLEM SOLVING

it. Based on if a problem can be possibly solved by existing machines we

have:

1. Tractable problems: If a problem has reasonable solution, that it

can be solved in no more than polynomial time complexity, it is said
to be tractable.

2. Intractable problems: Some problems can only be solved with al-

gorithms whose execution time grows too quickly in relation to their
input size, say exponential, then these problems are considered to be
intractable. For example, the classical Traveling Salesperson Problem.

Problems can also be categorized as:

1. P Problems:

2. NP Problems:

There are more types, such as undecidable problems and the halting problems,
feel free to look them up if interested.

2.3.2 Understand Solution Space

A data structure is a specialized format of organizing, storing, processing,
and retrieving data. As Dr. Wirth states in the chapter quote, we problem
modeling with data structures and the data structures often influence the
details of an algorithm: the input/output instances, and the intermediate
results in the process of an algorithms all associates to data structures.
In this section, we do not intend to get into details of data structures,
but rather pointing out directions. Quickly skim the first section of Part. ??
and get a sense of the categories of data structures. When a problem is
modeled with data structures, the problems can further be classified based
on its data structures. At this stage, we should try to model our input on
a data structure, and analyze the following five components to even better
understand our problem.

Five Components There are generally five components of a problem that

we can define and depends on to correlate the problem to data structures,
and to algorithms–searching, divide and conquer, dynamic programming,
and greedy algorithms. We introduce the five components with a dummy
example:
Given a l i s t o f i t e m s A=[1 , 2 , 3 , 4 , 5 , 6 ] , f i n d t h e p o s i t i o n o f
item with v a l u e 4 .
2.3. PROBLEM MODELING 23

Figure 2.1: The State Space Graph. This may appears as a tree, but we can
redraw it as a graph.

1. Initial State: state that where our algorithm starts. In our example,
we can scan the whole list starting from leftmost position 0, we denote
it as S(0). Note that a state does not equal to a point on the input
instance, it can be a range –such as from position 0 to 5, or from 0 to
2, or any state you define.

2. Actions or MOVES: describe possible actions relating to a state.

Now, given position 1 with 2 as value, we can move only one step
forward and get to position 2 or we can move 2, 3, 4, 5 steps and so.
Thus, we should find all possible actions or moves that we can take to
progress to next state. We can denote it as ACTIONS(1)=MOVE(1),
MOVE(2), MOVE(3), MOVE(4), MOVE(5).

3. State transfer Model: decides the state results from doing an action
a at state s. We denote it as T (a, s). For example, if we are at position
1 and move one step, MOVE(1), then we can reach to state 2, which
can be denote as 2 = T (M OV E(1), 1).

4. State Space: is the set of all states reachable from the initial state
by any sequence of actions, in our case, it can be 0, 1, 2, 3, 4, 5. We
can infer state space of the problem from the initial state, actions, and
transfer model. The state space forms a directed network or graph in
which the nodes are states and the links between nodes are actions.
Graph, with all its flexibity, is a universal and natural way to represent
relations. For example, if we limit the maximum moves we can make at
each state to two, the state space will be formed as follows in Fig. 2.6.
In practice, draw the graph as a tree structure is another option; in the
24 2. THE GLOBAL PICTURE OF ALGORITHMIC PROBLEM SOLVING

tree, we observe repeat states due to the expansion of nodes in graph

with multiple ingoing links. A path in the state space is a sequence of
states connected by a sequence of actions.

5. Goal Test: the determines whether a given state is a goal state.

Such as in this example, the goal state is 4. The goal is not limited to
such enumerated sets of states, it can also be specified by an abstract
property. For example, in the constraint state problems(CSP) such as
the n-queen, the goal is to reach to a state that not a single pair of
queens will attack each other.

In this example, the space graph is an analysis tool; we use it to repre-

sent the transition relationship between different states not the exact data
structure that we use to operate and define algorithms on.

Figure 2.2: State Transfer process on a linear structure

Apply Data Structures With the state space graph, our problem is ab-
stracted to finding a node with value 4 and graph algorithms–more specif-
ically, graph search–can be applied to solve the problem. It does not take
an expert to tell us, "This graph just complicated the situation, because
our intuition can lead us to a much simpler and straightforward solution:
scan the items from the leftmost to the rightmost one by one". True! As is
depicted in Fig. 2.2, the problem can be modeled using a linear structure,
possiblly a list or linked list, and we only need to consider one action out
of all options, MOVE(1), then our searching covers the whole state space,
which makes the algorithm we designed complete 1 . On the other side, in
the state space graph, if we insist on moving two steps each time, we would
not be able to cover the whole state space, and might end up not finding
our target, which indicates this algorithm is incomplete.
Instead of using linear data structure, we can restructure the states as a
tree if we refine the state as a range of items. The initial state is the possible
subarray the target can be found, denote as S(0, 5). Start from initial state,
each time, we divide the space into two halves: S(s, m) and S(m, e), where
s, e is the start and end index respectively, and m = (s + e)//2, meaning the
integer part of s + e divided by 2. We do this to all nodes repeatedly, and
we will have another state transfer graph shown in Fig. 2.3.From this graph
we can see, the last node will be where we can not divide further, that is
1
Check complexity analysis
2.4. PROBLEM SOLVING 25

Figure 2.3: State Transfer Process on the tree

when s = e. From state 0 − 5 to 3 − 5 needs an action–move to the right.

Similarly, from 0 − 5 to 0 − 3 needs the action of moving to the left. We use
MOVE(L), MOVE(R) to denote the whole set of possible actions to take.
In this example, we showed how to same simple problem can be modeled
using two different data structures–linked list and tree.

2.4 Problem Solving

In this section, we will first demonstrate how algorithm can be applied on
these two data structures with its corresponding state transfer process. Fol-
lowing this, we introduce the four fundamental algorithm design and analysis
methodologies–the “soul/brain”. We end this section by briefing on catego-
rizing algorithms.

2.4.1 Apply Design Principle

Figure 2.4: Linear Search on explicit linear data structure

Given the state transfer graph in Fig. 2.2, we simply iterate each state
and compare each item with our target to see if it equals; if true, we find
our target and return, if not, we continue to the end. This simple search
method is depicted in Fig. 2.4. What if we know that the data is already
organized in ascending order? With the tree data structure, when given a
specific target, we only need to choose one action from the actions set; either
move to left or right to search with a condition: if target is larger or smaller
than the item in the middle of the state. When 4 is target, we have the
search process depicted in Fig. 2.5.
26 2. THE GLOBAL PICTURE OF ALGORITHMIC PROBLEM SOLVING

Figure 2.5: Binary Search on an implicit Tree Structure

All these state space, data structure, algorithm, and analysis might ap-
pear overwhelming to you for now. But as you learn, some of these steps
are not necessary, but knowing these elements are good for you to analyze
and learn new algorithms, think of it more gathering terminologies into your
language base.

2.4.2 Algorithm Design and Analysis Principles

Algorithm Design
More of the time, the most naive and inefficient solution – brute-force solu-
tion would strike us right away, which is simply searching a feasible solution
to the problem in its solution space using the massive computation power
of the hardware. Although the naive solution is not preferred by your boss
nor it will be incorporated into the real product, it offers the baseline for
your complexity comparison and to showcase how good your well-designed
algorithm is.
In the dummy example, we actually used two different searching algorithms–
linear search and binary search. The process of looking for a sequence of
actions that reaches the goal is called search. Therefore, searching is the
fundamental strategy and and the very first step to problem-solving. How
could it not be? Algorithms are about to find answers to problems, if and
assuming we can define out potential state/solution space, then a naive/ex-
huastive searching would do the magic and solve the problem. However, back
to reality, we are limited by computation resource and speed, we comprise
by:

1. being smarter that we can be decrease the cost, increase the speed, and
yet still gives out the exact solution we are looking for. This comes
down to optimization, which we have divide and conquer(Chapter ??),
dynamic programming(Chapter ??), and greedy algorithms(Chapter ??).
What are the commonality between them? They all in some way need
2.4. PROBLEM SOLVING 27

us to get recurrence relation((Chapter ??), which is essentially mathe-

matical induction(Chapter ??), which I generalized from another book,
Introduction to Algorithms: A Creative Approach, by Udi Manber. Ex-
plain it in another way, these principles are using recurrence relation
to find the relation of a problem with its smaller instance. Why is it
smarter? First, smaller problems are just easier to solve than larger
problems. Second, the cost of assembling the answer to smaller prob-
lems to answers to the larger problems is possibly smaller.

2. by approximating the answer. Instead of trying to get the exact an-

swer, we find one that is good enough. Here goes to all heuristic search,
machine learning, artificial intelligence. Guess, currently my limited
knowledge is not enough for me to give more context that this.
Equally, we can say all algorithms can be described and categorized as
searching algorithms. Yet, there are three algorithm design paradigms–
Divide and Conquer, Dynamic Programming, and greedy Algorithms, can be
applied in the searching process for faster speed or using less space. Don’t
worry, this is just the introduction chapter, all these concepts and algorithm
design principles will be explained later.

Algorithm Analysis of Performance

How to measure problem-solving performance? Up till now, we have some
basic ways to solve the problem, we need to consider the criteria that might
be used to measure them. We can evaluate an algorithm’s performance in
four ways:
1. Completeness: Is the algorithm guaranteed to find a solution when
there is one?

2. Optimality: Does the stategy find the optimal solution, as defined?

3. Time Complexity: How long does it take to find a solution?

4. Space Complexity: How much memory is needed to perform the

search?
Time and space complexity are always considered with respect to some
measure of the problem difficulty. In theoretical computer science, the typ-
ical measure is the size of the state space graph, |V | + |E|, where V is the
set of vertices (nodes) of the graph and E is the set of edges (links). This is
appropriate when the graph is an explicit data structure that is input to the
search program. However, in reality, it is better to describe the search tree
that applied to search for our solutions. For this reason, complexity can be
expressed in terms of three quantities: b, the branching factor or a maxi-
mum number of successors of any node; d, the depth if the shallowest goal
28 2. THE GLOBAL PICTURE OF ALGORITHMIC PROBLEM SOLVING

node ; and m, the maximum length of any path in the state space. Time is
often measured in terms of the number of nodes in the search tree, and the
space are in terms of the maximum number of nodes stored in memory.
For the most part we describe time and space complexity for search on a
tree; for a graph, the answer depends on how “redundant" paths or “loops"
in the state space are.

2.4.3 Algorithm Categorization

There are countless algorithms invented, however, these traditional data-
independent algorithms (not the current data-oriented deep learning models
which are trained with data), it is important for us to be able to categorize
the algorithms and understand the similarities and characteristics of each
type and also be able to compare each type:
• By implementation: the most useful in our book is recursive and it-
erative. Understand the difference of these two, and the special us-
age of recursion (Chapter III) is fundamental to the further study of
algorithm design. We can also have serial and parallel/distributed,
deterministic and non-deterministic algorithms. In our book, all the
algorithms we learn are serial and deterministic algorithms.
• By design: algorithms can be interpreted to one or several of the
four fundamental problem solving paradigms, Divide and Conquer
(Part ??), Dynamic Programming and Greedy (Part ??). In Sec-
tion ??, we will briefly introduce and compare these four problem
solving paradigms to gain a global picture of the spirit of algorithms.
• By complexity: mostly algorithms can be categorized by its time com-
plexity. Given an input size of n, we normally have categories of O(1),
O(log n), O(n log n), O(n2 ), O(n3 ), O(2n ), and O(n!). More details
and the comparison is given in Section ??.
The intractable problems are still get solved by computer. We can limit
our input instance size. However, it is not very practical, when the size of
the input size is large and we are still hoping to get solutions, maybe not the
best, but are good enough in a a reasonable(polynomial) time. Approximate
algorithms comes into our hand, such as heuristic algorithm. In this book,
we focus more on the non-approximate algorithmic methods to solve prob-
lems in discrete solution spaces, and only brief on the part of approximate
algorithms.

2.5 Programming Languages

hird, for certain type of problems, there are algorithms specically designed
and tuned to optimize that type of question. This wil be introduced in
2.6. TIPS FOR ALGORITHM DESIGN 29

Part ??. Which might give us almost the best efficiency we can find.

2.6 Tips for Algorithm Design

Principle
1. Understand the problem, analyze with searching and combinatorics to
get the complexity of the naive solution.

2. If it is a exponential problem, check if the dynamic programming ap-

plies. If not, we have to stick to a search algorithm. If it applies, we
can decrease the complexity to polynomial.

3. If it is polynomial already or polynomial after the dynamic program-

ming applied, check if the greedy approach or the divide and con-
quer can be applied to further decrease the polynomial complexity.
For example, if it is O(n2 ), divide and conquer might decrease it to
O(n log n), and the greedy approach might end up with O(n).

4. If none of these design principle applies: we stick to the searching and

try to optimize with better searching techniques–such as backtracking,
bidirectional search, A∗ , sliding window and so on.
This process can be generalized with “BUD”–bottleneck, unncessary work,
and D.

2.7 Exercise
2.7.1 Knowledge Check
Longest Increasing Subsequence

Practice first before you check up the solution. (put the

solution at next page)

Given a list of items A = [1, 2, 3, 4, 5, 6], find the position of item with value
4.
1. Initial State: state that where our algorithm starts. In our example,
we can scan the whole list starting from leftmost position 1. S(0)

2. Actions or MOVES: A description of possible actions available at

a state. If we are at position 1, we can have different possible actions,
we can move only one step forward and get to position 2. Or we can
move 2, 3, 4, 5 steps. We can denote it as ACTIONS(1)=MOVE(1),
MOVE(2), MOVE(3), MOVE(4), MOVE(5).
30 2. THE GLOBAL PICTURE OF ALGORITHMIC PROBLEM SOLVING

Figure 2.6: The State Spaces Graph

3. State transfer or transition model: It returns the state results

from doing an action a at state s. We denote it as T(a, s). For example,
if we are at position 1 and take action that move one step, MOVE(1),
then we can reach to state 2, denote as 2 = T (M OV E(1), 1). We also
use the term successor to refer to any state reachable from a given
state by a single action.

4. State Space: Together, the initial state, actions, and transition

model implicitly define the state space of the problem–the set of all
states reachable from the initial state by any sequence of actions. The
state space forms a directed network or graph in which the nodes are
states and the links between nodes are actions. For example, if we
limit the maximum moves we can make at each state to be one and
two, the state space will be formed as follows in Fig. 2.6. A path
in the state space is a sequence of states connected by a sequence of
actions.

5. Goal Test: the goal test determines whether a given state is a goal
state. Sometimes there is an explicit set of possible goal states, and
the test simply checks whether the given state is one of them. Such as
in this example, the goal state is 4. Sometimes the goal is specified by
an abstract property rather than explicitly enumerated sets of states.
For example, in the constraint state problems(CSP) such as the n-
queen, the goal is to reach to a state that not a single pair of queens
2.7. EXERCISE 31

will attack each other.

In practice, analyzing and solving a problem is not answering a yes or no

question. There are always mutiple angels to model a problem, the way to
model and formalize a problem decides the corresponding algorithm that can
be used to solve this problem. And it might also decide the efficiency and
difficulty to solve the problem. For example, using the Longest Increasing
Subsequence: Ways to model the problem. There are different ways to

Figure 2.7: State Transfer Tree Structure for LIS, each path represents a
possible solution. Each arrow represents an move: find an element in the
following elements that’s larger than the current node.

model this LIS problem, including:

1. Model the problem as a directed graph, where each node is the ele-
ments of the array, and an edge µ to v means node v > µ. The problem
now becomes finding the longest path from any node to any node in
this directed graph.

2. Model the problem as a tree. The tree starts from empty root node, at
each level i, the tree has n-i possible children: nums[i+1], nums[i+2],
..., nums[n-1]. There will only be an edge if the child’s value is larger
than its parent. Or we can model the tree as a multi-choice tree: for
combination problem, each element can either be chosen or not chosen.
We would end up with two branch, and the nodes would become a path
of the LIS, therefore, the longest LIS exist at the leaf nodes which has
the longest length.

3. Model it with divide and conquer and optimal substructure.

32 2. THE GLOBAL PICTURE OF ALGORITHMIC PROBLEM SOLVING
3

Coding Interviews and Resources

In my humble opinion, I think it is a waste of our precious time to read

either books or long blogs that purely focusing on the interview process and
preparation. Personally, I would rather read a book that amuses me or work
on my personal project that has some meanings or just enjoy it with your
friends and families.
This chapter consists of parts:
1. Tech Interviews (Section. 3.1)
2. Tips and Resources on Coding Interviews (Section. ??).

3.1 Tech Interviews

In this section, a brief introduction to the coding interviews and hiring
process for a general software engineering position is first provided. , coding
interviews related with data structures and algorithms are necessary. Your
masterness of such knowledge varies as the requirement of more specific
work.

3.1.1 Coding Interviews and Hiring Process

Coding interviews, a.k.a whiteboard coding interviews, is one part of the
whole interview process, where interviewees would be asked to solve one or
a few well-defined problems and write down the code in 45-60 minutes of
time window while the interviewer is watching. This process can be done
either remotely via a shared file between interviewer and interviewee or face-
to-face in a conference room on the whiteboard with the interviewer being
present.

33
34 3. CODING INTERVIEWS AND RESOURCES

Typically, the interview pipeline of software developer jobs consists of

three stages–Exploratory chat with recruiter, Screening interviews, and On-
site interviews:

• Exploratory chat with recruiters: Either you applied for the position
and passed the initial screening or luckily get found by recruiters, they
would contact you to schedule a short chat, normally through phone.
During the phone call, the recruiter would introduce the company, the
position, and ask for your field of interest; just to check the degree of
interest on either side and decide if the process should be continued

• Screening interviews: The screening interviews are usually two back-

to-back coding interviews, each lasts 45-60 minutes. This process
consists of the introduction on each side–interviewer and interviewee–
which is cut as short as possible to save enough time for the coding
interviews.

• On-site interviews: If you have passed the first two rounds of inter-
views, you would be invited to the on-site interviews which is the most
fun, exciting, but might also be the most daunting and tiring part of
the whole process, since they can last anywhere from four hours to
the entire day. The company would offer both transportation and ac-
commodation to get you there. The on-site interview consists of 4
to 6 rounds one-on-one, each with an engineer in the team and lasts
between 45-60 minutes; due to the long process, typically a lunch in-
terview is included. There are some extra cases, which may or may
not be included: group presentation, recruiter conversation, or conver-
sation with the hiring manager or team manager. Presentation might
happens to research scientist or higher-level positions. The onsite in-
terview appears to be more more diverse compared with screening
interview; introduction, coding interviews, brain teaser type of ques-
tions, behavior questions, and questions related to the field of the
position, such as machine learning or web development. During the
lunch interview, it was just hanging out with the one who is arranged
to be with you, chatting while eating and showing you around the
company in some cases.

In some cases, you get to have to do on-line assignment which happens more
to some start-ups and second tier tech companies, which requires you spend-
ing at least two hours solving problems without any promise that would lead
to real interviews. Personally, I have done that twice with companies such
as ; and I never heard back from them then. I fully resent such assignment;
it is unfair because it wasted my time but not the company’s, I learned
nothing and the process is bored to hell! Ever since then I decide to stop
the interview whenever such chore is demanded!
3.1. TECH INTERVIEWS 35

Both the first and the second process serves as an initial screening pro-
cess, the purpose is obviously; a trade-off because of the cost, because
the remaining interview process can be quite costly in terms of finance–
accommodation and transportation if you get the on-site interviews–and in
terms of time–the time cost on each side but mainly the cost from spending
5-8 hours on the interviewees from multiple engineers of the hiring company.
Sometimes the process differs slightly between internship and full-time
position; interns typically do not need on-site interviews. For new graduates,
getting an internship first and through the internship to get a full-time
offer can ease the whole job hunting process a bit. For more experienced
engineers, they might get invited to on-site without the screening.

3.1.2 Why Coding Interviews?

The History
Coding interviews originated in the form of in-person interview and writing
code on paper back in 1970s, populated as whiteboard interview in 1990s
with the rise of the Internet, and froze in time and continued to live till
today, 2020s.
Back in the 70s, computer time was expensive; the electricity, data pro-
cessing and disc space costs were outrageous. Here shows the cost per mega-
hertz from 1970 to 2007, courtesy off Dr. Mark J. Perry:

Figure 3.1: Computer Prices, Computer Speed and Cost/MHz

Writing code on paper was a common, natural and effective way for
programmers to code into computer later in this era. As Bill Gates describes
the experience in a commencement speech at his alma mater – Lakeside
School: “You had to type up your program off-line and create this paper
tape—and then you would dial up the computer and get on, and get the
paper in there, and while you were programming, everybody would crowd
around, shouting:“Hey, you made a typing mistake”“Hey, you messed this
up”“Hey, you’re taking too much time.”
Writing code is conducted on whiteboard rather than on paper in 1990s,
when software engineering was growing exponentially with the rise of the
36 3. CODING INTERVIEWS AND RESOURCES

Internet. It was a natural transition because the whiteboard is easy to

setup, erase the mistake and the entire team can see the code which makes
it a perfect way to conduct discussion.
Now, at the 21st centry, when the computation is virtually free, the act
of writing out the code on whiteboard or paper continues and part of our
interview process.

Discussion
Is it a good way to testify and select talented candidates? There are different
opinions, in all, either favors it or oppose it. Stating the reason on each side
is boring and not bear much value. Let us see what people say about it in
their words.
Me : How do you think about coding interviews?
Susie Chen : Ummmmmm well there was like one full month I only did
Leetcode before interviewing with Facebok. LOL, was a bad experi-
ence but worth it hahahah. Susie was an intern from Facebook, with
bachelor degree from University of.
.....................................................................
Me : How the coding interview plays its role for new graduates and expe-
rienced engineers ?
Eric Lin :
• Common: Both require previous proj demo/desc. Your work
matters more than your score. Ppl care more about the actual
experience than the paper work.

• Diffs: Grads are more asked for a passion or altitude of learn-

ing and problem-solving. For experienced engineers, the coding
interview doesn’t matter at all.
Eric is an Cloud Engineer at Contino, Four years of experience, Mas-
ter’s degree in Information Technology at Monash University, Aus-
tralia.

3.2 Tips and Resources

Because of the focus of the book–learning computer science while having fun
and the opinion I hold to coding interviews decide I will check this section
short and offer more general information and tips.

3.2.1 Tips
Tips for Preparation
3.2. TIPS AND RESOURCES 37

1. First and of the most important tip: Do not be afraid of applying!

Apply any company that you want to join and try your best to make
it in the interviews. No need to check out statistics about their hiring
ratio and be terrified of trying. You have nothing to lose and it would
be a good chance to get first-hand interview experience with them,
which might help you next time. So, be bold, and just do it!

2. Schedule your interviews with companies in a descending order of your

favoritism. This way you get as much practice as possible before you
go on with your dream company.

3. Before doing real interviews, do mocking interviews; either ask your

friends for help or find online mocking websites.

4. If you are not fluent in English, practice even more using English! This
is the situation for a lot of international STEM students, including me.
I wished I would know the last three tips when I first started to prepare
interview with Google back to 2016 (At least I followed the first tip, went
for my dream company without hesitation, huh), one year in my PhD. It
was a very first try, I prepared for a month(I mean at least 8 hours a day);
reading and finishing all problems from Cracking the Coding Interview. I
failed the screening interview that is conducted through the phone with
a Google share document. I was super duper nervous; taking long to just
understand the question itself given my poor speaking English that time and
the noise from the phone made the situation worse (talking with people on
the phone fears me more than the ghost did from The Shining, by Stephen
King). At that time, I also did not have a clue about LeetCode.

5 Tips for the Interview Here, we summarize five tips when we are
doing a real interview or trying to mock one beforehand in the preparation.
1. Identify Problem Types Quickly: When given a problem, we read
through the description to first understand the task clearly, and run
small examples with the input to output, and see how it works intu-
itively in our mind. After this process, we should be able to identify
the type of the problems. There are 10 main categories and their dis-
trbution on the LeetCode which also shows the frequency of each type
in real coding interviews.

2. Do Complexity Analysis: We brainstorm as many solutions as possible,

and with the given maximum input size n to get the upper bound of
time complexity and the space complexity to see if we can get AC
while not LTE.
For example, For example, the maximum size of input n is 100K, or 105
(1K = 1, 000), and your algorithm is of order O(n2 ). Your common
38 3. CODING INTERVIEWS AND RESOURCES

Table 3.1: 10 Main Categories of Problems on LeetCode, total 877

Types Count Ratio1 Ratio 2
Array
Ad Hoc
String
Iterative Search 84 27.8% 15.5%
Complete search
Recursive Search 43 22.2% 13.6%
Divide and Conquer 15 8% 4.4%
Dynamic Programming 114 6.9% 3.9%
Greedy 38
Math and Computational Geometry 103 3.88% 2.2%
Bit Manipulation 31 2.9% 1.6%
Total 490 N/A 55.8%

sense told you that (100K)2 is an extremely big number, it is 101 0.

So, you will try to devise a faster (and correct) algorithm to solve the
problem, say of order O(n log2 n). Now 105 log2 105 is just 1.7 × 106 .
Since computer nowadays are quite fast and can process up to order
1M, or 106 (1M = 1, 000, 000) operations in seconds, your common
sense told you that this one likely able to pass the time limit.

3. Master the Art of Testing Code: We need to design good, comprehen-

sive, edges cases of test cases so that we can make sure our devised
algorithm can solve the problem completely while not partially.

4. Master the Chosen Programming Language:

3.2.2 Resources
Online Judge System
Leetcode LeetCode is a website where you can practice on real interview-
ing questions used by tech companies such as Facebook, Amazon, Google,
and so on.
Here are a few tips to navigate the usage of LeetCode:

• Use category tag to focusing practice: With the category or topic

tags, it is better strategy to practice and solve problems one type after
another, shown in Fig. 3.2.

• Use test case to debug: Before we submit our code on the LeetCode,
we should use the test case function shown in Fig. 3.4 to debug and
testify our code at first. This is also the right mindset and process at
the real interview.

• Use Discussion to get more solutions:

3.2. TIPS AND RESOURCES 39

Figure 3.2: Topic tags on LeetCode

Figure 3.3: Use Test Case to debug

Algorithm Visualizer If you are inspired more by visualization, then

check out this website, https://algorithm-visualizer.org/. If offers us
a tool to visualize the running process of algorithms.

Mocking Interviews Online

Interviewing.io Use the website interviewing.io, you can have real mock-
ing interviews given by software engineers working in top tech company.
This can greatly help you overcome the fear, tension. Also, if you do well
in the practice interviews, you can get real interviewing opportunities from
their partnership companies.
Interviewing is a skill that you can get better at. The steps mentioned
above can be rehearsed over and over again until you have fully internalized
them and following those steps become second nature to you. A good way
to practice is to find a friend to partner with and the both of you can take
turns to interview each other.
40 3. CODING INTERVIEWS AND RESOURCES

Figure 3.4: Use Test Case to debug

Table 3.2: Problems categorized by data structure on LeetCode, total 877

Data Structure Count Percentage/Total Percentage/Total
Problems Data Structure
Array 136 27.8% 15.5%
String 109 22.2% 13.6%
Linked List 34 6.9% 3.9%
Hash Table 87
Stack 39 8% 4.4%
Queue 8 1.6% 0.9%
Heap 31 6.3% 3.5%
Graph 19 3.88% 2.2%
Tree 91 18.6% 10.4%
Binary Search Tree 13
Trie 14 2.9% 1.6%
Segment Tree 9 1.8% 1%
Total 490 N/A 55.8%

A great resource for practicing mock coding interviews would be in-

terviewing.io. interviewing.io provides free, anonymous practice technical
interviews with Google and Facebook engineers, which can lead to real jobs
and internships. By virtue of being anonymous during the interview, the
inclusive interview process is de-biased and low risk. At the end of the inter-
view, both interviewer and interviewees can provide feedback to each other
for the purpose of improvement. Doing well in your mock interviews will
unlock the jobs page and allow candidates to book interviews (also anony-
mously) with top companies like Uber, Lyft, Quora, Asana and more. For
those who are totally new to technical interviews, you can even view a demo
interview on the site (requires sign in). Read more about them here.
Aline Lerner, the CEO and co-founder of interviewing.io and her team
are passionate about revolutionizing the technical interview process and
helping candidates to improve their skills at interviewing. She has also
published a number of technical interview-related articles on the interview-
3.2. TIPS AND RESOURCES 41

Table 3.3: 10 Main Categories of Problems on LeetCode, total 877

Algorithms Count Percentage/Total Percentage/Total
Problems Data Structure
Depth-first Search 84 27.8% 15.5%
Breadth-first Search 43 22.2% 13.6%
Binary Search 58 18.6% 10.4%
Divide and Conquer 15 8% 4.4%
Dynamic Program- 114 6.9% 3.9%
ming
Backtracking 39 6.3% 3.5%
Greedy 38
Math 103 3.88% 2.2%
Bit Manipulation 31 2.9% 1.6%
Total 490 N/A 55.8%

ing.io blog. interviewing.io is still in beta now but I recommend signing up

as early as possible to increase the likelihood of getting an invite.

Pramp Another platform that allows you to practice coding interviews is

Pramp. Where interviewing.io matches potential job seekers with seasoned
technical interviewers, Pramp takes a different approach. Pramp pairs you
up with another peer who is also a job seeker and both of you take turns
to assume the role of interviewer and interviewee. Pramp also prepares
questions for you, along with suggested solutions and prompts to guide the
interviewee.

Communities
If you understand Chinese, there is a good community 1 that we share infor-
mation with either interviews, career advice and job packages comparison.

1
http://www.1point3acres.com/bbs/
42 3. CODING INTERVIEWS AND RESOURCES
Part II

Warm Up: Abstract Data

Structures and Tools

43
45

We warm up our “algorithmic problem solving”-themed journey with

knowing the abstract data structures–representing data, fundamental prob-
lem solving strategies–searching and combinatorics, and math tools–recurrence
relations and useful math functions, which we decide to dedicate a stan-
dalone chapter to it due to its important role both in algorithm design and
analysis, as we shall see in the following chapters.
46
4

Abstract Data Structures

(put a figure here)

4.1 Introduction
Leaving alone statements that “data structures are building blocks of algo-
rithms”, they are just mimicking how things and events are organized in
real-world in the digital sphere. Imagine that a data structure is an old-
schooled file manager that has some basic operations: searching, modifying,
inserting, deleting, and potentially sorting. In this chapter, we are simply
learning how a file manager use to ‘lay out’ his or her files (structures) and
each ‘lay out’s corresponding operations to support his or her work.
We say the data structures introduced in this chapter are abstract or
idiomatic, because they are conventionally defined structures. Understand-
ing these abstract data structures are like the terminologies in computer
science. We further provide each abstract data structure’s corresponding
Python data structure in Part. III.
There are generally three broad ways to organize data: Linear, tree-like,
and graph-like, which we introduce in the following three sections.

Items We use the notion of items throughout this book as a generic name
for unspecified data type.

Records

47
48 4. ABSTRACT DATA STRUCTURES

4.2 Linear Data Structures

4.2.1 Array

Figure 4.1: Array Representation

Static Array An array or static array is container that holds a fixed size
of sequence of items stored at contiguous memory locations and each
item is identified by array index or key. The Array representation is shown
in Fig. 4.1. Since using contiguous memory locations, once we know the
physical position of the first element, an offset related to data types can be
used to access any item in the array with O(1), which can be characterized
as random access. Because of these items are physically stored contiguous
one after the other, it makes array the most efficient data structure to store
and access the items. Specifically, array is designed and used for fast random
access of data.

Dynamic Array In the static array, once we declared the size of the array,
we are not allowed to do any operation that would change its size; saying
we are banned from either inserting or deleting any item at any position
of the array. In order to be able to change its size, we can go for dynamic
array. that is to sayStatic array and dynamic array differs in the matter of
fixing size or not. A simple dynamic array can be constructed by allocating
a static array, typically larger than the number of elements immediately
required. The elements of the dynamic array are stored contiguously at the
start of the underlying array, and the remaining positions towards the end
of the underlying array are reserved, or unused. Elements can be added at
the end of a dynamic array in constant time by using the reserved space,
until this space is completely consumed. When all space is consumed, and
an additional element is to be added, then the underlying fixed-sized array
needs to be increased in size. Typically resizing is expensive because it
involves allocating a new underlying array and copying each element from
the original array. Elements can be removed from the end of a dynamic
array in constant time, as no resizing is required. The number of elements
used by the dynamic array contents is its logical size or size, while the size of
the underlying array is called the dynamic array’s capacity or physical size,
which is the maximum possible size without relocating data. Moreover, if
4.2. LINEAR DATA STRUCTURES 49

the memory size of the array is beyond the memory size of your computer,
it could be impossible to fit the entire array in, and then we would retrieve
to other data structures that would not require the physical contiguity, such
as linked list, trees, heap, and graph that we would introduce next.

Operations To summarize, array supports the following operations:

• Random access: it takes O(1) time to access one item in the array
given the index;

• Insertion and Deletion (for dynamic array only): it consumes Average

O(n) time to insert or delete an item from the middle of the array due
to the fact that we need to shift all other items;

• Search and Iteration: O(n) time for array to iterate all the elements
in the array. Similarly to search an item by value through iteration
takes O(n) time too.

No matter it’s static or dynamic array, they are static data structures; the
underlying implementation of dynamic array is static array. When frequent
need of insertion and deletion, we need dynamic data structures, The concept
of static array and dynamic array exist in programming languages such as
C–for example, we declare int a[10] and int* a = new int[10], but not
in Python, which is fully dynamically typed(need more clarification).

4.2.2 Linked List

Dynamic data structures, on the other hand, is designed to support flexible
size and efficient insertion and deletion. Linked List is one of the simplest
dynamic data structures; it achieves the flexibility by abandoning the idea of
storing items at contiguous location. Each item is represented separately–
meaning it is possible to have item of different data types, and all items
are linked together through pointers. A pointer is simply a variable that
holds the address of an item as a value. Normally we define a record data
structure, namely node, to include two variables: one is the value of the
item and the other is a pointer that addressing the next node.

Why is it a highly dynamic data structure? Imagine each node as a

’signpost’ which says two things: the name of the stop and address of the
next stop. Suppose you start from the first stop, you can head to the next
stop since the first signpost tells you the address. You would only know the
total number of stops by arriving at the end signpost, wherein no sign of
the address. To add a stop, you can just put it at the end, at the head or
anywhere in the middle by modifying any possible signpost before or after
the one you add.
50 4. ABSTRACT DATA STRUCTURES

Figure 4.2: Singly Linked List

Figure 4.3: Doubly Linked List

Singly and Doubly Linked List When the node has only one pointer,
it is called singly linked list which means we can only scan nodes in one
direction; when there is two pointers, one pointer to its predecessor and
another to its successor, it is called doubly linked list which supports traversal
in both forward and backward directions.

Operations and Disadvantages

• No Random access: in linked list, we need to start from some pointer

and to find one item, we need to scan all items sequentially in order
to find it and access it;

• Insertion and Deletion: only O(1) to insert or delete an item if we are

given the node after where to insert or the node the delete.

• Search and Iteration: O(n) time for linked list to iterate all items.
Similarly to search an item by value through iteration takes O(n) time
too.

• Extra memory space for a pointer is required with each element of the
list.

Recursive A linked list data structure is actually a recursive data struc-

ture; any node can be treated as a head node thus making it a sub-linked
list.

4.2.3 Stack and Queue

Stacks and queues are dynamic arrays with restrictions on deleting op-
eration. Items adding and deleting in a stack follows the “Last in, First
out(LIFO)” rule, and in a queue, the rule is “First in, First out(FIFO)”,
4.2. LINEAR DATA STRUCTURES 51

Figure 4.4: Stack VS Queue

this process is shown in Fig. 4.4. We can simply think of stack as a stack of
plates, we always put back and fetch a plate from the top of the pile. Queue
is just like a real-life queue in any line, to be first served with your delicious
ice cream, you need to be there in the head of the line.
Implementation-wise, stacks and queues are a simply dynamic array that
we add item by appending at the end of array, and they only differs with
the delete operation: for stack, we delete item from the end; for a queue, we
delete item from the front instead. Of course, we can also implement with
any other linear data structure, such as linked list. Conventionally, the add
and deletion operation is called “push” and “pop” in a stack, and “enque”
and “deque” in a queue.

Operations Stacks and Queues support limited access and limited inser-
tion and deletion and the search and iteration relies on its underlying data
structure.
Stacks and queues are widely used in computer science. First, they
are used to implement the three fundamental searching strategies–Depth-
first, Breath-first, and Priority-first Search. Also, stack is a recursive data
structure as it can be defined as:

• a stack is either empty or

• it consists of a top and the rest which is a stack;

4.2.4 Hash Table

A hash table is a data structure that (a) stores items formed as {key: value}
pairs, (b) and uses a hash function index = h(key) to compute an index into
an array of buckets or slots, from which the mapping value will be stored
and accessed; for users, ideally, the result is given a key we are expected to
find its value in constant time–only by computing the hash function. An
example is shown in Fig. 4.5. Hashing will not allow two pairs that has the
same key.
52 4. ABSTRACT DATA STRUCTURES

Figure 4.5: Example of Hashing Table, replace key as index

First, the key needs to be of real number; when it is not, a conversion

from any type it is to a real number is necessary. Now, we assume the keys
passing to our hash function are all real numbers. We define a universe set
of keys U = {0, 1, 2, ..., |U − 1|}. To frame hashing as a math problem: given
a set of keys drawn from U that has n {key: value} pairs, a hash function
needs to be designed to map each pair to a key in a set in range {0, .., m − 1}
so that it fits into a table with size m (denoted by T [0...m − 1]), usually
n > m. We denote this mapping relation as h : U → − {0, ..., m − 1}. The
simplest hashing function is h = key, called direct hashing, which is only
possible when the keys are drawn from {0, ..., m − 1} and it is usually not
the case in reality.
Continue from the hashing problem, when two keys are mapped into the
same slot, which will surely happen given n > m, this is called collision.
In reality, a well-designed hashing mechanism should include: (1) a hash
function which minimizes the number of collisions and (2) a efficient collision
resolution if it occurs.

Hashing Functions
The essence of designing hash functions is uniformity and randomness. We
further use h(k, m) to represent our hash function, which points out that it
takes two variables as input, the key as k, and m is the size of the table where
values are saved. One essential rule for hashing is if two keys are equal, then
a hash function should produce the same key value (h(s, m) = h(t, m), if
s = t). And, we try our best to minimize the collision to make it unlikely
for two distinct keys to have the same value. Therefore our expectation
for average collision times for the same slot will be α = m n
, which is called
loading factor and is a critical statistics for design hashing and analyze
its performance. Besides, a good hash function satisfied the condition of
simple uniform hashing: each key is equally likely to be mapped to any of
the m slots. But usually it is not possible to check this condition because
one rarely knows the probability distribution according to which the keys
4.2. LINEAR DATA STRUCTURES 53

are drawn. There are generally four methods:

1. The Direct addressing method, h(k, m) = k, and m = n. Direct

addressing can be impractical when n is beyond the memory size of a
computer. Also, it is just a waste of spaces when m << n.

2. The division method, h(k, m) = k%m, where % is the module

operation in Python, it is the reminder of k divided by m. A large
prime number not too close to an exact power of 2 is often a good
choice of m. The usage of prime number is to minimize collisions
when the data exhibits some particular patterns. For example, in the
following cases, when m = 4, and m = 7, keys = [10, 20, 30, 40, 50]
m = 4 m = 7
10 10=4∗2+2 10=7∗1+3
20 20=4∗5+0 20=7∗2+6
30 30=4∗7+2 30=7∗4+2
40 40=4∗10+0 40=7∗5+5
50 50=4∗12+2 50=7∗7+1

Because the keys share a common factor c = 2 with the bucket size
m = 4, this will decrease the range of the reminder into m/c of its
original range. As shown in the example, the remainder is just {0, 2}
which is only half of the space of m. The real loading factor increase
to cα. Using a prime number is a easy way to avoid this since a prime
number has no factors other than 1 and itself.
If the size of table cannot easily to be adjusted to a prime number,
we can use h(k, m) = (k%p)%m, where p is our prime number which
should be chosen from the range m < p < |U |.

3. The multiplication method, h(k, m) = bm(kA%1)c. √ A ∈ (0, 1) is

a chosen constant and a suggestion to it is A = ( 5 − 1)/2. kA%1
means the fractional part of kA which equals to kA − bkAc. It is also
shorten as {kA}. For example, the fractional part of 45.2 is .2. In this
case, the choice of m is not as critical as in the division method; for
convenience, it is suggested with m = 2p , where p is some integer.

4. Universal hashing method: because any fixed hash function is vul-

nerable to the worst-case behavior when all n keys are hashed to the
same index, an effective way is to choose the hash function randomly
from a set of predefined hash functions for each execution–the same
hash function must be used for all accesses to the same table. However,
finding the predefined hash functions requires us to define multiple
prime numbers if the division method for each function is used, which is
not easy. A replacement is to define h(k, m) = ((ak + b)%p)%m, a, b <
p, a, b are both integers.
54 4. ABSTRACT DATA STRUCTURES

Resolving Collision
Collision is unavoidable given that m < n and the sometimes it just purely
bad luck that the data you have and the chosen hashing function produce
lots of collisions, thus, we need mechanisms to resolve possible collisions. We
introduce three methods: Chaining, Open Addressing, and Perfect Hashing.

Figure 4.6: Hashtable chaining to resolve the collision, change it to the real
example

Chaining An easy way to think of is by chaining the keys that have the
same hashing value using a linked list (either singly or doubly). For example,
when h(k, m) = k%4, and keys = [10,20,30,40,50]. For key as 10, 30, 50,
they are mapped to the same slot 2. Therefore, we chain them up at index 2
using a single linked list shown in Fig. 4.6. This method shows the following
characters:

• The average-case time for searching is O(α) under the assumption of

simple uniform hashing.

• The worst case running time for insertion is O(1).

• the worst-case behavior is when all keys are mapped to the same slot.

The advantage of chaining is the flexible size because we can always add
more items by chaining behind, this is useful when the size of the input set is
unknown or too large. However, the advantage comes with a price of taking
extra space due to the use of pointers.

Open Addressing In Open addressing, all items are stored in the hash
table itself; thus requiring the size of the hash table to be (m ≥ n), making
each slot either contains an item or empty and the load factor α ≤ 1. This
4.3. GRAPHS 55

avoids the usages of pointers, saving spaces. So, here is the question, what
would you do if there is collision?

• Linear Probing: Assume, at first, from the hash function, we save an

item at h(k1 , m) = p1 , when another pair {k2 , v2 } comes, we have
index h(k2 , m). If the index is the same as k1 , we can simply check
the position right after p1 in a cyclic order(from p1 to the end of hash
table, continue from the start of the table and end at p1 − 1): if it is
empty, we save the value at p1 + 1, otherwise, we try p1 + 2, and so
on until we find an empty spot, this is called linear probing. However,
there are other keys such as k3 that is mapped to index p1 at the first
time too, for k3 , it would collide with k1 at p1 , with k2 at p1 + 1, and
the second collision is called secondary collision. When the table is
relative full, such secondary collision can degrade the searching in the
hashing table to linear search. We can denote the linear probing as

h0 (k, i, m) = (h(k, m) + i)%m, for i = 0, 1, 2, ..., m − 1. (4.1)

i marks the number of tries. Now, try to delete item from linear
probing table, we know that T [p1 ] = k1 , T [p1 + 1] = k2 , T [p1 + 2] = k3 .
say we delete k1 , we repeat the hash function, find and delete it from
p1 , working well, we have T [p1 == N U LL. Then we need to delete or
search for k2 , we first find p1 and find that it is empty already, then we
thought k2 is deleted, great! You see the problem here? k2 is actually
at p1 + 1 but from the process we did not know. A simple resolution
instead of really deleting the value, we add a flag, say deleted, at
any position that a key is supposedly be deleted. Now, to delete k1 ,
we have p1 is marked as deleted. This time, when we are trying to
delete k2 , we first go to p1 and see that the value is not the same as its
value, we would know we should move to p1 + 1, and check its value:
it equals, nice, we put a marker here again.

• *Other Methods: However, even if the linear probing works, but it is

far from perfection. We can try to decrease the secondary collisions
using quadratic probing or double hashing.

In open addressing, it computes a probe sequence as of [h(k, m, 0),

h(k, m, 1),...,h(k,m, m-1)] which is a permutation of [0, 1, 2, ..., m-1]. We
successively probe each slot until an empty slot is found.

*Perfect Hashing
56 4. ABSTRACT DATA STRUCTURES

Figure 4.7: Example of graphs. Middle: undirected graph, Right: directed

graph, and Left: representing undirected graph as directed, Rightmost:
weighted graph.

4.3 Graphs
4.3.1 Introduction
Graph is a natural way to represent connections and reasoning between
things or events. A graph is made up of vertices (nodes or points) which are
connected by edges (arcs or lines). A graph structure is shown in Fig. 4.7.
We use G to denote the graph, V and E to refer its collections of vertices and
edges, respectively. Accordingly, |V | and |E| is used to denote the number
of nodes and edges in the graph. An edge between vertex u and v is denoted
as a pair (u, v), depending on the type of the graph, the pair can be either
ordered or unordered.
There are many fields in that heavily rely on the graph, such as the
probabilistic graphical models applied in computer vision, route problems,
network flow in network science, link structures of a website in social media,
and so. We present graph as a data structure. However, graph is really
a broad way to model problems; for example, we can model the possible
solution space as a graph and apply graph search to find the possible solution
to a problem. So, do not let the physical graph data structures limit our
imagination.
The representation of graph is deferred to Chapter. ??. In the next
section, we introduce the types of graphs.

4.3.2 Types of Graphs

Undirected Graph VS Directed Graph If one edge is directed that it
points from u(the tail) to v (the arc head), but not the other way around,
this means we can reach to v from u, but not the opposite. An ordered pair
(u, v) can denote this edge, in this book, we denote it as (u → v). If all
edges are directed, then we say it is a directed graph as shown on the right
of Fig. 4.7. If all edges e ∈ E is an unordered pair (u, v), that it is reachable
from both way for u, v ∈ V then the graph is undirected graph as shown in
the middle of Fig. 4.7.
4.3. GRAPHS 57

Unweighted Graph VS Weighted Graph In weighted graphs, each

edge of G is assigned a numerical value, or weighted. For example, the road
network can be drawn as a directed and weighted graph: two edges if it is
a two-way road, and one arc if its is one-way instead; and the weight of an
edge might be the length, speed limit or traffic. In unweighted graphs, there
is no cost distinction between various edges and vertices.

Embedded Graph VS Topological Graph A graph is defined without

a geometric position of their own, meaning we literally can draw the same
graph with vertices arranged at different positions. We call a specific drawing
of a graph as an embedding, and trawn graph is called embedded graph.
Occasionally, the structure of a graph is completely defined by the geometry
of its embedding, such as the famous travelling salesman problem, and grids
of points are another example of topology from geometry. Many problems
on an n × m grid involve walking between neighboring points, so the edges
are implicitly defined from the geometry.

Implicit Graph VS Explicit Graph Certain graphs are not explicitly

constructed and traversed, but it can be modeled as a graph. For example,
grids of points can also be looked as implicit graph, where each point is
an vertex and usually a point can link to its neighbors through an implicit
edge. Working with implicit graph takes more imagination and practice.
Another example will be seen in the backtracking as we are going to learn
in Chapter ??, the vertices of the implicit graph are the states of the search
vector while edges link pair of states that can be directly generated from
each other. It is totally ok that you do not get it right now, relax, come
back and think about it later.

Terminologies of Graphs In order to apply graph abstractions in real

problems, it is important to get familiar with the following important ter-
minologies of graphs:

1. Path: A path in a graph is a sequence of adjacent vertices. For

example, there is a path (0, 1, 2, 4) in both the undirected and directed
graph in Fig. 4.7. The length of a path in an unweighted graph is the
total number of edges that it passes through–i.e., it is one less than
the number of vertices in the graph. A simple path is a path with no
repeated vertices. In the weighted graph, it may instead be the sum of
the weights of all of its consisting edges. Obtaining the shortest path
can be a common task and of real-value.

2. cycles: In directed graph a cycle is a path that starts and ends at

the same vertex, and in undirected graph. A cycle can have length
one, i.e. a selfloop. A simple cycle is a cycle that has no repeated
58 4. ABSTRACT DATA STRUCTURES

vertices other than the start and the end vertices being the same. In
an undirected graph a (simple) cycle is a path that starts and ends
at the same vertex, has no repeated vertices other than the first and
last, and has length at least three. In this book we will exclusively
talk about simple cycles and hence, as with paths, we will often drop
simple. A graph is acyclic if it contains no cycles. Directed acyclic
graphs are often abbreviated as DAG.

3. Distance: The distance σ(u, v) from a vertex u to a vertex v in a

graph G is the shortest path (minimum number of edges) from u to v.
It is also referred to as the shortest path length from u to v.

4. Diameter: The diameter of a graph is the maximum shortest path

length over all pairs of vertices: diam(G) = max σ(u, v) : u, v inV .

5. Tree: An acyclic and undirected graph is a forest and if it is connected

it is called a tree. A rooted tree is a tree with one vertex designated as
the root. A tree can be directed graph too, and the edges are typically
all directed toward the root or away from the root. We will detail
more in the next section.

6. Subgraph: A subgraph is another graph whose vertices Vs and edges

Es are subsets of G, and all endpoints of Es must be included in
Es – Vs might have additional vertices. When V = Vs , Es ⊂ E, that
the subgraph includes all vertices of graph G is called A spanning
subgraph; when Es = E, Vs ⊂ V , that the subgraph contains all the
edges whose endpoints belong the vertex subset is called an induced
subgraph.

7. Complete Graph: A graph in which each pair of graph vertices is

connected by an edge is a complete graph. A complete graph with |V |
vertices is denoted as Kn = (|C|, 2) = n(n − 1)/2, pointing out that it
will has (|V |, 2) total edges.

Figure 4.8: Bipartite Graph

8. Bipartite Graph: A bipartite graph, a.k.a bigraph, is a graph whose

vertices can be divided into two disjoint sets V1 and V2 such that no two
4.4. TREES 59

vertices within the same set are connected to each other or adjacent.
A bipartite graph is a graph with no odd cycles; equivalently, it is a
graph that may be properly colored with two colors. See Fig. 4.8.

9. Connected Graph: A graph is connected if there is a path joining

each pair of vertices, that it is always possible to travel in a connected
graph between one vertex and any other. If a graph is not fully con-
nected, but has subset Vs that are connected, then the connected parts
of a graph are called its components.

4.3.3 Reference
1. http://www.cs.cmu.edu/afs/cs/academic/class/15210-f14/www/
lectures/graph-intro.pdf

4.4 Trees
Trees in Interviews The most widely used are binary tree and binary
search tree which are also the most popular tree problems you encounter in
the real interviews. A large chance you will be asked to solve a binary tree
or binary search tree related problem in a real coding interview especially
for new graduates which has no real industrial experience and pretty much
had no chance before to put the major related knowledge into practice yet.

4.4.1 Introduction
A tree is essentially a simple graph which is (1) connected, (2) acyclic, and
(3) undirected. To connect n nodes without a cycle, it requires n − 1 edges.
Adding one edge will create one cycle and removing one edge will divides
a tree into two components. Trees can be represented as a graph whose
representations we have learned in the last section, such a tree is called free
tree. A Forest is a set of n >= 0 disjoint trees.
However, free trees are not commonly seen and applied in computer
science (not in coding interviews either) and there are better ways–rooted
trees. In a rooted tree, a special node is singled out which is called the root
and all the edges are oriented to point away from the root. The rooted node
and one-way structure enable the rooted tree to indicate a hierarchy relation
between nodes whereas not so in the free tree. A comparison between free
tree and the rooted tree is shown in Fig. 4.9.

Rooted Trees
A rooted tree introduces a parent-child, sibling relationship between
nodes to indicate the hierarchy relation.
60 4. ABSTRACT DATA STRUCTURES

Figure 4.9: Example of Trees. Left: Free Tree, Right: Rooted Tree with
height and depth denoted

Three Types of Nodes Just like a real tree, we have the root, branches,
and finally the leaves. The first node of the tree is called the root node,
which will likely to be connected to its several underlying children node(s),
making the root node the parent node of its children. Besides the root
node, there are another two kinds of nodes: inner nodes and leaf nodes. A
leaf node can be found at the last level of the tree which has no further
children. An inner node is any node in the tree that has both parent node
and children, which is also any node that can not be characterized as either
leaf or root node. A node can be both root and leaf node at the same time,
if it is the only node that composed of the tree.

Terminologies of Nodes We define the following terminologies to char-

acterize nodes in a tree.

• Depth: The depth (or level) of a node is the number of edges from
the node to the tree’s root node. The depth of the root node is 0.

• Height: The height of a node is the number of edges on the longest

path from the node to a leaf. A leaf node will have a height of 0.

• Descendant: The descendant of a node is any node that is reachable

by repeated proceeding from parent to child starting from this node.
They are also known as subchild.

• Ancestor: The ancestor of a node is any node that is reachable by

repeated proceeding from child to parent starting from this node.

• Degree: The degree of a node is the number of its children. A leaf

is necessarily degreed zero.
4.4. TREES 61

Terminologies of Trees Following the characteristics of nodes, we fur-

ther define some terminologies to describe a tree.

• Height: The height(or depth) of a tree would be the height of its root
node, or equivalently, the depth of its deepest node.

• Diameter: The diameter (or width) of a tree is the number of nodes

(or edges) on the longest path between any two leaf nodes.

• Path: A path is defined as a sequence of nodes and edges connecting

a node with a descendant. We can classify them into three types:

1. Root->Leaf Path: the starting and ending node of the path is

the root and leaf node respectively;
2. Root->Any Path: the starting and ending node of the path is the
root and any node (inner, leaf node) respectively;
3. Any->Any Path: the starting and ending node of the path is
both any node (Root, inner, leaf node) respectively.

Representation of Trees Like linked list, which chains nodes together

via pointers–once the first node is given, we can get hold of information of
all nodes, a rooted tree can be represented with nodes consisting of pointers
and values too. Because in a tree, a node would have multiple children,
indicating a node can have multiple pointers. Such representation makes
a rooted tree a recursive data structure: each node can be viewed as a
root node, making this node and all the nodes that reachable from this
node a subtree of its parent. This recursive structure is the main reason we
separate it from graph field, and make it one of its own data structure. The
advantages are summarized as:

• A tree is an easier data structure that can be recursively represented

as a root node connected with its children.

• Trees can be always used to organize data and can come with efficient
information retrieval. Because of the recursive tree structure, divide
and conquer can be easily applied on trees (a problem can be most
likely divided into subproblems related to its subtrees). For example,
Segment Tree, Binary Search Tree, Binary heap, and for the pattern
matching, we have the tries and suffix trees.

The recursive representation is also called explicit representation. The

counterpart–implicit representation will not use pointer but with array,
wherein the connections are implied by the positions of the nodes. We
will see how it works in the next section.
62 4. ABSTRACT DATA STRUCTURES

Applications of Trees Trees have various applications due to its conve-

nient recursive data structures which related the trees and one fundamental
algorithm design methology-Divide and Conquer. We summarize the fol-
lowing important applications of trees:

1. Unlike arrays and linked list, tree is hierarchical: (1) we can store
information that naturally forms hierarchically, e.g., the file systems on
a computer, the employee relation in at a company. (2) If we organize
keys of the tree with ordering, e.g. Binary Search Tree, Segment Tree,
Trie used to implement prefix lookup for strings.

2. Trees are relevant to the study of analysis of algorithms not only be-
cause they implicitly model the behavior of recursive programs but also
because they are involved explicitly in many basic algorithms that are
widely used.

3. Algorithms applied on graph can be analyzed with the concept of tree,

such as the BFS and DFS can be represented as a tree data structure,
and a spanning tree that include all of the vertices in the graph. These
trees are the basis of other kind of computational problems in the field
of graph.

Tree is a recursive structure, it can almost used to visual-

ize any recursive based algorithm design or even computing
! the complexity in which case it is specifically called recursion
tree.

4.4.2 N-ary Tres and Binary Tree

Figure 4.10: A 6-ary Tree Vs a binary tree.

For a rooted tree, if each node has no more than N children, it is called
N-ary Tree. When N = 2, it is further distinguished as a binary tree,
4.4. TREES 63

where its possible two children are typically called left child and right child.
Fig. 4.10 shows a comparison of a 6-ary tree and a binary tree. Binary tree
is more common than N-ary tree because it is simplier and more concise,
thus making it more popular for coding interviews.

Figure 4.11: Example of different types of binary trees

Types of Binary Tree There are four common types of Binary Tree:

1. Full Binary Tree: A binary tree is full if every node has either 0
or 2 children. We can also say that a full binary tree is a binary
tree in which all nodes except leaves have two children. In full binary
tree, the number of leaves (|L|) and the number of all other non-leaf
nodes (|N L|) has relation: |L| = |N L| + 1. The total number of nodes
compared with the height h will be:

n = 20 + 21 + 22 + ... + 2h (4.2)
= 2h+1 − 1 (4.3)

2. Complete Binary Tree: A Binary Tree is complete if all levels are

completely filled except possibly the last level and the last level has
all keys as left as possible.

3. Perfect Binary Tree: A Binary tree is perfect in which all internal

nodes have two children and all leaves are at the same level. This also
means a perfect binary tree is both a full and complete binary tree.

4. Balanced Binary Tree: A binary tree is balanced if the height of

the tree is O(log n) where n is the number of nodes. For Example,
AVL tree maintains O(log n) height by making sure that the difference
between heights of left and right subtrees is at most 1.

5. Degenerate (or pathological) tree: A Tree where every internal node

has one child. Such trees are performance-wise same as linked list.
64 4. ABSTRACT DATA STRUCTURES

And each we show one example in Fig. 4.11.

Complete tree and a perfect tree can be represented with an array, and
we assign index 0 for root node, and given a node with index i, the children
will be 2 ∗ i + 1 and 2 ∗ i + 2, this is called implicit representation, wherein
its counterpart recursive representation is called explicit representation.
5

Introduction to Combinatorics

In discrete optimization, some or all of the variables in a model are required

to belong to a discrete set; this is in contrast to continuous optimization in
which the variables are allowed to take on any value within a range of values.
There are two branches of discrete optimization: integer programming and
combinatorial optimization where the discrete set is a set of objects, or com-
binatorial structures, such as assignments, combinations, routes, schedules,
or sequences. Combinatorial optimization is the process of searching for
maxima (or minima) of an objective function F whose domain is a discrete
but large configuration space (as opposed to an N-dimensional continuous
space). Typical combinatorial optimization problems are the travelling sales-
man problem (“TSP”), the minimum spanning tree problem (“MST”), and
the knapsack problem. We start with basic combinatorics which is able to
enumerate the all solutions exhaustively. Later on, other chapters we will
dive into different combinatorial/disrete optimization problems.
Combinatorics, as a branch in mathematics that mainly concerns with
counting and enumerating, is a means in obtaining results, and certain prop-
erties of finite structures. Combinatorics is used frequently in computer sci-
ence to obtain formulas and estimates in both the design and analysis of
algorithms. It is a broad and thus seemingly hard to define topic that can
solve the following types of questions:

• The counting or enumerating of specified structures, sometimes re-

ferred to as arrangements or configurations in a very general sense,
associated with finite systems,

• the existence of such structures that satisfy certain given criteria, this
is usually called Contraint Restricted Problems (CSPs).

65
66 5. INTRODUCTION TO COMBINATORICS

• optimization, finding the “best” structure or solution among several

possibilities, be it the “largest”,“smallest” or satisfying some other
optimality criterion.

In this section, we introduce common combinatorics that can help us come

up with the simplest which potentially be quite large state space. At least,
this is the first step, and solving a small problem in this way might offer us
more insights on continuing finding a better solution.
When the situation is easy, we can mostly figure out the counting with
some logic and get a closed-form solution; when the situation is more com-
plex such as in the partition section, we detour by using recurrence relation
and math induction.

5.1 Permutation
Given a list of integer [1, 2, 3], how many way can we order these three
numbers? Imagine that we have three positions for these three integers. For
the first position, it can choose 3 integers, leaving the second position with
2 options. Further, when it reaches to the last position, it can only choose
whatever that is left, we have 1. The total count will be 3 × 2 × 1.
Similarly, for n distinct numbers, we will get the number of permutation
easily as n × (n − 1) × ... × 1. A factorial, denoted as as n!, is used to
abbreviate it. Worth to notice, the factorial sequence grows even quicker
than the exponential sequence, such as 2n .

5.1.1 n Things in m positions

Permutation of n things on n positions is denoted as p(n, n). Think about
what if we have m ∈ [1, n − 1] positions instead? How to get a closed-form
function for p(n, m). The process is the same: we fix each position and
consider the number of choice of things each one has.

p(n, m) = n × (n − 1) × ... × (n − (m − 1)) (5.1)

n × (n − 1) × ... × (n − m + 1) × (n − m) × ... × 1
= (5.2)
(n − m) × ... × 1
n!
= (5.3)
(n − m)!

If we want p(n, n) to follow the same form, it would require us to define

0! = 1.

What if there are repeated things, that things are not

distinct?
5.2. COMBINATION 67

5.1.2 Recurrence Relation and Math Induction

The number or the full set of permuations can be generated incrementally.

We demonstrate how with recurrence relation and math induction. We start
from P (0, 0) = 1. Easily, we get P (i, 0) = 1, i ∈ [1, n]. With math induction,
now assume we know P (n − 1, m − 1), for the m-th position, what choice
does it have? First, we need to pick this thing from the n − (m − 1) things.
Then, we have m − 1 things lined up linearly, there are m positions to insert
the m-th item, resulting P (n, m) = (n − m + 1) ∗ mP (n − 1, m − 1).
Now, we can use iterative method to obtain the closed-form solution:

P (n, m) = (n − m + 1) ∗ m ∗ P (n − 1, m − 1) (5.4)
= (n − m + 1) ∗ m ∗ (P (n − 2, m − 2) (5.5)
... (5.6)
= m!P (n − m + 1, 1 (5.7)

5.1.3 See Permutation in Problems

Suppose we want to sort an array of integers incrementally, say the array is

A = [4, 8, 2]. The right order is [2, 4, 8], which is trivial to obtain in this case.
If we are about to form it as a search problem, we need to define a search
space. Using our knowledge in combinatorics, we know all possible ordering
of these numbers are [4,8,2],[4,2,8],[2,4,8],[2,8,4],[8,2,4],[8,4,2]. Generating all
possible ordering and save it in an array maybe. Then this sorting problem
is converted into checking which array is incrementally sorted. However,
it comes with large price on space usage, since for n numbers there, the
number of possible orderings are n!. A smarter way to do it is to check the
ordering as we are generating the ordering set.

5.2 Combination
Same as before, we have to choose m things out of n but with one difference–
the order does not matter, how many ways we have? This problem is
called combination, and it is denoted as C(n, m). For example, for [1,2,3],
C(3, 2) = [1, 2], [2, 3], [1, 3]. Comparatively, P(3, 2) = [1, 2], [2, 1], [2, 3], [3,
2], [1, 3], [3, 1].
To get combination, we can leverage and apply permutation first. How-
ever, this results over-counting. As shown in our example, when there are
two things in the combination, a permutation would double count it. If
there are m things, we over count by m! times. Therefore, if we divide
the permutation by all permutation of m things, we get out formula for
68 5. INTRODUCTION TO COMBINATORICS

combination:
P (n, m)
C(n, m) = (5.8)
P (m, m)
n!
= (5.9)
(n − m)!m!

Back to the last question, when there are repeats in the

permutation. We can use the same idea. Assume we
have n, m that n things in total but only m types and
ai , i ∈ [1, m] to denote the number of each type, this means
a1 + a2 + ... + am = n. The number of ways to linearly order
these objects is a1 !a2n!!...am ! .

The combination of k things out of n, will be the same as choosing (n-k)

things.

C(n, k) = C(n, n − k) (5.10)

5.2.1 Recurrence Relation and Math Induction

We also show how the combination can be generated incrementally. We
start from C(0, 0) = 1, and easily we get C(n, 0) = 1. Assume we know
C(n − 1, k − 1), now we need to add the k-th item into the combination? :

• Use k-th item, then we just need to put the the k-th item into any sets
in C(n − 1, k − 1), resulting C(n − 1, k − 1).

• Not use k-th item, this means we need to pick k items from the other
n − 1 items, resulting C(n − 1, k).

Thus, we have C(n, k) = C(n − 1, k − 1) + C(n − 1, k), this is called Pascal’s

Identity.

What if things are not distinct?

5.3 Partition
We discuss three types of partitions: (1) integer partition, (2) set partition,
and (3) array partition. In this section, counting becomes less obvious com-
pared with combination and permutation, this is where we rely more on
recurrence relation and math induction.
5.3. PARTITION 69

5.3.1 Integer Partition

Integer Partition Definition Integer partitions is to partition a given
integer n into distinct subsets that add up to n.
For example , g i v e n n=5 , t h e r e s u l t i n g p a r t i t i o n e d s u b s e t s a r e
these 7 subsets :
{5}
{4 , 1} ,
{ 3 , 2}
{3 , 1 , 1} ,
{2 , 2 , 1} ,
{2 , 1 , 1 , 1} ,
{ 1 , 1 , 1 , 1 , 1}

Analysis Let us assume the resulting sequence is (a1 , a2 , ..., ak ), and a1 ≥

a2 ≥ ... ≥ ak ≥ 1, and a1 + a2 + ... + ak = n. The ordering is simply to help
us to track the sequence. We use
The easiest way to generate integer partition is to construct them incre-
mentally. We first start from the partition of n. For n=5, we get 5 first.
Then we subtract one from the largest item that is larger than 1, and add
it to the smallest item if it exists and that the resulting s+1 < l, s < l-1 ,
and other option is to put it aside. For 5, there is no other item, so that it
becomes 4, 1. For 4,1, following the same rule, we get 3, 2, for 3, 2, we get
3,1,1.
1 { 5 } , no o t h e r s m a l l e r item , put i t a s i d e
2 { 4 , 1 } , s a t i s f y s<l −1, become { 3 , 2 }
3 { 3 , 2 } , not s a t i s f y s<l −1 , put i t a s i d e
4 { 3 , 1 , 1 } , s a t i s f y s<l −1, add i t t o
5 { 2 , 2 , 1 } , not s a t i s f y , put i t a s i d e
6 { 2 , 1 , 1 , 1 } , not s a t i s f y , put i t a s i d e
7 { 1 , 1 , 1 , 1 , 1}

Try to generate the partition when n=6.

If we draw out the transfer graph, we can see a lot of overlapping of some
state. Therefore, we add one more limitation on the condition, s>1.

5.3.2 Set Partition

Set Partition Problem Definition How many ways exist to partition a
set of n distinct items a1 , a2 , ..., an into k nonempty subsets, k <= n.
Here a r e 7 ways t h a t we can p a r t i t i o n t h e s e t { a1 , a2 , a3 , a4 }
i n t o 2 nonempty s u b s e t s . They a r e
{ a1 } , { a2 , a3 , a4 } ;
{ a2 } , { a1 , a3 , a4 } ;
70 5. INTRODUCTION TO COMBINATORICS

{ a3 } , { a1 , a2 , a4 } ;
{ a4 } , { a1 , a2 , a3 }
{ a1 , a2 } , { a3 , a4 } ;
{ a1 , a3 } , { a2 , a4 } ;
{ a1 , a4 } , { a2 , a3 } ;

Let us denote the total ways as s(n, k). As seen in the example, given
2 groups and 4 items, there are two combination of each group’s size: 1+3
and 2+2. For combination 1,3, this is equivalent to choose one item from the
set to put at the first subset C(n, 1), and then choose 3 items for the other
subset C(3, 3). For combination 2, 2, we have C(4, 2) for one subset and
C(2, 2) for the other subset. However, because the ordering of the subsets
does not matter, we need to divide it by 2!. The set partition problem thus
consists of two steps:

• Partition n into k integers: This subrountine can be solved with integer

partition we just learned. We have b1 + b2 + ... + bk = n.

• For each combination of integer partition, we compute the number of

ways choosing bi items for that set, we get C(n, b1 ) × C(n − b1 , b2 ) ×
C(n − b1 − b2 , b3 ) × ... × C(bk , bk ). Now, we find the distinct bi and its
number of appearance in the sequence. If we have m distinct number
denoted as bi , and its count ci , then we divide the above ways by
c1 !c2 !...cm !.

From this solution, it is hard to get a closed form for s(n, k).

Find Recurrence Relation There is just one way to handle this problem,
let us try the incremental method–find a recurrence relation. We first start
with s(0, 0) = 0, and we can also easily get s(n, 0) = 0. Now, with the
mathematical induction, we assume we solved a subproblem, say s(n−1, k −
1), can we induce s(n, k)? What do we need?
Now we have n-1 items in k-1 groups, now there is one addition group
and one additional item. There are two ways:

• put the additional item into the additional group. In this way, s(n, k)
is simply the same as of s(n − 1, k − 1).

• spread the n-1 items from the original k-1 groups into k groups, that
is s(n − 1, k) and our additional item has k options now, making k ×
s(n − 1, k) in total

Combing together the count of these two ways, we get a recurrence relation
that

s(n, k) = s(n − 1, k − 1) + ks(n − 1, k) (5.11)

5.4. ARRAY PARTITION 71

5.4 Array Partition

Problem Definition How many ways exist to partition an array of n
items a1 , a2 , ..., an into subarrays. There are different subtypes depending
on the number of subarrays, say m:

1. When the number m is as flexible as m ∈ [1, n − 1].

2. When the number m is fixed as a number in range [2, n − 1].

When the number of subarray is fixed For example, it is common

to partition an array into 2 or 3 subarrays. First, we find an item ai as a
partition point, getting the last subarray a[i : n] and left an array to further
consider a[0 : i]. If m = 2, a[0 : i] results the first subarray and the partition
process is done. This gives out n ways of parition. When m = 3, we need
to further partition a[0 : i] into two parts. This can be represented with
recurrence relation:

d(n, m) = (d(i, m − 1), a[i : n]), i ∈ [0, n − 1] (5.12)

Further, for d(i, m − 1):

d(i, m − 1) = (d(j, m − 2), a[j : n]), j ∈ [0, i − 1] (5.13)

This can be done recursively: we will have a recursive function with depth
m.

When the number of subarray is flexible The process is the same

other than m can be as large as n − 1. If we are about to use dynamic
programming, for all these states, we need to come up with an ordering
of the state (i, j), where i is the subproblem a[0 : i], and j is the num-
ber of partitions. We imagine it as a matrix with i, j as row and column
respectively:
0 1 2 n−1: p a r t i t i o n
0 X − − −
1 X X − −
2 X X X

n−1
n X X X X X

Does the ordering of the for loop matter? Actually it does not.

Applications There are many applications that involve splitting an ar-

ray/string or cutting a rod. This relates to spliting type of dynamic pro-
gramming.
72 5. INTRODUCTION TO COMBINATORICS

5.5 Merge

5.6 More Combinatorics

Combinatorics is about enuemrating specified structures, there are some
structures are of our main interests through this book and often appears in
the interviews, they are: subarray, subsequence, and subsets.

Subarray We have solved one example with subarray. Subarray is defined

as a contigious sequence in the array, which can be represented as a[i, ..., j].
The number of subarray exist in an array of size n will be:
i=n
sa = i = n ∗ (n + 1)/2 (5.14)
X

i=1

A substring is a contiguous sequence of characters within a string. For

instance, "the best of" is a substring of "It was the best of times". This is
not to be confused with subsequence, which is a generalization of substring.
For example, "Itwastimes" is a subsequence of "It was the best of times", but
not a substring.
Prefix and suffix are special cases of substring. A prefix of a string S S
is a substring of S that occurs at the beginning of S. A suffix of a string S
is a substring that occurs at the end of S.

Subsequence For a subsequence means any sequence we can find the

array, which is not required to be contiguous, but the ordering still matters.
For example, in the array of [ABCD], the subsequence will be
1 [] ,
2 [A] , [ B ] , [ C ] , [ D] ,
3 [AB ] , [ AC ] , [AD] , [BC ] , [ BD] , [CD] ,
4 [ABC ] , [ABD] , [ACD] , [BCD] ,
5 [ABCD]

You would actually see for n = 4, there are 16 possible subsequence, which
is 24 . This is not coincidence. Imagine for each item in the array, they have
two options, either be chosen into the possible sequence or not chosen, which
make it to 2n .

ss = 2n (5.15)

Subset The Subset B of a set A is defined as a set within all elements of

this subset are from set A. In other words, the subset B is contained inside
the set A, B ∈ A. There are two kinds of subsets: if the order of the subset
does’nt matter, it is a combination problem, otherwise, it is a permutation
problem.
5.6. MORE COMBINATORICS 73

If it is the case that ordering does not matter, for n distinct things, the
number of possible subsets, also called the power set will be:

powers et = C(n, 0) + C(n, 1) + ... + C(n, n) (5.16)

74 5. INTRODUCTION TO COMBINATORICS
6

Recurrence Relations

As we mentioned briefly about the power of recursion is in the whole algo-

rithm design and analysis, we dedicate this chapter to recurrence relation.
To summarize, recurrence relation can help with:

• Recurrence relation naturally represent the relation of recursion. Ex-

amples will be shown in Chapter. 13.

• Any iteration can be translated into recurrence relation. Some exam-

ples can be found in Chapter. IV.

• Recurrence relation together with mathematical induction is the most

powerful tool to design and prove the correctness of algorithm(chapter. 13
and Chapter. IV).

• Recurrence relation can be applied to algorithm complexity analysis(

Chapter. IV).

In the following chapters of this part, we endow application meanings to

these formulas and discuss how to realize the mentioned uses.

6.1 Introduction
Definition and Concepts A recurrence relation is function expressed
with the same function. More precisely, as defined in mathematics, recur-
rence relation is an equation that recursively defines a sequence or multi-
dimensional array of values; once one or more initial terms are given, each
further term of the sequence or array is defined as a function of the preced-
ing terms. Fibonacci sequence is one of the most famous recurrence relation

75
76 6. RECURRENCE RELATIONS

which is defined as f (n) = f (n − 1) + f (n − 2), f (0) = 0, f (1) = 1.

an = Ψ(n, an−1 ) for n ≤ 0, (6.1)

We use an to denote the value at index n, and the recurrence function

is marked as Ψ(n, P ), P is all preceding terms that needed to build up
this recurrence relation. Like the case of factorial, each factorial number
only relies on the result of the previous number and its current index, this
recurrence relation can be written as the following equation:
A recurrence relation needs to start from initial value(s). For the above
relation, a0 needs to be defined and it will be the first element of a recurrence
relation. The above relation is only related to the very first preceding terms,
which is called recurrence relation of first order. If P includes multiple
preceding terms, a recurrence relation of order k can be easily extended as:

an = Ψ(n, an−1 , an−2 , ..., an−k ) for n ≤ k, (6.2)

In this case, k initial values are needed for defining a sequence. Initial
values can be given any values but then once initial values are decided, the
recurrence determines the sequence uniquely. Thus, initial values are also
called the degree of freedom for solutions to the recurrence.
Many natural functions are easily expressed as recurrence:

• Polynomial: an = an−1 + 1, a1 = 1 →
− an = n.

• Exponential: an = 2 × an−1 , a1 = 1 →
− an = 2n−1 .

• Factorial: an = n × an−1 , a1 = 1 →
− an = n!

Solving Recurrence Relation In real problems, we might care about

the value of recursion at n, that is compute an for any given n, and there
are two ways to do it:

• Programming: we utilize the computational power of computer and

code in either iteration or recursion to build up the value at any given
n. For example, f (2) = f (1) + f (0) = 1, f (3) = f (2) + f (1) = 2, and
so on. With this iteration, we would need n − 1 steps to compute f (n).

• Math: we solve the recurrence relation by obtaining an explicit or

closed-form expression which is a non-recursive function of n. With
the solution at hand, we can get an right away.

Recurrence relations plays an important role in the analysis of algorithms.

Usually, time recurrence relation T (n) is defined to analyze the time com-
plexity of solving a problem with input instance of size n. The field of
complexity analysis studies the closed-form solution of T (n); that is to say
6.1. INTRODUCTION 77

the functional relation between T (n) with n that it cares, not each exact
value.
In this section, we focus on solving the recurrence relation using math
to get a closed-form solution. Categorizing the recurrence relation can help
us pinpoint each type’s solving methods.

Categorizes Recurrence relation is essentially discreet function, which

can be naturally categorized as linear (such as function y = mx + b) and
non-linear; quadratic, cubic and so on (such as y = ax2 + bx + c, y =
ax3 + bx2 + cx + d). In the field of algorithmic problem solving, linear
recurrence relation is commonly used and researched, thus we deliberately
leave the non-linear recurrence relation and its method of solving out of the
scope of this book.

• Homogeneous linear recurrence relation: When the recurrent

relation is linear homogeneous of degree k with constant coefficients, it
is in the form, and is also called order-k homogeneous linear recurrence
with constant coefficients.

an = c1 an−1 + c2 an−2 + ... + ck an−k . (6.3)

a0 , a1 , ..., ak−1 will be initial values.

• Non-homogeneous linear recurrence relation: An order-k non-

homogeneous linear recurrence with constant coefficients is defined in
the form:

an = c1 an−1 + c2 an−2 + ... + ck an−k + f (n). (6.4)

f(n) can be 1 or n or n2 and so on.

• Divide-and-conquer recurrence relation: When n is not decreas-

ing by a constant as does in Eq. 6.3 and Eq. 6.4, instead by a constant
factor, with the equality as shown below, it is called divide and conquer
recurrence relation.
an = an/b + f (n) (6.5)
where a ≤ 1, b > 1, and f (n) is a given function, which usually has
f (n) = cnk .

We will introduce general methods to solve a linear recurrence relation but

leave out the part of divide and conquer recurrence relation in this chapter
for reason that divide and conquer recurrence relation will most likely to be
solved with just roughly, as shown in Chapter. IV to just estimate the time
complexity resulted from the divide and conquer method.
78 6. RECURRENCE RELATIONS

6.2 General Methods to Solve Linear Recurrence

Relation
No general method for solving recurrence function is known yet, however,
linear recurrence relation with finite initial values and previous states, con-
stant coefficients can always be solved. Due to the fact that the recursion
is essentially mathematical induction, the most general way of solving any
recurrence relation is to use mathematical induction and iterative method.
This also makes the the mathematical induction, in some form, the founda-
tion of all correctness proofs for computer programs. We examine these two
methods by solving two recurrence relation: an = 2 × an−1 + 1, a0 = 0 and
an = an/2 + 1.

6.2.1 Iterative Method

The most straightforward method for solving recurrence relation no mat-
ter its linear or non-linear is the iterative method. Iterative method is a
technique or procedure in computational mathematics that it iteratively re-
place/substitute each an with its recurrence relation Ψ(n, an−1 , an−2 , ..., an−k )
till all items “disappear” other than the initial values. Iterative method is
also called substitution method.
We demonstrate iteration with a simple non-overlapping recursion.
T (n) = T (n/2) + O(1) (6.6)
= T (n/2 ) + O(1) + O(1)
2

= T (n/23 ) + 3O(1)
= ...
= T (1) + kO(1) (6.7)
We have 2nk = 1, we solve this equation and will get k = log2 n. Most
likely T (1) = O(1) will be the initial condition, we replace this, and we get
T (n) = O(log2 n).
However, when we try to apply iteration on the third recursion: T (n) =
3T (n/4) + O(n). It might be tempting to assume that T (n) = O(n log n)
due to the fact that T (n) = 2T (n/2) + O(n) leads to this time complexity.
T (n) = 3T (n/4) + O(n) (6.8)
= 3(3T (n/4 ) + n/4) + n = 3 T (n/4 ) + n(1 + 3/4)
2 2 2

= 32 (3T (n/43 ) + n/42 ) + n(1 + 3/4) = 33 T (n/43 ) + n(1 + 3/4 + 3/42 )

(6.9)
= ... (6.10)
k−1
3
= 3k T (n/4k ) + n ( )i (6.11)
X

i=0
4
6.2. GENERAL METHODS TO SOLVE LINEAR RECURRENCE RELATION79

6.2.2 Recursion Tree

Since the term of T(n) grows, the iteration can look messy. We can use
recursion tree to better visualize the process of iteration. In a recursive tree,
each node represents the value of a single subproblem, and a leaf would be
a subproblem. As a start, we expand T (n) as a node with value n as root,
and it would have three children each represents a subproblem T (n/4). We
further do the same with each leaf node, until the subproblem is trivial and
be a base case. In practice, we just need to draw a few layers to find the
rule. The cost will be the sum of costs of all layers. The process can be seen
in Fig. 10.3. In this case, it is the base case T (1). Through the expansion

Figure 6.1: The process to construct a recursive tree for T (n) = 3T (bn/4c)+
O(n). There are totally k+1 levels. Use a better figure.

with iteration and recursion tree, our time complexity function becomes:

k
T (n) = Li + Lk+1 (6.12)
X

i=1
k
=n (3/4)i−1 + 3k T (n/4k ) (6.13)
X

i=1

In the process, we can see that Eq. 10.13 and Eq. 10.7 are the same.
80 6. RECURRENCE RELATIONS

Because T (n/4k ) = T (1) = 1, we have k = log4 n.

∞
T (n) ≤ n (3/4)k−1 + 3k T (n/4k ) (6.14)
X

i=1
≤ 1/(1 − 3/4)n + 3log4 n T (1) = 4n + nlog4 3 ≤ 5n (6.15)
= O(n) (6.16)

6.2.3 Mathematical Induction

Mathematical induction is a mathematical proof technique, and is essentially
used to prove that a property P (n) holds for every natural number n, i.e.
for n = 0, 1, 2, 3, and so on. Therefore, in order to use induction, we need
to make a guess of the closed-form solution for an . Induction requires two
cases to be proved.

1. Base case: proves that the property holds for the number 0.

2. Induction step: proves that, if the property holds for one natural num-
ber n, then it holds for the next natural number n + 1.

For T (n) = 2 × T (n − 1) + 1, T0 = 0, we can have the following result by

expanding T (i), i ∈ [0, 7].
n 0 1 2 3 4 5 6 7
T_n 0 3 7 15 31 63 127

It is not hard that we find the rule and guess T (n) = 2n − 1. Now, we prove
this equation by induction:

1. Show that the basis is true: T (0) = 20 − 1 = 0.

2. Assume it holds true for T (n − 1). By induction, we get

T (n) = 2T (n − 1) + 1 (6.17)
= 2(2 n−1
− 1) + 1 (6.18)
=2 −1
n
(6.19)

Now we show that the induction step holds true too.

Solve T (n) = T (n/2)+O(1) and T (2n) ≤ 2T (n)+2n−1, T (2) =

1.
6.3. SOLVE HOMOGENEOUS LINEAR RECURRENCE RELATION 81

Briefying on Other Methods When the form of the linear recurrence is

more complex, say large degree of k, more complex of the f (n), none of the
iterative and induction methods is practical and managable. For iterative
method, the expansion will be way too messy for us to handle. On the side
of induction method, it is quite challenging or sometimes impossible for us
just to “guess” or “generalize” the exact closed-form of recurrence relation
solution purely based on observing a range of expansion.
The more general and approachable method for solving homogeneous lin-
ear recurrence relation derives from making a rough guess rather than exact
guess, and then solve it via characteristic equation. This general method is
pinpointed in Section. 6.3 with examples. For non-homogeneous linear recur-
rence relation (Section. 6.4), there are generally two ways – symbolic differ-
entiation and method of undetermined coefficients to solve non-homogeneous
linear recurrence relation and both of them relates to solving homogeneous
linear relation. The study of the remaining content is most math saturated
in the book, while we later on will find out its tremendous help in complexity
analysis in Chapter. IV and potentially in problem solving.

6.3 Solve Homogeneous Linear Recurrence Rela-

tion
In this section, we offer a more general and more managable method for
solving recurrence relation that is homogeneous defined in Eq. 6.3. There
are three broad methods: using characteristic equation which we will learn
in this section, and the other two– linear algebra, and Z-transofrm 1 will not
be included.

Make a General “Guess” From our previous examples, we can figure

out the closed-form solution for simplied homogeneous linear recurrence such
as the fibonacci recurrence relation:

an = an−1 + an−2 , a0 = 0, a1 = 1 (6.20)

A reasonable guess would be that an is doubled every time; namely, it is

approximately 2n . Let’s guess an = c2n for some constant c. Now we
substitute Eq. 6.21, we get

c2n = c2n−1 + c2n−2 = c2n (6.21)

We can see that c will be canceled and the left side is always greater than
the right side. Thus we learned that c2n is a too large guess, and the
multiplicative constant c plays no role in the induction step.
1
Visit https://en.wikipedia.org/wiki/Recurrence_relation for details.
82 6. RECURRENCE RELATIONS

Based on the above example, we introduce a parameter γ as a base, an =

γ n for some γ. We then compute its value through solving Characteristic
Equation as introduced below.

Characteristic Equation Now, we substitute our guess into the Eq.6.3,

then

γ n = an (6.22)
= c1 γ n−1
+ c2 γ n−2
+ ... + ck γ n−k
. (6.23)

We rewrite Eq. 6.23 as:

γ n − c1 γ n−1 − c2 γ n−2 − ... − ck γ n−k = 0. (6.24)

By dividing γ n−k from left and right side of the equation, we get the simpli-
fied equation, which is called the characteristic equation of the recurrence
relation in the form of Eq. 6.3.

γ k − c1 γ k−1 − c2 γ k−2 − ... − ck = 0. (6.25)

The concept of characteristic equation is related to generating function2 .

The solutons of characteristic equation are called characteristic roots.

Characteristic Roots and Solution Now, we have a linear homoge-

neous recurrence relation and its characteristic equation, and assume that
the equation has k distinct roots, γ1 , γ2 , ..., γk , then we can build upon
these chracteristic roots, the general guess, and some other k constants,
d1 , d2 , , , , dk of {an } as:

an = d1 γ1n + d2 γ2n + ... + dk γkn (6.26)

The unknown constants, d1 , d2 , , , , dk of {an } can be found using the initial

values a0 , a1 , ..., ak−1 by solving the following equations:

a0 = d1 γ10 + d2 γ20 + ... + dk γk0 , (6.27)

a1 = d1 γ11 + d2 γ21 + ... + dk γk1 , (6.28)
..., (6.29)
ak−1 = d1 γ1k−1 + d2 γ2k−1 + ... + dk γkk−1 . (6.30)

Within the context of computer science, the degree is mostly within 2. Here,
we introduce the formula solving the character roots for characteristic equa-
tion with the following form:

0 = ax2 + bx + c (6.31)
2
6.4. SOLVE NON-HOMOGENEOUS LINEAR RECURRENCE RELATION83

The root(s) can be computed from the following formula 3 :

√
−b ± b2 − 4ac
x= (6.32)
2a

Hands-on Example For an = 2an−1 + 3an−2 , a0 = 3, a1 = 5, we can

write the characteristic equation as γ 2 − 2γ − 3 = 0. Because γ 2 − 2γ − 3 =
(γ − 3) + (γ + 1), which make the characteristic roots γ1 = 3, γ2 = −1. Now
our solution has the form:
an = d1 3n + d2 (−1)n (6.33)
Now, we find the constants via listing the initial values we know:
a0 = d1 30 + d2 (−1)0 = d1 + d2 = 3, (6.34)
1
a1 = d1 3 + d2 (−1) = 3d1 − d2 = 5.
1
(6.35)
We would get d1 = 2, d2 = 1. Finally, we have a solution an = 2∗3n +(−1)n .

Continue to solve an = an−1 + an−2 .

6.4 Solve Non-homogeneous Linear Recurrence Re-

lation
method of undetermined coefficients where the solution is comprised of the
solution of the homogeneous part and the particular f (n) part by summing
up; and the method of symbolic differentiation which converts from the
equation the same form of homogeneous linear recurrence relation.
The complexity analysis for most algorithms fall into the form of non-
homogeneous linear recurrence relation. For examples: in fibonacci se-
quence, if it is be solved by using recursion shown in Chapter. 15 without
caching mechanism, the time recurrence relation is T (n) = T (n − 1) + T (n −
2) + 1; in the merge sort discussed in Chapter. 13, the recurrence relation is
T (n) = T (n/2) + n. Examples of recurrence relation T (n) = T (n − 1) + n
can be easily found, such as the maximum subarray.

Method of Undetermined Coefficients Suppose we have a recurrence

relation in the form of Eq. 6.4.
Suppose we ignore the non-linear part and just look at the homogeneous
part:
hn = c1 hn−1 + c2 hn−2 + ... + ck hn−k . (6.36)
3
Visit http://www.biology.arizona.edu/biomath/tutorials/Quadratic/Roots.html for
derivation
84 6. RECURRENCE RELATIONS

Symbolic Differentiation

6.5 Useful Math Formulas

Knowing these facts can be very important in practice, we can treat each
as an element in the problem solving. Sometimes, when its hard to get the
closed form of a recurrence relation or finding the recurrence relation, we
decompse it to multiple parts with these elements. Put some examples.

binomial theorem
n
Cnk xk = (1 + x)n (6.37)
X

k=0

An example of using this the cost of generating a powerset, where x = 1.

6.6 Exercises
1. Compute factorial sequence using while loop.

2. Greatest common divisor: The Euclidean algorithm, which computes

the greatest common divisor of two integers, can be written recursively.

if y = 0,
(
x
gcd(x, y) = (6.38)
gcd(y, x%y) if y > 0

Function definition:

6.7 Summary
If a cursive algorithm can be further optimized, the optimization method can
either be divide and conquer or decrease and conquer. We have put much
effort into solving recurrence relation of both: the linear recurrence relation
for decrease and conquer, the divide and conquer recurrence relation for
divide and conquer. Right now, do not struggle and eager to know what is
divide or decrease and conquer, it will be explained in the next two chapters.
Further, Akra-Bazzi Method 4 applies to recurrence such that T (n) =
T (n/3) + T (2n/3) + O(n). Please look into more details if interested. Gen-
erating function is used to solve the linear recurrence.

4
Part III

Get Started: Programming

and Python Data Structures

85
87

After the warm up, we prepare ourselves with hands-on skills–basic pro-
gramming with Python 3, including two function type–iteration and recur-
sion, and connecting dots between the abstract data structures with Python

3 built-in data types and commonly used modules.

Python is object-oriented programming language and its underlying im-

plementation is C++, which has a good mapping with the abstract data struc-
tures we discussed. Learn how to use Python data type can be learned from
the official Python tutorial: https://docs.python.org/3/tutorial/. How-
ever, in order to grasp the efficiency of data structures needs us to examine
its C++ source code (https://github.com/python/cpython) that relates
easily to abstract data structures.
88
7

Iteration and Recursion

“The power of recursion evidently lies in the possibility of defin-

ing an infinite set of objects by a finite statement. In the same
manner, an infinite number of computations can be described by a
finite recursive program, even if this program contains no explicit
repetitions.”
– Niklaus Wirth, Algorithms + Data Structures = Programs, 1976

7.1 Introduction

Figure 7.1: Iteration vs recursion: in recursion, the line denotes the top-
down process and the dashed line is the bottom-up process.

In computer science, software programs can be categorized as either iter-

ation or recursion, thus making iteration and recursion as the topmost level
of concepts in software development and the very first base for us to study

89
90 7. ITERATION AND RECURSION

computer science techniques. Iteration refers to a looping process which

repeats some part of the code until a certain condition is met. Recursion,
similarly, needs to stop at a certain condition, but it replaces the loop with
recursive function calls; meaning a function calls itself from within its own
code. The process is shown in Fig. 7.1.
Do you still have the feeling that you seemingly already understand the
iteration even without code, but what is recursion exactly? Recursion can be
a bit of challenging for beginners, it differs from our normal way of thinking.
It is a bit of similar to the vision of being in the restroom which has two
mirrors abrest on each side and facing each other, we see multiple images
of the things in front of each mirror, and these images usually appear from
large to small. This is similar to recursion. The relation between these
recurred images can be called recurrence relation.
Understanding recursion and learning basic rules to solve recurrence re-
lation are two of the most purposes in this chapter. Thus, we organize the
content of this chapter following this trend:
1. Section. ?? will first address our question by analyzing the recursion
mechanism within the computer program, and we further understand
the different between by seeing example of factorial series and examines
the pros and cons of each.
2. Section. 7.4 advances our knowledge about recursion by studying the
recurrence relation, including its definition, categorization and ad-
dressing how to solve recurrence relation.
3. Section. ?? gives us two examples to see how iteration and recursion
works in real practice.

Deduce(find) the recurrence relation and sometimes solves it

is a key step in algorithm design and problem solving, solving
! the recurrence time relation is important to algorithm analy-
sis.

In this section, we first learn iteration and Python Syntax that can be
used to implement. We then examine a classic and elementary example–
Factorial sequence to catch a glimpse of how iteration and recursion can be
applied to solve this problem. Then, we discuss more details about recursion.
We end this section by comparing iteration and recursion; their pros and
cons and their relation between.

7.2 Iteration
In simple terms, an iterative function is one that loops to repeat some part
of the code. In Python, the loops can be expressed with for and while
7.3. FACTORIAL SEQUENCE 91

loop.
Enumerating the number from 1 to 10 is a simple iteration. Implemen-
tation wise:
• for usually is used together with function range(start, stop, step)
which creates a sequence of numbers from start to stop in range
[start, end), and increments by step (1 by default). Thus, we need to
set start as 1, and end as 11 to get numbers from 1 to 10.
1 # enumerate 1 t o 10 with f o r l o o p
2 f o r i in range (1 , 11) :
3 p r i n t ( i , end= ' ' )

• while is used with syntax

while expression
statement

In our case, we need to set start condition which is i = 1, and the

expression will be limiting i <= 10. In the statement, we need to
manually increment the variable i so that we wont not end up with
infinite loop.
1 i = 1
2 w h i l e i <= 1 0 :
3 p r i n t ( i , end = ' ' )
4 i += 1

7.3 Factorial Sequence

The factorial of a positive integer n, denoted by n!, is the product of all
positive integers less than or equal to n:
For example :
5! = 5 \ times 4 \ times 3 \ times 2 \ times 1 = 120.
0! = 1

To compute the factorial sequence at n, we need to know the factorial

sequence at n − 1, which can be expressed as a recurrence relation, that
n! = n × (n − 1)!).
• Solving with iteration: we use a for loop starts at 1 up till n so that
we eventually build up our answer at n. We use a variable ans to save
the factorial result for each number, and once the program stops, ans
gives the result of our factorial for n.
1 def f a c t o r i a l _ i t e r a t i v e (n) :
2 ans = 1
3 f o r i i n r a n g e ( 1 , n+1) :
4 ans = ans ∗ i
5 r e t u r n ans
92 7. ITERATION AND RECURSION

• Solving with recursion: we start to call a recursive function at n, within

this function, we can itself but instead with n − 1 just as shown in the
recurrence relation. We then multiply this recursive call with n. We
need to define a bottom, which is the end condition for the recursive
function calls to avoid infinite loop. In this case, it bottoms out at
n = 1, which we can know its answer would be 1, thus we return 1 to
stop further function calls and recursively return to its upmost level.
1 def factorial_recursive (n) :
2 i f n == 1 :
3 return 1
4 r e t u r n n ∗ f a c t o r i a l _ r e c u r s i v e ( n−1)

7.4 Recursion
In this section, we reveal how the recursion mechanism works: function calls
and stack, two passes.

Figure 7.2: Call stack of recursion function

Two Elements When a routine calls itself either directly or indirectly, it

is said to be making a recursive function call. The basic idea behind solving
problems via recursion is to break the instance of the problem into smaller
and smaller instances until the instances are so small they can be solved
trivially. We can view a recursive routine as consisting of two parts.

• Recursive Calls: As in the factorial sequence, when the instance of the

problem is still too large to solve directly, we recursive call this function
itself to solve problems of smaller size. Then the result returned from
the recursive calls are used to build upon the result of the upper level
using recurrence relation. For example, If we use f (n) to denote the
factorial at n, the recurrence relation would be f (n) = n×f (n−1), n >
0.
7.4. RECURSION 93

• End/Basis Cases: The above resursive call needs to bottom-out; stop

when the instance is so small to be solved directly. This stop condition
is called end/basic case. Without this case, the recursion will continue
to dive infinitely deep and eventually we run of memory and get a
crash. A recursive function can have one or more base cases. In the
example of factorial, the base case is when n = 0, by definition 0! = 1.

Recursive Calls and Stacks The recursive function calls of the recursive
factorial we implemented in the last section can be demonstrated as Fig. 7.2.
The execution of recursive function f (n) will pay two visits to each resur-
sive function f (i), i ∈ [1, n] through two passes: top-down and bottom-up as
we have illustrated in Fig. 7.1. The recursive function handles this process
via a stack data structure which follows a Last In First Out (LIFO) principle
to record all function calls.

• In the top-down pass, each recursive function’s execution context is

“pushed” into the stack in the order of f (n), f (n − 1), ..., f (1). The
process ends till it hits to the end case f (0), which will not be “pushed”
into the stack but execution some code and returns value(s). The end
case marks as the start of the bottom-up process.
• In the bottom-up pass, the recursive function’s execution context in
the stack is “poped” off the stack in a reversed order: f (1), ..., f (n −
1), f (n). And f (1) takes the returned value from the base case to
construct its value using the recurrence relation. Then it returns its
value up to the next recursive function f (2). This whole process ends
at f (n) which returns its value.

How Import Recursion Is? Recursion is a very powerful and funda-

mental technique, and it is basis for several other design principles, such
as:
• Divide and Conquer (Chapter. 13).
• Recursive Search, such as Tree Traversal and graph search.
• Dynamic Programming (Chapter. 15.
• Combinatorics such as enumeration (permutation and combination)
and branch and bound etc.
• Some classes of greedy algorithms.
It also supports the proof of correctness of algorithms via mathematical
induction, and consistently arise in the algorithm complexity analysis. We
shall see through out this book and will end up drawing this conclusion
ourselves.
94 7. ITERATION AND RECURSION

Practical Guideline In real algorithmic problem solving, different pro-

cess normally has different usage.
In top-down process we do:

1. Break problems into smaller problems, there are different ways of

“breaking” and depends on which, they can either be divide and con-
quer or decrease and conquer which we will further expand in Chap-
ter. ?? and ??. Divide and conquer will divide the problems into
disjoint subproblems, whereas in decrease and conquer, the problems

2. Searching: visit nodes in non-linear data structures (graph/tree), visit

nodes in linear data structures. Also, at the same time, we can use
pass by reference to track the state change such as the traveled path
in the path related graph algorithms.

In bottom-up process, we can either return None or variables. Assume

if we already used pass by reference to tack the change of state, then it
is not necessarily to return variables. In some scenario, tracking states with
by passing by reference can be more easier and more intuitive. For example,
in the graph algorithm, we mostly like to use this method.

Tail Recursion This is also called tail recursion where the function calls
itself at the end (“tail”) of the function in which no computation is done after
the return of recursive call. Many compilers optimize to change a recursive
call to a tail recursive or an iterative call.

7.5 Iteration VS Recursion

Stack Overflow Problem In our example, if we call function factorial_recursive()
with n = 1000, Python would have complain an error as:
1 R e c u r s i o n E r r o r : maximum r e c u r s i o n depth e x c e e d e d i n comparison

which is a stack overflow problem. A stack overflow is when we run out of

memory to hold items in the stack. These situations can incur the stack
overflow problem:

1. No base case is defined.

2. The recursion is too deep which is out of the assigned memory limit
of the executing machine.

Stack Overflow for Recursive Function and Iterative Implementa-

tion According to Wikipedia, in software, a stack overflow occurs if the call
stack pointer exceeds the stack bound. The call stack may consist of a lim-
ited amount of address space, often determined at the start of the program
7.5. ITERATION VS RECURSION 95

depending on many factors, including the programming language, machine

architecture, multi-threading, and amount of available memory. When a
program attemps to use more space than is available on the call stack, the
stack is said to overflow, typically resulting in a program crash. The very
deep recursive function is faced with the threat of stack overflow. And the
only way we can fix it is by transforming the recursion into a loop and
storing the function arguments in an explicit stack data structure, this is
often called the iterative implementation which corresponds to the recursive
implementation.
We need to follow these points:

1. End condition, Base Cases and Return Values: either return an answer
for base cases or None, and used to end the recursive calls.

2. Parameters: parameters include: data needed to implement the func-

tion, current paths, the global answers and so on.

3. Variables: What the local and global variables. In Python any pointer
type of data can be used as global variable global result putting in the
parameters.

4. Construct current result: when to collect the results from subtree and
combine to get the result for current node.

5. Check the depth: if the program will lead to the heap stack overflow.

Conversion For a given problem, conversion between iteration and recur-

sion is possible, but the difficulty of the conversion is highly dependable on
specific problem context. For example, the iteration of a range of numbers
can be represented with recurrence relation T (n) = T (n − 1) + 1. On the
side of implementation, some recursion and iteration can be easily converted
between such as linear search; in some other cases, it takes more tricks and
requires more sophisticated data structures to assist the conversion, such as
in the iterative implementation of the recursive depth-first-search, it uses
stack. Do not worry about these concepts here, as you flip more pages in
the book, you will know and start to think better.

Tail recursion and Optimization In a typical recursive function, we

usually make the recursive calls first, and then take the return value of the
recursive call to calculate the result. Therefore, we only get the final result
after all the recursive calls have returned some value. But in a tail recursive
function, the various calculations and statements are performed first and the
recursive call to the function is made after that. By doing this, we pass the
results of the current step to the next recursive call to the function. Hence,
the last statement in a Tail recursive function is the recursive call to the
96 7. ITERATION AND RECURSION

function. This means that when we perform the next recursive call to the
function, the current stack frame (occupied by the current function call) is
not needed anymore. This allows us to optimize the code. We Simply reuse
the current stack frame for the next recursive step and repeat this process
for all the other function calls.
Using regular recursion, each recursive call pushes another entry onto
the call stack. When the functions return, they are popped from the stack.
In the case of tail recursion, we can optimize it so that only one stack entry
is used for all the recursive calls of the function. This means that even on
large inputs, there can be no stack overflow. This is called Tail recursion
optimization.
Languages such as lisp and c/c++ have this sort of optimization. But,
the Python interpreter doesn’t perform tail recursion optimization. Due to
this, the recursion limit of python is usually set to a small value (approx,
104 ). This means that when you provide a large input to the recursive
function, you will get an error. This is done to avoid a stack overflow. The
Python interpreter limits the recursion limit so that infinite recursions are
avoided.

Handling recursion limit The “sys module” in Python provides a func-

tion called setrecursionlimit() to modify the recursion limit in Python.
It takes one parameter, the value of the new recursion limit. By default,
this value is usually 104 . If you are dealing with large inputs, you can set it
to, 106 so that large inputs can be handled without any errors.

7.6 Exercises
1. Compute factorial sequence using while loop.

7.7 Summary
If a cursive algorithm can be further optimized, the optimization method can
either be divide and conquer or decrease and conquer. We have put much
effort into solving recurrence relation of both: the linear recurrence relation
for decrease and conquer, the divide and conquer recurrence relation for
divide and conquer. Right now, do not struggle and eager to know what is
divide or decrease and conquer, it will be explained in the next two chapters.
8

Bit Manipulation

Many books on algorithmic problem solving seems forget about one topic–
bit and bit manipulation. Bit is how data is represented and saved on the
hardware. Thus knowing such concept and bit manipulation using Python
sometimes can also help us device more efficient algorithms, either space or
time complexity in the later Chapter.
For example, how to convert a char or integer to bit, how to get each bit,
set each bit, and clear each bit. Also, some more advanced bit manipulation
operations. After this, we will see some examples to show how to apply bit
manipulation in real-life problems.

8.1 Python Bitwise Operators

Bitwise operators include «, », &, |, ,̃ .̂ All of these operators operate on
signed or unsigned numbers, but instead of treating that number as if it were
a single value, they treat it as if it were a string of bits. Twos-complement
binary is used for representing the singed number.
Now, we introduce the six bitwise operators.

x « y Returns x with the bits shifted to the left by y places (and new bits
on the right-hand-side are zeros). This is the same as multiplying x by 2y .

x » y Returns x with the bits shifted to the right by y places. This is the
same as dividing x by 2y , same result as the // operator. This right shift is
also called arithmetic right shift, it fills in the new bits with the value of the
sign bit.

97
98 8. BIT MANIPULATION

x & y "Bitwise and". Each bit of the output is 1 if the corresponding bit
of x AND of y is 1, otherwise it’s 0. It has the following property:
1 # keep 1 o r 0 t h e same a s o r i g i n a l
2 1 & 1 = 1
3 0 & 1 = 0
4 # s e t t o 0 with & 0
5 1 & 0 = 0
6 0 & 0 = 0

x | y "Bitwise or". Each bit of the output is 0 if the corresponding bit of

x AND of y is 0, otherwise it’s 1.
1 # s e t t o 1 with | 1
2 1 | 1 = 1
3 0 | 1 = 1
4
5 # keep 1 o r 0 t h e same a s o r i g i n a l
6 1 | 0 = 1
7 0 | 0 = 0

∼ x Returns the complement of x - the number you get by switching each

1 for a 0 and each 0 for a 1. This is the same as −x − 1(really?).

x ∧ y "Bitwise exclusive or". Each bit of the output is the same as the
corresponding bit in x if that bit in y is 0, and it’s the complement of the
bit in x if that bit in y is 1. It has the following basic properties:
1 # t o g g l e 1 o r 0 with ^ 1
2 1 ^ 1 = 0
3 0 ^ 1 = 1
4
5 # keep 1 o r 0 with ^ 0
6 1 ^ 0 = 1
7 0 ^ 0 = 0

Some examples shown:

1 A = 5 = 0 1 0 1 , B = 3 = 0011
2 A ^ B = 0101 ^ 0011 = 0110 = 6
3

More advanced properties of XOR operator include:

1 a ^ b = c
2 c ^ b = a
3
4 n ^ n = 0
5 n ^ 0 = n
6 eg . a =00111011 , b=10100000 , c= 1 0 0 1 1 0 1 1 , c ^b= a
7
8.2. PYTHON BUILT-IN FUNCTIONS 99

Logical right shift The logical right shift is different to the above right
shift after shifting it puts a 0 in the most significant bit. It is indicated with
a >>> operator n Java. However, in Python, there is no such operator, but
we can implement one easily using bitstring module padding with zeros
using >>= operator.
1 >>> a = BitArray ( i n t =−1000, l e n g t h =32)
2 >>> a . i n t
3 −1000
4 >>> a >>= 3
5 >>> a . i n t
6 536870787

8.2 Python Built-in Functions

bin() The bin() method takes a single parameter num- an integer and
return its binary string. If not an integer, it raises a TypeError exception.
1 a = bin (88)
2 print (a)
3 # output
4 # 0 b1011000

However, bin() doesn’t return binary bits that applies the two’s complement
rule. For example, for the negative value:
1 a1 = b i n ( −88)
2 # output
3 # −0b1011000

int(x, base = 10) The int() method takes either a string x to return an
integer with its corresponding base. The common base are: 2, 10, 16 (hex).
1 b = i n t ( ' 01011000 ' , 2 )
2 c = i n t ( ' 88 ' , 1 0 )
3 print (b , c )
4 # output
5 # 88 88

chr() The chr() method takes a single parameter of integer and return a
character (a string) whose Unicode code point is the integer. If the integer
i is outside the range, ValueError will be raised.
1 d = chr (88)
2 print (d)
3 # output
4 # X
100 8. BIT MANIPULATION

Figure 8.1: Two’s Complement Binary for Eight-bit Signed Integers.

ord() The ord() method takes a string representing one Unicode character
and return an integer representing the Unicode code point of that character.
1 e = ord ( ' a ' )
2 print ( e )
3 # output
4 # 97

8.3 Twos-complement Binary

Given 8 bits, if it is unsigned, it can represent the values 0 to 255 (1111,1111).
However, a two’s complement 8-bit number can only represent positive in-
tegers from 0 to 127 (0111,1111) because the most significant bit is used as
sign bit: ’0’ for positive, and ’1’ for negative.

N −1
2i = 2(N −1) + 2(N −2) + ... + 22 + 21 + 20 = 2N − 1 (8.1)
X

i=0

The twos-complement binary is the same as the classical binary representa-

tion for positive integers and differs slightly for negative integers. Negative
integers are represented by performing Two’s complement operation on its
absolute value: it would be (2N − n) for representing −n with N-bits. Here,
we show Two’s complement binary for eight-bit signed integers in Fig. 8.1.
8.3. TWOS-COMPLEMENT BINARY 101

Get Two’s Complement Binary Representation In Python, to get

the two’s complement binary representation of a given integer, we do not re-
ally have a built-in function to do it directly for negative number. Therefore,
if we want to know how the two’s complement binary look like for negative
integer we need to write code ourselves. The Python code is given as:
1 bits = 8
2 ans = ( 1 << b i t s ) −2
3 p r i n t ( ans )
4 # output
5 # ' 0 b11111110 '

There is another method to compute: inverting the bits of n (this is called

One’s Complement) and adding 1. For instance, use 8 bits integer 5, we
compute it as the follows:

510 = 0000, 01012 , (8.2)

−510 = 1111, 10102 + 12 , (8.3)
−510 = 1111, 10112 (8.4)

To flip a binary representation, we need expression x XOR ’1111,1111’, which

is 2N − 1. The Python Code is given:
1 d e f twos_complement ( v a l , b i t s ) :
2 # f i r s t f l i p implemented with xor o f v a l with a l l 1 ' s
3 f l i p _ v a l = v a l ^ ( 1 << b i t s − 1 )
4 #f l i p _ v a l = ~ v a l we o n l y g i v e 3 b i t s
5 return bin ( f l i p _ v a l + 1)

Get Two’s Complement Binary Result In Python, if we do not want

to see its binary representation but just the result of two’s complement of a
given positive or negative integer, we can use two operations −x or ∼ +1.
For input 2, the output just be a negative integer -2 instead of its binary
representation:
1 d e f twos_complement_result ( x ) :
2 ans1 = −x
3 ans2 = ~x + 1
4 p r i n t ( ans1 , ans2 )
5 p r i n t ( b i n ( ans1 ) , b i n ( ans2 ) )
6 r e t u r n ans1
7 # output
8 # −8 −8
9 # −0b1000 −0b1000

This is helpful if we just need two’s complement result instead of getting the
binary representation.
102 8. BIT MANIPULATION

8.4 Useful Combined Bit Operations

For operations that handle each bit, we first need a mask that only set that
bit to 1 and all the others to 0, this can be implemented with arithmetic left
shift sign by shifting 1 with 0 to n-1 steps for n bits:
1 mask = 1 << i

Get ith Bit In order to do this, we use the property of AND operator
either 0 or 1 and with 1, the output is the same as original, while if it is and
with 0, they others are set with 0s.
1 # f o r n b i t , i i n r a n g e [ 0 , n−1]
2 def get_bit (x , i ) :
3 mask = 1 << i
4 i f x & mask :
5 return 1
6 return 0
7 print ( get_bit (5 ,1) )
8 # output
9 # 0

Else, we can use left shift by i on x, and use AND with a single 1.
1 def get_bit2 (x , i ) :
2 r e t u r n x >> i & 1
3 print ( get_bit2 (5 ,1) )
4 # output
5 # 0

Set ith Bit We either need to set it to 1 or 0. To set this bit to 1, we

need matching relation: 1− > 1, 0− > 1. Therefore, we use operator |. To
set it to 0: 1− > 0, 0− > 0. Because 0 & 0/1 = 0, 1&0=1, 1&1 = 1, so we
need first set that bit to 0, and others to 1.
1 # s e t i t to 1
2 x = x | mask
3
4 # s e t i t to 0
5 x = x & (~ mask )

Toggle ith Bit Toggling means to turn bit to 1 if it was 0 and to turn it to
0 if it was one. We will be using ’XOR’ operator here due to its properties.
1 x = x ^ mask

Clear Bits In some cases, we need to clear a range of bits and set them
to 0, our base mask need to put 1s at all those positions, Before we solve
this problem, we need to know a property of binary subtraction. Check if
you can find out the property in the examples below,
8.4. USEFUL COMBINED BIT OPERATIONS 103

1000 −0001 = 0111

0100 −0001 = 0011
1100 −0001 = 1011

The property is, the difference between a binary number n and 1 is all
the bits on the right of the rightmost 1 are flipped including the rightmost
1. Using this amazing property, we can create our mask as:
1 # b a s e mask
2 i = 5
3 mask = 1 << i
4 mask = mask −1
5 p r i n t ( b i n ( mask ) )
6 # output
7 # 0 b11111

With this base mask, we can clear bits: (1) All bits from the most significant
bit till i (leftmost till ith bit) by using the above mask. (2) All bits from
the lest significant bit to the ith bit by using ∼ mask as mask. The Python
code is as follows:
1 # i i −1 i −2 . . . 2 1 0 , keep t h e s e p o s i t i o n s
2 def c l e a r _ b i t s _ l e f t _ r i g h t ( val , i ) :
3 p r i n t ( ' val ' , bin ( val ) )
4 mask = ( 1 << i ) −1
5 p r i n t ( ' mask ' , b i n ( mask ) )
6 r e t u r n b i n ( v a l & ( mask ) )

1 # i i −1 i −2 . . . 2 1 0 , e r a s e t h e s e p o s i t i o n s
2 def c l e a r _ b i t s _ r i g h t _ l e f t ( val , i ) :
3 p r i n t ( ' val ' , bin ( val ) )
4 mask = ( 1 << i ) −1
5 p r i n t ( ' mask ' , b i n (~ mask ) )
6 r e t u r n b i n ( v a l & (~ mask ) )

Run one example:

p r i n t ( c l e a r _ b i t s _ l e f t _ r i g h t ( i n t ( '11111111 ' ,2) , 5) )
p r i n t ( c l e a r _ b i t s _ r i g h t _ l e f t ( i n t ( '11111111 ' ,2) , 5) )
v a l 0 b11111111
mask 0 b11111
0 b11111
v a l 0 b11111111
mask −0b100000
0 b11100000

Get the lowest set bit Suppose we are given ’0010,1100’, we need to get
the lowest set bit and return ’0000,0100’. And for 1100, we get 0100. If we
try to do an AND between 5 and its two’s complement as shown in Eq. 8.2
and 8.4, we would see only the right most 1 bit is kept and all the others are
cleared to 0. This can be done using expression x&(−x), −x is the two’s
complement of x.
104 8. BIT MANIPULATION

1 def get_lowest_set_bit ( val ) :

2 r e t u r n b i n ( v a l & (− v a l ) )
3 pr int ( get_lowest_set_bit (5) )
4 # output
5 # 0 b1

Or, optionally we can use the property of subtracting by 1.

1 x ^ ( x & ( x −1) )

Clear the lowest set bit In many situations we want to strip off the
lowest set bit for example in Binary Indexed tree data structure, counting
number of set bit in a number. We use the following operations:
1 def strip_last_set_bit ( val ) :
2 p r i n t ( bin ( val ) )
3 return bin ( val & ( val − 1) )
4 print ( strip_last_set_bit (5) )
5 # output
6 # 0 b101
7 # 0 b100

8.5 Applications
Recording States Some algorithms like Combination, Permutation, Graph
Traversal require us to record states of the input array. Instead of using an
array of the same size, we can use a single integer, each bit’s location indi-
cates the state of one element with same index in the array. For example,
we want to record the state of an array with length 8. We can do it like
follows:
1 used = 0
2 f o r i in range (8) :
3 i f used &(1<< i ) : # check s t a t e a t i
4 continue
5 used = used | (1<< i ) # s e t s t a t e a t i used
6 p r i n t ( b i n ( used ) )

It has the following output

0 b1
0 b11
0 b111
0 b1111
0 b11111
0 b111111
0 b1111111
0 b11111111
8.5. APPLICATIONS 105

XOR Single Number

8.1 136. Single Number(easy). Given a non-empty array of integers,
every element appears twice except for one. Find that single one.
Note: Your algorithm should have a linear runtime complexity. Could
you implement it without using extra memory?
Example 1 :

I nput : [ 2 , 2 , 1 ]
Output : 1

Example 2 :

I nput : [ 4 , 1 , 2 , 1 , 2 ]
Output : 4

Solution: XOR. This one is kinda straightforward. You’ll need to

know the properties of XOR as shown in Section 8.1.
1 n ^ n = 0
2 n ^ 0 = n

Therefore, we only need on variable to record the state which is ini-

tialize with 0: the first time to appear x = n, second time to appear x
= 0. the last element x will be the single number. To set the statem
we can use XOR.
1 d e f singleNumber ( s e l f , nums ) :
2 """
3 : type nums : L i s t [ i n t ]
4 : rtype : int
5 """
6 v = 0
7 f o r e i n nums :
8 v = v ^ e
9 return v

8.2 137. Single Number II Given a non-empty array of integers, ev-

ery element appears three times except for one, which appears exactly
once. Find that single one. Note: Your algorithm should have a lin-
ear runtime complexity. Could you implement it without using extra
memory?
1 Example 1 :
2
3 I nput : [ 2 , 2 , 3 , 2 ]
4 Output : 3
5
6 Example 2 :
7
8 I nput : [ 0 , 1 , 0 , 1 , 0 , 1 , 9 9 ]
9 Output : 99
106 8. BIT MANIPULATION

Solution: XOR and Two Variables. In this problem, because all

element but one appears three times. To record the states of three, we
need at least two variables. And we initialize it to a = 0, b = 0. For
example, when 2 appears the first time, we set a = 2, b = 0; when it
appears two times, a = 0, b = 2; when it appears three times, a = 0, b
= 0. For number that appears one or two times will be saves either in
a or in b. Same as the above example, we need to use XOR to change
the state for each variable. We first do a = a XOR v, b = XOR v, we
need to keep a unchanged and set b to zero. We can do this as a = a
XOR v & ∼ b; b = b XOR v & ∼ a.
1 d e f singleNumber ( s e l f , nums ) :
2 """
3 : type nums : L i s t [ i n t ]
4 : rtype : int
5 """
6 a = b = 0
7 f o r num i n nums :
8 a = a ^ num & ~b
9 b = b ^ num & ~a
10 return a | b

8.3 421. Maximum XOR of Two Numbers in an Array (medium).

Given a non-empty array of numbers, a0 , a1 , a2 , ..., an−1 , where 0 ≤
ai < 231 . Find the maximum result of ai XOR aj , where 0 ≤ i, j < n.
Could you do this in O(n) runtime?
Example :
Input : [ 3 , 1 0 , 5 , 2 5 , 2 , 8 ]

Output : 28
E x p l a n a t i o n : The maximum r e s u l t i s 5 \^ 25 = 2 8 .

Solution 1: Build the Max bit by bit. First, let’s convert these
integers into binary representation by hand.
3 0000 , 0011
10 0000 , 1011
5 0000 , 0101
25 0001 , 1001
2 0000 , 0010
8 0000 , 1000

If we only look at the highest position i where there is one one and
all others zero. Then we know the maximum XOR m has 1 at that
bit. Now, we look at two bits: i, i-1. The possible maximum XOR
for this is append 0 or 1 at the end of m, we have possible max 11,
because for XOR, if we do XOR of m with others, mXORa = b, if b
exists in these possible two sets, then max is possible and it become
m << 1 + 1. We can carry on this process, the following process is
showed as follows: answer1̂ is the possible max,
8.5. APPLICATIONS 107

1 d e f findMaximumXOR ( s e l f , nums ) :
2 """
3 : type nums : L i s t [ i n t ]
4 : rtype : int
5 """
6 answer = 0
7 f o r i in range (32) [ : : − 1 ] :
8 answer <<= 1 # m u l t i p l e i t by two
9 p r e f i x e s = {num >> i f o r num i n nums} # s h i f t r i g h t
f o r n , d i v i d e /2^ i , g e t t h e f i r s t (32− i ) b i t s
10 answer += any ( ( answer +1) ^ p i n p r e f i x e s f o r p i n
prefixes )
11 r e t u r n answer

Solution 2: Use Trie.

1 d e f findMaximumXOR ( s e l f , nums ) :
2 def Trie () :
3 return c o l l e c t i o n s . d e f a u l t d i c t ( Trie )
4
5 root = Trie ()
6 best = 0
7
8 f o r num i n nums :
9 candidate = 0
10 cur = t h i s = root
11 f o r i in range (32) [ : : − 1 ] :
12 c u r B i t = num >> i & 1
13 t h i s = t h i s [ curBit ]
14 i f curBit ^ 1 in cur :
15 c a n d i d a t e += 1 << i
16 cur = cur [ curBit ^ 1 ]
17 else :
18 cur = cur [ curBit ]
19 b e s t = max( c a n d i d a t e , b e s t )
20 return best

With Mask

8.4 190. Reverse Bits (Easy).Reverse bits of a given 32 bits unsigned

integer.
Example 1 :

I nput : 00000010100101000001111010011100
Output : 00111001011110000010100101000000
E x p l a n a t i o n : The i n p u t b i n a r y s t r i n g
00000010100101000001111010011100 r e p r e s e n t s t h e u n s i g n e d
i n t e g e r 4 3 2 6 1 5 9 6 , s o r e t u r n 964176192 which i t s b i n a r y
r e p r e s e n t a t i o n i s 00111001011110000010100101000000.

Example 2 :
108 8. BIT MANIPULATION

Input : 11111111111111111111111111111101
Output : 10111111111111111111111111111111
E x p l a n a t i o n : The i n p u t b i n a r y s t r i n g
11111111111111111111111111111101 r e p r e s e n t s t h e u n s i g n e d
i n t e g e r 4 2 9 4 9 6 7 2 9 3 , s o r e t u r n 3221225471 which i t s
binary representation i s
10101111110010110010011101101001.

Solution: Get Bit and Set bit with mask. We first get bits from
the most significant position to the least significant position. And get
the bit at that position with mask, and set the bit in our ’ans’ with a
mask indicates the position of (31-i):
1 # @param n , an i n t e g e r
2 # @return an i n t e g e r
3 def reverseBits ( s e l f , n) :
4 ans = 0
5 f o r i i n r a n g e ( 3 2 ) [ : : − 1 ] : #from h i g h t o low
6 mask = 1 << i
7 set_mask = 1 << (31− i )
8 i f ( mask & n ) != 0 : #g e t b i t
9 #s e t b i t
10 ans |= set_mask
11 r e t u r n ans

8.5 201. Bitwise AND of Numbers Range (medium).Given a range

[m, n] where 0 ≤ m ≤ n ≤ 2147483647, return the bitwise AND of all
numbers in this range, inclusive.
Example 1 :

Input : [ 5 , 7 ]
Output : 4

Example 2 :

Input : [ 0 , 1 ]
Output : 0

Solution 1: O(n) do AND operation. We start a 32 bit long 1s.

The solution would receive LTE error.
1 d e f rangeBitwiseAnd ( s e l f , m, n ) :
2 """
3 : type m: i n t
4 : type n : i n t
5 : rtype : int
6 """
7 ans = i n t ( ' 1 ' ∗ 3 2 , 2 )
8 f o r c i n r a n g e (m, n+1) :
9 ans &= c
10 r e t u r n ans
8.6. EXERCISES 109

Solution 2: Use mask, check bit by bit. Think, if we AND

all, the resulting integer would definitely smaller or equal to m. For
example 1:
0101 5
0110 6
0111 7

We start from the least significant bit at 5, if it is 1, then we check

the closest number to 5 that has 0 at the this bit. It would be 0110.
If this number is in the range, then this bit is offset to 0. We then
move on to check the second bit. To make this closest number: first
we clear the least i+1 positions in m to get 0100 and then we add it
with 1 << (i + 1) as 0010 to get 0110.
1 d e f rangeBitwiseAnd ( s e l f , m, n ) :
2 ans = 0
3 mask = 1
4 f o r i in range (32) : # [ : : − 1 ] :
5 b i t = mask & m != 0
6 i f bit :
7 # c l e a r i +1, . . . , 0
8 mask_clear = ( mask<<1)−1
9 l e f t = m & (~ mask_clear )
10 check_num = ( mask << 1 ) + l e f t
11 i f check_num < m o r check_num > n :
12 ans |= 1 << i
13 mask = mask << 1
14 r e t u r n ans

Solution 3: Use While Loop. We can start do AND of n with

(n-1). If the resulting integer is still larger than m, then we keep do
such AND operation.
1 d e f rangeBitwiseAnd ( s e l f , m, n ) :
2 ans=n
3 w h i l e ans>m:
4 ans=ans &(ans −1)
5 r e t u r n ans

8.6 Exercises
1. Write a function to determine the number of bits required to convert
integer A to integer B.
1 def bitswaprequired (a , b) :
2 count = 0
3 c = a^b
4 w h i l e ( c != 0 ) :
5 count += c & 1
6 c = c >> 1
110 8. BIT MANIPULATION

7 r e t u r n count
8 p r i n t ( bitswaprequired (12 , 7) )

2. 389. Find the Difference (easy). Given two strings s and t which
consist of only lowercase letters. String t is generated by random
shuffling string s and then add one more letter at a random position.
Find the letter that was added in t.
Example :
Input :
s = " abcd "
t = " abcde "

Output :
e
Explanation :
' e ' i s t h e l e t t e r t h a t was added .

Solution 1: Use Counter Difference. This way we need O(M +N )

space to save the result of counter for each letter.
1 def findTheDifference ( s e l f , s , t ) :
2 s = c o l l e c t i o n s . Counter ( s )
3 t = c o l l e c t i o n s . Counter ( t )
4 diff = t − s
5 return l i s t ( d i f f . keys ( ) ) [ 0 ]

Solution 2: Single Number with XOR. Using bit manipulation

and with O(1) we can find it in O(M + N ) time, which is the best
BCR:
1 def findTheDifference ( s e l f , s , t ) :
2 """
3 : type s : s t r
4 : type t : s t r
5 : rtype : s t r
6 """
7 v = 0
8 for c in s :
9 v = v ^ ord ( c )
10 for c in t :
11 v = v ^ ord ( c )
12 return chr ( v )

3. 50. Pow(x, n) (medium). for n, such as 10, we represent it as 1010,

if we have a base and an result, we start from the least significant
position, each time we move, the base because base*base, and if the
value if 1, then we multiple the answer with the base.
9

Python Data Structures

9.1 Introduction

Python is object-oriented programming language where each object is im-

plemented using C++ in the backend. The built-in data types of C++ fol-
lows more rigidly to the abstract data structures. We would get by just
learning how to use Python data types alone: its property–immutable or
mutable, its built-in in-place operations–such as append(), insert(),
add(), remove(), replace() and so, and built-in functions and opera-
tions that offers additional ability to manipulate data structure–an object
here. However, some data types’ behaviors might confuse us with abstract
data structures, making it hard to access and evaluate its efficiency.
In this chapter and the following three chapters, we starts to learn
Python data structures by relating its C++ data structures to our learned
abstract data structures, and then introduce each’s property, built-in opera-
tions, built-in functions and operations. Please read the section Understand-
ing Object in the Appendix–Python Knowledge Base to to study the properties
of Built-in Data Types first if Python is not your familiar language.

Python Built-in Data Types In Python 3, we have four built-in scalar

data types: int, float, complex, bool. At higher level, it includes four
sequence types: str–string type, list, tuple, and range; one mapping
type: dict and two set types: set and fronzenset. Among these 12 built-
in data types, other than the scalar types, the others representing some of
our introduced abstract data structures.

111
112 9. PYTHON DATA STRUCTURES

Abstract Data Types with Python Data Types/Modules To relate

the abstract data types to our build-in data types we have:

• Sequence type corresponds to Array data structure: includes string,

list, tuple, and range

• dict, set, and fronzenset mapps to the hash tables.

• For linked list, stack, queue, we either need to implement it with build-
in data types or we have Python Modules.

9.2 Array and Python Sequence

We will see from other remaining contents of this part that how array-based
Python data structures are used to implement the other data structures. On
the LeetCode, these two data structures are involved into 25% of LeetCode
Problems.

9.2.1 Introduction to Python Sequence

In Python, sequences are defined as ordered sets of objects indexed by non-
negative integers, we use index to refer and in Python it defaultly starts
at 0. Sequence types are iterable. Iterables are able to be iterated over.
Iterators are the agents that perform the iteration, where we have iter()
built-in function.

• string is a sequence of characters, it is immutable, and with static

array as its backing data structure in C++.

• list and tuple are sequences of arbitrary objects.–meaning it ac-

cepts different types of objects including the 12 built-in data types
and any other objects. This sounds fancy and like magic! However,
it does not change the fact that its backing abstract data structure
is dynamic array. They are able to have arbitrary type of objects
through the usage of pointers to objects, pointing to object’s physical
location, and each pointer takes fixed number of bytes in space (in
32-bit system, 4 bytes, and for a 64-bit system, 8 bytes instead).

• range: In Python 3, range() is a type. But range does not have

backing array data structure to save a sequence of value, it computes
on demand. Thus we will first introduce range and get done with it
before we focus on other sequence types.
1 >>> type ( r a n g e )
2 < c l a s s ' type '>
9.2. ARRAY AND PYTHON SEQUENCE 113

All these sequence type data structures share the most common methods
and operations shown in Table 9.4 and 9.5. To note that in Python, the
indexing starts from 0.
Let us examine each type of sequence further to understand its perfor-
mance, and relation to array data structures.

9.2.2 Range
Range Syntax
The range object has three attributes: start, stop, step, and a range
object can be created as range(start, stop, step. These attributes need
to integers–both negative and positive works–to define a range, which will
be [start, stop). The default value for start and stop is 0. For example:
1 >>> a = r a n g e ( 1 0 )
2 >>> b = r a n g e ( 0 , 1 0 , 2 )
3 >>> a , b
4 ( range (0 , 10) , range (0 , 10 , 2) )

Now, we print it out:

1 >>> f o r i i n a :
2 ... p r i n t ( i , end= ' ' )
3 ...
4 0 1 2 3 4 5 6 7 8 9

And for b, it will be:

1 >>> f o r i i n b :
2 ... p r i n t ( i , end= ' ' )
3 ...
4 0 2 4 6 8

Like any other sequence types, range is iterable, can be indexed and sliced.

What you do not see

The range object might be a little bizarre when we first learn it. Is it an
iterator, a generator? The answer to both questions are NO. What is it then?
It is more like a sequence type that differs itself without other counterparts
with its own unique properties:

• It is “lazy” in the sense that it doesn’t generate every number that it

“contain” when we create it. Instead it gives those numbers to us as
we need them when looping over it. Thus, it saves us space:
1 >>> a = r a n g e ( 1 _000_000 )
2 >>> b = [ i f o r i i n a ]
3 >>> a . __sizeof__ ( ) , b . __sizeof__ ( )
4 (48 , 8697440)
114 9. PYTHON DATA STRUCTURES

This is just how we define the behavior of the range class back in the
C++ code. We does not need to save all integers in the range, but be
generated with function that specifically asks for it.

• It is not an iterator; it won’t get consumed. We can iterate it multiple

times. This is understandable given how it is implemented.

9.2.3 String
String is static array and its items are just characters, represented using
ASCII or Unicode 1 . String is immutable which means once its created we
can no longer modify its content or extent its size. String is more compact
compared with storing the characters in list because of its backing array
wont be assigned to any extra space.

String Syntax

strings can be created in Python by wrapping a sequence of characters in

single or double quotes. Multi-line strings can easily be created using three
quote characters.

New a String We specially introduce some commonly and useful func-

tions.

Join The str.join() method will concatenate two strings, but in a way
that passes one string through another. For example, we can use the
str.join() method to add whitespace to that string, which we can do
like so:
1 b a l l o o n = "Sammy has a b a l l o o n . "
2 print ( " " . join ( balloon ) )
3 #Ouput
4 S a mm y h a s a b a l l o o n .

The str.join() method is also useful to combine a list of strings into a

new single string.
1 print ( " , " . join ( [ "a" , "b" , " c " ] ) )
2 #Ouput
3 abc

1
In Python 3, all strings are represented in Unicode. In Python 2 are stored internally
as 8-bit ASCII, hence it is required to attach ’u’ to make it Unicode. It is no longer
necessary now.
9.2. ARRAY AND PYTHON SEQUENCE 115

Split Just as we can join strings together, we can also split strings up using
the str.split() method. This method separates the string by whitespace
if no other parameter is given.
1 print ( balloon . s p l i t () )
2 #Ouput
3 [ 'Sammy ' , ' has ' , ' a ' , ' b a l l o o n . ' ]

We can also use str.split() to remove certain parts of an original string. For
example, let’s remove the letter ’a’ from the string:
1 print ( balloon . s p l i t ( "a" ) )
2 #Ouput
3 [ ' S ' , 'mmy h ' , ' s ' , ' b ' , ' l l o o n . ' ]

Now the letter a has been removed and the strings have been separated
where each instance of the letter a had been, with whitespace retained.

Replace The str.replace() method can take an original string and re-
turn an updated string with some replacement.
Let’s say that the balloon that Sammy had is lost. Since Sammy no
longer has this balloon, we will change the substring "has" from the original
string balloon to "had" in a new string:
1 p r i n t ( b a l l o o n . r e p l a c e ( " has " , " had " ) )
2 #Ouput
3 Sammy had a b a l l o o n .

We can use the replace method to delete a substring:

1 b a l l o n . r e p l a c e ( " has " , ' ' )

Using the string methods str.join(), str.split(), and str.replace()

will provide you with greater control to manipulate strings in Python.

Conversion between Integer and Character Function ord() would

get the int value (ASCII) of the char. And in case you want to convert back
after playing with the number, function chr() does the trick.
1 p r i n t ( ord ( 'A ' ) )# Given a s t r i n g o f l e n g t h one , r e t u r n an i n t e g e r
r e p r e s e n t i n g t h e Unicode code p o i n t o f t h e c h a r a c t e r when
t h e argument i s a u n i c o d e o b j e c t ,
2 p r i n t ( chr (65) )

String Functions
Because string is one of the most fundamental built-in data types, this makes
managing its built-in common methods shown in Table 9.1 and 9.2 necessary.
Use boolean methods to check whether characters are lower case, upper case,
or title case, can help us to sort our data appropriately, as well as provide
us with the opportunity to standardize data we collect by checking and then
modifying strings as needed.
116 9. PYTHON DATA STRUCTURES

Table 9.1: Common Methods of String

Method Description
count(substr, [start, end]) Counts the occurrences of a substring with op-
tional start and end position
find(substr, [start, end]) Returns the index of the first occurrence of a
substring or returns -1 if the substring is not
found
join(t) Joins the strings in sequence t with current
string between each item
lower()/upper() Converts the string to all lowercase or upper-
case
replace(old, new) Replaces old substring with new substring
strip([characters]) Removes withspace or optional characters
split([characters], [maxsplit]) Splits a string separated by whitespace or an
optional separator. Returns a list
expandtabs([tabsize]) Replaces tabs with spaces.

Table 9.2: Common Boolean Methods of String

Boolean Method Description
isalnum() String consists of only alphanumeric charac-
ters (no symbols)
isalpha() String consists of only alphabetic characters
(no symbols)
islower() String’s alphabetic characters are all lower
case
isnumeric() String consists of only numeric characters
isspace() String consists of only whitespace characters
istitle() String is in title case
isupper() String’s alphabetic characters are all upper
case

9.2.4 List
The underlying abstract data structure of list data types is dynamic
array, meaning we can add, delete, modify items in the list. It supports
random access by indexing. List is the most widely one among sequence
types due to its mutability.
Even if list supports data of arbitrary types, we do not prefer to do this.
Use tuple or namedtuple for better practice and offers better clarification.

What You see: List Syntax

New a List: We have multiple ways to new either empty list or with
initialized data. List comprehension is an elegant and concise way to create
new list from an existing list in Python.
1 # new an empty l i s t
9.2. ARRAY AND PYTHON SEQUENCE 117

2 lst = []
3 l s t 2 = [ 2 , 2 , 2 , 2 ] # new a l i s t with i n i t i a l i z a t i o n
4 lst3 = [3]∗5 # new a l i s t s i z e 5 with 3 a s i n i t i a l i z a t i o n
5 print ( lst , lst2 , lst3 )
6 # output
7 # [ ] [2 , 2 , 2 , 2] [3 , 3 , 3 , 3 , 3]

We can use list comprehension and use enumerate function to loop

over its items.
1 lst1 = [3]∗5 # new a l i s t s i z e 5 with 3 a s i n i t i a l i z a t i o n
2 l s t 2 = [ 4 f o r i in range (5) ]
3 f o r idx , v i n enumerate ( l s t 1 ) :
4 l s t 1 [ i d x ] += 1

Search We use method list.index() to obtain the index of the searched

item.
1 p r i n t ( l s t . i n d e x ( 4 ) ) #f i n d 4 , and r e t u r n t h e i n d e x
2 # output
3 # 3

If we print(lst.index(5)) will raise ValueError: 5 is not in list. Use the

following code instead.
1 i f 5 in l s t :
2 p r i n t ( l s t . index (5) )

Add Item We can add items into list through insert(index, value)–
inserting an item at a position in the original list or list.append(value)–
appending an item at the end of the list.
1 # INSERTION
2 l s t . i n s e r t ( 0 , 1 ) # i n s e r t an e l e m e n t a t i n d e x 0 , and s i n c e i t i s
empty l s t . i n s e r t ( 1 , 1 ) has t h e same e f f e c t
3 print ( l s t )
4
5 l s t 2 . i n s e r t (2 , 3)
6 print ( lst2 )
7 # output
8 # [1]
9 # [2 , 2 , 3 , 2 , 2]
10 # APPEND
11 f o r i in range (2 , 5) :
12 l s t . append ( i )
13 print ( l s t )
14 # output
15 # [1 , 2 , 3 , 4]

Delete Item
Get Size of the List We can use len built-in function to find out the
number of items storing in the list.
1 print ( len ( lst2 ) )
2 # 4
118 9. PYTHON DATA STRUCTURES

What you do not see: Understand List

To understand list, we need start with its C++ implementation, we do not
introduce the C++ source code, but instead use function to access and
evaluate its property.

List Object and Pointers In a 64-bits (8 bytes) system, such as in

Google Colab, a pointer is represented with 8 bytes space. In Python3, the
list object itself takes 64 bytes in space. And any additional element takes
8 bytes. In Python, we can use getsizeof() from sys module to get its
memory size, for example:
1 lst_lst = [[] , [1] , [ '1 ' ] , [1 , 2] , [ '1 ' , '2 ' ] ]

And now, let us get the memory size of lst_lst and each list item in this
list.
1 import s y s
2 for l s t in l s t _ l s t :
3 p r i n t ( s y s . g e t s i z e o f ( l s t ) , end= ' ' )
4 print ( sys . g e t s i z e o f ( l s t _ l s t ) )

The output is:

64 72 72 80 80 104

We can see a list of integers takes the same memory size as of a list of strings
with equal length.

insert and append Whenever insert and append is called, and assume
the original length is n, Python could compare n + 1 with its allocated
length. If you append or insert to a Python list and the backing array isn’t
big enough, the backing array must be expanded. When this happens, the
backing array is grown by approximately 12% the following formula (comes
from C++):
1 n e w _ a l l o c a t e d = ( s i z e _ t ) n e w s i z e + ( n e w s i z e >> 3 ) +
2 ( newsize < 9 ? 3 : 6) ;

Do an experiment, we can see how it works. Here we use id() function to

obtain the pointer’s physical address. We compare the size of the list and
its underlying backing array’s real additional size in space (with 8 bytes as
unit).
1 a = []
2 f o r s i z e in range (17) :
3 a . in se rt (0 , s i z e )
4 p r i n t ( ' s i z e : ' , l e n ( a ) , ' b y t e s : ' , ( s y s . g e t s i z e o f ( a ) −64) / / 8 , ' i d
: ' , id (a) )

The output is:

9.2. ARRAY AND PYTHON SEQUENCE 119

size : 1 b y t e s : 4 i d : 140682152394952
size : 2 b y t e s : 4 i d : 140682152394952
size : 3 b y t e s : 4 i d : 140682152394952
size : 4 b y t e s : 4 i d : 140682152394952
size : 5 b y t e s : 8 i d : 140682152394952
size : 6 b y t e s : 8 i d : 140682152394952
size : 7 b y t e s : 8 i d : 140682152394952
size : 8 b y t e s : 8 i d : 140682152394952
size : 9 b y t e s : 16 i d : 140682152394952
size : 10 b y t e s : 16 i d : 140682152394952
size : 11 b y t e s : 16 i d : 140682152394952
size : 12 b y t e s : 16 i d : 140682152394952
size : 13 b y t e s : 16 i d : 140682152394952
size : 14 b y t e s : 16 i d : 140682152394952
size : 15 b y t e s : 16 i d : 140682152394952
size : 16 b y t e s : 16 i d : 140682152394952
size : 17 b y t e s : 25 i d : 140682152394952

The output addresses the growth patterns as [0, 4, 8, 16, 25, 35, 46, 58, 72,
88, ...].
Amortizely, append takes O(1). However, it is O(n) for insert because
it has to first shift all items in the original list from [pos, end] by one position,
and put the item at pos with random access.

Common Methods of List

We have already seen how to use append, insert. Now, Table 9.3 shows us
the common List Methods, and they will be used as list.methodName().

Table 9.3: Common Methods of List

Method Description
append() Add an element to the end of the list
extend(l) Add all elements of a list to the another list
insert(index, val) Insert an item at the defined index s
pop(index) Removes and returns an element at the given
index
remove(val) Removes an item from the list
clear() Removes all items from the list
index(val) Returns the index of the first matched item
count(val) Returns the count of number of items passed
as an argument
sort() Sort items in a list in ascending order
reverse() Reverse the order of items in the list (same as
list[::-1])
copy() Returns a shallow copy of the list (same as
list[::])
120 9. PYTHON DATA STRUCTURES

Two-dimensional List
Two dimensional list is a list within a list. In this type of array the position
of an data element is referred by two indices instead of one. So it represents
a table with rows and columns of data. For example, we can declare the
following 2-d array:
1 ta = [ [ 1 1 , 3 , 9 , 1 ] , [ 2 5 , 6 , 1 0 ] , [ 1 0 , 8 , 12 , 5 ] ]

The scalar data in two dimensional lists can be accessed using two indices.
One index referring to the main or parent array and another index referring
to the position of the data in the inner list. If we mention only one index
then the entire inner list is printed for that index position. The example
below illustrates how it works.
1 p r i n t ( ta [ 0 ] )
2 p r i n t ( ta [ 2 ] [ 1 ] )

And with the output

[11 , 3 , 9 , 1]
8

In the above example, we new a 2-d list and initialize them with values.
There are also ways to new an empty 2-d array or fix the dimension of the
outer array and leave it empty for the inner arrays:
1 # empty two d i m e n s i o n a l l i s t
2 empty_2d = [ [ ] ]
3
4 # f i x the outer dimension
5 fix_out_d = [ [ ] f o r _ i n r a n g e ( 5 ) ]
6 p r i n t ( fix_out_d )

All the other operations such as delete, insert, update are the same as of
the one-dimensional list.

Matrices We are going to need the concept of matrix, which is defined as

a collection of numbers arranged into a fixed number of rows and columns.
For example, we define 3×4 (read as 3 by 4) order matrix is a set of numbers
arranged in 3 rows and 4 columns. And for m1 and m2 , they are doing the
same things.
1 rows , c o l s = 3 , 4
2 m1 = [ [ 0 f o r _ i n r a n g e ( c o l s ) ] f o r _ i n r a n g e ( rows ) ] # rows ∗
cols
3 m2 = [ [ 0 ] ∗ c o l s f o r _ i n r a n g e ( rows ) ] # rows ∗ c o l s
4 p r i n t (m1, m2)

The output is:

[[0 , 0 , 0 , 0] , [0 , 0 , 0 , 0] , [0 , 0 , 0 , 0]] [[0 , 0 , 0 , 0] , [0 , 0 ,
0 , 0] , [0 , 0 , 0 , 0]]

We assign value to m1 and m2 at index (1, 2) with value 1:

9.2. ARRAY AND PYTHON SEQUENCE 121

1 m1 [ 1 ] [ 2 ] = 1
2 m2 [ 1 ] [ 2 ] = 1
3 p r i n t (m1, m2)

And the output is:

[[0 , 0 , 0 , 0] , [0 , 0 , 1 , 0] , [0 , 0 , 0 , 0]] [[0 , 0 , 0 , 0] , [0 , 0 ,
1 , 0] , [0 , 0 , 0 , 0]]

However, we can not declare it in the following way, because we end up with
some copies of the same inner lists, thus modifying one element in the inner
lists will end up changing all of the them in the corresponding positions.
Unless the feature suits the situation.
1 # wrong d e c l a r a t i o n
2 m4 = [ [ 0 ] ∗ c o l s ] ∗ rows
3 m4 [ 1 ] [ 2 ] = 1
4 p r i n t (m4)

With output:
[[0 , 0 , 1 , 0] , [0 , 0 , 1 , 0] , [0 , 0 , 1 , 0]

Access Rows and Columns In the real problem solving, we might need
to access rows and columns. Accessing rows is quite easy since it follows the
declaraion of two-dimensional array.
1 # a c c e s s i n g row
2 f o r row i n m1 :
3 p r i n t ( row )

With the output:

[0 , 0 , 0 , 0]
[0 , 0 , 1 , 0]
[0 , 0 , 0 , 0]

However, accessing columns will be less straightforward. To get each column,

we need another inner for loop or list comprehension through all rows and
obtain the value from that column. This is usually a lot slower than accessing
each row due to the fact that each row is a pointer while each col we need
to obtain from each row.
1 # accessing col
2 f o r i in range ( c o l s ) :
3 c o l = [ row [ i ] f o r row i n m1 ]
4 print ( col )

The output is:

[0 , 0, 0]
[0 , 0, 0]
[0 , 1, 0]
[0 , 0, 0]
122 9. PYTHON DATA STRUCTURES

There’s also a handy “idiom” for transposing a nested list, turning ’columns’
into ’rows’:
1 transposedM1 = l i s t ( z i p ( ∗m1) )
2 p r i n t ( transposedM1 )

The output will be:

[ ( 0 , 0 , 0) , (0 , 0 , 0) , (0 , 1 , 0) , (0 , 0 , 0) ]

9.2.5 Tuple
A tuple has static array as its backing abstract data structure in C, which
is immutable–we can not add, delete, or replace items once its created and
assigned with value. You might think if list is a dynamic array and has
no restriction same as of the tuple, why would we need tuple then?

Tuple VS List We list how we use each data type and why is it. The
main benefit of tuple’s immutability is it is hashable, we can use them as
keys in the hash table–dictionary types, whereas the mutable types such
as list and range can not be applied. Besides, in the case that the data
does not to change, the tuple’s immutability will guarantee that the data
remains write-protected and iterating an immutable sequence is faster than
a mutable sequence, giving it slight performance boost. Also, we generally
use tuple to store a variety of data types. For example, in a class score
system, for a student, we might want to have its name, student id, and test
score, we can write (’Bob’, 12345, 89).

Tuple Syntax
New and Initialize Tuple Tuples are created by separating the items
with a comma. It is commonly wrapped in parentheses for better readability.
Tuple can also be created via a built-in function tuple(), if the argument to
tuple() is a sequence then this creates a tuple of elements of that sequences.
This is also used to realize type conversion.
An empty tuple:
1 tup = ( )
2 tup3 = t u p l e ( )

When there is only one item, put comma behind so that it wont be translated
as string, which is a bit bizarre!
1 tup2 = ( ' c r a c k ' , )
2 tup1 = ( ' c r a c k ' , ' l e e t c o d e ' , 2 0 1 8 , 2 0 1 9 )

Converting a string to a tuple with each character separated.

1 tup4 = t u p l e ( " l e e t c o d e " ) # t h e s e q u e n c e i s p a s s e d a s a t u p l e o f
elements
2 >> tup4 : ( ' l ' , ' e ' , ' e ' , ' t ' , ' c ' , 'o ' , 'd ' , ' e ' )
9.2. ARRAY AND PYTHON SEQUENCE 123

Converting a list to a tuple.

1 tup5 = t u p l e ( [ ' c r a c k ' , ' l e e t c o d e ' , 2 0 1 8 , 2 0 1 9 ] ) # same a s t u p l e 1

If we print out these tuples, it will be

1 tup1 : ( ' crack ' , ' l e e t c o d e ' , 2018 , 2019)
2 tup2 : crack
3 tup3 : ()
4 tup4 : ( ' l ' , ' e ' , ' e ' , ' t ' , ' c ' , 'o ' , 'd ' , 'e ')
5 tup5 : ( ' crack ' , ' l e e t c o d e ' , 2018 , 2019)

Changing a Tuple Assume we have the following tuple:

1 tup = ( ' a ' , ' b ' , [ 1 , 2 , 3 ] )

If we want to change it to (’c’, ’b’, [4,2,3]). We can not do the fol-

lowing operation as we said a tuple cannot be changed in-place once it has
been assigned.
1 tup = ( ' a ' , ' b ' , [ 1 , 2 , 3 ] )
2 #tup [ 0 ] = ' c ' #TypeError : ' t u p l e ' o b j e c t d o e s not s u p p o r t item
assignment

Instead, we initialize another tuple and assign it to tup variable.

1 tup=( ' c ' , ' b ' , [ 4 , 2 , 3 ] )

However, for its items which are mutable itself, we can still manipulate it.
For example, we can use index to access the list item at the last position of
a tuple and modify the list.
1 tup [ − 1 ] [ 0 ] = 4
2 #( ' a ' , ' b ' , [ 4 , 2 , 3 ] )

Understand Tuple
The backing structure is static array which states that the way the tuple
is structure is similar to list, other than its write-protected. We will just
brief on its property.

Tuple Object and Pointers Tuple object itself takes 48 bytes. And all
the others are similar to corresponding section in list.
1 lst_tup = [ ( ) , ( 1 , ) , ( ' 1 ' ,) , (1 , 2) , ( ' 1 ' , ' 2 ' ) ]
2 import s y s
3 f o r tup i n l s t _ t u p :
4 p r i n t ( s y s . g e t s i z e o f ( tup ) , end= ' ' )

The output will be:

48 56 56 64 64
124 9. PYTHON DATA STRUCTURES

Named Tuples
In named tuple, we can give all records a name, say “Computer_Science” to
indicate the class name, and we give each item a name, say ’name’, ’id’, and
’score’. We need to import namedtuple class from module collections.
For example:
1 r e c o r d 1 = ( ' Bob ' , 1 2 3 4 5 , 8 9 )
2 from c o l l e c t i o n s import namedtuple
3 Record = namedtuple ( ' Computer_Science ' , ' name i d s c o r e ' )
4 r e c o r d 2 = Record ( ' Bob ' , i d =12345 , s c o r e =89)
5 print ( record1 , record2 )

The output will be:

1 ( ' Bob ' , 1 2 3 4 5 , 8 9 ) Computer_Science ( name= ' Bob ' , i d =12345 , s c o r e
=89)

9.2.6 Summary
All these sequence type data structures share the most common methods
and operations shown in Table 9.4 and 9.5. To note that in Python, the
indexing starts from 0.

9.2.7 Bonus
Circular Array The corresponding problems include:
1. 503. Next Greater Element II

9.2.8 Exercises
1. 985. Sum of Even Numbers After Queries (easy)
2. 937. Reorder Log Files
You have an array of logs. Each log is a space delimited string of
words.
For each log, the first word in each log is an alphanumeric identifier.
Then, either:
Each word after the identifier will consist only of lowercase letters, or;
Each word after the identifier will consist only of digits.
We will call these two varieties of logs letter-logs and digit-logs. It is
guaranteed that each log has at least one word after its identifier.
Reorder the logs so that all of the letter-logs come before any digit-log.
The letter-logs are ordered lexicographically ignoring identifier, with
the identifier used in case of ties. The digit-logs should be put in their
original order.
Return the final order of the logs.
9.2. ARRAY AND PYTHON SEQUENCE 125

Table 9.4: Common Methods for Sequence Data Type in Python

Function Method Description
len(s) Get the size of sequence s
min(s, [,default=obj, key=func]) The minimum value in s (alphabetically for
strings)
max(s, [,default=obj, key=func]) The maximum value in s (alphabetically for
strings)
sum(s, [,start=0) The sum of elements in s(return T ypeError
if s is not numeric)
all(s) Return T rue if all elements in s are True (Sim-
ilar to and)
any(s) Return T rue if any element in s is True (sim-
ilar to or)

Table 9.5: Common out of place operators for Sequence Data Type in
Python
Operation Description
s+r Concatenates two sequences of the same type
s*n Make n copies of s, where n is an integer
v1 , v2 , ..., vn = s Unpack n variables from s
s[i] Indexing-returns ith element of s
s[i:j:stride] Slicing-returns elements between i and j with
optinal stride
x in s Return T rue if element x is in s
x not in s Return T rue if element x is not in s

1 Example 1 :
2
3 I nput : [ " a1 9 2 3 1 " , " g1 a c t c a r " , " zo4 4 7 " , " ab1 o f f key
dog " , " a8 a c t zoo " ]
4 Output : [ " g1 a c t c a r " , " a8 a c t zoo " , " ab1 o f f key dog " , " a1 9
2 3 1 " , " zo4 4 7 " ]
5
6
7
8 Note :
9
10 0 <= l o g s . l e n g t h <= 100
11 3 <= l o g s [ i ] . l e n g t h <= 100
12 l o g s [ i ] i s g u a r a n t e e d t o have an i d e n t i f i e r , and a word
a f t e r the i d e n t i f i e r .

1 def reorderLogFiles ( s e l f , logs ) :

2 letters = []
3 digits = []
4 f o r idx , l o g i n enumerate ( l o g s ) :
5 splited = log . s p l i t ( ' ' )
6 id = s p l i t e d [ 0 ]
7 type = s p l i t e d [ 1 ]
126 9. PYTHON DATA STRUCTURES

8
9 i f type . i s n u m e r i c ( ) :
10 d i g i t s . append ( l o g )
11 else :
12 l e t t e r s . append ( ( ' ' . j o i n ( s p l i t e d [ 1 : ] ) , i d ) )
13 l e t t e r s . s o r t ( ) #d e f a u l t s o r t i n g by t h e f i r s t e l e m e n t
and then t h e s e c o n d i n t h e t u p l e
14
15 return [ id + ' ' + other f o r other , id in l e t t e r s ] +
digits

1 def reorderLogFiles ( logs ) :

2 digit = []
3 letters = []
4 i n f o = {}
5 for log in logs :
6 i f ' 0 ' <= l o g [ −1] <= ' 9 ' :
7 d i g i t . append ( l o g )
8 else :
9 l e t t e r s . append ( l o g )
10 index = log . index ( ' ' )
11 i n f o [ log ] = log [ index +1:]
12
13 l e t t e r s . s o r t ( key= lambda x : i n f o [ x ] )
14 return l e t t e r s + d i g i t

9.3 Linked List

Python does not have built-in data type or modules that offers the Linked
List-like data structures, however, it is not hard to implement it ourselves.

9.3.1 Singly Linked List

Figure 9.1: Linked List Structure

Linked list consists of nodes, and each node consists of at least two
variables for singly linked lit: val to save data and next, a pointer that
points to the successive node. The Node class is given as:
1 c l a s s Node ( o b j e c t ) :
2 d e f __init__ ( s e l f , v a l = None ) :
3 s e l f . val = val
9.3. LINKED LIST 127

4 s e l f . next = None

In Singly Linked List, usually we can start to with a head node which
points to the first node in the list; only with this single node we are able
to trace other nodes. For simplicity, demonstrate the process without using
class, but we provide a class implementation with name SinglyLinkeList
in our online python source code. Now, let us create an empty node named
head.
1 head = None

We need to implement its standard operations, including insertion/append,

delete, search, clear. However, if we allow to the head node to be None, there
would be special cases to handle. Thus, we implement a dummy node–a
node but with None as its value as the head, to simplify the coding. Thus,
we point the head to a dummy node:
1 head = Node ( None )

Append Operation As the append function in list, we add node at the

very end of the linked list. If without the dummy node, then there will be
two cases:

• When head is an empty node, we assign the new node to head.

• When it is not empty, we because all we have that is available is the

head pointer, thus, it we need to first traverse all the nodes up till the
very last node whose next is None, then we connect node to the last
node through assigning it to the last node’s next pointer.

The first case is simply bad: we would generate a new node and we can not
track the head through in-place operation. However, with the dummy node,
only the second case will appear. The code is:
1 d e f append ( head , v a l ) :
2 node = Node ( v a l )
3 c u r = head
4 w h i l e c u r . next :
5 c u r = c u r . next
6 c u r . next = node
7 return

Now, let use create the same exact linked list in Fig. 9.1:
1 f o r v a l i n [ 'A ' , 'B ' , 'C ' , 'D ' ] :
2 append ( head , v a l )
128 9. PYTHON DATA STRUCTURES

Generator and Search Operations In order to traverse and iterate the

linked list using syntax like for ... in statement like any other sequence
data types in Python, we implement the gen() function that returns a
generator of all nodes of the list. Because we have a dummy node, so we
always start at head.next.
1 d e f gen ( head ) :
2 c u r = head . next
3 while cur :
4 y i e l d cur
5 c u r = c u r . next

Now, let us print out the linked list we created:

1 f o r node i n i t e r ( head ) :
2 p r i n t ( node . v a l , end = ' ' )

Here is the output:

A B C D

Search operation we find a node by value, and we return this node, otherwise,
we return None.
1 d e f s e a r c h ( head , v a l ) :
2 f o r node i n gen ( head ) :
3 i f node . v a l == v a l :
4 r e t u r n node
5 r e t u r n None

Now, we search for value ‘B’ with:

1 node = s e a r c h ( head , 'B ' )

Delete Operation For deletion, there are two scenarios: deleting a node
by value when we are given the head node and deleting a given node such
as the node we got from searching ’B’.
The first case requires us to first locate the node first, and rewire the
pointers between the predecessor and successor of the deleting node. Again
here, if we do not have a dummy node, we would have two cases: if the
node is the head node, repoint the head to the next node, we connect the
previous node to deleting node’s next node, and the head pointer remains
untouched. With dummy node, we would only have the second situation.
In the process, we use an additional variable prev to track the predecessor.
1 d e f d e l e t e ( head , v a l ) :
2 c u r = head . next # s t a r t from dummy node
3 prev = head
4 while cur :
5 i f c u r . v a l == v a l :
6 # rewire
7 prev . next = c u r . next
8 return
9.3. LINKED LIST 129

9 prev = c u r
10 c u r = c u r . next

Now, let us delete one more node–’A’ with this function.

1 d e l e t e ( head , 'A ' )
2 f o r n i n gen ( head ) :
3 p r i n t ( n . v a l , end = ' ' )

Now the output will indicate we only have two nodes left:
1 C D

The second case might seems a bit impossible–we do not know its pre-
vious node, the trick we do is to copy the value of the next node to current
node, and we delete the next node instead by pointing current node to the
node after next node. While, that is only when the deleting node is not the
last node. When it is, we have no way to completely delete it; but we can
make it “invalid” by setting value and Next to None.
1 d e f d e l e t e ( head , v a l ) :
2 c u r = head . next # s t a r t from dummy node
3 prev = head
4 while cur :
5 i f c u r . v a l == v a l :
6 # rewire
7 prev . next = c u r . next
8 return
9 prev = c u r
10 c u r = c u r . next

Now, let us try deleting the node ’B’ via our previously found node.
1 deleteByNode ( node )
2 f o r n i n gen ( head ) :
3 p r i n t ( n . v a l , end = ' ' )

The output is:

1 A C D

Clear When we need to clear all the nodes of the linked list, we just set
the node next to the dummy head to None.
1 def clear ( s e l f ) :
2 s e l f . head = None
3 self . size = 0

Question: Some linked list can only allow insert node at the tail which
is Append, some others might allow insertion at any location. To get the
length of the linked list easily in O(1), we need a variable to track the size
130 9. PYTHON DATA STRUCTURES

Figure 9.2: Doubly Linked List

9.3.2 Doubly Linked List

On the basis of Singly linked list, doubly linked list (dll) contains an extra
pointer in the node structure which is typically called prev (short for previ-
ous) and points back to its predecessor in the list. We define the Node class
as:
1 c l a s s Node :
2 d e f __init__ ( s e l f , v a l , prev = None , next = None ) :
3 s e l f . val = val
4 s e l f . prev = prev # r e f e r e n c e t o p r e v i o u s node i n DLL
5 s e l f . next = next # r e f e r e n c e t o next node i n DLL

Similarly, let us start with setting the dummy node as head:

1 head = Node ( )

Now, instead of for me to continue to implement all operations that are

slightly variants of the singly linked list, why do not you guys implement it?
Do not worry, try it first, and also I have the answer covered in the google
colab, enjoy!
Now, I assume that you have implemented those operations and or
checked up the solutions. We would notice in search() and gen(), the
code is exactly the same, and for other operations, there is only one or two
lines of code that differs from SLL. Let’s quickly list these operations:

Append Operation In DLL, we have to set the appending node’s prev

pointer to the last node of the linked list. The code is:
1 d e f append ( head , v a l ) :
2 node = Node ( v a l )
3 c u r = head
4 w h i l e c u r . next :
5 c u r = c u r . next
6 c u r . next = node
7 node . prev = c u r ## o n l y d i f f e r e n c e
8 return

Generator and Search Operations There is no much difference if we

just search through next pointer. However, with the extra prev pointer,
we can have two options: either search forward through next or backward
9.3. LINKED LIST 131

through prev if the given starting node is any node. Whereas for SLL, this is
not an option, because we would not be able to conduct a complete search–
we can only search among the items behind from the given node. When the
data is ordered in some way, or if the program is parallel–situations that
bidirectional search would make sense.
1 d e f gen ( head ) :
2 c u r = head . next
3 while cur :
4 y i e l d cur
5 c u r = c u r . next

1 d e f s e a r c h ( head , v a l ) :
2 f o r node i n gen ( head ) :
3 i f node . v a l == v a l :
4 r e t u r n node
5 r e t u r n None

Delete Operation To delete a node by value, we first find it in the linked

list, and the rewiring process needs to deal with the next node’s prev pointer
if the next node exists.
1 d e f d e l e t e ( head , v a l ) :
2 c u r = head . next # s t a r t from dummy node
3 while cur :
4 i f c u r . v a l == v a l :
5 # rewire
6 c u r . prev . next = c u r . next
7 i f c u r . next :
8 c u r . next . prev = c u r . prev
9 return
10 c u r = c u r . next

For deleteByNode, because we are cutting off node.next, we need to con-

nect node to node.next.next in two directions: first point prev of later
node to current node, and set point current node’s next to the later node.
1 d e f deleteByNode ( node ) :
2 # p u l l t h e next node t o c u r r e n t node
3 i f node . next :
4 node . v a l = node . next . v a l
5 i f node . next . next :
6 node . next . next . prev = node
7 node . next = node . next . next
8 e l s e : #l a s t node
9 node . prev . next = None
10 r e t u r n node

Comparison We can see there is some slight advantage of dll over sll, but
it comes with the cost of handing the extra prev. This would only be an
advantage when bidirectional searching plays dominant factor in the matter
of efficiency, otherwise, better stick with sll.
132 9. PYTHON DATA STRUCTURES

Tips From our implementation, in some cases we still need to worry about
if it is the last node or not. The coding logic can further be simplified if we
put a dummy node at the end of the linked list too.

9.3.3 Bonus
Circular Linked List A circular linked list is a variation of linked list in
which the first node connects to last node. To make a circular linked list
from a normal linked list: in singly linked list, we simply set the last node’s
next pointer to the first node; in doubly linked list, other than setting the
last node’s next pointer, we set the prev pointer of the first node to the last
node making the circular in both directions.
Compared with a normal linked list, circular linked list saves time for us
to go to the first node from the last (both sll and dll) or go to the last node
from the first node (in dll) by doing it in a single step through the extra
connection. Because it is a circle, when ever a search with a while loop is
needed, we need to make sure the end condition: just make sure we searched
a whole cycle by comparing the iterating node to the starting node.

Recursion Recursion offers additional pass of traversal–bottom-up on the

basis of the top-down direction and in practice, it offers clean and simpler
code compared with iteration.

9.3.4 Hands-on Examples

Remove Duplicates (L83) Given a sorted linked list, delete all dupli-
cates such that each element appear only once.
Example 1 :

Input : 1−>1−>2
Output : 1−>2

Example 2 :

Input : 1−>1−>2−>3−>3
Output : 1−>2−>3

Analysis

This is a linear complexity problem, the most straightforward way is to

iterate through the linked list and compare the current node’s value with
the next’s to check its equivalency: (1) if YES: delete one of the nodes, here
we go for the next node; (2) if NO: we can move to the next node safely and
sound.
9.3. LINKED LIST 133

Iteration without Dummy Node We start from the head in a while

loop, if the next node exists and if the value equals, we delete next node.
However, after the deletion, we can not move to next directly; say if we have
1->1->1, when the second 1 is removed, if we move, we will be at the last
1, and would fail removing all possible duplicates. The code is given:
1 d e f d e l e t e D u p l i c a t e s ( s e l f , head ) :
2 """
3 : type head : ListNode
4 : r t y p e : ListNode
5 """
6 i f not head :
7 r e t u r n None
8
9 d e f i t e r a t i v e ( head ) :
10 c u r r e n t = head
11 while current :
12 i f c u r r e n t . next and c u r r e n t . v a l == c u r r e n t . next . v a l :
13 # d e l e t e next
14 c u r r e n t . next = c u r r e n t . next . next
15 else :
16 c u r r e n t = c u r r e n t . next
17 r e t u r n head
18
19 r e t u r n i t e r a t i v e ( head )

With Dummy Node We see with a dummy node, we put current.next

in the whole loop, because only if the next node exists, would we need to
compare the values. Besides, we do not need to check this condition within
the while loop.
1 d e f i t e r a t i v e ( head ) :
2 dummy = ListNode ( None )
3 dummy . next = head
4 c u r r e n t = dummy
5 w h i l e c u r r e n t . next :
6 i f c u r r e n t . v a l == c u r r e n t . next . v a l :
7 # d e l e t e next
8 c u r r e n t . next = c u r r e n t . next . next
9 else :
10 c u r r e n t = c u r r e n t . next
11 r e t u r n head

Recursion Now, if we use recursion and return the node, thus, at each
step, we can compare our node with the returned node (locating behind the
current node), same logical applies. A better way to help us is drawing out
an example. With 1->1->1. The last 1 will return, and at the second last
1, we can compare them, because it equals, we delete the last 1, now we
backtrack to the first 1 with the second last 1 as returned node, we compare
again. The code is the simplest among all solutions.
134 9. PYTHON DATA STRUCTURES

1 d e f r e c u r s i v e ( node ) :
2 i f node . next i s None :
3 r e t u r n node
4
5 next = r e c u r s i v e ( node . next )
6 i f next . v a l == node . v a l :
7 node . next = node . next . next
8 r e t u r n node

9.3.5 Exercises
Basic operations:

1. 237. Delete Node in a Linked List (easy, delete only given current
node)

2. 2. Add Two Numbers (medium)

3. 92. Reverse Linked List II (medium, reverse in one pass)

4. 83. Remove Duplicates from Sorted List (easy)

5. 82. Remove Duplicates from Sorted List II (medium)

6. Sort List

7. Reorder List

Fast-slow pointers:

1. 876. Middle of the Linked List (easy)

2. Two Pointers in Linked List

3. Merge K Sorted Lists

Recursive and linked list:

1. 369. Plus One Linked List (medium)

9.4 Stack and Queue

Stack data structures fits well for tasks that require us to check the previous
states from cloest level to furtherest level. Here are some examplary appli-
cations: (1) reverse an array, (2) implement DFS iteratively as we will see
in Chapter ??, (3) keep track of the return address during function calls,
(4) recording the previous states for backtracking algorithms.
Queue data structures can be used: (1) implement BFS shown in Chap-
ter ??, (2) implement queue buffer.
9.4. STACK AND QUEUE 135

In the remaining section, we will discuss the implement with the built-in
data types or using built-in modules. After this, we will learn more advanced
queue and stack: the priority queue and the monotone queue which can be
used to solve medium to hard problems on LeetCode.

9.4.1 Basic Implementation

For Queue and Stack data structures, the essential operations are two that
adds and removes item. In Stack, they are usually called PUSH and POP.
PUSH will add one item, and POP will remove one item and return its
value. These two operations should only take O(1) time. Sometimes, we
need another operation called PEEK which just return the element that can
be accessed in the queue or stack without removing it. While in Queue, they
are named as Enqueue and Dequeue.
The simplest implementation is to use Python List by function insert()
(insert an item at appointed position), pop() (removes the element at the
given index, updates the list , and return the value. The default is to remove
the last item), and append(). However, the list data structure can not meet
the time complexity requirement as these operations can potentially take
O(n). We feel its necessary because the code is simple thus saves you from
using the specific module or implementing a more complex one.

Stack The implementation for stack is simplily adding and deleting ele-
ment from the end.
1 # stack
2 s = []
3 s . append ( 3 )
4 s . append ( 4 )
5 s . append ( 5 )
6 s . pop ( )

Queue For queue, we can append at the last, and pop from the first index
always. Or we can insert at the first index, and use pop the last element.
1 # queue
2 # 1 : u s e append and pop
3 q = []
4 q . append ( 3 )
5 q . append ( 4 )
6 q . append ( 5 )
7 q . pop ( 0 )

Running the above code will give us the following output:

1 p r i n t ( ' s t a c k : ' , s , ' queue : ' , q )
2 s t a c k : [ 3 , 4 ] queue : [ 4 , 5 ]
136 9. PYTHON DATA STRUCTURES

The other way to implement it is to write class and implement them

using concept of node which shares the same definition as the linked list
node. Such implementation can satisfy the O(1) time restriction. For both
the stack and queue, we utilize the singly linked list data structure.

Stack and Singly Linked List with top pointer Because in stack, we
only need to add or delete item from the rear, using one pointer pointing at
the rear item, and the linked list’s next is connected to the second toppest
item, in a direction from the top to the bottom.
1 # s t a c k with l i n k e d l i s t
2 ' ' ' a<−b<−c<−top ' ' '
3 c l a s s Stack :
4 d e f __init__ ( s e l f ) :
5 s e l f . top = None
6 self . size = 0
7
8 # push
9 d e f push ( s e l f , v a l ) :
10 node = Node ( v a l )
11 i f s e l f . top : # c o n n e c t top and node
12 node . next = s e l f . top
13 # r e s e t t h e top p o i n t e r
14 s e l f . top = node
15 s e l f . s i z e += 1
16
17 d e f pop ( s e l f ) :
18 i f s e l f . top :
19 v a l = s e l f . top . v a l
20 i f s e l f . top . next :
21 s e l f . top = s e l f . top . next # r e s e t top
22 else :
23 s e l f . top = None
24 s e l f . s i z e −= 1
25 return val
26
27 e l s e : # no e l e m e n t t o pop
28 r e t u r n None

Queue and Singly Linked List with Two Pointers For queue, we need
to access the item from each side, therefore we use two pointers pointing at
the head and the tail of the singly linked list. And the linking direction is
from the head to the tail.
1 # queue with l i n k e d l i s t
2 ' ' ' head−>a−>b−> t a i l ' ' '
3 c l a s s Queue :
4 d e f __init__ ( s e l f ) :
5 s e l f . head = None
6 s e l f . t a i l = None
7 self . size = 0
9.4. STACK AND QUEUE 137

8
9 # push
10 d e f enqueue ( s e l f , v a l ) :
11 node = Node ( v a l )
12 i f s e l f . head and s e l f . t a i l : # c o n n e c t top and node
13 s e l f . t a i l . next = node
14 s e l f . t a i l = node
15 else :
16 s e l f . head = s e l f . t a i l = node
17
18 s e l f . s i z e += 1
19
20 d e f dequeue ( s e l f ) :
21 i f s e l f . head :
22 v a l = s e l f . head . v a l
23 i f s e l f . head . next :
24 s e l f . head = s e l f . head . next # r e s e t top
25 else :
26 s e l f . head = None
27 s e l f . t a i l = None
28 s e l f . s i z e −= 1
29 return val
30
31 e l s e : # no e l e m e n t t o pop
32 r e t u r n None

Also, Python provide two built-in modules: Deque and Queue for such
purpose. We will detail them in the next section.

9.4.2 Deque: Double-Ended Queue

Deque object is a supplementary container data type from Python collec-
tions module. It is a generalization of stacks and queues, and the name
is short for “double-ended queue”. Deque is optimized for adding/popping
items from both ends of the container in O(1). Thus it is preferred over list
in some cases. To new a deque object, we use deque([iterable[, maxlen]]).
This returns us a new deque object initialized left-ro-right with data from
iterable. If maxlen is not specified or is set to None, deque may grow to an
arbitray length. Before implementing it, we learn the functions for deque
class first in Table 9.6.
In addition to the above, deques support iteration, pickling, len(d), re-
versed(d), copy.copy(d), copy.deepcopy(d), membership testing with the in
operator, and subscript references such as d[-1].
Now, we use deque to implement a basic stack and queue,the main meth-
ods we need are: append(), appendleft(), pop(), popleft().
1 ' ' ' Use deque from c o l l e c t i o n s ' ' '
2 from c o l l e c t i o n s import deque
3 q = deque ( [ 3 , 4 ] )
4 q . append ( 5 )
5 q . popleft ()
138 9. PYTHON DATA STRUCTURES

Table 9.6: Common Methods of Deque

Method Description
append(x) Add x to the right side of the deque.
appendleft(x) Add x to the left side of the deque.
pop() Remove and return an element from the right side of the deque.
If no elements are present, raises an IndexError.
popleft() Remove and return an element from the left side of the deque.
If no elements are present, raises an IndexError.
maxlen Deque objects also provide one read-only attribute:Maximum
size of a deque or None if unbounded.
count(x) Count the number of deque elements equal to x.
extend(iterable) Extend the right side of the deque by appending elements from
the iterable argument.
extendleft(iterable) Extend the left side of the deque by appending elements from
iterable. Note, the series of left appends results in reversing
the order of elements in the iterable argument.
remove(value) emove the first occurrence of value. If not found, raises a
ValueError.
reverse() Reverse the elements of the deque in-place and then return
None.
rotate(n=1) Rotate the deque n steps to the right. If n is negative, rotate
to the left.

6
7 s = deque ( [ 3 , 4 ] )
8 s . append ( 5 )
9 s . pop ( )

Printing out the q and s:

1 p r i n t ( ' s t a c k : ' , s , ' queue : ' , q )
2 s t a c k : deque ( [ 3 , 4 ] ) queue : deque ( [ 4 , 5 ] )

Deque and Ring Buffer Ring Buffer or Circular Queue is defined as a

linear data structure in which the operations are performed based on FIFO
(First In First Out) principle and the last position is connected back to the
first position to make a circle. This normally requires us to predefine the
maximum size of the queue. To implement a ring buffer, we can use deque
as a queue as demonstrated above, and when we initialize the object, set the
maxLen. Once a bounded length deque is full, when new items are added,
a corresponding number of items are discarded from the opposite end.

9.4.3 Python built-in Module: Queue

The queue module provides thread-safe implementation of Stack and Queue
like data structures. It encompasses three types of queue as shown in Ta-
9.4. STACK AND QUEUE 139

ble 9.7. In python 3, we use lower case queue, but in Python 2.x it uses
Queue, in our book, we learn Python 3.

Table 9.7: Datatypes in Queue Module, maxsize is an integer that sets the
upperbound limit on the number of items that can be places in the queue.
Insertion will block once this size has been reached, until queue items are
consumed. If maxsize is less than or equal to zero, the queue size is infinite.
Class Data Structure
class queue.Queue(maxsize=0) Constructor for a FIFO queue.
class queue.LifoQueue(maxsize=0) Constructor for a LIFO queue.
class queue.PriorityQueue(maxsize=0) Constructor for a priority queue.

Queue objects (Queue, LifoQueue, or PriorityQueue) provide the public

methods described below in Table 9.8.

Table 9.8: Methods for Queue’s three classes, here we focus on single-thread
background.
Class Data Structure
Queue.put(item[, block[, timeout]]) Put item into the queue.
Queue.get([block[, timeout]]) Remove and return an item from the
queue.
Queue.qsize() Return the approximate size of the
queue.
Queue.empty() Return True if the queue is empty,
False otherwise.
Queue.full() Return True if the queue is full, False
otherwise.

Now, using Queue() and LifoQueue() to implement queue and stack re-
spectively is straightforward:
1 # python 3
2 import queue
3 # imp lementing queue
4 q = queue . Queue ( )
5 f o r i in range (3 , 6) :
6 q . put ( i )

1 import queue
2 # imp lementing s t a c k
3 s = queue . LifoQueue ( )
4
5 f o r i in range (3 , 6) :
6 s . put ( i )

Now, using the following printing:

1 print ( ' stack : ' , s , ' queue : ' , q )
140 9. PYTHON DATA STRUCTURES

2 s t a c k : <queue . LifoQueue o b j e c t a t 0 x000001A4062824A8> queue : <

queue . Queue o b j e c t a t 0 x000001A4062822E8>

Instead we print with:

1 print ( ' stack : ' )
2 w h i l e not s . empty ( ) :
3 p r i n t ( s . g e t ( ) , end= ' ' )
4 p r i n t ( ' \ nqueue : ' )
5 w h i l e not q . empty ( ) :
6 p r i n t ( q . g e t ( ) , end = ' ' )
7 stack :
8 5 4 3
9 queue :
10 3 4 5

9.4.4 Bonus
Circular Linked List and Circular Queue The circular queue is a
linear data structure in which the operation are performed based on FIFO
principle and the last position is connected back to the the first position to
make a circle. It is also called “Ring Buffer”. Circular Queue can be either
implemented with a list or a circular linked list. If we use a list, we initialize
our queue with a fixed size with None as value. To find the position of the
enqueue(), we use rear = (rear + 1)%size. Similarily, for dequeue(), we use
f ront = (f ront + 1)%size to find the next front position.

9.4.5 Exercises
Queue and Stack

1. 225. Implement Stack using Queues (easy)

2. 232. Implement Queue using Stacks (easy)

3. 933. Number of Recent Calls (easy)

Queue fits well for buffering problem.

1. 933. Number of Recent Calls (easy)

2. 622. Design Circular Queue (medium)

1 Write a c l a s s RecentCounter t o count r e c e n t r e q u e s t s .

2
3 I t has o n l y one method : p i n g ( i n t t ) , where t r e p r e s e n t s some
time i n m i l l i s e c o n d s .
4
5 Return t h e number o f p i n g s t h a t have been made from 3000
m i l l i s e c o n d s ago u n t i l now .
6
9.4. STACK AND QUEUE 141

7 Any p i n g with time i n [ t − 3 0 0 0 , t ] w i l l count , i n c l u d i n g t h e

current ping .
8
9 I t i s guaranteed that every c a l l to ping uses a s t r i c t l y l a r g e r
v a l u e o f t than b e f o r e .
10
11
12
13 Example 1 :
14
15 I n p u t : i n p u t s = [ " RecentCounter " , " p i n g " , " p i n g " , " p i n g " , " p i n g " ] ,
inputs = [ [ ] , [ 1 ] , [ 1 0 0 ] , [ 3 0 0 1 ] , [ 3 0 0 2 ] ]
16 Output : [ n u l l , 1 , 2 , 3 , 3 ]

Analysis: This is a typical buffer problem. If the size is larger than the
buffer, then we squeeze out the easilest data. Thus, a queue can be used to
save the t and each time, squeeze any time not in the range of [t-3000, t]:
1 c l a s s RecentCounter :
2
3 d e f __init__ ( s e l f ) :
4 s e l f . ans = c o l l e c t i o n s . deque ( )
5
6 def ping ( s e l f , t ) :
7 """
8 : type t : i n t
9 : rtype : int
10 """
11 s e l f . ans . append ( t )
12 w h i l e s e l f . ans [ 0 ] < t −3000:
13 s e l f . ans . p o p l e f t ( )
14 r e t u r n l e n ( s e l f . ans )

Monotone Queue

1. 84. Largest Rectangle in Histogram

2. 85. Maximal Rectangle

3. 122. Best Time to Buy and Sell Stock II

4. 654. Maximum Binary Tree

Obvious applications:

1. 496. Next Greater Element I

2. 503. Next Greater Element I

3. 121. Best Time to Buy and Sell Stock

1. 84. Largest Rectangle in Histogram

142 9. PYTHON DATA STRUCTURES

2. 85. Maximal Rectangle

3. 122. Best Time to Buy and Sell Stock II

4. 654. Maximum Binary Tree

5. 42 Trapping Rain Water

6. 739. Daily Temperatures

7. 321. Create Maximum Number

9.5 Hash Table

9.5.1 Implementation
In this section, we practice on the learned concepts and methods by imple-
menting hash set and hash map.

Hash Set Design a HashSet without using any built-in hash table libraries.
To be specific, your design should include these functions: (705. Design
HashSet)
add ( v a l u e ) : I n s e r t a v a l u e i n t o t h e HashSet .
c o n t a i n s ( v a l u e ) : Return whether t h e v a l u e e x i s t s i n t h e HashSet
o r not .
remove ( v a l u e ) : Remove a v a l u e i n t h e HashSet . I f t h e v a l u e d o e s
not e x i s t i n t h e HashSet , do n o t h i n g .

For example:
MyHashSet h a s h S e t = new MyHashSet ( ) ;
h a s h S e t . add ( 1 ) ;
h a s h S e t . add ( 2 ) ;
hashSet . c o n t a i n s ( 1 ) ; // r e t u r n s t r u e
hashSet . c o n t a i n s ( 3 ) ; // r e t u r n s f a l s e ( not found )
h a s h S e t . add ( 2 ) ;
hashSet . c o n t a i n s ( 2 ) ; // r e t u r n s t r u e
h a s h S e t . remove ( 2 ) ;
hashSet . c o n t a i n s ( 2 ) ; // r e t u r n s f a l s e ( a l r e a d y removed )

Note: Note: (1) All values will be in the range of [0, 1000000]. (2) The
number of operations will be in the range of [1, 10000].
1 c l a s s MyHashSet :
2
3 d e f _h( s e l f , k , i ) :
4 r e t u r n ( k+i ) % 10001
5
6 d e f __init__ ( s e l f ) :
7 """
8 I n i t i a l i z e your data s t r u c t u r e h e r e .
9.5. HASH TABLE 143

9 """
10 s e l f . s l o t s = [ None ] ∗ 1 0 0 0 1
11 s e l f . s i z e = 10001
12
13 d e f add ( s e l f , key : ' i n t ' ) −> ' None ' :
14 i = 0
15 while i < s e l f . s i z e :
16 k = s e l f . _h( key , i )
17 i f s e l f . s l o t s [ k ] == key :
18 return
19 e l i f not s e l f . s l o t s [ k ] o r s e l f . s l o t s [ k ] == −1:
20 s e l f . s l o t s [ k ] = key
21 return
22 i += 1
23 # double s i z e
24 s e l f . s l o t s = s e l f . s l o t s + [ None ] ∗ s e l f . s i z e
25 s e l f . s i z e ∗= 2
26 r e t u r n s e l f . add ( key )
27
28
29 d e f remove ( s e l f , key : ' i n t ' ) −> ' None ' :
30 i = 0
31 while i < s e l f . s i z e :
32 k = s e l f . _h( key , i )
33 i f s e l f . s l o t s [ k ] == key :
34 s e l f . s l o t s [ k ] = −1
35 return
36 e l i f s e l f . s l o t s [ k ] == None :
37 return
38 i += 1
39 return
40
41 d e f c o n t a i n s ( s e l f , key : ' i n t ' ) −> ' b o o l ' :
42 """
43 Returns t r u e i f t h i s s e t c o n t a i n s t h e s p e c i f i e d e l e m e n t
44 """
45 i = 0
46 while i < s e l f . s i z e :
47 k = s e l f . _h( key , i )
48 i f s e l f . s l o t s [ k ] == key :
49 r e t u r n True
50 e l i f s e l f . s l o t s [ k ] == None :
51 return False
52 i += 1
53 return False

Hash Map Design a HashMap without using any built-in hash table li-
braries. To be specific, your design should include these functions: (706.
Design HashMap (easy))

• put(key, value) : Insert a (key, value) pair into the HashMap. If the
value already exists in the HashMap, update the value.
144 9. PYTHON DATA STRUCTURES

• get(key): Returns the value to which the specified key is mapped, or

-1 if this map contains no mapping for the key. remove(key) : Remove
the mapping for the value key if this map contains the mapping for
the key.
Example:
hashMap = MyHashMap ( )
hashMap . put ( 1 , 1 ) ;
hashMap . put ( 2 , 2 ) ;
hashMap . g e t ( 1 ) ; // r e t u r n s 1
hashMap . g e t ( 3 ) ; // r e t u r n s −1 ( not found )
hashMap . put ( 2 , 1 ) ; // update the e x i s t i n g value
hashMap . g e t ( 2 ) ; // r e t u r n s 1
hashMap . remove ( 2 ) ; // remove t h e mapping f o r 2
hashMap . g e t ( 2 ) ; // r e t u r n s −1 ( not found )

1 c l a s s MyHashMap :
2 d e f _h( s e l f , k , i ) :
3 r e t u r n ( k+i ) % 10001 # [ 0 , 1 0 0 0 1 ]
4 d e f __init__ ( s e l f ) :
5 """
6 I n i t i a l i z e your data s t r u c t u r e h e r e .
7 """
8 s e l f . s i z e = 10002
9 s e l f . s l o t s = [ None ] ∗ s e l f . s i z e
10
11
12 d e f put ( s e l f , key : ' i n t ' , v a l u e : ' i n t ' ) −> ' None ' :
13 """
14 v a l u e w i l l always be non−n e g a t i v e .
15 """
16 i = 0
17 while i < s e l f . s i z e :
18 k = s e l f . _h( key , i )
19 i f not s e l f . s l o t s [ k ] o r s e l f . s l o t s [ k ] [ 0 ] i n [ key ,
−1]:
20 s e l f . s l o t s [ k ] = ( key , v a l u e )
21 return
22 i += 1
23 # d o u b l e s i z e and t r y a g a i n
24 s e l f . s l o t s = s e l f . s l o t s + [ None ] ∗ s e l f . s i z e
25 s e l f . s i z e ∗= 2
26 r e t u r n s e l f . put ( key , v a l u e )
27
28
29 d e f g e t ( s e l f , key : ' i n t ' ) −> ' i n t ' :
30 """
31 Returns t h e v a l u e t o which t h e s p e c i f i e d key i s mapped ,
o r −1 i f t h i s map c o n t a i n s no mapping f o r t h e key
32 """
33 i = 0
34 while i < s e l f . s i z e :
35 k = s e l f . _h( key , i )
9.5. HASH TABLE 145

36 i f not s e l f . s l o t s [ k ] :
37 r e t u r n −1
38 e l i f s e l f . s l o t s [ k ] [ 0 ] == key :
39 return s e l f . s l o t s [ k ] [ 1 ]
40 e l s e : # i f i t s d e l e t e d keep p r o b i n g
41 i += 1
42 r e t u r n −1
43
44
45 d e f remove ( s e l f , key : ' i n t ' ) −> ' None ' :
46 """
47 Removes t h e mapping o f t h e s p e c i f i e d v a l u e key i f t h i s
map c o n t a i n s a mapping f o r t h e key
48 """
49 i = 0
50 while i < s e l f . s i z e :
51 k = s e l f . _h( key , i )
52 i f not s e l f . s l o t s [ k ] :
53 return
54 e l i f s e l f . s l o t s [ k ] [ 0 ] == key :
55 s e l f . s l o t s [ k ] = ( −1 , None )
56 return
57 e l s e : # i f i t s d e l e t e d keep p r o b i n g
58 i += 1
59 return

9.5.2 Python Built-in Data Structures

SET and Dictionary
In Python, we have the standard build-in data structure dictionary and set
using hashtable. For the set classes, they are implemented using dictionar-
ies. Accordingly, the requirements for set elements are the same as those
for dictionary keys; namely, that the object defines both __eq__() and
__hash__() methods. A Python built-in function hash(object =) is im-
plementing the hashing function and returns an integer value as of the hash
value if the object has defined __eq__() and __hash__() methods. As a
result of the fact that hash() can only take immutable objects as input key
in order to be hashable meaning it must be immutable and comparable (has
an __eq__() or __cmp__() method).

Python 2.X VS Python 3.X In Python 2X, we can use slice to access
keys() or items() of the dictionary. However, in Python 3.X, the same syn-
tax will give us TypeError: ’dict_keys’ object does not support indexing.
Instead, we need to use function list() to convert it to list and then slice it.
For example:
1 # Python 2 . x
2 d i c t . keys ( ) [ 0 ]
3
146 9. PYTHON DATA STRUCTURES

4 # Python 3 . x
5 l i s t ( d i c t . keys ( ) ) [ 0 ]

set Data Type Method Description Python Set remove() Removes El-
ement from the Set Python Set add() adds element to a set Python Set
copy() Returns Shallow Copy of a Set Python Set clear() remove all ele-
ments from a set Python Set difference() Returns Difference of Two Sets
Python Set difference_update() Updates Calling Set With Intersection of
Sets Python Set discard() Removes an Element from The Set Python Set
intersection() Returns Intersection of Two or More Sets Python Set inter-
section_update() Updates Calling Set With Intersection of Sets Python Set
isdisjoint() Checks Disjoint Sets Python Set issubset() Checks if a Set is
Subset of Another Set Python Set issuperset() Checks if a Set is Superset of
Another Set Python Set pop() Removes an Arbitrary Element Python Set
symmetric_difference() Returns Symmetric Difference Python Set symmet-
ric_difference_update() Updates Set With Symmetric Difference Python
Set union() Returns Union of Sets Python Set update() Add Elements to
The Set.
If we want to put string in set, it should be like this:
1 >>> a = s e t ( ' a a r d v a r k ' )
2 >>>
3 { 'd ' , 'v ' , 'a ' , ' r ' , 'k '}
4 >>> b = { ' a a r d v a r k ' }# o r s e t ( [ ' a a r d v a r k ' ] ) , c o n v e r t a l i s t o f
s t r i n g s to s e t
5 >>> b
6 { ' aardvark ' }
7 #o r put a t u p l e i n t h e s e t
8 a =s e t ( [ t u p l e ] ) o r { ( t u p l e ) }

Compare also the difference between and set() with a single word argument.

dict Data Type Method Description clear() Removes all the elements
from the dictionary copy() Returns a copy of the dictionary fromkeys()
Returns a dictionary with the specified keys and values get() Returns the
value of the specified key items() Returns a list containing a tuple for each
key value pair keys() Returns a list containing the dictionary’s keys pop()
Removes the element with the specified key and return value popitem()
Removes the last inserted key-value pair setdefault() Returns the value of
the specified key. If the key does not exist: insert the key, with the specified
value update() Updates the dictionary with the specified key-value pairs
values() Returns a list of all the values in the dictionary
See using cases at https://www.programiz.com/python-programming/
dictionary.
9.5. HASH TABLE 147

Collection Module
OrderedDict Standard dictionaries are unordered, which means that any
time you loop through a dictionary, you will go through every key, but you
are not guaranteed to get them in any particular order. The OrderedDict
from the collections module is a special type of dictionary that keeps track
of the order in which its keys were inserted. Iterating the keys of an ordered-
Dict has predictable behavior. This can simplify testing and debugging by
making all the code deterministic.

defaultdict Dictionaries are useful for bookkeeping and tracking statis-

tics. One problem is that when we try to add an element, we have no idea
if the key is present or not, which requires us to check such condition every
time.
1 d i c t = {}
2 key = " c o u n t e r "
3 i f key not i n d i c t :
4 d i c t [ key ]=0
5 d i c t [ key ] += 1

The defaultdict class from the collections module simplifies this process by
pre-assigning a default value when a key does not present. For different value
type it has different default value, for example, for int, it is 0 as the default
value. A defaultdict works exactly like a normal dict, but it is initialized
with a function (“default factory”) that takes no arguments and provides
the default value for a nonexistent key. Therefore, a defaultdict will never
raise a KeyError. Any key that does not exist gets the value returned by
the default factory. For example, the following code use a lambda function
and provide ’Vanilla’ as the default value when a key is not assigned and
the second code snippet function as a counter.
1 from c o l l e c t i o n s import d e f a u l t d i c t
2 ice_cream = d e f a u l t d i c t ( lambda : ' V a n i l l a ' )
3 ice_cream [ ' Sarah ' ] = ' Chunky Monkey '
4 ice_cream [ ' Abdul ' ] = ' B u t t e r Pecan '
5 p r i n t ice_cream [ ' Sarah ' ]
6 # Chunky Monkey
7 p r i n t ice_cream [ ' Joe ' ]
8 # Vanilla

1 from c o l l e c t i o n s import d e f a u l t d i c t
2 dict = d e f a u l t d i c t ( int ) # default value f o r int i s 0
3 d i c t [ ' c o u n t e r ' ] += 1

There include: Time Complexity for Operations Search, Insert, Delete:

O(1).

Counter
148 9. PYTHON DATA STRUCTURES

9.5.3 Exercises
1. 349. Intersection of Two Arrays (easy)

2. 350. Intersection of Two Arrays II (easy)

929. Unique Email Addresses

1 Every e m a i l c o n s i s t s o f a l o c a l name and a domain name ,
s e p a r a t e d by t h e @ s i g n .
2
3 For example , i n a l i c e @ l e e t c o d e . com , a l i c e i s t h e l o c a l name , and
l e e t c o d e . com i s t h e domain name .
4
5 B e s i d e s l o w e r c a s e l e t t e r s , t h e s e e m a i l s may c o n t a i n ' . ' s o r '+ ' s
.
6
7 I f you add p e r i o d s ( ' . ' ) between some c h a r a c t e r s i n t h e l o c a l
name p a r t o f an e m a i l a d d r e s s , m a i l s e n t t h e r e w i l l be
f o r w a r d e d t o t h e same a d d r e s s w i t h o u t d o t s i n t h e l o c a l name .
For example , " a l i c e . z @ l e e t c o d e . com " and " a l i c e z @ l e e t c o d e .
com " f o r w a r d t o t h e same e m a i l a d d r e s s . ( Note t h a t t h i s r u l e
d o e s not apply f o r domain names . )
8
9 I f you add a p l u s ( ' + ' ) i n t h e l o c a l name , e v e r y t h i n g a f t e r t h e
f i r s t p l u s s i g n w i l l be i g n o r e d . This a l l o w s c e r t a i n e m a i l s
t o be f i l t e r e d , f o r example m. y+name@email . com w i l l be
f o r w a r d e d t o my@email . com . ( Again , t h i s r u l e d o e s not apply
f o r domain names . )
10
11 I t i s p o s s i b l e t o u s e both o f t h e s e r u l e s a t t h e same time .
12
13 Given a l i s t o f e m a i l s , we send one e m a i l t o each a d d r e s s i n t h e
l i s t . How many d i f f e r e n t a d d r e s s e s a c t u a l l y r e c e i v e m a i l s ?
14
15 Example 1 :
16
17 Input : [ " t e s t . e m a i l+a l e x @ l e e t c o d e . com " , " t e s t . e . m a i l+bob .
c a t h y @ l e e t c o d e . com " , " t e s t e m a i l+d a v i d @ l e e . t c o d e . com " ]
18 Output : 2
19 E x p l a n a t i o n : " t e s t e m a i l @ l e e t c o d e . com " and " t e s t e m a i l @ l e e . t c o d e .
com " a c t u a l l y r e c e i v e m a i l s
20
21 Note :
22 1 <= e m a i l s [ i ] . l e n g t h <= 100
23 1 <= e m a i l s . l e n g t h <= 100
24 Each e m a i l s [ i ] c o n t a i n s e x a c t l y one '@' c h a r a c t e r .

Answer: Use hashmap simply Set of tuple to save the corresponding sending
exmail address: local name and domain name:
1 class Solution :
2 d e f numUniqueEmails ( s e l f , e m a i l s ) :
3 """
4 : type e m a i l s : L i s t [ s t r ]
9.6. GRAPH REPRESENTATIONS 149

5 : rtype : int
6 """
7 i f not e m a i l s :
8 return 0
9 num = 0
10 handledEmails = s e t ( )
11 f o r email in emails :
12 local_name , domain_name = e m a i l . s p l i t ( '@ ' )
13 local_name = local_name . s p l i t ( '+ ' ) [ 0 ]
14 local_name = local_name . r e p l a c e ( ' . ' , ' ' )
15 h a n d l e d E m a i l s . add ( ( local_name , domain_name ) )
16 return l e n ( handledEmails )

9.6 Graph Representations

Graph data structure can be thought of a superset of the array and the
linked list, and tree data structures. In this section, we only introduce the
presentation and implementation of the graph, but rather defer the searching
strategies to the principle part. Searching strategies in the graph makes a
starting point in algorithmic problem solving, knowing and analyzing these
strategies in details will make an independent chapter as a problem solving
principle.

9.6.1 Introduction
Graph representations need to show users full information to the graph itself,
G = (V, E), including its vertices, edges, and its weights to distinguish either
it is directed or undirected, weighted or unweighted. There are generally
four ways: (1) Adjacency Matrix, (2) Adjacency List, (3) Edge List, and (4)
optionally, Tree Structure, if the graph is a free tree. Each will be preferred
to different situations. An example is shown in Fig 9.3.

Figure 9.3: Four ways of graph representation, renumerate it from 0. Redraw

the graph

Double Edges in Undirected Graphs In directed graph, the number

of edges is denoted as |E|. However, for the undirected graph, because one
edge (u, v) only means that vertex u and v are connected; we can reach to
150 9. PYTHON DATA STRUCTURES

v from u and it also works the other way around. To represent undirected
graph, we have to double its number of edges shown in the structure; it
becomes 2|E| in all of our representations.

Adjacency Matrix

An adjacency matrix of a graph is a 2-D matrix of size |V | × |V |: each

dimension, row and column, is vertex-indexed. Assume our matrix is am,
if there is an edge between vertices 3,4, and if its unweighted graph, we
mark it by setting am[3][4]=1, we do the same for all edges and leaving
all other spots in the matrix zero-valued. For undirected graph, it will be
a symmetric matrix along the main diagonal as shown in A of Fig. 9.3;
the matrix is its own transpose: am = amT . We can choose to store only
the entries on and above the diagonal of the matrix, thereby cutting the
memory need in half. For unweighted graph, typically our adjacency matrix
is zero-and-one valued. For a weighted graph, the adjacency matrix becomes
a weight matrix, with w(i, j) to denote the weight of edge (i, j); the weight
can be both negative or positive or even zero-valued in practice, thus we
might want to figure out how to distinguish the non-edge relation from the
edge relation when the situation arises.
The Python code that implements the adjacency matrix for the graph
in the example is:

am = [ [ 0 ] ∗ 7 f o r _ i n r a n g e ( 7 ) ]

# set 8 edges
am [ 0 ] [ 1 ] = am [ 1 ] [ 0 ] = 1
am [ 0 ] [ 2 ] = am [ 2 ] [ 0 ] = 1
am [ 1 ] [ 2 ] = am [ 2 ] [ 1 ] = 1
am [ 1 ] [ 3 ] = am [ 3 ] [ 1 ] = 1
am [ 2 ] [ 4 ] = am [ 4 ] [ 2 ] = 1
am [ 3 ] [ 4 ] = am [ 4 ] [ 3 ] = 1
am [ 4 ] [ 5 ] = am [ 5 ] [ 4 ] = 1
am [ 5 ] [ 6 ] = am [ 6 ] [ 5 ] = 1

Applications Adjacency matrix usually fits well to the dense graph where
the edges are close to |V |2 , leaving a small ratio of the matrix be blank
and unused. Checking if an edge exists between two vertices takes only
O(1). However, an adjacency matrix requires exactly O(V ) to enumerate
the the neighbors of a vertex v–an operation commonly used in many graph
algorithms–even if vertex v only has a few neighbors. Moreover, when the
graph is sparse, an adjacency matrix will be both inefficient in the space
and iteration cost, a better option is adjacency list.
9.6. GRAPH REPRESENTATIONS 151

Adjacency List
An adjacency list is a more compact and space efficient form of graph repre-
sentation compared with the above adjacency matrix. In adjacency list, we
have a list of V vertices which is vertex-indexed, and for each vertex v we
store anther list of neighboring nodes with their vertex as the value, which
can be represented with an array or linked list. For example, with adjacency
list as [[1, 2, 3], [3, 1], [4, 6, 1]], node 0 connects to 1,2,3, node 1 connect to 3,1,
node 2 connects to 4,6,1.
In Python, We can use a normal 2-d array to represent the adjacent list,
for the same graph in the example, it as represented with the following code:
al = [ [ ] f o r _ in range (7) ]

# set 8 edges
al [ 0 ] = [1 , 2]
al [ 1 ] = [2 , 3]
al [ 2 ] = [0 , 4]
al [ 3 ] = [1 , 4]
al [ 4 ] = [2 , 3 , 5]
al [ 5 ] = [4 , 6]
al [ 6 ] = [5]

Applications The upper bound space complexity for adjacency list is

O(|V |2 ). However, with adjacency list, to check if there is an edge be-
tween node u and v, it has to take O(|V |) time complexity with a linear
scanning in the list al[u]. If the graph is static, meaning we do not add
more vertices but can modify the current edges and its weight, we can use a
set or a dictionary Python data type on second dimension of the adjacency
list. This change enables O(1) search of an edge just as of in the adjacency
matrix.

Edge List
The edge list is a list of edges (one-dimensional), where the index of the list
does not relate to vertex and each edge is usually in the form of (starting
vertex, ending vertex, weight). We can use either a list or a tuple to
represent an edge. The edge list representation of the example is given:
el = []
el . e x t en d ( [ [ 0 , 1] , [1 , 0]])
el . e x t en d ( [ [ 0 , 2] , [2 , 0]])
el . e x t en d ( [ [ 1 , 2] , [2 , 1]])
el . e x t en d ( [ [ 1 , 3] , [3 , 1]])
el . e x t en d ( [ [ 3 , 4] , [4 , 3]])
el . e x t en d ( [ [ 2 , 4] , [4 , 2]])
el . e x t en d ( [ [ 4 , 5] , [5 , 4]])
el . e x t en d ( [ [ 5 , 6] , [6 , 5]])
152 9. PYTHON DATA STRUCTURES

Applications Edge list is not widely used as the AM and AL, and usually
only be needed in a subrountine of algorithm implementation–such as in
Krukal’s algorithm to fine Minimum Spanning Tree(MST)–where we might
need to order the edges by its weight.

Tree Structure

If the connected graph has no cycle and the edges E = V − 1, which is

essentially a tree. We can choose to represent it either one of the three
representations. Optionally, we can use the tree structure is formed as rooted
tree with nodes which has value and pointers to its children. We will see
later how this type of tree is implemented in Python.

9.6.2 Use Dictionary

In the last section, we always use the vertex indexed structure, it works
but might not be human-friendly to work with, in practice a vertex always
comes with a “name”–such as in the cities system, a vertex should be a city’s
name. Another inconvenience is when we have no idea of the total number
of vertices, using the index-numbering system requires us to first figure our
all vertices and number each, which is an overhead.
To avoid the two inconvenience, we can replace Adjacency list, which is
a list of lists with embedded dictionary structure which is a dictionary of
dictionaries or sets.

Unweighted Graph For example, we demonstrate how to give a “name”

to exemplary graph; we replace 0 with ‘a’, 1 with ‘b’, and the others with
{0 c0 , d,0 e0 ,0 f 0 ,0 g 0 }. We declare defaultdict(set), the outer list is replaced
by the dictionary, and the inner neighboring node list is replaced with a set
for O(1) access to any edge.
In the demo code, we simply construct this representation from the edge
list.
1 from c o l l e c t i o n s import d e f a u l t d i c t
2
3 d = defaultdict ( set )
4 f o r v1 , v2 i n e l :
5 d [ c h r ( v1 + ord ( ' a ' ) ) ] . add ( c h r ( v2 + ord ( ' a ' ) ) )
6 print (d)

And the printed graph is as follows:

d e f a u l t d i c t (< c l a s s ' s e t ' > , { ' a ' : { ' b ' , ' c ' } , ' b ' : { ' d ' , ' c ' , ' a
'} , ' c ' : { 'b ' , ' e ' , 'a '} , 'd ' : { 'b ' , ' e '} , ' e ' : { 'd ' , ' c ' , ' f
'} , ' f ' : { 'e ' , 'g '} , 'g ' : { ' f '}})
9.7. TREE DATA STRUCTURES 153

Weighted Graph If we need weights for each edge, we can use two-
dimensional dictionary. We use 10 as a weight to all edges just to demon-
strate.
1 dw = d e f a u l t d i c t ( d i c t )
2 f o r v1 , v2 i n e l :
3 vn1 = c h r ( v1 + ord ( ' a ' ) )
4 vn2 = c h r ( v2 + ord ( ' a ' ) )
5 dw [ vn1 ] [ vn2 ] = 10
6 p r i n t (dw)

We can access the edge and its weight through dw[v1][v2]. The output of
this structure is given:
d e f a u l t d i c t (< c l a s s ' d i c t ' > , { ' a ' : { ' b ' : 1 0 , ' c ' : 1 0 } , ' b ' : { ' a ' :
10 , ' c ' : 10 , 'd ' : 10} , ' c ' : { ' a ' : 10 , 'b ' : 10 , ' e ' : 10} , 'd
' : { 'b ' : 10 , ' e ' : 10} , ' e ' : { 'd ' : 10 , ' c ' : 10 , ' f ' : 10} , ' f ' :
{ ' e ' : 10 , ' g ' : 10} , ' g ' : { ' f ' : 10}})

9.7 Tree Data Structures

In this section, we focus on implementing a recursive tree structure, since
a free tree just works the same way as of the graph structure. Also, we
have already covered the implicit structure of tree in the topic of heap.
In this section, we first implement the recursive tree data structure and the
construction of a tree. In the next section, we discuss the searching strategies
on the tree–tree traversal, including its both recursive and iterative variants.
put an figure here of a binary and n-ary tree.
Because a tree is a hierarchical–here which is represented recursively–
structure of a collection of nodes. We define two classes each for the N-ary
tree node and the binary tree node. A node is composed of a variable val
saving the data and children pointers to connect the nodes in the tree.

Binary Tree Node In a binary tree, the children pointers will at at most
two pointers, which we define as left and right. The binary tree node is
defined as:
1 c l a s s BinaryNode :
2 d e f __init__ ( s e l f , v a l ) :
3 s e l f . l e f t = None
4 s e l f . r i g h t = None
5 s e l f . val = val

N-ary Tree Node For N-ary node, when we initialize the length of the
node’s children with additional argument n.
1 c l a s s NaryNode :
2 d e f __init__ ( s e l f , n , v a l ) :
3 s e l f . c h i l d r e n = [ None ] ∗ n
154 9. PYTHON DATA STRUCTURES

4 s e l f . val = val

In this implementation, the children is ordered by each’s index in the list. In

real practice, there is a lot of flexibility. It is not necessarily to pre-allocate
the length of its children, we can start with an empty list [] and just append
more nodes to its children list on the fly. Also we can replace the list with a
dictionary data type, which might be a better and more space efficient way.

Construct A Tree Now that we have defined the tree node, the process
of constructing a tree in the figure will be a series of operations:
1
/ \
2 3
/ \ \
4 5 6

1 r o o t = BinaryNode ( 1 )
2 l e f t = BinaryNode ( 2 )
3 r i g h t = BinaryNode ( 3 )
4 root . l e f t = l e f t
5 root . right = right
6 l e f t . l e f t = BinaryNode ( 4 )
7 l e f t . r i g h t = BinaryNode ( 5 )
8 r i g h t . r i g h t = BinaryNode ( 6 )

We see that the above is not convenient in practice. A more practice

way is to represent the tree with the heap-like array, which treated the tree
as a complete tree. For the above binary tree, because it is not complete in
definition, we pad the left child of node 3 with None in the list, we would
have array [1, 2, 3, 4, 5, None, 6]. The root node will have index 0,
and given a node with index i, the children nodes of it will be indexed with
n ∗ i + j, j ∈ [1, ..., n]. Thus, a better way to construct the above tree is to
start from the array and and traverse the list recursively to build up the
tree.
We define a recursive function with two arguments: a–the input array of
nodes and idx–indicating the position of the current node in the array. At
each recursive call, we construct a BinaryNode and set its left and right
child to be a node returned with two recursive call of the same function.
Equivalently, we can say these two subprocess–constructTree(a, 2*idx
+ 1) and constructTree(a, 2*idx + 2) builds up two subtrees and each
is rooted with node 2*idx+1 and 2*idx+2 respectively. When there is no
items left in the array to be used, it natually indicates the end of the recur-
sive function and return None to indicate its an empty node. We give the
following Python code:
1 def constructTree (a , idx ) :
2 '''
3 a : i n p u t a r r a y o f nodes
9.7. TREE DATA STRUCTURES 155

4 i d x : i n d e x t o i n d i c a t t h e l o c a t i o n o f t h e c u r r e n t node
5 '''
6 i f i d x >= l e n ( a ) :
7 r e t u r n None
8 i f a [ idx ] :
9 node = BinaryNode ( a [ i d x ] )
10 node . l e f t = c o n s t r u c t T r e e ( a , 2∗ i d x + 1 )
11 node . r i g h t = c o n s t r u c t T r e e ( a , 2∗ i d x + 2 )
12 r e t u r n node
13 r e t u r n None

Now, we call this function, and pass it with out input array:
1 nums = [ 1 , 2 , 3 , 4 , 5 , None , 6 ]
2 r o o t = c o n s t r u c t T r e e ( nums , 0 )

Please write a recursive function to construct the N-ary

tree given in Fig. ???

In the next section, we discuss tree traversal methods, and we will use those
methods to print out the tree we just build.

9.7.1 LeetCode Problems

To show the nodes at each level, we use LevelOrder function to print out
the tree:
1 def LevelOrder ( root ) :
2 q = [ root ]
3 while q :
4 new_q = [ ]
5 for n in q :
6 i f n i s not None :
7 p r i n t ( n . v a l , end= ' , ' )
8 if n. left :
9 new_q . append ( n . l e f t )
10 i f n. right :
11 new_q . append ( n . r i g h t )
12 q = new_q
13 p r i n t ( ' \n ' )
14 LevelOrder ( root )
15 # output
16 # 1,
17
18 # 2 ,3 ,
19
20 # 4 , 5 , None , 6 ,

Lowest Common Ancestor. The lowest common ancestor is defined be-

tween two nodes p and q as the lowest node in T that has both p and q as
descendants (where we allow a node to be a descendant of itself). There will
156 9. PYTHON DATA STRUCTURES

be two cases in LCA problem which will be demonstrated in the following

example.
9.1 Lowest Common Ancestor of a Binary Tree (L236). Given a
binary tree, find the lowest common ancestor (LCA) of two given nodes
in the tree. Given the following binary tree: root = [3,5,1,6,2,0,8,null,null,7,4]
_______3______
/ \
___5__ ___1__
/ \ / \
6 _2 0 8
/ \
7 4

Example 1 :
Input : r o o t = [ 3 , 5 , 1 , 6 , 2 , 0 , 8 , n u l l , n u l l , 7 , 4 ] , p = 5 , q = 1
Output : 3
E x p l a n a t i o n : The LCA o f o f nodes 5 and 1 i s 3 .

Example 2 :
Input : r o o t = [ 3 , 5 , 1 , 6 , 2 , 0 , 8 , n u l l , n u l l , 7 , 4 ] , p = 5 , q = 4
Output : 5
E x p l a n a t i o n : The LCA o f nodes 5 and 4 i s 5 , s i n c e a node
can be a d e s c e n d a n t o f i t s e l f
a c c o r d i n g t o t h e LCA d e f i n i t i o n .

Solution: Divide and Conquer. There are two cases for LCA: 1)
two nodes each found in different subtree, like example 1. 2) two nodes
are in the same subtree like example 2. If we compare the current node
with the p and q, if it equals to any of them, return current node in
the tree traversal. Therefore in example 1, at node 3, the left return
as node 5, and the right return as node 1, thus node 3 is the LCA.
In example 2, at node 5, it returns 5, thus for node 3, the right tree
would have None as return, thus it makes the only valid return as the
final LCA. The time complexity is O(n).
1 d e f lowestCommonAncestor ( s e l f , r o o t , p , q ) :
2 """
3 : type r o o t : TreeNode
4 : type p : TreeNode
5 : type q : TreeNode
6 : r t y p e : TreeNode
7 """
8 i f not r o o t :
9 r e t u r n None
10 i f r o o t == p o r r o o t == q :
11 r e t u r n r o o t # found one v a l i d node ( c a s e 1 : s t o p a t
5 , 1 , case 2 : stop at 5)
12 l e f t = s e l f . lowestCommonAncestor ( r o o t . l e f t , p , q )
13 r i g h t = s e l f . lowestCommonAncestor ( r o o t . r i g h t , p , q )
14 i f l e f t i s not None and r i g h t i s not None : # p , q i n
the subtree
9.8. HEAP 157

15 return root
16 i f any ( [ l e f t , r i g h t ] ) i s not None :
17 r e t u r n l e f t i f l e f t i s not None e l s e r i g h t
18 r e t u r n None

9.8 Heap
count = Counter(nums)
Heap is a tree based data structure that satisfies the heap ordering prop-
erty. The ordering can be one of two types:

• the min-heap property: the value of each node is greater than or equal
(≥) to the value of its parent, with the minimum-value element at the
root.

• the max-heap property: the value of each node is less than or equal to
(≤) the value of its parent, with the maximum-value element at the
root.

Figure 9.4: Max-heap be visualized with binary tree structure on the left,
and be implemented with Array on the right.

Binary Heap A heap is not a sorted structure but can be regarded as

partially ordered. The maximum number of children of a node in a heap
depends on the type of heap. However, in the more commonly-used heap
type, there are at most two children of a node and it’s known as a Binary
heap. A min-binary heap is shown in Fig. 9.4. Throughout this section the
word “heap” will always refer to a min-heap.
Heap is commonly used to implement priority queue that each time the
item of the highest priority is popped out – this can be done in O(log n).
As we go through the book, we will find how often priority queue is needed
to solve our problems. It can also be used in sorting, such as the heapsort
algorithm.
158 9. PYTHON DATA STRUCTURES

Heap Representation A binary heap is always a complete binary tree

that each level is fully filled before starting to fill the next level. Therefore
it has a height of log n given a binary heap with n nodes. A complete binary
tree can be uniquely represented by storing its level order traversal in an
array. Array representation more space efficient due to the non-existence of
the children pointers for each node.
In the array representation, index 0 is skipped for convenience of imple-
mentation. Therefore, root locates at index 1. Consider a k-th item of the
array, its parent and children relation is:

• its left child is located at 2 ∗ k index,

• its right child is located at 2 ∗ k + 1. index,

• and its parent is located at k/2 index (In Python3, use integer division
n//2).

9.8.1 Basic Implementation

The basic methods of a heap class should include: push–push an item into
the heap, pop–pop out the first item, and heapify–convert an arbitrary
array into a heap. In this section, we use the heap shown in Fig. 9.5 as our
example.

Figure 9.5: A Min-heap.

Push: Percolation Up The new element is initially appended to the

end of the heap (as the last element of the array). The heap property is
repaired by comparing the added element with its parent and moving the
added element up a level (swapping positions with the parent). This process
9.8. HEAP 159

is called percolation up. The comparison is repeated until the parent is larger
than or equal to the percolating element. When we push an item in, the
item is initially appended to the end of the heap. Assume the new item is
the smaller than existing items in the heap, such as 5 in our example, there
will be violation of the heap property through the path from the end of the
heap to the root. To repair the violation, we traverse through the path and
compare the added item with its parent:

• if parent is smaller than the added item, no action needed and the
traversal is terminated, e.g. adding item 18 will lead to no action.

• otherwise, swap the item with the parent, and set the node to its
parent so that it can keep traverse.

Each step we fix the heap ordering property for a substree. The time com-
plexity is the same as the height of the complete tree, which is O(log n).
To generalize the process, a _float() function is first implemented which
enforce min heap ordering property on the path from a given index to the
root.
1 d e f _ f l o a t ( idx , heap ) :
2 w h i l e i d x // 2 :
3 p = i d x // 2
4 # Violation
5 i f heap [ i d x ] < heap [ p ] :
6 heap [ i d x ] , heap [ p ] = heap [ p ] , heap [ i d x ]
7 else :
8 break
9 idx = p
10 return

With _float(), function push is implemented as:

1 d e f push ( heap , k ) :
2 heap . append ( k )
3 _ f l o a t ( i d x = l e n ( heap ) − 1 , heap=heap )

Pop: Percolation Down When we pop out the item, no matter if it is

the root item or any other item in the heap, an empty spot appears at that
location. We first move the last item in the heap to this spot, and then start
to repair the heap ordering property by comparing the new item at this spot
to its children:

• if one of its children has smaller value than this item, swap this item
with that child and set the location to that child’s location. And then
continue.

• otherwise, the process is done.

160 9. PYTHON DATA STRUCTURES

Figure 9.6: Left: delete node 5, and move node 12 to root. Right: 6 is the
smallest among 12, 6, and 7, swap node 6 with node 12.

Similarly, this process is called percolation down. Same as the insert in the
case of complexity, O(log n). We demonstrate this process with two cases:

• if the item is the root, which is the minimum item 5 in our min-heap
example, we move 12 to the root first. Then we compare 12 with its
two children, which are 6 and 7. Swap 12 with 6, and continue. The
process is shown in Fig. 9.6.

• if the item is any other node instead of root, say node 7 in our example.
The process is exactly the same. We move 12 to node 7’s position.
By comparing 12 with children 10 and 15, 10 and 12 is about to be
swapped. With this, the heap ordering property is sustained.

We first use a function _sink to implement the percolation down part

of the operation.
1 d e f _sink ( idx , heap ) :
2 s i z e = l e n ( heap )
3 while 2 ∗ idx < s i z e :
4 l i = 2 ∗ idx
5 ri = li + 1
6 mi = i d x
7 i f heap [ l i ] < heap [ mi ] :
8 mi = l i
9 i f r i < s i z e and heap [ r i ] < heap [ mi ] :
10 mi = r i
11 i f mi != i d x :
12 # swap i n d e x with mi
13 heap [ i d x ] , heap [ mi ] = heap [ mi ] , heap [ i d x ]
14 else :
15 break
16 i d x = mi

The pop is implemented as:

9.8. HEAP 161

1 d e f pop ( heap ) :
2 v a l = heap [ 1 ]
3 # Move t h e l a s t item i n t o t h e r o o t p o s i t i o n
4 heap [ 1 ] = heap . pop ( )
5 _sink ( i d x =1 , heap=heap )
6 return val

Heapify Heapify is a procedure that converts a list to a heap. To heapify

a list, we can naively do it through a series of insertion operations through
the items in the list, which gives us an upper-bound time complexity :
O(n log n). However, a more efficient way is to treat the given list as a
tree and to heapify directly on the list.
To satisfy the heap property, we need to first start from the smallest
subtrees, which are leaf nodes. Leaf nodes have no children which satisfy
the heap property naturally. Therefore we can jumpy to the last parent
node, which is at position n//2 with starting at 1 index. We apply the
percolation down process as used in pop operation which works forwards
comparing the node with its children nodes and applies swapping if the heap
property is violated. At the end, the subtree rooted at this particular node
obeys the heap ordering property. We then repeat the same process for all
parents nodes items in the list in range [n/2, 1]–in reversed order of [1, n/2],
which guarantees that the final complete binary tree is a binary heap. This
follows a dynamic programming fashion. The leaf nodes a[n/2 + 1, n] are
naturally a heap. Then the subarrays are heapified in order of a[n/2, n],
a[n/2 − 1, n], ..., [1, n] as we working on nodes [n/2, 1]. we first heaipfy
a[n, n], A[n − 1...n], A[n − 2...n], ..., A[1...n]. Such process gives us a tighter
upper bound which is O(n).
We show how the heapify process is applied on a = [21, 1, 45, 78, 3, 5] in
Fig. 9.9.
Implementation-wise, the heapify function call _sink as its subroutine.
The code is shown as:
1 def heapify ( l s t ) :
2 heap = [ None ] + l s t
3 n = len ( l s t )
4 f o r i i n r a n g e ( n / / 2 , 0 , −1) :
5 _sink ( i , heap )
6 r e t u r n heap

Which way is more efficient building a heap from a list?

Using insertion or heapify? What is the efficiency of each method?
The experimental result can be seen in the code.
162 9. PYTHON DATA STRUCTURES

Figure 9.7: Heapify: The last parent node 45.

Figure 9.8: Heapify: On node 1

Figure 9.9: Heapify: On node 21.

Try to use the percolation up process to heaipify the list.

9.8.2 Python Built-in Library: heapq

When we are solving a problem, unless specifically required for implementa-
tion, we can always use an existent Python module/package. heapq is one
of the most frequently used library in problem solving.
heapq 2 is a built-in library in Python that implements heap queue al-
gorithm. heapq object implements a minimum binary heap and it provides
three main functions: heappush, heappop, and heaipfy similar to what we
have implemented in the last section. The API differs from our last section
2
https://docs.python.org/3.0/library/heapq.html
9.8. HEAP 163

in one aspect: it uses zero-based indexing. There are other three functions:
nlargest, nsmallest, and merge that come in handy in practice. These
functions are listed and described in Table 9.9.

Table 9.9: Methods of heapq

Method Description
heappush(h, x) Push the x onto the heap, maintaining the heap invariant.
heappop(h) Pop and return the smallest item from the heap, maintaining
the heap invariant. If the heap is empty, IndexError is raised.
heappushpop(h, x) Push x on the heap, then pop and return the smallest item
from the heap. The combined action runs more efficiently than
heappush() followed by a separate call to heappop().
heapify(x) Transform list x into a heap, in-place, in linear time.
nlargest(k, iterable, This function is used to return the k largest elements from the
key = fun) iterable specified and satisfying the key if mentioned.
nsmallest(k, iter- This function is used to return the k smallest elements from
able, key = fun) the iterable specified and satisfying the key if mentioned.
merge(*iterables, Merge multiple sorted inputs into a single sorted output. Re-
key=None, returns a generator over the sorted values.
verse=False)
heapreplace(h, x) Pop and return the smallest item from the heap, and also push
the new item.

Now, lets see some examples.

Min-Heap Given the exemplary list a = [21, 1, 45, 78, 3, 5], we call the
function heapify() to convert it to a min-heap.
1 from heapq import heappush , heappop , h e a p i f y
2 h = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
3 heapify (h)

The heapified result is h = [1, 3, 5, 78, 21, 45]. Let’s try heappop and heappush:
1 heappop ( h )
2 heappush ( h , 1 5 )

The print out for h is:

1 [ 5 , 15 , 45 , 78 , 21]

nlargest and nsmallest To get the largest or smallest first n items

with these two functions does not require the list to be first heapified with
heapify because it is built in them.
1 from heapq import n l a r g e s t , n s m a l l e s t
2 h = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
3 nl = nlargest (3 , h)
4 ns = n s m a l l e s t ( 3 , h )
164 9. PYTHON DATA STRUCTURES

The print out for nl and ns is as:

1 [ 7 8 , 45 , 21]
2 [1 , 3 , 5]

Merge Multiple Sorted Arrays Function merge merges multiple iter-

ables into a single generator typed output. It assumes all the inputs are
sorted. For example:
1 from heapq import merge
2 a = [ 1 , 3 , 5 , 21 , 45 , 78]
3 b = [2 , 4 , 8 , 16]
4 ab = merge ( a , b )

The print out of ab directly can only give us a generator object with its
address in the memory:
1 <g e n e r a t o r o b j e c t merge a t 0 x7 fd c9 3b 38 9e8 >

We can use list comprehension and iterate through ab to save the sorted
array in a list:
1 a b _ l s t = [ n f o r n i n ab ]

The print out for ab_lst is:

1 [ 1 , 2 , 3 , 4 , 5 , 8 , 16 , 21 , 45 , 78]

Max-Heap As we can see the default heap implemented in heapq is forc-

ing the heap property of the min-heap. What if we want a max-heap instead?
In the library, it does offer us function, but it is intentionally hided from
users. It can be accessed like: heapq._[function]_max(). Now, we can
heapify a max-heap with function _heapify_max.
1 from heapq import _heapify_max
2 h = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
3 _heapify_max ( h )

The print out for h is:

1 [ 7 8 , 21 , 45 , 1 , 3 , 5 ]

Also, in practise, a simple hack for the max-heap is to save data as

negative. Whenever we use the data, we convert it to the original value. For
example:
1 h = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
2 h = [−n f o r n i n h ]
3 heapify (h)
4 a = −heappop ( h )

a will be 78, as the largest item in the heap.

9.8. HEAP 165

With Tuple/List or Customized Object as Items for Heap Any

object that supports comparison (_cmp_()) can be used in heap with heapq.
When we want our item to include information such as “priority” and “task”,
we can either put it in a tuple or a list. heapq For example, our item is a
list, and the first is the priority and the second denotes the task id.
1 heap = [ [ 3 , ' a ' ] , [ 1 0 , ' b ' ] , [ 5 , ' c ' ] , [ 8 , ' d ' ] ]
2 h e a p i f y ( heap )

The print out for heap is:

1 [[3 , 'a ' ] , [8 , 'd ' ] , [5 , ' c ' ] , [10 , 'b ' ] ]

However, if we have multiple tasks that having the same priority, the relative
order of these tied tasks can not be sustained. This is because the list items
are compared with the whole list as key: it first compare the first item,
whenever there is a tie, it compares the next item. For example, when our
example has multiple items with 3 as the first value in the list.
1 h = [ [ 3 , ' e ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , 'a ' ] ]
2 heapify (h)

The printout indicates that the relative ordering of items [3, ’e’], [3, ’d’], [3,
’a’] is not kept:
1 [[3 , 'a ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , ' e ' ] ]

Keeping the relative order of tasks with same priority is a requirement for
priority queue abstract data structure. We will see at the next section how
priority queue can be implemented with heapq.

Modify Items in heapq In the heap, we can change the value of any
item just as what we can in the list. However, the violation of heap ordering
property occurs after the change so that we need a way to fix it. We have
the following two private functions to use according to the case of change:
• _siftdown(heap, startpos, pos): pos is where the where the new
violation is. startpos is till where we want to restore the heap in-
variant, which is usually set to 0. Because in _siftdown() it goes
backwards to compare this node with the parents, we can call this
function to fix when an item’s value is decreased.

• _siftup(heap, pos): In _siftup() items starting from pos are com-

pared with their children so that smaller items are sifted up along the
way. Thus, we can call this function to fix when an item’s value is
increased.
We show one example:
1 import heapq
2 heap = [ [ 3 , ' a ' ] , [ 1 0 , ' b ' ] , [ 5 , ' c ' ] , [ 8 , ' d ' ] ]
3 h e a p i f y ( heap )
166 9. PYTHON DATA STRUCTURES

4 p r i n t ( heap )
5
6 heap [ 0 ] = [ 6 , ' a ' ]
7 # Increased value
8 heapq . _ s i f t u p ( heap , 0 )
9 p r i n t ( heap )
10 #D e c r e a s e d Value
11 heap [ 2 ] = [ 3 , ' a ' ]
12 heapq . _siftdown ( heap , 0 , 2 )
13 p r i n t ( heap )

The printout is:

1 [[3 , 'a ' ] , [8 , 'd ' ] , [5 , ' c ' ] , [10 , 'b ' ] ]
2 [[5 , ' c ' ] , [8 , 'd ' ] , [6 , 'a ' ] , [10 , 'b ' ] ]
3 [[3 , 'a ' ] , [8 , 'd ' ] , [5 , ' c ' ] , [10 , 'b ' ] ]

9.9 Priority Queue

A priority queue is an abstract data type(ADT) and an extension of queue
with properties:

1. A queue that each item has a priority associated with.

2. In a priority queue, an item with higher priority is served (dequeued)

before an item with lower priority.

3. If two items have the same priority, they are served according to their
order in the queue.

Priority Queue is commonly seen applied in:

1. CPU Scheduling,

2. Graph algorithms like Dijkstra’s shortest path algorithm, Prim’s Min-

imum Spanning Tree, etc.

3. All queue applications where priority is involved.

The properties of priority queue demand sorting stability to our chosen

sorting mechanism or data structure. Heap is generally preferred over arrays
or linked list to be the underlying data structure for priority queue. In fact,
the Python class PriorityQueue() from Python module queue uses heapq
under the hood too. We later will see how to implement priority queue
with heapq and how to use PriorityQueue() class for our purpose. In
default, the lower the value is, the higher the priority is, making min-heap
the underlying data structure.
9.9. PRIORITY QUEUE 167

Implement with heapq Library

The core functions: heapify(), push(), and pop() within heapq lib are
used in our implementation. In order to implement priority queue, our
binary heap needs to have the following features:

• Sort stability: when we get two tasks with equal priorities, we return
them in the same order as they were originally added. A potential
solution is to modify the original 2-element list [priority, task]
into a 3-element list as [priority, count, task]. list is preferred
because tuple does not allow item assignment. The entry count in-
dicates the original order of the task in the list, which serves as a
tie-breaker so that two tasks with the same priority are returned in
the same order as they were added to preserve the sort stability. Also,
since no two entry counts are the same so that in the tuple comparison
the task will never be directly compared with the other. For example,
use the same example as in the last section:
1 import i t e r t o o l s
2 c o u n t e r = i t e r t o o l s . count ( )
3 h = [ [ 3 , ' e ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , 'a ' ] ]
4 h = [ [ p , next ( c o u n t e r ) , t ] f o r p , t i n h ]

The printout for h is:

1 [ [ 3 , 0 , ' e ' ] , [3 , 1 , 'd ' ] , [10 , 2 , ' c ' ] , [5 , 3 , 'b ' ] , [3 ,
4 , 'a ' ] ]

If we heapify h will gives us the same order as the original h. The

relative ordering of ties in the sense of priority is sustained.

• Remove arbitrary items or update the priority of an item:

In situations such as the priority of a task changes or if a pending
task needs to be removed, we have to update or remove an item from
the heap. we have seen how to update an item’s value in O(log n)
time cost with two functions: _siftdown() and _siftup() in a heap.
However, how to remove an arbitrary item? We could have found and
replaced it with the last item in the heap. Then depending on the
comparison between the value of the deleted item and the last item,
the two mentioned functions can be applied further.
However, there is a more convenient alternative: whenever we “re-
move” an item, we do not actually remove it but instead simply mark
it as “removed”. These “removed” items will eventually be popped out
through a normally pop operation that comes with heap data struc-
ture, and which has the same time cost O(log n). With this alterna-
tive, when we are updating an item, we mark the old item as “re-
moved” and add the new item in the heap. Therefore, with the special
168 9. PYTHON DATA STRUCTURES

mark method, we are able to implement a heap wherein arbitrary re-

moval and update is supported with just three common functionalities:
heapify, heappush, and heappop.
Let’s use the same example here. We first remove task ‘d’ and then
update task ‘b”s priority to 14. Then we use another list vh to get the
relative ordering of tasks in the heap according to the priority.
1 REMOVED = '<removed−t a s k > '
2 # Remove t a s k ' d '
3 h [ 1 ] [ 2 ] = REMOVED
4 # Updata t a s k ' b ' ' s p r o p r i t y t o 14
5 h [ 3 ] [ 2 ] = REMOVED
6 heappush ( h , [ 1 4 , next ( c o u n t e r ) , ' b ' ] )
7 vh = [ ]
8 while h :
9 item = heappop ( h )
10 i f item [ 2 ] != REMOVED:
11 vh . append ( item )

The printout for vh is:

1 [ [ 3 , 0 , ' e ' ] , [3 , 4 , 'a ' ] , [10 , 2 , ' c ' ] , [14 , 5 , 'b ' ] ]

• Search in constant time: To search in the heap of an arbitrary

item–non-root item and root-item–takes linear time. In practice, tasks
should have unique task ids to distinguish from each other, making the
usage of a dictionary where task serves as key and the the 3-element
list as value possible (for a list, the value is just a pointer pointing to
the starting position of the list). With the dictionary to help search,
the time cost is thus decreased to constant. We name this dictionary
here as entry_finder. Now, with we modify the previous code. The
following code shows how to add items into a heap that associates with
entry_finder:
1 # A heap a s s o c i a t e d with e n t r y _ f i n d e r
2 c o u n t e r = i t e r t o o l s . count ( )
3 e n t r y _ f i n d e r = {}
4 h = [ [ 3 , ' e ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , 'a ' ] ]
5 heap = [ ]
6 for p , t in h :
7 item = [ p , next ( c o u n t e r ) , t ]
8 heap . append ( item )
9 e n t r y _ f i n d e r [ t ] = item
10 h e a p i f y ( heap )

Then, we execute the remove and update operations with entry_finder.

1 REMOVED = '<removed−t a s k > '
2 d e f remove_task ( t a s k _ i d ) :
3 i f task_id in entry_finder :
4 e n t r y _ f i n d e r [ t a s k _ i d ] [ 2 ] = REMOVED
9.9. PRIORITY QUEUE 169

5 e n t r y _ f i n d e r . pop ( t a s k _ i d ) # d e l e t e from t h e d i c t i o n a r y
6 return
7
8 # Remove t a s k ' d '
9 remove_task ( ' d ' )
10 # Updata t a s k ' b ' ' s p r i o r i t y t o 14
11 remove_task ( ' b ' )
12 new_item = [ 1 4 , next ( c o u n t e r ) , ' b ' ]
13 heappush ( heap , new_item )
14 e n t r y _ f i n d e r [ ' b ' ] = new_item

In the notebook, we provide a comprehensive class named PriorityQueue

that implements just what we have discussed in this section.

Implement with PriorityQueue class

Class PriorityQueue() class has the same member functions as class Queue()
and LifoQueue() which are shown in Table 9.8. Therefore, we skip the in-
troduction. First, we built a queue with:
1 from queue import P r i o r i t y Q u e u e
2 data = [ [ 3 , ' e ' ] , [ 3 , ' d ' ] , [ 1 0 , ' c ' ] , [ 5 , ' b ' ] , [ 3 , ' a ' ] ]
3 pq = P r i o r i t y Q u e u e ( )
4 f o r d i n data :
5 pq . put ( d )
6
7 process_order = [ ]
8 w h i l e not pq . empty ( ) :
9 p r o c e s s _ o r d e r . append ( pq . g e t ( ) )

The printout for process_order shown as follows indicates how PriorityQueue

works the same as our heapq:
1 [[3 , 'a ' ] , [3 , 'd ' ] , [3 , ' e ' ] , [5 , 'b ' ] , [10 , ' c ' ] ]

Customized Object If we want the higher the value is the higher priority,
we demonstrate how to do so with a customized object with two compar-
ison operators: < and == in the class with magic functions __lt__() and
__eq__(). The code is as:
1 c l a s s Job ( ) :
2 d e f __init__ ( s e l f , p r i o r i t y , t a s k ) :
3 self . priority = priority
4 s e l f . task = task
5 return
6
7 d e f __lt__( s e l f , o t h e r ) :
8 try :
9 return s e l f . p r i o r i t y > other . p r i o r i t y
10 except AttributeError :
11 r e t u r n NotImplemented
12 d e f __eq__( s e l f , o t h e r ) :
170 9. PYTHON DATA STRUCTURES

13 try :
14 r e t u r n s e l f . p r i o r i t y == o t h e r . p r i o r i t y
15 except AttributeError :
16 r e t u r n NotImplemented

Similarly, if we apply the wrapper shown in the second of heapq, we can

have a priority queue that is having sort stability, remove and update item,
and with constant serach time.

In single thread programming, is heapq or PriorityQueue

more efficient?
In fact, the PriorityQueue implementation uses heapq under the hood
to do all prioritisation work, with the base Queue class providing the
locking to make it thread-safe. While heapq module offers no locking,
and operates on standard list objects. This makes the heapq module
faster; there is no locking overhead. In addition, you are free to use
the various heapq functions in different, noval ways, while the Priori-
tyQueue only offers the straight-up queueing functionality.

Hands-on Example
Top K Frequent Elements (L347, medium) Given a non-empty array
of integers, return the k most frequent elements.
Example 1 :
Input : nums = [ 1 , 1 , 1 , 2 , 2 , 3 ] , k = 2
Output : [ 1 , 2 ]

Example 2 :
Input : nums = [ 1 ] , k = 1
Output : [ 1 ]

Analysis: We first using a hashmap to get information as: item and its
frequency. Then, the problem becomes obtaining the top k most frequent
items in our counter: we can either use sorting or use heap. Our exemplary
code here is for the purpose of getting familiar with related Python modules.

• Counter(). Counter() has function most_common(k) that will return

the top k most frequent items. The time complexity is O(n log n).
1 from c o l l e c t i o n s import Counter
2 d e f topKFrequent ( nums , k ) :
3 r e t u r n [ x f o r x , _ i n Counter ( nums ) . most_common ( k ) ]

• heapq.nlargest(). The complexity should be better than O(n log n).

9.10. BONUS 171

1 from c o l l e c t i o n s import Counter

2 import heapq
3 d e f topKFrequent ( nums , k ) :
4 count = c o l l e c t i o n s . Counter ( nums )
5 # Use t h e v a l u e t o compare with
6 r e t u r n heapq . n l a r g e s t ( k , count . k e y s ( ) , key=lambda x :
count [ x ] )

key=lambda x: count[x] can also be replaced with key=lambda x:

count[x].
• PriorityQueue(): We put the negative count into the priority queue
so that it can perform as a max-heap.
1 from queue import P r i o r i t y Q u e u e
2 d e f topKFrequent ( s e l f , nums , k ) :
3 count = Counter ( nums )
4 pq = P r i o r i t y Q u e u e ( )
5 f o r key , c i n count . i t e m s ( ) :
6 pq . put (( − c , key ) )
7 r e t u r n [ pq . g e t ( ) [ 1 ] f o r i i n r a n g e ( k ) ]

9.10 Bonus
Fibonacci heap With fibonacc heap, insert() and getHighestPriority()
can be implemented in O(1) amortized time and deleteHighestPriority()
can be implemented in O(Logn) amortized time.

9.11 Exercises
selection with key word: kth. These problems can be solved by
sorting, using heap, or use quickselect
1. 703. Kth Largest Element in a Stream (easy)
2. 215. Kth Largest Element in an Array (medium)
3. 347. Top K Frequent Elements (medium)
4. 373. Find K Pairs with Smallest Sums (Medium
5. 378. Kth Smallest Element in a Sorted Matrix (medium)
priority queue or quicksort, quickselect
1. 23. Merge k Sorted Lists (hard)
2. 253. Meeting Rooms II (medium)
3. 621. Task Scheduler (medium)
172 9. PYTHON DATA STRUCTURES
Part IV

Core Principle: Algorithm

Design and Analysis

173
175

This part embodies the principle of algorithm design and analysis techniques–
the central part of this book.
Before we start, I wanna emphasize that tree and graph data structure,
especially tree, is a great visualization tool to assist us with algorithm design
and analysis. Tree is a recursive structure, it can almost used to visualize
any recursive based algorithm design or even computing the complexity in
which case it is specifically called recursion tree.
The next three chapters we introduce the principle of algorithm anal-
ysis(chapter 10) and fundamental algorithm design principle–Divide and
conquer(Chapter. 13) and Reduce and conquer(Chapter. IV). In Algorithm
Analysis, we familiarize ourselves with common concepts and techniques
to analyze the performance of algorithms – running time and space com-
plexity. Divide and conquer is a widely used principle in algorithm design,
in our book, we dedicate a whole chapter to its sibling design principle –
reduce and conquer, which is essentially a superset of optimization design
principle–dynamic programming and greedy algorithm–that is further de-
tailed in Chapter. 15 and Chapter. 17.
176
10

Algorithm Complexity Analysis

When a software program runs on a machine, we genuinely care about the

hardware space and the running time that it takes to complete the execution
of the program; space and running time is the cost we need to pay to get the
problem solved. The lower the cost, the happier we would be. Thus, space
and running time are two metrics we use to evaluate the performance of
programs, or rather say, algorithms.
Now, if I ask you the question, "How to evaluate the performance of
algorithms?" Do not go low and tell me, "You just write the code and run
it on a computer?" Because here is the reality: (a) These two metrics are
mostly possible to vary as using different the physical machine and the
programming languages, and (b) The cost will be too high. First, when we
are solving a problem, we would always try to come up with many possible
solutions–algorithms. Implementing and running all candidates just boost
your cost of labor and finance. Second, even at the best case, you only
have one candidate, but what if your designated machine can not load the
program due to the memory limit, what if your algorithm takes millions of
years to run, would you prefer to sit and wait?
With these situation, it is obvious that we need to predict algorithm’s
performance–running time and space–without implementing or running on
a particular machine, and meanwhile the prediction should be independent
of the hardwares. In this chapter, we will study the complexity analysis
method that strives to enable us such ability. The space complexity is
mostly obvious and way easier to obtain compared with its counterpart-time
complexity. This decides that in this chapter, the analysis of time complexity
will outweigh the pages we spent on space complexity. Before we dive into a
plethora of algorithms and data structures, learning the complexity analysis

177
178 10. ALGORITHM COMPLEXITY ANALYSIS

techniques can help us evaluate each algorithm.

10.1 Introduction
In reality, it is impossible to predict the exact behavior of an algorithm, thus
complexity analysis only try to extract the main influencing factors and ig-
nore some trivial details. The complexity analysis is thus only approximate,
but it works.

What are the main influencing factors? Imagine sorting an array of

integers with size 10 and size 10,000,000. The time and space it takes to
these two input size will mostly be a huge difference. Thus, the number
of items in the input size is a straightforward factor. Assume we use n
to denote the size of the input, and the complexity analysis will define an
expression of the running time as T (n) and the space as S(n).
In complexity analysis, RAM model is based upon, where instructions/-
operators are executed one after another, without concurrency. Therefore,
the running time of algorithm on a particular input can be expressed as
counting the number of operations or “steps” to run.

What are the difference cases? Yet, when two input instance has ex-
actly the same size, but with different values, such that one array where the
input array is already sorted, and the other is totally random, the time it
takes to these two cases will possibly vary, depending on the sorting algo-
rithm that you chose. In complexity analysis, best-case, worst-case, average-
case complexity analysis is used to differentiate the behavior of the same
algorithm applied on different input instance.
1. Worst-case: The behavior of the algorithm or an operation of a data
structure with respect to the worst possible case of input instance.
This gave us a way to measure the upper bound on the running time
for any input, which is denoted as O. Knowing it gives us a guarantee
that the algorithm will never take any longer.
2. Average-case: The expected behavior when the input is randomly
drawn from a given distribution. Average case running time is used
as an estimate complexity for a normal case. The expected case here
offers us asymptotic bound Θ. Computation of average-case running
time entails knowing all possible input sequences, the probability dis-
tribution of occurrence of these sequences, and the running times for
the individual sequences. Often it is assumed that all inputs of a given
size are equally likely.
3. Best-case: The possible best behavior when the input data is ar-
ranged in a way, that your algorithms run least amount of time. Best
10.1. INTRODUCTION 179

case analysis can lead us to the lower bound Ω of an algorithm or data

structure.

Toy Example: Selection Sort Given a list of integers, sort the item
incrementally.
For example , g i v e n t h e l i s t A=[10 , 3 , 9 , 2 , 8 , 7 , 9 ] , t h e s o r t e d
l i s t w i l l be :
A=[2 , 3 , 7 , 8 , 9 , 9 , 1 0 ] .

There are many sorting algorithms, in this case, let us examine the selection
sort. Given the input array A, and size to be n, we have index [0, n − 1].
In selection sort, each time we select the current largest item and swap it
with item at its corresponding position in the sorted list, thus dividing the
list into two parts: unsorted list on the left and sorted list on the right. For
example, at the first pass, we choose 10 from A[0, n − 1] and swap it with
A[n − 1], which is 9; at the second pass, we choose the largest item 9 from
A[0, n − 2] and swap it with 7 at A[n − 2], and so. Totally, after n − 1 passes
we will get an incrementally sorted array. More details of selection sort can
be found in Chapter 15.
In the implementation, we use ti to denote the target position and li
the index of the largest item which can only get by scanning. We show the
Python code:
1 def sel ect Sort (a) : cost times
2 ' ' ' Implement s e l e c t i o n s o r t ' ' '
3 n = len (a)
4 f o r i i n r a n g e ( n − 1 ) : #n−1 p a s s e s ,
5 t i = n − 1 −i c n−1
6 li = 0 c n−1
7 f o r j in range (n − i ) :
8 if a[ j ] > a[ li ]: c \sum_{ i =0}^{n−2}(n−i
)
9 li = j c \sum_{ i =0}^{n−2}(n−i
)
10 # swap l i and t i
11 p r i n t ( ' swap ' , a [ l i ] , a [ t i ] )
12 a[ ti ] , a[ li ] = a[ li ] , a[ ti ] c n−1
13 print (a)
14 return a

First, we ignore the distinction between different operation types and treat
all alike with a cost of c. In the above code, the line that comes with
notations–cost and times–are operations. In line 5, we first point at the
target position ti. Because of the for loop above it, this operation will be
called n − 1 times. Same for line 6 and 12. For operation in line 8 and
9, the times it operated is denoted as i=0 (n − i) due to two nested for
Pn−2

loops. And the range of j is dependable of the outer loop with i. We get
180 10. ALGORITHM COMPLEXITY ANALYSIS

our running time T (n) by summing up these cost on the variable of i.

n−2
T (n) = 3c ∗ (n − 1) + 2c(n − i) (10.1)
X

i=0
= 3c ∗ (n − 1) + 2c(n + (n − 1) + (n − 2) + ... + 2)
(n − 1) ∗ (2 + n)
= 3c ∗ (n − 1) + 2c( )
2
= cn2 + cn − 2 + 3cn − 3c
= cn2 + 4cn − 3c − 2 (10.2)
= an2 + bn + c (10.3)

We use three constants a, b, c to rewrite Eq. 10.2 with Eq.10.3.

In the case of sorting, an incrementally sorted array will potentially
be the best-cases that takes the lest running time and on the other hand
decrementally sorted array will be the worst-case. However, in the example
of selection sorted array, even if the input is perfect sorted, the algorithm
does not consider this case, it still runs n-1 passes, each pass it still scans
from a fixed size of window to find the largest item (you would only know it
is the largest by looking all cases). Thus, in this case, the best-case, worst-
case, and average-case all happens to have the same running time shown in
Eq. 10.3.

10.2 Asymptotic Notations

Order of Growth and Asymptotic Running Time In Equation 10.3

we end up with three constant a, b, c and two terms with order n2 and
n. When the input is large enough, all the lower order terms, even if with
large constant, will become relatively insignificant to the highest term; we
thus neglect the lower terms and end up with an2 . Further, we neglect
the constant coefficient a for the same reason. However, we can not say
T (n) = n2 , because we know mathematically speaking, it is wrong.
Instead, since we are only interested with property of T (n) when n is
large enough, we say the relation between the original complexity function
an2 + bn + c is “asymptotically equivalent to” n2 , which reads “T (n) is is
asymptotic to n2 ” and denoted as T (n) = an2 + bn +c n2 . Form Fig. 10.1,
we can visualize that when n is large enough, the term n is trivial compared
with n2 .
In this way, we manage to classify our complexity into a group of families,
say, exponential 2n or polynomial n2 .
10.2. ASYMPTOTIC NOTATIONS 181

Figure 10.1: Order of Growth of Common Functions

Definition of Asymptotic Notations

We mentioned “asymptotically equivalent” relation, which can be formal-

ized and defined with Θ-Notation as T (n) = Θ(n), one of the main three
asymptotic notations–asymptotically equivalent, smaller, and larger–we will
cover in this section.

Θ-Notation For a given function g(n), we define Θ(g(n))(pronounced as

“big theta”) as a set of functions Θ(g(n)) = {f (n)}, that each f (n) can be
bounded by g(n) by 0 ≤ c1 g(n) ≤ f (n) ≤ c2 g(n) for all n ≥ n0 for positive
constant c1 , c2 and n0 . We show this relation in Fig. 10.2. Strictly speaking,
we would write f (n) ∈ Θ(g(n)) to indicate that f (n) is just one member
of the set of functions that Θ(g(n)) can represent. However, in the field of
computer science, we write f (n) = Θ(g(n)) instead.
We say g(n) is an asymptotically tight bound of f (n). For example, we
can say n2 is asymptotically tight bound for 2n2 + 3n + 4 or 5n2 + 3n + 4
or 3n2 or any other similar functions. We can denote our running time as
T (n) = Θ(n2 ).

O-Notation Further, we define the asymptotically upper bound of a set

of functions {f (n)} as O(g(n))(pronounced as “big oh” of f (n)), with 0 ≤
f (n) ≤ cg(n) for all n ≥ n0 for positive constant c and n0 . We show this
relation in Fig. 10.2.
Note that T (n) = Θ(g(n)) implies that T (n) = O(g(n)), but not the
other way around. With 2n2 + 3n + 4 or 5n2 + 3n + 4 or 3n2 , it also be
182 10. ALGORITHM COMPLEXITY ANALYSIS

Figure 10.2: Graphical examples for asymptotic notations. Replace f(n)

with T(n)

denoted as T (n) = O(n2 ). Big Oh notation is widely applied in computer

science to describe either the running time or the space complexity.

Ω-Notation It provides asymptotic lower bound running time. With

T (n) = Ω(g(n))(pronounced as “big omega”) we represent a set of functions
that 0 ≤ cg(n) ≤ f (n) for all n ≥ n0 for positive constant c and n0 .

Does it mean that O is worst-case, Θ is the average-case

and Ω is the best-case? How does it relate to this three
cases.

Properties of Asymptotic Comparisons

We should note that only if f (n) = O(g(n)) and f (n) = Ω(g(n)), we can
have f (n) = Θ(g(n)).

Table 10.1: Analog of Asymptotic Relation

Notation Similar Relations
f (n) = Θ(g(n)) f (n) = g(n)
f (n) = O(g(n)) f (n) ≤ g(n)
f (n) = Ω(g(n)) f (n) ≥ g(n)

It is fair to denote the relation of g(n) and f (n) to similar relation as

between real numbers as shown in Table. 10.1. Thus the properties of real
numbers, such as transitivity, reflexivity, symmetry, transpose symmetry all
holds for asymptotic notations.
10.3. PRACTICAL GUIDELINE 183

10.3 Practical Guideline

The previous two sections, we introduced the complexity function T (n), how
it is influenced by different cases of input instance–worst, average, and best
cases, and how that we can use asymptotic notations to focus the complexity
only on the dominant term in function T (n). In this section, we would like
to provide some practical guideline that arise in real application.

Input Size and Running Time In general, the time taken by an al-
gorithm grows with the size of the input, so it is universal to describe the
running time of a program as a function of the size of its input. f (n), with
the input size denoted as n.
The notation of input size depends on specific problems and data struc-
tures. For example, the size of the array can be denoted as integer n, the
total numbers of bits when it come to binary notation, and sometimes, if
the input is matrix or graph, we need to use two integers such as (m, n) for
a two-dimensional matrix or (V, E) for the vertices and edges in a graph.
We use function T to denote the running time. With input size of n, our
running time can be denoted as T (n). Given (m, n), it can be T (m, n).

Worst-case Analysis is Preferred In reality, worst-case input is chosen

as our indicator over the best input and average input for: (a) best input
is not representative; there is usually an input for the algorithm become
trivial; (b) the average-input is sometimes very hard to define and measure;
(3) In some cases, the worst-case input is very close to the average and to
the observational input; (4)The algorithm with the best efficiency on the
worst-case usually achieve the best performance.

Relate Asymptotic Notations to Three Cases of Input Instance

It might seemingly confusing about how the asymptotic notation relates to
the three cases of input instance–worst-case, best-case, and average case.
Think about it this way, asymptotic notations apply to any function
that it abstract away some lower-term to characterize the property of the
function when the input is large or infinite. Therefore, it has nothing to do
with these three cases in this way.
However, assume we are trying to characterize the complexity of an
algorithm, and we analyzed its best-case and worst case input:

• Worst-case: T (n) = an2 +bn+c, now we can say T (n) = Θ(n2 ), which
indicates that T (n) = Ω(n2 ) and T (n) = O(n2 ).

• Best-case: T (n) = an, we can say T (n) = Θ(n), which indicates that
T (n) = Ω(n) and T (n) = O(n).
184 10. ALGORITHM COMPLEXITY ANALYSIS

In order to describe the complexity of our algorithm in general; put aside

the particular input instance. Such as the the average case analysis, which
is typically hard to “average” between different input, we can come up with
an estimation, and safely say for the time complexity in general is an ≤
T (n) ≤ an2 + bn + c. This can be further expanded as:

c1 n ≤ an ≤ T (n) ≤ an2 + bn + c ≤ c2 n2 (10.4)

Equivalently, we are safe to characterize a lower-bound based on best-case

and an upper-bound based on the worst-case, thus we say the time com-
plexity of our algorithm as T (n) = Ω(n), T (n) = O(n2 ).

Big Oh is a Popular Notation to Complexity Analysis As we have

concluded that the worst-case analysis is both easy to get and good indicator
of the overall complexity. Big Oh as the absolute upper bound of the worst-
case would also indicate the upper bound of the algorithm in general.
Even if we can get a tight bound for the algorithm as in the case of
selection sort, it is always right to say that its an upper bound because
Θ(g(n)) is a subset of O(g(n)). This is like, we know dog is categorized as
canine, and canine is in the type of mammal, thus, we are right to say that
dog is a species of mammal.

10.4 Time Recurrence Relation

We have studied recurrence relation throughly in Chapter. II. How does it
relate to complexity analysis? We can represent either recursive function or
iterative function with time recurrence relation. Therefore, the complexity
analysis can be done in two steps: (1) get the recurrence relation and (2)
solve the recurrence relation.

• For recursive function, this representation is natural. For example, in

the merge sort, it can be easily represented as T (n) = 2T (n/2) + O(n),
that each step it divides a problem of size n into two subproblems
each with half size, and the cost to combine the solution of these two
subproblems will be at most n, that is why we add O(n).

• A time recurrence relation can be easily applied on iterative program

too. Say, in the simple task where we try to search a target in a
list array, we can write a recurrence relation function to it as T (n) =
T (n − 1) + 1. Because, in the scanning process, one move reduce the
problem to a smaller size, and the case of it is 1. Using the asymptotic
notation, we can further write it as T (n) = T (n − 1) + O(1). Solving
this recurrence relation straightforwardly through iteration method,
we can have T (n) = O(n).
10.4. TIME RECURRENCE RELATION 185

As in the chapter. ??, there are generally two ways of reducing a problem:
divide and conquer and Reduce by Constant size, which is actually a non-
homogenous recurrence relation.
In Chapter. II, we showed how to solve linear recurrence relation and get
absolute answer, it was seemingly complex and terrifying. Good news, as
complexity analysis is about estimating the cost, so we can loose ourselves a
bit and sometimes a lower/upper bound is good enough, and the base case
will almost always be O(1) = 1.

10.4.1 General Methods to Solve Recurrence Relation

We have shown in Chapter. II there are iterative method and mathematical

induction as general methods to try to solve an easy recurrence relation. We
demonstrate how these two methods can be used in solving time recurrence
relations first. Additionally, we introduce recursion tree method.

Iterative Method

The most straightforward method for solving recurrence relation no mat-

ter its linear or non-linear is the iterative method. Iterative method is a
technique or procedure in computational mathematics that it iteratively re-
place/substitute each an with its recurrence relation Ψ(n, an−1 , an−2 , ..., an−k )
till all items “disappear” other than the initial values. Iterative method is
also called substitution method.
We demonstrate iteration with a simple non-overlapping recursion.

T (n) = T (n/2) + O(1) (10.5)

= T (n/2 ) + O(1) + O(1)
2

= T (n/23 ) + 3O(1)
= ...
= T (1) + kO(1) (10.6)

We have 2nk = 1, we solve this equation and will get k = log2 n. Most
likely T (1) = O(1) will be the initial condition, we replace this, and we get
T (n) = O(log2 n).
However, when we try to apply iteration on the third recursion: T (n) =
3T (n/4) + O(n). It might be tempting to assume that T (n) = O(n log n)
186 10. ALGORITHM COMPLEXITY ANALYSIS

due to the fact that T (n) = 2T (n/2) + O(n) leads to this time complexity.

T (n) = 3T (n/4) + O(n) (10.7)

= 3(3T (n/4 ) + n/4) + n = 3 T (n/4 ) + n(1 + 3/4)
2 2 2

= 32 (3T (n/43 ) + n/42 ) + n(1 + 3/4) = 33 T (n/43 ) + n(1 + 3/4 + 3/42 )

(10.8)
= ... (10.9)
k−1
3
= 3k T (n/4k ) + n ( )i (10.10)
X

i=0
4

Recursion Tree

Figure 10.3: The process to construct a recursive tree for T (n) =

3T (bn/4c) + O(n). There are totally k+1 levels. Use a better figure.
10.4. TIME RECURRENCE RELATION 187

with iteration and recursion tree, our time complexity function becomes:
k
T (n) = Li + Lk+1 (10.11)
X

i=1
k
=n (3/4)i−1 + 3k T (n/4k ) (10.12)
X

i=1

In the process, we can see that Eq. 10.13 and Eq. 10.7 are the same.
Because T (n/4k ) = T (1) = 1, we have k = log4 n.
∞
T (n) ≤ n (3/4)k−1 + 3k T (n/4k ) (10.13)
X

i=1
≤ 1/(1 − 3/4)n + 3log4 n T (1) = 4n + nlog4 3 ≤ 5n (10.14)
= O(n) (10.15)

Mathematical Induction
Mathematical induction is a mathematical proof technique, and is essentially
used to prove that a property P (n) holds for every natural number n, i.e.
for n = 0, 1, 2, 3, and so on. Therefore, in order to use induction, we need
to make a guess of the closed-form solution for an . Induction requires two
cases to be proved.
1. Base case: proves that the property holds for the number 0.

2. Induction step: proves that, if the property holds for one natural num-
ber n, then it holds for the next natural number n + 1.
For T (n) = 2 × T (n − 1) + 1, T0 = 0, we can have the following result by
expanding T (i), i ∈ [0, 7].
n 0 1 2 3 4 5 6 7
T_n 0 3 7 15 31 63 127

It is not hard that we find the rule and guess T (n) = 2n − 1. Now, we prove
this equation by induction:
1. Show that the basis is true: T (0) = 20 − 1 = 0.

2. Assume it holds true for T (n − 1). By induction, we get

T (n) = 2T (n − 1) + 1 (10.16)
= 2(2 n−1
− 1) + 1 (10.17)
=2 −1
n
(10.18)

Now we show that the induction step holds true too.

188 10. ALGORITHM COMPLEXITY ANALYSIS

Solve T (n) = T (n/2)+O(1) and T (2n) ≤ 2T (n)+2n−1, T (2) =

10.4.2 Solve Divide-and-Conquer Recurrence Relations

All the previous recurrence relation, either homogeneous or non-homogeneous,
they fall into the bucket of decrease and conquer (maybe not right), and ei-
ther is yet another type of recursion–Divide and Conquer. Same here, we
ignore how we get such recurrence but focus on how to solve it.
We write our divide and conquer recurrence relation using the time com-
plexity function, there are two types as shown in Eq.10.19(n are divided
equally) and E1.10.20(n are divided unequally):

T (n) = aT (n/b) + f (n) (10.19)

where a ≤ 1, b > 1, and f (n) is a given function, which usually has f (n) =
cnk .
k
T (n) = ai T (n/bi ) + f (n) (10.20)
X

i=1

Considering that the first type is much more commonly seen that the other,
we only learn how to solve the first type; in fact, at least, I assume you that
within this book, the second type will never appear.

Sit and Deduct For simplicity, we assume n = bm , so that n/b is always

integer. First, let us use the iterative method, and expand Eq. 10.19 up till
n/bm times so that T (n) become T (1):

T (n) = aT (n/b) + cnk (10.21)

= a(aT (n/b2 ) + c(n/b)k ) + cnk (10.22)
= a(a(T (n/b ) + c(n/b ) ) + c(n/b) ) + cn
3 2 k k k
(10.23)
..
. (10.24)
= a(a(. . . T (n/bm ) + c(n/bm−1 )k ) + . . .) + cnk (10.25)
= a(a(. . . T (1) + cb ) + . . .) + cn
k k
(10.26)

Now, assume T (1) = c for simplicity and for getting rid of this constant part
in our sequence. Then,

T (n) = cam + cam−1 bk + cam−2 b2k + . . . + cbmk , (10.27)

10.4. TIME RECURRENCE RELATION 189

which implies that

m
T (n) = c (10.28)
X
am−i bik
i=0
m
bk
= cam ( )i (10.29)
X

i=0
a

So far, we get a geometric series, which is a good sign to get the closed-form
expression. We first summarize all possible substitutions that will help our
further analysis.

f (n) = cnk (10.30)

n=b m
(10.31)
→
− (10.32)
m = logb n (10.33)
f (n) = cb mk
(10.34)
am = alogb n = nlogb a (10.35)

Depending on the relation between a and bk , there are three cases:

k
1. bk < a: In this case, ba < 1, so the geometric series converges to a
constant even if m goes to infinity. Then, we have an upper bound
for T (n), T (n) < cam , which converts to T (n) = O(am ). According to
Eq. 10.35, we further get:

T (n) = O(nlogb a ) (10.36)

k
2. bk = a: With ba = 1, T (n) = O(am m). With Eq. 10.35 and Eq. 10.33,
our upper bound is:

T (n) = O(nlogb a logb n) (10.37)

k
3. bk > a: In this case, we denote ba = d (d is a constant and d > 1).
Use the standard formula for summing over a geometric series:

dm+1 − 1 dm+1 − 1
T (n) = cam = O(am ) (10.38)
d−1 d−1
= O(bmk ) = O(nk ) = O(f (n)) (10.39)

Master Method
Comparison between bk and a equals to the comparison between bkm between
am . From the above substitution, it further equals to compare f (n) to
190 10. ALGORITHM COMPLEXITY ANALYSIS

nlogb a . This is when master method kicks in and we will see how it helps us
to apply these three cases into real situation.
Compare f (n)/c = nk with nlogb a . Intuitively, the larger of the two
functions would dominate the solution to the recurrence. Now, we rephrase
the three cases using the master method for the easiness of memorization.

1. If nk < nlogb a , or say polynomially smaller than by a factor of n for

some constant > 0, we have:

T (n) = O(nlogb a ) (10.40)

2. If nk > nlogb a , similarily, we need it to be polynomially larger than a

factor of n for some constant > 0, we have:

T (n) = O(f (n)) (10.41)

3. If nk = nlogb a , then:

T (n) = O(nlogb a logb n) (10.42)

10.4.3 Hands-on Example: Insertion Sort

In this section, we are expecting to see example that has different asymptotic
bound as the input differs; where we focus more on the worst-case and
average-case analysis. Along the analysis of complexity, we will also see how
asymptotic notation can be used in equations or inequalities to assist the
process.
Because most of the time, the average-case running time will be asmptot-
ically equal to the worst-case, thus we do not really try to analyze it at the
first place. In the case of best-case, it would only matter if you know your
application context fits right in, otherwise, it will be trivial and non-helpful
in the comparison of multiple algorithms. We will see example below.

Insertion Sort: Worst-case and Best-case There is another sorting

algorithm–insertion sort–it sets aside another array S to save the sorted
items. At first, we can put the first item in which itself is already sorted.
At the second pass, we put A[1] into the right position in S. Until the last
item is handled, we return the sorted list. The code is:
1 def insertionSort (a) :
2 ' ' ' implement i n s e r t i o n s o r t ' ' '
3 i f not a o r l e n ( a ) == 1 :
4 return a
5 n = len (a)
6 s l = [ a [ 0 ] ] + [ None ] ∗ ( n−1) # s o r t e d l i s t
7 f o r i i n r a n g e ( 1 , n ) : # i t e m s t o be i n s e r t e d i n t o t h e s o r t e d
8 key = a [ i ]
10.4. TIME RECURRENCE RELATION 191

9 j = i −1
10
11 w h i l e j >= 0 and s l [ j ] > key : # compare key from t h e l a s t
s o r te d element
12 s l [ j +1] = s l [ j ] # s h i f t a [ j ] backward
13 j −= 1
14 s l [ j +1] = key
15 print ( sl )
16 return s l

For the first for loop in line 7, it will sure has n − 1 passes. However, for the
inner while loop, the real times of execution of statement in line 12 and 13
depends on the state between sl and key. If we try to sort the input array
a incrementally such that A=[2, 3, 7, 8, 9, 9, 10], and if the input array is
already sorted, then there will be no items in the sorted list can be larger
than our key which result only the execution of line 14. This is the best
case, we can denote the running time of the while loop by Ω(1) because it
has constant running time at its best case. However, if the input array is a
reversed as the desired sorting, which means it is decreasing sorted such as
A=[10, 9, 9, 9, 7, 3, 2], then the inner while loop will has n − i, we denote
it by O(n). We can denote our running time equation as:

T (n) = T (n − 1) + O(n) (10.43)

= O(n )2

And,

T (n) = T (n − 1) + Ω(1) (10.44)

= Ω(n)

Using simple iteration, we can solve the math formula and have the asymp-
totic upper bound and lower bound for the time complexity of insertion
sort.
For the average case, we can assume that each time, we need half time
of comparison of n − i, we can have the following equation:

T (n) = T (n − 1) + Θ(n/2) (10.45)

n n−1
= T (n − 2) + Θ( + )
2 2
= Θ(n2 )

For algorithm that is stable in complexity, we conventionally analyze its

average performance, and it is better to use Θ-notation in the running time
equation and give the asymptotic tight bound like in the selection sort. For
algorithm such as insertion sort, whose complexity varies as the input data
distribution we conventionally analyze its worst-case and use O-notation.
192 10. ALGORITHM COMPLEXITY ANALYSIS

10.5 *Amortized Analysis

There are two different ways to evaluate an algorithm/data structure:

1. Consider each operation separately: one that look each operation in-
curred in the algorithm/data structure separately and offers worst-case
running time O and average running time Θ for each operation. For
the whole algorithm, it sums up on these two cases by how many times
each operation is incurred.

2. Amortized among a sequence of (related) operations: Amortized anal-

ysis can be used to show that the average cost of an operation is
small, if one averages over a sequence of operations, even though a
simple operation might be expensive. Amortized analysis guarantees
the average performance of each operation in the worst case.

Amortized analysis does not purely look each operation on a given data
structure separately, it averages time required to perform a sequence of
different data structure operations over all performed operations. With
amortized analysis, we might see that even though one single operation
might be expensive, the amortized cost of this operation on all operations is
small. Different from average-case analysis, probability will not be applied.
From the example later we will see that amortized analysis view the data
structure in applicable scenario, to complete this tasks, what is the average
cost of each operation, and it is acheviable given any input. Therefore, the
same time complexity, say O(f (n)), worst-case > amortized > average.
There are three types of amortized analysis:

1. Aggregate Analysis:

2. Accounting Method:

3. Potential method:

10.6 Space Complexity

The analysis of space complexity is more straightforward, given that we are
essentially the one who allocate space for the application. We simply link it
to the size of items in the data structures. The only obscure is with recursive
program which takes space from stack but is hidden from the users by the
programming language compiler or interpreter. The recursive program can
be represented as a recursive tree, the maximums stack space it needs is
decided by the height of the recursive tree, thus O(h), given h as the height.
10.7. SUMMARY 193

Space and Time Trade-off In the field of algorithm design, we can

usually trade space for time efficiency or trade time for space efficiency. For
example, if you put your algorithm on a backend server, we need to response
the request of users, then decrease the response time if especially useful here.
Normally we want to decrease the time complexity by sacrificing more space
if the extra space is not a problem for the physical machine. But in some
cases, decrease the time complexity is more important and needed, thus we
need might go for alternative algorithms that uses less space but might with
more time complexity.

10.7 Summary

For your convenience, we provide a table that shows the frequent used re-
currence equations’ time complexity.

Figure 10.4: The cheat sheet for time and space complexity with recurrence
function. If T(n) = T(n-1)+T(n-2)+...+T(1)+O(n-1) = 3n . They are called
factorial, exponential, quadratic, linearithmic, linear, logarithmic, constant.
194 10. ALGORITHM COMPLEXITY ANALYSIS

10.8 Exercises
10.8.1 Knowledge Check
1. Use iteration and recursion tree to get the time complexity of T (n) =
T (n/3) + 2T (2n/3) + O(n).

2. Get the time complexity of T (n) = 2T (n/2) + O(n2 ).

3. T (n) = T (n − 1) + T (n − 2) + T (n − 3) + ... + T (1) + O(1).

Search Strategies

Our standing at graph algorithms:

1. Search Strategies (Current)

2. Combinatorial Search(Chapter)

3. Advanced Graph Algorithm(Current)

4. Graph Problem Patterns(Future Chapter)

Searching 1 is one of the most effective tools in algorithms. We have

seen them being widely applied in the field of artificial intelligence to offer
either exact or approximate solutions for complex problems such as puzzles,
games, routing, scheduling, motion planning, navigation, and so on. On
the spectrum of discrete problems, nearly every single one can be modeled
as a searching problem together with enumerative combinatorics and opti-
mizations. The searching solutions serve as either naive baselines or even
as the only existing solutions for some problems. Understanding common
searching strategies as the main goal of this chapter along with the search
space of the problem lays the foundation of problem analysis and solving, it
is just indescribably powerful and important!

11.1 Introduction
Linear, tree-like data structures, they are all subsets of graphs, making graph
searching universal to all searching algorithms. There are many searching
1
https://en.wikipedia.org/wiki/Category:Search_algorithms

195
196 11. SEARCH STRATEGIES

strategies, and we only focus on a few decided upon the completeness of an

algorithm–being absolutely sure to find an answer if there is one.
Searching algorithms can be categorized into the following two types
depending on if the domain knowledge is used to guide selection of tbe best
path while searching:

1. Uninformed Search: This set of searching strategies normally are han-

dled with basic and obvious problem definition and are not guided
by estimation of how optimistic a certain node is. The basic algo-
rithms include: Depth-first-Search(DFS), Breadth-first Search(BFS),
Bidirectional Search, Uniform-cost Search, Iterative deepening search,
and so on. We choose to cover the first four.

2. Informed(Heuristic) Search: This set of searching strategies on the

other hand, use additional domain-specific information to find a heuris-
tic function which estimates the cost of a solution from a node. Heuris-
tics means “serving to aid discovery”. Common algorithms seen here
include: Best-first Search, Greedy Best-first Search, A∗ Search. And
we only introduce Best-first Search.

Following this introductory chapter, in Chapter Combinatorial Search,

we introduce combinatorial problems and its search space, and how to prune
the search space to search more efficiently.
Because the search space of a problem can either be of linear or tree
structure–an implicit free tree, which makes the graph search a “big deal” in
practice of problem solving. Compared with reduce and conquer, searching
algorithms treat states and actions atomically: they do not consider any
internal/optimal structure they might posses. We recap the linear search
given its easiness and that we have already learned how to search in multiple
linear data structures.

Linear Search As the naive and baseline approach compared with other
searching algorithms, linear search, a.k.a sequential search, simply traverse
the linear data structures sequentially and checking items until a target
is found. It consists of a for/while loop, which gives as O(n) as time
complexity, and no extra space needed. For example, we search on list A to
find a target t:
1 d e f l i n e a r S e a r c h (A, t ) : #A i s t h e a r r a y , and t i s t h e t a r g e t
2 f o r i , v i n enumerate (A) :
3 i f A[ i ] == t :
4 return i
5 r e t u r n −1

Linear Search is rarely used practically due to its lack of efficiency com-
pared with other searching methods such as hashmap and binary search that
we will learn soon.
11.1. INTRODUCTION 197

Searching in Un-linear Space For the un-linear data structure, or

search space comes from combinatorics, they are generally be a graph and
sometimes be a rooted tree. Because mostly the search space forms a search
tree, we introduce searching strategies on a search tree first, and then we
specifically explore searching in a tree, recursive tree traversal, and search
in a graph.

Generatics of Search Strategies

Assume we know our state space, searching or state-space search is the

process of searching through a state space for a solution by making explicit
a sufficient portion of an implicit state-space graph, in the form of a search
tree, to include a goal node.

Figure 11.1: Graph Searching

Nodes in Searching Process In the searching process, nodes in the tar-

geting data structure can be categorized into three sets as shown in Fig.11.1
and we distinguish the state of a node–which set they are at with a color
each.
198 11. SEARCH STRATEGIES

• Unexplored set–WHITE: initially all nodes in the graph are in the

unexplored set, and we assign WHITE color. Nodes in this set have
not yet being visited yet.

• Frontier set–GRAY: nodes which themselves have been just discov-

ered/visited and they are put into the frontier set, waiting to be ex-
panded; that is to say their children or adjacent nodes (through outgo-
ing edges) are about to be discovered and have not all been visited–not
all being found in the frontier set yet. This is an intermediate state
between WHITE and BLACK, which is ongoing, visiting but not yet
completed. Gray vertex might have adjacent vertices of all three pos-
sible states.

• Explored set–BLACK: nodes have been fully explored after being in

the frontier set; that is to say none of their children is not explored
and being in the unexplored set. For black vertex, all vertices adjacent
to them are nonwhite.

All searching strategies follow the general tree search algorithm:

1. At first, put the state node in the frontier set.

1 f r o n t i e r = {S}

2. Loop through the frontier set, if it is empty then searching terminates.

Otherwise, pick a node n from frontier set:

(a) If n is a goal node, then return solution

(b) Otherwise, generate all of n’s successor nodes and add them all
to frontier set.
(c) Remove n from frontier set.

Search process constructs a search tree where the root is the start state.
Loops in graph may cause the search tree to be infinite even if the state
space is small. In this section, we only use either acyclic graph or tree for
demonstrating the general search methods. In acyclic graph, there might
exist multiple paths from source to a target. For example, the example
shown in Fig. ?? has multiple paths from to. Further in graph search sec-
tion, we discuss how to handle cycles and explain single-path graph search.
Changing the ordering in the frontier set leads to different search strategies.

11.2 Uninformed Search Strategies

Through this section, we use Fig. 11.2 as our exemplary graph to search on.
The data structure to represent the graph is as:
11.2. UNINFORMED SEARCH STRATEGIES 199

Figure 11.2: Exemplary Acyclic Graph.

1 from c o l l e c t i o n s import d e f a u l t d i c t
2 al = defaultdict ( l i s t )
3 a l [ ' S ' ] = [ ( 'A ' , 4 ) , ( 'B ' , 5 ) ]
4 a l [ 'A ' ] = [ ( 'G ' , 7 ) ]
5 a l [ 'B ' ] = [ ( 'G ' , 3 ) ]

With uninformed search, we only know the goal test and the adjacent
nodes, but without knowing which non-goal states are better. Assuming
and limiting the state space to be a tree for now so that we won’t worry
about repeated states.
There are generally two ways to order nodes in the frontier without
domain-specific information:

• Queue that nodes are first in and first out (FIFO) from the frontier
set. This is called breath-first search.

• Stack that nodes are last in but first out (LIFO) from the frontier set.
This is called depth-first search.

• Priority queue that nodes are sorted increasingly in the path cost from
source to each node from the frontier set. This is called Uniform-Cost
Search.

11.2.1 Breath-first Search

Breath-first search always expand the shallowest node in the frontier first,
visiting nodes in the tree level by level as illustrated in Fig. 11.3. Using Q
to denote the frontier set, the search process is explained:
Q=[A]
Expand A, add B and C i n t o Q
200 11. SEARCH STRATEGIES

Figure 11.3: Breath-first search on a simple search tree. At each stage, the
node to be expanded next is indicated by a marker.

Q=[B, C ]
Expand B, add D and E i n t o Q
Q=[C, D, E ]
Expand C, add F and G i n t o Q
Q=[D, E , F , G]
F i n i s h expanding D
Q=[E , F , G]
F i n i s h expanding E
Q=[F , G]
F i n i s h expanding F
Q=[G]
F i n i s h expanding G
Q= [ ]

The implementation can be done with a FIFO queue iteratively as:

1 def bfs (g , s ) :
2 q = [s]
3 while q :
4 n = q . pop ( 0 )
5 p r i n t ( n , end = ' ' )
6 for v , _ in g [ n ] :
7 q . append ( v )

Call the function with parameters as bfs(al, ’S’), the output is as:
S A B G G

Properties Breath-first search is complete because it can always find

the goal node if it exists in the graph. It is also optimal given that all
actions(arcs) have the same constant cost, or costs are positive and non-
decreasing with depth.

Time Complexity We can clearly see that BFS scans each node in the
tree exactly once. If our tree has n nodes, it makes the time complexity O(n).
However, the search process can be terminated once the goal is found, which
can be less than n. Thus we measure the time complexity by counting the
number of nodes expanded while searching is running. Assume the tree has
a branching factor b at each non-leaf node and the goal node locates at
depth d, we sum up the number of nodes from depth 0 to depth d, the total
11.2. UNINFORMED SEARCH STRATEGIES 201

number of nodes expanded are:

d
n= (11.1)
X
bi
i=0
−1
bd+1
= (11.2)
b−1

Therefore, we have a time complexity of O(bd ). It is usually very slow to find

solutions with a large number of steps because it must look at all shorter
length possibilities first.

Space Complexity The space is measured in terms of the maximum size

of frontier set during the search. In BFS, the maximum size is the number
of nodes at depth d, resulting the total space cost to O(bd ).

11.2.2 Depth-first Search

Figure 11.4: Depth-first search on a simple search tree. The unexplored

region is shown in light gray. Explored nodes with no descendants in the
frontier are removed from memory as node L disappears. Dark gray marks
nodes that is being explored but not finished.
202 11. SEARCH STRATEGIES

Depth-first search on the other hand always expand the deepest node
from the frontier first. As shown in Fig. 11.4, Depth-first search starts at
the root node and continues branching down a particular path. Using S
to denote the frontier set which is indeed a stack, the search process is
explained:
S=[A]
Expand A, add C and B i n t o S
S=[C, B ]
Expand B, add E and D i n t o S
S=[C, E , D]
Expand D
S=[C, E ]
Expand E
S=[C ]
Expand C, add G and F i n t o S
S=[C, G, F ]
Expand F
S=[C, G]
Expand G
S=[C ]
Expand C
S=[]

Depth-first can be implemented either recursively or iteratively.

Recursive Implementation In the recursive version, the recursive func-

tion keeps calling the recursive function itself to expand its adjacent nodes.
Starting from a source node, it always deepen down the path until a leaf
node is met and then it backtrack to expand its other siblings (or say other
adjacent nodes). The code is as:
1 def dfs (g , vi ) :
2 p r i n t ( v i , end= ' ' )
3 for v , _ in g [ vi ] :
4 dfs (g , v)

Call the function with parameters as dfs(al, ’S’), the output is as:
S A G B G

Iterative Implementation According to the definition, we can imple-

ment DFS with LIFO stack data structure. The code is similar to that of
BFS other than using different data structure from the frontier set.
1 def dfs_iter (g , s ) :
2 stack = [ s ]
3 while stack :
4 n = s t a c k . pop ( )
5 p r i n t ( n , end = ' ' )
6 for v , _ in g [ n ] :
7 s t a c k . append ( v )
11.2. UNINFORMED SEARCH STRATEGIES 203

Call the function with parameters as dfs_iter(al, ’S’), the output is as:
S B G A G

We observe that the ordering is not exactly the same as of the recursive
counterpart. To keep the ordering consistent, we simply need to add the
adjacent nodes in reversed order. In practice, we replace g[n] with g[n][:: −1].

Properties DFS may not terminate without a fixed depth bound to limit
the amount of nodes that it expand. DFS is not complete because it always
deepens the search and in some cases the supply of nodes even within the
cutting off fixed depth bound can be infinitely. DFS is not optimal, in our
example, of our goal node is C, it goes through nodes A, B, D, E before
it finds node C. While, in the BFS, it only goes through nodes A and C.
However, when we are lucky, DFS can find long solutions quickly.

Time Complexity For DFS, it might need to explore all nodes within
graph to find the target, thus its worst-case time and space complexity is
not decided upon by the depth of the goal, but the total depth of the graph,
d instead. DFS has the same time complexity as BFS, which is O(bd ).

Space Complexity The stack will at most stores a single path from the
root to a leaf node (goal node) along with the remaining unexpanded siblings
so that when it has visited all children, it can backward to a parent node,
and know which sibling to explore next. Therefore, the space that needed
for DFS is O(bd). In most cases, the branching factor is a constant, which
makes the space complexity be mainly influenced by the depth of the search
tree. Obviously, DFS has great efficiency in space, which is why it is adopted
as the basic technique in many areas of computer science, such as solving
constraint satisfaction problems(CSPs). The backtracking technique we are
about to introduce even further optimizes the space complexity on the basis
of DFS.

11.2.3 Uniform-Cost Search(UCS)

When a priority queue is used to order nodes measured by the path cost
of each node to the root in the frontier, this is called uniform-cost search,
aka Cheapest First Search. In UCS, frontier set is expanded only in the
direction which requires the minimum cost to travel to from root node.
UCS only terminates when a path has explored the goal node, and this path
is the cheapest path among all paths that can reach to the goal node from
the initial point. When UCS is applied to find shortest path in a graph, it
is called Dijkstra’s Algorithm.
We demonstrate the process of UCS with the example shown in Fig. 11.2.
204 11. SEARCH STRATEGIES

Here, our source is ‘S’, and the goal is ‘G’. We are set to find a path from
source to goal with minimum cost. The process is shown as:
Q = [(0 , S) ]
Expand S , add A and B
Q = [ ( 4 , A) , ( 5 , B) ]
Expand A, add G
Q = [ ( 5 , B) , ( 1 1 , G) ]
Expand B, add G
Q = [ ( 8 , G) , ( 1 1 , G) ]
Expand G, g o a l found , t e r m i n a t e .

And the Python source code is:

1 import heapq
2 d e f u c s ( graph , s , t ) :
3 q = [ ( 0 , s ) ] # i n i t i a l path with c o s t 0
4 while q :
5 c o s t , n = heapq . heappop ( q )
6 # Test g o a l
7 i f n == t :
8 return cost
9 else :
10 f o r v , c i n graph [ n ] :
11 heapq . heappush ( q , ( c + c o s t , v ) )
12 r e t u r n None

Properties Uniformed-Cost Search is complete as a similar search strat-

egy compared with breath-first search(using queue). It is optimal even if
there exist negative edges.

Time and Space Complexity Similar to BFS, both the worst case time
and space complexity is O(bd ). When all edge costs are c, and C ∗ is the
best goal path cost, the time and space complexity can be more precisely
∗
represented as O(bC /c ).

11.2.4 Iterative-Deepening Search

Iterative-Deepening Search(IDS) is a modification on top of DFS, more
specifically depth limited DFS(DLS); as the name suggests, IDS sets a max-
imum depth as a “depth bound”, and it calls DLS as a subroutine looping
from depth zero to maximum depth to expand nodes just as DFS will do
and it only does goal test for nodes at the testing depth.
Using the graph in Fig. 11.2 as an example. The process is shown as:
maxDepth = 3

depth = 0 : S = [ S ]
Test S , g o a l not found

depth = 1 : S =[S ]
11.2. UNINFORMED SEARCH STRATEGIES 205

Expand S , S = [ B, A]
Test A, g o a l not found
Test B, g o a l not found

depth = 2 : S=[S ]
Expand S , S=[B, A]
Expand A, S=[B, G]
Test G, g o a l found , STOP

The implementation of the DLS goes easier with recursive DFS, we use a
count down to variable maxDepth in the function, and will only do goal
testing util this variable reaches to zero. The code is as:
1 d e f d l s ( graph , cur , t , maxDepth ) :
2 # End C o n d i t i o n
3 i f maxDepth == 0 :
4 i f c u r == t :
5 r e t u r n True
6 i f maxDepth < 0 :
7 return False
8
9 # Recur f o r a d j a c e n t v e r t i c e s
10 f o r n , _ i n graph [ c u r ] :
11 i f d l s ( graph , n , t , maxDepth − 1 ) :
12 r e t u r n True
13 return False

With the help of function dls, the implementation of DLS is just an iterative
call to the subroutine:
1 d e f i d s ( graph , s , t , maxDepth ) :
2 f o r i i n r a n g e ( maxDepth ) :
3 i f d l s ( graph , s , t , i ) :
4 r e t u r n True
5 return False

Analysis It appears to us that we are undermining the efficiency of the

original DFS since the algorithm ends up visiting top level nodes of the goal
multiple times. However, it is not as expensive as it seems to be, since in a
tree most of the nodes are in the bottom levels. If the goal node locates at
the bottom level, DLS will not have an obvious efficiency decline. But if the
goal locates on topper levels on the right side of the tree, it avoids to visit
all nodes across all depths on the left half first and then be able to find this
goal node.

Properties Through the depth limited DFS, IDS has advantages of DFS:
• Limited space linear to the depth and branching factor, giving O(bd)
as space complexity.
• In practice, even with redundant effort, it still finds longer path more
quickly than BFS does.
206 11. SEARCH STRATEGIES

By iterating through from lower to higher depth, IDS has advantages of

BFS, which comes with completeness and optimality stated the same as
of BFS.

Time and Space Complexity The space complexity is the same as of

BFS, O(bd). The time complexity is slightly worse than BFS or DFS due
to the repetitive visiting nodes on top of the search tree but it still has the
same worst case exponential time complexity, O(bd ).

11.2.5 Bidirectional Search**

Figure 11.5: Bidirectional search.

Bidirectional search applies breadth-first search from both the start and
the goal node, with one BFS from start moving forward and one BFS from
the goal moving backward until their frontiers meet. This process is shown
in Fig. 11.5. As we see, each BFS process only visit O(bd/2 ) nodes comparing
with one single BFS that visits O(bd ) nodes. This will improve both the time
and space efficiency by bd/2 times compared with vanilla BFS.

Implementation Because the BFS that starts from the goal needs to
move backwards, the easy way to do this is to create another copy of the
graph wherein each edge has opposite direction compared with the original.
By creating a reversed graph, we can use a forward BFS from the goal.
We apply level by level BFS instead of updating the queue one node by
one node. For better efficiency of the intersection of the frontier set from
both BFS, we use set data structure instead of simply a list or a FIFO
queue.
11.2. UNINFORMED SEARCH STRATEGIES 207

Use Fig. 11.2 as an example, if our source and goal is ‘S’ and ‘G’ re-
spectively, if we proceed both BFS simultaneously, the process looks like
this:
qs = [ 'S ']
qt = [ ' G' ]
Check i n t e r s e c t i o n , and p r o c e e d
qs = [ ' A' , 'B ' ]
qt = [ ' A' , 'B ' ]
Check i n t e r s e c t i o n , f r o n t i e r meet , STOP

No process in this case, however, the above process will end up missing the
goal node if we change our goal to be ‘A’. This process looks like:
qs = [ ' S ' ]
qt = [ ' A ' ]
Check i n t e r s e c t i o n , and p r o c e e d
qs = [ ' A' , 'B ' ]
qt = [ ' S ' ]
Check i n t e r s e c t i o n , and p r o c e e d
qs = [ ' G' ]
qt = [ ]
STOP

This because for source and goal nodes that has a shortest path with even
length, if we proceed the search process simultaneously, we will always end
up missing the intersection. Therefore, we process each BFS iteratively–one
at a time to avoid such troubles.
The code for one level at a time BFS with set and for the intersection
check is as:
1 d e f b f s _ l e v e l ( graph , q , bStep ) :
2 i f not bStep :
3 return q
4 nq = s e t ( )
5 for n in q :
6 f o r v , c i n graph [ n ] :
7 nq . add ( v )
8 r e t u r n nq
9
10 d e f i n t e r s e c t ( qs , qt ) :
11 i f qs & qt : # i n t e r s e c t i o n
12 r e t u r n True
13 return False

The main code for bidirectional search is as:

1 d e f b i s ( graph , s , t ) :
2 # F i r s t b u i l d a graph with o p p o s i t e e d g e s
3 bgraph = d e f a u l t d i c t ( l i s t )
4 f o r key , v a l u e i n graph . i t e m s ( ) :
5 f o r n , c in value :
6 bgraph [ n ] . append ( ( key , c ) )
7 # Start b i d i r e c t i o n a l search
8 qs = { s }
208 11. SEARCH STRATEGIES

9 qt = { t }
10 step = 0
11 w h i l e qs and qt :
12 i f i n t e r s e c t ( qs , qt ) :
13 r e t u r n True
14 qs = b f s _ l e v e l ( graph , qs , s t e p%2 == 0 )
15 qt = b f s _ l e v e l ( bgraph , qt , s t e p%2 == 1 )
16 step = 1 − step
17 return False

11.2.6 Summary

Table 11.1: Performance of Search Algorithms on Trees or Acyclic Graph

Method Complete Optimal Time Space
BFS Y Y, if O(bd ) O(bd )
UCS Y Y O(C ∗ /c) O(C ∗ /c)
DFS N N O(bm ) O(bm)
IDS Y Y, if O(bd ) O(bd)
Bidireactional Y Y, if O(bd/2 ) O(bd/2 )
Search

Using b as branching factor, d as the depth of the goal node, and m

is the maximum graph depth. The properties and complexity for the five
uninformed search strategies are summarized in Table. 11.1.

11.3 Graph Search

Cycles This section is devoted to discuss more details about two search
strategies–BFS and DFS in more general graph setting. In the last section,
we just assumed our graph is either a tree or acyclic directional graph. In
more general real-world setting, there can be cycles within a graph which
will lead to infinite loops of our program.

Print Paths Second, we talked about the paths, but we never discuss
how to track all the paths. In this section, we would like to see how we can
track paths first, and then with the tracked paths, we detect cycles to avoid
getting into infinite loops.

More Efficient Graph Search Third, the last section is all about tree
search, however, in a large graph, this is not efficient by visiting some nodes
multiple times if they happen to be on the multiple paths between the
source and any other node in the graph. Usually, depends on the application
scenarios, graph search which remembers already-expanded nodes/states in
the graph and avoids expanding again by checking any about to be expanded
11.3. GRAPH SEARCH 209

node to see if it exists in frontier set or the explored set. This section, we
introduce graph search that suits for general purposed graph problems.

Visiting States We have already explained that we can use three colors:
WHITE, GREY, and BLACK to denote nodes within the unexpanded, fron-
tier, and explored set, respectively. We are doing so to avoid the hassles of
tracking three different sets, with visiting state, it is all simplified to a color
check. We define a STATE class for convenience.
c l a s s STATE:
white = 0
gray = 1
black = 2

Figure 11.6: Exemplary Graph: Free Tree, Directed Cyclic Graph, and
Undirected Cyclic Graph.

In this section, we use Fig. 11.6 as our exemplary graphs. Each’s data
structure is defined as:

• Free Tree:
1 ft = [[1] , [2] , [4] , [] , [3 , 5] , []]

• Directed Cyclic Graph:

1 dcg = [ [ 1 ] , [2] ,[0 , 4] , [1] , [3 , 5] , [ ] ]

• Undirected Cyclic Graph

210 11. SEARCH STRATEGIES

1 ucg = [ [ 1 , 2 ] , [ 0 , 2 , 3 ] , [ 0 , 1 , 4] , [1 , 4] , [2 , 3 , 5] ,
[4]]

Search Tree It is important to realize the Searching ordering is always

forming a tree, this is terminologized as Search Tree. In a tree structure,
the search tree is itself. In a graph, we need to figure out the search tree
and it decides our time and space complexity.

11.3.1 Depth-first Search in Graph

In this section we will further the depth-first tree search and explore depth-
first graph search to compare their properties and complexity.

Depth-first Tree Search

Vanilla Depth-first Tree Search Our previous code slightly modified
to suit for the new graph data structure works fine with the free tree in
Fig. 11.6. The code is as:
1 def dfs (g , vi ) :
2 p r i n t ( v i , end= ' ' )
3 f o r nv i n g [ v i ] :
4 d f s ( g , nv )

However, if we call it on the cyclic graph, dfs(dcg, 0), it runs into stack
overflow.

Cycle Avoiding Depth-first Tree Search So, how to avoid cycles? We

know the definition of a cycle is a closed path that has at least one node
that repeats itself; in our failed run, we were stuck with cycle [0, 1, 2, 0].
Therefore, let us add a path in the recursive function, and whenever we
want to expand a node, we check if it forms a cycle or not by checking the
membership of a candidate to nodes comprising the path. We save all paths
and the visiting ordering of nodes in two lists: paths and orders. The
recursive version of code is:
1 d e f d f s ( g , v i , path ) :
2 p a t h s . append ( path )
3 o r d e r s . append ( v i )
4 f o r nv i n g [ v i ] :
5 i f nv not i n path :
6 d f s ( g , nv , path +[nv ] )
7 return

Now we call function dfs for ft, dcg, and ucg, the paths and orders for
each example is listed:

• For the free tree and the directed cyclic graph, they have the same
output. The orders are:
11.3. GRAPH SEARCH 211

[0 , 1 , 2 , 4 , 3 , 5]

And the paths are:

[ [ 0 ] , [0 , 1] , [0 , 1 , 2] , [0 , 1 , 2 , 4] , [0 , 1 , 2 , 4 , 3] , [0 ,
1, 2, 4, 5]]

• For the undirected cyclic graph, orders are:

[0 , 1 , 2 , 4 , 3 , 5 , 3 , 4 , 2 , 5 , 2 , 1 , 3 , 4 , 5 , 4 , 3 , 1 , 5]

And the paths are:

[[0] ,
[0 , 1] ,
[0 , 1 , 2] ,
[0 , 1 , 2 , 4] ,
[0 , 1 , 2 , 4 , 3] ,
[0 , 1 , 2 , 4 , 5] ,
[0 , 1 , 3] ,
[0 , 1 , 3 , 4] ,
[0 , 1 , 3 , 4 , 2] ,
[0 , 1 , 3 , 4 , 5] ,
[0 , 2] ,
[0 , 2 , 1] ,
[0 , 2 , 1 , 3] ,
[0 , 2 , 1 , 3 , 4] ,
[0 , 2 , 1 , 3 , 4 , 5] ,
[0 , 2 , 4] ,
[0 , 2 , 4 , 3] ,
[0 , 2 , 4 , 3 , 1] ,
[0 , 2, 4, 5]]

These paths mark the search tree, we visualize the search tree for each
exemplary graph in Fig. 11.7.

Depth-first Graph Search

We see that from the above implementation, for a graph with only 6 nodes,
we have been visiting nodes for a total of 19 times. A lot of nodes have
been repeating. 1 appears 3 times, 3 appears 4 times, and so on. As we see
the visiting order being represented with a search tree in Fig. 11.7, our
complexity is getting close to O(bh ), where b is the branching factor and h
is the total vertices of the graph, marking the upper bound of the maximum
depth that the search can traverse. If we simply want to search if a value or
a state exists in the graph, this approach insanely complicates the situation.
What we do next is to avoid revisiting the same vertex again and again by
tracking the visiting state of a node.
In the implementation, we only track the longest path–from source vertex
to vertex that has no more unvisited adjacent vertices.
212 11. SEARCH STRATEGIES

Figure 11.7: Search Tree for Exemplary Graph: Free Tree and Directed
Cyclic Graph, and Undirected Cyclic Graph.

1 d e f d f g s ( g , v i , v i s i t e d , path ) :
2 v i s i t e d . add ( v i )
3 o r d e r s . append ( v i )
4 bEnd = True # node w i t h o u t u n v i s i t e d a d j a c e n t nodes
5 f o r nv i n g [ v i ] :
6 i f nv not i n v i s i t e d :
7 i f bEnd :
8 bEnd = F a l s e
9 d f g s ( g , nv , v i s i t e d , path + [ nv ] )
10 i f bEnd :
11 p a t h s . append ( path )

Now, we call this function with ucg as:

1 paths , o r d e r s = [ ] , [ ]
2 d f g s ( ucg , 0 , s e t ( ) , [ 0 ] )

The output for paths and orders are:

([[0 , 1 , 2 , 4 , 3] , [0 , 1 , 2 , 4 , 5]] , [0 , 1 , 2 , 4 , 3 , 5])

Did you notice that the depth-first graph search on the undirected cyclic
graph shown in Fig. 11.6 has the same visiting order of nodes and same
search tree as the free tree and directed cyclic graph in Fig. 11.6?

Efficient Path Backtrace In graph search, each node is added into the
frontier and expanded only once, and the search tree of a |V | graph will
only have |V | − 1 edges. Tracing paths by saving each path as a list in the
frontier set is costly; for a partial path in the search tree, it is repeating itself
11.3. GRAPH SEARCH 213

multiple times if it happens to be part of multiple paths, such as partial path

0->1->2->4. We can bring down the memory cost to O(|v|) if we only save
edges by using a parent dict with key and value referring as the node and
its parent node in the path, respectively. For example, edge 0->1 is saved
as parent[1] = 0. Once we find out goal state, we can backtrace from this
goal state to get the path. The backtrace code is:
1 def backtrace ( s , t , parent ) :
2 p = t
3 path = [ ]
4 w h i l e p != s :
5 path . append ( p )
6 p = parent [ p ]
7 path . append ( s )
8 r e t u r n path [ : : − 1 ]

Now, we modify the dfs code as follows to find a given state (vertex) and
obtaining the path from source to target:
1 def d f g s ( g , vi , s , t , v i s i t e d , parent ) :
2 v i s i t e d . add ( v i )
3 i f v i == t :
4 r e t u r n b a c k t r a c e ( parent , s , t )
5
6 f o r nv i n g [ v i ] :
7 i f nv not i n v i s i t e d :
8 p a r e n t [ nv ] = v i
9 f p a t h = d f g s ( g , nv , s , t , v i s i t e d , p a r e n t )
10 i f fpath :
11 return fpath
12
13 r e t u r n None

The whole Depth-first graph search tree constructed from the parent
dict is delineated in Fig. 11.8 on the given example.

Properties The completeness of DFS depends on the search space. If

your search space is finite, then Depth-First Search is complete. However,
if there are infinitely many alternatives, it might not find a solution. For
example, suppose you were coding a path-search problem on city streets, and
every time your partial path came to an intersection, you always searched
the left-most street first. Then you might just keep going around the same
block indefinitely.
The depth-first graph search is nonoptimal just as Depth-first tree
search. For example, if the task is to find the shortest path from source
0 to target 2. The shortest path should be 0->2, however depth-first graph
search will return 0->1->2. For the search tree using depth-first tree search,
it can find the shortest path from source 0 to 2. However, it will explore
the whole left branch starts from 1 before it finds its goal node on the right
side.
214 11. SEARCH STRATEGIES

Figure 11.8: Depth-first Graph Search Tree.

Time and Space Complexity For the depth-first graph search, we use
aggregate analysis. The search process covers all edges, |E| and vertices,
|V |, which makes the time complexity as O(|V | + |E|). For the space, it uses
space O(|V |) in the worst case to store the stack of vertices on the current
search path as well as the set of already-visited vertices.

Applications
Depth-first tree search is adopted as the basic workhorse of many areas of AI,
such as solving CSP, as it is a brute-force solution. In Chapter Combinatorial
Search, we will learn how “backtracking” technique along with others can be
applied to speed things up. Depth-first graph search is widely used to solve
graph related tasks in non-exponential time, such as Cycle Check(linear
time) and shortest path.

Questions to ponder:
• Only track the longest paths.

• How to trace the edges of the search tree?

• Implement the iterative version of the recursive code.

11.3. GRAPH SEARCH 215

11.3.2 Breath-first Search in Graph

We further breath-first tree search and explore breath-first graph search in
this section to grasp better understanding of one of the most general search
strategies. Because that BFS is implemented iteratively, the implementation
in this section of sheds light to the iterative counterparts of DFS’s recursive
implementations from last section.

Breath-first Tree Search

Similarly, out vanilla breath-first tree search shown in Section. ?? will get
stuck with the cyclic graph in Fig. 11.6.

Cycle Avoiding Breath-first Tree Search We avoid cycles with similar

strategy to DFS tree search that traces paths and checks membership of
node. In BFS, we track paths by explicitly adding paths to the queue.
Each time we expand from the frontier (queue), the node we need is the
last item in the path from the queue. In the implementation, we only track
the longest paths from the search tree and the visiting orders of nodes. The
Python code is:
1 def bfs (g , s ) :
2 q = [[ s ]]
3 paths , o r d e r s = [ ] , [ ]
4 while q :
5 path = q . pop ( 0 )
6 n = path [ −1]
7 o r d e r s . append ( n )
8 bEnd = True
9 for v in g [ n ] :
10 i f v not i n path :
11 i f bEnd :
12 bEnd = F a l s e
13 q . append ( path + [ v ] )
14 i f bEnd :
15 p a t h s . append ( path )
16 r e t u r n paths , o r d e r s

Now we call function bfs for ft, dcg, and ucg, the paths and orders for
each example is listed:

• For the free tree and the directed cyclic graph, they have the same
output. The orders are:
[0 , 1 , 2 , 4 , 3 , 5]

And the paths are:

[[0 , 1 , 2 , 4 , 3] , [0 , 1 , 2 , 4 , 5]]

• For the undirected cyclic graph, orders are:

216 11. SEARCH STRATEGIES

[0 , 1 , 2 , 2 , 3 , 1 , 4 , 4 , 4 , 3 , 3 , 5 , 3 , 5 , 2 , 5 , 4 , 1 , 5]

And the paths are:

[[0 , 2 , 4 , 5] , [0 , 1 , 2 , 4 , 3] , [0 , 1 , 2 , 4 , 5] , [0 , 1 , 3 ,
4 , 2] , [0 , 1 , 3 , 4 , 5] , [0 , 2 , 4 , 3 , 1] , [0 , 2 , 1 , 3 , 4 ,
5]]

Properties We can see the visiting orders of nodes are different from
Depth-first tree search counterparts. However, the corresponding search
tree for each graph in Fig. 11.6 is the same as its counterpart–Depth-first
Tree Search illustrated in Fig. 11.7. This highlights how different searching
strategies differ by visiting ordering of nodes but not differ at the search-tree
which depicts the search space–all possible paths.

Applications However, the Breath-first Tree Search and path tracing is

extremely more costly compared with DFS counterpart. When our goal is to
enumerate paths, go for the DFS. When we are trying to find shortest-paths,
mostly use BFS.

Breath-first Graph Search

Similar to Depth-first Graph Search, we use a visited set to make sure
each node is only added to the frontier(queue) once and thus expanded only
once.

BFGS Implementation The implementation of Breath-first Graph Search

with goal test is:
1 def bfgs (g , s , t ) :
2 q = [s]
3 p a r e n t = {}
4 visited = {s}
5 while q :
6 n = q . pop ( 0 )
7 i f n == t :
8 return backtrace ( s , t , parent )
9 for v in g [ n ] :
10 i f v not i n v i s i t e d :
11 q . append ( v )
12 v i s i t e d . add ( v )
13 parent [ v ] = n
14 return parent

Now, use the undirected cyclic graph as example to find the path from source
0 to target 5:
1 b f g s ( ucg , 0 , 5 )

With the found path as:

11.3. GRAPH SEARCH 217

[0 , 2 , 4 , 5]

While this found path is the shortest path between the two vertices measured
by the length. The whole Breath-first graph search tree constructed from
the parent dict is delineated in Fig. 11.9 on the given example.

Figure 11.9: Breath-first Graph Search Tree.

Time and Space Complexity Same to DFGS, the time complexity as

O(|V | + |E|). For the space, it uses space O(|V |) in the worst case to store
vertices on the current search path, the set of already-visited vertices, as
well as the dictionary used to store edge relations. The shortage that comes
with costly memory usage of Breath-first Graph Search to Depth-first Graph
Search is less obvious compared to Breath-first Tree Search to Depth-first
Graph Search.

Tree Search VS Graph Search

There are two important characteristics about tree search and graph search:

• Within a graph G = (V, E), either it is undirected or directed, acyclic

or cyclic, both the breath-first and depth-first tree search results the
same search tree: They both enumerate all possible states (paths) of
the search space.

• The conclusion is different for breath-first and depth-first graph search.

For acyclic and directed graph (tree), both search strategies result the
218 11. SEARCH STRATEGIES

same search tree. However, whenever there exists cycles, the depth-
first graph search tree might differ from the breath-first graph search
tree.

11.3.3 Depth-first Graph Search

Within this section and the next, we focus on explaining more characteristics
of the graph search that avoids repeatedly visiting a vertex. Seemingly these
features and details are not that useful judging from current context, but we
will see how it can be applied to solve problems more efficiently in Chapter
Advanced Graph Algorithms, such as detecting cycles, topological sort, and
so on.
As shown in Fig. 11.10 (a directed graph), we start from 0, mark it gray,
and visit its first unvisited neighbor 1, mark 1 as gray, and visit 1’s first
unvisited neighbor 2, then 2’s unvisited neighbor 4, 4’s unvisited neighbor
3. For node 3, it does’nt have white neighbors, we mark it to be complete
with black. Now, here, we “backtrack” to its predecessor, which is 4. And
then we keep the process till 5 become gray. Because 5 has no edge out any
more, it becomes black. Then the search backtracks to 4, to 2, to 1, and
eventually back to 0. We should notice the ordering of vertices become gray
or black is different. From the figure, the gray ordering is [0, 1, 2, 4, 3,
5], and for the black is [3, 5, 4, 2, 1, 0]. Therefore, it is necessary to
distinguish the three states in the depth-first graph search at least.

Three States Recursive Implementation We add additional colors

list to track the color of each vertices, orders to track the ordering of the
gray, and completed_orders for ordering vertices by their ordering of turn-
ing into black–when all of a node’s neighbors become black which is after
the recursive call in the code.
1 def dfs ( g , s , co lor s , orders , complete_orders ) :
2 c o l o r s [ s ] = STATE. gray
3 o r d e r s . append ( s )
4 for v in g [ s ] :
5 i f c o l o r s [ v ] == STATE. w h i t e :
6 dfs ( g , v , col ors , orders , complete_orders )
7 c o l o r s [ s ] = STATE. b l a c k
8 c o m p l e t e _ o r d e r s . append ( s )
9 return
Now, we try to call the function with the undirected cyclic graph in Fig. 11.6.
1 v = l e n ( ucg )
2 orders , complete_orders = [ ] , [ ]
3 c o l o r s = [STATE. w h i t e ] ∗ v
4 d f s ( ucg , 0 , c o l o r s , o r d e r s , c o m p l e t e _ o r d e r s )
Now, the orders and complete_orders will end up differently:
[0 , 1 , 2 , 4 , 3 , 5] [3 , 5 , 4 , 2 , 1 , 0]
11.3. GRAPH SEARCH 219

Figure 11.10: The process of Depth-first Graph Search in Directed Graph.

The black arrows denotes the the relation of u and its not visited neighbors
v. And the red arrow marks the backtrack edge.

Three States and Edges Depth-first Graph Search on graph G = (V, E)

connects all reachable vertices from a given source in the graph in the form
of a depth-first forest Gπ . Edges within Gπ are called tree edges. Tree
edges are edges marked with black arrows in Fig. 11.11. Other edges in G
can be classified into three categories based on Depth-first forest Gπ , they
are:

1. Back edges which connect a node back to one of its ancestors in the
220 11. SEARCH STRATEGIES

Figure 11.11: Classification of Edges: black marks tree edge, red marks back
edge, yellow marks forward edge, and blue marks cross edge.

depth-first forest Gπ . Marked as red edges in Fig. 11.11.

2. Forward edges point from a node to one of its descendants in the

depth-first forest Gπ . Marked as yellow edges in Fig. 11.11.

3. Cross edges point from a node to a previously visited node that is nei-
ther an ancestor nor a descendant in the depth-first forest Gπ . Marked
as blue edges in Fig. 11.11.

We can decide the type of tree edge using the DFS execution with the states:
for an edge (u, v), depends on whether we have visited v before in the DFS
and if so, the relationship between u and v.

1. If v is WHITE, then the edge is a tree edge.

2. If v is GRAY–both u and v are both being visited–then the edge is a

back edge. In directed graph, this indicates that we meet a cycle.

3. If v is BLACK, that v is finished, and that the start_time[u] <

start_time[v], then the edge is a forward edge.

4. If v is BLACK, but the start_time[u] > start_time[v], then the edge

is a cross edge.
11.3. GRAPH SEARCH 221

In undirected graph, there is no forward edge or cross edge. Therefore, it

does not really need three colors. Usually, we can simply mark it as visited
or not visted.
Classification of edges provide important information about the graph,
e.g. to if we detect a back edge in directed graph, we find a cycle.

Parenthesis Structure In either undirected or directed graph, the dis-

covered time when state goes from WHITE to GRAY and the finish time
when state turns to BLACK from GRAY has the parenthesis structure. We
modify dfs to track the time: a static variable t is used to track the time,
discover and finish is used to record the first discovered and finished
time. The implementation is shown:
1 def dfs (g , s , colors ) :
2 d f s . t += 1 # s t a t i c v a r i a b l e
3 c o l o r s [ s ] = STATE. gray
4 dfs . discover [ s ] = dfs . t
5 for v in g [ s ] :
6 i f c o l o r s [ v ] == STATE. w h i t e :
7 dfs (g , v , colors )
8 # complete
9 d f s . t += 1
10 dfs . finish [ s ] = dfs . t
11 return

Now, we call the above function with directed graph in Fig. 11.11.
1 v = l e n ( dcg )
2 c o l o r s = [STATE. w h i t e ] ∗ v
3 d f s . t = −1
4 d f s . d i s c o v e r , d f s . f i n i s h = [ −1] ∗ v , [ −1] ∗ v
5 d f s ( dcg , 0 , c o l o r s )

The output for dfs.discover and dfs.finish are:

( [ 0 , 1 , 2 , 4 , 3 , 6 ] , [ 1 1 , 10 , 9 , 5 , 8 , 7 ] )

From dfs.discover and dfs.finish list, we can generate a new list of

merged order, merge_orders that arranges nodes in order of there discov-
ered and finish time. The code is as:
1 d e f p a r e n t h e s i s ( dt , f t , n ) :
2 merge_orders = [ −1] ∗ 2 ∗ n
3 f o r v , t i n enumerate ( dt ) :
4 merge_orders [ t ] = v
5 f o r v , t i n enumerate ( f t ) :
6 merge_orders [ t ] = v
7
8 p r i n t ( merge_orders )
9 nodes = s e t ( )
10 f o r i i n merge_orders :
11 i f i not i n nodes :
12 p r i n t ( ' ( ' , i , end = ' , ' )
222 11. SEARCH STRATEGIES

13 nodes . add ( i )
14 else :
15 p r i n t ( i , ' ) , ' , end = ' ' )

The output is:

1 [0 , 1 , 2 , 4 , 3 , 3 , 5 , 6 , 6 , 5 , 4 , 2 , 1 , 0]
2 ( 0, ( 1, ( 2, ( 4, ( 3, 3 ) , ( 5, ( 6, 6 ) , 5 ) , 4 ) , 2 ) , 1 ) ,
0 ),

We would easily find out that the ordering of nodes according to the discov-
ery and finishing time makes a well-defined expression in the sense that the
parentheses are properly nested.

Questions to ponder:
• Implement the iterative version of the recursive code.

11.3.4 Breadth-first Graph Search

We have already known how to implement BFS of both the tree and graph
search versions. In this section, we want to first exemplify the state change
process of BFGS with example shown in Fig. 11.10. Second, we focus on
proving that within the breath-first graph search tree, a path between root
and any other node is the shortest path.

Three States Iterative Implementation When a node is first put into

the frontier set, it is marked with gray color. A node is complete only if all
its adjacent nodes turn into gray or black. With the visiting ordering of the
breath-first graph search, the state change of nodes in the search process is
shown in Fig. 11.12. The Python code is:
1 def bfgs_state (g , s ) :
2 v = len (g)
3 c o l o r s = [STATE. w h i t e ] ∗ v
4
5 q , orders = [ s ] , [ s ]
6 complete_orders = [ ]
7 c o l o r s [ s ] = STATE. gray # make t h e s t a t e o f t h e v i s i t i n g node
8 while q :
9 u = q . pop ( 0 )
10 for v in g [ u ] :
11 i f c o l o r s [ v ] == STATE. w h i t e :
12 c o l o r s [ v ] = STATE. gray
13 q . append ( v )
14 o r d e r s . append ( v )
15
16 # complete
17 c o l o r s [ u ] = STATE. b l a c k
18 c o m p l e t e _ o r d e r s . append ( u )
19 return orders , complete_orders
11.3. GRAPH SEARCH 223

Figure 11.12: The process of Breath-first Graph Search. The black arrows
denotes the the relation of u and its not visited neighbors v. And the red
arrow marks the backtrack edge.

The printout of orders and complete_orders are:

([0 , 1 , 2 , 4 , 3 , 5] , [0 , 1 , 2 , 4 , 3 , 5])

Properties In breath-first graph search, the first discovery and finishing

time are different for each node, but the discovery ordering and the finishing
ordering of nodes are the same ordering.
224 11. SEARCH STRATEGIES

Shortest Path

Applications
The common problems that can be solved by BFS are those only need one
solution: the best one such like getting the shortest path. As we will learn
later that breath-first-search is commonly used as archetype to solve graph
optimization problems, such as Prim’s minimum-spanning-tree algorithm
and Dijkstra’s single-source-paths algorithm.

11.4 Tree Traversal

11.4.1 Depth-First Tree Traversal

Figure 11.13: Exemplary Binary Tree

Introduction
Depth-first search starts at the root node and continues branching down a
particular path; it selects a child node that is at the deepest level of the tree
from the frontier to expand next and defers the expansion of this node’s
siblings. Only when the search hits a dead end (a node that has no child)
does the search “backtrack” to its parent node, and continue to branch
down to other siblings that were deferred. A recursive tree can be traversed
recursively. We print out the value of current node, then apply recursive
call on the left and right node; by treating each node as a subtree, naturally
a recursive call to a node can be thought of handling the traversal of that
subtree. The code is quite straightforward:
1 d e f r e c u r s i v e ( node ) :
2 i f not node :
3 return
4 p r i n t ( node . v a l , end= ' ' )
5 r e c u r s i v e ( node . l e f t )
6 r e c u r s i v e ( node . r i g h t )

Now, we call this function with a tree as shown in Fig. 11.13, the output
that indicates the traversal order is:
11.4. TREE TRAVERSAL 225

1 1 2 4 5 3 6

Three Types of Depth-first Tree Traversal

Figure 11.14: Left: PreOrder, Middle: InOrder, Right: PostOrder. The red
arrows marks the traversal ordering of nodes.

The visiting ordering between the current node, its left child, and its
right child decides the following different types of recursive tree traversals:

(a) Preorder Traversal with ordering of [current node, left child,

right child]: it visits the nodes in the tree with ordering [1, 2, 4, 5,
3, 6].In our example, the recursive function first prints the root node 1,
then goes to its left child, which prints out 2. Then it goes to node 4.
From node 4, it next moves to its left child which is empty and leads to
the termination of the recursive call and then the recursion backward
to node 4. Since node 4 has no right child, it further backwards to
node 2, and then it check 2’s right child 5. The same process of node
4 happens on node 5. It backwards to node 2, backwards to node 1,
and keep visiting its right child 3, and the process goes on. We draw
out this process in Fig. 11.14.
(b) Inorder Traversal with ordering of [left child, current node, right
child]: it traverses the nodes in ordering of [4, 2, 5, 1, 3, 6]. Three
segments will appear with the inorder traversal for a root node: nodes
in left subtree, root, and nodes in the right subtree.
(c) Postorder Traversal with ordering of [left child, right child,
current node]: it traverses the nodes in ordering of [4, 5, 2, 6, 3,
1].

We offer the code of Inorder Traversal:

1 d e f i n o r d e r _ t r a v e r s a l ( node ) :
2 i f not node :
3 return
4 i n o r d e r _ t r a v e r s a l ( node . l e f t )
5 p r i n t ( node . v a l , end= ' ' )
6 i n o r d e r _ t r a v e r s a l ( node . r i g h t )
226 11. SEARCH STRATEGIES

Try to check the other two orderings: [left

child, current node, right child] and [left child,
right child, current node] by hand first and then write
the code to see if you get it right?

Return Values
Here, we want to do the task in a different way: We do not want to just
print out the visiting orders, but instead write the ordering in a list and
return this list. How would we do it? The process is the same, other than
we need to return something(not None which is default in Python). If we
only have empty node, it shall return us an empty list [], if there is only
one node, returns [1] instead.
Let us use PreOrder traversal as an example. To make it easier to
understand, the same queen this time wants to do the same job in a different
way, that she wants to gather all the data from these different states to her
own hand. This time, she assumes the two generals A and B will return a
list of the subtree, safely and sount. Her job is going to combine the list
returned from the left subtree, her data, and the list returned from the right
subtree. Therefore, the left general brings back A = [2, 4, 5], and the right
general brings back B = [3, 6]. Then the final result will be queue + A + B =
[1, 2, 4, 5, 3, 6]. The Python code is given:
1 d e f PreOrder ( r o o t ) :
2 i f r o o t i s None :
3 return [ ]
4 ans = [ ]
5 l e f t = PreOrder ( r o o t . l e f t )
6 r i g h t = PreOrder ( r o o t . r i g h t )
7 ans = [ r o o t . v a l ] + l e f t + r i g h t
8 r e t u r n ans

An Example of Divide and Conquer Be able to understand the re-

turned value and combine them is exactly the method of divide and conquer,
one of the fundamental algorithm design principles. This is a seemingly triv-
ial change, but it approaches the problem solving from a totally different
angle: atomic searching to divide and conquer that highlights the structure
of the problem. The printing traversal and returning traversal represents
two types of problem solving: the first is through searching–searching and
treating each node more separately and the second is through reduce and
conquer–reducing the problem to a series of smaller subproblems(subtrees
where the smallest are empty subtrees) and construct the result by using
the information of current problem and the solutions of the subproblems.
11.4. TREE TRAVERSAL 227

Complexity Analysis
It is straightforward to see that it only visit all nodes twice, one in the
forward pass and the other in the backward pass of the recursive call, making
the time complexity linear to total number of nodes, O(n). The other way is
through the recurrence relation, we would write T (n) = 2 × T (n/2) + O(1),
which gives out O(n) too.

11.4.2 Iterative Tree Traversal

In Chapter Iteration and Recursion, we would know that the recursive func-
tion might suffer from the stack overflow, and in Python the recursion depth
is 1000. This section, we explore iterative tree traversals corresponding to
PreOrder, InOrder, and PostOrder tree traversal. We know that the re-
cursion is implemented implicitly with call stack, therefore in our iterative
counterparts, they all use an explicit stack data structure to mimic the re-
cursive behavior.

Figure 11.15: The process of iterative preorder tree traversal.

Simple Iterative Preorder Traversal If we know how to implement

a DFS iteratively with stack in a graph, we know our iterative preorder
traversal. In this version, the stack saves all our frontier nodes.

• At first, we start from the root, and put it into the stack, which is 1
in our example.

• Our frontier set has only one node, thus we have to pop out node 1
and expand the frontiner set. When we are expanding node 1, we add
its children into the frontier set by pushing them into the stack. In
228 11. SEARCH STRATEGIES

the preorder traversal, the left child should be first expanded from the
frontier stack, indicating we should push the left child into the stack
afterward the right child is pushed into. Therefore, we add node 3 and
2 into the stack.

• We continue step 2. Each time, we expand the frontier stack by push-

ing the toppest node’s children into the stack and after popping out
this node. This way, we use the first come last ordering of the stack
data structure to replace the recursion.

We illustrate this process in Fig. 11.15. The code is shown as:

1 def PreOrderIterative ( root ) :
2 i f r o o t i s None :
3 return [ ]
4 res = [ ]
5 stack = [ root ]
6 while stack :
7 tmp = s t a c k . pop ( )
8 r e s . append ( tmp . v a l )
9 i f tmp . r i g h t :
10 s t a c k . append ( tmp . r i g h t )
11 i f tmp . l e f t :
12 s t a c k . append ( tmp . l e f t )
13 return res

Figure 11.16: The process of iterative postorder tree traversal.

11.4. TREE TRAVERSAL 229

Simple Iterative Postorder Traversal Similar to the above preorder

traversal, the postordering is the ordering of nodes finishing the expanding
of both its left and right subtree, thus with the ordering of left subtree,
right subtree, and root. In preorder traversal,we obtained the ordering
of root, left subtree, and right subtree. We try to reverse the order-
ing, it becomes right subtree, left subtree, and root. This ordering
only differs with postorder by a single a swap between the left and right
subtree. So, we can use the same process as in the preorder traversal but
expanding a node’s children in the order of left and right child instead of
right and left. And then the reversed ordering of items being popped out
is the postoder traversal ordering. The process is shown in Fig. 11.16. The
Python implementation is shown as:
1 def PostOrderIterative ( root ) :
2 i f r o o t i s None :
3 return [ ]
4 res = [ ]
5 stack = [ root ]
6 while stack :
7 tmp = s t a c k . pop ( )
8 r e s . append ( tmp . v a l )
9 i f tmp . l e f t :
10 s t a c k . append ( tmp . l e f t )
11 i f tmp . r i g h t :
12 s t a c k . append ( tmp . r i g h t )
13 return res [:: −1]

General Iterative Preorder and Inorder Traversal In the depth-

first-traversal, we always branch down via the left child of the node at the
deepest level in the frontier. The branching only stops when it can no longer
find a left child for the deepest node in the frontier. Only till then, it will
look around at expanding the right child of this deepest node, and if no such
right child exists, it backtracks to its parents node and continues to check
its right child to continue the branching down process.
Inspired by this process, we use a pointer, say cur to point to the root
node of the tree, and we prepare an empty stack. The iterative process is:

• The branching down process can be implemented with visiting cur

node, and pushing it into the stack. And then we set cur=cur.left,
so that it keeps deepening down.

• When one branch down process terminates, we pop out a node from
stack, and we set cur=node.right, so that we expand the branching
process to its right sibling.

We illustrate this process in Fig. 11.17. The ordering of items pushed into
the stack is the preorder traversal ordering, which is [1, 2, 4, 5, 3, 6]. And
230 11. SEARCH STRATEGIES

Figure 11.17: The process of iterative tree traversal.

the ordering of items being popped out of the stack is the inorder traversal
ordering, which is [4, 2, 5, 1, 3, 6].

Implementation We use two lists–preorders and inorders–to save the

traversal orders. The Python code is:
1 def iterative_traversal ( root ) :
2 stack = [ ]
3 cur = root
4 preorders = [ ]
5 inorders = [ ]
6 while stack or cur :
7 while cur :
8 p r e o r d e r s . append ( c u r . v a l )
9 s t a c k . append ( c u r )
10 cur = cur . l e f t
11 node = s t a c k . pop ( )
12 i n o r d e r s . append ( node . v a l )
13 c u r = node . r i g h t
14 return preorders , inorders
11.4. TREE TRAVERSAL 231

11.4.3 Breath-first Tree Traversal

Figure 11.18: Draw the breath-first traversal order

Instead of traversing the tree recursively deepening down each time, the
alternative is to visit nodes level by level, as illustrated in Fig. ?? for our
exemplary binary tree. We first visit the root node 1, and then its children
2 and 3. Next, we visit 2 and 3’s children in order, we goes to node 4, 5, and
6. This type of Level Order Tree Traversal uses the breath-first search
strategy which differs from our covered depth-first search strategy. As we
see in the example, the root node is expanded first, then all successors of the
root node are expanded next, and so on, following a level by level ordering.
We can also find the rule, the nodes first come and get first expanded. For
example 2 is first visited and then 3, thus we expand 2’s children first.
Then we have 4 and 5. Next, we expand 3’s children. This First come first
expanded tells us we can rely on a queue to implement BFS.

Simple Implementation We start from the root, say it is our first level,
put it in a list named nodes_same_level. Then we use a while loop, and
each loop we visit all children nodes of nodes_same_level from the last
level. We put all these children in a temporary list temp, before the loop
ends, we assign temp to nodes_same_level, until the deepest level where
no more children nodes will be found and leave our temp list to be empty
and our while loop terminates.
1 def LevelOrder ( root ) :
2 i f not r o o t :
3 return
4 nodes_same_level = [ r o o t ]
5 w h i l e nodes_same_level :
6 temp = [ ]
7 f o r n i n nodes_same_level :
8 p r i n t ( n . v a l , end= ' ' )
9 if n. left :
10 temp . append ( n . l e f t )
11 i f n. right :
12 temp . append ( n . r i g h t )
13 nodes_same_level = temp
232 11. SEARCH STRATEGIES

The above will output follows with our exemplary binary tree:
1 1 2 3 4 5 6

Implementation with Queue As we discussed, we can use a FIFO queue

to save the nodes waiting for expanding. In this case, at each while we only
handle one node that are at the front of the queue.
1 def bfs ( root ) :
2 i f not r o o t :
3 return
4 q = [ root ]
5 while q :
6 node = q . pop ( 0 ) # g e t node a t t h e f r o n t o f t h e queue
7 p r i n t ( node . v a l , end= ' ' )
8 i f node . l e f t :
9 q . append ( node . l e f t )
10 i f node . r i g h t :
11 q . append ( node . r i g h t )

11.5 Informed Search Strategies**

11.5.1 Best-first Search

Best-first search is a search algorithm which explores a graph by expanding

the most promising node chosen according to a specified rule. The degree
of promising of a node is described by a heuristic evaluation function
f (n) which, in general, may depend on the description of the node n, the
description of the goal, and the information gathered by the search up to
that point, and most important, on any extra knowledge about the problem
domain.
Breath-first search fits as a special case in Best-first search if the objective
of the problem is to find the shortest path from source to other nodes in the
graph; it uses the estimated distance to source as a heuristic function. At
the start, the only node in the frontier set is the source node, expand this
node and add all of its unexplored neighboring nodes in the frontier set and
each comes with distance 1. Now, among all nodes in the frontier set, choose
the node that is the most promising to expand. In this case, since they all
have the same distance, expand any of them is good. Next, we would add
nodes that have f (n) = 2 in the frontier set, choose any one that has smaller
distance.
A Generic best-first search will need a priority queue to implement in-
stead of a FIFO queue used in the breath-first search.
11.5. INFORMED SEARCH STRATEGIES** 233

11.5.2 Hands-on Examples

Get a more straightforward example

Add an example

Triangle (L120)

Given a triangle, find the minimum path sum from top to bottom. Each
step you may move to adjacent numbers on the row below.

Example :
Given t h e f o l l o w i n g t r i a n g l e :

[
[2] ,
[3 ,4] ,
[6 ,5 ,7] ,
[4 ,1 ,8 ,3]
]
The minimum path sum from top t o bottom i s 11 ( i . e . , 2 + 3 + 5 +
1 = 11) .

Analysis Solution: first we can use dfs traverse as required in the problem,
and use a global variable to save the minimum value. The time complexity
for this is O(2n ). When we try to submit this code, we get LTE error. The
code is as follows:

1 import s y s
2 d e f min_path_sum ( t ) :
3 '''
4 P u r e l y Complete S e a r c h
5 '''
6 min_sum = s y s . maxsize
7 d e f d f s ( i , j , cur_sum ) :
8 n o n l o c a l min_sum
9 # edge c a s e
10 i f i == l e n ( t ) o r j == l e n ( t [ i ] ) :
11 # g a t h e r t h e sum
12 min_sum = min ( min_sum , cur_sum )
13 return
14 # o n l y two e d g e s / c h o i c e s a t t h i s s t e p
15 d f s ( i +1 , j , cur_sum + t [ i ] [ j ] )
16 d f s ( i +1 , j +1, cur_sum + t [ i ] [ j ] )
17 d f s (0 , 0 , 0)
18 r e t u r n min_sum
234 11. SEARCH STRATEGIES

11.6 Exercises
11.6.1 Coding Practice
Property of Graph

1. 785. Is Graph Bipartite? (medium)

2. 261. Graph Valid Tree (medium)

3. 797. All Paths From Source to Target(medium)

Combinatorial Search

So far, we have learned the most fundamental search strategies on general

data structures such as array, linked list, graph, and tree. In this chapter,
instead of searching on explicit and well defined data structures, we extend
and discuss more exhaustive search algorithms that can solve rather obscure
and challenging combinatorial problems, such as sudoku and the famous
Travels Salesman Problem. For combinatorial problems, we have to figure
out the potential search space, and rummage a solution.

12.1 Introduction
Combinatorial search problems consists of n items and a requirement to
find a solution, i.e., a set of L < N items that satisfy specified conditions or
constraints. For example, a sudoku problem where a 9 × 9 grid is partially
filled with number between 1 and 9, fill the empty spots with numbers that
satisfy the following conditions:

1. Each row has all numbers form 1 to 9.

2. Each column has all numbers form 1 to 9.

3. Each sub-grid (3 × 3) has all numbers form 1 to 9.

This sudoku together with one possible solution is shown in Fig. 12.1. In
this case, we have 81 items, and we are required to fill 51 empty items with
the above three constraints.

235
236 12. COMBINATORIAL SEARCH

Figure 12.1: A Sudoku puzzle and its solution

Model Combinatorial Search Problems We can model the combina-

torial search solution as a vector s = (s0 , s1 , ..., sL−1 ), where each variable si
is selected from a finite set A, which is called the domain for each variable.
Such a vector might represent an arrangement where si contains the i-th
item of a permutation, in the combination problem, a boolean denotes if
the i-th item is selected already, or it can represent a path in a graph or a
sequence of moves in a game. In the sudoku problem, each si can choose
from a number in range [1, 9].

Problem Categories Combinatorial search problems arise in many areas

of computer science such as artificial intelligence, operations search, bioinfor-
matics, and electronic commerce. These problems typically involve finding
a grouping, ordering, or assignment of a discrete, finite set of objects that
satisfy given conditions or constraints. We introduce two well-studied types
of problems that are more likely to be NP-hard and of at least exponential
complexity:

1. Constraint Satisfaction Problems (CSP) are mathematical questions

defined as a set of variables whose state must satisfy a number of con-
straints or limitations(mathematical equations or inequations), such
as Sudoku, N-queen, map coloring, Crosswords, and so on. The size
of the search space of CSPs can be roughly given as:

O(cdL ) (12.1)

Where there are L variables, each with domain size d, and there are c
constraints to check out.

2. Combinatorial optimization problems consist of searching for maxima

or minima of an objective function F whose domain is a discrete but
large configuration space. Some classic examples are:
12.1. INTRODUCTION 237

• Travelling Salesman Problems (TSP): given position (x, y) of n

different cities, find the shortest possible path that visits each
city exactly once.
• Integer Linear Programming: maximize a specified linear com-
bination of a set of integers X1 , .., Xn subject to a set of linear
constraints each of the form:

a1 X1 + ... + an Xn ≤ c (12.2)

• Knapsack Problems: Given a set of items, each with a weight

and a value, determine the number of each item to include in a
collection so that the total weight is less than or equal to a given
limit and the total value is as large as possible.

Search Strategies From Chapter Discreet Programming, we have learned

the basic enumerative combinatorics, including counting principles and knowl-
edge on permutations, combinations, partitions, subsets, and subsequences.
Combinatorial Search builds atop this subject, and together through differ-
ent search strategies such as depth-first search and best-first search, it is
able to enumerate the search space and find the solution(s) with necessary
speedup methods. In this chapter, we only discuss about complete search
and only acknowledge the existence of approximate search techniques.
Backtracking is a process of depth-first based search where it “builds”
the search tree on the fly incrementally instead of having a tree/graph struc-
ture beforehand to search through. Backtracking fits to solve combinatorial
search problems because:

1. It is space efficient for the usage of a DFS and the candidates are built
incrementally and their validity to fit a solution is checked right away.

2. It is time efficient for that some partial candidates can be pruned if the
algorithm believes that it will not lead to our final complete solution.

Because the ordering of variables s0 , ..., sL−1 can potentially affect the
size of the search space sometimes. Thus, backtracking search relies on one
or more heuristics to select which variable to consider next. Look-ahead is
one such heuristic that is preferably applied to check the effects of choosing
a given variable to evaluate or to decide the order of values to give to it.
There are other Breath-first Search based strategies that might work
better than backtracking, such as for combinatorial optimization problems,
best-first branch and bound search might be more efficient than its depth-
first counterpart.
238 12. COMBINATORIAL SEARCH

Speedups The speedup methods are well studied in computer science,

and we list two general ways to prune unqualified or unnecessary branches
during the search of backtracking:

1. Branch and Prune: This method prunes the unqualified branches with
constraints of the problems. This is usually applied to solve constraint
restricted problems (CSPs).

2. Branch and Bound: This method prunes unnecessary branches via

comparing an estimation of a partial candidate with a found global
best solution. If the estimation states that the partial candidate will
never lead us to a better solution, we cut off this branch. This tech-
nique can be applied to solve a general optimization problems, such
as Travel Salesman Problems (TSP), knapsack problems, and so.

12.2 Backtracking
In this section, we first introduce the technique of backtracking, and then
demonstrate it by implementing common enumerative combinatorics seen
in Chapter Discreet Programming.

12.2.1 Introduction
Backtracking search is an exhaustive search algorithm(depth-first search)
that systematically assigns all possible combinations of values to the vari-
ables and checks if these assignments constitute a solution. Backtracking is
all about choices and consequences and it shows the following two properties:

1. No Repetition and Completion: It is a systematic generating

method that enumerates all possible states exactly at most once: it will
not miss any valid solution but avoids repetitions. If there exists “cor-
rect” solution(s), it is guaranteed to be found. This property makes
it ideal for solving combinatorial problems where the search space has
to be constructed and enumerated. Therefore, the worst-case running
time of backtracking search is exponential with the length of the state
(bL , b is the average choice for each variable in the state).

2. Search Pruning: Along the way of working with partial solutions,

in some cases, it is possible for us to decide if they will lead to a valid
complete solution. As soon as the algorithm is confident to say the
partial configuration is either invalid or nonoptimal, it abandons this
partial candidate, an then “backtracks” (return to the upper level),
and resets to the upper level’s state so that the search process can
continue to explore the next branch for the sake of efficiency. This
is called search pruning with which the algorithm ends up amortizely
12.2. BACKTRACKING 239

visiting each vertex less than once. This property makes backtracking
the most promising way to solve CSPs and combinatorial optimization
problems.

Solving sudoku problem with backtracking algorithm, each time at a level

in the DFS, it tries to extend the last partial solution s = (s0 , s1 , ..., sk ) by
trying out all 9 numbers at sk+1 , say we choose 1 at this step. It testifies
the partial solution with the desired solution:

1. If the partial solution s = (s0 , s1 , ..., sk , 1) is still valid, move on to the

next level and work on trying out sk+2 .

2. If the partial solution is invalid and is impossible to lead to a complete

solution, it “backtracks” to the last level and resets the state as s =
(s0 , s1 , ..., sk ) so that it can try our other choices if there are some
left(which in our example, we will try sk+1 = 2) or keep “backtracking”
to even upper level.

The process should be way clearer once we have learned the examples in the
following subsections.

12.2.2 Permutations
Given a list of items, generate all possible permutations of these items. If
the set has duplicated items, only enumerate all unique permutations.

No Duplicates(L46. Permutations)
When there are no duplicates, from Chapter Discreet Programming, we
know the number of all permutations are:
n!
p(n, m) = (12.3)
(n − m)!

where m is the number of items we choose from the total n items to make
the permutations.
For example :
a = [1 , 2 , 3]
There a r e 6 t o t a l permutations :
[1 , 2 , 3] , [1 , 3 , 2] ,
[2 , 1 , 3] , [2 , 3 , 1] ,
[3 , 1 , 2] , [3 , 2 , 1]

Analysis Let us apply the philosophy of backtracking technique. We have

to build a state with length 3, and each variable in the state has three choices:
1, 2, and 3. The constraint here comes from permutation which requires that
no two variables in the state will be having the same value. To build this
240 12. COMBINATORIAL SEARCH

incrementally with backtracking, we state with an empty state []. At first,

we have three options, we get three partial results [1], [2], and [3]. Next, we
handle the second variable in the state: for [1], we can choose either 2 or
3,getting [1, 2] and [1, 3]; same for [2], where we end up with [2, 1] and [2, 3];
for [3], we have [3, 1] and [3, 2]. At last, each partial result has only one
option, we get all permutations as shown in the example. We visualize this
incrementally building candidates in Fig. 12.2.

Figure 12.2: The search tree of permutation

However, we only managed to enumerate the search space, but not sys-
tematically or recursively with the Depth-first search process. With DFS, we
depict the traverse order of the vertexes in the virtual search space with red
arrows in Fig. 12.2. The backward arrows mark the “backtracking” process,
where we have to reset the state to the upper level.

Implementation We use a list of boolean bUsed to track which item is

used in the search process. n is the total number of items, d is the depth of
the depth-first search process, curr is the current state, and ans is to save
all permutations. The following code, we generate p(n, m)
1 d e f p_n_m( a , n , m, d , used , c u r r , ans ) :
2 i f d == m: #end c o n d i t i o n
3 ans . append ( c u r r [ : : ] )
4 return
5
6 f o r i in range (n) :
7 i f not used [ i ] :
8 # g e n e r a t e t h e next s o l u t i o n from c u r r
9 c u r r . append ( a [ i ] )
10 used [ i ] = True
11 print ( curr )
12 # move t o t h e next s o l u t i o n
13 p_n_m( a , n , m, d + 1 , used , c u r r , ans )
14 #b a c k t r a c k t o p r e v i o u s p a r t i a l s t a t e
12.2. BACKTRACKING 241

15 c u r r . pop ( )
16 used [ i ] = F a l s e
17 return

Check out the running process in the source code.

Figure 12.3: The search tree of permutation by swapping. The indexes of

items to be swapped are represented as a two element tuple.

Alternative: Swapping Method We first start with a complete state,

such that s = [1, 2, 3] in our case. By swapping 1 and 2, we get [2, 1, 3] and
[2, 3, 1] can be obtained by swapping 1 and 3 on top of [2, 1, 3]. With all
permutations as leaves in the search space, the generating process is similar
to Fig. 12.2. We show this alternative process in Fig. 12.3. At first, we swap
index 0 with all other indexes, including 0, 1, and 2. At the second layer,
we move on to swap index 1 with all other successive indexes, and so on for
all other layers. The Python code is as:
1 ans = [ ]
2 d e f permutate ( a , d ) :
3 g l o b a l ans
4 i f d == l e n ( a ) :
5 ans . append ( a [ : : ] )
6 f o r i in range (d , len ( a ) ) :
7 a[ i ] , a[d] = a[d] , a[ i ]
8 permutate ( a , d+1)
9 a[ i ] , a[d] = a[d] , a[ i ]
10 return

There is Johnson-Trotter algorithm that utilizes such swapping method,

which avoids recursion, and instead computes the permutations by an iter-
ative method.
242 12. COMBINATORIAL SEARCH

With Duplicates(47. Permutations II)

We have already know that p(n, n) is further decided by the duplicates
within the n items. Assume we have in total of d items are repeated, and
each item is repeated xi times, then the number of all arrangements pd(n, n)
are:

p(n, n)
pd(n, n) = , (12.4)
x0 !x1 !...xd−1
d−1
(12.5)
X
w.r.t xi ≤ n
i=0

For example, when a = [1, 2, 2, 3], there are 4!

2! unique permutations, which
is 12 in total, and are listed as bellow:
[1 , 2, 2, 3] , [1 , 2, 3, 2] , [1 , 3, 2, 2] ,
[2 , 1, 2, 3] , [2 , 1, 3, 2] , [2 , 2, 1, 3] ,
[2 , 2, 3, 1] , [2 , 3, 1, 2] , [2 , 3, 2, 1] ,
[3 , 1, 2, 2] , [3 , 2, 1, 2] , [3 , 2, 2, 1]

Figure 12.4: The search tree of permutation with repetition

Analysis The enumeration of these all possible permutations can be ob-

tained with backtracking exactly the same as if there are no duplicates.
However, this is not efficient since it has doubled the search space with re-
peated permutations. Here comes to our first time applying the Branch and
Prune method: we avoid repetition by pruning off redundant branches.
One main advantage of backtracking is not to save all intermediate states,
thus we should find a mechanism that avoids generating these intermediate
states at the first place. One solution is that we sort all n items, making all
repeat items adjacent to each other. We know if the current intermediate
state is redundant by simply comparing this item with its predecessor: if it
equals, we move on from building state with this item to the next item in
line. The search tree of our example is shown in Fig. 12.4.
12.2. BACKTRACKING 243

Implementation The implementation is highly similar to previous stan-

dard permutation code other than three different points:
1. Before the items are called by permutate, they are sorted first.

2. A simple condition check to avoid generating repeat states.

3. We used a dictionary data structure tracker which has all unique

items as keys and each item’s corresponding occurrence as values to
replace the boolean vector used for slightly better space efficiency.
The Python code is as:
1 from c o l l e c t i o n s import Counter
2 d e f permuteDup ( nums , k ) :
3 ans = [ ]
4 d e f permutate ( d , n , k , c u r r , t r a c k e r ) :
5 n o n l o c a l ans
6 i f d == k :
7 ans . append ( c u r r )
8 return
9 f o r i in range (n) :
10 i f t r a c k e r [ nums [ i ] ] == 0 :
11 continue
12 i f i − 1 >= 0 and nums [ i ] == nums [ i − 1 ] :
13 continue
14 t r a c k e r [ nums [ i ] ] −= 1
15 c u r r . append ( nums [ i ] )
16
17 permutate ( d+1, n , k , c u r r [ : ] , t r a c k e r )
18 c u r r . pop ( )
19 t r a c k e r [ nums [ i ] ] += 1
20 return
21
22 nums . s o r t ( )
23 permutate ( 0 , l e n ( nums ) , k , [ ] , Counter ( nums ) )
24 r e t u r n ans

Can you extend the swap method based permutation to

handle duplicates?

Discussion
From the example of permutation, we have demonstrated how backtracking
works to construct candidates with an implicit search tree structure: the root
node is the initial state, any internal node represents intermediate states,
and all leaves are our candidates which in this case there are n! for p(n, n)
permutation. In this subsection, we want to point out the unique properties
and its computational and space complexities.
244 12. COMBINATORIAL SEARCH

Two Passes Backtracking builds an implicit search tree on the fly, and
it does not memorize any intermediate state. It visits the vertices in the
search tree in two passes:
1. Forward pass: it builds the solution incrementally and reaches to the
leaf nodes in a DFS fashion. One example of forward pass is []− >
[1]− > [1, 2]− > [1, 2, 3].
2. Backward pass: as the returning process from recursion of DFS, it
also backtracks to previous state. One example of backward pass is
[1, 2, 3]− > [1, 2], − > [1].
First, the forward pass to build the solution incrementally. The change of
curr in the source code indicates all vertices and the process of backtracking,
it starts with [] and end with []. This is the core character of backtracking.
We print out the process for the example as:
[] − >[1] − >[1 , 2] − >[1 , 2 , 3]−> b a c k t r a c k : [ 1 , 2 ]
backtrack : [ 1 ]
[ 1 , 3] − >[1 , 3 , 2]−> b a c k t r a c k : [ 1 , 3 ]
backtrack : [ 1 ]
backtrack : [ ]
[2] − >[2 , 1] − >[2 , 1 , 3]−> b a c k t r a c k : [ 2 , 1 ]
backtrack : [ 2 ]
[ 2 , 3] − >[2 , 3 , 1]−> b a c k t r a c k : [ 2 , 3 ]
backtrack : [ 2 ]
backtrack : [ ]
[3] − >[3 , 1] − >[3 , 1 , 2]−> b a c k t r a c k : [ 3 , 1 ]
backtrack : [ 3 ]
[ 3 , 2] − >[3 , 2 , 1]−> b a c k t r a c k : [ 3 , 2 ]
backtrack : [ 3 ]
backtrack : [ ]

Time Complexity of Permutation In the search tree of permutation

in Fig. 12.2, there are in total V nodes, which equals to ni=0 pkn . Because
P

in a tree the number of edges |E| is |v| − 1, making the time complexity
O(|V | + |E|) the same as of O(|V |). Since p(n, n) itself alone takes n! time,
making the permutation an NP-hard problem.

Space Complexity A standard depth-first search consumes O(bd) space

in worst-case to execute, where b is branching factor and d is the depth of
the search tree. In the combinatorial search problems, usually depth and
branching is decided by the total number of variables in the state, making
b ∼ d ∼ n. In backtracking, we have space complexity O(n2 ). However, in
normal standard DFS, the input–tree or graph data structure–is given and
not attributed to space complexity. For a NP-hard combinatorial search
problem, this input is often exponential. Backtracking search outcompetes
the standard DFS by avoiding such space consumption; it only keeps a
dynamic data structure(curr) to construct node on the fly.
12.2. BACKTRACKING 245

12.2.3 Combinations
Given a list of n items, generate all possible combinations of these items. If
the input has duplicated items, only enumerate unique combinations.

No Duplicates (L78. Subsets )

From Chapter Discrete Programming, we list the powerset–all m-subset,

m ∈ [0, n] as:

P (n, m) n!
C(n, m) = = (12.6)
P (m, m) (n − m)!m!

For example, when a = [1, 2, 3], there are in total 7 m-subsets, they are:
C( 3 , 0) : [ ]
C( 3 , 1) : [ 1 ] , [ 2 ] , [ 3 ]
C( 3 , 2) : [ 1 , 2 ] , [ 1 , 3 ] , [ 2 , 3 ]
C( 3 , 3) : [ 1 , 2 , 3 ]

Figure 12.5: The Search Tree of Combination.

Analysis We can simply reuse the method of permutation, but with a

problem that it generates lots of duplicates. For example, P (3, 2) includes
[1, 2] and [2, 1] which are indeed the same subset. Of course, we can check
redundancy with saved m-subsets, but its not ideal. A systematical solution
that avoids duplicates all along is preferred. If we limit the items we put into
the m-subsets to be only increasing(of indexes of items or of values of items),
in which case [2, 1], [3, 1], and [3, 2] will never be generated. The enumeration
of combination through backtracking search is shown in Fig. 12.5.
246 12. COMBINATORIAL SEARCH

Implementation Two modifications based on permutation code:

1. for loop: in the loop to iterate all possible candidates, we limit the
candidates to be having larger indexes only.

2. We do not have to use a data structure to track the state of each candi-
date because any candidate that has larger index is a valid candidate.

We use start to track the starting position of valid candidates. The code
of combination is:
1 d e f C_n_k( a , n , k , s t a r t , d , c u r r , ans ) :
2 i f d == k : #end c o n d i t i o n
3 ans . append ( c u r r [ : : ] )
4 return
5
6 f o r i in range ( s t a r t , n) :
7 c u r r . append ( a [ i ] )
8 C_n_k( a , n , k , i +1 , d+1, c u r r , ans )
9 c u r r . pop ( )
10 return

Alternative: 0 and 1 Selection We have discussed that a powerset

written as P (S). With each item either being appear or not appear in the
resulting set makes the value set {0, 1}, resulting |P (S)| = 2n . Follow this
pattern, with our given example, we can alternatively generate a powerset
like this:
s sets
1 { 1 } , {}
2 { 1 , 2 } , { 1 } , { 2 } , {}
3 { 1 , 2 , 3 } , { 1 , 2 } , { 1 , 3 } , { 3 } , { 2 , 3 } , { 2 } , { 3 } , {}

This process can be better visualized in a tree as in Fig. ??. We can see this
process results 2n leaves compared with our previous implementation which
has a total of 2n nodes is slightly less efficient. The code is as:
1 d e f p o w e r s e t ( a , n , d , c u r r , ans ) :
2 i f d == n :
3 ans . append ( c u r r [ : : ] )
4 return
5
6 # Case 1 : s e l e c t item
7 c u r r . append ( a [ d ] )
8 p o w e r s e t ( a , n , d + 1 , c u r r , ans )
9 # Case 2 : not s e l e c t item
10 c u r r . pop ( )
11 p o w e r s e t ( a , n , d + 1 , c u r r , ans )
12 return
12.2. BACKTRACKING 247

Time Complexity The total nodes within the implicit search space of
combination shown in Fig. 12.5 is nk=0 Cnk = 2n , which was explained in
P

Chapter Discreet Programming. Thus, the time complexity of enumerating

the powset is O(2n ) and is less compared with O(n!) that comes with the
permutation.

Space Complexity Similarly, combination with backtracking search uses

slightly less space. But, we can still acclaim the upper bound to be O(n2 ).

With Duplicates(L90. Subsets II)

Assume we have m unqiue items, and the frequency of each is marked as xi ,

with m−1
i=0 xi = n.
P

n m−1
c(n, k) = (xi + 1) (12.7)
X Y

k=0 i=0

For example, when a = [1, 2, 2, 3], there are 2 × 3 × 2 = 12 combinations

in the powerset, they are listed as bellow:
[ ] , [1] , [2] , [3] , [1 , 2] , [1 , 3] , [2 , 2] , [2 , 3] ,
[1 , 2 , 2] , [1 , 2 , 3] , [2 , 2 , 3] ,
[1 , 2 , 2 , 3]

However, counting c(n, k) with duplicates in the input replies on the specific
input with specific distribution of these items. We are still able to count by
enumerating with backtracking search.

Analysis and Implementation The enumeration of the powerset with

backtracking search is the same as handling the iterations of choice in the
enumeration of permutation with duplicates. We first sort our items in
increasing order of the values. Then we replace the for loop from the above
code with the following code snippet to handle the repetition of items from
the input:
1 f o r i in range ( s t a r t , n) :
2 i f i − 1 >= s t a r t and a [ i ] == a [ i − 1 ] :
3 continue
4 ...

12.2.4 More Combinatorics

In this section, we supplement more use cases of backtracking search in the
matter of other types of combinatorics.
248 12. COMBINATORIAL SEARCH

Figure 12.6: Acyclic graph

All Paths in Graph

For a given acyclic graph, enumerate all paths from a starting vertex s. For
example, for the graph shown in Fig. 29.6, and a starting vertex 0, print out
the following paths:
0 , 0−>1, 0−>1−>2, 0−>1−>2−>5, 0−>1−>3, 0−>1−>4, 0−>2, 0−>2−>5

Analysis The backtracking search here is the same as how to apply a DFS
on an explicit graph, with rather one extra point: a state path which might
have up to n items ( the total vertices of a graph). In the implementation,
the path vector will dynamically be modified to track all paths constructed
as the go of the DFS. The code is offered as:
1 d e f a l l _ p a t h s ( g , s , path , ans ) :
2 ans . append ( path [ : : ] )
3 for v in g [ s ] :
4 path . append ( v )
5 a l l _ p a t h s ( g , v , path , ans )
6 path . pop ( )

You can run the above code in the Goolge Colab to see how it works on our
given example.

Subsequences(940. Distinct Subsequences II)

Given a string, list all unique subsequences. There may or may not exist
duplicated characters in the string. For example, when s =0 1230 , there are
in total 7 subsequences, which are:
'', '1 ' , '2 ' , '3 ' , '12 ' , '13 ' , '23 ' , '123 '

When s =0 12230 which comes with duplicates, there are 12 subsequences:

' ' , '1 ' , '2 ' , '3 ' , '12 ' , '13 ' , '22 ' , '23 ' ,
'122 ' , '123 ' , '223 ' ,
'1223 '
12.2. BACKTRACKING 249

Analysis From Chapter Discrete Programming, we have explained that

we can count the number of unique subsequences through recurrence relation
and pointed out the relation of subsquences with subsets(combinations).
Let the number of unique subsequences of a sequence as seq(n) and the
number of unique subsets of a set as set(n) with n items in the input.
All subsequences are within subsets, and the subsequence set has larger
cardinality than subsets, |seq(n)| ≥ |set(n)|. From the above example, we
can also see that when there are only unique items in the sequence or when
there are duplicates but all duplicates of an item are adjacent to each other:

• The cardinality of subsequences and subsets equals, |seq(n) = set(n)|.

• The subsequences and subsets share the same items when the ordering
of the subsequences are ignored.

This indicates that the process of enumerating subsequences is almost the

same as of enumerating a powerset. This should give us a good start.

Implementation However, if we change the ordering of the duplicated

characters in the above string as s =0 12320 , there are in total 14 subse-
quences instead:
' ' , '1 ' , '2 ' , '3 ' , '12 ' , '13 ' , '23 ' , '22 ' , '32 ' ,
'123 ' , '122 ' , '132 ' , '232 ' ,
'1232 '

Therefore, our code to handle duplicates should differ from that of a pow-
erset. In the case of powerset, the algorithm first sorts items so that all
duplicates are adjacent to each other, making the checking of repetition as
simple as checking the equality of item with its predecessor. However, in a
given sequence, the duplicated items are not adjacent most of the time, we
have to do things differently. We draw the search tree of enumerating all
subsequences of string “1232” in Fig. 12.7. From the figure, we can observe
that to avoid redundant branches, we simply check if a current new item in
the subsequence is repeating by comparing it with all of its predecessors in
range [s, i]. The code for checking repetition is as:
1 def check_repetition ( start , i , a) :
2 f o r j in range ( s t a r t , i ) :
3 i f a [ i ] == a [ j ] :
4 r e t u r n True
5 return False

And the code to enumerate subsequences is:

1 d e f s u b s e q s ( a , n , s t a r t , d , c u r r , ans ) :
2 ans . append ( ' ' . j o i n ( c u r r [ : : ] ) )
3 i f d == n :
4 return
5
250 12. COMBINATORIAL SEARCH

Figure 12.7: The Search Tree of subsequences.The red circled nodes are
redundant nodes. Each node has a variable s to indicate the starting index
of candidates to add to current subsequence. i indicate the candidate to add
to the current node.

6 f o r i in range ( s t a r t , n) :
7 i f check_repetition ( start , i , a) :
8 continue
9 c u r r . append ( a [ i ] )
10 s u b s e q s ( a , n , i +1, d+1 , c u r r , ans )
11 c u r r . pop ( )
12 return

12.2.5 Backtracking in Action

So far, we have applied backtracking search to enumerate combinatorics.
In this section, we shall see how backtracking search along with search
pruning speedup methods solve two types of challenging NP-hard problems:
Constraint Satisfication Problems (CSPs) and Combinatorial Optimization
Problems.
As we have briefly introduced the speedup methods needed to solve larger
scale of CSPs and COPs. For example, assume within the virtual search tree,
the algorithm is currently at level 2 with state s = [s0 , s1 ]. If there are c
choices for state s1 , and if one choice is testified to be invalid, this will prune
off 1c of the whole search space. In this section, we demonstrate backtracking
search armored with Branch and Prune method solving CSPs and Branch
and Bound solving COPs.
12.3. SOLVING CSPS 251

12.3 Solving CSPs

Officially, a constraint satisfaction problem(CSP) consists of a set of n vari-
ables, each denoted as si , i ∈ [0, n − 1]; their respective value domains, each
denoted as di ; and a set of m constraints, each denoted as cj , j ∈ [0, m−1]. A
solution to a CSP is an assignment of values to all the variables such that no
constraint is violated. A binary CSP is one in which each of the constraints
involves at most two variables. A CSP can be represented by a constraint
graph which has a node for each variable and each constraint, and an arc
connecting variable nodes contained in a constraint to the corresponding
constraint node.
We explain a few strategies from the CSP-solver’s arsenal that can po-
tentially speedup the process:

1. Forward Checking: The essential idea is that when a variable X from

si is instantiated with a value x from its domain di , the domain of each
future uninstantiated variable Y is examined. If a value y is found such
that X = x conflicts with Y = y, then y is temporarily removed from
the domain of Y .

2. Variable Ordering: The order in which variables are considered while

solving a CSP method can have a substantial impact on the search
space. One effective ordering is always select the next variable with
the smallest remaining domain. In a dynamic variable ordering, the
order of variables is determined as the search progresses, and often
goes with forward checking which keeps updating the uninstantiated
variables’ domains. Selecting variable with the minimal domain first
can pinpoint the solution quickly given the fact that the branch is still
early on, and branch pruning at this stage is more rewarding. Another
reasoning is that each step, when we are multiplying di to the cost, we
are adding the least expensive one, making this a greedy approach.

Sudoku (L37)
A Sudoku grid shown in Fig. 12.8 is a n2 × n2 grid, arranged into n n × n
mini-grids each containing the values 1, ..., n such that no value is repeated
in any row, column (or mini-grid).

Search Space First, we analyze the number of distinct states in the search
space which relies on how we construct the intermediate states and our
knowledge in Enumerative combinatorics. We discuss two different formu-
lations on 9 × 9 grid:

1. For each empty cell in the puzzle, we create a set by taking values
1, ..., 9 and removing from it those values that appear as a given in the
252 12. COMBINATORIAL SEARCH

Figure 12.8: A Sudoku puzzle and its solution

same row, column, or mini-grid as that cell. Assume we have m spots

and the corresponding candidate set of each spot is ci , and initial cost
estimation can be obtained which is:
m−1
T (n) = (12.8)
Y
ci
i=0

2. Each row can be presented by a 9-tuples, there will be 9 rows in to-

tal, resulting 9 9-tuples to represent the search state. With ci as the
number of non-given values in the i-th 9-tuples, there are ci ! ways of
ordering these values by permuting.The number of different states in
the search space is thus:
8
T (n) = ci ! (12.9)
Y

i=0

The two different ways each takes a different approach to formulate the state
space, making its corresponding backtracking search differs too. We mainly
focus on the first formulation with backtracking search.

Speedups Assume we have known all empty spots(variables) to fill in

and we construct the search tree using backtracking. In our source code, we
did an experiment comparing the effect of ordering variables with minimal
domain first rule with arbitrary ordering. The experiment shows that the
first method is more than 100 times faster than the second solving the our
exemplary Sudoku puzzle. Therefore, we decide to always select the variable
that has the least domain set to proceed next in the backtracking.
Further, we apply forward checking, for the current variable and a value
we are able to assign, we recompute all the remaining empty spots’ domain
sets, and use the updated domain sets to decide:
• If this assigment will lead to empty domain for any of other remaining
spots, and if so, we terminate the search and backtrack.
12.3. SOLVING CSPS 253

• The spot to select next time with the ordering rule we choose.

Implementation We set aside three vectors of length 9, row_state, col_state,

and block_state to track the state of all 9 rows, columns, and grids. The
list has set() data structures as items, saving the numbers filled already in
that row, col, and grid respectively. Two stages in the implementation:
1. Initialization: We scan the whole each spot in the 9 × 9 grid to record
the states of the filled spots and to find all empty spots that waiting to
be filled in. With (i, j) to denote the position of a spot, it corresponds
to row_state[i], col_state[j], and block_state[i//3][j//3]. We
also write two functions to set and reset state with one assignment in
the backtracking. The Python code is as follows:
1 from copy import deepcopy
2 c l a s s Sudoku ( ) :
3 d e f __init__ ( s e l f , board ) :
4 s e l f . org_board = deepcopy ( board )
5 s e l f . board = deepcopy ( board )
6
7 def i n i t ( s e l f ) :
8 s e l f .A = s e t ( [ i f o r i i n r a n g e ( 1 , 1 0 ) ] )
9 s e l f . row_state = [ s e t ( ) f o r i i n r a n g e ( 9 ) ]
10 s e l f . col_state = [ s e t ( ) f o r i in range (9) ]
11 s e l f . block_state = [ [ s e t ( ) f o r i in range (3) ] f o r i in
range (3) ]
12 self . unfilled = []
13
14 f o r i in range (9) :
15 f o r j in range (9) :
16 c = s e l f . org_board [ i ] [ j ]
17 i f c == 0 :
18 s e l f . u n f i l l e d . append ( ( i , j ) )
19 else :
20 s e l f . row_state [ i ] . add ( c )
21 s e l f . c o l _ s t a t e [ j ] . add ( c )
22 s e l f . b l o c k _ s t a t e [ i / / 3 ] [ j / / 3 ] . add ( c )
23
24 def set_state ( s e l f , i , j , c ) :
25 s e l f . board [ i ] [ j ] = c
26 s e l f . row_state [ i ] . add ( c )
27 s e l f . c o l _ s t a t e [ j ] . add ( c )
28 s e l f . b l o c k _ s t a t e [ i / / 3 ] [ j / / 3 ] . add ( c )
29
30 def reset_state ( s e l f , i , j , c ) :
31 s e l f . board [ i ] [ j ] = 0
32 s e l f . row_state [ i ] . remove ( c )
33 s e l f . c o l _ s t a t e [ j ] . remove ( c )
34 s e l f . b l o c k _ s t a t e [ i / / 3 ] [ j / / 3 ] . remove ( c )

2. Backtracking search with speedups: In the initialization, we have an-

other variable A used as the domain set of the current processing spot.
254 12. COMBINATORIAL SEARCH

To get the domain set according to the constraints, a simple set opera-
tion is executed as: A−(row_state[i]|col_state[j]|block_state[i//3][j//3]).
In the solver, each time, to pick a spot, we first update all remaining
spots in the unfilled and then choose the one with minimal domain.
This process takes O(n) which is trivial compared with the cost of the
searching, with 9 for computing domain set of a single spot, 9n for n
spots, and adding another n to 9n to choose the one with the smallest
size. The solver is implemented as:
1 d e f _ret_len ( s e l f , a r g s ) :
2 i , j = args
3 o p t i o n = s e l f .A − ( s e l f . row_state [ i ] | s e l f . c o l _ s t a t e [ j
] | s e l f . b l o c k _ s t a t e [ i //3 ] [ j / / 3 ] )
4 return len ( option )
5
6 def solve ( s e l f ) :
7 i f l e n ( s e l f . u n f i l l e d ) == 0 :
8 r e t u r n True
9 # Dynamic v a r i a b l e s o r d e r i n g
10 i , j = min ( s e l f . u n f i l l e d , key = s e l f . _ret_len )
11 # Forward l o o k i n g
12 o p t i o n = s e l f .A − ( s e l f . row_state [ i ] | s e l f . c o l _ s t a t e [ j
] | s e l f . b l o c k _ s t a t e [ i //3 ] [ j / / 3 ] )
13 i f l e n ( o p t i o n ) == 0 :
14 return False
15 s e l f . u n f i l l e d . remove ( ( i , j ) )
16 for c in option :
17 s e l f . set_state ( i , j , c )
18 i f s e l f . solve () :
19 r e t u r n True
20 # Backtracking
21 else :
22 s e l f . reset_state ( i , j , c )
23 # Backtracking
24 s e l f . u n f i l l e d . append ( ( i , j ) )
25 return False

12.4 Solving Combinatorial Optimization Problems

Combinatorial optimization is an emerging field at the forefront of combi-
natorics and theoretical computer science that aims to use combinatorial
techniques to solve discrete optimization problems. From a combinatorics
perspective, it interprets complicated questions in terms of a fixed set of
objects about which much is already known: sets, graphs, polytopes, and
matroids. From the perspective of computer science, combinatorial opti-
mization seeks to improve algorithms by using mathematical methods either
to reduce the size of the set of possible solutions or to make the search itself
faster.
12.4. SOLVING COMBINATORIAL OPTIMIZATION PROBLEMS 255

Genuinely, the inner complexity of a COP is at least of exponential, and

its solutions fall into two classes: exact methods and heuristic methods. In
some cases, we may be able to find efficient exact algorithms with either
greedy algorithms or dynamic programming technique such as finding the
shortest paths on a graph can be solved by the Dijkstra (greedy) or Bellman-
Ford algorithms(dynamic programming) to provide exact optimal solutions
in polynomial running time. For more complex problems, COP can be
mathematically formulated as a Mixed Linear Programming(MILP) model
and which is generally solved using a linear-programming based branch-and-
bound algorithm. But, in other cases no exact algorithms are feasible, and
the following randomized heuristic search algorithms though we do not cover
in this section should be applied:

1. Random-restart hill-climbing.

2. Simulated annealing.

3. Genetic Algorithms.

4. Tabu search.

Model Combinatorial Optimization Problems It is a good prac-

tice to formulate COPs with mathematical equations/inequations, which
includes three steps:

1. Choose the decision variables that typically encode the result we are
interested in, such that in a superset problem, each item is a variable,
and each variable includes two decisions: take or not take, making its
value set as 0, 1.

2. Express the problem constraints in terms of these decision variables

to specify what the feasible solutions of the problem are.

3. Express the objective function to specify the quality of each solution.

There are generally many ways to model a COP.

Branch and Bound Branch and bound (BB, B&B, or BnB) is an al-
gorithm design paradigm for discrete and combinatorial optimization prob-
lems, as well as mathematical optimization. A branch-and-bound algorithm
consists of a systematic enumeration of candidate solutions by means of
state space search: the set of candidate solutions is thought of as forming a
rooted tree with the full set at the root. The algorithm explores branches
of this tree, which represent subsets of the solution set. Before enumerating
the candidate solutions of a branch, the branch is checked against upper
and lower estimated bounds on the optimal solution, and is discarded if it
256 12. COMBINATORIAL SEARCH

cannot produce a better solution than the best one found so far by the algo-
rithm. “Branching” is to split problem into a number of subproblems, and
“bounding” is to find an optimistic estimation of the best solution to the
the subproblems to either maximize the upper bound or minimize the lower
bound. To get the optimistic estimation, we have to relax constraints. In this
section, we will exemplify both the minimization(TSP) and maximization
problem(Knapsack).

Search Strategies In practice, we can apply different search strategies to

enumerate the search space of the problem, such as depth-first, best-first,
and least-discrepancy search. The way of how each listed strategy is applied
in the combinatorial optimization problems is:

• Depth-First: it prunes when a node estimation is worse than the best

found solution.

• Best-First: it selects the node with the best estimation among the
frontier set to expand each time. Worst scenario, the whole search tree
have to be saved as long the best estimation is extremely optimistic
and not a single branch is pruned in the process.

• Least-Discrepancy: it trusts a greedy heuristic, and then move away

from the heuristic in a very systematic fashion.

In this section, we discuss exact algorithms using Branch and Bound

with a variation of search strategies. During the interviews, questions that
have polynomial exact solutions are more likely to appear, proving your mas-
tery of dynamic programming or greedy algorithms design methodologies.
However, it is still good to discuss this option.

12.4.1 Knapsack Problem

Given n items with weight and value indicated by two vectors W and V
respectively. Now, given a knapsack with capacity c, maximize the value of
items selected into the knapsack with the total weight being bounded by c.
Each item can be only used at most once. For example, given the following
data, the optimal solution is to choose item 1 and 3, with total weight of 8,
and optimal value of 80.
c = 10
W = [5 , 8 , 3]
V = [ 4 5 , 48 , 35]

Search Space In this problem, xi denotes each item, and wi , vi for its
corresponding weight and value, with i ∈ [0, n − 1]. Each item can either
be selected or left behind, indicating xi ∈ 0, 1. The selected items can not
12.4. SOLVING COMBINATORIAL OPTIMIZATION PROBLEMS 257

exceed the capacity, making n−1

i=0 wi xi ≤ c. And we capture the total value
P

of the selected items as n−1

i=0 i xi . Putting it all together:
P
v

n−1
max (12.10)
X
vi xi
v,x
i=0
n−1
s.t. (12.11)
X
wi xi ≤ c
i=0
xi ∈ 0, 1 (12.12)

With each variable having two choices, our search space is as large as 2n .

Branch and Bound To bound the search, we have to develop a heuristic

function to estimate an optimistic–maximum–total value a branch can lead
to.
In the case of knapsack problem, the simplest estimation is summing up
the total values of selected items so far, and estimate the maximum value
by adding the accumulated values of all remaining unselected items along
the search.
A tighter heuristic function can be obtained with constraint relax-
ation. By relaxing the condition of simply choose {0, 1} to [0, 1], that a
fraction of an item can be chosen at any time. By sorting the items by
the value per unit wvii , then a better estimate can be obtained by filling
the remaining capacity of knapsack with unselected items, with larger unit
value first be considered. A branch is checked on the optimal solution so far
against the lower estimated bound in our case, and is discarded if it cannot
produce a better solution than the best one found so far by the algorithm.
Both heuristic functions are more optimistic compared with the true value,
but the later is a tighter bound, being able to prune more branches along
the search and making it more time efficient. We demonstrate branch and
bound with two different search strategies: DFS(backtracking) and Best-
First search.

Depth-First Branch and Bound

We set up a class BranchandBound to implement this algorithm. First, in
the initiation, we add additional wvii to mark each item’s value per unit,
and sort these items by this value in decreasing order. Second, we have a
function estimate which takes three parameters: idx as start index of the
remaining items, curval is the total value based on all previous decision,
and left_cap as the left capacity of the knapsack. The code snippet is:
1 import heapq
2
3 c l a s s BranchandBound :
258 12. COMBINATORIAL SEARCH

Figure 12.9: Depth-First Branch and bound

4 d e f __init__ ( s e l f , c , v , w) :
5 s e l f . best = 0
6 self .c = c
7 s e l f . n = len (v)
8 s e l f . i t e m s = [ ( v i / wi , wi , v i ) f o r _, ( v i , wi ) i n enumerate (
z i p ( v , w) ) ]
9 s e l f . i t e m s . s o r t ( key=lambda x : x [ 0 ] , r e v e r s e=True )
10
11 d e f e s t i m a t e ( s e l f , idx , c u r v a l , l e f t _ c a p ) :
12 est = curval
13 # u s e t h e v/w t o e s t i m a t e
14 f o r i i n r a n g e ( idx , s e l f . n ) :
15 r a t i o , wi , _ = s e l f . i t e m s [ i ]
16 i f l e f t _ c a p − wi >= 0 : # u s e a l l
17 e s t += r a t i o ∗ wi
18 l e f t _ c a p −= wi
19 e l s e : # use part
20 e s t += r a t i o ∗ ( l e f t _ c a p )
21 left_cap = 0
22 return est

In the Depth-first search process, it consists of two main branches: one

considering to choose the current item, and the other to handle the case
while the item is not selected. For the first branch, it has to be bounded
by the capacity and the comparison of the best found solution against to
the estimation. Additional status is to assist to visualize the process of
the search, which tracks the combination of items. The process is shown in
Fig. 12.9. And the code is as:
1 d e f d f s ( s e l f , idx , e s t , v a l , l e f t _ c a p , s t a t u s ) :
12.4. SOLVING COMBINATORIAL OPTIMIZATION PROBLEMS 259

2 i f i d x == s e l f . n :
3 s e l f . b e s t = max( s e l f . b e s t , v a l )
4 return
5 p r i n t ( status , val , left_cap , e s t )
6
7 _, wi , v i = s e l f . i t e m s [ i d x ]
8 # Case 1 : c h o o s e t h e item
9 i f l e f t _ c a p − wi >= 0 : # prune by c o n s t r a i n t
10 # Bound by e s t i m a t e , i n c r e a s e v a l u e and volume
11 i f est > s e l f . best :
12 s t a t u s . append ( True )
13 n e s t = s e l f . e s t i m a t e ( i d x +1, v a l+v i , l e f t _ c a p − wi )
14 s e l f . d f s ( i d x +1, n e s t , v a l+v i , l e f t _ c a p − wi , s t a t u s )
15 s t a t u s . pop ( )
16
17 # Case 2 : not c h o o s e t h e item
18 i f est > s e l f . best :
19 s t a t u s . append ( F a l s e )
20 n e s t = s e l f . e s t i m a t e ( i d x +1 , v a l , l e f t _ c a p )
21 s e l f . d f s ( i d x +1, n e s t , v a l , l e f t _ c a p , s t a t u s )
22 s t a t u s . pop ( )
23 return

Best-First Branch and Bound

Within Best-First search, we use priority queue with the estimated value,
and each time the one with the largest estimated value within the frontier set
is expanded first. Similarly, with branch and bound, we prune branch that
has estimated value that would never surpass the best solution up till then.
The search space is the same as in Fig. 12.9 except that the search process
is different from depth-first. In the implementation, the priority queue is
implemented with a min-heap where the minimum value is firstly popped
out, thus we use the negative estimated value to make it always pop out the
largest value conveniently instead of write code to implement a max-heap.
1 def bfs ( s e l f ) :
2 # t r a c k v a l , cap , and i d x i s which item t o add next
3 q = [( − s e l f . e s t i m a t e ( 0 , 0 , s e l f . c ) , 0 , s e l f . c , 0 ) ] #
estimate , val , left_cap , idx
4 s e l f . best = 0
5 while q :
6 e s t , v a l , l e f t _ c a p , i d x = heapq . heappop ( q )
7 e s t = −e s t
8 _, wi , v i = s e l f . i t e m s [ i d x ]
9 i f i d x == s e l f . n − 1 :
10 s e l f . b e s t = max( s e l f . b e s t , v a l )
11 continue
12
13 # Case 1 : c h o o s e t h e item
14 n e s t = s e l f . e s t i m a t e ( i d x + 1 , v a l + v i , l e f t _ c a p − wi )
15 i f nest > s e l f . best :
260 12. COMBINATORIAL SEARCH

16 heapq . heappush ( q , (− n e s t , v a l + v i , l e f t _ c a p − wi , i d x
+ 1) )
17
18 # Case 2 : not c h o o s e t h e item
19 nest = s e l f . estimate ( idx + 1 , val , left_cap )
20 i f nest > s e l f . best :
21 heapq . heappush ( q , (− n e s t , v a l , l e f t _ c a p , i d x + 1 ) )
22 return

12.4.2 Travelling Salesman Problem

Figure 12.10: A complete undirected weighted graph.

Given a set of cities and the distances between every pair, find the short-
est possible path that visits every city exactly once and returns to the origin
city. For example, with the graph shown in Fig. 12.10, such shortest path is
[0, 1, 3, 2, 0] with a path weight 80.

Search Space In TSP, a possible complete solution is a Hamiltonian cycle,

a graph cycle that visits each vertex exactly once. Since it is a cycle, it does
not matter where it starts. For convenience, we choose vertex 0 as the
origin city. Therefore, in our example, our path starts and ends at 0, and
the remaining n − 1 vertices between will be a permutation of these vertices,
making the complexity as (n − 1)!.
Because this is a complete graph, it might be tempting to apply back-
tracking on the graph to enumerate all possible paths and find and check
possible solutions. However, this path searching will build a n − 1-ary
n −1
search
tree with height equals to n − 1, making the complexity as (n−1) n−2 , which
is larger than the space of permutation among n − 1 items. Therefore, in
our implementation, we apply backtracking to enumerate all permutations
of n − 1 vertices and check its corresponding cost.

Speedups Since we only care about the minimum cost, then any partial
result that has cost larger than the minimum cost of all known complete
12.5. EXERCISES 261

solutions can be prunned. This is the Branch and bound method that we
have introduced that is often used in the combinatorial optimization.

Implementation We built the graph as a list of dictionaries, each dictio-

nary stores the indexed vertex’s other cites and its corresponding distance
as key and value respectively. Compared with standard permutation with
backtracking, we add four additional variables: start to track the starting
vertex, g to pass the graph to refer the distance information, mincost to
save the minimum complete solution so far found, and cost to track the
current partial path’s cost. The code is shown as:
1 d e f t s p ( a , d , used , c u r r , ans , s t a r t , g , mincost , c o s t ) :
2 i f d == l e n ( a ) :
3 # Add t h e c o s t from l a s t v e r t e x t o t h e s t a r t
4 c = g [ curr [ −1]][ start ]
5 c o s t += c
6 i f c o s t < mincost [ 0 ] :
7 mincost [ 0 ] = c o s t
8 ans [ 0 ] = c u r r [ : : ] + [ s t a r t ]
9 return
10
11 for i in a :
12 i f not used [ i ] and c o s t + g [ c u r r [ − 1 ] ] [ i ] < m i n c o s t [ 0 ] :
13 c o s t += g [ c u r r [ − 1 ] ] [ i ]
14 c u r r . append ( i )
15 used [ i ] = True
16 t s p ( a , d + 1 , used , c u r r , ans , s t a r t , g , mincost , c o s t )
17 c u r r . pop ( )
18 c o s t −= g [ c u r r [ − 1 ] ] [ i ]
19 used [ i ] = F a l s e
20 return

TSP is a NP-hard problem, and there is no known polynomial time solution

so far.

Other Solutions
Whenever we are faced with optimization, we are able to consider the other
two algorithm design paradigm–Dynamic Programming and Greedy Algo-
rithms. In fact, the above two problems both have its corresponding dynamic
programming solutions: for knapsack problem, polynomial solution is pos-
sible; for TSP, though it is still of exponential time complexity, it is much
better than O(n!). We will further discuss these two problems in Chapter
Dynamic Programming.

12.5 Exercises
1. 77. Combinations
262 12. COMBINATORIAL SEARCH

2. 17. Letter Combinations of a Phone Number

3. 797. All Paths From Source to Target

4. N-bit String: enumerate all n-bit strings with backtracking algorithm,

for example:
n = 3 , a l l 3− b i t s t r i n g s a r e :
0 0 0 , 0 0 1 , 0 1 0 , 0 1 1 , 1 0 0 , 1 0 1 , 1 1 0 , 111

5. 940. Distinct Subsequences II

6. N-queen

7. Map-coloring

8. 943. Find the Shortest Superstring (hard). Can be moduled as trav-

eling salesman problem and dynamic programming
13

Reduce and Conquer

“Everything should be made as simple as possible, but not simpler.”

– Albert Einstein,

This chapter is the essence of the algorithmic problem solving–‘Reduction’.

Reduction is the essence of problem solving, and self-reduction

is the “center” of the essence. Recurrence relation is our tool
from math. The correctness of self-reduction is proved with
mathematical induction. And the complexity analysis relies
!
on solving recurrence relation.

13.1 Introduction
Story Imagine that your mom asks you to get 10,000 pounds of corns,
what would you do? First, you would think, where should I get the corns?
I can go to Walmart or I can go to grow the corns in the farm. This is
when one problem/task is reduced to some other problems/tasks. Solving
the other ones means you solved your assignment from your mom. This is
one example of the reduction; converting problem A to problem B.
Now, you are at Walmart and are ready to load the 10,000 pounds of
bagged corns, but the trunk of your car can not fit all corns at once. You just
decide that you want to do 10 rounds of loading and transporting to home.
Now, your task becomes loading 1,000 pounds of corns. After you are done
with this, you just solved a subtask–getting 1,000 pounds of corns. In the
second round, you load another 1,000 pounds. You solved another subtask–
getting 2,000 pounds of corns. After 10 rounds in total, you will solve the

263
264 13. REDUCE AND CONQUER

original task. This is the other side of reduction, reduce one problem to one
or multiple smaller instances of itself.

Definition of Reduction In computational theory and computational

complexity theory, a reduction is an algorithm for transforming one problem
A into another problem or other problems. There are two types of reduction:

1. Self-Reduction: it can also be a problem that are just a smaller

instance or we say subproblems of itself, say if the original problem
is an , then the smaller problems can be an/2 , an−1 , an−2 , and so on.
Self-reduction is a recursive process; we reduce the problem into one
or more subproblems of smaller size recursively until the subproblem
is small enough to be a base case. We need to differentiate whether
the subproblem is reduced by constant factor or just by constant size.

• If it is by constant size, say an−k , this will characterize searching,

dynamic programming and greedy algorithm.
• If it is by constant factor, say in the form of an/b , b is integer
b ≥ 2, this can be used to characterize divide and conquer which
we detail on it further in Section. 13.2.1.

The Recurrence relations which we have put so much effort on

in last chapter will conveniently represent and interpret the relation
between problem and its subproblems in self-reduction. Optionally,
we can also use recursion tree to help with visualization. In the next
two sections, we shall see how and discuss additional techniques for
each type.

2. A to B: The other problem can be a totally different problem, say B.

Intuitively, if we know how to solve B, this induces a solution to A. On
the other hand, this also means that if any of A and B is unsolvable,
then it indicates or proves that the other is unsolvable. More details
will be given in Section. 13.5.

Reducing a Problem to Subproblem(s)

“Reducing” a problem into subproblem(s) as the first step of using self-
reduction will result potentially two types of subproblems: non-overlapping
subproblems and overlapping subproblems.

Non-overlapping subproblems Like cutting a rod into multiple pieces,

the resulting subproblems each stand alone, disjoint with each other and
become another rod which is just smaller. The most general way is to
divide equally, thus conventionally an/b means the problem is reduced into
non-overlapping subproblems and each with size n/b.
13.2. DIVIDE AND CONQUER 265

Overlapping subproblems Different from the non-overlapping problems,

the feature of overlapping problem is more abstract. Easily put, it means
subproblems share subproblems. Say an is reduced to an−1 and an−2 , and
according to this recursive rule, an−1 will be reduced to an−2 and an−3 . Now
we can see that problem an and an−1 both share a common subproblem an−2 ,
this is to say that these problems might overlap. Overlapping subproblems
is one of the signals that further optimization might apply, which is detailed
in Dynamic programming in Chapter. 15.

Self-Reduction and Mathematical Induction

The word ‘self-Reduction’ is not commonly or even put under the umbrella
of ‘reduction’. In other materials, you might see that the content of self-
reduction appears in the form of mathematical induction1 . Self-Reduction
and Mathematical Induction are inseparable, as self-reduction can be rep-
resented with recurrence relation, and the mathematical induction is the
most straightforward and powerful tool to prove its correctness and their
concentration aligns–“concentrating on reducing a problem and solving sub-
problems rather than solving it directly”.
Mathematical induction can guide us to reduce the problem: we assume
we know the solution from problems of size an/b , or an−k , we focus on how
to construct a solution for an with solutions to our subproblems such as an/b
and an−k .
We will further see the distinction of these two characteristics of problems
in our following examples.

13.2 Divide and Conquer

13.2.1 Concepts

Figure 13.1: Divide and Conquer Diagram

1
Such as Introduction to Algorithms: A Creative Approach.
266 13. REDUCE AND CONQUER

Divide and conquer is the most fundamental problem solving paradigm

for computer programming; the strategy is to divide a problem into smaller
problems recursively until the subproblem is trivial to solve. In more details,
it consists of two process:

1. Divide: divide one problems into a series of non-overlapping subprob-

lems that are smaller instances of the same problem recursively until
reaching to the bases cases, when the subproblem is trivial to solve.
Usually, the problem is divided equally, and most likely breaking into
half and half. We say a problem of size n, denote as pn is divided into
a subproblems and each with size n/b, denote as apn/b , a, b are mostly
integer and a ≥ 1, b ≥ 2. As we explained in Chapter. III, this process
happens at the top-down pass.

2. Conquer: this step means that in the bottom-up pass, say we have
the solutions of the a subproblems each with size n/b available, we
need to combine these solutions to our current problem of size n.

We can interpret the divide and conquer with a recurrence relation as in

Eq. 13.1
pn = Ψ(n, apn/b ) (13.1)
Ψ will no longer be a function any more, but instead it represents the op-
erations needed to combine the the solutions to subproblems to solution of
current problem, n means the size of the combined solutions will be mostly
n, which also means n elements.

Decrease and Conquer When a = 1, each problem reduced to only one

sub-problem, and this case is named as Decrease and Conquer. Decrease
and conquer will reduce search space each step. If our time complexity is
T (n) = T (n/2) + O(1), we get O(log n). This decrease and conquer method
cuts the search space into half of its original at each step until it reaches its
target. Because logarithmic is way faster even compared with linear, this is
a significant efficiency growth. We will discuss classical algorithms with this
paradigm such as Binary Search, Binary Search Tree, Segment Tree in the
next chapter.

Common Applications of Divide and Conquer

Divide-and-conquer is mostly used in some well-developed algorithms and
some data structures. In this book, we covered the follows:

• Various sorting algorithms like Merge Sort, Quick Sort (Chapter 15);

• Binary Search (Section ??);

• Heap(Section ??);
13.2. DIVIDE AND CONQUER 267

• Binary Search Tree (Section 27.1);

• Segment Tree(Section 27.2).

13.2.2 Hands-on Examples

Merge Sort
The concept can be quite dry, let us look at a simple example of merge sort.
Given an array, [2, 5,1,8,9], the task is to sort the array to [1, 2, 5, 8, 9]. To
apply divide and conquer, we first divide it into two halves: [2, 5, 1], [8, 9],
sort each half and with return result [1, 2, 5], [8, 9], and now we just need
to merge the two parts. The process can be represented as the following:
1 d e f d i v i d e _ c o n q u e r (A, s , e ) :
2 # b a s e c a s e , can not be d i v i d e d f a r t h e r
3 i f s == e :
4 r e t u r n A[ s ]
5 # d i v i d e i n t o n / 2 , n/2 from middle p o s i t i o n
6 m = ( s+e ) // 2
7
8 # co nqu er
9 s 1 = d i v i d e _ c o n q u e r (A, s , m)
10 s 2 = d i v i d e _ c o n q u e r (A, m+1 , e )
11
12 # combine
13 r e t u r n combine ( s1 , s 2 )

This process can be visualized in Fig. 13.2. From the visualization, we

can clearly see that all subproblems form a tree and they never interact or
overlap with each other, and each subproblem will only be visited once.

Maximum Subarray (53. medium).

Find the contiguous subarray within an array (containing at least one num-
ber) which has the largest sum.
For example , g i v e n t h e a r r a y [ − 2 , 1 , − 3 , 4 , − 1 , 2 , 1 , − 5 , 4 ] ,
t h e c o n t i g u o u s s u b a r r a y [ 4 , − 1 , 2 , 1 ] has t h e l a r g e s t sum = 6 .

Solution: divide and conquer. T (n) = max(T (lef t), T (right), T (cross)),
max is for merging and the T(cross) is for the case that the potential sub-
array across the mid point. For the complexity, T (n) = 2T (n/2) + n, if we
use the master method, it would give us O(nlgn). We write the following
Python code
1 d e f maxSubArray ( s e l f , nums ) :
2 """
3 : type nums : L i s t [ i n t ]
4 : rtype : int
5 """
6 d e f getCrossMax ( low , mid , h i g h ) :
268 13. REDUCE AND CONQUER

Figure 13.2: Merge Sort with non-overlapping subproblems where subprob-

lems form a tree

7 left_sum , right_sum =0 ,0
8 left_max , right_max = −maxint , −maxint
9 l e f t _ i , r i g h t _ j =−1,−1
10 f o r i i n xrange ( mid , low −1,−1) : #[ )
11 left_sum+=nums [ i ]
12 i f left_sum>left_max :
13 left_max= left_sum
14 left_i = i
15 f o r j i n xrange ( mid+1, h i g h +1) :
16 right_sum+=nums [ j ]
17 i f right_sum>right_max :
18 right_max= right_sum
19 right_j = j
20 r e t u r n ( l e f t _ i , r i g h t _ j , left_max+right_max )
21
22 d e f maxSubarray ( low , h i g h ) :
23 i f low==h i g h :
24 r e t u r n ( low , high , nums [ low ] )
25 mid = ( low+h i g h ) //2
26 r s l t =[]
27 #l e f t _ l o w , l e f t _ h i g h , left_sum = maxSubarray ( low , mid
13.3. CONSTANT REDUCTION 269

) #[low , mid ]
28 r s l t . append ( maxSubarray ( low , mid ) ) #[ low , mid ]
29 #right_low , r i g h t _ h i g h , right_sum = maxSubarray ( mid+1 ,
h i g h ) #[mid+1 , h i g h ]
30 r s l t . append ( maxSubarray ( mid+1 , h i g h ) )
31 #cross_low , c r o s s _ h i g h , cross_sum = getCrossMax ( low ,
mid , h i g h )
32 r s l t . append ( getCrossMax ( low , mid , h i g h ) )
33 r e t u r n max( r s l t , key=lambda x : x [ 2 ] )
34 r e t u r n maxSubarray ( 0 , l e n ( nums ) −1) [ 2 ]

13.3 Constant Reduction

13.3.1 Concepts
In this category, problem instance of size n is reduced to one or more in-
stances of size n − 1 or less recursively until the subproblem is small and
trivial to solve. This process can be interpreted with Eq. 13.2.

pn = Ψ(n, pn−1 , pn−2 , ..., pn−k ) for n ≤ k, (13.2)

The number of subproblems that a current problem relies on should be as

less as possible. The ideal option is when it only relates to n − 1, this is the
case of an exhaustive search, which can be implemented easily both with
recursion and iteration.

Overlapping Subproblems
When the number of the subproblems appears in this relation is larger or
equals to 2, the subproblems might overlap. This implies that a straight-
forward recursion based solution without optimization will be expensive be-
cause these overlapped problems are solved again and again; the optimiza-
tion is possible with dynamic programming or greedy algorithm shown in
Part. ?? which optimize it using caching mechanism by saving the solution
of each subproblem and thus avoiding recomputation. However, to stick to
just the reduction itself, we delay our examples’ possible optimization to
Part. ??.

Subproblem Space
To count all possible subproblems– the subproblem space–is important for
us to understand the complexity. For array, a subproblem can be a subarray
that [ai , ..., aj ], i < j, i ∈ [0, n − 1], j ∈ [0, n − 1], which makes the potential
subproblems n2 . Sometimes, it is enough to fix ai = a0 , that the subarray
always start from the start. The reduction by constant size will be less
likely to be seen in tree structure where its more organized by the divide
270 13. REDUCE AND CONQUER

and conquer reduction. In graph, this type of reduction??? In practice,

we shall always try to define our subproblems that comes with smallest
possible surbproblem space, we would only enlarge it when we feel the further
shrinking will make the construction of the solution impossible.

13.3.2 Hands-on Examples

Figure 13.3: Fibonacci number with overlapping subproblems where sub-

problems form a graph.

Example 2: Fibonacci Sequence The Fibonacci Sequence is defined

as:
Given f ( 0 ) =0 , f ( 1 )= 1 , f ( n ) = f ( n−1) + f ( n−2) , n>=2. Return t h e
v a l u e f o r any g i v e n n .

The above is the classical Fibonacci Sequence, to get the fibonacci number
at position n, we first need to know the answer for subproblems f(n-1) and
f(n-2), we can solve it easily using recursion function:
1 def f i b (n) :
2 i f n <= 1 :
3 return n
4 r e t u r n f i b ( n−1) + f i b ( n−2)

The above recursion function has recursion tree shown in Fig ??. And we
also draw the recursion tree of recursion function call for merge sort and
shown in Fig 13.2. We notice that we call f(2) multiple times for fibonacci
but in the merge sort, each call is unique and wont be called more than
once. The recurrence function of merge sort is T (n) = 2 ∗ T (n/2) + n, and
for fibonacci sequence it is T (n) = T (n − 1) + T (n − 2) + 1.

Maximum Subarray Using reduction by constant size, the problem is to

find a subarray ai , ai+1 , ..., aj , i >= 0, i ≥ j < n. Now, let’s assume that the
13.4. DIVIDE-AND-CONQUER VS CONSTANT REDUCTION 271

solution of subarray of size n − 1 is known with the induction hypothesis,

try to figure out the “operations” to construct the solution for n.
For subproblem of size 4, it is [-2,1,-3], where is the maximum subarray?
A naive and human doable way is to check all subarries, which will be
s u b a r r a y s t a r t from −2: −2, −1, −4
s u b a r r a y s t a r t from 1 : 1 , −2
s u b a r r a y s t a r t from −3: −3

we would find [1] is our maximum subarray, to subproblem [-2,1,-3,4], to

construct the solution, we have two cases:

1. j = n − 1, to put 4 inside, we just need to make a choice in three cases:

previous maximum subarray [ai , ..., an−1 , extented maximum subarray
[ai , ..., an− , an , and [an ].

2. j 6= n − 1, in this case, say our maximum subarray is [ai , ..., aj ], j 6=

n − 1, the ecase of extension is more complex, that we need to try
extend any item in [aj+1 , an ]. Its doable but gives us time complexity
of T (n) = T (n − 1) + O(n), this is the same as of in the naive solution
to enumerate all subarries.

13.4 Divide-and-conquer VS Constant Reduction

Therefore, we draw the conclusion:

1. For non-overlapping as Eq. ??, when we use recursive programming

to solve the problem directly, we get the best time complexity since
there is no overlap between subproblems.

2. For overlapping problems as Eq. ??, programming them recursively

would end up with redundancy in time complexity because some sub-
problems are computed more than one time. This also means they
can be further optimized: either using recursive with memoization or
iterative through tablization as we later explain in Chapter Dynamic
Programming (Chapter 15.

Practical Guideline With the master theorem to either divide and con-
quer or reducing by constant size tells us its asympototic time complexity:
divide and conquer is either polynomial or logarithmic and reducing by con-
stant size will go up to exponential with f (n) ≥ n. This reminds us that
when our problem state space is beyond exponential, we might better try re-
ducing by constant size, and if the problem state space is within polynomial,
a divide and conquer should work better to further boost the efficiency.
272 13. REDUCE AND CONQUER

13.5 A to B
13.5.1 Concepts
Definition Reduction or transformation between two problems A and B
is to say that a solution of one problem is also a solution of the other. It’s
common applications are:

1. Design algorithms: given algorithm for B, can solve our problem for
A.

2. Proving limits: If A is hard that its lower bound is known, then so is

3. Classifying problems using the established relative difficulty of prob-

lems.

The strategy is to reduce problem A into a more general-purposed prob-

lem. One of such general-purposed problem is linear programming, which
we study in Chapter. ??. Only experience and study of the classical algo-
rithms designed on a certain data structures can help us identify possible
‘patterns’. In this section, we emphasize the principle of this method instead
of pointing out all possible reduction; as we are further in the book and the
journey of practice, we will learn.

13.5.2 Practical Guideline and Examples

Sorting and Ordering is always a good try. We only focus on Design
algorithms in this book with this method. In practice, sorting or ordering
is a common trick to reduce a problem to another that is more structured
and with well-known algorithms to solve. For example, if we need to find
the k-th largest item in an array, sorting the array first serves a more gen-
eral purpose and solves the problem. Same with the problem that we are
given an unsorted array, find if there exist duplicate items. This case fully
demonstrated how a well-known and a general purposed sorting algorithm
can be used to solve a problem that might seems bizarre or unique.
However, more likely sorting will be a subroutine of our algorithm design,
whose purpose is to order our input in the hope that it simplifies the following
design. Similar examples can be found across the book, other example
will be in the convex hull problem where the points are sorted by angle to
outskirt chosen point. The real-world is fuzzy, as a computer scientist, we
are responsible to enforce orders, so always keep this in mind.

Other Reduction Examples There are also more creative reduction, as

we see in the last section of example maximum subarray, we were having a
13.6. THE SKYLINE PROBLEM 273

problem to use straightforward induction hypothesis, we can strengthen our

hypothesis.
If we reduce our problem to be: find the maximum subarray that ends
at n, that is find [ai , ..., an−1 ], i >= 0 in the array that has the maximum
value. We simply only have n candidates to compare. To constrcut this
reduced problem B back to A, the maximum subarray of A is the maximum
subarray of n problems of B, say [a0 ], [a0 , a1 ], ..., [a0 , ..., an−1 ]. The rule of
reduction can happen into B, that is case j = n − 1 in the original problem
is enough to construct it. We can write pn = max(pn−1 , pn−1 + an , an ).
In the array, a suffix is defined as any subarry which includes its last
item. Another way to put the induction hypothesis into the problem B: We
know how to find the maximum suffix of size k < n, we can easily induce
the maximum suffix of size k + 1.
Another more creative option we convert this problem to best time to
buy and sell stock problem.[0, -2, -1, -4, 0, -1, 1, 2, -3, 1], => O(n), then we
use prefix_sum, the difference is we set prefix_sum to 0 when it is smaller
than 0, O(n)
1 from s y s import maxint
2 class Solution ( object ) :
3 d e f maxSubArray ( s e l f , nums ) :
4 """
5 : type nums : L i s t [ i n t ]
6 : rtype : int
7 """
8 max_so_far = −maxint − 1
9 prefix_sum= 0
10 f o r i i n r a n g e ( 0 , l e n ( nums ) ) :
11 prefix_sum+= nums [ i ]
12 i f ( max_so_far < prefix_sum ) :
13 max_so_far = prefix_sum
14
15 i f prefix_sum< 0 :
16 prefix_sum= 0
17 r e t u r n max_so_far

13.6 The Skyline Problem

Define and solve it by two cases.
Both skyline problem and maximum subarray problem has illustrated
how we can use reduction to solve our problem, either self-reduction or the
A to B is used. The real algorithm design is usually a composite of multiple
design steps and methods.

13.7 Exercises
1. Binary Search.
274 13. REDUCE AND CONQUER

2. Use Self-Reduction by constant size to solve maximum subarray prob-

lem.

3. Skyline problem.
14

Decrease and Conquer

Want to do even better than linear complexity? Decrease and conquer re-
duces one problem into one smaller subproblem only, and the most common
case is to reduce the state space into half of its original size. If the combining
step takes only constant time, we get an elegant recurrence relation as:

T (n) = T (n/2) + O(1), (14.1)

which gives us logarithmic time complexity!

We introduce three classical algorithms–binary search in array, binary
search tree, and segment tree to enforce our understanding of decrease and
conquer. Importantly, binary search and binary search tree consists 10% of
the total interview questions.

14.1 Introduction
All the searching we have discussed before never assumed any ordering be-
tween the items, and searching an item in an unordered space is doomed to
have a time complexity linear to the space size. This case is about to change
in this chapter.
Think about these two questions: What if we have a sorted list instead
of an arbitrary one? What if the parent and children nodes within a tree
are ordered in some way? With such special ordering between items in a
data structures, can we increase its searching efficiency and be better than
the blind one by one search in the state space? The answer is YES.
Let’s take advantage of the ordering and the decrease and conquer method-
ology. To find a target in a space of size n, we first divide it into two sub-
spaces and each of size n/2, say from the middle of the array. If the array is

275
276 14. DECREASE AND CONQUER

increasingly ordered, all items in the left subspace are smaller than all items
in the right subspace. If we compare our target with the item in the middle,
we will know if this target is on the left or right side. With just one step,
we reduced our state space by half size. We further repeat this process on
the reduced space until we find the target. This process is called Binary
Search. Binary search has recurrence relation:

T (n) = T (n/2) + O(1), (14.2)

which decreases the time complexity from O(n) to O(log n).

14.2 Binary Search

Binary search can be easily applied in sorted array or string.
For example , g i v e n a s o r t e d and d i s t i n c t a r r a y
nums = [ 1 , 3 , 4 , 6 , 7 , 8 , 1 0 , 1 3 , 1 4 , 1 8 , 1 9 , 2 1 , 2 4 , 3 7 , 4 0 ,
45 , 71]
Find t a r g e t t = 7 .

Figure 14.1: Example of Binary Search

Find the Exact Target This is the most basic application of binary
search. We can set two pointers, l and r, which points to the first and
last position, respectively. Each time we compute the middle position m =
(l+r)//2, and check if the item num[m] is equal to the target t.

• If it equals, target found and return the position.

• If it is smaller than the target, move to the left half by setting the right
pointer to the position right before the middle position, r = m − 1.

• If it is larger than the target, move to the right half by setting the left
pointer to the position right after the middle position, l = m + 1.
14.2. BINARY SEARCH 277

Repeat the process until we find the target or we have searched the whole
space. The criterion of finishing the whole space is when l starts to be larger
than r. Therefore, in the implementation we use a while loop with condition
l≤ r to make sure we only scan once of the searching space. The process of
applying binary search on our exemplary array is depicted in Fig. 14.1 and
the Python code is given as:
1 def standard_binary_search ( l s t , t a r g e t ) :
2 l , r = 0 , len ( l s t ) − 1
3 w h i l e l <= r :
4 mid = l + ( r − l ) // 2
5 i f l s t [ mid ] == t a r g e t :
6 r e t u r n mid
7 e l i f l s t [ mid ] < t a r g e t :
8 l = mid + 1
9 else :
10 r = mid − 1
11 r e t u r n −1 # t a r g e t i s not found

In the code, we compute the middle position with mid = l + (r - l) //

2 instead of just mid = (l + r) //2 because these two always give the
same computational result but the later is more likely to lead to overflow
with its addition operator.

14.2.1 Lower Bound and Upper Bound

Duplicates and Target Missing What if there are duplicates in the
array:
For example ,
nums = [ 1 , 3 , 4 , 4 , 4 , 4 , 6 , 7 , 8 ]
Find t a r g e t t = 4

Applying the first standard binary search will return 3 as the target position,
which is the second 4 in the array. This does not seem like a problem at
first. However, what if you want to know the predecessor or successor (3 or
5) of this target? In a distinct array, the predecessor and successor would
be adjacent to the target. However, when the target has duplicates, the
predecessor is before the first target and the successor is next to the last
target. Therefore, returning an arbitrary one will not be helpful.
Another case, what if our target is 6, and we first want to see if it exists
in the array. If it does not, we would like to insert it into the array and still
keep the array sorted. The above implementation simply returns −1, which
is not helpful at all.
The lower and upper bound of a binary search are the lowest and
highest position where the value could be inserted without breaking the
ordering.
278 14. DECREASE AND CONQUER

Figure 14.2: Binary Search: Lower Bound of target 4.

Figure 14.3: Binary Search: Upper Bound of target 4.

For example, if our t = 4, the first position it can insert is at index 2

and the last position is at index 6.

• With index 2 as the lower bound, items in i ∈ [0, l−1], a[i] < t, a[l] = t,
and i ∈ [l, n), a[i] ≥ t. A lower bound is also the first position that has
a value v ≥ t. This case is shown in Fig. 14.2.

• With the upper bound, items in i ∈ [0, u − 1], a[i] ≤ t, a[u] = t, and
i ∈ [u, n), a[i] > t. An upper bound is also the first position that has
a value v > t. This case is shown in Fig. 14.3.

Figure 14.4: Binary Search: Lower and Upper Bound of target 5 is the same.
14.2. BINARY SEARCH 279

If t = 5, the only position it can insert is at index 6, which indicates

l = u. We show this case in Fig. 14.4.
Now that we know the meaning of the upper and lower bound, here
comes to the question, “How to implement them?”

Implement Lower Bound Because if the target equals to the value at

the middle index, we have to move to the left half to find its leftmost position
of the same value. Therefore, the logic is that we move as left as possible
until it can’t further. When it stops, l > r, and l points to the first position
that the value v be v ≥ t. Another way to think about the return value is
with assumption: Assume the middle pointer m is at the first position that
equals to the target in the case of target 4, which is index 2. According
to the searching rule, it goes to the left search space and changes the right
pointer as r = m − 1. At this point, in the valid search space, there will
never be a value that can be larger or equals to the target, pointing out that
it will only moving to the right side, increasing the l pointer and leave the
r pointer untouched until l > r and the search stops. When the first time
that l > r, the left pointer will be l = r + 1 = m, which is the first position
that its value equals to the target.
The search process for target 4 and 5 is described as follows:
0: l = 0, r = 8 , mid = 4
1 : mid = 4 , 4==4, l = 0 , r = 3
2 : mid = 1 , 4 >3 , l = 2 , r = 3
3 : mid = 2 , 4==4, l = 2 , r = 1
r e t u r n l =2

Similarly, we run the case for target 5.

0: l = 0, r = 8, mid = 4
1 : mid = 4 , 5 >4 , l = 5, r = 8
2 : mid = 6 , 5 <6 , l = 5, r = 5
3 : mid = 5 , 5 >4 , l = 6, r = 5
r e t u r n l =6

The Python code is as follows:

1 d e f lower_bound_bs ( nums , t ) :
2 l , r = 0 , l e n ( nums ) − 1
3 w h i l e l <= r :
4 mid = l + ( r − l ) // 2
5 i f t <= nums [ mid ] : # move a s l e f t a s p o s s i b l e
6 r = mid − 1
7 else :
8 l = mid + 1
9 return l

Implement Upper Bound To be able to find the upper bound, we need

to move the left pointer to the right as much as possible. Assume we have
the middle index at 5, with target as 4. The binary search moves to the
280 14. DECREASE AND CONQUER

right side of the state space, making l = mid+1 = 6. Now, in the right state
space, the middle pointer will always have values larger than 4, thus it will
only moves to the left side of the space, which only changes the right pointer
r and leaves the left pointer l touched when the program ends. Therefore, l
will still return our final upper bound index. The Python code is as follows:
1 d e f upper_bound_bs ( nums , t ) :
2 l , r = 0 , l e n ( nums ) − 1
3 w h i l e l <= r :
4 mid = l + ( r − l ) // 2
5 i f t >= nums [ mid ] : # move a s r i g h t a s p o s s i b l e
6 l = mid + 1
7 else :
8 r = mid − 1
9 return l

Python Module bisect Conveniently, we have a Python built-in Module

bisect that offers two methods: bisect_left() for obtaining the lower
bound and bisect_right() to obtain the upper bound. For example, we
can use it as:
1 from b i s e c t import b i s e c t _ l e f t , b i s e c t _ r i g h t , b i s e c t
2 l1 = b i s e c t _ l e f t ( nums , 4 )
3 r1 = b i s e c t _ r i g h t ( nums , 5 )
4 l2 = b i s e c t _ r i g h t ( nums , 4 )
5 r2 = b i s e c t _ r i g h t ( nums , 5 )

It offers six methods as shown in Table 14.1.

Table 14.1: Methods of bisect

Method Description
bisect_left(a, x, The parameters lo and hi may be used to specify a subset of
lo=0, hi=len(a) the list; the function is the same as bisect_left_raw
bisect_right(a, The parameters lo and hi may be used to specify a subset of
x, lo=0, the list; the function is the same as bisect_right_raw
hi=len(a)
bisect(a, x, Similar to bisect_left(), but returns an insertion point which
lo=0, hi=len(a)) comes after (to the right of) any existing entries of x in a.
insort_left(a, x, This is equivalent to a.insert(bisect.bisect_left(a, x, lo, hi), x).
lo=0, hi=len(a))
insort_right(a, This is equivalent to a.insert(bisect.bisect_right(a, x, lo, hi),
x, lo=0, x).
hi=len(a))
insort(a, x, Similar to insort_left(), but inserting x in a after any existing
lo=0, hi=len(a)) entries of x.

Bonus For the lower bound, if we return the position as l-1, then we get
the last position that value < target. Similarily, for the upper bound, we
14.2. BINARY SEARCH 281

get the last position value <= target.

14.2.2 Applications
Binary Search is a powerful problem solving tool. Let’s go beyond the
sorted array: How about when the array is sorted in a way that is not as
monotonic as what we are familiar with, or how about solving math functions
with binary search, whether they are continuous or discrete, equations or
inequations?

First Bad Version(L278) You are a product manager and currently

leading a team to develop a new product. Unfortunately, the latest version
of your product fails the quality check. Since each version is developed based
on the previous version, all the versions after a bad version are also bad.
Suppose you have n versions [1, 2, ..., n] and you want to find out the
first bad one, which causes all the following ones to be bad. You are given
an API bool isBadVersion(version) which will return whether version
is bad. Implement a function to find the first bad version. You should
minimize the number of calls to the API.
Given n = 5 , and v e r s i o n = 4 i s t h e f i r s t bad v e r s i o n .

c a l l i s B a d V e r s i o n ( 3 ) −> f a l s e
c a l l i s B a d V e r s i o n ( 5 ) −> t r u e
c a l l i s B a d V e r s i o n ( 4 ) −> t r u e

Then 4 i s t h e f i r s t bad v e r s i o n .

Analysis and Design In this case, we have a search space in range [1, n].
Think the value at each position is the result from function isBadVersion(i).
Assume the first bad version is at position b, then the values from the po-
sitions are of such pattern: [F,..., F, ..., F, T, ..., T]. We can totally apply
the binary search in the search space [1, n]: to find the first bad version is
the same as finding the first position that we can insert a value True–the
lower bound of value True. Therefore, whenever the value we find is True,
we move to the left space to try to get its first location. The Python code
is given below:
1 def firstBadVersion (n) :
2 l , r = 1, n
3 w h i l e l <= r :
4 mid = l + ( r − l ) // 2
5 i f i s B a d V e r s i o n ( mid ) :
6 r = mid − 1
7 else :
8 l = mid + 1
9 return l
282 14. DECREASE AND CONQUER

Search in Rotated Sorted Array

“How about we rotate the sorted array?”

Problem Definition(L33, medium) Suppose an array (without dupli-

cates) is first sorted in ascending order, but later is rotated at some pivot
unknown to you beforehand–it takes all items before the pivot to the end of
the array. For example, an array [0, 1, 2, 4, 5, 6, 7] be rotated at pivot 4,
will become [4, 5, 6, 7, 0, 1, 2]. If the pivot is at 0, nothing will be changed.
If it is at the end of the array, say 7, it becomes [7, 0, 1, 2, 4, 5, 6]. You
are given a target value to search. If found in the array return its index,
otherwise return -1.
Example 1 :
Input : nums = [ 3 , 4 , 5 , 6 , 7 , 0 , 1 , 2 ] , t a r g e t = 0
Output : 5

target = 8
Output : −1

Analysis and Design In the rotated sorted array, the array is not purely
monotonic. Instead, there will be at most one drop in the array because of
the rotation, which we denote the high and the low item as ah , al respectively.
This drop cuts the array into two parts: a[0 : h + 1] and a[l : n], and both
parts are ascending sorted. If the middle point falls within the left part, the
left side of the state space will be sorted, and if it falls within the right part,
the right side of the state space will be sorted. Therefore, at any situation,
there will always be one side of the state space that is sorted. To check
which side is sorted, simply compare the value of middle pointer with that
of left pointer.

• If nums[l] < nums[mid], then the left part is sorted.

• If nums[l] > nums[mid], then the right part is sorted.

• Otherwise when they equal to each other, which is only possible that
there is no left part left, we have to move to the right part. For
example, when nums=[1, 3], we move to the right part.

With a sorted half of state space, we can check if our target is within the
sorted half: if it is, we switch the state space to the sorted space; otherwise,
we have to move to the other half that is unknown. The Python code is
shown as:
1 d e f R o t a t e d B i n a r y S e a r c h ( nums , t ) :
2 l , r = 0 , l e n ( nums )−1
3 w h i l e l <= r :
4 mid = l + ( r−l ) //2
14.2. BINARY SEARCH 283

5 i f nums [ mid ] == t :
6 r e t u r n mid
7 # Left i s sorted
8 i f nums [ l ] < nums [ mid ] :
9 i f nums [ l ] <= t < nums [ mid ] :
10 r = mid − 1
11 else :
12 l = mid + 1
13 # Right i s s o r t e d
14 e l i f nums [ l ] > nums [ mid ] :
15 i f nums [ mid ] < t <= nums [ r ] :
16 l = mid + 1
17 else :
18 r = mid − 1
19 # L e f t and middle i n d e x i s t h e same , move t o t h e r i g h t
20 else :
21 l = mid + 1
22 r e t u r n −1

What happens if there are duplicates in the rotated

sorted array?
In fact, the same comparison rule applies, with one minor change.
When nums=[1, 3, 1, 1, 1], the middle pointer and the left pointer
has the same value, and in this case, the right side will only consist of
a single value, making us to move to the left side instead. However, if
nums=[1,1,3], we need to move to the right side instead. Moreover,
for nums=[1, 3], it is because there is no left side we have to search
the the right side. Therefore, in this case, it is impossible for us to
decide which way to go, a simple strategy is to just move the left
pointer forward by one position and retreat to the linear search.
1 # The l e f t h a l f i s s o r t e d
2 i f nums [ mid ] > nums [ l ] :
3 # The r i g h t h a l f i s s o r t e d
4 e l i f nums [ mid ] < nums [ l ] :
5 # For t h i r d c a s e
6 else :
7 l +=1

Binary Search to Solve Functions

Now, let’s see how it can be applied to solve equations or inequations. As-
sume, our function is y = f (x), and this function is monotonic, such as
√
y = x, y = x2 + 1, y = x. To solve this function is the same as finding a
solution xt to a given target yt . We generally have three steps to solve such
problems:
1. Set a search space for x, say it is [xl , xr ].
284 14. DECREASE AND CONQUER

2. If the function is equation, we find a xt that either equals to yt or close

enough such as |yt − y| <= 1e − 6 using standard binary search.

3. If the function is inequation, we see if it wants the first or the last xt

that satisfy the constraints on y. It is the same as of finding the lower
bound or upper bound.

Arranging Coins (L441, easy) You have a total of n coins that you
want to form in a staircase shape, where every k-th row must have exactly
k coins. Given n, find the total number of full staircase rows that can be
formed. n is a non-negative integer and fits within the range of a 32-bit
signed integer.
Example 1 :
n = 5
The c o i n s can form t h e f o l l o w i n g rows :
∗
∗ ∗
∗ ∗

Because t h e 3 rd row i s i n c o m p l e t e , we r e t u r n 2 .

Analysis and Design Each row x has x coins, summing it up, we get
1 + 2 + ... + x = x(x+1)
2 . The problem is equvalent to find the last integer x
x(x+1)
that makes 2 ≤ n. Of course, this is just a quadratic equation which
can be easily solved if you remember the formula, such as the following
Python code:
1 import math
2 d e f a r r a n g e C o i n s ( n : i n t ) −> i n t :
3 r e t u r n i n t ( ( math . s q r t (1+8∗n ) −1) // 2 )

However, if in the case where we do not know a direct closed-form solution,

we solicit binary search. First, the function of x is monotonically increasing,
which indicates that binary search applies. We set the range of x to [1, n],
what we need is to find the last position that the condition of x(x+1)2 ≤n
satisfies, which is the position right before the upper bound. The Python
code is given as:
1 def arrangeCoins (n) :
2 d e f i s V a l i d ( row ) :
3 r e t u r n ( row ∗ ( row + 1 ) ) // 2 <= n
4
5 def bisect_right () :
6 l , r = 1, n
7 w h i l e l <= r :
8 mid = l + ( r−l ) // 2
9 # Move a s r i g h t a s p o s s i b l e
10 i f i s V a l i d ( mid ) :
11 l = mid + 1
14.3. BINARY SEARCH TREE 285

12 else :
13 r = mid − 1
14 return l
15 return bisect_right () − 1

14.3 Binary Search Tree

A sorted array supports logarithmic query time with binary search, however
it still takes linear time to update–delete or insert items. Binary search tree
(BSTs), a type of binary tree designed for fast access and updates to items,
on the other hand, only takes O(log n) time to update. How does it work?
In the array data structure, we simply sort the items, but how to apply
sorting in a binary tree? Review the min-heap data structure, which recur-
sively defining a node to have the largest value among the nodes that belong
to the subtree of that node, will give us a clue. In the binary search tree, we
define that for any given node x, all nodes in the left subtree of x have keys
smaller than x while all nodes in the right subtree of x have keys larger than
x. An example is shown in Fig. 27.1. With this definition, simply comparing
a search target with the root can point us to half of the search space, given
the tree is balanced enough. Moreover, if we do in-order traversal of nodes
in the tree from the root, we end up with a nice and sorted keys in ascending
order, making binary search tree one member of the sorting algorithms.

Figure 14.5: Example of Binary search tree of depth 3 and 8 nodes.

Binary search tree needs to support many operations, including search-

ing for a given key, the minimum and maximum key, and a predecessor or
successor of a given key, inserting and deleting items while maintaining the
binary search tree property. Because of its efficiency of these operations
compared with other data structures, binary search tree is often used as a
dictionary or a priority queue.
286 14. DECREASE AND CONQUER

With l and r to represent the left and right child of node x, there are
two other definitions other than the binary search tree definition we just in-
troduced: (1)l.key ≤ x.key < r.key and (2) l.key < x.key ≤ r.key. In these
two cases, our resulting BSTs allows us to have duplicates. The exemplary
implementation follow the definition that does not allow duplicates.

14.3.1 Operations
In order to build a BST, we need to insert a series of items in the tree
organized by the search tree property. And in order to insert, we need
to search for a proper position first and then insert the new item while
sustaining the search tree property. Thus, we introduce these operations in
the order of search, insert and generate.

Search The search is highly similar to the binary search in the array. It
starts from the root. Unless the node’s value equals to the target, the search
proceeds to either the left or right child depending upon the comparison
result. The search process terminates when either the target is found or
when an empty node is reached. It can be implemented either recursively or
iteratively with a time complexity O(h), where h is the height of the tree,
which is roughly log n is the tree is balanced enough. The recursive search
is shown as:
1 def search ( root , t ) :
2 i f not r o o t :
3 r e t u r n None
4 i f r o o t . v a l == t :
5 return root
6 e l i f t < root . val :
7 return search ( root . l e f t , t )
8 else :
9 return search ( root . right , t )

Because this is a tail recursion, it can easily be converted to iteration, which

helps us save the heap space. The iterative code is given as:
1 # i t e r a t i v e searching
2 d e f i t e r a t i v e _ s e a r c h ( r o o t , key ) :
3 w h i l e r o o t i s not None and r o o t . v a l != key :
4 i f r o o t . v a l < key :
5 root = root . right
6 else :
7 root = root . l e f t
8 return root
14.3. BINARY SEARCH TREE 287

Write code to find the minimum and maximum key in

the BST.
The minimum key locates at the leftmost of the BST, while the max-
imum key locates at the rightmost of the tree.

Figure 14.6: The red colored path from the root down to the position where
the key 9 is inserted. The dashed line indicates the link in the tree that is
added to insert the item.

Insert Assuming we are inserting a node with key 9 into the tree shown
in Fig 27.1. We start from the root, compare 9 with 8, and goes to node 10.
Next, the search process will lead us to the left child of node 10, and this is
where we should put node 9. The process is shown in Fig. 14.6.
The process itself is easy and clean. Here comes to the implementation.
We treat each node as a subtree: whenever the search goes into that node,
then the algorithm hands over the insertion task totally to that node, and
assume it has inserted the new node and return its updated node. The
main program will just simply reset its left or right child with the return
value from its children. The insertion of new node happens when the search
hits an empty node, it returns a new node with the target value. The
implementation is given as:
1 def i n s e r t ( root , t ) :
2 i f not r o o t :
3 r e t u r n BiNode ( t )
4 i f r o o t . v a l == t :
5 return root
6 e l i f t < root . val :
7 root . l e f t = i n s e r t ( root . l e f t , t )
8 return root
288 14. DECREASE AND CONQUER

9 else :
10 root . right = i n s e r t ( root . right , t )
11 return root

In the notebook, I offered a variant of implementation, check it out if you

are interested. To insert iteratively, we need to track the parent node while
searching. The while loop stops when it hit at an empty node. There will
be three cases in the case of the parent node:

1. When the parent node is None, which means the tree is empty. We
assign the root node with the a new node of the target value.

2. When the target’s value is larger than the parent node’s, the put a
new node as the right child of the parent node.

3. When the target’s value is smaller than the parent node’s, the put a
new node as the left child of the parent node.

The iterative code is given as:

1 def i n s e r t I t r ( root , t ) :
2 p = None
3 node = r o o t #Keep t h e r o o t node
4 w h i l e node :
5 # Node e x i s t s a l r e a d y
6 i f node . v a l == t :
7 return root
8 i f t > node . v a l :
9 p = node
10 node = node . r i g h t
11 else :
12 p = node
13 node = node . l e f t
14 # A s s i g n new node
15 i f not p :
16 r o o t = BiNode ( t )
17 e l i f t > p . val :
18 p . r i g h t = BiNode ( t )
19 else :
20 p . l e f t = BiNode ( t )
21 return root

BST Generation To generate our exemplary BST shown in Fig. 27.1, we

set keys = [8, 3, 10, 1, 6, 14, 4, 7, 13], then we call insert func-
tion we implemented for each key to generate the same tree. The time
complexity will be O(n log n).
1 keys = [ 8 , 3 , 10 , 1 , 6 , 14 , 4 , 7 , 13]
2 r o o t = None
3 f o r k in keys :
4 root = i n s e r t ( root , k )
14.3. BINARY SEARCH TREE 289

Find the Minimum and Maximum Key Because the minimum key is
the leftmost node within the tree, the search process will always traverse to
the left subtree and return the last non-empty node, which is our minimum
node. The time complexity is the same as of searching any key, which is
O(log n).
1 d e f minimum ( r o o t ) :
2 i f not r o o t :
3 r e t u r n None
4 i f not r o o t . l e f t :
5 return root
6 r e t u r n minimum ( r o o t . l e f t )

It can easily be converted to iterative:

1 d e f minimumIter ( r o o t ) :
2 while root :
3 i f not r o o t . l e f t :
4 return root
5 root = root . l e f t
6 r e t u r n None

To find the maximum node, replacing left with right will do. Also, some-
times we need to search two additional items related to a given node: suc-
cessor and predecessor. The structure of a binary search tree allows us to
determine the successor or the predecessor of a tree without ever comparing
keys.

Successor A successor of node x is the smallest node in BST that is

strictly greater than x. It is also called in-order successor, which is the
node next to node x in the inorder traversal ordering–sorted ordering. Other
than the maximum node in BST, all other nodes will have a successor. The
simplest implementation is to return the next node within inorder traversal.
This will have a linear time complexity, which is not great. The code is
shown as:
1 d e f s u c c e s s o r I n o r d e r ( r o o t , node ) :
2 i f not node :
3 r e t u r n None
4 i f node . r i g h t i s not None :
5 r e t u r n minimum ( node . r i g h t )
6 # Inorder traversal
7 s u c c = None
8 while root :
9 i f node . v a l > r o o t . v a l :
10 root = root . right
11 e l i f node . v a l < r o o t . v a l :
12 succ = root
13 root = root . l e f t
14 else :
15 break
16 return succ
290 14. DECREASE AND CONQUER

Let us try something else. In the BST shown in Fig. 14.6, the node 3’s
successor will be node 4. For node 4, its successor will be node 6. For node
7, its successor is node 8. What are the cases here?
• An easy case is when a node has right subtree, its successor is the
minimum node within its right subtree.

• However, if a node does not have a right subtree, there are two more
cases:

– If it is a left child of its parent, such as node 4 and 9, its direct

parent is its successor.
– However, if it is a right child of its parent, such as node 7 and 14,
we traverse backwards to check its parents. If a parent node is
the left chid of its parent, then that parent will be the successor.
For example, for node 7, we traverse through 6, 3, and 3 is a left
child of node 8, making node 8 the successor for node 7.

The above two rules can be merged as: starting from the target node,
traverse backward to check its parent, find the first two nodes which
are in left child–parent relation. The parent node in that relation will
be our targeting successor. Because the left subtree is always smaller
than a node, when we backward, if a node is smaller than its parent,
it tells us that the current node is smaller than that parent node too.
We write three functions to implement the successor:
• Function findNodeAddParent will find the target node and add a
parent node to each node along the searching that points to their
parents. The Code is as:
1 d e f findNodeAddParent ( r o o t , t ) :
2 i f not r o o t :
3 r e t u r n None
4 i f t == r o o t . v a l :
5 return root
6 e l i f t < root . val :
7 root . l e f t . p = root
8 r e t u r n findNodeAddParent ( r o o t . l e f t , t )
9 else :
10 root . right . p = root
11 r e t u r n findNodeAddParent ( r o o t . r i g h t , t )

• Function reverse will find the first left-parent relation when traverse
backward from a node to its parent.
1 d e f r e v e r s e ( node ) :
2 i f not node o r not node . p :
3 r e t u r n None
4 # node i s a l e f t c h i l d
14.3. BINARY SEARCH TREE 291

5 i f node . v a l < node . p . v a l :

6 r e t u r n node . p
7 r e t u r n r e v e r s e ( node . p )

• Function successor takes a node as input, and return its sccessor.

1 def successor ( root ) :
2 i f not r o o t :
3 r e t u r n None
4 i f root . right :
5 r e t u r n minimum ( r o o t . r i g h t )
6 else :
7 return reverse ( root )

To find a successor for a given key, we use the following code:

1 r o o t . p = None
2 node = findNodeAddParent ( r o o t , 4 )
3 s u c = s u c c e s s o r ( node )

This approach will gives us O(log n) time complexity.

Predecessor A predecessor of node x on the other side, is the largest item

in BST that is strictly smaller than x. It is also called in-order prede-
cessor, which denotes the previous node in Inorder traversal of BST. For
example, for node 6, the predecessor is node 4, which is the maximum node
within its left subtree. For node 4, its predecessor is node 3, which is the par-
ent node in a right child–parent relation while tracing back through parents.
Now, assume we find the targeting node with function findNodeAddParent,
we first write reverse function as reverse_right.
1 d e f r e v e r s e _ r i g h t ( node ) :
2 i f not node o r not node . p :
3 r e t u r n None
4 # node i s a r i g h t c h i l d
5 i f node . v a l > node . p . v a l :
6 r e t u r n node . p
7 r e t u r n r e v e r s e _ r i g h t ( node . p )

Next, we implement the above rules to find predecessor of a given node.

1 def predecessor ( root ) :
2 i f not r o o t :
3 r e t u r n None
4 i f root . l e f t :
5 r e t u r n maximum( r o o t . l e f t )
6 else :
7 return reverse_right ( root )

The expected time complexity is O(log n). And the worst is when the tree
line up and has no branch, which makes it O(n). Similarily, we can use
inorder traversal:
292 14. DECREASE AND CONQUER

1 d e f p r e d e c e s s o r I n o r d e r ( r o o t , node ) :
2 i f not node :
3 r e t u r n None
4 i f node . l e f t i s not None :
5 r e t u r n maximum( node . l e f t )
6 # Inorder traversal
7 pred = None
8 while root :
9 i f node . v a l > r o o t . v a l :
10 pred = r o o t
11 root = root . right
12 e l i f node . v a l < r o o t . v a l :
13 root = root . l e f t
14 else :
15 break
16 r e t u r n pred

Delete When we delete a node, we need to restructure the subtree of that

node to make sure the BST property is maintained. There are different
cases:
1. Node to be deleted is leaf: Simply remove from the tree. For example,
node 1, 4, 7, and 13.

2. Node to be deleted has only one child: Copy the child to the node and
delete the child. For example, to delete node 14, we need to copy node
13 to node 14.

3. Node to be deleted has two children, for example, to delete node 3,

we have its left and right subtree. We need to get a value, which can
either be its predecessor-node 1 or successor–node 4, and copy that
value to the position about to be deleted.
To support the delete operation, we write a function deleteMinimum to
obtain the minimum node in that subtree and return a subtree that has
that node deleted.
1 d e f deleteMinimum ( r o o t ) :
2 i f not r o o t :
3 r e t u r n None , None
4 i f root . l e f t :
5 mini , l e f t = deleteMinimum ( r o o t . l e f t )
6 root . l e f t = l e f t
7 r e t u r n mini , r o o t
8 # t h e minimum node
9 i f not r o o t . l e f t :
10 r e t u r n r o o t , None

Next, we implement the above three cases in function _delete when a delet-
ing node is given, which will return a processed subtree deleting its root
node.
14.3. BINARY SEARCH TREE 293

1 def _delete ( root ) :

2 i f not r o o t :
3 r e t u r n None
4 # No c h i d r e n : D e l e t e i t
5 i f not r o o t . l e f t and not r o o t . r i g h t :
6 r e t u r n None
7 # Two c h i l d r e n : Copy t h e v a l u e o f s u c c e s s o r
8 e l i f a l l ( [ root . l e f t , root . right ] ) :
9 succ , r i g h t = deleteMinimum ( r o o t . r i g h t )
10 root . val = succ . val
11 root . right = right
12 return root
13 # One C h i l d : Copy t h e v a l u e
14 else :
15 i f root . l e f t :
16 root . val = root . l e f t . val
17 r o o t . l e f t = None
18 else :
19 root . val = root . right . val
20 r o o t . r i g h t = None
21 return root

Finally, we call the above two function to delete a node with a target key.

1 def d e l e t e ( root , t ) :
2 i f not r o o t :
3 return
4 i f r o o t . v a l == t :
5 root = _delete ( root )
6 return root
7 e l i f t > root . val :
8 root . right = delete ( root . right , t )
9 return root
10 else :
11 root . l e f t = delete ( root . l e f t , t )
12 return root

14.3.2 Binary Search Tree with Duplicates

If we use any of the other two definitions we introduced that allows dupli-
cates, things can be more complicated. For example, if we use the definition
x.lef t.key <= x.key < x.right.key, we will end up with a tree looks like
Fig. 14.7:
294 14. DECREASE AND CONQUER

Figure 14.7: A BST with nodes 3 duplicated twice.

Note that the duplicates are not in contiguous levels. This is a big issue
when allowing duplicates in a BST representation as, because duplicates may
be separated by any number of levels, making the detection of duplicates
difficult.
An option to avoid this issue is to not represent duplicates structurally
(as separate nodes) but instead use a counter that counts the number of
occurrences of the key. The previous example will be represented as in
Fig. 14.8:

Figure 14.8: A BST with nodes 3 marked with two occurrence.

This simplifies the related operations at the expense of some extra bytes
and counter operations. Since a heap is a complete binary tree, it has a
smallest possible height - a heap with N nodes always has O(log N) height.

14.4 Segment Tree

To answer queries over an array is called a range query problem, e.g. finding
the sum of consecutive subarray a[l : r], or finding the minimum item in such
a range. A direct and linear solution is to compute the required query on
the subarray on the fly each time. When the array is large, and the update
is frequent, even this linear approach will be too slow. Let’s try to solve this
problem faster than linear. How about computing the query for a range in
advance and save it in a dictionary? If we can, the query time is constant.
14.4. SEGMENT TREE 295

However, because there are n2 subarray, making the space cost polynomial,
which is definitely not good. Another problem, “what if we need to change
the value of an item”, we have to update n nodes in the dictionary which
includes the node in its range.
We can balance the search, update, and space from the dictionary ap-
proach to a logarithmic time with the technique of decrease and conquer. In
the binary search, we keep dividing our search space into halves recursively
until a search space can no longer be divided. We can apply the dividing
process here, and construct a binary tree, and each node has l and r to
indicate the range of that node represents. For example, if our array has
index range [0, 5], its left subtree will be [0, mid], and right subtree will
be [mid+1, 5]. a binary tree built with binary search manner is shown in
Fig. 14.9.

Figure 14.9: A Segment Tree

To get the answer for range query [0, 5], we just return the value at root
node. If the range is [0, 1], which is on the left side of the tree, we go to the
left branch, and cutting half of the search space. For a range that happens
to be between two nodes, such as [1, 3], which needs node [0, 1] and [2-5],
we search [0, 1] in the left subtree and [2, 3] in the right subtree and combine
them together. Any searching will be within O(log n), relating to the height
of the tree. needs better complexity analysis

Segment tree The above binary tree is called segment tree. From our
analysis, we can see a segment tree is a static full binary trees. ’Static‘ here
means once the data structure is built, it can not be modified or extended.
However, it can still update the value in the original array into the segment
tree. Segment tree is applied widely to efficiently answer numerous dynamic
range queries problems (in logarithmic time), such as finding minimum,
296 14. DECREASE AND CONQUER

maximum, sum, greatest common divisor, and least common denominator

in array.
Consider an array A of size n and a corresponding segment tree T :

1. The root of T represents the whole array A[0 : n].

2. Each internal node in T represents the interval of A[i : j] where 0 <

i < j <= n.

3. Each leaf in T represents a single element A[i], where 0 ≤ i < n.

4. If the parent node is in range [i, j], then we separate this range at the
middle position m = (i + j)//2; the left child takes range [i, m], and
the right child take the interval of [m + 1, j].

Because in each step of building the segment tree, the interval is divided
into two halves, so the height of the segment tree will be log n. And there
will be totally n leaves and n − 1 number of internal nodes, which makes the
total number of nodes in segment tree to be 2n − 1, which indicates a linear
space cost. Except of an explicit tree can be used to implement segment
tree, an implicit tree implemented with array can be used too, similar to the
case of heap data structure.

14.4.1 Implementation
Implementation of a functional segment tree consists of three core oper-
ations: tree construction, range query, and value update, named as as
_buildSegmentTree(), RangeQuery(), and update(), respectively. We
demonstrate the implementation with Range Sum Query (RSQ) problem,
but we try to generalize the process so that the template can be easily reused
to other range query problems. In our implementation, we use explicit tree
data structure for both convenience and easier to understand. We define a
general tree node data structure as:
1 c l a s s TreeNode :
2 d e f __init__ ( s e l f , v a l , s , e ) :
3 s e l f . val = val
4 self . s = s
5 self .e = e
6 s e l f . l e f t = None
7 s e l f . r i g h t = None

Range Sum Query(L307, medium) Given an integer array, find the

sum of the elements between indices i and j, range [i, j], i ≤ j.
Example :

Given nums = [ 2 , 9 , 4 , 5 , 8 , 7 ]
14.4. SEGMENT TREE 297

sumRange ( 0 , 2 ) −> 15
update ( 1 , 3 )
sumRange ( 0 , 2 ) −> 9

Figure 14.10: Illustration of Segment Tree for Sum Range Query.

Tree Construction The function _buildSegmentTree() takes three ar-

guments: nums, s as the start index, and e as the end index. Because there
are totally 2n − 1 nodes, which makes the time and space complexity both
be O(n).
1 d e f _buildSegmentTree ( nums , s , e ) :
2 '''
3 s , e : s t a r t i n d e x and end i n d e x
4 '''
5 if s > e:
6 r e t u r n None
7 i f s == e :
8 r e t u r n TreeNode ( nums [ s ] , s , e )
9
10 m = ( s + e ) //2
11 # Divide : return a subtree
12 l e f t = _buildSegmentTree ( nums , s , m)
13 r i g h t = _buildSegmentTree ( nums , m+1, e )
14
15 # Conquer : merge two s u b t r e e
16 node = TreeNode ( l e f t . v a l + r i g h t . v a l , s , e )
17 node . l e f t = l e f t
18 node . r i g h t = r i g h t
19 r e t u r n node

Building a segment tree for our example as:

1 nums = [ 2 , 9 , 4 , 5 , 8 , 7 ]
298 14. DECREASE AND CONQUER

2 r o o t = _buildSegmentTree ( nums , 0 , l e n ( nums ) − 1 )

It will generate a tree shown in Fig. 14.10.

Range Query Each query within range [i, j], i < j, i ≥ s, j ≤ e, will be
found on a node or by combining multiple node. In the query process, check
the following cases:
• If range [i, j] matches the range [s, e], if it matches, return the value
of the node, otherwise, processed to other cases.

• Compute middle index m = (s + e)//2. Check if range [i, j] is within

the left state space [s, m] if j ≤ m, or within the right state space
[m + 1, e] if i ≥ m + 1, or is cross two spaces if otherwise.

– For the first two cases, a recursive call on that branch will return
our result.
– For the third case, where the range crosses two space, two re-
cursive calls on both children of our current node are needed:
the left one handles range [i, m], and the right one handles range
[m + 1, j]. The final result will be a combination of these two.

The code is as follows:

1 d e f _rangeQuery ( r o o t , i , j , s , e ) :
2 i f s == i and j == e :
3 return root . val i f root e l s e 0
4 m = ( s + e ) //2
5 i f j <= m:
6 r e t u r n _rangeQuery ( r o o t . l e f t , i , j , s , m)
7 e l i f i > m:
8 r e t u r n _rangeQuery ( r o o t . r i g h t , i , j , m+1, e )
9 else :
10 r e t u r n _rangeQuery ( r o o t . l e f t , i , m, s , m) + _rangeQuery ( r o o t
. r i g h t , m+1 , j , m+1, e )

Update To update nums[1]=3, all nodes on the path from root to the
leaf node will be affected and needed to be updated with to incorporate the
change at the leaf node. We search through the tree with a range [1, 1] just
like we did within _rangeQuery except that we no longer need the case of
crossing two ranges. Once we reach to the leaf node, we update that node’s
value to the new value, and it backtracks to its parents where we recompute
the parent node’s value according to the result of its children. This operation
takes O(log n) time complexity, and we can do it inplace since the structure
of the tree is not changed.
1 d e f _update ( r o o t , s , e , i , v a l ) :
2 i f s == e == i :
3 root . val = val
14.5. EXERCISES 299

4 return
5 m = ( s + e ) // 2
6 i f i <= m:
7 _update ( r o o t . l e f t , s , m, i , v a l )
8 else :
9 _update ( r o o t . r i g h t , m + 1 , e , i , v a l )
10 root . val = root . l e f t . val + root . right . val
11 return

Minimum and Maximum Range Query To get the minimum or max-

imum value within a given range, we just need to modify how to value is
computed. For example, to update, we just need to change the line 10 of
the above code to root.val = min(root.left.val, root.right.val).
There are way more other variants of segment tree, check it out if you are
into knowing more at https://cp-algorithms.com/data_structures/segment_
tree.html.

14.5 Exercises
1. 144. Binary Tree Preorder Traversal

2. 94. Binary Tree Inorder Traversal

3. 145. Binary Tree Postorder Traversal

4. 589. N-ary Tree Preorder Traversal

5. 590. N-ary Tree Postorder Traversal

6. 429. N-ary Tree Level Order Traversal

7. 103. Binary Tree Zigzag Level Order Traversal(medium)

8. 105. Construct Binary Tree from Preorder and Inorder Traversal

938. Range Sum of BST (Medium)

Given the root node of a binary search tree, return the sum of values
of all nodes with value between L and R (inclusive).
The binary search tree is guaranteed to have unique values.
1 Example 1 :
2
3 I n p u t : r o o t = [ 1 0 , 5 , 1 5 , 3 , 7 , n u l l , 1 8 ] , L = 7 , R = 15
4 Output : 32
5
6 Example 2 :
7
8 I n p u t : r o o t = [ 1 0 , 5 , 1 5 , 3 , 7 , 1 3 , 1 8 , 1 , n u l l , 6 ] , L = 6 , R = 10
9 Output : 23
300 14. DECREASE AND CONQUER

Tree Traversal+Divide and Conquer. We need at most O(n) time

complexity. For each node, there are three cases: 1) L <= val <= R, 2)val
< L, 3)val > R. For the first case it needs to obtain results for both its
subtrees and merge with its own val. For the others two, because of the
property of BST, only the result of one subtree is needed.
1 d e f rangeSumBST ( s e l f , r o o t , L , R) :
2 i f not r o o t :
3 return 0
4 i f L <= r o o t . v a l <= R:
5 r e t u r n s e l f . rangeSumBST ( r o o t . l e f t , L , R) + s e l f .
rangeSumBST ( r o o t . r i g h t , L , R) + r o o t . v a l
6 e l i f r o o t . v a l < L : #l e f t i s not needed
7 r e t u r n s e l f . rangeSumBST ( r o o t . r i g h t , L , R)
8 e l s e : # r i g h t s u b t r e e i s not needed
9 r e t u r n s e l f . rangeSumBST ( r o o t . l e f t , L , R)

14.5.1 Exercises
14.1 35. Search Insert Position (easy). Given a sorted array and a
target value, return the index if the target is found. If not, return the
index where it would be if it were inserted in order.
You can assume that there are no duplicates in the array.
Example 1 :

Input : [ 1 , 3 , 5 , 6 ] , 5
Output : 2

Example 2 :
Input : [ 1 , 3 , 5 , 6 ] , 2
Output : 1

Example 3 :
Input : [ 1 , 3 , 5 , 6 ] , 7
Output : 4

Example 4 :
Input : [ 1 , 3 , 5 , 6 ] , 0
Output : 0

Solution: Standard Binary Search Implementation. For this

problem, we just standardize the Python code of binary search, which
takes O(logn) time complexity and O(1) space complexity without
using recursion function. In the following code, we use exclusive right
index with len(nums), therefore it stops if l == r; it can be as small
as 0 or as large as n of the array length for numbers that are either
smaller or equal to the nums[0] or larger or equal to nums[-1]. We can
also make the right index inclusive.
14.5. EXERCISES 301

1 # exclusive version
2 d e f s e a r c h I n s e r t ( s e l f , nums , t a r g e t ) :
3 l , r = 0 , l e n ( nums ) #s t a r t from 0 , end t o t h e l e n (
exclusive )
4 while l < r :
5 mid = ( l+r ) //2
6 i f nums [ mid ] < t a r g e t : #move t o t h e r i g h t s i d e
7 l = mid+1
8 e l i f nums [ mid ] > t a r g e t : #move t o t h e l e f t s i d e ,
not mid−1
9 r= mid
10 e l s e : #found t h e t r a g e t
11 r e t u r n mid
12 #where t h e p o s i t i o n s h o u l d go
13 return l

1 # inclusive version
2 d e f s e a r c h I n s e r t ( s e l f , nums , t a r g e t ) :
3 l = 0
4 r = l e n ( nums )−1
5 w h i l e l <= r :
6 m = ( l+r ) //2
7 i f t a r g e t > nums [m] : #s e a r c h t h e r i g h t h a l f
8 l = m+1
9 e l i f t a r g e t < nums [m] : # s e a r c h f o r t h e l e f t h a l f
10 r = m−1
11 else :
12 return m
13 return l

Standard binary search

1. 611. Valid Triangle Number (medium)
2. 704. Binary Search (easy)
3. 74. Search a 2D Matrix) Write an efficient algorithm that searches for
a value in an m x n matrix. This matrix has the following properties:
(a) Integers in each row are sorted from left to right.
(b) The first integer of each row is greater than the last integer of
the previous row.
For example ,
C o n s i d e r t h e f o l l o w i n g matrix :

[
[1 , 3 , 5 , 7] ,
[ 1 0 , 11 , 16 , 2 0 ] ,
[ 2 3 , 30 , 34 , 50]
]

Given t a r g e t = 3 , r e t u r n t r u e .
302 14. DECREASE AND CONQUER

Also, we can treat is as one dimensional, and the time complexity is

O(lg(m ∗ n)), which is the same as O(log(m) + log(n)).
1 class Solution :
2 d e f s e a r c h M a t r i x ( s e l f , matrix , t a r g e t ) :
3 i f not matrix o r t a r g e t i s None :
4 return False
5
6 rows , c o l s = l e n ( matrix ) , l e n ( matrix [ 0 ] )
7 low , h i g h = 0 , rows ∗ c o l s − 1
8
9 w h i l e low <= h i g h :
10 mid = ( low + h i g h ) / 2
11 num = matrix [ mid / c o l s ] [ mid % c o l s ]
12
13 i f num == t a r g e t :
14 r e t u r n True
15 e l i f num < t a r g e t :
16 low = mid + 1
17 else :
18 h i g h = mid − 1
19
20 return False

Check http://www.cnblogs.com/grandyang/p/6854825.html to get more

examples.
Search on rotated and 2d matrix:

1. 81. Search in Rotated Sorted Array II (medium)

2. 153. Find Minimum in Rotated Sorted Array (medium) The key here
is to compare the mid with left side, if mid-1 has a larger value, then
that is the minimum

3. 154. Find Minimum in Rotated Sorted Array II (hard)

Search on Result Space:

1. 367. Valid Perfect Square (easy) (standard search)

2. 363. Max Sum of Rectangle No Larger Than K (hard)

3. 354. Russian Doll Envelopes (hard)

4. 69. Sqrt(x) (easy)

Sorting and Selection Algorithms

Sorting is the most basic building block for many other algorithms and is
often considered as the very first step that eases and reduces the original
problems to easier ones.

15.1 Introduction
Sorting In computer science, a sorting algorithm is designed to rearrange
items of a given array in a certain order based on each item’s key. The
most frequently used orders are numerical order and lexicographical order.
For example, given an array of size n, sort items in increasing order of its
numerical values:
Array = [ 9 , 1 0 , 2 , 8 , 9 , 3 , 7 ]
sorted = [2 , 3 , 7 , 8 , 9 , 9 , 10]

Selection Selection algorithm is used to find the k-th smallest number in

a given array; such a number is called the k-th order statistic. For example,
given the above array, find the 3-th smallest number.
Array = [ 9 , 1 0 , 2 , 8 , 9 , 3 , 7 ] , k = 3
Result : 7

Sorting and Selection often go hand in hand; either we first execute sorting
and then select the desired order through indexing or we derive a selection
algorithm from a corresponding sorting algorithm. Due to such relation, this
chapter is mainly about introducing sorting algorithms and occasionally we
introduce their corresponding selection algorithms by the side.

303
304 15. SORTING AND SELECTION ALGORITHMS

Lexicographical Order For a list of strings, sorting them will make them
in lexicographical order. The order is decided by a comparison function,
which compares corresponding characters of the two strings from left to
right. In the process, the first pair of characters that differ from each other
determines the ordering: the string that has smaller alphabet from the pair
is smaller than the other string.
Characters are compared using the Unicode character set. All uppercase
letters come before lower case letters. If two letters are the same case, then
alphabetic order is used to compare them. For example:
' ab ' < ' bc ' ( d i f f e r s a t i = 0 )
' abc ' < ' abd ' ( d i f f e r s a t i = 2 )

Special cases appears when two strings are of different length and the shorter
one s is a prefix of the the longer one t, then it is considered that s < t. For
example:
' ab ' < ' abab ' ( ' ab ' i s a p r e f i x o f ' abab ' )

How to Learn Sorting Algorithms? We list a few terminologies that

are commonly seen to describe the properties of a certain sorting algorithm:

• In-place Sorting: In-place sorting algorithm only uses a constant

number of extra spaces to assist its implementation. If a sorting algo-
rithm is not in-place, it is called out-of-place sorting instead.

• Stable Sorting: Stable sorting algorithm maintain the relative order

of items with equal keys. For example, two different tasks that come
with same priority in the priority queue should be scheduled in the
relative pending ordering.

• Comparison-based Sorting: This kind of sorting technique deter-

mines the sorted order of an input array by comparing pairs of items
and moving them around based on the results of comparison. And it
has a lower bound of Ω(n log n) comparison.

Sorting Algorithms in Coding Interviews As the fundamental Sort-

ing and selection algorithms can still be potentially met in interviews where
we might be asked to implement and analyze any sorting algorithm you like.
Therefore, it is necessary for us to understand the most commonly known
sorting algorithms. Also, Python provides us built-in sorting algorithms to
use directly and we shall mater the syntax too.

The Applications of Sorting The importance of sorting techniques is

decided by its multiple fields of application:
15.2. PYTHON COMPARISON OPERATORS AND BUILT-IN FUNCTIONS305

1. Sorting can organize information in a human-friendly way. For ex-

ample, the lexicographical order are used in dictionary and inside of
library systems to help users locate wanted words or books in a quick
way.

2. Sorting algorithms often be used as a key subroutine to other algo-

rithms. As we have shown before, binary search, sliding window al-
gorithms, or cyclic shifts of suffix array need the data to be in sorted
order to carry on the next step. When ordering will not incur wrong
solution to the problems, sorting beforehand should always be atop on
our mind for sorting first might ease our problem later.

Organization We organize the content mainly based on the worst case

time complexity. Section 15.3 - 15.4 focuses on comparison-based sorting
algorithms, and Section 15.5.2-?? introduce classical non-comparison-based
sorting algorithms.

• Naive Sorting (Section 15.3): Bubble Sort, Insertion Sort, Selection

Sort;

• Asymptotically Best Sorting (Section 15.4) Sorting: merge sort, quick

sort, and Quick Select;

• Linear Sorting (Section 15.5.2): Counting Sort, where k is the range

of the very first and last key.

• Python Built-in Sort (Section 15.6):

15.2 Python Comparison Operators and Built-in

Functions
Comparison Operators Python offers 7 comparison operators shown in
Table. 30.2 to compare values. It either returns True or False according to
the condition.

Table 15.1: Comparison operators in Python

> Greater than - True if left operand is greater than the
right
< Less that - True if left operand is less than the right
== Equal to - True if both operands are equal
!= Not equal to - True if operands are not equal
>= Greater than or equal to - True if left operand is
greater than or equal to the right
<= Less than or equal to - True if left operand is less than
or equal to the right
306 15. SORTING AND SELECTION ALGORITHMS

For example, compare two numerical values:

1 c1 = 2 < 3
2 c2 = 2 . 5 > 3

The printout is:

1 ( True , F a l s e )

Also, compare two strings follows the lexicographical orders:

1 c1 = ' ab ' < ' bc '
2 c2 = ' abc ' > ' abd '
3 c3 = ' ab ' < ' abab '
4 c4 = ' abc ' != ' abc '

The printout is:

1 ( True , F a l s e , True , F a l s e )

What’s more, it can compare other types of sequences such as list and
tuple using lexicographical orders too:
1 c1 = [ 1 , 2 , 3 ] < [ 2 , 3 ]
2 c2 = ( 1 , 2 ) > ( 1 , 2 , 3 )
3 c3 = [ 1 , 2 ] == [ 1 , 2 ]

The printout is:

1 ( True , F a l s e , True )

However, mostly Python 3 does not support comparison between different

types of sequence, nor does it supports comparison for dictionary. For
dictionary data structures, in default, it uses its key as the key to com-
pare with. For example, comparison between list and tuple will raise
TypeError:
1 [ 1 , 2 , 3 ] < (2 , 3)

The error is shown as:

1 −−−−> 1 [ 1 , 2 , 3 ] < ( 2 , 3 )
2 TypeError : '< ' not s u p p o r t e d between i n s t a n c e s o f ' l i s t ' and '
tuple '

Comparison between dictionary as follows will raise the same error:

1 {1: 'a ' , 2: 'b '} < {1: 'a ' , 2: 'b ' , 3: ' c '}

Comparison Functions Python built-in functions max() and min() sup-

port two forms of syntax: max(iterable, *[, key, default]) and max(arg1,
arg2, *args[, key]). If one positional argument is provided, it should be
an iterable. And then it returns the largest item in the iterable based on its
key. It also accepts two or more positional arguments, and these arguments
can be numerical or sequential. When there are two or more positional
argument, the function returns the largest.
For example, with one iterable and it returns 20:
15.2. PYTHON COMPARISON OPERATORS AND BUILT-IN FUNCTIONS307

1 max ( [ 4 , 8 , 9 , 2 0 , 3 ] )

With two positional arguments –either numerical or sequential:

1 m1 = max ( 2 4 , 1 5 )
2 m2 = max ( [ 4 , 8 , 9 , 2 0 , 3 ] , [ 6 , 2 , 8 ] )
3 m3 = max( ' abc ' , ' ba ' )

The printout of these results is:

1 ( 2 4 , [ 6 , 2 , 8 ] , ' ba ' )

With dictionary:
1 d i c t 1 = { ' a ' : 5 , ' b ' : 8 , ' c ' : 3}
2 k1 = max( d i c t 1 )
3 k2 = max( d i c t 1 , key=d i c t 1 . g e t )
4 k3 = max( d i c t 1 , key =lambda x : d i c t 1 [ x ] )

The printout is:

1 ( ' c ' , 'b ' , 'b ' )

When the sequence is empty, we need to set an default value:

1 max ( [ ] , d e f a u l t =0)

Rich Comparison To compare a self-defined class, in Python 2.X, cmp(self,

other) special method is used to implement comparison between two ob-
jects. __cmp__(self, other) returns negative value if self < other, pos-
itive if self > other, and zero if they were equal. However, in Python 3,
this cmp style of comparisons is dropped, and rich comparison is intro-
duced, which assign a special method to each operator as shown in Ta-
ble. 15.2: To avoid the hassle of providing all six functions, we can only

Table 15.2: Operator and its special method

== __eq__
!= __ne__
< __lt__
<= __le__
> __gt__
>= __ge__

implement __eq__, __ne__, and only one of the ordering operators, and use
the functools.total_ordering() decorator to fill in the rest. For example,
write a class Person:
1 from f u n c t o o l s import t o t a l _ o r d e r i n g
2 @total_ordering
3 c l a s s Person ( o b j e c t ) :
4 d e f __init__ ( s e l f , f i r s t n a m e , l a s t n a m e ) :
5 s e l f . f i r s t = firstname
308 15. SORTING AND SELECTION ALGORITHMS

6 s e l f . l a s t = lastname
7
8 d e f __eq__( s e l f , o t h e r ) :
9 r e t u r n ( ( s e l f . l a s t , s e l f . f i r s t ) == ( o t h e r . l a s t , o t h e r .
first ))
10
11 d e f __ne__( s e l f , o t h e r ) :
12 r e t u r n not ( s e l f == o t h e r )
13
14 d e f __lt__( s e l f , o t h e r ) :
15 return (( s e l f . last , s e l f . f i r s t ) < ( other . last , other .
first ))
16
17 d e f __repr__ ( s e l f ) :
18 r e t u r n "%s %s " % ( s e l f . f i r s t , s e l f . l a s t )

Then, we would be able to use any of the above comparison operator on our
class:
1 p1 = Person ( ' L i ' , ' Yin ' )
2 p2 = Person ( ' B e l l a ' , ' Smith ' )
3 p1 > p2

It outputs True because last name “Yin” is larger than “Smith”.

15.3 Naive Sorting

As the most naive and intuitive group of comparison-based sorting methods,
this group takes O(n2 ) time and usually consists of two nested for loops. In
this section, we learn three different sorting algorithms “quickly” due to
their simplicity: insertion sort, bubble sort,and selection sort.

15.3.1 Insertion Sort

Insertion sort is one of the most intuitive sorting algorithms for humans. For
humans, given an array of n items to process, we divide it into two regions:
sorted and unrestricted region. Each time we take one item “out” of
the unrestricted region to sorted region by inserting it at a proper position.

In-place Insertion The logic behind this algorithm is simple, we can do

it easily by setting up another sorted array. However, here we want to focus
on the in-place insertion. Given array of size n, we use index 0 and i to point
to the start position of sorted and the unrestricted region, respectively. And
i = 1 at the beginning, indicates that the sorted region will naturely has
one item. We have sorted region in [0, i − 1], and the unrestricted region in
[i, n − 1]. We scan item in the unrestricted region from left to right, and
insert each item a[i] into the sorted sublist.
15.3. NAIVE SORTING 309

Figure 15.1: The whole process for insertion sort: Gray marks the item to
be processed, and yellow marks the position after which the gray item is to
be inserted into the sorted region.

The key step is to find a proper position of a[i] in the region [0, i − 1] to
insert into. There are two different ways for iteration over unsorted region:
forward and backward. We use pointer j in the sorted region.

• Forward: j will iterate in range [0, i − 1]. We compare a[j] with a[i],
and stop at the first place that a[j] > a[i] (to keep it stable). All items
elements a[j : i − 1] will be shifted backward for one position, and a[i]
will be placed at index j. Here we need i times of comparison and
swaps.

• Backward: j iterates in range [i − 1, 0]. We compare a[j] with a[i],

and stop at the first place that a[j] <= a[i] (to keep it stable). In this
process, we can do the shifting simultaneously: if a[j] > a[i], we shift
a[j] with a[j + 1].

In forward, the shifting process still requires us to reverse the range, therefore
the backward iteration makes better sense.
For example, given an array a = [9, 10, 2, 8, 9, 3]. First, 9 itself is sorted
array. we demonstrate the backward iteration process. At first, 10 is com-
pared with 9, and it stays at where it is. At the second pass, 2 is compared
with 10, 9, and then it is put at the first position. The whole whole process
of this example is demonstrated in Fig. 15.1.

With Extra Space Implementation The Python list.insert() func-

tion handles the insert and shifting at the same time. We need to pay at-
tention when the item is larger than all items in the sorted list, we have to
insert it at the end.
310 15. SORTING AND SELECTION ALGORITHMS

1 def insertionSort (a) :

2 i f not a o r l e n ( a ) == 1 :
3 return a
4 n = len (a)
5 s l = [ a [ 0 ] ] # sorted l i s t
6 f o r i in range (1 , n) :
7 f o r j in range ( i ) :
8 if sl [ j ] > a[ i ]:
9 sl . insert (j , a[ i ])
10 break
11 i f l e n ( s l ) != i + 1 : # not i n s e r t e d y e t
12 sl . insert ( i , a[ i ])
13 return s l

Backward In-place Implementation We use a while loop to handle

the backward iteration: whenever the target is smaller than the item in the
sorted region, we shift the item backward. When the while loop stops, it is
either j = −1 or when t >= a[j].

• When j = −1, that means we need to insert the target at the first
position which should be j + 1.

• When t >= a[j], we need to insert the target one position behind j,
which is j + 1.

The code is shown as:

1 def insertionSort (a) :
2 i f not a o r l e n ( a ) == 1 :
3 return a
4 n = len (a)
5 f o r i in range (1 , n) :
6 t = a[ i ]
7 j = i − 1
8 w h i l e j >= 0 and t < a [ j ] :
9 a [ j +1] = a [ j ] # Move item backward
10 j −= 1
11 a [ j +1] = t
12 return

15.3.2 Bubble Sort and Selection Sort

Bubble Sort

Bubble sort compares each pair of adjacent items in an array and swaps
them if they are out of order. Given an array of size n: in a single pass,
there are n − 1 pairs for comparison, and at the end of the pass, one item
will be put in place.
15.3. NAIVE SORTING 311

Figure 15.2: One pass for bubble sort

Passes For example, Fig. 15.2 shows the first pass for sorting array [9, 10,
2, 8, 9, 3]. When comparing a pair (ai , ai+1 ), if ai > ai+1 , we swap these two
items. We can clearly see after one pass, the largest item 10 is in place. For
the next pass, it only compare pairs within the unrestricted window [0, 4].
This is what“bubble” means in the name: after a pass, the largest item in
the unrestricted window bubble up to the end of the window and become in
place.

Implementation With the understanding of the valid window of each

pass, we can implement “bubble” sort with two nested for loops in Python.
The first for loop to enumerate the number of passes, say i, which is n − 1
in total. The second for loop to is to scan pairs in the unrestricted window
[0, n − i − 1] from left to right. thus index j points to the first item in the
pair, making it in range of [0, n − i − 2].
1 def bubbleSort ( a ) :
2 i f not a o r l e n ( a ) == 1 :
3 return
4 n = len (a)
5 f o r i i n r a n g e ( n − 1 ) : #n−1 p a s s e s
6 f o r j i n r a n g e ( n − i −1) :
7 # Swap
8 if a[ j ] > a[ j + 1]:
9 a [ j ] , a [ j + 1] = a [ j + 1] , a [ j ]
10 return

When the pair has equal values, we do not need to swap them. The advan-
tage of doing so is (1) to save unnecessary swaps and (2) keep the original
order of items with same keys. This makes bubble sort a stable sort. Also,
312 15. SORTING AND SELECTION ALGORITHMS

in the implementation no extra space is assigned either which makes bubble

sort in-place sort.

Complexity Analysis and Optimization In i-th pass, the item number

in the valid window is n − i with n − i − 1 maximum of comparison and
swap, and we need a total of n − 1 passes. The total time will be T =
i=0 (n − i − 1) = n − 1 + (n − 2) + ... + 2 + 1 = n(n − 1)/2 = O(n ).
Pn−i 2

The above implementation runs O(n2 ) even if the array is sorted. We can
optimize the inner for loop by stopping the whole program if no swap is
detected in a single pass. When the input is nearly sorted, this strategy can
get us O(n) time complexity.

Selection Sort

Figure 15.3: The whole process for Selection sort

In the bubble sort, each pass we get the largest element in the valid
window in place by a series of swapping operations. While, selection sort
makes a slight optimization via searching for the largest item in the current
unrestricted window and swap it directly with the last item in the region.
This avoids the constant swaps as occurred in the bubble sort. The whole
sorting process for the same array is shown in Fig 15.3.

Implementation Similar to the implementation of Bubble Sort, we have

the concept of number of passes at the outer for loop, and the concept
of unrestricted at the inner for loop. We use variables ti and li for the
position of the largest item to be and being, respectively.
1 def sel ect Sort (a) :
2 n = len (a)
3 f o r i i n r a n g e ( n − 1 ) : #n−1 p a s s e s
4 ti = n − 1 − i
5 l i = 0 # The i n d e x o f t h e l a r g e s t item
15.4. ASYMPTOTICALLY BEST SORTING 313

6 f o r j in range (n − i ) :
7 i f a [ j ] >= a [ l i ] :
8 li = j
9 # swap l i and t i
10 a[ ti ] , a[ li ] = a[ li ] , a[ ti ]
11 return

Like bubble sort, selection sort is in-place. In the comparison, we used if

a[j] >= a[li]:, which is able to keep the relative order of equal keys. For
example, in our example, there is equal key 9. Therefore, selection sort is
stable sort too.

Complexity Analysis Same as of bubble sort, selection sort has a worst

and average time complexity of O(n2 ) but more efficient when the input is
not as near as sorted.

15.4 Asymptotically Best Sorting

We have learned a few comparison-based sorting algorithms and they all have
an upper bound of n2 in time complexity due to the number of comparisons
must be executed. Can we do better than O(n2 ) and how?

Comparison-based Lower Bounds for Sorting Given an input of size

n, there are n! different possible permutations on the input, indicating that
our sorting algorithms must find the one and only one permutation by com-
paring pairs of items. So, how many times of comparison do we need to reach
to the answer? Let’s try the case when n = 3, and all possible permutations
using the indexes will be: (1, 2, 3), (1, 3, 2), (3, 1, 2), (2, 1, 3), (2, 3, 1), (3, 2, 1).
First we compare pair (1, 2), if a1 < a2 , our candidates set is thus narrowed
down to {(1, 2, 3), (1, 3, 2), (3, 1, 2)}.
We draw a decision-tree, which is a full binary tree with n! leaves–the
n! permutations, and each branch represents one decision made on the com-
parison result. The cost of any comparison-based algorithm is abstracted
as the length of the path from the root of the decision tree to its final
sorted permutation. The longest path represents the worst-case number of
comparisons.
Using h to denote the height of the binary tree, and l for the number
of leaves. First, a binary tree will have at most 2h leaves, we get l ≤ 2h .
Second, it will have at least n! leaves to represent all possible orderings, we
have l ≥ n! . Therefore we get the lower bound time complexity for the
314 15. SORTING AND SELECTION ALGORITHMS

worst case:

n! ≤ l ≤ 2h (15.1)
2 ≥ n!
h
(15.2)
h ≥ log(n!) (15.3)
h = Ω(n log n) (15.4)

In this section, we will introduce three classical sorting algorithms that

has O(n log n) time complexity: Merge Sort and Quick Sort both utilize the
Divide-and-conquer method, and Heap Sort uses the max/min heap data
structures.

15.4.1 Merge Sort

As we know there are two main steps: “divide” and “merge” in merge sort
and we have already seen the illustration of the “divide” process in Chap-
ter. 13.

Divide In the divide stage, the original problem a[s...e], where s, e is the
start and end index of the subarray, respectively. The divide process divides
its parent problem into two halves from the middle index m = (s + e)//2:
a[s...m], and a[m + 1, e]. This recursive call keeps moving downward till the
size of the subproblem becomes one when s = e, which is the base case for a
list of size 1 is naturally sorted. The process of divide is shown in Fig. 15.4.

Merge When we obtained two sorted sublists from the left and right side,
the result of current subproblem is to merge the two sorted list into one. The
merge process is done through two pointer method: We assign a new list and
put two pointers at the start of the two sublists, and each time we choose
the smaller item to append into the new list between the items indicated by
the two pointers. Once a smaller item is chosen, we move its corresponding
pointer to the next item in that sublist. We continue this process until any
pointer reaches to the end. Then, the sublist where the pointer does not
reach to the end yet is coped to the end of the new generated list. The
subprocess is shown in Fig. 15.4 and its implementation is as follows:
1 d e f merge ( l , r ) :
2 ans = [ ]
3 # Two p o i n t e r s each p o i n t s a t l and r
4 i = j = 0
5 n , m = len ( l ) , len ( r )
6
7 w h i l e i < n and j < m:
8 i f l [ i ] <= r [ j ] :
9 ans . append ( l [ i ] )
10 i += 1
15.4. ASYMPTOTICALLY BEST SORTING 315

Figure 15.4: Merge Sort: The dividing process is marked with dark arrows
and the merging process is with gray arrows with the merge list marked in
gray color too.

11 else :
12 ans . append ( r [ j ] )
13 j += 1
14
15 ans += l [ i : ]
16 ans += r [ j : ]
17 r e t u r n ans

In the code, we use l[i] <= r[j] instead of l[i] < r[j] is because when the
left and right sublist contains items of equal keys, we put the ones in the
left first in the merged list, so that the sorting can be stable. However, we
used a temporary space as O(n) to save the merged result a, making merge
sort an out-of-place sorting algorithm.

Implementation The whole implementation is straightforward.

1 d e f mergeSort ( a , s , e ) :
2 i f s == e :
3 return [ a [ s ] ]
4
316 15. SORTING AND SELECTION ALGORITHMS

5 m = ( s + e ) // 2
6
7 l = mergeSort ( a , s , m)
8 r = mergeSort ( a , m+1, e )
9 r e t u r n merge ( l , r )

Complexity Analysis Because for each divide process we need to take

O(n) time to merge the two sublists back to a list, the recurrent relation of
the complexity function can be deducted as follows:

T (n) = 2T (n/2) + O(n)

= 2 ∗ 2T (n/4) + O(n) + O(n) (15.5)
= O(n log n)

Thus, we get O(n log n) as the upper bound for merge sort, which is asymp-
totically optimal within the comparison-based sorting.

15.4.2 HeapSort
To sort the given array in increasing order, we can use min-heap. We first
heapify the given array. To get a sorted list, we can simply pop out items
till the heap is empty. And the popped out items will be in sorted order.

Implementation We can implement heap sort easily with built-in module

heapq through the heapify() and heappop() functions :
1 from heapq import h e a p i f y , heappop
2 def heapsort ( a ) :
3 heapify (a)
4 r e t u r n [ heappop ( a ) f o r i i n r a n g e ( l e n ( a ) ) ]

Complexity Analysis The heapify takes O(n), and the later process
takes O(log n + log (n − 1) + ... + 0) = log(n!) which has an upper bound of
O(n log n).

15.4.3 Quick Sort and Quick Select

Like merge sort, quick sort applies divide and conquer method and is mainly
implemented with recursion. Unlike merge sort, the conquering step the
sorting process- partition happens before “dividing” the problem into sub-
problems through recursive calls.
15.4. ASYMPTOTICALLY BEST SORTING 317

Partition and Pivot In the partition, quick sort chooses a pivot item
from the subarray, either randomly or intentionally. Given a subarray of
A[s, e], the pivot can either be located at s or e, or a random position in
range [s, e]. Then it partitions the subarray A[s, e] into three parts according
to the value of the pivot: A[s, p − 1], A[p], and A[p + 1...e], where p is where
the pivot is placed at. The left and right part of the pivot satisfies the
following conditions:
• A[i] ≤ A[p], i ∈ [s, p − 1],

• and A[i] > A[p], i ∈ [p + 1, e].

If we are allowed with linear space, this partition process will be trivial
to implement. However, we should strive for better and learn an in-place
partition methods–Lomuto’s Partition, which only uses constant space.

Conquer After the partition, one item–the pivot A[p] is placed in the right
place. Next, we only need to handle two subproblems: sorting A[s, p − 1]
and A[p + 1, e] by recursively call the quicksort function. We can write down
the main steps of quick sort as:
1 def quickSort (a , s , e ) :
2 # Base c a s e
3 i f s >= e :
4 return
5 p = partition (a , s , e )
6
7 # Conquer
8 q u i c k S o r t ( a , s , p−1)
9 q u i c k S o r t ( a , p+1 , e )
10 return

At the next two subsection, we will talk about partition algorithm. And
the requirement for this step is to do it in-place just through a series of
swapping operations.

Lomuto’s Partition
We use example A = [3, 5, 2, 1, 6, 4] to demonstrate this partition method.
Assume our given range for partition is [s, e], and p = A[e] is chosen as
pivot. We would use two pointer technique i, j to maintain three regions in
subarray A[s, e]: (1) region [s, i] with items smaller than or equal to p; (2)
[i + 1, j − 1] region with item larger than p; (3) unrestricted region [j, e − 1].
These three areas and the partition process on the example is shown in
Fig. 15.5.

• At first, i = s − 1, j = s. Such initialization guarantees that region (1)

and (2) are both empty, and region (3) is the full range other than the
pivot.
318 15. SORTING AND SELECTION ALGORITHMS

Figure 15.5: Lomuto’s Partition. Yellow, while, and gray marks as region
(1), (2) and (3), respectively.

• Then, scan scan items in the unrestricted area using pointer j:

– If the current item A[j] belongs to region (2), that is to say A[j] >
p, we just increment pointer j;
– Otherwise when A[j] <= p, this item should goes to region (1).
We accomplish this by swapping this item with the first item in
region (2) at i + 1. And now region (1) increments by one and
region (2) shifts one position backward.

• After the for loop, we need to put our pivot at the first place of region
(2) by swapping. And now, the whole subarray is successfully pariti-
tioned into three regions as we needed, and return where the index of
where the pivot is at–i + 1–as the partition index.

We The implementation of as follows:

1 def partition (a , s , e ) :
2 p = a[e]
3 i = s − 1
4 # Scan u n r e s t i c t e d a r e a
5 f o r j in range ( s , e ) :
6 # Swap
7 i f a [ j ] <= p :
8 i += 1
9 a[ i ] , a[ j ] = a[ j ] , a[ i ]
10 a [ i +1] , a [ e ] = a [ e ] , a [ i +1]
11 r e t u r n i +1

Complexity Analysis The worst case of the partition appears when the
input array is already sorted or is reversed from the sorted array. In this
15.4. ASYMPTOTICALLY BEST SORTING 319

case, it will partition a problem with size n into one subproblem with size
n − 1 and the other subproblem is just empty. The recurrence function is
T (n) = T (n−1)+O(n), and it has a time complexity of O(n2 ). And the best
case appears when a subprocess is divided into half and half as in the merge
sort, where the time complexity is O(n log n). Randomly picking the pivot
from A[s, e] and swap it with A[e] can help us achieve a stable performance
with average O(n log n) time complexity.

Stablity of Quick Sort

Quick sort is not stable, because there are cases items can be swapped no
matter what: (1) as the first item in the region (2), it can be swapped to the
end of region (2). (2) as the pivot, it is swapped with the first item in the
region (2) too. Therefore, it is hard to guarantee the stability among equal
keys. We can try experiment with A = [(2, 1), (2, 2), (1, 1)], and use the first
element in the tuple as key. This will sort it as A = [(1, 1), (2, 2), (2, 1)].
However, we can still make quick sort stable if we get rid of the swaps
by using two extra lists: one for saving the smaller and equivalent items and
the other for the larger items.

Quick Select
Quick Select is a variant of quick sort, and it is used to find the k-th smallest
item in a list in linear time. In quicksort, it recurs both sides of the partition
index, while in quick select, only the side that contains the k-th smallest item
will be recurred. This is similar to the binary search, the comparison of k
and partition index p results three cases:

• If p = k, we find the k-th smallest item, return.

• If p > k, then we recur on the right side.

• If p < k, then we recur on the left side.

Based on the structure, quick select has the following recurrence time com-
plexity function:

T (n) = T (n/2) + O(n) (15.6)

T (n) = O(n) (with master theorem) (15.7)

Implementation We first set k in range of [s, e]. When s = e, there is

only one item in the list, which means we no longer can divide it. This is
our end condition and is also the case when our original list has only one
item, then we have to return this item as the 0-th smallest item.
320 15. SORTING AND SELECTION ALGORITHMS

1 d e f q u i c k S e l e c t ( a , s , e , k , p a r t i t i o n=p a r t i t i o n ) :
2 i f s >= e :
3 return a [ s ]
4
5 p = partition (a , s , e )
6 i f p == k :
7 return a [ p ]
8 if k > p:
9 r e t u r n q u i c k S e l e c t ( a , p+1 , e , k , p a r t i t i o n )
10 else :
11 r e t u r n q u i c k S e l e c t ( a , s , p−1, k , p a r t i t i o n )

15.5 Linear Sorting

Sorting without basing upon comparisons is possible, creative, and even
faster, proved by the three non-comparative sorting algorithms we are about
to introduce: Bucket Sort, Counting Sort, and Radix Sort. For these algo-
rithms the theoretic lower bound O(n log n) of comparison-based sorting is
not likewise a lower bound any more; they all work in linear time. However,
there are limitations to the input data, as these sorting techniques rely on
certain assumptions concerning the data to be sorted to be able to work.
Although the three algorithms we see in this section come in different
forms and rely on different assumptions to the input data, we see one thing
in common: They all use the divide and conquer algorithm design paradigm.
Let’s explore their unique tricks and the restrictive applications!

15.5.1 Bucket Sort

Bucket Sort assumes that the input data satisfying a uniform distribution.
The uniform distribution is usually assumed to be in interval [0, 1). However,
it can be extended to any uniform distribution with simple modification.
Bucket sort applies a one time divide and conquer trick–it divides the input
data into n independent segments, n the size of the input, just as what we
have seen in merge sort, and then insertion sort is applied on each segment,
and finally each sorted segmented is combined to get the result.
Bucket sort manages the dividing process by assigning n empty buckets,
and then distribute the input data a[i] to bucket index int(a[i]*n). For
example, if n = 10, and a[i] = 0.15, the bucket that the number goes to is
the one with index 1. We use example a = [0.42, 0.72, 0. , 0.3 , 0.15, 0.09,
0.19, 0.35, 0.4 , 0.54], and visualize the process in Fig. 15.6.

Implementation First, we prepare the input data with random.uniform

from numpy library. For simplicity and the reconstruction of the same input,
we used random seed and rounded the float number to only two decimals.
15.5. LINEAR SORTING 321

Figure 15.6: Bucket Sort

1 import numpy a s np
2 np . random . s e e d ( 1 )
3 a = np . random . uni for m ( 0 , 1 , 1 0 )
4 a = np . round ( a , d e c i m a l s =2)

Now, the code for the bucket sort is straightforward as:

1 from f u n c t o o l s import r e d u c e
2 def bucketSort ( a ) :
3 n = len (a)
4 buckets = [ [ ] f o r _ in range (n) ]
5 # D i v i d e numbers i n t o b u c k e t s
6 for v in a :
7 b u c k e t s [ i n t ( v∗n ) ] . append ( v )
8 # Apply i n s e r t i o n s o r t w i t h i n each b ucke t
9 f o r i in range (n) :
10 i n s e r t i o n S o r t ( buckets [ i ] )
11 # Combine s o r t e d b u c k e t s
12 r e t u r n r e d u c e ( lambda a , b : a + b , b u c k e t s )

Complexity Analysis

Extension To extend to uniform distribution in any range, we first find

the minimum and maximum value, minV, maxV , and compute the bucket
322 15. SORTING AND SELECTION ALGORITHMS

index i for number a[i] with formula:

a[i] − minV
i=n (15.8)
maxV − minV

15.5.2 Counting Sort

Counting sort is an algorithm that sorts items according to their correspond-
ing keys that are small integers. It works by counting the occurrences of
each distinct key value, and using arithmetic–prefix sum–on those counts to
determine the position of each key value in the sorted sequence. Counting
sort no longer fits into the comparison-based sorting paradigm because it
uses the keys as indexing to assist the sorting instead of comparing them
directly to decide relative positions. For input that comes with size n and
the difference between the maximum and minimum integer keys k, counting
sort has a time complexity O(n + k).

Premise: Prefix Sum

Before we introduce counting sort, first let us see what is prefix sum. Prefix
sum, a.k.a cumulative sum, inclusive scan, or simply scan of a sequence of
numbers xi , i ∈ [0, n − 1] is second sequence of numbers yi , i ∈ [0, n − 1], and
yi is the sums of prefixes of the input sequence, with equation:
i
yi = (15.9)
X
xj
j=0

For instance, the prefix sums of on the following array is:

Index : 0 1 2 3 4 5
x: 1 2 3 4 5 6
y: 1 3 6 10 15 21

Prefix sums are trivial to compute with the following simple recurrence
relation in O(n) complexity.

yi = yi−1 + xi , i ≥ 1 (15.10)

Despite the ease of computation, prefix sum is a useful primitive in cer-

tain algorithms such as counting sort and Kadane’s Algorithm as you shall
see through this book.

Counting Sort
Given an input array [1, 4, 1, 2, 7, 5, 2], let’s see how exactly counting sort
works by explaining it in three steps. Because our input array comes with
duplicates, we distinguish the duplicates by their relative order shown in the
parenthesises. Ideally, for this input, we want it to be sorted as:
15.5. LINEAR SORTING 323

Index : 0 1 2 3 4 5 6
Key : 1(1) 4 1(2) 2(1) 7 5 2(2)
Sorted : 1 ( 1 ) 1(2) 2(1) 2(2) 4 5 7

Figure 15.7: Counting Sort: The process of counting occurrence and com-
pute the prefix sum.

1. Count Occurrences: We assign a count array Ci , and assign a size

8, which has index in range [0, 7] and will be able to contain our keys
whose range is [1, 7]. which has the same size of the key range k. We
loop over each key in the input array, and use key as index to count
each key’s occurrence. Doing so will get the following result. And it
means in the input array, we have two 1’s, two 2’s, one 4, one 5, and
one 7. The process is shown in Fig. 15.7.
Counting sort is indeed a subtype of bucket sort, where the number
of buckets is k, and each bucket stores keys implicitly by using keys
as indexes and the occurrence to track the total number of the same
keys.
2. Prefix Sum on Count Array: We compute the prefix sum for count
array, which is shown as:
Index : 0 1 2 3 4 5 6 7
Count : 0 2 2 0 1 1 0 1
P r e f i x Sum : 0 2 4 4 5 6 6 7

Denote the prefix sum array as ps. For key i, psi−1 tells us the number
of items that is less or equals to (≤) key i. This information can be
324 15. SORTING AND SELECTION ALGORITHMS

used to place key i directly into its correct position. For example, for
key 2, summing over its previous keys’ occurrences (ps1 ) gives us 2,
indicating that we can put key 2 to position 2. However, key 2 appears
two times, and the last position of key 2 is indicated by ps2 − 1, which
is 3. Therefore, for any key i, its locations in the sorted array is in
range [psi−1 , psi ). We could have just scan the prefix sum array, and
use the prefix sum as locations for key indicated by index of prefix
sum array. However, this method is only limited to situations where
the input array is integers. Moreover, it is unable to keep the relative
ordering of the items of the same key.

3. Sort Keys with Prefix Sum Array: First, let us loop over the
input keys from position 0 to n − 1. For keyi , we decrease the prefix
sum by one, pskeyi = pskeyi − 1 to get the last position that we can
assign this key in the sorted array. The whole process is shown in
Fig. 15.8. We saw that items of same keys are sorted in reverse order.
Looping over keys in the input in reverse order is able to correct this
and thus making the counting sort a stable sorting algorithm.

Figure 15.8: Counting sort: Sort keys according to prefix sum.

15.5. LINEAR SORTING 325

Implementation In our implementation, we first find the range of the

input data, say it is [minK, maxK], making our range k = maxK − minK.
And we recast the key as key − minK for two purposes:

• To save space for count array.

• To be able to handle negative keys.

The implementation of the main three steps are nearly the same as what we
have discussed other than the recast of the key. In the process, we used two
auxiliary arrays: count array for counting and accumulating the occurrence
of keys with O(k) space and order array for storing the sorted array with
O(n) space, giving us the space complexity O(n + k) in our implementation.
The Python code is shown as:
1 def countSort ( a ) :
2 minK , maxK = min ( a ) , max( a )
3 k = maxK − minK + 1
4 count = [ 0 ] ∗ (maxK − minK + 1 )
5 n = len (a)
6 order = [ 0 ] ∗ n
7 # Get o c c u r r e n c e
8 f o r key i n a :
9 count [ key − minK ] += 1
10
11 # Get p r e f i x sum
12 f o r i in range (1 , k ) :
13 count [ i ] += count [ i −1]
14
15 # Put key i n p o s i t i o n
16 f o r i i n r a n g e ( n−1, −1, −1) :
17 key = a [ i ] − minK
18 count [ key ] −= 1 # t o g e t t h e i n d e x a s p o s i t i o n
19 o r d e r [ count [ key ] ] = a [ i ]
20 return order

Properties Counting sort is out-of-place for the auxiliary count and

order array. Counting sort is stable given that we iterate keys in the input
array in reversed order. Counting sort is likely to have O(n + k) for both
the space and time complexity.

Applications Due to the special character that counting sort sorts by

using key as index, and the range of keys decides the time and space com-
plexity, counting sort’s applications are limited. We list the most common
applications:

• Because the time complexity depends on the size of k, in practice

counting sort is usually used when k = O(n), in which case it makes
the time complexity O(n).
326 15. SORTING AND SELECTION ALGORITHMS

• Counting sort is often used as a sub-routine. For example, it is a part

of other sorting algorithms such as radix sort, which is a linear sorting
algorithm. We will also see some examples in string matching chapter.

15.5.3 Radix Sort

The word “Radix” is a mathematical term for the base of a number. For
example, decimal and hexadecimal number has a radix of 10 and 16, respec-
tively. For strings of alphabets has a radix of 26 given there are 26 letters
of alphabet. Radix sort is a non-comparative sorting methods that utilize
the concept of radix or base to order a list of integers digit by digit or a list
of strings letter by letter. The sorting of integers or strings of alphabets is
different based on the different concepts of ordering–number ordering and
the lexicographical order as we have introduced. We show one example for
list of integers and strings and their sorted order or lexicographical order:
I n t e g e r s : 1 7 0 , 4 5 , 7 5 , 9 0 , 8 0 2 , 24
Sorted : 2 4 , 4 5 , 7 5 , 9 0 , 1 7 0 , 802

S t r i n g s : apple , pear , b e r r y , peach , a p r i c o t

S o r t e d : apple , a p r i c o t , b e r r y , peach , p e a r

And we see how that the integers are ordered by the length of digits, whereas
in the sorted strings, the length of strings does not usually decide the order-
ing.
Within Radix sorting, it is usually either the bucket sort or counting
sort that is doing the sorting using one radix as key at a time. Based upon
the sorting order of the digit, we have two types of radix sorting: Most
Significant Digit (MSD) radix sort which starts from the left-most radix
first and goes all the way the right-most radix, and Least Significant Digit
(LSD) radix sort vice versa. We should address the details of the two forms
of radix sort – MSD and LSD using our two examples.

LSD Radix Sorting Integers

LSD radix sort is often used to sort list of integers. It sorts the entire
numbers one digit/radix at a time from the least-significant to the most-
significant digit. For a list of positive integers where the maximum of them
has m digits, LSD radix sort takes a total of m passes to finish sorting.
Here, we demonstrate the process of LSD radix sorting on our exemplary
list of integers with counting sort as a subroutine to sort items by using each
radix as key. m = 3 in our example:

• As shown in Fig. 15.9, in the first pass, the least significant digit (1st
place) is used as key to sort. After this pass, the ordering of numbers
of unit digits is in-place.
15.5. LINEAR SORTING 327

Figure 15.9: Radix Sort: LSD sorting integers in iteration

• In the second pass, the 10s place digit is used. After this pass, we
see that numbers that has less than or equals to two digits comprising
24, 45, 75, 90 in our example is in ordering.

• At the last and third pass, the 100s place digit is used. For numbers
that are short of 100s place digit, 0 is placed. Afterwards, the entire
numbers are in ordering.

We have to notice that the sorting will not work unless the sorting subroutine
we apply is stable. For example, in our last pass, there exists four zeros,
indicating that they share the same key value. If the relative ordering of
them is not kept, the previously sorting effort will be wasted.

Implementation To implement the code with Python, we first need to

know how to get each digit out of an integer. With number 178 as an
example:

• The least significant digit 8 is the reminder of 178%10.

• The second least-significant digit 7 is the reminder of 17%10.

• And the most-significant digit 1 is the reminder of 1%10.

As we see for digit 8, we need to have 178, for digit 7, we need to have 17,
and for digit 1, we only need 1. 178, 17, 1 are the prefix till the digit we
need. We can obtain these prefixes via a base exp.
exp = 1 , ( 1 7 8 // exp ) = 1 7 8 , 178 % 10 = 8
exp = 1 0 , ( 1 7 8 // exp ) = 1 7 , 17 % 10 = 7
exp = 1 0 0 , ( 1 7 8 // exp ) = 1 , 1 % 10 = 1

We can also get the prefix by looping and each time we divide our number
by 10. For example, the following code will output [8, 7, 1].
1 a = 178
2 digits = []
328 15. SORTING AND SELECTION ALGORITHMS

3 while a > 0:
4 d i g i t s . append ( a%10)
5 a = a // 10

Now, we know the number of loops we need is decided by the maximum

positive integer in our input array. On the code basis, we use a while loop
to obtain the prefix and making sure that it is larger than 0. At each pass,
we call count_sort subroutine to sort the input list. The code is shown as:
1 def radixSort (a) :
2 maxInt = max( a )
3 exp = 1
4 w h i l e maxInt // exp > 0 :
5 a = c o u n t _ s o r t ( a , exp )
6 exp ∗= 10
7 return a

For subroutine count_sort subroutine, it is highly similar to our previously

implemented counting sort but two minor differences:

• Because we sort by digits, therefore, we have to use a formula: key =

(key//exp)%10 to covert the key to digit.

• Because for decimal there are in total only 10 digits, we only arrange
10 total space for the count array.

The code is as:

1 d e f c o u n t _ s o r t ( a , exp ) :
2 count = [ 0 ] ∗ 10 # [ 0 , 9 ]
3 n = len (a)
4 order = [ 0 ] ∗ n
5 # Get o c c u r r e n c e
6 f o r key i n a :
7 key = ( key // exp ) % 10
8 count [ key ] += 1
9
10 # Get p r e f i x sum
11 f o r i in range (1 , 10) :
12 count [ i ] += count [ i −1]
13
14 # Put key i n p o s i t i o n
15 f o r i i n r a n g e ( n−1, −1, −1) :
16 key = ( a [ i ] // exp ) % 10
17 count [ key ] −= 1 # t o g e t t h e i n d e x a s p o s i t i o n
18 o r d e r [ count [ key ] ] = a [ i ]
19 return order

Properties and Complexity Analysis Radix sorting for integers takes

m passes with m as the total digits, and each pass takes O(n + k), where
k = 10 since there is only 10 digits for decimals. This gives out a total
of O(mn) time complexity, and m is rather of a constant compared with
15.5. LINEAR SORTING 329

variable n, thus radix sorting for integers with counting sort as subroutine
has a linear time complexity. Due to the usage of counting sort, which is
stable, making the radix sorting a stable sorting algorithm too.
With the usage of auxiliary count and order, it gives a O(n) space
complexity, and makes the LSD integer sorting an out-of- place sorting al-
gorithm.

MSD Radix Sorting Strings

In our fruit alphabetization example, it uses MSD radix sorting and groups
the strings by a single letter with either bucket sort or counting sort under
the hood, starting from the very first letter on the left side all the way to the
very last on the right if necessary. MSD radix sorting is usually implemented
with recursion.

Figure 15.10: Radix Sort: MSD sorting strings in recursion. The black and
grey arrows indicate the forward and backward pass in recursion, respec-
tively.

For better demonstration, we add two more strings: “ap” and “pear”.
String “ap” is for showing what happens when the strings in the same bucket
but one is shorter and has no valid letter to compare with. And string “pear”
is to showcase how the algorithm handles duplicates. The algorithm indeed
330 15. SORTING AND SELECTION ALGORITHMS

applies the recursive divide and conquer design methodology.

Implementation We show bucket sort here, as in lexicographical order-

ing, the first letter of the strings already decide the ordering of groups of
strings bucketed by this letter. This LSD sorting method divide the strings
into different buckets indexed by letter, and it then combines the returned
results together to get the final sorted strings, which is highly similar to
merge sort.

• At first, the recursion handles the first keys of the first letter in the
string, as the process where i = 0 shown in Fig. 15.10. There are three
buckets with letter ‘a’, ‘b’, and ‘p’.

• At depth 2 when i = 1, the resulting buckets from depth 1 is further

bucketed by the second letter. Bucket ‘b’ contains only one item,
which itself is sorted, thus, the recursion end for this bucket. For the
last bucket ‘a’ and ‘p’, they are further bucketed by letter ‘p’ and ‘e’.

• At depth 3 when i = 2, for the last bucket ‘p’, it further results

two more buckets ‘p’ and ‘r’. However, for string ‘ap’ in the bucket,
there is no valid third letter to use as index. And according to the
lexicographical order, it puts ‘ap’ in earlier ordering of the resulted
buckets.

• In our example, the forward process in the recursion is totally done

when i = 4. It then enters into the backward phase, which merges
buckets that are either composed of a single item or the done_bucket.

The code is offered as:

1 d e f MSD_radix_string_sort ( a , i ) :
2 # End c o n d i t i o n : buc ket has o n l y one item
3 i f l e n ( a ) <= 1 :
4 return a
5
6 # Divide
7 buckets = [ [ ] f o r _ in range (26) ]
8 done_bucket = [ ]
9 for s in a :
10 i f i >= l e n ( s ) :
11 done_bucket . append ( s )
12 else :
13 b u c k e t s [ ord ( s [ i ] ) − ord ( ' a ' ) ] . append ( s )
14 # Conquer and c h a i n a l l b u c k e t s
15 ans = [ ]
16 f o r b in buckets :
17 ans += MSD_radix_string_sort ( b , i + 1 )
18 r e t u r n done_bucket + ans
15.6. PYTHON BUILT-IN SORT 331

Properties and Complexity Analysis Because the bucket sort itself is

a stable sorting algorithm, making the radix sort for string stable too.
The complexity analysis for the recursive Radix sorting can be accom-
plished with recursion tree. The tree has nearly n leaves. The worst case
occurs when all strings within the input array are the same, thus the recur-
sion tree degrades to linear structure with length n and within each node
O(n) is spent to scan items of corresponding letter, making the worst time
complexity O(n2 ).
For the existence of auxiliary buckets, done_bucket, and ans arrays in
sorting of strings, it is an out-of-place sorting. With the same recursion tree
analysis for space, we have linear space complexity too.

15.6 Python Built-in Sort

There are two built-in functions to sort list and other iterable objects in
Python 3, and both of them are stable. In default, they use < comparisons
between items and sort items in increasing order.
• Built-in method list.sort(key=None, reverse=False) of list which
sorts the items in the list in-place, and returns None.

• Built-in function sorted(iterable, key=None, reverse=False) works

on any iterable object, including list, string, tuple, dict, and so
on. It sorts the items out-of-place; returning another list and keeps
the original input unmodified.

Basics
To use the above two built-in methods to sort a list of integers is just as
simple as:
1 l s t = [4 , 5 , 8 , 1 , 2 , 7]
2 l s t . sort ()

Printing out lst shows that the sorting happens in-place within lst.
1 [1 , 2 , 4 , 5 , 7 , 8]

Now, use sorted() for the same list:

1 l s t = [4 , 5 , 8 , 1 , 2 , 7]
2 new_lst = s o r t e d ( l s t )

We print out:
1 new_lst , l s t
2 ([1 , 2 , 4 , 5 , 7 , 8] , [4 , 5 , 8 , 1 , 2 , 7])

Let’s try to sort other iterable object, and try sort a tuple of strings:
1 f r u i t = ( ' a p p l e ' , ' p e a r ' , ' b e r r y ' , ' peach ' , ' a p r i c o t ' )
2 new_fruit = s o r t e d ( f r u i t )
332 15. SORTING AND SELECTION ALGORITHMS

Print out new_fruit, and we also see that it returned a list instead of
tuple.
1 [ ' a p p l e ' , ' a p r i c o t ' , ' b e r r y ' , ' peach ' , ' p e a r ' ]

Note: For list, list.sort() is faster than sorted() because it doesn’t

have to create a copy. For any other iterable, we have no choice but to
apply sorted() instead.

Change Comparison Operator What if we want to redefine the be-

havior of comparison operator <? Other than writing a class and defin-
ing __lt__(), in Python 2, these two built-in functions has another argu-
ment, cmp, but it is totally dropped in Python 3. We can use functools’s
cmp_to_key method to convert to key in Python 3. For example, we want
to sort [4, 5, 8, 1, 2, 7] in reverse order, we can define a cmp function that
reverse the order of items to be compared:
1 d e f cmp( x , y ) :
2 return y − x

And then we call this function as:

1 from f u n c t o o l s import cmp_to_key
2 l s t . s o r t ( key=cmp_to_key (cmp) )

The printout of lst is:

1 [8 , 7 , 5 , 4 , 2 , 1]

Timsort These two methods both using the same sorting method – Tim-
sort and has the same parameters. Timesort is a hybrid stable and in-place
sorting algorithm, derived from merge sort and insertion sort, designed to
perform well on many kinds of real-world data. It uses techniques from Peter
McIlroy’s “Optimistic Sorting and Information Theoretic Complexity”, Jan-
uary 1993. It was implemented by Tim Peters in 2002 for use in the Python
programming language. The algorithm finds subsequences of the data that
are already ordered, and uses that knowledge to sort the remainder more
efficiently.

Arguments
They both takes two keyword-only arguments: key and reverse:, and each
has None and False as default value, respectively.

• Argument key: ot specifies a function of one argument that is used to

extract a comparison key from each list item (for example, key=str.lower).
If not set, the default value None means that the list items are sorted
directly.
15.6. PYTHON BUILT-IN SORT 333

• Argument reverse: a boolean value. If set to True, then the list or

iterable is sorted as if each comparison were reversed(use >= sorted list
is in Descending order. The dafault value is False.

Sort in Reverse Order Set reverse=True will sort a list of integers in

decreasing order:
1 l s t = [4 , 5 , 8 , 1 , 2 , 7]
2 l s t . s o r t ( r e v e r s e=True )

Print out lst, we see:

1 [8 , 7 , 5 , 4 , 2 , 1]

This is equivalent to customize a class Int and rewrite its __lt__() special
method as:
1 c l a s s Int ( int ) :
2 d e f __init__ ( s e l f , v a l ) :
3 s e l f . val = val
4 d e f __lt__( s e l f , o t h e r ) :
5 return other . val < s e l f . val

Now, sort the same list but without setting reverse will get us exactly the
same result:
1 l s t = [ Int (4) , Int (5) , Int (8) , Int (1) , Int (2) , Int (7) ]
2 l s t . sort ()

Customize key We have mainly two options to customize the key argu-
ment: (1) through lambda function, (2) through a pre-defined function. And
in either way, the function only takes one argument. For example, to sort
the following list of tuples by using the second item in the tuple as key:
1 l s t = [ ( 8 , 1) , (5 , 7) , (4 , 1) , (1 , 3) , (2 , 4) ]

We can write a function, and set key argument to this function

1 d e f get_key ( x ) :
2 return x [ 1 ]
3 new_lst = s o r t e d ( l s t , key = get_key )

The sorted result is:

1 [ ( 8 , 1) , (4 , 1) , (1 , 3) , (2 , 4) , (5 , 7) ]

The same result can be achieved via lambda function which is more conve-
nient:
1 new_lst = s o r t e d ( l s t , key = lambda x : x [ 1 ] )

Same rule applies to objects with named attributes. For example, we

have the following class named Student that comes with three attributes:
name, grade, and age, and we want to sort a list of Student class only
through the age.
334 15. SORTING AND SELECTION ALGORITHMS

1 c l a s s Student ( o b j e c t ) :
2 d e f __init__ ( s e l f , name , grade , age ) :
3 s e l f . name = name
4 s e l f . grade = grade
5 s e l f . age = age
6
7 # To s u p p o r t i n d e x i n g
8 d e f __getitem__ ( s e l f , key ) :
9 r e t u r n ( s e l f . name , s e l f . grade , s e l f . age ) [ key ]
10
11 d e f __repr__ ( s e l f ) :
12 r e t u r n r e p r ( ( s e l f . name , s e l f . grade , s e l f . age ) )

We can do it through setting key argument still:

1 s t u d e n t s = [ Student ( ' john ' , 'A ' , 1 5 ) , Student ( ' j a n e ' , 'B ' , 1 2 ) ,
Student ( ' dave ' , 'B ' , 1 0 ) ]
2 s o r t e d ( s t u d e n t s , key=lambda x : x . age )

which outputs the following result:

1 [ ( ' dave ' , 'B ' , 1 0 ) , ( ' j a n e ' , 'B ' , 1 2 ) , ( ' john ' , 'A ' , 1 5 ) ]

The key-function patterns shown above are very common, so Python

provides convenience functions to make accessor functions easier and faster.
The operator module has itemgetter() and attrgetter() to get the
attributes by index and name, respectively. For the sorting above, we can
do it like this:
1 from o p e r a t o r import a t t r g e t t e r
2 s o r t e d ( s t u d e n t s , key=a t t r g e t t e r ( ' age ' ) )

attrgetter can take multiple arguments, for example, we can sort the list
first by ‘grade’ and then by ‘age’, we can do it as:
1 s o r t e d ( s t u d e n t s , key=a t t r g e t t e r ( ' g r a d e ' , ' age ' ) )

which outputs the following result:

1 [ ( ' john ' , 'A ' , 1 5 ) , ( ' dave ' , 'B ' , 1 0 ) , ( ' j a n e ' , 'B ' , 1 2 ) ]

If our object supports indexing, which is why we defined getitem() in

the class, we can use itemgetter() to do the same thing:
1 from o p e r a t o r import i t e m g e t t e r
2 s o r t e d ( s t u d e n t s , key=i t e m g e t t e r ( 2 ) )

15.7 Summary and Bonus

Here, we give a comprehensive summary of the time complexity for different
sorting algorithms.
15.8. LEETCODE PROBLEMS 335

Figure 15.11: The time complexity for common sorting algorithms

15.8 LeetCode Problems

Problems

15.1 Insertion Sort List (147). Sort a linked list using insertion sort.
A graphical example of insertion sort. The partial sorted list (black)
initially contains only the first element in the list. With each iteration
one element (red) is removed from the input data and inserted in-place
into the sorted list
Algorithm of Insertion Sort: Insertion sort iterates, consuming one
input element each repetition, and growing a sorted output list. At
each iteration, insertion sort removes one element from the input data,
finds the location it belongs within the sorted list, and inserts it there.
It repeats until no input elements remain.
Example 1 :
I nput : 4−>2−>1−>3
Output : 1−>2−>3−>4

Example 2 :
I nput : −1−>5−>3−>4−>0
Output : −1−>0−>3−>4−>5

15.2 Merge Intervals (56, medium). Given a collection of intervals,

merge all overlapping intervals.
Example 1 :
I nput : [ [ 1 , 3 ] , [ 2 , 6 ] , [ 8 , 1 0 ] , [ 1 5 , 1 8 ] ]
Output : [ [ 1 , 6 ] , [ 8 , 1 0 ] , [ 1 5 , 1 8 ] ]
E x p l a n a t i o n : S i n c e i n t e r v a l s [ 1 , 3 ] and [ 2 , 6 ] o v e r l a p s ,
merge them i n t o [ 1 , 6 ] .
336 15. SORTING AND SELECTION ALGORITHMS

Example 2 :
Input : [ [ 1 , 4 ] , [ 4 , 5 ] ]
Output : [ [ 1 , 5 ] ]
E x p l a n a t i o n : I n t e r v a l s [ 1 , 4 ] and [ 4 , 5 ] a r e c o n s i d e r e d
overlapping .

15.3 Valid Anagram (242, easy). Given two strings s and t , write a
function to determine if t is an anagram of s.
Example 1 :
Input : s = " anagram " , t = " nagaram "
Output : t r u e

Example 2 :
Input : s = " r a t " , t = " c a r "
Output : f a l s e

Note: You may assume the string contains only lowercase alphabets.
Follow up: What if the inputs contain unicode characters? How would
you adapt your solution to such case?

15.4 Largest Number (179, medium).

15.5 Sort Colors (leetcode: 75). Given an array with n objects colored
red, white or blue, sort them so that objects of the same color are
adjacent, with the colors in the order red, white and blue. Here, we
will use the integers 0, 1, and 2 to represent the color red, white, and
blue respectively. Note: You are not suppose to use the library’s sort
function for this problem.

15.6 148. Sort List (sort linked list using merge sort or quick
sort).

Solutions
1. Solution: the insertion sort is easy, we need to compare current node
with all previous sorted elements. However, to do it in the linked list,
we need to know how to iterate elements, how to build a new list. In
this algorithm, we need two while loops to iterate: the first loop go
through from the second node to the last node, the second loop go
through the whole sorted list to compare the value of the current node
to the sorted element, which starts from having one element. There
are three cases for the comparison: if the comp_node does not move,
which means we need to put the current node in front the previous
head, and the cur_node become the new head; if the comp_node stops
at the back of it, so current node is the end, we set its value to 0, and
we save the pre_node in case; if it stops in the middle, we need to put
cur_node in between pre_node and cur_node.
15.8. LEETCODE PROBLEMS 337

1 d e f i n s e r t i o n S o r t L i s t ( s e l f , head ) :
2 """
3 : type head : ListNode
4 : r t y p e : ListNode
5 """
6 i f head i s None :
7 r e t u r n head
8 sorted_head = head
9 cur_node = head . next
10 head . next = None #s o r t e d l i s t o n l y has one node , a new
list
11 w h i l e cur_node :
12 next_node = cur_node . next #s a v e t h e next node
13 cmp_node = head
14 #compare node with p r e v i o u s a l l
15 pre_node = None
16 w h i l e cmp_node and cmp_node . v a l <= cur_node . v a l :
17 pre_node = cmp_node
18 cmp_node = cmp_node . next
19
20 i f cmp_node == head : #put i n t h e f r o n t
21 cur_node . next = head
22 head = cur_node
23 e l i f cmp_node == None : #put a t t h e back
24 cur_node . next = None #c u r r e n t node i s t h e end ,
s o s e t i t t o None
25 pre_node . next = cur_node
26 #head i s not changed
27 e l s e : #i n t h e middle , i n s e r t
28 pre_node . next = cur_node
29 cur_node . next = cmp_node
30 cur_node = next_node
31 r e t u r n head

2. Solution: Merging intervals is a classical case that use sorting. If we

do the sorting at first, and keep track our merged intervals in a heap
(which itself its sorted too), we just iterate into the sorted intervals, to
see if it should be merged in the previous interval or just be added into
the heap. Here the code is tested into Python on the Leetcode, however
for the python3 it needs to resolve the problem of the heappush with
customized class as iterable item.
1 # D e f i n i t i o n f o r an i n t e r v a l .
2 # class Interval ( object ) :
3 # d e f __init__ ( s e l f , s =0, e =0) :
4 # self . start = s
5 # s e l f . end = e
6 from heapq import heappush , heappop
7
8 class Solution ( object ) :
9 d e f merge ( s e l f , i n t e r v a l s ) :
10 """
11 : type i n t e r v a l s : L i s t [ I n t e r v a l ]
338 15. SORTING AND SELECTION ALGORITHMS

12 : rtype : List [ I n t e r v a l ]
13 """
14 i f not i n t e r v a l s :
15 return [ ]
16 #s o r t i n g t h e i n t e r v a l s n l o g n
17 i n t e r v a l s . s o r t ( key=lambda x : ( x . s t a r t , x . end ) )
18 h = [ intervals [ 0 ] ]
19 # i t e r a t e t h e i n t e r v a l s t o add
20 for i in i n t e r v a l s [ 1 : ] :
21 s , e = i . s t a r t , i . end
22 bAdd = F a l s e
23 f o r idx , p r e _ i n t e r a l i n enumerate ( h ) :
24 s_before , e_before = p r e _ i n t e r a l . s t a r t ,
p r e _ i n t e r a l . end
25 i f s <= e _ b e f o r e : #o v e r l a p , merge t o t h e
same i n t e r v a l
26 h [ i d x ] . end = max( e , e _ b e f o r e )
27 bAdd = True
28 break
29 i f not bAdd :
30 #no o v e r l a p , push t o t h e heap
31 heappush ( h , i )
32 return h

3. Solution: there could have so many ways to do it, the most easy one
is to sort the letters in each string and see if it is the same. Or we can
have an array of 26, and save the count of each letter, and check each
letter in the other one string.
1 d e f isAnagram ( s e l f , s , t ) :
2 """
3 : type s : s t r
4 : type t : s t r
5 : rtype : bool
6 """
7 r e t u r n ' ' . j o i n ( s o r t e d ( l i s t ( s ) ) ) == ' ' . j o i n ( s o r t e d (
list (t)))

The second solution is to use a fixed number of counter.

1 d e f isAnagram ( s e l f , s , t ) :
2 """
3 : type s : s t r
4 : type t : s t r
5 : rtype : bool
6 """
7 i f l e n ( s ) != l e n ( t ) :
8 return False
9 table = [0]∗26
10 s t a r t = ord ( ' a ' )
11 f o r c1 , c2 i n z i p ( s , t ) :
12 p r i n t ( c1 , c2 )
13 t a b l e [ ord ( c1 )−s t a r t ] += 1
14 t a b l e [ ord ( c2 )−s t a r t ] −= 1
15.8. LEETCODE PROBLEMS 339

15 for n in table :
16 i f n != 0 :
17 return False
18 r e t u r n True

For the follow up, use a hash table instead of a fixed size counter.
Imagine allocating a large size array to fit the entire range of unicode
characters, which could go up to more than 1 million. A hash table is
a more generic solution and could adapt to any range of characters.

4. Solution: from instinct, we know we need sorting to solve this problem.

From the above example, we can see that sorting them by integer is
not working, because if we do this, with 30, 3, we get 303, while the
right answer is 333. To review the sort built-in function, we need to
give a key function and rewrite the function, to see if it is larger, we
compare the concatenated value of a and b, if it is larger. The time
complexity here is O(n log n).
1 c l a s s LargerNumKey ( s t r ) :
2 d e f __lt__( x , y ) :
3 r e t u r n x+y > y+x
4
5 class Solution :
6 d e f l a r ge s t Nu m b er ( s e l f , nums ) :
7 largest_num = ' ' . j o i n ( s o r t e d (map( s t r , nums ) , key=
LargerNumKey ) )
8 r e t u r n ' 0 ' i f largest_num [ 0 ] == ' 0 ' e l s e
largest_num
340 15. SORTING AND SELECTION ALGORITHMS
16

Dynamic Programming

Figure 16.1: Dynamic Programming Chapter Recap

Dynamic programming is simplifying a complicated problem by break-

ing it down into simpler sub-problems in a recursive manner. As intro-
duced in Divide-and-Conquer in Chapter 13, dynamic programming is ap-
plied on problems wherein its subproblems overlap when you construct them
in Divide-and-Conquer manner. We use the recurrence function: T (n) =
T (n − 1) + T (n − 2) + ... + T (1) + f (n), to highlight its most special charac-
teristic – Overlapping Subproblems – compared with T (n) = 2T (n/2)+f (n)
for Divide-and-Conquer’s nonoverlapping subproblems. As we shall see there
are more types of recurrence functions in dynamic programming field other
than this exemplary formulation, either in this chapter briefly or in Chap-
ter 15 where a comprehensive list of dynamic programming categories/pat-

341
342 16. DYNAMIC PROGRAMMING

terns are given.

Importance and Applications Dynamic Programming is one of the fun-

damental methods in computer science and plays a very important role in
computer algorithms, even the book artificial Intelligence, a modern ap-
proach has mentioned this terminology for as many as 47 times. Dynamic
programming firs for optimizing the problems fall into the following cate-
gories:

1. Optimization: Compute the maximum or minimum value;

2. Counting: Count the total number of solutions;

3. Checking if a solution works.

To be noticed, not that all problems with the above formats will be cer-
tainly solved with dynamic programing, it requires the problem to show two
properties: overlapping subproblems and optimal substructures in order for
dynamic programming to be applied. These two properties will be defined
and explained in this chapter.

Our Difference and Plan A lot of textbooks or courses describe dy-

namic programming as obscure and demands creativity and subtle insights
from users to identify and construct its dynamic programming solutions.
However, we are determined to unfold such “mystery” by being ground-
ing practical. We have two chapters topiced with dynamic programming:
The current chapter oriented with clear definition by distinguishing, relat-
ing, and exampling the concept with divide-conquer and complete search.
This chapter serves as the frontline of our contents on Dynamic Program-
ming. Further, Chapter 15 is focusing on categorizing problems patterns
and giving examples on each.

• In order to understand how dynamic programming’s role in the algo-

rithms evolution map, the very first thing we do in Section 16.1 is to
show how we evolve the complete search to dynamic programming so-
lution by: (1) discussing the relation between complete search, divide-
and-conquer, and our dynamic programming; and (2) examining two
elementary examples – Fibonacci Sequence and Longest Increasing
Subsequence.

• Dynamic programming is typically applied on optimization problems.

Section 16.2 discuss the principle properties, elements, and experience
based guideline. And we show how we can relate these key character-
istics to the field of optimization.
16.1. INTRODUCTION TO DYNAMIC PROGRAMMING 343

• The naive solutions for dynamic programming applicable problems

have either exponential or polynomial time using complete searching
method. In Section. 16.3 we showcase how we can decrease the com-
plexity from the two baselines: from exponential to polynomial and
from polynomial to polynomial with lower power.

16.1 Introduction to Dynamic Programming

In this section, we answer two questions:

1. How to distinct divide and conquer from dynamic program-

ming? We have already conceptually know that it differs in the case
of the characteristics of subproblems in two cases: overlapping sub-
problems for dynamic programming where each subproblems share
subproblems and non-overlapping subproblems for divide and conquer
where the subproblems are disjoint with each other. In this section,
we further answer this question in a more visualized way using the
concept of subproblem graph.

2. How to develop the dynamic programming solution from the

complete search naive method? We are not offering a fully and
detail-oriented answer in this section. Instead, we first identify the
problem using complete search in dynamic programming applicable
problems using subproblem graph. Then we answer this question using
two elementary examples by showing a sorted solutions so that we
can demonstrate the relation between complete search and dynamic
programming.

16.1.1 Concepts

(a) Subproblem Graph

for Fibononacci Se-
quence (b) Subproblem Graph for Merge Sort

Figure 16.2: Subproblem Graph

344 16. DYNAMIC PROGRAMMING

Subproblem Graph If we treat each subproblem as a vertex, and the

relation between subprblems as arced edges, we can get a directed subprob-
lem graph. If the arced edge points from larger subproblems to smaller
subproblems, we say it is in top-down fashion. In contrast, if arced edge
is pointing from smaller subproblems to large subproblems, it is bottom-up
fashion. In Fig. 16.2 we draw the subproblem graph for Fibonacci Sequence
which we have defined in Page?? with n = 4. In comparison, we also give
the subproble m graph for sorting for array [29, 10, 14, 37, 13]. In the follow-
ing contents, we show how we can use subproblem graph to answer the two
questions we lay out at the beginning of this section.

Terminologies To make reading other materials accessible, we introduce

more related terminologies that are widely used in the field.

• State: State and subproblem is interchangeable among different books.

Both of them can be used to describe the optimal solution to a problem
with a solution space/search space.

• State Transition: State transition and recurrence function is inter-

changeable. In the case of fibonacci sequence, our state transition/re-
currence function is given as f (n) = f (n − 1) + f (n − 2). On the flip
side, there are problems that will not specify the state transitions. We
will have to figure them out by ourselves.

Distinction between Divide and Conquer and Dynamic Program-

ming In divide and conquer, problems are divided into disjoint subprob-
lems, the subproblem graph would degrade to a tree structure, each sub-
problem other than the base problems (here it is each individual element
in the array) will only have out degree equals to 1. In comparison, in the
case of fibonacci sequence shown in Fig. 16.2), some problems would have
out degree larger than one, which makes the network a graph instead of a
tree structure. This characteristics is directly induced by the fact that the
subproblems overlap – that is, subproblems share subprolems. For example,
f (4) and f (3) share f (2), and f (3) and f (2) shares f (1).

Complete Search and Dynamic Programming If we program these

two problems in a recursive top-down divide-and-conquer manner, they
are essentially equivalent to applying Depth-first-search on the subproblem
graph/tree. The only difference is for sorting, there will be no recomputa-
tion (such as in merge sort and quick sort), while for the fibonacci sequence,
subproblems that have in-degree larger than 1 will be recomputed multiple
times, which gives us space for optimization and this is where the dynamic
programming comes into rescue.
16.1. INTRODUCTION TO DYNAMIC PROGRAMMING 345

With the subproblem graph, we reconstruct the problem as a graph

problem, which means all complete searching methods can be applied; such
as Breadth-first search other than the recursive depth-first-search.

Depth-first-search Because of the usage of subproblems, the depth-first-

search implemented with divide-and-conquer manner (with the result of sub-
problems as return) outweight the usage of Breath-first-search. BFS doesn’t
compute the values of “optimal” sub-solutions (of sub-instances) and use
these to build up a solution, then build the optimum using this information.
I don’t see anything “bottom-up” in the process of BFS. I don’t see where
the “intermediate” states come. Therefore, we shall see that the close bond
of dynamic programming with DFS instead of with BFS.

16.1.2 From Complete Search to Dynamic Programming

So far, we know dynamic programming is an optimization methodology over
the compete search solutions for typical optimization problems. Dynamic
Programming’s core principle is to solve each subproblem only once by sav-
ing and reusing its solution. Therefore, compare with its naive counterpart
– Complete Search:

1. Dynamic Programming avoids the redundant recomputation met in

its compete search counterpart as demonstrated in the last section.

2. Dynamic Programming uses additional memory as a trade-off for bet-

ter computation/time efficiency; it serves as an example of a time-space
trade-off. In most cases as we shall see in this chapter, the space over-
head is well-worthy; it can decrease the time complexity dramatically
from exponential to polynomial level.

Two Forms of Dynamic Programming Solution There are two ways

– either recursive or iterative in general to add space mechanism into naive
complete search to construct our dynamic programming solution. But do
remember that we cannot eliminate recursive thinking completely. we will
always have to define a recursive relation irrespective of the approach we
use.

1. Top-down + Memoization (recursive DFS): we start from larger

problem (from top) and recursively search the subproblems space (to
bottom) until the leaf node. This method is built on top of Depth-First
Graph Search together with Divide and Conquer Methodology which
treat each node as a subproblem and return its solution to its caller
so that it can be used to build up its solution. Following a top-down
fashion as is in divide and conquer, along the process, in the recursive
call procedure, a hashmap is relied on to save and search solutions.
346 16. DYNAMIC PROGRAMMING

The memoization works in such way that at the very first time that
the subproblem is solved it will be saved in the hashmap, and whenever
this problem is met again, it finds the solution and returns it directly
instead of computing again. The key elements of this style of dynamic
programming is:

(a) Define subproblem;

(b) Develop solution using Depth-first Graph search and Divide and
conquer (leave alone the recomputation).
(c) Adding hashmap to save and search the state of each subproblem.

2. Bottom-up + Tabulation (iterative): different from the last method,

which use recursive calls, in this method, we approach the subproblems
from the smallest subproblems, and construct the solutions to larger
subproblems using the tabulaized result. The nodes in the subprob-
lem graph is visited in a reversed topological sort order. This means
that to reconstruct the state of current subproblem, all dependable
(predecessors) have already be computed and saved.

Comparison The Figure 16.1 record the two different methods, we can
use memoization and tabulation for short. Momoization and tabulation yield
the same asymptotic time complexity, however the tabulation approach often
has much better constant factors, since it has less overhead for procedure
calls.
The memoization method applies better for beginners that who have de-
cent understanding of divide and conquer. However, once you study further
and have enough practice, the tabulation should be more intuitive com-
pared with recursive solution. Usually, dynamic programming solution to a
problem refers to the solution with tabulation.
We enumerate two examples: Fibonacci Sequence (Subsection 16.1.3)
and Longest Increasing Subsequence (subsection ??) in the remaining section
to showcase memoization and tabulation in practice.

16.1.3 Fibonacci Sequence

Problem Definition Given f (0) = 0, f (1) = 1, f (n) = f (n − 1) + f (n −
2), n >= 2. Return the value for any given n.
As the most elementary and classical example demonstrating dynamic
programming, we carry on this tradition and give multi-fold of solutions for
fibonacci sequence. Since in Chapter. 13 the recursive and naive solution is
already given, we will just briefly explain it here.

Complete Search Because the relation between current state and previ-
ous states are directly given, it is straightforward to solve the problem in
16.1. INTRODUCTION TO DYNAMIC PROGRAMMING 347

a top-down fashion using depth-first search. The time complexity can be

easily obtained from using induction or recurion tree: O(2n ), where the base
2 is the width of the tree, and n is the depth. The Python code is given:
1 # DFS on subproblem graph
2 def fibonacciDFS (n) :
3 # base case
4 i f n <= 1 : r e t u r n n
5 r e t u r n f i b o n a c c i D F S ( n−1)+f i b o n a c c i D F S ( n−2) # u s e t h e r e s u l t
o f s u b t r e e t o b u i l d up t h e r e s u l t o f c u r r e n t t r e e .

Memoization As we explained, there are subproblems computed more

than once in the complete search solution. To avoid the recomputation, we
can use a hashtable memo to save the solved subproblem. We need to make
memo globally and available for all recursive calls; in the memoized complete
search, instead of calling the recursion function f (n) = f (n−1)+f (n−2) to
get answer for current state n, it first check if the problem is already solved
and available in memo.
Because to solve f(n), there will be n subproblems, and each subprob-
lem only depends on two smaller problems, so the time complexity will be
lowered to O(n) if we use DFS+memoizataion.
1 # DFS on subproblem graph + Memoization
2 d e f fibonacciDFSMemo ( n , memo) :
3 i f n <= 1 : r e t u r n n
4 i f n not i n memo :
5 memo [ n]= fibonacciDFSMemo ( n−1 , memo)+fibonacciDFSMemo ( n
−2 , memo)
6 r e t u r n memo [ n ]

Bottom-up Tabulation In the top-down recursive solution, where exists

two passes: one pass to divide the problems into subproblems, and the other
recursive pass to gather solution from the base case and construct solution
for larger problems. However, in the bottom-up tabulation way, for this
specific problem, we have four key steps:

1. We start by assigning dp array to save each state’s result. It represents

the fibonacci number at each index (from 0 to n), which is also called
a state.

2. Then, we initialize results of base cases which either were given or can
be obtained easily with simple deduction.

3. We iterate through each subproblem/state in reversed topological sort

order, which is [0, 1, 2, 3, 4] as in Fig 16.2, and use tabulazied solution
to build up the answer of current state through the given recurrence
function f (n) = f (n − 1) + f (n − 2).
348 16. DYNAMIC PROGRAMMING

4. We return the last state in dp as the final answer.

The tabulation code of dynamic programming is given:

1 # Dynamic Programming : bottom−up t a b u l a t i o n O( n ) , O( n )
2 def fibonacciDP (n) :
3 dp = [ 0 ] ∗ ( n+1)
4 # init
5 dp [ 1 ] = 1
6 f o r i i n r a n g e ( 2 , n+1) :
7 dp [ i ] = dp [ i −1] + dp [ i −2]
8 r e t u r n dp [ n ]

16.2 Dynamic Programming Knowledge Base

So far, we have learned most of the knowledge related to Dynamic pro-
gramming, including basic concepts and two examples. In this section, we
would officially answer three questions – when and how to apply dynamic
programming? and which type of dynamic programming we need? Tabula-
tion or Memoization. With clear definition, and offering some more practi-
cal guideline, complexity analysis, and comprehensive comprension between
memoization and tabulation, we are determined to demystify dynamic pro-
gramming. The subsections are organized as:

1. Two properties and Practical Guideline (Section 16.2.1) that an opti-

mization problems must have in order to answer the when question.

2. Five key elements, General Steps to Sovle Dynamic Programming, and

Complexity Analysis (Section 16.2.2) in implementing the dynamic
programming solution and to answer the how question.

3. Tabulation VS Memoization (Section 16.2.3) to answer the which ques-

tion.

16.2.1 When? Two properties

In order for the dynamic programming to apply, these two properties: over-
lapping subproblems and optimal substructure must be found in our solving
problems. From our illustrated examples, 1) the step of identifying over-
lapping shows the overlapping subproblem properties. 2) the recurrence
function in fact shows the optimal substructure. To be official, these two
essential properties states as:

Overlapping Subproblems When a recursive algorithm revisits the same

subproblem repeatedly, we say that the optimization problem has overlap-
ping subproblems. This can be easily visualized in the top-down subproblem
graph, where one state is reached by multiple other states. This property
16.2. DYNAMIC PROGRAMMING KNOWLEDGE BASE 349

demonstrates the recomputation overhead seen in the complete search solu-

tions of our two examples.
Overlapping Subproblems property helps us find space for optimization
and lead us to its solution – the caching mechanism used in dynamic pro-
gramming. In the flip side, when subproblems are disjoint such as seen in
merge sort and binary search, dynamic programming would not be helping.

Optimal Substructure A given problem has optimal substructure prop-

erty if the optimal solution of the given problem can be obtained by using
optimal solutions of its subproblems. Only if optimal substructure property
applied we can find the recurrence relation function which is a key step in
implementation as we have seen from the above two examples. Optimal
substructures varies across problem domains in two ways:

1. Subproblem space: how many subproblems an optimal solution to

the original problem uses. For example, in Fibonacci sequence, each
integer in range [0, n] is a subproblem, which makes the n + 1 as the
total subproblem space. The state of each subproblem is the optimal
solution for that subproblem which is f (n). And in the example of
LIS, each subproblem is the array with index in range [0, i], and its
state is the length of the longest increasing subsuquence ends/includes
at index i, this makes the whole subproblem space to be n too.

2. State Choice: how many choices we have in determining which sub-

problem(s) to use to decide the recurrence function for the current
state. In Fibonacci sequence, each state only relies on two preced-
ing states as seen in recurrence function f (i) = f (i − 1) + f (i − 2),
thus making it constant cost. For LIS, each state require knowing all
inclusive states (solutions relating to all smaller subproblems), which
makes it cost of O(n) that relates to the subproblem space.

Subproblem space and state choice together not only formulates the re-
current relation with which we very much have the implementation in hand.
Together they also decide the time and space complexity we will need to
tackle our dynamic programming problems.

Practical Guideline Instead of the textbook definition, we also summa-

rize the experience shared by experienced software programmers. Dynamic
programming problems are normally asked in its certain way and its naive
solution shows certain time complexity. Here, we summarize the situations
when to use or not to use dynamic programming as Dos and Donots.

• Dos: Dynamic programming fits for the optimizing the following prob-
lems which are either exponential or polynomial complexity using com-
plete search:
350 16. DYNAMIC PROGRAMMING

1. Optimization: Compute the maximum or minimum value;

2. Counting: Count the total number of solutions;
3. Checking if a solution works.

• Donots: In the following cases we might not be able to apply Dynamic

Programming:

1. When the naive solution has a low time complexity already such
as O(n2 ) or O(n3 ).
2. When the input dataset is a set while not an array or string or
matrix, 90% chance we will not use DP.
3. When the overlapping subproblems apply but it is not optimiza-
tion problems thus we can not identify the suboptimial substruc-
ture property and thus find its recurrence function. For example,
same problem context as in Dos but instead we are required to
obtain or print all solutions, this is when we need to retreat back
to the use of DFS+memo(top-down) instead of DP.

16.2.2 How? Five Elements and Steps

In Section, we have provided two forms of dynamic programming solutions:
memoization and tabulation, as two different ways of bringing the caching
mechanism into practice. In this section, we focus on the iterative tabulation
and generalize its four key elements and practical guidelines for import steps.

Five Key Elements of Tabulation As the first guideline of Tabulation,

we summarize the four key elements for the implementation of dynamic
programming:

1. Subproblem and State: Define what the subproblem space, what is

the optimal state/solution for each subproblem. In practice, it would
normally be the the total/the maximum/minimum for subproblem.
This requires us to know how to divide problem into subproblems,
there are patterns to follow which will be detailed in Chapter. ??.

2. State Transfer (Recurrence) Function: derive the function that

how we can get current state by using result from previous computed
state(s). This requires us to identify the optimal substructure and
know how to make state choice.

3. Assignment and Initialization: Followed by knowing the subprob-

lem space, we typically assign a space data structure and initialize its
values. For base or edge cases, we might need to initialize different
than the other more general cases.
16.2. DYNAMIC PROGRAMMING KNOWLEDGE BASE 351

4. Iteration: decide the order of iterating through the subproblem space

thus we can scan each subproblem/state exact and only once. Using
the subproblem graph, and visit the subproblms in reversed topological
order is a good way to go.

5. Answer: decide which state or a combination of all states such as the

the max/min of all the state is the final result needed.

Five Steps to Solve Dynamic Programming This is a general guide-

line for dynamic programming – memoization or tabulation. Key advice –
being “flexbile”. Given a real problem, all in all, we are credited with our
understanding of the concepts in computer science. Thus, we should not
be too bothered or stressed that if you can not come up with a “perfect”
answer.
1. Read the question: search for the key words of the problem patterns:
counting, checking, or maximum/minimum.

2. Come up with the most naive solution ASAP: analyze its time com-
plexity. Is it a typical DFS solution? Try draw a SUBPROBLEM
GRAPH to get visualization. Is there space for optimization?

3. Apply Section 16.2.1: Is there overlapping? Can you define the optimal
substructure/recurrence function?

4. If the conclusion is YES, try to define the Five key elements so that
we can solve it using the preferable tabulation. If you can figure it out
intuitively just like that, great! What to do if not? Maybe retreat to
use memoization, which is a combination of divide and conquer, DFS,
and memoization.

5. What if we were just so nervous that or time is short, we just go ahead

and implement the complete search solution instead. With imple-
mentation is better than nothing. With the implementation in hand,
maybe we can figure it out later.

Complexity Analysis The complexity analysis of the tabulation is seem-

ingly more straightforward compared with its counterpart – the recursive
memoization. For the tabulation, we can simply draw conclusion without
any prior knowledge of the dynamic programming by observing the for
loops and its recurrence function. However, for both variant, there exists a
common analysis method. The core points to analyze complexity involving
dynamic programming is: (1) the subproblem space |S|, that is the total
number of subproblems; and (2) the number of state choice needed to con-
struct each state |C|. By multiplying these two points, we can draw the
conclusion of its time complexity as O(|S||C|).
352 16. DYNAMIC PROGRAMMING

For example, if the subproblem space is n and if each state i relies on (1)
only one or two previous states as we have seen in the example of Fibonacci
Sequence, it makes the time complexity O(n); and (2) all previous states in
range [0, i − 1] as seen in the example of Longest Increasing Subsequence,
which can be viewed as O(n) to solve each subproblem, this brings up the
complexity up to O(n2 ).

16.2.3 Which? Tabulation or Memoization

As we can see, the way the bottom-up DP table is filled is not as intuitive
as the top-down DP as it requires some ‘reversals’ of the signs in Complete
Search recurrence that we have developed in previous sections. However,
we are aware that some programmers actually feel that the bottom-up ver-
sion is more intuitive. The decision on using which DP style is in your
hand. To help you decide which style that you should take when presented
with a DP solution, we present the trade-off comparison between top-down
Memoization and bottom-up Tabulation in Table 16.1.

Table 16.1: Tabulation VS Memoization

Memoization Tabulation
Pros
1. A natural transformation from 1. Faster if many sub-problems are
normal recursive complete search. revisited a sthere is no overhead of
2. Compute subproblems only recursive calls.
when necessary, sometimes this can 2. Can save memory space with
be faster. dynamic programming ‘on-the-fly‘
technique (see Section Exten-
sion. ??).

Cons
1. Slower if many subproblemes are 1. For programmers who are in-
revisited due to overhead of recur- clined with recursion, this may not
sive calls. be intuitive.
2. If there are n states, it can use up
to O(n) table size which might lead
to Memory Limit Exceeded(MLE)
for some hard problems.
3. Faces stack overflow due to the
resursive calls.

16.3 Hands-on Examples (Main-course Examples)

In the practical guideline, we mentioned that the problems that can be fur-
ther optimized with dynamic programming would be seen with complexity
16.3. HANDS-ON EXAMPLES (MAIN-COURSE EXAMPLES) 353

patters of their naive solutions: either exponential such as O(2n ) or polyno-

mial such as O(n3 ) or O(n2 ).
The purpose of this section is to further enhance our knowledge and
put our both theoreotical and practical guideline into test. We examine
two examples: Triangle and maximum subarray. We have seen how max-
imum subarray can be solved with linear search and divide and conquer
in Chapter.??. However, in this section, we expand the old solution into
dynamic programming solutions and we see the difference and connection.

16.3.1 Exponential Problem: Triangle

Triangle (L120) Given a triangle, find the minimum path sum from top
to bottom. Each step you may move to adjacent numbers on the row below.
Example :
Given t h e f o l l o w i n g t r i a n g l e :

[
[2] ,
[3 ,4] ,
[6 ,5 ,7] ,
[4 ,1 ,8 ,3]
]
The minimum path sum from top t o bottom i s 11 ( i . e . , 2 + 3 + 5 +
1 = 11) .

Analysis

1. We quickly read the question and we find the key word – minimum.

2. We come up with the most naive solution that would be dfs which
we have already covered in chapter. A quick drawing of dfs traversal
graph, we can find some nodes are repetitively visited.

3. Apply Two Properties: First, define the subproblem, for each node in
the triangle, it is decided by two indexes (i, j) as row and column index
respectively. The subproblem can be straightforward, the minimum
sum from the starting point (0, 0) to current position (i, j). And the
subprogram graph will be exactly the same as the graph we used in
dfs. We identify overlapping easily.
Now, develop the recurrence function. To build up solution for state
at (i, j), it needs two other states: (i − 1, j) and (i − 1, j − 1) and
need one value from current state. The function will be: f (i, j) =
min(f (i − 1, j), f (i − 1, j − 1)) + t[i][j].

4. Five Key Elements: we need to figure out how to assign and initialize
the dp space and do the iteration. To get the boundary condition:
354 16. DYNAMIC PROGRAMMING

(a) by observation: the first element at (0, 0) will have none of these
two states f (i − 1, j), f (i − 1, j − 1) exist. the leftmost and right-
most element of the triangle will have only one of these two states:
f (i − 1, j), f (i − 1, j − 1).
(b) by simple math induction: i ∈ [0, n − 1], j ∈ [0, i]. When i =
0, j = 0, f (i, j) = t[i][j], when i ∈ [1, n − 1], j = 0, f (i − 1, j − 1)
is invalid, and when i = n − 1, j = n − 1, (i − 1, j) is invalid.
The answer would be the minimum value of dp at the last row. The
Python code is given:
1 d e f min_path_sum ( t ) :
2 dp = [ [ 0 f o r c i n r a n g e ( r +1) ] f o r r i n r a n g e ( l e n (
t r i a n g l e ) ) ] # i n i t i a l i z e d to 0 f o r f ( )
3 n = len ( triangle )
4 #i n i t i a l i z e t h e f i r s t p o i n t , bottom
5 dp [ 0 ] [ 0 ] = t r i a n g l e [ 0 ] [ 0 ]
6 #i n i t i a l t h e l e f t c o l and t h e r i g h t c o l o f t h e t r i a n g l e
7 f o r i in range (1 , n) :
8 dp [ i ] [ 0 ] = dp [ i − 1 ] [ 0 ] + dp [ i ] [ 0 ]
9 dp [ i ] [ i ] = dp [ i − 1 ] [ i −1] + dp [ i ] [ i ]
10 f o r i in range (1 , n) :
11 f o r j in range (1 , i ) :
12 dp [ i ] [ j ] = t [ i ] [ j ] + min ( dp [ i − 1 ] [ j ] , dp [ i − 1 ] [ j
−1])
13 r e t u r n min ( dp [ − 1 ] )

Space Optimization From the recurrence function, we can see the cur-
rent state is only related to two states from the last row. We can reuse the
original triangle matrix itself to save the state. If we are following the
forward induction as the previous solution, we still have the problem of edge
cases; for some state that it only has one previous or none previous states
needed to decide its current state. We can write our code as:
1 d e f min_path_sum ( t ) :
2 '''
3 Space o p t i m i z a t i o n with forward induction
4 '''
5 t = deepcopy ( t )
6 i f not t :
7 return 0
8 n = len ( t )
9 f o r i in range (0 , n) :
10 f o r j in range (0 , i + 1) :
11 i f i == 0 and j == 0 :
12 continue
13 e l i f j == 0 :
14 t[ i ][ j ] = t[ i ] [ j ] + t [ i −1][ j ]
15 e l i f j == i :
16 t[ i ][ j ] = t[ i ] [ j ] + t [ i − 1 ] [ j −1]
17 else :
16.3. HANDS-ON EXAMPLES (MAIN-COURSE EXAMPLES) 355

18 t [ i ] [ j ] = t [ i ] [ j ] + min ( t [ i − 1 ] [ j ] , t [ i − 1 ] [ j −1])
19 r e t u r n min ( t [ − 1 ] )

Further Optimization Let us look at the traversal order backward where

we start from the last row and traverse upward to the first row. For the last
row, its state should be the same as its triangle value. For any remaining
rows and each of its element, its state will all rely on two other states locating
below of them. There is consistency in this backward induction and the final
state at the first row will be only final global answer. In this method, we
reverse of recurrence function as f (i, j) = min(f (i + 1, j + 1), f (i + 1, j)) +
t[i][j].
1 d e f min_path_sum ( t ) :
2 '''
3 Space o p t i m i z a t i o n with backward i n d u c t i o n
4 '''
5 t = deepcopy ( t )
6 i f not t :
7 return 0
8 n = len ( t )
9 # S t a r t from t h e l a s t s e c o n d row
10 f o r i i n r a n g e ( n−2 , −1, −1) :
11 f o r j i n r a n g e ( i , −1, −1) :
12 t [ i ] [ j ] = t [ i ] [ j ] + min ( t [ i + 1 ] [ j ] , t [ i + 1 ] [ j +1])
13 return t [ 0 ] [ 0 ]

16.3.2 Polynomial Problem: Maximum Subarray

Maximum Subarray (L53) Find the contiguous subarray within an ar-
ray (containing at least one number) which has the largest sum.
For example , g i v e n t h e a r r a y [ − 2 , 1 , − 3 , 4 , − 1 , 2 , 1 , − 5 , 4 ] , t h e
c o n t i g u o u s s u b a r r a y [ 4 , − 1 , 2 , 1 ] has t h e l a r g e s t sum = 6 .

The problem will be analyzed following our two properties and solved
following our five step guideline and five elements.

Analysis and O(n) Solution

1. First step, we read the problem and we can quickly catch the key word
– maximum.

2. Second step, the naive solution. We have From other chapters, we have
seen how maximum subarray can be approached as either graph search
(O(2n ) to get more details later), linear search along the solution space
(O(n3 ) and O(n2 ) if be tweeted with the computation of subarray).

3. Third step: Apply two properties. The solution space we concluded

for maximum subarray would be totally in O(n2 ), and be denoted as
356 16. DYNAMIC PROGRAMMING

a[i, j] where i, j ∈ [0, n − 1], j ≤ i. This states that the maximum

subarray is one of these subarrays fixing their starting index. Here, in
order to think in dynamic programming way, let us define subproblem.
We first define it as a[i, j], and the state would be its sum of this
subarray. We can see there is already some hidden recurrence function
that f (i, j) = f (i, j − 1) + a[j]. We would see there is overlap: a[0, 4]
actually includes a[1, 4].
However, there is something missing. The state we define did not take
leverage of the optimal substructure. Let us define the subproblem in
another way that has the optimal condition there. We define f (i), i ∈
[0, n − 1], represents the subarry that starts from index i and the
answer/state will be the maximum value of these potential subarrarys.
Therefore, the subproblem space will be only O(n). The solution space
of subproblem f (i) is a[i, j] where i, j ∈ [0, n − 1], j ≤ i. Assume
we are comparing f (0) and f (1), that is the relation of maximum
subarry starts from 0 and the maximum subarry that starts from 1.
f (1) is a subproblem of f (0). If f (1) is computed already, the f (0)
would be either include f (1) if its positive or not include with two
possible state-choice. Thus, we get our recurrence function f (i − 1) =
max(f (i) + a[i], a[i]). The last state is f (n − 1) = a[n − 1].

4. Step 4: Given all the conclusions, we can start the five key elements.
The above solution requires us to start from the maximum index in a
reverse order, this is called backward induction mentioned in materials
explaining dynamic programming from the angle of optimization. We
need to always pay attention there is empty array where the maximum
subarray should give zero as result. This makes our total states n + 1
instead of n. In the backward induction, this empty state will locate
at index n with a list of size n + 1.
1 d e f maximum_subarray_dp ( a ) :
2 '''
3 Backward i n d u c t i o n dp s o l u t i o n
4 '''
5 # a s s i g n m e n t and i n i t i a l i z a t i o n
6 dp = [ 0 ] ∗ ( l e n ( a ) + 1 )
7 # f i l l out t h e dp s p a c e i n r e v e r s e o r d e r
8 # we do not need t o f i l l t h e b a s e c a e dp [ n ]
9 f o r i in r e v e r s e d ( range ( len ( a ) ) ) :
10 dp [ i ] = max( dp [ i +1] + a [ i ] , a [ i ] )
11 p r i n t ( dp )
12 r e t u r n max( dp )

Space Optimization If we observe the iterating process, we always only

use one previous state. If we use another global variable, say maxsum to track
16.4. EXERCISES 357

the global maximum subarray value, and use state to replace dp array, we
can decrease the space complexity from O(n) to O(1).
1 d e f maximum_subarray_dp_sp ( a ) :
2 '''
3 dp s o l u t i o n with s p a c e o p t i m i z a t i o n
4 '''
5 # a s s i g n m e n t and i n i t i a l i z a t i o n
6 state = 0
7 maxsum = 0
8 # f i l l out t h e dp s p a c e i n r e v e r s e o r d e r
9 # we do not need t o f i l l t h e b a s e c a e dp [ n ]
10 f o r i in r e v e r s e d ( range ( len ( a ) ) ) :
11 s t a t e = max( s t a t e + a [ i ] , a [ i ] )
12 maxsum = max( maxsum , s t a t e )
13 r e t u r n maxsum

All of the above steps are for deep analysis purpose. When you are more
experience, we can go directly to the five elements of tabulation and develop
the solution without connecting it to the naive solution. Also, this is actually
a Kadane’s Algorithm which will be further detailed in Chapter. ??.

16.4 Exercises
16.4.1 Knowledge Check
1. The completeness of Dynamic programming.

16.4.2 Cooding Practice

In order to understand how the efficiency is boosted from searching algo-
rithms to dynamic programming, readers will be asked to give solutions for
both searching algorithms and dynamic programming algorihtms. And then
to compare and analyze the difference. (Two problems to be asked)
1. Coordinate Type 63. Unique Paths II (medium).
A robot is located at the top-left corner of a m x n grid (marked ’Start’
in the diagram below).
The robot can only move either down or right at any point in time. The
robot is trying to reach the bottom-right corner of the grid (marked
’Finish’ in the diagram below).
Now consider if some obstacles are added to the grids. How many
unique paths would there be?
An obstacle and empty space is marked as 1 and 0 respectively in the
grid.
Note: m and n will be at most 100.
Example 1:
358 16. DYNAMIC PROGRAMMING

1 Input :
2 [
3 [0 ,0 ,0] ,
4 [0 ,1 ,0] ,
5 [0 ,0 ,0]
6 ]
7 Output : 2

Explanation: There is one obstacle in the middle of the 3x3 grid above.
There are two ways to reach the bottom-right corner:
1 1 . Right −> Right −> Down −> Down
2 2 . Down −> Down −> Right −> Right

Sequence Type

2. 213. House Robber II

Note: This is an extension of House Robber.
After robbing those houses on that street, the thief has found himself
a new place for his thievery so that he will not get too much attention.
This time, all houses at this place are arranged in a circle. That means
the first house is the neighbor of the last one. Meanwhile, the security
system for these houses remain the same as for those in the previous
street.
Given a list of non-negative integers representing the amount of money
of each house, determine the maximum amount of money you can rob
tonight without alerting the police.
example nums = [3,6,4], return 6
1 d e f rob ( s e l f , nums ) :
2 """
3 : type nums : L i s t [ i n t ]
4 : rtype : int
5 """
6
7 i f not nums :
8 return 0
9 i f l e n ( nums ) ==1:
10 r e t u r n nums [ 0 ]
11 d e f r o b b e r 1 ( nums ) :
12 dp = [ 0 ] ∗ ( 2 )
13 dp [ 0 ] =0
14 dp [ 1 ] =nums [ 0 ] #i f l e n i s 1
15 f o r i i n r a n g e ( 2 , l e n ( nums ) +1) : #i f l e n g i s
2 . . . . , i n d e x i s i −1
16 dp [ i %2]=max( dp [ ( i −2)%2]+nums [ i −1] , dp [ ( i −1)
%2])
17 r e t u r n dp [ l e n ( nums ) %2]
18
19 r e t u r n max( r o b b e r 1 ( nums [ : − 1 ] ) , r o b b e r 1 ( nums [ 1 : ] ) )
16.4. EXERCISES 359

3. 337. House Robber III

4. 256. Paint House

There are a row of n houses, each house can be painted with one of
the three colors: red, blue or green. The cost of painting each house
with a certain color is different. You have to paint all the houses such
that no two adjacent houses have the same color.
The cost of painting each house with a certain color is represented by
a n x 3 cost matrix. For example, costs[0][0] is the cost of painting
house 0 with color red; costs[1][2] is the cost of painting house 1 with
color green, and so on... Find the minimum cost to paint all houses.
Solution: state: 0, 1, 2 colors minCost[i] = till i the mincost for
each color for color 0: paint 0 [0] = min(minCost[i-1][1], minCost[i-
1][2])+costs[i][0]
paint 1 [1]
minCost[i] = [0,1,2], i for i in [0,1,2]
answer = min(minCost[-1])
1 d e f minCost ( s e l f , c o s t s ) :
2 """
3 : type c o s t s : L i s t [ L i s t [ i n t ] ]
4 : rtype : int
5 """
6 i f not c o s t s :
7 return 0
8 i f l e n ( c o s t s ) ==1:
9 r e t u r n min ( c o s t s [ 0 ] )
10
11 minCost = [ [ 0 f o r c o l i n r a n g e ( 3 ) ] f o r row i n r a n g e
( l e n ( c o s t s ) +1) ]
12 minCost [ 0 ] = [ 0 , 0 , 0 ]
13 minCost [ 1 ] = [ c o s t f o r c o s t i n c o s t s [ 0 ] ]
14 c o l o r S e t=s e t ( [ 1 , 2 , 0 ] )
15 f o r i i n r a n g e ( 2 , l e n ( c o s t s ) +1) :
16 f o r c in range (3) :
17 #p r e v i o u s c o l o r
18 p r e s = l i s t ( c o l o r S e t −s e t ( [ c ] ) )
19 print ( pres )
20 minCost [ i ] [ c ] = min ( [ minCost [ i − 1 ] [ pre_cor ]
f o r pre_cor i n p r e s ] )+c o s t s [ i − 1 ] [ c ]
21 r e t u r n min ( minCost [ − 1 ] )

5. 265. Paint House II

There are a row of n houses, each house can be painted with one of
the k colors. The cost of painting each house with a certain color is
different. You have to paint all the houses such that no two adjacent
houses have the same color.
360 16. DYNAMIC PROGRAMMING

The cost of painting each house with a certain color is represented by

a n x k cost matrix. For example, costs[0][0] is the cost of painting
house 0 with color 0; costs[1][2] is the cost of painting house 1 with
color 2, and so on... Find the minimum cost to paint all houses.
Note: All costs are positive integers.
Follow up: Could you solve it in O(nk) runtime?
Solution: this is exactly the same as the last one:
1 i f not c o s t s :
2 return 0
3 i f l e n ( c o s t s ) ==1:
4 r e t u r n min ( c o s t s [ 0 ] )
5
6 k = len ( costs [ 0 ] )
7 minCost = [ [ 0 f o r c o l i n r a n g e ( k ) ] f o r row i n r a n g e
( l e n ( c o s t s ) +1) ]
8 minCost [ 0 ] = [ 0 ] ∗ k
9 minCost [ 1 ] = [ c o s t f o r c o s t i n c o s t s [ 0 ] ]
10 c o l o r S e t=s e t ( [ i f o r i i n r a n g e ( k ) ] )
11 f o r i i n r a n g e ( 2 , l e n ( c o s t s ) +1) :
12 f o r c in range ( k ) :
13 #p r e v i o u s c o l o r
14 p r e s = l i s t ( c o l o r S e t −s e t ( [ c ] ) )
15 minCost [ i ] [ c ] = min ( [ minCost [ i − 1 ] [ pre_cor ]
f o r pre_cor i n p r e s ] )+c o s t s [ i − 1 ] [ c ]
16 r e t u r n min ( minCost [ − 1 ] )

6. 276. Paint Fence

There is a fence with n posts, each post can be painted with one of
the k colors.
You have to paint all the posts such that no more than two adjacent
fence posts have the same color.
Return the total number of ways you can paint the fence.
Note: n and k are non-negative integers. for three posts, the same
color, the first two need to be different
1 d e f numWays( s e l f , n , k ) :
2 """
3 : type n : i n t
4 : type k : i n t
5 : rtype : int
6 """
7 i f n==0 o r k==0:
8 return 0
9 i f n==1:
10 return k
11
12 count = [ [ 0 f o r c o l i n r a n g e ( k ) ] f o r row i n r a n g e ( n
+1) ]
16.4. EXERCISES 361

13 same = k
14 d i f f = k ∗ ( k−1)
15 f o r i i n r a n g e ( 3 , n+1) :
16 pre_diff = d i f f
17 d i f f = ( same+ d i f f ) ∗ ( k−1)
18 same = p r e _ d i f f
19 r e t u r n ( same+ d i f f )

Double Sequence Type DP

7. 115. Distinct Subsequences (hard)

Given a string S and a string T, count the number of distinct subse-
quences of S which equals T.
A subsequence of a string is a new string which is formed from the
original string by deleting some (can be none) of the characters with-
out disturbing the relative positions of the remaining characters. (ie,
"ACE" is a subsequence of "ABCDE" while "AEC" is not).
Example 1:
1 I nput : S = " r a b b b i t " , T = " r a b b i t "
2 Output : 3

Explanation:
As shown below, there are 3 ways you can generate "rabbit" from S.
(The caret symbol ^ means the chosen letters)
1 rabbbit
2 ^^^^ ^^
3 rabbbit
4 ^^ ^^^^
5 rabbbit
6 ^^^ ^^^

8. 97. Interleaving String

Given s1, s2, s3, find whether s3 is formed by the interleaving of s1
and s2.
Example 1:
1 I nput : s 1 = " aabcc " , s 2 = " dbbca " , s 3 = " aadbbcbcac "
2 Output : t r u e

Example 2:
1 I nput : s 1 = " aabcc " , s 2 = " dbbca " , s 3 = " aadbbbaccc "
2 Output : f a l s e

Splitting Type DP
362 16. DYNAMIC PROGRAMMING

9. 132. Palindrome Partitioning II (hard)

Given a string s, partition s such that every substring of the partition
is a palindrome.
Return the minimum cuts needed for a palindrome partitioning of s.
Example:
1 Input : " aab "
2 Output : 1

Explanation: The palindrome partitioning ["aa","b"] could be pro-

duced using 1 cut.
Exercise: max difference between two subarrays: An integer indicate
the value of maximum difference between two Subarrays. The temp
java code is:
1 p u b l i c i n t maxDiffSubArrays ( i n t [ ] nums ) {
2 // w r i t e your code h e r e
3 i n t s i z e = nums . l e n g t h ;
4 i n t [ ] left_max = new i n t [ size ];
5 i n t [ ] l e f t _ m i n = new i n t [ size ];
6 i n t [ ] right_max = new i n t [ size ];
7 i n t [ ] right_min = new i n t [ size ];
8
9 i n t localMax = nums [ 0 ] ;
10 i n t l o c a l M i n = nums [ 0 ] ;
11
12 left_max [ 0 ] = l e f t _ m i n [ 0 ] = nums [ 0 ] ;
13 // s e a r c h f o r left_max
14 f o r ( i n t i = 1 ; i < s i z e ; i ++) {
15 localMax = Math . max( nums [ i ] , localMax + nums [ i
]) ;
16 left_max [ i ] = Math . max( left_max [ i − 1 ] ,
localMax ) ;
17 }
18 // s e a r c h f o r l e f t _ m i n
19 f o r ( i n t i = 1 ; i < s i z e ; i ++) {
20 l o c a l M i n = Math . min ( nums [ i ] , l o c a l M i n + nums [ i
]) ;
21 l e f t _ m i n [ i ] = Math . min ( l e f t _ m i n [ i − 1 ] ,
localMin ) ;
22 }
23
24 right_max [ s i z e − 1 ] = right_min [ s i z e − 1 ] = nums [
size − 1];
25 // s e a r c h f o r right_max
26 localMax = nums [ s i z e − 1 ] ;
27 f o r ( i n t i = s i z e − 2 ; i >= 0 ; i −−) {
28 localMax = Math . max( nums [ i ] , localMax + nums [ i
]) ;
29 right_max [ i ] = Math . max( right_max [ i + 1 ] ,
localMax ) ;
16.4. EXERCISES 363

30 }
31 // s e a r c h f o r r i g h t min
32 l o c a l M i n = nums [ s i z e − 1 ] ;
33 f o r ( i n t i = s i z e − 2 ; i >= 0 ; i −−) {
34 l o c a l M i n = Math . min ( nums [ i ] , l o c a l M i n + nums [ i
]) ;
35 right_min [ i ] = Math . min ( right_min [ i + 1 ] ,
localMin ) ;
36 }
37 // s e a r c h f o r s e p a r e t e p o s i t i o n
38 int d i f f = 0;
39 f o r ( i n t i = 0 ; i < s i z e − 1 ; i ++) {
40 d i f f = Math . max( Math . abs ( left_max [ i ] −
right_min [ i + 1 ] ) , d i f f ) ;
41 d i f f = Math . max( Math . abs ( l e f t _ m i n [ i ] −
right_max [ i + 1 ] ) , d i f f ) ;
42 }
43 return d i f f ;
44 }

10. 152. Maximum Product Subarray (medium)

Given an integer array nums, find the contiguous subarray within an
array (containing at least one number) which has the largest product.
Example 1:
1 I nput : [ 2 , 3 , − 2 , 4 ]
2 Output : 6
3 E x p l a n a t i o n : [ 2 , 3 ] has t h e l a r g e s t p r o d u c t 6 .

Example 2:
1 I nput : [ −2 ,0 , −1]
2 Output : 0
3 E x p l a n a t i o n : The r e s u l t cannot be 2 , b e c a u s e [ −2 , −1] i s not
a subarray .

Solution: this is similar to the maximum sum subarray, the difference

we need to have two local vectors, one to track the minimum vaule:
min_local, the other is max_local, which denotes the minimum and
the maximum subarray value including the ith element. The function
is as follows.

min(min_local[i − 1] ∗ nums[i], nums[i]), nums[i] < 0;

(
min_local[i] =
min(max_local[i − 1] ∗ nums[i], nums[i]) otherwise
(16.1)

max(max_local[i − 1] ∗ nums[i], nums[i]), nums[i] > 0;

(
max_local[i] =
max(min_local[i − 1] ∗ nums[i], nums[i]) otherwise
(16.2)
364 16. DYNAMIC PROGRAMMING

1 d e f maxProduct ( nums ) :
2 i f not nums :
3 return 0
4 n = l e n ( nums )
5 min_local , max_local = [ 0 ] ∗ n , [ 0 ] ∗ n
6 max_so_far = nums [ 0 ]
7 m i n _ l o c a l [ 0 ] , max_local [ 0 ] = nums [ 0 ] , nums [ 0 ]
8 f o r i in range (1 , n) :
9 i f nums [ i ] >0:
10 max_local [ i ] = max( max_local [ i −1]∗nums [ i ] , nums
[ i ])
11 m i n _ l o c a l [ i ] = min ( m i n _ l o c a l [ i −1]∗nums [ i ] , nums
[ i ])
12 else :
13 max_local [ i ] = max( m i n _ l o c a l [ i −1]∗nums [ i ] , nums
[ i ])
14 m i n _ l o c a l [ i ] = min ( max_local [ i −1]∗nums [ i ] , nums
[ i ])
15 max_so_far = max( max_so_far , max_local [ i ] )
16 r e t u r n max_so_far

With space optimization:

1 d e f maxProduct ( s e l f , nums ) :
2 i f not nums :
3 return 0
4 n = l e n ( nums )
5 max_so_far = nums [ 0 ]
6 min_local , max_local = nums [ 0 ] , nums [ 0 ]
7 f o r i in range (1 , n) :
8 i f nums [ i ] >0:
9 max_local = max( max_local ∗nums [ i ] , nums [ i ] )
10 m i n _ l o c a l = min ( m i n _ l o c a l ∗nums [ i ] , nums [ i ] )
11 else :
12 pre_max = max_local #s a v e t h e i n d e x
13 max_local = max( m i n _ l o c a l ∗nums [ i ] , nums [ i ] )
14 m i n _ l o c a l = min ( pre_max∗nums [ i ] , nums [ i ] )
15 max_so_far = max( max_so_far , max_local )
16 r e t u r n max_so_far

Even simpler way to write it:

1 d e f maxProduct ( s e l f , nums ) :
2 i f not nums :
3 return 0
4 n = l e n ( nums )
5 max_so_far = nums [ 0 ]
6 min_local , max_local = nums [ 0 ] , nums [ 0 ]
7 f o r i in range (1 , n) :
8 a = m i n _ l o c a l ∗nums [ i ]
9 b = max_local ∗nums [ i ]
10 max_local = max( nums [ i ] , a , b )
11 m i n _ l o c a l = min ( nums [ i ] , a , b )
12 max_so_far = max( max_so_far , max_local )
13 r e t u r n max_so_far
16.4. EXERCISES 365

11. 122. Best Time to Buy and Sell Stock II

Say you have an array for which the ith element is the price of a given
stock on day i.
Design an algorithm to find the maximum profit. You may complete
as many transactions as you like (i.e., buy one and sell one share of
the stock multiple times).
Note: You may not engage in multiple transactions at the same time
(i.e., you must sell the stock before you buy again).
Example 1:
1 I nput : [ 7 , 1 , 5 , 3 , 6 , 4 ]
2 Output : 7
3 E x p l a n a t i o n : Buy on day 2 ( p r i c e = 1 ) and s e l l on day 3 (
p r i c e = 5 ) , p r o f i t = 5−1 = 4 .
4 Then buy on day 4 ( p r i c e = 3 ) and s e l l on day
5 ( p r i c e = 6 ) , p r o f i t = 6−3 = 3 .

Example 2:
1 I nput : [ 1 , 2 , 3 , 4 , 5 ]
2 Output : 4
3 E x p l a n a t i o n : Buy on day 1 ( p r i c e = 1 ) and s e l l on day 5 (
p r i c e = 5 ) , p r o f i t = 5−1 = 4 .
4 Note t h a t you cannot buy on day 1 , buy on day
2 and s e l l them l a t e r , a s you a r e
5 e n g a g i n g m u l t i p l e t r a n s a c t i o n s a t t h e same
time . You must s e l l b e f o r e buying a g a i n .

Example 3:
1 I nput : [ 7 , 6 , 4 , 3 , 1 ]
2 Output : 0
3 E x p l a n a t i o n : In t h i s c a s e , no t r a n s a c t i o n i s done , i . e . max
profit = 0.

Solution: the difference compared with the first problem is that we

can have multiple transaction, so whenever we can make profit we can
have an transaction. We can notice that if we have [1,2,3,5], we only
need one transaction to buy at 1 and sell at 5, which makes profit
4. This problem can be resolved with decreasing monotonic stack.
whenever the stack is increasing, we kick out that number, which is
the smallest number so far before i and this is the transaction that
make the biggest profit = current price - previous element. Or else,
we keep push smaller price inside the stack.
1 def maxProfit ( s e l f , p r i c e s ) :
2 """
3 : type p r i c e s : L i s t [ i n t ]
4 : rtype : int
5 """
366 16. DYNAMIC PROGRAMMING

6 mono_stack = [ ]
7 profit = 0
8 for p in prices :
9 i f not mono_stack :
10 mono_stack . append ( p )
11 else :
12 i f p<mono_stack [ − 1 ] :
13 mono_stack . append ( p )
14 else :
15 #k i c k out t i l l i t i s d e c r e a s i n g
16 i f mono_stack and mono_stack [−1]<p :
17 p r i c e = mono_stack . pop ( )
18 p r o f i t += p−p r i c e
19
20 w h i l e mono_stack and mono_stack [ −1]<p :
21 p r i c e = mono_stack . pop ( )
22 mono_stack . append ( p )
23 return p r o f i t

12. 188. Best Time to Buy and Sell Stock IV (hard)

Say you have an array for which the ith element is the price of a given
stock on day i.
Design an algorithm to find the maximum profit. You may complete
at most k transactions.
Note: You may not engage in multiple transactions at the same time
(ie, you must sell the stock before you buy again).
Example 1:
1 Input : [ 2 , 4 , 1 ] , k = 2
2 Output : 2
3 E x p l a n a t i o n : Buy on day 1 ( p r i c e = 2 ) and s e l l on day 2 (
p r i c e = 4 ) , p r o f i t = 4−2 = 2 .

Example 2:
1 Input : [ 3 , 2 , 6 , 5 , 0 , 3 ] , k = 2
2 Output : 7
3 E x p l a n a t i o n : Buy on day 2 ( p r i c e = 2 ) and s e l l on day 3 (
p r i c e = 6 ) , p r o f i t = 6−2 = 4 .
4 Then buy on day 5 ( p r i c e = 0 ) and s e l l on day
6 ( p r i c e = 3 ) , p r o f i t = 3−0 = 3 .

13. 644. Maximum Average Subarray II (hard)

Given an array consisting of n integers, find the contiguous subarray
whose length is greater than or equal to k that has the maximum
average value. And you need to output the maximum average value.
Example 1:
16.4. EXERCISES 367

1 I nput : [ 1 , 1 2 , − 5 , − 6 , 5 0 , 3 ] , k = 4
2 Output : 1 2 . 7 5
3 Explanation :
4 when l e n g t h i s 5 , maximum a v e r a g e v a l u e i s 1 0 . 8 ,
5 when l e n g t h i s 6 , maximum a v e r a g e v a l u e i s 9 . 1 6 6 6 7 .
6 Thus r e t u r n 1 2 . 7 5 .

Note:
1 1 <= k <= n <= 1 0 , 0 0 0 .
2 Elements o f t h e g i v e n a r r a y w i l l be i n r a n g e [ − 1 0 , 0 0 0 ,
10 ,000].
3 The answer with t h e c a l c u l a t i o n e r r o r l e s s than 10−5
w i l l be a c c e p t e d .

14. Backpack Type Backpack II Problem

Given n items with size A[i] and value V[i], and a backpack with size
m. What’s the maximum value can you put into the backpack? Notice
You cannot divide item into small pieces and the total size of items
you choose should smaller or equal to m. Example
1 Given 4 i t e m s with s i z e [ 2 , 3 , 5 , 7 ] and v a l u e [ 1 , 5 , 2 ,
4 ] , and a backpack with s i z e 1 0 . The maximum v a l u e i s 9 .

Challenge
O(n x m) memory is acceptable, can you do it in O(m) memory? Note
Hint: Similar to the backpack I, difference is dp[j] we want the value
maximum, not to maximize the volume. So we just replace f[i-A[i]]+A[i]
with f[i-A[i]]+V[i].

15. 801. Minimum Swaps To Make Sequences Increasing

1 We have two i n t e g e r s e q u e n c e s A and B o f t h e same non−z e r o
length .
2
3 We a r e a l l o w e d t o swap e l e m e n t s A[ i ] and B [ i ] . Note t h a t
both e l e m e n t s a r e i n t h e same i n d e x p o s i t i o n i n t h e i r
respective sequences .
4
5 At t h e end o f some number o f swaps , A and B a r e both
strictly increasing . (A s e q u e n c e i s s t r i c t l y i n c r e a s i n g
i f and o n l y i f A [ 0 ] < A [ 1 ] < A [ 2 ] < . . . < A[A. l e n g t h −
1].)
6
7 Given A and B, r e t u r n t h e minimum number o f swaps t o make
both s e q u e n c e s s t r i c t l y i n c r e a s i n g . I t i s guaranteed
t h a t t h e g i v e n i n p u t always makes i t p o s s i b l e .
8
9 Example :
10 I nput : A = [ 1 , 3 , 5 , 4 ] , B = [ 1 , 2 , 3 , 7 ]
11 Output : 1
368 16. DYNAMIC PROGRAMMING

12 Explanation :
13 Swap A [ 3 ] and B [ 3 ] . Then t h e s e q u e n c e s a r e :
14 A = [ 1 , 3 , 5 , 7 ] and B = [ 1 , 2 , 3 , 4 ]
15 which a r e both s t r i c t l y i n c r e a s i n g .
16
17 Note :
18
19 A, B a r e a r r a y s with t h e same l e n g t h , and t h a t l e n g t h
w i l l be i n t h e r a n g e [ 1 , 1 0 0 0 ] .
20 A[ i ] , B [ i ] a r e i n t e g e r v a l u e s i n t h e r a n g e [ 0 , 2 0 0 0 ] .

Simple DFS. The brute force solution is to generate all the valid
sequence and find the minimum swaps needed. Because each element
can either be swapped or not, thus make the time complexity O(2n ).
If we need to swap current index i is only dependent on four elements
at two state, (A[i], B[i], A[i-1], B[i-1]), at state i and i-1 respectively.
At first, supposedly for each path, we keep the last visited element a
and b for element picked for A and B respectively. Then
1 d e f minSwap ( s e l f , A, B) :
2 i f not A o r not B :
3 return 0
4
5 d e f d f s ( a , b , i ) : #t h e l a s t e l e m e n t o f t h e s t a t e
6 i f i == l e n (A) :
7 return 0
8 i f i == 0 :
9 # not swap
10 count = min ( d f s (A[ i ] , B [ i ] , i +1) , d f s (B [ i ] , A[ i
] , i +1)+1)
11 r e t u r n count
12 count = s y s . maxsize
13
14 i f A[ i ]>a and B [ i ]>b : #not swap
15 count = min ( d f s (A[ i ] , B [ i ] , i +1) , count )
16 i f A[ i ]>b and B [ i ]>a :#swap
17 count = min ( d f s (B [ i ] , A[ i ] , i +1)+1, count )
18 r e t u r n count
19
20 return dfs ( [ ] , [ ] , 0)

DFS with single State Memo is not working. Now, to avoid

overlapping, [5,4], [3,7] because for the DFS there subproblem is in
reversed order compared with normal dynamic programming. Simply
using the index to identify the state will not work and end up with
wrong answer.
DFS with muliple choiced memo. For this problem, it has two
potential choice, swap or keep. The right way is to distinguish different
state with additional variable. Here we use swapped to represent if the
current level we make the decision of swap or not.
16.4. EXERCISES 369

1 d e f minSwap ( s e l f , A, B) :
2 i f not A o r not B :
3 return 0
4
5 d e f d f s ( a , b , i , memo, swapped ) : #t h e l a s t e l e m e n t o f
the s t a t e
6 i f i == l e n (A) :
7 return 0
8 i f ( swapped , i ) not i n memo :
9 i f i == 0 :
10 # not swap
11 memo [ ( swapped , i ) ] = min ( d f s (A[ i ] , B [ i ] , i
+1 , memo, F a l s e ) , d f s (B [ i ] , A[ i ] , i +1 , memo, True ) +1)
12 r e t u r n memo [ ( swapped , i ) ]
13 count = s y s . maxsize
14
15 i f A[ i ]>a and B [ i ]>b : #not swap
16 count = min ( count , d f s (A[ i ] , B [ i ] , i +1,
memo, F a l s e ) )
17 i f A[ i ]>b and B [ i ]>a : #swap
18 count = min ( count , d f s (B [ i ] , A[ i ] , i +1,
memo, True ) +1)
19 memo [ ( swapped , i ) ] = count
20
21 r e t u r n memo [ ( swapped , i ) ]
22
23 return dfs ( [ ] , [ ] , 0 , {} , False )

Dynamic Programming. Because it has two choice, we define two

dp state arrays. One represents the minimum swaps if current i is not
swapped, and the other is when the current i is swapped.
1 d e f minSwap ( s e l f , A, B) :
2 i f not A o r not B :
3 return 0
4
5 dp_not =[ s y s . maxsize ] ∗ l e n (A)
6 dp_swap = [ s y s . maxsize ] ∗ l e n (A)
7 dp_swap [ 0 ] = 1
8 dp_not [ 0 ] = 0
9 f o r i i n r a n g e ( 1 , l e n (A) ) :
10 i f A[ i ] > A[ i −1] and B [ i ] > B [ i − 1 ] : #i −1 not swap
and i not swap
11 dp_not [ i ] = min ( dp_not [ i ] , dp_not [ i −1])
12 # i f i −1 swap , i t means A[ i ]>B [ i −1] , i need t o
swap
13 dp_swap [ i ] = min ( dp_swap [ i ] , dp_swap [ i −1]+1)
14 i f A[ i ] > B [ i −1] and B [ i ] > A[ i − 1 ] : # i −1 not swap ,
i swap
15 dp_swap [ i ] = min ( dp_swap [ i ] , dp_not [ i −1]+1)
16 # i f i −1 swap , i t means t h e f i r s t c a s e , c u r r e n t
need t o not t o swap
17 dp_not [ i ] = min ( dp_not [ i ] , dp_swap [ i −1])
18 r e t u r n min ( dp_not [ − 1 ] , dp_swap [ − 1 ] )
370 16. DYNAMIC PROGRAMMING

Actually, in this problem, the DFS+memo solution is not easy to un-

derstand any more. On the other hand, the dynamic programming is
easier and more straightforward to understand.

16. Example 1. 131. Palindrome Partitioning (medium)

Given a string s, partition s such that every substring of the partition
is a palindrome.
Return all possible palindrome partitioning of s.
1 For example , g i v e n s = " aab " ,
2 Return
3
4 [
5 [ " aa " , " b " ] ,
6 [" a " ," a " ,"b "]
7 ]

Solution: here we not only need to count all the solutions, we need
to record all the solutions. Before using dynamic prgramming, we
can use DFS, and we need a function to see if a splitted substring is
palindrome or not. The time complexity for this is T (n) = T (n − 1) +
T (n − 2) + ... + T (1) + O(n), which gave out the complexity as O(3n ).
This is also called backtracking algorithm. The running time is 152
ms.

Figure 16.3: State Transfer for the panlindrom splitting

1 def partition ( s e l f , s ) :
2 """
3 : type s : s t r
4 : rtype : List [ List [ s t r ] ]
5 """
6 #s ="bb "
7 #t h e whole p u r p o s e i s t o f i n d pal , which means i t i s a
DFS
8 d e f bPal ( s ) :
9 r e t u r n s==s [ : : − 1 ]
10 d e f h e l p e r ( s , path , r e s ) :
11 i f not s :
12 r e s . append ( path )
16.5. SUMMARY 371

13 f o r i i n r a n g e ( 1 , l e n ( s ) +1) :
14 i f bPal ( s [ : i ] ) :
15 h e l p e r ( s [ i : ] , path +[ s [ : i ] ] , r e s )
16 res =[]
17 helper ( s , [ ] , res )
18 return res

Now, we use dynamic programming, for the palindrome, if substring

s(i, j) is panlindrome, then if s[i − 1] == s[j + 1], then s(i-1,j+1) is
palindrome too. So, for state: f [i][j] denotes if s[i : j] is a palindrome
with 1 or 0; for function: f [i − 1][j + 1] = f [i][j], if s[i] == s[j], else ;
for initialization: f[i][i] = True and f[i][i+1], for the loop, we start with
size 3, set the start and end index; However, for this problem, this
only acts like function bP al, checking it in O(1) time. The running
time is 146 ms.
1 def partition ( s ) :
2 f = [ [ False f o r i in range ( len ( s ) ) ] f o r i in range ( len (
s))]
3
4 f o r d in range ( len ( s ) ) :
5 f [ d ] [ d ] = True
6 f o r d in range (1 , len ( s ) ) :
7 f [ d − 1 ] [ d ]=( s [ d−1]==s [ d ] )
8 f o r s z i n r a n g e ( 3 , l e n ( s ) +1) : #3 : 3
9 f o r i i n r a n g e ( l e n ( s )−s z +1) : #t h e s t a r t index , i =0 ,
0
10 j = i+sz −1 #0+3−1 = 2 , 1 , 1
11 f [ i ] [ j ] = f [ i + 1 ] [ j −1] i f s [ i ]==s [ j ] e l s e F a l s e
12 res = [ ]
13 d e f h e l p e r ( s t a r t , path , r e s ) :
14 i f s t a r t==l e n ( s ) :
15 r e s . append ( path )
16 f o r i in range ( s t a r t , len ( s ) ) :
17 i f f [ start ] [ i ] :
18 h e l p e r ( i +1 , path +[ s [ s t a r t : i + 1 ] ] , r e s )
19 helper (0 , [ ] , res )
20 return res

This is actually the example that if we want to print out all the solu-
tions, we need to use DFS and backtracking. It is hard to use dynamic
programming and save time.

16.5 Summary
Steps of Solving Dynamic Programming Problems
We read through the problems, most of them are using array or string
data structures. We search for key words: ”min/max number", ”Yes/No" in
”subsequence/" type of problems. After this process, we made sure that we
372 16. DYNAMIC PROGRAMMING

are going to solve this problem with dynamic programming. Then, we use
the following steps to solve it:

1. .

2. New storage( a list) f to store the answer, where fi denotes the answer
for the array that starts from 0 and end with i. (Typically, one extra
space is needed) This steps implicitly tells us the way we do divide
and conquer: we first start with dividing the sequence S into S(1,n)
and a0 . We reason the relation between these elements.

3. We construct a recurrence function using f between subproblems.

4. We initialize the storage and we figure out where in the storage is the
final answer (f[-1], max(f), min(f), f[0]).

Other important points from this chapter.

1. Dynamic programming is an algorithm theory, and divide and conquer

+ memoization is a way to implement dynamic programming.

2. Dynamic programming starts from initialization state, and deduct the

result of current state from previous state till it gets to the final state
when we can collect our final answer.

3. The reason that dynamic programming is faster because it avoids rep-

etition computation.

4. Dynamic programming ≈ divide and conquer + memoization.

The following table shows the summary of different type of dynamic

programming with their four main elements.
16.5. SUMMARY 373

Figure 16.4: Summary of different type of dynamic programming problems

374 16. DYNAMIC PROGRAMMING
17

Greedy Algorithms

Greedy algorithm is a further optimization strategy on top of dynamic pro-

gramming. It usually constructs and tracks a single optimal solution to
problem directly and incrementally; like the dynamic programming, it works
with subproblems, and at each step it extends the last partial solution by
evaluating all available candidates and then pick the best one at the mo-
ment without regard to other discarded solutions. Greedy algorithm picks
the best immediate output, but does not consider the big picture, hence it
is considered greedy.
Because of the “greediness” of the greedy algorithms, whether the single
one solution we derive is optimal or not is what for us to ponder and decide.
The consciousness of its optimality is important: if we require an absolutely
optimal solution, we have to prove its optimality with systematic induction
methods, if we are aware that it wont lead to the optimal solution, but
is close enough and a good approximation to the optimal solution that we
seek but too expensive to achieve, we can still go for it. This chapter is a
systematic study of the greedy algorithm, we focus on designing and proving
methods that always try to achieve the optimal solution.
Greedy algorithm is highly related to and relies on math optimization.
It is “easy” if you can reason a solution and prove it easily with math, which
comes “natural”. Greedy algorithm can be “hard” when we need to identify
important and less obvious properties, design a greedy approach, and prove
its correctness with more systematic induction methods; it requires even
more analysis effort than dynamic programming does. Because of the highly
flexibility of the greedy algorithms, a lot algorithmic books do not even cover
this topic. It is not frequently seen in real interviews, but we want to cover
it because in the field of AI, the searching is approximate, greedy algorithm

375
376 17. GREEDY ALGORITHMS

can be approximate too and efficient. Maybe it will inspire us in other fields.

17.1 Exploring

Maximum Non-overlapping Intervals (L435) Given a collection of

intervals, find the minimum number of intervals you need to remove to
make the rest of the intervals non-overlapping. Note: You may assume the
interval’s end point is always bigger than its start point. Intervals like [1,2]
and [2,3] have borders “touching” but they don’t overlap each other.

Example 1 :

Input : [ [ 1 , 2 ] , [ 2 , 3 ] , [ 3 , 4 ] , [ 1 , 3 ] ]
Output : 1
E x p l a n a t i o n : [ 1 , 3 ] can be removed and t h e r e s t o f i n t e r v a l s a r e
non−o v e r l a p p i n g .

Analysis Naively, this is a combination problem that each interval can be

taken or not taken, which has a total of O(2n ) combinations. For each com-
bination, we make sure its a feasible that none of the within items overlaps.
The process of enumerating the combination has been well explained in the
chapter of search and combinatorics. As a routine for optimization problem,
we use a sequence X to represent if each item in the original array is chosen
or not, xi ∈ {0, 1}. Our objective is to optimize the value:

n−1
o = max (17.1)
X
xi
i=0
(17.2)

However, if we sort the items by either start or end time, the checking of an
item’s compatibility to a combination will be only need to compare it with
its last item
17.1. EXPLORING 377

Dynamic Programming

Figure 17.1: All intervals sorted by start and end time.

A-B: Convert to Longest Increasing Subsequence A feasible solu-

tion would be that a0 , a1 , ..., ak , and s(ai ) ≤ f (ai ) ≤ s(ai+1 ) < f (ai+1 ).
If we sort the intervals by either start or end time, our sorted intervals as
shown in Fig. 17.1. We can reduce our problem into finding the length of
longest subsequence LS, i = [0, k − 1] that does not overlap , which is equiv-
alently defining that s[i + 1] ≥ f [i] in the resulting subsequence. This is
similar enough to the concept of longest increasing subsequence, we can ap-
ply the dynamic programming to solve this problem with a time complexity
of O(n2 ).
For this problem, there can exist multiple optimal solutions and dynamic
programming can tell us from the LIS array that which one has the max-
imum. Let’s define a subproblem d[i] as getting the maximum number of
non-overlapping intervals for subarray [a[0], a[1], ..., a[i − 1]] with the maxi-
mum subsequence that includes a[i − 1]. Then, our the recurrence relation
is:

d[i] = max(d[j]) + 1, j ∈ [0, i − 1], j < i, f (j) < s(i). (17.3)

And the answer is max(d[i]), i ∈ [0, n − 1].

1 from t y p i n g import L i s t
2 d e f e r a s e O v e r l a p I n t e r v a l s ( i n t e r v a l s : L i s t [ L i s t [ i n t ] ] ) −> i n t :
3 i f not i n t e r v a l s :
4 return 0
5 i n t e r v a l s . s o r t ( key=lambda x : x [ 0 ] )
6 n = len ( intervals )
7 LIS = [ 0 ] ∗ ( n+1)
8 f o r i in range (n) :
9 max_before = 0
10 f o r j i n r a n g e ( i , −1, −1) :
11 i f i n t e r v a l s [ i ] [ 0 ] >= i n t e r v a l s [ j ] [ 1 ] :
12 max_before = max( max_before , LIS [ j +1])
13 LIS [ i +1] = max( LIS [ i ] , max_before +1)
378 17. GREEDY ALGORITHMS

14 #p r i n t ( LIS )
15 r e t u r n l e n ( i n t e r v a l s )−max( LIS )

Simplified Dynamic Programming Let’s approach the problem di-

rectly, define a subproblem d[i] as the maximum number of non-overlapping
intervals for subarray a[0 : i]. With induction, assume we have solved all
subproblems from d[0] up till d[i − 1], meaning we have known the answer to
all these subproblems. Now we want to find the recurrence relation between
subproblem d[i] and its preceding subproblems. We have a[i] at hand, what
effect it can have?
We can either increase its previous maximum value which is d[i − 1] by
one, or else, the optimal solution remains unchanged. This makes the d
array non-decreasing sequence. With this characteristic, we do not need
to try out all preceding compatible intervals, but instead just its nearest
preceding one–because it is at least the same as all preceding ones. We
define this preceding compatible interval of a[i] with index p[i], then our
recurrence relation become

d[i] = max(d[i − 1], d[p[i]] + 1). (17.4)

And the final answer will be dp[-1]. With the sorting, the part with the
dynamic programming only takes O(n), making the total time O(n log n)
mainly caused by sorting. The Code only differs one line with the above
approach:
1 d e f e r a s e O v e r l a p I n t e r v a l s ( i n t e r v a l s : L i s t [ L i s t [ i n t ] ] ) −> i n t :
2 i f not i n t e r v a l s :
3 return 0
4 i n t e r v a l s . s o r t ( key=lambda x : x [ 0 ] )
5 n = len ( intervals )
6 dp = [ 0 ] ∗ ( n+1)
7
8 f o r i in range (n) :
9 max_before = 0
10 f o r j i n r a n g e ( i , −1, −1) :
11 i f i n t e r v a l s [ i ] [ 0 ] >= i n t e r v a l s [ j ] [ 1 ] :
12 max_before = max( max_before , dp [ j +1])
13 break
14 dp [ i +1] = max( dp [ i ] , max_before +1)
15 #p r i n t ( LIS )
16 r e t u r n n−dp [ −1]

Greedy Algorithm
In the previous solution, the process looks like this: If it is sorted by end
time, first we have e, m = 1, for a, it is not compatible with e, according
to previous recurrence relation, m = 1, with either a or e in the optimal
17.1. EXPLORING 379

solution. When we are processing d, its preceding compatible interval is e,

making our maximum value 2 for this subproblem. For c, the length of the
optimal solution remains the same, but with additional optimal solution:
e, c. However, if we go back, when we are processing a, is it necessary to
keep a and e as the optimal solution. If we just get the maximum length
of the optimal solution, we do not need to track all optimal solutions, but
just one that is the most “optimistic”, which will be e in our case. Because
choosing e instead of a leaves more space and thus more likely to fit more
intervals for the later subproblems. Similarly, for c, it is incompatible with
d, then it is safe to throw it away, because it has the largest end time–the
least optimistic, thus it is unnecessary to replace any previous interval in
the optimal solution with it. This algorithm takes this simplification even
more aggressive and “greedy’. Therefore, in the greedy algorithm, if we have
multiple optimal solutions, usually it only cares about one that is the most
promising and optimistic, and incrementally to build up on it. The code is
given:

1 d e f e r a s e O v e r l a p I n t e r v a l s ( i n t e r v a l s : L i s t [ L i s t [ i n t ] ] ) −> i n t :
2 i f not i n t e r v a l s :
3 return 0
4 min_rmv = 0
5 i n t e r v a l s . s o r t ( key = lambda x : x [ 1 ] )
6 l a s t _ e n d = −s y s . maxsize
7 for i in i n t e r v a l s :
8 i f i [ 0 ] >= l a s t _ e n d : #non−o v e r l a p
9 last_end = i [ 1 ]
10 else :
11 min_rmv += 1
12
13 r e t u r n min_rmv

If we sort our problems by start time. We need to tweak the code a bit, that
whenever one interval is incompatible with previous, we see if it has earlier
end time that the previous one, if it is, then we replace it with this one,
because it has later start time, and earlier end time, whatever the optimal
that the previous interval is in, replacing it with the current one will not
overlap and it will be more promising.

1 for i in i n t e r v a l s :
2 i f i [ 0 ] < l a s t _ e n d : #o v e r l a p , d e l e t e t h i s one , do not update
t h e end
3 i f i [ 1 ] < last_end :
4 last_end = i [ 1 ]
5 min_rmv += 1
6 else :
7 last_end = i [ 1 ]
380 17. GREEDY ALGORITHMS

Summary and Comparison

We have seen that both dynamic programming and the greedy algorithms
solves the problems incrementally–starting from small problems to larger
problems. Dynamic programming plays safe by tracking the previous state,
thus it does not matter how you sort these intervals; both by start time
and end time work the same. However, in the greedy approach, it cares less
about the previous states.
For example, if our intervals are [[1, 11], [2, 12], [13, 14], [11, 22]], the
dynamic programming will give us LIS=[0, 1, 1, 2, 2], which indicates there
are two optimal solutions. While, in the greedy algorithm, we would find
one that is [1, 11], [13, 14]. The resulting is, we might find a solution that is
one of its multiple optimal solutions with the same length. In this process,
we greatly increased our time efficiency, and simplified the algorithm design
and coding.

Questions to Ponder

• Are you absolutely sure that is one of the optimal solutions? If it is

optimal, how to prove it then?

• When can I use greedy algorithm over dynamic programming?

What if there each interval is weighted with a real value

wi , and the objective is to maximize a non-overlap set of
interval’s sum of weights?
If the weight can be both negative and positive, we have to use the
first dynamic programming method. If for every wi ≥ 0, then previous
suboptimal solution can still have a chance to lead to a global optimal
solution if it happens to be compatible with following intervals with
large weight, we can apply the second dynamic programming method.

17.2 Introduction to Greedy Algorithm

What is Greedy Algorithm?
We say that dynamic programming tracks best solution to all subproblems
(in the above example, it is the d array) and incrementally build up solutions
to subproblems using their subproblems (in our example, we use all of its
subproblems for j in range(i) to build up solution to subproblem d[i].
Greedy algorithm follows the same trend in the sense of solving over-
lapping subproblems where optimal substructure property shows. But it
only maintain one optimal solution for each of the subproblem–the most
17.2. INTRODUCTION TO GREEDY ALGORITHM 381

promising one. For example, for [[1, 11]], the optimal solution is [1, 11], for
[1, 11], [2, 12], the optimal solution is still [1, 11], even though [2, 12] is
another optimal solution for this subproblem. For [1, 11], [2, 12], [13, 14],
[11, 22], greedy approach gives us [1,11],[13, 14] as our optimal solution,
while in dynamic programming, we can still find another optimal solution:
[1, 11], [11, 22].

Three Properties We define three properties for greedy algorithm:

• Overlapping Subproblems and Optimal substructure property: These

two properties defined exactly the same as in dynamic programming. If
an optimal solution to the problem contains within it optimal solutions
to its subproblem, this is said to be optimal substructure. In our
example, [1, 11], [2, 12], [13, 14], the optimal solution [1, 11], [13, 14]
contains optimal solution [1, 11] that is to its subproblem [1, 11], [2,
12].

• Greedy-choice property: This is the only additional property that

greedy algorithm holds compared with dynamic programming. We
can assemble a globally optimal solution by making a locally optimal
(greedy) choice.
For example, given an array [2, 1, 3, 7, 5, 6], which has as [1, 3, 5, 6],
[2, 3, 5, 6] as the longest increasing subsequence. We define the LIS as
the longest increasing subsequence that ends at a[i-1] for array a[0 : i].
The process of constructing it with dynamic programming shows as
follows:
1 subproblems
2 [ 2 ] , LIS = [ 2 ]
3 [ 2 , 1 ] , LIS = [ 1 ]
4 [ 2 , 1 , 3 ] , LIS= [ 1 , 3 ] , [ 2 , 3 ]
5 [ 2 , 1 , 3 , 7 ] , LIS = [ 1 , 3 , 7 ] , [ 2 , 3 , 7 ]
6 [ 2 , 1 , 3 , 7 , 5 ] , LIS = [ 1 , 3 , 5 ] , [ 2 , 3 , 5 ]
7 [ 2 , 1 , 3 , 7 , 5 , 6 ] , LIS = [ 1 , 3 , 5 , 6 ] , [ 2 , 3 , 5 , 6 ]
8

We clearly see that to get the best solution, we have to rely on the
optimal solution of all preceding subproblems. If we insist on applying
greedy algorithm, this is how to process looks like:
1 subproblems
2 [ 2 ] , LIS = [ 2 ]
3 [ 2 , 1 ] , LIS = [ 2 ] , o n l y compare [ 2 ] and 1
4 [ 2 , 1 , 3 ] , LIS= [ 2 , 3 ]
5 [ 2 , 1 , 3 , 7 ] , LIS = [ 2 , 3 , 7 ]
6 [ 2 , 1 , 3 , 7 , 5 ] , LIS = [ 2 , 3 , 7 ]
7 [ 2 , 1 , 3 , 7 , 5 , 6 ] , LIS = [ 2 , 3 , 7 ]
8
382 17. GREEDY ALGORITHMS

LIS = [2, 3, 7] is locally optimal but not part of the global optimal
solutions which are [1, 3, 5, 6] and [2, 3, 5, 6]. In our non-overlapping
interval problem, if one interval is optimal in the local subproblem, it
will sure be part of the optimal solution to the final problem (globally).

To summarize, greedy algorithms simply works on incrementally build up

one optimal solution. For this single optimal solution to be globally optimal,
each partial optimal solution has to exactly match some prefix of the optimal
solution. Both proving the correctness and design of greedy algorithm thus
has to be done by induction and that at each stage, greedy algorithm is
making the best choices.
To correctly design a greedy algorithm, it has to make a locally optimal
choice according to some rules or orderings. In the above example, we know
the optional solution has the property that si ≤ fi , fi ≤ fi+1 . By sorting
the intervals with increasing order of the finish time. The greedy approach
choose the interval with the earliest finish time, and it says, this belongs to
my optimal solution. And it just need to go through all the candidates in
the order of finishing time and see if it is compatible with the last item, and
we would build up a feasible and optimal solution. It orders its subproblems
to make sure each partial optimal solution will be “prefix” or part of the
global optimal solution.

Practical Guideline
It is clear to us like in dynamic programming, greedy algorithms are for
solving optimization problems, and it subjects to a set of constraints. For
example:

• Maximize the number of events you can attend, but do not attend any
overlapping events.

• Minimize the number of jumps

• Minimize the cost of all edges chosen, but do not disconnect the graph.

Do not worry about the definition of greedy algorithm; it is hard and

often confusing, because it is a natural and highly dependable on the problem
context. However, to come up with a rule that.
I suggest we start with dynamic programming, it is more systemized,
easier to prove the correctness, and it guides us to walk through to the
greedy algorithm which is more efficient just as the process shown in the
example.

Ordering, Monotone Property These constraints bring sense of or-

dering in our optimal solution, and this is when greedy algorithm applies.
Therefore, we say:“beneath every greedy algorithm, there is almost always
17.3. *PROOF 383

a more cumbersome dynamic programming solutions”. But, not every dy-

namic programming we can find a more efficiency greedy algorithm, because
to make greedy algorithm work, there needs to have some ordering in the
optimal solution that makes the locally optimization applicably and globally
optimal. We shall see this in our examples! In the activity scheduling, it is
that d[i] is non-decreasing as shown in its dynamic programming solution.
Because of this property, a glo therefore, and in the dijkstra’s algorithm, it
is that w(s, u) < w(s, v) = w(s, u) + w(u, v). in the shortest path. Once
this monotone property breaks as in a graph with negative weights, greedy
algorithm won’t apply and we have to retreat to dynamic programming.

Pros and Cons

As we see, greedy algorithm has the following pros:

• Simplicity: Greedy algorithms are often easier to describe and code

up than other algorithms.

• Efficiency: Greedy algorithms can often be implemented more effi-

ciently than other algorithms.

However,

• Hard to get it right: Once you have found the right greedy approach,
designing greedy algorithms can be easy. However, finding the right
rule can be hard.

• Hard to verify/prove: Showing a greedy algorithm is correct often

requires a nuanced argument.

17.3 *Proof
The main challenging in greedy algorithms is to prove its correctness, which
is important in theoretical study. However, in real coding practice, we can
leverage the dynamic programming solution to compare with and scrutinize
different kinds of examples to make sure the greedy algorithm and the dy-
namic programming are having the same results. Still, let us just learn this
proof techniques as mastering another powerful tool.

17.3.1 Introduction
First, we introduce generally two techniques/arguments to prove the correct-
ness of a greedy algorithm in a step-by-step fashion using the mathematical
induction, they are: Greedy Stays Ahead and Exchange Arguments.
384 17. GREEDY ALGORITHMS

Greedy stays ahead

This simple style of proof works by showing that, according to some mea-
sures, the optimal solution built by the greedy algorithm is always at least
or better than the optimal solution during each iteration of the algorithm.
Once we have established this argument, we can show that the greedy solu-
tion must be optimal. Typically there are four steps:

1. Define the solution: Define our greedy solution as G and we compare

it against some optimal solution O∗ .

2. Define the measurement: Your goal is to find a series of measurements

you can make of your solution and the optimal solution. Define some
series of measures m1 (X), m2 (X), ..., mn (X) such that m1 (X ∗ ), m2 (X ∗ ), ..., mk (X ∗ )
is also defined for some choices of m and n.Note that there might be
a different number of measures for X and X*, since you can’t assume
at this point that X is optimal

3. Prove Greedy Stays Ahead: Prove that mi (X) ≥ mi (X ∗ ) or that

mi (X) ≤ mi (X ∗ ), whichever is appropriate, for all reasonable values
of i. This argument is usually done inductively.

4. Prove Optimality. Using the fact that greedy stays ahead, prove that
the greedy algorithm must produce an optimal solution. This argu-
ment is often done by contradiction by assuming the greedy solution
isn’t optimal and using the fact that greedy stays ahead to de-rive a
contradiction.

The main challenge with this style of argument is finding the right measure-
ments to make.

Exchange Arguments
It proves that the greedy solution is optimal by showing that we can iter-
atively transform any optimal solution into the greedy solution produced
by greedy algorithm without worsening the cost of the optimal solution.
This transformality matches the word “exchange”. Exchange arguments are
a more versatile technique compared with greedy stays ahead. It can be
generalized into three steps:

1. Define the solution: Define our greedy solution as G = {g1 , ..., gk } and
we compare it against some optimal solution O = {o1 , ..., om }.

2. Compare solutions: Assume the optimal solution is not the same as

the greedy solution; show that if m(G) 6= m(O), then G and O must
differ in some way. How it differs depend on the measurement and the
problem context.
17.3. *PROOF 385

(a) If it is a combination and a length problem, then m(G) = k, and

m(O) = m, we need to prove k = m.
(b) If it is a combination and with objective as a value, we assume
o1 and g1 differs and all others are identical. Then, we swap o1
with g1 .
(c) If it is a permutation with objective function, we assume there
are two consecutive items in O that is in a different order than
they are in G (i.e. there is inversion)

3. Exchange Arguments: Show how to transform O by exchanging some

piece of O for some piece of G. Then, we prove that by doing so, we
did not increase/decrease the cost of O as we transform O to G with
more iterations, proving that greedy is just as good as any optimal
solution and hence is optimal.

Guideline
We will simply go through the list and, but the point is it is we should use
the proof methods as a way to design the greedy algorithm on top of the
dynamic programming.

17.3.2 Greedy Stays Ahead

Maximum Non-overlapping Intervals Proofs Let’s use G, O for our
greedy and optimal solution respectively. G consists of {G1 , G2 , ..., Gk } in
the order they each item is added into the G. Similiarly, O1 , O2 , ..., Om is
one of the optimal sets.

17.3.1 G is a compatible set of intervals.

G is trivially feasible because in our design we discard any interval that

overlaps with our previous greedy choice. Now, all it matters is to prove its
optimality.
We know that there might exist multiple optimal solutions for a problem,
just as shown in our example. In this particular problem, we do not intend
to prove that G = O, because it might not; we prove that G and O has the
same length instead |G| = |O|. We will apply “greedy stays ahead” principle
along with mathematical induction. We first assume that the intervals in
O is ordered by the same rule applied on G–f (Oi ) < f (Oj ), if i < j. In
this case, we use the finishing time f of each interval to measure. To prove
|G| = |O|, we need to prove Greedy stays ahead and the optimality.

17.3.2 f (Gi ) ≤ f (Oi ), i ∈ [1, k],

For i = 1, the statement is true: our algorithm starts by choosing interval

with minimum finish time. We further assume the statement is true for
386 17. GREEDY ALGORITHMS

n − 1 as our induction hypothesis. 17.3.2 will be proved if we can prove that

f (Gn ) ≤ f (On ).
f (Gn−1 ) ≤ f (On−1 ) ≤ s(On ) tells us: right after the selection of Gn−1 ,
On together with others remains in the available set for the selection at
step n–f (On−1 ) ≤ s(On ) is always true because O is a compatible set, and
naturally, f (On ) > s(On ) ≥ f (On−1 ). The greedy algorithm selects the
available interval with smallest finish time, which guarantee that f (Gn ) ≤
f (On ). Now, we have formally proved our sense of “greedy stays ahead” or
rather “greedy never fall behind” using induction.
17.3.3 G is optimal: |G| = |O|.
We apply contradiction: if G is not optimal, then we must have m > k.
Similarily, after step k, we have f (Gk ) ≤ f (Ok ) ≤ s(Ok+1 ), and Ok+1 must
be remaining in the available set for greedy algorithm to choose. Because
greedy algorithm only stops when the remaining set is empty–a contradic-
tion. So far, we successfully applied the “greedy stays ahead” method to
prove the correctness in the example of maximum non-overlapping inter-
vals.

17.3.3 Exchange Arguments

Scheduling to minimize lateness: Instead of having a fixed start and
end time for each interval, we relax the start time. Therefore, we represent
the interval with [ti , di ], where ti is the contiguous time interval and di is the
deadline. There are many objective functions we might want to optimize.
Here, we assume we only have one resource, what is the maximum number
of meetings that we can schedule into a single conference room and none of
them is late or we allow some meetings to run late, but define lateness li as:

l[i] = f [i] − d[i] (17.5)

Say, our object is to minimize the total lateness scheduling all meetings in
one conference room, find the optimal solution:
n−1
O = min (17.6)
X
li
i=0
n−1
= min f [i] − d[i] (17.7)
X

i=0

For example , we have nums = [ [ 4 , 6 ] , [2 , 6] , [5 , 5]]

The o p t i m a l s o l u t i o n i s [ 2 , 6 ] , [4 , 6 ] , [ 5 , 5 ] with t o t a l
l a t e n e s s (2 −6)+(2+4−6)+(2+4+5)−5 = −4+0+6 = 2
Example 2 :
nums = [ [ 2 , 1 5 ] , [ 3 6 , 4 5 ] , [ 9 , 2 9 ] , [ 1 6 , 2 3 ] ,[7 , 9] , [4 ,9] , [5 , 9]]
ans = 47
17.3. *PROOF 387

Analysis First, let us assume all intervals have distinct deadline. A naive
solution is to try all permutation of n intervals and find the one with the
minimum lateness. But what if we start from random order, we compute
its lateness and each time we exchange two adjacent items and see if this
change will decrease the total lateness or not.
1 ______a_i____a_j

Therefore, Say our items are ai and aj . There are four cases according to
di , dj , ti , tj . At first, with lateness s + ti − di , and s + ti + tj − dj . After the
exchange, we have s + tj − dj and s + tj + ti − di . i will definitely be more
late, j however will be less late. Let us compare the additional lateness of i
with the decreased lateness of j:

s + tj + ti − di − (s + ti − di ) →
− s + ti + tj − dj − (s + tj − dj ) (17.8)
tj →
− ti (17.9)

Therefore, we have to exchange i, j if tj < ti . Thus, ordering the list with

increasing order of duration time of each meeting will have the best objective.
Our Python code is:
1 def lateness ( i n t e r v a l s ) :
2 i n t e r v a l s = s o r t e d ( i n t e r v a l s , key=lambda x : ( x [ 0 ] ) )
3 f = 0
4 ans = 0
5 f o r i , ( t , d ) i n enumerate ( i n t e r v a l s ) :
6 f += t
7 ans = ans + f −d
8 r e t u r n ans

Modification However, if we modify our definition of lateness as:

0, if f [i] ≤ d[i],
(
l[i] = (17.10)
f [i] − d[i], otherwise.

Which is to say we do not reward for intervals that are not late with negative
values. Things get more complex.

1. If none of them is late, then exchange or not to change will not make
any difference to the total lateness.
1 ______a_i____a_j__d_i__d_j

2. If both is late, then exchange the items if tj < ti .

3. If i is late, and j is not late, then no matter about their t, exchange

them will only be even later.

4. If i is not late, and j is late. Exchange them will totally depends

388 17. GREEDY ALGORITHMS

Therefore, if we change the definition of lateness, greedy solution is no longer

available for us, not even dynamic programming. But the greedy approach
that first sorts the intervals by the duration time will get us a good start,
then we can track the smallest lateness with backtracking and search prune
by its minimum lateness found so far.
1 d e f l a t e n e s s ( i n t e r v a l s , f , l , g l o b a l m i n , g l o b a l a n s , ans , used ) :
2 i f l e n ( ans ) == l e n ( i n t e r v a l s ) :
3 i f l < globalmin [ 0 ] :
4 globalmin [ 0 ] = l
5 g l o b a l a n s [ 0 ] = ans [ : : ]
6 return
7 f o r i , ( t , d ) i n enumerate ( i n t e r v a l s ) :
8 i f used [ i ] :
9 continue
10 used [ i ] = True
11 f += t
12 i f f −d >= 0 :
13 l+= ( f −d )
14 i f l < globalmin [ 0 ] :
15 ans . append ( i )
16 l a t e n e s s ( i n t e r v a l s , f , l , g l o b a l m i n , g l o b a l a n s , ans , used )
17 ans . pop ( )
18 i f f −d >=0:
19 l −= ( f −d )
20 f −= t
21 used [ i ] = F a l s e
22 return

We call this function with code:

1 i n t e r v a l s = s o r t e d ( i n t e r v a l s , key=lambda x : ( x [ 0 ] ) )
2 globalmin , g l o b a l a n s = [ f l o a t ( ' i n f ' ) ] , [ [ ] ]
3 ans = [ ]
4 used = [ F a l s e ] ∗ l e n ( i n t e r v a l s )
5 l a t e n e s s ( i n t e r v a l s , 0 , 0 , g l o b a l m i n , g l o b a l a n s , ans , used )
6 p r i n t ( globalmin , g l o b a l a n s [ 0 ] )
7 for i in globalans [ 0 ] :
8 p r i n t ( i n t e r v a l s [ i ] , end= ' ' )

We will get the following output:

63 [ 1 , 2 , 0 , 3 , 4 , 5 , 6 ]
[4 , 9] [5 , 9] [2 , 15] [7 , 9] [9 , 29] [16 , 23] [36 , 45]

We can see that no particular rule–not sorting by d, not by t, and not by d−t
which is called slack time–we can find that to solve it in greedy polynomial
time. However, can we use dynamic programming?

Dynamic Programming We can first do a simple experiment: we take

out [2, 15] from our intervals, the resulting optimal solution keeps the
same order as [4, 9] [5, 9] [7, 9] [9, 29] [16, 23] [36, 45], which is a very good
indicator that dynamic programming might apply. We can keep taking out
17.3. *PROOF 389

and the optimal solution is still simply the same order, this indicates the
optimal substructure.
Let us assume we find the best order O for subarray intervals[0 : i],
now we have to prove that the best solution for subarray intervals[0 : i + 1]
can be obtained by inserting interval[i] into O. Assume the position we
insert is at j, so O[0 : j] will not be affected at all, we care about O[j : i].
First, we have to prove that no matter where to add insert interval[i],
the ordering of O needs to keep unchanged for it to have optimal solution.
If insert position is at the end of O, the ordering do not need to change. For
the other positions, however, it is really difficult to prove without enough
math knowledge and optimization.
Let us assume the start time is s for j, j + 1, we know:

l(s + tj − dj ) + l(s + tj + tj+1 − dj+1 ) ≤ l(s + tj+1 − dj+1 ) + l(s + tj + tj+1 − dj )

(17.11)

Because l(c) ∈ [0, c], prove that

l(s + tj + ti − dj ) + l(s + tj + ti + tj+1 − dj+1 ) ≤ l(s + tj+1 + ti − dj+1 ) + l(s + tj + ti + tj+1 − dj )

(17.12)

We can not prove it, and we use this method to try out, but it gives
us wrong answer, so far, all our attempt to use greedy algorithm failed
miserably.
When there is a tie at the deadline, if we schedule the one that takes the
most time first, we end up with higher lateness. For example, if our solution
is [5, 5], [4, 6], [2, 6], the lateness is 5+4-6 + (9+2-6) = 3+5 =8 instead of
6.

What if each interval, if we are allowing multiple re-

sources, what is the least number of conference rooms
we need to schedule all meeting.

630. Course Schedule III, find the maximum number of

non-overlapping meetings can be scheduled within one
resource.

Prove the Kruskal’s Algorithm Let the optimal minimum spanning

tree be O = (V, E ∗ ), and the one generated by greedy approach be G =
(V, E). |E ∗ | = |E| because both is a tree and the number of edges always
equal to |V | − 1. Assume there is one edge e ∈ E ∗ , e 6∈ E, this means that
there is another edge f ∈ E that differs from e. Other than these two edges,
390 17. GREEDY ALGORITHMS

all the other edges are the same. For example, in the graph we say e = (1, 5).
With the constraint that there is only edge differs, f has to be one edge out
of (2, 3), (3, 5); adding e to T forms a cycle, so in T ∗ , it can not have edges
(2, 3), (3, 5) at the same time, thus one referred as f has to be removed in the
T ∗ . It is always true that cost(e) ≥ cost(f ), because otherwise the greedy
approach would have chosen e instead of f .
For the optimal approach, if we replace e with f , then we have cost(T ) =
cost(T ∗ − e + f ) ≤ cost(T ∗ ). This means, with this swap of e and f between
G and O, the cost of the greedy approach is still at most the same as the
optimal cost, transforming the optimal solution to greedy solution will not
worsen the optimal solution.

17.4 Design Greedy Algorithm

We have seen greedy algorithm design, definition, and different examples of
greedy approaches and its proof. One obvious sign that states greedy ap-
proach might apply is, “sorting will not incur the correctness of the optimal
solution but rather greatly simplify the design complexity”. Generally, peo-
ple design a greedy algorithm by trying out rules with objection and hope-
fully find one good enough and then prove its correctness. This approach is
simple but fuzzy. Thus, we prefer a more systemized design approach:
1. Search: Analyze the problem with search and combination–no imple-
mentation is needed, to know our atomic search complexity.
2. Dynamic Programming: Design dynamic programming approach first
by defining state and constructing a recurrence relation repeatedly
until we find one that works well. This step brings us closer to the
greedy approach: it gives us definition of state, recurrence relation and
a polynomial time complexity.
3. Greedy: then further to see if the greedy choice property holds–between
a subproblem pi and its succeeding subproblem pi+1 , if the optimal so-
lution within p_i is also part of the optimal solution within p_i + 1
or we can simply construct an optimal solution from previous optimal
solutions without checking multiple subproblems. If it holds, great,
the previous dynamic programming becomes an overkill, and we fur-
ther improve our approach by simplifying the recurrence relation with
“rules”, which saves us time and/or space. To derive a good “rule”,
we have to study and understand a bunch of “facts”, thus strengthen
our choice with: Does the greedy optimal solution always stay ahead
and be the most promising optimal solution at each step? If not, try
exchanging some items within the previous optimal solution and see if
it improves the situation and keeps us staying ahead.
We will solve the following classical problems with this
17.5. CLASSICAL PROBLEMS 391

17.5 Classical Problems

List classical problems

17.5.1 Scheduling
We have seen two scheduling problems, it is a time-based problem which
naturally follows has a leftmost to rightmost order along the timeline. And
it is about scheduling tasks/meetings to allowed resources. We need to pay
attention to the following contexts:

• Do we have to assign all intervals or just select a maximum set from

all? This relates to the number of resources that are available.

• What are the conditions? Is both start and end time fixed, or they
are highly flexible and are bounded by earliest possible start time and
latest end time?

The core principle is to answer these questions:

• Start and end time:

– Is both fixed? Yes, then we are simply giving these intervals as a

state, no change at all, and the ordering of the start and end time
is exactly the same. If the question is only given one resource and
we need to get the maximum non-overlapping intervals, easy piece
of cake, we follows the order, and check if it is compatible with
one previous intervals. If it asks about the minimum resources
needed, that is the depth of the set, that is the property, we
have discussed how assigning a meeting room to a preceding free
meeting room does not affect the number of free rooms for the
next.
– If it is not fixed, we are given either ti and di for each interval–
we can start at any time si , finish at si + ti , but we do have
a deadline di that better to be met–or we are given bi , ti , di –we
can start at any si , but it would better be si ≥ bi , and end at
si + ti ≤ di . (The second is not sure). The fundamental rule is:
Earliest Deadline First. We have proved that if there is an
inversion, swapping them will only result better objective value.
This usually points out that the optimal solution shall have no
inversion and no idle time on the resource. Whenever we met a
tie at the deadlines, no matter what order of these intervals with
equal deadline, the total lateness is usually the same, which can
be proved with inversion.
392 17. GREEDY ALGORITHMS

Scheduling all Intervals(L253. Meeting Rooms II) In our previous

scheduling problem, there is only a single resource to fit in non-overlapping
intervals. Scheduling all intervals on the other hand requires us to schedule
all the intervals with as few resources as possible. This problem is also
known as interval partitioning problem or interval coloring problem
because our goal is to partition all intervals across multiple resources, and
it is like each resource to be assigned a color.
Given an array of meeting time intervals consisting of start and end times
[[s1 , e1 ], [s2 , e2 ], ...](si < ei ), find the minimum number of conference rooms
required and assign a label for each interval.
Example 1 :
Input : [ [ 2 , 1 5 , 'a '] ,[36 ,45 , 'b ' ] , [ 9 , 2 9 , 'c '] ,[16 ,23 , 'd ' ] , [ 4 , 9 ,
'e ' ] ]
Output : 2

Figure 17.2: All intervals

Analysis The example is plotted in Fig. 17.2. A universal solution is to

treat each interval as a vertex, if two intervals overlap, connect them with
an edge, thus forming a graph. Now, the problem is reduced to a graph
coloring, however, it might be to complicating things.

Find Minimum Number of Conference Rooms

First, let us solve the first problem: What is the minimum number of con-
ference rooms required? By observation and intuition, if at the same time
point, there are a number of overlapping intervals, we have to assign each of
these intervals a different resources. Now, we define the depth d as the max-
imum number of overlapping intervals at any single point on the time-line,
we claim:
17.5. CLASSICAL PROBLEMS 393

17.5.1 In the interval partitioning problem, the number of resources needed

is at least the depth d of the set of intervals.
Before we head off to the proof, let’s discuss how to find the depth. Ac-
cording to the definition, if we are lucky that the time given is in the form
of integer, then, the most straightforward way is to use the sweep line
method, with a counter to track the number of intervals at each integer
time moment. We use a vertical line to sweep from the leftmost to the right
most intervals: this exactly follows a natural order, that when the earliest
meeting starts, we have to assign a room to it no matter what (start has
+1), and when can reuse a meeting room assigned before only if there is one
that is freed (end has -1 as value). The sweep line method We need to watch
out for the edge case, when two intervals where the finish time of one and
the start time of the other overlaps, such as [4, 9], [9, 29], this is not counted
as two. Therefore we make sure to exclude the finish time when scanning,
in range [s, e).

Figure 17.3: All intervals sorted by start and end time.

However, this process can be simplified. When we are scanning, we will

notice that the count will only change when encountering start or finish time
point. First, at the start time of a, we have to assign one room, then at the
start time of e, we have to assign a second room, and at the end of e, we free
the room, and at the start time of c, it reuses the second room right away.
Then in the line, at the end of a, it releases room 1, so d starts reusing the
first room right away. We can simply assign 1 to the start time, -1 to the
finish time, and put all of these points into a list, sort them by time first.
Because to handle the edge case–a tie in the previous sort where the start
and end time is the same, the second degree sorting is used to put -1 in front
of 1 to avoid overcounting the rooms. This process is shown in Fig. 17.3.
1 d e f minMeetingRooms ( i n t e r v a l s ) :
394 17. GREEDY ALGORITHMS

2 i f not i n t e r v a l s :
3 return 0
4 points = [ ]
5 for s , e in i n t e r v a l s :
6 p o i n t s . append ( ( s , +1) )
7 p o i n t s . append ( ( e , −1) )
8 p o i n t s = s o r t e d ( p o i n t s , key=lambda x : ( x [ 0 ] , x [ 1 ] ) )
9 ans = 0
10 total = 0
11 f o r _, f l a g i n p o i n t s :
12 t o t a l += f l a g
13 ans = max( ans , t o t a l )
14 r e t u r n ans

Label Assignment
We can modify the previous code to incorporate label assignment. We sep-
arate the start and end time in two independent lists because only when we
meet a start time, we assign a room, and sort both of them.
2(0) 4(4) 9(2) 16(3) 36(1)
9(4) 15(0) 23(3) 29(2) 45(1)

We put two pointers, sp, ep at the start of the start time list and end time
list respectively. We need zero room at first. And for start pointer at 2,
we assign room one to interval 0 because 2 < 9, no room is freed to reuse.
Then sp moves to 4. 4 < 9, no room is freed, assign room 2 to interval 4.
sp at 9, 9 ≥ 9, meaning we can reuse the room belonged to interval 4, thus
assign room 2 to interval 2. Now, move both sp and ep, we are comparing
16 > 15, meaning interval 3 can reuse the room belonged to interval 0, we
assign room 1 to interval 3. Next, we compare 36 > 23, interval 1 takes the
room number 1 from interval 3. Since one of the pointer reached to the end
of the list, process ends.
1 d e f minMeetingRooms ( i n t e r v a l s ) :
2 s t a r t s , ends = [ ] , [ ]
3 f o r i , ( s , e ) i n enumerate ( i n t e r v a l s ) :
4 s t a r t s . append ( ( s , i ) )
5 ends . append ( ( e , i ) )
6 s t a r t s . s o r t ( key=lambda x : x [ 0 ] )
7 ends . s o r t ( key=lambda x : x [ 0 ] )
8 n = len ( intervals )
9 rooms = [ 0 ] ∗ n
10 sp , ep = 0 , 0
11 label = 0
12 w h i l e sp < n :
13 i n d e x = s t a r t s [ sp ] [ 1 ]
14 # A s s i g n a new room
15 i f s t a r t s [ sp ] [ 0 ] < ends [ ep ] [ 0 ] :
16 rooms [ i n d e x ] = l a b e l
17 l a b e l += 1
17.5. CLASSICAL PROBLEMS 395

18 e l s e : #Reuse a room
19 room_of_end = rooms [ ends [ ep ] [ 1 ] ]
20 rooms [ i n d e x ] = room_of_end
21 ep += 1
22 sp += 1
23 p r i n t ( rooms )
24 return label

The above method is natural but indeed greedy! We sort the intervals by
start time, the worst case we assign each meeting a different room. However,
the number of room can be reduced if we can reuse any previous assigned
meeting rooms that is free at the moment. The depth is controlled by
the time line. For example, for interval c, if both a and e is free at that
moment, does it matter which meeting room to put of c in? Nope. Because
no matter which room it is in, the interval d will overlap with this interval,
thus can not use its meeting room, but still there is the one left from either
a or e. This is what this problem is essentially different from the maximum
non-overlapping problems. The greedy part is we always reassign the room
belongs to the earliest available rooms. A non-greedy and naive way is to
check all preceding meeting rooms, and find one available.

You have to property Did you find that, for the resource assignment,
mostly, we have no much choice, because we have to assign it. The only
choice is which room. We are greedy that we merge it whenever we can.
All the solutions no matter if they put the earliest finished meeting room
to reassign or just random or arbitrary one, they are doing it for a single
purpose: reduce the possible number of resources whenever they can.
An easy way to understand this problem is to notice that: for each
meeting you HAVE TO assign it a room. The worst case is we assign a room
for each single meeting and we do not even need to sort these intervals. Well,
how can we optimize it, minimize the number of rooms? We have to reuse
a room whenever it is possible. Therefore, we need to sort the meeting by
start time. Because the first meeting has no choice but to assign a meeting
room to it. For the second meeting, we have two options: either assign a
room or reuse one that is available now.

• If I choose to reuse a room, does it influence my optimal solution later

on? No, because if we chose to reuse a room, we decrease the total
number of room by one, and later on it wont even affect the available
rooms for the next meeting. It’s like, here is a candy, take it and it
wont affect your chance of having candy at all! Of course I would go
for it.

• Does it matter which one to reuse? Nope. Why? Because the smallest
number of rooms needed are decided by how many meetings collide at
a single time point. No matter which available room you put of this
396 17. GREEDY ALGORITHMS

meeting, for the following meetings the number of available rooms are
always the same:any rooms that are freed from preceding meetings.
Here is the thing. When we are scanning from the leftmost interval to
the rightmost by start time,

This is why there are so many different approaches: iterating preceding

meetings and find any one that is available or put it into a min-heap to use
the earliest available rooms or as the second solution, it is still the same as
of the min-heap, reassign one that ends earliest. This optimization process
is natural and GREEDY!

Proof We have been proved it already informally. The greedy we have will
end up with compatible/feasible solutions where at each meeting rooms, no
two overlapping meetings will be scheduled.

17.5.2 If we use the greedy algorithm above, we can schedule every interval
with d number of resources, which is optimal.

We know that using d number of resources, we have to prove that we

can schedule these intervals with d resources.

Organize Second, how to assign a label to each meeting? Actually, it

is not necessary to know the number of the minimum conference rooms d
needed to assign a label to each, we can get the d by counting the total
number of labels. Now, back to be greedy, it might be tempting at first to
follow the non-overlapping scheduling problems. First, we sort the intervals
by the finish time, and an intuitive strategy to assign labels is: go through
intervals in order, assign each interval a label that differs from any previous
overlapping interval’s. The code is:
1 def colorInterval ( i n t e r v a l s ) :
2 i n t e r v a l s = s o r t e d ( i n t e r v a l s , key=lambda x : x [ 1 ] )
3 l a b e l s = [ ] # l a b e l l i s t to sorted i n t e r v a l s
4 n = len ( intervals )
5 f o r i , ( s , e ) i n enumerate ( i n t e r v a l s ) :
6 excluded = [ ]
7 f o r j in range ( i ) :
8 i f s < i n t e r v a l s [ j ] [ 1 ] : # overlap
9 e x c l u d e d . append ( l a b e l s [ j ] )
10 # assign label
11 f o r l in range (n) :
12 i f l not i n e x c l u d e d :
13 l a b e l s . append ( l )
14 break
15 return len ( set ( l a b e l s ) )
17.5. CLASSICAL PROBLEMS 397

Figure 17.4: Left: sort by start time, Right: sort by finish time.

Figure 17.5: Left: sort by start time, Right: sort by finish time.

Unfortunately, it gives us wrong answer; it used three resources instead of

2 as we proved before. The sorting and the answer is plotted in Fig. 17.4.
What went wrong? In the non-overlapping interval scheduling problem,
what matters is the maximum number of intervals we can fit in within
one resource, sorting by finish time guarantees we fit as many intervals as
possible. However, in this problem, we try to use as less as possible of
resources, we want each resource to be as tight as possible. In math, ....
Therefore, the right way of sorting is to sort by the starting time. If you
insist on sorting by finish time, you will get right answer if you traverse the
intervals in reversed order.
This type of assignment takes O(n2 ) in time complexity. We can easily
do better but with the same greedy strategy.

Optimizations
We use a list rooms which starts as being empty. Each room, we only keep
its end time. After sorting by the start time, we go through each interval
and try to put it in a room if it does not overlap, we put this interval in this
room and update its end time. If no available room found, we assign a new
room instead. With this strategy, we end up with O(nd) in time complexity.
When d is small enough, it saves more time.
398 17. GREEDY ALGORITHMS

1 d e f minMeetingRooms ( i n t e r v a l s ) :
2 i n t e r v a l s = s o r t e d ( i n t e r v a l s , key=lambda x : x [ 0 ] )
3 rooms = [ ] # a l i s t t h a t t r a c k s t h e end time
4 for s , e in i n t e r v a l s :
5 bFound = F a l s e
6 f o r i , r e i n enumerate ( rooms ) :
7 i f s >= r e :
8 rooms [ i ] = e
9 bFound = True
10 break
11 i f not bFound :
12 rooms . append ( e )
13 r e t u r n l e n ( rooms )

Priority Queue Is there a way to fully get rid of the factor of d? In our
case, we loop over all rooms and check if it is available, but we do not even
care which one is, we just need one! So, instead we replace rooms with a
priority queue with uses min-heap, making sure each time we only check the
room with the earliest end time; if it does not overlap, put this meeting into
this room and update its finish time, or else assign a new room.
1 import heapq
2 d e f minMeetingRooms ( i n t e r v a l s ) :
3 i n t e r v a l s = s o r t e d ( i n t e r v a l s , key=lambda x : x [ 0 ] )
4 rooms = [ ] # a l i s t t h a t t r a c k s t h e end time o f each room
5
6 for s , e in i n t e r v a l s :
7 bFound = F a l s e
8 # now , j u s t check t h e room t h a t ends e a r l i e r i n s t e a d o f
check i t a l l
9 i f rooms and rooms [ 0 ] <= s :
10 heapq . heappop ( rooms )
11 heapq . heappush ( rooms , e )
12 r e t u r n l e n ( rooms )

17.5.2 Partition
763. Partition Labels A string S of lowercase letters is given. We want
to partition this string into as many parts as possible so that each letter
appears in at most one part, and return a list of integers representing the
size of these parts.
Example 1 :

Input : S = " a b a b c b a c a d e f e g d e h i j h k l i j "

Output : [ 9 , 7 , 8 ]
Explanation :
The p a r t i t i o n i s " ababcbaca " , " d e f e g d e " , " h i j h k l i j " .
This i s a p a r t i t i o n s o t h a t each l e t t e r a p p e a r s i n a t most one
part .
17.5. CLASSICAL PROBLEMS 399

A p a r t i t i o n l i k e " ababcbacadefegde " , " h i j h k l i j " i s i n c o r r e c t ,

because i t s p l i t s S into l e s s parts .

f o r example `` a b a e f e g d e h i ' ' .

{ aba } , { e f e g d e } , { h i }

Analysis We know we can use the partition type of dynamic programming

to find the maximum length with

d[n] = p[i : n], d[i]

(17.13)
d[n] = max(d[i] + 1), i < n, and p[i:n] is an independet part.
(17.14)
d[i] = p[j : i], d[j], i ∈ [0, n − 1], j < i,
(17.15)
d[i] = max(d[j] + 1)
(17.16)

This will give us a solution with O(n2 ). Not bad! In dynamic programming
solution, when we are solving subproblem “abaefegdeh”, we are checking all
previous subproblems’ solutions. However, if we observe, we only need the
optimal solutions of the previous subproblem “abaefegde”– “aba”, “efegde”
to figure out the optimal solution of current: simply check if ’h’ is in any
parts of the preceding optimal solution and merging parts between the ear-
liest part that incorporates ’h’ to the last. Now, we can also observe that
between subproblems for example “abaefegdehi”.
a , o = {a}
ab , o = { a } , {b}
aba , merge , o ={a , b}
abae , o={a , b } , { e }
a b a e f , o = {a , b } , { e } , { f }
a b a e f e , e e x i s t s , merge , o = {a , b } , { e , f }
abaefeg
abaefegd
abaefegde
abaefegdeh
abaefegdehi

This indeed is already a greedy algorithm. We no longer just blindly care

about the subprobelm state, but we explicitly working on a single partial
optimal solution. We incrementally construct the partial optimal solution
till it is optimal globally. Moreover, we can detect it has unnecessary work
to track these optimal solutions, we can utilize a dict structure to track
the last location of a character is seen in the string. We loop through the
characters in the string one by one, and track a subarray’s start and end
location in the string. Whenever a new character comes, it can either enlarge
400 17. GREEDY ALGORITHMS

this part, or within the last range, or it is the end of the range and different
type of process is applied for different case. The Python code shows more
details of this greedy approach algorithm:
1 from c o l l e c t i o n s import d e f a u l t d i c t
2 class Solution :
3 d e f p a r t i t i o n L a b e l s ( s e l f , S : s t r ) −> L i s t [ i n t ] :
4 n = len (S)
5 loc = defaultdict ( int )
6 f o r i , c i n enumerate ( S ) :
7 l o c [ c ] = i # g e t t h e l a s t l o c a t i o n o f each c h a r
8 l a s t _ l o c = −1
9 p r e v _ l o c = −1
10 ans = [ ]
11 f o r i , c i n enumerate ( S ) :
12 #p r e v _ l o c = min ( prev_loc , i )
13 l a s t _ l o c = max( l a s t _ l o c , l o c [ c ] )
14 i f i == l a s t _ l o c : ##a good one
15 ans . append ( l a s t _ l o c − p r e v _ l o c )
16 prev_loc = l a s t _ l o c
17
18 r e t u r n ans

With the greedy approach, we further decreased the complexity to O(n).

17.5.3 Data Compression, File Merge

File Merge Given array F of size n, each item indicates the length of the
i-th file in the array. Find the best ordering of merging all files.
For example , F = { 1 0 , 5 , 1 0 0 , 5 0 , 2 0 , 1 5 }

First, this is a exponential problem because we might need to try all per-
mutation of a file. Merge the first two files take 10+5 cost, and merge this
further with 100, takes 10+5+100, and so. Now, let us write the cost of the
original order:

c = min(F0 + F1 ) + (F0 + F1 + F2 ) + ... + (F0 + F1 + F2 + ... + Fn−1 )

(17.17)
= min(n − 1)F0 + (n − 2)F1 + ... + Fn−1 (17.18)

From the objective function, because all file size are having positive sizes,
to minimize it, we have to make sure F0 is the smallest item in the array
because it has to be computed the most times, and F2 is the second smallest
and so on. We can easily figure out that sorting the files in increasing orders
and merge them in this order result the least cost of merging. This is a very
simple and natural greedy approach.

Data Compression
17.6. EXERCISES 401

17.5.4 Factional S
17.5.5 Graph Algorithms

17.6 Exercises
• 630. Course Schedule III (hard)
402 17. GREEDY ALGORITHMS
18

Hands-on Algorithmic Problem Solving

The purpose of this chapter to see how our learned algorithm design principle
can be applied into problem solving. We approach problems using different
algorithm design principle and step by step to see how the change in the
time and space complexity.

Longest Increasing Subsequence (300L) Given an unsorted array of

integers, find the length of longest increasing subsequence.
Example :

Input : [ 1 0 , 9 , 2 , 5 , 3 , 7 , 1 0 1 , 1 8 ]
Output : 4
E x p l a n a t i o n : The l o n g e s t i n c r e a s i n g s u b s e q u e n c e i s [ 2 , 3 , 7 , 1 0 1 ] ,
t h e r e f o r e the length i s 4 .

18.1 Direct Approach

18.1.1 Search in Graph
In a subsequence, an item can only have two choice: either in or out of
the resulting subsequence, this makes our total subsequence O(2n ). As we
know the searching process shall always be a search tree, we now start to
generate this subsequence, which starts from []. At the first level, we have
item 10 that have two actions: not adding or adding, which makes it a two
branches. At the second level, we consider item 9. This makes a search
space of a binary tree, and we generate node implicitly since we only need
to track the length of the path so far, and we need value of the last item
to decide the children of current level. So far, we managed to model our

403
404 18. HANDS-ON ALGORITHMIC PROBLEM SOLVING

problem as finding the longest path in the search tree, which is a binary tree
and with height n. We can have the Python code:
1 d e f l e n g t h O f L I S ( s e l f , nums : L i s t [ i n t ] ) −> i n t :
2 d e f d f s ( nums , idx , cur_len , last_num , ans ) :
3 i f i d x >= l e n ( nums ) :
4 ans [ 0 ] = max( ans [ 0 ] , cur _len )
5 return
6 i f nums [ i d x ] > last_num :
7 d f s ( nums , i d x +1, cur _len + 1 , nums [ i d x ] , ans )
8 d f s ( nums , i d x +1, cur_len , last_num , ans )
9 ans = [ 0 ]
10 last_num = −s y s . maxsize
11 d f s ( nums , 0 , 0 , last_num , ans )
12 r e t u r n ans [ 0 ]

18.1.2 Self-Reduction
Now, let us us an example smaller than before, say [2, 5, 3, 7], which has
the LIS 3 with [2, 3, 7]. Let us consider each state not atomic but as a
subproblem. The same tree, but we translate each node differently. We
start to consider the problem top down: we have problem [2, 5, 3, 7], and
our start index = 0, meaning start from item 2, then our problem is can be
divided into different situations:
• not take 2: we find the LIS length of subproblem [5, 3, 7]. In this
case, our subsequence can start from any of these 3 items, we indicate
this case by not changing the previous value. Use idx to indicate the
subproblem/subarray, we call dfs that idx+1.
• take 2: we need to find the LIS length of subproblem [5, 3, 7] whose
subsequence must start from 5. Thus, we set the last_num to 5 in the
recursive call.
Therefore, our code becomes:
1 d e f l e n g t h O f L I S ( s e l f , nums : L i s t [ i n t ] ) −> i n t :
2 d e f d f s ( nums , idx , last_num ) :
3 i f not nums :
4 return 0
5 i f i d x >= l e n ( nums ) :
6 return 0
7 len1 = 0
8 i f nums [ i d x ] > last_num :
9 l e n 1 = 1 + d f s ( nums , i d x +1, nums [ i d x ] )
10 l e n 2 = d f s ( nums , i d x +1 , last_num )
11 r e t u r n max( l e n 1 , l e n 2 )
12
13 last_num = −s y s . maxsize
14 r e t u r n d f s ( nums , 0 , last_num )
In this solution, the time complexity has not improved yet, but from this
approach, we can further increase the efficiency with dynamic programming.
18.2. A TO B 405

18.1.3 Dynamic Programming

Memoization We have known that the recurrence relation takes LIS(i, prev) =
max(LIS(i + 1, prev), LIS(i + 1, nums[i]). How many possible states for
LIS(i, prev)? i ∈ [0, n − 1], and prev can have n candidates too, this makes
the whole state space only n2 . While, using the depth-first tree search we
revisited a state multiple times, which eventually make the time complex-
ity to O(2n ). Now, let us modify the approach and use memo which is a
dictionary and takes a tuple (i, prev) as key. If we found the state is
not computed, we compute as we do in the previous implementation, if it
exists in the memory, however, we just directly return the value and avoid
recomputing again:
1 d e f l e n g t h O f L I S ( s e l f , nums : L i s t [ i n t ] ) −> i n t :
2 d e f d f s ( nums , idx , last_num , memo) :
3 i f i d x >= l e n ( nums ) :
4 return 0
5 i f ( idx , last_num ) not i n memo :
6 len1 = 0
7 i f nums [ i d x ] > last_num :
8 l e n 1 = 1 + d f s ( nums , i d x +1, nums [ i d x ] , memo)
9 l e n 2 = d f s ( nums , i d x +1 , last_num , memo)
10 memo [ ( idx , last_num ) ] = max( l e n 1 , l e n 2 )
11 r e t u r n memo [ ( idx , last_num ) ]
12
13 last_num = −s y s . maxsize
14 memo = {}
15 r e t u r n d f s ( nums , 0 , last_num , memo)

18.2 A to B
Another approach is to use the concept of “prefix” or “suffix”. The LIS must
start from one of the items in the array. Finding the length of the LIS in
the original array can be achieved by comparing n subproblems, the length
of LIS of:
1 [2 , 5 , 3 , 7 ] , LIS s t a r t s a t 2 ,
2 [5 , 3 , 7 ] , LIS s t a r t s a t 5 ,
3 [3 , 7 ] , LIS s t a r t s a t 3
4 [7] , LIS s t a r t s a t 7

18.2.1 Self-Reduction
We model the problem as in Fig. 29.1. Same here, our problem become
finding the longest path in a N-ary tree instead of a binary tree. Define f (i)
as the LIS starting with index i in the array. then, its relation with other
state will be f (i) = maxj (f (j)) + 1, j > i, a[j] > a[i], and f [n] = 0. Here,
the base case is when there has element to start from which will have 0 LIS.
406 18. HANDS-ON ALGORITHMIC PROBLEM SOLVING

Figure 18.1: Graph Model for LIS, each path represents a possible solution.

1 d e f l e n g t h O f L I S ( s e l f , nums : L i s t [ i n t ] ) −> i n t :
2 d e f d f s ( nums , idx , cur_num ) :
3 max_len = 0
4 # Generate t h e next node
5 f o r i i n r a n g e ( i d x +1, l e n ( nums ) ) :
6 i f nums [ i ] > cur_num :
7 max_len = max( max_len , 1 + d f s ( nums , i , nums [ i ] )
)
8 r e t u r n max_len
9 r e t u r n d f s ( nums , −1, −s y s . maxsize )

18.2.2 Dynamic Programming

Memoization Similar to the last approach, we can write code:
1 d e f l e n g t h O f L I S ( s e l f , nums : L i s t [ i n t ] ) −> i n t :
2 d e f d f s ( nums , idx , cur_num , memo) :
3 max_len = 0
4 # Generate t h e next node
5 i f ( idx , cur_num ) not i n memo :
6 f o r i i n r a n g e ( i d x +1, l e n ( nums ) ) :
7 i f nums [ i ] > cur_num :
8 max_len = max( max_len , 1 + d f s ( nums , i , nums
[ i ] , memo) )
9 memo [ ( idx , cur_num ) ] = max_len
10 r e t u r n memo [ ( idx , cur_num ) ]
11 memo = {}
12 r e t u r n d f s ( nums , −1, −s y s . maxsize , memo)

Tabulation With the bottom-up manner, we need to tweet our above

recurrence function and definition of state. The subproblem f (i) here will
be defined as the LIS ending at index i. We shall pay attention that with n
elements there should exist n + 1 states in total, that there is an empty state
with empty array []. The recurrence function will be shown in Eq. 18.1. It
can be explained the LIS ending at index i will be transitioned from LIS
18.2. A TO B 407

ending at any previous index by plusing one. The whole analysis process is
illustrated in Fig 18.2.

1 + max(f (j)), −1 ≤ j < i < n, arr[j] < arr[i];

(
f (i) = (18.1)
0 i=-1

To simply the implementation, we insert a −∞ value at the beginning of

Figure 18.2: The solution to LIS.

the array. To initialize we set dp[0] = 0, and the answer is max(dp). The
time complexity is O(n2 ) because we need two for loops: one outsider loop
with i, and another inside loop with j. The space complexity is O(n). The
Python code is:
1 def l i s (a) :
2 # d e f i n e t h e dp a r r a y
3 dp = [ 0 ] ∗ ( l e n ( a ) +1)
4 a = [− s y s . maxsize ] + a
5 print (a)
6 f o r i i n r a n g e ( l e n ( a ) ) : # end with i n d e x i −1
7 f o r j in range ( i ) :
8 if a[ j ] < a[ i ]:
9 dp [ i ] = max( dp [ i ] , dp [ j ] + 1 )
10 p r i n t ( dp )
11 r e t u r n max( dp )

18.2.3 Divide and Conquer

We can even speedup further by using binary search, the second loop we
can use a binary search to make the time complexity O(logn), and the dp
array used to save the maximum ans. Each time we use binary search to
find an insertion point, if it is at the end, then the length grow. [4]->[4,10],-
>[4,10],[3,10],->[3,8]->[3,8,9]
408 18. HANDS-ON ALGORITHMIC PROBLEM SOLVING

1 d e f l e n g t h O f L I S ( s e l f , nums ) :
2 """
3 : type nums : L i s t [ i n t ]
4 : rtype : int
5 """
6 d e f b i n a r y S e a r c h ( a r r , l , r , num) :
7 w h i l e l <r :
8 mid = l +(r−l ) //2
9 i f num>a r r [ mid ] :
10 l=mid+1
11 e l i f num<a r r [ mid ] :
12 r=mid
13 else :
14 r e t u r n mid
15 return l
16 max_count = 0
17 i f not nums :
18 return 0
19 dp =[0 f o r _ i n r a n g e ( l e n ( nums ) ) ]#s a v e t h e maximum t i l l
now
20 maxans =1
21 l e n g t h =0
22 f o r i d x i n r a n g e ( 0 , l e n ( nums ) ) : #c u r r e n t combine t h i s t o
t h i s s u b s e q u e n c e , 10 t o [ ] , 9 t o [ 1 0 ]
23 pos = b i n a r y S e a r c h ( dp , 0 , l e n g t h , nums [ i d x ] ) #f i n d
i n s e r t i o n point
24 dp [ pos ]= nums [ i d x ] #however i f i t i s not a t end , we
r e p l a c e i t , c u r r e n t number
25 i f pos==l e n g t h :
26 l e n g t h+=1
27 p r i n t ( dp )
28 return length

s u b s e q u e n c e , 10 t o [ ] , 9 t o [ 1 0 ]
22 pos = b i n a r y S e a r c h ( dp , 0 , l e n g t h , nums [ i ] )
23 dp [ pos ]= nums [ i ]
24 i f pos==LIS :
25 LIS += 1
26 r e t u r n LIS
410 18. HANDS-ON ALGORITHMIC PROBLEM SOLVING
Part V

Classical Algorithms

411
413

In this part, we focus on application through solving a few families of

classical real-problems, ranging from advanced search algorithms on lin-
ear data structures, advanced graph algorithms, to typical string pattern
matching. By studying and analyzing each problem’s representative algo-
rithm whereby the fundamental algorithm design and analysis principles are
leveraged, we further enforce our skills to algorithmic problem solving.
414
19

Advanced Search on Linear Data Structures

Figure 19.1: Two pointer Technique

On linear data structures, or on implicit linear state space, either a

particular targeted item or a consecutive substructure such as a subarray
and substring can be searched.
To find a single item on linear space, we can apply linear search in
general, or binary search if the data structure is ordered/sorted with loga-
rithmic cost. In this chapter, we introduce two pointer techniques that are
commonly used to solve two types of problems:

1. Searching: To search for an item such as median, a predefined sub-

structure, and a substructure that satisfy certain conditions such as
finding the minimum subarray length wherein the subarray equals to
a targeted sum. Or find a substructure satisfy a string pattern.

2. Adjusting: To adjust ordering or arrangement of items in the data

structure such as removing duplicates from sorted array.

As the name suggests, Two pointers technique involves two pointers that
start and move with the following two patterns:

415
416 19. ADVANCED SEARCH ON LINEAR DATA STRUCTURES

1. Equi-directional: Both pointers start from the beginning of the array,

and usually one moves faster and the other slower. Sliding window
algorithm can be put into this category.

2. Opposite-directional: One pointer start at the start position and con-

versely the other pointer starts at the end. These two oppositely posed
pointers move toward each other and usually meet in the middle.
In the following sections, we will detail on two-pointer technique exemplified
on real interview questions.

19.1 Slow-Faster Pointers

Suppose we have two pointers, i and j, which may or may not start at the
start position in the linear data structures, but one move slower (i) and the
other faster (j). Two pointers can decide either a pair or a subarray to solve
related problems. For the case of subarray, the algorithm is called sliding
window algorithm. On the span of the array, and at most of three potential
sub-spaces exist: from start index to i ([0, i]), from i to j ([i, j]), and from
j to the end index ([j, n]).
Even though slow-faster pointers technique rarely given formal introduc-
tion in book, it is widely used in algorithms. In sorting, Lumuto’s partition
in the QuickSort used the slow-faster pointers to divide the whole region
into three parts according the comparison result to the pivot: Smaller Items
region, Larger Items region, and the unrestricted region. In string pattern
matching, fixed sliding window and one we will introduce in this chapter.
In this section, we explain how two pointers work on two types of linear
data structures: Array and Linked List.

19.1.1 Array
Remove Duplicates from Sorted Array(L26)
Given a sorted array a = [0, 0, 1, 1, 1, 2, 2, 3, 3, 4], remove the duplicates in-
place such that each element appears only once and return the new length.
Do not allocate extra space for another array, you must do this by modifying
the input array in-place with O(1) extra memory. In the given example,
there are in total of 5 unique items and 5 is returned.

Analysis We set both slower pointer i and the faster pointer j at the first
item in the array. Recall that slow-fast pointers cut the space of the sorted
array into three parts, we can define them as:
1. unique items in region [0, i],

2. untouched items in region [i + 1, j],

19.1. SLOW-FASTER POINTERS 417

3. and unprocessed items in region [j + 1, n).

In the process, we compare the items pointed by two pointers, once these
two items does not equal, we find an new unique item. We copy this unique
item at the faster pointer right next to the position of the slower pointer.
Afterwards, we move the slow pointer by one position to remove duplicates
of our copied value.
With our example, at first, i = j = 0, region one has one item which is
naively unique and region two has zero item. Part of the process is illustrated
as:
i j [0 , i] [ i +1, j ] process
0 0 [0] [] item 0==0, j +1=1
0 1 [0] [0] item 0==0, j +1=2
0 2 [0] [0 , 1] item 0!=1 , i +1=1, copy 1 t o i n d e x 1 , j
+1=3
1 3 [0 , 1] [1 , 1] item 1==1, j +1=4
1 4 [0 , 1] [ 1 , 1 , 1 ] item 1==1, j +1=5
1 5 [0 , 1] [ 1 , 1 , 1 , 2 ] item 1==2, i +1=2, copy 2 t o i n d e x 2 ,
j +1=6
2 6 [0 , 1 , 2] [1 , 1 , 2 , 2]

The code is given as:

1 d e f r e m o v e D u p l i c a t e s ( nums ) −> i n t :
2 i , j = 0, 0
3 w h i l e j < l e n ( nums ) :
4 i f nums [ i ] != nums [ j ] :
5 # Copy j t o i +1
6 i += 1
7 nums [ i ] = nums [ j ]
8 j += 1
9 return i + 1

After calling the above function on our given example, array a becomes
[[0, 1, 2, 3, 4, 2, 2, 3, 3, 4]. Check the source code for the whole visualized pro-
cess.

Minimum Size Subarray Sum(L209)

Given an array of n positive integers and a positive integer s, find the min-
imal length of a contiguous subarray of which the sum ≥ s. If there isn’t
one, return 0 instead.
Example :

I n p u t : s = 7 , nums = [ 1 , 4 , 1 , 2 , 4 , 3 ]
Output : 2
E x p l a n a t i o n : t h e s u b a r r a y [ 4 , 3 ] has t h e minimal l e n g t h under t h e
problem c o n s t r a i n t .
418 19. ADVANCED SEARCH ON LINEAR DATA STRUCTURES

Analysis In this problem, we need to secure a substructure–subarray–that

not only satisfies a condition(sum ≥ s) but also has the minimal length.
Naively, we can enumerate all subarrays and search through them to find
the minimal length, which requires at least O(n2 ) time complexity using
prefix sum. The code is as:

However, we can use two pointers i and j (i ≤ j) and both points at the
first item. In this case, these two pointers defines a subarray a[i : j + 1] and
we care the region [i, j]. As we increase pointer j, we keep adding positive
item into the sum of the subarray, making the subarray sum monotonically
increasing. Oppositely, if we increase pointer i, we remove positive item
away from the subarray, making the sum of the subarray monotonically
decreasing. The detailed steps of two pointer technique in this case is as:

1. Get the optimal subarray for all subproblems(subarries) that start

from current i, which is 0 at first. We accomplish this by forwarding
j pointer to include enough items until sum ≥ s that we pause and go
to the next step. Let’s assume pointer j stops at e0 .

2. Get the optimal subarray for all subproblems(subarries) that end with
current j, which is e0 at the moment. We do this by forwarding pointer
i this time to shrink the window size until sum ≥ s no longer holds.
Let’s assume pointer i stops at index s0 . Now, we find the optimal
solution for subproblems a[0 : i, 0 : j]( denoting subarries with the
start point in range [0, i] and the end point in range [0, j].

3. Now that i = s0 and j = e0 , we repeat step 1 and 2.

In our example, we first move j until j = 3 with a subarray sum of 8. Then

we move pointer i until i = 1 when the subarray sum is less than 7. For
subarray [1, 4, 1, 2], we find its optimal solution to have a length 3. The
Python code is given as:
1 d e f minSubArrayLen ( s : i n t , nums ) −> i n t :
2 i , j = 0, 0
3 acc = 0
4 ans = f l o a t ( ' i n f ' )
5 w h i l e j < l e n ( nums ) :
6 a c c += nums [ j ]
7 # S h r i n k t h e window
8 w h i l e a c c >= s :
9 ans = min ( ans , j − i + 1 )
10 a c c −= nums [ i ]
11 i += 1
12 j += 1
13
14 r e t u r n ans i f ans < f l o a t ( ' i n f ' ) e l s e 0
19.1. SLOW-FASTER POINTERS 419

Because both pointer i and j move at most n steps, with the total op-
erations to be at most 2n, making the time complexity as O(n). The above
question would be trivial if the maximum subarray length is asked.

19.1.2 Minimum Window Substring (L76, hard)

Given a string S and a string T , find all the minimum windows in S which
will contain all the characters in T in complexity O(n).
Example :
I n p u t : S = "AOBECDBANC" , T = "ABC"
Output : [ "CDBA" , "BANC" ]

Figure 19.2: The data structures to track the state of window.

Analysis Applying two pointers, with the region between pointer i and j
to be our testing substring. For this problem, the condition for the window
[i, j] it will at most have all characters from T . The intuition is we keep
expanding the window by moving forward j until all characters in T is
found. Afterwards, we contract the window so that we can find the minimum
window with the condition satisfied. Instead of using another data structure
to track the state of the current window, we can depict the pattern T as a
dictionary data structure where all unique characters comprising the keys
and with the number of occurrence of each character as value. We use
another variable count to track how the number of unique characters. In
all, they are used to track the state of the moving window in [i, j], with the
value of the dictionary to indicate how many occurrence is short of, and the
count represents how many unique characters is not fully found, and we
depict the state in Fig. 19.2.
Along the expanding and shrinking of the window that comes with the
movement of pointer i and j, we track the state with:
• When forwarding j, we encompass S[j] in the window. If S[j] is a
key in the dictionary, decrease the value by one. Further, if the value
reaches to the threshold 0, we decrease count by one, meaning we are
short of one less character in the window.
420 19. ADVANCED SEARCH ON LINEAR DATA STRUCTURES

Figure 19.3: The partial process of applying two pointers. The grey shaded
arrow indicates the pointer that is on move.

• Once count=0, our window satisfy condition for contracting. We then

forward i, removing S[i] from the window if it is existing key in the
dictionary by increasing this key’s value, meaning the window is short
of one more character. Once the value reaches to the threshold of 1,
we increase count.

Part of this process with our example is shown in Fig. 19.3. And the Python
code is given as:
1 from c o l l e c t i o n s import Counter
2 d e f minWindow ( s , t ) :
3 d i c t _ t = Counter ( t )
4 count = l e n ( d i c t _ t )
5 i , j = 0, 0
6 ans = [ ]
7 minLen = f l o a t ( ' i n f ' )
8 while j < len ( s ) :
9 c = s[j]
10 i f c in dict_t :
11 d i c t _ t [ c ] −= 1
12 i f d i c t _ t [ c ] == 0 :
19.1. SLOW-FASTER POINTERS 421

13 count −= 1
14 # S h r i n k t h e window
15 w h i l e count == 0 and i < j :
16 curLen = j − i + 1
17 i f curLen < minLen :
18 minLen = j − i + 1
19 ans = [ s [ i : j + 1 ] ]
20 e l i f curLen == minLen :
21 ans . append ( s [ i : j +1])
22
23 c = s[i]
24 i f c in dict_t :
25 d i c t _ t [ c ] += 1
26 i f d i c t _ t [ c ] == 1 :
27 count += 1
28 i += 1
29
30 j += 1
31 r e t u r n ans

19.1.3 When Two Pointers do not work

Two pointer does not always work on subarray related problems.

What happens if there exists negative number in the

array?
Since the sum of the subarray is no longer monotonically increasing
with the number of items between two pointers, we can not figure out
how to move two pointers each step. Instead (1) we can use prefix
sum and organize them in order, and use binary search to find all
possible start index. (2) use monotone stack (see LeetCode probelm:
325. Maximum Size Subarray Sum Equals k, 325. Maximum Size
Subarray Sum Equals k (hard)))

What if we are to check the maximum average subarray?

644. Maximum Average Subarray II (hard). Similarly, the average

of subarray does not follow a certain order with the moving of two
pointers at each side, making it impossible to decide how to make the
two pointers.

19.1.4 Linked List

The complete code to remove cycle is provided in google colab together with
running examples.
422 19. ADVANCED SEARCH ON LINEAR DATA STRUCTURES

Middle of the Linked List(L876)

Given a non-empty, singly linked list with head node head, return a middle
node of linked list. When the linked list is of odd length, there exists one and
only middle node, but when it is of even length, two exists and we return
the second middle node.
Example 1 ( odd l e n g t h ) :

Input : [ 1 , 2 , 3 , 4 , 5 ]
Output : Node 3 from t h i s l i s t ( S e r i a l i z a t i o n : [ 3 , 4 , 5 ] )

Example 2 ( even l e n g t h ) :

Input : [ 1 , 2 , 3 , 4 , 5 , 6 ]
Output : Node 4

from t h i s l i s t ( S e r i a l i z a t i o n : [ 4 , 5 , 6 ] )

Analysis If the data structure is array, we can compute the position of

the middle item simply with the total length. Following this method, if only
one pointer is applied, we can first iterate over the whole linked list in O(n)
time to get the length. Then we do another iteration to obtain the middle
node. n + n2 times of operations needed, making the time complexity O(n).
However, we can apply two pointers simultaneously at the head node,
each one moves at different paces: the slow pointer moves one step at a
time and the fast moves two steps instead. When the fast pointer reached
the end, the slow pointer will stop at the middle. This slow-faster pointers
technique requires only n2 times of operations, which is three times faster
than our naive method, although the big Oh time complexity still remains
O(n).

Figure 19.4: Slow-fast pointer to find middle

Implementation Simply, we illustrate the process of running the two

pointers technique on our two examples in Fig. 19.4. As we can see, when
the slow pointer reaches to item 3, the faster pointer is at item 5, which
19.1. SLOW-FASTER POINTERS 423

is the last item in the first example that comes with odd length. Further,
when the slow pointer reaches to item 4, the faster pointer reaches to the
empty node of the last item in the second example that comes with even
length. Therefore, in the implementation, we check two conditions in the
while loop:

1. For example 1: if the fast pointer has no successor (fast.next==None),

the loop terminates.

2. For example 1: if the fast pointer is invalid (fast==None), the loop

terminates.

The Python code is as:

1 d e f middleNode ( head ) :
2 s l o w = f a s t = head
3 w h i l e f a s t and f a s t . next :
4 f a s t = f a s t . next . next
5 s l o w = s l o w . next
6 return slow

Floyd’s Cycle Detection (Floyd’s Tortoise and Hare)

Figure 19.5: Circular Linked List

When a linked list which has a cycle, as shown in Fig. 19.5, iterating
items over the list will make the program stuck into infinite loop. The
pointer starts from the heap, traverse to the start of the loop, and then comes
back to the start of the loop again and continues this process endlessly. To
avoid being stuck into a “trap”, we have to possibly solve the following three
problems:

1. Check if there exists a cycle.

2. Check where the cycle starts.

3. Remove the cycle once it is detected.

The solution encompasses the exact way of slow faster pointers traversing
through the linked list as our last example. With the slow pointer iterating
one item at a time, and the faster pointer in double pace, these two pointers
will definitely meet at one item in the loop. In our example, they will meet
424 19. ADVANCED SEARCH ON LINEAR DATA STRUCTURES

at node 6. So, is it possible that it will meet at the non-loop region starts
from the heap and ends at the start node of the loop? The answer is No,
because the faster pointer will only traverse through the non-loop region
once and it is always faster than the slow pointer, making it impossible to
meet in this region. This method is called Floyd’s Cycle Detection, aka
Floyd’s Tortoise and Hare Cycle Detection. Let’s see more details at how
to solve our mentioned three problems with this method.

Check Linked List Cycle(L141) Compared with the code in the last
example, we only need to check if the slow and fat pointers are pointing at
the same node: If it is, we are certain that there must be a loop in the list
and return True, otherwise return False.
1 d e f h a s C y c l e ( head ) :
2 s l o w = f a s t = head
3 w h i l e f a s t and f a s t . next :
4 s l o w = s l o w . next
5 f a s t = f a s t . next . next
6 i f s l o w == f a s t :
7 r e t u r n True
8 return False

Check Start Node of Linked List Cycle(L142) Given a linked list,

return the node where the cycle begins. If there is no cycle, return None.

Figure 19.6: Floyd’s Cycle finding Algorithm

For a given linked list, assume the slow and fast pointers meet at node
somewhere in the cycle. As shown in Fig. 19.6, we denote three nodes: head
(h, start node of cycle(s), and meeting node in the cycle(m). we denote the
distance between h and s to be x, the distance between s and m to be y, and
the distance between m and s to be z. Because the faster pointer traverses
through the list in double speed, when it meets up with the slow pointer,
the distance that it traveled(x + y + z + y) to be two times of the distance
19.1. SLOW-FASTER POINTERS 425

traveled by the slow pointer (x + y).

x+y+z+y =x+y (19.1)

x=z (19.2)

From the above equation, we obtain the equal relation between x and z. the
starting node of the cycle from the head is x, and y is the distance from
the start node to the slow and fast pointer’s node, and z is the remaining
distance from the meeting point to the start node. Therefore, after we have
detected the cycle from the last example, we can reset the slow pointer to
the head of the linked list after. Then we make the slow and the fast pointer
both traverse at the same pace–one node at a time–until they meet at a
node we stop the traversal. The node where they stop at is the start node
of the cycle. The code is given as:
1 d e f d e t e c t C y c l e ( head ) :
2 s l o w = f a s t = head
3
4 d e f g e t S t a r t N o d e ( slow , f a s t , head ) :
5 # Reset slow p o i n t e r
6 s l o w = head
7 w h i l e f a s t and s l o w != f a s t :
8 s l o w = s l o w . next
9 f a s t = f a s t . next
10 return slow
11
12 w h i l e f a s t and f a s t . next :
13 s l o w = s l o w . next
14 f a s t = f a s t . next . next
15 # A cycle i s detected
16 i f s l o w == f a s t :
17 r e t u r n g e t S t a r t N o d e ( slow , f a s t , head )
18
19 r e t u r n None

Remove Linked List Cycle We can remove the cycle by recirculing the
last node in the cycle, which in example in Fig. 19.5 is node 6 to an empty
node. Therefore, we have to modify the above code to make the slow and
fast pointers stop at the last node instead of the start node of the loop. This
subroutine is implemented as:
1 d e f r e s e t L a s t N o d e ( slow , f a s t , head ) :
2 s l o w = head
3 w h i l e f a s t and s l o w . next != f a s t . next :
4 s l o w = s l o w . next
5 f a s t = f a s t . next
6 f a s t . next = None

The complete code to remove cycle is provided in google colab together with
running examples.
426 19. ADVANCED SEARCH ON LINEAR DATA STRUCTURES

What if there has not only one, but multiple cycles in

the Linked List?

19.2 Opposite-directional Pointers

Another variant of two pointers technique is to place these two pointers
oppositely: one at the beginning and the other at the end of the array.
Through the process, they move toward each other until they meet in the
middle. Details such as how much each pointer moves or which pointer to
move at each step decided by our specific problems to solve. We just have
to make sure when we are applying this technique, we have considered its
whole state space, and will not miss out some area which makes the search
incomplete.
The simplest example of this two pointers method is to reverse an array
or a string around. For example, when the list a = [1, 2, 3, 4, 5] is reversed,
it becomes [5, 4, 3, 2, 1]. Of course we can simply assign a new list and copy
the items in reversed orders. But, with two pointers, we are able to reverse
it in-place and using only O( n2 ) times of operations through the following
code:
1 def reverse (a) :
2 i , j = 0 , len (a) − 1
3 while i < j :
4 # Swap i t e m s
5 a[ i ] , a[ j ] = a[ j ] , a[ i ]
6 i += 1
7 j −= 1

Moreover, binary search can be viewed as an example of opposite-directional

pointers. At first, these two pointers are the first and the last item in the
array. Then depends on which side of the target compared with the item
in the middle, one of the pointers move either forward or backward to the
middle point, reducing the search space to half of where it started at each
step. We also explore another example with this technique.

Two Sum on Sorted Array(L167)

Given an array of integers that is already sorted in ascending order, find two
numbers such that they add up to a specific target number.
Input : numbers = [ 2 , 7 , 1 1 , 1 5 ] , t a r g e t = 9
Output : [ 1 , 2 ]
E x p l a n a t i o n : The sum o f 2 and 7 i s 9 . T h e r e f o r e i n d e x 1 = 0 ,
index2 = 1 .
19.3. FOLLOW UP: THREE POINTERS 427

Analysis If we simply put enumerate all possible pairs, we have to take

O(n2 ) to solve this problem. However, with the opposite-directional two
pointers, it gives out linear performance.
Denote the list as A = [a1 , a2 , ..., an−1 , an ], and for the sorted array we
have a1 ≤ a2 ≤ ... ≤ an−1 ≤ an . The range of the sum of any two items in the
array is within two possible ranges: [a1 +a2 , a1 +an ] and [a1 +an , an−1 +an ].
By placing one pointer i at a1 and the other j at an to start with, we can
get a1 + an as the sum. Pointer i can only move forward, accessing larger
items. On the other hand, pointer j can only backward, accessing smaller
items. Now there are three scenarios according to the comparison between
the target and the current sum of the two pointers:

1. If t == a[i] + a[j], target sum found.

2. If t > a[i] + a[j], we have to increase the sum, we can only do this by
moving pointer i forward.

3. If t > a[i] + a[j], we have to decrease the sum, we can only do this by
moving pointer j backward.

The Python code is as:

1 d e f twoSum ( a , t a r g e t ) :
2 n = len (a)
3 i , j = 0 , n−1
4 while i < j :
5 temp = a [ i ] + a [ j ]
6 i f temp == t a r g e t :
7 return [ i , j ]
8 e l i f temp < t a r g e t :
9 i += 1
10 else :
11 j −= 1
12 return [ ]

19.3 Follow Up: Three Pointers

Sometimes, manipulating two pointers is not even enough to distinguish
different subspaces, we might need to the assistant of one another pointer
to make things work.

Binary Subarrays With Sum (L930)

In an array A of 0s and 1s, how many non-empty subarrays have sum S?
Example 1 :
Input : A = [ 1 , 0 , 1 , 0 , 1 ] , S = 2
Output : 4
Explanation :
428 19. ADVANCED SEARCH ON LINEAR DATA STRUCTURES

The 4 s u b a r r a y s a r e l i s t e d below :
[ 1 , 0 , 1 ] , index (0 , 2)
[ 1 , 0 , 1 , 0 ] , index (0 , 3)
[ 0 , 1 , 0 , 1 ] , index (1 , 4)
[ 1 , 0 , 1 ] , index (2 , 4)

Analysis This problem is highly similar to the minimum length subarray

problem we encountered before. We naturally start with two pointers i and
j, and restrict the subarray in range [i, j] to satisfy condition sum ≤ S. The
window is contracted when the condition is violated. We would have write
the following code:
1 d e f numSubarraysWithSum ( a , S ) :
2 i , j = 0, 0
3 win_sum = 0
4 ans = 0
5 while j < len ( a ) :
6 win_sum += a [ j ]
7 w h i l e i < j and win_sum > S :
8 win_sum −= a [ i ]
9 i += 1
10 i f win_sum == S :
11 ans += 1
12 p r i n t ( ' ( { } , { } ) ' . format ( i , j ) )
13 j += 1
14 r e t u r n ans

However, the above code only returns 3, instead of 4 as shown in the example.
By printing out pointers i and j, we can see the above code is missing case
(2, 4). Why? Because we are restricting the subarray sum in range [i, j] to
be smaller than or equal to S, with the occruence of 0s that might appear
in the front or in the rear of the subarray:

• In the process of expanding the subarray, pointer j is moved one at

a time. Thus, even though 0s appear in the rear of the subarray, the
counting is correct.

• However, in the process of shrinking the subarray while the restriction

is violated(sum > S), we stop right away once sum ≤ S. And in the
code, we end up only counting it as one occurrence. With 0s at the
beginning of the subarray, such as the subarray [0, 1, 0, 1] with index
1 and 4, there count should be two instead of one.

The solution is to add another pointer ih to handle the missed case: When
the sum = S, count the total occurrence of 0 in the front. Compared with
the above solution, the code only differs slightly with the additional pointer
and one extra while loop to deal the case. Also we need to pay attention
that ih ≤ j, otherwise, the while loop would fail with example with only
zeros and a targeting sum 0.
19.4. SUMMARY 429

1 d e f numSubarraysWithSum ( a , S ) :
2 i , i_h , j = 0 , 0 , 0
3 win_sum = 0
4 ans = 0
5 while j < len ( a ) :
6 win_sum += a [ j ]
7 w h i l e i < j and win_sum > S :
8 win_sum −= a [ i ]
9 i += 1
10 # Move i_h t o count a l l z e r o s i n t h e f r o n t
11 i_h = i
12 w h i l e i_h < j and win_sum == S and a [ i_h ] == 0 :
13 ans += 1
14 i_h += 1
15
16 i f win_sum == S :
17 ans += 1
18 j += 1
19 r e t u r n ans

We noticed that in this case, we have to explicitly restrict i < j and

ih < j due to the special case, while in all our previous examples, we do not
have to.

19.4 Summary
Two pointers is a powerful tool for solving problems on liner data structures,
such as “certain” subarray and substring problems as we have shown in the
examples. The “window” secluded between the two pointers can be viewed
as sliding window: It can move slide forwarding with the forwarding the
slower pointer. Two important properties are generally required for this
technique to work:

Figure 19.7: Sliding Window Property

1. Sliding window property: Either we move the faster pointer j forward

by one, or move the slower pointer i, we can get the state of current
window in O(1) cost knowing the state of the last window.
430 19. ADVANCED SEARCH ON LINEAR DATA STRUCTURES

For example, given an array, imagine that we have a fixed size window
as shown in Fig. 19.7, and we can slide it forward one position at
a time, compute the sum of each window. The bruteforce solution
would be of O(kn) complexity where k is the window size and n is the
array size by using two nested for loops: one to set the starting point,
and the other to compute the sum in O(k). However, the sum of the
current window (Sc ) can be computed from the last window (Sl ), and
the items that just slid out and in as aj and ai respectively. Then
Sc = Sl − ai + aj . Getting the state of of the window between two
pointers in O(1) as shown in the example is our called Sliding Window
Property.
Usually, for an array with numerical value, it satisfies the sliding win-
dow property if we are to compute its sum or product. For substring,
as shown in our minimum window substring example, we can get the
state of current window referring to the state of the last window in
O(1) with the assist of dictionary data structure. In substring, this
is more obscure, and the general requirement is that the state of the
substring does not relate to the order of the characters(anagram-like
state).
2. Monotonicity: For subarray sum/product, the array should only com-
prise all positive/negative values so that the prefix sum/product has
monotonicity: moving the faster pointer and the slower pointer for-
ward results into opposite change to the state. The same goes for the
substring problems where we see from the minimum window substring
example the change of the state: count and the value of the dictionary
is monotonic, and each either increases or decreases with the moving
of two pointers.

19.5 Exercises
1. 3. Longest Substring Without Repeating Characters
2. 674. Longest Continuous Increasing Subsequence (easy)
3. 438. Find All Anagrams in a String
4. 30. Substring with Concatenation of All Words
5. 159. Longest Substring with At Most Two Distinct Characters
6. 567. Permutation in String
7. 340. Longest Substring with At Most K Distinct Characters
8. 424. Longest Repeating Character Replacement
20

Advanced Graph Algorithms

Our standing at graph algorithms:

1. Search Strategies (Chapter)

2. Combinatorial Search(Chapter)

3. Advanced Graph Algorithm(Current)

4. Graph Problem Patterns(Future Chapter)

This chapter is more to apply the basic search strategies and two advanced
algorithm design methodologies–Dynamic Programming and Greedy Algorithms–
on a variety of classical graph problems:

• Cycle Detection (Section 20.1), Topological Sort(Section 20.2), and

Connected Components(Section 20.3) which all require a through un-
derstanding to properties of basic graph search, especially Depth-first
graph search.

• On the other hand, Minimum Spanning Tree (MST) and Shortest Path
Algorithm on the entails our mastering of Breath-first Graph Search.

• Moreover, to achieve better efficiency, Dynamic Programming and

Greedy Algorithms has to be leveraged in the graph search process.
For example, Bellman-Ford algorithm uses the Dynamic Programming
to avoid recomputing intermediate paths while searching the shortest
paths from a single source to all other targets. The classical Prim’s
and Kruskal’s MST algorithm both demonstrates how greedy algo-
rithm can be applied, each in a different way.

431
432 20. ADVANCED GRAPH ALGORITHMS

20.1 Cycle Detection

Figure 20.1: Undirected Cyclic Graph. (0, 1, 2, 0) is a cycle

Figure 20.2: Directed Cyclic Graph, (0, 1, 2, 0) is a cycle.

Problem Definition Detect cycles in both directed and undirected graph.

Specifically, given a path with k +1 vertices, denoted as v0 , v1 , ..., vk in graph
G:

1. When G is directed: a cycle is formed if v0 = vk and the path contains

at least one edge. For example, there is a cycle 0, 1, 2, 0 shown in the
directed graph of Fig. 20.2.

2. When G is undirected: the path forms a cycle only if v0 = vk and

the path length is at least three (i.e., there are at least three distinct
vertices within the path). For example, in the undirected graph of
Fig. 20.2, we couldn’t say (0, 2) is a cycle even though there is a path
0, 2, 0, but the path 0, 2, 1, 0 is as the path length ≥ 3.

DFS to Solve Cycle Detection Recall the process of DFS graph search
where a vertex has three possible states–white, gray, and black. A back edge
appears while we reach to an adjacent vertex v which is in gray state from
current vertex u. If we connect v back to its ancestor u, we find our cycle if
the graph is directed. When the graph is undirected, we have discussed that
it has only tree edge and back edge. Thus, we will use two states: visited
and not visted. For edge (u, v), we check two conditions:

1. if v is visited already. In Fig. 20.1, when we are at 1, we first visit 0

20.1. CYCLE DETECTION 433

2. avoiding cycle of length one which is any existing edge within the
graph. We can easily achieve this by tracking the predecessor p of the
exploring vertex during the search, and making sure the predecessor
is not the same as the current vertex: p 6= u.

Cycle Detection for Directed Graph We define a function hasCycleDirected

with g as the adjacent list of graph, state as a list to track state for each
vertex, and s as the exploring vertex. The function returns a boolean value
to indicate if there is a cycle or not. The function is essentially a DFS graph
search along with an extra condition check on the back edge.
1 def hasCycleDirected (g , s , state ) :
2 s t a t e [ s ] = STATE. gray # f i r s t be v i s i t e d
3 for v in g [ s ] :
4 i f s t a t e [ v ] == STATE. w h i t e :
5 i f hasCycleDirected (g , v , state ) :
6 p r i n t ( f ' Cycle found a t node {v } . ' )
7 r e t u r n True
8 e l i f s t a t e [ v ] == STATE. gray : # aback edge
9 p r i n t ( f ' Cycle s t a r t s a t node {v } . ' )
10 r e t u r n True
11 else :
12 pass
13 s t a t e [ s ] = STATE. b l a c k # mark i t a s c o m p l e t e
14 return False

Because a graph can be disconnected with multiple components, we run

hasCycleDirected on each unvisited vertex within the graph in a main
function.
1 def cycleDetectDirected (g) :
2 n = len (g)
3 s t a t e = [STATE. w h i t e ] ∗ n
4 f o r i in range (n) :
5 i f s t a t e [ i ] == STATE. w h i t e :
6 i f hasCycleDirected (g , i , state ) :
7 r e t u r n True
8 return False

Cycle Detection for Undirected Graph First, we add another variable

p to track the predecessor. p will first be initialized to −1 because the root
in the rooted search tree has no predecessor (or ancestor). We can use the
three coloring state as we did in directed graph, but it is a slight overkill.
In the implementation, we only use boolean value to mark its state:
1 def hasCycleUndirected (g , s , p , v i s i t e d ) :
2 v i s i t e d [ s ] = True
3 for v in g [ s ] :
4 i f not v i s i t e d [ v ] :
5 i f hasCycleUndirected (g , v , s , v i s i t e d ) :
6 p r i n t ( f ' Cycle found a t node {v } . ' )
434 20. ADVANCED GRAPH ALGORITHMS

7 r e t u r n True
8 else :
9 i f v != p : # both b l a c k and gray
10 p r i n t ( f ' Cycle s t a r t s a t node {v } . ' )
11 r e t u r n True
12 return False

The main function:

1 def cycleDetectUndirected (g) :
2 n = len (g)
3 v i s i t e d = [ False ] ∗ n
4 f o r i in range (n) :
5 i f not v i s i t e d [ i ] :
6 i f h a s C y c l e U n d i r e c t e d ( g , i , −1, v i s i t e d ) :
7 p r i n t ( f ' Cycle found a t s t a r t node { i } . ' )
8 r e t u r n True
9
10 return False

Please check the source code to try out the examples.

How to find all cycles? First, we need to enumerate all

paths while searching in order to get all cycles. This
requires us to retreat to less efficient search strategy:
depth-first tree search. Second, for each path, we find
where the cycle starts by comparing each vi with current
vertex u: in directed graph, once vi == u, the cycle is
vi , vi+1 , ..., vk , vi ; in undirected graph, the cycle is found
only if the length of vi , ..., vk ≥ 3.

20.2 Topological Sort

Problem Definition In a given Directed Acyclic Graph (DAG) G =
(V, E), topological sort/ordering of is a linear ordering of the vertices V ,
such that for each edge e ∈ E, e = (u, v), u comes before v. If a vertex
represents a task to be completed and each directed edge denotes the or-
der between two tasks, then topological sort is a way of linearly ordering a
number of tasks in a completable sequence.
Every DAG has at least one topological ordering. For example, the
topological ordering of Fig 20.3 can be [0, 1, 3, (2, 4, 5), 6], where
(2, 4, 5) can be of any order, i.em., (2, 4, 5), (2, 5, 4), (4, 2, 5), (4, 5, 2), (5, 2, 4), (5, 4, 2).
A topological ordering is only possible if there is no cycle existing in the
graph. Thus, a cycle detection should be applied first when we are given a
possible cyclic graph.
20.2. TOPOLOGICAL SORT 435

Figure 20.3: DAG 1

Kahn’s algorithm (1962)

In topological sort, the first vertex is always ones with in-degree 0 (a vertex
with no incoming edges). A naive algorithm is to decide the first node (with
in-degree 0), add it in resulting order S, and remove all outgoing edges from
this node. Repeat this process until:

• V − S is empty, i.e., |S| = |V |, which indicates we found valid topo-

logical ordering.
0
• no node with 0 in-degree found in the remaining graph G = (V −S, E )
0
where E are the remaining edges from E after the removal, i.e., |S| <
|V |, indicating a cycle exists in V − S and no valid answer exists.

For example, with the digraph in Fig. 20.3, the process is:
S Removed Edges
0 , 3 a r e t h e in−d e g r e e 0 nodes
Add 0 (0 , 1)
1 , 3 a r e t h e c u r r e n t in−d e g r e e 0 node
Add 1 (1 , 2)
3 i s t h e o n l y in−d e g r e e 0 node
Add 3 (3 , 2) , (3 , 4) , ( 3 , 5 )
2 , 4 , 5 a r e t h e in−d e g r e e 0 nodes
Add 2
Add 4
Add 5 (5 , 6)
6 i s t h e o n l y in−d e g r e e 0 node
Add 6
V−S empty , s t o p

In this process, we see that in some time 2, 4, 5 are no in-degree 0 nodes,

that is why their orderings can be permutated, resulting multiple topological
orderings.
In implementation, instead of removing edges from the graph explicitly,
a better option is to track V − S with each vertex’s in-degree: whenever
a in-degree 0 vertex u is added into S, ∀v, u → v, decrease the in-degree
of v by one. We also keep a queue of the all nodes with in-degree zero Q.
Whenever a vertex in V − S is detected with zero in-degree, add it into Q.
Accumulatively, the cost of decreasing the in-degree for vertices in V − S is
|E| as from the start to end,“all edges are removed.” The cost of removing
436 20. ADVANCED GRAPH ALGORITHMS

of vertex from V − S is |V | as all nodes are removed at the end. With

the initialization of the in-degree for vertices in V − S, we have a total of
O(2|E| + |V |), i.e., O(|E| + |V |) as the time complexity. Python code:
1 from c o l l e c t i o n s import d e f a u l t d i c t
2 import heapq
3 d e f kahns_topo_sort ( g ) :
4 S = []
5 V_S = [ ( 0 , node ) f o r node i n r a n g e ( l e n ( g ) ) ] # i n i t i a l i z e node
with 0 a s in−d e g r e e
6 indegrees = defaultdict ( int )
7 # Step 1 : count t h e in−d e g r e e
8 f o r u in range ( len ( g ) ) :
9 indegrees [ u ] = 0
10 f o r u in range ( len ( g ) ) :
11 for v in g [ u ] :
12 i n d e g r e e s [ v]+= 1
13 print ( f ' i n i t i a l indegree : { indegrees } ' )
14 V_S = [ ( i n d e g r e e , node ) f o r node , i n d e g r e e i n i n d e g r e e s . i t e m s
() ]
15 heapq . h e a p i f y (V_S)
16
17 # Step 2 : Kan ' s a l g o r i t h m
18 w h i l e l e n (V_S) > 0 :
19 i n d e g r e e , f i r s t _ n o d e = V_S. pop ( 0 )
20 i f i n d e g r e e != 0 : # c y c l e found , no t o p o l o g i c a l o r d e r i n g
21 r e t u r n None
22 S . append ( f i r s t _ n o d e )
23 # Remove e d g e s
24 for v in g [ first_node ] :
25 i n d e g r e e s [ v ] −= 1
26 # update V_S
27 f o r idx , ( i n d e g r e e , node ) i n enumerate (V_S) :
28 i f i n d e g r e e != i n d e g r e e s [ node ] :
29 V_S[ i d x ] = ( i n d e g r e e s [ node ] , node )
30 heapq . h e a p i f y (V_S)
31 return S

Calling the function using graph in Fig. 20.3 gives result:

1 i n i t i a l i n d e g r e e : d e f a u l t d i c t (< c l a s s ' int ' > , {0: 0 , 1: 1 , 2: 2 ,
3 : 0 , 4 : 1 , 5 : 1 , 6 : 1})
2 [0 , 1 , 3 , 2 , 4 , 5 , 6]

Linear Topological Sort with Depth-first Graph Search

In depth-first graph search, if there is an edge u → v, the recursive search
from v will always be completed ahead of the search of u. With a simple
reverse of the finishing ordering of vertices in depth-first graph search, the
topological ordering takes O(|E| + |V |) time. The time complexity equates
to that of Kahn’s algorithm, but this process is more efficient as it does not
require the counting and updates of node in-degrees. The whole process
20.2. TOPOLOGICAL SORT 437

is exactly the same as Cycle Detection with additional complete ordering

tracking.
First, the code of the DFS is:
1 def dfs ( g , s , co lor s , complete_orders ) :
2 c o l o r s [ s ] = STATE. gray
3 for v in g [ s ] :
4 i f c o l o r s [ v ] == STATE. w h i t e :
5 i f dfs ( g , v , col ors , complete_orders ) :
6 r e t u r n True
7 e l i f c o l o r s [ v ] == STATE. gray : # a c y c l e a p p e a r s
8 p r i n t ( f ' Cycle found a t node {v } . ' )
9 r e t u r n True
10 c o l o r s [ s ] = STATE. b l a c k
11 c o m p l e t e _ o r d e r s . append ( s )
12 return False

Then main function is:

1 def topo_sort ( g ) :
2 n = len (g)
3 complete_orders = [ ]
4 c o l o r s = [STATE. w h i t e ] ∗ n
5 f o r i i n r a n g e ( n ) : # run d f s on a l l t h e node
6 i f c o l o r s [ i ] == STATE. w h i t e :
7 ans = d f s ( g , i , c o l o r s , c o m p l e t e _ o r d e r s )
8 i f not ans :
9 p r i n t ( ' Cycle found , no t o p o l o g i c a l o r d e r i n g ' )
10 r e t u r n None
11 return complete_orders [ : : − 1 ]

Call topo_sort on the graph, we will have the sorted ordering as:
1 [3 , 5 , 6 , 4 , 0 , 1 , 2]

which is another linear topological ordering.

Example: Course Schedule (L210, m)

There are a total of n courses that you have to take. Some courses may have
prerequisites, for example course 1 has to be taken before course 0, which is
expressed as [0, 1]. Given the total number of courses and the prerequisite
pairs, return the ordering of courses you should take to finish all courses. If
it is impossible to finish, return an empty array.

Analysis Viewing a pair [u, v] as an directed edge v → u, we have a

directed graph with n vertices and we solve the ordering of courses as getting
the topological sort of vertices in the resulting digraph.
438 20. ADVANCED GRAPH ALGORITHMS

20.3 Connected Components

Problem Definition In graph theory, a connected component(or simply
component) is defined as a subgraph where all vertices are mutually con-
nected, i.e., where there exists a path between any two vertices in it. A
graph G = (V, E) is thus composed of separate connected components(sets)
which are mutually exclusive and include all the vertices, .i.e., V = V0 ∪ V1 ∪
... ∪ Vm−1 , Vi ∩ Vj6=i = ∅. A connected component algorithm should be able
to cluster vertices of each single connected component. For example, the

Figure 20.4: The connected components in undirected graph, each dashed

read circle marks a connected component.

undirected graph in Fig. 20.4 has two connected components: {0, 1, 2, 3, 4}

and {5, 6}.
Given a directed graph,

• the term Strongly Connected Component (SCC) or diconnected is used

to refer to the same definition– where in a SCC any two vertices are
reachable to each other by paths. In the leftest directed graph shown
in Fig. 20.5, there is a total of five SCCs: {0, 1, 2}, {3},{4}, {5}, and
{6}. Vertex 5 and 6 is only connected in one way, resulting into two
separate SCCs.

• ignoring the direction of edges, a weakly connected component (WCC)

equates to a connected component in the resulting undirected graph.

Figure 20.5: The strongly connected components in directed graph, each

dashed read circle marks a strongly connected component.
20.3. CONNECTED COMPONENTS 439

Cycles and Strongly Connected Components A directed graph is

acyclic if and only it has no strongly connected subgraphs with more than
one vertex. We call SCCs with at least two vertices nontrivial SCCs. Non-
trial SCCs contains at least one directed cycle, and more specifically, non-
trivial SCCs is composed of a set of directed cycles as we have observed
that there are two directed cycles in our above example and they share at
least one common vertex. The shared common vertex act as “transferring
stop” between these directed cycles thus they all compose to one component.
Therefore, SCCs algorithms can be indirectly used to detect cycles. If there
exists nontrivial SCC, directed graph contains cycles.

20.3.1 Connected Components Detection

In general, there are two ways to detect connected components in an undi-
rected graph: graph search and union-find, each suits different needs.

Graph Search and Search Tree In undirected graph G, executing a

BFS or DFS starting at some vertex u will result in a rooted search tree.
As the edges are undirected or bidirectional, all vertices in the search tree
belong to the same connected component. To find all connected components,
we simply loop through all vertices V , for each vertex u:

• if u is not visited yet, we start a new DFS/BFS. Mark all vertices

along the traversal as the same component.

• otherwise, u is already included in a previously found connected com-

ponent, continue.

The time complexity will be O(|V | + |E|) and the space complexity will be
O(|V |). Since the code is trivial, we only demonstrate it in the notebook.

Union Find
We represent each connected component as a set. For the exemplary graph
in Fig. 20.4, we have two sets: 0, 1, 2, 3, 4 and 5, 6. Unlike the graph-search
based approach, where the edges are visited in certain order,in union-find
approach, the ordering of edges to be visited can be arbitrary. The algorithm
using union-find is:

• Initialize in total |V | sets, one for each vertex V .

• For each edge (u, v) in E, union the two sets where vertex u and v
previously belongs to.

Implementing it with Python:

440 20. ADVANCED GRAPH ALGORITHMS

1 from c o l l e c t i o n s import d e f a u l t d i c t
2 d e f connectedComponent ( g ) :
3 n = len (g)
4 # i n i t i a l i z e disjoint set
5 ds = D i s j o i n t S e t ( n )
6
7 f o r i in range (n) :
8 f o r j i n g [ i ] : # f o r edge i <−>j
9 ds . union ( i , j )
10 r e t u r n ds . get_num_sets ( ) , ds . g e t _ a l l _ s e t s ( )

How we implement the union-find data structure decides the complexity

of this approach. For example, if we use linked list based structure, the
complexity will be O(|E| × |V |) as we traversal |V | edges and each step
in worst case can take O(|V |) to find the set that it belongs to. However,
if path compression and union by rank is used for optimization, the time
complexity could be lowered to O(|E| × log |V |).

Dynamic Graph Since union-find has worse time complexity compared

with graph search, then why do we care about it? The answer is: if we use
graph search, whenever new edges and vertices are added to the graph, we
have to rerun the graph search algorithm. Imagine that if we double |V |
and E, the worst time complexity will be O(|V | × (|V | + |E|), bringing up
the complexity to polynomial of the number of edges. However, for each
additional edge, union-find adds only a single merge operation to address
the change, keeping the time complexity unchanged.
In detail, we adapt the union-find structure dynamically. Set up a dict
to track vertex and its index in the union find. Set index=0. When a new
edge (u, v) comes, union find includes:
• check if u and v exists in dict. If not, (a) add a key-value into the node
tracker, (b) append index into the list of vertex-set, (c) index+=1.

• find the sets where u and v belongs to.

Implementation Here we demonstrate how to implement a dynamic con-

nected component detection algorithm. First, convert the graph represen-
tation from adjacent list to a list of edges:
1 ug_edges = [ ( 0 , 1 ) , ( 0 , 2 ) , ( 1 , 2 ) , ( 2 , 4 ) , ( 4 , 3 ) , ( 4 , 3 ) , ( 5 ,
6) ]

Then, we implement a class DynamicConnectedComponent offering all func-

tions needed.
1 c l a s s DynamicConnectedComponent ( ) :
2 d e f __init__ ( s e l f ) :
3 s e l f . ds = D i s j o i n t S e t ( 0 )
4 s e l f . node_index= d e f a u l t d i c t ( i n t )
5 s e l f . index_node = d e f a u l t d i c t ( i n t )
20.3. CONNECTED COMPONENTS 441

6 s e l f . index = 0
7
8 d e f add_edge ( s e l f , u , v ) :
9 i f u not i n s e l f . node_index :
10 s e l f . node_index [ u ] , s e l f . index_node [ s e l f . i n d e x ] = s e l f .
index , u
11 s e l f . ds . p . append ( s e l f . i n d e x )
12 s e l f . ds . n += 1
13 s e l f . i n d e x += 1
14
15 i f v not i n s e l f . node_index :
16 s e l f . node_index [ v ] , s e l f . index_node [ s e l f . i n d e x ] = s e l f .
index , v
17 s e l f . ds . p . append ( s e l f . i n d e x )
18 s e l f . ds . n += 1
19 s e l f . i n d e x += 1
20 u , v = s e l f . node_index [ u ] , s e l f . node_index [ v ]
21 s e l f . ds . union ( u , v )
22 return
23
24 d e f get_num_sets ( s e l f ) :
25 r e t u r n s e l f . ds . get_num_sets ( )
26
27 def get_all_sets ( s e l f ) :
28 s e t s = s e l f . ds . g e t _ a l l _ s e t s ( )
29 r e t u r n { s e l f . index_node [ key ] : s e t ( [ s e l f . index_node [ i ] f o r i
i n l i s t ( v a l u e ) ] ) f o r key , v a l u e i n s e t s . i t e m s ( ) }

Now, to find the connected components dynamically based on incoming

edges, we can run:
1 dcc = DynamicConnectedComponent ( )
2 f o r u , v i n ug_edges :
3 dcc . add_edge ( u , v )
4 dcc . get_num_sets ( ) , dcc . g e t _ a l l _ s e t s ( )

The output is consistent with previous result, which is:

1 (2 , {3: {0 , 1 , 2 , 3 , 4} , 6 : {5 , 6}})

Examples
1. 547. Number of Provinces(medium)

2. 128. Longest Consecutive Sequence (hard), union find solution: https:

//leetcode.com/problems/longest-consecutive-sequence/discuss/
1109808/Python-Clean-Union-Find-with-explanation

Implement WCC detection algorithm in directed graph?

442 20. ADVANCED GRAPH ALGORITHMS

20.3.2 Strongly Connected Components

In graph theory, two nodes u, v ∈ V are called strongly connected iff v is
reachable from u and u is reachable from v. If we contract each SCC into
a single vertex, the resulting graph will be a DAG. Denoting the contracted
DAG as GSCC = (V SCC , E SCC ), V SCC are vertices of SCCs and E SCC are
defined as follows:
(C1 , C2 ) is an edge in GSCC iff ∃u ∈ C1 , v ∈ C2 . (u, v) is an edge in G.
In other words, if there is an edge in G from any node in C1 to any node
in C2 , there is an edge in GSCC from C1 to C2 .

Figure 20.6: A graph with four SCCs.

Kosaraju’s Algorithm If we were to do a DFS in G, and C1 → C2 is an

edge in GSCC , then at least one vertex in C1 will finish after all vertices in
C2 being finished. If we first start with vertex 0, the finishing order of of
all vertices is [1, 6, 5, 4, 3, 2, 0]. 0 finished later than 4 from C2 , satisfying
the claim. If we look purely at the last node from each SCC to turn dark,
we get a topological sort of GSCC in reverse([1, 6, 5, 4, 3, 2, 0]), which is
[C4 , C3 , C2 , C1 ]. How to find the last node in each SCC? We can reverse the
dfs finishing order, having [0, 2, 3, 4, 5, 6, 1].
If we reverse the order, we have [0, 2, 3, 4, 5, 6, 1]. What happens if
we do another round of DFS on the given ordering? First, starting from 0
(last node), we can (1) reach to all vertices in C1 as they are connected, (2)
reach to vertices in C2 if there exists no edge or edges only from C1 to C2
in between. If we can reverse the edges in between, then we can avoid (2)
and still keeps (1). The way we do this is: reverse the direction of all edges
20.3. CONNECTED COMPONENTS 443

in graph G. Run DFS on the reversed finishing ordering, then a SCC will
include any vertex along the traversal that hasn’s been put into a SCC yet.
In our example, the process is:
0: f i n d { 0 , 1 , 2 , 3}
4 : f i n d {4}
5 : f i n d {5}
6 : f i n d {6}

We formalize Kosaraju’s algorithm into three steps:

1. Retrieve a reversed finishing order of vertices during DFS L. This step

is similar to topological sort in an DAG.

2. Transpose the original graph G to GT by reversing the directional of

edges in G.

3. Run another DFS in L1 ordering on GT , any df-search tree starting

from a vertex that hasn’t been put into a SCC yet make up to another
SCC.

Implementation The main function scc calls two functions: topo_sort_scc

and reverse_graph to get L and GT . The topological ordering like function:
1 # DFS t r a v e r s a l with r e v e r s e d c o m p l e t e o r d e r s
2 def dfs ( g , s , co lor s , complete_orders ) :
3 c o l o r s [ s ] = STATE. gray
4 for v in g [ s ] :
5 i f c o l o r s [ v ] == STATE. w h i t e :
6 dfs ( g , v , col ors , complete_orders )
7 c o l o r s [ s ] = STATE. b l a c k
8 c o m p l e t e _ o r d e r s . append ( s )
9 return
10
11 # t o p o l o g i c a l l y s o r t i n terms o f t h e l a s t node o f each s c c
12 def topo_sort_scc ( g ) :
13 v = len (g)
14 complete_orders = [ ]
15 c o l o r s = [STATE. w h i t e ] ∗ v
16 f o r i i n r a n g e ( v ) : # run d f s on a l l t h e node
17 i f c o l o r s [ i ] == STATE. w h i t e :
18 dfs ( g , i , col ors , complete_orders )
19 return complete_orders [ : : − 1 ]

The main scc is straightforward:

1 # g e t c o n v e r s e d graph
2 def reverse_graph ( g ) :
3 rg = [ [ ] f o r i in range ( len ( g ) ) ]
4 f o r u in range ( len ( g ) ) :
5 for v in g [ u ] :
6 r g [ v ] . append ( u )
7 return rg
444 20. ADVANCED GRAPH ALGORITHMS

8
9 def scc (g) :
10 rg = reverse_graph ( g )
11 o r d e r s = topo_sort_scc ( g )
12
13 # track states
14 c o l o r s = [STATE. w h i t e ] ∗ l e n ( g )
15 sccs = [ ]
16
17 # t r a v e r s e t h e r e v e r s e d graph
18 for u in orders :
19 i f c o l o r s [ u ] != STATE. w h i t e :
20 continue
21 scc = [ ]
22 d f s ( rg , u , c o l o r s , s c c )
23 s c c s . append ( s c c )
24 return sccs

Try to take a look at Tarjans’ algorithm for SCC

Examples
1. 1520. Maximum Number of Non-Overlapping Substrings (hard): set
up 26 nodes for all letters. A node represents a substray from start to
end. Given a string abacdb, for a(0-2), add an edge between a -> to
any other letter between start and end.Then we will have a directed
graph. There is a scc (loop) between a and d, meaning a substring a
has occurence of b and b substring has occurence of a, which is con-
flicting condition 2, so that they have to be combined. all results are
sccs that are leaves in the contracted scc graph. We can think the scc
graph is acyclic which is a forest. If we choose an internal node, we cant
choose any of the leaves. Which making choosing the number of leaves
maximum. Another solution is using two pointers: https://zxi.
mytechroad.com/blog/greedy/leetcode-1520-maximum-number-of-non-overlapping

20.4 Minimum Spanning Trees

Problem Definition A spanning tree in an undirected graph G = (V, E)
is a set of edges, with no cycles, that connects all vertices. There can exist
many spanning trees in a graph. Given a weighted graph, we are particularly
interested with the minimum spanning tree (MST)–a spanning tree with the
least total edge cost.
One example is shown in Fig. 20.7. This graph can represent a collection
of houses, and possible wires that we can lay. How we lay wires to connect
all houses with the least total cost is equivalently a MST problem.
20.4. MINIMUM SPANNING TREES 445

Figure 20.7: Example of minimum spanning tree in undirected graph, the

green edges are edges of the tree, and the yellow filled vertices are vertices
of MST (change this to a graph with multiple spanning tree, and highlight
the one with the minimum ones.

Spanning Tree To obtain a tree from a graph, the essence is to select

edges iteratively until we have |V | − 1 edges which form a tree connecting
V . We have two general approaches:

• Start with a forest consists of |V | trees and contains only one node. We
design a method to merge these trees into a final connected MST by
selecting one edge at a time. This is the path taken by the Kruskal’s
algorithm.

• Start with a root node which can be any vertex selected from G, grow
the tree by spanning to more nodes iteratively. In the process, we
maintain two disjoint sets of vertices: one containing vertices that are
in the growing spanning tree S and the other to track all remaining
vertices V − S. This is the path taken by the Prim’s algorithm.

We denote the edges in the growing as A. In this section, we explain two

greedy algorithms to find MST.

20.4.1 Kruskal’s Algorithm

Kruskal’s algorithm starts with |V | trees that each has only one node. The
main process of the algorithm is to merge these trees into a single one by
iterating through all edges.

Generate Spanning Tree with Union-Find For each edge (u, v):

• if u and v belongs to the same tree, adding this edge will form a cycle,
thus we discard this edge.
446 20. ADVANCED GRAPH ALGORITHMS

Figure 20.8: The process of Kruskal’s Algorithm

• otherwise, combine these two trees and add this edge into A.

This process will result in a single spanning tree. In implementation wise,

we can do this easily by using union-find data structure. A tree is a set.
Adding one edge is to merge two sets/trees into a single one if they belong
to different sets.

Being Greedy with MST At each step i, we have |E| − i edges to

choose from. Applying the principle of greedy algorithm, maybe we can try
to choose the edge with the minimum cost among |E| − i options. That is
to say, we iterate edges in increasing order of its weight in the process of
generating a spanning tree. Doing so will ensure us to have the MST, and
this algorithm is the so called Kruskal’s algorithm.
Fig. 20.8 demonstrates the run of Kruskal’s on the input undirected
graph. Here, the edges are ordered increasingly, i.e., [(1,2), (3, 5), (2, 3),
20.4. MINIMUM SPANNING TREES 447

(2, 5), (3, 4), (4, 5), (1, 3)]. As initialization, we assign a set id for each
vertex that is marked in read and placed above its corresponding vertex.
The process is:
edge logic action
(1 ,2) 1 ' s set_id 1 != 2's set_id 2 merge s e t 2 to s e t 1
(3 ,5) 3 ' s set_id 3 != 5's set_id 5 merge s e t 5 to s e t 3
(2 ,3) 2 ' s set_id 1 != 3's set_id 3 merge s e t 3 to s e t 1
(2 ,5) 2 ' s set_id 1 == 5's set_id 1 continue
(3 ,4) 3 ' s set_id 1 != 4's set_id 4 merge s e t 4 to s e t 1
(4 ,5) 4 ' s set_id 1 == 5's set_id 1 continue
(1 ,3) 1 ' s set_id 1 == 3's set_id 1 continue

This process produces edges [(1, 2), (3, 5), (2, 3), (3, 4)] as the edges of the
final MST. We can have slightly better performance if we can stop iterating
through edges once we have selected |V | − 1 edges. The implementation is
as simply as:
1 from t y p i n g import D i c t
2 def kruskal ( g : Dict ) :
3 # g i s a d i c t with node : a d j a c e n t nodes
4 v e r t i c e s = [ i f o r i in range (1 , 1 + len ( g ) ) ]
5 v e r t i c e s = g . keys ( )
6 n = len ( vertices )
7 ver_idx = {v : i f o r i , v i n enumerate ( v e r t i c e s ) }
8
9 # i n i t i a l i z e a disjoint set
10 ds = D i s j o i n t S e t ( n )
11
12 # s o r t a l l edges
13 edges = [ ]
14 for u in v e r t i c e s :
15 for v , w in g [ u ] :
16 i f ( v , u , w) not i n e d g e s :
17 e d g e s . append ( ( u , v , w) )
18 e d g e s . s o r t ( key=lambda x : x [ 2 ] )
19
20 # main s e c t i o n
21 A = []
22 f o r u , v , w in edges :
23 i f ds . f i n d ( ver_idx [ u ] ) != ds . f i n d ( ver_idx [ v ] ) :
24 ds . union ( ver_idx [ u ] , ver_idx [ v ] )
25 p r i n t ( f ' {u} −> {v } : {w} ' )
26 A. append ( ( u , v , w) )
27 return A

For the exemplary graph, we denote an weighted edge as a (key, value) pair,
where the value is a tuple of two with the first item being the other endpoint
from the key vertex and the second item being the weight of the edge. The
graph will thus be represented by a dictionary, {1:[(2, 2), (3, 12)], 2:[(1, 2),
(3, 4), (5, 5)], 3:[(1, 12), (2, 4), (4, 6), (5, 3)], 4:[(3, 6), (5, 7)], 5:[(2, 5), (3,
3), (4, 7)]}. Running kruskal(a) will return the following edges:
[ ( 1 , 2 , 2) , (3 , 5 , 3) , (2 , 3 , 4) , (3 , 4 , 6) ]
448 20. ADVANCED GRAPH ALGORITHMS

Complexity Analysis The sorting takes O(|E| log |E|) big oh time. The
cost of checking each edge’s belonging set id and merging two trees into a
single one is decided by the complexity of the disjoint set, it can range from
O(log |V |) to O(|V |). Therefore, we can conclude the time complexity will
be bounded by the sorting time, i.e., O(|E| log |E|).

20.4.2 Prim’s Algorithm

Figure 20.9: A cut denoted with red curve partition V into {1,2,3} and
{4,5}.

In graph theory, a cut is a partition of V into S and V − S. For ex-

ample, in Fig. 20.9 a cut is marked by red curve, removing three edges
(2, 5), (3, 5), (3, 4) partitions the set into two subgraph with subsets {1, 2, 3}
and {4, 5}. A cross edge (u, v) ∈ E crosses the cut (S, V − S) if one of its
endpoint is in S and the other is in V − S. A light edge is the minimum edge
among all cross edges, such as edge (3, 5) is the light edge in our example.
We say, a cut respects a set of edges A if no edge in A crosses the cut, such as
the marked cut in the example respects the set of edges (1, 2), (2, 3), (1, 3).
Prim’s algorithm starts with a randomly chosen root node and be put
into a set S, leaving us with two sets of vertices, S and V − S. Next, it
iteratively grows the partial and connected MST by adding an edge from
the cross edges between the cut of (S, V − S). Prim’s algorithm is greedy
in the sense that it chooses a light edge among its options to form the final
MST. This process simulates the uniform-cost search which compose the
Dijkstra’s shortest path algorithm.
Fig. 20.10 demonstrates the process of Prim’s algorithm. We start from
vertex 1. with the set A, S, V − S, the cross edges at each step are denoted
as CE, and a decision valid if it does not form a cycle within A, we list the
process as:
A S V−S CE l i g h t edge
1 2 ,3 ,4 ,5 (1 ,2) , (1 ,3) (1 ,2)
(1 ,2) 1 ,2 3 ,4 ,5 (1 ,3) ,(2 ,3) ,(2 ,5) (2 ,3)
(1 ,2) ,(2 ,3) 1 ,2 ,3 4 ,5 (3 ,4) ,(3 ,5) ,(2 ,5) (3 ,5)
20.4. MINIMUM SPANNING TREES 449

Figure 20.10: Prim’s Algorithm, at each step, we manage the cross edges.

(1 ,2) ,(2 ,3) ,(3 ,5) 1 ,2 ,3 ,5 4 (3 ,4) ,(5 ,4) (3 ,4)

(1 ,2) ,(2 ,3) ,(3 ,5) ,(3 ,4) 1 ,2 ,3 ,4 ,5

Implementation
One key step is to track all valid cross edges and be able to select the
minimum edge from the set. Naturally, we use priority queue pq. pq can be
implemented in two ways:
• Priority Queue by Edges–Considering the set S as a frontier set,
pq maintains all edges expanded from the frontier set.

• Priority Queue by Vertices–pq maintains the minimum cross edge

cost between vertices in S to the current vertex which is in |V − S|.
This is an optimization over the first approach as it reduces multiple
cross edges between S and current vertex v into a single cost – the
minimum.

Priority Queue by Edges For example shown in Fig. 20.10, at first, the
frontier set has only 1, then we have edges (1, 2), (1, 3) in pq. Once edge
(1, 2) is popped out as it has the smallest weight, we explore all outgoing
edges of vertex 2 to nodes in V − S, adding (2, 3), (2, 5) in pq, resulting pq =
(2, 3), (2, 5), (1, 3). Then we pop out edge (2, 3), and explore outgoing edges
of vertex 3 and add (3, 4), (3, 5) into pq, with pq = (2, 5), (1, 3), (3, 4), (3, 5).
At this moment, we can see that edge (1, 3) is no longer a cross edge. There-
fore, whenever we are about to add the light edge into the expanding tree,
we check if both of its endpoints are in set S already. If true, we skip this
450 20. ADVANCED GRAPH ALGORITHMS

edge and use the next valid light edge. Repeat this process will get us the
set of edges A forming a MST. The Python code is as:
1 import queue
2
3 d e f _get_light_edge ( pq , S ) :
4 w h i l e pq :
5 # Pick t h e l i g h t edge
6 w, u , v = pq . g e t ( )
7 # F i l t e r out non−c r o s s edge
8 i f v not i n S :
9 S . add ( v )
10 r e t u r n ( u , v , w)
11 r e t u r n None
12
13 d e f prim ( g ) :
14 cur = 1
15 n = len ( g . items () )
16 S = { c u r } #s p a n n i n g t r e e s e t
17 pq = queue . P r i o r i t y Q u e u e ( )
18 A = []
19
20 while len (S) < n :
21 # Expand e d g e s f o r t h e e x p l o r i n g v e r t e x
22 f o r v , w in g [ cur ] :
23 i f v not i n S :
24 pq . put ( ( w, cur , v ) )
25
26 l e = _get_light_edge ( pq , S )
27 if le :
28 A. append ( l e )
29 c u r = l e [ 1 ] #s e t t h e e x p l o r i n g v e r t e x
30 else :
31 p r i n t ( f ' Graph { g } i s not c o n n e c t e d . ' )
32 break
33 return A

In line 24, we use a 3 item tuple representing the edge cost, the first endpoint
in the set S and the second endpoint in V − S to align with the fact that
the PriorityQueue() uses the first item of a tuple as the key for sorting.
The while loop is similar to our breath-first-search and can be terminated
in the following two conditions:

• when the set S is as large as the set V by checking the size of set S

• when we can not find a light edge which happens when the graph is
not connected.

Call prim(a) will return us the following A:

[ ( 1 , 2 , 2) , (2 , 3 , 4) , (3 , 5 , 3) , (3 , 4 , 6) ]
20.4. MINIMUM SPANNING TREES 451

Complexity Analysis The main cost of this implementation is on the

priority queue, which has a maximum of |E| items. In the worst case we have
to enqueue and dequeue all edges, making the complexity as O(|E| log |E|).
In a graph, generally, E < V 2 , the complexity become O(|E| log V ).

Priority Queue by Vertices Instead of tracking cross edges in the pri-

ority queue explicitly, we reduce all cross edges that reaches to a vertex in
V − S into the smallest cost and a predecessor which is to track the node
in S that resulted in the smallest cost, saving us some additional space and
time in the queue operations.

Figure 20.11: Prim’s Algorithm

As shown in Fig. 20.11, we first initialize a priority queue with |V | items,

each has a task id same as the vertex id, a predecessor vertex p = −1, and
a cost initialized with ∞. We start by pointing vertex 1 as the root node,
setting S = 1 and modify the task 1’s cost to 0 and points its predecessor
to itself. Then, we repeatedly pop out the vertex in the queue that has the
smallest weight, along with the predecessor of this node, we are choosing
the light edge. With this chosen node, we are able to reach out to adjacent
452 20. ADVANCED GRAPH ALGORITHMS

nodes that are still in V − S and see if we are able to find an even “lighter”
edge. Applying this process on the given example:

1. First, we have the start vertex 1 with the smallest cost, pop it out,
and explore edges (1, 2), (1, 3), resulting in (a) modifying task 2 and
3’s cost to 2 and 12, respectively and (b) set 2 and 3’s predecessor to
1.

2. Pop out vertex 2, explore edges (2, 3), (2, 5), resulting in (a) modifying
task 3 and 5’s cost to 4 and 5, respectively and (b) set 3 and 5’s
predecessor to 2.

3. Pop out vertex 3, explore edges (3, 5), (3, 4), resulting in (a) modifying
task 5 and 4’s cost to 3 and 6, respectively and (b) set 3 and 5’s
predecessor to 3.

4. Pop out vertex 5, explore edges (5, 4): since the new cross edge (5, 4)
has larger cost compared with previous reduced cross edge to reach to
vertex 4, the vertex 4 in the queue is not modified.

5. Pop out vertex 4, no more new edges to expand, terminate the pro-
gram.

This process results in the exactly same MST compared with the implemen-
tation by edges. However, it adds additional challenges into the implemen-
tation of the priority queue: We have to modify an enqueued item’s record
during the life cycle of the queue. In the Python implementation, we use
the our customized PriorityQueue() in Section. ??(also included in the
notebook). The main process of the algorithm is:
1 d e f prim2 ( g ) :
2 n = len ( g . items () )
3 pq = P r i o r i t y Q u e u e ( )
4 S = {}
5 A = []
6 # Initialization
7 f o r i in range (n) :
8 pq . add_task ( t a s k=i +1, p r i o r i t y=f l o a t ( ' i n f ' ) , i n f o=None ) #
t a s k : v e r t e x , p r i o r i t y : edge c o s t , i n f o : p r e d e c e s s o r v e r t e x
9
10 S = {1}
11 pq . add_task ( 1 , 0 , i n f o =1)
12
13 while len (S) < n :
14 u , p , w = pq . pop_task ( )
15 i f w == f l o a t ( ' i n f ' ) :
16 p r i n t ( f ' Graph { g } i s not c o n n e c t e d . ' )
17 break
18 A. append ( ( p , u , w) )
19 S . add ( u )
20.5. SHORTEST-PATHS ALGORITHMS 453

20 for v , w in g [ u ] :
21 i f v not i n S and w < pq . e n t r y _ f i n d e r [ v ] [ 0 ] :
22 pq . add_task ( v , w, u )
23
24 return A
25

Calling function prim2(a) will output the following A:

1 [ ( 1 , 1 , 0) , (1 , 2 , 2) , (2 , 3 , 4) , (3 , 5 , 3) , (3 , 4 , 6) ]
2

Examples
1. 1584. Min Cost to Connect All Points (medium)
2. 1579. Remove Max Number of Edges to Keep Graph Fully Traversable
(hard)

Try to prove the correctness of Kruskal’s Algorithm.

20.5 Shortest-Paths Algorithms

Problem Definition Given a weighted, directed graph G = (V, E), with
weight function w : E → R that maps edges to real-valued weights, the
weight of a path p = (v0 , v1 , ..., vk ) is the summation over its constituent
edge weights, denoted as w(p):
k
w(p) = w(vi−1 , vi ) (20.1)
X

i=1

The shortest path problem between vi and vj is to find the shortest path
weight σ(vi , vj ) along with the shortest path p.
p
(
min{w(p) : vi −
→ vj } if there is a path from vi to vj
σ(vi , vj ) =
∞ otherwise
(20.2)
For example, for the graph shown in Fig. 20.12, the shortest-path weight
and its corresponding shortest-path between s to any other vertex in V is
listed as:
( source , t ar g e t ) s h o r t e s t −path w e i g h t s h o r t e s t path
(s , s) 0 s
(s , y) 7 (s , y)
(s , x) 4 (s , y , x)
(s , t) 2 (s , y, x, t)
(s , z) −2 (s , y, x, t , z)
454 20. ADVANCED GRAPH ALGORITHMS

Figure 20.12: A weighted and directed graph.

Variants of Shortest-path Problems Generally, there exists a few vari-

ants of shortest path problems:

1. Single-source shortest-path: Find a shortest path from a given source

s vertex to each vertex v ∈ V .

2. Single-target shortest-path: Find a shortest path to a given target t

from each vertex v ∈ V . By reversing the direction of each edge in
the graph, we can reduce this problem to a single-source shortest-path
problem.

3. Single-pair shortest-path problem: Find a shortest path from u to v

for given vertices u and v. If we solve the single-source problem with
source vertex u, we solve this problem too.

4. All-pairs shortest-path problem: Find a shortest path from u to v for

every pair of vertices u and v in V if there exists one. Although we
can solve this problem by running a single-source algorithm once for
each vertex, we usually can solve it faster with algorithms addressed
in Section (Sec. 20.5.4).

20.5.1 Algorithm Design

In this section, we discuss the shortest path problem, and analyze it by using
both graph theory and the fundamental algorithm design principle–Dynamic
Programming.

Shortest path and Cycle From our experience in Combinatorial Search,

we have to detect cycles within a path in the graph-based tree search to avoid
being stuck in infinite recursion. So, how will cycle affect the detection of
shortest paths? For example, in Fig. 20.12, a path p = (s, t, x, t) contains
20.5. SHORTEST-PATHS ALGORITHMS 455

the cycle (t, x, t). Because the cycle has a positive path weight 5 + (−2),
the path (s, t) remains smaller than the path that comes with the cycle.
However, if we switch the weight of edge (t, x) with that of (x, t), then the
same cycle (t, x, t) will have negative path weight (−5) + 2, repeating the
cycle within the path infinitely we will have a cost of −∞. Therefore, for a
graph where the weights can be both negative and positive, one requirement
posed on the single-source shortest-path algorithm, recursive or iterative, is
to detect the negative-weight cycle that is reachable from the source. Once
we get rid of all negative-weight cycles, the remaining of the algorithm can
focus on only shortest-paths of at most |V | − 1 edges, and the resulting
shortest-paths will not contain neither negative- nor positive-weight cycles.

Exponential Naive Solution

Assume the given graph has no negative-weight cycle, a naive solution to
obtain the shortest path and its weight is simply through a tree-search which
starts from a source vertex s and enumerates all possible paths between s
to any other vertex in V . The search tree will have a maximum height of
|V | − 1, making the time complexity of this naive solution to be O(b|V | ),
where b is the maximum branch of a vertex. Recall the path enumeration
in Search Strategies, we implement this solution as:
1 d e f a l l _ p a t h s ( g , s , path , c o s t , ans ) :
2 ans . append ( { ' path ' : path [ : : ] , ' c o s t ' : c o s t } )
3 for v , w in g [ s ] :
4 # Avoid c y c l e
5 i f v i n path :
6 continue
7 path . append ( v )
8 c o s t += w
9 a l l _ p a t h s ( g , v , path , c o s t , ans )
10 c o s t −= w
11 path . pop ( )

To obtain all possible paths, we call the function all_paths() with the
following code:
1 g = {
2 ' t ' : [ ( ' x ' , 5 ) , ( ' y ' , 8 ) , ( ' z ' , −4) ] ,
3 ' x ' : [ ( ' t ' , −2) ] ,
4 ' y ' : [ ( ' x ' , −3) , ( ' z ' , 9 ) ] ,
5 ' z ' : [ ( 'x ' ,7) ] ,
6 ' s ' : [ ( ' t ' , 6) , ( ' y ' , 7) ] ,
7 }
8 ans = [ ]
9 a l l _ p a t h s ( g , ' s ' , [ ' s ' ] , 0 , ans )

Shortest-paths Tree We visualize all paths in ans in a tree structure

shown in Fig. 20.13. We can easily extract the shortest paths between s
456 20. ADVANCED GRAPH ALGORITHMS

Figure 20.13: All paths from source vertex s for graph in Fig. 20.12 and its
shortest paths.

to any other vertex from this result, which is shown on the right side of
Fig. 20.13. All possible paths starting from source vertex can be viewed as
a tree, and the shortest paths from source to all other vertices within the
graph will be a subtree of the former tree structure, known as the shortest-
paths tree. Formally, a shortest-paths tree rooted at s is a directed subgraph
0 0 0 0 0
G = (V , E ), where V ∈ V and E ∈ E, such that

0
1. V is the set of vertices reachable from s in G,

0 0
2. for each v ∈ V , the unique simple path from s to v in G is a shortest
path from s to v in G.

Predecessor Rule The shortest-paths tree makes it possible for us to

track shortest paths with the predecessor rule: Given a graph G = (V, E),
and in the single-source shortest path problem, we maintain for each vertex
v ∈ V a predecessor π that is either another vertex or empty as for the root
node. The shortest-paths between s and another vertex v can be obtained by
iterating the chained predecessors starting from v and all the way backward
to the source s. To summarize, each vertex v in the graph stores two values,
d(v) and π(v) , which (inductively) describe a tentative shortest path from
s to v.
20.5. SHORTEST-PATHS ALGORITHMS 457

Optimization
As we see, shortest path problem is a truly combinatorial optimization prob-
lem, making them the best demonstration examples of the algorithm design
principles–Dynamic Programming and Greedy Algorithm. On the other
hand, depending on the characteristics of targeting graph, either they are
dense or spares, directed acyclic graph (DAG) or not DAG, we can further
optimize the efficiency besides of the design principle. However, in this chap-
ter, we focus on the gist: how to solve all-pair shortest path problems with
dynamic programming?
First, we use an adjacency matrix to represent our weight matrix W of
size |V | × |V |. In the process, we track shortest-path weight estimate D and
additionally the predecessor Π. Both D and Π are of same size as W . wij
indicates the weight of each edge with startpoint i and endpoint j,

 0
 if i = j
W (i, j) = wij if i =
6 j, and (i, j) ∈ E (20.3)
 ∞ if i =6 j, and (i, j) ∈

/E

With this definition, we show a naive directed graph in Fig. 20.14 along with
its W .

Figure 20.14: The simple graph and its adjacency matrix representation
(changing it to lower letter)

Overlapping Subproblems and Optimal Substructures For all-pair

shortest paths, we have |V |2 optimal subproblems, each subproblem D(i, j)
is defined as the shortest-path between vi and vj . Optimal Substructures
states “the optimal solution to a problem has the optimal solutions to sub-
problems in it.” All of this boils down to how to define the “subproblem” and
how a larger subproblem is divided into smaller subproblems (the recurrence
relation).
With our naive directed graph, the shortest path between a and d come
from the shortest path between a to an intermediate node x or the shortest
458 20. ADVANCED GRAPH ALGORITHMS

path between a and d found so far. First, we define the subproblem as the
shortest path between a and d with maximum path length(MPL) m. With
this definition, we show two possible ways of dividing the subproblem:

1. We divide a subproblem with MPL m into a subproblem with MPL

m − 1 and an edge. Therefore, the shortest path at this maximum
length m is either the shortest path found so far or equals to the
shortest path between a and x plus the weight of edge (x, d), our
recurrence relation is:

Dm (a, d) = min(Dm−1 (a, d), Dm−1 (a, x) + W (x, d)) (20.4)

As we can see, each update for an item in distance matrix D takes

O(|V |) time as it has to check all possible intermediate nodes. Fur-
thermore, it takes |V | − 1 passes to update D0 all the way to D|V |−1 .
Therefore, this approach has a time complexity of O(|V |4 ). We demon-
strate the update process in Fig. 20.15 for our naive example.

(1).png (1).png.mps (1).png.pdf (1).png.png (1).png.jpg (1).png.mps

(1).png.jpeg (1).png.jbig2 (1).png.jb2 (1).png.PDF (1).png.PNG
(1).png.JPG (1).png.JPEG (1).png.JBIG2 (1).png.JB2 (1).png.eps

Figure 20.15: DP process using Eq. 20.4 for Fig. 20.14

2. We divide a subproblem with MPL m into two equal sized subproblem,

each with MPL m/2. Therefore, the shortest path at this maximum
length m is either the shortest path found so far or equals to the
shortest path between a and x of length m/2 plus the shortest path
20.5. SHORTEST-PATHS ALGORITHMS 459

(2).png (2).png.mps (2).png.pdf (2).png.png (2).png.jpg (2).png.mps

(2).png.jpeg (2).png.jbig2 (2).png.jb2 (2).png.PDF (2).png.PNG
(2).png.JPG (2).png.JPEG (2).png.JBIG2 (2).png.JB2 (2).png.eps

Figure 20.16: DP process using Eq. 20.5 for Fig. 20.14

between x and d of length m/2. With recurrence relation:

Dm (a, d) = min(Dm/2 (a, d), Dm/2 (a, x) + Dm/2 (x, d)) (20.5)
x

Similarly, each update takes |V | time. Differently, it only takes log |V |

updates to get the final optimal subproblems. Thus, this approach
gives a better time complexity, O(|V |3 log |V |). The process is demon-
strated in Fig. 20.16.
Alternatively, we define the subproblem as the shortest path between a
and d with x as an intermediate node along the path, the number of inter-
mediate node is |V |. Here, we use k to index the intermediate node, and i, j
to index the start and end node. Then a subproblem Dk (i, j) can be either
the shortest path between i and j with intermediate nodes 0, 1, ..., k − 1 or
the shortest path between i and k with all previous intermediate nodes plus
the shortest path between k and j with all previous intermediate nodes. The
recurrence relation is:

Dk (i, j) = min(D{0,...,k−1} (i, j), D{0,...,k−1} (i, k) + D{0,...,k−1} (k, j)) (20.6)

As we see, each recurrence update only takes constant time. At the end,
after we consider all possible intermediate nodes, we reach out to the optimal
460 20. ADVANCED GRAPH ALGORITHMS

solution. This approach results in the best time complexity, O(|V |3 ) so far.
We demonstrate the update process in Fig. 20.17. At pass C, using C as
intermediate node, we end up only use C-th row and C-th column to update
our matrix.
(3).png (3).png.mps (3).png.pdf (3).png.png (3).png.jpg (3).png.mps
(3).png.jpeg (3).png.jbig2 (3).png.jb2 (3).png.PDF (3).png.PNG
(3).png.JPG (3).png.JPEG (3).png.JBIG2 (3).png.JB2 (3).png.eps

Figure 20.17: DP process using Eq. 20.6 for Fig. 20.14

As we shall see later, the first way is similar to Bellman-Ford, the second
is a repeated squaring version of Bellman-Ford, and the third is Floyd-
warshall algorithm.

Greedy algorithms For |V |2 subproblems, solving each subproblem takes

at least |V | using Floyd-warshall algorithm. Greedy approach would think of
ways to decide the optional solution to each subproblem in one try, making
it |V |2 or |V | + |E|. We will see Dijkstra algorithm which is only applicable
on all positive weighted W .
In the following section, we start with going through algorithms solving
20.5. SHORTEST-PATHS ALGORITHMS 461

single-source shortest path problem before we put up more details to the

all-pair shortest path algorithms introduced above.

20.5.2 The Bellman-Ford Algorithm

Bellman-ford algorithm addresses single-source shortest path problem using
a single-source version of DP approach one.

Dynamic Programming Representation Given a single source node s

in graph G, we define D and Π as just a one-dimensional vector instead of a
matrix in all-pair shortest paths. Dim represents the shortest path between
s and i with maximum path length m. When m = 0, there is a shortest
path from s to v with no edge iff s = v.
(
0 if s = i
Di0 = (20.7)
∞ otherwise

Similarly, Π0 is initialized as None. Our simplified recurrence relation is:

Dim = min(Dim−1 , min (Dkm−1 + W (k, i))) (20.8)

k,k∈[0,n−1]

which can be further simplified to:

Dim = min (Dkm−1 + W (k, i)) (20.9)

k,k∈[0,n−1]

In Eq. 20.9, once an intermediate node is found to have smaller tentative

path weight than the current’s value, we set Π(i) = k.

Implementation In function bellman_ford_dp, W is an n × n adjacency

matrix. In the first for loop, we run recurrence relation in Eq. 20.9 for
|V | − 1 passes, giving the fact that other than the negative-weight cycle,
there will be at most |V | − 1 edges for all paths within the graph.
1 d e f bellman_ford_dp ( s , W) :
2 n = l e n (W)
3 # D, p i
4 D = [ f l o a t ( ' i n f ' ) i f i != s e l s e 0 f o r i i n r a n g e ( n ) ] # ∗ n
5 P = [ None ] ∗ n
6 f o r m i n r a n g e ( n−1) :
7 newD = D [ : ]
8 f o r i in range (n) : # endpoint
9 f o r k i n r a n g e ( n ) : # i n t e r m e d i a t e node
10 i f D[ k ] + W[ k ] [ i ] < newD [ i ] :
11 P[ i ] = k
12 newD [ i ] = D[ k ] + W[ k ] [ i ]
13
14 D = newD
15 p r i n t ( f 'D{m+1}: {D} ' )
16 r e t u r n D, P
462 20. ADVANCED GRAPH ALGORITHMS

Now, to retrieve the path from source s to other vertices, we implement

a recursive function named get_path that starts from the target u and
backtraces to the source s through Π. The code is as:
1 d e f get_path (P , s , u , path ) :
2 path . append ( u )
3 i f u == s :
4 p r i n t ( ' Reached t o t h e s o u r c e v e r t e x , s t o p ! ' )
5 r e t u r n path [ : : − 1 ]
6 e l i f u i s None :
7 p r i n t ( f "No path found between { s } and {u } . " )
8 return [ ]
9 else :
10 r e t u r n get_path (P , s , P [ u ] , path )

For the graph in Fig. 20.12, the updating on D using s as source is visualized
in Fig. 20.18. Connecting all red arrows along with the shaded gray nodes,
(2).png (2).png.mps (2).png.pdf (2).png.png (2).png.jpg (2).png.mps
(2).png.jpeg (2).png.jbig2 (2).png.jb2 (2).png.PDF (2).png.PNG
(2).png.JPG (2).png.JPEG (2).png.JBIG2 (2).png.JB2 (2).png.eps

Figure 20.18: The update on D for Fig. 20.12. The gray filled spot marks
the nodes that updated its estimate value, with its precessor indicated by
incoming red arrow.

we have a tree structure, each update on D, we expand the tree by one more
level, updating the best estimate reaching to target node with one more
possible edge. We visualize this tree structure in Fig. 20.21. We explain the
tree like this: if we are at most one edge away from s, we get t as small
as 6, if we are three edges away, t is able to gain a smaller value through
its predecessor x which is at most 2 edges away. After the last round of
update, when the tree reaches to height |V | − 1, the predecessor vector Π
will gives out the shortest-path tree: each edge in the shortest path tree can
be obtained by connecting each predecessor with vertices in the graph. The
shortest-path tree is marked in Fig. 20.21 in red color.
20.5. SHORTEST-PATHS ALGORITHMS 463

Figure 20.19: The tree structure indicates the updates on D, and the short-
est path tree marked by red arrows.

Formal Bellman-Ford Algorithm In the above implementation, at each

round, we made a copy of D, which is named newD. However, we can actually
reuse the original D and update directly on it. The difference is: we would
update D[i] at step m with other D[k] at step m instead of at step m − 1,
making some nodes’ optimal estimate even more optimal. In the previous
implementation, we can guarantee that after iteration i, for all i ∈ [0, n − 1],
Di is at most the weight of every path from s to i using at most m edges.
In the new version, we end up reaching to the optimal value even earlier,
but still it takes n − 1 passes to guarantee.
Second, the inner two for loops are equivalently enumerating edges: for
each possible edge (k, i), we update the best estimate for node i. With such
two points modified, we get our official the Bellman-Ford algorithm, which
states:

1. Initialize D and Π as D0 and Π0 .

2. Run a relaxation process for |v| − 1 passes. Within each pass, go

through each edge (u, v) ∈ E, with Eq. 20.9, if using u as an interme-
diate node, the tentative shortest path has smaller value, update D
and Π.

Implement Bellman-Ford by checking edges from adja-

cency list as defined in g for Fig. 20.12. We should notice
that different ordering of vertices /edges to be relaxed
leads to different intermediate results in D, though the
final result is the same.

Implementation We give out an exemplary implementation

1 d e f bellman_ford ( g : d i c t , s : s t r ) :
2 n = len (g)
464 20. ADVANCED GRAPH ALGORITHMS

3 # A s s i g n an e n u m e r i a l i n d e x f o r each key
4 V = g . keys ( )
5 # Key t o i n d e x
6 v e r 2 i d x = d i c t ( z i p (V, [ i f o r i i n r a n g e ( n ) ] ) )
7 # Index t o key
8 i d x 2 v e r = d i c t ( z i p ( [ i f o r i i n r a n g e ( n ) ] , V) )
9 # I n i t i a l i z a t i o n t h e dp matrix with d e s t i m a t e and p r e d e c e s s o r
10 s i = ver2idx [ s ]
11 D = [ f l o a t ( ' i n f ' ) i f i != s i e l s e 0 f o r i i n r a n g e ( n ) ] # ∗ n
12 P = [ None ] ∗ n
13
14 # n−1 p a s s e s
15 f o r i i n r a n g e ( n−1) :
16 # r e l a x a l l edges
17 f o r u i n V:
18 ui = ver2idx [ u ]
19 for v , w in g [ u ] :
20 vi = ver2idx [ v ]
21 # Update dp ' s minimum path v a l u e and p r e d e c e s s o r
22 i f D[ v i ] > D[ u i ] + w :
23 D[ v i ] = D[ u i ] + w
24 P[ vi ] = ui
25 p r i n t ( f 'D{ i +1}: {D} ' )
26 r e t u r n D, P , v e r 2 i d x , i d x 2 v e r

During each pass, we relax on the estimation D with the following ordering:
's ':[( ' t ' , 6) , ( ' y ' , 7) ] ,
't ':[( ' x ' , 5 ) , ( ' y ' , 8 ) , ( ' z ' , −4) ] ,
'x ' : [ ( ' t ' , −2) ] ,
'y ' : [ ( ' x ' , −3) , ( ' z ' , 9 ) ] ,
'z ' : [ ( ' x ' ,7) ] ,

Printing out on the updates of D, we can see that it converges to the optimal
value faster than the previous strict Dynamic programming version.

Time Complexity The first dynamic programming solution takes O(|V |3 ),

and the formal Bellman-Ford takes O(|V ||E|). The later would be more ef-
ficient than the first if our graph is dense.

Detect Negative-weight Cycle If the graph contains no negative-weight

cycle, after |V | − 1 passes of relaxation, D will reach to the minimum path
value. Thus, if we run additional pass of relaxation, no vertex would be
updated further. However, if there exists at least one negative-weight cycle,
the |V |th update will have at least one vertex in D with decreased value.

Special Cases and Further Optimization

From the perspective of optimization, there are at least two approaches we
can try to further boost the time efficiency, such as
20.5. SHORTEST-PATHS ALGORITHMS 465

1. special linear ordering of vertices to relax its leaving edges that leads
us to its shortest-paths in just one pass of the Bellman-Ford algorithm,

2. and some greedy approach that takes only one pass of relaxation which
can be similar to breath-first graph search or the Prim’s algorithm.

In Fig. 20.18, suppose we are relaxing leaving edges of vertices in linear order
[s, t, y, z, x], the process will be as follows:
vertex edges relaxed vertices
s ( s , t ) ,( s , y) {t :6 , y:7}
t ( t , x) ,( t , y) ,( t , z ) {x : 1 1 , z : 2 , t : 6 , y : 7 }
y (y , x) , (y , z ) {x : 4 , z : 2 , t : 6 , y : 7 }
z (z , x) {x : 4 , z : 2 , t : 6 , y : 7 }
x (x , t ) {t :2 , x :4 , z :2 , y:7}

(a) Initialization (b) After 1st Pass

Figure 20.20: The execution of Bellman-Ford’s Algorithm with ordering

[s, t, y, z, x].

The process is also visualized in Fig. 20.20. We see that only vertex z did
not find its shortest-path weight. Why? From s to z, there are paths:
(s, t, z), (s, t, z), (s, t, y, z), (s, y, x, t, z). If we want to make sure after one
pass of updates, vertex z reaches to its minimum shortest-path weight, we
have to make sure its predecessors all reach to its minimum-path weight too
which are vertex y and t. Same rule applies to its predecessors. In this
graph, the ordering
vertex predecessor
s None
t s, x
y s, t
x t , y, z
z y, t

From the listing, we see that the pair t and x conflicts each other: t needs
x as predecessor and x needs t as predecessor. Tracking down this clue, we
will find out that it is due to the fact that t and x coexist in a cycle.
466 20. ADVANCED GRAPH ALGORITHMS

(a) Initialization (b) After 1st Pass

Figure 20.21: The execution of Bellman-Ford’s Algorithm on DAG using

topologically sorted vertices. The red color marks the shortest-paths tree.

Order Vertices with Topological Sort Taking away edge (x, t), we are
able to obtain a topological ordering of the vertices, which is [s, t, y, z, x].
Relaxing vertices by this order of its leaving edges will guarantee to reach
to the global-wise shortest-path weight that would otherwise be reached in
|V |−1 passes in Bellman-Ford algorithm using arbitrary ordering of vertices.
The shortest-paths tree is shown in Fig. 20.21.
So far, we discovered a O(|V | + |E|) linear algorithm for single-source
shortest-path problem when the given graph being directed, weighted, and
acyclic. The algorithm consists of two steps: topological sorting of vertices
in G and one pass of Bellman-Ford algorithm using the reordered vertices
instead of arbitrary ordering. Calling the topo_sort function from Sec-
tion. 20.2, we have our Python code:
1 d e f bellman_ford_dag ( g , s ) :
2 s = s
3 n = len (g)
4 # Key t o i n d e x
5 ver2idx = d i c t ( z i p ( g . keys ( ) , [ i f o r i in range (n) ] ) )
6 # Index t o key
7 idx2ver = d i c t ( z i p ( [ i f o r i in range (n) ] , g . keys ( ) ) )
8 # Convert g t o i n d e x
9 ng = [ [ ] f o r _ i n r a n g e ( n ) ]
10 f o r u in g . keys ( ) :
11 for v , _ in g [ u ] :
12 ui = ver2idx [ u ]
13 vi = ver2idx [ v ]
14 ng [ u i ] . append ( v i )
15 V = t o p o _ s o r t ( ng )
16 # I n i t i a l i z a t i o n t h e dp matrix with d e s t i m a t e and p r e d e c e s s o r
17 s i = ver2idx [ s ]
18 dp = [ ( f l o a t ( ' i n f ' ) , None ) f o r i i n r a n g e ( n ) ]
19 dp [ s i ] = ( 0 , None )
20
21 # r e l a x a l l edges
20.5. SHORTEST-PATHS ALGORITHMS 467

22 f o r u i i n V:
23 u = idx2ver [ ui ]
24 for v , w in g [ u ] :
25 vi = ver2idx [ v ]
26 # Update dp ' s minimum path v a l u e and p r e d e c e s s o r
27 i f dp [ v i ] [ 0 ] > dp [ u i ] [ 0 ] + w :
28 dp [ v i ] = ( dp [ u i ] [ 0 ] + w, u i )
29 r e t u r n dp

20.5.3 Dijkstra’s Algorithm

From Prim’s to Dijkstra’s In Breath-first Search, it hosts a FIFO
queue, and whenever the vertex finishes exploring and turns into BLACK
color, it is guaranteed to have the shortest-path length from the source.
Similarly, in Prim’s algorithm, it maintains a priority queue of cross edges
between the spanning tree set S and the remaining set V − S, whenever a
vertex is added into S, it is a part of the MST.
In the shortest-path problem, using the same initialization in Bellman-
Ford algorithm, that source vertex has 0 estimate to the source and all other
vertices take ∞. Following the process of Prim’s algorithm, we set a set S to
save vertices that has found its shortest-path weight and predecessor, which
is empty initially. Then, the algorithm starts the from the “lightest” vertex
in V − S to add to the set S, which is source vertex s at first, and it relax on
the shortest-path estimate of vertices that are the endpoints of edges leaving
the lightest vertex. This process is repeated in a loop until V − S is empty.
This devised approach indeed follows the principle of greedy algorithm just
as Prim’s algorithm does, this algorithm is called Dijkstra’s.

How is it greedy? Dijkstra’s is the “greedy” version of Bellman-ford

Algorithm. At each step, dynamic programming uses Eq. 20.9 to update
Dim by trying all possible edges that extend the paths between s and i one
at a time. Bellman-ford can only guarantee to achieve the optimal solution at
the very end of running all passes. However, in Dijkstra algorithm, it reaches
to the optimal solution in only one step–whenever a vertex is added into S,
it adds a vertex in the shortest-path tree with only “local” information.

Correctness Condition: Non-negative Weight But, how to make

sure that whenever the vertex was added into set S, it reaches to its shortest-
path weight? Specifically, how to ensure our locally optimal decision is global
optimal? This also means after this step, no matter how many additional
paths with larger path length can reach to i, they shall never have less
distance. This requires all of graph edges to be non-negative.

Implementation The implementation relies on the PriorityQueue() cus-

tomized data structure once again, where we can modify an existing item in
468 20. ADVANCED GRAPH ALGORITHMS

(a) s enters Queue (b) s enters S (c) t and y enter Queue

(d) t enters S (e) z and x enter Queue (f) y enters S

(g) x is modified in Queue (h) z enters S (i) Queue is not modified

(j) x enters S

Figure 20.22: The execution of Dijkstra’s Algorithm on non-negative

weighted graph. Red circled vertices represent the priority queue, and blue
circled vertices represent the set S. Eventually, the blue colored edges rep-
resent the shortest-paths tree.

the queue. There are two ways to apply the priority queue:

• Add all vertices into the queue all at once at the beginning. Then only
deque and modification operations are needed.

• Add vertex in the queue only when it is relaxed and has a non-∞
shortest-path estimate. The process of Dijkstra algorithm on a non-
negative weighted graph that takes this approach of queue is demon-
strated in Fig. 20.22 and the code is as follows:
20.5. SHORTEST-PATHS ALGORITHMS 469

1 def dijkstra (g , s ) :
2 Q = PriorityQueue ()
3 S = []
4 # t a s k : v e r t e x id , p r i o r i t y : s h o r t e s t −path e s t i m a t e , i n f o :
predecessor
5 Q. add_task ( t a s k=s , p r i o r i t y =0 , i n f o=None )
6 visited = set ()
7 w h i l e not Q. empty ( ) :
8 # Use t h e l i g h t v e r t e x
9 u , up , ud = Q. pop_task ( )
10 v i s i t e d . add ( u )
11 S . append ( ( u , ud , up ) )
12
13 # Relax a d j a c e n t v e r t i c e
14 for v , w in g [ u ] :
15 # Already found t h e s h o r t e s t path f o r t h i s i d
16 i f v in v i s i t e d :
17 continue
18
19 vd , vp = Q. g e t _ t a s k ( v )
20 # F i r s t time t o add t h e t a s k o r a l r e a d y i n t h e queue , but
need update
21 i f not vd o r ud + w < vd :
22 Q. add_task ( t a s k=v , p r i o r i t y=ud + w, i n f o=u )
23 return S

Complexity Analysis Once again, the complexity of Dijkstra’s relies on

the specific implementation of the priority queue. In our implementation,
we used a customized PriorityQueue() which takes |V | to initialize the
queue. In this queue, we did not really remove the task from the queue
but instead marked it as “REMOVED,” so we can end up having maximum
of |E| vertices in the queue, making the cost of extracting the minimum
item be O(log |E|). For the update, the main cost comes from inserting an
new vertex through heappush-like operation, which is O(log |E|) too. In
all, we have |V | times of pops and |V | times of updates, ending up with a
worst-case time complexity of O(|V | log |E|).

Try to prove the correctness of Dijkstra’s using greedy

algorithms’ two approaches on proving.

20.5.4 All-Pairs Shortest Paths

In this section, we first summarize the solutions to singe-source shortest-path
problem due to the fact that the problem of finding all-pairs shortest-path
problem can be naturally decomposed into |V | such single sourced subprob-
lems. Next, we systematically build into three all-pair paths algorithms we
are about to learn:
470 20. ADVANCED GRAPH ALGORITHMS

Summary to Single-source Shortest-Path Algorithms The solutions

vary to the type of weighted graph G that we are dealing with:
• if (1) each weight w ∈ R and (2) only non-negative cycle, we can
apply the generalist dynamic programming approach–Bellman-Ford
Algorithm,

• if each weight is non-negative, i.e., w ∈ R+ , we take the greedy

approach–“Dijkstra’s Algorithm”

• and (1) if the graph is acyclic and (2) only have non-negative cycles,
we can run one pass of Bellman-Ford algorithm with vertices being
relaxed in topologically sorted liner ordering.
Depends on which category the given graph G falls into, a naive and nature
solution to all-pairs shortest-path problem can be addressed by running the
corresponding algorithm |V | passes–once for each vertex viewed as source in
a complexity scaled by |V | times.

Extended Bellman-Ford’s Algorithm

We leverage the first DP approach in Section 20.5.1. Define weight matrix
W , shortest-path weight estimate matrix D, and predecessor matrix Π. We
have recurrence relation:

Dm (i, j) = min (Dm−1 (i, k) + W (k, j)), (20.10)

k∈[0,n−1]

Πm (i, j) is updated by:

(
None, if Dm (i, j) = 0 or Dm (i, j) = ∞,
Π (i, j) =
m
argmink∈[0,n−1] (D m−1 (i, k) + W (k, j)), otherwise.
(20.11)

with initialization:
(
0, if i = j,
D (i, j) =
0
(20.12)
∞, otherwise.

Π0 (i, j) = N one (20.13)

In detail, out extended Bellman-ford algorithm consists of these main

steps:
1. Initialization: we initialize d and π using Eq. 20.12 and 20.13.

2. For every pair of vertices i and j, we update the d and π using recur-
rence relation in Eq. 20.10 and 20.13, respectively, for |V | − 1 passes.
20.5. SHORTEST-PATHS ALGORITHMS 471

3. Run the |V |th pass to decide if any negative-weight cycle exist in each
rooted shortest-path tree.

To notice that after one pass of update on D since it is initialized, D(1) = W ,

thus, in our implementation, only |V | − 2 passes of updates are needed actu-
ally. Assume we have converted the graph shown in Fig. 20.12 into a W adja-
cency matrix representation and a dictionary key2idx that maps each key to
a numerical index from 0 to |V |−1. This extended Bellman-ford algorithm is
implemented in main function extended_bellman_ford_with_predecessor
which calls a subfunction bellman_ford_with_predecessor that does one
pass of relaxation and does not detect non-negative cycle. The code is as:
1 import copy
2 d e f b e l l m a n _ f o r d _ w i t h _ p r e d e c e s s o r (W, L , P) :
3 n = l e n (W)
4 f o r i in range (n) : # source
5 f o r j in range (n) : # endpoint
6 f o r k i n r a n g e ( n ) : # ex tend one edge
7 i f L [ i ] [ k ] + W[ k ] [ j ] < L [ i ] [ j ] :
8 L [ i ] [ j ] = L [ i ] [ k ] + W[ k ] [ j ] # s e t d
9 P[ i ] [ j ] = k # set predecessor
10
11 d e f e xtended_bellman_ford_with_predecessor (W) :
12 n = l e n (W)
13 # i n i t i a l i z e L, f i r s t pass
14 L = copy . deepcopy (W)
15 p r i n t ( f ' L1 : {L} \n ' )
16 P = [ [ None f o r _ i n r a n g e ( n ) ] f o r _ i n r a n g e ( n ) ]
17 f o r i in range (n) :
18 f o r j in range (n) :
19 i f L [ i ] [ j ] != 0 and L [ i ] [ j ] != f l o a t ( ' i n f ' ) :
20 P[ i ] [ j ] = i
21 # n−2 p a s s e s
22 f o r i i n r a n g e ( n−2) :
23 b e l l m a n _ f o r d _ w i t h _ p r e d e c e s s o r (W, L , P)
24 p r i n t ( f 'L{ i +2}: {L} \n ' )
25 return L, P

The L matrix will be having all zeros along the diagonal, in this case, it is
[ [0 , 2 , 4 , 7 , −2] ,
[ inf , 0 , 3 , 8 , −4] ,
[ inf , −2, 0 , 6 , −6] ,
[ inf , −5, −3, 0 , −9] ,
[ inf , 5 , 7 , 13 , 0 ] ] ,

We reconstruct the shortest-path trees and visualize them in Fig. 20.23.

Repeated Squaring Extended Bellman-Ford Algorithm

We leverage the second DP approach in Section 20.5.1. This approach bears
resemblance to the repeated squaring optimization in matrix multiplication.
472 20. ADVANCED GRAPH ALGORITHMS

(a) s as source (b) t as source (c) x as source

(d) y as source (e) z as source

Figure 20.23: All shortest-path trees starting from each vertex.

Repeated squaring is a general method for fast computation of exponenti-

ation with large powers of a number or more generally of a polynomial or
a square matrix. The underlying algorithm design methodogy is divide and
conquer. Assume our input is xn , where x is an expression, repeat squar-
ing computes this in O(log n) steps by repeatedly squaring an intermediate
result. Repeating Squaring method is actually used a lot in some advanced
algorithm. Another one we will see in String algorithms.

Repeated Squaring Applied on Extended Bellman-Ford Algorithm

If we observe the bellman_ford_one_pass, it has three for loops, and it
shows similar pattern with matrix multiplication. Suppose A and B are
both n × n matrix, and we compute C = A × B, the formulation is cij =
k=0 aik ·bkj which has the same pattern as of Eq. 20.10. If we use · to mark
Pn−1

bellman_ford_one_pass operation on L and W , we will have the following

relations:

L1 = L0 · W = W, (20.14)
L =L ·W =W ,
2 1 2

L3 = L2 · W = W 3 ,
..
.
Ln−1 = Ln−2 · W = W n−1

With repeated squaring technique, we can compute Ln−1 with only

20.5. SHORTEST-PATHS ALGORITHMS 473

log(n − 1) round of one pass operation

L1 = W, (20.15)
L = W · W,
2

L4 = W 2 · W 2 ,
..
.

The above repeation stops when our m ≥ n − 1. The implementation is:

1 import copy
2 import math
3 d e f b e llman_ford_repeated_square (L) :
4 n = l e n (W)
5 f o r i in range (n) : # source
6 f o r j in range (n) : # endpoint
7 f o r k in range (n) : # double the extending length
8 L [ i ] [ j ] = min (L [ i ] [ j ] , L [ i ] [ k]+L [ k ] [ j ] )
9
10 d e f extended_bellman_ford_repeated_square (W) :
11 n = l e n (W)
12 # i n i t i a l i z e L, f i r s t pass
13 L = copy . deepcopy (W)
14 p r i n t ( f ' L1 : {L} \n ' )
15 # log n passes
16 f o r i i n r a n g e ( math . c e i l ( math . l o g ( n ) ) ) :
17 b ellman_ford_repeated_square (L)
18 p r i n t ( f 'L{2^( i +1) } : {L} \n ' )
19 return L

The Floyd-Warshall Algorithm

We leverage the third DP approach in Section 20.5.1, this approach is called
The Floyd-Warshall Algorithm. We directly put the code here:
1 d e f f l o y d _ w a r s h a l l (W) :
2 L = copy . deepcopy (W) #L0
3 n = l e n (W)
4 f o r k i n r a n g e ( n ) : # i n t e r m e d i a t e node
5 f o r i i n r a n g e ( n ) : # s t a r t node
6 f o r j i n r a n g e ( n ) : # end node
7 L [ k ] [ i ] = min (L [ k ] [ i ] , L [ k ] [ j ] + L [ j ] [ i ] )
8 return L
474 20. ADVANCED GRAPH ALGORITHMS
21

Advanced Data Structures

In this chapter, we extend the data structure learned from the first part with
more advanced data structures. These data structures are not as widely used
as the basic data structures, however, they can be often seen to implement
more advanced algorithms or they can be more efficient compared with al-
gorithms that relies on a more basic version.

21.1 Monotone Stack

A monotone Stack is a data structure the elements from the front to the
end is strictly either increasing or decreasing. For example, there is an
line at the hair salo, and you would naturally start from the end of the
line. However, if you are allowed to kick out any person that you can win
at a fight, if every one follows the rule, then the line would start with the
most powerful man and end up with the weakest one. This is an example
of monotonic decreasing stack.

• Monotonically Increasing Stack: to push an element e, starts from the

rear element, we pop out element r >= e (violation);

• Monotonically Decreasing Stack: we pop out element r <= e (viola-

tion). T

The process of the monotone decresing stack is shown in Fig. 21.1. Some-
times, we can relax the strict monotonic condition, and can allow the stack
or queue have repeat value.
To get the feature of the monotonic queue, with [5, 3, 1, 2, 4] as example,
if it is increasing:

475
476 21. ADVANCED DATA STRUCTURES

Figure 21.1: The process of decreasing monotone stack

index v Increasing stack Decreasing stack

1 5 [5] [5]
2 3 [ 3 ] 3 k i c k out 5 [ 5 , 3 ] #3−>5
3 1 [ 1 ] 1 k i c k out 3 [ 5 , 3 , 1 ] #1−>3
4 2 [ 1 , 2 ] #2−>1 [ 5 , 3 , 2 ] 2 k i c k out 1
5 4 [ 1 , 2 , 4 ] #4−>2 [ 5 , 4 ] 4 k i c k out 2 , 3

By observing the above process, what features we can get?

• Pushing in to get smaller/larger item to the left: When we push an

element in, if there exists one element right in front of it, 1) for increas-
ing stack, we find the nearest smaller item to the left of current
item, 2) for decreasing stack, we find the nearest larger item to the
left instead. In this case, we get [-1, -1, -1, 1, 2], and [-1, 5, 3, 3, 5]
respectively.

• Popping out to get smaller/larger item to the right: when we pop one
element out, for the kicked out item, such as in step of 2, increasing
stack, 3 forced 5 to be popped out, for 5, 3 is the first smaller item
to the right. Therefore, if one item is popped out, for this item, the
current item that is about to be push in is 1) for increasing stack,
the nearest smaller item to its right, 2) for decreasing stack, the
nearest larger item to its right. In this case, we get [3,1, -1, -1,
-1], and [-1, 4, 2, 4, -1] respectively.

The conclusion is with monotone stack, we can search for smaller/larger

items of current item either to its left/right.
21.1. MONOTONE STACK 477

Basic Implementation This monotonic queue is actually a data struc-

ture that needed to add/remove element from the end. In some application
we might further need to remove element from the front. Thus Deque from
collections fits well to implement this data structure. Now, we set up the
example data:
1 A = [5 , 3 , 1 , 2 , 4]
2 import c o l l e c t i o n s

Increasing Stack We can find first smaller item to left/right.

1 d e f i n c r e a s i n g S t a c k (A) :
2 s t a c k = c o l l e c t i o n s . deque ( )
3 f i r s t S m a l l e r T o L e f t = [ −1]∗ l e n (A)
4 f i r s t S m a l l e r T o R i g h t = [ −1]∗ l e n (A)
5 f o r i , v i n enumerate (A) :
6 w h i l e s t a c k and A[ s t a c k [ − 1 ] ] >= v : # r i g h t i s from t h e
popping out
7 f i r s t S m a l l e r T o R i g h t [ s t a c k . pop ( ) ] = v # A[ s t a c k [ − 1 ] ]
>= v
8 i f s t a c k : #l e f t i s from t h e p u s h i n g in , A[ s t a c k [ − 1 ] ] <
v
9 f i r s t S m a l l e r T o L e f t [ i ] = A[ s t a c k [ − 1 ] ]
10 s t a c k . append ( i )
11 return firstSmallerToLeft , firstSmallerToRight , stack

Now, run the above example with code:

1 firstSmallerToLeft , firstSmallerToRight , stack = increasingQueue
(A)
2 for i in stack :
3 p r i n t (A[ i ] , end = ' ' )
4 p r i n t ( ' \n ' )
5 print ( firstSmallerToLeft )
6 print ( firstSmallerToRight )

The output is:

1 1 2 4
2
3 [ −1 , −1, −1, 1 , 2 ]
4 [ 3 , 1 , −1, −1, −1]

Decreasing Stack We can find first larger item to left/right.

1 d e f d e c r e a s i n g S t a c k (A) :
2 s t a c k = c o l l e c t i o n s . deque ( )
3 f i r s t L a r g e r T o L e f t = [ −1]∗ l e n (A)
4 f i r s t L a r g e r T o R i g h t = [ −1]∗ l e n (A)
5 f o r i , v i n enumerate (A) :
6 w h i l e s t a c k and A[ s t a c k [ − 1 ] ] <= v :
7 f i r s t L a r g e r T o R i g h t [ s t a c k . pop ( ) ] = v
8
9 i f stack :
478 21. ADVANCED DATA STRUCTURES

10 f i r s t L a r g e r T o L e f t [ i ] = A[ s t a c k [ − 1 ] ]
11 s t a c k . append ( i )
12 return firstLargerToLeft , firstLargerToRight , stack

Similarily, the output is:

1 5 4
2
3 [ −1 , 5 , 3 , 3 , 5 ]
4 [ −1 , 4 , 2 , 4 , −1]

For the above problem, If we do it with brute force, then use one for loop
to point at the current element, and another embedding for loop to look
for the first element that is larger than current, which gives us O(n2 ) time
complexity. If we think about the BCR, and try to trade space for efficiency,
and use monotonic queue instead, we gain O(n) linear time and O(n) space
complexity.
Monotone stack is especially useful in the problem of subarray where we
need to find smaller/larger item to left/right side of an item in the array.
To better understand the features and applications of monotone stack, let
us look at some examples. First, we recommend the audience to practice
on these obvious applications shown in LeetCode Problem Section before
moving to the examples:
There is one problem that is pretty interesting:

Sliding Window Maximum/Minimum Given an array nums, there is

a sliding window of size k which is moving from the very left of the array to
the very right. You can only see the k numbers in the window. Each time
the sliding window moves right by one position. Return the max sliding
window. (LeetCode Probelm: 239. Sliding Window Maximum (hard))
Example :

Input : nums = [ 1 , 3 , − 1 , − 3 , 5 , 3 , 6 , 7 ] , and k = 3

Output : [ 3 , 3 , 5 , 5 , 6 , 7 ]
Explanation :

Window p o s i t i o n Max
−−−−−−−−−−−−−−− −−−−−
[ 1 3 −1] −3 5 3 6 7 3
1 [ 3 −1 −3] 5 3 6 7 3
1 3 [ −1 −3 5 ] 3 6 7 5
1 3 −1 [ −3 5 3 ] 6 7 5
1 3 −1 −3 [ 5 3 6] 7 6
1 3 −1 −3 5 [ 3 6 7] 7

Analysis: In the process of moving the window, any item that is smaller
than its predecessor will not affect the max result anymore, therefore, we
can use decrese stack to remove any trough. If the window size is the same
as of the array, then the maximum value is the first element in the stack
21.1. MONOTONE STACK 479

(bottom). With the sliding window, we record the max each iteration when
the window size is the same as k. At each iteration, if need to remove the
out of window item from the stack. For example of [5, 3, 1, 2, 4] with k =
3, we get [5, 3, 4]. At step 3, we get 5, at step 4, we remove 5 friom the
stack, and we get 3. At step 5, we remove 3 if it is in the stack, and we get
4. With the monotone stack, we decrease the time complexity from O(kn)
to O(n).
1 import c o l l e c t i o n s
2
3 d e f maxSlidingWindow ( s e l f , nums , k ) :
4 ds = c o l l e c t i o n s . deque ( )
5 ans = [ ]
6 f o r i i n r a n g e ( l e n ( nums ) ) :
7 w h i l e ds and nums [ i ] >= nums [ ds [ − 1 ] ] : i n d i c e s . pop ( )
8 ds . append ( i )
9 i f i >= k − 1 : ans . append ( nums [ ds [ 0 ] ] ) #append t h e
c u r r e n t maximum
10 i f i − k + 1 == ds [ 0 ] : ds . p o p l e f t ( ) #i f t h e f i r s t a l s o
t h e maximum number i s out o f window , pop i t out
11 r e t u r n ans

21.1 907. Sum of Subarray Minimums (medium). Given an array

of integers A, find the sum of min(B), where B ranges over every
(contiguous) subarray of A. Since the answer may be large, return the
answer modulo 109 + 7. Note: 1 <= A.length <= 30000, 1 <= A[i]
<= 30000.
Example 1 :

I nput : [ 3 , 1 , 2 , 4 ]
Output : 17
Explanation : Subarrays are [ 3 ] , [ 1 ] , [ 2 ] , [ 4 ] , [ 3 , 1 ] ,
[1 ,2] , [2 ,4] , [3 ,1 ,2] , [1 ,2 ,4] , [3 ,1 ,2 ,4].
Minimums a r e 3 , 1 , 2 , 4 , 1 , 1 , 2 , 1 , 1 , 1 . Sum i s 1 7 .

Analysis: For this problem, using naive solution to enumerate all

possible subarries, we end up with n2 subarray and the time complexity
would be O(n2 ), and we will receive LTE. For this problem, we just
need to sum over the minimum in each subarray. Try to consider the
problem from another angel, what if we can figure out how many times
each item is used as minimum value corresponding subarry? Then res
= sum(A[i]*f(i)). If there is no duplicate in the array, then To get f(i),
we need to find out:
• left[i], the length of strict bigger numbers on the left of A[i],
• right[i], the length of strict bigger numbers on the right of A[i].
For the given examples, if A[i] = 1, then the left item is 3, and the
right item is 4, we add 1*(left_len*right_len) to the result. However,
480 21. ADVANCED DATA STRUCTURES

if there is duplicate such as [3, 1, 4, 1], for the first 1, we need [3,1],
[1], [1,4], [1, 4,1] with subarries, and for the second 1, we need [4,1],
[1] instead. Therefore, we set the right length to find the >= item.
Now, the problem in converted to the first smaller item on the left side
and the first smaller or equal item on the right side. From the feature
we draw above, we need to use increasing stack, as we know, from the
pushing in, we find the first smaller item, and from the popping out,
for the popped out item, the current item is the first smaller item on
the right side. The code is as:
1 d e f sumSubarrayMins ( s e l f , A) :
2 n , mod = l e n (A) , 10∗∗9 + 7
3 l e f t , s1 = [ 1 ] ∗ n , [ ]
4 r i g h t = [ n−i f o r i i n r a n g e ( n ) ]
5 f o r i in range (n) : # f i n d f i r s t s m a l l e r to the l e f t
from p u s h i n g i n
6 w h i l e s 1 and A[ s 1 [ − 1 ] ] > A[ i ] : # can be e q u a l
7 i n d e x = s 1 . pop ( )
8 r i g h t [ i n d e x ] = i −i n d e x # k i c k e d out
9 i f s1 :
10 l e f t [ i ] = i −s 1 [ −1]
11 else :
12 l e f t [ i ] = i +1
13 s 1 . append ( i )
14 r e t u r n sum ( a ∗ l ∗ r f o r a , l , r i n z i p (A, l e f t , r i g h t )
) % mod

The above code, we can do a simple improvement, by adding 0 to each

side of the array. Then eventually there will only have [0, 0] in the
stack. All of the items originally in the array they will be popped out,
each popping, we can sum up the result directely:
1 d e f sumSubarrayMins ( s e l f , A) :
2 res = 0
3 s = []
4 A = [0] + A + [0]
5 f o r i , x i n enumerate (A) :
6 w h i l e s and A[ s [ − 1 ] ] > x :
7 j = s . pop ( )
8 k = s [ −1]
9 r e s += A[ j ] ∗ ( i − j ) ∗ ( j − k )
10 s . append ( i )
11 return r e s % (10∗∗9 + 7)

21.2 Disjoint Set

Disjoint-set data structure (aka union-find data structure or merge-find
set) maintains a collection S = {S1 , S2 , ..., Sk } of disjoint dynamic sets by
partitioning a set of elements. We identify each set by a representative,
which is some member of the set. It does matter which member is used
21.2. DISJOINT SET 481

only if we get the same answer both times if we ask for the representative
twice without modifying the set. Choosing the smallest member in a set
as representative is an examplary prespecified rule. According to its typical
applications such as implementing Kruskal’s minimum spanning tree algo-
rithm and tracking connected components dynamically, disjoint-set should
support the following operations:
1. make_set(x): create a new set whose only member is x. To keep these
sets to be disjoint, this member should not already be in some existent
sets.

2. union(x, y): unites the two dynamic sets that contain x and y, say
Sx ∪ Sy into a new set that is the union of these two sets. In practice,
we merge one set into the other say Sy into Sx , we then remove/destroy
Sy . This will be more efficient than create a new one that unions and
destroy the other two.

3. find_set(x): returns a pointer to the representative of the set that

contains x.

Applications Disjoint sets are applied to implement union-find algorithm

where perfoms find_set and union. Union-find algorithms can be used into
some basic graph algorithms, such as cycle detection, tracking connected
components in the graph dynamically, 1 , Krauskal’s MST algorithm, and
Dijkstra’s Shortest path algorithm.

Connected Component Before we move to the implementation, let us

first see how disjoint set can be applied to connected components. At first,
we assign a set id for each vertex in the graph. Then we traverse each edge,
and if the two endpoints of the edge belongs to different set, then we union
the two sets. As shown in the process, first vertex 0 and 1 has different set
id, then we update 1’s id to 0. For edge (1, 2), we update 2’s id to 0. For
edge(0, 2), they are already in the same set, no update needed. We apply
the same process with edge (2, 4), (3, 4), and (5, 6).

21.2.1 Basic Implementation with Linked-list or List

Before we head off to more efficient and complex implementation, we first
implement a baseline for the convenience of comparison. The key for the
implementation is two dictionaries named item_set (saves the mapping
between item and its set id, which will only be one to one) and set_item
(the value of the key will be a list, because one set will have one to multiple
relation).
1
where new edge will be added and the search based algorithm each time will be rerun
to find them again
482 21. ADVANCED DATA STRUCTURES

Figure 21.2: The connected components using disjoint set.

If our coding is right, each item must have an item when find_set
function is called, if not we will call make_set. For each existing set, it will
have at least one item. For function union, we choose the set that has less
items to merge to the one that with more items.
1 class DisjointSet () :
2 ' ' ' Implement a b a s i c d i s j o i n t s e t ' ' '
3 d e f __init__ ( s e l f , i t e m s ) :
4 s e l f . n = len ( items )
5 s e l f . item _set = d i c t ( z i p ( items , [ i f o r i i n r a n g e ( s e l f . n ) ] ) )
# f i r s t each s e t o n l y has one item [ i ] , t h i s can be one−>
m u l t i p l e match
6 s e l f . set_ item = d i c t ( z i p ( [ i f o r i i n r a n g e ( s e l f . n ) ] , [ [ item ]
f o r item i n i t e m s ] ) ) # each item w i l l always b e l o n g t o one
set
7
8 d e f make_set ( s e l f , item ) :
9 ' ' ' make s e t f o r new incoming s e t ' ' '
10 i f item i n s e l f . item_ set :
11 return
12
13 s e l f . item _set [ item ] = s e l f . n
14 s e l f . n += 1
15
16 d e f f i n d _ s e t ( s e l f , item ) :
17 i f item i n s e l f . ite m_set :
18 r e t u r n s e l f . item_s et [ item ]
19 else :
20 p r i n t ( ' not i n t h e s e t y e t : ' , item )
21 r e t u r n None
22
21.2. DISJOINT SET 483

23 d e f union ( s e l f , x , y ) :
24 id_x = s e l f . f i n d _ s e t ( x )
25 id_y = s e l f . f i n d _ s e t ( y )
26 i f id_x == id_y :
27 return
28
29 s i d , l i d = id_x , id_y
30 i f l e n ( s e l f . s et_ite m [ id_x ] ) > l e n ( s e l f . set_i tem [ id_y ] ) :
31 s i d , l i d = id_y , id_x
32 # merge i t e m s i n s i d t o l i d
33 f o r item i n s e l f . set_it em [ s i d ] :
34 s e l f . item _set [ item ] = l i d
35 s e l f . set _item [ l i d ] += s e l f . s et_ite m [ s i d ]
36 d e l s e l f . set_i tem [ s i d ]
37 return

Complexity For n items, we spend O(n) time to initialize the two hashmaps.
With the help of hashmap, function find_set tasks only O(1) time, accu-
mulating it will give us O(n). For function union, it takes more effort to
analyze. From another angle, for one item x, it will only update its item id
when we are unioning it to another set x1 . The first time, the resulting set
x1 will have at least two items. The second update will be union x1 to x2 .
Because the merged one will have smaller length, thus the resulting items
in x2 will at least be 4. Then it is the third, ..., up to k updates. Because
a resulting set will at most has n in size, so for each item, at most log n
updates will be needed. For n items, this makes the upper bound for union
to be n log n.
However, for our implementation, we has additional cost, which is in
union, where we merge the list. This cost can be easily limited to constant
by using linked list. However, even with list, there are different ways to
concatenate one list to another:

1. Use + operator: The time complexity of the concat operation for two
lists, A and B, is O(A + B). This is because you aren’t adding to one
list, but instead are creating a whole new list and populating it with
elements from both A and B, requiring you to iterate through both.

2. extend(lst): Use extend which doesn’t create a new list but adds to
the original. The time complexity should only be O(1). On the other
hand l += [i] modifies the original list and behaves like extend.

21.2.2 Implementation with Disjoint-set Forests

Instead of using linear linked list, we use tree structure. Different with trees
we have introduced before that a node points to its children, an item here
will only points to its parent. A tree represents a set, and the root node
is the representative and it points to itself. The straightforward algorithms
484 21. ADVANCED DATA STRUCTURES

that use this structure are not faster than the linked-list version. By in-
troducing two heuristics–“Union by rank” and “path compression"–we can
achieve asympotically optimal disjoint-set data structure.

Figure 21.3: A disjoint forest

Naive Version
We first need to create a Node class which stores item and another par-
ent pointer parent. An item can be any immutable data structure with
necessary information represents a node.
1 c l a s s Node :
2 d e f __init__ ( s e l f , item ) :
3 s e l f . item = item # s a v e node i n f o r m a t i o n
4 s e l f . p a r e n t = None

We need one dict data structure item_finder and one set data structure
sets to track nodes and set. From item_finder we can do (item, node) map
to find node, and then from the node further we can find its set representative
node or execute union operation. sets is used to track all the representative
nodes. When we union two sets, the one merged to the other will be deleted
in sets. At the easy version, make_set will create tree with only one node.
find_set will start from the node and traverse all the way back to its final
parent which is when node.parent==node. And a union operation will
simply point one tree’s root node to the root of another through parent.
The code is as follows:
1 class DisjointSet () :
2 ' ' ' Implement with d i s j o i n t −s e t f o r e s t ' ' '
3 d e f __init__ ( s e l f , i t e m s ) :
21.2. DISJOINT SET 485

4 s e l f . n = len ( items )
5 s e l f . item_finder = dict ()
6 s e l f . s e t s = s e t ( ) # s e t s w i l l have o n l y t h e p a r e n t node
7
8 f o r item i n i t e m s :
9 node = Node ( item )
10 node . p a r e n t = node
11 s e l f . i t e m _ f i n d e r [ item ] = node # from item we can f i n d t h e
node
12 s e l f . s e t s . add ( node )
13
14 d e f make_set ( s e l f , item ) :
15 ' ' ' make s e t f o r new incoming s e t ' ' '
16 i f item i n s e l f . i t e m _ f i n d e r :
17 return
18
19 node = Node ( item )
20 node . p a r e n t = node
21 s e l f . i t e m _ f i n d e r [ item ] = node
22 s e l f . s e t s . add ( node )
23 s e l f . n += 1
24
25 d e f f i n d _ s e t ( s e l f , item ) :
26 # from item−>node−>p a r e n t t o s e t r e p r e s e n t a t i v e
27 i f item not i n s e l f . i t e m _ f i n d e r :
28 p r i n t ( ' not i n t h e s e t y e t : ' , item )
29 r e t u r n None
30 node = s e l f . i t e m _ f i n d e r [ item ]
31 w h i l e node . p a r e n t != node :
32 node = node . p a r e n t
33 r e t u r n node
34
35 d e f union ( s e l f , x , y ) :
36 node_x = s e l f . f i n d _ s e t ( x )
37 node_y = s e l f . f i n d _ s e t ( y )
38 i f node_x . item == node_y . item :
39 return
40
41 #t h e r o o t o f one t r e e t o p o i n t t o t h e r o o t o f t h e o t h e r
42 # merge x t o y
43 node_x . p a r e n t = node_y
44 #remove one s e t
45 s e l f . s e t s . remove ( node_x )
46 return
47
48 d e f __str__ ( s e l f ) :
49 ans = ' '
50 for root in s e l f . sets :
51 ans += ' s e t : '+ s t r ( r o o t . item ) + ' \n '
52 r e t u r n ans
53
54 d e f p r i n t _ s e t ( s e l f , item ) :
55 i f item i n s e l f . i t e m _ f i n d e r :
56 node = s e l f . i t e m _ f i n d e r [ item ]
486 21. ADVANCED DATA STRUCTURES

57 p r i n t ( node . item , '−> ' , end= ' ' )

58 w h i l e node . p a r e n t != node :
59 node = node . p a r e n t
60 p r i n t ( node . item , '−> ' , end= ' ' )

Let’s run an example:

1 ds = D i s j o i n t S e t ( i t e m s =[ i f o r i i n r a n g e ( 5 ) ] )
2 ds . union ( 0 , 1 )
3 ds . union ( 1 , 2 )
4 ds . union ( 2 , 3 )
5 ds . union ( 3 , 4 )
6 p r i n t ( ds )
7 f o r item i n ds . i t e m _ f i n d e r . k e y s ( ) :
8 ds . p r i n t _ s e t ( item )
9 print ( ' ' )

The output is:

set : 4

0 −>1 −>2 −>3 −>4 −>

1 −>2 −>3 −>4 −>
2 −>3 −>4 −>
3 −>4 −>
4 −>

The above implementation, both make_set and union takes O(1) time com-
plexity. The main time complexity is incurred at find_set, which traverse
a path from node to root. If we assume each tree in the disjoint-set forest
is balanced, the upper bound of this operation will be O(log n). However, if
the tree is as worse as a linear linked list, the time complexity will goes to
O(n). This makes the total time complexity from O(n log n) to O(n2 ).

Heuristics
Union by Rank As we have seen from the above example, A sequence
of n − 1 union operations may create a tree that is just a linear chain of
n nodes. Union by rank, which is similar to the weighted-union heuristic
we used with the linked list implementation, is applied to avoid the worst
case. For each node, other than the parent pointer, it adds rank to track the
upper bound of the height of the associated node (the number of edges in
the longest simple path between the node and a descendant leaf). In union
by rank, we make the root with smaller rank point to the root with larger
rank.
In the initialization, and make_set operation, a single noded tree has an
initial rank of 0. In union(x, y), there will exist three cases:
Case 1 x . rank == y . rank :
j o i n x to y
y . rank += 1
Case 2 : x . rank < y . rank :
21.2. DISJOINT SET 487

j o i n y to x
x . rank += 1
Case 3 : x . rank > y . rank :
j o i n y to x
x ' s rank s t a y unchanged

Now, with adding rank to the node. We modify the naive implementation:
1 c l a s s Node :
2 d e f __init__ ( s e l f , item ) :
3 s e l f . item = item # s a v e node i n f o r m a t i o n
4 s e l f . p a r e n t = None
5 s e l f . rank = 0

The updated implementation of union:

1 d e f union ( s e l f , x , y ) :
2 node_x = s e l f . f i n d _ s e t ( x )
3 node_y = s e l f . f i n d _ s e t ( y )
4 i f node_x . item == node_y . item :
5 return
6
7 # link
8 i f node_x . rank > node_y . rank :
9 node_y . p a r e n t = node_x
10 #remove one s e t
11 s e l f . s e t s . remove ( node_y )
12 e l i f node_x . rank < node_y . rank :
13 node_x . p a r e n t = node_y
14 s e l f . s e t s . remove ( node_x )
15 else :
16 node_x . p a r e n t = node_y
17 node_y . rank += 1
18 s e l f . s e t s . remove ( node_x )
19 return

Path Compression In our naive implementation, find_set took the

most time. With path compression, during the process of find_set, it
simply make each node on the find path point directly to its root. Path
Compression wont affect the rank of each node. Now, we modify this func-
tion:
1 d e f _find_parent ( s e l f , node ) :
2 w h i l e node . p a r e n t != node :
3 node = node . p a r e n t
4 r e t u r n node
5
6 d e f f i n d _ s e t ( s e l f , item ) :
7 ' ' ' m o d i f i e d t o do path c o m p r e s s i o n ' ' '
8 # from item−>node−>p a r e n t t o s e t r e p r e s e n t a t i v e
9 i f item not i n s e l f . i t e m _ f i n d e r :
10 p r i n t ( ' not i n t h e s e t y e t : ' , item )
11 r e t u r n None
12 node = s e l f . i t e m _ f i n d e r [ item ]
488 21. ADVANCED DATA STRUCTURES

13 node . p a r e n t = s e l f . _find_parent ( node ) # change node ' s p a r e n t

t o t h e r o o t node
14 r e t u r n node . p a r e n t

The same example, the output will be:

set : 1

0 −>1 −>
1 −>
2 −>1 −>
3 −>1 −>
4 −>1 −>

1 import time , random

2 t 0 = time . time ( )
3 n = 100000
4 ds = D i s j o i n t S e t ( i t e m s =[ i f o r i i n r a n g e ( n ) ] )
5 f o r _ in range (n) :
6 i , j = random . r a n d i n t ( 0 , n−1) , random . r a n d i n t ( 0 , n−1) #[ 0 , n ]
7 ds . union ( i , j )
8 p r i n t ( ' time : ' , time . time ( )−t 0 )

Experiment to the running time of Linked-list VS naive

forest VS heuristic forest
We run the disjoint set with n=100,000, and with n times of union:
1 import time , random
2 t 0 = time . time ( )
3 n = 100000
4 ds = D i s j o i n t S e t ( i t e m s =[ i f o r i i n r a n g e ( n ) ] )
5 f o r _ in range (n) :
6 i , j = random . r a n d i n t ( 0 , n−1) , random . r a n d i n t ( 0 , n−1) #[ 0 ,
n]
7 ds . union ( i , j )
8 p r i n t ( ' time : ' , time . time ( )−t 0 )

The resulting time is: 1.09s, 50.4s, 1.19s

Note As we see, in our implementation, we have never removed any item

from disjoint-set structure. Also, from the above implementation, we know
the sets of the nodes, but we cant track items from the root node. How can
we further improve this?
21.3. FIBONACCI HEAP 489

21.3 Fibonacci Heap

21.4 Exercises
21.4.1 Knowledge Check
21.4.2 Coding Practice
Disjoint Set

1. 305. Number of Islands II (hard)

490 21. ADVANCED DATA STRUCTURES
22

String Pattern Matching Algorithms

Pattern matching is a fundamental string processing problem. Pattern

matching algorithms are also called string searching algorithms, and it is
defined a class of string algorithms that try to find a place where one or
several strings (also called patterns) are found within a larger string or text.
Based on if some mismathces are allowed or not, we have Exact or Ap-
proximate Pattern Matching. In this section, we start from exact single-
pattern matching algorithms where we only need to find one pattern in a
given string or text. Based on how on how many patterns we might have, we
have one-time or multiple-times string pattern matching problems. For
multiple-times matching, preprocessing the text using suffix array/trie/tree
can improve the total efficiency. This chapter is organized as:

1. Exact Pattern Matching: includes one-pattern and multiple patterns.

2. Approximate Pattern Matching:

22.1 Exact Single-Pattern Matching

Exact Single-pattern Matching Problem Given two strings or two
arrays, one is pattern P which has size m, and the other is the target string
or text T which has size n, the exact single-pattern matching problem is
defined as finding the first one or all occurrences of pattern P in the T as
substring, and return the starting indexes of all the occurrences.

Brute Force Solution The naive searching is straightforward, we slide

the pattern P like sliding window algorithm through the text T one by one
item. At each position i, we compare P with T[i:i+m]. In this process, we

491
492 22. STRING PATTERN MATCHING ALGORITHMS

Figure 22.1: The process of the brute force exact pattern matching

need to do n − m times of comparison, and each comparison takes maximum

of m times of computation. This brute force solution gives O(mn) time
complexity.
1 def bruteForcePatternMatching (p , s ) :
2 i f len (p) > len ( s ) :
3 r e t u r n [ −1]
4 m, n = l e n ( p ) , l e n ( s )
5 ans = [ ]
6 f o r i i n r a n g e ( n−m+1) :
7 i f s [ i : i+m] == p :
8 ans . append ( i )
9 r e t u r n ans
10
11 p = "AABA"
12 s = "AABAACAADAABAABA"
13 p r i n t ( bruteForcePatternMatching (p , s ) )
14 # output
15 # [0 , 9 , 12]
22.1. EXACT SINGLE-PATTERN MATCHING 493

We write it in another way that use less built-in python function:

1 def bruteForcePatternMatchingAll (p , s ) :
2 i f not s o r not p :
3 return [ ]
4 m, n = l e n ( p ) , l e n ( s )
5 i , j = 0, 0
6 ans = [ ]
7 while i < n :
8 # do t h e p a t t e r n matching
9 i f s [ i ] == p [ j ] :
10 i += 1
11 j += 1
12 i f j == m: #c o l l e c t p o s i t i o n
13 ans . append ( i − j )
14 i = i −j +1
15 j = 0
16 else :
17 i = i −j + 1
18 j = 0
19 r e t u r n ans

For LeetCode Problems, most times, brute force solution will not be ac-
cepted and receive LTE. In real applications, such as human genome match-
ing, the text can have approximate size of 3 ∗ 109 and the pattern can be
very long to, such as 108 . Therefore, other faster algorithms are needed to
improve the efficiency.
The other algorithms requires us preprocess either/both the pattern and
text. In this book, we mainly discuss three algorithms:

1. Knuth Morris Pratt (KMP) Algorithm (Section 22.1.1). KMP is a

linear algorithm, and it should mostly be enough to solve interview
related string matching, and also once we understand the algorithm,
the implementation is quite trivial, which makes it a very good algo-
rithm during interviews. It has O(m + n) and O(m) in the case of the
time and space complexity.

2. Suffix Trie/Tree/Array Matching (Section 22.2.2).

22.1.1 Prefix Function and Knuth Morris Pratt (KMP)

In the above brute force solution, we compare our pattern with each item
as starting window in the text. Each matching result is independent of each
other, which is a lot of information lose to improve the efficiency.

Skipping Positions See Fig. 22.1, we know a matching at step 1. Is it

necessary for us to do step 2 and step 3? The pattern itself tells us it is
impossible to get a match at step 2 and step 3 because ’b’ will dismatch ’a’
and ’r’ will dismatch ’a’ too. However, at the original step 4, by analyzing
494 22. STRING PATTERN MATCHING ALGORITHMS

the pattern itself we know ’a’ will match ’a’, and any step further, we have
not enough information to cover, therefore, step 4 is necessary to compare
’c’ with ’b’ in the pattern. In this example, step 4, 5, 6, 7 are all needed but
step 4, 5, 6 will only end up do one or two comparison each step.

Figure 22.2: The Skipping Rule

The reason why step 2 and 3 can be skipped can be shown from Fig. 22.2.
If we analyze our pattern at first, we will know at step 2 and step 3, “bra”
not equals to “abr” and “ra” not equals to “ab”. While at step 4, we do have
“a” equals to “a”. If we observe further of the relations of these pairs, we will
know they are suffix and prefix of the same length of the pattern. Inspired
by this, we define border of string S as a prefix of S which is equals to a
suffix of the same length of S, but not equals to the whole S. For example:
' ' a ' ' i s a b o r d e r o f ' arba '
' ab ' i s a b o r d e r o f ' abcdab '
' ab ' i s not a b o r d e r o f ' ab '

Prefix Function A Prefix function for a string P generates an array l (lps

is short for failure loopkp table) of the same length of string, where lps[i] is
the length of the longest border of for prefix substring P[0...i]. Mathemati-
cally the definition of prefix function can be written as follows:

l[i] = max {k : P [0...k − 1] = P [i − (k − 1)...i} (22.1)

k=0,...,i

The naive implementation of prefix-function takes O(n3 ):

22.1. EXACT SINGLE-PATTERN MATCHING 495

Figure 22.3: The Sliding Rule

1 def naiveLps (p : s t r ) :
2 dp = [ 0 ] ∗ l e n ( p )
3 f o r i in range (1 , len (p) ) :
4 f o r l i n r a n g e ( i , 0 , −1) : # from maxmim l e n g t h t o l e n g t h 1
5 prefix = p [0: l ]
6 s u f f i x = p [ i − l +1: i +1]
7 #p r i n t ( p r e f i x , s u f f i x )
8 i f p r e f i x == s u f f i x :
9 dp [ i ] = l
10 break
11 r e t u r n dp

Figure 22.4: Proof of Lemma

For example, prefix function of string “abcabcd” is [0,0,0,1,2,3,0]. The

trivial algorithm to implement this has O(n3 ) time complexity (one for loop
for i, second nested for loop for k, and another n for comparing corresponding
substring), which exactly follows the definition of the prefix function. The
efficient algorithm which is demonstrated to run in O(n) was proposed by
Knuth and Pratt and independently from them by Morris in 1977. It was
used as the main function of a substring search algorithm. This is the core
of Knuth Morris Pratt (KMP) algorithm. In order to implement the prefix
496 22. STRING PATTERN MATCHING ALGORITHMS

function in linear time, we first need to utilize two properties ( facts) for the
purpose of two further optimization:

1. Observation: π[i + 1] ≤ π[i] + 1, which states that the value of the

prefix function can either increase by one, stay the same, or decrease
by some amount.

2. Lemma: If l[i] > 0, then all borders of P[0...i] but for the
longest one are also borders of P [0...l(i) − 1]. The proof is: As
shown in Fig. 22.4, l(i) is the longest border for P[0...i]. We let µ be
another shorter border of P[0...i] such that |µ| < l(i). Because the first
l(i) and the second is the same, this means at the first l(i), the suffix
of l(i) that of the same length of µ is µ. This states that µ is both a
border of P[0...l(i)-1].

Now, with such knowledge we can do the following two further optimiza-
tion:

1. With 1, the complexity can be reduced to O(n2 ) by getting rid of the

for loop on k. Because each step the prefix function can grow at most
one. And among all iterations of i, it can grow at most n steps, and
also only can decrease a total of n steps.

2. With 2, we can further get rid of the O(n) string comparison each step.
To accomplish this, we have to use all the information computed in the
previous steps: all borders of P[0...i] (assuming it has k in total) can be
enumerated from the longest to shortest as: b0 = π(i), b1 = π(b0 − 1),
..., bk−1 = π(bk−2 − 1) (bk−1 = 0). Therefore, at step posited at
i + 1, instead of comparing string s[0...π(i)] with s[i − (π(i) − 1)...i],
comparison of char s[π(i)] and s[i] is needed.

Implementation of Prefix Function for a Given String S Let’s recap

the above optimization to get the final algorithm which computes prefix
function in O(n). This step is of key importance to the success of KMP
algorithm. Let’s understand this together with the algorithm statement
and the code.

1. Initialization: assign n space to l array and set l0 = 0.

2. A for loop in range of [1, m-1] to compute l(i). Set a variable j =

l(i − 1), and a while loop over j until j = 0: check if s[j] == s[i]; if
true, l(i) = j + 1, otherwise reassign j j = l(j − 1) in order to check
smaller border.

1 def prefix_function ( s ) :
2 n = len ( s )
3 pi = [ 0 ] ∗ n
22.1. EXACT SINGLE-PATTERN MATCHING 497

4 f o r i in range (1 , n) :
5 # compute l ( i )
6 j = p i [ i −1]
7 w h i l e j > 0 and s [ i ] != s [ j ] : # t r y a l l b o r d e r s o f s
[ 0 . . . i −1] , from t h e l o n g e s t t o t h e s h o r t e s t
8 j = p i [ j −1]
9 # check t h e c h a r a c t e r
10 i f s [ i ] == s [ j ] :
11 pi [ i ] = j + 1
12
13 return pi

Run an example:
1 S = ' abcabcd '
2 p r i n t ( ' The p r e f i x f u n c t i o n o f : ' , S , " i s " , prefix_function (S) )
3
4 The p r e f i x f u n c t i o n o f : abcabcd is [0 , 0 , 0 , 1 , 2 , 3 , 0]

Knuth Morris Pratt (KMP) Back to the problem of eaxact pattern

matching, we first build a new string as s = P +0 $0 + T , which is a concate-
nation of pattern P, ’$’, and text T. Let us calculate the prefix function of
string s. Now, let us think about the meaning of the prefix function, except
for the first m + 1 items (which belong to the string P and the separator
’$’):

1. For all i, π[i] ≤ m because of the separator ’$’ in the middle of the
pattern and the text that acts as a separator.

2. If π[i] = m, i.e. K[0 : m] = K[i − m : i] = P . This means that the

pattern P appears completely in the new string s and ends at position
i. Now, we convert i to the starting position of pattern in T with
i − 2m.

3. If f [i] < m, no full occurrence of pattern ends with position i.

Thus the Knuth-Morris-Pratt algorithm solves the problem in O(n + m)

time and O(n + m) memory. And can be simply implemented with prefix
function as follows:
1 d e f KMP_coarse ( p , t ) :
2 m = len (p)
3 s = p + '$ ' + t
4 n = len ( s )
5 pi = prefix_function ( s )
6 ans = [ ]
7 f o r i i n r a n g e ( 2 ∗m, n ) :
8 i f p i [ i ] == m:
9 ans . append ( i − 2∗m)
10 r e t u r n ans
498 22. STRING PATTERN MATCHING ALGORITHMS

Because for all π[i] ≤ m: for i in [0, m-1], we save the border in π; for
i in [m, n+m-1], we set up a global variable j to track the last border. We
can decrease the space complexity in O(m). The Python implementation is
given as:
1 d e f KMP( p , t ) :
2 m = len (p)
3 s = p + '$ ' + t
4 n = len ( s )
5 pi = [ 0 ] ∗ m
6 j = pi [ 0 ]
7 ans = [ ]
8 f o r i in range (1 , n) :
9 # compute l ( i )
10 w h i l e j > 0 and s [ i ] != s [ j ] : # t r y a l l b o r d e r s o f s
[ 0 . . . i −1] , from t h e l o n g e s t t o t h e s h o r t e s t
11 j = p i [ j −1]
12 # check t h e c h a r a c t e r
13 i f s [ i ] == s [ j ] :
14 j += 1
15 # record the r e s u l t
16 i f j == m:
17 ans . append ( i −2∗m)
18 # s a v e t h e r e s u l t i f i i n [ 0 , m−1]
19 i f i < m:
20 pi [ i ] = j
21 r e t u r n ans

Run an example:
1 t = ' textbooktext '
2 p = ' text '
3 p r i n t (KMP( p , t ) )
4 # output
5 # [0 , 8]

Sliding Rule with Border Information Now, assuming we know how

to compute the border information, how do we slide instead compared with
the brute force solution? There are three steps, with Fig. 22.3 as demon-
stration:

1. Find longest common prefix µ.

2. Find w – the longest border of µ.

3. Move P such that prefix w in P aligns with suffix w of µ in T.

Knuth Morris Pratt O(m + n) Now, to complete the picture of KMP,

when we have the lookup table at hand, when we failed to match i and j,
we set j = lps[j-1], and i doest not need to backtrack.
22.1. EXACT SINGLE-PATTERN MATCHING 499

1 d e f KMP( p , ps ) :
2 f = LPS( p )
3 n =m, n = l e n ( p ) , l e n ( t s )
4
5 i = 0 # index in s
6 j = 0 # index in p
7 pos = [ ]
8 while i < n :
9 i f p [ j ] == s [ i ] :
10 i += 1
11 p o s j += 1
12 dp [ i ] = pos
13 i f dp [ i ] == m: i f j == m: # i a t i +1 , j a t f [ j −1]
14 p r i n t ( " Found p a t t e r n a t i n d e x " , i −j )
15 ans . append ( i −2∗mj )
16 i += 1
17 else :
18 i f pos > 0 : j = f [ j −1]
19 e l s e : # mismatch a t i and j
20 i f j != 0 : # i f j can r e t r e a t with l p s , then i keep
t h e same
21 pos = dp [ p o s j = f [ j −1]
22 else :
23 i += 1 #t h e v a l u e i s 0
24 r e t u r n ans # i f j n e e d s t o s t a r t over , i moves t o o
25 i += 1
26 r e t u r n ans
27 p r i n t (KMP( p , s ) )
28 # [0 , 9 , 12]

22.1.2 More Applications of Prefix Functions

Counting the number of occurrences of each prefix

Counting the number of occurrences of different substring in a

string

Compressing a string

22.1.3 Z-function
Definition and Implementation
Z-function for a string s of length n is defined as an array z[i] = k, i ∈
[1, n − 1]. At item z[i] = k stores the longest substring starting at index i
which is also a prefix of string s. To notice, the length of the substring has
to be smaller than the whole length, therefore, z[0] = 0. In other words, it
means the the length of the longest common prefix between s and substring
s[i : n]. For example:
500 22. STRING PATTERN MATCHING ALGORITHMS

" aaaaa " − [ 0 ,4 ,3 ,2 ,1]

a
a substring ' aaaa ' = p r e f i x ' aaaa '
a substring ' aaa ' = p r e f i x ' aaa '
a substring ' aa ' = p r e f i x ' aa '
a substring 'a ' = prefix 'a '

Another Example.
" aaabaab " − [ 0 , 2 , 1 , 0 , 2 , 1 , 0 ]
a 0
a s u b s t r i n g ' aa ' = p r e f i x ' aa '
a substring 'a ' = prefix 'a '
b 0
a s u b s t r i n g ' aa ' = p r e f i x ' aa '
a substring 'a ' = prefix 'a '
b

z-function can be represented with a formula:

l[i] = max {k + 1 : P [0...k] = P [i...i + k]} (22.2)

k=0,...,i

The naive implementation of z-function takes O(n2 ) time complexity just as

the prefix function.
1 d e f naiveZF ( s ) :
2 n = len ( s )
3 z = [0] ∗ n
4 f o r i in range (1 , n) : # s t a r t i n g point
5 k = 0
6 w h i l e i + k < n and s [ i + k ] == s [ k ] :
7 k += 1
8 z[ i ] = k
9 return z

Z-function Property Here, we show how we can implement it in O(n).

To compute z[i], do we have to start at i, then follows the order of i + 1,
i + 2, ..., i + k? The answer is No. First, As shown in Fig. 22.5, for a given

Figure 22.5: Z function property

position i, [l, r] is one of its preceding non-zero z[p], p < i, which has the
furthest right boundary r. We can think it as a rightmost window, wherein
s[l, r] = s[0, r − l + 1]. s[0, i − l] is marked as yellow. We divide the area in
22.1. EXACT SINGLE-PATTERN MATCHING 501

range [0, r − l + 1] into yellow [0, l − i] and a green parts [l − i + 1, r − l + 1].

Therefore, to compare range [i, r] with prefix is the same as of comparing
range [l − i + 1, r − l + 1] with the prefix, which already has a result z[i − l].
So instead, our k can start from position z[i − l]. However, there are two
more restrictions:

1. Enable to utilize z-function property, r ≥ i because the index r can

be seen as “boundary” to which our string s has been scanned by the
algorithm.

2. The initial approximation for z[i] is bounded by the length between r

and i, which is r−i+1. Therefore, we modify our initial approximation
to z[i] to z[i] = min(r − i + 1, z[i − l]) instead.

Now, the O(n) implementation is given as follows:

1 def linearZF ( s ) :
2 n = len ( s )
3 z = [0] ∗ n
4 l = r = 0
5 f o r i in range (1 , n) :
6 k = 0
7 i f i <= r : # r i s t h e r i g h t bound has been scanned
8 k = min ( r−i +1, z [ i − l ] )
9 w h i l e i + k < n and s [ i+k ] == s [ k ] :
10 k += 1
11 # update t h e boundary
12 if i + k − 1 > r:
13 l = i
14 r = i + k − 1
15 z[ i ] = k
16 return z

Applications
The applications of Z-function are largely similar to those of prefix func-
tion. Therefore, the applications will be explained briefly compared with
the applications of prefix functions. If you have problems to understand
this section, please read the prefix function first.

Exact Single-Pattern Matching In this problem set, we are asked to

find all occurrences of the pattern p inside the text t. We can do the same
as of in the KMP, we create a new string s = p + $ + t. Then, we compute
the z-function for s. With the z array, for z[i] = k, if k = |p|, then we
know there is one occurrence of p starting in the i-th position in s, which is
i − (|p| + 1) in the t.
1 def findPattern (p , t ) :
2 s = p + '$ ' + t
502 22. STRING PATTERN MATCHING ALGORITHMS

3 m = len (p)
4 z = ( linearZF ( s ) )
5 ans = [ ]
6 f o r i , v i n enumerate ( z ) :
7 i f v == m:
8 ans . append ( i −m−1)
9 r e t u r n ans

Number of distinct substrings in a string Given a string s of length

n, count the number of distinct substrings of s.
To solve this problem we need to use dynamic programming and the
subproblems are s[0...0], s[0...1], ..., s[1...i],...s[0...n − 1]. For example, given
“abc”,
subproblem 1 : ' a ' , dp [ 0 ] = 1
subproblem 2 : ' ab ' , dp [ 1 ] = 2 , with new s u b s t r i n g s ' b ' , ' ab '
subproblem 3 , ' abc ' , dp [ 2 ] = 3 , new s u b s t r s ' c ' , ' bc ' , ' abc '

We know the maximum for dp[i] is i + 1, however for cases like “aaa”’,
the situation is different:
subproblem 1 : ' a ' , dp [ 0 ] = 1
subproblem 2 : ' aa ' , dp [ 1 ] = 1 , ' aa ' , b e c a u s e ' a ' _1 == ' a_0 '
subproblem 3 , ' aaa ' , dp [ 2 ] = 1 , new s u b s t r s ' aaa ' , b e c a u s e '
a_0a_1 '= ' a_1a_2 ' , ' a_2 ' = ' a_0 ' .

If for each subproblem i, we take the string s[0...i] and reverse it i...0.
If using z-function on this substring, we can find the number of prefixes of
the reversed string are found somewhere else in it, which is the maximum
value of its z-function. This is because if we know z[j] = max(k), then
s[i...i-max-1] = s[i-j...i-j+max], which is to say s[i-max-1...i] = s[i-j-max...i-j]
With the max value, all of the shorter prefixes also occur too. Therefore,
dp[i] = i + 1 − max(z[i]). The time complexity is O(n2 )
1 def distinctSubstrs ( s ) :
2 n = len ( s )
3 i f n < 1:
4 return 0
5 ans = 1 # f o r dp [ 0 ]
6 #l a s t _ s t r = s [ 0 : 1 ]
7 f o r i in range (1 , n) :
8 reverse_str = s [ 0 : i +1][:: −1]
9 z = linearZF ( reverse_str )
10 ans += ( i + 1 − max( z ) )
11 r e t u r n ans

Run an example:
1 s = ' abab '
2 print ( distinctSubstrs ( s ) )
3 # output
4 # 7
22.2. EXACT MULTI-PATTERNS MATCHING 503

22.2 Exact Multi-Patterns Matching

22.2.1 Suffix Trie/Tree/Array Introduction
Up till now, prefix function and the KMP algorithms seems impeccable with
its liner time and space complexity. However, there are two problems that
KMP can not resolve:

1. Approximate matching, which we will detail more in the next section.

2. If frequent queries will be made on the same text with a given pattern,
and if the m << n, then KMP become impractical.

The solution to the second problem of KMP is preprocess the text and
store it in order to obtain an algorithm with time complexity only related
to the length of the pattern for each query. Building a suffix trie of the text
is such a solution.

Suffix Trie A suffix trie of a given string is defined as:

Suffix Tree If we compress the above suffix trie, we get suffix tree.

Suffix Array Suffix Array is further applied with the benefits of saving
space in storage.

Suffix Tree VS Suffix Array Each data structure has its own pros and
cons. In reality, conversion between these two can be implemented in O(n)
time. Therefore, we can first construct one and convert it to the other later.

22.2.2 Suffix Array and Pattern Matching

Definition and Implementation
Suffix Array of a given string s is defined as all suffixes of this string in
lexicographical order. Because no any two suffixes can have the same length,
thus the sorting will not have equal items. For example, given s = ’ababaa’,
the suffix array will be:
'a '
' aa '
' abaa '
' ababaa '
' baa '
' babaa '

To avoid the prefix rule defined in the lexicographical order, as shown

with example ’ab’ < ’abab’, we append a special character ’$’ at the end
of all suffixes. ’$’ is smaller than all other characters. With this operation,
504 22. STRING PATTERN MATCHING ALGORITHMS

we have ’ab$’ and ’abab$’. At position 2, ’$’ will be smaller than ’a’ or any
other character and ’ab$’ is still smaller than ’abab$’. Therefore, adding this
special character will not lead to different sorting result, and can avoid the
prefix rule when comparing two different strings.

Naive Solution with O(n2 log n) time complexity With this knowl-
edge, we get s = s + ’$’, and we can generate the suffix array and sort
them. A stable sorting algorithm takes O(n log n) comparison, and each
comparison takes additional O(n), which makes the total time complexity
of O(n2 log n).
1 def generateSuffixArray ( s ) :
2 s = s + '$ '
3 n = len ( s )
4 s u f f i x A r r a y = [ None ] ∗ n
5 # generate
6 f o r i in range (n) :
7 suffixArray [ i ] = s [ i : ]
8 #p r i n t ( s u f f i x A r r a y )
9 suffixArray . sort ()
10 print ( suffixArray )
11 # s a v e s p a c e by s t o r i n g t h e o r d e r o f t h e s u f f i x e s , which i s
the s t a r t i n g index
12 f o r idx , s u f f i x i n enumerate ( s u f f i x A r r a y ) :
13 suffixArray [ idx ] = n − len ( s u f f i x )
14 print ( suffixArray )
15 return suffixArray

Run the above example, we will have the following output:

[ ' $ ' , ' a$ ' , ' aa$ ' , ' abaa$ ' , ' ababaa$ ' , ' baa$ ' , ' babaa$ ' ]
[6 , 5 , 4 , 2 , 0 , 3 , 1]

Figure 22.6: Cyclic Shifts

22.2. EXACT MULTI-PATTERNS MATCHING 505

Cyclic Shifts For our example, we start at position 0, we get the first
cyclic shift of ’ababaa$’, and then position 1, we have our second cyclic shift
’babaa$a’, and so till the last position of the string. Now, let us see what
happens if we sort all of the cyclic shifts:
Sorted To S u f f i x Array
0 : ababaa$ $ababaa $
1 : babaa$a a$ababa a$
2 : abaa$ab aa$abab aa$
3 : baa$aba abaa$ab abaa$
4 : aa$abab ababaa$ ababaa$
5 : a$ababa baa$aba baa$
6 : $ababaa babaa$a babaa$

We know the number of cyclic shifts is the same as of the number of all
suffixes of the same string. And by observing the above example, sorting the
cyclic shifts will get us sorted suffixes if we remove all characters after ’$’ in
each cyclic shift. This conclusion can be hold true for all strings because ’$’
is smaller than all other characters, and with ’$’ at different position in each
cyclic shift, once we are at ’$’, the comparison of two strings end because the
first one that has ’$’ is smaller than all others. Therefore, all the characters
after the ’$’ will not affect the sorting at all. Now, we know that sorting
cyclic shifts and suffixes of string s is equivalent with the addition of ’$’ at
the end.
If we can sort the cyclic shifts of string in faster way, then we will find
ourselves a more efficient suffix sorting algorithm. One obvious efficient
sorting algorithm is using Radix Sort. Using radix sort, we first sort the
cyclic shifts by the last character using counting sort, and then the second
last character till finishing the first character. Sorting each character for
the whole cyclic shifts array takes O(n), and we are running n rounds, this
makes the whole sorting of O(n2 ) and with O(n) space. However, we can
improve these complexity further by using special properties of the Cyclic
shifts.

Partial Cyclic Shifts Different from the cyclic shifts, partial cyclic shifts
are defined as CiL which is the partial cyclic shift of length L starting at
index i. For the above example, the partial cyclic shift of length 1, 2, and 4
will be :
C7 C1 C2 C4
ababaa$ a ab abab
babaa$a b ba baba
abaa$ab a ab abaa
baa$aba b ba baa$
aa$abab a aa aa$a
a$ababa a a$ a$ba
$ababaa $ $a $aba
506 22. STRING PATTERN MATCHING ALGORITHMS

Carefully observing the relation of pair (C1 , C2 ) and (C2 , C4 ). We can find
that C1 and the second half of substring (denoted by underline) in C2 has
the same key set. Same rule applies to C2 and the second half of substring
in C4 .

Doubled Partial Cyclic Shifts Doubled Partial Cyclic Shifts of CiL is

Ci2L and Ci2L = CiL Ci+L
L (with concatenation of these two strings). Apply
the same methodology of Radix Sort, we can sort the doubled partial shifts
by firstly sort the second half and then the first half. Therefore, instead of
doing n rounds of counting sort on each character, we do log n rounds of
sorting of the doubled partial cyclic shifts from the last round. If we can
sort each round in O(n), then we make the time complexity to O(n log)
(T (n) = T (n/2)) which is way better than the radix sort of O(n2 ). The
starting point of sorting doubled partial cyclic is sorting the partial cyclic
shifts with length one.

Order and Class Order is defined as the sorted cyclic shift with the
starting index as their value. For example, for C 1 , the sorted order will be
[6, 0, 2, 4, 5, 1, 3], which represents [$, a, a, a, a, b, b]. Class is an array that
each item Classi corresponds to Ci and denotes as the number of par-
tial cyclic shifts of the same length that are strictly smaller than Ci . For
’ababaa$’, the class of length 1 will be [1, 2, 1, 2, 1, 1, 0]. The reason to bring
in the concept of class is because of the rule that the set of first and second
half of the doubled partial cyclic shifts share the same key set, and the class
is equivalent to the converted key of corresponding partial cyclic
shift.

Compute Order and Class of Partial Cyclic Shifts of Length 1

For C 1 : , we can obtain with counting sort with the range of 256 for all
common English characters. For C 1 , we know for [$, a, a, a, a, b, b], we assign
order as [0, 1, 1, 1, 1, 2, 2]. Each one corresponds to Corderi . Because the
class corresponds to the original string order, therefore, we just need to put
these class back to orderi position in the array. We recap this as: we first
set class[order[0]] = 0, and looping over the order array from [1, n-1], the
corresponding character will be s[order[i] and the last order char will be
s[order[i − 1]]. We just need to compare if it equals.
i f s [ o r d e r [ i ] ] != s [ o r d e r [ i − 1 ] ] :
# u s e o r d e r a s i n d e x t o put t h e r e s u l t back
c l a s s [ order [ i ] ] = c l a s s [ order [ i −1]] + 1
else :
c l a s s [ order [ i ] ] = c l a s s [ order [ i −1]]

The Python implementation of Computing Order for Partial Cyclic Shift

of Length 1, the time complexity is O(n + k), k is the number of possible
characters.
22.2. EXACT MULTI-PATTERNS MATCHING 507

1 d e f getCharOrder ( s ) :
2 n = len ( s )
3 numChars = 256
4 count = [ 0 ] ∗ numChars # t o t a l l y 256 c h a r s , i f you want , can
p r i n t i t out t o s e e t h e s e c h a r s
5
6 order = [ 0 ] ∗ ( n)
7
8 #count t h e o c c u r r e n c e o f each c h a r
9 for c in s :
10 count [ ord ( c ) ] += 1
11
12 # p r e f i x sum o f each c h a r
13 f o r i i n r a n g e ( 1 , numChars ) :
14 count [ i ] += count [ i −1]
15
16 # a s s i g n from count down t o be s t a b l e
17 f o r i i n r a n g e ( n−1,−1,−1) :
18 count [ ord ( s [ i ] ) ] −=1
19 o r d e r [ count [ ord ( s [ i ] ) ] ] = i # put t h e i n d e x i n t o t h e o r d e r
i n s t e a d the s u f f i x s t r i n g
20
21 return order

The Python implementation of Computing Class for Partial Cyclic Shift

of Length 1, this can be applied in O(n) given the order.
1 def getCharClass ( s , order ) :
2 n = len ( s )
3 cls = [0]∗n
4 # i f i t a l l d i f f e r s , then c l s [ i ] = o r d e r [ i ]
5 c l s [ o r d e r [ 0 ] ] = 0 #t h e 6 th w i l l be 0
6 f o r i in range (1 , n) :
7 # u s e o r d e r [ i ] a s index , s o t h e l a s t i n d e x
8 i f s [ o r d e r [ i ] ] != s [ o r d e r [ i − 1 ] ] :
9 print ( ' d i f f ' , s [ order [ i ] ] , s [ order [ i −1]])
10 c l s [ order [ i ] ] = c l s [ order [ i −1]] + 1
11 else :
12 c l s [ order [ i ] ] = c l s [ order [ i −1]]
13 return c l s

Applying the above two functions, we can get:

L=1 c l s o r d e r CL=2 order cls
i =0 , a : 1 $ :6 a$ : 5 $a : 6 ab : 3
i =1 , b : 2 a :0 $a : 6 a$ : 5 ba : 4
i =2 , a : 1 a :2 ba : 1 aa : 4 ab : 3
i =3 , b : 2 a :4 ba : 3 ab : 0 ba : 4
i =4 , a : 1 a :5 aa : 4 ab : 2 aa : 2
i =5 , a : 1 b:1 ab : 0 ba : 1 a$ : 1
i =6 , $ : 0 b:3 ab : 2 ba : 3 $a : 0

Sort the Doubled Partial Cyclic shifts To apply radix sorting, we

double our previous sorted partial shifts of CiL as Ci−L
L C L . Given the fact
i
508 22. STRING PATTERN MATCHING ALGORITHMS

that the second part CiL is already sorted, we just need to sort the first half
with counting sort using the class array of the last partial cyclic shifts. The
time complexity of this step is O(n) too. The Python implementation of
computing the doubled partial cyclic shifts’ order is:
1 ' ' ' I t i s a counting s o r t using the f i r s t part as c l a s s ' ' '
2 def sortDoubled ( s , L , order , c l s ) :
3 n = len ( s )
4 count = [ 0 ] ∗ n
5 new_order = [ 0 ] ∗ n
6 # t h e i r key i s t h e c l a s s
7 f o r i in range (n) :
8 count [ c l s [ i ] ] += 1
9
10 # p r e f i x sum
11 f o r i in range (1 , n) :
12 count [ i ] += count [ i −1]
13
14 # a s s i g n from count down t o be s t a b l e
15 # s o r t the f i r s t h a l f
16 f o r i i n r a n g e ( n−1, −1, −1) :
17 s t a r t = ( o r d e r [ i ] − L + n ) % n #g e t t h e s t a r t i n d e x o f t h e
f i r s t half ,
18 count [ c l s [ s t a r t ] ] −= 1
19 new_order [ count [ c l s [ s t a r t ] ] ] = s t a r t
20
21 r e t u r n new_order

Now, similarily, we compute the new class information. The comparison

of the string is converted to compare its corresponding class info, as a pair
(P1 , P2 ) which is the class of the first and second half.
1 d e f u p d a t e C l a s s ( o r d e r , c l s , L) :
2 n = len ( order )
3 new_cls = [ 0 ] ∗ n
4 # i f i t a l l d i f f e r s , then c l s [ i ] = o r d e r [ i ]
5 new_cls [ o r d e r [ 0 ] ] = 0 #t h e 6 th w i l l be 0
6 f o r i in range (1 , n) :
7 cur_order , prev_order = o r d e r [ i ] , o r d e r [ i −1]
8 # u s e o r d e r [ i ] a s index , s o t h e l a s t i n d e x
9 i f c l s [ cu r_o rde r ] != c l s [ prev_order ] o r c l s [ ( c ur_ ord er+L) %
n ] != c l s [ ( prev_order+L) % n ] :
10 new_cls [ cur _or der ] = new_cls [ prev_order ] + 1
11 else :
12 new_cls [ cur _or der ] = new_cls [ prev_order ]
13 r e t u r n new_cls

Sorting Cyclic Shifts in O(n log n) Now, we have derived ourselves a

O(n log n) suffix array construction algorithm. We start from sorting partial
cyclic shifts of length 1 and each time to double the length untill the the
sorted length is >= to the string’s length.
1 def cyclic_shifts_sort ( s ) :
22.3. BONUS 509

2 s = s + '$ '
3 n = len ( s )
4 o r d e r = getCharOrder ( s )
5 c l s = getCharClass ( s , order )
6 p r i n t ( order , c l s )
7 L = 1
8 while L < n :
9 order = sortDoubled ( s , 1 , order , c l s )
10 c l s = u p d a t e C l a s s ( o r d e r , c l s , L)
11 p r i n t ( order , c l s )
12 L ∗= 2
13
14 return order

Applications
Number of Distinct Substrings of a string

22.2.3 Rabin-Karp Algorithm (Exact or anagram Pattern

Matching)
Used to find the exact pattern, because different anagram of string would
have different hash value.

22.3 Bonus

Figure 22.7: Building a Trie from Patterns

Multiple-Patterns Matching Previously, we mainly talked about exac-

t/approximate one pattern matching. When there are multiple patterns the
510 22. STRING PATTERN MATCHING ALGORITHMS

time complexity became to O( i mi ∗ n) if brute force solution is used. We

can construct a trie of all patterns as shown in Section 27.3. For example,
in Fig. 22.7 shows a trie built with all patterns.
Now, let us do Trie Matching exactly the same way as the brute force
pattern matching algorithm by sliding the pattern trie along the text at each
position of text. Each comparison: walk down the trie by spelling symbols
of text and a pattern from the pattern list matches text each time we reach
a leaf. Try text = “panamabananas”. We will first walk down branch of p-
>a->n and stop at the leaf, thus we find pattern ‘pan‘. With Trie Matching,
the runtime is decreased to O(maxi mi ∗ n). Plus the trie construction time
O( i mi ).
P

However, merging all patterns into a trie makes it impossible for using
advanced single-pattern matching algorithms such as KMP.

More Pattern Matching Tasks There are more types of matching, in-
stead of finding the exact occurrence of one string in another.

1. Longest Common Substring (LCS): LCS asks us to return the longest

substring between these two strings.

2. Anagram Matching: this asks us to find a substring in T that has all

letters in P, and does not care about the order of these letters in P.

3. Palindrome Matching.

22.4 Trie for String

Definition Trie comes from the word reTrieval. In computer science,
a trie, also called digital tree, radix tree or prefix tree which like BST is
also a kind of search tree for finding substring in a text. We can solve
string matching in O(|T |) time, where |T| is the size of our text. This
purely algorithmic approach has been studied extensively in the algorithms:
Knuth-Morris-Pratt, Boyer-Moore, and Rabin-Karp. However, we entertain
the possibility that multiple queries will be made to the same text. This
motivates the development of data structures that preprocess the text to
allow for more efficient queries. Such efficient data structure is Trie, which
can do each query in O(P ), where P is the length of the pattern string. Trie
is an ordered tree structure, which is used mostly for storing strings (like
words in dictionary) in a compact way.

1. In a Trie, each child branch is labeled with letters in the alphabet .

Actually, it is not necessary to store the letter as the key, because if

we order the child branches of every node alphabetically from left to
right, the position in the tree defines the key which it is associated to.
22.4. TRIE FOR STRING 511

2. The root node in a Trie represents an empty string.

Now, we define a trie Node: first it would have a bool variable to denote
if it is the end of the word and a children which is a list of of 26 children
TrieNodes.
1 c l a s s TrieNode :
2 # T r i e node c l a s s
3 d e f __init__ ( s e l f ) :
4 s e l f . c h i l d r e n = [ None ] ∗ 2 6
5 # isEndOfWord i s True i f node r e p r e s e n t t h e end o f t h e
word
6 s e l f . isEndOfWord = F a l s e

Figure 22.8: Trie VS Compact Trie

Compact Trie If we assign only one letter per edge, we are not taking
full advantage of the trie’s tree structure. It is more useful to consider
compact or compressed tries, tries where we remove the one letter per edge
constraint, and contract non-branching paths by concatenating the letters on
these paths. In this way, every node branches out, and every node traversed
represents a choice between two different words. The compressed trie that
corresponds to our example trie is also shown in Figure 27.4.

Operations: INSERT, SEARCH Both for INSERT and SEARCH, it

takes O(m), where m is the length of the word/string we wand to insert
or search in the trie. Here, we use an LeetCode problem as an example
showing how to implement INSERT and SEARCH. Because constructing a
trie is a series of INSERT operations which will take O(n ∗ m), n is the total
numbers of words/strings, and m is the average length of each item. The
space complexity fof the non-compact Trie would be O(N ∗ | |), where
P

| | is the alphlbetical size, and N is the total number of nodes in the trie
P

structure. The upper bound of N is n ∗ m.

512 22. STRING PATTERN MATCHING ALGORITHMS

Figure 22.9: Trie Structure

22.1 208. Implement Trie (Prefix Tree) (medium). Implement a trie

with insert, search, and startsWith methods.
1 Example :
2 T r i e t r i e = new T r i e ( ) ;
3 t r i e . i n s e r t ( " apple ") ;
4 t r i e . search ( " apple " ) ; // r e t u r n s true
5 t r i e . s e a r c h ( " app " ) ; // r e t u r n s false
6 t r i e . s t a r t s W i t h ( " app " ) ; // r e t u r n s true
7 t r i e . i n s e r t ( " app " ) ;
8 t r i e . s e a r c h ( " app " ) ; // r e t u r n s true

Note: You may assume that all inputs are consist of lowercase letters
a-z. All inputs are guaranteed to be non-empty strings.

INSERT with INSERT operation, we woould be able to insert a

given word in the trie, when traversing the trie from the root node
which is a TrieNode, with each letter in world, if its corresponding
node is None, we need to put a node, and continue. At the end, we
need to set that node’s endofWord variable to True. thereafter, we
would have a new branch starts from that node constructured. For
example, when we first insert “app“ as shown in Fig 27.4, we would
end up building branch “app“, and with ape, we would add nodes “e“
as demonstrated with red arrows.
1 d e f i n s e r t ( s e l f , word ) :
2 """
3 I n s e r t s a word i n t o t h e t r i e .
4 : type word : s t r
5 : rtype : void
6 """
7 node = s e l f . r o o t #s t a r t from t h e r o o t node
8 f o r c i n word :
22.4. TRIE FOR STRING 513

9 l o c = ord ( c )−ord ( ' a ' )

10 i f node . c h i l d r e n [ l o c ] i s None : # c h a r d o e s not
e x i s t , new one
11 node . c h i l d r e n [ l o c ] = s e l f . TrieNode ( )
12 # move t o t h e next node
13 node = node . c h i l d r e n [ l o c ]
14 # s e t the f l a g to true
15 node . is_word = True

SEARCH For SEARCH, like INSERT, we traverse the trie using

the letters as pointers to the next branch. There are three cases:
1) for word P, if it doesnt exist, but its prefix does exist, then we
return False. 2) If we found a matching for all the letters of P, at the
last node, we need to check if it is a leaf node where is_word is True.
STARTWITH is just slightly different from SEARCH, it does not need
to check that and return True after all letters matched.
1 d e f s e a r c h ( s e l f , word ) :
2 node = s e l f . r o o t
3 f o r c i n word :
4 l o c = ord ( c )−ord ( ' a ' )
5 # c a s e 1 : not a l l l e t t e r s matched
6 i f node . c h i l d r e n [ l o c ] i s None :
7 return False
8 node = node . c h i l d r e n [ l o c ]
9 # case 2
10 r e t u r n True i f node . is_word e l s e F a l s e

1 d e f s t a r t W i t h ( s e l f , word ) :
2 node = s e l f . r o o t
3 f o r c i n word :
4 l o c = ord ( c )−ord ( ' a ' )
5 # c a s e 1 : not a l l l e t t e r s matched
6 i f node . c h i l d r e n [ l o c ] i s None :
7 return False
8 node = node . c h i l d r e n [ l o c ]
9 # case 2
10 r e t u r n True

Now complete the given Trie class with TrieNode and __init__ func-
tion.
1 c l a s s Trie :
2 c l a s s TrieNode :
3 d e f __init__ ( s e l f ) :
4 s e l f . is_word = F a l s e
5 s e l f . c h i l d r e n = [ None ] ∗ 26 #t h e o r d e r o f t h e
node r e p r e s e n t s a c h a r
6
7 d e f __init__ ( s e l f ) :
8 """
9 I n i t i a l i z e your data s t r u c t u r e h e r e .
514 22. STRING PATTERN MATCHING ALGORITHMS

10 """
11 s e l f . r o o t = s e l f . TrieNode ( ) # r o o t has v a l u e None

22.1 336. Palindrome Pairs (hard). Given a list of unique words, find
all pairs of distinct indices (i, j) in the given list, so that the concate-
nation of the two words, i.e. words[i] + words[j] is a palindrome.
1 Example 1 :
2
3 Input : [ " abcd " , " dcba " , " l l s " , " s " , " s s s l l " ]
4 Output : [ [ 0 , 1 ] , [ 1 , 0 ] , [ 3 , 2 ] , [ 2 , 4 ] ]
5 E x p l a n a t i o n : The p a l i n d r o m e s a r e [ " dcbaabcd " , " abcddcba " , "
s l l s " ," l l s s s s l l "]
6
7 Example 2 :
8
9 Input : [ " bat " , " tab " , " c a t " ]
10 Output : [ [ 0 , 1 ] , [ 1 , 0 ] ]
11 E x p l a n a t i o n : The p a l i n d r o m e s a r e [ " b a t t a b " , " t a b b a t " ]

Solution: One Forward Trie and Another Backward Trie. We

23 i f word [ i ] != word [ j ] :
24 return False
25 i += 1
26 j −= 1
27 r e t u r n True
28
29
30 class Solution :
31 d e f p a l i n d r o m e P a i r s ( s e l f , words ) :
32 ' ' ' Find p a i r s o f p a l i n d r o m e s i n O( n∗k ^2) time and O
( n∗k ) s p a c e . ' ' '
33 root = Trie ()
34 res = [ ]
35 f o r i , word i n enumerate ( words ) :
36 i f not word :
37 continue
38 r o o t . i n s e r t ( word [ : : − 1 ] , i )
39 f o r i , word i n enumerate ( words ) :
40 i f not word :
41 continue
42 t r i e = root
43 f o r j , ch i n enumerate ( word ) :
44 i f ch not i n t r i e . l i n k s :
45 break
46 t r i e = t r i e . l i n k s [ ch ]
47 i f i s _ p a l i n d r o m e ( word [ j + 1 : ] ) and t r i e . i n d e x
i s not None and t r i e . i n d e x != i :
48 # i f t h i s word c o m p l e t e s t o a
p a l i n d r o m e and t h e p r e f i x i s a word , c o m p l e t e i t
49 r e s . append ( [ i , t r i e . i n d e x ] )
50 else :
51 # t h i s word i s a r e v e r s e s u f f i x o f o t h e r
words , combine with t h o s e t h a t c o m p l e t e t o a p a l i n d r o m e
52 f o r pali_index in t r i e . pali_indices :
53 i f i != p a l i _ i n d e x :
54 r e s . append ( [ i , p a l i _ i n d e x ] )
55 i f ' ' i n words :
56 j = words . i n d e x ( ' ' )
57 f o r i , word i n enumerate ( words ) :
58 i f i != j and i s _ p a l i n d r o m e ( word ) :
59 r e s . append ( [ i , j ] )
60 r e s . append ( [ j , i ] )
61 return res

Solution2: .Moreover, there are always more clever ways to solve

palindrome when considering the empty string as prefix for the other
word.
1 class Solution ( object ) :
2 d e f p a l i n d r o m e P a i r s ( s e l f , words ) :
3 # 0 means t h e word i s not r e v e r s e d , 1 means t h e
word i s r e v e r s e d
4 words , l e n g t h , r e s u l t = s o r t e d ( [ ( w, 0 , i , l e n (w) )
f o r i , w i n enumerate ( words ) ] +
5 [ ( w[ : : − 1 ] , 1 , i , l e n (w) )
f o r i , w i n enumerate ( words ) ] ) , l e n ( words ) ∗ 2 , [ ]
6
7 #a f t e r t h e s o r t i n g , t h e same s t r i n g were nearby , one
i s 0 and one i s 1
8 f o r i , ( word1 , rev1 , ind1 , l e n 1 ) i n enumerate ( words
):
9 f o r j i n xrange ( i + 1 , l e n g t h ) :
10 word2 , rev2 , ind2 , _ = words [ j ]
11 #p r i n t word1 , word2
12 i f word2 . s t a r t s w i t h ( word1 ) : # word2 might
be l o n g e r
13 i f i n d 1 != i n d 2 and r e v 1 ^ r e v 2 : # one
i s r e v e r s e d one i s not
14 r e s t = word2 [ l e n 1 : ]
15 i f r e s t == r e s t [ : : − 1 ] : r e s u l t += ( [
ind1 , i n d 2 ] , ) i f r e v 2 e l s e ( [ ind2 , i n d 1 ] , ) # i f r e v 2 i s
r e v e r s e d , t h e from i n d 1 t o i n d 2
16 else :
17 break # from t h e p o i n t o f view , break
i s p o w e r f u l , t h i s way , we o n l y d e a l with p o s s i b l e
reversed ,
18 return r e s u l t
19

There are several other data structures, like balanced trees and hash
tables, which give us the possibility to search for a word in a dataset of
strings. Then why do we need trie? Although hash table has O(1) time
complexity for looking for a key, it is not efficient in the following operations
:

• Finding all keys with a common prefix.

• Enumerating a dataset of strings in lexicographical order.

Sorting Lexicographic sorting of a set of keys can be accomplished by

building a trie from them, and traversing it in pre-order, printing only the
leaves’ values. This algorithm is a form of radix sort. This is why it is also
called radix tree.
Part VI

Math and Geometry

517
23

Math and Probability Problems

In this chapter, we will specifically talk math related problems. Normally, for
the problems appearing in this section, they can be solved using our learned
programming methodology. However, it might not inefficient (we will get
LTE error on the LeetCode) due to the fact that we are ignoring their math
properties which might help us boost the efficiency. Thus, learning some of
the most related math knowledge can make our life easier.

23.1 Numbers
23.1.1 Prime Numbers
A prime number is an integer greater than 1, which is only divisible by 1
and itself. First few prime numbers are : 2 3 5 7 11 13 17 19 23 ...
Some interesting facts about Prime numbers:

1. 2 is the only even Prime number.

2. 2, 3 are only two consecutive natural numbers which are prime too.

3. Every prime number except 2 and 3 can represented in form of 6n+1

or 6n-1, where n is natural number.

4. Goldbach Conjecture: Every even integer greater than 2 can be ex-

pressed as the sum of two primes. Every positive integer can be de-
composed into a product of primes.

5. GCD of a natural number with Prime is always one.

519
520 23. MATH AND PROBABILITY PROBLEMS

6. Fermat’s Little Theorem: If n is a prime number, then for every a,

1 <= a < n,

7. Prime Number Theorem : The probability that a given, randomly

chosen number n is prime is inversely proportional to its number of
digits, or to the logarithm of n.

Check Single Prime Number

Learning to check if a number is a prime number is necessary: the naive
solution comes from the direct definition, for a number n, we try to check if
it can be divided by number in range [2, n − 1], if it divides, then its not a
prime number.
1 def isPrime (n) :
2 # Corner c a s e
3 i f ( n <= 1 ) :
4 return False
5 # Check from 2 t o n−1
6 f o r i in range (2 , n) :
7 i f ( n % i == 0 ) :
8 return False
9 r e t u r n True

There are actually a lot of space for us to optimize the algorithm. First,
√
instead of checking till n, we can check till n because a larger factor of n
must be a multiple of smaller factor that has been already checked. Also,
because even numbers bigger than 2 are not prime, so the step we can set
it to 2. The algorithm can be improved further by use feature 3 that all
primes are of the form 6k ± 1, with the exception of 2 and 3. Together with
feature 4 which implicitly states that every non-prime integer is divisible by
a prime number smaller than itself. So a more efficient method is to test if n
is divisible by 2 or 3, then to check through all the numbers of form 6k ± 1.
1 def isPrime (n) :
2 # corner cases
3 i f n <= 1 :
4 return False
5 i f n<= 3 :
6 r e t u r n True
7
8 i f n % 2 == 0 o r n % 3 == 0 :
9 return False
10
11 f o r i i n r a n g e ( 5 , i n t ( n ∗ ∗ 0 . 5 ) +1, 6 ) : # 6k+1 o r 6k−1, s t e p
6 , up t i l l s q r t ( n ) , when i =5 , check 5 and 7 , ( k−1 , k+1)
12 i f n%i == 0 o r n%( i +2)==0:
13 return False
14 r e t u r n True
15 r e t u r n True
23.1. NUMBERS 521

Generate A Range of Prime Numbers

Wilson theorem says if a number k is prime then ((k − 1)! + 1)%k must
be 0. Below is Python implementation of the approach. Note that the
solution works in Python because Python supports large integers by default
therefore factorial of large numbers can be computed.
1 # Wilson Theorem
2 d e f primesInRange ( n ) :
3 fact = 1
4 rst = [ ]
5 f o r k in range (2 , n) :
6 f a c t ∗= ( k−1)
7 i f ( f a c t + 1 )% k == 0 :
8 r s t . append ( k )
9 return rst
10
11 p r i n t ( primesInRange ( 1 5 ) )
12 # output
13 # [ 2 , 3 , 5 , 7 , 11 , 13]

Sieve Of Eratosthenes To generate a list of primes. It works by recog-

nizing Goldbach Conjecture that all non-prime numbers are divisible by a
prime number. An optimization is to only use odd number in the primes
list, so that we can save half space and half time. The only difference is we
need to do index mapping.
1 d e f primesInRange ( n ) :
2 p r i m e s = [ True ] ∗ n
3 primes [ 0 ] = primes [ 1 ] = False
4 f o r i i n r a n g e ( 2 , i n t ( n ∗∗ 0 . 5 ) + 1 ) :
5 #c r o s s o f f r e m a i n i n g m u l t i p l e s o f prime i , s t a r t with i ∗
i
6 i f primes [ i ] :
7 f o r j in range ( i ∗ i , n , i ) :
8 primes [ j ] = False
9 r s t = [ ] # o r u s e sum ( p r i m e s ) t o g e t t h e t o t a l number
10 f o r i , p i n enumerate ( p r i m e s ) :
11 if p:
12 r s t . append ( i )
13 return rst
14
15 p r i n t ( primesInRange ( 1 5 ) )

23.1.2 Ugly Numbers

Ugly numbers are positive numbers whose prime factors only include 2, 3, 5.
We can write it as uglynumber = 2i 3j 5k , i >= 0, j >= 0, k >= 0. Examples
of ugly numbers: 1, 2, 3, 5, 6, 10, 15, ... The concept of ugly number is
quite simple. Now let us use the LeetCode problems as example to derive
the algorithms to identify ugly numbers.
522 23. MATH AND PROBABILITY PROBLEMS

Check a Single Number

263. Ugly Number (Easy)
1 Ugly numbers a r e p o s i t i v e numbers whose prime f a c t o r s o n l y
i n c l u d e 2 , 3 , 5 . For example , 6 , 8 a r e u g l y w h i l e 14 i s not
u g l y s i n c e i t i n c l u d e s a n o t h e r prime f a c t o r 7 .
2
3 Note :
4 1 i s t y p i c a l l y t r e a t e d a s an u g l y number .
5 Input i s w i t h i n t h e 32− b i t s i g n e d i n t e g e r r a n g e .

Analysis: because the ugly number is only divisible by 2, 3, 5, so if we keep

dividing the number by these factors (num/f ), eventually we would get 1, if
the reminder (num%f ) is 0 (divisible), otherwise we stop the loop to check
the number.
1 d e f i s U g l y ( s e l f , num) :
2 """
3 : type num : i n t
4 : rtype : bool
5 """
6 i f num ==0:
7 return False
8 factor = [2 ,3 ,5]
9 for f in factor :
10 w h i l e num%f ==0:
11 num/= f
12 r e t u r n num == 1

Generate A Range of Number

264. Ugly Number II (medium)
1 Write a program t o f i n d t h e n−th u g l y number .
2
3 Ugly numbers a r e p o s i t i v e numbers whose prime f a c t o r s o n l y
i n c l u d e 2 , 3 , 5 . For example , 1 , 2 , 3 , 4 , 5 , 6 , 8 , 9 , 1 0 , 12
i s t h e s e q u e n c e o f t h e f i r s t 10 u g l y numbers .
4
5 Note t h a t 1 i s t y p i c a l l y t r e a t e d a s an u g l y number , and n d o e s
not e x c e e d 1 6 9 0 .

Analysis: The first solution is we use the rules uglynumber = 2i 3j 5k , i >=

0, j >= 0, k >= 0, using three for loops to generate at least 1690 ugly
numbers that is in the range of 23 2, and then sort them, the time complexity
is O(nlogn), with O(n) in space. However, if we need to constantly make
request, it seems resasonable to save a table, and once the table is generated
and saved, each time we would only need constant time to check.
1 from math import l o g , c e i l
2 class Solution :
3 u g l y = [ 2 ∗ ∗ i ∗ 3∗∗ j ∗ 5∗∗ k f o r i i n r a n g e ( 3 2 ) f o r j i n r a n g e
( c e i l ( log (2∗∗32 , 3) ) ) f o r k in range ( c e i l ( log (2∗∗32 , 5) ) ) ]
23.1. NUMBERS 523

4 ugly . s o r t ( )
5 d e f nthUglyNumber ( s e l f , n ) :
6 """
7 : type n : i n t
8 : rtype : int
9 """
10 r e t u r n s e l f . u g l y [ n−1]

The second way is only generate the nth ugly number, with
1 class Solution :
2 n = 1690
3 ugly = [ 1 ]
4 i2 = i3 = i5 = 0
5 f o r i i n r a n g e ( n−1) :
6 u2 , u3 , u5 = 2 ∗ u g l y [ i 2 ] , 3 ∗ u g l y [ i 3 ] , 5 ∗ u g l y [ i 5 ]
7 umin = min ( u2 , u3 , u5 )
8 u g l y . append ( umin )
9 i f umin == u2 :
10 i 2 += 1
11 i f umin == u3 :
12 i 3 += 1
13 i f umin == u5 :
14 i 5 += 1
15
16 d e f nthUglyNumber ( s e l f , n ) :
17 """
18 : type n : i n t
19 : rtype : int
20 """
21 r e t u r n s e l f . u g l y [ n−1]

23.1.3 Combinatorics
1. 611. Valid Triangle Number

23.1 Pascal’s Triangle II(L119, *). Given a non-negative index k where

k <= 33, return the kth index row of the Pascal’s triangle. Note that
the row index starts from 0. In Pascal’s triangle, each number is the
sum of the two numbers directly above it.
Example :
I nput : 3
Output : [ 1 , 3 , 3 , 1 ]

Follow up: Could you optimize your algorithm to use only O(k) extra
space? Solution: Generate from Index 0 to K.
1 d e f getRow ( s e l f , rowIndex ) :
2 i f rowIndex == 0 :
3 return [ 1 ]
4 # f i r s t , n = rowIndex +1, i f n i s even ,
5 ans = [ 1 ]
524 23. MATH AND PROBABILITY PROBLEMS

6 f o r i i n r a n g e ( rowIndex ) :
7 tmp = [ 1 ] ∗ ( i +2)
8 f o r j i n r a n g e ( 1 , i +1) :
9 tmp [ j ] = ans [ j −1]+ ans [ j ]
10 ans = tmp
11 r e t u r n ans

Triangle Counting

Smallest Larger Number

556. Next Greater Element III
1 Given a p o s i t i v e 32− b i t i n t e g e r n , you need t o f i n d t h e s m a l l e s t
32− b i t i n t e g e r which has e x a c t l y t h e same d i g i t s e x i s t i n g i n
t h e i n t e g e r n and i s g r e a t e r i n v a l u e than n . I f no such
p o s i t i v e 32− b i t i n t e g e r e x i s t s , you need t o r e t u r n −1.
2
3 Example 1 :
4
5 Input : 12
6 Output : 21
7
8 Example 2 :
9
10 Input : 21
11 Output : −1

Analysis: The first solution is to get all digits [1,2], and generate all the
permutation [[1,2],[2,1]], and generate the integer again, and then sort gen-
erated integers, so that we can pick the next one that is larger. But the time
complexity is O(n!).
Now, let us think about more examples to find the rule here:
1 435798 − >435879
2 1432−>2134

If we start from the last digit, we look to its left, find the cloest digit that
has smaller value, we then switch this digit, if we cant find such digit, then
we search the second last digit. If none is found, then we can not find one.
Like 21. return -1. This process is we get the first larger number to the
right.
1 [ 5 , 5 , 7 , 8 , −1, −1]
2 [ 2 , −1, −1, −1]

After the this we switch 8 with 7: we get

1 4358 97
2 2 431

For the reminding digits, we do a sorting and put them back to those digit
to get the smallest value
23.1. NUMBERS 525

1 class Solution :
2 def getDigits ( s e l f , n) :
3 digits = []
4 while n :
5 d i g i t s . append ( n%10) # t h e l e a s t i m p o r t a n t p o s i t i o n
6 n = i n t (n/10)
7 return d i g i t s
8 d e f g e t S m a l l e s t L a r g e r E l e m e n t ( s e l f , nums ) :
9 i f not nums :
10 return [ ]
11 r s t = [ −1]∗ l e n ( nums )
12
13 f o r i , v i n enumerate ( nums ) :
14 smallestLargerNum = s y s . maxsize
15 i n d e x = −1
16 f o r j i n r a n g e ( i +1 , l e n ( nums ) ) :
17 i f nums [ j ]>v and smallestLargerNum > nums [ j ] :
18 index = j
19 smallestLargerNum = nums [ j ]
20 i f smallestLargerNum < s y s . maxsize :
21 r s t [ i ] = index
22 return rst
23
24
25 def nextGreaterElement ( s e l f , n) :
26 """
27 : type n : i n t
28 : rtype : int
29 """
30 i f n==0:
31 r e t u r n −1
32
33 d i g i t s = s e l f . getDigits (n)
34 digits = digits [:: −1]
35 # print ( digits )
36
37 r s t = s e l f . getSmallestLargerElement ( d i g i t s )
38 # print ( rst )
39 stop_index = −1
40
41 # switch
42 f o r i i n r a n g e ( l e n ( r s t ) −1, −1, −1) :
43 i f r s t [ i ]!= −1: #s w i t c h
44 print ( ' switch ' )
45 stop_index = i
46 digits [ i ] , digits [ rst [ i ] ] = digits [ rst [ i ]] ,
digits [ i ]
47 break
48 i f stop_index == −1:
49 r e t u r n −1
50
51 # print ( digits )
52
53 # s o r t from stop_index+1 t o t h e end
526 23. MATH AND PROBABILITY PROBLEMS

54 d i g i t s [ stop_index + 1 : ] = s o r t e d ( d i g i t s [ stop_index + 1 : ] )
55 print ( digits )
56
57 #c o n v e r t t h e d i g i t i a l i z e d answer t o i n t e g e r
58 nums = 0
59 digit = 1
60 for i in d i g i t s [ : : − 1 ] :
61 nums+=d i g i t ∗ i
62 d i g i t ∗=10
63 i f nums >2147483647:
64 r e t u r n −1
65
66
67 r e t u r n nums

23.2 Intersection of Numbers

In this section, intersection of numbers is to find the “common" thing be-
tween them, for example Greatest Common Divisor and Lowest Common
Multiple.

23.2.1 Greatest Common Divisor

GCD (Greatest Common Divisor) or HCF (Highest Common Factor) of two
numbers a and b is the largest number that divides both of them. For
example shown as follows:
1 The d i v i s o r s o f 36 a r e : 1 , 2 , 3 , 4 , 6 , 9 , 1 2 , 1 8 , 36
2 The d i v i s o r s o f 60 a r e : 1 , 2 , 3 , 4 , 5 , 6 , 1 0 , 1 2 , 1 5 , 3 0 , 60
3 GCD = 12

Special case is when one number is zero, the GCD is the value of the other.
gcd(a, 0) = a.
The basic algorithm is: we get all divisors of each number, and then find
the largest common value. Now, let’s see how to we advance this algorithm.
We can reformulate the last example as:
1 36 = 2 ∗ 2 ∗ 3 ∗ 3
2 60 = 2 ∗ 2 ∗ 3 ∗ 5
3 GCD = 2 ∗ 2 ∗ 3
4 = 12

So if we use 60−36 = 2∗2∗3∗5−2∗2∗3∗3 = (2∗2∗3)∗(5−3) = 2∗2∗3∗2. So

we can derive the principle that the GCD of two numbers does not change
if the larger number is replaced by its difference with the smaller number.
The features of GCD:

1. gcd(a, 0) = a

2. gcd(a, a) = a,
23.2. INTERSECTION OF NUMBERS 527

3. gcd(a, b) = gcd(a − b, b), if a > b.

Based on the above features, we can use Euclidean Algorithm to gain GCD:
1 def euclid (a , b) :
2 w h i l e a != b :
3 # r e p l a c e l a r g e r number by i t s d i f f e r e n c e with t h e
s m a l l e r number
4 if a > b:
5 a = a − b
6 else :
7 b = b − a
8 return a
9
10 p r i n t ( e u c l i d (36 , 60) )

The only problem with the Euclidean Algorithm is that it can take several
subtraction steps to find the GCD if one of the given numbers is much bigger
than the other. A more efficient algorithm is to replace the subtraction
with remainder operation. The algorithm would stops when reaching a zero
reminder and now the algorithm never requires more steps than five times
the number of digits (base 10) of the smaller integer.
The recursive version code:
1 def euclidRemainder ( a , b) :
2 i f a == 0 :
3 return b
4 r e t u r n gcd ( b%a , a )

The iterative version code:

1 def euclidRemainder ( a , b) :
2 while a > 0:
3 # r e p l a c e one number with r e m i n d e r between them
4 a , b = b%a , a
5 return b
6
7 p r i n t ( euclidRemainder (36 , 60) )

23.2.2 Lowest Common Multiple

Lowest Common Multiple (LCM) is the smallest number that is a multiple
of both a and b. For example of 6 and 8:
1 The m u l t i p l i e s o f 6 a r e : 6 , 1 2 , 1 8 , 2 4 , 3 0 , . . .
2 The m u l t i p l i e s o f 8 a r e : 8 , 1 6 , 2 4 , 3 2 , 4 0 , . . .
3 LCM = 24

Computing LCM is dependent on the GCD with the following formula:

a×b
lcm(a, b) = (23.1)
gcd(a, b)
528 23. MATH AND PROBABILITY PROBLEMS

23.3 Arithmetic Operations

Because for the computer, it only understands the binary representation as
we learned in Bit Manipulation (Chapter III, the most basic arithmetic op-
eration it supports are binary addition and subtraction. (Of course, it can
execute the bit manipulation too.) The other common arithmetic operations
such as Multiplication, division, modulus, exponent are all implemented/-
coded with the addition and subtraction as basis or in a dominant fashion.
As a software engineer, have a sense of how we can implement the other op-
erations from the given basis is reasonable and a good practice of the coding
skills. Also, sometimes if the factor to compute on is extra large number,
which is to say the computer can not represent, we can still compute the
result by treating these numbers as strings.
In this section, we will explore operations include multiplication, divi-
sion. There are different algorithms that we can use, we learn a standard
one called long multiplication and long division. I am assuming you know
the algorithms and focusing on the implementation of the code instead.

Long Multiplication

Long Division We treat the dividend as a string, e.g. dividend = 3456,

and the divisor = 12. We start with 34, which has the digits as of divisor.
34/12 = 2, 10, where 2 is the integer part and 10 is the reminder. Next
step, we take the reminder and join with the next digit in the dividend, we
get 105/12 = 8, 9. Smilarily, 96/12 = 8, 0. Therefore we get the results by
joinging the result of each dividending operation, ’288’. To see the coding,
let us code it the way required by the following LeetCode Problem. In the
process we need (n-m) (n, m is the total number of digits of dividend and
divisor, respectively) division operation. Each division operation will be
done at most 9 steps. This makes the time complexity O(n − m).

23.2 29. Divide Two Integers (medium) Given two integers dividend
and divisor, divide two integers without using multiplication, division
and mod operator. Return the quotient after dividing dividend by
divisor. The integer division should truncate toward zero.
1 Example 1 :
2
3 Input : d i v i d e n d = 1 0 , d i v i s o r = 3
4 Output : 3
5
6 Example 2 :
7
8 Input : d i v i d e n d = 7 , d i v i s o r = −3
9 Output : −2
23.4. PROBABILITY THEORY 529

Analysis: we can get the sign of the result first, and then convert the
dividend and divisor into its absolute value. Also, we better handle
the bound condition that the divisor is larger than the vidivend, we
get 0 directly. The code is given:

1 def d i v i d e ( s e l f , dividend , d i v i s o r ) :
2 d e f d i v i d e ( dd ) : # t h e l a s t p o s i t i o n t h a t d i v i s o r ∗ v a l <
dd
3 s , r = 0, 0
4 f o r i in range (9) :
5 tmp = s + d i v i s o r
6 i f tmp <= dd :
7 s = tmp
8 else :
9 r e t u r n s t r ( i ) , s t r ( dd−s )
10 r e t u r n s t r ( 9 ) , s t r ( dd−s )
11
12 i f d i v i d e n d == 0 :
13 return 0
14 s i g n = −1
15 i f ( d i v i d e n d >0 and d i v i s o r >0 ) o r ( d i v i d e n d < 0 and
d i v i s o r < 0) :
16 sign = 1
17 d i v i d e n d = abs ( d i v i d e n d )
18 d i v i s o r = abs ( d i v i s o r )
19 i f d i v i s o r > dividend :
20 return 0
21 ans , did , dr = [ ] , s t r ( d i v i d e n d ) , s t r ( d i v i s o r )
22 n = l e n ( dr )
23 p r e = d i d [ : n−1]
24 f o r i i n r a n g e ( n−1 , l e n ( d i d ) ) :
25 dd = p r e+d i d [ i ]
26 dd = i n t ( dd )
27 v , p r e = d i v i d e ( dd )
28 ans . append ( v )
29
30 ans = i n t ( ' ' . j o i n ( ans ) ) ∗ s i g n
31
32 i f ans > (1<<31) −1:
33 ans = (1<<31)−1
34 r e t u r n ans

23.4 Probability Theory

In programming tasks, such problems are either solvable with some closed-
form formula or one has no choice than to enumerate the complete search
space.
530 23. MATH AND PROBABILITY PROBLEMS

23.5 Linear Algebra

Gaussian Elimination is one of the several ways to find the solution for a
system of linear euqations.

23.6 Geometry
In this section, we will discuss coordinate related problems.
939. Minimum Area Rectangle(Medium)
Given a set of points in the xy-plane, determine the minimum area of a
rectangle formed from these points, with sides parallel to the x and y axes.
If there isn’t any rectangle, return 0.
1 Example 1 :
2
3 Input : [ [ 1 , 1 ] , [ 1 , 3 ] , [ 3 , 1 ] , [ 3 , 3 ] , [ 2 , 2 ] ]
4 Output : 4
5
6 Example 2 :
7
8 Input : [ [ 1 , 1 ] , [ 1 , 3 ] , [ 3 , 1 ] , [ 3 , 3 ] , [ 4 , 1 ] , [ 4 , 3 ] ]
9 Output : 2

Combination. This at first it is a combination problem, we pick four points

and check if it is a rectangle and then what is the size. However the time
complexity can be Cnk , which will be O(n4 ). The following code implements
the best combination we get, however, we receive LTE:
1 d e f minAreaRect ( s e l f , p o i n t s ) :
2 d e f combine ( p o i n t s , idx , c u r r , ans ) : # h and w a t f i r s t i s
−1
3 i f l e n ( c u r r ) >= 2 :
4 l x , rx = min ( [ x f o r x , _ i n c u r r ] ) , max ( [ x f o r x , _
in curr ] )
5 l y , hy = min ( [ y f o r _, y i n c u r r ] ) , max ( [ y f o r _, y
in curr ] )
6 s i z e = ( rx−l x ) ∗ ( hy−l y )
7 i f s i z e >= ans [ 0 ] :
8 return
9 xs = [ l x , rx ]
10 ys = [ l y , hy ]
11 for x , y in curr :
12 i f x not i n xs o r y not i n ys :
13 return
14
15 i f l e n ( c u r r ) == 4 :
16 ans [ 0 ] = min ( ans [ 0 ] , s i z e )
17 return
18
19 f o r i i n r a n g e ( idx , l e n ( p o i n t s ) ) :
20 i f l e n ( c u r r ) <= 3 :
23.6. GEOMETRY 531

21 combine ( p o i n t s , i +1 , c u r r +[ p o i n t s [ i ] ] , ans )
22 return
23
24 ans =[ s y s . maxsize ]
25 combine ( p o i n t s , 0 , [ ] , ans )
26 r e t u r n ans [ 0 ] i f ans [ 0 ] != s y s . maxsize e l s e 0

Math: Diagonal decides a rectangle. We use the fact that if we know

the two diagonal points, say (1, 2), (3, 4). Then we need (1, 4), (3, 2) to
make it a rectangle. If we save the points in a hashmap, then the time
complexity can be decreased to O(n2 ). The condition that two points are
diagonal is: x1 != x2, y1 != y2. If one of them is equal, then they form a
vertical or horizontal line. If both equal, then its the same points.
1 class Solution ( object ) :
2 d e f minAreaRect ( s e l f , p o i n t s ) :
3 S = s e t (map( t u p l e , p o i n t s ) )
4 ans = f l o a t ( ' i n f ' )
5 f o r j , p2 i n enumerate ( p o i n t s ) : # d e c i d e t h e s e c o n d
point
6 f o r i in range ( j ) : # decide the f i r s point
7 p1 = p o i n t s [ i ]
8 i f ( p1 [ 0 ] != p2 [ 0 ] and p1 [ 1 ] != p2 [ 1 ] and #
avoid
9 ( p1 [ 0 ] , p2 [ 1 ] ) i n S and ( p2 [ 0 ] , p1 [ 1 ] )
in S) :
10 ans = min ( ans , abs ( p2 [ 0 ] − p1 [ 0 ] ) ∗ abs ( p2
[ 1 ] − p1 [ 1 ] ) )
11 r e t u r n ans i f ans < f l o a t ( ' i n f ' ) e l s e 0

Math: Sort by column. Group the points by x coordinates, so that we

have columns of points. Then, for every pair of points in a column (with
coordinates (x,y1) and (x,y2)), check for the smallest rectangle with this
pair of points as the rightmost edge. We can do this by keeping memory of
what pairs of points we’ve seen before.
1 d e f minAreaRect ( s e l f , p o i n t s ) :
2 columns = c o l l e c t i o n s . d e f a u l t d i c t ( l i s t )
3 for x , y in points :
4 columns [ x ] . append ( y )
5 l a s t x = {} # one−p a s s hash
6 ans = f l o a t ( ' i n f ' )
7
8 f o r x i n s o r t e d ( columns ) : # s o r t by t h e k e y s
9 column = columns [ x ]
10 column . s o r t ( ) # s o r t column
11 f o r j , y2 i n enumerate ( column ) : # r i g h t most edge , up
point
12 f o r i i n xrange ( j ) : # r i g h t most edge , l o w e r
point
13 y1 = column [ i ]
14 i f ( y1 , y2 ) i n l a s t x : # 1 : [ 1 , 3 ] , w i l l be
saved , when we were a t 3 : [ 1 , 3 ] , we can g e t t h e answer
532 23. MATH AND PROBABILITY PROBLEMS

15 ans = min ( ans , ( x − l a s t x [ y1 , y2 ] ) ∗ ( y2 − y1

))
16 l a s t x [ y1 , y2 ] = x # y1 , y2 form a t u p l e
17 r e t u r n ans i f ans < f l o a t ( ' i n f ' ) e l s e 0

Traverse linked list using two pointers. Move one pointer by one and
other pointer by two. If these pointers meet at some node then there is a
loop. If pointers do not meet then linked list doesn’t have loop. Once you
detect a cycle, think about finding the starting point.

Figure 23.1: Example of floyd’s cycle finding

1 d e f d e t e c t C y c l e ( s e l f , A) :
2 #f i n d t h e " i n t e r s e c t i o n "
3 p_f=p_s=A
4 w h i l e ( p_f and p_s and p_f . next ) :
5 p_f = p_f . next . next
23.8. EXERCISE 533

6 p_s = p_s . next

7 i f p_f==p_s :
8 break
9 #Find t h e " e n t r a n c e " t o t h e c y c l e .
10 ptr1 = A
11 p t r 2 = p_s ;
12 w h i l e p t r 1 and p t r 2 :
13 i f p t r 1 != p t r 2 :
14 p t r 1 = p t r 1 . next
15 p t r 2 = p t r 2 . next
16 else :
17 return ptr1
18 r e t u r n None

23.8 Exercise
23.8.1 Number
313. Super Ugly Number
1 Super u g l y numbers a r e p o s i t i v e numbers whose a l l prime f a c t o r s
are in t h e g i v e n prime l i s t p r i m e s o f s i z e k . For example ,
[1 , 2 , 4 , 7 , 8 , 13 , 14 , 16 , 19 , 26 , 28 , 32] i s the sequence
of the f i r s t 12 s u p e r u g l y numbers g i v e n p r i m e s = [ 2 , 7 , 1 3 ,
19] of size 4.
2
3 Note :
4 ( 1 ) 1 i s a s u p e r u g l y number f o r any g i v e n p r i m e s .
5 ( 2 ) The g i v e n numbers i n p r i m e s a r e i n a s c e n d i n g o r d e r .
6 ( 3 ) 0 < k <= 1 0 0 , 0 < n <= 1 0 6 , 0 < p r i m e s [ i ] < 1 0 0 0 .
7 ( 4 ) The nth s u p e r u g l y number i s g u a r a n t e e d t o f i t i n a 32− b i t
signed integer .

1 d e f nthSuperUglyNumber ( s e l f , n , p r i m e s ) :
2 """
3 : type n : i n t
4 : type p r i m e s : L i s t [ i n t ]
5 : rtype : int
6 """
7 nums = [ 1 ]
8 i d e x s = [ 0 ] ∗ l e n ( p r i m e s ) #f i r s t i s t h e c u r r e n t i d e x
9 f o r i i n r a n g e ( n−1) :
10 min_v = maxsize
11 min_j = [ ]
12 f o r j , i d e x i n enumerate ( i d e x s ) :
13 v = nums [ i d e x ] ∗ p r i m e s [ j ]
14 i f v<min_v :
15 min_v = v
16 min_j=[ j ]
17 e l i f v==min_v :
18 min_j . append ( j ) #we can g e t m u t i p l e j i f
there i s a t i e
534 23. MATH AND PROBABILITY PROBLEMS

19 nums . append ( min_v )

20 f o r j i n min_j :
21 i d e x s [ j ]+=1
22 r e t u r n nums [ −1]
Part VII

Problem-Patterns

535
24

Array Questions(15%)

In this chapter, we mainly discuss about the array based questions. We first
categorize these problems into different type, and then each type can usually
be solved and optimized with nearly the best efficiency.
Given an array, a subsequence is composed of elements whose subscripts
are increasing in the original array. A subarray is a subset of subsequence,
which is contiguous subsequence. Subset contain any possible combinations
of the original array. For example, for array [1, 2, 3, 4]:
Subsequence
[1 , 3]
[1 , 4]
[1 , 2 , 4]
Subarray
[1 , 2]
[2 , 3]
[2 , 3 , 4]
Subset i n c l u d e s d i f f e r e n t length o f subset , e i t h e r
length 0: [ ]
length 1: [ 1 ] , [ 2 ] , [ 3 ] , [ 4 ]
length 2: [1 , 2] , [1 , 3] , [1 , 4] , [2 , 3] , [2 , 4] , [3 , 4]

Here array means one dimension list. For array problems, math will play
an important role here. The rules are as follows:

• Subarray: using dynamic programming based algorithm to make brute

force O(n3 ) to O(n). Two pointers for the increasing subarray. Prefix
sum, or kadane’s algorithm plus sometimes with the hashmap, or two
pointers (three pointers) for the maximum subarray.

• Subsequence: using dynamic programming based algorithm to make

brute force O(2n ) to O(n2 ), which corresponds to the seqence type of

537
538 24. ARRAY QUESTIONS(15%)

dynamic programming.

• Duplicates: 217, 26, 27, 219, 287, 442;

• Intersections of Two Arrays:

Before we get into solving each type of problems, we first introduce the
algorithms we will needed in this Chapter, including two pointers (three
pointers or sliding window), prefix sum, kadane’s algorithm. Kadane’s al-
gorithm can be explained with sequence type of dynamic programming.
After this chapter, we need to learn the step to solve these problems:

1. Analyze the problem and categorize it. To know the naive solution’s
time complexity can help us identify it.

2. If we can not find what type it is, let us see if we can convert. If
not, we can try to identify a simple version of this problem, and then
upgrade the simple solution to the more complex one.

3. Solve the problem with the algorithms we taught in this chapter.

4. Try to see if there is any more solutions.

5. Check the special case. (Usually very important for this type of prob-
lems)

24.1 Subarray
Note: For subarray the most important feature is contiguous. Here, we
definitely will not use sorting. Given an array with size n, the total number
of subarrays we have is i=n i=1 i = n ∗ (n + 1)/2, which makes the time
P

complexity of naive solution that use two nested for/while loop O(n2 ) or
O(n3 ).
There are two types of problems related to subarry: Range Query
and optimization-based subarray. The Range query problems include
querying the minimum/maximum or sum of all elements in a given range
[i,j] in an array. Range Query has a more standard way to solve, either by
searching or with the segment tree:

Range Query

1. 303. Range Sum Query - Immutable

2. 307. Range Sum Query - Mutable

3. 304. Range Sum Query 2D - Immutable

24.1. SUBARRAY 539

Optimization-based subarray Given a single array, we would normally

be asked to return either the maximum/minimum value, the maximum/min-
imum length, or the number of subarrays that has sum/product that satisfy
a certain condition. The condition here decide the difficulty of these prob-
lems.
The questions can are classified into two categories:

1. Absolute-conditioned Subarray that sum/product = K or

2. Vague-conditioned subarray that has these symbols that is not equal.

With the proposed algorithms, the time complexity of subarray problems

can be decreased from the brute force O(n3 ) to O(n). The brute force is
universal: two nested for loops marked the start and end of the subarray to
enumerate all the possible subarrays, and another O(n) spent to compute
the result needed (sum or product or check the pattern like increasing or
decreasing).
As we have discussed in the algorithm section,

1. stack/queue/monotone stack can be used to solve subarray prob-

lems that is related to its smaller/larger item to one item’s left/right
side

2. sliding window can be used to find subarray that either the sum
or product inside of the sliding window is ordered (either monotone
increasing/decreasing). This normally requires that the array are all
positive or all negative. We can use the sliding window to cover its all
search space. Or else we cant use sliding window.

3. For all problems related with subarray sum/product, for both vague
or absolute conditioned algorithm, we have a universal algorithm: save
the prefix sum (sometimes together with index) in a sorted array, and
use binary search to find all possible starting point of the window.

4. Prefix Sum or Kadane’s algorithm can be used when we need to get

the sum of the subarry.

1. 53. Maximum Subarray (medium)

2. 325. Maximum Size Subarray Sum Equals k

3. 525. Contiguous Array

4. 560. Subarray Sum Equals K

5. 209. Minimum Size Subarray Sum (medium)

Monotone stack and vague conditioned subarray

540 24. ARRAY QUESTIONS(15%)

1. 713. Subarray Product Less Than K (all positive)

2. 862. Shortest Subarray with Sum at Least K (with negative)

3. 907. Sum of Subarray Minimums (all positive, but minimum in all

subarray and sum)

24.1.1 Absolute-conditioned Subarray

For the maximum array, you are either asked to return:

1. the maximum sum or product; solved using prefix sum or kadane’s

algorithm

2. the maximum length of subarray with sum or product S equals to K;

solved using prefix sum together with a hashmap saves previous prefix
sum and its indices

3. the maximum number of subarray with sum or product S (the to-

tal number of) equals to K; solved using prefix sum together with a
hashmap saves previous prefix sum and its count

Maximum/Minimum sum or product

24.1 53. Maximum Subarray (medium). Find the contiguous subarray

within an array (containing at least one number) which has the largest
sum.
For example , g i v e n t h e a r r a y [ − 2 , 1 , − 3 , 4 , − 1 , 2 , 1 , − 5 , 4 ] ,
t h e c o n t i g u o u s s u b a r r a y [ 4 , − 1 , 2 , 1 ] has t h e l a r g e s t sum =
6.

Solution: Brute force is to use two for loops, first is the starting,
second is the end, then we can get the maximum value. To optimize,
we can use divide and conquer, O(nlgn) vs brute force is O(n3 ) (two
embedded for loops and n for computing the sum). The divide and
conquer method was shown in that chapter. A more efficient algorithm
is using pre_sum. Please check Section ?? for the answer.
Now what is the slinding window solution? The key step in sliding
window is when to move the first pointer of the window (shrinking
the window). The window must include current element j. For the
maximum subarray, to increase the sum of the window, we need to
abandon any previous elements if they have negative sum.
1 from s y s import maxsize
2 class Solution :
3 d e f maxSubArray ( s e l f , nums ) :
4 """
5 : type nums : L i s t [ i n t ]
24.1. SUBARRAY 541

6 : rtype : int
7 """
8 i f not nums :
9 return 0
10 i , j = 0 , 0 #i<=j
11 maxValue = −maxsize
12 window_sum = 0
13 w h i l e j < l e n ( nums ) :
14 window_sum += nums [ j ]
15 j += 1
16 maxValue = max( maxValue , window_sum )
17 w h i l e i <j and window_sum < 0 :
18 window_sum −= nums [ i ]
19 i += 1
20 r e t u r n maxValue

Maximum/Minimum length of subarray with sum or product S

For this type of problem we need to track the length of it.

24.2 325. Maximum Size Subarray Sum Equals k.

Given an array nums and a target value k, find the maximum length of
a subarray that sums to k. If there isn’t one, return 0 instead. Note:
The sum of the entire nums array is guaranteed to fit within the 32-bit
signed integer range.
Example 1 :
Given nums = [ 1 , −1, 5 , −2, 3 ] , k = 3 ,
r e t u r n 4 . ( b e c a u s e t h e s u b a r r a y [ 1 , −1, 5 , −2] sums t o 3
and i s t h e l o n g e s t )

Example 2 :
Given nums = [ −2 , −1, 2 , 1 ] , k = 1 ,
r e t u r n 2 . ( b e c a u s e t h e s u b a r r a y [ −1 , 2 ] sums t o 1 and i s
the l o n g e s t )

Follow Up :
Can you do i t i n O( n ) time ?

Solution: Prefix Sum Saved as Hashmap. Answer: the brute

force solution of this problem is the same as the maximum subarray.
The similarity here is we track the prefix sum S(i,j) = yj − yi−1 , if
we only need to track a certain value of S(i,j) , which is k. Because
yi = yj − k which is the current prefix sum minus the k. If we use a
hashmap to save the set of prefix sum together with the first index of
this value appears. We saved (yi , f irst_index), so that max_len =
max(idx − dict[yj − k]).
1 d e f maxSubArrayLen ( s e l f , nums , k ) :
2 """
542 24. ARRAY QUESTIONS(15%)

3 : type nums : L i s t [ i n t ]
4 : type k : i n t
5 : rtype : int
6 """
7 prefix_sum = 0
8 d i c t = {0: −1} #t h i s means f o r i n d e x −1, t h e sum i s
0
9 max_len = 0
10 f o r idx , n i n enumerate ( nums ) :
11 prefix_sum += n
12 # s a v e t h e s e t o f p r e f i x sum t o g e t h e r with t h e
f i r s t index of t h i s value appears .
13 i f prefix_sum not i n d i c t :
14 d i c t [ prefix_sum ] = i d x
15 # t r a c k t h e maximum l e n g t h s o f a r
16 i f prefix_sum−k i n d i c t :
17 max_len=max( max_len , idx−d i c t [ sum_i−k ] )
18 r e t u r n max_len

Another example that asks for pattern but can be converted or equiv-
alent to the last problems:

24.3 525. Contiguous Array. Given a binary array, find the maximum
length of a contiguous subarray with equal number of 0 and 1. Note:
The length of the given binary array will not exceed 50,000.
Example 1 :

Input : [ 0 , 1 ]
Output : 2
E x p l a n a t i o n : [ 0 , 1 ] i s t h e l o n g e s t c o n t i g u o u s s u b a r r a y with
e q u a l number o f 0 and 1 .

Example 2 :

Input : [ 0 , 1 , 0 ]
Output : 2
Explanation : [ 0 , 1 ] ( or [ 1 , 0 ] ) i s a l o n g e s t contiguous
s u b a r r a y with e q u a l number o f 0 and 1 .

Solution: the problem is similar to the maximum sum of array with

sum==0, so 0=-1, 1==1. Here our k = 0
1 d e f findMaxLength ( s e l f , nums ) :
2 """
3 : type nums : L i s t [ i n t ]
4 : rtype : int
5 """
6 nums=[nums [ i ] i f nums [ i ]==1 e l s e −1 f o r i i n r a n g e (
l e n ( nums ) ) ]
7
8 max_len=0
9 cur_sum=0
10 mapp={0:−1}
24.1. SUBARRAY 543

11
12 f o r idx , v i n enumerate ( nums ) :
13 cur_sum+=v
14 i f cur_sum i n mapp :
15 max_len=max( max_len , idx−mapp [ cur_sum ] )
16 else :
17 mapp [ cur_sum]= i d x
18
19 r e t u r n max_len

24.4 674. Longest Continuous Increasing Subsequence Given an un-

sorted array of integers, find the length of longest continuous increasing
subsequence (subarray).
Example 1 :
I nput : [ 1 , 3 , 5 , 4 , 7 ]
Output : 3
E x p l a n a t i o n : The l o n g e s t c o n t i n u o u s i n c r e a s i n g s u b s e q u e n c e
i s [1 ,3 ,5] , i t s length i s 3.
Even though [ 1 , 3 , 5 , 7 ] i s a l s o an i n c r e a s i n g s u b s e q u e n c e , i t
' s not a c o n t i n u o u s one where 5 and 7 a r e s e p a r a t e d by
4.

Example 2 :
I nput : [ 2 , 2 , 2 , 2 , 2 ]
Output : 1
E x p l a n a t i o n : The l o n g e s t c o n t i n u o u s i n c r e a s i n g s u b s e q u e n c e
i s [ 2 ] , i t s length i s 1.
\ t e x t i t { Note : Length o f t h e a r r a y w i l l not e x c e e d 1 0 , 0 0 0 . }

Solution: The description of this problem should use ”subarray" in-

stead of the ”subsequence". The brute force solution is like any sub-
array problem O(n3 ). For embedded for loops to enumerate the sub-
array, and another O(n) to check if it is strictly increasing. Using two
pointers, we can get O(n) time complexity. We put two pointers: one
i located at the first element of the nums, second j at the second ele-
ment. We specifically restrict the subarray from i to j to be increasing,
if this is violated, we reset the starting point of the subarray from the
violated place.
1 class Solution :
2 d e f findLengthOfLCIS ( s e l f , nums ) :
3 """
4 : type nums : L i s t [ i n t ]
5 : rtype : int
6 """
7 i f not nums :
8 return 0
9 i f l e n ( nums ) ==1:
10 return 1
11 i , j = 0 ,0
12 max_length = 0
544 24. ARRAY QUESTIONS(15%)

13 w h i l e j < l e n ( nums ) :
14 j += 1 #s l i d e t h e window
15 max_length = max( max_length , j −i )
16 # when c o n d i t i o n v i o l a t e d , r e s e t t h e window
17 i f j <l e n ( nums ) and nums [ j −1]>=nums [ j ] :
18 i = j
19
20 r e t u r n max_length

24.5 209. Minimum Size Subarray Sum (medium) Given an array

of n positive integers and a positive integer s, find the minimal length
of a contiguous subarray of which the sum >= s. If there isn’t one,
return 0 instead.
Example :

Input : s = 7 , nums = [ 2 , 3 , 1 , 2 , 4 , 3 ]
Output : 2
E x p l a n a t i o n : t h e s u b a r r a y [ 4 , 3 ] has t h e minimal l e n g t h
under t h e problem c o n s t r a i n t .

Solution 1: Sliding Window, O(n).

1 d e f minSubArrayLen ( s e l f , s , nums ) :
2 ans = f l o a t ( ' i n f ' )
3 n = l e n ( nums )
4 i = j = 0
5 acc = 0
6 while j < n :
7 a c c += nums [ j ] # i n c r e a s e t h e window s i z e
8 w h i l e a c c >= s : # s h r i n k t h e window t o g e t t h e
optimal r e s u l t
9 ans = min ( ans , j −i +1)
10 a c c −= nums [ i ]
11 i += 1
12 j +=1
13 r e t u r n ans i f ans != f l o a t ( ' i n f ' ) e l s e 0

Solution 2: prefix sum and binary search. O(n logn ). Assuming

current prefix sum is pi , We need to find the max pj ≤ (pi − s), this is
the right most value in the prefix sum array (sorted) that is <= pi − s.
1 from b i s e c t import b i s e c t _ r i g h t
2 class Solution ( object ) :
3 d e f minSubArrayLen ( s e l f , s , nums ) :
4 ans = f l o a t ( ' i n f ' )
5 n = l e n ( nums )
6 i = j = 0
7 ps = [ 0 ]
8 while j < n :
9 ps . append ( nums [ j ]+ ps [ − 1 ] )
10 # find a posible l e f t i
11 i f ps [−1]− s >= 0 :
24.1. SUBARRAY 545

12 i n d e x = b i s e c t _ r i g h t ( ps , ps [−1]− s )
13 i f index > 0 :
14 i n d e x −= 1
15 ans = min ( ans , j −i n d e x +1)
16 j+=1
17 r e t u r n ans i f ans != f l o a t ( ' i n f ' ) e l s e 0

The maximum number of subarray with sum or product S

24.6 560. Subarray Sum Equals K Given an array of integers and an

integer k, you need to find the total number of continuous subarrays
whose sum equals to k.
Example 1 :
I nput : nums = [ 1 , 1 , 1 ] , k = 2
Output : 2

Answer: The naive solution is we enumerate all possible subarray

which is n2 , and then we compute and check its sum which is O(n).
So the total time complexity is O(n3 ) time complexity. However, we
can decrease it to O(n2 ) if we compute the sum of array in a differ-
ent way: we first compute the sum till current index for each position,
with equation sum(i, j) = sum(0, j)−sum(0, i). However the OJ gave
us LTE error.
1 d e f subarraySum ( s e l f , nums , k ) :
2 """
3 : type nums : L i s t [ i n t ]
4 : type k : i n t
5 : rtype : int
6 """
7 ' ' ' r e t u r n t h e number o f s u b a r r a y s t h a t e q u a l t o k
'''
8 count = 0
9 sums = [ 0 ] ∗ ( l e n ( nums ) +1) # sum t i l l c u r r e n t i n d e x
10 f o r idx , v i n enumerate ( nums ) :
11 sums [ i d x +1] = sums [ i d x ]+v
12 f o r i i n r a n g e ( l e n ( nums ) ) :
13 f o r j i n r a n g e ( i , l e n ( nums ) ) :
14 v a l u e = sums [ j +1]−sums [ i ]
15 count = count+1 i f v a l u e==k e l s e count
16 r e t u r n count

Solution 3: using prefix_sum and hashmap, to just need to reformulate

dict[sum_i]. For this question, we need to get the total number of
subsubarray, so dict[i] = count, which means every time we just set
the dict[i]+=1. dict[0]=1
1 import c o l l e c t i o n s
2 class Solution ( object ) :
3 d e f subarraySum ( s e l f , nums , k ) :
546 24. ARRAY QUESTIONS(15%)

4 """
5 : type nums : L i s t [ i n t ]
6 : type k : i n t
7 : rtype : int
8 """
9 ' ' ' r e t u r n t h e number o f s u b a r r a y s t h a t e q u a l t o k
'''
10 d i c t = c o l l e c t i o n s . d e f a u l t d i c t ( i n t ) #t h e v a l u e i s
t h e number o f t h e sum o c c u r s
11 d i c t [0]=1
12 prefix_sum , count =0, 0
13 f o r v i n nums :
14 prefix_sum += v
15 count += d i c t [ prefix_sum−k ] # i n c r e a s e t h e
counter of the appearing value k , d e f a u l t i s 0
16 d i c t [ prefix_sum ] += 1 # update t h e count o f
p r e f i x sum , i f i t i s f i r s t time , t h e d e f a u l t v a l u e i s 0
17 r e t u r n count

24.7 974. Subarray Sums Divisible by K. Given an array A of integers,

return the number of (contiguous, non-empty) subarrays that have a
sum divisible by K.
Example 1 :

Input : A = [ 4 , 5 , 0 , − 2 , − 3 , 1 ] , K = 5
Output : 7
E x p l a n a t i o n : There a r e 7 s u b a r r a y s with a sum d i v i s i b l e by
K = 5:
[ 4 , 5 , 0 , −2, −3, 1 ] , [ 5 ] , [ 5 , 0 ] , [ 5 , 0 , −2, −3] , [ 0 ] , [ 0 ,
−2, −3] , [ −2 , −3]

Analysis: for the above array, we can compute the prefix sum as
[0,4,9, 9, 7,4,5]. Let P[i+1] = A[0] + A[1] + ... + A[i]. Then, each
subarray can be written as P[j] - P[i] (for j > i). We need to i for
current j index that (P[j]-P[i])% K == 0. Because P[j]%K=P[i]%K,
therefore different compared with when sum == K, we not check P[j]-
K but instead P[j]%K if it is in the hashmap. Therefore, we need to
save the prefix sum as the modulo of K. For the example, we have dict:
0: 2, 4: 4, 2: 1.
1 from c o l l e c t i o n s import d e f a u l t d i c t
2 class Solution :
3 d e f subarraysDivByK ( s e l f , A, K) :
4 """
5 : type A: L i s t [ i n t ]
6 : type K: i n t
7 : rtype : int
8 """
9 a_sum = 0
10 p_dict = d e f a u l t d i c t ( i n t )
11 p_dict [ 0 ] = 1 # when i t i s empty we s t i l l has one
0:1
24.1. SUBARRAY 547

12 ans = 0
13 f o r i , v i n enumerate (A) :
14 a_sum += v
15 a_sum %= K
16 i f a_sum i n p_dict :
17 ans += p_dict [ a_sum ]
18 p_dict [ a_sum ] += 1 # s a v e t h e remodule i n s t e a d
19 r e t u r n ans

Solution 2: use Combination Then P = [0,4,9,9,7,4,5], and C0 =

2, C2 = 1, C4 = 4. With C0 = 2, (at P[0] and P[6]), it indicates C21
subarray with sum divisible by K, namely A[0:6]=[4, 5, 0, -2,-3,1].
With C4 = 4 (at P[1], P[2], P[3], P[5]), it indicates C42 = 6 subarrays
with sum divisible by K, namely A[1:2]], A[1:3], A[1:5], A[2:3], A[2:5],
A[3:5].
1 d e f subarraysDivByK ( s e l f , A, K) :
2 P = [0]
3 f o r x i n A:
4 P . append ( ( P[ −1] + x ) % K)
5
6 count = c o l l e c t i o n s . Counter (P)
7 r e t u r n sum ( v ∗ ( v−1) /2 f o r v i n count . v a l u e s ( ) )

24.1.2 Vague-conditioned subarray

In this section, we would be asked to ask the same type of question comapred
with the last section. The only difference is the condition. For example, in
the following question, it is asked with subarray that with sum >= s.
Because of the vague of the condition, a hashmap+prefix sum solution
will on longer give us O(n) liner time. The best we can do if the array is all
positive number we can gain O(nlgn) if it is combined with binary search.
However, a carefully designed sliding window can still help us achieve linear
time O(n). For array with negative number, we can ultilize monotonic queue
mentioned in Section 21.1, which will achieve O(n) both in time and space
complexity.

All Positive Array (Sliding Window) If it is all positive array, it can

still be easily solved with sliding window. For example:
24.8 209. Minimum Size Subarray Sum (medium) Given an array
of n positive integers and a positive integer s, find the minimal length
of a contiguous subarray of which the sum >= s. If there isn’t one,
return 0 instead.
Example :
I nput : s = 7 , nums = [ 2 , 3 , 1 , 2 , 4 , 3 ]
Output : 2
E x p l a n a t i o n : t h e s u b a r r a y [ 4 , 3 ] has t h e minimal l e n g t h
under t h e problem c o n s t r a i n t .
548 24. ARRAY QUESTIONS(15%)

Follow up: If you have figured out the O(n) solution, try coding an-
other solution of which the time complexity is O(n log n).
Analysis. For this problem, we can still use prefix sum saved in
hashmap. However, since the condition is sum >= s, if we use
a hashmap, we need to search through the hashmap with key <=
pref ixs um − s. The time complexity would rise up to O(n2 ) if we use
linear search. We would receive LTE error.
1 d e f minSubArrayLen ( s e l f , s , nums ) :
2 """
3 : type s : i n t
4 : type nums : L i s t [ i n t ]
5 : rtype : int
6 """
7 i f not nums :
8 return 0
9 dict = collections . defaultdict ( int )
10 d i c t [ 0 ] = −1 # pre_sum 0 with i n d e x −1
11 prefixSum = 0
12 minLen = s y s . maxsize
13 f o r idx , n i n enumerate ( nums ) :
14 prefixSum += n
15 f o r key , v a l u e i n d i c t . i t e m s ( ) :
16 i f key <= prefixSum − s :
17 minLen = min ( minLen , idx−v a l u e )
18 d i c t [ prefixSum ] = i d x #s a v e t h e l a s t i n d e x
19 r e t u r n minLen i f 1<=minLen<=l e n ( nums ) e l s e 0

Solution 1: Prefix Sum and Binary Search. Because the items in

the array are all positive number, so the prefix sum array is increasing,
this means if we save the prefix sum in an array, it is ordered, we can
use binary search to find the index of largest value <= (prefix sum -
s). If we use bisect module, we can use bisect_right function which
finds the right most position that we insert current value to keep the
array ordered. The index will be rr-1.
1 import b i s e c t
2 d e f minSubArrayLen ( s e l f , s , nums ) :
3 ps = [ 0 ]
4 ans = l e n ( nums )+1
5 f o r i , v i n enumerate ( nums ) :
6 ps . append ( ps [ −1] + v )
7 #f i n d t h e r i g h t most p o s i t i o n t h a t <=
8 r r = b i s e c t . b i s e c t _ r i g h t ( ps , ps [ i +1] − s )
9 i f rr :
10 ans = min ( ans , i +1 − ( r r −1) )
11 r e t u r n ans i f ans <= l e n ( nums ) e l s e 0

1 d e f minSubArrayLen ( s e l f , s , nums ) :
2 """
3 : type s : i n t
4 : type nums : L i s t [ i n t ]
24.1. SUBARRAY 549

5 : rtype : int
6 """
7 d e f bSearch ( nums , i , j , t a r g e t ) :
8 while i < j :
9 mid = ( i+j ) / 2
10 i f nums [ mid ] == t a r g e t :
11 r e t u r n mid
12 e l i f nums [ mid ] < t a r g e t :
13 i = mid + 1
14 else :
15 j = mid − 1
16 return i
17
18 i f not nums :
19 return 0
20 r e c = [ 0 ] ∗ l e n ( nums )
21 r e c [ 0 ] = nums [ 0 ]
22 i f r e c [ 0 ] >= s :
23 return 1
24 minlen = l e n ( nums )+1
25 f o r i i n r a n g e ( 1 , l e n ( nums ) ) :
26 r e c [ i ] = r e c [ i −1] + nums [ i ]
27 i f r e c [ i ] >= s :
28 i n d e x = bSearch ( r e c , 0 , i , r e c [ i ] − s )
29 i f rec [ index ] > rec [ i ] − s :
30 i n d e x −= 1
31 minlen = min ( minlen , i − i n d e x )
32 r e t u r n minlen i f minlen != l e n ( nums )+1 e l s e 0

Solution 2: Sliding window in O(n). While, using the sliding

window, Once the sum in the window satisfy the condition, we keep
shrinking the window size (moving the left pointer rightward) untill
the condition is no longer hold. This way, we are capable of getting
the complexity with O(n).
1 d e f minSubArrayLen ( s e l f , s , nums ) :
2 i , j = 0, 0
3 sum_in_window = 0
4 ans = l e n ( nums ) + 1
5 w h i l e j < l e n ( nums ) :
6 sum_in_window += nums [ j ]
7 j += 1
8 # s h r i n k t h e window i f t h e c o n d i t i o n s a t i s f i e d
9 w h i l e i <j and sum_in_window >= s :
10 ans = min ( ans , j −i )
11 sum_in_window −= nums [ i ]
12 i += 1
13 r e t u r n ans i f ans <= l e n ( nums ) e l s e 0

24.9 713. Subarray Product Less Than K Your are given an array of
positive integers nums. Count and print the number of (contiguous)
subarrays where the product of all the elements in the subarray is less
550 24. ARRAY QUESTIONS(15%)

than k.
Example 1 :
Input : nums = [ 1 0 , 5 , 2 , 6 ] , k = 100
Output : 8
E x p l a n a t i o n : The 8 s u b a r r a y s t h a t have p r o d u c t l e s s than
100 a r e : [ 1 0 ] , [ 5 ] , [ 2 ] , [ 6 ] , [ 1 0 , 5 ] , [ 5 , 2 ] , [ 2 , 6 ] ,
[5 , 2 , 6].

Note t h a t [ 1 0 , 5 , 2 ] i s not i n c l u d e d a s t h e p r o d u c t o f 100

i s not s t r i c t l y l e s s than k .
Note :
0 < nums . l e n g t h <= 5 0 0 0 0 .
0 < nums [ i ] < 1 0 0 0 .
0 <= k < 1 0 ^ 6 .

Answer: Because we need the subarray less than k, so it is difficult to

use prefix sum. If we use sliding window,
1 i =0 , j =0, 10 10 <100 , ans+= j −i +1 ( 1 ) −> [ 1 0 ]
2 i =0 , j =1, 50 50 <100 , ans+= j −i +1 ( 3 ) , −> [ 1 0 ] , [ 1 0 , 5 ]
3 i =0 , j =2, 100 s h r i n k t h e window , i =1 , p r o d u c t = 1 0 , ans +=2,
− >[5 ,2][2]
4 i =1 , j =3, 6 0 , ans + = 3 − > [ 2 , 6 ] , [ 2 ] , [ 6 ]

The python code:

1 class Solution :
2 d e f numSubarrayProductLessThanK ( s e l f , nums , k ) :
3 """
4 : type nums : L i s t [ i n t ]
5 : type k : i n t
6 : rtype : int
7 """
8 i f not nums :
9 return 0
10 i , j = 0, 0
11 window_product = 1
12 ans = 0
13 w h i l e j < l e n ( nums ) :
14 window_product ∗= nums [ j ]
15
16 w h i l e i <j and window_product >= k :
17 window_product /= nums [ i ]
18 i+=1
19 i f window_product < k :
20 ans += j −i +1
21 j += 1
22 r e t u r n ans

Array with Negative Element (Monotonic Queue) In this section,

we will work through how to handle the array with negative element and is
Vague-conditioned. We found using monotonic Queue or stack (Section 21.1
24.1. SUBARRAY 551

will fit the scenairo and gave O(n) time complexity and O(N ) space com-
plexity.

24.10 862. Shortest Subarray with Sum at Least K Return the length
of the shortest, non-empty, contiguous subarray of A with sum at least
K.
If there is no non-empty subarray with sum at least K, return -1.
Example 1 :
I nput : A = [ 1 ] , K = 1
Output : 1

Example 2 :
I nput : A = [ 1 , 2 ] , K = 4
Output : −1

Example 3 :
I nput : A = [ 2 , − 1 , 2 ] , K = 3
Output : 3

Note: 1 <= A.length <= 50000, −105 <= A[i] <= 105 , 1 <= K <=
109 .
Analysis: The only difference of this problem compared with the last
is with negative value. Because of the negative, the shrinking method
no longer works because when we shrink the window, the sum in the
smaller window might even grow if we just cut out a negative value.
For instance, [84,-37,32,40,95], K=167, the right answer is [32, 40, 95].
In this program, i=0, j=4, so how to handle the negative value?
Solution 1: prefix sum and binary search in prefix sum. LTE
1 d e f s h o r t e s t S u b a r r a y ( s e l f , A, K) :
2 def bisect_right ( lst , target ) :
3 l , r = 0 , l e n ( l s t )−1
4 w h i l e l <= r :
5 mid = l + ( r−l ) //2
6 i f l s t [ mid ] [ 0 ] <= t a r g e t :
7 l = mid + 1
8 else :
9 r = mid −1
10 return l
11 acc = 0
12 ans = f l o a t ( ' i n f ' )
13 prefixSum = [ ( 0 , −1) ] #v a l u e and i n d e x
14 f o r i , n i n enumerate (A) :
15 a c c += n
16 i n d e x = b i s e c t _ r i g h t ( prefixSum , acc−K)
17 f o r j in range ( index ) :
18 ans = min ( ans , i −prefixSum [ j ] [ 1 ] )
19 i n d e x = b i s e c t _ r i g h t ( prefixSum , a c c )
20 prefixSum . i n s e r t ( index , ( acc , i ) )
552 24. ARRAY QUESTIONS(15%)

21 #p r i n t ( index , prefixSum )
22 r e t u r n ans i f ans != f l o a t ( ' i n f ' ) e l s e −1

Now, let us analyze a simple example which includes both 0 and neg-
ative number. [2, -1, 2, 0, 1], K=3, with prefix sum [0, 2, 1, 3, 3,
4], the subarray is [2,-1,2], [2,-1,2, 0] and [2, 0, 1] where its sum is at
least three. First, let us draw the prefix sum on a x-y axis. When
we encounter an negative number, the prefix sum decreases, if it is
zero, then the prefix sum stablize. For the zero case: at p[2] = p[3],
if subarray ends with index 2 is considered, then 3 is not needed. For
the negative case: p[0]=2>p[1]=1 due to A[1]<0. Because p[1] can
always be a better choice to be i than p[1] (smaller so that it is more
likely, shorter distance). Therefore, we can still keep the validate pre-
fix sum monoitually increasing like the array with all positive numbers
by maintaince a mono queue.
1 class Solution :
2 d e f s h o r t e s t S u b a r r a y ( s e l f , A, K) :
3
4 P = [ 0 ] ∗ ( l e n (A) +1)
5 f o r idx , x i n enumerate (A) :
6 P [ i d x +1] = P [ i d x ]+x
7
8
9 ans = l e n (A)+1 # N+1 i s i m p o s s i b l e
10 monoq = c o l l e c t i o n s . deque ( )
11 f o r y , Py i n enumerate (P) :
12 w h i l e monoq and Py <= P [ monoq [ − 1 ] ] : #both
n e g a t i v e and z e r o l e a d s t o k i c k out any p r e v o u s l a r g e r
or equal value
13 p r i n t ( ' pop ' , P [ monoq [ − 1 ] ] )
14 monoq . pop ( )
15
16 w h i l e monoq and Py − P [ monoq [ 0 ] ] >= K: # i f one
x i s c o n s i d e r e d , no need t o c o n s i d e r a g a i n ( s i m i l a r t o
s l i d i n g window where we move t h e f i r s t i n d e x f o r w a r d )
17 p r i n t ( ' pop ' , P [ monoq [ 0 ] ] )
18 ans = min ( ans , y − monoq . p o p l e f t ( ) )
19 p r i n t ( ' append ' , P [ y ] )
20 monoq . append ( y )
21
22
23 r e t u r n ans i f ans < l e n (A)+1 e l s e −1

24.1.3 LeetCode Problems and Misc

Absolute-conditioned Subarray
1. 930. Binary Subarrays With Sum
1 In an a r r a y A o f 0 s and 1 s , how many non−empty
s u b a r r a y s have sum S?
24.1. SUBARRAY 553

2 Example 1 :
3
4 I nput : A = [ 1 , 0 , 1 , 0 , 1 ] , S = 2
5 Output : 4
6 Explanation :
7 The 4 s u b a r r a y s a r e b o l d e d below :
8 [1 ,0 ,1 ,0 ,1]
9 [1 ,0 ,1 ,0 ,1]
10 [1 ,0 ,1 ,0 ,1]
11 [1 ,0 ,1 ,0 ,1]
12 Note :
13
14 A. l e n g t h <= 30000
15 0 <= S <= A. l e n g t h
16 A[ i ] i s e i t h e r 0 o r 1 .

Answer: this is exactly the third time of maximum subarray, the max-
imum length of subarry with a certain value. We solve it using prefix
sum and a hashmap to save the count of each value.
1 import c o l l e c t i o n s
2 class Solution :
3 d e f numSubarraysWithSum ( s e l f , A, S ) :
4 """
5 : type A: L i s t [ i n t ]
6 : type S : i n t
7 : rtype : int
8 """
9 d i c t = c o l l e c t i o n s . d e f a u l t d i c t ( i n t ) #t h e v a l u e i s
t h e number o f t h e sum o c c u r s
10 d i c t [ 0 ] = 1 #p r e f i x sum s t a r t s from 0 and t h e number
is 1
11 prefix_sum , count =0, 0
12 f o r v i n A:
13 prefix_sum += v
14 count += d i c t [ prefix_sum−S ] # i n c r e a s e t h e
counter of the appearing value k , d e f a u l t i s 0
15 d i c t [ prefix_sum ] += 1 # update t h e count o f
p r e f i x sum , i f i t i s f i r s t time , t h e d e f a u l t v a l u e i s 0
16 r e t u r n count

We can write it as:

1 d e f numSubarraysWithSum ( s e l f , A, S ) :
2 """
3 : type A: L i s t [ i n t ]
4 : type S : i n t
5 : rtype : int
6 """
7 P = [0]
8 f o r x i n A: P . append (P[ −1] + x )
9 count = c o l l e c t i o n s . Counter ( )
10
11 ans = 0
554 24. ARRAY QUESTIONS(15%)

12 for x in P:
13 ans += count [ x ]
14 count [ x + S ] += 1
15
16 r e t u r n ans

Also, it can be solved used a modified sliding window algorithm. For

sliding window, we have i, j starts from 0, which represents the win-
dow. Each iteration j will move one position. For a normal sliding
window, only if the sum is larger than the value, then we shrink
the window size by one. However, in this case, like in the example
1, 0, 1, 0, 1, when j = 5, i = 1, the sum is 2, but the algorithm would
miss the case of i = 2, which has the same sum value. To solve this
problem, we keep another index ih i, in addition to the moving rule
of i, it also moves if the sum is satisfied and that value is 0. This is
actually a Three pointer algorithm.
1 d e f numSubarraysWithSum ( s e l f , A, S ) :
2 i _ l o , i_hi , j = 0 , 0 , 0 #i _ l o <= j
3 sum_lo = sum_hi = 0
4 ans = 0
5 w h i l e j < l e n (A) :
6 # Maintain i _ l o , sum_lo :
7 # While t h e sum i s t o o big , i _ l o += 1
8 sum_lo += A[ j ]
9 w h i l e i _ l o < j and sum_lo > S :
10 sum_lo −= A[ i _ l o ]
11 i _ l o += 1
12
13 # Maintain i_hi , sum_hi :
14 # While t h e sum i s t o o big , o r e q u a l and we can
move , i _ h i += 1
15 sum_hi += A[ j ]
16 w h i l e i _ h i < j and (
17 sum_hi > S o r sum_hi == S and not A[
i_hi ] ) :
18 sum_hi −= A[ i _ h i ]
19 i _ h i += 1
20
21 i f sum_lo == S :
22 ans += i _ h i − i _ l o + 1
23 j += 1
24
25 r e t u r n ans

2. 523. Continuous Subarray Sum

1 Given a l i s t o f non−n e g a t i v e numbers and a t a r g e t i n t e g e r k
, w r i t e a f u n c t i o n t o check i f t h e a r r a y has a
c o n t i n u o u s s u b a r r a y o f s i z e a t l e a s t 2 t h a t sums up t o
t h e m u l t i p l e o f k , t h a t i s , sums up t o n∗k where n i s
a l s o an i n t e g e r .
24.1. SUBARRAY 555

2
3 Example 1 :
4 I nput : [ 2 3 , 2 , 4 , 6 , 7 ] , k=6
5 Output : True
6 E x p l a n a t i o n : Because [ 2 , 4 ] i s a c o n t i n u o u s s u b a r r a y o f
s i z e 2 and sums up t o 6 .
7
8 Example 2 :
9 I nput : [ 2 3 , 2 , 6 , 4 , 7 ] , k=6
10 Output : True
11 E x p l a n a t i o n : Because [ 2 3 , 2 , 6 , 4 , 7 ] i s an c o n t i n u o u s
s u b a r r a y o f s i z e 5 and sums up t o 4 2 .
12
13 Note :
14 The l e n g t h o f t h e a r r a y won ' t e x c e e d 1 0 , 0 0 0 .
15 You may assume t h e sum o f a l l t h e numbers i s i n t h e r a n g e
o f a s i g n e d 32− b i t i n t e g e r .

Answer: This is a mutant of the subarray with value k. The difference

here, we save the prefix sum as the reminder of k. if (a + b)%k = 0,
then (a%k + b%k)/k = 1.
1 class Solution :
2 d e f checkSubarraySum ( s e l f , nums , k ) :
3 """
4 : type nums : L i s t [ i n t ]
5 : type k : i n t
6 : rtype : bool
7 """
8
9 i f not nums :
10 return False
11 k = abs ( k )
12 prefixSum = 0
13 dict = collections . defaultdict ( int )
14 d i c t [0]= −1
15 f o r i , v i n enumerate ( nums ) :
16 prefixSum += v
17 i f k !=0:
18 prefixSum %= k
19 i f prefixSum i n d i c t and ( i −d i c t [ prefixSum ] )
>=2:
20 r e t u r n True
21 i f prefixSum not i n d i c t :
22 d i c t [ prefixSum ] = i
23 return False

For problems like bounded, or average, minimum in a subarray,

24.11 795.Number of Subarrays with Bounded Maximum (medium)

24.12 907. Sum of Subarray Minimums (monotone stack)

556 24. ARRAY QUESTIONS(15%)

24.2 Subsequence (Medium or Hard)

The difference of the subsequence type of questions with the subarray is that
we do not need the elements to be consecutive. Because of this relaxation,
the brute force solution of this type of question is exponentialO(2n ), because
for each element, we have two options: chosen or not chosen. This type of
questions would usually be used as a follow-up question to the subarray due
to its further difficulty because of nonconsecutive. This type of problems
are a typical dynamic programming. Here we should a list of all related
subsequence problems shown on LeetCode in Fig. 24.1
A subsequence of a string is a new string which is formed from the
original string by deleting some (can be none) of the characters without
disturbing the relative positions of the remaining characters. (ie, "ACE"
is a subsequence of "ABCDE" while "AEC" is not). For the subsequence
problems, commonly we will see increasing subsequence, count the distinct
subsequence. And they are usually solved with single sequence type of dy-
namic programming. 940. Distinct Subsequences II (hard)

Figure 24.1: Subsequence Problems Listed on LeetCode

Given a string S, count the number of distinct, non-empty subsequences

of S . Since the result may be large, return the answer modulo 109 + 7.
24.2. SUBSEQUENCE (MEDIUM OR HARD) 557

1 Example 1 :
2
3 I n p u t : " abc "
4 Output : 7
5 E x p l a n a t i o n : The 7 d i s t i n c t s u b s e q u e n c e s a r e " a " , " b " , " c " , " ab
" , " ac " , " bc " , and " abc " .
6
7 Example 2 :
8
9 I n p u t : " aba "
10 Output : 6
11 E x p l a n a t i o n : The 6 d i s t i n c t s u b s e q u e n c e s a r e " a " , " b " , " ab " , " ba
" , " aa " and " aba " .
12
13 Example 3 :
14
15 I n p u t : " aaa "
16 Output : 3
17 E x p l a n a t i o n : The 3 d i s t i n c t s u b s e q u e n c e s a r e " a " , " aa " and " aaa
".

Sequence type dynamic programming. The naive solution for subse-

quence is using DFS to generate all of the subsequence recursively and we
also need to check the repetition. The possible number of subsequence is
2n − 1. Let’s try forward induction method.
1 # d e f i n e t h e r e s u l t f o r each s t a t e : number o f s u b s e q u e n c e ends
with each s t a t e
2 state : a b c
3 ans : 1 2 4
4 a : a ; dp [ 0 ] = 1
5 b : b , ab ; = dp [ 0 ] + 1 i f t h i s i s ' a ' , l e n g t h 1 i s t h e same a s dp
[ 0 ] , only length 2 i s p o s s i b l e
6 c : c , ac , bc , abc ; = dp [ 0 ] + dp [ 1 ] + 1 , i f i t i s ' a ' , aa , ba , aba ,
= dp [ 1 ] + 1
7 d : d , ad , bd , abd , cd , acd , bcd , abcd = dp [ 0 ] + dp [ 1 ] + dp [ 2 ] + 1

Thus the recurrence function can be Eq. 24.1.

dp[i] = (dp[j]) + 1, S[j]! = S[i] (24.1)

j<i

Thus, we have O(n2 ) time complexity, and the following code:

1 def distinctSubseqII ( s e l f , S) :
2 """
3 : type S : s t r
4 : rtype : int
5 """
6 MOD = 10∗∗9+7
7 dp = [ 1 ] ∗ l e n ( S ) #means f o r t h a t l e n g t h i t has a t l e a s t one
count
8 f o r i , c i n enumerate ( S ) :
9 f o r j in range ( i ) :
558 24. ARRAY QUESTIONS(15%)

10 i f c == S [ j ] :
11 continue
12 else :
13 dp [ i ] += dp [ j ]
14 dp [ i ] %= MOD
15 r e t u r n sum ( dp ) % MOD

However, we still get LTE. How to improve it further. If we use a counter

indexed by all of the 26 letters, and a prefix sum. The inner for loop can be
replaced by dp[i] = 1+ (prefix sum - sum of all S[i]).Thus we can lower the
complexity further to O(n).
1 def distinctSubseqII ( s e l f , S) :
2 MOD = 10∗∗9+7
3 dp = [ 1 ] ∗ l e n ( S ) #means f o r t h a t l e n g t h i t has a t l e a s t one
count
4 sum_tracker = [ 0 ] ∗ 2 6
5 total = 0
6 f o r i , c i n enumerate ( S ) :
7 i n d e x = ord ( c ) − ord ( ' a ' )
8 dp [ i ] += t o t a l −sum_tracker [ i n d e x ]
9 t o t a l += dp [ i ]
10 sum_tracker [ i n d e x ] += dp [ i ]
11 r e t u r n sum ( dp ) % MOD

24.2.1 Others
For example, the following question would be used as follow up for question
Longest Continuous Increasing Subsequence
300. Longest Increasing Subsequence
673. Number of Longest Increasing Subsequence
Given an unsorted array of integers, find the number of longest increasing
subsequence.
1 Example 1 :
2
3 Input : [ 1 , 3 , 5 , 4 , 7 ]
4 Output : 2
5 E x p l a n a t i o n : The two l o n g e s t i n c r e a s i n g s u b s e q u e n c e a r e [ 1 , 3 ,
4 , 7 ] and [ 1 , 3 , 5 , 7 ] .
6
7 Example 2 :
8 Input : [ 2 , 2 , 2 , 2 , 2 ]
9 Output : 5
10 E x p l a n a t i o n : The l e n g t h o f l o n g e s t c o n t i n u o u s i n c r e a s i n g
s u b s e q u e n c e i s 1 , and t h e r e a r e 5 s u b s e q u e n c e s ' l e n g t h i s 1 ,
s o output 5 .
11 \ t e x t i t { Note : Length o f t h e g i v e n a r r a y w i l l be not e x c e e d 2000
and t h e answer i s g u a r a n t e e d t o be f i t i n 32− b i t s i g n e d i n t . }

Solution: Another different problem, to count the number of the max

subsequence. Typical dp:
24.2. SUBSEQUENCE (MEDIUM OR HARD) 559

state: f[i]
1 from s y s import maxsize
2 class Solution :
3 d e f findNumberOfLIS ( s e l f , nums ) :
4 """
5 : type nums : L i s t [ i n t ]
6 : rtype : int
7 """
8 max_count = 0
9 i f not nums :
10 return 0
11 memo =[None f o r _ i n r a n g e ( l e n ( nums ) ) ]
12 r l s t =[]
13 d e f r e c u r s i v e ( idx , t a i l , r e s ) :
14 i f i d x==l e n ( nums ) :
15 r l s t . append ( r e s )
16 return 0
17 i f memo [ i d x ]==None :
18 length = 0
19 i f nums [ i d x ]> t a i l :
20 addLen = 1+ r e c u r s i v e ( i d x +1, nums [ i d x ] , r e s +[
nums [ i d x ] ] )
21 notAddLen = r e c u r s i v e ( i d x +1, t a i l , r e s )
22 r e t u r n max( addLen , notAddLen )
23 else :
24 r e t u r n r e c u r s i v e ( i d x +1, t a i l , r e s )
25
26
27 ans=r e c u r s i v e (0 , − maxsize , [ ] )
28 count=0
29 for l s t in r l s t :
30 i f l e n ( l s t )==ans :
31 count+=1
32
33 r e t u r n count

Using dynamic programming, the difference is we add a count array.

1 from s y s import maxsize
2 class Solution :
3 d e f findNumberOfLIS ( s e l f , nums ) :
4 N = l e n ( nums )
5 i f N <= 1 : r e t u r n N
6 l e n g t h s = [ 0 ] ∗ N #l e n g t h s [ i ] = l o n g e s t e n d i n g i n nums [ i
]
7 c o u n t s = [ 1 ] ∗ N #count [ i ] = number o f l o n g e s t e n d i n g i n
nums [ i ]
8
9 f o r idx , num i n enumerate ( nums ) : #i
10 f o r i i n r a n g e ( i d x ) : #j
11 i f nums [ i ] < nums [ i d x ] : #b i g g e r
12 i f l e n g t h s [ i ] >= l e n g t h s [ i d x ] :
13 l e n g t h s [ i d x ] = 1 + l e n g t h s [ i ] #s e t t h e
biggest length
560 24. ARRAY QUESTIONS(15%)

14 c o u n t s [ i d x ] = c o u n t s [ i ] #change t h e
count
15 e l i f l e n g t h s [ i ] + 1 == l e n g t h s [ i d x ] : #i f it
is a tie
16 c o u n t s [ i d x ] += c o u n t s [ i ] #i n c r e a s e t h e
c u r r e n t count by count [ i ]
17
18 l o n g e s t = max( l e n g t h s )
19 print ( counts )
20 print ( lengths )
21 r e t u r n sum ( c f o r i , c i n enumerate ( c o u n t s ) i f l e n g t h s [ i ]
== l o n g e s t )

128. Longest Consecutive Sequence

1 Given an u n s o r t e d a r r a y o f i n t e g e r s , f i n d t h e l e n g t h o f t h e
l o n g e s t consecutive elements sequence .
2
3 For example ,
4 Given [ 1 0 0 , 4 , 2 0 0 , 1 , 3 , 2 ] ,
5 The l o n g e s t c o n s e c u t i v e e l e m e n t s s e q u e n c e i s [ 1 , 2 , 3 , 4 ] .
Return i t s l e n g t h : 4 .
6
7 Your a l g o r i t h m s h o u l d run i n O( n ) c o m p l e x i t y .
8

Solution: Not thinking about the O(n) complexity, we can use sorting
to get [1,2,3,4,100,200], and then use two pointers to get [1,2,3,4].
How about O(n)? We can pop out a number in the list, example, 4 ,
then we use while first-1 to get any number that is on the left side of 4, here
it is 3, 2, 1, and use another to find all the bigger one and remove these
numbers from the nums array.
1 d e f l o n g e s t C o n s e c u t i v e ( s e l f , nums ) :
2 nums = s e t ( nums )
3 maxlen = 0
4 w h i l e nums :
5 f i r s t = l a s t = nums . pop ( )
6 w h i l e f i r s t − 1 i n nums : #keep f i n d i n g t h e s m a l l e r
one
7 f i r s t −= 1
8 nums . remove ( f i r s t )
9 w h i l e l a s t + 1 i n nums : #keep f i n d i n g t h e l a r g e r one
10 l a s t += 1
11 nums . remove ( l a s t )
12 maxlen = max( maxlen , l a s t − f i r s t + 1 )
13 r e t u r n maxlen

24.3 Subset(Combination and Permutation)

The Subset B of a set A is defined as a set within all elements of this subset
are from set A. In other words, the subset B is contained inside the set A,
24.3. SUBSET(COMBINATION AND PERMUTATION) 561

B ∈ A. There are two kinds of subsets: if the order of the subset doesnt
matter, it is a combination problem, otherwise, it is a permutation problem.
To solve the problems in this section, we need to refer to the backtracking
in Sec ??. When the subset has a fixed constant length, then hashmap can
be used to lower the complexity by one power of n.
Subset VS Subsequence. In the subsequence, the elements keep the
original order from the original sequence. While, in the set concept, there
is no ordering, only a set of elements.
In this type of questions, we are asked to return subsets of a list. For
this type of questions, backtracking ?? can be applied.

24.3.1 Combination
The solution of this section is heavily correlated to Section ??. 78. Subsets
1 Given a s e t o f d i s t i n c t i n t e g e r s , nums , r e t u r n a l l p o s s i b l e
s u b s e t s ( t h e power s e t ) .
2
3 Note : The s o l u t i o n s e t must not c o n t a i n d u p l i c a t e s u b s e t s .
4
5 Example :
6
7 I n p u t : nums = [ 1 , 2 , 3 ]
8 Output :
9 [
10 [3] ,
11 [1] ,
12 [2] ,
13 [1 ,2 ,3] ,
14 [1 ,3] ,
15 [2 ,3] ,
16 [1 ,2] ,
17 []
18 ]

Backtracking. This is a combination problem, which we have explained in

backtrack section. We just directly gave the code here.
1 d e f s u b s e t s ( s e l f , nums ) :
2 r e s , n = [ ] , l e n ( nums )
3 r e s = s e l f . combine ( nums , n , n )
4 return res
5
6 d e f combine ( s e l f , nums , n , k ) :
7 """
8 : type n : i n t
9 : type k : i n t
10 : rtype : List [ List [ int ] ]
11 """
12 d e f C_n_k( d , k , s , c u r r , ans ) : #d c o n t r o l s t h e d e g r e e ( depth
) , k i s c o n t r o l s the return l e v e l , curr saves the current
r e s u l t , ans i s a l l t h e r e s u l t
562 24. ARRAY QUESTIONS(15%)

13 ans . append ( c u r r )
14 i f d == k : #t h e l e n g t h i s s a t i s f i e d
15
16 return
17 f o r i in range ( s , n) :
18 c u r r . append ( nums [ i ] )
19 C_n_k( d+1, k , i +1, c u r r [ : ] , ans ) # i +1 b e c a u s e no
r e p e a t , make s u r e u s e deep copy c u r r [ : ]
20 c u r r . pop ( )
21
22 ans = [ ]
23 C_n_k( 0 , k , 0 , [ ] , ans )
24 r e t u r n ans

Incremental. Backtracking is not the only way for the above problem.
There is another way to do it iterative, observe the following process. We
can just keep append elements to the end of of previous results.
1 [1 , 2 , 3 , 4]
2 l = 0, []
3 l = 1 , f o r 1 , [ ] + [ 1 ] , −> [ 1 ] , get powerset of [ 1 ]
4 l = 2 , f o r 2 , [ ] + [ 2 ] , [ 1 ] + [ 2 ] , −> [ 2 ] , [ 1 , 2 ] , g e t p o w e r s e t o f
[1 , 2]
5 l = 3 , f o r 3 , [ ] + [ 3 ] , [ 1 ] + [ 3 ] , [ 2 ] + [ 3 ] , [ 1 , 2 ] + [ 3 ] , −> [ 3 ] , [ 1 ,
3 ] , [ 2 , 3 ] , [ 1 , 2 , 3 ] , get powerset of [ 1 , 2 , 3 ]
6 l = 4 , for 4 , []+ [ 4 ] ; [ 1 ] + [ 4 ] ; [ 2 ] + [ 4 ] , [1 , 2] +[4]; [ 3 ] + [ 4 ] ,
[ 1 , 3 ] + [ 4 ] , [ 2 , 3 ] + [ 4 ] , [ 1 , 2 , 3 ] + [ 4 ] , get powerset of [ 1 , 2 , 3 ,
4]

1 d e f s u b s e t s ( s e l f , nums ) :
2 r e s u l t = [ [ ] ] #u s e two d i m e n s i o n a l , which a l r e a d y have [ ]
one e l e m e n t
3 f o r num i n nums :
4 new_results = [ ]
5 for r in r e s u l t :
6 n e w _ r e s u l t s . append ( r + [ num ] )
7 r e s u l t += n e w _ r e s u l t s
8
9 return r e s u l t

90. Subsets II
1 Given a c o l l e c t i o n o f i n t e g e r s t h a t might c o n t a i n d u p l i c a t e s ,
nums , r e t u r n a l l p o s s i b l e s u b s e t s ( t h e power s e t ) .
2
3 Note : The s o l u t i o n s e t must not c o n t a i n d u p l i c a t e s u b s e t s .
4
5 Example :
6
7 Input : [ 1 , 2 , 2 ]
8 Output :
9 [
10 [2] ,
11 [1] ,
12 [1 ,2 ,2] ,
24.3. SUBSET(COMBINATION AND PERMUTATION) 563

13 [2 ,2] ,
14 [1 ,2] ,
15 []
16 ]

Analysis: Because of the duplicates, the previous superset algorithm would

give repetitive subset. For the above example, we would have [1, 2] twice,
and [2] twice. If we try to modify on the previous code. We first need to
sort the nums, which makes the way we check repeat easiler. Then the code
goes like this:
1 d e f subsetsWithDup ( s e l f , nums ) :
2 """
3 : type nums : L i s t [ i n t ]
4 : rtype : List [ List [ int ] ]
5 """
6 nums . s o r t ( )
7 r e s u l t = [ [ ] ] #u s e two d i m e n s i o n a l , which a l r e a d y have
[ ] one e l e m e n t
8 f o r num i n nums :
9 new_results = [ ]
10 for r in r e s u l t :
11 print ( r )
12 n e w _ r e s u l t s . append ( r + [ num ] )
13 f o r r s t in new_results :
14 i f r s t not i n r e s u l t : # check t h e r e p e t i t i v e
15 r e s u l t . append ( r s t )
16
17 return r e s u l t

However, the above code is extremely inefficient because of the checking

process. A better way to do this:
1 [1 , 2, 2]
2 l = 0, []
3 l = 1, for 1 , []+[1]
4 l = 2, for 2 , []+[2] , [1]+[2]; []+[2 , 2] , [1]+[2 , 2]

So it would be more efficient if we first save all the numbers in the array in
a dictionary. For the above case, the dic = 1:1, 2:2. Each time we try to
generate the result, we use 2 up to 2 times. Same way, we can use dictionary
on the backtracking too.
1 class Solution ( object ) :
2 d e f subsetsWithDup ( s e l f , nums ) :
3 """
4 : type nums : L i s t [ i n t ]
5 : rtype : List [ List [ int ] ]
6 """
7 i f not nums :
8 return [ [ ] ]
9 res = [ [ ] ]
10 d i c = c o l l e c t i o n s . Counter ( nums )
11 f o r key , v a l i n d i c . i t e m s ( ) :
564 24. ARRAY QUESTIONS(15%)

12 tmp = [ ]
13 for l s t in res :
14 f o r i i n r a n g e ( 1 , v a l +1) :
15 tmp . append ( l s t +[ key ] ∗ i )
16 r e s += tmp
17 return res

77. Combinations
1 Given two i n t e g e r s n and k , r e t u r n a l l p o s s i b l e c o m b i n a t i o n s o f
k numbers out o f 1 . . . n .
2
3 Example :
4
5 Input : n = 4 , k = 2
6 Output :
7 [
8 [2 ,4] ,
9 [3 ,4] ,
10 [2 ,3] ,
11 [1 ,2] ,
12 [1 ,3] ,
13 [1 ,4] ,
14 ]

Analysis: In this problem, it is difficult for us to generate the results iter-

atively, the only way we can use the second solution is by filtering and get
only the results with the length we want. However, the backtrack can solve
the problem easily as we mentioned in Section ??.
1 d e f combine ( s e l f , n , k ) :
2 """
3 : type n : i n t
4 : type k : i n t
5 : rtype : List [ List [ int ] ]
6 """
7 ans = [ ]
8 d e f C_n_k( d , k , s , c u r r ) :
9 i f d==k :
10 ans . append ( c u r r )
11 return
12 f o r i in range ( s , n) :
13 #c u r r . append ( i +1)
14 #C_n_k( d+1 , k , i +1, c u r r [ : ] )
15 #c u r r . pop ( )
16 C_n_k( d+1, k , i +1, c u r r +[ i +1])
17 C_n_k( 0 , k , 0 , [ ] )
18
19 r e t u r n ans

24.3.2 Combination Sum

39. Combination Sum
24.3. SUBSET(COMBINATION AND PERMUTATION) 565

Given a set of candidate numbers (candidates) (without duplicates)

and a target number (target), find all unique combinations in candidates
where the candidate numbers sums to target.
The same repeated number may be chosen from candidates unlimited
number of times.
1 Note :
2
3 A l l numbers ( i n c l u d i n g t a r g e t ) w i l l be p o s i t i v e i n t e g e r s .
4 The s o l u t i o n s e t must not c o n t a i n d u p l i c a t e c o m b i n a t i o n s .
5
6 Example 1 :
7
8 Input : candidates = [ 2 , 3 , 6 , 7 ] , t a r g e t = 7 ,
9 A solution set is :
10 [
11 [7] ,
12 [2 ,2 ,3]
13 ]
14
15 Example 2 :
16
17 Input : candidates = [ 2 , 3 , 5 ] , t a r g e t = 8 ,
18 A solution set is :
19 [
20 [2 ,2 ,2 ,2] ,
21 [2 ,3 ,3] ,
22 [3 ,5]
23 ]

DFS Backtracking. Analysis: This is still a typical combination problem,

the only thing is the return level is when the sum of the path we gained is
larger than the target, and we only collect the answer when it is equal. And
Because a number can be used unlimited times, so that each time after we
used one number, we do not increase the next start position.
1 d e f combinationSum ( s e l f , c a n d i d a t e s , t a r g e t ) :
2 """
3 : type c a n d i d a t e s : L i s t [ i n t ]
4 : type t a r g e t : i n t
5 : rtype : List [ List [ int ] ]
6 """
7 ans = [ ]
8 candidates . sort ()
9 s e l f . combine ( c a n d i d a t e s , t a r g e t , 0 , [ ] , ans )
10 r e t u r n ans
11
12 d e f combine ( s e l f , nums , t a r g e t , s , c u r r , ans ) :
13 i f target < 0:
14 return # backtracking
15 i f t a r g e t == 0 :
16 ans . append ( c u r r )
17 return
566 24. ARRAY QUESTIONS(15%)

18 f o r i i n r a n g e ( s , l e n ( nums ) ) :
19 # i f nums [ i ] > t a r g e t :
20 # return
21 s e l f . combine ( nums , t a r g e t −nums [ i ] , i , c u r r +[nums [ i ] ] ,
ans ) # u s e i , i n s t e a d o f i +1 b e c a u s e we can r e u s e

40. Combination Sum II

Given a collection of candidate numbers (candidates with duplicates)
and a target number (target), find all unique combinations in candidates
where the candidate numbers sums to target.
Each number in candidates may only be used once in the combination.
1 Note :
2
3 A l l numbers ( i n c l u d i n g t a r g e t ) w i l l be p o s i t i v e i n t e g e r s .
4 The s o l u t i o n s e t must not c o n t a i n d u p l i c a t e c o m b i n a t i o n s .
5
6 Example 1 :
7
8 Input : c a n d i d a t e s = [ 1 0 , 1 , 2 , 7 , 6 , 1 , 5 ] , t a r g e t = 8 ,
9 A solution set is :
10 [
11 [1 , 7] ,
12 [1 , 2 , 5] ,
13 [2 , 6] ,
14 [1 , 1 , 6]
15 ]
16
17 Example 2 :
18
19 Input : c a n d i d a t e s = [ 2 , 5 , 2 , 1 , 2 ] , t a r g e t = 5 ,
20 A solution set is :
21 [
22 [1 ,2 ,2] ,
23 [5]
24 ]

Backtracking+Counter. Because for the first example, if we reuse the

code from the previous problem, we will get extra combinations: [7, 1], [2,
1, 5]. To avoid this, we need a dictionary to save all the unique candidates
with its corresponding appearing times. For a certain number, it will be
used at most its counter times.
1 d e f combinationSum2 ( s e l f , c a n d i d a t e s , t a r g e t ) :
2 """
3 : type c a n d i d a t e s : L i s t [ i n t ]
4 : type t a r g e t : i n t
5 : rtype : List [ List [ int ] ]
6 """
7
8 c a n d i d a t e s = c o l l e c t i o n s . Counter ( c a n d i d a t e s )
9 ans = [ ]
10 s e l f . combine ( l i s t ( c a n d i d a t e s . i t e m s ( ) ) , t a r g e t , 0 , [ ] , ans ) #
c o n v e r t t h e Counter t o a l i s t o f ( key , item ) t u p l e
24.3. SUBSET(COMBINATION AND PERMUTATION) 567

11 r e t u r n ans
12
13 d e f combine ( s e l f , nums , t a r g e t , s , c u r r , ans ) :
14 i f target < 0:
15 return
16 i f t a r g e t == 0 :
17 ans . append ( c u r r )
18 return
19 f o r i d x i n r a n g e ( s , l e n ( nums ) ) :
20 num , count = nums [ i d x ]
21 f o r c i n r a n g e ( count ) :
22 s e l f . combine ( nums , t a r g e t −num∗ ( c +1) , i d x +1, c u r r +[
num ] ∗ ( c +1) , ans )

377. Combination Sum IV (medium)

1 Given an i n t e g e r a r r a y with a l l p o s i t i v e numbers and no
d u p l i c a t e s , f i n d t h e number o f p o s s i b l e c o m b i n a t i o n s t h a t add
up t o a p o s i t i v e i n t e g e r t a r g e t .
2
3 Example :
4
5 nums = [ 1 , 2 , 3 ]
6 target = 4
7
8 The p o s s i b l e c o m b i n a t i o n ways a r e :
9 (1 , 1 , 1 , 1)
10 (1 , 1 , 2)
11 (1 , 2 , 1)
12 (1 , 3)
13 (2 , 1 , 1)
14 (2 , 2)
15 (3 , 1)
16
17 Note t h a t d i f f e r e n t s e q u e n c e s a r e counted a s d i f f e r e n t
combinations .
18
19 T h e r e f o r e t h e output i s 7 .
20
21 Follow up :
22 What i f n e g a t i v e numbers a r e a l l o w e d i n t h e g i v e n a r r a y ?
23 How d o e s i t change t h e problem ?
24 What l i m i t a t i o n we need t o add t o t h e q u e s t i o n t o a l l o w n e g a t i v e
numbers ?

DFS + MEMO. This problem is similar to 39. Combination Sum. For [2,
3, 5], target = 8, comparison:
1 [2 , 3 , 5] , target = 8
2 3 9 . Combination Sum . # t h e r e i s o r d e r i n g ( each time t h e s t a r t
i n d e x i s same o r l a r g e r than b e f o r e )
3 [
4 [2 ,2 ,2 ,2] ,
5 [2 ,3 ,3] ,
6 [3 ,5]
568 24. ARRAY QUESTIONS(15%)

7 ]
8 3 7 7 . Combination Sum IV , h e r e we have no o r d e r i n g ( each time t h e
s t a r t i n d e x i s t h e same a s b e f o r e ) . Try a l l e l e m e n t .
9 [
10 [2 ,2 ,2 ,2] ,
11 [2 ,3 ,3] ,
12 ∗ [3 ,3 ,2]
13 ∗ [3 ,2 ,3]
14 [3 ,5] ,
15 ∗ [5 ,3]
16 ]

1 d e f combinationSum4 ( s e l f , nums , t a r g e t ) :
2 """
3 : type nums : L i s t [ i n t ]
4 : type t a r g e t : i n t
5 : rtype : int
6 """
7 nums . s o r t ( )
8 n = l e n ( nums )
9 d e f DFS( idx , memo, t ) :
10 i f t < 0:
11 return 0
12 i f t == 0 :
13 return 1
14 count = 0
15 i f t not i n memo :
16 f o r i i n r a n g e ( idx , n ) :
17 count += DFS( idx , memo, t−nums [ i ] )
18 memo [ t ] = count
19 r e t u r n memo [ t ]
20 r e t u r n (DFS( 0 , { } , t a r g e t ) )

Because, here we does not need to numerate all the possible solutions, we
can use dynamic programming, which will be shown in Section ??.

24.3.3 K Sum
In this subsection, we still trying to get subset that sum up to a target. But
the length here is fixed. We would have 2, 3, 4 sums normally. Because it is
still a combination problem, we can use the backtracking to do. Second,
because the fixed length, we can use multiple pointers to build up the
potential same lengthed subset. But in some cases, because the length is
fixed, we can use hashmap to simplify the complexity.
1. Two Sum Given an array of integers, return indices of the two num-
bers such that they add up to a specific target.
You may assume that each input would have exactly one solution, and
you may not use the same element twice.
1 Example :
2
3 Given nums = [ 2 , 7 , 1 1 , 1 5 ] , t a r g e t = 9 ,
24.3. SUBSET(COMBINATION AND PERMUTATION) 569

4
5 Because nums [ 0 ] + nums [ 1 ] = 2 + 7 = 9 ,
6 return [0 , 1 ] .

Hashmap. Using backtracking or brute force will get us O(n2 ) time com-
plexity. We can use hashmap to save the nums in a dictionary. Then we just
check target-num in the dictionary. We would get O(n) time complexity. We
have two-pass hashmap and one-pass hashmap.
1 # two−p a s s hashmap
2 d e f twoSum ( s e l f , nums , t a r g e t ) :
3 """
4 : type nums : L i s t [ i n t ]
5 : type t a r g e t : i n t
6 : rtype : List [ int ]
7 """
8 dict = collections . defaultdict ( int )
9 f o r i , t i n enumerate ( nums ) :
10 dict [ t ] = i
11 f o r i , t i n enumerate ( nums ) :
12 i f t a r g e t − t i n d i c t and i != d i c t [ t a r g e t −t ] :
13 r e t u r n [ i , d i c t [ t a r g e t −t ] ]
14 # one−p a s s hashmap
15 d e f twoSum ( s e l f , nums , t a r g e t ) :
16 """
17 : type nums : L i s t [ i n t ]
18 : type t a r g e t : i n t
19 : rtype : List [ int ]
20 """
21 dict = collections . defaultdict ( int )
22 f o r i , t i n enumerate ( nums ) :
23 i f target − t in dict :
24 r e t u r n [ d i c t [ t a r g e t −t ] , i ]
25 dict [ t ] = i

15. 3Sum
Given an array S of n integers, are there elements a, b, c in S such that
a + b + c = 0? Find all unique triplets in the array which gives the sum of
zero.
Note: The solution set must not contain duplicate triplets.
For example, given array S = [-1, 0, 1, 2, -1, -4],
1 A solution set is :
2 [
3 [ −1 , 0 , 1 ] ,
4 [ −1 , −1, 2 ]
5 ]

Solution: Should use three pointers, no extra space. i is the start point
from [0,len-2], l,r is the other two pointers. l=i+1, r=len-1 at the beignning.
The saving of time complexity is totally from the sorting algorithm.
1 [ −4 , −1 , −1 ,0 ,1 ,2]
2 i , l −> ``````<− r
570 24. ARRAY QUESTIONS(15%)

How to delete repeat?

1 d e f threeSum ( s e l f , nums ) :
2 res = [ ]
3 nums . s o r t ( )
4 f o r i i n xrange ( l e n ( nums ) −2) :
5 i f i > 0 and nums [ i ] == nums [ i − 1 ] : #make s u r e p o i n t e r
not r e p e a t
6 continue
7 l , r = i +1, l e n ( nums )−1
8 while l < r :
9 s = nums [ i ] + nums [ l ] + nums [ r ]
10 i f s < 0:
11 l +=1
12 e l i f s > 0:
13 r −= 1
14 else :
15 r e s . append ( ( nums [ i ] , nums [ l ] , nums [ r ] ) )
16 l+=1
17 r−=1
18
19 #a f t e r t h e f i r s t run , then check d u p l i c a t e
example .
20 while l < r and nums [ l ] == nums [ l − 1 ] :
21 l += 1
22 while l < r and nums [ r ] == nums [ r + 1 ] :
23 r −= 1
24 return res

Use hashmap:
1 d e f threeSum ( s e l f , nums ) :
2 """
3 : type nums : L i s t [ i n t ]
4 : rtype : List [ List [ int ] ]
5 """
6 res =[]
7 nums=s o r t e d ( nums )
8 i f not nums :
9 return [ ]
10 i f nums[ −1] <0 o r nums [ 0 ] > 0 :
11 return [ ]
12 e n d _ p o s i t i o n = l e n ( nums )−2
13 dic_nums={}
14 f o r i i n xrange ( 1 , l e n ( nums ) ) :
15 dic_nums [ nums [ i ] ] = i# same r e s u l t s a v e t h e l a s t i n d e x
16
17 f o r i i n xrange ( e n d _ p o s i t i o n ) :
18 t a r g e t = 0−nums [ i ]
19 i f i >0 and nums [ i ] == nums [ i − 1 ] : #t h i s i s t o a v o i d
repeat
20 continue
21 i f t a r g e t <nums [ i ] : #i f t h e t a r g e t i s s m a l l e r than
t h i s , we can not f i n d them on t h e r i g h t s i d e
22 break
24.3. SUBSET(COMBINATION AND PERMUTATION) 571

23 f o r j i n r a n g e ( i +1 , l e n ( nums ) ) : #t h i s i s t o a v o i d
repeat
24 i f j >i +1 and nums [ j ]==nums [ j − 1 ] :
25 continue
26 complement =t a r g e t − nums [ j ]
27 i f complement<nums [ j ] : #i f t h e l e f t numbers a r e
b i g g e r than t h e complement , no need t o keep s e a r c h i n g
28 break
29 i f complement i n dic_nums and dic_nums [
complement ]> j : #need t o make s u r e t h e complement i s b i g g e r
than nums [ j ]
30 r e s . append ( [ nums [ i ] , nums [ j ] , complement ] )
31 return res

The following code uses more time

1 f o r i i n xrange ( l e n ( nums ) −2) :
2 i f i > 0 and nums [ i ] == nums [ i − 1 ] :
3 continue
4 l , r = i +1 , l e n ( nums )−1
5 while l < r :
6 i f l −1>=i +1 and nums [ l ] == nums [ l − 1 ] : #check t h e
front
7 l += 1
8 continue
9 i f r+1<l e n ( nums ) and nums [ r ] == nums [ r + 1 ] :
10 r −= 1
11 continue
12 s = nums [ i ] + nums [ l ] + nums [ r ]
13 i f s < 0:
14 l +=1
15 e l i f s > 0:
16 r −= 1
17 else :
18 r e s . append ( ( nums [ i ] , nums [ l ] , nums [ r ] ) )
19 l += 1 ; r −= 1
20 return res

18. 4Sum
1 d e f fourSum ( s e l f , nums , t a r g e t ) :
2 d e f findNsum ( nums , t a r g e t , N, r e s u l t , r e s u l t s ) :
3 i f l e n ( nums ) < N o r N < 2 o r t a r g e t < nums [ 0 ] ∗N o r
t a r g e t > nums [ −1]∗N: # e a r l y t e r m i n a t i o n
4 return
5 i f N == 2 : # two p o i n t e r s s o l v e s o r t e d 2−sum problem
6 l , r = 0 , l e n ( nums )−1
7 while l < r :
8 s = nums [ l ] + nums [ r ]
9 i f s == t a r g e t :
10 r e s u l t s . append ( r e s u l t + [ nums [ l ] , nums [ r
]])
11 l += 1
12 r−=1
13 w h i l e l < r and nums [ l ] == nums [ l − 1 ] :
572 24. ARRAY QUESTIONS(15%)

14 l += 1
15 w h i l e l < r and nums [ r ] == nums [ r + 1 ] :
16 r −= 1
17 e l i f s < target :
18 l += 1
19 else :
20 r −= 1
21 e l s e : # r e c u r s i v e l y reduce N
22 f o r i i n r a n g e ( l e n ( nums )−N+1) :
23 i f i == 0 o r ( i > 0 and nums [ i −1] != nums [ i
]) :
24 findNsum ( nums [ i + 1 : ] , t a r g e t −nums [ i ] , N
−1 , r e s u l t +[nums [ i ] ] , r e s u l t s ) #r e d u c e nums s i z e , r e d u c e
target , save r e s u l t
25
26 results = []
27 findNsum ( s o r t e d ( nums ) , t a r g e t , 4 , [ ] , r e s u l t s )
28 return r e s u l t s

454. 4Sum II
Given four lists A, B, C, D of integer values, compute how many tuples
(i, j, k, l) there are such that A[i] + B[j] + C[k] + D[l] is zero.
To make problem a bit easier, all A, B, C, D have same length of N
where 0 ≤ N ≤ 500. All integers are in the range of -228 to 228–1 and the
result is guaranteed to be at most 231–1.
Example:
1 Input :
2 A = [ 1 , 2]
3 B = [ −2 , −1]
4 C = [ −1 , 2 ]
5 D = [ 0 , 2]
6
7 Output :
8 2

Explanation:
1 The two t u p l e s a r e :
2 1 . ( 0 , 0 , 0 , 1 ) −> A [ 0 ] + B [ 0 ] + C [ 0 ] + D [ 1 ] = 1 + ( −2) + ( −1) +
2 = 0
3 2 . ( 1 , 1 , 0 , 0 ) −> A [ 1 ] + B [ 1 ] + C [ 0 ] + D [ 0 ] = 2 + ( −1) + ( −1) +
0 = 0

Solution: if we use brute force, use 4 for loop, then it is O(N 4 ). If we use
divide and conquer, sum the first half, and save a dictionary (counter), time
complexity is O(2N 2 ). What if we have 6 sum, we can reduce it to O(2N 3 ),
what if 8 sum.
1 d e f fourSumCount ( s e l f , A, B, C, D) :
2 AB = c o l l e c t i o n s . Counter ( a+b f o r a i n A f o r b i n B)
3 r e t u r n sum (AB[−c−d ] f o r c i n C f o r d i n D)
24.3. SUBSET(COMBINATION AND PERMUTATION) 573

Summary
As we have seen from the shown examples in this section, to solve the com-
bination problem, backtrack shown in Section ?? offers a universal solution.
Also, there is another iterative solution which suits the power set purpose.
And I would include its code here again:
1 d e f s u b s e t s ( s e l f , nums ) :
2 r e s u l t = [ [ ] ] #u s e two d i m e n s i o n a l , which a l r e a d y have [ ]
one e l e m e n t
3 f o r num i n nums :
4 new_results = [ ]
5 for r in r e s u l t :
6 n e w _ r e s u l t s . append ( r + [ num ] )
7 r e s u l t += n e w _ r e s u l t s
8
9 return r e s u l t

If we have duplicates, how to handle in the backtrack?? In the iterative

solution, we can replace the array with a dictionary saves the counts.

24.3.4 Permutation
46. Permutations
1 Given a c o l l e c t i o n o f d i s t i n c t numbers , r e t u r n a l l p o s s i b l e
permutations .
2
3 For example ,
4 [ 1 , 2 , 3 ] have t h e f o l l o w i n g p e r m u t a t i o n s :
5
6 [
7 [1 ,2 ,3] ,
8 [1 ,3 ,2] ,
9 [2 ,1 ,3] ,
10 [2 ,3 ,1] ,
11 [3 ,1 ,2] ,
12 [3 ,2 ,1]
13 ]

47. Permutations II
Given a collection of numbers that might contain duplicates, return all
possible unique permutations.
For example,
1 [ 1 , 1 , 2 ] have t h e f o l l o w i n g u niqu e p e r m u t a t i o n s :
2
3 [
4 [1 ,1 ,2] ,
5 [1 ,2 ,1] ,
6 [2 ,1 ,1]
7 ]

301. Remove Invalid Parentheses

574 24. ARRAY QUESTIONS(15%)

Remove the minimum number of invalid parentheses in order to make

the input string valid. Return all possible results.
Note: The input string may contain letters other than the parentheses (
and ).
Examples:
1 " ( ) ( ) ) ( ) " −> [ " ( ) ( ) ( ) " , " ( ( ) ) ( ) " ]
2 " ( a ) ( ) ) ( ) " −> [ " ( a ) ( ) ( ) " , " ( a ( ) ) ( ) " ]
3 " ) ( " −> [ " " ]

24.4 Merge and Partition

24.4.1 Merge Lists
We can use divide and conquer (see the merge sort) and the priority queue.

24.4.2 Partition Lists

Partition of lists can be converted to subarray, combination, subsequence
problems. For example,

1. 416. Partition Equal Subset Sum (combination)

2. 698. Partition to K Equal Sum Subsets

24.5 Intervals
Sweep Line is a type of algorithm that mainly used to solve problems with
intervals of one-dimensional. Let us look at one example: 1. 253. Meeting
Rooms II
Given an array of meeting time intervals consisting of start and end times
[[s1,e1],[s2,e2],...] (si < ei), find the minimum number of conference rooms
required.
1 Example 1 :
2
3 Input : [ [ 0 , 3 0 ] , [ 5 , 1 0 ] , [ 1 5 , 2 0 ] ]
4 Output : 2
5
6 Example 2 :
7
8 Input : [ [ 7 , 1 0 ] , [ 2 , 4 ] ]
9 Output : 1

It would help a lot if at first we can draw one example with cooridinates.
First, the simplest situation is when we only need one meeting room is
there is no intersection between these time intervals. If we add one interval
that only intersect with one of the previous intervals, this means we need
24.5. INTERVALS 575

Figure 24.2: Interval questions

two conference rooms. So to find the minimum conference rooms we need,

we need to find the maximum number of intersection between these time
intervals. The most native solution is to scan all the time slot in one for
loop, and at another inner loop go through all the intervals, if this time slot
is in this intervals, then we increase the minimum number of meeting room
counter. This gives us time complexity of O(n ∗ m), where n is the number
of intervals and m is the total number of time slots. The Python code is as
follows, unfortunately, with this solution we have LTE error.
1 # D e f i n i t i o n f o r an i n t e r v a l .
2 # class Interval ( object ) :
3 # d e f __init__ ( s e l f , s =0, e =0) :
4 # self . start = s
5 # s e l f . end = e
6
7 from c o l l e c t i o n s import d e f a u l t d i c t
8 from heapq import heappush , heappop
9 from s y s import maxint
10 class Solution ( object ) :
11 d e f minMeetingRooms ( s e l f , i n t e r v a l s ) :
12 """
13 : type i n t e r v a l s : L i s t [ I n t e r v a l ]
14 : rtype : int
15 """
16 i f not i n t e r v a l s :
17 return 0
18 #s o l u t i o n 1 , v o t i n g , time c o m p l e x i t y i s O( e1−s 1 ) , 71/77
t e s t , TLE
19 votes = defaultdict ( int )
20 num_rooms = 0
21 for i n t e r v a l in i n t e r v a l s :
22 s=i n t e r v a l . s t a r t
23 e=i n t e r v a l . end
576 24. ARRAY QUESTIONS(15%)

24 f o r i i n r a n g e ( s +1 , e +1) :
25 v o t e s [ i ]+=1
26 num_rooms = max( num_rooms , v o t e s [ i ] )
27 r e t u r n num_rooms

24.5.1 Speedup with Sweep Line

Now, let us see how to speed up this process. We can use Sweep Line method.
For the sweep line, we have three basic implementations: one-dimensional,
min-heap, or map based.

One-dimensional Implementation
To get the maximum number of intersection of all the intervals, it is not
necessarily to scan all the time slots, how about just scan the key slot: the
starts and ends . Thus, what we can do is to open an array and put all
the start or end slot into the array, and with 1 to mark it as start and 0
to mark it as end. Then we sort this array. Till this point, how to get the
maximum intersection? We go through this sorted array, if we get a start
our current number of room needed will increase by one, otherwise, if we
encounter an end slot, it means one meeting room is freed, thus we decrease
the current on-going meeting room by one. We use another global variable
to track the maximum number of rooms needed in this whole process. Great,
because now our time complexity is decided by the number of slots 2n, with
the sorting algorithm, which makes the whole time complexity O(nlogn)
and space complexity n. This speeded up algorithm is called Sweep Line
algorithm. Before we write our code, we better check the special cases, what
if there is one slot that is marked as start in one interval but is the end
of another interval. This means we can not increase the counting at first,
but we need to decrease, so that the sorting should be based on the first
element of the tuple, and followed by the second element of the tuple. For
example, the simple case [[13, 15], [1, 13]], we only need maximum of one
meeting room. Thus it can be implemented as:

Figure 24.3: One-dimensional Sweep Line

1 d e f minMeetingRooms ( s e l f , i n t e r v a l s ) :
2 i f not i n t e r v a l s :
24.5. INTERVALS 577

3 return 0
4 #s o l u t i o n 2
5 slots = []
6 # put s l o t s i n t o one−d i m e n s i o n a l a x i s
7 for i in i n t e r v a l s :
8 s l o t s . append ( ( i . s t a r t , 1 ) )
9 s l o t s . append ( ( i . end , 0 ) )
10 # s o r t t h e s e s l o t s on t h i s d i m e n s i o n
11 #s l o t s . s o r t ( key = lambda x : ( x [ 0 ] , x [ 1 ] ) )
12 slots . sort ()
13
14 # now e x e c u t e t h e c o u n t i n g
15 crt_room , max_room = 0 , 0
16 for s in s l o t s :
17 i f s [ 1 ] = = 0 : # i f i t ends , d e c r e a s e
18 crt_room−=1
19 else :
20 crt_room+=1
21 max_room = max( max_room , crt_room )
22 r e t u r n max_room

Min-heap Implementation

Figure 24.4: Min-heap for Sweep Line

Instead of opening an array to save all the time slots, we can directly
sort the intervals in the order of the start time. We can see Fig. 24.4, we
go through the intervals and visit their end time, the first one we encounter
is 30, we put it in a min-heap, and then we visit the next interval [5, 10], 5
is smaller than the previous end time 30, it means this interval intersected
with a previous interval, so the number of maximum rooms increase 1, we
get 2 rooms now. We put 10 into the min-heap. Next, we visit [15, 20], 15
is larger than the first element in the min-heap 10, it means that these two
intervals can be merged into one [5, 20], so we need to update the end time
578 24. ARRAY QUESTIONS(15%)

10 to 20.
This way, the time complexity is still the same which is decided by the
sorting algorithm. While the space complexity is decided by real situation,
it varies from O(1) (no intersection) to O(n) (all the meetings are intersected
at at least one time slot).
1 d e f minMeetingRooms ( s e l f , i n t e r v a l s ) :
2 i f not i n t e r v a l s :
3 return 0
4 #s o l u t i o n 2
5 i n t e r v a l s . s o r t ( key=lambda x : x . s t a r t )
6 h = [ i n t e r v a l s [ 0 ] . end ]
7 rooms = 1
8 for i in i n t e r v a l s [ 1 : ] :
9 s , e=i . s t a r t , i . end
10 e_before = h [ 0 ]
11 i f s<e _ b e f o r e : #o v e r l a p
12 heappush ( h , i . end )
13 rooms+=1
14 e l s e : #no o v e r l a p
15 #merge
16 heappop ( h ) #k i c k out 10 i n our example
17 heappush ( h , e ) # r e p l a c e 10 with 20
18 r e t u r n rooms

Map-based Implementation

1 class Solution {
2 public :
3 i n t minMeetingRooms ( v e c t o r & i n t e r v a l s ) {
4 map mp ;
5 f o r ( auto v a l : i n t e r v a l s ) {
6 ++mp [ v a l . s t a r t ] ;
7 −−mp [ v a l . end ] ;
8 }
9 i n t max_room = 0 , crt_room = 0 ;
10 f o r ( auto v a l : mp) {
11 crt_room += v a l . s e c o n d ;
12 max_room = max( max_room , crt_room ) ;
13 }
14 r e t u r n max_room ;
15 }
16 };

24.5.2 LeetCode Problems

1. 986. Interval List Intersections Given two lists of closed intervals,
each list of intervals is pairwise disjoint and in sorted order. Return
the intersection of these two interval lists.
24.6. INTERSECTION 579

I nput : A = [ [ 0 , 2 ] , [ 5 , 1 0 ] , [ 1 3 , 2 3 ] , [ 2 4 , 2 5 ] ] , B =
[[1 ,5] ,[8 ,12] ,[15 ,24] ,[25 ,26]]
Output : [ [ 1 , 2 ] , [ 5 , 5 ] , [ 8 , 1 0 ] , [ 1 5 , 2 3 ] , [ 2 4 , 2 4 ] , [ 2 5 , 2 5 ] ]
Reminder : The i n p u t s and t h e d e s i r e d output a r e l i s t s o f
I n t e r v a l o b j e c t s , and not a r r a y s o r l i s t s .

24.6 Intersection
For problems to get intersections of lists, we can use hashmap, which takes
O(m + n) time complexity. Also, we can use sorting at first and use two
pointers one start from the start of each array. Examples are shown as
below;

1. 349. Intersection of Two Arrays (Easy)

Given two arrays, write a function to compute their intersection.
Example:
1 Given nums1 = [ 1 , 2 , 2 , 1 ] , nums2 = [ 2 , 2 ] , r e t u r n [ 2 ] .

Note:

• Each element in the result must be unique.

• The result can be in any order.

Solution 1: Using hashmap, here we use set to convert, this takes

43ms.
1 d e f i n t e r s e c t i o n ( s e l f , nums1 , nums2 ) :
2 """
3 : type nums1 : L i s t [ i n t ]
4 : type nums2 : L i s t [ i n t ]
5 : rtype : List [ int ]
6 """
7 i f not nums1 o r not nums2 :
8 return [ ]
9 i f l e n ( nums1 ) > l e n ( nums2 ) :
10 nums1 , nums2 = nums2 , nums1
11 ans = s e t ( )
12 nums1 = s e t ( nums1 )
13 f o r e i n nums2 :
14 i f e i n nums1 :
15 ans . add ( e )
16 r e t u r n l i s t ( ans )

Solution2: sorting at first, and then use pointers. Take 46 ms.

1 d e f i n t e r s e c t i o n ( s e l f , nums1 , nums2 ) :
2 """
3 : type nums1 : L i s t [ i n t ]
4 : type nums2 : L i s t [ i n t ]
580 24. ARRAY QUESTIONS(15%)

5 : rtype : List [ int ]

6 """
7 nums1 . s o r t ( )
8 nums2 . s o r t ( )
9 r = set ()
10 i , j = 0, 0
11 w h i l e i < l e n ( nums1 ) and j < l e n ( nums2 ) :
12 i f nums1 [ i ] < nums2 [ j ] :
13 i += 1
14 e l i f nums1 [ i ] > nums2 [ j ] :
15 j += 1
16 else :
17 r . add ( nums1 [ i ] )
18 i += 1
19 j += 1
20 return l i s t ( r )

2. 350. Intersection of Two Arrays II(Easy)

Given two arrays, write a function to compute their intersection.
Example:
1 Given nums1 = [ 1 , 2 , 2 , 1 ] , nums2 = [ 2 , 2 ] , r e t u r n [ 2 , 2 ] .

Note:

• Each element in the result should appear as many times as it

shows in both arrays.
• The result can be in any order.

Follow up:

(a) What if the given array is already sorted? How would you opti-
mize your algorithm?
(b) What if nums1’s size is small compared to nums2’s size? Which
algorithm is better?
(c) What if elements of nums2 are stored on disk, and the memory is
limited such that you cannot load all elements into the memory
at once?

24.7 Miscellanous Questions

24.13 283. Move Zeroes. (Easy) Given an array nums, write a function
to move all 0’s to the end of it while maintaining the relative order of
the non-zero elements.
Note:

1. You must do this in-place without making a copy of the array.

24.8. EXERCISES 581

2. Minimize the total number of operations.

1 Example :
2
3 I nput : [ 0 , 1 , 0 , 3 , 1 2 ]
4 Output : [ 1 , 3 , 1 2 , 0 , 0 ]

Solution 1: Find All Zeros Subarray. If we found the first all

zeros subarray [0, ..., 0] + [x], and we can swap this subarray with the
first non-zero element as swap last 0 with x, swap second last element
with x, ..., and so on. Therefore, if 0 is at first index, one zero, then
it takes O(n), if another 0, at index 1, it takes n-1+n-2 = 2n. It is
bit tricky to compute the complexity analysis. The upper bound is
O(n2 ).

24.8 Exercises
24.8.1 Subsequence with (DP)
1. 594. Longest Harmonious Subsequence
We define a harmonious array is an array where the difference between
its maximum value and its minimum value is exactly 1.
Now, given an integer array, you need to find the length of its longest
harmonious subsequence among all its possible subsequences.
Example 1:
1 I nput : [ 1 , 3 , 2 , 2 , 5 , 2 , 3 , 7 ]
2 Output : 5
3 E x p l a n a t i o n : The l o n g e s t harmonious s u b s e q u e n c e i s
[3 ,2 ,2 ,2 ,3].

Note: The length of the input array will not exceed 20,000.
Solution: at first, use a Counter to save the whole set. Then visit the
counter dictionary, to check key+1 and key-1, only when the item is
not zero, we can count it as validate, or else it is 0.
1 from c o l l e c t i o n s import Counter
2 class Solution :
3 d e f findLHS ( s e l f , nums ) :
4 """
5 : type nums : L i s t [ i n t ]
6 : rtype : int
7 """
8 i f not nums o r l e n ( nums ) <2:
9 return 0
10 count=Counter ( nums ) #t h e l i s t i s s o r t e d by t h e key
value
11 maxLen = 0
582 24. ARRAY QUESTIONS(15%)

12 f o r key , item i n count . i t e m s ( ) : #t o v i s i t t h e key :

item i n t h e c o u n t e r
13 i f count [ key + 1 ] : #b e c a u s e t h e l i s t i s s o r t e d ,
s o we o n l y need t o check key+1
14 maxLen = max( maxLen , item+count [ key +1])
15
16 # i f count [ key − 1 ] :
17 # maxLen=max( maxLen , item+count [ key −1])
18 r e t u r n maxLen

2. 521. Longest Uncommon Subsequence I

Given a group of two strings, you need to find the longest uncom-
mon subsequence of this group of two strings. The longest uncom-
mon subsequence is defined as the longest subsequence of one of these
strings and this subsequence should not be any subsequence of the
other strings.
A subsequence is a sequence that can be derived from one sequence by
deleting some characters without changing the order of the remaining
elements. Trivially, any string is a subsequence of itself and an empty
string is a subsequence of any string.
The input will be two strings, and the output needs to be the length
of the longest uncommon subsequence. If the longest uncommon sub-
sequence doesn’t exist, return -1.
Example 1:
1 Input : " aba " , " cdc "
2 Output : 3
3 E x p l a n a t i o n : The l o n g e s t uncommon s u b s e q u e n c e i s " aba " ( o r
" cdc " ) ,
4 b e c a u s e " aba " i s a s u b s e q u e n c e o f " aba " ,
5 but not a s u b s e q u e n c e o f any o t h e r s t r i n g s i n t h e group o f
two s t r i n g s .

Note:
Both strings’ lengths will not exceed 100.
Only letters from a z will appear in input strings.
Solution: if we get more examples, we could found the following rules,
“aba”,”aba” return -1,
1 def findLUSlength ( s e l f , a , b) :
2 """
3 : type a : s t r
4 : type b : s t r
5 : rtype : int
6 """
7 i f l e n ( b ) != l e n ( a ) :
8 r e t u r n max( l e n ( a ) , l e n ( b ) )
24.8. EXERCISES 583

9 #l e n g t h i s t h e same
10 r e t u r n l e n ( a ) i f a !=b e l s e −1

3. 424. Longest Repeating Character Replacement

Given a string that consists of only uppercase English letters, you can
replace any letter in the string with another letter at most k times.
Find the length of a longest substring containing all repeating letters
you can get after performing the above operations.
Note:
Both the string’s length and k will not exceed 104.
Example 1:
1 I nput :
2 s = "ABAB" , k = 2
3
4 Output :
5 4

Explanation: Replace the two ’A’s with two ’B’s or vice versa.
Example 2:
1 I nput :
2 s = "AABABBA" , k = 1
3
4 Output :
5 4

Explanation: Replace the one ’A’ in the middle with ’B’ and form
"AABBBBA". The substring "BBBB" has the longest repeating let-
ters, which is 4.
Solution: the brute-force recursive solution for this, is try to replace
any char into another when it is not equal or choose not too. LTE
1 #b r u t e f o r c e , u s e r e c u r s i v e f u n c t i o n t o w r i t e b r u t e f o r c e
solution
2 d e f r e p l a c e ( news , idx , re_char , k ) :
3 n o n l o c a l maxLen
4 i f k==0 o r i d x==l e n ( s ) :
5 maxLen = max( maxLen , getLen ( news ) )
6 return
7
8 i f s [ i d x ] ! = re_char : #r e p l a c e
9 news_copy=news [ : i d x ]+ re_char+news [ i d x + 1 : ]
10 r e p l a c e ( news_copy , i d x +1 , re_char , k−1)
11 r e p l a c e ( news [ : ] , i d x +1 , re_char , k )
12
13 #what i f we o n l y have one c h a r
14 # f o r char1 in chars . keys ( ) :
15 # r e p l a c e ( s [ : ] , 0 , char1 , k )
584 24. ARRAY QUESTIONS(15%)

To get the BCR, think about the sliding window. The longest re-
peating string we can by number of replacement = ‘length of string
max(numer of occurence of letter i), i=’A’ to ‘Z’. With the constraint,
which means the equation needs to be ≤ k. So we can use sliding
window to record the max occurence, and when the constraint is vi-
olated, we shrink the window. Given an example, strs= “BBCABB-
BAB”, k=2, when i=0, and j=7, 8–5=3>2, which is at A, we need
to shrink it, the maxCharCount changed to 4, i=1, so that 8–1–4=3,
i=2, 8–2–3=3, 8–3–3=2, so i=3, current length is 5.
1 def characterReplacement ( s e l f , s , k) :
2 """
3 : type s : s t r
4 : type k : i n t
5 : rtype : int
6 """
7 i , j = 0 , 0 #s l i d i n g window
8 counter =[0]∗26
9 ans = 0
10 maxCharCount = 0
11 w h i l e j <l e n ( s ) :
12 c o u n t e r [ ord ( s [ j ] )−ord ( 'A ' ) ]+=1
13 maxCharCount = max( maxCharCount , c o u n t e r [ ord ( s [
j ] )−ord ( 'A ' ) ] )
14 w h i l e j −i +1−maxCharCount>k : #now s h r i n k t h e
window
15 c o u n t e r [ ord ( s [ i ] )−ord ( 'A ' ) ]−=1
16 i+=1
17 #updata max
18 maxCharCount=max( c o u n t e r )
19 ans=max( ans , j −i +1)
20 j+=1
21
22 r e t u r n ans

4. 395. Longest Substring with At Least K Repeating Characters

Find the length of the longest substring T of a given string (consists
of lowercase letters only) such that every character in T appears no
less than k times.
Example 1:
1 Input :
2 s = " aaabb " , k = 3
3
4 Output :
5 3

The longest substring is "aaa", as ’a’ is repeated 3 times.

Example 2:
24.8. EXERCISES 585

1 I nput :
2 s = " ababbc " , k = 2
3
4 Output :
5 5

The longest substring is "ababb", as ’a’ is repeated 2 times and ’b’ is

repeated 3 times.
Solution: use dynamic programming with memo: Cons: it takes too
much space, and with LTE.
1 from c o l l e c t i o n s import Counter , d e f a u l t d i c t
2 class Solution :
3 def longestSubstring ( s e l f , s , k) :
4 """
5 : type s : s t r
6 : type k : i n t
7 : rtype : int
8 """
9 i f not s :
10 return 0
11 i f l e n ( s )<k :
12 return 0
13 count = Counter ( c h a r f o r c h a r i n s )
14 p r i n t ( count )
15 memo= [ [ None f o r c o l i n r a n g e ( l e n ( s ) ) ] f o r row i n
range ( len ( s ) ) ]
16
17 d e f c u t ( s t a r t , end , count ) :
18 i f s t a r t >end :
19 return 0
20 i f memo [ s t a r t ] [ end]==None :
21 i f any(0< item<k f o r key , item i n count . i t e m s
() ) :
22 newCounterF=count . copy ( )
23 newCounterF [ s [ s t a r t ]] −=1
24 newCounterB=count . copy ( )
25 newCounterB [ s [ end ]] −=1
26 #p r i n t ( newsF , newsB )
27 memo [ s t a r t ] [ end ]= max( c u t ( s t a r t +1 , end ,
newCounterF ) , c u t ( s t a r t , end −1, newCounterB ) )
28 else :
29 memo [ s t a r t ] [ end ] = end−s t a r t +1
30 r e t u r n memo [ s t a r t ] [ end ]
31 r e t u r n c u t ( 0 , l e n ( s ) −1, count )

Now, use sliding window, we use a pointer mid, what start from 0, if
the whole string satisfy the condition, return len(s). Otherwise, use
two while loop to separate the string into three substrings: left, mid,
right. left satisfy, mid unsatisfy, right unknown.
1 from c o l l e c t i o n s import Counter , d e f a u l t d i c t
2 class Solution :
586 24. ARRAY QUESTIONS(15%)

3 def longestSubstring ( s e l f , s , k) :
4 """
5 : type s : s t r
6 : type k : i n t
7 : rtype : int
8 """
9 i f not s :
10 return 0
11 i f l e n ( s )<k :
12 return 0
13 count = Counter ( c h a r f o r c h a r i n s )
14 mid=0 #on t h e l e f t s i d e , from 0−mid , s a t i s f i e d
elments
15 w h i l e mid<l e n ( s ) and count [ s [ mid]]>=k :
16 mid+=1
17 i f mid==l e n ( s ) : r e t u r n l e n ( s )
18 l e f t = s e l f . l o n g e s t S u b s t r i n g ( s [ : mid ] , k ) #" ababb "
19 #from pre_mid − cur_mid , g e t r i d o f t h o s e c a n t
s a t i s f y the c o n d i t i o n
20 w h i l e mid<l e n ( s ) and count [ s [ mid ]] < k :
21 mid+=1
22 #now t h e r i g h t s i d e keep d o i n g i t
23 r i g h t = s e l f . l o n g e s t S u b s t r i n g ( s [ mid : ] , k )
24 r e t u r n max( l e f t , r i g h t )

24.8.2 Subset
216. Combination Sum III
Find all possible combinations of k numbers that add up to a number
n, given that only numbers from 1 to 9 can be used and each combination
should be a unique set of numbers.
Note :

A l l numbers w i l l be p o s i t i v e i n t e g e r s .
The s o l u t i o n s e t must not c o n t a i n d u p l i c a t e c o m b i n a t i o n s .

Example 1 :

Input : k = 3 , n = 7
Output : [ [ 1 , 2 , 4 ] ]

Example 2 :

Input : k = 3 , n = 9
Output : [ [ 1 , 2 , 6 ] , [ 1 , 3 , 5 ] , [ 2 , 3 , 4 ] ]

1 d e f combinationSum3 ( s e l f , k , n ) :
2 """
3 : type k : i n t
4 : type n : i n t
5 : rtype : List [ List [ int ] ]
6 """
24.8. EXERCISES 587

7 # each o n l y used one time

8 d e f combine ( s , c u r r , ans , t , d , k , n ) :
9 i f t < 0:
10 return
11 i f d == k :
12 i f t == 0 :
13 ans . append ( c u r r )
14 return
15 f o r i in range ( s , n) :
16 num = i +1
17 combine ( i +1, c u r r +[num ] , ans , t−num , d+1 , k , n )
18 ans = [ ]
19 combine ( 0 , [ ] , ans , n , 0 , k , 9 )
20 r e t u r n ans

24.8.3 Intersection
160. Intersection of Two Linked Lists (Easy)
Write a program to find the node at which the intersection of two singly
linked lists begins.
For example, the following two linked lists:

A: a1 −> a2
\
c1 −> c2 −> c3
/
B: b1 −> b2 −> b3

begin to intersect at node c1.

Notes:

• If the two linked lists have no intersection at all, return null.

• The linked lists must retain their original structure after the function
returns.

• You may assume there are no cycles anywhere in the entire linked
structure.

• Your code should preferably run in O(n) time and use only O(1) mem-
ory.
588 24. ARRAY QUESTIONS(15%)
25

Linked List, Stack, Queue, and Heap Questions (12%)

In this chapter, we focusing on solving problems that carried on or the

solution is related using non-linear data structures that are not array/string,
such as linked list, heap, queue, and stack.

25.1 Linked List

Problems with linked list can be basic operations to add or remove node, or
merge two different linked list.

Circular Linked List For the circular linked list, when we are traversing
the list, the most important thing is to know how to set up the end condition
for the while loop.
25.1 708. Insert into a Cyclic Sorted List (medium) Given a node
from a cyclic linked list which is sorted in ascending order, write a
function to insert a value into the list such that it remains a cyclic
sorted list. The given node can be a reference to any single node in
the list, and may not be necessarily the smallest value in the cyclic
list. For example,
Analysis: The maximum we traverse the list is one round. The po-
tential positions we insert is related to the insert value. Suppose the
linked list is in range of [s, e], s<=e. Given the insert value as m:
1. m ∈ [s, e]: we insert in the middle of the list.
2. m ≥ e or m ≤ s: we insert at the end of the list, we need to
detect the end as if the current node’s value is larger than its
successor’s value.

589
590 25. LINKED LIST, STACK, QUEUE, AND HEAP QUESTIONS (12%)

Figure 25.1: Example of insertion in circular list

3. After one loop, if we can not find a place, then we insert at the
end. For example, 2->2->2 and insert 3 or 2->3->4->2 and
insert 2.

1 d e f i n s e r t ( s e l f , head , i n s e r t V a l ) :
2 i f not head : # 0 node
3 head = Node ( i n s e r t V a l , None )
4 head . next = head
5 r e t u r n head
6
7 c u r = head
8 w h i l e c u r . next != head :
9 i f c u r . v a l <= i n s e r t V a l <= c u r . next . v a l : # i n s e r t
10 break
11 e l i f c u r . v a l > c u r . next . v a l : # end and s t a r t
12 i f i n s e r t V a l >= c u r . v a l o r i n s e r t V a l <= c u r .
next . v a l :
13 break
14 c u r = c u r . next
15 else :
16 c u r = c u r . next
17 # insert
18 node = Node ( i n s e r t V a l , None )
19 node . next , c u r . next = c u r . next , node
20 r e t u r n head
25.2. QUEUE AND STACK 591

25.2 Queue and Stack

Because Queue and Stack is used to implement BFS and DFS search respec-
tively, therefore, that type of implementation is covered in Chapter ??. The
other problems include: Buffering problem with Queue(circular queue),

25.2.1 Implementing Queue and Stack

25.2 622. Design Circular Queue (medium). Design your imple-
mentation of the circular queue. The circular queue is a linear data
structure in which the operations are performed based on FIFO (First
In First Out) principle and the last position is connected back to the
first position to make a circle. It is also called "Ring Buffer".
Your implementation should support following operations:

• MyCircularQueue(k): Constructor, set the size of the queue to

be k.
• Front: Get the front item from the queue. If the queue is empty,
return -1.
• Rear: Get the last item from the queue. If the queue is empty,
return -1.
• enQueue(value): Insert an element into the circular queue. Re-
turn true if the operation is successful.
• deQueue(): Delete an element from the circular queue. Return
true if the operation is successful.
• isEmpty(): Checks whether the circular queue is empty or not.
• isFull(): Checks whether the circular queue is full or not.

Solution 1: Singly Linked List with Predefined Size. This is a

typical queue data structure and because it is a buffering, therefore, we
need to limit its size. As shown in previous theory chapter of the book,
queue can be implemented with singly linked list with two pointers,
one at the head and the other at the rear. The additional controlling
we need is to limit the size of the queue.
1 c l a s s MyCircularQueue :
2 c l a s s Node :
3 d e f __init__ ( s e l f , v a l ) :
4 s e l f . val = val
5 s e l f . next = None
6 d e f __init__ ( s e l f , k ) :
7 self . size = k
8 s e l f . head = None
9 s e l f . t a i l = None
10 s e l f . cur_size = 0
11
592 25. LINKED LIST, STACK, QUEUE, AND HEAP QUESTIONS (12%)

12 d e f enQueue ( s e l f , v a l u e ) :
13 i f s e l f . c u r _ s i z e >= s e l f . s i z e :
14 return False
15 new_node = MyCircularQueue . Node ( v a l u e )
16 i f s e l f . c u r _ s i z e == 0 :
17 s e l f . t a i l = s e l f . head = new_node
18 else :
19 s e l f . t a i l . next = new_node
20 new_node . next = s e l f . head
21 s e l f . t a i l = new_node
22 s e l f . c u r _ s i z e += 1
23 r e t u r n True
24
25 d e f deQueue ( s e l f ) :
26
27 if s e l f . c u r _ s i z e == 0:
28 return False
29 # d e l e t e head node
30 v a l = s e l f . head . v a l
31 i f s e l f . c u r _ s i z e == 1:
32 s e l f . head = s e l f . t a i l = None
33 else :
34 s e l f . head = s e l f . head . next
35 s e l f . c u r _ s i z e −= 1
36 r e t u r n True
37
38 d e f Front ( s e l f ) :
39 r e t u r n s e l f . head . v a l i f s e l f . head e l s e −1
40
41 d e f Rear ( s e l f ) :
42 return s e l f . t a i l . val i f s e l f . t a i l e l s e −1
43
44 d e f isEmpty ( s e l f ) :
45 r e t u r n True i f s e l f . c u r _ s i z e == 0 e l s e F a l s e
46
47 def i s F u l l ( s e l f ) :
48 r e t u r n True i f s e l f . c u r _ s i z e == s e l f . s i z e e l s e
False

25.3 641. Design Circular Deque (medium).

Solution: Doubly linked List with Predefined size

25.2.2 Solving Problems Using Queue

Use as a Buffer

25.4 346. Moving Average from Data Stream (easy). Given a stream
of integers and a window size, calculate the moving average of all
integers in the sliding window.
Example :
25.2. QUEUE AND STACK 593

Figure 25.2: Histogram

MovingAverage m = new MovingAverage ( 3 ) ;

m. next ( 1 ) = 1
m. next ( 1 0 ) = ( 1 + 1 0 ) / 2
m. next ( 3 ) = ( 1 + 10 + 3) / 3
m. next ( 5 ) = ( 1 0 + 3 + 5) / 3

Solution: module deque with maxlen. When we have a fixed

window size, this is like a buffer, it has a maximum of capacity. When
the n+1 th element come, we need delete the leftmost element first.
This is directly implemented in deque module if we set the maxlen to
the size we want. Also, it is easy to use function like sum() and len()
to compute the average value.
1 from c o l l e c t i o n s import deque
2 c l a s s MovingAverage :
3 d e f __init__ ( s e l f , s i z e ) :
4 s e l f . q = deque ( maxlen = s i z e )
5 d e f next ( s e l f , v a l ) :
6 s e l f . q . append ( v a l )
7 r e t u r n sum ( s e l f . q ) / l e n ( s e l f . q )

25.2.3 Solving Problems with Stack and Monotone Stack

84. Largest Rectangle in Histogram

Given n non-negative integers representing the histogram’s bar height
where the width of each bar is 1, find the area of largest rectangle in
the histogram. Above is a histogram where width of each bar is 1,
given height = [2, 1, 5, 6, 2, 3]. The largest rectangle is shown in the
shaded area, which has area = 10 unit.
Solution: brute force. Start from 2 which will be included, then we go
to the right side to find the minimum height, we could have possible
594 25. LINKED LIST, STACK, QUEUE, AND HEAP QUESTIONS (12%)

area (1 × 2, 1 × 3, 1 × 4, 1 × 5, ...), which gave us O(n2 ) to track the

min height and width.
1 class Solution :
2 def largestRectangleArea ( s e l f , heights ) :
3 """
4 : type h e i g h t s : L i s t [ i n t ]
5 : rtype : int
6 """
7 i f not h e i g h t s :
8 return 0
9 maxsize = max( h e i g h t s )
10
11 f o r i in range ( len ( h e i g h t s ) ) :
12 minheight = h e i g h t s [ i ]
13 width = 1
14 f o r j i n r a n g e ( i +1, l e n ( h e i g h t s ) ) :
15 width+=1
16 m i n h e i g h t = min ( minheight , h e i g h t s [ j ] )
17 maxsize = max( maxsize , m i n h e i g h t ∗ width )
18 r e t u r n maxsize

Now, try the BCR, which is O(n). The maximum area is amony areas
that use each height as the rectangle height multiplied by the width
that works. For the above example,we would choose the maximum
among 2 × 1, 1 × 6, 5 × 2, 6 × 1, 2 × 4, 3 × 1. So, the important step here
is to find the possible width, for element 2, if the following height is
increasing, then the width grows, however, since the following height
1 is smaller, so 2 will be popped out, we can get 2 × 1, which satisfies
the condition of the monotonic increasing stack, when one element is
popped out, which means we found the next element that is smaller
than the kicked out element, so the width span ended here. How to
deal if current number equals to previous, 6,6,6,6,6, we need to pop
previous, and append current. The structure we use here is called
Monotonic Stack, which will only allow the increasing elements to get
in the stack, and once smaller or equal ones get in, it kicks out the
previous smaller elements.
1 def largestRectangleArea ( s e l f , heights ) :
2 """
3 : type h e i g h t s : L i s t [ i n t ]
4 : rtype : int
5 """
6 i f not h e i g h t s :
7 return 0
8 maxsize = max( h e i g h t s )
9
10 s t a c k = [ −1]
11
12 #t h e s t a c k w i l l o n l y grow
13 f o r i , h i n enumerate ( h e i g h t s ) :
25.2. QUEUE AND STACK 595

14 i f s t a c k [ −1]!= −1:
15 i f h>h e i g h t s [ s t a c k [ − 1 ] ] :
16 s t a c k . append ( i )
17 else :
18 #s t a r t t o k i c k t o pop and compute t h e
area
19 w h i l e s t a c k [ −1]!= −1 and h<=h e i g h t s [
s t a c k [ − 1 ] ] : #same o r e q u a l n e e d s t o be pop out
20 i d x = s t a c k . pop ( )
21 v = heights [ idx ]
22 maxsize=max( maxsize , ( i −s t a c k
[ −1] −1) ∗v )
23 s t a c k . append ( i )
24
25 else :
26 s t a c k . append ( i )
27 #h a n d l e t h e l e f t s t a c k
28 w h i l e s t a c k [ −1]!= −1:
29 i d x = s t a c k . pop ( )
30 v = heights [ idx ]
31 maxsize=max( maxsize , ( l e n ( h e i g h t s )−s t a c k [ −1] −1)
∗v )
32 r e t u r n maxsize

85. Maximal Rectangle Solution: 64/66 with LTE

1 d e f maximalRectangle ( s e l f , matrix ) :
2 """
3 : type matrix : L i s t [ L i s t [ s t r ] ]
4 : rtype : int
5 """
6 i f not matrix :
7 return 0
8 i f l e n ( m atri x [ 0 ] ) ==0:
9 return 0
10 row , c o l = l e n ( matrix ) , l e n ( matrix [ 0 ] )
11
12 def check ( x , y , w, h ) :
13 #check t h e l a s t c o l
14 f o r i i n r a n g e ( x , x+h ) : #change row
15 i f matrix [ i ] [ y+w−1]== ' 0 ' :
16 return 0
17 f o r j i n r a n g e ( y , y+w) : #change c o l
18 i f matrix [ x+h − 1 ] [ j ]== ' 0 ' :
19 return 0
20 r e t u r n w∗h
21 maxsize = 0
22 f o r i i n r a n g e ( row ) :
23 f o r j i n r a n g e ( c o l ) : #s t a r t p o i n t i , j
24 i f matrix [ i ] [ j ]== ' 0 ' :
25 continue
26 f o r h i n r a n g e ( 1 , row−i +1) : #d e c i d e t h e
s i z e o f t h e window
27 f o r w i n r a n g e ( 1 , c o l −j +1) :
596 25. LINKED LIST, STACK, QUEUE, AND HEAP QUESTIONS (12%)

28 r s l t = check ( i , j , w, h )
29 i f r s l t ==0: #we d e f i n i t e l y need t o
break i t . o r e l s e we g e t wrong r e s u l t
30 break
31 maxsize = max( maxsize , check ( i , j , w,
h) )
32 r e t u r n maxsize

Now, the same as before, use the sums

1 d e f maximalRectangle ( s e l f , matrix ) :
2 """
3 : type matrix : L i s t [ L i s t [ s t r ] ]
4 : rtype : int
5 """
6 i f not matrix :
7 return 0
8 i f l e n ( matrix [ 0 ] ) ==0:
9 return 0
10 row , c o l = l e n ( matrix ) , l e n ( matrix [ 0 ] )
11 sums = [ [ 0 f o r _ i n r a n g e ( c o l +1) ] f o r _ i n r a n g e (
row+1) ]
12 #no need t o i n i t i a l i z e row 0 and c o l 0 , b e c a u s e we
j u s t need i t t o be 0
13 f o r i i n r a n g e ( 1 , row+1) :
14 f o r j i n r a n g e ( 1 , c o l +1) :
15 sums [ i ] [ j ]=sums [ i − 1 ] [ j ]+sums [ i ] [ j −1]−sums [ i
− 1 ] [ j − 1 ] + [ 0 , 1 ] [ matrix [ i − 1 ] [ j −1]== ' 1 ' ]
16
17 def check ( x , y , w, h ) :
18 count = sums [ x+h − 1 ] [ y+w−1]−sums [ x+h − 1 ] [ y−1]−
sums [ x − 1 ] [ y+w−1]+sums [ x − 1 ] [ y −1]
19 r e t u r n count i f count==w∗h e l s e 0
20
21 maxsize = 0
22 f o r i i n r a n g e ( row ) :
23 f o r j i n r a n g e ( c o l ) : #s t a r t p o i n t i , j
24 i f matrix [ i ] [ j ]== ' 0 ' :
25 continue
26 f o r h i n r a n g e ( 1 , row−i +1) : #d e c i d e t h e
s i z e o f t h e window
27 f o r w i n r a n g e ( 1 , c o l −j +1) :
28 r s l t = check ( i +1, j +1 ,w, h )
29 i f r s l t ==0: #we d e f i n i t e l y need t o
break i t . o r e l s e we g e t wrong r e s u l t
30 break
31 maxsize = max( maxsize , r s l t )
32 r e t u r n maxsize

Still can not be AC. So we need another solution. Now use the largest
rectangle in histogram.
1 d e f maximalRectangle ( s e l f , matrix ) :
2 """
25.2. QUEUE AND STACK 597

3 : type matrix : L i s t [ L i s t [ s t r ] ]
4 : rtype : int
5 """
6 i f not matrix :
7 return 0
8 i f l e n ( m atri x [ 0 ] ) ==0:
9 return 0
10 d e f getMaxAreaHist ( h e i g h t s ) :
11 i f not h e i g h t s :
12 return 0
13 maxsize = max( h e i g h t s )
14
15 s t a c k = [ −1]
16
17 #t h e s t a c k w i l l o n l y grow
18 f o r i , h i n enumerate ( h e i g h t s ) :
19 i f s t a c k [ −1]!= −1:
20 i f h>h e i g h t s [ s t a c k [ − 1 ] ] :
21 s t a c k . append ( i )
22 else :
23 #s t a r t t o k i c k t o pop and compute
the area
24 w h i l e s t a c k [ −1]!= −1 and h<=h e i g h t s [
s t a c k [ − 1 ] ] : #same o r e q u a l n e e d s t o be pop out
25 i d x = s t a c k . pop ( )
26 v = heights [ idx ]
27 maxsize=max( maxsize , ( i −s t a c k
[ −1] −1) ∗v )
28 s t a c k . append ( i )
29
30 else :
31 s t a c k . append ( i )
32 #h a n d l e t h e l e f t s t a c k
33 w h i l e s t a c k [ −1]!= −1:
34 i d x = s t a c k . pop ( )
35 v = heights [ idx ]
36 maxsize=max( maxsize , ( l e n ( h e i g h t s )−s t a c k
[ −1] −1) ∗v )
37 r e t u r n maxsize
38 row , c o l = l e n ( matrix ) , l e n ( matrix [ 0 ] )
39 h e i g h t s = [ 0 ] ∗ c o l #s a v e t h e maximum h e i g h t s t i l l
here
40 maxsize = 0
41 f o r r i n r a n g e ( row ) :
42 f o r c in range ( c o l ) :
43 i f matrix [ r ] [ c]== ' 1 ' :
44 h e i g h t s [ c]+=1
45 else :
46 h e i g h t s [ c ]=0
47 #p r i n t ( h e i g h t s )
48 maxsize = max( maxsize , getMaxAreaHist ( h e i g h t s ) )
49 r e t u r n maxsize

Monotonic Stack
598 25. LINKED LIST, STACK, QUEUE, AND HEAP QUESTIONS (12%)

122. Best Time to Buy and Sell Stock II

Say you have an array for which the ith element is the price of a given
stock on day i.
Design an algorithm to find the maximum profit. You may complete
as many transactions as you like (i.e., buy one and sell one share of
the stock multiple times).
Note: You may not engage in multiple transactions at the same time
(i.e., you must sell the stock before you buy again).
Example 1:
1 Input : [ 7 , 1 , 5 , 3 , 6 , 4 ]
2 Output : 7
3 E x p l a n a t i o n : Buy on day 2 ( p r i c e = 1 ) and s e l l on day 3 (
p r i c e = 5 ) , p r o f i t = 5−1 = 4 .
4 Then buy on day 4 ( p r i c e = 3 ) and s e l l on day
5 ( p r i c e = 6 ) , p r o f i t = 6−3 = 3 .

Example 2:
1 Input : [ 1 , 2 , 3 , 4 , 5 ]
2 Output : 4
3 E x p l a n a t i o n : Buy on day 1 ( p r i c e = 1 ) and s e l l on day 5 (
p r i c e = 5 ) , p r o f i t = 5−1 = 4 .
4 Note t h a t you cannot buy on day 1 , buy on day
2 and s e l l them l a t e r , a s you a r e
5 e n g a g i n g m u l t i p l e t r a n s a c t i o n s a t t h e same
time . You must s e l l b e f o r e buying a g a i n .

Example 3:
1 Input : [ 7 , 6 , 4 , 3 , 1 ]
2 Output : 0
3 E x p l a n a t i o n : In t h i s c a s e , no t r a n s a c t i o n i s done , i . e . max
profit = 0.

Solution: the difference compared with the first problem is that we

Also, there are other solutions that can use O(1) space. Say the given
array is: [7, 1, 5, 3, 6, 4]. If we plot the numbers of the given array on
a graph, we get: If we analyze the graph, we notice that the points of

Figure 25.3: Track the peaks and valleys

interest are the consecutive valleys and peaks.

Mathematically speaking:

T otalP rof it = (height(peaki ) − height(valleyi )) (25.1)

The key point is we need to consider every peak immediately following

a valley to maximize the profit. In case we skip one of the peaks (trying
600 25. LINKED LIST, STACK, QUEUE, AND HEAP QUESTIONS (12%)

to obtain more profit), we will end up losing the profit over one of the
transactions leading to an overall lesser profit.
1 class Solution {
2 p u b l i c i n t maxProfit ( i n t [ ] p r i c e s ) {
3 int i = 0;
4 int valley = prices [ 0 ] ;
5 i n t peak = p r i c e s [ 0 ] ;
6 int maxprofit = 0;
7 while ( i =
prices [ i + 1])
9 i ++;
10 valley = prices [ i ] ;
11 w h i l e ( i < p r i c e s . l e n g t h − 1 && p r i c e s [ i ] <=
prices [ i + 1])
12 i ++;
13 peak = p r i c e s [ i ] ;
14 m a x p r o f i t += peak − v a l l e y ;
15 }
16 return maxprofit ;
17 }
18 }

This solution follows the logic used in Approach 2 itself, but with
only a slight variation. In this case, instead of looking for every peak
following a valley, we can simply go on crawling over the slope and keep
on adding the profit obtained from every consecutive transaction. In
the end,we will be using the peaks and valleys effectively, but we need
not track the costs corresponding to the peaks and valleys along with
the maximum profit, but we can directly keep on adding the difference
between the consecutive numbers of the array if the second number is
larger than the first one, and at the total sum we obtain will be the
maximum profit. This approach will simplify the solution. This can
be made clearer by taking this example: [1, 7, 2, 3, 6, 7, 6, 7]
The graph corresponding to this array is:
From the above graph, we can observe that the sum A+B+CA+B+CA+B+C
is equal to the difference D corresponding to the difference between the
heights of the consecutive peak and valley.
1 class Solution {
2 p u b l i c i n t maxProfit ( i n t [ ] p r i c e s ) {
3 int maxprofit = 0;
4 f o r ( i n t i = 1 ; i prices [ i − 1])
6 m a x p r o f i t += p r i c e s [ i ] − p r i c e s [ i − 1 ] ;
7 }
8 return maxprofit ;
9 }
10 }
25.3. HEAP AND PRIORITY QUEUE 601

Figure 25.4: profit graph

25.3 Heap and Priority Queue

25.5 621. Task Scheduler (medium). Given a char array representing
tasks CPU need to do. It contains capital letters A to Z where different
letters represent different tasks. Tasks could be done without original
order. Each task could be done in one interval. For each interval, CPU
could finish one task or just be idle.
However, there is a non-negative cooling interval n that means between
two same tasks, there must be at least n intervals that CPU are doing
different tasks or just be idle. You need to return the least number of
intervals the CPU will take to finish all the given tasks.
Example :

I nput : t a s k s = [ " A" , "A" , "A" , " B " , " B " , " B " ] , n = 2
Output : 8
E x p l a n a t i o n : A −> B −> i d l e −> A −> B −> i d l e −> A −> B .

Analysis: we can approach the problem by thinking when we can get

the least idle times? Whenever we put the same task together, they
incurs largest idle time. Therefore, rule number 1: put different task
next to each other whenever it is possible. However, consider the case:
"A":6, "B":1, "C":1, "D":1", "E":1, if we simply do a round of using all
of the available tasks in the decreasing order of their frequency, we
get ’A, B, C, D, E, A , ?, A, ?, A, ?, A, ?’, here we end up with four
’?’, which represents idle. However, this is not the best solution. A
better way that this is to use up the most frequent task as soon as its
cooling time is finished. The new order is ’A, B, C, A, D, E, A, ?, A,
602 25. LINKED LIST, STACK, QUEUE, AND HEAP QUESTIONS (12%)

?, A, ?, A’. We end up with one less idle session. We can implement

it with heapq due to the fast that it is more efficient compared with
PriorityQueue().
Solution 1: heapq and idle cycle. We can use a map to get the
frequency of each task, then we put their frequencies into a heapq,
by using heapify function. When the list is not empty yet, for each
idle cycle: which is n+1, we pop out items out and decrease its fre-
quency and add time. (Actually, using PriorityQueue() here we will
receive LTE.) We need O(n) to iterate through the tasks list to get
its frequency. Then heapify takes O(26), each time, heappush takes
O(log 26). This still makes the time complexity O(n).
1 from c o l l e c t i o n s import Counter
2 from queue import P r i o r i t y Q u e u e
3 import heapq
4 def l e a s t I n t e r v a l ( s e l f , tasks , n) :
5 c = Counter ( t a s k s )
6 h = [− count f o r _, count i n c . i t e m s ( ) ]
7 heapq . h e a p i f y ( h )
8
9 ans = 0
10
11 while h :
12 temp = [ ]
13 i = 0
14 w h i l e i <= n : # a c y c l e i s n+1
15
16 if h:
17 c = heapq . heappop ( h )
18 i f c < −1:
19 temp . append ( c +1)
20 ans += 1
21 # i f t h e queue i s empty , we r e a c h e d t h e end ,
need t o break , no i d l e
22 i f not h and not temp :
23 break
24 i += 1
25 f o r c i n temp :
26 heapq . heappush ( h , c )
27 r e t u r n ans

Solution 2: Use Sorting. Obversing Fig. 25.5, the actually time =

idle time + total number of tasks. So, all we need to do is getting
the idle time. And we start with the initial idle time which is (biggest
frequency - 1)*(n). Then we travese the sorted list from the second
item, and decrase the initial idle time. This gives us O(n) time too.
But the concept and coding is easier.
1 from c o l l e c t i o n s import Counter
2 def l e a s t I n t e r v a l ( s e l f , tasks , n) :
3 c = Counter ( t a s k s )
25.3. HEAP AND PRIORITY QUEUE 603

Figure 25.5: Task Scheduler, Left is the first step, the right is the one we
end up with.

4 f = [ count f o r _, count i n c . i t e m s ( ) ]
5 f . s o r t ( r e v e r s e =True )
6 idle_time = ( f [ 0 ] − 1) ∗ n
7
8 f o r i in range (1 , len ( f ) ) :
9 c = f[i]
10 i d l e _ t i m e −= min ( c , f [ 0 ] − 1 )
11 return idle_time + len ( tasks ) i f idle_time > 0 e l s e len
( tasks )
604 25. LINKED LIST, STACK, QUEUE, AND HEAP QUESTIONS (12%)
26

String Questions (15%)

For the string problems, it can be divided into two categories: one string
and two strings pattern matching.
For the one string problem, the first type is to do operations that meet
certain requirements on a single string. (1). For the ad hoc easy string
processing problems, we only need to read the requirement carefully and
use basic programming skills, data structures, and sometimes requires us
to be farmilar with some string libraries like Re other than the basic built-
in string functions. We list some LeetCode Problems of this type in Sec-
tion 26.1. (2) There are also more challenging problems: including find the
longest/shortest/ count substring and subsequence that satisfy certain re-
quirements. Usually the subsequence is more difficult than the substring.
In this chapter we would list the following types in Section 26.3

• Palindrome: A sequence of characters read the same forward and back-

ward.

• Anagram: A word or phrase formed by rearranging the letters of a

different word or phrase.

• Parentheses and others.

Application for Pattern Matching for two strings: Given two

strings or two arrays, one is S, and the pattern P, The problems can be
generalized to find pattern P in a string S, you would be given two strings.
(1) If we do not care the order of the letters (anagram) in the pattern, then
it is the best to use Sliding Window; This is detailed in Section 26.5 (2) If
we care the order matters (identical to pattern), we use KMP. The problems
of this type is listed in Section 26.6.

605
606 26. STRING QUESTIONS (15%)

26.1 Ad Hoc Single String Problems

1. 125. Valid Palindrome

2. 65. Valid Number

3. 20. Valid Parentheses (use a stack to save left parenthe)

4. 214. Shortest Palindrome (KMP lookup table)

5. 5. Longest Palindromic Substring

6. 214 Shortest Palindrome , KMP lookup table, for example s=abba,

constructed S = abba#abba),

7. 58. Length of Last Word(easy)

26.2 String Expression

1. 8. String to Integer (atoi) (medium)

26.3 Advanced Single String

For hard problem, reconstruct the problem to another so that it can be
resolved by an algorithm that you know.

26.3.1 Palindrome
Palindrome is a sequence of characters read the same forward and backward.
To identify if a sequence is a palindrome say “abba" we just need to check
if s == s[::-1]. In the structure, if we know “bb" is palindrome, then “abba"
should be palindrome if s[0] == s[3]. Due to this structure, in the problems
with finding palindromic substrings, we can apply dynamic programming
and other algorithms to fight back the naive solution.
To validate a palindrome we can use two pointers, one at the start, and
the other and the end. We iterative them into the middle location.

1. 409. Longest Palindrome (*)

2. 9. Palindrome Number (*)

3. Palindrome Linked List (234, *)

4. Valid Palindrome (125, *)

5. Valid Palindrome II (680, *)

6. Largest Palindrome Product (479, *)

26.3. ADVANCED SINGLE STRING 607

7. 647. Palindromic Substrings (medium, check)

8. Longest Palindromic Substring (5, **, check)

9. Longest Palindromic Subsequence(516, **)

10. Shortest Palindrome (214, ***)

11. Find the Closest Palindrome(564, ***)

12. Count Different Palindromic Subsequences(730, ***)

13. Palindrome Partitioning (131, **)

14. Palindrome Partitioning II (132, ***)

15. 266. Palindrome Permutation (Easy)

16. Palindrome Permutation II (267, **)

17. Prime Palindrome (866, **)

18. Super Palindromes (906, ***)

19. Palindrome Pairs (336, ***)

20.

26.1 Valid Palindrome II (L680, *). Given a non-empty string s, you

may delete at most one character. Judge whether you can make it a
palindrome.
Example 1 :

I nput : " aba "

Output : True

Example 2 :

I nput : " abca "

Output : True
E x p l a n a t i o n : You c o u l d d e l e t e t h e c h a r a c t e r ' c ' .

Solution: Two Pointers. If we allow zero deletion, then it is a

normal two pointers algorithm to check if the start i and the end j
position has the same char. If we allow another time deletion is when
the start and end char is not equal, we check if deleting s[i] or s[j], left
s(i+1, j) or s(i, j-1) if they are palindrome.
608 26. STRING QUESTIONS (15%)

1 def validPalindrome ( s e l f , s ) :
2 i f not s :
3 r e t u r n True
4
5 i , j = 0 , l e n ( s )−1
6 w h i l e i <= j :
7 i f s [ i ] == s [ j ] :
8 i += 1
9 j −= 1
10 else :
11 l e f t = s [ i +1: j +1]
12 right = s [ i : j ]
13 r e t u r n l e f t == l e f t [ : : − 1 ] o r r i g h t == r i g h t
[:: −1]
14 r e t u r n True

26.2 Palindromic Substrings(L647, **). Given a string, your task is to

count how many palindromic substrings in this string. The substrings
with different start indexes or end indexes are counted as different
substrings even they consist of same characters.
Example 1 :

Input : " abc "

Output : 3
E x p l a n a t i o n : Three p a l i n d r o m i c s t r i n g s : " a " , " b " , " c " .

Example 2 :

Input : " aaa "

Output : 6
E x p l a n a t i o n : S i x p a l i n d r o m i c s t r i n g s : " a " , " a " , " a " , " aa " ,
" aa " , " aaa " .

Solution 1: Dynamic Programming. First, we use dp[i][j] to

denotes if the substring s[i:j] is a palindrome or not. Thus, we have a
matrix of size n × n. We an apply a simple example “aaa".
`` aaa "
0 1 2
0 1 1 1
1 0 1 1
2 0 0 1

From the example, first, we know this matrix would only have valid
value at the upper part due to i<=j. Because if j-i>=3 which means
the length is larger or equals to 3, dp[i][j] = 1 if s[i]==s[j] and dp[i+1][j-
1]==1. Compare i:i+1, j:j-1. This means we need to iterate i reversely
and j incrementally.
1 def countSubstrings ( s e l f , s ) :
2 """
26.3. ADVANCED SINGLE STRING 609

3 : type s : s t r
4 : rtype : int
5 """
6 n =l e n ( s )
7 dp = [ [ 0 f o r _ i n r a n g e ( n ) ] f o r _ i n r a n g e ( n ) ] # i f
from i t o j i s a p a l i n d r o m e
8 res = 0
9 f o r i i n r a n g e ( n−1,−1,−1) :
10 f o r j in range ( i , n) :
11 i f j −i >2: #l e n g t h >=3
12 dp [ i ] [ j ] = ( s [ i ]==s [ j ] and dp [ i + 1 ] [ j −1])
13 else :
14 dp [ i ] [ j ] = ( s [ i ]==s [ j ] ) #l e n g t h 1 and 2
15 i f dp [ i ] [ j ] :
16 r e s += 1
17 return res

Range Type Dynamic Programming.A sligtly different way to fill

out the matrix is:
1 def countSubstrings ( s e l f , s ) :
2 i f not s :
3 return 0
4
5 rows = l e n ( s )
6 dp = [ [ 0 f o r c o l i n r a n g e ( rows ) ] f o r row i n r a n g e ( rows )
]
7 ans = 0
8 f o r i i n r a n g e ( 0 , rows ) :
9 dp [ i ] [ i ] = 1
10 ans += 1
11
12 f o r l i n r a n g e ( 2 , rows +1) : #l e n g t h o f s u b s t r i n g
13 f o r i i n r a n g e ( 0 , rows−l +1) : #s t a r t 0 , end l e n − l +1
14 j = i+l −1
15 i f j > rows :
16 continue
17 i f s [ i ] == s [ j ] :
18 i f j −i > 2 :
19 dp [ i ] [ j ] = dp [ i + 1 ] [ j −1]
20 else :
21 dp [ i ] [ j ] = 1
22 ans += dp [ i ] [ j ]
23
24 r e t u r n ans

Solution 2: Center Expansion. For s[0]=’a’, it is center at 0,

s[0:2]=’aa’, is center between 0 and 1, s[1]=’a’, s[0:3] =’aaa’, center at
1. s[1:3]=’aa’ is center between 1 and 2, for s[3]=’a’, is center at 2.
There for our centers goes from: The time complexity if O(n2 ).
l e f t = 0 , r i g h t = 0 , i = 0 , i /2 = 0 , i %2 = 0
l e f t = 0 , r i g h t = 1 , i = 1 , i /2 = 0 , i %2 = 1
l e f t = 1 , r i g h t = 1 , i = 2 , i /2 = 1 , i %2 = 0
610 26. STRING QUESTIONS (15%)

Figure 26.1: LPS length at each position for palindrome.

l e f t = 1 , r i g h t = 2 , i = 3 , i /2 = 1 , i %2 = 1
l e f t = 2 , r i g h t = 2 , i = 4 , i /2 = 2 , i %2 = 0

1 def countSubstrings ( s e l f , S) :
2 n = len (S)
3 ans = 0
4 f o r i i n r a n g e ( 2 ∗ n−1) :
5 l = i n t ( i /2)
6 r = l + i %2
7 w h i l e l >= 0 and r < n and S [ l ] == S [ r ] :
8 ans += 1
9 l −= 1
10 r += 1
11 r e t u r n ans

Solution 3: Manacher’s Algorithm. In the center expansion, we

can save the result according to the position i. We can see from postion
6, the LPS table is symmetric, what Manacher’s Algorithm do is to
identify around the center of a palindrome, when it will be symmetric
and when it wont (in case at position 3, for immediate left and right
(2, 4) is symmetric, but not (0, 5). This is distinguished by the LPS
length at position 3. only (i-d, i, i+d) will be symmetric. The code
for Python 2 is given: and try to understand later???
1 d e f manachers ( S ) :
2 A = '@# ' + '# ' . j o i n ( S ) + '#$ '
3 Z = [ 0 ] ∗ l e n (A)
4 center = right = 0
5 f o r i i n xrange ( 1 , l e n (A) − 1 ) :
6 i f i < right :
7 Z [ i ] = min ( r i g h t − i , Z [ 2 ∗ c e n t e r − i ] )
8 w h i l e A[ i + Z [ i ] + 1 ] == A[ i − Z [ i ] − 1 ] :
9 Z [ i ] += 1
10 i f i + Z[ i ] > right :
11 center , right = i , i + Z [ i ]
12 return Z
13
14 r e t u r n sum ( ( v+1) /2 f o r v i n manachers ( S ) )
26.3. ADVANCED SINGLE STRING 611

26.3 Longest Palindromic Subsequence (L516, **). Given a string

s, find the longest palindromic subsequence’s length in s. You may
assume that the maximum length of s is 1000.
Example 1 :
I nput :
" bbbab "
Output :
4
One p o s s i b l e l o n g e s t p a l i n d r o m i c s u b s e q u e n c e i s " bbbb " .

Example 2 :
I nput :
" cbbd "
Output :
2
One p o s s i b l e l o n g e s t p a l i n d r o m i c s u b s e q u e n c e i s " bb " .

Solution: Range Type Dynamic Programming. We use dp[i][j]

to denote the maximum palindromic subsequence of s(i,j). Like the
substring palindrome, we only need to fill out the upper bound of
the matrix. Let us dismentle the problems into different length of
substring:
L=2: bb bb ba ab i = 0 . . n−L+1, j=i+L−1
L=3: bbb bba bab , i f s [ i ] == s [ j ] , Yes : dp [ i ] [ j ] = dp [ i + 1 ] [
j −1]+2 , which we o b t i n e d from l a s t L2 , No : dp [ i ] [ j ] =
max( dp [ i + 1 ] [ j ] , dp [ i ] [ j −1])
L=4, bbba , bbab
L=5, bbbab

The process will be controled by the range of the length of substring.

And we fill out the matrix in the following way: this is a ranging type
of dynamic programming.
[[1 , 2, 0, 0, 0] , [0 , 1, 2, 0, 0] , [0 , 0 , 1 , 1 , 0] , [0 , 0 ,
0 , 1 , 1] , [0 , 0, 0, 0, 1]]
[[1 , 2, 3, 0, 0] , [0 , 1, 2, 2, 0] , [0 , 0 , 1 , 1 , 3] , [0 , 0 ,
0 , 1 , 1] , [0 , 0, 0, 0, 1]]
[[1 , 2, 3, 3, 0] , [0 , 1, 2, 2, 3] , [0 , 0 , 1 , 1 , 3] , [0 , 0 ,
0 , 1 , 1] , [0 , 0, 0, 0, 1]]
[[1 , 2, 3, 3, 4] , [0 , 1, 2, 2, 3] , [0 , 0 , 1 , 1 , 3] , [0 , 0 ,
0 , 1 , 1] , [0 , 0, 0, 0, 1]]

1 def longestPalindromeSubseq ( s e l f , s ) :
2 i f not s :
3 return 0
4 i f s == s [ : : − 1 ] :
5 return len ( s )
6
7 rows = l e n ( s )
8 dp = [ [ 0 f o r c o l i n r a n g e ( rows ) ] f o r row i n r a n g e ( rows )
]
612 26. STRING QUESTIONS (15%)

9 f o r i i n r a n g e ( 0 , rows ) :
10 dp [ i ] [ i ] = 1
11
12 f o r l i n r a n g e ( 2 , rows +1) : #u s e a s p a c e
13 f o r i i n r a n g e ( 0 , rows−l +1) : #s t a r t 0 , end l e n − l +1
14 j = i+l −1
15 i f j > rows :
16 continue
17 i f s [ i ] == s [ j ] :
18 dp [ i ] [ j ] = dp [ i + 1 ] [ j −1]+2
19 else :
20 dp [ i ] [ j ] = max( dp [ i ] [ j −1] , dp [ i + 1 ] [ j ] )
21 r e t u r n dp [ 0 ] [ rows −1]

26.3.2 Calculator
In this section, for basic calculator, we have operators ’+’, ’-’, ’*’, ’/’, and
parentheses. Because of (’+’, ’-’) and (’*’, ’/’) has different priority, and the
parentheses change the priority too. The basic step is to obtain the integers
digit by digit from the string. And, if the previous sign is ’-’, we make sure
we get a negative number. Given a string expression: (a+b/c)*(d-e)+((f-
g)-(h-i))+(j-k). The rule here is to deduct this to:

(a + b/cb ) ∗ (d − e)d + ((f − g)f − (h − i)h ) + (j − k)j (26.1)

a a f

The rules are: 1) Reduce the ’*’ and ’/’: And we handle it when we en-
counter the following operator or at the end of the string. Because, when
we encounter a sign(operator), we check the previous sign, if the previous
sign is ’/’ or ’*’, we compute the previous number with current number to
reduce it into one. 2) Reduce the parentheses into one: (d-e) is reduced to
d, and because the previous sign is ’*’, it is further combined with a and
become a. Thus, if we save the reduced result into a stack, there will be
[a„ f, j], we just need to sum over. thus to avoid the boundary condition,
we can add ’+’ at the end. In the later part, we will explain more about
how to deal with the above two kinds of reduce. There are different levels
of calculators:

1. ’+’, ’-’, w/o parentheses: e.g., a+b+c, or a-b-c.

p r e s i g n = ' + ' , num=0 , f o r t h e d i g i t s , s t a c k f o r s a v i n g
e i t h e r negative or p o s i t i v e i n t e g e r
1 . i t e r a t e through t h e c h a r :
i f a d i g i t : obtain the i n t e g e r
e l s e i f c in [ '+ , ' − '] or c i s the l a s t char :
i f p r e s i g n == ' − ' :
num = −num
s t a c k . append (num)
num = 0
presign = c
26.3. ADVANCED SINGLE STRING 613

2 . sum o v e r t h e p o s i t i v e and n e g a t i v e v a l u e s i n t h e
stack

2. ’+’, ’-’, with parentheses: e.g. a-b-c vs a-(b-c-d). To handle the

parentheses, we need to think of (b-c-d) as a single integer. When we
encounter the left parenthesis we save its state: the previous sign and
’(’; when encountering ’)’, we do a sum over in the stack till we pop
out the previous ’(’. And we recover its state: the previous sign and
the num.
i f c == ' ( ' :
s t a c k . append ( p r e s i g n )
s t a c k . append ( ' ( ' )
p r e s i g n = '+ '
num = 0
e l s e i f c in [ '+ , ' − ' , ') ' ] : # i f i t s operator or ') '
i f p r e s i g n == ' − ' :
num = −num
i f c == ' ) ' :
sum o v e r i n t h e s t a c k t i l l top i s ' ( ' ,
r e s t o r e the s t a t e
else :
s t a c k . append (num)
num=0 , p r e s i g n = c

3. ’+’, ’-’, ’*’, ’/’, w/o parentheses: This is similar to Case 1, other than
the ’*’, ’-’. For example, a-b/c/d*e. When we are at c, we compute the
pop the top element in the stack and compute (-b/c)=f, and append f
into the stack. When we are at d, similarly, we compute (f/d)=g, and
append g into the stack.
1 . i t e r a t e through t h e c h a r :
i f a d i g i t : obtain the i n t e g e r
i f c in [ '+ , ' − ' , ' ∗ ' , ' / ' ] or c i s the l a s t char :
i f p r e s i g n == ' − ' :
num = −num
# we r e d u c e t h e c u r r e n t num with p r e v i o u s
e l i f presign in [ '∗ ' , ' / ' ] :
num = o p e r a t o r ( s t a c k . pop ( ) , p r e s i g n , num)
s t a c k . append (num)
num = 0
presign = c
2 . sum o v e r t h e p o s i t i v e and n e g a t i v e v a l u e s i n t h e
stack

4. ’+’, ’-’, ’*’, ’/’, with parentheses. It is a combination of the previous

cases, so I am not giving code here.
614 26. STRING QUESTIONS (15%)

26.4 Basic Calculator (L224, ***). Implement a basic calculator to

evaluate a simple expression string. The expression string may contain
open ( and closing parentheses ), the plus + or minus sign -, non-
negative integers and empty spaces.
Example 1 :
Input : " 1 + 1 "
Output : 2

Example 2 :
Input : " 2−1 + 2 "
Output : 3

Example 3 :
Input : "(1+(4+5+2) −3)+(6+8) "
Output : 23

Stack for Parentheses. Suppose firstly we don’t consider the paren-

theses, then it is linear iterating each char and handle the digits and
the sign. The code are the first if and elif in the following Python
code. Now, to think of the parentheses, it does affect the result: 2-
(5-6). With and without parentheses give 3 and -9 for answer. When
we encounter a ’(’, we need to reset ans and the sign, plus we need
to save the previous ans and sign, at here it is (2, -). Then when we
encounter a ’)’, we first collect the answer from last ’(’ to current ’)’.
And, we need to sum up the answer before ’(’.
(1+(4+5+2) −3)+(6+8)
a t ( : s t a c k = [ 0 , +]
a t s e c o n d ' ( ' : s t a c k = [ 0 , +, 1 +]
a t f i r s t ' ) ' : ans =11 , pop out [ 1 , +] , ans = 1 2 , +
a t s e c o n d ' ) ' : ans = 9 , pop out [ 0 , +] , ans = 9
a t t h i r d ' ( ' : ans = 9 , +, s t a c k = [ 9 , + ] , r e s e t ans = 0 ,
s i g n = '+ '

1 def calculate ( s e l f , s ) :
2 s = s + '+ '
3 ans = num = 0 #num i s t o g e t each number
4 s i g n = '+ '
5 s t a c k = c o l l e c t i o n s . deque ( )
6 for c in s :
7 i f c . i s d i g i t ( ) : #g e t number
8 num = 10∗num + i n t ( c )
9 e l i f c i n [ '− ' , '+ ' , ' ) ' ] :
10 i f s i g n == '− ' :
11 num = −num
12 i f c == ' ) ' :
13 w h i l e s t a c k and s t a c k [ −1] != ' ( ' :
14 num += s t a c k . pop ( )
15 s t a c k . pop ( )
16 s i g n = s t a c k . pop ( )
17 else :
26.3. ADVANCED SINGLE STRING 615

18 s t a c k . append (num)
19 num = 0
20 sign = c
21 e l i f c == ' ( ' : # l e f t p a r a t h e s e , put t h e c u r r e n t
ans and s i g n i n t h e s t a c k
22 s t a c k . append ( s i g n )
23 s t a c k . append ( ' ( ' )
24 num = 0
25 s i g n = '+ '
26
27 while stack :
28 ans += s t a c k . pop ( )
29 r e t u r n ans

26.5 Basic Calculator III (L772, ***). Implement a basic calculator to

evaluate a simple expression string. The expression string may contain
open ( and closing parentheses ), the plus + or minus sign -, non-
negative integers and empty spaces . The expression string contains
only non-negative integers, +, -, *, / operators , open ( and closing
parentheses ) and empty spaces . The integer division should truncate
toward zero. You may assume that the given expression is always
valid. All intermediate results will be in the range of [-2147483648,
2147483647].
Some examples :

"1 + 1" = 2
" 6−4 / 2 " = 4
" 2∗(5+5 ∗2) /3+(6/2+8) " = 21
"(2+6∗ 3+5− (3∗14/7+2) ∗ 5 ) +3"=−12

Solution: Case 4
1 def calculate ( s e l f , s ) :
2 ans = num = 0
3 s t a c k = c o l l e c t i o n s . deque ( )
4 n = len ( s )
5 p r e s i g n = '+ '
6 s = s+ '+ '
7 d e f op ( pre , op , c u r ) :
8 i f op == ' ∗ ' :
9 return pre ∗ cur
10 i f op == ' / ' :
11 r e t u r n −abs ( p r e ) // c u r i f p r e < 0 e l s e p r e // c u r
12 f o r i , c i n enumerate ( s ) :
13 i f c . i s d i g i t () :
14 num = 10∗num + i n t ( c )
15 e l i f c i n [ '+ ' , '− ' , ' ∗ ' , ' / ' , ' ) ' ] :
16 i f p r e s i g n == '− ' :
17 num = −num
18 e l i f presign in [ '∗ ' , '/ ' ] :
19 num = op ( s t a c k . pop ( ) , p r e s i g n , num)
616 26. STRING QUESTIONS (15%)

20 i f c == ' ) ' : # r e d u c e t o one number , and

r e s t o r e the s t a t e
21 w h i l e s t a c k and s t a c k [ −1] != ' ( ' :
22 num += s t a c k . pop ( )
23 s t a c k . pop ( ) # pop out ' ( '
24 p r e s i g n = s t a c k . pop ( )
25 else :
26 s t a c k . append (num)
27 num = 0
28 presign = c
29 e l i f c == ' ( ' : #s a v e s t a t e , and r e s t a r t a new
process
30 s t a c k . append ( p r e s i g n )
31 s t a c k . append ( c )
32 p r e s i g n = '+ '
33 num = 0
34
35 ans = 0
36 while stack :
37 ans += s t a c k . pop ( )
38 r e t u r n ans

26.6 227. Basic Calculator II (exercise)

26.3.3 Others
Possible methods: two pointers, one loop+two pointers

26.4 Exact Matching: Sliding Window and KMP

Exact Pattern Matching
1. 14. Longest Common Prefix (easy)

26.5 Anagram Matching: Sliding Window

However, if the question is to find all anagrams of the pattern in string S.
For example: 438. Find All Anagrams in a String
Example 1:
1 Input :
2 s : " cbaebabacd " p : " abc "
3
4 Output :
5 [0 , 6]

Explanation: The substring with start index = 0 is "cba", which is an

anagram of "abc". The substring with start index = 6 is "bac", which is an
anagram of "abc".
Python code with sliding window:
26.6. EXACT MATCHING 617

1 d e f findAnagrams ( s e l f , s , p ) :
2 """
3 : type s : s t r
4 : type p : s t r
5 : rtype : List [ int ]
6 """
7 i f l e n ( s )<l e n ( p ) o r not s :
8 return [ ]
9 #f r e q u e n c y t a b l e
10 t a b l e = {}
11 for c in p :
12 t a b l e [ c ]= t a b l e . g e t ( c , 0 ) +1
13
14 begin , end = 0 , 0
15 r = []
16 counter = len ( table )
17 w h i l e end<l e n ( s ) :
18 end_char = s [ end ]
19 i f end_char i n t a b l e . k e y s ( ) :
20 t a b l e [ end_char]−=1
21 i f t a b l e [ end_char ]==0:
22 c o u n t e r −=1
23
24 #go t o l o n g e r s t r i n g , from A, AD,
25 end+=1
26
27 w h i l e c o u n t e r ==0: #we have t h e same c h a r i n t h e
window , s t a r t t o t r i m i t
28 #s a v e t h e b e s t r e s u l t , j u s t t o s a v e t h e
beigining
29 i f end−b e g i n == l e n ( p ) :
30 r . append ( b e g i n )
31 #move t h e window f o r w a r d
32 start_char = s [ begin ]
33 i f s t a r t _ c h a r i n t a b l e : #r e v e r s e t h e count
34 t a b l e [ s t a r t _ c h a r ]+=1
35 i f t a b l e [ s t a r t _ c h a r ]==1: #o n l y i n c r e a s e when
it is 1
36 c o u n t e r+=1
37
38 b e g i n+=1
39 return r

26.6 Exact Matching

26.6.1 Longest Common Subsequence

26.7 Exercise
26.7.1 Palindrome
618 26. STRING QUESTIONS (15%)

26.1 EXERCISES
1. Valid Palindrome (L125, *). Given a string, determine if it is a palin-
drome, considering only alphanumeric characters and ignoring cases.
Note: For the purpose of this problem, we define empty string as valid
palindrome.
Example 1 :

Input : "A man , a plan , a c a n a l : Panama "

Output : t r u e

Example 2 :

Input : " r a c e a c a r "

Output : f a l s e

2. Longest Palindrome (L409, *). Given a string which consists of

lowercase or uppercase letters, find the length of the longest palindromes
that can be built with those letters. This is case sensitive, for example
"Aa" is not considered a palindrome here.
Example :

Input :
" abccccdd "

Output :
7

Explanation :
One l o n g e s t p a l i n d r o m e t h a t can be b u i l t i s " d c c a c c d " , whose
length i s 7.

3. Longest Palindromic Substring(L5, **). Given a string s, find the

longest palindromic substring in s. You may assume that the maximum
length of s is 1000.
Example 1 :

Input : " babad "

Output : " bab "
Note : " aba " i s a l s o a v a l i d answer .

Example 2 :

Input : " cbbd "

Output : " bb "
27

Tree Questions(10%)

The purpose of the

27.1 Binary Search Tree

In computer science, a search tree is a tree data structure used for locating
specific keys from within a set. In order for a tree to function as a search
tree, the key for each node must be greater than any keys in subtrees on the
left and less than any keys in subtrees on the right.
The advantage of search trees is their efficient search time ( O(log n))
given the tree is reasonably balanced, which is to say the leaves at either
end are of comparable depths as we introduced the balanced binary tree.
The search tree data structure supports many dynamic-set operations,
including Search for a key, minimum or maximum, predecesor or successor,
insert and delete. Thus, a search tree can be both used as a dictionary
and a priority queue.
A binary search tree (BST) is a search tree with children up to two.
There are three possible ways to properly define a BST, and we use l and
r to represent the left and right child of node x: 1)l.key ≤ x.key < r.key,
2) l.key < x.key ≤ r.key, 3) l.key < x.key < r.key. In the first and second
definition, our resulting BST allows us to have duplicates, while not in the
case of the third definiton. One example of BST without duplicates is shown
in Fig 27.1.

Operations
When looking for a key in a tree (or a place to insert a new key), we tra-
verse the tree from root to leaf, making comparisons to keys stored in the

619
620 27. TREE QUESTIONS(10%)

Figure 27.1: Example of Binary search tree of depth 3 and 8 nodes.

nodes of the tree and deciding, on the basis of the comparison, to continue
searching in the left or right subtrees. On average, this means that each
comparison allows the operations to skip about half of the tree, so that each
SEARCH, INSERT or DELETE takes time proportional to the logarithm of
the number of items stored in the tree. This is much better than the linear
time required to find items by key in an (unsorted) array, but slower than
the corresponding operations on hash tables.
In order to build a BST, we need to INSERT a series of elements in the
tree organized by the searching tree property, and in order to INSERT, we
need to SEARCH the position to INSERT this element. Thus, we introduce
these operations in the order of SEARCH, INSERT and GENERATE.

Figure 27.2: The lightly shaded nodes indicate the simple path from the root
down to the position where the item is inserted. The dashed line indicates
the link in the tree that is added to insert the item.

SEARCH There are two different implementations for SEARCH: recur-

sive and iterative.
1 # recursive searching
2 d e f s e a r c h ( r o o t , key ) :
3 # Base Cases : r o o t i s n u l l o r key i s p r e s e n t a t r o o t
4 i f r o o t i s None o r r o o t . v a l == key :
27.1. BINARY SEARCH TREE 621

5 return root
6
7 # Key i s g r e a t e r than r o o t ' s key
8 i f r o o t . v a l < key :
9 r e t u r n s e a r c h ( r o o t . r i g h t , key )
10
11 # Key i s s m a l l e r than r o o t ' s key
12 r e t u r n s e a r c h ( r o o t . l e f t , key )

Also, we can write it in an iterative way, which helps us save the heap space:
1 # i t e r a t i v e searching
2 d e f i t e r a t i v e _ s e a r c h ( r o o t , key ) :
3 w h i l e r o o t i s not None and r o o t . v a l != key :
4 i f r o o t . v a l < key :
5 root = root . right
6 else :
7 root = root . l e f t
8 return root

INSERT Assuming we are inserting a node 13 into the tree shown in

Fig 27.2. A new key is always inserted at leaf (there are other ways to insert
but here we only discuss this one way). We start searching a key from root
till we hit an empty node. Then we new a TreeNode and insert this new
node either as the left or the child node according to the searching property.
Here we still shows both the recursive and iterative solutions.
1 # Recursive i n s e r t i o n
2 d e f i n s e r t i o n ( r o o t , key ) :
3 i f r o o t i s None :
4 r o o t = TreeNode ( key )
5 return root
6 i f r o o t . v a l < key :
7 r o o t . r i g h t = i n s e r t i o n ( r o o t . r i g h t , key )
8 else :
9 r o o t . l e f t = i n s e r t i o n ( r o o t . l e f t , key )
10 return root

The above code needs return value and reassign the value for the right
and left every time, we can use the following code which might looks more
complex with the if condition but works faster and only assign element at
the end.
1 # recursive insertion
2 def i n s e r t i o n ( root , val ) :
3 i f r o o t i s None :
4 r o o t = TreeNode ( v a l )
5 return
6 i f val > root . val :
7 i f r o o t . r i g h t i s None :
8 r o o t . r i g h t = TreeNode ( v a l )
9 else :
10 i n s e rt i o n ( root . right , val )
622 27. TREE QUESTIONS(10%)

11 else :
12 i f r o o t . l e f t i s None :
13 r o o t . l e f t = TreeNode ( v a l )
14 else :
15 i n s e rt i o n ( root . l e f t , val )

We can search the node iteratively and save the previous node. The while
loop would stop when hit at an empty node. There will be three cases in
the case of the previous node.
1 . The p r e v i o u s node i s None , which means t h e t r e e i s empty , s o
we a s s i g n a r o o t node with t h e v a l u e
2 . The p r e v i o u s node has a v a l u e l a r g e r than t h e key , means we
need t o put key a s l e f t c h i l d .
3 . The p r e v i o u s node has a v a l u e s m a l l e r than t h e key , means we
need t o put key a s r i g h t c h i l d .

1 # iterative insertion
2 d e f i t e r a t i v e I n s e r t i o n ( r o o t , key ) :
3 pre_node = None
4 node = r o o t
5 w h i l e node i s not None :
6 pre_node = node
7 i f key < node . v a l :
8 node = node . l e f t
9 else :
10 node = node . r i g h t
11 # we r e a c h e d t o t h e l e a f node which i s pre_node
12 i f pre_node i s None :
13 r o o t = TreeNode ( key )
14 e l i f pre_node . v a l > key :
15 pre_node . l e f t = TreeNode ( key )
16 else :
17 pre_node . r i g h t = TreeNode ( key )
18 return root

BST Generation First, let us declare a node as BST which is the root
node. Given a list, we just need to call INSERT for each element. The time
complexity can be O(n logn ).
1 datas = [ 8 , 3 , 10 , 1 , 6 , 14 , 4 , 7 , 13]
2 BST = None
3 f o r key i n d a t a s :
4 BST = i t e r a t i v e I n s e r t i o n (BST, key )
5 p r i n t ( L e v e l O r d e r (BST) )
6 # output
7 # [ 8 , 3 , 10 , 1 , 6 , 14 , 4 , 7 , 13]

DELETE Before we start to check the implementation of DELETE, I

would suggest the readers to read the next subsection–the Features of BST
at first, and then come back here to finish this paragraph.
When we delete a node, three possibilities arise.
27.1. BINARY SEARCH TREE 623

1 ) Node t o be d e l e t e d i s l e a f : Simply remove from t h e t r e e .

50 50
/ \ delete (20) / \
30 70 −−−−−−−−−> 30 70
/ \ / \ \ / \
20 40 60 80 40 60 80

2 ) Node t o be d e l e t e d has o n l y one c h i l d : Copy t h e c h i l d t o t h e

node and d e l e t e t h e c h i l d

50 50
/ \ delete (30) / \
30 70 −−−−−−−−−> 40 70
\ / \ / \
40 60 80 60 80

3 ) Node t o be d e l e t e d has two c h i l d r e n : Find i n o r d e r s u c c e s s o r

o f t h e node . Copy c o n t e n t s o f t h e i n o r d e r s u c c e s s o r t o t h e
node and d e l e t e t h e i n o r d e r s u c c e s s o r . Note t h a t i n o r d e r
p r e d e c e s s o r can a l s o be used .

50 60
/ \ delete (50) / \
40 70 −−−−−−−−−> 40 70
/ \ \
60 80 80

The i m p o r t a n t t h i n g t o n o t e i s , i n o r d e r s u c c e s s o r i s needed o n l y
when r i g h t c h i l d i s not empty . In t h i s p a r t i c u l a r c a s e ,
i n o r d e r s u c c e s s o r can be o b t a i n e d by f i n d i n g t h e minimum
v a l u e i n r i g h t c h i l d o f t h e node .

Features of BST
Minimum and Maximum The operation is similar to search, to find
the minimum, we always traverse on the left subtree. For the maximum, we
just need to replace the “left” with “right” in the key word. Here the time
complexity is the same O(lgn).
1 # recursive
2 d e f get_minimum ( r o o t ) :
3 i f r o o t i s None :
4 r e t u r n None
5 i f r o o t . l e f t i s None : # a l e a f o r node has no l e f t s u b t r e e
6 return root
7 i f root . l e f t :
8 r e t u r n get_minimum ( r o o t . l e f t )
9
10 # iterative
11 d e f iterative_get_minimum ( r o o t ) :
12 w h i l e r o o t . l e f t i s not None :
13 root = root . l e f t
624 27. TREE QUESTIONS(10%)

14 return root

Also, sometimes we need to search two additional items related to a

given node: successor and predecessor. The structure of a binary search
tree allows us to determine the successor or the predecessor of a tree without
ever comparing keys.

Successor of a Node A successor of node x is the smallest item in the

BST that is strictly greater than x. It is also called in-order successor, which
is the next node in Inorder traversal of the Binary Tree. Inoreder Successor
is None for the last node in inorder traversal. If our TreeNode data structure
has a parent node.
Use parent node: the algorihtm has two cases on the basis of the right
subtree of the input node.
For t h e r i g h t s u b t r e e o f t h e node :
1 ) I f i t i s not None , then t h e s u c c e s s o r i s t h e minimum node i n
t h e r i g h t s u b t r e e . e . g . f o r node 1 2 , s u c c e s s o r ( 1 2 ) = 13 = min
(12. right )
2 ) I f i t i s None , then t h e s u c c e s s o r i s one o f i t s a n c e s t o r s . We
t r a v e r s e up u s i n g t h e p a r e n t node u n t i l we f i n d a node which
i s t h e l e f t c h i l d o f i t s p a r e n t . Then t h e p a r e n t node h e r e
i s the s u c c e s s o r . e . g . s u c c e s s o r ( 2 ) =5

The Python code is provided:

1 def Successor ( root , n) :
2 # Step 1 o f t h e above a l g o r i t h m
3 i f n . r i g h t i s not None :
4 r e t u r n get_minimum ( n . r i g h t )
5 # Step 2 o f t h e above a l g o r i t h m
6 p = n . parent
7 w h i l e p i s not None :
8 i f n == p . l e f t :# i f c u r r e n t node i s t h e l e f t c h i l d node ,
then we found t h e s u c c e s s o r , p
9 return p
10 n = p
11 p = p . parent
12 return p

However, if it happens that your tree node has no parent defined, which
means you can not traverse back its parents. We only have one option. Use
the inorder tree traversal, and find the element right after the node.
For t h e r i g h t s u b t r e e o f t h e node :
1 ) I f i t i s not None , then t h e s u c c e s s o r i s t h e minimum node i n
t h e r i g h t s u b t r e e . e . g . f o r node 1 2 , s u c c e s s o r ( 1 2 ) = 13 = min
(12. right )
2 ) I f i t i s None , then t h e s u c c e s s o r i s one o f i t s a n c e s t o r s . We
t r a v e r s e down from t h e r o o t t i l l we f i n d c u r r e n t node , t h e
node i n advance o f c u r r e n t node i s t h e s u c c e s s o r . e . g .
s u c c e s s o r ( 2 ) =5
27.1. BINARY SEARCH TREE 625

1 def S uc ce s so rI n or de r ( root , n) :
2 # Step 1 o f t h e above a l g o r i t h m
3 i f n . r i g h t i s not None :
4 r e t u r n get_minimum ( n . r i g h t )
5 # Step 2 o f t h e above a l g o r i t h m
6 s u c c = None
7 w h i l e r o o t i s not None :
8
9 i f n . val > root . val :
10 root = root . right
11 e l i f n . val < root . val :
12 succ = root
13 root = root . l e f t
14 e l s e : # we found t h e node , no need t o t r a v e r s e
15 break
16 return succ

Predecessor of A Node A predecessor of node x on the other side, is the

largest item in BST that is strictly smaller than x. It is also called in-order
predecessor, which denotes the previous node in Inorder traversal of BST.
e.g. for node 14, predecessor(14)=12= max(14.left). The same searching
rule applies, if node x’s left subtree exists, we return the maximum value
of the left subtree. Otherwise we traverse back its parents, and make sure
it is the right subtree, then we return the value of its parent, otherwise the
reversal traverse keeps going.

1 def Predecessor ( root , n) :

2 # Step 1 o f t h e above a l g o r i t h m
3 i f n . l e f t i s not None :
4 r e t u r n get_maximum ( n . l e f t )
5 # Step 2 o f t h e above a l g o r i t h m
6 p = n . parent
7 w h i l e p i s not None :
8 i f n == p . r i g h t :# i f c u r r e n t node i s t h e r i g h t node , p a r e n t
i s smaller
9 return p
10 n = p
11 p = p . parent
12 return p

The worst case to find the successor or the predecessor of a BST is to search
the height of the tree: include the one of the subtrees of the current node,
and go back to all the parents and greatparents of this code, which makes
it the height of the tree. The expected time complexity is O(lgn). And the
worst is when the tree line up and has no branch, which makes it O(n).
Now we put a table here to summarize the space and time complexity
for each operation.
626 27. TREE QUESTIONS(10%)

Table 27.1: Time complexity of operations for BST in big O notation

Algorithm Average Worst Case
Space O(n) O(n)
Search O(lgn) O(n)
Insert O(lgn) O(n)
Delete O(lgn) O(n)

27.2 Segment Tree

Segment Tree is a static full binary tree similar to heap that is used for stor-
ing the intervals or segments. ‘Static‘ here means once the data structure is
build, it can not be modified or extended. Segment tree is a data structure
that can efficiently answer numerous dynamic range queries problems (in
logarithmic time) like finding minimum, maximum, sum, greatest common
divisor, least common denominator in array. The “dynamic" means there
are constantly modifications of the value of elements (not the tree structure).
For instance, given a problem to find the index of the minimum/maximum/-
sum of all elements in an given range of an array: [i:j].

Definition Consider an array A of size n and a corresponding Segment

Tree T (here a range [0, n-1] in A is represented as A[0:N-1]):

1. The root of T represents the whole array A[0:N-1].

2. Each internal node in the Segment Tree T represents the interval of

A[i:j] where 0 < i < j < n.

3. Each leaf in T represents a single element A[i], where 0 ≤ i < N .

4. If the parent node is in range [i, j], then we separate this range at the
middle position m = (i + j)2, the left child takes range [i, m], and the
right child take the interval of [m+1, j].

Because in each step of building the segment tree, the interval is divided
into two halves, so the height of the segment tree will be log N . And there
will be totally N leaves and N-1 number of internal nodes, which makes the
total number of nodes in segment tree to be 2N − 1 and make the segment
tree a full binary tree.
Here, we use the Range Sum Query (RSQ) problem to demonstrate how
segment tree works:

27.1 307. Range Sum Query - Mutable (medium). Given an integer

array nums, find the sum of the elements between indices i and j (i ≤
j), inclusive.The update(i, val) function modifies nums by updating
the element at index i to val.
27.2. SEGMENT TREE 627

Example :

Given nums = [ 1 , 3 , 5 ]

sumRange ( 0 , 2 ) −> 9
update ( 1 , 2 )
sumRange ( 0 , 2 ) −> 8

Note:

1. The array is only modifiable by the update function.

2. You may assume the number of calls to update and sumRange
function is distributed evenly.

Solution: Brute-Force. There are several ways to solve the RSQ.

The brute-force solution is to simply iterate the array from index i
to j to sum up the elements and return its corresponding index. And
it gives O(n) per query, such algorithm maybe infeasible if queries are
constantly required. Because the update and query action distributed
evenly, it still gives O(n) time complexity and O(n) in space, which
will get LET error.

Solution: Segment Tree. With Segment Tree, we can store the

TreeNode’s val as the sum of elements in its corresponding interval.
We can define a TreeNode as follows:
1 c l a s s TreeNode :
2 d e f __init__ ( s e l f , v a l , s t a r t , end ) :
3 s e l f . val = val
4 self . start = start
5 s e l f . end = end
6 s e l f . l e f t = None
7 s e l f . r i g h t = None

As we see in the process, it is actually not necessary if we save the

size of the array, we can decide the start and end index of each node
on-the-fly and saves space.

Build Segment Tree. Because the leaves of the tree is a single ele-
ment, we can use divide and conquer to build the tree recursively. For
a given node, we first build and return its left and right child(including
calculating its sum) in advance in the ‘divide‘ step, and in the ‘con-
quer’ step, we calculate this node’s sum using its left and right child’s
sum, and set its left and right child. Because there are totally 2n − 1
nodes, which makes the time and space complexity O(n).
628 27. TREE QUESTIONS(10%)

Figure 27.3: Illustration of Segment Tree.

1 d e f _buildSegmentTree ( s e l f , nums , s , e ) : #s t a r t i n d e x and

end i n d e x
2 if s > e:
3 r e t u r n None
4 i f s == e :
5 r e t u r n s e l f . TreeNode ( nums [ s ] )
6
7 m = ( s + e ) //2
8 # divide
9 l e f t = s e l f . _buildSegmentTree ( nums , s , m)
10 r i g h t = s e l f . _buildSegmentTree ( nums , m+1, e )
11
12 # co nqu er
13 node = s e l f . TreeNode ( l e f t . v a l + r i g h t . v a l )
14 node . l e f t = l e f t
15 node . r i g h t = r i g h t
16 r e t u r n node

Update Segment Tree. Updating the value at index i is like search-

ing the tree for leaf node with range [i, i]. We just need to recalculate
the value of the node in the path of the searching. This operation
takes O(log n) time complexity.
1 d e f _updateNode ( s e l f , i , v a l , r o o t , s , e ) :
2 i f s == e :
3 root . val = val
4 return
5 m = ( s + e ) //2
6 i f i <= m:
27.2. SEGMENT TREE 629

7 s e l f . _updateNode ( i , v a l , r o o t . l e f t , s , m)
8 else :
9 s e l f . _updateNode ( i , v a l , r o o t . r i g h t , m+1, e )
10 root . val = root . l e f t . val + root . right . val
11 return

Range Sum Query. Each query range [i, j], will be a combination
of ranges of one or multiple ranges. For instance, as in the segment
tree shown in Fig 27.3, for range [2, 4], it will be combination of [2, 3]
and [4, 4]. The process is similar to the updating, we starts from the
root, and get its middle index m: 1) if [i, j] is the same as [s, e] that i
== s and j == e, then return the value, 2) if the interval [i, j] is within
range [s, m] that j <=m , then we just search it in the left branch. 3)
if [i, j] in within range [m+1, e] that i>m, then we search for the right
branch. 4) else, we search both branch and the left branch has target
[i, m], and the right side has target [m+1, j], the return value should
be the sum of both sides. The time complexity is still O(log n).
1 d e f _rangeQuery ( s e l f , r o o t , i , j , s , e ) :
2 i f s > e or i > j :
3 return 0
4 i f s == i and j == e :
5 r e t u r n r o o t . v a l i f r o o t i s not None e l s e 0
6
7 m = ( s + e ) //2
8
9 i f j <= m:
10 r e t u r n s e l f . _rangeQuery ( r o o t . l e f t , i , j , s , m)
11 e l i f i > m:
12 r e t u r n s e l f . _rangeQuery ( r o o t . r i g h t , i , j , m+1 , e )
13 else :
14 r e t u r n s e l f . _rangeQuery ( r o o t . l e f t , i , m, s , m) +
s e l f . _rangeQuery ( r o o t . r i g h t , m+1, j , m+1 , e)

The complete code is given:

1 c l a s s NumArray :
2 c l a s s TreeNode :
3 d e f __init__ ( s e l f , v a l ) :
4 s e l f . val = val
5 s e l f . l e f t = None
6 s e l f . r i g h t = None
7
8 d e f __init__ ( s e l f , nums ) :
9 self .n = 0
10 s e l f . s t = None
11 i f nums :
12 s e l f . n = l e n ( nums )
13 s e l f . s t = s e l f . _buildSegmentTree ( nums , 0 , s e l f .
n−1)
14
630 27. TREE QUESTIONS(10%)

15 d e f update ( s e l f , i , v a l ) :
16 s e l f . _updateNode ( i , v a l , s e l f . s t , 0 , s e l f . n −1)
17
18 d e f sumRange ( s e l f , i , j ) :
19 r e t u r n s e l f . _rangeQuery ( s e l f . s t , i , j , 0 , s e l f . n−1)

Segment tree can be used here to lower the complexity of each query to
O(logn).

27.3 Trie for String

Actually, it is not necessary to store the letter as the key, because if

we order the child branches of every node alphabetically from left to
right, the position in the tree defines the key which it is associated to.

2. The root node in a Trie represents an empty string.

Figure 27.4: Trie VS Compact Trie

these paths. In this way, every node branches out, and every node traversed
represents a choice between two different words. The compressed trie that
corresponds to our example trie is also shown in Figure 27.4.

Operations: INSERT, SEARCH Both for INSERT and SEARCH, it

| | is the alphlbetical size, and N is the total number of nodes in the trie
P

structure. The upper bound of N is n ∗ m.

Figure 27.5: Trie Structure

632 27. TREE QUESTIONS(10%)

27.1 208. Implement Trie (Prefix Tree) (medium). Implement a trie

Note: You may assume that all inputs are consist of lowercase letters
a-z. All inputs are guaranteed to be non-empty strings.

INSERT with INSERT operation, we woould be able to insert a

SEARCH For SEARCH, like INSERT, we traverse the trie using

1 d e f s e a r c h ( s e l f , word ) :
2 node = s e l f . r o o t
3 f o r c i n word :
4 l o c = ord ( c )−ord ( ' a ' )
5 # c a s e 1 : not a l l l e t t e r s matched
6 i f node . c h i l d r e n [ l o c ] i s None :
7 return False
8 node = node . c h i l d r e n [ l o c ]
9 # case 2
10 r e t u r n True i f node . is_word e l s e F a l s e

27.1 336. Palindrome Pairs (hard). Given a list of unique words, find
all pairs of distinct indices (i, j) in the given list, so that the concate-
nation of the two words, i.e. words[i] + words[j] is a palindrome.
1 Example 1 :
2
3 I nput : [ " abcd " , " dcba " , " l l s " , " s " , " s s s l l " ]
4 Output : [ [ 0 , 1 ] , [ 1 , 0 ] , [ 3 , 2 ] , [ 2 , 4 ] ]
5 E x p l a n a t i o n : The p a l i n d r o m e s a r e [ " dcbaabcd " , " abcddcba " , "
s l l s " ," l l s s s s l l "]
6
7 Example 2 :
8
9 I nput : [ " bat " , " tab " , " c a t " ]
10 Output : [ [ 0 , 1 ] , [ 1 , 0 ] ]
11 E x p l a n a t i o n : The p a l i n d r o m e s a r e [ " b a t t a b " , " t a b b a t " ]
634 27. TREE QUESTIONS(10%)

Solution: One Forward Trie and Another Backward Trie. We

start from the naive solution, which means for each element, we check
if it is palindrome with all the other strings. And from the example
1, [3,3] can be a pair, but it is not one of the outputs, which means
this is a combination problem, the time complexity is Cn Cn−1 , and
multiply it with the average length of all the strings, we make it m,
which makes the complexity to be O(mn2 ). However, we can use Trie
Structure,
1 from c o l l e c t i o n s import d e f a u l t d i c t
2
3
4 c l a s s Trie :
5 d e f __init__ ( s e l f ) :
6 s e l f . l i n k s = d e f a u l t d i c t ( s e l f . __class__ )
7 s e l f . i n d e x = None
8 # h o l d s i n d i c e s which c o n t a i n t h i s p r e f i x and whose
remainder i s a palindrome
9 s e l f . pali_indices = set ()
10
11 d e f i n s e r t ( s e l f , word , i ) :
12 trie = self
13 f o r j , ch i n enumerate ( word ) :
14 t r i e = t r i e . l i n k s [ ch ]
15 i f word [ j + 1 : ] and i s _ p a l i n d r o m e ( word [ j + 1 : ] ) :
16 t r i e . p a l i _ i n d i c e s . add ( i )
17 t r i e . index = i
18
19
20 d e f i s _ p a l i n d r o m e ( word ) :
21 i , j = 0 , l e n ( word ) − 1
22 w h i l e i <= j :
23 i f word [ i ] != word [ j ] :
24 return False
25 i += 1
26 j −= 1
27 r e t u r n True
28
29
30 class Solution :
31 d e f p a l i n d r o m e P a i r s ( s e l f , words ) :
32 ' ' ' Find p a i r s o f p a l i n d r o m e s i n O( n∗k ^2) time and O
( n∗k ) s p a c e . ' ' '
33 root = Trie ()
34 res = [ ]
35 f o r i , word i n enumerate ( words ) :
36 i f not word :
37 continue
38 r o o t . i n s e r t ( word [ : : − 1 ] , i )
39 f o r i , word i n enumerate ( words ) :
40 i f not word :
41 continue
42 t r i e = root
27.3. TRIE FOR STRING 635

43 f o r j , ch i n enumerate ( word ) :
44 i f ch not i n t r i e . l i n k s :
45 break
46 t r i e = t r i e . l i n k s [ ch ]
47 i f i s _ p a l i n d r o m e ( word [ j + 1 : ] ) and t r i e . i n d e x
i s not None and t r i e . i n d e x != i :
48 # i f t h i s word c o m p l e t e s t o a
p a l i n d r o m e and t h e p r e f i x i s a word , c o m p l e t e i t
49 r e s . append ( [ i , t r i e . i n d e x ] )
50 else :
51 # t h i s word i s a r e v e r s e s u f f i x o f o t h e r
words , combine with t h o s e t h a t c o m p l e t e t o a p a l i n d r o m e
52 f o r pali_index in t r i e . pali_indices :
53 i f i != p a l i _ i n d e x :
54 r e s . append ( [ i , p a l i _ i n d e x ] )
55 i f ' ' i n words :
56 j = words . i n d e x ( ' ' )
57 f o r i , word i n enumerate ( words ) :
58 i f i != j and i s _ p a l i n d r o m e ( word ) :
59 r e s . append ( [ i , j ] )
60 r e s . append ( [ j , i ] )
61 return res

Solution2: .Moreover, there are always more clever ways to solve

these problems. Let us look at a clever way: abcd, the prefix is ”. ’a’,
’ab’, ’abc’, ’abcd’, if the prefix is a palindrome, so the reverse[abcd],
reverse[dc], to find them in the words, the words stored in the words
with index is fastest to find. O(n). Note that when considering suf-
fixes, we explicitly leave out the empty string to avoid counting dupli-
cates. That is, if a palindrome can be created by appending an entire
other word to the current word, then we will already consider such a
palindrome when considering the empty string as prefix for the other
word.
1 class Solution ( object ) :
2 d e f p a l i n d r o m e P a i r s ( s e l f , words ) :
3 # 0 means t h e word i s not r e v e r s e d , 1 means t h e
word i s r e v e r s e d
4 words , l e n g t h , r e s u l t = s o r t e d ( [ ( w, 0 , i , l e n (w) )
f o r i , w i n enumerate ( words ) ] +
5 [ ( w[ : : − 1 ] , 1 , i , l e n (w) )
f o r i , w i n enumerate ( words ) ] ) , l e n ( words ) ∗ 2 , [ ]
6
7 #a f t e r t h e s o r t i n g , t h e same s t r i n g were nearby , one
i s 0 and one i s 1
8 f o r i , ( word1 , rev1 , ind1 , l e n 1 ) i n enumerate ( words
):
9 f o r j i n xrange ( i + 1 , l e n g t h ) :
10 word2 , rev2 , ind2 , _ = words [ j ]
11 #p r i n t word1 , word2
12 i f word2 . s t a r t s w i t h ( word1 ) : # word2 might
be l o n g e r
636 27. TREE QUESTIONS(10%)

13 i f i n d 1 != i n d 2 and r e v 1 ^ r e v 2 : # one
i s r e v e r s e d one i s not
14 r e s t = word2 [ l e n 1 : ]
15 i f r e s t == r e s t [ : : − 1 ] : r e s u l t += ( [
ind1 , i n d 2 ] , ) i f r e v 2 e l s e ( [ ind2 , i n d 1 ] , ) # i f r e v 2 i s
r e v e r s e d , t h e from i n d 1 t o i n d 2
16 else :
17 break # from t h e p o i n t o f view , break
i s p o w e r f u l , t h i s way , we o n l y d e a l with p o s s i b l e
reversed ,
18 return r e s u l t
19

• Finding all keys with a common prefix.

• Enumerating a dataset of strings in lexicographical order.

Sorting Lexicographic sorting of a set of keys can be accomplished by

building a trie from them, and traversing it in pre-order, printing only the
leaves’ values. This algorithm is a form of radix sort. This is why it is also
called radix tree.

27.4 Bonus
Solve Duplicate Problem in BST When there are duplicates, things
can be more complicated, and the college algorithm book did not really tell
us what to do when there are duplicates. If you use the definition "left <=
root < right" and you have a tree like:
3
/ \
2 4

then adding a “3” duplicate key to this tree will result in:
3
/ \
2 4
\
3

Note that the duplicates are not in contiguous levels.

This is a big issue when allowing duplicates in a BST representation
as the one above: duplicates may be separated by any number of levels,
27.5. LEETCODE PROBLEMS 637

so checking for duplicate’s existence is not that simple as just checking for
immediate children of a node.
An option to avoid this issue is to not represent duplicates structurally
(as separate nodes) but instead use a counter that counts the number of
occurrences of the key. The previous example would then have a tree like:
1 3(1)
2 / \
3 2(1) 4(1)
4

and after insertion of the duplicate "3" key it will become:

1 3(2)
2 / \
3 2(1) 4(1)
4

This simplifies SEARCH, DELETE and INSERT operations, at the ex-

pense of some extra bytes and counter operations. In the following content,
we assume using definition three so that our BST will have no duplicates.

27.5 LeetCode Problems

1. 144. Binary Tree Preorder Traversal
2. 94. Binary Tree Inorder Traversal
3. 145. Binary Tree Postorder Traversal
4. 589. N-ary Tree Preorder Traversal
5. 590. N-ary Tree Postorder Traversal
6. 429. N-ary Tree Level Order Traversal
7. 103. Binary Tree Zigzag Level Order Traversal(medium)
8. 105. Construct Binary Tree from Preorder and Inorder Traversal
938. Range Sum of BST (Medium)
Given the root node of a binary search tree, return the sum of values
of all nodes with value between L and R (inclusive).
The binary search tree is guaranteed to have unique values.
1 Example 1 :
2
3 I n p u t : r o o t = [ 1 0 , 5 , 1 5 , 3 , 7 , n u l l , 1 8 ] , L = 7 , R = 15
4 Output : 32
5
6 Example 2 :
7
8 I n p u t : r o o t = [ 1 0 , 5 , 1 5 , 3 , 7 , 1 3 , 1 8 , 1 , n u l l , 6 ] , L = 6 , R = 10
9 Output : 23
638 27. TREE QUESTIONS(10%)

Tree Traversal+Divide and Conquer. We need at most O(n) time

Graph Questions (15%)

In this chapter, we will introduce a variety of algorithms for graphs, and

summaries different type of questions from the LeetCode. There are mainly
three sections, searching algorithms in graph, which we already introduced
in Chapter XX, algorithms that can be applied in the graph include breadth-
first search, depth-first search and the topological sort. The second is short-
est paths searching algorithms. So for the graph data structure, we usually
need to search. Basic DFS/BFS can be applied into any graph data struc-
tures. The following sections include more advanced problems, including
the concept in Chapter ??.

28.1 Basic BFS and DFS

There are two types of questions :
• that explicitly telling us we need to find a path/shorest/logest path in
the graph,
• that implicitly requires us to use DFS/BFS, these type of problems
we need to build the graph by ourselves first.

28.1.1 Explicit BFS/DFS

28.1.2 Implicit BFS/DFS
28.1 582. Kill Process (medium). Given n processes, each process has
a unique PID (process id) and its PPID (parent process id).
Each process only has one parent process, but may have one or more
children processes. This is just like a tree structure. Only one process

639
640 28. GRAPH QUESTIONS (15%)

has PPID that is 0, which means this process has no parent process.
All the PIDs will be distinct positive integers.
We use two list of integers to represent a list of processes, where the
first list contains PID for each process and the second list contains the
corresponding PPID.
Now given the two lists, and a PID representing a process you want
to kill, return a list of PIDs of processes that will be killed in the
end. You should assume that when a process is killed, all its children
processes will be killed. No order is required for the final answer.
Example 1 :

Input :
pid = [ 1 , 3 , 10 , 5 ]
ppid = [ 3 , 0 , 5 , 3 ]
kill = 5
Output : [ 5 , 1 0 ]
Explanation :
3
/ \
1 5
/
10
Kill 5 will also k i l l 10.

Analysis: We know the parent and the child node is a tree-like data
structure, which is also a graph. Instead of building a tree data struc-
ture first, we use graph defined as defaultdict indexed by the parent
node, and the children nodes is a list. In such a graph, finding the
killing process is the same as do a DFS/BFS starting from the kill
node, we just save all the passing nodes in the process. Here, we only
give the DFS solution.
1 from c o l l e c t i o n s import d e f a u l t d i c t
2 d e f k i l l P r o c e s s ( s e l f , pid , ppid , k i l l ) :
3 """
4 : type p i d : L i s t [ i n t ]
5 : type ppid : L i s t [ i n t ]
6 : type k i l l : i n t
7 : rtype : List [ int ]
8 """
9 # f i r s t s o r t i n g : nlog n ,
10 graph = d e f a u l t d i c t ( l i s t )
11 f o r p_id , i d i n z i p ( ppid , p i d ) :
12 graph [ p_id ] . append ( i d )
13
14 q = [ kill ]
15 path = s e t ( )
16 while q :
17 i d = q . pop ( 0 )
18 path . add ( i d )
28.2. CONNECTED COMPONENTS 641

19 f o r n e i g i n graph [ i d ] :
20 i f n e i g i n path :
21 continue
22 q . append ( n e i g )
23 r e t u r n l i s t ( path )

28.2 Connected Components

28.2 130. Surrounded Regions(medium). Given a 2D board contain-
ing ’X’ and ’O’ (the letter O), capture all regions surrounded by ’X’.
A region is captured by flipping all ’O’s into ’X’s in that surrounded
region. Surrounded regions shouldn’t be on the border, which means
that any ’O’ on the border of the board are not flipped to ’X’. Any
’O’ that is not on the border and it is not connected to an ’O’ on
the border will be flipped to ’X’. Two cells are connected if they are
adjacent cells connected horizontally or vertically.
Example :

X X X X
X O O X
X X O X
X O X X

A f t e r r u n n i n g your f u n c t i o n , t h e board s h o u l d be :

X X X X
X X X X
X X X X
X O X X

Solution 1: Use DFS and visited matrix. First, this is to do

operations either filip ’O’ or keep it. If ’O’ is at the boarder, and any
other ’O’ that is connected to the boardary ’O’, (the connected com-
ponets that can be found through DFS) will be kept. The complexity
is O(mn), m, n is the rows and columns.
1 d e f s o l v e ( s e l f , board ) :
2 """
3 : type board : L i s t [ L i s t [ s t r ] ]
4 : r t y p e : v o i d Do not r e t u r n anything , modify board in−
place instead .
5 """
6 i f not board :
7 return
8 rows , c o l s = l e n ( board ) , l e n ( board [ 0 ] )
9 i f rows == 1 o r c o l s == 1 :
10 return
11 i f rows == 2 and c o l s == 2 :
12 return
642 28. GRAPH QUESTIONS (15%)

13 moves = [ ( 0 , −1) , ( 0 , 1 ) , ( −1 , 0 ) , ( 1 , 0 ) ]
14 # f i n d a l l c o n n e c t e d components t o t h e edge 0 , and mark
them a s −1,
15 # then f l i p a l l 0 s i n t h e o t h e r p a r t s
16 # change t h e −1 t o 0 s
17 v i s i t e d = [ [ False f o r c in range ( c o l s ) ] f o r r in range (
rows ) ]
18 d e f d f s ( x , y ) : # ( x , y ) i s t h e edge 0 s
19 f o r dx , dy i n moves :
20 nx = x + dx
21 ny = y + dy
22 i f nx < 0 o r nx >= rows o r ny < 0 o r ny >= c o l s
:
23 continue
24 i f board [ nx ] [ ny ] == 'O ' and not v i s i t e d [ nx ] [ ny
]:
25 v i s i t e d [ nx ] [ ny ] = True
26 d f s ( nx , ny )
27 # f i r s t and l a s t c o l
28 f o r i i n r a n g e ( rows ) :
29 i f board [ i ] [ 0 ] == 'O ' and not v i s i t e d [ i ] [ 0 ] :
30 v i s i t e d [ i ] [ 0 ] = True
31 d f s ( i , 0)
32 i f board [ i ] [ − 1 ] == 'O ' and not v i s i t e d [ i ] [ − 1 ] :
33 v i s i t e d [ i ] [ − 1 ] = True
34 d f s ( i , c o l s −1)
35 # f i r s t and l a s t row
36 f o r j in range ( c o l s ) :
37 i f board [ 0 ] [ j ] == 'O ' and not v i s i t e d [ 0 ] [ j ] :
38 v i s i t e d [ 0 ] [ j ] = True
39 dfs (0 , j )
40 i f board [ rows − 1 ] [ j ] == 'O ' and not v i s i t e d [ rows − 1 ] [
j ]:
41 v i s i t e d [ rows − 1 ] [ j ] = True
42 d f s ( rows −1, j )
43 f o r i i n r a n g e ( rows ) :
44 f o r j in range ( c o l s ) :
45 i f board [ i ] [ j ] == 'O ' and not v i s i t e d [ i ] [ j ] :
46 board [ i ] [ j ] = 'X '

Solution 2: mark visited ’O’ as ’-1’ to save space. Instead of

using a O(mn) space to track the visited vertices, we can just mark the
connected components of the boundary ’O’ as ’-1’ in the DFS process,
and then we just need another round to iterate the matrix to flip all
the remaining ’O’ and flip the ’-1’ back to ’O’.
1 d e f s o l v e ( s e l f , board ) :
2 i f not board :
3 return
4 rows , c o l s = l e n ( board ) , l e n ( board [ 0 ] )
5 i f rows == 1 o r c o l s == 1 :
6 return
7 i f rows == 2 and c o l s == 2 :
28.2. CONNECTED COMPONENTS 643

8 return
9 moves = [ ( 0 , −1) , ( 0 , 1 ) , ( −1 , 0 ) , ( 1 , 0 ) ]
10 # f i n d a l l c o n n e c t e d components t o t h e edge 0 , and mark
them a s −1,
11 # then f l i p a l l 0 s i n t h e o t h e r p a r t s
12 # change t h e −1 t o 0 s
13 d e f d f s ( x , y ) : # ( x , y ) i s t h e edge 0 s
14 f o r dx , dy i n moves :
15 nx = x + dx
16 ny = y + dy
17 i f nx < 0 o r nx >= rows o r ny < 0 o r ny >= c o l s
:
18 continue
19 i f board [ nx ] [ ny ] == 'O ' :
20 board [ nx ] [ ny ] = '−1 '
21 d f s ( nx , ny )
22 return
23 # f i r s t and l a s t c o l
24 f o r i i n r a n g e ( rows ) :
25 i f board [ i ] [ 0 ] == 'O ' :
26 board [ i ] [ 0 ] = '−1 '
27 d f s ( i , 0)
28 i f board [ i ] [ − 1 ] == 'O ' :
29 board [ i ] [ − 1 ] = '−1 '
30 d f s ( i , c o l s −1)
31 # # f i r s t and l a s t row
32 f o r j in range ( c o l s ) :
33 i f board [ 0 ] [ j ] == 'O ' :
34 board [ 0 ] [ j ] = '−1 '
35 dfs (0 , j )
36 i f board [ rows − 1 ] [ j ] == 'O ' :
37 board [ rows − 1 ] [ j ] = '−1 '
38 d f s ( rows −1 , j )
39 f o r i i n r a n g e ( rows ) :
40 f o r j in range ( c o l s ) :
41 i f board [ i ] [ j ] == 'O ' :
42 board [ i ] [ j ] = 'X '
43 e l i f board [ i ] [ j ] == '−1 ' :
44 board [ i ] [ j ] = 'O '
45 else :
46 pass

28.3 323. Number of Connected Components in an Undirected

Graph (medium). Given n nodes labeled from 0 to n - 1 and a list
of undirected edges (each edge is a pair of nodes), write a function to
find the number of connected components in an undirected graph.
Example 1 :

I nput : n = 5 and e d g e s = [ [ 0 , 1 ] , [ 1 , 2 ] , [ 3 , 4 ] ]

0 3
| |
644 28. GRAPH QUESTIONS (15%)

1 −−− 2 4

Output : 2

Example 2 :

Input : n = 5 and e d g e s = [ [ 0 , 1 ] , [ 1 , 2 ] , [ 2 , 3 ] , [ 3 , 4 ] ]

0 4
| |
1 −−− 2 −−− 3

Output : 1

Solution: Use DFS. First, if given n node, and have edges, it will
have n components.
for n in v e r t i c e s :
i f n not v i s i t e d :
DFS( n ) # t h i s i s a component t r a v e r s e i t s c o n n e c t e d
components and mark them a s v i s i t e d .

Before we start the main part, it is easier if we can convert the edge list
into undirected graph using adjacencly list. Because it is undirected,
one edge we need to add two directions in the adjancency list.
1 d e f countComponents ( s e l f , n , e d g e s ) :
2 """
3 : type n : i n t
4 : type e d g e s : L i s t [ L i s t [ i n t ] ]
5 : rtype : int
6 """
7 i f not e d g e s :
8 return n
9 def dfs ( i ) :
10 for n in g [ i ] :
11 i f not v i s i t e d [ n ] :
12 v i s i t e d [ n ] = True
13 dfs (n)
14 return
15 # convert edges into a adjacency l i s t
16 g = [ [ ] f o r i in range (n) ]
17 f o r i , j in edges :
18 g [ i ] . append ( j )
19 g [ j ] . append ( i )
20
21 # f i n d components
22 v i s i t e d = [ False ]∗ n
23 ans = 0
24 f o r i in range (n) :
25 i f not v i s i t e d [ i ] :
26 v i s i t e d [ i ] = True
27 dfs ( i )
28 ans += 1
28.3. ISLANDS AND BRIDGES 645

29 r e t u r n ans

28.3 Islands and Bridges

An island is surrounded by water (usually ’0’s in the matrix) and is formed
by connecting adjacent lands horizontally or vertically. An island is acutally
a definition of the connected components.

1. 463. Island Perimeter

2. 305. Number of Islands II

3. 694. Number of Distinct Islands

4. 711. Number of Distinct Islands II

5. 827. Making A Large Island

6. 695. Max Area of Island

7. 642. Design Search Autocomplete System

28.4 200. Number of Islands. (medium). Given a 2d grid map of

’1’s (land) and ’0’s (water), count the number of islands. An island
is surrounded by water and is formed by connecting adjacent lands
horizontally or vertically. You may assume all four edges of the grid
are all surrounded by water.
Example 1 :

I nput :
11110
11010
11000
00000

Output : 1

Example 2 :

I nput :
11000
11000
00100
00011

Output : 3

Solution; DFS without extra space.. We use DFS and mark the
visted components as ’-1’ in the grid.
646 28. GRAPH QUESTIONS (15%)

1 d e f numIslands ( s e l f , g r i d ) :
2 """
3 : type g r i d : L i s t [ L i s t [ s t r ] ]
4 : rtype : int
5 """
6 i f not g r i d :
7 return 0
8 rows , c o l s = l e n ( g r i d ) , l e n ( g r i d [ 0 ] )
9 moves = [ ( − 1 , 0 ) , ( 1 , 0 ) , ( 0 , −1) , ( 0 , 1 ) ]
10 def dfs (x , y) :
11 f o r dx , dy i n moves :
12 nx , ny = x + dx , y + dy
13 i f nx < 0 o r ny < 0 o r nx >= rows o r ny >= c o l s
:
14 continue
15 i f g r i d [ nx ] [ ny ] == ' 1 ' :
16 g r i d [ nx ] [ ny ] = '−1 '
17 d f s ( nx , ny )
18 return
19 ans = 0
20 f o r i i n r a n g e ( rows ) :
21 f o r j in range ( c o l s ) :
22 i f g r i d [ i ] [ j ] == ' 1 ' :
23 g r i d [ i ] [ j ] = '−1 '
24 dfs ( i , j )
25 ans += 1
26 r e t u r n ans

28.5 934. Shortest Bridge In a given 2D binary array A, there are

two islands. (An island is a 4-directionally connected group of 1s not
connected to any other 1s.) Now, we may change 0s to 1s so as to
connect the two islands together to form 1 island.
Return the smallest number of 0s that must be flipped. (It is guaran-
teed that the answer is at least 1.)
Example 1 :

Input : [ [ 0 , 1 ] , [ 1 , 0 ] ]
Output : 1

Example 2 :

Input : [ [ 0 , 1 , 0 ] , [ 0 , 0 , 0 ] , [ 0 , 0 , 1 ] ]
Output : 2

Example 3 :

Input :
[[1 ,1 ,1 ,1 ,1] ,[1 ,0 ,0 ,0 ,1] ,[1 ,0 ,1 ,0 ,1] ,[1 ,0 ,0 ,0 ,1] ,[1 ,1 ,1 ,1 ,1]]

Output : 1
28.4. NP-HARD PROBLEMS 647

Note :

1 <= A. l e n g t h = A [ 0 ] . l e n g t h <= 100

A[ i ] [ j ] == 0 o r A[ i ] [ j ] == 1

Solution 1: DFS to find the complete connected components.

This is a two island problem, First we need to find one node ’1’ and
use DFS to find identify all the ’1’s compose this first island, in this
process, we mark them as ’-1’. Then we can do another BFS starts
from each node marked as ’-1’ that is saved in bf s to find the shortest
path (the first element that is another ’1’ to make the shortest bridge).
A better solution for this is: at each step, we traverse all bf s to only
expand one step. This is an algorithm that finds the shortest path
from multiple starting and multiple ending points. The code is:
1 d e f s h o r t e s t B r i d g e ( s e l f , A) :
2 def dfs ( i , j ) :
3 A[ i ] [ j ] = −1
4 b f s . append ( ( i , j ) )
5 f o r x , y in ( ( i − 1 , j ) , ( i + 1 , j ) , ( i , j − 1) , ( i
, j + 1) ) :
6 i f 0 <= x < n and 0 <= y < n and A[ x ] [ y ] == 1 :
7 dfs (x , y)
8 def f i r s t () :
9 f o r i in range (n) :
10 f o r j in range (n) :
11 i f A[ i ] [ j ] :
12 return i , j
13 n , s t e p , b f s = l e n (A) , 0 , [ ]
14 d f s (∗ f i r s t ( ) )
15 p r i n t (A)
16 while bfs :
17 new = [ ]
18 for i , j in bfs :
19 f o r x , y in ( ( i − 1 , j ) , ( i + 1 , j ) , ( i , j − 1)
, ( i , j + 1) ) :
20 i f 0 <= x < n and 0 <= y < n :
21 i f A[ x ] [ y ] == 1 :
22 return step
23 e l i f not A[ x ] [ y ] :
24 A[ x ] [ y ] = −1
25 new . append ( ( x , y ) )
26 s t e p += 1
27 b f s = new

28.4 NP-hard Problems

Traveling salesman problems (TSP): Given a set of cities and distance be-
tween every pair of cities, the problem is to find the shortest possible route
that visits every city exactly once and returns to the starting point. In
648 28. GRAPH QUESTIONS (15%)

fact, there is no polynomial time solution available for this problem as the
problem is a known NP-Hard problem.
28.6 943. Find the Shortest Superstring (hard). Given an array A
of strings, find any smallest string that contains each string in A as a
substring. We may assume that no string in A is substring of another
string in A.
Example 1 :

Input : [ " a l e x " , " l o v e s " , " l e e t c o d e " ]

Output : " a l e x l o v e s l e e t c o d e "
Explanation : All permutations of " alex " , " l o v e s " , " l e e t c o d e "
would a l s o be a c c e p t e d .

Example 2 :

Input : [ " c a t g " , " c t a a g t " , " g c t a " , " t t c a " , " a t g c a t c " ]
Output : " g c t a a g t t c a t g c a t c "

Note: 1 <= A.length <= 12, 1 <= A[i].length <= 20 Solution 1:

DFS Permutation. First, there are n! possible ways to arrange the
strings to connect to get the superstring, and pick the shortest one.
This is a typical permutation problems, and when we connect string i
to j, we can compute the maximum length of prefix in j that we can
skip when connecting. However, with Python, we receive LTE error.
1 d e f s h o r t e s t S u p e r s t r i n g ( s e l f , A) :
2 """
3 : type A: L i s t [ s t r ]
4 : rtype : s t r
5 """
6 i f not A:
7 return ' '
8 n = l e n (A)
9
10 d e f getGraph (A) :
11 G = [ [ 0 f o r i in range (n) ] f o r _ in range (n) ] #
key i s t h e index , v a l u e ( i n d e x : l e n g t h o f s u f f i x with
t h e next p r e f i x )
12 i f not A:
13 return G
14 f o r i , s i n enumerate (A) :
15 f o r j in range (n) :
16 i f i == j :
17 continue
18
19 t = A[ j ]
20 m = min ( l e n ( s ) , l e n ( t ) )
21 f o r l i n r a n g e (m, 0 , −1) : #[ n , 1 ]
22 i f s [− l : ] == t [ 0 : l ] : # s u f f i x and
prefix
23 G[ i ] [ j ] = l
28.4. NP-HARD PROBLEMS 649

24 break
25 return G
26
27 d e f d f s ( used , d , c u r r , path , ans , best_path ) :
28 i f c u r r >= ans [ 0 ] :
29 return
30 i f d == n :
31 ans [ 0 ] = c u r r
32 best_path [ 0 ] = path
33 return
34 f o r i in range (n) :
35 i f used & (1<< i ) :
36 continue
37 #used [ i ] = True
38 i f c u r r == 0 :
39 d f s ( used |(1<< i ) , d+1 , c u r r+l e n (A[ i ] ) ,
path +[ i ] , ans , best_path )
40 else :
41 d f s ( used |(1<< i ) , d+1 , c u r r+l e n (A[ i ] )−G[
path [ − 1 ] ] [ i ] , path +[ i ] , ans , best_path )
42 #used [ i ] = F a l s e
43 return
44
45
46 G = getGraph (A)
47 ans = [ 0 ]
48 f o r a i n A:
49 ans [ 0 ] += l e n ( a )
50
51 final_path = [ [ i f o r i in range (n) ] ]
52
53 v i s i t e d = 0#[ F a l s e f o r i i n r a n g e ( n ) ]
54 d f s ( v i s i t e d , 0 , 0 , [ ] , ans , f i n a l _ p a t h )
55
56 # g e n e r a t e r e s u l t from path
57 final_path = final_path [ 0 ]
58 r e s = A[ f i n a l _ p a t h [ 0 ] ]
59 f o r i in range (1 , len ( final_path ) ) :
60 l a s t = f i n a l _ p a t h [ i −1]
61 cur = final_path [ i ]
62 l = G[ l a s t ] [ c u r ]
63 r e s += A[ c u r ] [ l : ]
64 return res

Solution 2: Dynamic programming.

650 28. GRAPH QUESTIONS (15%)
29

Dynamic Programming Questions (15%)

In this Chapter, we categorize dynamic programming into three according

to the input data types, including Single Sequence (Section 29.1 and Sec-
tion 29.2), Coordinate (Section 29.4), and Double Sequence(Section 29.5).
Each type has its own identifiable characters and can be solved in a certain
similar way. In this process, we found the Forward Induction Method
is the most effective way to identify the recurrence state transfer function.
In Forward Induction Method, we start from the base cases (corresponds to
the base cases in the DFS solution), and incrementally move to the larger
subproblem, and try to induce the state transfer function between current
problem and its previous subproblems. If can be induced from only con-
stant subproblems, we have O(n), if relates to all smaller subproblems, we
have O(n2 ). Using forward inductio method, is intuitive and effective. The
only thing we need to note is to try a variety of examples, make sure the
recurrence function we found is comprehensive and right. At the end of the
section, we would summarize a template for this type of problems solved
using dynamic programming. These types include:

1. Single Sequence (50%): This is an easy type too. The states represents
if the sequence ends here and include the current element. This way
of divide the problem we can obtain the state transfer function easily
to find a pattern.

2. Coordinate (15%): 1D or 2D coordinate. This is the easiest type of DP

because the state transfer function can be directly obtained through
the problem (how to make moves to the next position).

3. Double Sequence (30%): Because double sequence make its state a

matrix and subproblem size O(mn), this type of dynamic programming

651
652 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

is similar to coordinate type, within which we just need to figure out

the transfer function (moves) ourselves.

The single sequence type dynamic programming is usually applied on the

string and array.

Table 29.1: Different Type of Single Sequence Dynamic Programming

Case Input Subproblems f(n) Time Space
Section 29.1 O(n) O(n) O(1) O(n) O(n)− >
O(1)
Section 29.2 O(n) O(n) O(n) O(n2 ) O(n)
Section 29.3 O(n) O(n2 ) O(n) O(n3 ) O(n2 )
Hard O(n) O(n3 ) O(n) O(n4 ) O(n3 )

Table 29.2: Different Type of Coordinate Dynamic Programming

Case Input Subproblems f(n) Time Space
Easy O(mn) O(mn) O(1) O(mn) O(mn)− >
O(m)
Medium O(mn) O(kmn) O(1) O(kmn) O(kmn)− >
O(mn)

Now, let us look at some examples:

29.1 Single Sequence O(n)

In this section, we will see how to solve the easy type of dynamic program-
ming shown in Table 29.1, where each subproblem is only dependent on
the state of constant number of smaller subproblems. Subarray and Sub-
string are two types of them. Here, we will see how to using deduction
method which starts from base case, and gradually get the result of all the
cases after it. The examples include probelems with one or multiple choice.
Moreover, for this type, because for each subproblem, we only need to
look back constant smaller subproblems, we do not even need O(n) space
to save all the result, unless you are asked to get the best solution for all
subproblems too. Thus, this section generally achieve O(n) and O(1) for
time complexity and space complexity, respectively.

1. 276. Paint Fence

2. 256. Paint House

3. 198. House Robber

4. 337. House Robber III (medium)

29.1. SINGLE SEQUENCE O(N ) 653

5. 53. Maximum Subarray (Easy)

6. 152. Maximum Product Subarray

7. 32. Longest Valid Parentheses(hard)

29.1.1 Easy Type

29.1 Paint Fence (L276, *). There is a fence with n posts, each post
can be painted with one of the k colors. You have to paint all the
posts such that no more than two adjacent fence posts have the same
color.Return the total number of ways you can paint the fence. Note:
n and k are non-negative integers.
Example :

I nput : n = 3 , k = 2
Output : 6
E x p l a n a t i o n : Take c1 a s c o l o r 1 , c2 a s c o l o r 2 . A l l
p o s s i b l e ways a r e :

post1 post2 post3

−−−−− −−−−− −−−−− −−−−−
1 c1 c1 c2
2 c1 c2 c1
3 c1 c2 c2
4 c2 c1 c1
5 c2 c1 c2
6 c2 c2 c1

Solution: Induction and Multi-choiced State. suppose n=1,

dp[1] = k; when n=2, we have two cases: same color with k ways to
paint and different color with k*(k-1) ways.
dp [ 1 ] = k
dp [ 2 ] = same + d i f f ; same = k , d i f f = k ∗ ( k−1)
dp [ 3 ] : f o r dp [ 2 ] . same , we can o n l y have d i f f c o l o r s , d i f f =
dp [ 2 ] . same ∗ ( k−1)
f o r dp [ 2 ] . d i f f , we can have e i t h e r d i f f c o l o r o r
s m a l l c o l o r , same = dp [ 2 ] . d i f f , d i f f+=dp [ 2 ] . d i f f ∗ ( k−1)

Thus, using deduction, which is the dynamic programming, the code

is:
1 d e f numWays( s e l f , n , k ) :
2 i f n==0 o r k==0:
3 return 0
4 i f n==1:
5 return k
6
7 same = k
8 d i f f = k ∗ ( k−1)
9 f o r i i n r a n g e ( 3 , n+1) :
654 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

10 pre_diff = d i f f
11 d i f f = ( same+ d i f f ) ∗ ( k−1)
12 same = p r e _ d i f f
13 r e t u r n ( same+ d i f f )

29.2 Paint House (L256, *). There are a row of n houses, each house
can be painted with one of the three colors: red, blue or green. The
cost of painting each house with a certain color is different. You have
to paint all the houses such that no two adjacent houses have the same
color.
The cost of painting each house with a certain color is represented by
a n x 3 cost matrix. For example, costs[0][0] is the cost of painting
house 0 with color red; costs[1][2] is the cost of painting house 1 with
color green, and so on... Find the minimum cost to paint all houses.
Note: All costs are positive integers.
Example :

Input : [ [ 1 7 , 2 , 1 7 ] , [ 1 6 , 1 6 , 5 ] , [ 1 4 , 3 , 1 9 ] ]
Output : 10
E x p l a n a t i o n : P a i n t house 0 i n t o blue , p a i n t house 1 i n t o
green , p a i n t house 2 i n t o b l u e .
Minimum c o s t : 2 + 5 + 3 = 1 0 .

Solution: Induction and Multi-choiced State. For this problem,

each item has three choice, so we need to track the optimal solution
for taking each color. dp[0] = 0, for one house, return min(c1, c2, c3).
f o r 1 house : f o r t h r e e c h o i c e − ( c1 , c2 , c3 ) , t h e r e s u l t i s
min ( c1 , c2 , c3 )
f o r 2 h o u s e s : c o s t o f t a k i n g c1 = c o s t s [ 2 ] [ c1 ]+min ( dp [ 1 ] . c2
, dp [ 1 ] . c3 )
c o s t o f t a k i n g c2 = c o s t s [ 2 ] [ c2 ]+min ( dp [ 1 ] . c1
, dp [ 1 ] . c3 )
c o s t o f t a k i n g c3 = c o s t s [ 2 ] [ c3 ]+min ( dp [ 1 ] . c1
, dp [ 1 ] . c2 )

1 d e f minCost ( s e l f , c o s t s ) :
2 i f not c o s t s :
3 return 0
4 c1 , c2 , c3 = c o s t s [ 0 ]
5 n = len ( costs )
6 f o r i in range (1 , n) :
7 nc1 = c o s t s [ i ] [ 0 ] + min ( c2 , c3 )
8 nc2 = c o s t s [ i ] [ 1 ] + min ( c1 , c3 )
9 nc3 = c o s t s [ i ] [ 2 ] + min ( c1 , c2 )
10 q c1 , c2 , c3 = nc1 , nc2 , nc3
11 r e t u r n min ( c1 , c2 , c3 )

29.3 House Robber (L198,*). You are a professional robber planning

to rob houses along a street. Each house has a certain amount of
29.1. SINGLE SEQUENCE O(N ) 655

money stashed, the only constraint stopping you from robbing each
of them is that adjacent houses have security system connected and
it will automatically contact the police if two adjacent houses were
broken into on the same night.
Given a list of non-negative integers representing the amount of money
of each house, determine the maximum amount of money you can rob
tonight without alerting the police.
Solution: Induction and Multi-choiced State. For each house
has two choice: rob or not rob. Thus the profit for each house can be
deducted as follows:
1 house : dp [ 1 ] . rob = p [ 1 ] , dp [ 1 ] . not_rob = 0 , r e t u r n max( dp
[1) ]
2 house : i f rob house 2 , means we d e f i n i t e l y can not rob
house 1 . dp [ 2 ] . rob = dp [ 1 ] . not_rob + p [ 2 ] .
i f not rob house 2 , means we can c h o o s e rob house
1 o r not rob house 1 . dp [ 2 ] . not_rob = max( dp [ 1 ] . rob , dp
[ 1 ] . not_rob )

1 d e f rob ( s e l f , nums ) :
2 i f not nums :
3 return 0
4 i f l e n ( nums ) ==1:
5 r e t u r n nums [ 0 ]
6 rob = nums [ 0 ]
7 not_rob = 0
8 f o r i i n r a n g e ( 1 , l e n ( nums ) ) :
9 new_rob = not_rob + nums [ i ]
10 new_not_rob = max( rob , not_rob )
11 rob , not_rob = new_rob , new_not_rob
12 r e t u r n max( rob , not_rob )

29.4 House Robber III (L337, medium). The thief has found himself
a new place for his thievery again. There is only one entrance to this
area, called the "root." Besides the root, each house has one and only
one parent house. After a tour, the smart thief realized that "all houses
in this place forms a binary tree". It will automatically contact the
police if two directly-linked houses were broken into on the same night.
Determine the maximum amount of money the thief can rob tonight
without alerting the police.
Example 1 :

I nput : [ 3 , 2 , 3 , n u l l , 3 , n u l l , 1 ]

3
/ \
2 3
\ \
656 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

3 1

Output : 7
E x p l a n a t i o n : Maximum amount o f money t h e t h i e f can rob = 3
+ 3 + 1 = 7.

Example 2 :

Input : [ 3 , 4 , 5 , 1 , 3 , n u l l , 1 ]

3
/ \
4 5
/ \ \
1 3 1

Output : 9
E x p l a n a t i o n : Maximum amount o f money t h e t h i e f can rob = 4
+ 5 = 9.

Solution: Induction + Tree Traversal + Multi-choiced State.

This is a dynamic programming applied on tree structure. The brute
force still takes O(2n ), where n is the total nodes of the tree. Also,
for the tree structure, naturally, the result of a node dependent on the
result of its both left and right subtree. When the subtree is empty,
then we return (0, 0) for rob and not rob. After we gained the result
of left and right subtree each for robbing or not robbing, we merge the
result with the current node. Say if we want the result for robbing state
for current node: then the left tree and right subtree will only use not
robbing, it will be left_not_rob + right_not_rob + current node val.
If the current is not robbing, then for the left and right subtree, it both
can take rob or not rob state, so we pick the maximum combination
of them. Walking through a carefully designed sophisticated enough
example is necessary to figure out the process.
1 # c l a s s TreeNode ( o b j e c t ) :
2 # d e f __init__ ( s e l f , x ) :
3 # s e l f . val = x
4 # s e l f . l e f t = None
5 # s e l f . r i g h t = None
6 d e f rob ( s e l f , r o o t ) :
7 def TreeTraversal ( root ) :
8 i f not r o o t :
9 return (0 , 0)
10
11 l_rob , l_not_rob = T r e e T r a v e r s a l ( r o o t . l e f t )
12 r_rob , r_not_rob = T r e e T r a v e r s a l ( r o o t . r i g h t )
13
14 rob = r o o t . v a l +(l_not_rob+r_not_rob )
15 not_rob = max( l_rob+r_rob , l_rob+r_not_rob ,
l_not_rob+r_not_rob , l_not_rob+r_rob )
29.1. SINGLE SEQUENCE O(N ) 657

16 # not_rob = (max( l_rob , l_not_rob )+max( r_rob ,

r_not_rob )
17 r e t u r n ( rob , not_rob )
18 r e t u r n max( T r e e T r a v e r s a l ( r o o t ) )

29.1.2 Subarray Sum: Prefix Sum and Kadane’s Algorithm

This subsection is a continuation of the last section. The purpose of seper-
ating from the last section is due to the importance of the algorithms–Prefix
Sum and Kadane’s Algorithms in the problems related to the sum or product
of the subarray.
Both Prefix Sum and Kadane’s algorithm has used the dynamic program-
ming methodology, and they are highly correlated to each others. They each
holds a different perspective to solve a similar problem: one best example is
the maximum subarray problem. In the following two sections (Sec 29.1.2
and Sec ??) we will demonstrate how prefix sum is used to solve the maxi-
mum subarray problem and how kadane’s algorithm which applied dynamic
programming directly on this problem. And we show Python code in the
next paragraph. After we obtained the prefix sum of the array, using formula
S(i,j) = yj − yi−1 can get us the sum of any subarray in the array.
1 P = [ 0 ] ∗ ( l e n (A) +1)
2 f o r i , v i n enumerate (A) :
3 P [ i +1] = P [ i ] + v

Prefix Sum and Kadane’s Algorithm Application

29.5 Maximum Subarray (L53, *). Given an integer array nums, find
the contiguous subarray (containing at least one number) which has
the largest sum and return its sum.
Example :

I nput : [ − 2 , 1 , − 3 , 4 , − 1 , 2 , 1 , − 5 , 4 ] ,
Output : 6
E x p l a n a t i o n : [ 4 , − 1 , 2 , 1 ] has t h e l a r g e s t sum = 6 .

Follow up: If you have figured out the O(n) solution, try coding an-
other solution using the divide and conquer approach, which is more
subtle.
Solution 1: Prefix Sum. For the maximum subarray problem, we
have our answer to be max(yj − yi )(j > i, j ∈ [0, n − 1]), which is
equivalent to max(yj − min(yi )(i < j)), j ∈ [0, n − 1]. We can solve
the maximum subarray problem using prefix sum with linear O(n)
time, where using brute force is O(n3 ) and the divide and conquer is
O(nlgn). For example, given an array of [−2, −3, 4, −1, −2, 1, 5, −3].
We have the following results: The coding:
658 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

Table 29.3: Process of using prefix sum for the maximum subarray
Array −2 −3 4 −1 −2 1 5 3
prefix sum −2 −5 −1 −2 −4 −3 2 1
Updated prefix sum −2 −3 4 3 1 2 7 4
current max sum −2 −2 4 4 4 4 7 7
min prefix sum −2 −5 −5 −5 −5 −5 −5 −5

1
2 # o r we can u s e import math , math . i n f
3
4 # Function t o compute maximum
5 # s u b a r r a y sum i n l i n e a r time .
6 d e f maximumSumSubarray ( nums ) :
7 i f not nums :
8 return 0
9 prefixSum = 0
10 g l o b a l A = −s y s . maxsize
11 minSub = 0
12 f o r i i n r a n g e ( l e n ( nums ) ) :
13 prefixSum += nums [ i ]
14 g l o b a l A = max( globalA , prefixSum−minSub )
15 minSub = min ( minSub , prefixSum )
16 return globalA
17
18 # D r i v e r Program
19
20 # Test c a s e 1
21 a r r 1 = [ −2, −3, 4 , −1, −2, 1 , 5 , −3 ]
22 p r i n t ( maximumSumSubarray ( a r r 1 ) )
23
24 # Test c a s e 2
25 a r r 2 = [ 4 , −8, 9 , −4, 1 , −8, −1, 6 ]
26 p r i n t ( maximumSumSubarray ( a r r 2 ) )

As we can see, we did not need extra space to save the prefix sum,
because each time we only use prefix sum at current index.
Solution 2: Kadane’s Algorithm. Another easier perspective us-
ing dynamic programming for this problem because we found the key
word ”Maximum" in the question which is problem that identified in
the dynamic programming chapter.
dp : t h e maximum s u b a r r a y r e s u l t t i l l i n d e x i , which
i n c l u d e s t h e c u r r e n t e l e m e n t nums [ i ] . We need n+1 s p a c e
due t o u s i n g i −1.
Init : all 0
s t a t e t r a n s f e r f u n c t i o n : dp [ i ] = max( dp [ i −1]+nums [ i ] , nums [
i ] ) ; b e c a u s e i f f o r each element , we can e i t h e r c o n t i n u e
t h e p r e v i o u s s u b a r r a y o r s t a r t a new s u b a r r a y .
R e q u e s u l t : max( dp )
29.1. SINGLE SEQUENCE O(N ) 659

However, to do space optimization, we only need to track the current

maximum dp and since dp[i] is only related to dp[i − 1]. For the last
example, the newly updated prefix sum is −2, −3, 4, 3, 1, 2, 7, 4. The
comparison result can be seen in Table 29.3.
1 d e f maxSubArray ( s e l f , nums ) :
2 i f not nums :
3 return 0
4 dp = [− f l o a t ( ' i n f ' ) ] ∗ ( l e n ( nums ) + 1 )
5 f o r i , n i n enumerate ( nums ) :
6 dp [ i +1] = max( dp [ i ] + n , n )
7 r e t u r n max( dp )

It can also be extended to string, where it is used to compute the number

of occurrence of each char in a query substring. For example, 1177. Can
Make Palindrome from Substring
1 d e f presum ( s ) :
2 n = len ( s )
3 ps = [ d e f a u l t d i c t ( i n t ) ] ∗ ( n+1)
4 f o r i , c i n enumerate ( s ) :
5 ps [ i +1] = ps [ i ] . copy ( )
6 ps [ i + 1 ] [ c ] += 1
7 r e t u r n ps
8
9 d e f h e l p e r ( q u e r i e s , ps ) :
10 ans = [ ]
11 for s , e , k in queries :
12 c t = ps [ e +1]
13 num_odd = 0
14 f o r key i n c t . k e y s ( ) :
15 o c c u r e n c e = c t [ key ]
16 i f key i n ps [ s ] :
17 o c c u r e n c e −= ps [ s ] [ key ]

Kadane’s algorithm Actually, the above simple dynamic programming

is exactly the same as an algorithm called Kadane’s algorithm, Kadane’s al-
gorithm begins with a simple inductive question: if we know the maximum
subarray sum ending at position i, what is the maximum subarray sum end-
ing at position i + 1? The answer turns out to be relatively straightforward:
either the maximum subarray sum ending at position i+1 includes the max-
imum subarray sum ending at position i as a prefix, or it doesn’t. Thus,
we can compute the maximum subarray sum ending at position i for all
positions i by iterating once over the array. As we go, we simply keep track
of the maximum sum we’ve ever seen. Thus, the problem can be solved with
the following code, expressed here in Python:
1 d e f max_subarray (A) :
2 max_ending_here = max_so_far = A [ 0 ]
3 for x in A[ 1 : ] :
660 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

4 max_ending_here = max( x , max_ending_here + x )

5 max_so_far = max( max_so_far , max_ending_here )
6 r e t u r n max_so_far

The algorithm can also be easily modified to keep track of the starting
and ending indices of the maximum subarray (when max_so_far changes) as
well as the case where we want to allow zero-length subarrays (with implicit
sum 0) if all elements are negative.
Because of the way this algorithm uses optimal substructures (the max-
imum subarray ending at each position is calculated in a simple way from
a related but smaller and overlapping subproblem: the maximum subarray
ending at the previous position) this algorithm can be viewed as a sim-
ple/trivial example of dynamic programming.
Prefix Sum to get BCR convert this problem to best time to buy and sell
stock problem. [0, −2, −1, −4, 0, −1, 1, 2, −3, 1], which is to find the maxi-
mum benefit, => O(n), use prefix_sum, the difference is we set prefix_sum
to 0 when it is smaller than 0, O(n). Or we can try two pointers.
1 from s y s import maxint
2 d e f maxSubArray ( s e l f , nums ) :
3 """
4 : type nums : L i s t [ i n t ]
5 : rtype : int
6 """
7 max_so_far = −maxint − 1
8 prefix_sum= 0
9 f o r i i n r a n g e ( 0 , l e n ( nums ) ) :
10 prefix_sum+= nums [ i ]
11 i f ( max_so_far < prefix_sum ) :
12 max_so_far = prefix_sum
13
14 i f prefix_sum< 0 :
15 prefix_sum= 0
16 r e t u r n max_so_far

Generalize Kadane’s Algorithm

Because we can still do space optimization to the above solution, we use one
variable to replace the dp array, and we track the maximum dp in the for
loop instead of obtaining the maximum value at the end. Also, if we rename
the dp to max_ending_here and the max(dp) to max_so_far, the code is
as follows:
1 d e f maximumSumSubarray ( a r r , n ) :
2 i f not a r r :
3 return 0
4 max_ending_here = 0
5 max_so_far = −s y s . maxsize
6 f o r i in range ( len ( arr ) ) :
7 max_ending_here = max( max_ending_here+a r r [ i ] , a r r [ i ] )
29.1. SINGLE SEQUENCE O(N ) 661

8 max_so_far = max( max_so_far , max_ending_here )

9 r e t u r n max_so_far

This space-wise optimized dynamic programming solution to the maximum

subarray problem is exactly the Kadane’s algorithm. Kadane’s algorithm
begins with a simple inductive question: if we know the maximum subarray
sum ending at position i, what is the maximum subarray sum ending at
position i + 1? The answer turns out to be relatively straightforward: either
the maximum subarray sum ending at position i + 1 includes the maximum
subarray sum ending at position i as a prefix, or it doesn’t. Thus, we can
compute the maximum subarray sum ending at position i for all positions
i by iterating once over the array. As we go, we simply keep track of the
maximum sum we’ve ever seen. Thus, the problem can be solved with the
following code, expressed here in Python:
1 d e f max_subarray (A) :
2 max_ending_here = max_so_far = A [ 0 ]
3 for x in A[ 1 : ] :
4 max_ending_here = max( x , max_ending_here + x )
5 max_so_far = max( max_so_far , max_ending_here )
6 r e t u r n max_so_far

29.6 Maximum Product Subarray (L152, **). Given an integer array

nums, find the contiguous subarray within an array (containing at least
one number) which has the largest product.
Example 1 :

I nput : [ 2 , 3 , − 2 , 4 ]
Output : 6
E x p l a n a t i o n : [ 2 , 3 ] has t h e l a r g e s t p r o d u c t 6 .

Example 2 :

I nput : [ −2 ,0 , −1]
Output : 0
E x p l a n a t i o n : The r e s u l t cannot be 2 , b e c a u s e [ −2 , −1] i s not
a subarray .

Solution: Kadane’s Algorithm with product. For the product,

the difference compared with sum is the max_ending_here is not nec-
essarily computed from the previous value with current element; if
the element is negative it might even become the smallest. So that
662 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

we need to track another variable, the min_ending_here. Let use

see the Python code which is a straightforward implementation of the
product-modified kadane’s algorithm.
1 from s y s import maxsize
2 class Solution ( object ) :
3 d e f maxProduct ( s e l f , nums ) :
4 """
5 : type nums : L i s t [ i n t ]
6 : rtype : int
7 """
8 i f not nums :
9 return 0
10 n = l e n ( nums )
11 max_so_far = nums [ 0 ]
12 min_local , max_local = nums [ 0 ] , nums [ 0 ]
13 f o r i in range (1 , n) :
14 a = m i n _ l o c a l ∗nums [ i ]
15 b = max_local ∗nums [ i ]
16 max_local = max( nums [ i ] , a , b )
17 m i n _ l o c a l = min ( nums [ i ] , a , b )
18 max_so_far = max( max_so_far , max_local )
19 r e t u r n max_so_far

29.1.3 Subarray or Substring

It will lower the complexity from O(n2 ) or O(n3 ) to O(n).

29.7 Longest Valid Parentheses (L32, hard). Given a string contain-

ing just the characters ’(’ and ’)’, find the length of the longest valid
(well-formed) parentheses substring.
Example 1 :

Input : " ( ( ) "

Output : 2
E x p l a n a t i o n : The l o n g e s t v a l i d p a r e n t h e s e s s u b s t r i n g i s " ( )
"

Example 2 :

Input : " ) ( ) ( ) ) "

Output : 4
E x p l a n a t i o n : The l o n g e s t v a l i d p a r e n t h e s e s s u b s t r i n g i s " ( )
() "

Solution 1: Dynamic programming. We define the state to be

the longest length ends at this position. We would know only ’)’ can
possibly has value larger than 0. At all position of ’(’ it is 0. As our
define, for the following case:
1 ") ( ) ( ) ) "
29.1. SINGLE SEQUENCE O(N ) 663

2 dp 0 0 2 0 4 0
3 ") ( ) ( ( ( ) ) ) ("
4 dp 0 0 2 0 0 0 2 4 8 0

Thus, when we are at position ’)’, we look for i-1, there are two cases:
1 1 ) i f s [ i −1] == ' ( ' , i t i s an c l o s u r e , dp [ i ]+=2 , then we
check dp [ i −2] t o c o n n e c t with p r e v i o u s l o n g e s t l e n g t h .
f o r example i n c a s e 1 , " ) ( ) ( ) " , where dp [ i ] = 4 .
2 2 ) i f s [ i −1] == ' ) ' , then we check a t p o s i t i o n i −1−dp [ i −1] ,
i n c a s e , a t dp [ i ] = 8 , i f a t i t s c o r r e s p o n d i n g
p o s i t i o n we check i f i t i s ' ( ' . I f i t i s we i n c r e a s e t h e
count by 2 , and c o n n e c t i t with p r e v i o u s p o s i t i o n .

1 def longestValidParentheses ( s e l f , s ) :
2 """
3 : type s : s t r
4 : rtype : int
5 """
6 i f not s :
7 return 0
8 dp = [ 0 ] ∗ l e n ( s )
9 f o r i in range (1 , len ( s ) ) :
10 c = s[i]
11 i f c == ' ) ' :#check p r e v i o u s p o s i t i o n
12 i f s [ i −1] == ' ( ' :#t h i s i s t h e c l o s u r e
13 dp [ i ] +=2
14 i f i −2>=0: #c o n n e c t with p r e v i o u s l e n g t h
15 dp [ i ]+=dp [ i −2]
16 i f s [ i −1] == ' ) ' : #l o o k a t i −1−dp [ i −1] f o r ' ( '
17 i f i −1−dp [ i −1]>=0 and s [ i −1−dp [ i − 1 ] ] == ' ( '
:
18 dp [ i ] = dp [ i −1]+2
19 i f i −1−dp [ i −1]−1 >=0: # c o n n e c t with
previous length
20 dp [ i −1]+=dp [ i −1−dp [ i −1] −1]
21 p r i n t ( dp )
22 r e t u r n max( dp )
23 # input " ( ( ) ) ) () ) ( "
24 # output [ 0 , 0 , 2 , 4 , 0 , 0 , 2 , 0 , 0 ]

Solution 2: Using Stack.

1 def longestValidParentheses ( s e l f , s ) :
2 i f not s :
3 return 0
4 s t a c k =[ −1]
5 ans = 0
6 f o r i , c i n enumerate ( s ) :
7 i f c == ' ( ' :
8 s t a c k . append ( i )
9 else :
10 i f stack :
11 s t a c k . pop ( )
12 i f not s t a c k :
664 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

13 s t a c k . append ( i )
14 else :
15 ans = max( ans , i −s t a c k [ − 1 ] )
16 r e t u r n ans

29.1.4 Exercise
1. 639. Decode Ways II (hard)

29.2 Single Sequence O(n2 )

In this section, we will analysis the second type in Table 29.1 where we have
O(n) subproblems, and each subproblem is dependent on all the previous
smaller subproblems, thus gave us O(n2 ) time complexity. The problems
here further can be categorized as Subsequence and Splitting.

1. 300. Longest Increasing Subsequence (medium)

2. 139. Word Break (Medium)

3. 132. Palindrome Partitioning II (hard)

4. 123. Best Time to Buy and Sell Stock III (hard)

5. 818. Race Car (hard)

29.2.1 Subsequence
29.8 Longest Increasing Subsequence (L300, medium). Given an
unsorted array of integers, find the length of longest increasing subse-
quence.
Example :

Input : [ 1 0 , 9 , 2 , 5 , 3 , 7 , 1 0 1 , 1 8 ]
Output : 4
E x p l a n a t i o n : The l o n g e s t i n c r e a s i n g s u b s e q u e n c e i s
[ 2 , 3 , 7 , 1 0 1 ] , t h e r e f o r e the length i s 4 .

Note: (1) There may be more than one LIS combination, it is only
necessary for you to return the length. (2) Your algorithm should run
in O(n2) complexity.
Follow up: Could you improve it to O(n log n) time complexity?
Solution 1: Induction. For each subproblem, we show the result
as follows. Each state dp[i] we represents the longest increasing sub-
sequence ends with nums[i]. The reconstruction depends on all the
previous i-1 subproblems, as shown in Eq. 29.1.
29.2. SINGLE SEQUENCE O(N 2 ) 665

Figure 29.1: State Transfer Tree Structure for LIS, each path represents a
possible solution. Each arrow represents an move: find an element in the
following elements that’s larger than the current node.

1 subproblem : [ ] , [ 1 0 ] , [ 1 0 , 9 ] ,
[10 ,9 ,2] ,[10 ,9 ,2 ,5] ,[10 ,9 ,2 ,5 ,3] , [10 ,9 ,2 ,5 ,3 , 7 ] . . .
2 Choice :
3 ans : 0, 1, 1, 1, 2, 2,
3,

1 + max(f (j)), 0 < j < i, arr[j] < arr[i];

(
f (i) = (29.1)
1 otherwise

1 class Solution ( object ) :

2 d e f l e n g t h O f L I S ( s e l f , nums ) :
3 """
4 : type nums : L i s t [ i n t ]
5 : rtype : int
6 """
7 max_count = 0
8 LIS = [ 0 ] ∗ ( l e n ( nums ) +1) # t h e LIS f o r a r r a y ends
with i n d e x i
9 f o r i i n r a n g e ( l e n ( nums ) ) : # s t a r t with 10
10 max_before = 0
11 f o r j in range ( i ) :
12 i f nums [ i ] > nums [ j ] :
13 max_before = max( max_before , LIS [ j +1])
14 LIS [ i +1] = max_before+1
15 r e t u r n max( LIS )

29.2.2 Splitting
Need to figure out how to fill out the two-dimensional dp matrix for splitting.
29.9 Word Break (L139, **). Given a non-empty string s and a dic-
tionary wordDict containing a list of non-empty words, determine if
666 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

s can be segmented into a space-separated sequence of one or more

dictionary words. Note: (1) The same word in the dictionary may be
reused multiple times in the segmentation. (2) You may assume the
dictionary does not contain duplicate words.
Example 1 :

Input : s = " l e e t c o d e " , wordDict = [ " l e e t " , " code " ]

Output : t r u e
E x p l a n a t i o n : Return t r u e b e c a u s e " l e e t c o d e " can be
segmented a s " l e e t code " .

Example 2 :

Input : s = " a p p l e p e n a p p l e " , wordDict = [ " a p p l e " , " pen " ]

Output : t r u e
E x p l a n a t i o n : Return t r u e b e c a u s e " a p p l e p e n a p p l e " can be
segmented a s " a p p l e pen a p p l e " .
Note t h a t you a r e a l l o w e d t o r e u s e a
d i c t i o n a r y word .

Example 3 :

Input : s = " c a t s a n d o g " , wordDict = [ " c a t s " , " dog " , " sand " ,
" and " , " c a t " ]
Output : f a l s e

Solution: Induction + Splitting. Like most of single sequence

problem, we have n overlapping subproblems, for example of “leet-
code".
subproblem : ' ' , ' l ' , ' l e ' , ' l e e ' , ' leet ' , ' leetc ' , ' leetco
' , ' leetcod ' , ' leetcode ' .
ans : 1, 0, 0, 0, 1, 0, 0,
0, 1

Thus, deduction still works here. We manually write down the result
of each subproblem. Suppose we are trying to achieve answer for ’leet’,
how does it work? if ’lee’ is true and ’t’ is true, then we have true. Or,
if ’le’ is true, and ’et’ is ture, we have true. unlike problems before, the
ans for ’leet’ can only be constructured from all the previous smaller
problems.
1 d e f wordBreak ( s e l f , s , wordDict ) :
2 wordDict = s e t ( wordDict )
3 n = len ( s )
4 dp = [ F a l s e ] ∗ ( n+1)
5 dp [ 0 ] = True #s e t 1 f o r empty s t r ' '
6 f o r i i n r a n g e ( 1 , n+1) :
7 f o r j in range ( i ) :
8 i f dp [ j ] and s [ j : i ] i n wordDict : # check
p r e v i o u s r e s u l t , and new word s [ j : i ]
9 dp [ i ] = True
29.2. SINGLE SEQUENCE O(N 2 ) 667

10
11 r e t u r n dp [ −1]

Figure 29.2: Word Break with DFS. For the tree, each arrow means check
the word = parent-child and then recursively check the result of child.

DFS+Memo. To understand why each subproblem depends on O(n)

even smaller subproblem, we can look at the process solving the prob-
lem with DFS shown in Fig. 29.2 (we can also draw a tree structure
which will be more obvious). For “leetcode" and “leet" they both com-
puted the subproblem ”, ’l’,’le’, ’lee’. Thus we can use memory to save
solved problems. From the tree strcture, for each root node, it has
O(n) subbranches. So we should see why. To complete this, we give
the code for the DFS version.
1 d e f wordBreak ( s e l f , s , wordDict ) :
2 wordDict = s e t ( wordDict )
3 #b a c k t r a c k i n g
4 d e f DFS( s t a r t , end , memo) :
5 i f s t a r t >= end :
6 r e t u r n True
7 i f s t a r t not i n memo :
8 i f s [ s t a r t : end ] i n wordDict :
9 memo [ s t a r t ] = True
668 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

10 r e t u r n memo [ s t a r t ]
11
12 f o r i i n r a n g e ( s t a r t , end+1) :
13 word = s [ s t a r t : i ] #i i s t h e s p l i t t i n g p o i n t
14 i f word i n wordDict :
15 i f i not i n memo :
16 memo [ i ] = DFS( i , end , memo)
17 i f memo [ i ] :
18 r e t u r n True
19 memo [ s t a r t ] = F a l s e
20
21 r e t u r n memo [ s t a r t ]
22
23 r e t u r n DFS( 0 , n , { } )

29.10 Palindrome Partitioning II (L132, ***) Given a string s, par-

tition s such that every substring of the partition is a palindrome.
Return the minimum cuts needed for a palindrome partitioning of s.
Example :

Input : " aab "

Output : 1
E x p l a n a t i o n : The p a l i n d r o m e p a r t i t i o n i n g [ " aa " , " b " ] c o u l d
be produced u s i n g 1 c u t .

Solution: use two dp. one to track if it is pal and the other is to
compute the cuts.
1 d e f minCut ( s e l f , s ) :
2 """
3 : type s : s t r
4 : rtype : int
5 """
6 pal = [ [ False f o r _ in range ( len ( s ) ) ] f o r _ in
range ( len ( s ) ) ]
7 c u t s = [ l e n ( s )−i −1 f o r i i n r a n g e ( l e n ( s ) ) ]
8 f o r s t a r t i n r a n g e ( l e n ( s ) −1,−1,−1) :
9 f o r end i n r a n g e ( s t a r t , l e n ( s ) ) :
10 i f s [ s t a r t ] == s [ end ] and ( end−s t a r t < 2 o r
p a l [ s t a r t + 1 ] [ end −1]) :
11 p a l [ s t a r t ] [ end ] = True
12 i f end == l e n ( s ) −1:
13 cuts [ start ] = 0
14 else :
15 c u t s [ s t a r t ] = min ( c u t s [ s t a r t ] , 1+
c u t s [ end +1])
16 return cuts [ 0 ]

29.11 Best Time to Buy and Sell Stock III (L123, hard). Say you
have an array for which the ith element is the price of a given stock
on day i. Design an algorithm to find the maximum profit. You may
29.2. SINGLE SEQUENCE O(N 2 ) 669

complete at most two transactions. Note: You may not engage in

multiple transactions at the same time (i.e., you must sell the stock
before you buy again).
Example 1 :

I nput : [ 3 , 3 , 5 , 0 , 0 , 3 , 1 , 4 ]
Output : 6
E x p l a n a t i o n : Buy on day 4 ( p r i c e = 0 ) and s e l l on day 6 (
p r i c e = 3 ) , p r o f i t = 3−0 = 3 .
Then buy on day 7 ( p r i c e = 1 ) and s e l l on day
8 ( p r i c e = 4 ) , p r o f i t = 4−1 = 3 .

Example 2 :
\ begin { l s t l i s t i n g }
I nput : [ 1 , 2 , 3 , 4 , 5 ]
Output : 4
E x p l a n a t i o n : Buy on day 1 ( p r i c e = 1 ) and s e l l on day 5 (
p r i c e = 5 ) , p r o f i t = 5−1 = 4 .
Note t h a t you cannot buy on day 1 , buy on day
2 and s e l l them l a t e r , a s you a r e
e n g a g i n g m u l t i p l e t r a n s a c t i o n s a t t h e same
time . You must s e l l b e f o r e buying a g a i n .

Example 3 :
I nput : [ 7 , 6 , 4 , 3 , 1 ]
Output : 0
E x p l a n a t i o n : In t h i s c a s e , no t r a n s a c t i o n i s done , i . e . max
profit = 0.

Solution: the difference compared with I is that we need at most two

times of transaction. We split the array into two parts from i, the
max profit we can get till i and the max profit we can get from i to n.
To get the maximum profit of each part is the same as the problem I.
At last, the answer is maxpreProfit[i] + postProfit[i],(0 ≤ i ≤ n − 1).
However, we would get O(n2 ) time complexity if we use the following
code, it has a lot of redundency.
1 from s y s import maxsize
2 class Solution :
3 def maxProfit ( s e l f , p r i c e s ) :
4 """
5 : type p r i c e s : L i s t [ i n t ]
6 : rtype : int
7 """
8 d e f m a x P r o f i t I ( s t a r t , end ) :
9
10 i f s t a r t == end :
11 return 0
12 max_global_profit = 0
13 min_local = p r i c e s [ s t a r t ]
14 f o r i i n r a n g e ( s t a r t +1 , end+1) :
670 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

15 m a x _ g l o b a l _ p r o f i t= max( max_global_profit ,
p r i c e s [ i ]− m i n _ l o c a l )
16 m i n _ l o c a l = min ( min_local , p r i c e s [ i ] )
17 return max_global_profit
18
19 i f not p r i c e s :
20 return 0
21 n = len ( prices )
22 min_local = p r i c e s [ 0 ]
23 preProfit , postProfit = [0]∗ n , [0]∗ n
24
25 f o r i in range (n) :
26 p r e P r o f i t [ i ] = maxProfitI (0 , i )
27 p o s t P r o f i t [ i ] = m a x P r o f i t I ( i , n−1)
28 m a x P r o f i t = max ( [ p r e+p o s t f o r pre , p o s t i n z i p (
preProfit , postProfit ) ] )
29 return maxProfit

To avoid repeat work, we can use a for loop to get all the value of
preProfit, and use another to get values for postProfit. For the post-
Profit, we need to traverse from the end to the start of the array in
reverse direction, this way we track the local_max and the profit is
going to be local_max - prices[i], and both keep a global max profit.
The code is as follows:
1 def maxProfit ( s e l f , p r i c e s ) :
2 """
3 : type p r i c e s : L i s t [ i n t ]
4 : rtype : int
5 """
6 i f not p r i c e s :
7 return 0
8 n = len ( prices )
9
10 preProfit , postProfit = [0]∗ n , [0]∗ n
11 #g e t p r e P r o f i t , from 0−n , t r a c k t h e m i n i _ l o c a l ,
global_max
12 min_local = p r i c e s [ 0 ]
13 max_global_profit = 0
14 f o r i in range (1 , n) :
15 m a x _ g l o b a l _ p r o f i t= max( max_global_profit , p r i c e s [ i
]− m i n _ l o c a l )
16 m i n _ l o c a l = min ( min_local , p r i c e s [ i ] )
17 p r e P r o f i t [ i ] = max_global_profit
18 #g e t p o s t P r o f i t , from n−1 t o 0 , t r a c k t h e max_local ,
global_min
19 max_local = p r i c e s [ −1]
20 max_global_profit = 0
21 f o r i i n r a n g e ( n−1, −1, −1) :
22 m a x _ g l o b a l _ p r o f i t= max( max_global_profit , max_local
−p r i c e s [ i ] )
23 max_local = max( max_local , p r i c e s [ i ] )
24 p o s t P r o f i t [ i ] = max_global_profit
29.3. SINGLE SEQUENCE O(N 3 ) 671

25 # i t e r a t e p r e P r o f i t and p o s t P r o f i t t o g e t t h e maximum
profit
26 m a x P r o f i t = max ( [ p r e+p o s t f o r pre , p o s t i n z i p (
preProfit , postProfit ) ] )
27 return maxProfit

818. Race Car (hard)

29.3 Single Sequence O(n3 )

The difference of this type of singe sequence is that there are not only n sub-
problems for a sequence of size n, each subarray is A[0:i], i=[0, n]. There will
be n2 subproblems, each states as subarray A[i : j], i ≤ j. Usually for this
type, it shows such optimal substructure dp[i][j] = f (dp[i][k], dp[k][j]), k ∈
[i, j]. This would give us the O(n3 ) time complexity and O(n2 ) space com-
plexity. The classical examples of this type of problem is matrix-multiplcation
as explained in Introduction to Algorithms and stone game.

29.3.1 Interval
Problems include Stone Game, Burst Ballons, and Scramble String. The
features of this type of dynamic programming is we try to get the min/-
max/count of a range of array; and the state transfer function updates
through the range by from the big range to small rang.

29.12 486. Predict the Winner (medium) Given an array of scores that
are non-negative integers. Player 1 picks one of the numbers from
either end of the array followed by the player 2 and then player 1 and
so on. Each time a player picks a number, that number will not be
available for the next player. This continues until all the scores have
been chosen. The player with the maximum score wins.
Given an array of scores, predict whether player 1 is the winner. You
can assume each player plays to maximize his score.
Example 1 : Input : [ 1 , 5 , 2 ] . Output : F a l s e

E x p l a n a t i o n : I n i t i a l l y , p l a y e r 1 can c h o o s e between 1 and

2.
I f he c h o o s e s 2 ( o r 1 ) , then p l a y e r 2 can c h o o s e from 1 ( o r
2 ) and 5 . I f p l a y e r 2 c h o o s e s 5 , then p l a y e r 1 w i l l be
l e f t with 1 ( o r 2 ) . So , f i n a l s c o r e o f p l a y e r 1 i s 1 + 2
= 3 , and p l a y e r 2 i s 5 . Hence , p l a y e r 1 w i l l n e v e r be
t h e winner and you need t o r e t u r n F a l s e .

Example 2 : Input : [ 1 , 5 , 2 3 3 , 7 ] . Output : True

672 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

E x p l a n a t i o n : P l a y e r 1 f i r s t c h o o s e s 1 . Then p l a y e r 2 have
t o c h o o s e between 5 and 7 . No matter which number p l a y e r
2 choose , p l a y e r 1 can c h o o s e 2 3 3 . F i n a l l y , p l a y e r 1
has more s c o r e ( 2 3 4 ) than p l a y e r 2 ( 1 2 ) , s o you need t o
r e t u r n True r e p r e s e n t i n g p l a y e r 1 can win .

Note:

1. 1 <= length of the array <= 20.

2. Any scores in the given array are non-negative integers and will
not exceed 10,000,000.
3. If the scores of both players are equal, then player 1 is still the
winner.

Solution: At first, we can not use f [i] to denote the state, because we
can choose element from both the left and the right side, we use f [i][j]
instead, which represents the maximum value we can get from i to j
range. Second, when we deal with problem with potential accumulate
value, we can use sum[i][j] to represent the sum in the range i − j.
Each player take actions to maximize their total points, f[i][j], it has
two choice: left, right, which left f[i+1][j] and f[i][j-1] respectively for
player two to choose. In order to gain the maximum scores in range [i,j]
we need to optimize it by making sure f[i+1][j] and f[i][j-1] we choose
the mimum value from. Therefore, we have state transfer function:
f [i][j] = sum[i][j] − min(f [i + 1][j], f [i][j − 1]). Each subproblem
relys on only two subproblems, which makes the total time complexity
O(n2 ). This is actually a game theory type. According to the function:
if the range is 1, when i == j, the value is nums[i], which is the
initialization. For the loop, the first for loop is the range: from size
2 to n, the second for loop to get the start index i in range [0, n − l],
then the end index j = i + l − 1. The answer for this problem is: if
f [0][−1] >= sum/2. If it is, then it is true.
The process of the for loop is we initialize the diagonal element, and
fill out element on the right upper side, which is upper diagonal.

Figure 29.3: Caption

1 d e f PredictTheWinner ( nums ) :
29.3. SINGLE SEQUENCE O(N 3 ) 673

2 """
3 : type nums : L i s t [ i n t ]
4 : rtype : bool
5 """
6 i f not nums :
7 return False
8 i f l e n ( nums ) ==1:
9 r e t u r n True
10 #sum [ i , j ] = sum [ j +1]−sum [ i ]
11 sums = nums [ : ]
12 f o r i i n r a n g e ( 1 , l e n ( nums ) ) :
13 sums [ i ]+=sums [ i −1]
14 sums . i n s e r t ( 0 , 0 )
15
16 dp = [ [ 0 f o r c o l i n r a n g e ( l e n ( nums ) ) ] f o r row i n
r a n g e ( l e n ( nums ) ) ]
17 f o r i i n r a n g e ( l e n ( nums ) ) :
18 dp [ i ] [ i ] = nums [ i ]
19
20 f o r l i n r a n g e ( 2 , l e n ( nums ) +1) :
21 f o r i i n r a n g e ( 0 , l e n ( nums )− l +1) : #s t a r t 0 , end
l e n − l +1
22 j =i+l −1
23 dp [ i ] [ j ] = ( sums [ j +1]−sums [ i ] )−min ( dp [ i + 1 ] [
j ] , dp [ i ] [ j −1])
24 n =l e n ( nums )
25 r e t u r n dp [ 0 ] [ n−1]>=sums [ −1]/2

Else, we use f [i][j] = max(nums[i]−f [i+1][j], nums[j]−f [i][j −1]) to

represent the difference of the points gained by player one compared
with player two. When f [i][j] is the state of player one, then f [i][j − 1]
and f [i + 1][j] are the potential states of player two.
1 class Solution :
2 d e f PredictTheWinner ( s e l f , nums ) :
3 """
4 : type nums : L i s t [ i n t ]
5 : rtype : bool
6 """
7 n = l e n ( nums )
8 i f n == 1 o r n%2==0 : r e t u r n True
9 dp = [ [ 0 ] ∗ n f o r _ i n r a n g e ( n ) ]
10 f o r l i n r a n g e ( 2 , l e n ( nums ) +1) :
11 f o r i i n r a n g e ( 0 , l e n ( nums )− l +1) : #s t a r t 0 , end
l e n − l +1
12 j =i+l −1
13 dp [ i ] [ j ] = max( nums [ j ] − dp [ i ] [ j −1] , nums [ i ]
− dp [ i + 1 ] [ j ] )
14 r e t u r n dp [0][ −1] >=0

Actually the for loop we can use a simpler one. However, it is harder
to understand to code compared with the standard version.
1 f o r i i n r a n g e ( n−1,−1,−1) :
674 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

2 dp [ i ] [ i ] = nums [ i ] #i n i t i a l i z a t i o n
3 f o r j i n r a n g e ( i +1,n ) :
4 dp [ i ] [ j ] = max( nums [ j ] − dp [ i ] [ j −1] , nums [ i ]
− dp [ i + 1 ] [ j ] )

29.13 Stone Game

There is a stone game. At the beginning of the game the player picks
n piles of stones in a line. The goal is to merge the stones in one pile
observing the following rules:
At each step of the game,the player can merge two adjacent piles to a
new pile. The score is the number of stones in the new pile. You are
to determine the minimum of the total score. Example For [4, 1, 1, 4],
in the best solution, the total score is 18:
Merge second and third piles [4, 2, 4], score +2 Merge the first two
piles [6, 4], score +6 Merge the last two piles [10], score +10
Other two examples: [1, 1, 1, 1] return 8 [4, 4, 5, 9] return 43

29.14 312. Burst Balloons

Given n balloons, indexed from 0 to n-1. Each balloon is painted with
a number on it represented by array nums. You are asked to burst
all the balloons. If the you burst balloon i you will get nums[left] *
nums[i] * nums[right] coins. Here left and right are adjacent indices of
i. After the burst, the left and right then becomes adjacent.
Find the maximum coins you can collect by bursting the balloons
wisely.
Note: (1) You may imagine nums[-1] = nums[n] = 1. They are not
real therefore you can not burst them. (2) 0 ≤ nums[i] ≤ 500, 0 ≤
nums[i] ≤ 100
Example :

Given [ 3 , 1 , 5 , 8 ]

Return 167

nums = [ 3 , 1 , 5 , 8 ] −−> [ 3 , 5 , 8 ] −−> [3 ,8] −−> [ 8 ] −−> [ ]

c o i n s = 3∗1∗5 + 3∗5∗8 + 1∗3∗8 + 1∗8∗1
= 167

at first burst c[i][k-1] then burst c[k+1][j], then burst k,

1 class Solution :
2 d e f maxCoins ( s e l f , nums ) :
3 """
4 : type nums : L i s t [ i n t ]
5 : rtype : int
6 """
29.3. SINGLE SEQUENCE O(N 3 ) 675

7 n = l e n ( nums )
8 nums . i n s e r t ( 0 , 1 )
9 nums . append ( 1 )
10
11 c = [ [ 0 f o r _ i n r a n g e ( n+2) ] f o r _ i n r a n g e ( n+2) ]
12 f o r l i n r a n g e ( 1 , n+1) : #l e n g t h [ 1 , n ]
13 f o r i i n r a n g e ( 1 , n−l +2) : #s t a r t [ 1 , n−l +1]
14 j = i+l −1 #end =i+l −1
15
16 #f u n c t i o n i s a k f o r l o o p
17 f o r k i n r a n g e ( i , j +1) :
18 c [ i ] [ j ] = max( c [ i ] [ j ] , c [ i ] [ k−1]+nums [ i
−1]∗nums [ k ] ∗ nums [ j +1]+c [ k + 1 ] [ j ] )
19 #r e t u r n from 1 t o n
20 return c [ 1 ] [ n ]

29.15 516. Longest Palindromic Subsequence

Given a string s, find the longest palindromic subsequence’s length in
s. You may assume that the maximum length of s is 1000.
Example 1 :

Input :
" bbbab "
Output :
4
One p o s s i b l e l o n g e s t p a l i n d r o m i c s u b s e q u e n c e i s " bbbb " .

Example 2 :
Input :
" cbbd "
Output :
2
One p o s s i b l e l o n g e s t p a l i n d r o m i c s u b s e q u e n c e i s " bb " .

Solution: for this problem, we have state dp[i][j] means from i to j, the
length of the longest palindromic subsequence. dp[i][i] = 1. Then we
use this range to fill in the dp matrix (upper triangle.)
1 def longestPalindromeSubseq ( s e l f , s ) :
2 """
3 : type s : s t r
4 : rtype : int
5 """
6 nums=s
7 i f not nums :
8 return 0
9 i f l e n ( nums ) ==1:
10 return 1
11
12 def isPanlidrome ( s ) :
13 l , r= 0 , l e n ( s )−1
14 w h i l e l<=r :
676 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

15 i f s [ l ]!= s [ r ] :
16 return False
17 else :
18 l+=1
19 r−=1
20 r e t u r n True
21
22 i f i s P a n l i d r o m e ( s ) : #t o s p e e d up
23 return len ( s )
24
25 rows=l e n ( nums )
26 dp = [ [ 0 f o r c o l i n r a n g e ( rows ) ] f o r row i n r a n g e (
rows ) ]
27 f o r i i n r a n g e ( 0 , rows ) :
28 dp [ i ] [ i ] = 1
29
30 f o r l i n r a n g e ( 2 , rows +1) : #u s e a l e n g t h
31 f o r i i n r a n g e ( 0 , rows−l +1) : #s t a r t 0 , end l e n −
l +1
32 j =i+l −1
33 i f j >rows :
34 continue
35 i f s [ i ]==s [ j ] :
36 dp [ i ] [ j ] = dp [ i + 1 ] [ j −1]+2
37 else :
38 l e f t _ s i z e , r i g h t _ s i z e = dp [ i ] [ j −1] , dp [ i
+1][ j ]
39 dp [ i ] [ j ]= max( dp [ i ] [ j −1] , r i g h t _ s i z e )
40 p r i n t ( dp )
41 r e t u r n dp [ 0 ] [ rows −1]

Or else, we can say, i need to be from i+1 to i, from big to small, j

need to from j-1 or j to j, from small to big.
1 f o r ( i n t i = n − 1 ; i >= 0 ; −−i ) {
2 dp [ i ] [ i ] = 1 ;
3 f o r ( i n t j = i + 1 ; j < n ; ++j ) {
4 i f ( s [ i ] == s [ j ] ) {
5 dp [ i ] [ j ] = dp [ i + 1 ] [ j − 1 ] + 2 ;
6 } else {
7 dp [ i ] [ j ] = max( dp [ i + 1 ] [ j ] , dp [ i ] [ j −
1]) ;
8 }
9 }
10 }

Now to do the space optimization:

1 class Solution {
2 public :
3 int longestPalindromeSubseq ( s t r i n g s ) {
4 int n = s . size () , res = 0;
5 v e c t o r dp ( n , 1 ) ;
6 f o r ( i n t i = n − 1 ; i >= 0 ; −−i ) {
29.4. COORDINATE: BFS AND DP 677

7 int len = 0;
8 f o r ( i n t j = i + 1 ; j < n ; ++j ) {
9 i n t t = dp [ j ] ;
10 i f ( s [ i ] == s [ j ] ) {
11 dp [ j ] = l e n + 2 ;
12 }
13 l e n = max( l e n , t ) ;
14 }
15 }
16 f o r ( i n t num : dp ) r e s = max( r e s , num) ;
17 return res ;
18 }
19 };

29.4 Coordinate: BFS and DP

In this type of problems, we are give an array or a matrix with 1D or 2D
axis. We either do ’optimization’ to find the minimum path sum, or do the
’counting’ to get the total number of paths, or check if we can start from A
and end at B.
Two-dimensional. For a O(mn) sized coordinate, Tab. 29.4 shows two
different types: one there will only be O(mn), and the other is O(kmn),
k here normally represents number of steps. Because a 2D coordinate is
inherently a graph, so this type is closely related to the graph traversal al-
gorithms; BFS for counting and DFS for the optimization problems. For this

Table 29.4: Different Type of Coordinate Dynamic Programming

Case Input Subproblems f(n) Time Space
Easy O(mn) O(mn) O(1) O(mn) O(mn)− >
O(m)
Medium O(mn) O(kmn) O(1) O(kmn) O(kmn)− >
O(mn)

type of problems, understanding the BFS related solution is more important

than just mesmorizing the template of the dynamic programming solution.
There, we will use two sections: Counting: BFS and DP in Sec. 29.4.1 and
Optimization in Sec. ?? with LeetCode examples to learn how to solve this
type of dynamic programming problems.

29.4.1 One Time Traversal

In this section, we want to explore how we can modify our solution from BFS
to the dynamic programming. Inherently, dynamic programming solutions
for this type of problems are the optimized Breath-first-search.
678 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

Counting
In this type, any location in the coordinate will be only vised once. Thus,
it gives O(mn) time complexity.
62. Unique Paths
1 A r o b o t i s l o c a t e d a t t h e top− l e f t c o r n e r o f a m x n g r i d (
marked ' S t a r t ' i n t h e diagram below ) .
2
3 The r o b o t can o n l y move e i t h e r down o r r i g h t a t any p o i n t i n
time . The r o b o t i s t r y i n g t o r e a c h t h e bottom−r i g h t c o r n e r o f
t h e g r i d ( marked ' F i n i s h ' i n t h e diagram below ) .
4
5 How many p o s s i b l e uni que p a t h s a r e t h e r e ?
6
7 Above i s a 3 x 7 g r i d . How many p o s s i b l e uni que p a t h s a r e t h e r e ?
8
9 Note : m and n w i l l be a t most 1 0 0 .
10
11 Example 1 :
12
13 Input : m = 3 , n = 2
14 Output : 3
15 Explanation :
16 From t h e top− l e f t c o r n e r , t h e r e a r e a t o t a l o f 3 ways t o r e a c h
t h e bottom−r i g h t c o r n e r :
17 1 . Right −> Right −> Down
18 2 . Right −> Down −> Right
19 3 . Down −> Right −> Right
20
21 Example 2 :
22
23 Input : m = 7 , n = 3
24 Output : 28

BFS. Fig. 29.4 shows the BFS traversal process in the matrix. We can
clearly see that each node and edge is only visited once. The BFS solution
is straightforward and is the best solution. We use bfs to track the nodes
in the queue at each level, and dp to record the unique paths to location
(i, j). Because each location is only visted once, thus, at each level, using
the same dp will have no conflict.
1 # BFS
2 d e f uniquePaths ( s e l f , m, n ) :
3 dp = [ [ 0 f o r _ i n r a n g e ( n ) ] f o r _ i n r a n g e (m) ]
4 dp [ 0 ] [ 0 ] = 1
5 bfs = set ([(0 ,0) ] )
6 d i r s = [ ( 1 , 0) , (0 ,1) ]
7 while bfs :
8 new_bfs = s e t ( )
9 for x , y in bfs :
10 f o r dx , dy i n d i r s :
11 nx , ny = x+dx , y+dy
12 i f 0<=nx<m and 0<=ny<n :
29.4. COORDINATE: BFS AND DP 679

Figure 29.4: One Time Graph Traversal. Different color means different
levels of traversal.

13 dp [ nx ] [ ny]+= dp [ x ] [ y ]
14 new_bfs . add ( ( nx , ny ) )
15 b f s = new_bfs
16 r e t u r n dp [m− 1 ] [ n−1]

Dynamic Programming. In the BFS solution, we use a set to track the

nodes at each level. However, its corresponding dynamic programming solu-
tion should design a way to obtain the result of current state only dependent
on the previous computed state. Here each position has a different state:
(x, y) and f [x][y] denotes the number of unique paths from start position (0,
0) to (x, y). The state transfer function: f [x][y] = f [x − 1][y] + f [x][y − 1].
If we initialize the boundary locations (the first row and the first column),
and we visit each location by loop over row and col, then we can get the
dynamic programming solution.
1 d e f u niquePaths ( s e l f , m, n ) :
2 i f m==0 o r n==0:
3 return 0
4 dp = [ [ 0 f o r c o l i n r a n g e ( n ) ] f o r row i n r a n g e (m) ]
5 dp [ 0 ] [ 0 ] = 1
6 #i n i t i a l i z e row 0
7 f o r c o l in range (1 , n) :
8 dp [ 0 ] [ c o l ] = dp [ 0 ] [ c o l −1]
9 #i n i t i a l i z e c o l 0
10 f o r row i n r a n g e ( 1 ,m) :
11 dp [ row ] [ 0 ] = dp [ row − 1 ] [ 0 ]
12
13 f o r row i n r a n g e ( 1 ,m) :
14 f o r c o l in range (1 , n) :
680 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

15 dp [ row ] [ c o l ]=dp [ row − 1 ] [ c o l ]+dp [ row ] [ c o l −1]

16 r e t u r n dp [m− 1 ] [ n−1]

377. Combination Sum IV (medium)

Target as Climbing Stairs. Analysis: The DFS+MEMO solution is given

in Section ??. However, because we just need to count the number, which
makes the dynamic programming possible. From the DFS solution, we can
see the state depends on the target, thus we can define a dp array that use
(target+1) space. This is like climbing stairs, each time we can either go 1,
or 2, or 3 steps.
1 [1 ,2 ,3] , t = 4
2 t = 0 : dp [ 0 ] = 1 , [ ]
3 t = 1 : t ( 0 ) +1, dp [ 1 ] = 1 ; [ 1 ]
4 t = 2 : t ( 0 ) +2, dp [ 2 ] = 1 , t ( 1 ) +1 , dp [2]+=1 , dp [ 2 ] = 2 ; [ 2 ] , [ 1 ,
1]
5 t = 3 : t ( 0 ) +3, dp [ 3 ] = 1 , t ( 1 ) +2, dp [3]+=1 , t ( 2 ) +1, dp [3]+=2 , dp
[3] = 4, [3] ,[1 ,2] ,[2 ,1] ,[1 ,1 ,1]
6 t = 4 : t ( 0 ) +4, dp [ 4 ] = 0 , t ( 1 ) +3, dp [4]+=1 , t ( 2 ) +2, dp [4]+=2 , t ( 3 )
+1 , dp [4]+=4 , dp [ 4 ] = 7 , [ 1 , 3 ] , [ 2 , 2 ] , [ 1 , 1 , 2 ] , [ 3 , 1 ] ,
[1 ,2 ,1] ,[2 ,1 ,1] , [1 ,1 ,1 ,1]

1 d e f combinationSum4 ( s e l f , nums , t a r g e t ) :
2 """
29.4. COORDINATE: BFS AND DP 681

3 : type nums : L i s t [ i n t ]
4 : type t a r g e t : i n t
5 : rtype : int
6 """
7 nums . s o r t ( )
8 n = l e n ( nums )
9 dp = [ 0 ] ∗ ( t a r g e t +1)
10 dp [ 0 ] = 1
11 f o r t i n r a n g e ( 1 , t a r g e t +1) :
12 f o r n i n nums :
13 i f t−n >= 0 :
14 dp [ t ] += dp [ t−n ]
15 else :
16 break
17 r e t u r n dp [ −1]

Optimization
64. Minimum Path Sum (medium)
1
2 Given a m x n g r i d f i l l e d with non−n e g a t i v e numbers , f i n d a path
from top l e f t t o bottom r i g h t which m i n i m i z e s t h e sum o f a l l
numbers a l o n g i t s path .
3
4 Note : You can o n l y move e i t h e r down o r r i g h t a t any p o i n t i n
time .
5
6 Example 1 :
7
8 [[1 ,3 ,1] ,
9 [1 ,5 ,1] ,
10 [4 ,2 ,1]]

Given the above grid map, return 7. Because the path 1 → 3 → 1 →

1 → 1 minimizes the sum. Dynamic Programming. For this problem, it
is exactly the same as all the previous problems, the only difference is the
state transfer function. f (i, j) = g(i, j) + min(f (i − 1, j), f (i, j − 1)).
1 # dynamic programming
2 d e f minPathSum ( s e l f , g r i d ) :
3 i f not g r i d :
4 return 0
5 rows , c o l s = l e n ( g r i d ) , l e n ( g r i d [ 0 ] )
6 dp = [ [ 0 f o r _ i n r a n g e ( c o l s ) ] f o r _ i n r a n g e ( rows ) ]
7 dp [ 0 ] [ 0 ] = g r i d [ 0 ] [ 0 ]
8
9 # i n i t i a l i z e row
10 f o r c in range (1 , c o l s ) :
11 dp [ 0 ] [ c ] = dp [ 0 ] [ c −1] + g r i d [ 0 ] [ c ]
12
13 # i n i t i a l i z e col
14 f o r r i n r a n g e ( 1 , rows ) :
682 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

Figure 29.5: Caption

15 dp [ r ] [ 0 ] = dp [ r − 1 ] [ 0 ] + g r i d [ r ] [ 0 ]
16
17 f o r r i n r a n g e ( 1 , rows ) :
18 f o r c in range (1 , c o l s ) :
19 dp [ r ] [ c ] = g r i d [ r ] [ c ] + min ( dp [ r − 1 ] [ c ] , dp [ r ] [ c −1])
20 r e t u r n dp [ − 1 ] [ − 1 ]

Dynamic Programming with Space Optimization. As can be seen,

each time when we update sum[i][j], we only need sum[i − 1][j] (at the
current column) and sum[i][j − 1] (at the left column). So we need not
maintain the full m ∗ n matrix. Maintaining two columns is enough and now
we have the following code.
1 rows , c o l s= l e n ( g r i d ) , l e n ( g r i d [ 0 ] )
2 #O( rows )
3 pre , c u r = [ 0 ] ∗ rows , [ 0 ] ∗ rows
4 #i n t i a l i z e t h e t h e f i r s t c o l , walk from t h e ( 0 , 0 ) − >(1 ,0)
−>(row , 0 )
5 pre [0]= g r i d [ 0 ] [ 0 ]
6
7 f o r row i n r a n g e ( 1 , rows ) :
8 p r e [ row]= p r e [ row−1]+ g r i d [ row ] [ 0 ] #t h i s i s e q u a l t o
c o s t [ 0 ] [ row ]
9 f o r c o l in range (1 , c o l s ) :
10 c u r [ 0 ] = p r e [ 0 ] + g r i d [ 0 ] [ c o l ] #i n i t i a l i z e t h e f i r s t
row , c u r r e n t [ 0 ] [ 0 ]
11 f o r row i n r a n g e ( 1 , rows ) :
12 c u r [ row]= min ( c u r [ row −1] , p r e [ row ] )+g r i d [ row ] [
col ]
13 pre , c u r = cur , p r e
14 r e t u r n p r e [ rows −1]

Further inspecting the above code, it can be seen that maintaining pre
is for recovering pre[i], which is simply cur[i] before its update. So it is
enough to use only one vector. Now the space is further optimized and the
29.4. COORDINATE: BFS AND DP 683

code also gets shorter.

1 rows , c o l s= l e n ( g r i d ) , l e n ( g r i d [ 0 ] )
2 #O( rows )
3 c u r = [ 0 ] ∗ rows
4 #i n t i a l i z e t h e t h e f i r s t c o l , walk from t h e ( 0 , 0 ) − >(1 ,0)
−>(row , 0 )
5 cur [0]= g r i d [ 0 ] [ 0 ]
6 f o r row i n r a n g e ( 1 , rows ) :
7 c u r [ row]= c u r [ row−1]+ g r i d [ row ] [ 0 ]
8 f o r c o l in range (1 , c o l s ) :
9 c u r [ 0 ] = c u r [ 0 ] + g r i d [ 0 ] [ c o l ] #i n i t i a l i z e t h e f i r s t
row
10 f o r row i n r a n g e ( 1 , rows ) :
11 c u r [ row]= min ( c u r [ row −1] , c u r [ row ] )+g r i d [ row ] [
col ]
12 r e t u r n c u r [ rows −1]

Now, we use O(1) space by reusing the original grid.

1 rows , c o l s= l e n ( g r i d ) , l e n ( g r i d [ 0 ] )
2 #O( 1 ) s p a c e by r e u s i n g t h e s p a c e h e r e
3 f o r i i n r a n g e ( 0 , rows ) :
4 f o r j in range (0 , c o l s ) :
5 i f i==0 and j ==0:
6 continue
7 e l i f i==0 :
8 g r i d [ i ] [ j ]+= g r i d [ i ] [ j −1]
9 e l i f j ==0:
10 g r i d [ i ] [ j ]+= g r i d [ i − 1 ] [ j ]
11 else :
12 g r i d [ i ] [ j ]+= min ( g r i d [ i − 1 ] [ j ] , g r i d [ i ] [ j −1])
13 r e t u r n g r i d [ rows − 1 ] [ c o l s −1]

29.4.2 Multiple-time Traversal

In this type, we need to traverse each location for K times, making K steps
of moves thus we can get the final solution. This will have O(kmn) time
complexity.

Two-dimensional Coordinate
935. Knight Dialer (Medium)
1 A c h e s s k n i g h t can move a s i n d i c a t e d i n t h e c h e s s diagram below :
2 This time , we p l a c e our c h e s s k n i g h t on any numbered key o f a
phone pad ( i n d i c a t e d above ) , and t h e k n i g h t makes N−1 hops .
Each hop must be from one key t o a n o t h e r numbered key .
3
4 Each time i t l a n d s on a key ( i n c l u d i n g t h e i n i t i a l placement o f
t h e k n i g h t ) , i t p r e s s e s t h e number o f t h a t key , p r e s s i n g N
digits total .
5
684 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

Figure 29.6: Caption

6 How many d i s t i n c t numbers can you d i a l i n t h i s manner ?

7
8 S i n c e t h e answer may be l a r g e , output t h e answer modulo 10^9 +
7.
9
10 Example 1 :
11
12 Input : 1
13 Output : 10
14
15 Example 2 :
16
17 Input : 2
18 Output : 20
19
20 Example 3 :
21
22 Input : 3
23 Output : 46
24
25 Note :
26
27 1 <= N <= 5000

Most Naive BFS. Analysis: First, we need to figure out from each num-
ber, where is the possible next moves. We would have get this dictionary:
moves = {0 : [4, 6], 1 : [6, 8], 2 : [7, 9], 3 : [4, 8], 4 : [0, 3, 9], 5 : [], 6 : [0, 1, 7], 7 :
[2, 6], 8 : [1, 3], 9 : [2, 4]}. This is not exactly a coordinate, however, because
we can make endless move, we would have a graph. The brute force is we
put [0,1,2,3,4,5,6,7,8,9] as the start positions, and we use BFS to control the
steps, the total number of paths is the sum over of all the leaves. At each
step, we would do two things 1) generate a list to save all the possible next
numbers; 2) if it reaches to the leaves, sum up all the nodes.
1 # n a i v e BFS s o l u t i o n
2 d e f k n i g h t D i a l e r ( s e l f , N) :
3 """
4 : type N: i n t
5 : rtype : int
6 """
7 i f N == 1 :
8 r e t u r n 10
29.4. COORDINATE: BFS AND DP 685

9 moves = { 0 : [ 4 , 6 ] , 1 : [ 6 , 8 ] , 2 : [ 7 , 9 ] , 3 : [ 4 , 8 ] , 4 : [ 0 , 3 ,
9 ] , 5 : [ ] , 6 : [ 0 , 1 , 7 ] , 7 : [ 2 , 6 ] , 8 : [ 1 , 3 ] , 9 : [ 2 , 4 ] } #4 , 6 has
three
10
11 bfs = [0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9] # a l l starting points
12 step = 1
13 while bfs :
14 new = [ ]
15 for i in bfs :
16 new += moves [ i ]
17 s t e p += 1
18 b f s = new
19 i f s t e p == N:
20 r e t u r n l e n ( b f s ) %(10∗∗9+7)

Optimized BFS. However, the brute force BFS only passed 18/120 test
cases. To improve it further, we know that we only need a counter to record
the counter of each number in that level. This way, bfs is replaced with a
counter. Now, the new code is:
1 #o p t i m i z e d BFS e x a c t l y a DP
2 def knightDialer ( s e l f , N) :
3 MOD = 10∗∗9+7
4 i f N == 1 :
5 r e t u r n 10
6 moves = { 0 : [ 4 , 6 ] , 1:[6 , 8] , 2: [7 , 9] , 3: [4 ,8] , 4: [0 , 3 ,
9] , 5 : [ ] , 6:[0 ,1 ,7] , 7 : [ 2 , 6 ] , 8 : [ 1 , 3 ] , 9 : [ 2 , 4 ] } #4 , 6 has
three
7
8 bfs = [1]∗10
9 step = 1
10
11 while bfs :
12 size = 0
13 new = [ 0 ] ∗ 1 0
14 f o r idx , count i n enumerate ( b f s ) :
15 f o r m i n moves [ i d x ] :
16 new [m] += count
17 new [m] %= MOD
18 s t e p += 1
19 b f s = new
20 i f s t e p == N:
21 r e t u r n sum ( b f s ) %(MOD)

Optimized Dynamic Programming. This is exactly a dynamic pro-

gramming algorithm: new[m]+ = bf s[i], for example, from 1 we can move
to 6,8,so that we have f (1, n) = f (6, n − 1) + f (8, n − 1). So here a state
is represented by bf s[num] and step, and it saves the count at each state.
Now, we write it in the way of dp template:
1 # o p t i m i z e d dynamic programming t e m p l a t e
2 d e f k n i g h t D i a l e r ( s e l f , N) :
3 MOD = 10∗∗9+7
686 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

4 moves = { 0 : [ 4 , 6 ] , 1 : [ 6 , 8 ] , 2 : [ 7 , 9 ] , 3 : [ 4 , 8 ] , 4 : [ 0 , 3 ,
9 ] , 5 : [ ] , 6 : [ 0 , 1 , 7 ] , 7 : [ 2 , 6 ] , 8 : [ 1 , 3 ] , 9 : [ 2 , 4 ] } #4 , 6 has
three
5
6 dp = [ 1 ] ∗ 1 0
7
8 f o r s t e p i n r a n g e (N−1) :
9 size = 0
10 new_dp = [ 0 ] ∗ 1 0
11 f o r idx , count i n enumerate ( dp ) :
12 f o r m i n moves [ i d x ] :
13 new_dp [m] += count
14 new_dp [m] %= MOD
15 dp = new_dp
16
17 r e t u r n sum ( dp )%(MOD)

688. Knight Probability in Chessboard (Medium)

1 On an NxN c h e s s b o a r d , a k n i g h t s t a r t s a t t h e r−th row and c−th
column and a t t e m p t s t o make e x a c t l y K moves . The rows and
columns a r e 0 indexed , s o t h e top− l e f t s q u a r e i s ( 0 , 0 ) , and
t h e bottom−r i g h t s q u a r e i s (N−1, N−1) .
2
3 A c h e s s k n i g h t has 8 p o s s i b l e moves i t can make , a s i l l u s t r a t e d
below . Each move i s two s q u a r e s i n a c a r d i n a l d i r e c t i o n , then
one s q u a r e i n an o r t h o g o n a l d i r e c t i o n .
4
5 Each time t h e k n i g h t i s t o move , i t c h o o s e s one o f e i g h t
p o s s i b l e moves u n i f o r m l y a t random ( even i f t h e p i e c e would
go o f f t h e c h e s s b o a r d ) and moves t h e r e .
6
7 The k n i g h t c o n t i n u e s moving u n t i l i t has made e x a c t l y K moves o r
has moved o f f t h e c h e s s b o a r d . Return t h e p r o b a b i l i t y t h a t
t h e k n i g h t r e m a i n s on t h e board a f t e r i t has s t o p p e d moving .
8
9 Example :
10
11 Input : 3 , 2 , 0 , 0
12 Output : 0 . 0 6 2 5
13 E x p l a n a t i o n : There a r e two moves ( t o ( 1 , 2 ) , ( 2 , 1 ) ) t h a t w i l l
keep t h e k n i g h t on t h e board .
14 From each o f t h o s e p o s i t i o n s , t h e r e a r e a l s o two moves t h a t w i l l
keep t h e k n i g h t on t h e board .
15 The t o t a l p r o b a b i l i t y t h e k n i g h t s t a y s on t h e board i s 0 . 0 6 2 5 .

Optimized BFS. Analysis: Each time we can make 8 moves, thus after
K steps, we can have 8K total unique paths. Thus, we just need to get
the total number of paths that it ends within the board (valid paths). The
first step is to write down the possible moves or directions. And, then we
initialize a two-dimensional array dp to record the number of paths end at
(i, j) after k steps. Using a BFS solution, each time we just need to save all
the unique positions can be reached at that step.
29.4. COORDINATE: BFS AND DP 687

1 # Optimized BFS s o l u t i o n
2 d e f k n i g h t P r o b a b i l i t y ( s e l f , N, K, r , c ) :
3 d i r s = [ [ − 2 , −1] , [ −2 , 1 ] , [ −1 , −2] , [ −1 , 2 ] , [ 1 , −2] , [ 1 ,
2] , [2 , −1] ,[2 , 1 ] ]
4 dp = [ [ 0 f o r _ i n r a n g e (N) ] f o r _ i n r a n g e (N) ]
5 t o t a l = 8∗∗K
6 last_pos = set ( [ ( r , c ) ] )
7 dp [ r ] [ c ]=1
8
9 f o r s t e p i n r a n g e (K) :
10 new_pos = s e t ( )
11 new_dp = [ [ 0 f o r _ i n r a n g e (N) ] f o r _ i n r a n g e (N) ]
12 f o r x , y in last_pos :
13 f o r dx , dy i n d i r s :
14 nx = x+dx
15 ny = y+dy
16 i f 0<=nx<N and 0<=ny<N:
17 new_dp [ nx ] [ ny ] += dp [ x ] [ y ]
18 new_pos . add ( ( nx , ny ) )
19 l a s t _ p o s = new_pos
20 dp = new_dp
21
22 r e t u r n f l o a t ( sum (map( sum , dp ) ) ) / t o t a l

Optimized Dynamic Programming. If we delete the last_pos and di-

rectly use the dp to use as a way to visit the last positions, this is a space
optimized dynamic programming solution. And this solution is nearly the
fastest; compared with the above solution, each step we cut down the cost
of maintain a set (a hashmap) dynamically.

1 # Best Dynamic Programming S o l u t i o n

2 d e f k n i g h t P r o b a b i l i t y ( s e l f , N, K, r , c ) :
3 d i r s = [ [ − 2 , −1] , [ −2 , 1 ] , [ −1 , −2] , [ −1 , 2 ] , [ 1 , −2] , [ 1 ,
2] , [2 , −1] ,[2 , 1 ] ]
4 dp = [ [ 0 f o r _ i n r a n g e (N) ] f o r _ i n r a n g e (N) ]
5 t o t a l = 8∗∗K
6 dp [ r ] [ c ]=1
7
8 f o r s t e p i n r a n g e (K) :
9 new_dp = [ [ 0 f o r _ i n r a n g e (N) ] f o r _ i n r a n g e (N) ]
10 f o r i i n r a n g e (N) :
11 f o r j i n r a n g e (N) :
12 i f dp [ i ] [ j ] == 0 :
13 c o n t i n u e #not a v a i l a b l e p o s i t i o n
14 f o r dx , dy i n d i r s :
15 nx , ny = i+dx , j+dy
16 i f 0<=nx<N and 0<=ny<N:
17 new_dp [ nx ] [ ny ] += dp [ i ] [ j ]
18 dp = new_dp
19
20 r e t u r n f l o a t ( sum (map( sum , dp ) ) ) / t o t a l
688 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

One-dimensional Coordinate
For one-dimensional, it is the same as two-dimensional, and it could be even
simpler.
70. Climbing Stairs (Easy)
1 You a r e c l i m b i n g a s t a i r c a s e . I t t a k e s n s t e p s t o r e a c h t o t h e
top .
2
3 Each time you can e i t h e r c l i m b 1 o r 2 s t e p s . In how many
d i s t i n c t ways can you c l i m b t o t h e top ?
4
5 Note : Given n w i l l be a p o s i t i v e i n t e g e r .
6
7 Example 1 :
8
9 Input : 2
10 Output : 2
11 E x p l a n a t i o n : There a r e two ways t o c l i m b t o t h e top .
12 1. 1 step + 1 step
13 2. 2 steps
14
15 Example 2 :
16
17 Input : 3
18 Output : 3
19 E x p l a n a t i o n : There a r e t h r e e ways t o c l i m b t o t h e top .
20 1. 1 step + 1 step + 1 step
21 2. 1 step + 2 steps
22 3. 2 steps + 1 step

Figure 29.7: Tree Structure for One dimensional coordinate

BFS. Fig 29.7 demonstrates the state transfer relation between different
position. First, we can solve it using our standard BFS. In this problem, we
do not know the level of the tree structure, so the end condition is while the
bfs is not empty. Thus, eventually the bfs is set to empty, and the result of
the dp is empty too. So we use a global variable ans to track our result.
29.4. COORDINATE: BFS AND DP 689

1 # BFS
2 def climbStairs ( s e l f , n) :
3 dp = [ 0 ] ∗ ( n+1)
4 dp [ 0 ] = 1 # i n i t s t a r t i n g p o i n t 0 t o 1
5 dirs = [1 , 2]
6 bfs = set ( [ 0 ] )
7 ans = 0
8 while bfs :
9 new_dp = [ 0 ] ∗ ( n+1)
10 new_bfs = s e t ( )
11 f o r i i n b f s : #pos
12 f o r dx i n d i r s :
13 nx = i+dx
14 i f 0 <= nx <= n :
15 new_dp [ nx ] += dp [ i ]
16 new_bfs . add ( nx )
17 ans += dp [ −1]
18 b f s , dp = new_bfs , new_dp
19 r e t u r n ans

Dynamic Programming. If we observe the tree structure, we have the

state transfer function f (i) = f (i − 1) + f (i − 2). Thus, a single for loop
starts from 2, the dp list can be filled in without overlap.
1 # Dynamic Programming
2 def climbStairs ( s e l f , n) :
3 dp = [ 0 ] ∗ ( n+1)
4 dp [ 0 ] = 1 # i n i t s t a r t i n g p o i n t 0 t o 1
5 dp [ 1 ] = 1
6
7 f o r i i n r a n g e ( 2 , n+1) :
8 dp [ i ] = dp [ i −1] + dp [ i −2]
9 r e t u r n dp [ −1]

The BFS and the Dynamic Programming has the same time and space
complexity.

29.4.3 Generalization
1. State: f [x] or f [x][y] to denote the optimum value or count, or check
the workability of whole solutions till axis x for 1D and (x, y) for 2D;
2. Function: usually for f [x], we connect f [x]Rf [x − 1], or f [x][y]Rf [x −
1][y], f [x][y − 1];
3. Initialization: for f [x] we initialize the starting point, sometimes we
need extra 1 space, with size n + 1; for f [x][y] we need to initialize
elements from row 0 and col 0;
4. Answer: Usually it is f [n − 1] or f [m − 1][n − 1];

Space Optimization For f [i] = max(f [i − 1], f [i − 2] + A[i]), it can be

converted into f [i%2] = max(f [(i − 1)%2], f [(i − 2)%2]). Also, we can
690 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

directly using the original matrix or array to save the state results. Note:
there are possible ways to optimize the space complexity, we can do it from
O(m ∗ n) to O(m + n) to O(1) which we get by reusing the original grid or
array.

One-Time Traversal

1 dp [ i ] [ j ] := answer o f A[0−> i ][0−> j ]

2
3 #t e m p l a t e
4 dp [ n + 1 ] [m+1]
5 f o r i in range (n) :
6 f o r j i n r a n g e (m) :
7 dp [ i ] [ j ] = f ( dp [ pre_i ] [ pre_j ] )
8 r e t u r n f ( dp )

Multiple-Dimensional Traversal

1 dp [ k ] [ i ] [ j ] := answer o f A[0−> i ][0−> j ] a f t e r k s t e p s

2
3 #t e m p l a t e
4 dp [ k ] [ n + 1 ] [m+1]
5 f o r _ in range ( k ) :
6 f o r i in range (n) :
7 f o r j i n r a n g e (m) :
8 dp [ k ] [ i ] [ j ] = f ( dp [ k − 1 ] [ pre_i ] [ pre_j ] )
9 r e t u r n f ( dp )

29.5 Double Sequence: Pattern Matching DP

Figure 29.8: Longest Common Subsequence

In this section, we focus on double sequence P and S with input size

O(m) + O(n). Because double sequence can naturally be arranged to be a
matrix with size (m+1)×(n+1). Here we have extra row and extra column,
29.5. DOUBLE SEQUENCE: PATTERN MATCHING DP 691

it happens because we put empty char ” at the beginning of each string to

better initialize and get result even for empty string too. One example is
shown in Fig. 29.8. This mostly make the time complexity for this section
O(mn). This type of dynamic programming can be generalized to coordinate
problems. The difference is the moves are not given as in coordinate section
(29.4.1).
We need to find the deduction rules or say recurrence relation ourselves.
Most of the time, the moves are around their neighbors: for (i, j), we have
potential positions of (i-1, j-1), (i-1, j), (i, j-1). For example, in the case of
Longest Common Subsequence in Fig. 29.8, if current P[i] and S[j] matches,
then it only depends on dp[i-1][j-1]. If not, it depends on the relation between
(P(0, i), S(0,j-1)) and (P(0, i-1), S(0,j)). Filling out an examplary table
manually can guide us find the rules. If we do so, we would find out that
even problems marked as hard from LeetCode is solvable.
Brute Force. For the brute force solution: we need t
Problems shown in this section include:

1. 72. Edit Distance

2. 712. Minimum ASCII Delete Sum for Two Strings

3. 115. Distinct Subsequences (hard)

4. 44. Wildcard Matching (hard)

29.5.1 Longest Common Subsequence

Problem Definition: Given two string A and B, for example A is "ABCD",
and B is "ABD", the longest common subsequence is "ABD", so the length
of the longest common subsequence is 3.
Coordinate+Moves. Because each has m and n subproblems, two
sequence make it a matrix problem. The result of the above example is
shown in Fig. 29.8. We can try to observe the problem and generalize the
moves or state transfer function. For the red marked positions, the char
in string A and B are the same. So, the length would be the result of its
previous subtrings plus one. Otherwise as the black marked positions, it is
the maximum of the left and above positions. And the math equation is
shown in Eq. 29.2. To initialize, we need to initialize the first row and the
first column, which is f [i][0] = 0, f [0][j] = 0.

1 + f [i − 1][j − 1], a[i − 1] == b[j − 1];

(
f [i][j] = (29.2)
max(f [i − 1][j], f [i][j − 1]) otherwise

The Python code is shown as follow:

692 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

1 d e f LCSLen ( S1 , S2 ) :
2 i f not S1 o r not S2 :
3 return 0
4 n , m = l e n ( S1 ) , l e n ( S2 )
5 f = [ [ 0 ] ∗ (m+1) f o r _ i n r a n g e ( n+1) ]
6 #i n i t f [ 0 ] [ 0 ] = 0
7 f o r i in range (n) :
8 f o r j i n r a n g e (m) :
9 f [ i + 1 ] [ j +1] = f [ i ] [ j ]+1 i f S1 [ i ]==S2 [ j ] e l s e max( f [ i
] [ j +1] , f [ i + 1 ] [ j ] )
10 print ( f )
11 return f [ −1][ −1]
12 S1 = "ABCD"
13 S2 = "ABD"
14 LCSLen ( S1 , S2 )
15 # output
16 # [[0 , 0 , 0 , 0] , [0 , 1 , 1 , 1] , [0 , 1 , 2 , 2] , [0 , 1 , 2 , 2] , [0 ,
1, 2, 3]]
17 # 3

29.5.2 Other Problems

There are more pattern matching related dynamic programming, we give
them in this section.

29.16 72. Edit Distance (hard). Given two words word1 and word2,
find the minimum number of operations required to convert word1
to word2. You have the following 3 operations permitted on a word:
Insert a character, Delete a character, Replace a character.
Example 1 :

Input : word1 = " h o r s e " , word2 = " r o s "

Output : 3
Explanation :
h o r s e −> r o r s e ( r e p l a c e ' h ' with ' r ' )
r o r s e −> r o s e ( remove ' r ' )
r o s e −> r o s ( remove ' e ' )

Example 2 :

Input : word1 = " i n t e n t i o n " , word2 = " e x e c u t i o n "

Output : 5
Explanation :
i n t e n t i o n −> i n e n t i o n ( remove ' t ' )
i n e n t i o n −> e n e n t i o n ( r e p l a c e ' i ' with ' e ' )
e n e n t i o n −> e x e n t i o n ( r e p l a c e ' n ' with ' x ' )
e x e n t i o n −> e x e c t i o n ( r e p l a c e ' n ' with ' c ' )
e x e c t i o n −> e x e c u t i o n ( i n s e r t ' u ' )

Coordinate+Deduction. This is similar to the LCS length. We use

f[i][j] to denote the minimum number of operations needed to make
29.5. DOUBLE SEQUENCE: PATTERN MATCHING DP 693

the previous i chars in S1 to be the same as the first j chars in S2. The
upbound of the minimum edit distance is max(m,n) by replacing and
insertion. The most important step is to decide the transfer function:
to get the result of current state f[i][j]. If directly filling in the matrix
is obscure, then we can try the recursive:
DFS ( " h o r s e " , " r o s e " )
= DFS ( " h o r s " , " r o s " ) # no e d i t a t e
= DFS ( " hor " , " r o " ) # no e d i t a t s
= 1+ min (DFS ( " ho " , " r o " ) , # d e l e t e " r " from l o n g e r one
DFS ( " hor " , " r " ) , # i n s e r t " o " a t t h e l o n g e r one , l e f t
" hor " and " r " t o match
DFS ( " ho " , " r " ) ) , # r e p l a c e " r " i n t h e l o n g e r one with
" o " i n t h e s h o r t e r one , l e f t " ho " and " r " t o match

Be written as equation 29.3. Thus, it can be solved by dynamic pro-

gramming.
min(f [i][j − 1], f [i − 1][j], f [i − 1][j − 1]) + 1, S1[i − 1]! = S1[j − 1];
(
f [i][j] =
f [i − 1][j − 1] otherwise
(29.3)
The Python code is as follows:
1 d e f minDistance ( word1 , word2 ) :
2 i f not word1 :
3 i f not word2 :
4 return 0
5 else :
6 r e t u r n l e n ( word2 )
7 i f not word2 :
8 r e t u r n l e n ( word1 )
9 dp = [ [ 0 f o r c o l i n r a n g e ( l e n ( word2 ) +1) ] f o r row i n
r a n g e ( l e n ( word1 ) +1) ]
10 rows=l e n ( word1 )
11 c o l s=l e n ( word2 )
12 f o r row i n r a n g e ( 1 , rows +1) :
13 dp [ row ] [ 0 ] = row
14 f o r c o l i n r a n g e ( 1 , c o l s +1) :
15 dp [ 0 ] [ c o l ] = c o l
16
17 f o r i i n r a n g e ( 1 , rows +1) :
18 f o r j i n r a n g e ( 1 , c o l s +1) :
19 i f word1 [ i −1]==word2 [ j − 1 ] :
20 dp [ i ] [ j ]=dp [ i − 1 ] [ j −1]
21 else :
22 dp [ i ] [ j ]=min ( dp [ i − 1 ] [ j ]+1 , dp [ i ] [ j −1]+1 , dp
[ i − 1 ] [ j −1]+1) # add , d e l e t e , r e p l a c e
23 r e t u r n dp [ rows ] [ c o l s ]

29.17 115. Distinct Subsequences (hard).

Given a string S and a string T, count the number of distinct subse-
quences of S which equals T.
694 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

A subsequence of a string is a new string which is formed from the

original string by deleting some (can be none) of the characters with-
out disturbing the relative positions of the remaining characters. (ie,
"ACE" is a subsequence of "ABCDE" while "AEC" is not).
Example 1 :

Input : S = " r a b b b i t " , T = " r a b b i t "

Output : 3
Explanation :

As shown below , t h e r e a r e 3 ways you can g e n e r a t e " r a b b i t "

from S .
( The c a r e t symbol ^ means t h e c h o s e n l e t t e r s )

rabbbit
^^^^ ^^
rabbbit
^^ ^^^^
rabbbit
^^^ ^^^

Example 2 :

Input : S = " babgbag " , T = " bag "

Output : 5
Explanation :

As shown below , t h e r e a r e 5 ways you can g e n e r a t e " bag "

from S .
( The c a r e t symbol ^ means t h e c h o s e n l e t t e r s )

babgbag
^^ ^
babgbag
^^ ^
babgbag
^ ^^
babgbag
^ ^^
babgbag
^^^

Coordinate. Here still we need to fill out a matrix. We would see if

the length of s is smaller than the length of t: then it is 0. If the length
is equal, which is the diagonal in the matrix, then it only depends on
position (i-1, j-1) and s(i), s(j). For the lower part of the matrix it has
different rule: for example, s = ’ab’, t = ’a’, because s[i]!=t[j], then
we need to find s[0, i-1] with t[0, j]. if it equals, we can check the
dp[i-1][j-1].
'' a b a
'' 1 0 0 0
29.5. DOUBLE SEQUENCE: PATTERN MATCHING DP 695

a 1 1 0 0
b 1 1 1 0
b 1 1 2 0
a 1 2 2 2

1 d e f numDistinct ( s e l f , s , t ) :
2 i f not s o r not t :
3 i f not s and t :
4 return 0
5 else :
6 return 1
7
8 rows , c o l s = l e n ( s ) , l e n ( t )
9 i f c o l s > rows :
10 return 0
11 i f c o l s == rows :
12 r e t u r n 1 i f s==t e l s e 0
13
14 # initalize
15 dp = [ [ 0 f o r c i n r a n g e ( c o l s +1) ] f o r r i n r a n g e ( rows +1)
]
16 f o r r i n r a n g e ( rows ) :
17 dp [ r + 1 ] [ 0 ] = 1
18 dp [ 0 ] [ 0 ] = 1
19
20 # f i l l out t h e l o w e r p a r t
21 f o r i i n r a n g e ( rows ) :
22 f o r j i n r a n g e ( min ( i +1 , c o l s ) ) :
23 i f i==j : # d i a g n o a l
24 i f s [ i ] == t [ j ] :
25 dp [ i + 1 ] [ j +1] = dp [ i ] [ j ]
26 e l s e : # l o w e r h a l f o f t h e matrix
27 i f s [ i ] == t [ j ] :
28 dp [ i + 1 ] [ j +1] = dp [ i ] [ j +1]+dp [ i ] [ j ] # dp
[ i ] [ j ] i s b e c a u s e they e q u a l , s o check p r e v i o u s i , j ,
29 else :
30 dp [ i + 1 ] [ j +1] = dp [ i ] [ j +1] # check t h e
s u b s e q u e n c e b e f o r e t h i s c h a r i n S i s t h e same a s t
31 r e t u r n dp [ − 1 ] [ − 1 ]

29.18 44. Wildcard Matching (hard). Given an input string (s) and
a pattern (p), implement wildcard pattern matching with support for
’?’ and ’*’.
’?’ Matches any single character. ’*’ Matches any sequence of charac-
ters (including the empty sequence).
The matching should cover the entire input string (not partial).
Note:
s could be empty and contains only lowercase letters a-z. p could be
empty and contains only lowercase letters a-z, and characters like ? or
*.
696 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

Example 1 :

Input :
s = " aa "
p = "a"
Output : f a l s e
E x p l a n a t i o n : " a " d o e s not match t h e e n t i r e s t r i n g " aa " .

Example 2 :

Input :
s = " aa "
p = "∗"
Output : t r u e
E x p l a n a t i o n : ' ∗ ' matches any s e q u e n c e .

Example 3 :

Input :
s = " cb "
p = "? a "
Output : f a l s e
E x p l a n a t i o n : ' ? ' matches ' c ' , but t h e s e c o n d l e t t e r i s 'a ' ,
which d o e s not match ' b ' .

Example 4 :

Input :
s = " adceb "
p = " ∗ a ∗b "
Output : t r u e
E x p l a n a t i o n : The f i r s t ' ∗ ' matches t h e empty s e q u e n c e ,
w h i l e t h e s e c o n d ' ∗ ' matches t h e s u b s t r i n g " dce " .

Solution 1: Complete Search: DFS. We start from the first ele-

ment in s and p with index i, j. If it is a ’?’, or s[i]=p[j], we match
dfs(i+1, j+1). The more complex one is for ’*’, it can go from empty
to full length of s. Therefore, we call df s(k, j + 1), k ∈ [i, n]. Check if
any of these recursive calls return True. It receives LTE error.
1 d e f isMatch ( s e l f , s , p ) :
2 """
3 : type s : s t r
4 : type p : s t r
5 : rtype : bool
6 """
7 ns , np = l e n ( s ) , l e n ( p )
8 def helper ( si , pi ) :
9 i f s i == ns and p i == np :
10 r e t u r n True
11 e l i f s i == ns o r p i == np :
12 i f s i == ns : # i f p a t t e r n l e f t , make s u r e i t s
a l l '∗ '
29.5. DOUBLE SEQUENCE: PATTERN MATCHING DP 697

13 f o r i i n r a n g e ( pi , np ) :
14 i f p [ i ] != ' ∗ ' :
15 return False
16 r e t u r n True
17 e l s e : # i f string l e f t , return False
18 return False
19
20 i f p [ pi ] in [ ' ? ' , '∗ ' ] :
21 i f p [ p i ] == ' ? ' :
22 r e t u r n h e l p e r ( s i +1, p i +1)
23 else :
24 f o r i i n r a n g e ( s i , ns +1) : # we can match
a l l t i l l t h e end
25 #p r i n t ( i )
26 i f h e l p e r ( i , p i +1) :
27 r e t u r n True
28 return False
29 else :
30 i f p [ p i ] != s [ s i ] :
31 return False
32 r e t u r n h e l p e r ( s i +1, p i +1)
33 return helper (0 , 0)

Solution 2: Dynamic programming. Same as all the above prob-

lems, we try to fill out the dp table ourselves. If it is a ’?’, check
dp[i-1][j-1], if p[i]==s[j], check dp[i-1][j-1]. For ’*’, if it is treated as
”, check dp[i-1][j] (above), because it can be any length of string, we
check left dp[i][j-1].
'' a d c e b
'' 1 0 0 0 0 0
∗ 1 1 1 1 1 1
a 0 1 0 0 0 0
∗ 0 1 1 1 1 1
b 0 0 0 0 0 1
∗ 0 0 0 0 0 1

1 d e f isMatch ( s e l f , s , p ) :
2 ns , np = l e n ( s ) , l e n ( p )
3 dp = [ [ F a l s e f o r c i n r a n g e ( ns +1) ] f o r r i n r a n g e ( np+1)
]
4
5 # initialize
6 dp [ 0 ] [ 0 ] = True
7 f o r r i n r a n g e ( 1 , np+1) :
8 i f p [ r −1] == ' ∗ ' and dp [ r − 1 ] [ 0 ] :
9 dp [ r ] [ 0 ] = True
10
11 # dp main
12 f o r r i n r a n g e ( 1 , np+1) :
13 f o r c i n r a n g e ( 1 , ns +1) :
14 i f p [ r −1] == ' ? ' :
15 dp [ r ] [ c ] = dp [ r − 1 ] [ c −1]
698 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

16 e l i f p [ r −1] == ' ∗ ' :

17 dp [ r ] [ c ] = dp [ r − 1 ] [ c ] o r dp [ r ] [ c −1] # above
or l e f t
18 i f dp [ r ] [ c ] :
19 f o r nc i n r a n g e ( c +1 , ns +1) :
20 dp [ r ] [ nc ] = True
21 break
22 else :
23 i f dp [ r − 1 ] [ c −1] and p [ r −1] == s [ c − 1 ] :
24 dp [ r ] [ c ] = True
25
26
27 r e t u r n dp [ − 1 ] [ − 1 ]

29.5.3 Summary
The four elements include:

1. state: f[i][j]: i denotes the previous i number of numbers or characters

in the first string, j is the previous j elements for the second string;
We need to assign n + 1 and m + 1 for each dimension;

2. function: f[i][j] research how to match the ith element in the first string
with the jth element in the second string;

3. initialize: f[i][0] for the first column and f[0][j] for the first row;

4. answer: f[n][m]

29.6 Knapsack
The problems in this section are defined as: Given n items with Cost Ci
and value Vi , we can choose i items that either 1) equals to an amount S or
2) is bounded by an amount S. We would be required to obtain either 1)
maximum values or 2) minimum items. Depends on if we can use one item
multiple times, we have three categorizes:

1. 0-1 Knapsack (Section 29.6.1): each item is only allowed to use 0 or 1

time.

2. Unbounded Knapsack(Section 29.6.2): each item is allowed to use un-

limited times.

3. Bounded Knapsack(Section 29.6.3): each item is allowed to use a fixed

number of times.

How to solve the above three types of questions will be explained and the
Python example will be given in the next three subsections (Section 29.6.1,
29.6. KNAPSACK 699

29.6.2, and 29.6.3) with the second type of restriction that the total cost is
bounded by an amount S.
The problems itself is a combination problem with restriction, therefore
we can definitely use DFS as the naive solution. Moreover, the problems
are not about to simply enumerate all the combinations, its an optimization
problems, this is the difference of with memoization to solve these prob-
lems. Thus, dynamic programming is not our only choice. We can refer to
Section ?? and Section ?? for the DFS based solution and reasoning.
LeetCode problems:

1. 322. Coin Change (**) unbounded, fixed amount.

29.6.1 0-1 Knapsack

In this subsection, each item is only allowed to be used at most one time.
This is a combination problem with restriction (total cost be bounded by a
given cost or say the total weights of items need to be <= the capacity of
the knapsack).
Given the following example: we can get the maximum value to be 9 by
choosing item 3 and 4 each with cost 2.
c = [1 ,1 ,2 ,2]
v = [1 ,2 ,4 ,5]
C = 4

Solution 1: Combination with DFS. Clearly this is a combination prob-

lem, here we give the naive DFS solution. The time complexity if O(2n ).
1 d e f knapsack01DFS ( c , v , C) :
2 d e f d f s ( s , cur_c , cur_v , ans ) :
3 ans [ 0 ] = max( ans [ 0 ] , cur_v )
4 i f s == n : r e t u r n
5 f o r i in range ( s , n) :
6 i f cur_c + c [ i ] <= C : # r e s t r i c t i o n
7 d f s ( i + 1 , cur_c + c [ i ] , cur_v + v [ i ] , ans )
8 ans = [ 0 ]
9 n = len ( c )
10 d f s ( 0 , 0 , 0 , ans )
11 r e t u r n ans [ 0 ]
12
13 c = [1 ,1 ,2 ,2]
14 v = [1 ,2 ,4 ,5]
15 C = 4
16 p r i n t ( knapsack01DFS ( c , v , C) )
17 # output
18 # 9

Solution 2: DFS+MEMO. However, because this is an optimization

problem Solution 3: Dynamic Programming. Here, we can try to
make it iterative with dynamic programming. Here, because we have two
variables to track (need modification), we use dp[i][c] to denote maximum
700 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

value we can gain with subproblems (0,i) and a cost of c. Thus, the size of
the dp matrix is n × (C + 1). This makes the time complexity of O(n × C).
Like any coordinate type of dynamic programming problems, We definitely
need to iterate through two for loops, one for i and the other for c, which
one is inside or outside does not matter here. The state transfer function
will be: the maximum value of 1) not choose this item, 2) choose this item,
which will add v[i] to the value of the first i-1 items with cost of c-c[i].
dp[i][c] = max(dp[i − 1][c], dp[i − 1][c − c[i]] + v[i]).
1 d e f knapsack01DP ( c , v , C) :
2 dp = [ [ 0 f o r _ i n r a n g e (C+1) ] f o r r i n r a n g e ( l e n ( c ) +1) ]
3 f o r i in range ( len ( c ) ) :
4 f o r w i n r a n g e ( c [ i ] , C+1) :
5 dp [ i + 1 ] [w ] = max( dp [ i ] [ w ] , dp [ i ] [ w−c [ i ] ] + v [ i ] )
6 r e t u r n dp [ − 1 ] [ − 1 ]

Optimize Space. Because when we are updating dp, we use the left upper
row to update the right lower row, we can reduce the space to O(C). If
we keep the same code as above just with one dimensional dp, then for the
later part of updating it is using the updated result from the same level, thus
resulting using each item multiple times which is actually the most efficient
solution to unbounded knapsack problem in the next section. To avoid this
we have two choices 1) by using a temporary one-dimensional new dp for
each i. 2) by updating the cost reversely we can make sure each time we are
not using the newly updated result.
1 d e f knapsack01OptimizedDP1 ( c , v , C) :
2 dp = [ 0 f o r _ i n r a n g e (C+1) ]
3 f o r i in range ( len ( c ) ) :
4 new_dp = [ 0 f o r _ i n r a n g e (C+1) ]
5 f o r w i n r a n g e ( c [ i ] , C+1) :
6 new_dp [ w ] = max( dp [ w ] , dp [ w−c [ i ] ] + v [ i ] )
7 dp = new_dp
8 r e t u r n dp [ −1]
9
10 d e f knapsack01OptimizedDP2 ( c , v , C) :
11 dp = [ 0 f o r _ i n r a n g e (C+1) ]
12 f o r i in range ( len ( c ) ) :
13 f o r w i n r a n g e (C, c [ i ] −1 , −1) :
14 dp [ w ] = max( dp [ w ] , dp [ w−c [ i ] ] + v [ i ] )
15 r e t u r n dp [ −1]

For the convenience of the later sections, we modularize the final code as:
1 d e f knapsack01 ( c o s t , v a l , C, dp ) :
2 f o r j i n r a n g e (C, c o s t −1, −1) :
3 dp [ j ] = max( dp [ j ] , dp [ j −c o s t ]+ v a l )
4 r e t u r n dp
5 d e f k n a p s a c k 0 1 F i n a l ( c , v , C) :
6 n = len ( c )
7 dp = [ 0 f o r _ i n r a n g e (C+1) ]
8 f o r i in range (n) :
29.6. KNAPSACK 701

9 knapsack01 ( c [ i ] , v [ i ] , C, dp )
10 r e t u r n dp [ −1]

29.6.2 Unbounded Knapsack

Unbounded knapsack problems where one item can be used for unlimited
times only if the total cost is limited. So each item can be used at most
C/c[i] times.
Solution 1: Combination with DFS. Here, because one item can be
used only if the cost is within restriction of the knapsack’s capacity, thus
when we recursively call DFS function, we do not increase the index i like
we did in the 0-1 knapsack problem.
1 d e f knapsackUnboundDFS ( c , v , C) :
2 d e f combinationUnbound ( s , cur_c , cur_v , ans ) :
3 ans [ 0 ] = max( ans [ 0 ] , cur_v )
4 i f s == n : r e t u r n
5 f o r i in range ( s , n) :
6 i f cur_c + c [ i ] <= C : # r e s t r i c t i o n
7 combinationUnbound ( i , cur_c + c [ i ] , cur_v + v [ i
] , ans )
8 ans = [ 0 ]
9 n = len ( c )
10 combinationUnbound ( 0 , 0 , 0 , ans )
11 r e t u r n ans [ 0 ]
12 p r i n t ( knapsackUnboundDFS ( c , v , C) )
13 # output
14 # 10

Solution 2: Use 0-1 knapsack’s dynamic programming. We can

simply copy each item up to C/c[i] times. Or we can do it better, because
any positive integer can be composed by using 1, 2, 4, ..., 2k . For instance,
3=1+2, 5=1+4,6=2+4. Thus we can shrink the C/c[i] to log2 (C/c[i])) + 1
items, each with value c[i], v[i]; 2*c[i], 2*v[i], to 2k times the cost and value.
1 import math
2 d e f knapsackUnboundNaiveDP2 ( c , v , C) :
3 n = len ( c )
4 dp = [ 0 f o r _ i n r a n g e (C+1) ]
5 f o r i in range (n) :
6 f o r j i n r a n g e ( i n t ( math . l o g (C/ c [ i ] , 2 ) ) +1) : # c a l l i t
multiple times
7 # l o g ( 3 , 2 ) = 1 . 4 , 3= 1+2 , s o we need 2 , 4 = 4 .
8 knapsack01 ( c [ i ]<<j , v [ i ]<<j , C, dp )
9 r e t u r n dp [ −1]
10 # output
11 # 10

Solution 3: Use the covered updating of the one-dimensional dp.

As we mentioned in the above section, if we use one-dimensional dp without
do any change of the knapsack01DP code. (still hard to explain)
702 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

1 d e f knapsackUnbound ( c o s t , v a l , C, dp ) :
2 f o r j i n r a n g e ( c o s t , C+1) :
3 dp [ j ] = max( dp [ j ] , dp [ j −c o s t ]+ v a l )
4 r e t u r n dp
5
6 d e f knapsackUnboundFinal ( c , v , C) :
7 n = len ( c )
8 dp = [ 0 f o r _ i n r a n g e (C+1) ]
9 f o r i in range (n) :
10 knapsackUnbound ( c [ i ] , v [ i ] , C, dp )
11 r e t u r n dp [ −1]

29.6.3 Bounded Knapsack

In this type of problems, each item can be used at most n[i] times.
Reduce to 0-1 Knapsack problem. Like in the Unbounded Knap-
sack, it can be reduced to 0-1 knapsack and each can appear at most n[i]
times. Thus, we can use min(log2 (n[i]), log2 (C/c[i]).
1 d e f knapsackboundDP ( c , v , Num, C) :
2 n = len ( c )
3 dp = [ 0 f o r _ i n r a n g e (C+1) ]
4 f o r i in range (n) :
5 f o r j i n r a n g e ( min ( i n t ( math . l o g (C/ c [ i ] , 2 ) ) +1, i n t ( math .
l o g (Num[ i ] , 2 ) ) +1) ) : # c a l l i t m u l t i p l e t i m e s
6 knapsack01 ( c [ i ]<<j , v [ i ]<<j , C, dp )
7 r e t u r n dp [ −1]
8 Num = [ 2 , 3 , 2 , 2 ]
9 p r i n t ( knapsackboundDP ( c , v , Num, C) )
10 # 10

Reduce to Unbounded Knapsack. If n[i] >= C/c[i], ∀i, then the Bounded
Knapsack can be reduced to Unbounded Knapsack.

29.6.4 Generalization
The four elements of the backpack problems include:

1. State: dp[i][c] denotes the optimized value (maximum value, minimum

items, total number) with subproblem (0,i) with cost c.

2. State transfer Function: dp[i][c] = f (dp[i − 1][c − c[i]], dp[i − 1][c]). For
example, if we want:

• maximum/min value: f = max/min, dp[i-1][c-c[i]] -> dp[i-1][c-

c[i]]+v[i];
• total possible solutions: dp[i][c] += dp[i-1][c-c[i]]
• the maximum cost (how full we can fill the snapsack): dp[i][j] =
max(dp[i-1][j], dp[i-1][j-c[i]]+c[i])
29.6. KNAPSACK 703

3. Initialize: f [i][0] = T rue; f [0][1, ..., size] = F alse, which is explained

that if we have i items, we choose 0, so we can always get size 0, if we
only have 0 items, we cant fill backpack with size in range (1, size).

4. Answer: dp[n-1][C-1].

Restriction Requires to Reach to Exact Amount of Capacity

In the above sections, we answered different type of knapsacks with the
second restriction, while how about for the first restriction which requires
the total cost to be exact equal to an amount S. Think about that if we
are given an amount that no combination from the cost array can be added
up to this amount, then it should be set to invalid, with value float(“-inf")
for max function and float(“inf") for min state function in Python. For the
amount of 0, the value will be valid with 0. Thus, the only difference for the
first restriction lies in the initialization. Here, we give an example of Eaxt
for the unbounded type:
1 d e f knapsackUnboundExactNaiveDP2 ( c , v , C) :
2 n = len ( c )
3 dp = [ f l o a t ( "− i n f " ) f o r _ i n r a n g e (C+1) ]
4 dp [ 0 ] = 0
5 f o r i in range (n) :
6 f o r j i n r a n g e ( i n t ( math . l o g (C/ c [ i ] , 2 ) ) +1) : # c a l l i t
multiple times
7 knapsack01 ( c [ i ]<<j , v [ i ]<<j , C, dp )
8 r e t u r n dp [ −1] i f dp [ −1] != f l o a t ( "− i n f " ) e l s e 0
9 c = [2 ,2 ,2 ,7]
10 v = [1 ,2 ,4 ,5]
11
12 C = 17
13 p r i n t ( knapsackUnboundNaiveDP2 ( c , v , C) )
14 p r i n t ( knapsackUnboundExactNaiveDP2 ( c , v , C) )
15 # output
16 # 32
17 # 25

29.6.5 LeetCode Problems

29.19 Coin Change (L322, **). You are given coins of different denom-
inations and a total amount of money amount. Write a function to
compute the fewest number of coins that you need to make up that
amount. If that amount of money cannot be made up by any combi-
nation of the coins, return -1. Note: You may assume that you have
an infinite number of each kind of coin.
Example 1 :

I nput : c o i n s = [ 1 , 2 , 5 ] , amount = 11
Output : 3
E x p l a n a t i o n : 11 = 5 + 5 + 1
704 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

Example 2 :

Input : c o i n s = [ 2 ] , amount = 3
Output : −1

Solution: Unbounded Snapsack with Restriction 1 and Do

Minimum Counting. First, we are required to get the fewest length
of valid combination, thus the state transfer function is dp[i][j] =
min(dp[i-1][j], dp[i-1][j-c[i]]+1). Second, we need to make up exact
amount thus other than at cost 0, all the others are initialize with in-
valid value of float(“inf") in the dp array. Third, this is an unbounded
snapsack, thus we iterate the costs incrementally with one dimensional
dp array to be able to use them multiple times.
1 d e f coinChange ( s e l f , c o i n s , amount ) :
2 dp = [ f l o a t ( " i n f " ) f o r c i n r a n g e ( amount+1) ]
3 dp [ 0 ] = 0
4 # unbounded sacp back problems
5 f o r i in range ( len ( c o i n s ) ) :
6 f o r a i n r a n g e ( c o i n s [ i ] , amount+1) :
7 dp [ a ] = min ( dp [ a ] , dp [ a−c o i n s [ i ] ] + 1 )
8
9 r e t u r n dp [ −1] i f dp [ −1] != f l o a t ( " i n f " ) e l s e −1

29.20 Partition Equal Subset Sum (L416, **). Given a non-empty

array containing only positive integers, find if the array can be parti-
tioned into two subsets such that the sum of elements in both subsets
is equal.
Example 1 :
Input : [ 1 , 5 , 1 1 , 5 ]
Output : t r u e
E x p l a n a t i o n : The a r r a y can be p a r t i t i o n e d a s [ 1 , 5 , 5 ] and
[11].

Example 2 :
Input : [ 1 , 2 , 3 , 5 ]
Output : f a l s e
E x p l a n a t i o n : The a r r a y cannot be p a r t i t i o n e d i n t o e q u a l sum
subsets .

Solution 1: 0-1 Snapsack. First, we compute the total sum, only

if the sum is even integer that we can possibly divide it into two
equal subsets. After we obtained the possible sum, we use 0-1 snap-
sack where the state transfer function: dp[i][j] = dp[i-1][j] or dp[i-1][j-
nums[i]]. And the dp array is initalized as false.
1 d e f c a n P a r t i t i o n ( s e l f , nums ) :
2 i f not nums :
3 return False
29.7. EXERCISE 705

4 s = sum ( nums )
5 i f s %2:
6 return False
7 # 01 s n a p s a c k
8 dp = [ F a l s e ] ∗ ( i n t ( s / 2 ) +1)
9 dp [ 0 ] = True
10
11 f o r i i n r a n g e ( l e n ( nums ) ) :
12 f o r j i n r a n g e ( i n t ( s / 2 ) , nums [ i ] −1 , −1) :
13 dp [ j ] = ( dp [ j ] o r dp [ j −nums [ i ] ] )
14
15 r e t u r n dp [ −1]

Solution 2: DFS with Memo. However, here we only need to check

if it equals, we can do early stop with DFS.
1 d e f c a n P a r t i t i o n ( s e l f , nums ) :
2 i f not nums :
3 return False
4 s = sum ( nums )
5 i f s %2:
6 return False
7 memo = {}
8 def dfs ( s , t ) :
9 i f t == 0 :
10 r e t u r n True
11 i f t < 0:
12 return False
13 i f t not i n memo :
14 memo [ t ] = any ( d f s ( i +1, t−nums [ i ] ) f o r i i n
r a n g e ( s , l e n ( nums ) ) )
15
16 r e t u r n memo [ t ]
17 return d f s (0 , s //2)

29.7 Exercise
29.7.1 Single Sequence
Unique Binary Search Tree
Interleaving String
Race Car

29.7.2 Coordinate
746. Min Cost Climbing Stair (Easy)
1 On a s t a i r c a s e , t h e i −th s t e p has some non−n e g a t i v e c o s t c o s t [ i
] assigned (0 indexed ) .
2
706 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

3 Once you pay t h e c o s t , you can e i t h e r c l i m b one o r two s t e p s .

You need t o f i n d minimum c o s t t o r e a c h t h e top o f t h e f l o o r ,
and you can e i t h e r s t a r t from t h e s t e p with i n d e x 0 , o r t h e
s t e p with i n d e x 1 .
4
5 Example 1 :
6
7 Input : c o s t = [ 1 0 , 1 5 , 2 0 ]
8 Output : 15
9 E x p l a n a t i o n : Cheapest i s s t a r t on c o s t [ 1 ] , pay t h a t c o s t and go
t o t h e top .
10
11 Example 2 :
12
13 Input : c o s t = [ 1 , 1 0 0 , 1 , 1 , 1 , 1 0 0 , 1 , 1 , 1 0 0 , 1 ]
14 Output : 6
15 E x p l a n a t i o n : Cheapest i s s t a r t on c o s t [ 0 ] , and o n l y s t e p on 1 s ,
skipping cost [ 3 ] .
16
17 Note :
18
19 c o s t w i l l have a l e n g t h i n t h e r a n g e [ 2 , 1 0 0 0 ] .
20 Every c o s t [ i ] w i l l be an i n t e g e r i n t h e r a n g e [ 0 , 9 9 9 ] .

Figure 29.9: Caption

Analysis: As Fig 29.9 shows, the mincost to get 2 is only dependent on

the mincost of 0, 1, each goes 2 and 1 steps respectively. To get 3, is only
dependent on the mincost of 1, 2. For 0, 1, the cost is initialize as 0 because
it is the starting point.
1 def minCostClimbingStairs ( s e l f , cost ) :
2 i f not c o s t :
3 return 0
4 dp = [ s y s . maxsize ] ∗ ( l e n ( c o s t ) +1)
5
6 dp [ 0 ] = 0
29.7. EXERCISE 707

7 dp [ 1 ] = 0
8 f o r i i n r a n g e ( 2 , l e n ( c o s t ) +1) :
9 dp [ i ] = min ( dp [ i ] , dp [ i −1]+ c o s t [ i −1] , dp [ i −2]+ c o s t [ i −2])
10 r e t u r n dp [ −1]

576. Out of Boundary Paths (Medium)

1 There i s an m by n g r i d with a b a l l . Given t h e s t a r t c o o r d i n a t e
( i , j ) o f t h e b a l l , you can move t h e b a l l t o a d j a c e n t c e l l o r
c r o s s t h e g r i d boundary i n f o u r d i r e c t i o n s ( up , down , l e f t ,
r i g h t ) . However , you can a t most move N t i m e s . Find out t h e
number o f p a t h s t o move t h e b a l l out o f g r i d boundary . The
answer may be v e r y l a r g e , r e t u r n i t a f t e r mod 10^9 + 7 .
2
3 Example 1 :
4 Input : m = 2 , n = 2 , N = 2 , i = 0 , j = 0
5 Output : 6
6 Explanation :
7
8 Example 2 :
9 Input : m = 1 , n = 3 , N = 3 , i = 0 , j = 1
10 Output : 12
11 Explanation :
12
13 Note :
14 Once you move t h e b a l l out o f boundary , you cannot move i t
back .
15 The l e n g t h and h e i g h t o f t h e g r i d i s i n r a n g e [ 1 , 5 0 ] .
16 N i s in range [ 0 , 5 0 ] .

Multiple Time Coordinate. The only difference compared with our ex-
amples, we track the out of boundary paths each time when the next location
is not within bound.
1 d e f f i n d P a t h s ( s e l f , m, n , N, i , j ) :
2 MOD = 10∗∗9+7
3 d i r s = [( −1 , 0 ) , ( 1 , 0 ) , ( 0 , −1) , ( 0 , 1 ) ]
4 dp = [ [ 0 f o r _ i n r a n g e ( n ) ] f o r _ i n r a n g e (m) ]
5 dp [ i ] [ j ] = 1
6 ans = 0
7
8 f o r s t e p i n r a n g e (N) :
9 new_dp = [ [ 0 f o r _ i n r a n g e ( n ) ] f o r _ i n r a n g e (m) ]
10 f o r x i n r a n g e (m) :
11 f o r y in range (n) :
12 i f dp [ x ] [ y ] == 0 : #o n l y check a v a i l a b l e l o c a t i o n
at that step
13 continue
14 f o r dx , dy i n d i r s :
15 nx , ny = x+dx , y+dy
16 i f 0 <= nx < m and 0 <= ny < n :
17 new_dp [ nx ] [ ny ] += dp [ x ] [ y ]
18 else :
19 ans += dp [ x ] [ y ]
20 ans %= MOD
708 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

21 dp = new_dp
22
23 r e t u r n ans

63. Unique Paths II

1 A r o b o t i s l o c a t e d a t t h e top− l e f t c o r n e r o f a m x n g r i d (
marked ' S t a r t ' i n t h e diagram below ) .
2
3 The r o b o t can o n l y move e i t h e r down o r r i g h t a t any p o i n t i n
time . The r o b o t i s t r y i n g t o r e a c h t h e bottom−r i g h t c o r n e r o f
t h e g r i d ( marked ' F i n i s h ' i n t h e diagram below ) .
4
5 Now c o n s i d e r i f some o b s t a c l e s a r e added t o t h e g r i d s . How many
un ique p a t h s would t h e r e be ?
6
7 An o b s t a c l e and empty s p a c e i s marked a s 1 and 0 r e s p e c t i v e l y i n
the g r i d .
8
9 Note : m and n w i l l be a t most 1 0 0 .
10
11 Example 1 :
12
13 Input :
14 [
15 [0 ,0 ,0] ,
16 [0 ,1 ,0] ,
17 [0 ,0 ,0]
18 ]
19 Output : 2
20 Explanation :
21 There i s one o b s t a c l e i n t h e middle o f t h e 3 x3 g r i d above .
22 There a r e two ways t o r e a c h t h e bottom−r i g h t c o r n e r :
23 1 . Right −> Right −> Down −> Down
24 2 . Down −> Down −> Right −> Right

Coordinate.
1 def uniquePathsWithObstacles ( s e l f , obstacleGrid ) :
2 """
3 : type o b s t a c l e G r i d : L i s t [ L i s t [ i n t ] ]
4 : rtype : int
5 """
6 i f not o b s t a c l e G r i d o r o b s t a c l e G r i d [ 0 ] [ 0 ] == 1 :
7 return 0
8 m, n = l e n ( o b s t a c l e G r i d ) , l e n ( o b s t a c l e G r i d [ 0 ] )
9 dp = [ [ 0 f o r c i n r a n g e ( n ) ] f o r r i n r a n g e (m) ]
10 dp [ 0 ] [ 0 ] = 1 i f o b s t a c l e G r i d [ 0 ] [ 0 ] == 0 e l s e 0 # s t a r t i n g
point
11
12 # init col
13 f o r r i n r a n g e ( 1 , m) :
14 dp [ r ] [ 0 ] = dp [ r − 1 ] [ 0 ] i f o b s t a c l e G r i d [ r ] [ 0 ] == 0 e l s e 0
15
16 f o r c in range (1 , n) :
29.7. EXERCISE 709

17 dp [ 0 ] [ c ] = dp [ 0 ] [ c −1] i f o b s t a c l e G r i d [ 0 ] [ c ] == 0 e l s e 0
18
19 f o r r i n r a n g e ( 1 , m) :
20 f o r c in range (1 , n) :
21 dp [ r ] [ c ] = dp [ r − 1 ] [ c ] + dp [ r ] [ c −1] i f o b s t a c l e G r i d [ r
] [ c ] == 0 e l s e 0
22 p r i n t ( dp )
23 r e t u r n dp [ − 1 ] [ − 1 ]

29.7.3 Double Sequence

712. Minimum ASCII Delete Sum for Two Strings
1 Given two s t r i n g s s1 , s2 , f i n d t h e l o w e s t ASCII sum o f d e l e t e d
c h a r a c t e r s t o make two s t r i n g s e q u a l .
2
3 Example 1 :
4
5 Input : s1 = " sea " , s2 = " eat "
6 Output : 231
7 E x p l a n a t i o n : D e l e t i n g " s " from " s e a " adds t h e ASCII v a l u e o f " s "
( 1 1 5 ) t o t h e sum .
8 D e l e t i n g " t " from " e a t " adds 116 t o t h e sum .
9 At t h e end , both s t r i n g s a r e e q u a l , and 115 + 116 = 231 i s t h e
minimum sum p o s s i b l e t o a c h i e v e t h i s .
10
11 Example 2 :
12
13 Input : s1 = " d e l e t e " , s2 = " l e e t "
14 Output : 403
15 E x p l a n a t i o n : D e l e t i n g " dee " from " d e l e t e " t o t u r n t h e s t r i n g
into " let " ,
16 adds 1 0 0 [ d ] + 1 0 1 [ e ] + 1 0 1 [ e ] t o t h e sum . D e l e t i n g " e " from " l e e t "
adds 1 0 1 [ e ] t o t h e sum .
17 At t h e end , both s t r i n g s a r e e q u a l t o " l e t " , and t h e answer i s
100+101+101+101 = 4 0 3 .
18 I f i n s t e a d we t u r n e d both s t r i n g s i n t o " l e e " o r " e e t " , we would
g e t a n s w e r s o f 433 o r 4 1 7 , which a r e h i g h e r .
19
20 Note :
21 0 < s 1 . l e n g t h , s 2 . l e n g t h <= 1 0 0 0 .
22 A l l e l e m e n t s o f each s t r i n g w i l l have an ASCII v a l u e i n [ 9 7 ,
122].

1 d e f minimumDeleteSum ( s e l f , s1 , s 2 ) :
2 word1 , word2=s1 , s 2
3 i f not word1 :
4 i f not word2 :
5 return 0
6 else :
7 r e t u r n sum ( [ ord ( c ) f o r c i n word2 ] )
8 i f not word2 :
9 r e t u r n sum ( [ ord ( c ) f o r c i n word1 ] )
710 29. DYNAMIC PROGRAMMING QUESTIONS (15%)

10
11 rows , c o l s=l e n ( word1 ) , l e n ( word2 )
12
13 dp = [ [ 0 f o r c o l i n r a n g e ( c o l s +1) ] f o r row i n r a n g e ( rows +1) ]
14 f o r i i n r a n g e ( 1 , rows +1) :
15 dp [ i ] [ 0 ] = dp [ i − 1 ] [ 0 ] + ord ( word1 [ i −1]) #d e l e t e i n word1
16 f o r j i n r a n g e ( 1 , c o l s +1) :
17 dp [ 0 ] [ j ] = dp [ 0 ] [ j −1] + ord ( word2 [ j −1]) #d e l e t e i n word2
18
19 f o r i i n r a n g e ( 1 , rows +1) :
20 f o r j i n r a n g e ( 1 , c o l s +1) :
21 i f word1 [ i −1] == word2 [ j − 1 ] :
22 dp [ i ] [ j ] = dp [ i − 1 ] [ j −1]
23 else :
24 dp [ i ] [ j ] = min ( dp [ i ] [ j −1] + ord ( word2 [ j −1]) , dp [
i − 1 ] [ j ] + ord ( word1 [ i −1]) ) #d e l e t e i n word2 , d e l e t e i n word1
25 r e t u r n dp [ rows ] [ c o l s ]
Part VIII

Appendix

711
30

Cool Python Guide

In this chapter, instead of installing Python 3 on your device, we can use

the IDE offered by greeksforgeeks website which can be found https://
ide.geeksforgeeks.org/index.php, Google Colab is a good place to write
Python notebook style document.
Python is one of the easiest advanced scripting and object-oriented pro-
gramming languages to learn. Its intuitive structures and semantics are
originally designed for people for are computer scientists. Python is one of
the most widely used programming languages due to its:
• Easy syntax and hide concept like pointers: both non-programmers
and programmers can learn to code easily and at a faster rate,

• Readability: Python is often referred as “executable pseudo-code” be-

cause its syntax mostly follows the conventions used for programmers
to outline their ideas.
• Cross-platform: Python runs on all major operating systems such as
Microsoft Windows, Linux, and Mac OS X

• Extensible: In addition to the standard libraries there are extensive

collections of freely available add-on modules, libraries, frameworks,
and tool-kits.
Compared with other programming languages, Python code is typically 3-5
times shorter than equivalent Java code, and often 5-10 times shorter than
equivalent C++ code according to www.python.org. All of these simplicity
and efficiency make Python an ideal language to learn under current trend,
and it is also an ideal candidate language to use during Coding Interviews
which is time-limited.

713
714 30. COOL PYTHON GUIDE

30.1 Python Overview

In this section, we provide a well-organized overview of how Python works as
an object-oriented programming language in Subsection 30.1.1, what
is the components of Python: built-in Data types, built-in modules, third
party packages/libraries, frameworks (Subsection 30.1.2). And in this book,
we selectively introduce the most useful ones that considered for the purpose
of learning algorithms and passing coding interviews.

30.1.1 Understanding Objects and Operations

Basics
Everything is Object Python built-in Data Types, Modules, Classes,
they are all objects. An object is a class, thus different object is just a
different class. For example, we created an instance of int object with
value 1:
1 >>> 1
2 1

We use type() built-in function to see its underlying type–class, for example:
1 >>> type ( [ 1 , 2 , 3 , 4 ] )
2 < c l a s s ' l i s t '>
3 >>> type ( 1 )
4 < c l a s s ' i n t '>
5 >>> type ( [ 1 , 2 , 3 , 4 ] )
6 < c l a s s ' l i s t '>
7 >>> type ( r a n g e ( 1 0 ) )
8 < c l a s s ' r a n g e '>
9 >>> type ( 1 )
10 < c l a s s ' i n t '>
11 >>> type ( ' abc ' )
12 < c l a s s ' s t r '>

Operators Operators are used to perform operations on variables and in-

stance of objects. The example shows operator + performed on two instances
of objects.
1 >>> [ 1 , 2 , 3 ] + [ 4 , 5 , 6 ]
2 [1 , 2 , 3 , 4 , 5 , 6]

Variables When we are creating an instance of objects, a common practice

is to have variables which are essentially pointers pointing to the instance
of object’s location in memory.
1 >>> a = [ 1 , 2 , 3 ]
2 >>> b = [ 4 , 5 , 6 ]
3 >>> c = a + b
30.1. PYTHON OVERVIEW 715

4 >>> c
5 [1 , 2 , 3 , 4 , 5 , 6]

Tools To be able to help ourselves; knowing what attributes, built-in func-

tion that, we use built-in function dir(object).
To check object or function’s information, we use built-in function help.
And when we are done with viewing, type q to exit.

Properties
In-place VS Standard Operations In-place operation is an operation
that changes directly the content of a given linear algebra, vector, ma-
trices(Tensor) without making a copy. The operators which helps to do
the operation is called in-place operator. Eg: a+= b is equivalent to a=
operator.iadd(a, b). A standard operation, on the other hand, will re-
turn a new instance of object.

Mutable VS Immutable Objects All objects can be either mutable or

immutable. Simply put, for immutable data types/objects, we can not add,
remove, or replace its content on the fly, whereas the mutable objects can not
but rather return new objects when attempting to update. Custom classes
are generally mutable. The different behavior of mutable and immutable
objects can be shown by using operations. A in-place operation can only be
performed on mutable objects.

Object, Type, Identity, Value Everything in Python is an object in-

cluding different data types, modules, classes, and functions. Each object
in Python has a type, a value, and an identity. When we are creating
an instance of an object such as string with value ‘abc’ it automatically
comes with an “identifier”. The identifier of the object acts as a pointer to
the object’s location in memory. The built-in function id() can be used to
return the identity of an object as an integer which usually corresponds to
the object’s location in memory. is identity operator can be used directly
to compare the identity of two objects. The built-in function type() can
return the type of an object and operator == can be used to see if two objects
has the same value.

Examples
Behavior of Mutable Objects Let us see an example, we create three
variables/instances a, b, c, and a, b are assigned with object of the same
value, and c is assigned with variable a.
1 >>> a = [ 1 , 2 , 3 ]
2 >>> b = [ 1 , 2 , 3 ]
716 30. COOL PYTHON GUIDE

3 >>> c = a
4 >>> i d ( a ) , i d ( b ) , i d ( c )
5 (140222162413704 , 140222017592328 , 140222162413704)

We use our introduced function and operators to demonstrate its behavior.

First, check a and b:
1 >>> a == b , a i s b , type ( a ) i s type ( b )
2 ( True , F a l s e , True )

We see that a and b are having different identity, meaning the object of
each points to different location in memory, they are indeed two independent
objects. Now, let us compare a and c the same way:
1 >>> a == c , a i s c , type ( a ) i s type ( c )
2 ( True , True , True )

Ta-daa! They have the same identity, meaning they point to the same piece
of memory and c is more like an alias to a. Now, let’s change a value in a
use in-place operation and see its ids:
1 >>> a [ 2 ] = 4
2 >>> i d ( a ) , i d ( b ) , id ( c )
3 (140222162413704 , 140222017592328 , 140222162413704)
4 >>> a += [ 5 ]
5 >>> i d ( a ) , i d ( b ) , id ( c )
6 (140222162413704 , 140222017592328 , 140222162413704)

We do not see any change about identity but change of values. Now, let us
use other standard operations and see the behavior:
1 >>> a = a + [ 5 ]
2 >>> a
3 [1 , 2 , 4 , 5 , 5]
4 >>> i d ( a ) , i d ( b ) , i d ( c )
5 (140222017592392 , 140222017592328 , 140222162413704)

Now, we see a has a different id compared with c, meaning they are no

longer the same instance of the same object any more.

Behavior of Immutable Objects For the mutable objects, we see the

the reassignment of c to a results having same identity, however, this is not
the case in the immutable objects, see an example:
1 >>> a = ' abc '
2 >>> b = ' abc '
3 >>> c = a
4 >>> i d ( a ) , i d ( b ) , i d ( c )
5 (140222162341424 , 140222162341424 , 140222162341424)

These three variables a, b, c all share the same identity, meaning they all
point to the same instance of object in the same piece of memory. This ends
up more efficient usage of memory. Now, let’s try to change the value of
the variable a. We called += operator which is in-place operator for mutable
objects:
30.1. PYTHON OVERVIEW 717

1 >>> a += ' d '

2 >>> a
3 ' abcd '
4 >>> i d ( a ) , i d ( b ) , i d ( c )
5 (140222017638952 , 140222162341424 , 140222162341424)

We see still a new instance of string object is created and with an new id
140222017638952.

30.1.2 Python Components

The plethora of built-in data types, built-in modules, third party modules
or package/libraries, and frameworks contributes to the popularity and effi-
ciency of coding in Python.

Python Data Types Python contains 12 built-in data types. These in-
clude four scalar data types( int, float, complex and bool), four sequence
types(string, list, tuple and range), one mapping type(dict) and two set
types(set and frozenset). All the four scalar data types together with
string, tuple, range and fronzenset are immutable, and the others are mu-
table. Each of these can be manipulated using:

• Operators

• Functions

• Data-type methods

Module is a file which contains python functions, global variables etc.

It is nothing but .py file which has python executable code / statement.
With the build-in modules, we do not need to install external packages or
include these .py files explicitly in our Python project, all we need to do is
importing them directly and use their objects and corresponding methods.
For example, we use built-in module Array:
1 import Array
2 # use i t

We can also write a .py file ourselves and import them. We provide reference
to some of the popular and useful built-in modules that is not covered in
Part ?? in Python in Section 30.9 of this chapter, they are:

• Re

Package/Library Package or library is namespace which contains multi-

ple package/modules. It is a directory which contains a special file __init__.py
Let’s create a directory user. Now this package contains multiple pack-
ages / modules to handle user related requests.
718 30. COOL PYTHON GUIDE

user / # top l e v e l package

__init__ . py

get / # f i r s t subpackage
__init__ . py
i n f o . py
p o i n t s . py
t r a n s a c t i o n s . py

create / # s e c o n d subpackage
__init__ . py
a p i . py
p l a t f o r m . py

Now you can import it in following way

1 from u s e r . g e t import i n f o # i m p o r t s i n f o module from g e t package
2 from u s e r . c r e a t e import a p i #i m p o r t s a p i module from c r e a t e
package

When we import any package, python interpreter searches for sub direc-
tories / packages.
Library is collection of various packages. There is no difference between
package and python library conceptually. Have a look at requests/requests
library. We use it as a package.

Framework It is a collection of various libraries which architects the code

flow. Let’s take example of Django which has various in-built libraries like
Auth, user, database connector etc. Also, in artifical intelligence filed, we
have TensorFlow, PyTorch, SkLearn framework to use.

When Mutability Matters

Mutability might seem like an innocuous topic, but when writing an efficient
program it is essential to understand. For instance, the following code is a
straightforward solution to concatenate a string together:
1 string_build = " "
2 f o r data i n c o n t a i n e r :
3 s t r i n g _ b u i l d += s t r ( data )

In reality, this is very inefficient. Because strings are immutable, concate-

nating two strings together actually creates a third string which is the com-
bination of the previous two. If you are iterating a lot and building a large
string, you will waste a lot of memory creating and throwing away objects.
Also, at the end of the iteration you will be allocating and throwing away
very large string objects which is even more costly.
The following is a more efficient and pythonic way:
1 builder_list = [ ]
2 f o r data i n c o n t a i n e r :
30.2. DATA TYPES AND OPERATORS 719

3 b u i l d e r _ l i s t . append ( s t r ( data ) )
4 " " . join ( builder_list )
5
6 ### Another way i s t o u s e a l i s t comprehension
7 " " . j o i n ( [ s t r ( data ) f o r data i n c o n t a i n e r ] )
8
9 ### o r u s e t h e map f u n c t i o n
10 " " . j o i n (map( s t r , c o n t a i n e r ) )

This code takes advantage of the mutability of a single list object to gather
your data together and then allocate a single result string to put your data
in. That cuts down on the total number of objects allocated by almost half.
Another pitfall related to mutability is the following scenario:
1 d e f my_function ( param = [ ] ) :
2 param . append ( " t h i n g " )
3 r e t u r n param
4
5 my_function ( ) # r e t u r n s [ " t h i n g " ]
6 my_function ( ) # r e t u r n s [ " t h i n g " , " t h i n g " ]

What you might think would happen is that by giving an empty list as a
default value to param, a new empty list is allocated each time the function
is called and no list is passed in. But what actually happens is that every
call that uses the default list will be using the same list. This is because
Python (a) only evaluates functions definitions once, (b) evaluates default
arguments as part of the function definition, and (c) allocates one mutable
list for every call of that function.
Do not put a mutable object as the default value of a function parameter.
Immutable types are perfectly safe. If you want to get the intended effect,
do this instead:
1 d e f my_function2 ( param=None ) :
2 i f param i s None :
3 param = [ ]
4 param . append ( " t h i n g " )
5 r e t u r n param
6 Conclusion

Mutability matters. Learn it. Primitive-like types are probably im-

mutable. Container-like types are probably mutable.

30.2 Data Types and Operators

Operators are special symbols in Python that carry out arithmetic or logical
computation. The value that the operator operates on is called the operand.
Python offers Arithmetic operators, Assignment Operator, Comparison Op-
erators, Logical Operators, Bitwise Operators (shown in Chapter III), and
two special operators like the identity operator or the membership operator.
720 30. COOL PYTHON GUIDE

30.2.1 Arithmetic Operators

Arithmetic operators are used to perform mathematical operations like ad-
dition, subtraction, multiplication etc.

Table 30.1: Arithmetic operators in Python

Operator Description Example
+ Add two operands or unary plus x + y+2
- Subtract right operand from the left x - y -2
or unary minus
* Multiply two operands x*y
/ Divide left operand by the right one x/y
(always results into float)
// Floor division - division that results x // y
into whole number adjusted to the
left in the number line
** Exponent - left operand raised to the x**y (x to the
power of right power y)
% Modulus - Divides left hand operand x% y
by right hand operand and returns re-
mainder

30.2.2 Assignment Operators

Assignment operators are used in Python to assign values to variables.
a = 5 is a simple assignment operator that assigns the value 5 on the
right to the variable a on the left.
There are various compound operators that follows the order: vari-
able_name (arithemetic operator) = variable or data type. Such as a += 5
that adds to the variable and later assigns the same. It is equivalent to a =
a + 5.

30.2.3 Comparison Operators

Comparison operators are used to compare values. It either returns True or
False according to the condition.

30.2.4 Logical Operators

Logical operators are the and, or, not operators. It is important for us to
understand what are the values that Python considers False and True. The
following values are considered False, and all the other values are considered
T rue.

• The N one type

30.2. DATA TYPES AND OPERATORS 721

Table 30.2: Comparison operators in Python

Operator Description Example
> Greater that - True if left operand is x>y
greater than the right
< Less that - True if left operand is less x<y
than the right
== Equal to - True if both operands are x == y
equal
!= Not equal to - True if operands are x != y
not equal
>= Greater than or equal to - True if left x >= y
operand is greater than or equal to
the right
<= Less than or equal to - True if left x <= y
operand is less than or equal to the
right

• Boolean False

• An integer, float, or complex zero

• An empty sequence or mapping data type

• An instance of a user-defined class that defines a len() or bool()

method that returns zero or False.

Table 30.3: Logical operators in Python

Operator Description Example
and True if both the operands are true x and y
or True if either of the operands is true x or y
not True if operand is false (complements not x
the operand)

30.2.5 Special Operators

Python language offers some special type of operators like the identity op-
erator or the membership operator.

Identity operators Identity operators are used to check if two values (or
variables) are located on the same part of the memory. Two variables that
are equal does not imply that they are identical as we have shown in the
last section.
722 30. COOL PYTHON GUIDE

Table 30.4: Identity operators in Python

Operator Description Example
is True if the operands are identical (re- x is y
fer to the same object)
is not True if the operands are not identical x is not y
(do not refer to the same object)

Membership Operators in and notin are the membership operators in

Python. They are used to test whether a value or variable is found in a
sequence (string, list, tuple, set and dictionary).

Table 30.5: Membership operators in Python

Operator Description Example
in True if value/variable is found in the 5 in x
sequence
not in True if value/variable is not found in 5 not in x
the sequence

30.3 Function
30.3.1 Python Built-in Functions
Check out here https://docs.python.org/3/library/functions.html.

Built-in Data Types We have functions like int(), float(), str(), tuple(),
list(), set(), dict(), bool(), chr(), ord(). These functions can be used for
intialization, and also used for type conversion between different data types.

30.3.2 Lambda Function

In Python, lambda function is anonymous function, which is a function that
is defined without a name. While normal functions are defined using the
def keyword, in anonymous functions are defined using the lambda keyword,
this is where the name comes from.

Syntax The syntax of lambda function in Python is:

1 lambda arguments : e x p r e s s i o n

Lambda function can has zero to multiple arguments but only one expres-
sion, which will be evaluated and returned. For example, we define a lambda
function which takes one argument x and return x2 .
1 s q u a r e 1 = lambda x : x ∗∗2
30.3. FUNCTION 723

The above lambda function is equal to a normal function defined as:

1 def square (x) :
2 r e t u r n x ∗∗2

Calling the following code has the same output:

1 s q u a r e 1 ( 5 ) == square (5)

Applications Hence, anonymous functions are also called lambda func-

tions. The use of lambda creates an anonymous function (which is callable).
In the case of sorted the callable only takes one parameters. Python’s
lambda is pretty simple. It can only do and return one thing really.
The syntax of lambda is the word lambda followed by the list of param-
eter names then a single block of code. The parameter list and code block
are delineated by colon. This is similar to other constructs in python as well
such as while, for, if and so on. They are all statements that typically have
a code block. Lambda is just another instance of a statement with a code
block.
We can compare the use of lambda with that of def to create a function.
1 adder_lambda = lambda parameter1 , parameter2 : parameter1+
parameter2

The above code equals to the following:

1 d e f a d d e r _ r e g u l a r ( parameter1 , parameter2 ) :
2 r e t u r n parameter1+parameter2

30.3.3 Map, Filter and Reduce

These are three functions which facilitate a functional approach to program-
ming. We will discuss them one by one and understand their use cases.

Map
Map applies a function to all the items in an input_list. Here is the
blueprint:
1 map( function_to_apply , l i s t _ o f _ i n p u t s )

Most of the times we want to pass all the list elements to a function
one-by-one and then collect the output. For instance:
1 items = [ 1 , 2 , 3 , 4 , 5]
2 squared = [ ]
3 f o r i in items :
4 s q u a r e d . append ( i ∗ ∗ 2 )

Map allows us to implement this in a much simpler and nicer way. Here you
go:
724 30. COOL PYTHON GUIDE

1 items = [ 1 , 2 , 3 , 4 , 5]
2 s q u a r e d = l i s t (map( lambda x : x ∗ ∗ 2 , i t e m s ) )

Most of the times we use lambdas with map so I did the same. Instead of a
list of inputs we can even have a list of functions! Here we use x(i) to call
the function, where x is replaced with each function in funcs, and i is the
input to the function.
1 def multiply (x) :
2 r e t u r n ( x∗x )
3 d e f add ( x ) :
4 r e t u r n ( x+x )
5
6 f u n c s = [ m u l t i p l y , add ]
7 f o r i in range (5) :
8 v a l u e = l i s t (map( lambda x : x ( i ) , f u n c s ) )
9 print ( value )
10
11 # Output :
12 # [0 , 0]
13 # [1 , 2]
14 # [4 , 4]
15 # [9 , 6]
16 # [16 , 8]

Filter
As the name suggests, filter creates a list of elements for which a function
returns true. Here is a short and concise example:
1 n um b er _ li s t = r a n g e ( −5 , 5 )
2 l e s s _ t h a n _ z e r o = l i s t ( f i l t e r ( lambda x : x < 0 , n um b er _ li s t ) )
3 pr int ( less_than_zero )
4
5 # Output : [ −5 , −4, −3, −2, −1]

The filter resembles a for loop but it is a builtin function and faster.
Note: If map and filter do not appear beautiful to you then you can read
about list/dict/tuple comprehensions.

Reduce
Reduce is a really useful function for performing some computation on a list
and returning the result. It applies a rolling computation to sequential pairs
of values in a list. For example, if you wanted to compute the product of a
list of integers.
So the normal way you might go about doing this task in python is using
a basic for loop:
1 product = 1
2 l i s t = [1 , 2 , 3 , 4]
3 f o r num i n l i s t :
30.4. CLASS 725

4 p r o d u c t = p r o d u c t ∗ num
5
6 # p r o d u c t = 24

Now let’s try it with reduce:

1 from f u n c t o o l s import r e d u c e
2 p r o d u c t = r e d u c e ( ( lambda x , y : x ∗ y ) , [ 1 , 2 , 3 , 4 ] )
3
4 # Output : 24

30.4 Class
30.4.1 Special Methods
From [1]. http://www.informit.com/articles/article.aspx?p=453682&
seqNum=6 All the built-in data types implement a collection of special object
methods. The names of special methods are always preceded and followed
by double underscores (__). These methods are automatically triggered by
the interpreter as a program executes. For example, the operation x + y
is mapped to an internal method, x.__add__(y), and an indexing opera-
tion, x[k], is mapped to x.__getitem__(k). The behavior of each data type
depends entirely on the set of special methods that it implements.
User-defined classes can define new objects that behave like the built-
in types simply by supplying an appropriate subset of the special methods
described in this section. In addition, built-in types such as lists and dictio-
naries can be specialized (via inheritance) by redefining some of the special
methods. In this book, we only list the essential ones so that it speeds up
our interview preparation.

Object Creation, Destruction, and Representation We first list

these special methods in Table 30.6. A good and useful way to imple-
ment a class is through __repr__() method. By calling built-in function
repr(built-in object) and implement self-defined class as the same as built-in
object. Doing so avoids us implementing a lot of other special methods for
our class and still has most of behaviors needed. For example, we define a
Student class and represent it as of a tuple:
1 c l a s s Student :
2 d e f __init__ ( s e l f , name , grade , age ) :
3 s e l f . name = name
4 s e l f . grade = grade
5 s e l f . age = age
6 d e f __repr__ ( s e l f ) :
7 r e t u r n r e p r ( ( s e l f . name , s e l f . grade , s e l f . age ) )
8 a =Student ( ' John ' , 'A' , 1 4 )
9 p r i n t ( hash ( a ) )
10 print (a)
726 30. COOL PYTHON GUIDE

Table 30.6: Special Methods for Object Creation, Destruction, and Repre-
sentation
Method Description
*__init__(self Called to initialize a new instance
[,*args [,**kwargs]])
__del__(self) Called to destroy an instance
*__repr__(self) Creates a full string representation of an object
__str__(self) Creates an informal string representation
__cmp__(self,other) Compares two objects and returns negative, zero, or positive
__hash__(self) Computes a 32-bit hash index
hline Returns 0 or 1 for truth-value testing
__nonzero__(self)
__unicode__(self) Creates a Unicode string representation

If we have no __repr__(), the output for the following test cases are:
1 8766662474223
2 <__main__ . Student o b j e c t a t 0 x 7 f 9 2 5 c d 7 9 e f 0 >

Doing so, we has hash(),

Comparison Operations Table 30.7 lists all the comparison methods

that might need to be implemented in a class in order to apply comparison
in applications such as sorting.

Table 30.7: Special Methods for Object Creation, Destruction, and Repre-
sentation
Method Description
__lt__(self,other) self < other
__le__(self,other) self <= other
__gt__(self,other) self > other
__ge__(self,other) self >= other
__eq__(self,other) self == other
__ne__(self,other) self != other

30.4.2 Class Syntax

30.4.3 Nested Class
When we solving problem on leetcode, sometimes we need to wrap another
class object inside of the solution class. We can do this with the nested class.
When we re newing an instance, we use mainClassName.NestedClassName().
30.5. SHALLOW COPY AND THE DEEP COPY 727

30.5 Shallow Copy and the deep copy

For list and string data structures, we constantly met the case that we need
to copy. However, in programming language like C++, Python, we need
to know the difference between shallow copy and deep copy. Here we only
introduce the Python version.
Given the following two snippets of Python code:
1 c o l o u r s 1 = [ " red " , " green " ]
2 colours2 = colours1
3 c o l o u r s 2 = [ " rouge " , " vert " ]
4 print ( colours1 )
5 >>> [ ' r e d ' , ' g r e e n ' ]

1 c o l o u r s 1 = [ " red " , " green " ]

2 colours2 = colours1
3 colours2 [ 1 ] = " blue "
4 print ( colours1 )
5 [ ' red ' , ' blue ' ]

From the above outputs, we can see that the colors1 list is the same but
in the second case, it is changed although we are assigning value to colors2.
The result can be either wanted or not wanted. In python, to assign one
list to other directly is similar to a pointer in C++, which both point to
the same physical address. In the first case, colors2 is reassigned a new list,
which has an new address, so now colors2 points to the address of this new
list instead, which leaves the values of colors2 untouched at all. We can
visualize this process as follows: However, we often need to do copy and

(a) The copy process for code 1 (b) The copy process for code 2

Figure 30.1: Copy process

leave the original list or string unchanged. Because there are a variety of
list, from one dimensional, two-dimensional to multi-dimensional.

30.5.1 Shallow Copy using Slice Operator

It’s possible to completely copy shallow list structures with the slice operator
without having any of the side effects, which we have described above:
1 l i s t 1 = [ 'a ' , 'b ' , ' c ' , 'd ' ]
2 list2 = list1 [:]
3 l i s t 2 [1] = 'x '
728 30. COOL PYTHON GUIDE

4 print ( list2 )
5 [ 'a ' , 'x ' , ' c ' , 'd ' ]
6 print ( list1 )
7 [ 'a ' , 'b ' , ' c ' , 'd ' ]

Also, for Python 3, we can use list.copy() method

1 l i s t 2 = l i s t 1 . copy ( )

But as soon as a list contains sublists, we have the same difficulty, i.e.
just pointers to the sublists.
1 l s t 1 = [ ' a ' , ' b ' , [ ' ab ' , ' ba ' ] ]
2 lst2 = lst1 [ : ]

This behaviour is depicted in the following diagram:

Figure 30.2: Caption

If you assign a new value to the 0th Element of one of the two lists, there
will be no side effect. Problems arise, if you change one of the elements of
the sublist.
1 >>> l s t 1 = [ ' a ' , ' b ' , [ ' ab ' , ' ba ' ] ]
2 >>> l s t 2 = l s t 1 [ : ]
3 >>> l s t 2 [ 0 ] = ' c '
4 >>> l s t 2 [ 2 ] [ 1 ] = ' d '
5 >>> p r i n t ( l s t 1 )
6 [ ' a ' , ' b ' , [ ' ab ' , ' d ' ] ]

The following diagram depicts what happens, if one of the elements of a

sublist will be changed: Both the content of lst1 and lst2 are changed.

30.5.2 Iterables, Generators, and Yield

https://pythontips.com/2013/09/29/the-python-yield-keyword-explained/.
Seems like it can not yeild a list.

30.5.3 Deep Copy using copy Module

A solution to the described problems is to use the module "copy". This
module provides the method "copy", which allows a complete copy of a
arbitrary list, i.e. shallow and other lists.
30.5. SHALLOW COPY AND THE DEEP COPY 729

Figure 30.3: Caption

The following script uses our example above and this method:

1 from copy import deepcopy

2
3 l s t 1 = [ ' a ' , ' b ' , [ ' ab ' , ' ba ' ] ]
4
5 l s t 2 = deepcopy ( l s t 1 )
6
7 lst2 [ 2 ] [ 1 ] = "d"
8 lst2 [0] = "c" ;
9
10 print lst2
11 print lst1

If we save this script under the name of deep_copy.py and if we call the
script with“python deep_copy.p”, we will receive the following output:

1 $ python deep_copy . py
2 [ ' c ' , ' b ' , [ ' ab ' , ' d ' ] ]
3 [ ' a ' , ' b ' , [ ' ab ' , ' ba ' ] ]

Figure 30.4: Caption

This section is cited from Need to modify.

730 30. COOL PYTHON GUIDE

30.6 Global Vs nonlocal

30.7 Loops
The for loop can often be needed in algorithms we have two choices: for and
while. So to learn the basic grammar to do for loop easily could help us be
more efficienct in programming.
Usually for loop is used to iterate over a sequence or matrix data. For
example, the following grammar works for either string or list.
1 # f o r loop in a l i s t to get the value d i r e c t l y
2 a = [5 , 4 ,3 , 2 , 1]
3 f o r num i n a :
4 p r i n t (num)
5 # f o r loop in a l i s t use index
6 f o r idx in range ( len ( a ) ) :
7 print ( a [ idx ] )
8 # f o r l o o p i n a l i s t g e t both i n d e x and v a l u e d i r e c t l y
9 f o r idx , num i n enumerate ( a ) :
10 p r i n t ( idx , num)

Sometimes, we want to iterate two lists jointly at the same time, which
requires they both have the same length. We can use zip to join them
together, and all the others for loop works just as the above. For example:
1 a , b = [1 , 2 , 3 , 4 , 5] , [5 , 4 , 3 , 2 , 1]
2 f o r idx , (num_a , num_b) i n enumerate ( z i p ( a , b ) ) :
3 p r i n t ( idx , num_a , num_b)

30.8 Special Skills

1. Swap the value of variable
1 a , b = 7 , 10
2 print (a , b)
3 a, b = b, a
4 print (a , b)
5

2. Join all the string elements in a list to a whole string

1 a = [ " C ra ck i ng " , " LeetCode , " Problems " ]
2 print ( " " , join (a) )
3

3. Find the most frequent element in a list

1 a = [1 , 3 ,5 ,6 ,9 ,9 ,4 ,10 ,9]
2 p r i n t (max( s e t ( a ) , key = a . count ) )
3 # o r u s e c o u n t e r from t h e c o l l e c t i o n s
4 from c o l l e c t i o n s import c o u n t e r
30.9. SUPPLEMENTAL PYTHON TOOLS 731

5 c n t = Counter ( a )
6 p r i n t ( c n t . most_common ( 3 ) )
7

4. Check if two strings are comprised of the same letters.

1 from c o l l e c t i o n s import Counter
2 Counter ( s t r 1 ) == Counter ( s t r 2 )
3

5. Reversing
1 # 1 . r e v e r s i n g s t r i n g s or l i s t
2 a = ' crackingleetcode '
3 b = [1 ,2 ,3 ,4 ,5]
4 print (a [:: −1] , a [:: −1])
5 # 2 . i t e r a t e o v e r each c h a r o f t h e s t r i n g o r l i s t
c o n t e n t s i n r e v e r s e o r d e r e f f i c i e n t l y , h e r e we u s e z i p
to
6 f o r char , num i n z i p ( r e v e r s e d ( a ) , r e v e r s e d ( b ) ) :
7 p r i n t ( char , num)
8 #3 . r e v e r s e each d i g i t i n an i n t e g e r o r f l o a t number
9 num = 123456789
10 p r i n t ( i n t ( s t r (num) [ : : − 1 ] ) )
11

6. Remove the duplicates from list or string. We can convert it to set at

first, but this wont keep the original order of the elements. If we want
to keep the order, we can use the OrderdDict method from collections.
1 a = [5 , 4 , 4 , 3 , 3 , 2 , 1]
2 no_duplicate = l i s t ( s e t ( a ) )
3
4 from c o l l e c t i o n s import OrderedDict
5 p r i n t ( l i s t ( OrderedDict . fromkeys ( a ) . k e y s ( ) )
6

7. Find the min or max element or the index.

30.9 Supplemental Python Tools

30.9.1 Re
30.9.2 Bitsect

1 def index ( a , x ) :
2 ' Locate the l e f t m o s t value e x a c t l y equal to x '
3 i = bisect_left (a , x)
4 i f i != l e n ( a ) and a [ i ] == x :
5 return i
6 r a i s e ValueError
732 30. COOL PYTHON GUIDE

7
8 def find_lt (a , x) :
9 ' Find r i g h t m o s t v a l u e l e s s than x '
10 i = bisect_left (a , x)
11 if i :
12 r e t u r n a [ i −1]
13 r a i s e ValueError
14
15 def find_le (a , x) :
16 ' Find r i g h t m o s t v a l u e l e s s than o r e q u a l t o x '
17 i = bisect_right (a , x)
18 if i :
19 r e t u r n a [ i −1]
20 r a i s e ValueError
21
22 def find_gt (a , x ) :
23 ' Find l e f t m o s t v a l u e g r e a t e r than x '
24 i = bisect_right (a , x)
25 i f i != l e n ( a ) :
26 return a [ i ]
27 r a i s e ValueError
28
29 def find_ge ( a , x ) :
30 ' Find l e f t m o s t item g r e a t e r than o r e q u a l t o x '
31 i = bisect_left (a , x)
32 i f i != l e n ( a ) :
33 return a [ i ]
34 r a i s e ValueError

30.9.3 collections
collections is a module in Python that implements specialized container
data types alternative to Python’s general purpose built-in containers: dict,
list, set, and tuple. The including container type is summarized in Ta-
ble 30.8. Most of them we have learned in Part ??, therefore, in the table
we simply put the reference in the table. Before we use them, we need to
import each data type as:
1 from c o l l e c t i o n s import deque , Counter , OrderedDict , d e f a u l t d i c t
, namedtuple
30.9. SUPPLEMENTAL PYTHON TOOLS 733

Table 30.8: Container Data types in collections module.

Container Description Refer
namedtuple factory function for creating tuple
subclasses with named fields
deque list-like container with fast appends
and pops on either end
Counter dict subclass for counting hashable
objects
defaultdict dict subclass that calls a factory func-
tion to supply missing values
OrderedDict dict subclass that remembers the or-
der entries were added
734 30. COOL PYTHON GUIDE
Bibliography

[1] D. M. Beazley, Python essential reference, Addison-Wesley Profes-

sional, 2009.

[2] T. H. Cormen, Introduction to algorithms, MIT press, 2009.

[3] S. Halim and F. Halim, Competitive Programming 3, Lulu Independent

Publish, 2013.

[4] B. Slatkin, Effective Python: 59 Specific Ways to Write Better Python,

Pearson Education, 2015.

[5] H. hua jiang, “Leetcode blogs,” https://zxi.mytechroad.com/blog/

category, 2018, [Online; accessed 19-July-2018].

[6] B. Baka, “Python data structures and algorithms: Improve application

performance with graphs, stacks, and queues,” 2017.

[7] “Competitive Programming,” https://cp-algorithms.com/, 2019,

[Online; accessed 19-July-2018].

[8] “cs princeton,” https://aofa.cs.princeton.edu/60trees/, 2019,

[Online; accessed 19-July-2018].

[9] S. S. Skiena, The algorithm design manual: Text, vol. 1, Springer

Science & Business Media, 1998.

[10] D. Phillips, Python 3 Object Oriented Programming, Packt Publishing

Ltd, 2010.

735

Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Computer Science III
No ratings yet
Computer Science III
244 pages
Data Structures and Algorithms: Lecture Notes For
No ratings yet
Data Structures and Algorithms: Lecture Notes For
126 pages
Audio, Video, and Media in the Ministry
From Everand
Audio, Video, and Media in the Ministry
Clarence Floyd Richmond
No ratings yet
HN Daa m4 Question Bank
100% (1)
HN Daa m4 Question Bank
5 pages
Trigonometry Exam Questions
50% (2)
Trigonometry Exam Questions
343 pages
lc_1
No ratings yet
lc_1
337 pages
Data Structures and Algorithms With Object-Oriented Design Patterns in Python
No ratings yet
Data Structures and Algorithms With Object-Oriented Design Patterns in Python
14 pages
Pythoneasy
No ratings yet
Pythoneasy
3 pages
Instant download Algorithms For Competitive Programming 1st Edition David Esparza Alba pdf all chapter
100% (2)
Instant download Algorithms For Competitive Programming 1st Edition David Esparza Alba pdf all chapter
65 pages
ProblemSolvingwithAlgorithmsandDataStructures_2
No ratings yet
ProblemSolvingwithAlgorithmsandDataStructures_2
240 pages
Uol Algorithms
No ratings yet
Uol Algorithms
215 pages
Algorithmic Problem Solving: Johan Sannemo 2020
No ratings yet
Algorithmic Problem Solving: Johan Sannemo 2020
404 pages
I Fundamentals 1
No ratings yet
I Fundamentals 1
593 pages
Johan Sannemo - Principles of Algorithmic Problem Solving
No ratings yet
Johan Sannemo - Principles of Algorithmic Problem Solving
351 pages
Principles of Algorithmic Problem Solving PDF
100% (1)
Principles of Algorithmic Problem Solving PDF
351 pages
Algorithms For Competitive Programming 1st Edition David Esparza Alba download pdf
100% (3)
Algorithms For Competitive Programming 1st Edition David Esparza Alba download pdf
62 pages
DSA Data Structures and Algorithms
No ratings yet
DSA Data Structures and Algorithms
126 pages
Algorithms For Competitive Programming 2021
No ratings yet
Algorithms For Competitive Programming 2021
371 pages
Lpo
No ratings yet
Lpo
464 pages
Algorithms For Competitive Programming 1st Edition David Esparza Alba download
No ratings yet
Algorithms For Competitive Programming 1st Edition David Esparza Alba download
81 pages
Algorithm Structure C++ - Teo OK PDF
No ratings yet
Algorithm Structure C++ - Teo OK PDF
126 pages
Dsa Book1 PDF
No ratings yet
Dsa Book1 PDF
126 pages
Python
No ratings yet
Python
4 pages
Algorithms For Competitive Programming 1st Edition David Esparza Alba - Instantly access the full ebook content in just a few seconds
100% (2)
Algorithms For Competitive Programming 1st Edition David Esparza Alba - Instantly access the full ebook content in just a few seconds
61 pages
Full download Problem Solving with Algorithms and Data Structures Using Python SECOND EDITION Bradley N. Miller pdf docx
100% (1)
Full download Problem Solving with Algorithms and Data Structures Using Python SECOND EDITION Bradley N. Miller pdf docx
61 pages
Problem Solving With Algorithms And Data Structures Using Python Second Edition 2nd Bradley N Miller pdf download
No ratings yet
Problem Solving With Algorithms And Data Structures Using Python Second Edition 2nd Bradley N Miller pdf download
90 pages
ocaml
No ratings yet
ocaml
462 pages
CP Guide Contents
No ratings yet
CP Guide Contents
7 pages
Foundations of Computing Science
No ratings yet
Foundations of Computing Science
97 pages
The Computer Science Handbook
100% (2)
The Computer Science Handbook
271 pages
Instant Access to (Ebook) Algorithms For Competitive Programming by David Esparza Alba,J A Ruiz Leal ISBN 9788700297067, 9798700297066, 8700297062 ebook Full Chapters
100% (9)
Instant Access to (Ebook) Algorithms For Competitive Programming by David Esparza Alba,J A Ruiz Leal ISBN 9788700297067, 9798700297066, 8700297062 ebook Full Chapters
62 pages
Algorithms Parallel and Sequential
No ratings yet
Algorithms Parallel and Sequential
514 pages
Guide
No ratings yet
Guide
160 pages
An Intuitive Introduction To Data Structures Heinold
No ratings yet
An Intuitive Introduction To Data Structures Heinold
167 pages
Algorithms in Python
100% (10)
Algorithms in Python
218 pages
Algorithms in Python
100% (1)
Algorithms in Python
218 pages
Python and Algorithms : University of New York at Stony Brook
No ratings yet
Python and Algorithms : University of New York at Stony Brook
218 pages
CM1035 Ads
No ratings yet
CM1035 Ads
29 pages
Prep Doc Coding Algo
No ratings yet
Prep Doc Coding Algo
100 pages
Data Sructures and Algorithms
No ratings yet
Data Sructures and Algorithms
112 pages
Interview PDF
No ratings yet
Interview PDF
100 pages
Coding Interview Preparation by Kaiyu Zheng 1651769215
No ratings yet
Coding Interview Preparation by Kaiyu Zheng 1651769215
100 pages
F# Programming
No ratings yet
F# Programming
104 pages
dsaa12
No ratings yet
dsaa12
1 page
Crash Course Coding Companion
No ratings yet
Crash Course Coding Companion
136 pages
L1 Chapters 1 2
100% (1)
L1 Chapters 1 2
51 pages
Abstractions and Paradigms For Programming
No ratings yet
Abstractions and Paradigms For Programming
107 pages
CP Handbook PDF
No ratings yet
CP Handbook PDF
296 pages
DSA_Notes 2
No ratings yet
DSA_Notes 2
19 pages
Introduction To Computing
No ratings yet
Introduction To Computing
270 pages
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Human Nature Potential in Nurture
From Everand
Human Nature Potential in Nurture
David L. Hawk
No ratings yet
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Teaching Scratch Programming…from Scratch
From Everand
Teaching Scratch Programming…from Scratch
John Nunez
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
FSD QUESTION BANK MODULE 1 BIS601
100% (1)
FSD QUESTION BANK MODULE 1 BIS601
2 pages
Pptsoft 1
No ratings yet
Pptsoft 1
33 pages
Chapter 1
No ratings yet
Chapter 1
81 pages
Discrete Mathematical Structures
No ratings yet
Discrete Mathematical Structures
4 pages
Lab Program 1 Advanced Java
No ratings yet
Lab Program 1 Advanced Java
2 pages
MATH 2280 Integration Formulas - SPR-19
No ratings yet
MATH 2280 Integration Formulas - SPR-19
1 page
Yacht
No ratings yet
Yacht
10 pages
Optimal Placement of Valves in A Water Distribution Network With CLP (FD)
No ratings yet
Optimal Placement of Valves in A Water Distribution Network With CLP (FD)
19 pages
Searching Algorithms (BFS, DFS, DLS, UCS)
No ratings yet
Searching Algorithms (BFS, DFS, DLS, UCS)
45 pages
ALG 2 CH 6 quiz
No ratings yet
ALG 2 CH 6 quiz
2 pages
Spatial Networks: A Complete Introduction: From Graph Theory and Statistical Physics to Real-World Applications Barthelemy download
100% (3)
Spatial Networks: A Complete Introduction: From Graph Theory and Statistical Physics to Real-World Applications Barthelemy download
50 pages
1462531595E textofChap4Module4
No ratings yet
1462531595E textofChap4Module4
9 pages
Solutions - Trigonometric Equation: Exercise # 1
No ratings yet
Solutions - Trigonometric Equation: Exercise # 1
5 pages
Section 10.2 The Hyperbola Worksheets
No ratings yet
Section 10.2 The Hyperbola Worksheets
8 pages
Combinations and Permutations
No ratings yet
Combinations and Permutations
25 pages
Title: Implement Depth-First Search: Department of Computer Science and Engineering
No ratings yet
Title: Implement Depth-First Search: Department of Computer Science and Engineering
5 pages
Helllooo Bacchhjo Lelo Ye Aur Solve Kra Na Pls Rav
No ratings yet
Helllooo Bacchhjo Lelo Ye Aur Solve Kra Na Pls Rav
84 pages
Lab 13 Implementation of Graphs
No ratings yet
Lab 13 Implementation of Graphs
5 pages
Ch-08 Solution-1328673
No ratings yet
Ch-08 Solution-1328673
19 pages
Lecture 10: Sparse Systems: Prof. Tom Overbye Dept. of Electrical and Computer Engineering Texas A&M University
No ratings yet
Lecture 10: Sparse Systems: Prof. Tom Overbye Dept. of Electrical and Computer Engineering Texas A&M University
22 pages
Conic Sections Practice Test02
No ratings yet
Conic Sections Practice Test02
9 pages
Playing Games With Algorithms Algorithmic Combinatorial Game Theory
100% (1)
Playing Games With Algorithms Algorithmic Combinatorial Game Theory
42 pages
Floyd
No ratings yet
Floyd
23 pages
Solution:: Quiz 2
No ratings yet
Solution:: Quiz 2
16 pages
ADA PYQ (Shivani)
No ratings yet
ADA PYQ (Shivani)
20 pages
FAssignment 6 - Jupyter Notebook
No ratings yet
FAssignment 6 - Jupyter Notebook
8 pages
PRIM
No ratings yet
PRIM
6 pages
3a. Using Trigonometric Identities
No ratings yet
3a. Using Trigonometric Identities
1 page
Setting Up A MATLAB TCPIP Server Session
No ratings yet
Setting Up A MATLAB TCPIP Server Session
5 pages
62498
No ratings yet
62498
50 pages
Topological Sort Homework
No ratings yet
Topological Sort Homework
5 pages
Table of Laplace Transforms: Heaviside Function Dirac Delta Function
No ratings yet
Table of Laplace Transforms: Heaviside Function Dirac Delta Function
2 pages
Year 9 Autumn Higher Markscheme A
No ratings yet
Year 9 Autumn Higher Markscheme A
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.