Top K problems
TopK Problem
Find the Top K elements in a given list of size \(n\).
Solutions
- sort the list and extract the top k elements -- inefficient; Time Complexity: \(O(nlogn)\)
- only sort the top k elements: bubble; Time Complexity: \(O(kn)\)
- min heap: maintain a min-heap of size k; Time Complexity: \(O(nlogk)\)
- Quick Selection; Average Time Complexity: \(O(n)\)
Quick Selection
1 | def partition(nums, l, r): |
The key idea of Quick Sort is Divide and Conquer (solving problem by solving multiple subproblems). Quick Selection can be considered as a special case of Quick Sort. Its basic idea is Reduce and Conquer (solving problem by solving one certain subproblem).
The average time complexity of Quick Selection is \(O(n)\) (sounds good right?) but the worst case could be \(O(n^2)\). Thus we introduce Median of Medians (also know as BFPRT, last names of the authors) to supply a better pivot.
Median of Medians
Median of medians finds an approximate median in linear time and can be used to find an improved pivot in Quick Sort and Quick Selection which reduces the worst-case complexity.
Steps:
- Divide input into groups of at most five elements.
- Find the medians of each group.
- Find the median of medians.
The median of medians algorithm computes an approximate median which is guaranteed to be between 30% and 70%.
Suppose there are \(n\) elements and divided to \(\lceil n/5 \rceil\) groups. Thus we can derive \(\lceil n/5 \rceil\) medians.
The median of medians is greater than at least \(\lfloor\lceil n/5 \rceil / 2\rfloor\) medians which means it is greater than \(\lfloor\lceil n/5 \rceil / 2\rfloor *3 + 2 \geq \lfloor 3n/10 \rfloor\) elements, thus 30% elements.