Top K problems

TopK Problem

Find the Top K elements in a given list of size \(n\).

Solutions

  1. sort the list and extract the top k elements -- inefficient; Time Complexity: \(O(nlogn)\)
  2. only sort the top k elements: bubble; Time Complexity: \(O(kn)\)
  3. min heap: maintain a min-heap of size k; Time Complexity: \(O(nlogk)\)
  4. Quick Selection; Average Time Complexity: \(O(n)\)

Quick Selection

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def partition(nums, l, r):
# generally pick the first/last element as pivot
pivot = nums[l]
while r > l:
# choose the first element < pivot from right side
while nums[r] >= pivot and r > l: r -= 1
nums[l] = nums[r] # move this to the left side
# choose the first element > pivot from left side
while nums[l] <= pivot and r > l: l += 1
nums[r] = nums[l] # move this to the right side
nums[l] = pivot # do not forget pivot
return l # return pivot index

# Time Complexity: O(nlogn), Space Complexity: O(logn)
def quick_sort(nums, l, r):
if l >= r: return
# choose pivot and sort the array based on this pivot
pivot = partition(nums, l, r)
quick_sort(nums, l, pivot - 1) # sort left part
quick_sort(nums, pivot + 1, r) # sort right part

# Time Complexity: O(n), Space Complexity: O(logn)
def quick_selection(nums, k, l, r):
if l >= r: return
pivot = partition(nums, l, r)
if k == pivot: return # hit top k
elif k < pivot: quick_selection(nums, k, l, pivot - 1) # hit left part
elif k > pivot: quick_selection(nums, k, pivot + 1, r) # hit right part

quick_selection(nums, k, 0, len(nums) - 1) # select min top k: nums[:k]
quick_selection(nums, len(nums) - k, 0, len(nums) - 1) # select max top k: nums[-k:]

The key idea of Quick Sort is Divide and Conquer (solving problem by solving multiple subproblems). Quick Selection can be considered as a special case of Quick Sort. Its basic idea is Reduce and Conquer (solving problem by solving one certain subproblem).

The average time complexity of Quick Selection is \(O(n)\) (sounds good right?) but the worst case could be \(O(n^2)\). Thus we introduce Median of Medians (also know as BFPRT, last names of the authors) to supply a better pivot.

Median of Medians

Median of medians finds an approximate median in linear time and can be used to find an improved pivot in Quick Sort and Quick Selection which reduces the worst-case complexity.

Steps:

  1. Divide input into groups of at most five elements.
  2. Find the medians of each group.
  3. Find the median of medians.

The median of medians algorithm computes an approximate median which is guaranteed to be between 30% and 70%.

Suppose there are \(n\) elements and divided to \(\lceil n/5 \rceil\) groups. Thus we can derive \(\lceil n/5 \rceil\) medians.

The median of medians is greater than at least \(\lfloor\lceil n/5 \rceil / 2\rfloor\) medians which means it is greater than \(\lfloor\lceil n/5 \rceil / 2\rfloor *3 + 2 \geq \lfloor 3n/10 \rfloor\) elements, thus 30% elements.