Design a data structure that supports adding numbers from a data stream and retrieving the median of all added numbers efficiently. When the total count is even, the median is defined as the average of the two middle values.
Key Insights
Use two heaps: a max-heap for the lower half and a min-heap for the upper half of numbers.
Balancing the heaps ensures the median can be computed in O(1) time once numbers are added.
Rebalance the heaps after each insertion to ensure size properties hold (difference in sizes is at most 1).
Space and Time Complexity
Time Complexity:
addNum: O(log n) per insertion (heap operations).
findMedian: O(1) (accessing the top elements).
Space Complexity:
O(n), where n is the number of elements added in the data stream, as they are stored in the two heaps.
Solution
We maintain two heaps:
A max-heap (lowerHalf) for the smaller half of the numbers.
A min-heap (upperHalf) for the larger half of the numbers.
On inserting a new number, decide the heap membership based on the current median. After each insertion, ensure that the heaps have balanced sizes (the size difference is at most one).
When computing the median:
If both heaps have the same number of elements, the median is the average of the max of lowerHalf and the min of upperHalf.
If one heap has more elements, the median is the top element of that heap.
This method makes it efficient to dynamically update and retrieve the median as new numbers are inserted.
Code Solutions
import heapq
classMedianFinder:def__init__(self):# max-heap for the lower half (store negatives to simulate max-heap) self.lowerHalf =[]# min-heap for the upper half self.upperHalf =[]defaddNum(self, num:int)->None:# Push to max-heap (simulate by pushing negative) heapq.heappush(self.lowerHalf,-num)# Ensure every element in lowerHalf is <= every element in upperHalf# Move maximum of lowerHalf to upperHalf heapq.heappush(self.upperHalf,-heapq.heappop(self.lowerHalf))# Balance the heaps: if upperHalf exceeds lowerHalf in sizeiflen(self.upperHalf)>len(self.lowerHalf): heapq.heappush(self.lowerHalf,-heapq.heappop(self.upperHalf))deffindMedian(self)->float:# If both heaps are balanced, median is average of both heaps' top elementsiflen(self.lowerHalf)==len(self.upperHalf):return(-self.lowerHalf[0]+ self.upperHalf[0])/2.0# Otherwise, median is the top element of the larger heapreturn-self.lowerHalf[0]# Example usage:# medianFinder = MedianFinder()# medianFinder.addNum(1)# medianFinder.addNum(2)# print(medianFinder.findMedian()) # Outputs 1.5# medianFinder.addNum(3)# print(medianFinder.findMedian()) # Outputs 2.0