We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Find Median from Data Stream

Number: 295

Difficulty: Hard

Paid? No

Companies: Amazon, Google, Meta, Apple, Anduril, Microsoft, Uber, Splunk, IXL, TikTok, PayPal, Tinder, Okta, KLA, Coupang, Pinterest, Oracle, Nvidia, Docusign, Walmart Labs, Citadel, Bloomberg, Yahoo, Goldman Sachs, Flipkart, Adobe, Cohesity


Problem Description

Design a data structure that supports adding numbers from a data stream and retrieving the median of all added numbers efficiently. When the total count is even, the median is defined as the average of the two middle values.


Key Insights

  • Use two heaps: a max-heap for the lower half and a min-heap for the upper half of numbers.
  • Balancing the heaps ensures the median can be computed in O(1) time once numbers are added.
  • Rebalance the heaps after each insertion to ensure size properties hold (difference in sizes is at most 1).

Space and Time Complexity

Time Complexity:

  • addNum: O(log n) per insertion (heap operations).
  • findMedian: O(1) (accessing the top elements). Space Complexity:
  • O(n), where n is the number of elements added in the data stream, as they are stored in the two heaps.

Solution

We maintain two heaps:

  • A max-heap (lowerHalf) for the smaller half of the numbers.
  • A min-heap (upperHalf) for the larger half of the numbers. On inserting a new number, decide the heap membership based on the current median. After each insertion, ensure that the heaps have balanced sizes (the size difference is at most one). When computing the median:
  • If both heaps have the same number of elements, the median is the average of the max of lowerHalf and the min of upperHalf.
  • If one heap has more elements, the median is the top element of that heap.

This method makes it efficient to dynamically update and retrieve the median as new numbers are inserted.


Code Solutions

import heapq

class MedianFinder:
    def __init__(self):
        # max-heap for the lower half (store negatives to simulate max-heap)
        self.lowerHalf = []
        # min-heap for the upper half
        self.upperHalf = []
    
    def addNum(self, num: int) -> None:
        # Push to max-heap (simulate by pushing negative)
        heapq.heappush(self.lowerHalf, -num)
        # Ensure every element in lowerHalf is <= every element in upperHalf
        # Move maximum of lowerHalf to upperHalf
        heapq.heappush(self.upperHalf, -heapq.heappop(self.lowerHalf))
        
        # Balance the heaps: if upperHalf exceeds lowerHalf in size
        if len(self.upperHalf) > len(self.lowerHalf):
            heapq.heappush(self.lowerHalf, -heapq.heappop(self.upperHalf))
    
    def findMedian(self) -> float:
        # If both heaps are balanced, median is average of both heaps' top elements
        if len(self.lowerHalf) == len(self.upperHalf):
            return (-self.lowerHalf[0] + self.upperHalf[0]) / 2.0
        # Otherwise, median is the top element of the larger heap
        return -self.lowerHalf[0]

# Example usage:
# medianFinder = MedianFinder()
# medianFinder.addNum(1)
# medianFinder.addNum(2)
# print(medianFinder.findMedian())  # Outputs 1.5
# medianFinder.addNum(3)
# print(medianFinder.findMedian())  # Outputs 2.0
← Back to All Questions