We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Design an Array Statistics Tracker

Number: 3710

Difficulty: Hard

Paid? Yes

Companies: N/A


Problem Description

Design a data structure that supports dynamically adding numbers, removing the earliest added number, and querying statistical properties – mean (floored average), median (middle element in sorted order with the larger one chosen if there are two), and mode (most frequent element; if a tie exists, return the smallest one). All operations must be efficient as the total number of operations can be up to 10^5.


Key Insights

  • A FIFO (queue) is used to record the order of insertion so that the removal operation always deletes the first added element.
  • Maintaining the running sum and count allows computation of the mean in O(1) time.
  • For median, a dual heap (max-heap for lower half and min-heap for upper half) strategy with lazy deletion (similar to the “Sliding Window Median” technique) can efficiently track the median even when removals occur in arbitrary order.
  • For mode, a frequency dictionary coupled with a max-heap (storing tuples of (-frequency, number)) enables quick retrieval of the most frequent element, while lazy updates help maintain correctness after removals.

Space and Time Complexity

Time Complexity:

  • addNumber: O(log n) due to heap insertions and balancing (median update) and O(1) for updating the sum and frequency dictionary (plus O(log n) for mode heap update), overall O(log n).
  • removeFirstAddedNumber: O(log n) amortized, mainly due to lazy removal adjustments in heaps.
  • getMean, getMedian, getMode: O(1) to O(log n) in worst-case (when cleaning up lazy deletions for the heaps in median and mode queries).

Space Complexity: O(n) for storing the queue, heaps, frequency dictionary, and lazy removal counters.


Solution

We design the StatisticsTracker class by combining several data structures:

  1. A queue (e.g., a deque) for keeping the insertion order so that removeFirstAddedNumber always deletes the oldest element.
  2. A running sum and count variables to compute the mean in constant time.
  3. A frequency dictionary to track counts of each number.
  4. A max-heap for mode queries where each entry is (-frequency, number). When frequencies change (by addition or removal), new entries are pushed into the heap. In getMode(), we pop from the heap until the top entry reflects the current frequency; ties are automatically resolved by the natural order because if frequencies are equal the heap (min on the second element) gives the smaller number.
  5. A DualHeap structure (using two heaps – a max-heap for the lower half and a min-heap for the upper half) to support median queries. Lazy deletion is used in the dual heap to handle removals that may not be at the root. Balancing between the heaps is maintained so that the median is always available at the top of the max-heap.

Each API call (addNumber, removeFirstAddedNumber) will update these data structures accordingly.


Code Solutions

import heapq
from collections import deque, defaultdict

class DualHeap:
    def __init__(self):
        # max heap for lower half, we store negatives
        self.lower = []
        # min heap for higher half
        self.higher = []
        # lazy removal dictionaries for heaps
        self.delayedLower = defaultdict(int)
        self.delayedHigher = defaultdict(int)
        # effective sizes
        self.lowerSize = 0
        self.higherSize = 0
    
    def pruneHeap(self, heap, delayed):
        # Remove the element from heap while it is marked for deletion.
        while heap and delayed.get(heap[0] if heap is self.higher else -heap[0], 0):
            if heap is self.higher:
                num = heap[0]
            else:
                num = -heap[0]
            heapq.heappop(heap)
            delayed[num] -= 1
            if delayed[num] == 0:
                del delayed[num]
    
    def rebalance(self):
        # Make sure lower has either equal size or one more element than higher.
        if self.lowerSize > self.higherSize + 1:
            # Move top of lower to higher
            self.pruneHeap(self.lower, self.delayedLower)
            moved = -heapq.heappop(self.lower)
            self.lowerSize -= 1
            heapq.heappush(self.higher, moved)
            self.higherSize += 1
        elif self.lowerSize < self.higherSize:
            self.pruneHeap(self.higher, self.delayedHigher)
            moved = heapq.heappop(self.higher)
            self.higherSize -= 1
            heapq.heappush(self.lower, -moved)
            self.lowerSize += 1

    def add(self, num):
        # Add number into one of the heaps.
        if not self.lower or num <= -self.lower[0]:
            heapq.heappush(self.lower, -num)
            self.lowerSize += 1
        else:
            heapq.heappush(self.higher, num)
            self.higherSize += 1
        self.rebalance()
    
    def remove(self, num):
        # Lazy deletion: decide which heap the number likely belongs to.
        if self.lower and num <= -self.lower[0]:
            self.delayedLower[num] += 1
            self.lowerSize -= 1
            if self.lower and num == -self.lower[0]:
                self.pruneHeap(self.lower, self.delayedLower)
        else:
            self.delayedHigher[num] += 1
            self.higherSize -= 1
            if self.higher and num == self.higher[0]:
                self.pruneHeap(self.higher, self.delayedHigher)
        self.rebalance()
    
    def getMedian(self):
        # Make sure top of lower is valid.
        self.pruneHeap(self.lower, self.delayedLower)
        return -self.lower[0]
        

class StatisticsTracker:
    def __init__(self):
        # Queue to store insertion order.
        self.queue = deque()
        # Running sum and count for mean.
        self.total = 0
        self.count = 0
        # Frequency dictionary for mode.
        self.freq = defaultdict(int)
        # Max heap for mode: ( -frequency, number )
        self.modeHeap = []
        # DualHeap for median queries.
        self.dualHeap = DualHeap()
    
    def addNumber(self, number: int) -> None:
        # Add to queue.
        self.queue.append(number)
        self.total += number
        self.count += 1
        
        # Update frequency and push new value into modeHeap.
        self.freq[number] += 1
        freqVal = self.freq[number]
        heapq.heappush(self.modeHeap, (-freqVal, number))
        
        # Add to dual heap.
        self.dualHeap.add(number)
    
    def removeFirstAddedNumber(self) -> None:
        # Remove from queue FIFO.
        number = self.queue.popleft()
        self.total -= number
        self.count -= 1
        
        # Update frequency and push new frequency value.
        self.freq[number] -= 1
        newFreq = self.freq[number]
        heapq.heappush(self.modeHeap, (-newFreq, number))
        if self.freq[number] == 0:
            del self.freq[number]
        
        # Remove from dual heap using lazy deletion.
        self.dualHeap.remove(number)
    
    def getMean(self) -> int:
        # Return floored mean.
        return self.total // self.count
    
    def getMedian(self) -> int:
        # Median is the top of the lower half.
        return self.dualHeap.getMedian()
    
    def getMode(self) -> int:
        # Lazy removal from mode heap until top entry is correct.
        while self.modeHeap:
            freqNeg, num = self.modeHeap[0]
            # If the current frequency matches the stored frequency.
            if self.freq.get(num, 0) == -freqNeg:
                return num
            heapq.heappop(self.modeHeap)
        return -1  # Should not reach here if there is at least one element.
        
# Example usage:
# tracker = StatisticsTracker()
# tracker.addNumber(4)
# tracker.addNumber(4)
# tracker.addNumber(2)
# tracker.addNumber(3)
# print(tracker.getMean())   # Output: 3
# print(tracker.getMedian()) # Output: 4
# print(tracker.getMode())   # Output: 4
# tracker.removeFirstAddedNumber()
# print(tracker.getMode())   # Output: 2
← Back to All Questions