Problem Description
Design a data structure that supports dynamically adding numbers, removing the earliest added number, and querying statistical properties – mean (floored average), median (middle element in sorted order with the larger one chosen if there are two), and mode (most frequent element; if a tie exists, return the smallest one). All operations must be efficient as the total number of operations can be up to 10^5.
Key Insights
- A FIFO (queue) is used to record the order of insertion so that the removal operation always deletes the first added element.
- Maintaining the running sum and count allows computation of the mean in O(1) time.
- For median, a dual heap (max-heap for lower half and min-heap for upper half) strategy with lazy deletion (similar to the “Sliding Window Median” technique) can efficiently track the median even when removals occur in arbitrary order.
- For mode, a frequency dictionary coupled with a max-heap (storing tuples of (-frequency, number)) enables quick retrieval of the most frequent element, while lazy updates help maintain correctness after removals.
Space and Time Complexity
Time Complexity:
- addNumber: O(log n) due to heap insertions and balancing (median update) and O(1) for updating the sum and frequency dictionary (plus O(log n) for mode heap update), overall O(log n).
- removeFirstAddedNumber: O(log n) amortized, mainly due to lazy removal adjustments in heaps.
- getMean, getMedian, getMode: O(1) to O(log n) in worst-case (when cleaning up lazy deletions for the heaps in median and mode queries).
Space Complexity: O(n) for storing the queue, heaps, frequency dictionary, and lazy removal counters.
Solution
We design the StatisticsTracker class by combining several data structures:
- A queue (e.g., a deque) for keeping the insertion order so that removeFirstAddedNumber always deletes the oldest element.
- A running sum and count variables to compute the mean in constant time.
- A frequency dictionary to track counts of each number.
- A max-heap for mode queries where each entry is (-frequency, number). When frequencies change (by addition or removal), new entries are pushed into the heap. In getMode(), we pop from the heap until the top entry reflects the current frequency; ties are automatically resolved by the natural order because if frequencies are equal the heap (min on the second element) gives the smaller number.
- A DualHeap structure (using two heaps – a max-heap for the lower half and a min-heap for the upper half) to support median queries. Lazy deletion is used in the dual heap to handle removals that may not be at the root. Balancing between the heaps is maintained so that the median is always available at the top of the max-heap.
Each API call (addNumber, removeFirstAddedNumber) will update these data structures accordingly.