Problem Description
Given an integer array nums, the uniqueness array is defined as the sorted array of the number of distinct elements for every contiguous subarray of nums. For each subarray nums[i..j], let distinct(nums[i..j]) be the number of unique elements in that subarray. The task is to compute the median of this uniqueness array. When the uniqueness array has an even number of elements, the median is defined as the smaller of the two middle values.
Key Insights
- Directly generating all subarrays and calculating their distinct count leads to a time complexity that is infeasible for n up to 10^5.
- Instead of building the uniqueness array explicitly, we can count how many subarrays have exactly k distinct elements.
- Use the sliding window technique to efficiently compute count_at_most(k): the number of subarrays with at most k distinct elements.
- The number of subarrays with exactly k distinct elements equals count_at_most(k) minus count_at_most(k-1) (inclusion-exclusion).
- With these counts, and knowing that the total number of subarrays is n*(n+1)/2, we can iterate over possible k values (from 1 to n) and accumulate frequencies until reaching the median target, which is (total_subarrays + 1) // 2 when considering 1-based indexing.
- An alternative approach is to use binary search over candidate answers based on the monotonicity properties of count_at_most; however, the cumulative frequency iteration is straightforward given the range of k.
Space and Time Complexity
Time Complexity: O(n log n) overall, where each count_at_most(k) is O(n) but careful cumulative counting (or binary search optimization) helps control the overall complexity. Space Complexity: O(n) due to frequency maps and additional variables.
Solution
The solution uses a helper function count_at_most(nums, k) which implements a sliding window approach using two pointers and a frequency map to count the subarrays having at most k distinct numbers. Then, the difference count_at_most(k) - count_at_most(k-1) gives the number of subarrays with exactly k distinct elements. By iterating over all possible k values and maintaining a cumulative sum of these exact counts, we determine the smallest k for which the cumulative frequency reaches or exceeds the median index in the sorted uniqueness array.