Problem Description
Given an array of integers where each element represents the number of citations a paper received, determine the researcher's h-index. The h-index is defined as the maximum h such that there are at least h papers with at least h citations each.
Key Insights
- The problem requires determining a threshold h such that h papers have at least h citations.
- Sorting the array or using a bucket/counting sort method can facilitate finding this threshold.
- A bucket sort approach can achieve a linear time solution by counting citations up to the number of papers.
- Traversing the bucket array in reverse (from high citation counts to low) helps in summing the cumulative counts to pinpoint the h-index.
Space and Time Complexity
Time Complexity: O(n), where n is the number of papers (using the counting/bucket sort approach).
Space Complexity: O(n), for storing the bucket counts.
Solution
We use a bucket sort method since the maximum citation count we care about is bounded by n (the total number of papers). First, create a bucket array of size n+1. For each citation, if it is greater than n, increment the count in bucket[n] (to cap the count); otherwise, increment bucket[citation]. Then iterate the bucket array in reverse order accumulating the number of papers with citations greater than or equal to the current index. The first index where the accumulated count becomes equal to or exceeds the current index is the h-index.