We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

H-Index

Number: 274

Difficulty: Medium

Paid? No

Companies: Microsoft, Google, Amazon, LinkedIn, Bloomberg, Nvidia, Adobe, Apple, Yandex, ByteDance, Meta


Problem Description

Given an array of integers where each element represents the number of citations a paper received, determine the researcher's h-index. The h-index is defined as the maximum h such that there are at least h papers with at least h citations each.


Key Insights

  • The problem requires determining a threshold h such that h papers have at least h citations.
  • Sorting the array or using a bucket/counting sort method can facilitate finding this threshold.
  • A bucket sort approach can achieve a linear time solution by counting citations up to the number of papers.
  • Traversing the bucket array in reverse (from high citation counts to low) helps in summing the cumulative counts to pinpoint the h-index.

Space and Time Complexity

Time Complexity: O(n), where n is the number of papers (using the counting/bucket sort approach).
Space Complexity: O(n), for storing the bucket counts.


Solution

We use a bucket sort method since the maximum citation count we care about is bounded by n (the total number of papers). First, create a bucket array of size n+1. For each citation, if it is greater than n, increment the count in bucket[n] (to cap the count); otherwise, increment bucket[citation]. Then iterate the bucket array in reverse order accumulating the number of papers with citations greater than or equal to the current index. The first index where the accumulated count becomes equal to or exceeds the current index is the h-index.


Code Solutions

# Python solution using bucket sort.

def hIndex(citations):
    n = len(citations)  # Total number of papers.
    bucket = [0] * (n + 1)  # Initialize bucket of size n+1.
    
    # Fill the bucket.
    for citation in citations:
        if citation >= n:
            bucket[n] += 1  # Cap citations over n.
        else:
            bucket[citation] += 1
    
    total = 0  # This will accumulate the counts of papers.
    # Traverse from the highest possible h-index downwards.
    for i in range(n, -1, -1):
        total += bucket[i]
        if total >= i:  # Found the maximum h such that at least i papers have i or more citations.
            return i

# Example usage:
print(hIndex([3, 0, 6, 1, 5]))  # Output: 3
← Back to All Questions