We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Statistics from a Large Sample

Number: 1183

Difficulty: Medium

Paid? No

Companies: Microsoft


Problem Description

Given an array count of length 256 where count[k] represents the number of times the integer k appears in a sample (with k in the range [0, 255]), compute the following statistics for the sample:

  • minimum: The smallest integer present.
  • maximum: The largest integer present.
  • mean: The average value of all integers.
  • median: The middle value when the sample is sorted (or the average of the two middle values for an even number of elements).
  • mode: The integer that appears the most (guaranteed to be unique).

Return these five statistics as an array of floating-point numbers.


Key Insights

  • The sample is represented indirectly by counts, so there is no need to store the individual sample elements.
  • The minimum and maximum can be obtained by scanning the count array for the first and last non-zero entries.
  • The mean is calculated as the total sum of (value * frequency) divided by the total number of elements.
  • The median is determined by using cumulative counts to find the middle element(s), handling even and odd total counts appropriately.
  • The mode is the index with the highest frequency in the count array.

Space and Time Complexity

Time Complexity: O(256) ≈ O(1) since the count array size is fixed. Space Complexity: O(1) — only constant extra space is used.


Solution

We solve the problem by iterating through the count array once to compute basic statistics: the minimum, maximum, total sum of values, and the frequency of each number to determine the mode. After calculating the total number of elements, we use a second pass (or cumulative sum within the same loop) to identify the median. If the total count is odd, the median is the element at position total_count/2 + 1; if even, the median is the average of the two middle elements. This approach avoids reconstructing the full sample and efficiently computes the required statistics using constant extra space.


Code Solutions

def sampleStatistics(count):
    # Calculate total count, sum, and determine min, max, and mode
    total_count = sum(count)
    min_val, max_val = None, None
    total_sum = 0
    mode_val = 0
    mode_count = 0
    
    for i in range(len(count)):
        if count[i] > 0:
            if min_val is None:
                min_val = i
            max_val = i
            total_sum += i * count[i]
            if count[i] > mode_count:
                mode_count = count[i]
                mode_val = i
                
    mean_val = total_sum / total_count

    # Find the median using cumulative frequency
    median_val = 0.0
    cumulative = 0
    if total_count % 2 == 1:
        mid_index = total_count // 2 + 1  # The middle index for an odd number of elements
        for i in range(len(count)):
            cumulative += count[i]
            if cumulative >= mid_index:
                median_val = i
                break
    else:
        mid_index1 = total_count // 2
        mid_index2 = mid_index1 + 1
        median1 = None
        median2 = None
        for i in range(len(count)):
            cumulative += count[i]
            if median1 is None and cumulative >= mid_index1:
                median1 = i
            if cumulative >= mid_index2:
                median2 = i
                break
        median_val = (median1 + median2) / 2.0

    return [float(min_val), float(max_val), mean_val, median_val, float(mode_val)]

# Example usage:
# For a sample count array representing the sample [1,2,2,2,3,3,3,3]
count = [0]*256
count[1] = 1
count[2] = 3
count[3] = 4
print(sampleStatistics(count))
← Back to All Questions