LeetCode Solutions Blog

Top K Frequent Words

Number: 692

Difficulty: Medium

Paid? No

Problem Description

Given an array of strings (words) and an integer k, return the k most frequent strings. The answer must be sorted by frequency (highest first). If two words have the same frequency, the word with the lower alphabetical order comes first.

Key Insights

Count the frequency of each unique word using a hash table or dictionary.
Use a min-heap (priority queue) to keep track of the top k frequent words. In the heap, the ordering is based on frequency; if frequencies are equal, order by lexicographical order (inverted condition for min-heap).
Alternatively, you can sort the unique words based on frequency and lexicographical order and then pick the top k.
For an efficient solution, use a min-heap that maintains size k achieving O(n log k) time complexity.

Space and Time Complexity

Time Complexity: O(n log k) when using a heap, where n is the number of unique words. Space Complexity: O(n) for storing the frequency counts and the heap.

Solution

We first calculate the frequency of each word using a hash table (or dictionary). Then, we use a min-heap that stores pairs of (frequency, word). The comparator for the heap is defined such that the root of the heap is the word with the smallest frequency; if two words have the same frequency, the word with the lexicographically larger value is considered “smaller” in our ordering so that it gets popped when the heap size exceeds k. After processing all words, the heap contains the top k frequent words. Finally, the heap elements are extracted and reversed (if needed) to produce the final list sorted from highest to lowest frequency with lexicographic tie-breaking.

Code Solutions

import heapq
from collections import Counter

def topKFrequent(words, k):
    # Count the frequency of each word
    count = Counter(words)
    
    # Initialize a min-heap; Python's heapq is a min-heap.
    # The comparator is (frequency, negative word) but use word in reverse order:
    # Since we want words with lower alphabetical order to be prioritized in the result,
    # and in the min heap if frequencies equal, we want the one with lexicographically larger word to be popped.
    heap = []
    
    for word, freq in count.items():
        # Push each pair into the heap. The tuple ordering:
        # First element is frequency and second element is the word (with reverse lex order trick)
        heapq.heappush(heap, (freq, -ord(word[0]), word))
        # However, using -ord(word[0]) is not robust for multi-letter words.
        # Instead, we can use tuple (freq, word) with a custom comparator by ensuring we invert the word order by using word in reverse.
        # For simplicity, we push tuple (freq, word) and then pop if heap size > k with a custom condition.
        
    # Reset heap and use a custom comparator implemented by the tuple (-freq, word) to simulate max heap behavior.
    heap = []
    for word, freq in count.items():
        # We push with frequency as negative so that larger frequencies come first;
        # use word as is to satisfy lexicographical order when frequencies are equal.
        heapq.heappush(heap, (-freq, word))
    
    # Extract k elements from the heap and sort them accordingly.
    result = []
    for _ in range(k):
        freq, word = heapq.heappop(heap)
        result.append(word)
    return result

# Example usage:
print(topKFrequent(["i","love","leetcode","i","love","coding"], 2))  # Output: ['i', 'love']