Problem Description
Given an array of strings words and an integer k, for every index i we remove words[i] from the array. Then among the remaining words we want to choose any k (from distinct indices) so that the k chosen words share the longest possible common prefix. For every removal (i.e. for every index) return the length of that longest common prefix. If after removal there are fewer than k words available, we return 0.
For example, if words = ["jump","run","run","jump","run"] and k = 2 then:
- Removing "jump" at index 0, the remaining list has three "run” and one "jump”. Choosing any two "run" gives a common prefix “run” (length 3).
- Removing "run" at index 1, the best we can do is choose the two occurrences of "jump" (common prefix “jump” of length 4). The answer array is [3,4,4,3,4].
Key Insights
- A global data structure (specifically, a Trie) can be used to compute for each prefix (of any word) the total count of words that share that prefix.
- The “global” best answer (without any removal) is determined by the deepest level (largest prefix length) for which there exists at least one node (prefix) with count ≥ k.
- Removing a word only “affects” the nodes along the path corresponding to that removed word. For any node whose prefix is not shared with the removed word the count remains unchanged.
- In order to answer each removal efficiently, we pre‐compute for every depth d the maximum frequency among all trie nodes at that depth (globalBest[d]), how many nodes achieve that maximum (globalBestFreq[d]) and the second‐highest frequency (globalSecond[d]). Then for a removal, if the removed word passes through the best node at depth d and it was unique, its effective count will drop (i.e. count–1) so we need to compare that with the backup candidate (the second best).
- Note that for a removal word whose length is L, any prefix longer than L is “unaffected” (since the removed word does not contribute there) and the global count remains valid.
Space and Time Complexity
Time Complexity: O(S) where S is the sum of the lengths of all words (for both building and traversing the trie, plus processing each removal). Space Complexity: O(S) due to storing the Trie and some additional arrays keyed by prefix depth.
Solution
We first build a Trie with every word inserted. Each Trie node stores: • A mapping from character to child node. • A count – the number of words that pass through the node.
Then we perform a traversal (BFS or DFS) of the Trie to “group” nodes by their depth. For each depth d we record: • globalBest[d]: the maximum count of any node at depth d. • globalBestFreq[d]: the number of nodes that achieve that maximum. • globalSecond[d]: the largest count that is strictly less than globalBest[d] (or 0 if none).
The “global answer” (if no removal occurred) is the maximum d (i.e. prefix length) for which globalBest[d] ≥ k.
For each removal (i.e. for each word removed), we “simulate” the effect along the path corresponding to that word. For a prefix of length d that is part of the removed word, the effective count becomes: if the removed word’s prefix node was the unique best at depth d then: candidate = max( removed_node_count – 1, globalSecond[d] ) otherwise: candidate = globalBest[d] For any depth d greater than the length of the removed word the candidate remains globalBest[d] (since the removed word does not contribute there).
Thus the answer for removal i is the maximum d (1 ≤ d ≤ maxDepth) for which the candidate count is still at least k. (If the total number of words after removal is less than k then the answer is 0.)
A key “gotcha” is that even if the global best at a given depth comes from the removed word’s contribution, if another node exists (with the same global count) we can “borrow” that value and the candidate is not affected.
We now show complete annotated code solutions in Python, JavaScript, C++ and Java.