Problem Description
Given an array of strings words, count the number of index pairs (i, j) with i < j such that words[i] is both a prefix and a suffix of words[j]. In other words, for a valid pair, words[j] should begin with words[i] and end with words[i].
Key Insights
- For a pair (i, j) to be valid, words[i] must be one of the “borders” of words[j] (a string that is both its prefix and suffix).
- Rather than comparing each pair naively, we can precompute for each word the set of its borders using techniques such as the KMP prefix function.
- An offline approach is helpful where we iterate through the list in reverse order. For each word (treated as a potential S word), we add counts for each of its border strings to a global map.
- Then, when processing an earlier word (as a candidate prefix), we simply add how many future words have that word as one of their borders.
- The overall work is proportional to the total length of the strings (which is bounded by 5×10^5), making the approach efficient.
Space and Time Complexity
Time Complexity: O(total length of all words) — each word’s borders are computed in linear time relative to its length. Space Complexity: O(total length of all words) — to store the prefix function for computing borders and the frequency map for candidate border strings.
Solution
The main idea is to reframe the problem: instead of trying to match each possible pair on the fly, we count for each word j (as the “S” word) all of its border strings (which include the full length string and any proper border). We will iterate over the words from right to left and update a frequency map (futureMap) where each border string key’s value is the number of words (with index greater than the current index) that have that string as a border. Then, when processing a word at index i, we simply add the count from futureMap[words[i]] (if any) to our result. Finally, we compute the borders of words[i] (using the KMP prefix function) and update futureMap accordingly so that future indices (i.e., with smaller index) can match against the current word’s contributions.
Data Structures and Techniques:
- Hash Map: To store the count of how many times a candidate border string appears as a border for some j (j > i).
- KMP Prefix Function: To derive all border lengths of a word efficiently.
- Reverse Iteration: Ensures that when we process a candidate word at index i, all words with index j > i have already been processed and their border counts recorded.
Gotchas:
- When computing borders via the prefix function, be sure to include the whole word (which is always a valid border for itself) as well as all proper borders.
- Beware of the order condition (i < j); processing from right to left simplifies the counting logic.
Code Solutions
Below are implementations in Python, JavaScript, C++, and Java with inline comments.