Problem Description
Given two strings word1 and word2, count the total number of non-empty substrings of word1 that are valid. A substring is considered valid if, after any rearrangement, word2 appears as its prefix. In other words, the substring must contain at least the frequency of each character present in word2.
Key Insights
- A substring can be rearranged arbitrarily, so the order does not matter. We only need to ensure that the substring has at least as many of each character as word2.
- Create a frequency count for word2 (required counts) and use a sliding window on word1 to check if a current window meets or exceeds these counts.
- Once a valid window starting at index i is found (with a minimal ending index j), every substring starting from i and ending at any index beyond j will also be valid.
- Use two pointers (sliding window) to efficiently count the number of valid substrings and avoid O(n^2) enumeration.
Space and Time Complexity
Time Complexity: O(n * k) where n is the length of word1 and k is the size of the alphabet (typically 26).
Space Complexity: O(1) since the frequency arrays have a fixed size independent of input length.
Solution
We use a two-pointer (sliding window) approach with frequency arrays. First, compute the frequency of each character in word2 which serves as the requirement. Then, iterate over word1 with a left pointer (i) and expand a right pointer (j) until the window [i, j] meets or exceeds the required frequency for every character in word2. When a valid window is found, every substring that extends this window (i.e. [i, j], [i, j+1], …, [i, n-1]) will be valid, and we add (n - j) to our result. Before moving i to the next index, remove its effect from the window count and continue.
Key Data Structures:
- Two fixed-size integer arrays (or dictionaries) to store character frequencies for the current window and for word2.
- Two pointers to mark the boundaries of the current substring window.
The main trick is recognizing that once the window is valid, extending its right boundary keeps it valid. This allows efficient counting without re-checking all substrings one by one.
Code Solutions
Below are code solutions in multiple languages with line-by-line comments.