Problem Description
Given two strings, word1 and word2, count the number of nonempty substrings of word1 that are “valid”. A substring is considered valid if the characters in it can be rearranged so that word2 becomes its prefix. In other words, every letter in word2 must appear in the substring with at least the same frequency as in word2.
Key Insights
- The substring is valid if it contains at least as many of each character as word2.
- Instead of checking every substring (which is prohibitively expensive), use a sliding window approach.
- Maintain character frequency counters over the current window and “expand” the window until the substring meets the frequency requirements dictated by word2.
- When a minimal valid window [L, R] is found, every extension of this window (i.e. adding characters past R) is also valid.
- Use two pointers (L and R) and update a match counter to quickly determine when a window meets all the required counts for each character in word2.
Space and Time Complexity
Time Complexity: O(n) where n is the length of word1, since each character is processed at most twice. Space Complexity: O(1) because the frequency arrays are of constant size (26 for lowercase letters) plus the required frequency array for word2 which is also fixed by the 26 letters.
Solution
To solve the problem, precompute the frequency of every character in word2. Then, use a sliding window over word1 to maintain the frequency of characters in the current window. Keep a counter (matchCount) of how many characters have reached the required frequency. For each position L from the start of word1, move the right pointer R until the window [L, R] contains every character in word2 in sufficient quantity. Once a valid window is found, any substring [L, k] with k ≥ R is valid since adding extra characters does not remove validity. Count these substrings as (n - R + 1). Before moving L to the next index, update the frequency array by removing word1[L] and, if necessary, update the match counter. This guarantees that each character is added and removed at most once, resulting in linear runtime.