Problem Description
Given a string s of lowercase letters, find the maximum number of non-empty substrings that satisfy the following conditions:
- The selected substrings do not overlap.
- For any selected substring, if it contains a character c, then the substring must include every occurrence of c in s.
If multiple solutions yield the same number of substrings, return the one with the minimum total length. It is guaranteed that the solution with minimal total length is unique.
Key Insights
- Precompute the first and last indices of each character in the string.
- For each character based on its first occurrence, attempt to form the smallest valid substring that covers all its occurrences. During this expansion, if any character within the current candidate has a first occurrence outside the candidate’s range, the candidate is invalid.
- After collecting all valid intervals (candidate substrings), sort them by their ending indices.
- Use a greedy selection strategy: choose the substring that ends the earliest and then skip all substrings that overlap with it.
Space and Time Complexity
Time Complexity: O(n) – primarily from scanning the string and processing a constant number (26) of characters. Space Complexity: O(n) – space for storing occurrence indices and intervals.
Solution
We begin by determining the first and last occurrence for every character in s. For each character (considering only its first occurrence), expand its interval to include all characters within and check if the interval is still valid (i.e., for every character within, its first occurrence is not before the interval start). Collect all valid intervals. Finally, sort the intervals by ending index and greedily choose non-overlapping intervals to arrive at the answer.