Problem Description
Given a string s and an integer k, determine whether it is possible to choose k disjoint “special substrings” from s. A special substring is defined as a non‐empty substring (which cannot be equal to the entire string s) satisfying that for every character appearing inside the substring, none of its occurrences appear outside of the substring.
Key Insights
- The key observation is that if a substring is “special” then, for every character in that substring, the substring must cover all global occurrences of that character.
- For any character c, if we let L and R be the first and last positions of c in s, then the candidate substring [L, R] is “special” if every character within s[L, R] has its global first occurrence ≥ L and global last occurrence ≤ R.
- If a character appears exactly once then its candidate interval is a single character and can be used as a special substring.
- Although one may “combine” adjacent unique characters into a longer special substring, choosing the minimal valid intervals (for example, the candidate intervals defined for each letter) always yields a collection from which a greedy interval‐scheduling approach can pick a maximum set of disjoint intervals.
- Finally, note that even if a candidate interval is valid, if it covers the entire string s it is not allowed.
Space and Time Complexity
Time Complexity: O(n + 26 * L) where n is the length of s and L is the average length of candidate intervals (in worst-case L can be O(n) but only 26 candidates exist so it is effectively O(n)).
Space Complexity: O(n) for storing occurrences and the candidate intervals list (at most 26 intervals).
Solution
The approach is as follows:
- Precompute the first and last occurrence indices for every letter in s.
- For each letter that appears in s, define its candidate candidate interval as [firstOccurrence, lastOccurrence]. Then check if the candidate interval is “self-contained” – meaning that for every character in that interval, the global first occurrence is not before the candidate’s start and the global last occurrence is not after the candidate’s end.
- Exclude the candidate interval if it spans the entire string (i.e. start == 0 and end == n-1).
- Over all valid candidate intervals (each corresponding to some letter – note that if a letter appears only once the interval is just that one index), sort them by their ending indices.
- Use a greedy (interval scheduling) selection method to choose as many disjoint intervals as possible.
- Return true if the count of chosen intervals is at least k; otherwise return false.
The algorithm takes advantage of the fact that there are at most 26 letters so even if we check each candidate interval by scanning the substring, the cost is acceptable.