We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Select K Disjoint Special Substrings

Number: 3771

Difficulty: Medium

Paid? No

Companies: Amazon


Problem Description

Given a string s and an integer k, determine whether it is possible to choose k disjoint “special substrings” from s. A special substring is defined as a non‐empty substring (which cannot be equal to the entire string s) satisfying that for every character appearing inside the substring, none of its occurrences appear outside of the substring.


Key Insights

  • The key observation is that if a substring is “special” then, for every character in that substring, the substring must cover all global occurrences of that character.
  • For any character c, if we let L and R be the first and last positions of c in s, then the candidate substring [L, R] is “special” if every character within s[L, R] has its global first occurrence ≥ L and global last occurrence ≤ R.
  • If a character appears exactly once then its candidate interval is a single character and can be used as a special substring.
  • Although one may “combine” adjacent unique characters into a longer special substring, choosing the minimal valid intervals (for example, the candidate intervals defined for each letter) always yields a collection from which a greedy interval‐scheduling approach can pick a maximum set of disjoint intervals.
  • Finally, note that even if a candidate interval is valid, if it covers the entire string s it is not allowed.

Space and Time Complexity

Time Complexity: O(n + 26 * L) where n is the length of s and L is the average length of candidate intervals (in worst-case L can be O(n) but only 26 candidates exist so it is effectively O(n)).
Space Complexity: O(n) for storing occurrences and the candidate intervals list (at most 26 intervals).


Solution

The approach is as follows:

  1. Precompute the first and last occurrence indices for every letter in s.
  2. For each letter that appears in s, define its candidate candidate interval as [firstOccurrence, lastOccurrence]. Then check if the candidate interval is “self-contained” – meaning that for every character in that interval, the global first occurrence is not before the candidate’s start and the global last occurrence is not after the candidate’s end.
  3. Exclude the candidate interval if it spans the entire string (i.e. start == 0 and end == n-1).
  4. Over all valid candidate intervals (each corresponding to some letter – note that if a letter appears only once the interval is just that one index), sort them by their ending indices.
  5. Use a greedy (interval scheduling) selection method to choose as many disjoint intervals as possible.
  6. Return true if the count of chosen intervals is at least k; otherwise return false.

The algorithm takes advantage of the fact that there are at most 26 letters so even if we check each candidate interval by scanning the substring, the cost is acceptable.


Code Solutions

# Python solution with detailed inline comments
class Solution:
    def canSelect(self, s: str, k: int) -> bool:
        n = len(s)
        # k==0 is always valid (choose none)
        if k == 0:
            return True

        # Precompute first and last occurrence for each letter
        first_occ = {c: n for c in "abcdefghijklmnopqrstuvwxyz"}
        last_occ = {c: -1 for c in "abcdefghijklmnopqrstuvwxyz"}
        for i, ch in enumerate(s):
            first_occ[ch] = min(first_occ[ch], i)
            last_occ[ch] = i
        
        candidates = []
        # For each letter that appears in s
        for ch in "abcdefghijklmnopqrstuvwxyz":
            if last_occ[ch] == -1:
                continue  # letter not in s
            L, R = first_occ[ch], last_occ[ch]
            # Skip if candidate interval is the entire string s
            if L == 0 and R == n - 1:
                continue
            valid = True
            # Check every character in the interval [L, R]
            for i in range(L, R + 1):
                c = s[i]
                # If any letter in the interval appears outside [L,R], candidate is not valid
                if first_occ[c] < L or last_occ[c] > R:
                    valid = False
                    break
            if valid:
                candidates.append((L, R))
        
        # Remove duplicates if any and sort the candidate intervals by their ending index
        candidates = list(set(candidates))
        candidates.sort(key = lambda interval: interval[1])
        
        # Greedy interval scheduling to count maximum number of disjoint intervals
        count = 0
        last_end = -1
        for start, end in candidates:
            if start > last_end:
                count += 1
                last_end = end
        
        return count >= k

# Example usage:
# s = "abcdbaefab", k = 2 -> should return True.
← Back to All Questions