We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Find the Lexicographically Smallest Valid Sequence

Number: 3584

Difficulty: Medium

Paid? No

Companies: N/A


Problem Description

Given two strings – word1 and word2 – we need to find an array of indices (of length equal to word2’s length) that is “valid” in the following sense: • The indices are in increasing order. • If you take the characters from word1 at those indices (in order) you get a string that is “almost equal” to word2, meaning that by changing at most one character in that extracted string you can obtain word2. If there are several valid sequences, you must return the lexicographically smallest (i.e. the array which, when compared element‐by‐element, is smallest). If no valid sequence exists, return an empty array.


Key Insights

  • Recognize that an “almost equal” string is either exactly equal to word2 or differs in exactly one position.
  • First check if an exact subsequence of word2 exists in word1. (An exact match is valid because zero changes is at most one.)
  • If no exact match is found, then try to “force” one substitution at some position in word2.
  • For each candidate substitution position t in word2, pick the lexicographically smallest set of indices: use greedy two–pointers (backed by precomputed “next occurrence” arrays) to choose the earliest available index for every non–substituted letter; for the substitution position, choose any index (without requiring a character match) provided that the remainder of word2 (which must match exactly) can still be “found.”
  • Precomputing for every index in word1 and for every letter (a–z) the next occurrence helps quickly decide whether a complete exact subsequence exists from a given point, and it makes the greedy selection efficient.
  • The challenge is to decide when and where to “use up” the allowed substitution so that the final indices array is lexicographically smallest.

Space and Time Complexity

Time Complexity: O(n · 26 + m · (log n or constant))
 • Precomputing the next occurrence table takes O(26 · n) time.
 • The greedy selection (and checking feasibility for each potential substitution) runs in O(m) worst-case if implemented carefully.
Space Complexity: O(n · 26) for the next–occurrence table, plus O(m) for helper arrays.


Solution

The idea is to “simulate” the subsequence selection from word1 for word2 by using a next–occurrence table:

  1. Build an array nextPos such that for every index i in word1 (from 0 to n) and for every letter from a to z, nextPos[i][c] gives the next index (>= i) where letter c appears. This allows us to quickly “jump” to the next matching index.
  2. Attempt a greedy exact–match subsequence. For each position j in word2, starting from position 0 in word1, choose the smallest index i (>= current) where word1[i] equals word2[j] and ensure that the remainder of word2 can still be matched. If this succeeds for all j then return the indices.
  3. If an exact match is impossible then consider using one substitution. For each possible substitution position t (0-indexed in word2):   a. For j from 0 to m–1, if j is not t, choose the smallest index where word1’s letter equals word2[j] (using the nextPos table) while ensuring that the remaining part can be completed exactly.   b. For j == t, you are free to choose any index (starting at the current pointer) – choose the smallest possible candidate that still leaves a valid exact sequence for word2[t+1:].
  4. Compare all valid candidates and choose the lexicographically smallest indices array.
  5. Return the chosen indices array or an empty array if no candidate is valid.

Key details include handling edge cases (for instance, word2 being longer than any possible subsequence in word1) and ensuring that the feasibility check (for “completing” the match after a chosen index) is done in O(1) time by using precomputed data.


Code Solutions

Below are code solutions with detailed inline comments in Python, JavaScript, C++ and Java.

# Python solution
def find_lexicographically_smallest_valid_sequence(word1, word2):
    n, m = len(word1), len(word2)
    # Precompute next occurrence table:
    # next_pos[i][ch] = smallest index >= i where character ch appears in word1.
    # We'll use n as a sentinel (meaning not found).
    next_pos = [[n] * 26 for _ in range(n + 1)]
    # Initialize last row with n (i.e., not found)
    for i in range(n - 1, -1, -1):
        # Copy the row from i+1.
        next_pos[i] = next_pos[i + 1][:]
        # Update for the current character.
        next_pos[i][ord(word1[i]) - ord('a')] = i

    # Helper: check if it is possible to match word2 exactly from index `start` in word1.
    def can_match(start, idx):
        # idx: current index in word2 to match
        cur = start
        for j in range(idx, m):
            ch = ord(word2[j]) - ord('a')
            cur = next_pos[cur][ch]
            if cur == n:
                return False
            cur += 1
        return True

    # Attempt 1: Exact match (0 substitution)
    seq = []
    cur = 0
    valid_exact = True
    for j in range(m):
        ch_index = ord(word2[j]) - ord('a')
        cur = next_pos[cur][ch_index]
        if cur == n:
            valid_exact = False
            break
        seq.append(cur)
        cur += 1
    if valid_exact:
        return seq

    # Attempt 2: Use one substitution.
    best_seq = None
    # Try each candidate substitution index t in word2.
    for t in range(m):
        candidate_seq = []
        cur = 0
        valid = True
        for j in range(m):
            if j == t:
                # At substitution index: choose smallest possible valid index.
                # We try all possible choices starting at cur; pick the first candidate 'x'
                found_pos = -1
                for x in range(cur, n):
                    # Check that remaining pattern (from j+1 onward) can be matched exactly after x.
                    if can_match(x + 1, j + 1):
                        found_pos = x
                        break
                if found_pos == -1:
                    valid = False
                    break
                candidate_seq.append(found_pos)
                cur = found_pos + 1
            else:
                # Must match exactly.
                ch_index = ord(word2[j]) - ord('a')
                cur = next_pos[cur][ch_index]
                if cur == n:
                    valid = False
                    break
                candidate_seq.append(cur)
                cur += 1
        if valid:
            if best_seq is None or candidate_seq < best_seq:
                best_seq = candidate_seq
    return best_seq if best_seq is not None else []

# Example usage:
print(find_lexicographically_smallest_valid_sequence("vbcca", "abc"))  # Output: [0,1,2]
print(find_lexicographically_smallest_valid_sequence("bacdc", "abc"))  # Output: [1,2,4]
print(find_lexicographically_smallest_valid_sequence("aaaaaa", "aaabc"))  # Output: []
print(find_lexicographically_smallest_valid_sequence("abc", "ab"))  # Output: [0,1]
← Back to All Questions