LeetCode Solutions Blog

Shortest Common Supersequence

Number: 1170

Difficulty: Hard

Paid? No

Problem Description

Given two strings str1 and str2, return the shortest string that has both str1 and str2 as subsequences. A string s is a subsequence of string t if deleting some (possibly zero) characters from t results in s. If there exist multiple valid answers, any one of them is acceptable.

Key Insights

The problem can be solved by leveraging the concept of the Longest Common Subsequence (LCS).
By first finding the LCS of str1 and str2, we can merge the two strings while ensuring that common parts (LCS) are not repeated unnecessarily.
Dynamic Programming is used to compute the LCS.
Once the LCS is available, the final result is built by interleaving characters from str1 and str2 while aligning with the LCS.

Space and Time Complexity

Time Complexity: O(m * n), where m and n are the lengths of str1 and str2 respectively. Space Complexity: O(m * n) for the DP table to store LCS lengths, with additional space for the output string.

Solution

The solution is divided into two main steps:

Compute the LCS of the two strings using a Dynamic Programming table. This table dp[i][j] represents the length of the LCS of str1[0...i-1] and str2[0...j-1].
Reconstruct the shortest common supersequence by traversing the two strings in parallel and using the LCS as a guide. For every character in the LCS, append all non-LCS characters from str1 and str2 until that common character is reached, then add the LCS character. Append any remaining characters after processing the LCS.

The key observation is that by including the non-matching parts between matched LCS characters, we ensure both strings are subsequences of the constructed result, and the result is the shortest possible.

Code Solutions

# Python solution using DP for LCS and reconstruction for SCS.

def shortestCommonSupersequence(str1: str, str2: str) -> str:
    m, n = len(str1), len(str2)
    # DP table initialization: dp[i][j] will hold LCS for str1[:i] and str2[:j]
    dp = [["" for _ in range(n + 1)] for _ in range(m + 1)]
    
    # Build the dp table for LCS string reconstruction.
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if str1[i - 1] == str2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + str1[i - 1]
            else:
                # Choose the longer LCS when characters do not match
                if len(dp[i - 1][j]) > len(dp[i][j - 1]):
                    dp[i][j] = dp[i - 1][j]
                else:
                    dp[i][j] = dp[i][j - 1]
    
    # Reconstruct the shortest common supersequence using the LCS.
    lcs = dp[m][n]
    i = j = 0
    scs = []
    # Append non-LCS characters from str1 and str2 until the LCS characters appear.
    for c in lcs:
        # Append all characters from str1 until we hit the LCS character.
        while str1[i] != c:
            scs.append(str1[i])
            i += 1
        # Append all characters from str2 until we hit the LCS character.
        while str2[j] != c:
            scs.append(str2[j])
            j += 1
        # Now append the LCS character.
        scs.append(c)
        i += 1
        j += 1
        
    # Append any remaining characters after processing the LCS.
    scs.extend(list(str1[i:]))
    scs.extend(list(str2[j:]))
    return "".join(scs)

# Example test cases:
print(shortestCommonSupersequence("abac", "cab"))  # Expected output: "cabac" (or any valid SCS)
print(shortestCommonSupersequence("aaaaaaaa", "aaaaaaaa"))  # Expected output: "aaaaaaaa"