We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Find the Shortest Superstring

Number: 980

Difficulty: Hard

Paid? No

Companies: Google


Problem Description

Given an array of unique strings, return the smallest string that contains each string as a substring. No string in the given array is a substring of another. In case of multiple valid answers, any one is accepted.


Key Insights

  • Precompute the maximum overlap for every pair of strings such that merging them minimizes additional length.
  • Use dynamic programming with bitmasking to try every possible permutation of words efficiently.
  • Reconstruct the final superstring using backtracking from the DP table.
  • The problem is analogous to solving a variant of the traveling salesman problem where the cost is the negative overlap.

Space and Time Complexity

Time Complexity: O(n^2 * 2^n), where n is the number of words. Space Complexity: O(n * 2^n) for storing DP states and backtracking information.


Solution

We solve the problem using dynamic programming with bitmasking. First, precompute the overlap between every two strings (i and j) which is the maximum length of the suffix of i that is a prefix of j. Then, use a DP state dp[mask][i] that represents the maximum saved overlap when the set of strings given by mask has been used and ends with the i-th string. The state transitions by considering appending a new word j to the current combination, updating the overlap saved. Once the DP table is computed, reconstruct the optimal order by backtracking and finally combine the words by merging them according to the computed overlaps.


Code Solutions

# Python implementation for Shortest Superstring

from typing import List

def shortestSuperstring(words: List[str]) -> str:
    n = len(words)
    
    # Precompute the overlaps between words: overlap[i][j] is the maximum suffix of words[i] that is a prefix of words[j]
    overlap = [[0] * n for _ in range(n)]
    for i in range(n):
        for j in range(n):
            if i == j:
                continue
            # Compute maximum overlap between words[i] and words[j]
            max_len = min(len(words[i]), len(words[j]))
            for k in range(max_len, 0, -1):
                if words[i].endswith(words[j][:k]):
                    overlap[i][j] = k
                    break
                    
    # dp[mask][i] (mask: bitmask of visited words, i: ending word) stores maximum overlap sum
    dp = [[0] * n for _ in range(1 << n)]
    parent = [[-1] * n for _ in range(1 << n)]
    
    # Iterate over all bitmask states
    for mask in range(1, 1 << n):
        for j in range(n):
            if (mask & (1 << j)) == 0:
                continue
            prev_mask = mask ^ (1 << j)
            if prev_mask == 0:
                continue
            for i in range(n):
                if prev_mask & (1 << i):
                    value = dp[prev_mask][i] + overlap[i][j]
                    if value > dp[mask][j]:
                        dp[mask][j] = value
                        parent[mask][j] = i
    
    # Find the end word that gives maximal overlap sum
    max_overlap = -1
    last = -1
    full_mask = (1 << n) - 1
    for i in range(n):
        if dp[full_mask][i] > max_overlap:
            max_overlap = dp[full_mask][i]
            last = i

    # Reconstruct the path for maximum overlap
    path = []
    mask = full_mask
    while last != -1:
        path.append(last)
        temp = parent[mask][last]
        mask ^= (1 << last)
        last = temp
    path.reverse()
    
    # Build the superstring using the path and computed overlaps
    superstring = words[path[0]]
    for idx in range(1, len(path)):
        i, j = path[idx-1], path[idx]
        # Append the non-overlapping part of words[j]
        superstring += words[j][overlap[i][j]:]
    
    # Append remaining words not in path (if any) - though by problem statement all words appear in path
    used = set(path)
    for i in range(n):
        if i not in used:
            superstring += words[i]
    
    return superstring

# Example usage:
if __name__ == "__main__":
    test_words = ["catg","ctaagt","gcta","ttca","atgcatc"]
    print(shortestSuperstring(test_words))
← Back to All Questions