LeetCode Solutions Blog

Frequencies of Shortest Supersequences

Number: 3713

Difficulty: Hard

Paid? No

Companies: N/A

Problem Description

We are given an array of strings words (each of length exactly 2) composed of lowercase letters. A shortest common supersequence (SCS) is a string of minimum length that contains every word in words as a subsequence. Note that two SCSs that are simply rearrangements (i.e. letter‐permutations) of one another should be considered the same. The goal is to return a two‐dimensional array where each inner array is of size 26, corresponding to the frequency count (for letters a to z) for a distinct SCS.

Key Insights

Every input word is of length 2. If a word is “xx”, it forces at least two occurrences of that letter.
For words with two distinct letters (say “xy”), an ordering constraint is imposed: an occurrence of x must appear before an occurrence of y.
The collective ordering constraints naturally form a directed graph on the unique letters. A valid SCS must “respect” these constraints.
In cases where the constraints form a cycle, the minimal SCS cannot be a simple permutation. Extra copies of certain letters are needed to “break” the cycle.
The answer requires only the frequency counts of letters in distinct SCSs up to permutation. Thus, SCSs that are rearrangements of the same multiset are considered equivalent.

Space and Time Complexity

Time Complexity: In the worst case the solution uses exponential enumeration over the extra letter counts required; however, given a maximum of 16 unique letters, the effective worst‐case is manageable. Space Complexity: O(U) for storing graphs and frequency maps, where U is the number of unique letters (at most 16).

Solution

The key observation is to transform the problem into one of constraint satisfaction. First, we extract all unique letters from words and compute a base frequency for each letter. Every letter must appear at least once (and for words like “aa” the base frequency is 2). For each word with two distinct letters “xy”, we record the ordering constraint that in the final supersequence at least one occurrence of x must precede one occurrence of y.

We then build a directed graph from these ordering constraints. If the graph is acyclic then one valid SCS is any permutation of the unique letters (augmented with the base counts). However, when cycles occur the ordering constraints conflict when considering a simple permutation. In such cases, we need to duplicate letters so that an ordering exists that “separates” conflicting constraints (e.g. for words “ab” and “ba” the minimal SCS has length 3).

A high‐level approach is:

Compute the base frequency requirements per letter. For each word “xy”, if x == y increase the base count for that letter.
Build the ordering graph from words with distinct letters.
Set extra = 0. While no valid multiset distribution (base frequencies plus extra counts) yields a valid ordering respecting the constraints, increment extra.
For every distribution of extra counts among the unique letters (subject to the sum of extras being extra), test if there exists an arrangement – for example using a greedy/topological sort– that satisfies that for every word “xy” an occurrence of x appears before an occurrence of y.
Among all valid frequency arrays, output only one frequency array for each distinct multiset distribution (ignoring different orders that result in the same frequency counts).

This approach leverages graph algorithms (topological sort) along with enumeration over small extra count distributions. Some of the nuances include handling cycles through duplications and ensuring that different arrangements (that are simply permutations) collapse to the same frequency distribution.

Code Solutions

Below are code outlines with detailed line‐by‐line comments in Python, JavaScript, C++, and Java.

# Python solution outline

def shortest_superseq_frequencies(words):
    # Step 1: Determine the set of unique letters and initialize base frequency counts.
    unique_letters = set()
    base_freq = {chr(i + ord('a')): 1 for i in range(26)}
    # Update base frequency for letters that appear; later restrict to letters in words
    for word in words:
        for ch in word:
            unique_letters.add(ch)
    # For letters not in unique_letters, we zero out their count.
    for ch in base_freq:
        if ch not in unique_letters:
            base_freq[ch] = 0

    # For words like "xx", we need at least 2 occurrences.
    for word in words:
        if word[0] == word[1]:
            base_freq[word[0]] = max(base_freq[word[0]], 2)
    
    # Step 2: Build ordering constraints for distinct letters.
    # For each word "xy" with x != y, record that x must appear before y.
    graph = {ch: set() for ch in unique_letters}
    indegree = {ch: 0 for ch in unique_letters}
    for word in words:
        if word[0] != word[1]:
            u, v = word[0], word[1]
            # Avoid duplicate edges.
            if v not in graph[u]:
                graph[u].add(v)
                indegree[v] += 1

    # Helper function to test if a given frequency distribution can produce an ordering
    # that satisfies all constraints. We simulate a topological sort that must pick letters
    # in an order (not necessarily unique) that respects the constraint.
    def valid_ordering(freq):
        # Make a copy of indegree as we will modify it during processing.
        local_indegree = indegree.copy()
        # Create an array with letter copies, total count is sum(freq)
        # We simulate positions by repeatedly choosing any available letter (with local_indegree 0) 
        # if available, and decrease the frequency count.
        remaining = freq.copy()
        total = sum(freq.values())
        order = []
        for _ in range(total):
            # For a valid pick, letter must be present (remaining count > 0) and have 0 indegree.
            found = False
            for ch in unique_letters:
                if remaining[ch] > 0 and local_indegree[ch] == 0:
                    # Choose letter ch at this position
                    order.append(ch)
                    remaining[ch] -= 1
                    # If this was the last occurrence of ch, relax the dependency for its neighbors.
                    if remaining[ch] == 0:
                        for nb in graph[ch]:
                            local_indegree[nb] -= 1
                    found = True
                    break  # pick one letter at each iteration
            if not found:
                # No valid letter to pick, so the frequency distribution cannot yield a valid ordering.
                return False
        # The simulated ordering respects the constraints.
        return True

    # Step 3: Enumerate over extra counts. extra_sum represents total extra letters added beyond base.
    found_distributions = set()
    valid_frequencies = []
    extra = 0
    # Because the search space is small we can try increasing extra until we find at least one valid distribution.
    while True:
        # Helper recursive function to enumerate ways to distribute extra count among letters.
        freq_list = []
        letters = list(unique_letters)
        def dfs(index, extra_remaining, current):
            if index == len(letters):
                if extra_remaining == 0:
                    # Build frequency map using base frequencies plus current extras.
                    freq_map = {}
                    for i, ch in enumerate(letters):
                        freq_map[ch] = base_freq[ch] + current[i]
                    freq_map_full = {ch: freq_map.get(ch, 0) for ch in (chr(j+ord('a')) for j in range(26))}
                    freq_list.append(freq_map_full)
                return
            # Distribute extra_remaining among remaining letters.
            for add in range(extra_remaining + 1):
                current.append(add)
                dfs(index + 1, extra_remaining - add, current)
                current.pop()
        dfs(0, extra, [])
        # Check each frequency distribution for valid ordering.
        for freq in freq_list:
            # Create a simplified frequency mapping for only letters in unique_letters.
            sub_freq = {ch: freq[ch] for ch in unique_letters}
            if valid_ordering(sub_freq):
                # Create a tuple to represent the distribution (ordered by a...z).
                freq_tuple = tuple(freq[ch] for ch in sorted(freq.keys()))
                if freq_tuple not in found_distributions:
                    found_distributions.add(freq_tuple)
                    valid_frequencies.append([freq[ch] for ch in sorted(freq.keys())])
        if valid_frequencies:
            break
        extra += 1
    # Return the valid frequency arrays. The ordering of letters is by sorted order of the keys.
    return valid_frequencies

# Example usage:
if __name__ == "__main__":
    words = ["ab", "ba"]
    result = shortest_superseq_frequencies(words)
    print(result)