We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Short Encoding of Words

Number: 839

Difficulty: Medium

Paid? No

Companies: N/A


Problem Description

Given an array of words, construct the shortest possible reference string s where each word appears as a suffix ending at the '#' character. Each word is encoded as the substring in s starting from a specific index up to (but not including) the next '#' character.


Key Insights

  • If one word is a suffix of another, only the longer word needs to be explicitly stored in the encoding.
  • By reversing the words, the suffix problem becomes a prefix problem, making it easier to detect and ignore redundant words.
  • Using data structures such as a Trie or a set can help efficiently eliminate words that are suffixes of other words.

Space and Time Complexity

Time Complexity: O(n * k^2), where n is the number of words and k is the maximum word length. This arises from checking every possible suffix in each word. Space Complexity: O(n * k), used for storing the words and potential Trie nodes or set entries.


Solution

We start by removing duplicate words using a set. For each word, we remove all its suffixes from the set. This ensures that only words which are not suffixes of any other word remain. The final encoded length is simply the sum of each remaining word's length plus one for the '#' separator.

We can use:

  • A set to store the words.
  • A nested loop to remove every suffix of each word that appears in the set.
  • A final pass to sum word length plus the separator for the constructed reference string.

Code Solutions

# Python implementation with line-by-line comments

class Solution:
    def minimumLengthEncoding(self, words):
        # Create a set to remove duplicate words
        words_set = set(words)
        
        # Remove any word that is a suffix of another word
        for word in list(words_set):
            for k in range(1, len(word)):
                # word[k:] is a potential suffix of word
                suffix = word[k:]
                if suffix in words_set:
                    words_set.discard(suffix)
        
        # The encoding length is the sum of word lengths plus one '#' per word
        total_length = sum(len(word) + 1 for word in words_set)
        return total_length

# Example usage:
# sol = Solution()
# print(sol.minimumLengthEncoding(["time", "me", "bell"]))  # Output: 10
← Back to All Questions