We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Groups of Strings

Number: 2276

Difficulty: Hard

Paid? No

Companies: Lowe's


Problem Description

Given an array of strings (each with unique letters), we need to group the strings such that any two strings in the same group are “connected”. Two strings are connected if one can be transformed into the other by exactly one of the following operations:

  • Adding exactly one letter.
  • Deleting exactly one letter.
  • Replacing exactly one letter (note: the replacement letter can be the same as the original, but that operation will typically yield some “neighbor” when thought on a bitmask level).

We must return an array with:

  • The number of groups formed.
  • The size of the largest group.

A valid grouping ensures that no string in one group is connected by any of these operations to a string in any other group.


Key Insights

  • Represent each string as a bitmask, where each bit represents whether a given letter (from ‘a’ to ‘z’) is present.
  • Two strings are connected if their bitmasks differ by exactly one letter addition, deletion, or by replacing one bit (i.e. removing one bit and adding a different one).
  • Generate all neighbors for each bitmask by:
    • Removing every one of its set bits.
    • Adding every letter that is not yet set.
    • Replacing every set bit with every not-set bit.
  • Use a union-find (disjoint set union) data structure to group words that are connected.
  • Keep track of duplicates because multiple words might have the same bitmask.

Space and Time Complexity

Time Complexity: O(N * 26^2) in the worst case, where N is the number of words. Each word generates up to O(26 + 26*26) neighbor masks. Space Complexity: O(N) for the union-find data structure and bitmask mapping.


Solution

We first convert each word to its bitmask representation. Then, using union-find, we iterate over each bitmask and generate all its possible neighbor masks using three possible operations:

  1. Deletion – remove a set bit.
  2. Addition – add an unset bit.
  3. Replacement – for each set bit, remove it and for each unset bit, add it. For each neighbor that exists in our input mapping, we union the two groups. Finally, we count the number of distinct groups and the maximum size among them.

Code Solutions

# Python solution with line-by-line comments

class UnionFind:
    def __init__(self, n):
        # Initialize parent pointer and size array for n elements
        self.parent = list(range(n))
        self.size = [1] * n

    def find(self, x):
        # Path compression for efficient look-up
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        # Union the groups containing x and y if they are not already connected
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX == rootY:
            return False
        # Merge smaller tree into larger tree for balance
        if self.size[rootX] < self.size[rootY]:
            rootX, rootY = rootY, rootX
        self.parent[rootY] = rootX
        self.size[rootX] += self.size[rootY]
        return True

class Solution:
    def groupStrings(self, words):
        n = len(words)
        # Convert each word to a bitmask representation
        bitmasks = []
        for word in words:
            mask = 0
            for ch in word:
                mask |= (1 << (ord(ch) - ord('a')))
            bitmasks.append(mask)
        
        # Map each bitmask to list of indices (to handle duplicates)
        mask_to_indices = {}
        for idx, mask in enumerate(bitmasks):
            if mask not in mask_to_indices:
                mask_to_indices[mask] = []
            mask_to_indices[mask].append(idx)
        
        uf = UnionFind(n)
        
        # For each unique bitmask, iterate over operations and union connected indices
        for mask, indices in mask_to_indices.items():
            # For each index that has the exact same mask, union them together.
            base_index = indices[0]
            for idx in indices[1:]:
                uf.union(base_index, idx)
            
            # Operation 1: Delete one letter from mask
            for i in range(26):
                if mask & (1 << i):
                    neighbor = mask ^ (1 << i)  # remove letter i
                    if neighbor in mask_to_indices:
                        uf.union(base_index, mask_to_indices[neighbor][0])
            
            # Operation 2: Add one letter to mask
            for i in range(26):
                if mask & (1 << i) == 0:
                    neighbor = mask | (1 << i)  # add letter i
                    if neighbor in mask_to_indices:
                        uf.union(base_index, mask_to_indices[neighbor][0])
            
            # Operation 3: Replace one letter: remove one and add a different one
            # Essentially, try all (remove, add) pairs.
            for i in range(26):
                if mask & (1 << i):
                    remove_mask = mask ^ (1 << i)
                    for j in range(26):
                        # if letter j is not in the current mask, then add it
                        if remove_mask & (1 << j) == 0:
                            neighbor = remove_mask | (1 << j)
                            if neighbor in mask_to_indices:
                                uf.union(base_index, mask_to_indices[neighbor][0])
                                
        # Count groups and determine largest group size.
        groups = {}
        for i in range(n):
            root = uf.find(i)
            groups[root] = groups.get(root, 0) + 1

        num_groups = len(groups)
        max_group_size = max(groups.values())
        return [num_groups, max_group_size]

# Example usage:
# sol = Solution()
# print(sol.groupStrings(["a","b","ab","cde"]))
← Back to All Questions