Problem Description
Given an array of strings (each with unique letters), we need to group the strings such that any two strings in the same group are “connected”. Two strings are connected if one can be transformed into the other by exactly one of the following operations:
- Adding exactly one letter.
- Deleting exactly one letter.
- Replacing exactly one letter (note: the replacement letter can be the same as the original, but that operation will typically yield some “neighbor” when thought on a bitmask level).
We must return an array with:
- The number of groups formed.
- The size of the largest group.
A valid grouping ensures that no string in one group is connected by any of these operations to a string in any other group.
Key Insights
- Represent each string as a bitmask, where each bit represents whether a given letter (from ‘a’ to ‘z’) is present.
- Two strings are connected if their bitmasks differ by exactly one letter addition, deletion, or by replacing one bit (i.e. removing one bit and adding a different one).
- Generate all neighbors for each bitmask by:
- Removing every one of its set bits.
- Adding every letter that is not yet set.
- Replacing every set bit with every not-set bit.
- Use a union-find (disjoint set union) data structure to group words that are connected.
- Keep track of duplicates because multiple words might have the same bitmask.
Space and Time Complexity
Time Complexity: O(N * 26^2) in the worst case, where N is the number of words. Each word generates up to O(26 + 26*26) neighbor masks. Space Complexity: O(N) for the union-find data structure and bitmask mapping.
Solution
We first convert each word to its bitmask representation. Then, using union-find, we iterate over each bitmask and generate all its possible neighbor masks using three possible operations:
- Deletion – remove a set bit.
- Addition – add an unset bit.
- Replacement – for each set bit, remove it and for each unset bit, add it. For each neighbor that exists in our input mapping, we union the two groups. Finally, we count the number of distinct groups and the maximum size among them.