Problem Description
Given a starting gene string, an ending gene string, and a bank of valid gene sequences, determine the fewest single-character mutations required to change the start gene into the end gene. Each mutation must result in a gene that exists in the bank. If no valid mutation sequence exists, return -1.
Key Insights
- Use Breadth-First Search (BFS) as it finds the shortest path (minimum mutations) in an unweighted graph.
- Mutations are generated by changing each character of the gene string one by one to one of the valid characters: A, C, G, or T.
- Use a set for the bank for O(1) membership checks.
- Avoid revisiting the same gene by marking mutations as visited (or removing them from the bank).
Space and Time Complexity
Time Complexity: O(N * L * 4) where N is the number of genes in the bank and L is the length of the gene (here L=8), due to checking each position with 4 possible mutations. Space Complexity: O(N) for the queue and visited set, with additional O(N) for the bank set.
Solution
We solve the problem using a Breadth-First Search (BFS) algorithm. The idea is to start from the initial gene and generate all possible one-character mutations. Each valid mutation (i.e., one that exists in the bank) is added to the BFS queue. For each level of BFS, we increment the mutation count. When the end gene is reached, we return the current mutation count.
We store the bank in a set to ensure constant time membership checks and avoid revisiting states. This solution guarantees finding the minimum number of mutations needed if a valid sequence exists, otherwise, it returns -1.