Problem Description
Given a string s and a pattern p that contains exactly two '*' wildcards (each matching any sequence of zero or more characters), return the length of the shortest contiguous substring of s that matches p. The empty substring is considered valid. If no substring of s can match p, return -1.
For example, if s = "abaacbaecebce" and p = "bacce", one matching substring is "baecebce" (of length 8).
Key Insights
- The pattern p always contains exactly two '*' characters, so p can be split into three parts: A, B, and C (which could be empty).
- A valid match is any substring T of s that can be partitioned as: T = (any prefix) + A + (any string) + B + (any string) + C + (any suffix). In terms of matching, T must contain A, then later B, then later C (with possibly zero characters in between).
- To obtain the shortest matching substring, you want to pick occurrences of A, B, and C that are “as close together as possible.”
- A good approach is to first find all occurrences (starting indices) of A, B, and C in s. Then, for each occurrence of A you can binary–search for the next occurrence of B that appears after the end of A, and similarly find an occurrence of C after the end of B.
- The challenge is handling edge cases when any of the segments (A, B, or C) is empty (recall that p might begin or end with '*' or, in an extreme, be just "**", where the answer would be 0).
- Efficient substring search (using for example KMP) and binary search on sorted lists of occurrence indices keeps the solution within acceptable complexity bounds.
Space and Time Complexity
Time Complexity: O(n + m + occ*log(occ)), where n = s.length, m = p.length, and occ is the number of occurrences found for a given segment (in worst-case O(n)). Space Complexity: O(n) to store the occurrence indices.
Solution
We start by splitting p into three segments: A, B, and C, where p = A + '' + B + '' + C. In s, for each segment we use a KMP-based search (or any efficient substring search algorithm) to compute a sorted list of start indices where that segment occurs. (If the segment is empty, we conceptually allow it to “match” at every position; however, for the purpose of minimizing substring length we only need to consider boundary conditions.)
For each occurrence index a for A (if A is nonempty; if empty, any index may serve as a starting point), we determine the end index of the A match (a + |A|). Then, using binary search on all occurrences of B, we try to find an occurrence b such that b ≥ a + |A|. For the found b, the match for B extends to b + |B|. Then, again by binary search on all occurrences of C, we find c such that c ≥ b + |B|. The candidate matching substring spans from a to c + |C| - 1 and its length is (c + |C| - a).
We iterate over all possible occurrences of A in s and take the minimum such substring length. If we cannot form a valid chain, we return -1. Special handling is done when one or more segments are empty. For example, if p equals "**", the answer is 0 as the empty substring is valid.
This solution uses:
- KMP (Knuth-Morris-Pratt) algorithm for efficient pattern searching.
- Binary search to quickly find the next occurrence index that meets our boundary requirements.
A similar strategy is applied for each of the target languages.
Code Solutions
Below are code solutions in Python, JavaScript, C++, and Java with line-by-line comments.