We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Distinct Echo Substrings

Number: 1244

Difficulty: Hard

Paid? No

Companies: Google


Problem Description

Given a string text, find the number of distinct non-empty substrings that can be written as a concatenation of a string with itself (i.e. as a + a). For example, if a substring is "abcabc", it is valid because it is formed by "abc" concatenated with "abc". The task is to count each distinct valid substring only once.


Key Insights

  • Only even-length substrings can form an echo substring because they need to be split into two equal halves.
  • Using a brute force approach (checking all even-length substrings) can be inefficient. A rolling hash is an effective way to quickly compute and compare hash values of substring halves.
  • Precomputing hash values and appropriate power values enables constant-time comparisons for any substring.
  • Store valid echo substrings (or a hash representation along with their length) in a set to ensure uniqueness.

Space and Time Complexity

Time Complexity: O(n^2) – We iterate over all even-length substrings. Space Complexity: O(n^2) in the worst case due to the hash set storage, plus O(n) for auxiliary arrays.


Solution

Our approach uses rolling hash to compare two halves of every even-length substring in constant time. We start by precomputing a prefix hash array and a power array based on a chosen base (26, for lowercase letters) and modulus (10^9 + 7) to minimize collisions. Then, for every possible even-length substring, we compute the hash of the first half and the hash of the second half. If they are equal, the substring is valid and is added (using a tuple of the hash of the entire substring and its length for uniqueness) to a set. Finally, we return the size of this set.


Code Solutions

# Python solution using rolling hash

def distinct_echo_substrings(text):
    n = len(text)
    mod = 10**9 + 7  # modulus for hash
    base = 26        # number of characters (a-z)

    # Precompute prefix hash: prefix[i] is hash of text[0:i]
    prefix = [0] * (n + 1)
    for i in range(n):
        prefix[i + 1] = (prefix[i] * base + (ord(text[i]) - ord('a'))) % mod

    # Precompute power array: power[i] = base^i mod mod
    power = [1] * (n + 1)
    for i in range(1, n + 1):
        power[i] = (power[i - 1] * base) % mod

    # Helper function to get hash of substring text[l:r]
    def get_hash(l, r):
        return (prefix[r] - (prefix[l] * power[r - l]) % mod + mod) % mod

    echo_set = set()
    # Check only even length substrings
    for length in range(2, n + 1, 2):
        half = length // 2
        for start in range(n - length + 1):
            # Compute hash for first half and second half
            if get_hash(start, start + half) == get_hash(start + half, start + length):
                # Use tuple (hash, length) for uniqueness to avoid collisions
                echo_value = (get_hash(start, start + length), length)
                echo_set.add(echo_value)
    return len(echo_set)

# Example usage:
print(distinct_echo_substrings("abcabcabc"))  # Output: 3
print(distinct_echo_substrings("leetcodeleetcode"))  # Output: 2
← Back to All Questions