We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Distinct Subsequences II

Number: 977

Difficulty: Hard

Paid? No

Companies: Google


Problem Description

Given a string s, count and return the number of distinct non-empty subsequences of s. Since the answer may be very large, return the result modulo 10^9 + 7.


Key Insights

  • Use dynamic programming to count subsequences by leveraging the idea that any new character doubles the count of existing subsequences.
  • To avoid counting duplicates when a character reappears, subtract the number of subsequences that existed before the previous occurrence of that character.
  • Use a hash map or an array of fixed size (for 26 lowercase letters) to keep track of the last index each character was seen.
  • Use modulo arithmetic at every step to keep the numbers within bounds.

Space and Time Complexity

Time Complexity: O(n), where n is the length of the string
Space Complexity: O(n) for the dp array and O(1) for the tracking structure since it is limited to 26 characters


Solution

We use a dynamic programming approach. Define dp[i] as the number of distinct subsequences using the substring s[0..i]. When a new character is processed, it potentially doubles the number of distinct subsequences by appending the new character to each previously valid subsequence. However, if the character has appeared before, then some subsequences are repeated. To fix this, subtract the count of subsequences counted before the previous occurrence of that character. A tracking structure (map or array) is maintained which holds the last position where each character was encountered. Care must be taken to handle subtraction modulo 10^9 + 7 so that negative values are corrected by adding the modulus.


Code Solutions

MOD = 10**9 + 7

class Solution:
    def distinctSubseqII(self, s: str) -> int:
        # dp[i] represents the number of distinct subsequences using s[0...i-1]
        n = len(s)
        dp = [0] * (n + 1)
        dp[0] = 1  # Counting the empty subsequence
        
        # last_occurrence stores the last index for each character.
        last_occurrence = {}
        
        for i in range(1, n + 1):
            char = s[i - 1]
            # Each character doubles the subsequences from previous state.
            dp[i] = (dp[i - 1] * 2) % MOD
            
            # If the character has been seen before, subtract the duplicate subsequences.
            if char in last_occurrence:
                j = last_occurrence[char]
                dp[i] = (dp[i] - dp[j - 1]) % MOD
            
            # Update last occurrence (we use i to represent 1-indexed position in dp)
            last_occurrence[char] = i
        
        # Subtract 1 to exclude the empty subsequence.
        return (dp[n] - 1) % MOD

# Example usage:
# sol = Solution()
# print(sol.distinctSubseqII("abc"))  # Output: 7
← Back to All Questions