Problem Description
Given a string s, count and return the number of distinct non-empty subsequences of s. Since the answer may be very large, return the result modulo 10^9 + 7.
Key Insights
- Use dynamic programming to count subsequences by leveraging the idea that any new character doubles the count of existing subsequences.
- To avoid counting duplicates when a character reappears, subtract the number of subsequences that existed before the previous occurrence of that character.
- Use a hash map or an array of fixed size (for 26 lowercase letters) to keep track of the last index each character was seen.
- Use modulo arithmetic at every step to keep the numbers within bounds.
Space and Time Complexity
Time Complexity: O(n), where n is the length of the string
Space Complexity: O(n) for the dp array and O(1) for the tracking structure since it is limited to 26 characters
Solution
We use a dynamic programming approach. Define dp[i] as the number of distinct subsequences using the substring s[0..i]. When a new character is processed, it potentially doubles the number of distinct subsequences by appending the new character to each previously valid subsequence. However, if the character has appeared before, then some subsequences are repeated. To fix this, subtract the count of subsequences counted before the previous occurrence of that character. A tracking structure (map or array) is maintained which holds the last position where each character was encountered. Care must be taken to handle subtraction modulo 10^9 + 7 so that negative values are corrected by adding the modulus.