Problem Description
Given a string s and an integer k, delete at most k characters from s such that the run-length encoded version of s has the minimum possible length. In run-length encoding, consecutive identical characters are replaced by the character and the count (if count > 1). For instance, "aaabcccd" compressed becomes "a3bc3d". The task is to choose which characters (if any) to delete so that after compression the resulting string is as short as possible.
Key Insights
- Use dynamic programming to consider both deleting and keeping characters.
- The current state must capture the index in the string, the previous character (or group), and how many times it has been consecutively repeated.
- When keeping a character, if it extends the current group, increasing the count may change the encoded length only when crossing thresholds (e.g., 1 to 2, 9 to 10, etc.).
- When deleting a character, simply move forward reducing the available deletions.
- Leverage memoization to avoid recomputation over overlapping sub-problems.
Space and Time Complexity
Time Complexity: O(n * k * 27) where n is the length of s and the factor 27 comes from the possible states for the previous run (character and its count), as counts are effectively capped (only changes when passing 1, 9, 99, etc.). Space Complexity: O(n * k * 27) due to recursion depth and memoization state storage.
Solution
We solve the problem using recursion with memoization (a top-down dynamic programming approach). The recursion is defined by a function dp(i, prev, count, k) where:
- i is the current index in the string.
- prev is the last character that was kept in the compressed formation.
- count is the number of consecutive occurrences for prev.
- k is the number of deletions remaining.
At every step we have two options:
- Delete s[i] if k > 0.
- Keep s[i] and if it is the same as prev, extend the run (and possibly increase the encoded length if count becomes 2, 10, etc.). If it is different from prev, start a new group with an encoded contribution of 1 (for the character, plus additional cost if the count goes beyond 1 later).
The answer is the minimum compressed length achievable over these choices. Special attention is needed when extending a group: only when count reaches certain thresholds (1 to 2, 9 to 10, etc.) does the output length increase.