We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

String Compression II

Number: 1637

Difficulty: Hard

Paid? No

Companies: Microsoft, Amazon, Toptal


Problem Description

Given a string s and an integer k, delete at most k characters from s such that the run-length encoded version of s has the minimum possible length. In run-length encoding, consecutive identical characters are replaced by the character and the count (if count > 1). For instance, "aaabcccd" compressed becomes "a3bc3d". The task is to choose which characters (if any) to delete so that after compression the resulting string is as short as possible.


Key Insights

  • Use dynamic programming to consider both deleting and keeping characters.
  • The current state must capture the index in the string, the previous character (or group), and how many times it has been consecutively repeated.
  • When keeping a character, if it extends the current group, increasing the count may change the encoded length only when crossing thresholds (e.g., 1 to 2, 9 to 10, etc.).
  • When deleting a character, simply move forward reducing the available deletions.
  • Leverage memoization to avoid recomputation over overlapping sub-problems.

Space and Time Complexity

Time Complexity: O(n * k * 27) where n is the length of s and the factor 27 comes from the possible states for the previous run (character and its count), as counts are effectively capped (only changes when passing 1, 9, 99, etc.). Space Complexity: O(n * k * 27) due to recursion depth and memoization state storage.


Solution

We solve the problem using recursion with memoization (a top-down dynamic programming approach). The recursion is defined by a function dp(i, prev, count, k) where:

  • i is the current index in the string.
  • prev is the last character that was kept in the compressed formation.
  • count is the number of consecutive occurrences for prev.
  • k is the number of deletions remaining.

At every step we have two options:

  1. Delete s[i] if k > 0.
  2. Keep s[i] and if it is the same as prev, extend the run (and possibly increase the encoded length if count becomes 2, 10, etc.). If it is different from prev, start a new group with an encoded contribution of 1 (for the character, plus additional cost if the count goes beyond 1 later).

The answer is the minimum compressed length achievable over these choices. Special attention is needed when extending a group: only when count reaches certain thresholds (1 to 2, 9 to 10, etc.) does the output length increase.


Code Solutions

def getLengthOfOptimalCompression(s: str, k: int) -> int:
    n = len(s)
    
    # Helper function to compute cost change when extending a group
    # Returns 1 if adding one more character increases the encoded length (e.g. 1->2, 9->10, 99->100...), otherwise 0.
    def cost_increase(count):
        if count == 1 or count == 9 or count == 99:
            return 1
        return 0
    
    from functools import lru_cache
    
    @lru_cache(maxsize=None)
    def dp(i: int, prev: str, count: int, k_remaining: int) -> int:
        # Base case: reached the end of the string
        if i == n:
            return 0
        
        # Option 1: Delete s[i] if possible.
        best = float('inf')
        if k_remaining > 0:
            best = dp(i + 1, prev, count, k_remaining - 1)
        
        # Option 2: Keep s[i]
        if s[i] == prev:
            # If the character is same as previous, extend the current group.
            # Increase cost if extending the group crosses thresholds
            cost = cost_increase(count)
            best = min(best, cost + dp(i + 1, prev, count + 1, k_remaining))
        else:
            # Starting a new group: cost is 1 for the new character 
            best = min(best, 1 + dp(i + 1, s[i], 1, k_remaining))
        return best

    # Initial call: no previous character (use a placeholder '#' which is not in s) and count = 0.
    return dp(0, '#', 0, k)

# Example usage:
print(getLengthOfOptimalCompression("aaabcccd", 2))
← Back to All Questions