We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Tag Validator

Number: 591

Difficulty: Hard

Paid? No

Companies: Microsoft


Problem Description

Given a string that represents a code snippet, determine if it is valid. A valid code snippet must be entirely wrapped within one valid closed tag. The snippet may contain nested valid closed tags and CDATA sections. Each tag must satisfy strict rules on formatting, naming (only uppercase letters with a length of 1 to 9), and proper nested structure. CDATA sections, which are denoted by are not parsed for tags but treated as plain text.


Key Insights

  • The entire code snippet must be wrapped in one valid outer closed tag.
  • A stack data structure is ideal for tracking nested tag structures.
  • When encountering a '<', determine if it starts a CDATA section, a start tag, or an end tag.
  • For start tags, validate the tag name (uppercase letters only, length between 1 and 9) before pushing onto the stack.
  • For end tags, ensure it matches the most recent start tag in the stack.
  • CDATA sections are skipped entirely (after finding the ending "]]>") since their content is not parsed.
  • After processing the entire snippet, the stack must be empty for the snippet to be valid.

Space and Time Complexity

Time Complexity: O(n), where n is the length of the code string, as we process each character at most once.
Space Complexity: O(n), due to the usage of a stack to store tags.


Solution

The solution leverages a stack to manage the nested tag structure. We iterate through the string, and upon encountering a '<' we determine its type:

  1. If it starts a CDATA section ("" and move the pointer forward, treating its contents as plain text.
  2. If it starts with "</", it denotes an end tag. We then extract the tag name, compare it with the tag on top of the stack and pop the stack if they match.
  3. Otherwise, if it is a start tag, we verify the tag name’s validity (only uppercase letters and a length between 1 and 9) and push it onto the stack. Any characters outside these recognized patterns are processed as plain text. Finally, when the string ends, the stack must be empty (meaning every start tag was properly closed) and the entire code must have been encapsulated within a valid outer tag.

Code Solutions

# Python solution using a stack and pointer iteration

def isValid(code: str) -> bool:
    n = len(code)
    stack = []
    i = 0
    # The code must start with a start tag.
    if n == 0 or code[0] != '<':
        return False

    while i < n:
        if i > 0 and not stack:
            # All characters must be inside a valid closed tag.
            return False
        if code.startswith("<![CDATA[", i):
            # Locate the end of the CDATA section.
            j = code.find("]]>", i)
            if j == -1:
                return False
            i = j + 3
        elif code.startswith("</", i):
            # Process end tag.
            j = code.find(">", i)
            if j == -1:
                return False
            tagName = code[i+2:j]
            if not stack or stack[-1] != tagName:
                return False
            stack.pop()
            i = j + 1
        elif code[i] == '<':
            # Process start tag.
            j = code.find(">", i)
            if j == -1:
                return False
            tagName = code[i+1:j]
            # Validate tag name: must be 1 to 9 uppercase letters.
            if len(tagName) < 1 or len(tagName) > 9 or not tagName.isupper():
                return False
            stack.append(tagName)
            i = j + 1
        else:
            # Regular character.
            i += 1

    return not stack

# Example usage:
print(isValid("<DIV>This is the first line <![CDATA[<div>]]></DIV>"))  # Expected output: True
← Back to All Questions