LeetCode Solutions Blog

Number of Valid Words in a Sentence

Number: 2173

Difficulty: Easy

Paid? No

Companies: Cisco

Problem Description

Given a sentence containing tokens separated by spaces, count the number of tokens (words) that are valid. A valid word may include lowercase letters, at most one hyphen (which must be surrounded by letters), and at most one punctuation mark (if it exists, it must be at the end of the token). Digits are not allowed.

Key Insights

Split the sentence into tokens by spaces.
Each token must be checked for the following:
- It contains only valid characters (lowercase letters, hyphens, and punctuation marks) and no digits.
- It has at most one hyphen, and if present the hyphen must be flanked by lowercase letters.
- It has at most one punctuation mark, and if present it must be located at the end of the token.
Use iteration to verify each token and count the valid ones.

Space and Time Complexity

Time Complexity: O(n * m), where n is the number of tokens and m is the average length of a token. Space Complexity: O(n) for storing the tokenized words, though auxiliary space remains constant.

Solution

We solve this problem by:

Splitting the input sentence into tokens using spaces as delimiters.
For each token, performing the following checks:
- Verify that the token does not contain any digits.
- Count the number of hyphens. If a hyphen is found, ensure it is neither at the beginning nor at the end, and that the characters immediately before and after are lowercase letters.
- Count the number of punctuation marks (from the set {!,.,,}). If a punctuation mark exists, ensure it appears only at the very end of the token.
Count tokens that satisfy all criteria. The solution primarily uses string manipulation and iteration through characters.

Code Solutions

# Function to count valid words in a sentence
def count_valid_words(sentence):
    # Define valid punctuation characters
    punctuation_marks = set("!.,")
    # Split the sentence by spaces into tokens
    tokens = sentence.split()
    valid_count = 0

    # Process each token individually
    for token in tokens:
        hyphen_count = 0
        punctuation_count = 0
        valid = True

        for i, ch in enumerate(token):
            # Check if character is a digit; if so, token is invalid
            if ch.isdigit():
                valid = False
                break

            # Check if character is a hyphen
            if ch == '-':
                hyphen_count += 1
                # If more than one hyphen, invalid token
                if hyphen_count > 1:
                    valid = False
                    break
                # Hyphen must not be at the start or end and must be surrounded by lowercase letters
                if i == 0 or i == len(token) - 1 or not (token[i-1].islower() and token[i+1].islower()):
                    valid = False
                    break

            # Check if character is a punctuation mark
            elif ch in punctuation_marks:
                punctuation_count += 1
                # If more than one punctuation mark, invalid token
                if punctuation_count > 1:
                    valid = False
                    break
                # Punctuation must be the last character in the token
                if i != len(token) - 1:
                    valid = False
                    break

            # Check if character is a lowercase letter
            elif not ch.islower():
                valid = False
                break

        if valid:
            valid_count += 1

    return valid_count

# Example usage:
sentence1 = "cat and  dog"
print(count_valid_words(sentence1))  # Output: 3