We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Word Frequency

Number: 192

Difficulty: Medium

Paid? No

Companies: Google, Amazon


Problem Description

Write a bash script to calculate the frequency of each word in a text file words.txt, where the file contains only lowercase characters and spaces. Words are separated by one or more whitespace characters, and the result should be output sorted by descending frequency.


Key Insights

  • The problem can be efficiently solved using Unix pipelines.
  • Use tr to transform spaces into newlines to list each word on a new line.
  • Utilize sort and uniq to count occurrences.
  • A secondary sort (with sort -nr) orders the words by descending frequency.
  • The guarantee of unique frequency counts simplifies handling of ties.

Space and Time Complexity

Time Complexity: O(n log n) due to sorting the list of words.
Space Complexity: O(n) where n is the number of words in words.txt.


Solution

The script processes the file words.txt by first converting spaces to newlines with tr. Then, it sorts the resulting list of words so that identical words are adjacent—making it easy for uniq to count occurrences. Another sort orders the counts in descending order. This method leverages Unix utilities to efficiently count and sort word frequencies without explicitly storing data in a custom data structure.


Code Solutions

Additionally, here's the one-liner solution in bash: cat words.txt | tr ' ' '\n' | sort | uniq -c | sort -nr | awk '{print $2, $1}'

# Python simulation to mimic the Unix pipeline approach.
# Open the file and read all words, then count frequencies using Counter.
with open("words.txt", "r") as file:
    content = file.read()
words = content.split()  # Split the content by any whitespace.
from collections import Counter
word_counts = Counter(words)
# Sort the word frequency pairs in descending order based on counts.
sorted_word_counts = sorted(word_counts.items(), key=lambda kv: kv[1], reverse=True)
# Print each word with its frequency.
for word, count in sorted_word_counts:
    print(f"{word} {count}")
← Back to All Questions