Problem Description
Given a list of directory info strings, each containing a directory path followed by one or more files with their contents (formatted as "fileName(content)"), return all groups of duplicate files in the system. Two files are considered duplicate if they have exactly the same content. Each file path is constructed by combining the directory path and the file name.
Key Insights
- Parse each input string by separating the directory path from the file details.
- Extract the file name and content using string manipulation (split by spaces and use the parenthesis as delimiters).
- Use a hashmap (or dictionary) to map file content to a list of full file paths.
- Return only those groups that have more than one file path.
Space and Time Complexity
Time Complexity: O(N * K), where N is the number of directory strings and K is the average length of the strings including file details.
Space Complexity: O(N * K) for storing the hashmap mapping file content to file paths.
Solution
We iterate over each directory info string from the input. For each string, we:
- Split the string to obtain the directory path and the file information segments.
- For each file segment, extract the file name and content by locating the '(' separator.
- Construct the full file path by appending the file name to the directory path.
- Map the file content to a list of these full paths using a hashmap.
- Finally, return the lists from the hashmap that contain more than one file path, indicating duplicate files.
This approach leverages simple string manipulation and a hashmap to efficiently group and identify duplicate files.