LeetCode Solutions Blog

Delete Duplicate Folders in System

Number: 2079

Difficulty: Hard

Paid? No

Problem Description

Given a list of absolute folder paths, build the hierarchical folder structure and identify duplicate subtrees (folders with identical subfolder structures). If two or more folders are identical (they have the same non-empty set of same subfolders recursively), then mark all such folders and their entire subtrees for deletion. After a one-time deletion on all marked folders, return the paths of the remaining folders.

Key Insights

Build a tree (trie) structure to represent the folder hierarchy.
Serialize each subtree into a unique string that captures the folder name and its sorted children’s serialization.
Use a hash map to count how many times each subtree serialization appears.
Mark nodes (and their subtrees) for deletion if their serialization occurs more than once.
Traverse the tree again to output all remaining folder paths, skipping any marked nodes.

Space and Time Complexity

Time Complexity: O(N * L * log(degree)) where N is the number of folder nodes, L is the average length of each serialization and log factor comes from sorting the children. Space Complexity: O(N * L) due to the tree structure and storage of serialization strings.

Solution

We first build a tree from the given paths, where each node represents a folder and has a dictionary of children nodes. We then traverse the tree from the bottom-up (postorder DFS) to compute a canonical string representation (serialization) for each subtree. This serialization includes the folder name and the sorted serializations of its children. We count the occurrences of each serialization in a global dictionary. If any subtree has a count greater than one, we mark that node for deletion. Finally, we perform another DFS traversal to collect all remaining folder paths by skipping over any subtree marked for deletion.

Code Solutions

# Python solution
class FolderNode:
    def __init__(self, name):
        self.name = name
        self.children = {}  # key: folder name, value: FolderNode instance
        self.serial = ""   # serialized string of the subtree
        self.delete = False  # mark for deletion

class Solution:
    def deleteDuplicateFolder(self, paths):
        # Build the folder tree with a dummy root
        root = FolderNode("")
        for path in paths:
            curr = root
            for folder in path:
                if folder not in curr.children:
                    curr.children[folder] = FolderNode(folder)
                curr = curr.children[folder]
        
        # Dictionary to count subtree serializations
        subtreeCount = {}
        
        # Postorder DFS to serialize each subtree
        def serialize(node):
            # If node has no children, the serialization is empty string
            # We use empty string to represent no folder below so that only non-empty subfolder sets are serialized.
            if not node.children:
                node.serial = ""
            else:
                # Serialize children in sorted order to ensure identical subtrees produce the same string
                parts = []
                for child in sorted(node.children.keys()):
                    parts.append(child + "(" + serialize(node.children[child]) + ")")
                node.serial = "".join(parts)
            # Count the serialization if it is non-empty (only consider folders with subfolders)
            if node.serial:
                subtreeCount[node.serial] = subtreeCount.get(node.serial, 0) + 1
            return node.serial
        
        serialize(root)
        
        # Mark nodes for deletion if their serialization count > 1
        def mark_deletion(node):
            for child in node.children.values():
                mark_deletion(child)
            if node.serial and subtreeCount.get(node.serial, 0) > 1:
                node.delete = True
        
        mark_deletion(root)
        
        # Collect the remaining paths (skip the entire marked subtree)
        result = []
        def collect(node, path):
            for folder, child in node.children.items():
                if child.delete:
                    continue  # Skip this folder and its subtree
                result.append(path + [folder])
                collect(child, path + [folder])
        collect(root, [])
        return result

# Example usage:
# solution = Solution()
# print(solution.deleteDuplicateFolder([["a"],["c"],["d"],["a","b"],["c","b"],["d","a"]]))