We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Smallest Missing Genetic Value in Each Subtree

Number: 2131

Difficulty: Hard

Paid? No

Companies: N/A


Problem Description

Given a tree (represented by a parents array) with n nodes and an array nums which assigns each node a unique genetic value (in the range [1, 10^5]), find for each node the smallest positive integer that is missing from the subtree rooted at that node (the subtree includes the node itself and all its descendants).

For example, if a subtree’s genetic values are [1,2,3,4], then the smallest missing value is 5. If a subtree does not contain 1, then the answer is 1.


Key Insights

  • Only nodes on the path from the node containing genetic value 1 (if it exists) up to the root can have answers different from 1; all others will have answer = 1.
  • A DFS starting from the node with genetic value 1 can be used to “collect” genetic values in subtrees.
  • Propagate the union of seen genetic values upward along the parent chain, and update a pointer for the smallest missing integer.
  • Use an auxiliary visited/seen array (of booleans) for genetic values to efficiently check which numbers have been encountered.
  • Each node is processed at most once in the DFS, ensuring linear time complexity.

Space and Time Complexity

Time Complexity: O(n + m) where n is the number of nodes and m is the maximum genetic value (bounded by 10^5) – overall linear. Space Complexity: O(n + m) due to the tree structure (children list), recursion stack, and the visited array for genetic values.


Solution

We start by building an adjacency list from the parents array in order to easily access each node’s children. The key observation is that if genetic value 1 is not present in the tree, then every subtree is missing 1. Otherwise, we locate the node that contains 1. Then, for each node from that node up to the root (using the parent chain), we run an efficient DFS that marks all genetic values found in the currently processed subtree (only processing nodes that haven’t been visited before by taking advantage of the union property). After processing the subtree of the current node, we update a running variable (curMissing) that holds the smallest missing genetic value by checking the visited array. This value is then saved as the answer for the current node, and we move upward.

The approach is efficient because, even though we run a DFS from each node on the path, each node in the tree is processed at most once across all DFS calls.


Code Solutions

Below are code solutions in Python, JavaScript, C++, and Java with line-by-line comments.

# Python solution

# Build the children list from the parents array.
# Use a DFS to mark genetic values seen in the subtrees.
# Propagate upwards from the node containing 1.

def smallestMissingValueSubtree(parents, nums):
    n = len(nums)
    # build children list
    children = [[] for _ in range(n)]
    for node in range(1, n):
        children[parents[node]].append(node)
    
    # initialize answers with 1, the default answer for nodes missing gene 1
    ans = [1] * n

    # find the node that contains gene 1; if not found, return ans (all 1)
    try:
        start = nums.index(1)
    except ValueError:
        return ans

    # visited array for genetic values; size: max gene value possible +2
    max_gene = 10**5 + 2
    visited = [False] * (max_gene)

    # DFS function to traverse subtree from node and mark genes as seen
    def dfs(node):
        # if this genetic value already seen, no need to traverse again
        if visited[nums[node]]:
            return
        visited[nums[node]] = True
        # recursively visit children nodes
        for child in children[node]:
            dfs(child)
    
    # curMissing will track the smallest missing positive integer
    curMissing = 1
    # propagate upward from the node that has gene 1 to the root
    cur = start
    while cur != -1:
        # DFS from the current node to mark all genes in its subtree
        dfs(cur)
        # increase curMissing until a gene is missing
        while curMissing < max_gene and visited[curMissing]:
            curMissing += 1
        # update answer for the current node
        ans[cur] = curMissing
        # move upward to the parent of the current node
        cur = parents[cur]
    
    return ans

# Example usage:
# parents = [-1,0,0,2]
# nums = [1,2,3,4]
# print(smallestMissingValueSubtree(parents, nums))  # Output: [5,1,1,1]
← Back to All Questions