We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Reshape Data: Melt

Number: 3073

Difficulty: Easy

Paid? No

Companies: N/A


Problem Description

Given a DataFrame where each row represents a product and its corresponding sales for four different quarters (quarter_1 to quarter_4), reshape the DataFrame so that each row represents the sales data for a product in a specific quarter. The output should have three columns: product, quarter, and sales.


Key Insights

  • The problem requires transforming data from wide format to long format.
  • Each product has multiple quarter columns that need to be "melted" into two columns: one for quarter labels and one for the corresponding sales.
  • Pandas’ melt function is a natural fit in Python; similar approaches can be simulated in other languages by iterating over each product and quarter.

Space and Time Complexity

Time Complexity: O(n * k), where n is the number of products and k is the number of quarter columns (in this case, constant 4).
Space Complexity: O(n * k) for storing the reshaped data.


Solution

The problem is solved by iterating over each row of the input data and for each product, iterating through the quarter columns (quarter_1 to quarter_4). For every quarter, we create a new record with the product name, quarter label, and the corresponding sales figure. In Python with pandas, the pd.melt function can perform this transformation concisely. For languages without a built-in melt functionality, the iteration approach is adopted by manually constructing the result set.


Code Solutions

import pandas as pd

# Define a function to reshape the DataFrame from wide to long format.
def reshape_data(report):
    # Use pd.melt to transform the DataFrame.
    reshaped_df = pd.melt(report,
                          id_vars=['product'],                            # Column to keep
                          value_vars=['quarter_1', 'quarter_2', 'quarter_3', 'quarter_4'],  # Columns to unpivot
                          var_name='quarter',                             # New column name for quarter labels
                          value_name='sales')                             # New column name for sales values
    return reshaped_df

# Example usage:
if __name__ == '__main__':
    # Create a sample DataFrame with product sales data.
    data = {
        'product': ['Umbrella', 'SleepingBag'],
        'quarter_1': [417, 800],
        'quarter_2': [224, 936],
        'quarter_3': [379, 93],
        'quarter_4': [611, 875]
    }
    report = pd.DataFrame(data)
    result = reshape_data(report)
    print(result)
← Back to All Questions