Mongodb Use Cases For Aggregation Pipelines Complete Guide

Last Update:2025-06-23T00:00:00 .NET School AI Teacher - SELECT ANY TEXT TO EXPLANATION. 8 mins read Difficulty-Level: beginner

Understanding the Core Concepts of MongoDB Use Cases for Aggregation Pipelines

MongoDB Use Cases for Aggregation Pipelines

1. Data Transformation and Filtering

Use Case: Data cleaning and preprocessing are crucial stages in data analysis. Aggregation pipelines can perform these tasks efficiently by filtering out unwanted data and transforming the existing data into a suitable format.

Important Information:

$match: Filters documents that pass the specified condition.
$project: Reshapes each document in the stream, such as by adding new fields or removing existing fields.
$redact: Restricts the documents to the set of fields that match the given condition.
$replaceRoot: Replaces the input document with the specified document.

Example:

db.sales.aggregate([
  { $match: { status: "A" } },
  { $project: { _id: 0, item: 1, price: 1, quantity: 1 } }
])

This pipeline filters documents with a status of "A" and projects only the item, price, and quantity fields.

2. Data Aggregation and Grouping

Use Case: Summarizing and aggregating large datasets is essential for generating insights and driving business decisions. Aggregation pipelines can perform complex grouping operations.

Important Information:

$group: Groups input documents by a specified expression and outputs to the next stage a document for each distinct grouping.
$accumulator: Adds user-defined accumulators to the $group stage.
$bucket: Categorizes incoming documents into groups based on specified boundaries.

Example:

db.sales.aggregate([
  { $group: { _id: "$item", totalQuantity: { $sum: "$quantity" } } }
])

This pipeline groups documents by the item field and sums up the quantity for each item.

3. Data Sorting and Limiting

Use Case: Sorting and limiting results are common requirements in data analysis to focus on top-performing or bottom-performing items.

Important Information:

$sort: Sorts all input documents and returns them in sorted order.
$limit: Limits the number of documents passed to the next stage in the pipeline.
$skip: Skips a specified number of documents and passes the remaining documents to the next stage.

Example:

db.sales.aggregate([
  { $sort: { totalQuantity: -1 } },
  { $limit: 5 }
])

This pipeline sorts the documents by totalQuantity in descending order and limits the results to the top 5.

4. Joining Data from Different Collections

Use Case: Data often resides in multiple collections or databases. Aggregation pipelines can perform lookups to combine data from different collections.

Important Information:

$lookup: Performs a left outer join to another collection in the same database to filter in documents from the “joined” collection for processing.
$graphLookup: Recursively joins subdocuments to the current document from the same collection.

Example:

db.orders.aggregate([
  {
    $lookup:
      {
        from: "customers",
        localField: "cust_id",
        foreignField: "customer_id",
        as: "customer_info"
      }
   }
])

This pipeline joins the orders collection with the customers collection based on the cust_id and customer_id fields.

5. Calculating Running Totals and Moving Averages

Use Case: Running totals and moving averages are useful for analyzing trends over time, such as stock price movements or sales performance.

Important Information:

$addFields: Adds new fields to documents. $addFields outputs documents that contain all existing fields from the input documents and newly added fields.
$window: Performs operations on a set of documents in a defined window and outputs the result for each document individually. $window can be used to calculate running totals and moving averages.

Example:

db.sales.aggregate([
  {
    $window:
      {
        output:
          {
            runningTotal: { $sum: "$value" },
            movingAvg: { $avg: "$value", window: { documents: [-2, 2] } }
          }
      }
  }
])

This pipeline calculates a running total and a moving average with a window of 5 documents centered on the current document.

6. Geospatial Aggregation

Use Case: Geospatial data is critical in location-based services, logistics, and urban planning. Aggregation pipelines can perform geospatial operations directly within MongoDB.

Important Information:

$geoNear: Outputs documents in order of nearest to farthest from a specified point.
$geoWithin: Selects documents with geospatial data within a specified shape.
$geoIntersects: Selects documents with geospatial data that intersect with a specified shape.

Example:

db.places.aggregate([
  {
    $geoNear:
      {
        near: { type: "Point", coordinates: [ -73.99279 , 40.719296 ] },
        distanceField: "dist.calculated",
        maxDistance: 2000,
        spherical: true
      }
  }
])

This pipeline retrieves locations within a maximum distance of 2000 meters from the specified point.

7. Data Analysis and Reporting

Use Case: Generating business reports and performing data analysis are vital for making informed decisions. Aggregation pipelines can provide the required data in real-time.

Important Information:

$facet: Performs multiple aggregation operations in parallel.
$bucketAuto: Categorizes incoming documents into a specific number of groups based on a specified expression.
$out: Writes the result of the aggregation pipeline to a specified collection.

Example:

db.orders.aggregate([
  {
    $facet:
      {
        byCategory:
          [
            { $group: { _id: "$category", totalSales: { $sum: "$amount" } } },
            { $sort: { totalSales: -1 } },
            { $limit: 5 }
          ],
        salesByDay:
          [
            { $group: { _id: { $dayOfYear: "$orderDate" }, totalSales: { $sum: "$amount" } } },
            { $sort: { _id: 1 } },
            { $limit: 12 }
          ]
      }
  }
])

This pipeline generates two reports: the top 5 categories by total sales and the total sales by day for the last 12 days.

Conclusion

Online Code run

🔔 Note: Select your programming language to check or run code at

💻 Run Code Compiler

Step-by-Step Guide: How to Implement MongoDB Use Cases for Aggregation Pipelines

Use Case 1: Calculating Total Sales for a Product

Scenario: Suppose you have a collection named orders with the following documents. You want to calculate the total sales for each product.

{ "_id": 1, "product_id": "A101", "quantity": 3, "price": 10 }
{ "_id": 2, "product_id": "B202", "quantity": 5, "price": 20 }
{ "_id": 3, "product_id": "A101", "quantity": 2, "price": 10 }
{ "_id": 4, "product_id": "C303", "quantity": 4, "price": 15 }
{ "_id": 5, "product_id": "B202", "quantity": 3, "price": 20 }

Objective: Calculate the total sales (quantity * price) for each product.

Steps:

Match: Filter the documents (if needed, but in this case, all documents are relevant).
Group: Group by product_id and calculate the total sales.
Project: Format the output.

Aggregation Pipeline:

db.orders.aggregate([
  {
    $group: {
      _id: "$product_id",
      totalSales: { $sum: { $multiply: ["$quantity", "$price"] } }
    }
  },
  {
    $project: {
      _id: 0,
      product_id: "$_id",
      totalSales: 1
    }
  }
])

Output:

{ "product_id": "A101", "totalSales": 50 }
{ "product_id": "B202", "totalSales": 170 }
{ "product_id": "C303", "totalSales": 60 }

Use Case 2: Filtering and Sorting Data

Scenario: Suppose you have a collection named transactions with the following documents. You want to filter transactions for a specific year, sort them by amount in descending order, and limit the results to the top 5.

{ "_id": 1, "account_id": "123", "year": 2021, "amount": 150 }
{ "_id": 2, "account_id": "456", "year": 2022, "amount": 200 }
{ "_id": 3, "account_id": "789", "year": 2021, "amount": 250 }
{ "_id": 4, "account_id": "123", "year": 2021, "amount": 300 }
{ "_id": 5, "account_id": "456", "year": 2021, "amount": 350 }
{ "_id": 6, "account_id": "789", "year": 2022, "amount": 100 }

Objective: Filter transactions for the year 2021, sort them by amount in descending order, and limit the results to the top 5.

Steps:

Match: Filter documents for the year 2021.
Sort: Sort the documents by amount in descending order.
Limit: Limit the results to the top 5.

Aggregation Pipeline:

db.transactions.aggregate([
  {
    $match: { year: 2021 }
  },
  {
    $sort: { amount: -1 }
  },
  {
    $limit: 5
  }
])

Output:

{ "_id": 5, "account_id": "456", "year": 2021, "amount": 350 }
{ "_id": 4, "account_id": "123", "year": 2021, "amount": 300 }
{ "_id": 3, "account_id": "789", "year": 2021, "amount": 250 }
{ "_id": 1, "account_id": "123", "year": 2021, "amount": 150 }

Use Case 3: Grouping by Multiple Fields and Using Accumulators

Scenario: Suppose you have a collection named sales with the following documents. You want to group sales by region and product, calculate the total sales amount, and find the average sales amount per product.

{ "_id": 1, "region": "North", "product": "Widget", "amount": 200 }
{ "_id": 2, "region": "South", "product": "Widget", "amount": 150 }
{ "_id": 3, "region": "North", "product": "Gadget", "amount": 250 }
{ "_id": 4, "region": "South", "product": "Gadget", "amount": 100 }
{ "_id": 5, "region": "North", "product": "Widget", "amount": 300 }
{ "_id": 6, "region": "South", "product": "Widget", "amount": 175 }

Objective: Group sales by region and product, calculate the total sales amount, and find the average sales amount per product.

Steps:

Group: Group by region and product, and calculate total and average sales.
Project: Format the output.

Aggregation Pipeline:

db.sales.aggregate([
  {
    $group: {
      _id: { region: "$region", product: "$product" },
      totalSales: { $sum: "$amount" },
      averageSales: { $avg: "$amount" }
    }
  },
  {
    $project: {
      _id: 0,
      region: "$_id.region",
      product: "$_id.product",
      totalSales: 1,
      averageSales: 1
    }
  }
])

Output:

{ "region": "North", "product": "Widget", "totalSales": 500, "averageSales": 250 }
{ "region": "South", "product": "Widget", "totalSales": 325, "averageSales": 162.5 }
{ "region": "North", "product": "Gadget", "totalSales": 250, "averageSales": 250 }
{ "region": "South", "product": "Gadget", "totalSales": 100, "averageSales": 100 }

Use Case 4: Joining Collections (Lookup)

Scenario: Suppose you have two collections named orders and customers. You want to join these collections to retrieve customer information for each order.

Collections:

orders:

{ "_id": 1, "customer_id": "C100", "product": "Widget", "amount": 200 }
{ "_id": 2, "customer_id": "C101", "product": "Gadget", "amount": 150 }
{ "_id": 3, "customer_id": "C100", "product": "Widget", "amount": 300 }

customers:

{ "_id": "C100", "name": "Alice", "email": "alice@example.com" }
{ "_id": "C101", "name": "Bob", "email": "bob@example.com" }

Objective: Join the orders and customers collections to retrieve customer information for each order.

Steps:

Lookup: Join the customers collection with the orders collection based on customer_id.
Project: Format the output.

Aggregation Pipeline:

db.orders.aggregate([
  {
    $lookup: {
      from: "customers",
      localField: "customer_id",
      foreignField: "_id",
      as: "customer_info"
    }
  },
  {
    $unwind: "$customer_info"
  },
  {
    $project: {
      _id: 0,
      order_id: "$_id",
      product: 1,
      amount: 1,
      customer_name: "$customer_info.name",
      customer_email: "$customer_info.email"
    }
  }
])

Output:

Top 10 Interview Questions & Answers on MongoDB Use Cases for Aggregation Pipelines

1. Data Transformation and Formatting

Question: How can MongoDB Aggregation Pipelines be used to format and transform raw data into a more readable format? Answer: The $project stage is used to select and reshape fields to be returned from the pipeline. You can also use $addFields to add new fields or modify existing ones.

Example: Suppose you have a collection employees with document fields firstName, lastName, and salary. You want to create a full name field combining firstName and lastName.

db.employees.aggregate([
    {
        $project: {
            _id: 0, // exclude _id
            fullName: { $concat: ["$firstName", " ", "$lastName"] },
            salary: 1
        }
    }
]);

2. Data Filtering

Question: How can Aggregation Pipelines filter documents to return only those that meet specific criteria? Answer: The $match stage is used to filter documents based on query conditions. Similar to a find() query, it is useful for narrowing down your dataset.

Example: To find all employees earning more than $70,000.

db.employees.aggregate([
    {
        $match: { salary: { $gt: 70000 } }
    }
]);

3. Grouping Data

Question: How does MongoDB handle data grouping within a pipeline? Answer: The $group stage groups documents by a certain key and can optionally aggregate data based on various accumulators like $sum, $avg, $max, etc.

Example: Group employees by department and find the average salary in each department.

db.employees.aggregate([
    {
        $group: {
            _id: "$department",
            avgSalary: { $avg: "$salary" }
        }
    }
]);

4. Sorting and Limiting Results

Question: Can Aggregation Pipelines sort and limit the number of documents returned? Answer: Yes, the $sort and $limit stages serve these purposes. $sort organizes documents in a specified order while $limit restricts the number of documents passed to the next stage.

Example: List the top 5 highest-paid employees.

db.employees.aggregate([
    {
        $sort: { salary: -1 }
    },
    {
        $limit: 5
    }
]);

5. Data Lookup and Joining

Question: How can Aggregation Pipelines perform complex lookups and joins between different collections? Answer: The $lookup stage facilitates left outer joins, while $unwind can deconstruct arrays into multiple documents. These are useful for combining data from related collections.

Example: List all employees along with their associated projects from the projects collection.

db.employees.aggregate([
    {
        $lookup: {
            from: "projects",
            localField: "_id",
            foreignField: "employeeId",
            as: "assignedProjects"
        }
    }
]);

6. Conditional Logic and Computations

Question: How can Aggregation Pipelines incorporate conditional logic and computations? Answer: Operators like $cond, $switch, $ifNull, and $add can be used to perform logic and arithmetic operations.

Example: Add a new field to indicate whether an employee is high earning (over $100,000).

db.employees.aggregate([
    {
        $addFields: {
            isHighEarning: { $cond: { if: { $gt: ["$salary", 100000] }, then: true, else: false } }
        }
    }
]);

7. Handling Null and Missing Values

Question: How can Aggregation Pipelines manage null or missing values? Answer: Operators such as $ifNull and $filter allow for handling missing or null values efficiently.

Example: Set salary to 0 for any null or missing value.

db.employees.aggregate([
    {
        $addFields: {
            salary: { $ifNull: ["$salary", 0] }
        }
    }
]);

8. Complex Data Extraction Using `$unwind` and `$group`

Question: How can Aggregation Pipelines handle arrays and extract complex data? Answer: $unwind splits the documents into multiple ones based on an array field. After $unwind, operations like $group can aggregate data across the expanded documents.

Example: Find out how many projects each employee is assigned to.

db.projects.aggregate([
    {
        $unwind: "$employeeIds"
    },
    {
        $group: {
            _id: "$employeeIds",
            projectCount: { $sum: 1 }
        }
    }
]);

9. Pipeline Optimization and Indexing

Question: How can you improve the performance of MongoDB Aggregation Pipelines? Answer: Proper indexing can significantly boost the performance of $match stages. Utilizing indexes on fields that are frequently used in filtering, sorting, and grouping operations can substantially speed up the aggregation processes.

Example: Ensure there is an index on department for the grouping query from earlier.

db.employees.createIndex({ department: 1 });

10. Data Export and Analysis

Question: Can Aggregation Pipelines aid in exporting and analyzing data? Answer: Aggregation Pipelines can output data into various formats for further analysis in other tools via $out to another collection, $merge to merge with another collection, or $export (in newer versions) to export results to a file or other destinations.

Example: Export the aggregated data to a CSV file using the MongoDB Shell.

Mongodb Use Cases For Aggregation Pipelines Complete Guide