Mongodb Use Cases For Aggregation Pipelines Complete Guide
Understanding the Core Concepts of MongoDB Use Cases for Aggregation Pipelines
MongoDB Use Cases for Aggregation Pipelines
1. Data Transformation and Filtering
Use Case: Data cleaning and preprocessing are crucial stages in data analysis. Aggregation pipelines can perform these tasks efficiently by filtering out unwanted data and transforming the existing data into a suitable format.
Important Information:
- $match: Filters documents that pass the specified condition.
- $project: Reshapes each document in the stream, such as by adding new fields or removing existing fields.
- $redact: Restricts the documents to the set of fields that match the given condition.
- $replaceRoot: Replaces the input document with the specified document.
Example:
db.sales.aggregate([
{ $match: { status: "A" } },
{ $project: { _id: 0, item: 1, price: 1, quantity: 1 } }
])
This pipeline filters documents with a status of "A" and projects only the item
, price
, and quantity
fields.
2. Data Aggregation and Grouping
Use Case: Summarizing and aggregating large datasets is essential for generating insights and driving business decisions. Aggregation pipelines can perform complex grouping operations.
Important Information:
- $group: Groups input documents by a specified expression and outputs to the next stage a document for each distinct grouping.
- $accumulator: Adds user-defined accumulators to the
$group
stage. - $bucket: Categorizes incoming documents into groups based on specified boundaries.
Example:
db.sales.aggregate([
{ $group: { _id: "$item", totalQuantity: { $sum: "$quantity" } } }
])
This pipeline groups documents by the item
field and sums up the quantity
for each item.
3. Data Sorting and Limiting
Use Case: Sorting and limiting results are common requirements in data analysis to focus on top-performing or bottom-performing items.
Important Information:
- $sort: Sorts all input documents and returns them in sorted order.
- $limit: Limits the number of documents passed to the next stage in the pipeline.
- $skip: Skips a specified number of documents and passes the remaining documents to the next stage.
Example:
db.sales.aggregate([
{ $sort: { totalQuantity: -1 } },
{ $limit: 5 }
])
This pipeline sorts the documents by totalQuantity
in descending order and limits the results to the top 5.
4. Joining Data from Different Collections
Use Case: Data often resides in multiple collections or databases. Aggregation pipelines can perform lookups to combine data from different collections.
Important Information:
- $lookup: Performs a left outer join to another collection in the same database to filter in documents from the “joined” collection for processing.
- $graphLookup: Recursively joins subdocuments to the current document from the same collection.
Example:
db.orders.aggregate([
{
$lookup:
{
from: "customers",
localField: "cust_id",
foreignField: "customer_id",
as: "customer_info"
}
}
])
This pipeline joins the orders
collection with the customers
collection based on the cust_id
and customer_id
fields.
5. Calculating Running Totals and Moving Averages
Use Case: Running totals and moving averages are useful for analyzing trends over time, such as stock price movements or sales performance.
Important Information:
- $addFields: Adds new fields to documents.
$addFields
outputs documents that contain all existing fields from the input documents and newly added fields. - $window: Performs operations on a set of documents in a defined window and outputs the result for each document individually.
$window
can be used to calculate running totals and moving averages.
Example:
db.sales.aggregate([
{
$window:
{
output:
{
runningTotal: { $sum: "$value" },
movingAvg: { $avg: "$value", window: { documents: [-2, 2] } }
}
}
}
])
This pipeline calculates a running total and a moving average with a window of 5 documents centered on the current document.
6. Geospatial Aggregation
Use Case: Geospatial data is critical in location-based services, logistics, and urban planning. Aggregation pipelines can perform geospatial operations directly within MongoDB.
Important Information:
- $geoNear: Outputs documents in order of nearest to farthest from a specified point.
- $geoWithin: Selects documents with geospatial data within a specified shape.
- $geoIntersects: Selects documents with geospatial data that intersect with a specified shape.
Example:
db.places.aggregate([
{
$geoNear:
{
near: { type: "Point", coordinates: [ -73.99279 , 40.719296 ] },
distanceField: "dist.calculated",
maxDistance: 2000,
spherical: true
}
}
])
This pipeline retrieves locations within a maximum distance of 2000 meters from the specified point.
7. Data Analysis and Reporting
Use Case: Generating business reports and performing data analysis are vital for making informed decisions. Aggregation pipelines can provide the required data in real-time.
Important Information:
- $facet: Performs multiple aggregation operations in parallel.
- $bucketAuto: Categorizes incoming documents into a specific number of groups based on a specified expression.
- $out: Writes the result of the aggregation pipeline to a specified collection.
Example:
db.orders.aggregate([
{
$facet:
{
byCategory:
[
{ $group: { _id: "$category", totalSales: { $sum: "$amount" } } },
{ $sort: { totalSales: -1 } },
{ $limit: 5 }
],
salesByDay:
[
{ $group: { _id: { $dayOfYear: "$orderDate" }, totalSales: { $sum: "$amount" } } },
{ $sort: { _id: 1 } },
{ $limit: 12 }
]
}
}
])
This pipeline generates two reports: the top 5 categories by total sales and the total sales by day for the last 12 days.
Conclusion
Online Code run
Step-by-Step Guide: How to Implement MongoDB Use Cases for Aggregation Pipelines
Use Case 1: Calculating Total Sales for a Product
Scenario: Suppose you have a collection named orders
with the following documents. You want to calculate the total sales for each product.
{ "_id": 1, "product_id": "A101", "quantity": 3, "price": 10 }
{ "_id": 2, "product_id": "B202", "quantity": 5, "price": 20 }
{ "_id": 3, "product_id": "A101", "quantity": 2, "price": 10 }
{ "_id": 4, "product_id": "C303", "quantity": 4, "price": 15 }
{ "_id": 5, "product_id": "B202", "quantity": 3, "price": 20 }
Objective: Calculate the total sales (quantity * price
) for each product.
Steps:
- Match: Filter the documents (if needed, but in this case, all documents are relevant).
- Group: Group by
product_id
and calculate the total sales. - Project: Format the output.
Aggregation Pipeline:
db.orders.aggregate([
{
$group: {
_id: "$product_id",
totalSales: { $sum: { $multiply: ["$quantity", "$price"] } }
}
},
{
$project: {
_id: 0,
product_id: "$_id",
totalSales: 1
}
}
])
Output:
{ "product_id": "A101", "totalSales": 50 }
{ "product_id": "B202", "totalSales": 170 }
{ "product_id": "C303", "totalSales": 60 }
Use Case 2: Filtering and Sorting Data
Scenario: Suppose you have a collection named transactions
with the following documents. You want to filter transactions for a specific year, sort them by amount in descending order, and limit the results to the top 5.
{ "_id": 1, "account_id": "123", "year": 2021, "amount": 150 }
{ "_id": 2, "account_id": "456", "year": 2022, "amount": 200 }
{ "_id": 3, "account_id": "789", "year": 2021, "amount": 250 }
{ "_id": 4, "account_id": "123", "year": 2021, "amount": 300 }
{ "_id": 5, "account_id": "456", "year": 2021, "amount": 350 }
{ "_id": 6, "account_id": "789", "year": 2022, "amount": 100 }
Objective: Filter transactions for the year 2021, sort them by amount in descending order, and limit the results to the top 5.
Steps:
- Match: Filter documents for the year 2021.
- Sort: Sort the documents by
amount
in descending order. - Limit: Limit the results to the top 5.
Aggregation Pipeline:
db.transactions.aggregate([
{
$match: { year: 2021 }
},
{
$sort: { amount: -1 }
},
{
$limit: 5
}
])
Output:
{ "_id": 5, "account_id": "456", "year": 2021, "amount": 350 }
{ "_id": 4, "account_id": "123", "year": 2021, "amount": 300 }
{ "_id": 3, "account_id": "789", "year": 2021, "amount": 250 }
{ "_id": 1, "account_id": "123", "year": 2021, "amount": 150 }
Use Case 3: Grouping by Multiple Fields and Using Accumulators
Scenario: Suppose you have a collection named sales
with the following documents. You want to group sales by region
and product
, calculate the total sales amount, and find the average sales amount per product.
{ "_id": 1, "region": "North", "product": "Widget", "amount": 200 }
{ "_id": 2, "region": "South", "product": "Widget", "amount": 150 }
{ "_id": 3, "region": "North", "product": "Gadget", "amount": 250 }
{ "_id": 4, "region": "South", "product": "Gadget", "amount": 100 }
{ "_id": 5, "region": "North", "product": "Widget", "amount": 300 }
{ "_id": 6, "region": "South", "product": "Widget", "amount": 175 }
Objective: Group sales by region
and product
, calculate the total sales amount, and find the average sales amount per product.
Steps:
- Group: Group by
region
andproduct
, and calculate total and average sales. - Project: Format the output.
Aggregation Pipeline:
db.sales.aggregate([
{
$group: {
_id: { region: "$region", product: "$product" },
totalSales: { $sum: "$amount" },
averageSales: { $avg: "$amount" }
}
},
{
$project: {
_id: 0,
region: "$_id.region",
product: "$_id.product",
totalSales: 1,
averageSales: 1
}
}
])
Output:
{ "region": "North", "product": "Widget", "totalSales": 500, "averageSales": 250 }
{ "region": "South", "product": "Widget", "totalSales": 325, "averageSales": 162.5 }
{ "region": "North", "product": "Gadget", "totalSales": 250, "averageSales": 250 }
{ "region": "South", "product": "Gadget", "totalSales": 100, "averageSales": 100 }
Use Case 4: Joining Collections (Lookup)
Scenario: Suppose you have two collections named orders
and customers
. You want to join these collections to retrieve customer information for each order.
Collections:
orders:
{ "_id": 1, "customer_id": "C100", "product": "Widget", "amount": 200 }
{ "_id": 2, "customer_id": "C101", "product": "Gadget", "amount": 150 }
{ "_id": 3, "customer_id": "C100", "product": "Widget", "amount": 300 }
customers:
{ "_id": "C100", "name": "Alice", "email": "alice@example.com" }
{ "_id": "C101", "name": "Bob", "email": "bob@example.com" }
Objective: Join the orders
and customers
collections to retrieve customer information for each order.
Steps:
- Lookup: Join the
customers
collection with theorders
collection based oncustomer_id
. - Project: Format the output.
Aggregation Pipeline:
db.orders.aggregate([
{
$lookup: {
from: "customers",
localField: "customer_id",
foreignField: "_id",
as: "customer_info"
}
},
{
$unwind: "$customer_info"
},
{
$project: {
_id: 0,
order_id: "$_id",
product: 1,
amount: 1,
customer_name: "$customer_info.name",
customer_email: "$customer_info.email"
}
}
])
Output:
Top 10 Interview Questions & Answers on MongoDB Use Cases for Aggregation Pipelines
1. Data Transformation and Formatting
Question: How can MongoDB Aggregation Pipelines be used to format and transform raw data into a more readable format?
Answer: The $project
stage is used to select and reshape fields to be returned from the pipeline. You can also use $addFields
to add new fields or modify existing ones.
Example: Suppose you have a collection employees
with document fields firstName
, lastName
, and salary
. You want to create a full name field combining firstName
and lastName
.
db.employees.aggregate([
{
$project: {
_id: 0, // exclude _id
fullName: { $concat: ["$firstName", " ", "$lastName"] },
salary: 1
}
}
]);
2. Data Filtering
Question: How can Aggregation Pipelines filter documents to return only those that meet specific criteria?
Answer: The $match
stage is used to filter documents based on query conditions. Similar to a find()
query, it is useful for narrowing down your dataset.
Example: To find all employees earning more than $70,000.
db.employees.aggregate([
{
$match: { salary: { $gt: 70000 } }
}
]);
3. Grouping Data
Question: How does MongoDB handle data grouping within a pipeline?
Answer: The $group
stage groups documents by a certain key and can optionally aggregate data based on various accumulators like $sum
, $avg
, $max
, etc.
Example: Group employees by department and find the average salary in each department.
db.employees.aggregate([
{
$group: {
_id: "$department",
avgSalary: { $avg: "$salary" }
}
}
]);
4. Sorting and Limiting Results
Question: Can Aggregation Pipelines sort and limit the number of documents returned?
Answer: Yes, the $sort
and $limit
stages serve these purposes. $sort
organizes documents in a specified order while $limit
restricts the number of documents passed to the next stage.
Example: List the top 5 highest-paid employees.
db.employees.aggregate([
{
$sort: { salary: -1 }
},
{
$limit: 5
}
]);
5. Data Lookup and Joining
Question: How can Aggregation Pipelines perform complex lookups and joins between different collections?
Answer: The $lookup
stage facilitates left outer joins, while $unwind
can deconstruct arrays into multiple documents. These are useful for combining data from related collections.
Example: List all employees along with their associated projects from the projects
collection.
db.employees.aggregate([
{
$lookup: {
from: "projects",
localField: "_id",
foreignField: "employeeId",
as: "assignedProjects"
}
}
]);
6. Conditional Logic and Computations
Question: How can Aggregation Pipelines incorporate conditional logic and computations?
Answer: Operators like $cond
, $switch
, $ifNull
, and $add
can be used to perform logic and arithmetic operations.
Example: Add a new field to indicate whether an employee is high earning (over $100,000).
db.employees.aggregate([
{
$addFields: {
isHighEarning: { $cond: { if: { $gt: ["$salary", 100000] }, then: true, else: false } }
}
}
]);
7. Handling Null and Missing Values
Question: How can Aggregation Pipelines manage null or missing values?
Answer: Operators such as $ifNull
and $filter
allow for handling missing or null values efficiently.
Example: Set salary
to 0 for any null or missing value.
db.employees.aggregate([
{
$addFields: {
salary: { $ifNull: ["$salary", 0] }
}
}
]);
8. Complex Data Extraction Using $unwind
and $group
Question: How can Aggregation Pipelines handle arrays and extract complex data?
Answer: $unwind
splits the documents into multiple ones based on an array field. After $unwind
, operations like $group
can aggregate data across the expanded documents.
Example: Find out how many projects each employee is assigned to.
db.projects.aggregate([
{
$unwind: "$employeeIds"
},
{
$group: {
_id: "$employeeIds",
projectCount: { $sum: 1 }
}
}
]);
9. Pipeline Optimization and Indexing
Question: How can you improve the performance of MongoDB Aggregation Pipelines?
Answer: Proper indexing can significantly boost the performance of $match
stages. Utilizing indexes on fields that are frequently used in filtering, sorting, and grouping operations can substantially speed up the aggregation processes.
Example: Ensure there is an index on department
for the grouping query from earlier.
db.employees.createIndex({ department: 1 });
10. Data Export and Analysis
Question: Can Aggregation Pipelines aid in exporting and analyzing data?
Answer: Aggregation Pipelines can output data into various formats for further analysis in other tools via $out
to another collection, $merge
to merge with another collection, or $export
(in newer versions) to export results to a file or other destinations.
Example: Export the aggregated data to a CSV file using the MongoDB Shell.
Login to post a comment.