Sql Window Functions And Analytics Complete Guide

 Last Update:2025-06-23T00:00:00     .NET School AI Teacher - SELECT ANY TEXT TO EXPLANATION.    8 mins read      Difficulty-Level: beginner

Understanding the Core Concepts of SQL Window Functions and Analytics

SQL Window Functions and Analytics: Detailed Explanation and Important Information

Overview of Key Concepts

  1. Window Functions vs Aggregate Functions:

    • Window Functions allow computations over a set of rows called a window. They perform calculations over a set of rows and return a value for each row.
    • Aggregate Functions summarize data and return a single value, typically used with GROUP BY to summarize data into one row per group.
  2. Basic Syntax:

    SELECT 
        <columns>,
        <window_function> (<arguments>)
            OVER (
                [PARTITION BY <partitioning columns>]
                [ORDER BY <ordering columns>]
            ) AS <alias>
    FROM 
        <table>;
    
  3. Partitioning:

    • PARTITION BY divides the result set into partitions to which the window function is applied. If no partition is specified, the entire result set is treated as a single partition.
  4. Ordering:

    • ORDER BY within the OVER clause defines the order of rows in each partition. This is crucial for functions like ROW_NUMBER(), RANK(), SUM(), etc., which often require a specific order to perform their calculations.

Types of Window Functions

  1. Ranking Functions:

    • ROW_NUMBER(): Assigns a unique sequential integer to rows within a partition, starting at 1 for the first row in each partition.
      SELECT 
          employee_id, department, salary,
          ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
      FROM 
          Employees;
      
    • RANK(): Assigns a rank to rows within a partition. Rows with the same value receive the same rank, but the next rank is incremented by the number of rows with the duplicate rank.
      SELECT 
          employee_id, department, salary,
          RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
      FROM 
          Employees;
      
    • DENSE_RANK(): Similar to RANK(), but the next rank is not incremented by the number of duplicate ranks.
      SELECT 
          employee_id, department, salary,
          DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
      FROM 
          Employees;
      
    • NTILE(n): Divides a result set into n approximately equal parts.
      SELECT 
          employee_id, department, salary,
          NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) AS quartile
      FROM 
          Employees;
      
  2. Aggregate Window Functions:

    • SUM(): Calculates the running total of a column within a partition.
      SELECT 
          employee_id, department, salary,
          SUM(salary) OVER (PARTITION BY department ORDER BY employee_id) AS running_total
      FROM 
          Employees;
      
    • AVG(): Computes the moving average within a partition.
      SELECT 
          employee_id, department, salary,
          AVG(salary) OVER (PARTITION BY department ORDER BY salary ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS moving_avg
      FROM 
          Employees;
      
    • MIN(), MAX(): Determines the minimum and maximum values within a window frame.
      SELECT 
          employee_id, department, salary,
          MIN(salary) OVER (PARTITION BY department) AS min_salary,
          MAX(salary) OVER (PARTITION BY department) AS max_salary
      FROM 
          Employees;
      
  3. Analytic Functions:

    • LEAD() / LAG(): Retrieves data from a row before or after the current row in the partition.
      SELECT 
          employee_id, department, salary,
          LEAD(salary) OVER (PARTITION BY department ORDER BY salary) AS next_salary,
          LAG(salary) OVER (PARTITION BY department ORDER BY salary) AS prev_salary
      FROM 
          Employees;
      
    • FIRST_VALUE() / LAST_VALUE(): Returns the first and last values in a partition, based on an order.
      SELECT 
          employee_id, department, salary,
          FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS first_salary,
          LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS last_salary
      FROM 
          Employees;
      
  4. Frames:

    • Frames define a specific subset of rows within a partition, over which the window function is calculated. The most common frame clauses are ROWS BETWEEN and RANGE BETWEEN.

Online Code run

🔔 Note: Select your programming language to check or run code at

💻 Run Code Compiler

Step-by-Step Guide: How to Implement SQL Window Functions and Analytics

Introduction

SQL window functions perform a calculation across a set of table rows (a window frame) related to the current row. This window frame can be defined in a variety of ways, such as all preceding or following rows, or a specified set of rows.

Common Window Functions

Here are some of the common window functions:

  • ROW_NUMBER(): Assigns a unique number to each row within a partition of a result set.
  • RANK(): Assigns a rank to each row within a partition of a result set. If there are ties, the next row(s) will have the same rank, but the following row(s) will skip one or more ranks.
  • DENSE_RANK(): Similar to RANK(), but does not skip any ranks in case of ties.
  • NTILE(): Divides the ordered result set into a specified number of groups (buckets) and assigns a unique integer number to each row indicating to which group the row belongs.
  • SUM(): Calculates the cumulative sum of a column within a partition.
  • AVG(): Calculates the cumulative average of a column within a partition.
  • COUNT(): Counts the cumulative number of rows within a partition.
  • MIN() and MAX(): Finds the cumulative minimum and maximum values within a partition.

Setting Up the Environment

First, let's create a sample table for our examples. We'll create an employees table with the following data:

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    department_id INT,
    salary DECIMAL(10, 2)
);

INSERT INTO employees (employee_id, department_id, salary) VALUES
(1, 1, 50000.00),
(2, 1, 55000.00),
(3, 1, 60000.00),
(4, 2, 55000.00),
(5, 2, 60000.00),
(6, 2, 65000.00),
(7, 3, 52000.00),
(8, 3, 57000.00),
(9, 3, 62000.00);

Examples

Example 1: ROW_NUMBER()

Let's assign a unique row number to each employee within their department.

SELECT 
    employee_id,
    department_id,
    ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary) AS row_num
FROM 
    employees;

Output:

employee_id  department_id  row_num
1            1             1
2            1             2
3            1             3
4            2             1
5            2             2
6            2             3
7            3             1
8            3             2
9            3             3

Example 2: RANK()

Let's assign a rank to each employee within their department based on their salary.

SELECT 
    employee_id,
    department_id,
    salary,
    RANK() OVER (PARTITION BY department_id ORDER BY salary) AS rank_val
FROM 
    employees;

Output:

employee_id  department_id  salary      rank_val
1            1              50000.00    1
2            1              55000.00    2
3            1              60000.00    3
4            2              55000.00    1
5            2              60000.00    2
6            2              65000.00    3
7            3              52000.00    1
8            3              57000.00    2
9            3              62000.00    3

Example 3: DENSE_RANK()

Let's use DENSE_RANK() to assign ranks to employees within their department.

SELECT 
    employee_id,
    department_id,
    salary,
    DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary) AS dense_rank_val
FROM 
    employees;

Output:

employee_id  department_id  salary      dense_rank_val
1            1              50000.00    1
2            1              55000.00    2
3            1              60000.00    3
4            2              55000.00    1
5            2              60000.00    2
6            2              65000.00    3
7            3              52000.00    1
8            3              57000.00    2
9            3              62000.00    3

Example 4: NTILE()

Let's divide each department into 3 salary bands (buckets).

SELECT 
    employee_id,
    department_id,
    salary,
    NTILE(3) OVER (PARTITION BY department_id ORDER BY salary) AS ntile_val
FROM 
    employees;

Output:

employee_id  department_id  salary      ntile_val
1            1              50000.00    1
2            1              55000.00    2
3            1              60000.00    3
4            2              55000.00    1
5            2              60000.00    2
6            2              65000.00    3
7            3              52000.00    1
8            3              57000.00    2
9            3              62000.00    3

Example 5: SUM() - Cumulative Sum

Let's calculate the cumulative sum of salaries for each department.

SELECT 
    employee_id,
    department_id,
    salary,
    SUM(salary) OVER (PARTITION BY department_id ORDER BY salary) AS cumulative_sum
FROM 
    employees;

Output:

employee_id  department_id  salary      cumulative_sum
1            1              50000.00    50000.00
2            1              55000.00    105000.00
3            1              60000.00    165000.00
4            2              55000.00    55000.00
5            2              60000.00    115000.00
6            2              65000.00    180000.00
7            3              52000.00    52000.00
8            3              57000.00    109000.00
9            3              62000.00    171000.00

Example 6: AVG() - Cumulative Average

Let's calculate the cumulative average salary for each department.

SELECT 
    employee_id,
    department_id,
    salary,
    AVG(salary) OVER (PARTITION BY department_id ORDER BY salary) AS cumulative_avg
FROM 
    employees;

Output:

Top 10 Interview Questions & Answers on SQL Window Functions and Analytics

1. What are SQL Window Functions?

Answer: SQL Window Functions allow you to perform calculations across a set of rows related to the current row, without using subqueries. They are commonly used for ranking, ordering, and summarizing data in a more efficient manner. Unlike aggregate functions, window functions do not collapse groups into a single row.

2. How do you use the ROW_NUMBER() function?

Answer: The ROW_NUMBER() function assigns a unique sequential integer to rows within a partition of a result set. It's often used to assign a unique row number to each row in a result set after sorting. Here’s an example:

SELECT 
    EmployeeName, 
    Salary,
    ROW_NUMBER() OVER (ORDER BY Salary DESC) AS RowNum
FROM Employees;

This query assigns a row number to each employee ordered by their salary in descending order.

3. Can you explain the use of RANK() and DENSE_RANK() functions?

Answer: Both RANK() and DENSE_RANK() assign a rank number to rows, but they handle ties differently.

  • RANK(): If two or more rows have the same salary, they receive the same rank. The next rank(s) is skipped. For example, if two rows are tied for rank 1, the next rank is 3.
    SELECT 
        EmployeeName, 
        Salary,
        RANK() OVER (ORDER BY Salary DESC) AS Rank
    FROM Employees;
    
  • DENSE_RANK(): Similar to RANK(), but the next rank is not skipped. If two rows are tied for rank 1, the next rank is 2.
    SELECT 
        EmployeeName, 
        Salary,
        DENSE_RANK() OVER (ORDER BY Salary DESC) AS DenseRank
    FROM Employees;
    

4. What is the difference between PARTITION BY and ORDER BY in window functions?

Answer:

  • PARTITION BY: This clause divides the result set into partitions to which window functions are applied. It effectively breaks the result set into smaller sets, and the window function is applied to each set independently.
  • ORDER BY: Used within the OVER clause to specify the order in which rows within a partition are processed by the window function. Example:
SELECT 
    Department, 
    EmployeeName, 
    Salary,
    ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS RowNum
FROM Employees;

This query assigns a unique row number to each employee within their respective department, ordered by salary descending.

5. How does the SUM() window function work?

Answer: The SUM() window function calculates the cumulative sum of a given column either within a partition or across the entire result set. Here’s an example using SUM():

SELECT 
    OrderDate, 
    Amount,
    SUM(Amount) OVER (ORDER BY OrderDate) AS CumulativeSum
FROM Orders;

This query calculates the cumulative sum of order amounts by order date.

6. What is the difference between a window frame and the OVER clause?

Answer:

  • OVER Clause: Defines the partitioning and ordering of the result set, and optionally specifies the window frame.
  • Window Frame: Specifies how the rows within a partition are to be grouped for the window function to operate upon. It can be defined as a set of rows preceding and/or following the current row. It includes three components: ROWS BETWEEN, RANGE BETWEEN, and GROUPS BETWEEN. Example with Window Frame:
SELECT 
    OrderDate, 
    Amount,
    AVG(Amount) OVER (ORDER BY OrderDate ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS MovingAverage3Day
FROM Orders;

This calculates the 3-day moving average of order amounts.

7. Can you explain the LEAD() and LAG() functions?

Answer:

  • LEAD(): A window function that provides access to a row at a specified offset position after the current row within the partition of a result set.
  • LAG(): A window function that provides access to a row at a specified offset position before the current row within the partition of a result set. Example:
SELECT 
    EmployeeName, 
    Salary,
    LEAD(Salary) OVER (ORDER BY EmployeeName) AS NextSalary,
    LAG(Salary) OVER (ORDER BY EmployeeName) AS PreviousSalary
FROM Employees;

This query fetches the next and previous employee's salary.

8. How can you use window functions for calculating running totals or running averages?

Answer: You can use window functions like SUM() and AVG() with the OVER clause to calculate running totals and averages. For instance:

SELECT 
    OrderDate, 
    Amount,
    SUM(Amount) OVER (ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CumulativeSum,
    AVG(Amount) OVER (ORDER BY OrderDate ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS MovingAverage3Day
FROM Orders;

This calculates the cumulative sum and a 3-day moving average of the order amounts.

9. What are the performance considerations when using window functions?

Answer: While window functions are powerful, they can impact performance if not used efficiently. Considerations include:

  • Avoiding large window frames that process numerous rows.
  • Indexing columns used in PARTITION BY and ORDER BY clauses.
  • Ensuring that the ROWS BETWEEN clause is optimized.
  • Testing and analyzing query performance using tools like execution plans.

10. How can window functions be nested or combined?

Answer: While window functions are typically used within the OVER clause, it's possible to combine them or nest them for more complex calculations, such as calculating the average of running totals:

SELECT 
    OrderDate, 
    Amount,
    AVG(CumulativeSum) OVER (ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS AvgCumulativeSum
FROM (
    SELECT 
        OrderDate, 
        Amount,
        SUM(Amount) OVER (ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CumulativeSum
    FROM Orders
) T;

This query calculates the average of the cumulative sums in a moving window of two rows.

Summary

You May Like This Related .NET Topic

Login to post a comment.