Chapter 7 - Advanced Queries

 

HIERARCHICAL QUERIES:

Queries that can be used to produce a tree of some kind such as a family tree or organizational chart.

Root Node: Top node in the tree.

Parent Node: One or nodes beneath it.

Child Node: Has a parent above it.

Leaf Node:  No children.

 

Example-1:

SELECT employee_id, manager_id, first_name, last_name
FROM more_employees
START WITH employee_id=1
CONNECT BY PRIOR employee_id = manager_id;

Example-2:

SELECT LEVEL, employee_id, manager_id, first_name, last_name
FROM more_employees
START WITH employee_id=1
CONNECT BY PRIOR employee_id = manager_id
ORDER BY LEVEL;

Example-3:

SET PAGESIZE 999

COLUMN employee FORMAT A25

SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||' '|| last_name
FROM more_employees
START WITH employee_id=1
CONNECT BY PRIOR employee_id = manager_id;

 

Example-4: Start with Node other than ROOT

SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||' '|| last_name
FROM more_employees
START WITH last_name='Jones'
CONNECT BY PRIOR employee_id = manager_id;

 

Example-5: Using a Sub-Query.

SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||' '|| last_name
FROM more_employees
START WITH employee_id=
    (SELECT employee_id FROM more_employees WHERE first_name = 'Kevin' AND last_name = 'Black')
CONNECT BY PRIOR employee_id = manager_id;

 

Example-6: Traversing Upward through the tree.

SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||' '|| last_name
FROM more_employees
START WITH last_name='Jones'
CONNECT BY PRIOR manager_id = employee_id;

 

Example-7: Removing Nodes.

SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||' '|| last_name
FROM more_employees
WHERE last_name!='Johnson'
START WITH employee_id=1
CONNECT BY PRIOR employee_id = manager_id;

Example-8: Removing Branches.

SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||' '|| last_name
FROM more_employees
START WITH employee_id=1
CONNECT BY PRIOR employee_id = manager_id AND last_name!='Johnson';

 

Example-9: Using as a Sub-Query.

SELECT lvl, empname FROM(
        SELECT LEVEL as lvl, LPAD(' ', 2 * LEVEL -1) || first_name ||' '|| last_name As empname
        FROM more_employees
        START WITH employee_id=1
        CONNECT BY PRIOR employee_id = manager_id) inner
WHERE lvl <= 2
 

Using the Extended GROUP BY Clauses

ROLLUP: Generates sub-totals for each column in the group by and totals for the whole

CUBE: Generates sub-totals for all combinations of the values in the group by clause

GROUPING:  Returns 1 if the columns value is generated by a ROLLUP or CUBE.

GROUPING_ID() Function: Can be used in a Having Clause to filter groups.

GROUP_ID() Function: Remove duplicate rows returned by GROUP BY clause.

Rollup and Grouping Examples 

Example 1: Rollup

select gender, avg(sal) from emp group by ROLLUP(gender);

Output 1: Rollup
G AVG(SAL)
- ----------
F 2025
M 2282.14286
    2153.57143

Example 2: Rollup

select gender, deptno, avg(sal) from emp group by rollup(gender, deptno);

Output 2
G DEPTNO AVG(SAL)
- ---------- ----------
F 10             1300
F 20             2987.5
F 30             1725
F                  2025
M 10            4225
M 20            1633.33333
M 30            1312.5
M                 2282.14286
                    2153.57143

 

        Example 3: Rollup and Grouping

SELECT DECODE(GROUPING(gender), 1, 'Overall ', gender) as gender, DECODE(GROUPING(deptno), 1, 'Sex Avg', deptno) as deptno, avg(sal)
FROM emp GROUP BY rollup(gender, deptno);

Output 3: Rollup and Grouping

        GENDER DEPTNO AVG(SAL)
        -------- ------------- ----------
                F         10          1300
                F         20          2987.5
                F         30          1725
                F         Sex Avg 2025
                M        10          4225
                M        20         1633.33333
                M        30         1312.5
                M       Sex Avg 2282.14286
               Overall Sex Avg 2153.57143

9 rows selected.

 

Cube and Grouping Handout CIS328

Example 4: Cube with one field

select gender, avg(sal) from emp group by cube(gender);

G AVG(SAL)
- ----------
F     2025
M    2282.14286
       2153.57143

Example 5: Cube with two fields

select gender, deptno, avg(sal) from emp group by cube(gender, deptno);

G DEPTNO AVG(SAL)
- ---------- ----------
F         10     1300
F         20     2987.5
F         30     1725
F                  2025
M         10    4225
M         20    1633.33333
M         30    1312.5
M                 2282.14286
            10     3250
            20     2175
            30     1587.5
                     2153.57143

 

Example 6: Cube and Grouping

SELECT DECODE(GROUPING(gender), 1, 'All Sex', gender) As gender,
             DECODE(GROUPING(deptno), 1, 'All Depts', deptno) as Dept, avg(sal)
FROM emp
GROUP BY CUBE(gender, deptno);

Output 6

GENDER DEPT         AVG(SAL)
------- --------------- ----------
F                     10         1300
F                     20         2987.5
F                     30         1725
F             All Depts       2025
M                    10         4225
M                    20         1633.33333
M                    30         1312.5
M          All Depts         2282.14286
All Sex             10         3250
All Sex             20         2175
All Sex             30         1587.5
All Sex   All Depts         2153.57143

 

Example 7: GROUPING_ID() Function:

SELECT DECODE(GROUPING(gender), 1, 'All Sex', gender) As gender,
             DECODE(GROUPING(deptno), 1, 'All Depts', deptno) as Dept, avg(sal)
FROM emp
GROUP BY CUBE(gender, deptno)
HAVING GROUPING_ID(gender, deptno) > 0;

 

Example 7: GROUPING_ID() Function:

SELECT DECODE(GROUPING(gender), 1, 'All Sex', gender) As gender,
             DECODE(GROUPING(deptno), 1, 'All Depts', deptno) as Dept, avg(sal)
FROM emp
GROUP BY CUBE(gender, deptno)
HAVING GROUP_ID() = 0;

 

Using Analytic Functions:

Ranking functions:  Calculate Ranks, percentiels, n-tiles, etc.

Inverse percentile functions:  Enable you to calculate the value that corresponds to a percentile.

Window functions:  Enable you to calculate cumulative and moving aggregates.

Reporting functions:  Enable you to calculate things like market shares.

Lag and lead functions:  Enable you to get a value in a row where that row is a certain number of rows away from the current now.

First and last functions:  Enalbe you to get the first and last values in an ordered group.

Linear regression functions:  Enable you to fit an ordinary-least-squares regression time to a set of number pairs.

Hypothetical rank and distribution functions:  Enable you to calculate the rank and percentile that a new row would have if you inserted it into a table.

 

SELECT * FROM all_sales; -- Example table.

 

Ranking Functions: RANK(), DENSE_RANK(), CUME_DIST(), PERCENT_RANK(), NTILE(), ROW_NUMBER()

SELECT prd_type_id, SUM(amount),
        RANK() OVER (ORDER BY SUM(amount) DESC) AS rank,
        DENSE_RANK() OVER (ORDER BY SUM(amount) DESC) AS dense_rank
FROM all_sales
WHERE year=2003
AND amount IS NOT NULL
GROUP BY prd_type_id
ORDER BY prd_type_id;
 

SELECT prd_type_id, SUM(amount),
        RANK() OVER (ORDER BY SUM(amount) DESC) AS rank,
        DENSE_RANK() OVER (ORDER BY SUM(amount) DESC) AS dense_rank
FROM all_sales
WHERE year=2003
GROUP BY prd_type_id
ORDER BY prd_type_id;
 

SELECT prd_type_id, SUM(amount),
        RANK() OVER (ORDER BY SUM(amount) DESC NULLS LAST) AS rank,
        DENSE_RANK() OVER (ORDER BY SUM(amount) DESC NULLS LAST) AS dense_rank
FROM all_sales
WHERE year=2003
GROUP BY prd_type_id
ORDER BY prd_type_id;
 

PARTITION clause:  Allows you to create sub-groups.

SELECT prd_type_id, month, SUM(amount),
        RANK() OVER (PARTITION BY month ORDER BY SUM(amount) DESC) AS rank
FROM all_sales
WHERE year=2003 AND amount IS NOT NULL
GROUP BY prd_type_id, month
ORDER BY prd_type_id, month;
 

You can also use the ROLLUP, CUBE, and GROUPING SETS functions:

SELECT prd_type_id, SUM(amount),
        RANK() OVER (ORDER BY SUM(amount) DESC) AS rank
FROM all_sales
WHERE year=2003 AND amount IS NOT NULL
GROUP BY ROLLUP(prd_type_id)
ORDER BY prd_type_id;
 

SELECT prd_type_id, emp_id, SUM(amount),
        RANK() OVER (ORDER BY SUM(amount) DESC) AS rank
FROM all_sales
WHERE year=2003 AND amount IS NOT NULL
GROUP BY CUBE(prd_type_id, emp_id)
ORDER BY prd_type_id, emp_id;
 

SELECT prd_type_id, emp_id, SUM(amount),
        RANK() OVER (ORDER BY SUM(amount) DESC) AS rank
FROM all_sales
WHERE year=2003 AND amount IS NOT NULL
GROUP BY GROUPING SETS(prd_type_id, emp_id)
ORDER BY prd_type_id, emp_id;
 

 

Chapter 7 - Part Two (2), Pages 221 to 244:

 

Using the CUME_DIST() and PERCENT_RANK() Functions:

CUME_DIST():  Calculate the position of a value relative to a group of values.

PERCENT_RANK():  Calculate the percent rank of a value relative to a group of values.

SELECT prd_type_id, SUM(amount), CUME_DIST() OVER (ORDER BY SUM(amount) DESC) AS cume_dist,
                PERCENT_RANK() OVER (ORDER BY SUM(amount) DESC) AS percent_rank
FROM  all_sales
WHERE year=2003
GROUP BY prd_type_id
ORDER BY prd_type_id;

 

NTILE(n): Calculate n-tiles by n. 

SELECT prd_type_id, SUM(amount), NTILE(2) OVER (ORDER BY SUM(amount) DESC) AS ntile
FROM  all_sales
WHERE year=2003 AND amount IS NOT NULL
GROUP BY prd_type_id
ORDER BY prd_type_id;

PRD_TYPE_ID SUM(AMOUNT)      NTILE
----------- ----------- ----------
          1   905081.84          1
          2   186381.22          2
          3   478270.91          1
          4   402751.16          2

ROW_NUMBER(): Row number for each group. 

SELECT prd_type_id, SUM(amount), ROW_NUMBER() OVER (ORDER BY SUM(amount) DESC) AS row_number
FROM  all_sales
WHERE year=2003
GROUP BY prd_type_id
ORDER BY prd_type_id;

 

Using the Inverse Percentile Functions PERCENTILE_CONT(n) and PERCENTILE_DISC(n):

SELECT  PERCENTILE_CONT(0.6) WITHIN GROUP (ORDER BY SUM(amount) DESC) AS percent_cont,
                PERCENTILE_DISC(0.6) WITHIN GROUP (ORDER BY SUM(amount) DESC) AS percent_disc
FROM  all_sales
WHERE year=2003
GROUP BY prd_type_id;

 

Using the Window Functions:  Calculate cumulative values for moving ranges.  Can be used with any aggregate function including: SUM, COUNT, AVG as well as the LAST_VALUE() and FIRST_VALUE functions.

SELECT month, SUM(amount) AS month_amount,
            SUM(SUM(amount))
                 OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_amount
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;

     MONTH MONTH_AMOUNT CUMULATIVE_AMOUNT
---------- ------------ -----------------
         1     95525.55          95525.55
         2     116671.6         212197.15
         3    160307.92         372505.07
         4     175998.8         548503.87
         5    154349.44         702853.31
         6    124951.36         827804.67
         7    170296.16         998100.83
         8    212735.68        1210836.51
         9    199609.68        1410446.19
        10    264480.79        1674926.98
        11    160221.98        1835148.96

     MONTH MONTH_AMOUNT CUMULATIVE_AMOUNT
---------- ------------ -----------------
        12    137336.17        1972485.13

Notice the third column is cumulative of the second column.

SELECT month, SUM(amount) AS month_amount,
            SUM(SUM(amount))
                 OVER (ORDER BY month ROWS UNBOUNDED PRECEDING) AS cumulative_amount
FROM all_sales
WHERE year = 2003
AND month BETWEEN 6 AND 12
GROUP BY month
ORDER BY month;

 

Performing a Moving Average:

SELECT month, SUM(amount) AS month_amount,
            AVG(SUM(amount))
                 OVER (ORDER BY month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS moving_average
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;

 

Performing a Centered Average:

SELECT month, SUM(amount) AS month_amount,
            AVG(SUM(amount))
                 OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS center_average
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;

 

Getting First and Last Rows Using the FIRST_VALUE() and LAST_VALUE functions:

SELECT month, SUM(amount) AS month_amount,
        FIRST_VALUE(SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) As previous_month_amount,
        LAST_VALUE(SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) As next_month_amount
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;
 

SELECT month, SUM(amount) AS month_amount,
        SUM(amount) / FIRST_VALUE(SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) As curr_div_prev,
        SUM(amount) / LAST_VALUE(SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) As curr_div_next
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;

 

Page: 229 - USING The Reporting Functions:  These are the standard aggregate functions such as SUM, AVG, MAX, MIN, VARIANCE, COUNT, STDDEV

SELECT month, prd_type_id,
        SUM(SUM(amount)) OVER (PARTITION BY month) AS total_month_amount,
        SUM(SUM(amount)) OVER (PARTITION BY prd_type_id) AS total_product_type_amount
FROM all_sales
WHERE year = 2003 AND month <= 3
GROUP BY month, prd_type_id
ORDER BY month, prd_type_id;

 

Page: 230 - Using the RATIO_TO_REPORT() Function:

SELECT month, prd_type_id, SUM(amount) AS prd_type_amount,
        RATIO_TO_REPORT(SUM(amount)) OVER (PARTITION BY month) AS prd_type_ratio
FROM all_sales
WHERE year = 2003 AND month <= 3
GROUP BY month, prd_type_id
ORDER BY month, prd_type_id;

 

Page: 232 - Using the LAG() and LEAD() Function:

SELECT month, SUM(amount) AS month_amount,
        LAG(SUM(amount), 1) OVER (ORDER BY month) AS previous_month_amount,
        LEAD(SUM(amount), 1) OVER (ORDER BY month) AS next_month_amount
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;

 

Page: 232 - Using FIRST and LAST functions:

SELECT MIN(month) KEEP (DENSE_RANK FIRST ORDER BY SUM(amount)) AS highest_sales_month,
                MIN(month) KEEP (DENSE_RANK LAST ORDER BY SUM(amount)) AS lowest_sales_month
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;

 

PAGE 233 - Using the Linear Regression Functions:

 

SELECT prd_type_id,
                REGR_AVGX(amount, month) AS avgx,
                REGR_AVGY(amount, month) AS avgy,
                REGR_COUNT(amount, month) AS count,
                REGR_INTERCEPT(amount, month) AS inter,
                REGR_R2(amount, month) AS r2,
                REGR_SLOPE(amount, month) AS slope,
                REGR_SXX(amount, month) AS sxx,
                REGR_SXY(amount, month) AS sxy,
                REGR_SXY(amount, month) AS sxy
FROM all_sales
WHERE year = 2003
GROUP BY prd_type_id
ORDER BY prd_type_id;

 

PAGE 234 - Using the Hypothetical Rank and Distribution Functions:

SELECT prd_type_id, SUM(amount),
                RANK() OVER (ORDER BY SUM(amount) DESC) AS rank,
                PERCENT_RANK() OVER (ORDER BY SUM(amount) DESC) AS percent_rank
FROM all_sales
WHERE year = 2003 AND amount IS NOT NULL
GROUP BY prd_type_id
ORDER BY prd_type_id;

 

SELECT  RANK(500000) WITHIN GROUP (ORDER BY SUM(amount) DESC) AS rank,
                PERCENT_RANK(500000) WITHIN GROUP (ORDER BY SUM(amount) DESC) AS percent_rank
FROM all_sales
WHERE year = 2003 AND amount IS NOT NULL
GROUP BY prd_type_id
ORDER BY prd_type_id;

 

PAGE 235 - Using the MODEL clause.

SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
    sales_amount[1,2004] = sales_amount[1,2003],
    sales_amount[2,2004] = sales_amount[2,2003] +  sales_amount[3,2003],
    sales_amount[3,2004] = ROUND(sales_amount[3,2003] * 1.25, 2)
)
ORDER BY prd_type_id, year, month;

 

PAGE 237 - Using Positional and Symbolic Notation to Access Cells:

SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
    sales_amount[month=1,year=2004] = sales_amount[month=1,year=2003],
    sales_amount[month=2,year=2004] = sales_amount[month=2,year=2003] +  sales_amount[month=3,year=2003],
    sales_amount[month=3,year=2004] = ROUND(sales_amount[month=3,year=2003] * 1.25, 2)
)
ORDER BY prd_type_id, year, month;

 

PAGE 238 - Access a Range of Cells Using BETWEEN and AND:

SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
    sales_amount[1, 2004] = ROUND(AVG(sales_amount)[month BETWEEN 1 AND 3, 2003], 2)
)
ORDER BY prd_type_id, year, month;

PAGE 238 - Access a Range of Cells Using ANY and IS ANY:

SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
    sales_amount[1, 2004] = ROUND(SUM(sales_amount)[ANY, year IS ANY], 2)
)
ORDER BY prd_type_id, year, month;

PAGE 239 - Getting the Current Value of a Dimension Using CURRENTV():

SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
    sales_amount[1, 2004] = ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2)
)
ORDER BY prd_type_id, year, month;

PAGE 240 - Access a Range of Cells Using a FOR Loop:

SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
    sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] = ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2)
)
ORDER BY prd_type_id, year, month;

 

PAGE 240 - Handling Null and Missing Values:

SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
    sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] =
        CASE WHEN sales_amount[CURRENTV(), 2003] IS PRESENT THEN
            ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2)
        ELSE
            0
        END
)
ORDER BY prd_type_id, year, month;

 

PAGE 242 - PRESENTV():

SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
    sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] =
        PRESENTV (sales_amount[CURRENTV(), 2003],
            ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2),0)
)
ORDER BY prd_type_id, year, month;

 

PAGE 242 - PRESENTNNV():

SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
    sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] =
        PRESENTNNV (sales_amount[CURRENTV(), 2003],
            ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2),0)
)
ORDER BY prd_type_id, year, month;

 

PAGE 243 - Using IGNORE NAV and KEEP NAV:

SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL IGNORE NAV
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
    sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] =
            ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2)
)
ORDER BY prd_type_id, year, month;

 

PAGE 243 - Updating Existing Cells:

SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL IGNORE NAV
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount)
RULES UPDATE (
    sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] =
            ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2)
)
ORDER BY prd_type_id, year, month;