HIERARCHICAL QUERIES:
Queries that can be used to produce a tree of some kind such as a family tree or organizational chart.
Root Node: Top node in the tree.
Parent Node: One or nodes beneath it.
Child Node: Has a parent above it.
Leaf Node: No children.
Example-1:
SELECT employee_id, manager_id, first_name, last_name
FROM more_employees
START WITH employee_id=1
CONNECT BY PRIOR employee_id = manager_id;
Example-2:
SELECT LEVEL, employee_id, manager_id, first_name, last_name
FROM more_employees
START WITH employee_id=1
CONNECT BY PRIOR employee_id = manager_id
ORDER BY LEVEL;
Example-3:
SET PAGESIZE 999
COLUMN employee FORMAT A25
SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||'
'|| last_name
FROM more_employees
START WITH employee_id=1
CONNECT BY PRIOR employee_id = manager_id;
Example-4: Start with Node other than ROOT
SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||'
'|| last_name
FROM more_employees
START WITH last_name='Jones'
CONNECT BY PRIOR employee_id = manager_id;
Example-5: Using a Sub-Query.
SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||' '||
last_name
FROM more_employees
START WITH employee_id=
(SELECT employee_id FROM more_employees WHERE first_name =
'Kevin' AND last_name = 'Black')
CONNECT BY PRIOR employee_id = manager_id;
Example-6: Traversing Upward through the tree.
SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||'
'|| last_name
FROM more_employees
START WITH last_name='Jones'
CONNECT BY PRIOR manager_id = employee_id;
Example-7: Removing Nodes.
SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||'
'|| last_name
FROM more_employees
WHERE last_name!='Johnson'
START WITH employee_id=1
CONNECT BY PRIOR employee_id = manager_id;
Example-8: Removing Branches.
SELECT LEVEL, LPAD(' ', 2 * LEVEL -1) || first_name ||'
'|| last_name
FROM more_employees
START WITH employee_id=1
CONNECT BY PRIOR employee_id = manager_id AND last_name!='Johnson';
Example-9: Using as a Sub-Query.
SELECT lvl, empname FROM(
SELECT LEVEL as lvl, LPAD(' ', 2 *
LEVEL -1) || first_name ||' '|| last_name As empname
FROM more_employees
START WITH employee_id=1
CONNECT BY PRIOR employee_id =
manager_id) inner
WHERE lvl <= 2
Using the Extended GROUP BY Clauses
ROLLUP: Generates sub-totals for each column in the group by and totals for the whole
CUBE: Generates sub-totals for all combinations of the values in the group by clause
GROUPING: Returns 1 if the columns value is generated by a ROLLUP or CUBE.
GROUPING_ID() Function: Can be used in a Having Clause to filter groups.
GROUP_ID() Function: Remove duplicate rows returned by GROUP BY clause.
Rollup and Grouping Examples
Example 1: Rollup
select gender, avg(sal) from emp group by ROLLUP(gender);
Output 1: Rollup
G AVG(SAL)
- ----------
F 2025
M 2282.14286
2153.57143Example 2: Rollup
select gender, deptno, avg(sal) from emp group by rollup(gender, deptno);
Output 2
G DEPTNO AVG(SAL)
- ---------- ----------
F 10 1300
F 20 2987.5
F 30 1725
F 2025
M 10 4225
M 20 1633.33333
M 30 1312.5
M 2282.14286
2153.57143
Example 3: Rollup and Grouping
SELECT DECODE(GROUPING(gender), 1, 'Overall ', gender) as gender, DECODE(GROUPING(deptno), 1, 'Sex Avg', deptno) as deptno, avg(sal)
FROM emp GROUP BY rollup(gender, deptno);Output 3: Rollup and Grouping
GENDER DEPTNO
AVG(SAL)
-------- -------------
----------
F
10
1300
F
20
2987.5
F
30
1725
F Sex Avg
2025
M
10
4225
M
20
1633.33333
M
30
1312.5
M Sex Avg
2282.14286
Overall Sex Avg 2153.57143
9 rows selected.
Cube and Grouping Handout CIS328
Example 4: Cube with one field
select gender, avg(sal) from emp group by cube(gender);
G AVG(SAL)
- ----------
F 2025
M 2282.14286
2153.57143Example 5: Cube with two fields
select gender, deptno, avg(sal) from emp group by cube(gender, deptno);
G DEPTNO AVG(SAL)
- ---------- ----------
F 10 1300
F 20 2987.5
F 30 1725
F 2025
M 10 4225
M 20 1633.33333
M 30 1312.5
M 2282.14286
10 3250
20 2175
30 1587.5
2153.57143
Example 6: Cube and Grouping
SELECT DECODE(GROUPING(gender), 1, 'All Sex', gender) As gender,
DECODE(GROUPING(deptno), 1, 'All Depts', deptno) as Dept, avg(sal)
FROM emp
GROUP BY CUBE(gender, deptno);Output 6
GENDER DEPT AVG(SAL)
------- --------------- ----------
F 10 1300
F 20 2987.5
F 30 1725
F All Depts 2025
M 10 4225
M 20 1633.33333
M 30 1312.5
M All Depts 2282.14286
All Sex 10 3250
All Sex 20 2175
All Sex 30 1587.5
All Sex All Depts 2153.57143
Example 7: GROUPING_ID() Function:
SELECT DECODE(GROUPING(gender), 1, 'All Sex', gender) As gender,
DECODE(GROUPING(deptno), 1, 'All Depts', deptno) as Dept, avg(sal)
FROM emp
GROUP BY CUBE(gender, deptno)
HAVING GROUPING_ID(gender, deptno) > 0;
Example 7: GROUPING_ID() Function:
SELECT DECODE(GROUPING(gender), 1, 'All Sex', gender) As
gender,
DECODE(GROUPING(deptno), 1, 'All Depts', deptno) as Dept, avg(sal)
FROM
emp
GROUP BY CUBE(gender, deptno)
HAVING GROUP_ID() = 0;
Using Analytic Functions:
Ranking functions: Calculate Ranks, percentiels, n-tiles, etc.
Inverse percentile functions: Enable you to calculate the value that corresponds to a percentile.
Window functions: Enable you to calculate cumulative and moving aggregates.
Reporting functions: Enable you to calculate things like market shares.
Lag and lead functions: Enable you to get a value in a row where that row is a certain number of rows away from the current now.
First and last functions: Enalbe you to get the first and last values in an ordered group.
Linear regression functions: Enable you to fit an ordinary-least-squares regression time to a set of number pairs.
Hypothetical rank and distribution functions: Enable you to calculate the rank and percentile that a new row would have if you inserted it into a table.
SELECT * FROM all_sales; -- Example table.
Ranking Functions: RANK(), DENSE_RANK(), CUME_DIST(), PERCENT_RANK(), NTILE(), ROW_NUMBER()
SELECT prd_type_id, SUM(amount),
RANK() OVER (ORDER BY SUM(amount)
DESC) AS rank,
DENSE_RANK() OVER (ORDER BY
SUM(amount) DESC) AS dense_rank
FROM all_sales
WHERE year=2003
AND amount IS NOT NULL
GROUP BY prd_type_id
ORDER BY prd_type_id;
SELECT prd_type_id, SUM(amount),
RANK() OVER (ORDER BY SUM(amount)
DESC) AS rank,
DENSE_RANK() OVER (ORDER BY
SUM(amount) DESC) AS dense_rank
FROM all_sales
WHERE year=2003
GROUP BY prd_type_id
ORDER BY prd_type_id;
SELECT prd_type_id, SUM(amount),
RANK() OVER (ORDER BY SUM(amount)
DESC NULLS LAST) AS rank,
DENSE_RANK() OVER (ORDER BY
SUM(amount) DESC NULLS LAST) AS dense_rank
FROM all_sales
WHERE year=2003
GROUP BY prd_type_id
ORDER BY prd_type_id;
PARTITION clause: Allows you to create sub-groups.
SELECT prd_type_id, month, SUM(amount),
RANK() OVER (PARTITION BY month
ORDER BY SUM(amount) DESC) AS rank
FROM all_sales
WHERE year=2003 AND amount IS NOT NULL
GROUP BY prd_type_id, month
ORDER BY prd_type_id, month;
You can also use the ROLLUP, CUBE, and GROUPING SETS functions:
SELECT prd_type_id, SUM(amount),
RANK() OVER (ORDER BY SUM(amount)
DESC) AS rank
FROM all_sales
WHERE year=2003 AND amount IS NOT NULL
GROUP BY ROLLUP(prd_type_id)
ORDER BY prd_type_id;
SELECT prd_type_id, emp_id, SUM(amount),
RANK() OVER (ORDER BY SUM(amount)
DESC) AS rank
FROM all_sales
WHERE year=2003 AND amount IS NOT NULL
GROUP BY CUBE(prd_type_id, emp_id)
ORDER BY prd_type_id, emp_id;
SELECT prd_type_id, emp_id, SUM(amount),
RANK() OVER (ORDER BY SUM(amount)
DESC) AS rank
FROM all_sales
WHERE year=2003 AND amount IS NOT NULL
GROUP BY GROUPING SETS(prd_type_id, emp_id)
ORDER BY prd_type_id, emp_id;
Using the CUME_DIST() and PERCENT_RANK() Functions:
CUME_DIST(): Calculate the position of a value relative to a group of values.
PERCENT_RANK(): Calculate the percent rank of a value relative to a group of values.
SELECT prd_type_id, SUM(amount), CUME_DIST() OVER (ORDER BY SUM(amount) DESC)
AS cume_dist,
PERCENT_RANK() OVER (ORDER BY SUM(amount) DESC) AS percent_rank
FROM all_sales
WHERE year=2003
GROUP BY prd_type_id
ORDER BY prd_type_id;
NTILE(n): Calculate n-tiles by n.
SELECT prd_type_id, SUM(amount), NTILE(2) OVER (ORDER BY SUM(amount) DESC) AS
ntile
FROM all_sales
WHERE year=2003 AND amount IS NOT NULL
GROUP BY prd_type_id
ORDER BY prd_type_id;
PRD_TYPE_ID SUM(AMOUNT) NTILE
----------- ----------- ----------
1 905081.84 1
2 186381.22 2
3 478270.91 1
4 402751.16 2
ROW_NUMBER(): Row number for each group.
SELECT prd_type_id, SUM(amount), ROW_NUMBER() OVER (ORDER BY SUM(amount) DESC) AS
row_number
FROM all_sales
WHERE year=2003
GROUP BY prd_type_id
ORDER BY prd_type_id;
Using the Inverse Percentile Functions PERCENTILE_CONT(n) and PERCENTILE_DISC(n):
SELECT PERCENTILE_CONT(0.6) WITHIN GROUP (ORDER BY SUM(amount) DESC)
AS percent_cont,
PERCENTILE_DISC(0.6) WITHIN GROUP (ORDER BY SUM(amount) DESC) AS percent_disc
FROM all_sales
WHERE year=2003
GROUP BY prd_type_id;
Using the Window Functions: Calculate cumulative values for moving ranges. Can be used with any aggregate function including: SUM, COUNT, AVG as well as the LAST_VALUE() and FIRST_VALUE functions.
SELECT month, SUM(amount) AS month_amount,
SUM(SUM(amount))
OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS
cumulative_amount
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;
MONTH MONTH_AMOUNT CUMULATIVE_AMOUNT
---------- ------------ -----------------
1 95525.55 95525.55
2 116671.6 212197.15
3 160307.92 372505.07
4 175998.8 548503.87
5 154349.44 702853.31
6 124951.36 827804.67
7 170296.16 998100.83
8 212735.68 1210836.51
9 199609.68 1410446.19
10 264480.79 1674926.98
11 160221.98 1835148.96
MONTH MONTH_AMOUNT CUMULATIVE_AMOUNT
---------- ------------ -----------------
12 137336.17 1972485.13
Notice the third column is cumulative of the second column.
SELECT month, SUM(amount) AS month_amount,
SUM(SUM(amount))
OVER (ORDER BY month ROWS UNBOUNDED PRECEDING) AS cumulative_amount
FROM all_sales
WHERE year = 2003
AND month BETWEEN 6 AND 12
GROUP BY month
ORDER BY month;
Performing a Moving Average:
SELECT month, SUM(amount) AS month_amount,
AVG(SUM(amount))
OVER (ORDER BY month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS moving_average
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;
Performing a Centered Average:
SELECT month, SUM(amount) AS month_amount,
AVG(SUM(amount))
OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS center_average
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;
Getting First and Last Rows Using the FIRST_VALUE() and LAST_VALUE functions:
SELECT month, SUM(amount) AS month_amount,
FIRST_VALUE(SUM(amount)) OVER (ORDER
BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) As previous_month_amount,
LAST_VALUE(SUM(amount)) OVER (ORDER
BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) As next_month_amount
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;
SELECT month, SUM(amount) AS month_amount,
SUM(amount) / FIRST_VALUE(SUM(amount))
OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) As curr_div_prev,
SUM(amount) / LAST_VALUE(SUM(amount))
OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) As curr_div_next
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;
Page: 229 - USING The Reporting Functions: These are the standard aggregate functions such as SUM, AVG, MAX, MIN, VARIANCE, COUNT, STDDEV
SELECT month, prd_type_id,
SUM(SUM(amount)) OVER (PARTITION BY
month) AS total_month_amount,
SUM(SUM(amount)) OVER (PARTITION BY
prd_type_id) AS total_product_type_amount
FROM all_sales
WHERE year = 2003 AND month <= 3
GROUP BY month, prd_type_id
ORDER BY month, prd_type_id;
Page: 230 - Using the RATIO_TO_REPORT() Function:
SELECT month, prd_type_id, SUM(amount) AS prd_type_amount,
RATIO_TO_REPORT(SUM(amount))
OVER (PARTITION BY month) AS prd_type_ratio
FROM all_sales
WHERE year = 2003 AND month <= 3
GROUP BY month, prd_type_id
ORDER BY month, prd_type_id;
Page: 232 - Using the LAG() and LEAD() Function:
SELECT month, SUM(amount) AS month_amount,
LAG(SUM(amount), 1) OVER (ORDER BY
month) AS previous_month_amount,
LEAD(SUM(amount), 1) OVER (ORDER BY
month) AS next_month_amount
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;
Page: 232 - Using FIRST and LAST functions:
SELECT MIN(month) KEEP (DENSE_RANK FIRST ORDER BY SUM(amount)) AS
highest_sales_month,
MIN(month) KEEP (DENSE_RANK LAST ORDER BY SUM(amount)) AS lowest_sales_month
FROM all_sales
WHERE year = 2003
GROUP BY month
ORDER BY month;
PAGE 233 - Using the Linear Regression Functions:
SELECT prd_type_id,
REGR_AVGX(amount, month) AS avgx,
REGR_AVGY(amount, month) AS avgy,
REGR_COUNT(amount, month) AS count,
REGR_INTERCEPT(amount, month) AS inter,
REGR_R2(amount, month) AS r2,
REGR_SLOPE(amount, month) AS slope,
REGR_SXX(amount, month) AS sxx,
REGR_SXY(amount, month) AS sxy,
REGR_SXY(amount, month) AS sxy
FROM all_sales
WHERE year = 2003
GROUP BY prd_type_id
ORDER BY prd_type_id;
PAGE 234 - Using the Hypothetical Rank and Distribution Functions:
SELECT prd_type_id, SUM(amount),
RANK() OVER (ORDER BY SUM(amount) DESC) AS rank,
PERCENT_RANK() OVER (ORDER BY SUM(amount) DESC) AS percent_rank
FROM all_sales
WHERE year = 2003 AND amount IS NOT NULL
GROUP BY prd_type_id
ORDER BY prd_type_id;
SELECT RANK(500000) WITHIN GROUP (ORDER BY SUM(amount) DESC) AS rank,
PERCENT_RANK(500000) WITHIN GROUP (ORDER BY SUM(amount) DESC) AS percent_rank
FROM all_sales
WHERE year = 2003 AND amount IS NOT NULL
GROUP BY prd_type_id
ORDER BY prd_type_id;
PAGE 235 - Using the MODEL clause.
SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
sales_amount[1,2004] = sales_amount[1,2003],
sales_amount[2,2004] = sales_amount[2,2003] +
sales_amount[3,2003],
sales_amount[3,2004] = ROUND(sales_amount[3,2003] * 1.25, 2)
)
ORDER BY prd_type_id, year, month;
PAGE 237 - Using Positional and Symbolic Notation to Access Cells:
SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
sales_amount[month=1,year=2004] = sales_amount[month=1,year=2003],
sales_amount[month=2,year=2004] = sales_amount[month=2,year=2003]
+ sales_amount[month=3,year=2003],
sales_amount[month=3,year=2004] = ROUND(sales_amount[month=3,year=2003]
* 1.25, 2)
)
ORDER BY prd_type_id, year, month;
PAGE 238 - Access a Range of Cells Using BETWEEN and AND:
SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
sales_amount[1, 2004] = ROUND(AVG(sales_amount)[month BETWEEN
1 AND 3, 2003], 2)
)
ORDER BY prd_type_id, year, month;
PAGE 238 - Access a Range of Cells Using ANY and IS ANY:
SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
sales_amount[1, 2004] = ROUND(SUM(sales_amount)[ANY, year IS
ANY], 2)
)
ORDER BY prd_type_id, year, month;
PAGE 239 - Getting the Current Value of a Dimension Using CURRENTV():
SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
sales_amount[1, 2004] = ROUND(sales_amount[CURRENTV(), 2003]
* 1.25, 2)
)
ORDER BY prd_type_id, year, month;
PAGE 240 - Access a Range of Cells Using a FOR Loop:
SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] =
ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2)
)
ORDER BY prd_type_id, year, month;
PAGE 240 - Handling Null and Missing Values:
SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] =
CASE WHEN sales_amount[CURRENTV(),
2003] IS PRESENT THEN
ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2)
ELSE
0
END
)
ORDER BY prd_type_id, year, month;
PAGE 242 - PRESENTV():
SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] =
PRESENTV (sales_amount[CURRENTV(),
2003],
ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2),0)
)
ORDER BY prd_type_id, year, month;
PAGE 242 - PRESENTNNV():
SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] =
PRESENTNNV (sales_amount[CURRENTV(),
2003],
ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2),0)
)
ORDER BY prd_type_id, year, month;
PAGE 243 - Using IGNORE NAV and KEEP NAV:
SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL IGNORE NAV
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount) (
sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] =
ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2)
)
ORDER BY prd_type_id, year, month;
PAGE 243 - Updating Existing Cells:
SELECT prd_type_id, year, month, sales_amount
FROM all_sales
WHERE prd_type_id BETWEEN 1 AND 2
AND emp_id = 21
MODEL IGNORE NAV
PARTITION BY (prd_type_id)
DIMENSION BY (month, year)
MEASURES (amount sales_amount)
RULES UPDATE (
sales_amount[FOR month FROM 1 TO 3 INCREMENT 1, 2004] =
ROUND(sales_amount[CURRENTV(), 2003] * 1.25, 2)
)
ORDER BY prd_type_id, year, month;