Advanced SQL Functions
Advanced SQL Functions
Advanced SQL Functions
There are three types of SQL extensions that fall under the banner of "analytic functions" though the first could be said to provide "analytic functionality" rather than actually be analytic functions:
new grouping of resultsets through extensions to the GROUP BY clause (ROLLUP and CUBE); the new analytic functions themselves; and TOP-N analysis (largely enabled by the analytic functions).
Example Two: Sums the salaries by job and then sub-totals each department and each job type (similar to a break sum report in SQL*Plus) and then the entire salary column. This will provide sub-totals for all columns within the GROUP BY clause.
select deptno,job ,sum(sal) from emp group by cube(deptno,job)
Example Three: Using the GROUPING function to "label" the sub-total rows (i.e. determine which rows represented rollups). The GROUPING function returns a value of 1 if the current row is a row representing an aggregated rollup group (such as the sub-total rows) or zero if the row is one of the "source" records itself.
select decode(grouping(deptno),1,'all depatments',deptno) as deptno, decode(grouping(job),1,'job total',job) as job, sum(sal) from emp group by cube(deptno,job) select decode(grouping(deptno),1,'all depatments',deptno) as deptno, decode(grouping(job),1,'job total',job) as job, sum(sal) from emp group by rollup(deptno,job) ----- no use of grouping here
suriyanreddyb@ymail.com
analytic functions
There are 33 new analytic functions, though it is likely that most users will concentrate on a small number of these and very few will use the statistical capabilities. They are:
AVG, MIN, MAX, COUNT, SUM LAG, LEAD FIRST_VALUE, LAST_VALUE ROW_NUMBER RANK, DENSE_RANK, PERCENT_RANK NTILE, CUME_DIST, RATIO_TO_REPORT CORR COVAR_POP VARIANCE, VAR_POP, VAR_SAMP STTDEV, STTDEV_POP, STDDEV_SAMP REGR_COUNT, REGR_P2, REGR_AVGX, REGR_AVGY, REGR_SXX, REGR_SXY, REGR_SYY, REGR_SLOPE, REGR_INTERCEPT
The way analytic functions work is to manipulate data contained within returning resultsets. This means they can process, merge and compute against data that has already been fetched from a query and partition and order the resultset into groups while at the same time returning the entire resultset without GROUP BY clauses.
Example Four: Calculate a running salary total for each department as new employees were hired.
select ename, deptno, sal, sum(sal) over(partition by deptno order by ename nulls last ) dept_running_total, row_number() over (partition by deptno order by ename asc nulls last) empno_running_total from emp select ename, deptno, sal, sum(sal) over(partition by deptno order by ename ) running_total from emp /* it will give the same result as above one. but if we have same sal as sequenced then it
select ename, deptno, sal, sum(sal) over(partition by deptno) running_total from emp /* it will give dept wise total salaries , for every dept */ select ename, deptno, sal,
suriyanreddyb@ymail.com
Analytic functions are invoked using the OVER() clause. This also enables Oracle to distinguish between PL/SQL functions and analytic functions that share the same name such as AVG, MIN and MAX. There are three components to the OVER clause:
PARTITION clause, by which the resultset can be broken into groups, such as departments in the example above. Without this the entire resultset is treated as a single partition; ORDER BY clause, by which the resultset or partition group can be ordered. This is optional for some analytic functions but mandatory for those which need to access rows either side of the current row, such as LAG and LEAD; and RANGE or ROWS clause (AKA windowing), by which the function can be made to include rows or values around the current row in its calculations. RANGE windows work on values and ROWS windows work on records, such as either X rows on each side of the current row or all rows preceding the current row, within the current partition.
The PARTITION and ORDER BY clauses are demonstrated in the first example above. The resultset was partitioned into the individual departments in the organization. Within each department, the data was ordered by ename (using default criteria (ASC and NULLS LAST). No RANGE clause was added which means that we used the default of RANGE UNBOUNDED PRECEDING, which means include all the preceding records in the current partition in the calculation for the current row. The easiest way to understand analytic functions and windowing is by examples which demonstrate the each of the three components to the OVER() clause. Example Five: Find the average salary by department and compare each employees' salaries to the department average.
SELECT deptno , ename , sal , ROUND(average_sal_dept,0) AS average_sal_dept , ROUND(sal - average_sal_dept,0) AS sal_variance FROM (SELECT deptno , ename , sal , AVG(sal) OVER (PARTITION BY deptno) AS average_sal_dept FROM emp); .-OR-. select deptno,ename,sal,round( (select avg(SAl) from emp where deptno=e.deptno group by deptno )) avg_sal ,sal-(round( (select avg(SAl) from emp where deptno=e.deptno group by deptno ))) sal_variance
suriyanreddyb@ymail.com
Example Six: Determine the order by which employees joined their respective departments. Also include the employees who preceded and succeeded them. ==> LAG() and LEAD() provide access to rows around the current row,
SELECT deptno, ename, hiredate, LAG (ename, 1, NULL) OVER (PARTITION BY deptno ORDER BY hiredate ASC NULLS LAST) AS previous_employee_ename, LEAD (ename, 1, NULL) OVER (PARTITION BY deptno ORDER BY hiredate ASC NULLS LAST) AS next_employee_ename FROM emp ORDER BY deptno;
Example Seven: Determine the proportion of each department's salary taken up by its individual employees:
SELECT deptno, ename, sal, dept_sal, ROUND (employees_dept_ratio * 100, 2) AS emps_proportion FROM (SELECT deptno,
suriyanreddyb@ymail.com
ename, sal, SUM (sal) OVER (PARTITION BY deptno) AS dept_sal, RATIO_TO_REPORT (sal) OVER (PARTITION BY deptno) AS employees_dept_ratio FROM emp) ORDER BY deptno;
RANGE windowing: Example Eight: Determine the first and last employee to be employed within 50 days of the current employees' hiredate
select deptno, ename, hiredate, first_value(ename) over() first_ename, last_value(ename) over() last_ename from emp ----it will print first ename for all records, and last name for all records as first_ename and last_ename select deptno, ename, hiredate, first_value(ename) over(order by hiredate range between 50 preceding and 50 following) first_emp, last_value(ename) over(order by hiredate range between 50 preceding and 50 following ) last_emp from emp --is the right solution
suriyanreddyb@ymail.com
ROWS windowing:
Example Nine: Determine who was recruited two employees before and three after the current employee (note what happens when employees share hiredates).
select deptno, ename, hiredate, first_value(ename) over (order by hiredate rows 2 preceding) two_emps_back, last_value(ename) over(order by hiredate rows 3 preceding) three_emps_forward from emp ---- here no use of last value whenever we are using preceding select deptno, ename, hiredate, first_value(ename) over (order by hiredate rows 2 preceding) two_emps_back, first_value(ename) over(order by hiredate rows 3 preceding) three_emps_back from emp select deptno, ename, hiredate, first_value(ename) over (order by hiredate rows 2 preceding) two_emps_back, first_value(ename) over(order by hiredate desc rows 3 preceding) three_emps_forward from emp ---is the right solution
top-n queries
Example Ten: Who were the first three recruits to our organization?
SELECT ROWNUM AS rank , ename , hiredate FROM (SELECT ename , hiredate FROM emp ORDER BY hiredate ASC NULLS LAST) WHERE ROWNUM <= 3;
There is ambiguity to the above question, especially in using the above methodology. For example, if five people were recruited on the same day, then how would this question be
suriyanreddyb@ymail.com
answered? The ORDER BY in-line view method would generate the employees in no particular order and the stopkey would stop returning rows at record three. To remove this ambiguity, analytic functions can help. For the following example, I've updated five employees to have the earliest date. Example Eleven: Who were the first three recruits to our organization?
SELECT hire_rank , ename , hiredate FROM (SELECT ename , hiredate , RANK() OVER (ORDER BY HIREDATE ASC NULLS LAST) AS hire_rank FROM emp) WHERE hire_rank <= 3; HIRE_RANK ENAME HIREDATE ---------- ---------- ----------1 SMITH 01-JAN-1951 1 ALLEN 01-JAN-1951 1 WARD 01-JAN-1951 1 JONES 01-JAN-1951 1 MARTIN 01-JAN-1951
Technically, this has not answered the actual question but has instead expanded it. In using the RANK() analytic function, the query has returned all the people who joined the organization on the same day, even though this is more than the three people asked for. Note the use of RANK() rather than the DENSE_RANK() function. The RANK() function skips ranking numbers, such that the sixth employee to be returned would be given a rank of 6. DENSE_RANK() would assign the sixth person a rank of 2 as this works on distinct values, rather than rows.
if the hiredates will be different then the output will be like this..
Example Twelve: Determine the ranking of each employees' salary within their departments and within the company as a whole.
SELECT , , , , deptno ename sal DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal DESC NULLS LAST) AS dept_ranking DENSE_RANK() OVER
suriyanreddyb@ymail.com
FROM ORDER
(ORDER BY sal DESC NULLS LAST) AS company_ranking emp BY deptno; ENAME SAL DEPT_RANKING COMPANY_RANKING ---------- ---------- ------------ --------------KING 5000 1 1 CLARK 2450 2 5 MILLER 1300 3 8 SCOTT 3000 1 2 FORD 3000 1 2 JONES 2975 2 3 ADAMS 1100 3 10 SMITH 800 4 12 BLAKE 2850 1 4 ALLEN 1600 2 6 TURNER 1500 3 7 WARD 1250 4 9 MARTIN 1250 4 9 JAMES 950 5 11
DEPTNO ---------10 10 10 20 20 20 20 20 30 30 30 30 30 30
Note: The use of DENSE_RANK() and opposed to RANK() means that no rank numbers are skipped. For example, in department 20, SCOTT and FORD have the same salary so they share a dense_rank of 1, while JONES (next highest) has the dense_rank of 2. With RANK(), JONES would be ranked 3, as rank is relative to the number of rows, so the RANK() for SCOTT, FORD and JONES would be 1,1,3 respectively.
suriyanreddyb@ymail.com
==::Hierarchical Quaries::==
select level,empno,ename,mgr from emp start with empno=7902 connect by prior empno=mgr
select level,empno,ename,mgr from emp start with mgr is null connect by prior empno=mgr
select ename ||'reports to'|| prior ename "Reporting Details of employee" from emp start with mgr is null connect by prior empno=mgr
suriyanreddyb@ymail.com
11
suriyanreddyb@ymail.com
12
suriyanreddyb@ymail.com