SQL Documents
SQL Documents
SQL Documents
SELECT * FROM employees WHERE EXISTS (SELECT * FROM orders WHERE orders.employee_id =
employees.employee_id).
SELECT * FROM employees WHERE NOT EXISTS (SELECT * FROM orders WHERE
orders.employee_id = employees.employee_id).
Problem
Solution
SELECT
[ProductSubcategoryKey]
,[EnglishProductSubcategoryName]
,[ProductCategoryKey]
FROM [AdventureWorksDW2017].[dbo].[DimProductSubcategory]
WHERE [ProductCategoryKey] IN (1,2);
This query searches for all the product subcategories which belong
to the product categories Bikes and Categories
(ProductCategoryKey 1 and 2).
You can also use the IN operator to search the values in the result
set of a subquery with the following SQL commands:
SELECT
[ProductSubcategoryKey]
,[EnglishProductSubcategoryName]
,[ProductCategoryKey]
FROM [AdventureWorksDW2017].[dbo].[DimProductSubcategory]
WHERE [ProductCategoryKey] IN
(
SELECT [ProductCategoryKey]
FROM [dbo].[DimProductCategory]
WHERE [EnglishProductCategoryName] = 'Bikes'
);
The EXISTS operator doesn't check for values, but instead checks
for the existence of rows. Typically, a subquery is used in
conjunction with EXISTS. It actually doesn't matter what the
subquery returns, as long as rows are returned.
This query will return all rows from the ProductSubcategory table,
because the inner subquery returns rows (which are not related to
the outer query at all).
SELECT
[ProductSubcategoryKey]
,[EnglishProductSubcategoryName]
,[ProductCategoryKey]
FROM [AdventureWorksDW2017].[dbo].[DimProductSubcategory]
WHERE EXISTS (
SELECT 1/0
FROM [dbo].[DimProductCategory]
WHERE [EnglishProductCategoryName] = 'Bikes'
);
As you might have noticed, the subquery has 1/0 in the SELECT
clause. In a normal query, this would return a divide by zero error,
but inside an EXISTS clause it's perfectly fine, since this division is
never calculated. This demonstrates that it's not important what the
subquery returns, as long as rows are returned.
SELECT
[EmployeeKey]
,[FirstName]
,[LastName]
,[Title]
FROM [AdventureWorksDW2017].[dbo].[DimEmployee] e
WHERE EXISTS (
SELECT 1
FROM dbo.[FactResellerSales] f
WHERE e.[EmployeeKey] = f.[EmployeeKey]
);
In the WHERE clause inside the EXISTS subquery, we correlate the
employee key of the outer table – DimEmployee – with the
employee key of the inner table – FactResellerSales. If the
employee key exists in both tables, a row is returned and EXISTS
will return true. If an employee key is not found in
FactResellerSales, EXISTS returns false and the employee is
omitted from the results:
We can implement the same logic using the IN operator with the
following SQL statement:
SELECT
[EmployeeKey]
,[FirstName]
,[LastName]
,[Title]
FROM [AdventureWorksDW2017].[dbo].[DimEmployee] e
WHERE [EmployeeKey] IN (
SELECT [EmployeeKey]
FROM dbo.[FactResellerSales] f
);
Both queries return the same result set, but maybe there is an
underlying performance difference? Let's compare the execution
plans.
Let’s illustrate the last point with an example. In the
AdventureWorks data warehouse, we have an Employee dimension.
Some employees manage a specific sales territory:
Now, it’s possible that a sales person also makes sales in other
territories. For example, Michael Blythe – responsible for the
Northeast region – has sold in 4 distinct regions:
Let’s suppose we now only want to find the sales amounts for the
sales territory managers, but only for their own region. A possible
SQL query could be:
SELECT
f.[EmployeeKey]
,f.[SalesTerritoryKey]
,SUM([SalesAmount])
FROM [dbo].[FactResellerSales] f
WHERE EXISTS
(
SELECT 1
FROM [dbo].[DimEmployee] e
WHERE f.[EmployeeKey] = e.[EmployeeKey]
AND f.[SalesTerritoryKey] = e.[SalesTerritoryKey]
AND e.[SalesTerritoryKey] <> 11 -- the NA region
)
GROUP BY f.[EmployeeKey]
,f.[SalesTerritoryKey];
The query above returns all employees who haven't made a sale.
Logically, NOT IN and NOT EXISTS are the same – meaning they
return the same result sets – as long as NULLS aren't involved. Is
there a performance difference? Again, both query plans are the
same:
A CTE is a named set of temporary results that is defined
within the context of a single SQL statement.
It allows complex queries to be broken down into smaller,
more manageable pieces, making them easier to read and
understand.
CTEs are typically used to simplify queries that involve
multiple joins or subqueries, or to perform recursive queries.
Feature CTE View Subquery
Purpose Temporary Reusable query Part of a larger
query result result query