SQL Documents

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

3 longest movies and actors who played the role

List all movies in which burt temple played


If we need the film name
IN, NOT IN, EXISTS, and NOT EXISTS are used in SQL to filter
data based on certain conditions. Here are some guidelines on
when to use each of these operators:
 Use IN when you want to filter data based on a list of specific
values. For example, if you want to select all the employees in
the Marketing or Sales department, you can use the IN
operator as follows:

SELECT * FROM employees WHERE department IN ('Marketing', 'Sales').

 Use NOT IN when you want to exclude data based on a list of


specific values. For example, if you want to select all
employees NOT in either the Marketing or the Sales
department, you can use the NOT IN operator as follows:

SELECT * FROM employees WHERE department NOT IN ('Marketing', 'Sales').

 Use EXISTS when you want to check whether a subquery


returns any rows. For example, if you want to select all the
employees who have at least one order in the orders table, you
can use the EXISTS operator as follows:

SELECT * FROM employees WHERE EXISTS (SELECT * FROM orders WHERE orders.employee_id =
employees.employee_id).

 Use NOT EXISTS when you want to check whether a subquery


does not return any rows. For example, if you want to select all
the employees who do not have any orders in the orders table,
you can use the NOT EXISTS operator as follows:

SELECT * FROM employees WHERE NOT EXISTS (SELECT * FROM orders WHERE
orders.employee_id = employees.employee_id).

Problem

Is there a difference between using the T-SQL IN operator or the


EXISTS operator in a WHERE clause to filter for specific values in
SQL queries and stored procedures? Is there a logical difference, a
performance difference or are they exactly the same? And what
about NOT IN and NOT EXISTS?

Solution

In this SQL tutorial we'll investigate if there are any differences


between the EXISTS and the IN operator. This can either be logical,
i.e. they behave different under certain circumstances, or
performance-wise, meaning if using one operator has a performance
benefit over the other. We'll be using the AdventureWorks DW
2017 sample database for our test queries for the Microsoft SQL
Server DBMS.

SQL IN vs EXISTS Syntax

The IN operator is typically used to filter a column for a certain list


of values. For example, review this SELECT statement:

SELECT
[ProductSubcategoryKey]
,[EnglishProductSubcategoryName]
,[ProductCategoryKey]
FROM [AdventureWorksDW2017].[dbo].[DimProductSubcategory]
WHERE [ProductCategoryKey] IN (1,2);

This query searches for all the product subcategories which belong
to the product categories Bikes and Categories
(ProductCategoryKey 1 and 2).
You can also use the IN operator to search the values in the result
set of a subquery with the following SQL commands:

SELECT
[ProductSubcategoryKey]
,[EnglishProductSubcategoryName]
,[ProductCategoryKey]
FROM [AdventureWorksDW2017].[dbo].[DimProductSubcategory]
WHERE [ProductCategoryKey] IN
(
SELECT [ProductCategoryKey]
FROM [dbo].[DimProductCategory]
WHERE [EnglishProductCategoryName] = 'Bikes'
);

This query returns all subcategories linked to the Bikes category.


The benefit of using a subquery is that the query becomes less hard-
coded; if the ProductCategoryKey changes for some reason, the
second query will still work, while the first query might suddenly
return incorrect results. It's important though the subquery returns
exactly one column for the IN operator to work.

The EXISTS operator doesn't check for values, but instead checks
for the existence of rows. Typically, a subquery is used in
conjunction with EXISTS. It actually doesn't matter what the
subquery returns, as long as rows are returned.

This query will return all rows from the ProductSubcategory table,
because the inner subquery returns rows (which are not related to
the outer query at all).

SELECT
[ProductSubcategoryKey]
,[EnglishProductSubcategoryName]
,[ProductCategoryKey]
FROM [AdventureWorksDW2017].[dbo].[DimProductSubcategory]
WHERE EXISTS (
SELECT 1/0
FROM [dbo].[DimProductCategory]
WHERE [EnglishProductCategoryName] = 'Bikes'
);

As you might have noticed, the subquery has 1/0 in the SELECT
clause. In a normal query, this would return a divide by zero error,
but inside an EXISTS clause it's perfectly fine, since this division is
never calculated. This demonstrates that it's not important what the
subquery returns, as long as rows are returned.

To use EXISTS in a more meaningful way, you can use a correlated


subquery. In a correlated subquery, we pair values from the outer
query with values from the inner (sub)query. This effectively checks
if the value of the outer query exists in the table used in the inner
query. For example, if we want to return a list of all employees who
made a sale, we can write the following query:

SELECT
[EmployeeKey]
,[FirstName]
,[LastName]
,[Title]
FROM [AdventureWorksDW2017].[dbo].[DimEmployee] e
WHERE EXISTS (
SELECT 1
FROM dbo.[FactResellerSales] f
WHERE e.[EmployeeKey] = f.[EmployeeKey]
);
In the WHERE clause inside the EXISTS subquery, we correlate the
employee key of the outer table – DimEmployee – with the
employee key of the inner table – FactResellerSales. If the
employee key exists in both tables, a row is returned and EXISTS
will return true. If an employee key is not found in
FactResellerSales, EXISTS returns false and the employee is
omitted from the results:

We can implement the same logic using the IN operator with the
following SQL statement:

SELECT
[EmployeeKey]
,[FirstName]
,[LastName]
,[Title]
FROM [AdventureWorksDW2017].[dbo].[DimEmployee] e
WHERE [EmployeeKey] IN (
SELECT [EmployeeKey]
FROM dbo.[FactResellerSales] f
);

Both queries return the same result set, but maybe there is an
underlying performance difference? Let's compare the execution
plans.
Let’s illustrate the last point with an example. In the
AdventureWorks data warehouse, we have an Employee dimension.
Some employees manage a specific sales territory:
Now, it’s possible that a sales person also makes sales in other
territories. For example, Michael Blythe – responsible for the
Northeast region – has sold in 4 distinct regions:
Let’s suppose we now only want to find the sales amounts for the
sales territory managers, but only for their own region. A possible
SQL query could be:

SELECT
f.[EmployeeKey]
,f.[SalesTerritoryKey]
,SUM([SalesAmount])
FROM [dbo].[FactResellerSales] f
WHERE EXISTS
(
SELECT 1
FROM [dbo].[DimEmployee] e
WHERE f.[EmployeeKey] = e.[EmployeeKey]
AND f.[SalesTerritoryKey] = e.[SalesTerritoryKey]
AND e.[SalesTerritoryKey] <> 11 -- the NA region
)
GROUP BY f.[EmployeeKey]
,f.[SalesTerritoryKey];

The result is as follows:

SQL Server NOT IN vs NOT EXISTS

By prefixing the operators with the NOT operator, we negate the


Boolean output of those operators. Using NOT IN for example will
return all rows with a value that cannot be found in a list.
There is one special case though: when NULL values come into the
picture. If a NULL value is present in the list, the result set is empty!
This means that NOT IN can return unexpected results if suddenly a
NULL value pops up in the result set of the subquery. NOT EXISTS
doesn't have this issue, since it doesn't matter what is returned. If
an empty result set is returned, NOT EXISTS will negate this,
meaning the current record isn't filtered out:

The query above returns all employees who haven't made a sale.
Logically, NOT IN and NOT EXISTS are the same – meaning they
return the same result sets – as long as NULLS aren't involved. Is
there a performance difference? Again, both query plans are the
same:
 A CTE is a named set of temporary results that is defined
within the context of a single SQL statement.
 It allows complex queries to be broken down into smaller,
more manageable pieces, making them easier to read and
understand.
 CTEs are typically used to simplify queries that involve
multiple joins or subqueries, or to perform recursive queries.
Feature CTE View Subquery
Purpose Temporary Reusable query Part of a larger
query result result query

Usage Defined in a Defined as a Nested within


single SQL separate a larger query
statement database object

Syntax Defined using Defined using Defined within


the keyword the CREATE parentheses
WITH VIEW statement

Naming Named within Named as a Not named


the query separate
database object

Performanc Generally less Generally more Generally less


e efficient due efficient, as they efficient due to
to lack of can be indexed repeated
optimization calculations

Data Not persisted Persisted in the Not persisted


persistence in the database in the
database database

Functionali Can be Can be indexed Can only return


ty recursive scalar or
aggregate
values
Readability Can improve Can improve the Can make
the readability readability of queries more
of complex complex queries complex and
queries harder to read

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy