What is a database table
A table is a structure that is used to store
data (values) in an organized way using rows and columns. Let's say you want to store data about countries. Below is example of a country table:
And below you can see where are colums (ID and Country), rows (8 rows) and values (2 columns x 8 rows = 16 values)
For more information visit what is a database
Take care
Emil
Purpose of tables in a database
The main purpose of tables is to
store data in an organized way that allows to achieve company objectives. One
database almost always contains multiple tables that represent enity (for
instance Customer, Product, Order) and they relate to each other, for instance
Customer buys a product and receive unique Order Number.
In our definition we mentioned to achieve company objectives. This is quite important because it affects how tables are
designed and there are different techniques depending on objective.
Purpose of table is not limited to
simply store data and it helps to achieve other objective like:
- Enforces uniqueness (for example one customer generally
(exception is data warehouse) should be recorded only once in a
customer table)
- Allow users to enter only valid values (using data
types and constraints)
- Make the data sharable with others (database is
generally on a server and can be used by multiple departments)
- Enforce referential integrity and that means that you
cannot have order without a customer.
- Quick access to data. Tables or more precisely indexes
(that are create for tables) allow quick access to data.
What is SQL
You might be wondering what is SQL. Let's start with explaining what it stands for. SQL stands for Structure Query Language and it is a language that allows you to communicate with databases. In previous lesson we explained what is a database and we explained that you store data in tables that consist of columns, rows and data. You can have multiple tables related to each other so you need a way to retrieve data (from one or multiple related tables), insert, update or delete data in tables.In order to do that you need language that your Database Management Software (DBMS) understand and can convert into actions. This language is SQL. To better understand what is SQL let's give you several simple examples of actions you might want to perform on your database.
Let's say you have been given a database and you have been given several tasks:
1) Show me all customers from our database
2) Insert new customers
3) Update customer information
4) Delete customers that requested to be permanently removed from your database
SQL is a very simple and easy to understand language as it is very similar to questions you need to answer or actions you need to perform so let me give you SQL example for the first 4 points:
1)
SELECT FirstName, Surname, DOB
FROM Customer
-- as you can it to retrieve field from a particular table we just need to use SELECT and provide fields names (separated using comma) and extract data FROM a table which is our case is customer so we added FROM Customer
2)
INSERT INTO Customer (FirstName, Surname, DOB)
VALUES ('Emil', 'Glownia','1983-08-19')
NOTE: single ' around values is needed to tell SQL that this is a string value as opposed to table name, field name, special keyword and so on. single quotes are also used for dates but are not used for numbers as number is not a string value.
-- second point asked us to insert new customer. This is simple with SQL you just type INSERT INTO Customer and specify field names and then put VALUES and provide the actual values. Strings are surrounded using single quotes. Isn't that simple?
3)
UPDATE Customer
SET Surname = 'Glownia'
WHERE FirstName = 'Katie' AND Surname = 'Sliwinska' AND DOB = '1986-08-20'
Note: We use CAPITAL letters for UPDATE, SET and AND operator. This is considered by many as best practice as it easier to read it and distinguish between tables names (Proper case) and user values.
-- third point is to update customer details. Katie got married and changed her surname to Glownia. In order to update the record using SQL we just need to type UPDATE Customer then use word SET provide field name we want to update (Surname) and provide value (='Glownia') next we need to identify unique customer record and we do that using WHERE and we specify firstname, surname and DOB.
4)
DELETE FROM Customer
WHERE FirstName = 'John' AND Surname = 'Smith' and DOB = '1980-01-01'
-- John doesn't want to have his records in our database and requested to be permanently removed. We do that using DELETE FROM Customer and specifying WHERE condition that identifies John's record.
NOTE: In real life you would normally use Primary Key (PK) or Foreign Key (FK) to identify unique records which we discussed in 'what is a database' page.
When you use SQL you have to obey certain rules. Below I show you SQL Clauses list with order of execution or in other words sequence of clauses that you should follow in order to successfuly execute SQL statement.
SELECT -- which you use to specify the fields or calculated fields you want to return
FROM -- which you use to specify table or tables you want to use and usually includes joins.
WHERE -- which you use to filter rows
GROUP BY -- used to grouped together values using field or expression you specified
HAVING -- filters group by results with aggregate function (doesn't work on row by row basis like WHERE)
ORDER BY -- used to specify results order
The above our basic SQL Clauses in order you should use them. You don't have to use all of them but there are certain rules you should also follow:
1) SELECT is mandatory
2) FROM is used in most cases (sometimes you can use only select with union without specific table)
3) WHERE you need SELECT & FROM to use WHERE (In SQL Server you can actually used WHERE with select without from but there is not much use of that)
4) GROUP BY -- You cannot used fields in SELECT that are not in group by unless you use aggregate function with them
5) HAVING -- working only if you use GROUP BY
6) ORDER BY -- You can use it at any time
Three reasons for normalization in database
Who is interested in Normalization?
Before we go into reasons for normalization let's make it very clear who is interested in normalization. This is important because there are different types of databases and two main kind of databases are:- Operational Databases (Online Transaction Processing - OLTP) -
These databases are used every day by employees or customers and generally
they are updated on row by row basis for example: An Employee
may retrieve one customer's details (one row) and update one customer
details (one row). These databases are generally used by large number of
users that may potentially want to update the same records. OLTP
databases are the most widely used databases but bear in mind that the reason
behind it, is it's purpose (operational activities) and therefore
they will be often many
purpose specific databases. OLTP Databases are often
critical from operations perspective (selling an air ticket) and are
available for 99.9%-100% of the time.
- Data Warehouses (Supports Online Analytical Processing
- OLAP) - This kind of database is actually the opposite of OLTP in
many aspects. Data Warehouses integrate data from multiple sources
(usually many OLTP databases but also excel files etc) into one centralized
location. During integration process (commonly known as ETL) the data
is cleansed (remove duplicates, typos etc) and business rules
are applied (reject/accept, calculations etc). Data Warehouses are usually
used by decision makers and analysts and they access set of rows (rarely
single row) and they typically ask questions like how much did the company
(or department) sale last month and the same time a year before. Most
companies should have only one data warehouse and the database is
critical for supporting tactical and strategic decisions (sometimes
operational)
NOTE: When we talk about Data warehouses we mean Kimball Data Warehousing methodology which is the most common one. There are different ones where our arguments may not be applicable.
IMPORTANT: Normalization supports OLTP characteristics and objectives and you should always bear this in mind because Normalization almost never is used in Data Warehouse databases. You may see normalized Data warehouses but that is only because someone was not aware of negative consequences of doing that.
Three reasons for Normalization
Below are three main reasons for normalization (from my point of view).
Reminder: These reasons are not applicable in Data Warehouses because they are handled differently and usage is different.
- Data Integrity
- Data in each table should contain unique
instances of that entity (table) with valid attribute (column) values.
That means in Customer Entity (table) you should have only one customer
(not the same customer with misspelled surname, or changed surname and so
on) also certain customer attributes should be valid for instance DOB
should not be in the future or sometimes it should be enforce rule of at
least 18 years. UK customer Post Code should also be valid and you can
enforce UK post code pattern (that get's complicated with multiple
countries)
- You can achieve that using table constraints
like: Primary Key, Unique Key, Check constraint, default constraint.
- Related tables should contain valid values.This
is also know as Referential Integrity which means that two related tables
should have related keys. In order words when we break it down to Parent
(Primary Key field) and Child table (Foreign Key) we should not have
Child table Foreign Key values that do not exist in Parent table.
- We can achieve that using Foreign Key
Constraints that will ensure you cannot delete (or insert) value into
Foreign Key field that does not exist as Primary Key field value.
- Controlled Data Manipulation
- In OLTP many users may want to access and modify
the same rows. After Normalization process you have many tables that
contain usually one values in one table (one instance in one table to be
more precise). This means that if you need to modify Product Category you
don't lock product information
and just lock one particular table which is used in many other tables
(using Foreign Key). This means users conflicts (trying to modify the
same row) is minimized to minimum.
- Security of data is also important and with many
tables you get greater control over data access.
- Performance
- If you break down table into multiple smaller
tables than that means you will save space (note: this is not really
a good argument in data warehouses due to different usage). When you
save space that means you get better performance because accessing
particular row requires less effort.
- In databases you often improve retrieval
performance by adding indexes and in OLTP choosing the right indexes in
particularly important as there are large amount of request that usually
affects individual rows or small set of rows and you want to ensure
extract of one or small set of rows is very fast but at the same time you
don't want to slow down updates, inserts and deletes and the more indexes
you have to slower updates/inserts/delete are therefore choosing the
right indexes is particularly important and usually have a big affect on
users experience.
Article Version: SQL Select Statement
In this tutorial I will explain what a SQL SELECT statement is, when can you use it and I will give code examples how to use it.A database stores data in tables. Data in tables is stored using columns and rows. You can retrieve data from tables using SQL SELECT Statement.
SELECT can be used to:
- Retrieve data from specific columns which exist in
one or more tables.
- Create calculated fields that contains specific
logic to product new results and return it to the user as extra column.
- Use Manipulation Functions (Text, Data and Time,
Numeric). For instance retrieve on the fly today’s date.
- Use aggregate functions to summary the rows. For
instance Sum sales for each country.
- Limit Results. For instance show only top 10
customers.
- Retrieve only distinct rows. For instance to get
only a list of values that are used in a specific field.
SQL Syntax and examples
Below is a very simple SQL SELECT syntax:SELECT FieldName1, FieldName2, FieldName3
FROM tableName
I would like to show you now a few simple examples using Select.My table is called PersonDetails and contains only three fields (PersonID, PersonFirstName, PersonSurname). Based on this table I will present you how to retrieve only the fields you need and retrieve ALL rows from the table which is 4 in this case.
First example retrieves data from PersonName field which is taken from PersonDetails table :
SELECT PersonName
FROM PersonDetails
See below screenshot with our code sample and retrieved data.
In next example I will use exactly the same table as before (table PersonDetails) but this time I will retrieve two fields (PersonName, PersonSurname).Retrieving multiple columns is exactly the same as what I showed you on the first example, the only change we need to make is put comma (,) after the first field in select and then put the second field name.
Below is the code we use to retrieve rows (data) from multiple columns:
SELECT PersonName, PersonSurname
FROM PerosnDetails
See below the outcome of our code
As you can see retrieving fields from one table using SQL is very simple. In the next example we will show you how to retrieve all fields from one table without specifying the columns.
Using table PersonDetails I will retrieving all columns. There are two ways to retrieve data from all columns first one is to put all fields names in SELECT but there is an easier way to do that. We can use special character * which retrieved all fields from the specified table(s).
Below is the code sample:SELECT *
FROM PerosnDetails
Let’s check the output. As you can see the special character * retrieved data from all columns.
SQL Tip: Retrieving all columns can be very convenient but use only if you want to see what is in the table. If your code is going to be part of some kind of application/reporting system make sure you specify the fields which will make your queries run faster, will keep network data traffic to minimum and it is usually easier to understand your code (which fields are involved).
You may have noticed that our results pane column header is exactly the same as the field name in our select and sometimes you will want to rename it. This is easy to do with SQL. You can as an alias which is very simple in use; you just have put as NewFieldName after the column you want to rename.
Let’s see it in action:
SELECT PersonName as Name FROM PerosnDetails
As you can see our retrieved field in called ‘Name’ in the results pane and not ‘PersonName’ anymore.
SQL Select calculated column
You can retrieve field values using select
but you can also use select to perform calculations and created calculated
fields and below I will give several examples.The simpliest form of calculated field (or calculated column) is below:
SELECT 2+2 AS CalculatedField
As you can see I provided two values and performed addition on them to get 4 as result. This is calculated field. Notice that I haven't used from as I don't need it for calculations.
In the following examples I will use calculated field as well but using more realistic scenarios.
SQL Select concatenate
Using Select you can concatenate values,
fields (columns) that are strings but also you can convert data types and
perform concatenation as well. See below examples:In the example above we:
--> Concatenated two strings (I have placed a space at the end of 'word1 ' so it is displayed in the results)
SELECT ‘word1 ’ + ‘word 2’ as SimpleStrings
--> Concatenated a string with a field (I have placed a space after 'My name is ' to separate the words)
SELECT ‘My name is ‘ + FirstName as MyNameIs
FROM Customer
--> Concatenated two fields (notice how the space is added between the fields)
SELECT FirstName + ‘ ‘ + Surname as FullName
FROM Customer
--> Concatenated a string with an integer. In this case I used CAST to convert the year field which is an integer into a string (nvarchar with 4 characters)
SELECT ‘Born in ‘ + CAST(YearOfBirth AS NVARCHAR(4)) as BornInYear
FROM Customer
--> Concatenated a string (this could be any data type) with NULL. Be careful here as any concatenation with NULL (or a field containing NULL) will give you a NULL result.
SELECT ‘Problem with null’ + NULL as ProblemWithNULL
FROM Customer
--> Concatenated a string with NULL and handled NULL using the ISNULL function. In this case we changed NULL to an empty string by using 2 apostraphes(‘’)
SELECT ‘I am not NULL’ + ISNULL(NULL,’’) as NULLHandled
FROM Customer
Article Version: SQL SELECT CASE WHEN THEN ELSE
In this SQL tutorial I give you an example of how to use SQL SELECT with CASE WHEN THEN ELSE and for the purpose of this tutorial I will use SQL Server 2012.Very often you want to perform IF statement in SELECT and replace one value with another. In SQL we cannot use IF in SELECT clause however we can achieve the same result using CASE WHEN THEN ELSE.
In the below example I will use a very simple example where I will divide 7 days into 3 parts: Monday to Wednesday, Thursday to FrIday and Saturday to Sunday. Normally I would use today's date with GETDATE() function or Date Field and use DATEPART function with DW (day of week) to determine day of week and then replace with with my text... so my input value would be something like that DATEPART(DW,GETDATE()) but for simplicity reasons I will just hard code my input value and put just number 7 which is 'Saturday'.
NOTE: Sunday = 1 and Saturday = 7
SELECT
CASE 7
WHEN 2 THEN 'Monday to Wednesday'
WHEN 3 THEN 'Monday to Wednesday'
WHEN 4 THEN 'Monday to Wednesday'
WHEN 5 THEN 'Thursday to Friday'
WHEN 6 THEN 'Thursday to Friday'
WHEN 7 THEN 'Saturday to Sunday'
WHEN 1 THEN 'Saturday to Sunday'
ELSE 'Unknown'
END AS PartOfWeek
Below is result of the above SQL select statement
SQL FROM Syntax
SQL FROM clause allows us to specify table name (or view) that we want to use to retrieve certain fields. The most basic syntax is below:SELECT *
FROM TableName
SQL Joins Examples
SQL JOINS Introduction
We are going to show you how to use Joins in sql and this knowledge might help you to show related rows between two tables (inner join) for instance Customer and (Customer) Orders or show related rows between two tables and additionally show data that exists in first table but does not have related data in second table (LEFT OUTER JOIN) for instance show ALL customers and order details but show also customers who haven't placed an order yet.
SQL INNER JOIN EXAMPLE
From first table dbo.DimCustomer I check
primary key name and find this field in the dbo.FactInternetSales.I can see that in the dbo.FactInernetSales CustomerKye Field exist as well.
In my script below I join dbo.DimCustomer with dbo.FactInternetSales On CustomerKey field.
You can notice that before each field name are aliases I put them to recognize which field come from which table.
Below you can see when customers ordered something.
SQL LEFT OUTER JOIN EXAMPLE
From first table dbo.DimCustomer I check
primary key name and find this field in the dbo.vwFactSalesOrder2005I can see that in the dbo.vwFactSalesOrder2005 CustomerKye field exist as well.
NOTE: We used FactSalesOrder 2005 to filter data this way we can show customers who haven't placed an order in specific year for instance 2005
In my script below I join dbo.DimCustomer with dbo.vwFactSalesOrdr2005 On CustomerKey field.
You can notice that before each field name are aliases I put them to recognize which field come from which table.
NOTE: I have used fso.CustomerKey is null to show only customers that haven't placed an order in 2005.
Below you can see all customers without orders placed in 2005.
SQL WHERE
WHERE is used to filter data that you extract based on the conditions your specify.
SQL
WHERE Syntax
Below is the basic example of SQL Where syntax: SELECT Field1, Field2
FROM TableName
WHERE FieldName = Value
SQL WHERE Example Below are examples of simple condition:
Data Type Rules
String = if you field is string data type then you will have to enlose your values using single quotes like that 'value'. If you have values with single quote like surname al'jar then you will have to use double single quote for the quote inside like that 'al''jar' Numeric = for numeric values we don't need quotes so we can just do something like that Year = 2011 Date = Dates require quotest and you need to use specific format which is YYYY-MM-dd so if you want order that have been places on 1st of October 2011 then you can use OrderDate = '2011-10-01' but BEWARE! Field can contain time so you may lose rows! It is usually good idea to use function in this case like that DATEDIFF(d,'2011-10-06 08:00:00','2011-10-06 16:00:00') = 0 you can notice that the time is different but I use datediff function to ignore time. Boolean = Which is basically TRUE and FALSE in sql server we use 0 = FALSE and 1 = TRUE NULL = this is special value that represent non-existend value and we can filter it using IS NULL for instance email is null. Beware that you can also have empty string which different to NULL and you filter empty string (for string data type) using the following example email = ''
SQL WHERE examples
SQL GROUP BY
SQL Group By Syntax
Below is group by syntax in the simpliest form:SELECT FieldName
FROM TableName
GROUP BY FieldName
SQL Group By Example
Group By allows us to group rows together and
additionally perform aggregate calculation which works on a group of rows. Let
me first give you a simple example of what group by does using one field. I
will use Customer table and group all customers by Country in other words I
want to show all unique values that exist in Country Field, that is how group
by works. See below example code with results (before and after)SELECT *
from [dbo].[CUSTOMER]
sql group by year
Prior knowledge: It might be easier for you to follow this tutorial if you have knowledge of Aggregate functions and Group by for more information visit SQL Group By.
We are going to show you how to use Year and Sum aggregate function with Group by this knowledge might help with questions like
how to get Total sales amount for each year. In the following example I will show you how you could write the query.
Below I use Year function with ShipDate field and display new column as SalesYear in SELECT then I use group by with Year on ShipDate field.
NOTE: As you can see we "aggregated" date to Year level using Year function and placed it both in Group by clause and select clause.
Once I have Sales Year I added Sum aggregate function to select with SalesAmount field to get Total Sales Amount.
sql group by month and year
In this step by step SQL Tutorial I will show you how to use SQL Group by clause with using Month, Year and aggregate functions. For the purpose of this tutorial I will use SQL Server 2012.Prior knowledge: It might be easier for you to follow this tutorial if you have knowledge of Aggregate functions and Group by for more information visit SQL Group by.
We are going to show you how to use Month and Sum aggregate function with Group by this knowledge might help with questions like
how to get Total sales amount for each Month . In the fallowing example I will show you how you could write the query.
Below I used Month function to get only month from ShipDate and new column is displayed as SalesMonth. In Group by I use month function with ShipDate field.
Once I have data group by month I Sum SalesAount field to get Total sales amout.
I add some filter to show only months from 2006.
sql group by count
Introduction to SQL Group By with Count
In this step by step SQL tutorial I will show you how to use Group by clause witch count aggregate function. For the purpose of this tutorial I will use SQL Server 2008 R2.Prior knowledge: It might be easier for you to follow this tutorial if you have knowledge of Group By clause
for more information visit SQL Group By tutorial
SQL Group By Count Rows
In our example we will use DimCustomer table
which contains 18494 rows.
Let's check
what we have in our table that we can Group and count. Gender seems to be a
good choice.
What we want to see is total customers grouped by Gender.
In our SQL statement I will put Gender field in Group By clause and in select, additionally I will put COUNT(*) as TotalGender that will perform count on the group value. See below code and results.
Our results shows that there are approximately the same amount both Female and Male customers and we were able to find out that by using Group by and Count function.
using-not-operator-t-sql
Not is used to negate a condition.
First sample shows Product Name, ProductNumber, ListPrice, SellEndDate
From Product table. I would like to see only those product which SellEndDate is not null.
Below you can see that my SellEndDate dysplays a lot of Null values .
In Were I filter the SellEndDate to dysplay all values where are not null.
how-to-use-IN-operator-in-t-sql
In this step by step T-sql tutorial I will show you how to use IN operator.
For the purpose of this tutorial I will use SQL Server Menagment Studio 2008.
Why use In operator ?
The IN operator we used to specify a rage of conditions, any of which can be mached.
The Select statement retrieves all Employees the IN operator is followed by a comma-delimited list of valid values and the list must be closed using parentheses.
First picture shows FirstName, LastName, and Year which I took from StartDate
using Year() function and dysplayed as StartYear.
Into Were statement U used StartDate converted to Year and to get
all employees from 2000 and 2001 I used In() operator.
SQL Server Wildcards
In this tutorial I will present you some samples witch wild cards.
In the sample below I used “%” wildcard to find any matching character.
In this case I want to find all missing data containing following text before ‘5@adventure-works.com’.
The sample below presents how to find Name containing Ball inside the percentage on both sides means that there can be missing text which will be finding by this wildcard.
Next wild card which I will present is “_” .
Underscore wildcard will much only one character.
The combination of “%” and “[]” gives us opportunity to fins all data ending on 1-9.
Next wildcards combination returns everything what starts from “a” to “c”.
SQL Server INTERSECT operator
INTERSECT INTRODUCTION
We are going to show you how to compare result
of two queries using INTERSECT statement and return only distinct (unique) rows
that exist in both queries.
To better understand our examples let's create
a task that is close to real-life.
Imagine it's Christmas time and you have been
asked to identify customers that might buy something this Christmas. These
customers will be send "a gift" that will encourage them to buy
something new. Budget is limited and you need to focus on the most promising
customers.
You decided that the best customers are the
ones that bought something in December last year and the year before. You just
need to write a SQL query to identify those customers.
In the following example I will show you
how you could solve your task.
SQL INTERSECT statement using SSMS 2012
Before we start let's briefly
discuss INTERSECT behaviour and rules:
- INTERSECT returns DISTINCT (unique) rows that exists in both queries
- You need the same number of fields; in the same order
- Data type for each matching field should be the same
In our example we have to queries:
One returns Customers (CustomerKey
field - we won't bother with joins to customer table) that bought something in
December 2007.
Second T-SQL query returns Customers
(CustomerKey field) that bought something in December 2006
Our task is to return Customers that
bought something both in December 2007 and December 2006. In order to do that
we will use INTERSECT.
See below example that returns 44
unique customers (CustomerKey) that are best candidates to receive a gift this
Christmas!
sql server except example
Prior knowledge:It might be easier for you to follow this tutorial if you have knowledge of SQL UNION Operator that generally most developer learn first and is very similar to EXCEPT operator in rules and structure. The only difference is end result.
How does EXCEPT work?
EXCEPT operator combines two queries that have the same number of fields and returns unique rows from first query that don't exist in second query.
EXCEPT operator has two basic rules:
- The number of columns and order must be the same in all queries.
- The data types must be the same.
EXCEPT EXAMPLE
To better understand EXCEPT operator let's
give you a real life example. In our building a data warehouse project
I was given a task to see if we are getting website visits to pages that don't
exist in our "master list". To do that I need to compare two
different sources of information and find any values that exist in first source
but don't exist in second source.Below is a PAGE table that contains all pages that we created. In theory when someone visit our website it should go to a page that we created and our task is to check if that is what is happening and if not find out why that is not the case.
Our next source is TrackingSystem view that display information about each visit including our website page name. This data is gathered automatically and we do not have full control over it so it is important to cross reference data and see if we have "good quality" data.
See below view Tracking System that shows Page Name which in theory should match our Page Table from previous example.
Now that I have found fields from two different sources that I need to compare; I can do EXCEPT operator to compare the result of this two queries and get unique values of pages that are recorded during each visit and don't exist in our Page table that should contains all pages.
Below is our query using EXCEPT operator and uses two different databases.
NOTE: We got an error! If you would like to know more about Collation confict please follow this link (we will upload the link soon).
Our new query is exactly the same but uses COLLATE to resolve collation conflict.
See below our query and results of EXCEPT operator.
As you can see we have a few page names that have been recorded and don't exist in our Page table. Looking at the data I can see that there are some special characters %5C that indicate that our internal links or external links that point to our pages are not fully correct.
Thanks to EXCEPT operator we were able to quickly find a list of page name exception and using this information we can create suitable actions.
SQL UNION Operator
Article Version: SQL UNION Operator
In this Step by step SQL Tutorial I will show you a simple example using SQL Union.Union is used to combine two queries together so they appear as one result set.
Below is a simple example where I combine one row with another row using union.
Below are several basic SQL UNION rules (not all):
- Union should have two or more SELECT statement.
- Queries columns must be listed in the same order.
- Column data types must be the same.
IMPORTANT: Union shows DISTINCT rows in the final result. This is not always desired behavior so if you need to return exactly the same number of rows from both queries than use UNION ALL
PERFORMANCE: UNION applies DISTINCT; that means it will be slower comparing to UNION ALL that doesn't do that however always ensure you get the expected result.
You can also visit SQL UNION ALL Multiple queries issues with last UNION
I hope this very simple example of SQL UNION will help you
VIDEO SQL Union All Basics
SQL UNION ALL multiple queries
Below is SQL that uses UNION ALL and is based on AdventureWorkdDW2012 database.USE AdventureWorksDW2012
SELECT EmployeeNationalIDAlternateKey, Gender
FROM DimEmployee AS de
WHERE EmployeeNationalIDAlternateKey = 112457891
UNION ALL
SELECT EmployeeNationalIDAlternateKey, GenderFROM DimEmployee AS de
WHERE EmployeeNationalIDAlternateKey = 309738752
and the result is below
I have used UNION ALL so I don't lose duplicated row (which is not really duplicated as has different surrogate key, those working on Data Warehouse project will now what I mean)
The issue I got is when I used another UNION. I tried to use add another row which would have some hard coded data for demo purposes so I added UNION (I forgot ALL) see below
UNION
SELECT 123, 'U'
So that should give me the result below right? WRONG.... as I have learned.
This will give this result
It seems that the last UNION is applied and overwrites UNION ALL used before... which is rather odd and therefore UNION means that DISTINCT is applied and duplicated rows are removed and we get 3 rows instead of 4 rows. This is was my typo I simply replaced last UNION with UNION ALL but it was very surprising and interesting for me see this.
No comments:
Post a Comment