This article will be a comprehensive guide for SQL unions and joins. A lot of time databases are designed in a way that joins and unions are essential. For convenience, data can be stored in several separate tables instead of having a big table. So, it is important to learn unions and joins to use them properly.
We will use PostgreSQL for this demonstration. This is a very popular SQL engine nowadays. But the same queries can work in many other mainstream SQL engines as well.
If you are reading this to learn, please type and try to run the queries yourself. That’s the only way to learn.
Let’s dive in.
Preparing the Data Tables
I will prepare some dummy data tables for this demonstration. These are the commands for creating three tables: games, makes, and gold.
CREATE TABLE public.games(country character varying(50) NOT NULL, games character varying(50) NOT NULL, year integer NOT NULL);
CREATE TABLE makes (country varchar (50) NOT NULL, clubs varchar (50) NOT NULL, games varchar (50) NOT NULL);
create table gold (player_name varchar (100) not null, years varchar (10) not null, games varchar (50) null);
The games table has three columns: country, games, and year. The makes table has three columns: country, clubs, and games. In the end, the gold table has three columns: player_name, years, and games.
We need to insert some data in the tables to move forward:
insert into gamesvalues("USA", "Baseball", 2012), ("China", "Wrestling", 2011), ("England", "Cricket", 2015), ("India", "Cricket", 2011), ("USA", "Football", 2018), ("Spain", "Football", 2014), ("China", "Basketball", 2019), ("Italy", "Football", 2017);
insert into makesvalues('USA', 'gk', 'Basketball'), ('China', 'kl', 'Wrestling'), ('Finland', 'ds', 'Football'), ('England', 'nj', 'Cricket'), ('Spain', 'lp', 'Football'), ('Norway', 'ko', 'Basketball'), ('India', 'kg', 'Wresling'), ('South Africa', 'ep', 'Cricket'), ('Nigeria', 'wm', 'Swimming');
insert into goldvalues('Jimmy', '2018', 'Football'), ('Danny', '2012', 'Baseball'), ('Indra', '2011', 'Cricket'), ('Yun', '2019', 'Basketball'), ('Mojo', '2017', 'Football'), ('David', '2015', 'Cricket');
Data insertion is done. Let’s have a look at the tables. Here is the games table:

Here is the makes table:

Here is the gold table:

The tables are ready. They may not look complete. But we will be able to learn using them and use the concept in our real-world projects.
Unions
Both games and makes tables have country columns. We want to see all the countries in one place:
select country from games union all select country from makes;

Here is the part of the result. As the result, one country comes several times that may be unnecessary. How to solve this?
There is another type of union called union distinct. It will only return the distinct values:
select country from games union distinct
select country from makes;

We can use multiple columns as well. In the next example, we will see country and games columns. So, it will show both the column from both the tables.
select country, games from games union all select country, games from makes;

The union distinct can be used here as well to get the distinct combination of country and games.
select country, games from games union distinct
select country, games from makes;

We have 14 rows of data in the results. You may find one country several times or one game several times. But there are 14 unique combinations of country and games available in both the tables.
Notice, in the games column we have ‘year’ as integer values and in the gold column we have ‘years’ column as varchar values. Can we use union on them?
We can but we need to make them the same data type first. Here I will cast the ‘years’ column in the gold table as an integer and then perform the union. We are also using an alias and making the ‘years’ column as ‘year’ to match
select year from games union distinct select cast(years as int) as year from gold;

Is it possible to use union on more than two tables?
Yes. Here is an example:
select games from games
union all
select games from makes
union all
select games from gold;
This is the part of the output.

We can use the distinct unions as well. Or, one ‘all’ one ‘distinct’:
select games from games
union all
select games from makes
union distinct
select games from gold;

It returns only the distinct games from all three tables. Wrestling came twice. But if you notice carefully the spelling is different. It is a mistake in the database. So, you need to be careful about the spelling in the tables.
Joins
Joins works differently. It joins the columns. We will work on different types of joins. First, see some simple default joins.
Both games and makes tables have country columns.
This is the default join using the country column:
select * from games join makes on games.country = makes.country;

Two tables are joined based on the common countries between them and all the other columns just followed because we did not specify which columns we want in the result. The same country column came twice that was unnecessary.
In the next example, I will specify the columns I want to return in the result. Here I am making a few changes to the last query.
We are using an alias or a short name for the table. For example, we will denote the ‘games’ table as g and the ‘makes’ table as m.
The columns we want from the games table will be specified as g.column_name. The column country will come from the games table. So, it will be mentioned as g.country.
In the last example, we also had two ‘games’ columns in the table. But they are not exactly the same. So, we want both of them. To make it clear we will use column aliases and name the columns of the games table as g_games and the games column from the makes table as m_games.
select g.country, g.year, g.games as g_games, m.clubs, m.games as m_games
from games g
join makes m on g.country = m.country;

Types of Joins
There are different types of joins. Let’s work on all the joins one by one.
Inner join
The first type I want to mention is an inner join. That actually is the default type. If you do not mention any type it will give you inner join. In our last two examples, we did not mention any type. So, it performed the inner join automatically.
We are performing inner join on the country column. So, it will return only the common countries between the two tables.
We can rewrite the last example as follows, only using inner join instead of ‘join’.
select g.country, g.year, g.games as g_games, m.clubs, m.games as m_games from games g inner join makes m on g.country = m.country;

Left Outer Join
We are using the same example again. This time left outer join will be used instead of the inner join. As we are joining on the country column, in the left outer join all the countries from the left side will be taken. Here games table is on the left side because it is mentioned first in this query. From the right side or the makes tables, it will only use the information available for the countries on the left side. If any information is not available in the makes table about any country in the games table, those places will show nulls. There are some null values in the result here:
select g.country, g.year, g.games as g_games, m.clubs, m.games as m_games from games g left outer join makes m on g.country = m.country;

What if we want to see the name of the clubs for all the games on the games table in the countries of the games table?
In that case, we will need to join based on both games and country columns. Let’s do that.
Left Outer Join on Multiple Columns
In this query, the left outer join is done using both country and games columns. This time, it will return the values from the right table (makes table) where both country and games values will be matched with the games table.
select g.country, g.year, g.games, m.clubs
from games g
left outer join makes m on g.country = m.country and
g.games = m.games;

We can also join more than two tables.
Joining More Than Two Tables
Now we want to include the player names who won gold in the years mentioned in the games in the last query. So, we have to also join the gold table with the games table on games and years and include the player name in the output.
As a reminder, the ‘year’ column in the gold table is stored as varchar. It needs to be cast as an integer to do the join.
select g.country, g.year, g.games, m.clubs, gd.player_name
from games g
left outer join makes m on g.country = m.country and
g.games = m.games
left outer join gold gd on g.games = gd.games and
g.year = cast(gd.years as int);

Now we have the clubs associated with the games in the countries mentioned in the games table and the player names who won the gold in all those years all in one table.
You can perform the joins with multiple columns and more than two tables in inner join, right outer join, and full outer join as well.
We will move quickly through the right outer join and full outer join.
Right Outer join
This time we will join games and makes table on countries again to perform the right outer join. So, the table that will mention later will get priority. We will put the ‘games’ table first and the ‘makes’ table later as we did before.
But in the output table, we will get all the countries from the makes table and the information from the left table only when it will match the right table country.
select m.country, g.year, g.games as g_games, m.clubs, m.games as m_games from games g right outer join makes m on g.country = m.country;

Because we joined only on the ‘country’, it took the corresponding m_games and clubs from that certain country. As USA and China come twice in the ‘games’ table the information in the ‘makes’ table joins twice with these two rows. So, it becomes 11 rows in total.
In this case, also it is more appropriate to join using both ‘country’ and ‘games’:
select m.country, g.year, g.games as g_games, m.clubs, m.games as m_games
from games g
right outer join makes m on g.country = m.country
and g.games = m.games;

Full Outer Join
We will perform a full join on the ‘country’ only and we will return both m.country and g.country because a full join will return the information of all the countries from both the columns:
select g.country, g.year, g.games as g_games, m.country, m.clubs, m.games as m_games
from games g
full outer join makes m on g.country = m.country
and g.games = m.games;

When we do full join on both country and games column, it will take all the combinations of the country and games from both the tables:
select g.country, g.year, g.games as g_games, m.country, m.clubs, m.games as m_games
from games g
full outer join makes m on g.country = m.country
and g.games = m.games;

These are all very common types of joins.
PostgreSQL and many other SQL engines also support using the keyword ‘using’ instead of ‘on’ in the ‘join’ statement:
select g.country, g.year, g.games as g_games, m.clubs, m.games as m_games from games g full outer join makes m using (country);
Please feel free to try it out.
Natural Joins
PostgreSQL and some other SQL engines like MySQL support natural joins where if you do not mention the join condition it will naturally ‘join’ using the common columns. Let’s do a natural join using ‘games’ and ‘makes’ table:
select g.country, g.year, m.clubs
from games g natural join makes m;

As you can see I did not mention any join condition. But still, it did an inner join on both country and games columns. Because both country and games columns are common between ‘games’ and ‘makes’ tables.
You will get exactly the same result using the natural inner join as follows:
select g.country, g.year, m.clubs
from games g natural inner join makes m;
You can also perform natural left outer join, natural right outer join, and natural full outer join.
Example of Natural Left Outer Join
select g.country, g.year, m.clubs
from games g natural left outer join makes m;
Output:

Example of Natural Right Outer Join
select g.country, g.year, m.clubs
from games g natural right outer join makes m;
Output:

Example of Natural Full Outer Join
select g.country, g.year, m.clubs
from games g natural full outer join makes m;

Some Other Types of Joins
There are a few other types of join that can be mentioned here. One is cross join.
Cross Join
Cross join is like dot products of two tables. Each row of the left table joins with each row of the right table. An example here.
select * from games cross join makes;

Here I am showing the part of the output. Because the output is so big.
I am taking all the columns from both the tables but you can specify certain columns as we did before.
You may think why would anyone do a cross join. It is not very helpful in this type of data table. But there are times when it can be useful.
All the above-mentioned joins are 92-style joins. But there are some other types of syntax, I would like to mention here.
There is also an 89-style join that only performs inner join. It uses the ‘where’ clause instead of ‘join’.
select g.country, g.year, g.games as g_games, m.country, m.clubs, m.games as m_games
from games g, makes m
where g.country = m.country;

Notice, it did exactly what an inner join would do.
Conclusion
These are all the unions and joins I wanted to talk about here. I hope it was helpful. Most of these joins can be performed in other SQL engines as well. Sometimes you may need a minor syntax change. You just need to check the documentation. Please feel free to try with whatever SQL engine you are using.
Please feel free to follow me on Twitter, and the Facebook page, and check out my YouTube channel.
#sql #DataAnalytics #DataScience #DataAnalysis