Today’s topic is missing values in a database. It is very common to get databases with some or lots of missing data in them. Missing data may make data retrieval incorrect. For example, when we use ‘=’, ‘>’ or ‘<’ conditions in a where clause to compare values with some value. How can we compare a value with a missing value or null value? Let’s look at this query:
Select * from fun.games where min_age>= 8 and min_age <=10;
In this query we are chacking if min_age is greater than or equal to 8 and less than or equal to 10. If min_age in some rows are null, then there is no way we can compare them. This is important for data scientists to understand the data properly and check if there are missing values and deal with them properly. Here I am talking about some ideas to deal with missing data or null values. There will be lots of examples and I used an Impala engine. But Hive, MySQL, PostgreSQL and many other SQL engines should also support this types of functionality.
IS NULL or IS NOT NULL
You cannot test for NULL using =, <, >. Because “select * from inventory where price = NULL” will return a null.
Or “select * from inventory where price != NULL” will also return a null. Any value compared to null will always yield null. Like:
3 = NULL will be NULL
3 != NULL will be NULL
3 < NULL will be NULL
NULL = NULL will be NULL
NULL != NULL will be NULL
So to deal with NULL, we can use IS NULL or IS NOT NULL to check if there is a null value. Here is an example of IS NULL. Following query statement in the picture below querying for the rows where price column has null values from the fun table of inventory database. The picture is showing the output. Only one row has NULL price.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Here is an example where we can use IS NULL and IS NOT NULL in a little more complex query. Let’s retrieve the flight details of January 15, 2009, where all the arrival time is empty or null and all the departure time is available.
Select * from fly.flights where flights.year = 2009 and flights.day = 15 and flights.month = 1 and dep_time IS NOT NULL and arr_time IS NULL;

IS NULL or IS NOT NULL
We saw earlier how to find NULL values and also how to filter out NULL values with IS NULL and IS NOT NULL functions. Here we will see another tricky problem. In the picture below we can see the offices table from default database.

In this table state_province is NULL for Singapore because Singapore does not have any state or province. Now check what we get if we run a query to find the rows where state_province is not ‘Istanbul’.

This query didn’t bring Singapore. Though it’s value is not ‘Illnois’ but it’s value is NULL.
!= Null returns NULL. So, we didn’t get a correct result from this query. To solve this type of situation IS DISTINCT FROM is very useful.
These statements are equal:
Select * from default.offices where state_province != ‘Illinois’ or state_province IS NULL;
Select * from default.offices where state_province IS DISTINCT FROM ‘Illinois’;
Please see the pictures below.


frolep rotrem
3 Mar 2020My brother recommended I might like this web site. He was once entirely right. This post actually made my day. You cann't believe simply how so much time I had spent for this information! Thanks!