A Complete Cheatsheet for Numpy

A Complete Cheatsheet for Numpy

What is Numpy?

Numpy is an open-source Python library. This library is essential for data scientists. Some other essential libraries like Pandas, Scipy are built on the Numpy library. In Numpy documentation, Numpy is defined like this:

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

I use this library every day. Most data scientists do if they are python users. It is fast, easy to use, understandable, uncomplicated. I don’t want to write too much about how and why it is so good. Because you will see it for yourself as you read through this article.

My goal is to document Numpy’s every day used methods.

As the title says, it is a Numpy cheatsheet. If you use the Numpy library or planning to use in the future, or trying to learn, this page can be a great resource for your everyday life.

These are the topics that will be covered here:

  1. Numpy Array Basics
  2. Repeat
  3. Mathematics
  4. Statistics
  5. Initializing Different Types of Arrays
  6. Rearranging or Reorganizing Arrays
  7. Indexing and Slicing of Array
  8. Adding Rows or Columns
  9. Append, Insert, Delete, and Sort
  10. Random
  11. File Import, Save, and, Load

Let’s start!!

Numpy Array Basics

This section is 

I used a Jupyter Notebook for this whole exercise. First import Numpy.

import numpy as np

Make a Numpy array. To do that we need to pass a Python list.

input:

a = np.array([1,2,3])
a

output:

array([1, 2, 3])

In the array ‘a’ I used all the integers. Now, make an array of floats:

input:

b = np.array([[9.0, 10.0, 6.0], [6.0,1.0,7.0]])
b

output:

array([[ 9., 10.,  6.],
[ 6., 1., 7.]])

Let’s try to make an array with both ints and floats:

input:

np.array([1, 3.0, 0.004, -2])

output:

array([ 1.   ,  3.   ,  0.004, -2.   ])

Notice, Numpy automatically converted the integers into floats!

Find out the dimensions of array a and b:

input:

a.ndim

output:

1

input:

b.ndim

output:

2

Array ‘a’ is a one-dimensional array and array b is a two-dimensional array.

Now, find the shape of the array ‘a’ and ‘b’:

input:

a.shape

output:

(3,)

input:

b.shape

output:

(2, 3)

Array ‘a’ is a one-dimensional array. So, it has only one value in shape. But array ‘b’ is a two-dimensional array. So, it’s shape is 2 x 3. That means it has 2 rows and 3 columns.

Find the length of the arrays:

input:

len(a)

output:

3

input:

len(b)

output:

2

Array ‘a’ has length 3 because it has 3 elements in it. Array ‘b’ is a two-dimensional array. So, the length of the array does not mean the number of elements in it. The length means the number of one-dimensional arrays in it or the number of rows in it. It has two rows. So, the length is 2.

Repeat

There are a few different ways to repeat the elements of an array. If you want to repeat the whole array,

input:

np.array([2,4,6]*4)

output:

array([2, 4, 6, 2, 4, 6, 2, 4, 6, 2, 4, 6])

Look, the array [2, 4, 6] was repeated 4 times.

Here is how to do the elementwise repetition,

input:

np.repeat([1,2,3], 3)

output:

array([1, 1, 1, 2, 2, 2, 3, 3, 3])

This time each element was repeated 3 times.

Let’s use this for a two-dimensional array,

input:

arr = np.array([[2, 4, 6]])
arr

output:

array([[2, 4, 6]])

Now, use repeat on it:

input:

np.repeat(arr,3,axis=0)

output:

array([[2, 4, 6],
[2, 4, 6],
[2, 4, 6]])

Here, we mentioned axis = 0. So, the repetition occurred in the axis-0 direction or rows direction.

input:

np.repeat(arr,3,axis=1)

output:

array([[2, 2, 2, 4, 4, 4, 6, 6, 6]])

Axis 1 indicates the direction of columns. So, repetition happens in the column’s direction.

Mathematics

In this section, I am going to show the mathematical operations. Most of the operations are self-explanatory. I will start with mathematical operations on one array.

input:

a = np.array([1,2,3,4])
a

output:

array([1, 2, 3, 4])

input:

a+2

output:

array([3, 4, 5, 6])

It adds 2 to each element of the array.

input:

a-2

output:

array([-1,  0,  1,  2])

You can simply use similar operations this way such as:

input:

a/2

output:

array([0.5, 1. , 1.5, 2. ])

input:

a**2

output:

array([ 1,  4,  9, 16], dtype=int32)

Two asterisks mean exponents. Each element in ‘a’ is squared.

input:

np.sqrt(a)  #square root

output:

array([1.        , 1.41421356, 1.73205081, 2.        ])

We can also perform some trigonometric operations:

input:

np.cos(a)

Output:

array([ 0.54030231, -0.41614684, -0.9899925 , -0.65364362])

input:

np.sin(a)

output:

array([ 0.84147098,  0.90929743,  0.14112001, -0.7568025 ])

input:

np.tan(a)

output:

array([ 1.55740772, -2.18503986, -0.14254654,  1.15782128])

Now see how we can do some mathematical operation in two arrays or matrices. First, make one more array,

input:

b = np.array([3,4,5,6])

output:

array([3, 4, 5, 6])

Just as a reminder, our array ‘a’ looked like this:

array([1, 2, 3, 4])

Now, we have two arrays, a and b. Let’s do the same mathematical operations. Again, it’s simple and self-explanatory,

input:

a + b

output:

array([ 4,  6,  8, 10])

The same way, you can do these following operations:

a - b
a*b
a/b
a**b

Another widely used operation is,

input:

a.dot(b)

output:

50

What is a.dot(b)? It’s element-wise multiplication and then addition like this,

1*3 + 2*4 + 3*5 + 4*6

where array ‘a’ is [1,2,3,4] and array b is [3,4,5,6].

You can write the syntax a bit differently as well,

np.dot(a, b)

This works the same. The output will be 50.

We can use this dot process in multi-dimensional arrays. Let’s make two multi-dimensional arrays for that,

input:

c = np.array([[3, 5, 1], [6, 4, 9]])
c

output:

array([[3, 5, 1],
[6, 4, 9]])

input:

d = np.array([[5,2], [7,9], [4, 3]])
d

output:

array([[5, 2],
[7, 9],
[4, 3]])

We are ready for a ‘dot’ operation on a multi-dimensional array,

input:

c.dot(d)

output:

array([[54, 54],
[94, 75]])

When the inputs are 2D array, ‘dot’ function behaves like matrix multiplication.

That means you can only perform ‘dot’ operation when the number of columns of the first array matches the number of rows in the second array.

If the first array is m x n, the second array should be n x p.

Matrix multiplication has another expression,

input:

np.matmul(c, d)

output:

array([[54, 54],
[94, 75]])

‘np.matmul’ does not work in one-dimensional arrays

Remember, this multiplication rule does not apply in other operations such as addition, deduction or division. We need to have the arrays of same shape and size to add, subtract or divide one matrix by another.

Statistics

Numpy has basic statistical operations as well. Here are some examples.

Make a new array first.

input:

x = np.array([1,3,4,6,-3,-2])
x.sum()

output:

9

input:

x.max()

output:

6

input:

x.min()

output:

-3

input:

x.mean()

output:

1.5

input:

x.std()  #standard deviation

output:

3.2015621187164243

There are two other very useful functions which are not exactly statistical,

input:

x.argmin()

output:

4

input:

x.argmax()

output:

3

What ‘argmin()’ or ‘argmax()’?

‘argmin()’ gives you the index of the minimum element of the array and ‘argmax()’ returns the index of the maximum value of the array.

The minimum element of the array ‘x’ is -3 and the maximum element of array ‘x’ is 6. Now check if their index matches the result.

Initializing Different Types of Arrays

There are so many different ways in Numpy to initialize an array. Here I am going to discuss some commonly used ways:

input:

np.arange(10)

output:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

This is the way to initialize a sequence of numbers. Notice it starts from zero and ends at 9. The upper range is always excluded. Here the upper limit is 10. So, it stops at 9.

We also can add a mathematical operation in it:

input:

np.arange(10)**2

output:

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81], dtype=int32)

In this case, we asked for a range of 10 square and we got the square of 0 to 9 in the output array.

We can make an array of the sequence of numbers with certain intervals.

np.arange(0, 15, 3)

output:

array([ 0,  3,  6,  9, 12])

Here, 0 is the lower limit, 15 is the upper limit and 3 is the interval.

There is another method that gives a sequence a bit differently:

input:

np.linspace(0, 3, 15)

output:

array([0.        , 0.21428571, 0.42857143, 0.64285714, 0.85714286,
1.07142857, 1.28571429, 1.5 , 1.71428571, 1.92857143,
2.14285714, 2.35714286, 2.57142857, 2.78571429, 3. ])

Here, 0 is the lower limit, 3 is the upper limit and 15 is the number of elements. In this case, Numpy automatically generates 15 elements equally spaced from 0 to 3.

There are a few other types of arrays:

input:

np.ones((3, 4))

output:

array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])

input:

np.zeros((2, 3))

output:

array([[0., 0., 0.],
[0., 0., 0.]])

You can initiate a three-dimensional array of ones:

input:

np.ones((4,3,2), dtype='int32')

output:

array([[[1, 1],
[1, 1],
[1, 1]],
[[1, 1],
[1, 1],
[1, 1]],
[[1, 1],
[1, 1],
[1, 1]],
[[1, 1],
[1, 1],
[1, 1]]])

Here, (4,3,2) means 4 two-dimensional arrays, each with 3 rows and 2 columns.

input:

np.full((2,2), 30)

output:

array([[30, 30],
[30, 30]])

There is another method called full_like that replaces the elements of an array:

input:

ar = np.array([[2,3], [4,5]])
ar

output:

array([[2, 3],
[4, 5]])

input:

np.full_like(ar, 4)

output:

array([[4, 4],
[4, 4]])

There is another type of matrix called the identity matrix:

input:

np.identity(5)

output:

array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])

It’s a five by five matrix where all the elements are 0s and only diagonal elements are ones.

There is another type called an ‘eye’. It takes the shape of a matrix:

input:

np.eye(3,3)

output:

array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])

input:

np.eye(3,4)

output:

array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.]])

These diagonal numbers can be different than ones.

input:

a = np.array([2,4,5])
np.diag(a)

output:

array([[2, 0, 0],
[0, 4, 0],
[0, 0, 5]])

Rearranging or Reorganizing Arrays

There are different ways to rearrange or reshape an array.

Let’s learn it by example. First make an array,

input:

x = np.arange(0, 45, 3)
x

output:

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42])

I explained the ‘arange’ function in the previous section. Let’s see how we can reshape it.

input:

x.reshape(3, 5)

output:

array([[ 0,  3,  6,  9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42]])

We passed (3,5). So, it becomes a two-dimensional array that has 3 rows and 5 columns. We could achieve the same by using:

x.resize(3,5)

What if we want to get back to the original one-dimensional array?

Here is the way,

input:

x.ravel()

output:

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42])

Look, We got back the original array!

Notice another thing. That is we changed the dimension of the array. Array ‘x’ was a one-dimensional array. We made it a two-dimensional array by reshaping it.

Now, make another array to understand it better. Here is another example.

input:

c = np.array([4,5,6])
c

output:

array([4, 5, 6])

This time I will use resize. Reshape will do the same. Just to practice with resize, let’s use resize here.

input:

c.resize(3,1)

output:

array([[4],
[5],
[6]])

Look we provided (3,1) as the parameter of resizing. So it made 3 rows and 1 column. It’s a 3×1 matrix. We can have a 1×3 matrix as well.

input:

c.resize(1,3)
c

output:

array([[4, 5, 6]])

Look originally c was a one-dimensional array. Now it’s a two-dimensional array or matrix.

Do not think that you can only reshape a resize a one-dimensional array only. You can do so in a higher dimensional array as well.

Here I have some examples:

input:

x = np.array([[1,2,3,4], [5,6,7,8]])
x

output:

array([[1, 2, 3, 4],
[5, 6, 7, 8]])

Now reshape this two-dimensional array,

input:

x.reshape(4,2)
x

output:

array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])

You can achieve this using ‘resize’ as I mentioned before. There is another way,

input:

y = x.reshape(4, -1)
y

output:

array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])

Looks confusing? Imagine, you have a huge big array or dataset. Before reshaping, you only know one dimension. So, in the reshape method you pass that one dimension and for the other dimension use -1. That way Numpy itself will figure out the other dimension.

In the example above, I passed the first dimension that is 4. That means I am telling Numpy to make 4 rows. And I do not know the number of columns. So I just passed -1 there. So, it automatically knows to make 2 columns.

This is a very useful trick when we work with a big dataset or dataframe and we have to build machine learning algorithms.

In all the examples above we saw how to reshape and change the dimensions.

This is the way to alter the dimensions. The array ‘y’ above is a 4×2 matrix. Let’s make it a 2×4 matrix again.

input:

y.T

output:

array([[1, 3, 5, 7],
[2, 4, 6, 8]])

This method is called the transpose. When you use transpose on an array or matrix it just alters the dimensions. 2×3 matrix becomes 3×2, 3×6 matrix becomes 6×3 or 1×3 matrix becomes 3×1.

Indexing or Slicing

Indexing and slicing is a very common and everyday task. Let’s work on a few examples:

input:

a = np.array([2,5,1,7,6,3,9,0,4])

input:

a[0]

output:

2

a[0] gives the first element of the array. In the same way, we can keep going with a[1], a[2], and all the way up to the entire array.

input:

a[3]

output:

7

We can cut a slice as well,

input:

a[1:5]

output:

array([5, 1, 7, 6])

This is the explanation. We passed [1:5]. So, the slice will start at index 1 and ends before index 5. Remember, the lower bound is included and the upper bound is excluded.

I am not going any deeper about slicing and indexing in this article. Because I already wrote another article where I explained it in detail. Please check. It is important to learn it well.

Adding Rows or Columns

Numpy has a few different methods to add rows or columns. First I will start some stacking techniques. Here are some examples.

This time I will work with some list or arrays. Numpy will automatically turn them into arrays while stacking.

Here two lists:

x1 = [[2, 4, 3, 7], [2, 5, 3, 1]]
x2 = [1, 0, 9, 5]

Now stack them vertically.

input:

np.vstack([x1, x2])

output:

array([[2, 4, 3, 7],
[2, 5, 3, 1],
[1, 0, 9, 5]])

You can stack them as many times as you want.

input:

np.vstack([x1, x2, x2])

output:

array([[2, 4, 3, 7],
[2, 5, 3, 1],
[1, 0, 9, 5],
[1, 0, 9, 5]])

Let’s do some stacking horizontally as well. We need arrays with the same number of rows.

‘x1’ has 2 rows. Make an array out of it.

input:

np.array(x1)

output:

array([[2, 4, 3, 7],
[2, 5, 3, 1]])

Make another array ‘x3’.

input:

x3 = np.ones((2,3))
x3

output:

array([[1., 1., 1.],
[1., 1., 1.]])

Time for horizontal stacking.

input:

np.hstack([x1, x3])

output:

array([[2., 4., 3., 7., 1., 1., 1.],
[2., 5., 3., 1., 1., 1., 1.]])

Concatenate

Another way of adding columns or rows. But as opposed to stacking, this time we need to have two arrays of the same dimensions. Remember, when we did vertical stacking we had a two-dimensional and a one-dimensional list.

Here are my two lists for this example.

x1 = [[2, 4, 3, 7], [2, 5, 3, 1]]
x2 = [[1, 0, 9, 5]]

Concatenation operation,

input:

np.concatenate((x1, x2), axis=0)

output:

array([[2, 4, 3, 7],
[2, 5, 3, 1],
[1, 0, 9, 5]])

Now, concatenate horizontally. But we need two arrays with the same number of rows.

x3 = [[2,4], [7,5]]

Concatenate x1 and x3.

input:

np.concatenate((x1, x3), axis=1)

output:

array([[2, 4, 3, 7, 2, 4],
[2, 5, 3, 1, 7, 5]])

Append, Insert, Delete, and Sort

You probably know by name what these operations are about.

Append

input:

np.append([2,3], [[4,5], [1, 3]])

output:

array([2, 3, 4, 5, 1, 3])

input:

np.append([2, 3, 1], [[4, 5], [1,3]])

output:

array([2, 3, 1, 4, 5, 1, 3])

We did not mention any axis in those examples. So, by default, they took axis 1 or in the column direction or horizontal direction. Now, do an append operation in a vertical direction.

input:

np.append([[1,3,5], [4,3,6]], [[1,2,3]], axis=0)

output:

array([[1, 3, 5],
[4, 3, 6],
[1, 2, 3]])

Insert

This time we will insert an element in a certain position. Start with a new array.

input:

a = np.array([[2, 2], [3, 4], [5, 6]])
a

output:

array([[2, 2],
[3, 4],
[5, 6]])

Insert element 5 at the beginning of the array.

input:

np.insert(a, 0, 5)

output:

array([5, 2, 2, 3, 4, 5, 6])

First, understand the input. In (a, 0, 5), a is the array, 0 is the position where we wanted the element to be inserted and 5 is the element that was to be inserted.

Notice, how the insertion happened. First, the two-dimensional array ‘a’ got flattened into a one-dimensional array. And then 5 was added at the index 0.

We can insert along the axis as well.

input:

np.insert(a, 0, 5, axis=1)

output:

array([[5, 2, 2],
[5, 3, 4],
[5, 5, 6]])

Look a column of 5s is added to array ‘a’. We can add a row of 5s as well.

input:

np.insert(a, 0, 5, axis=0)

output:

array([[5, 5],
[2, 2],
[3, 4],
[5, 6]])

Delete

I will make a new array as before.

input:

a= np.array([[1,3,2,6], [4,1,6,7], [9, 10, 6, 3]])
a

output:

array([[ 1,  3,  2,  6],
[ 4, 1, 6, 7],
[ 9, 10, 6, 3]])

input:

np.delete(a, [1, 2, 5])

output:

array([ 1,  6,  4,  6,  7,  9, 10,  6,  3])

Like insert operation, delete operation also flattens the array. In the input [1,2,5] is the list of indexes where we wanted the delete. To see it clearly, let’s flatten the original array ‘a’.

input:

a.flatten()

output:

array([ 1,  3,  2,  6,  4,  1,  6,  7,  9, 10,  6,  3])

Now check, the elements of index 1, 2, and 5 were deleted.

Like insertion, we can delete a specific row or column.

Delete the column index 1.

input:

np.delete(a, 1, 1)

output:

array([[1, 2, 6],
[4, 6, 7],
[9, 6, 3]])

In the input (a, 1, 1), a is the array, 1 is the column’s index that we want to delete, and the last 1 is the axis.

input:

np.delete(a, 1, 0)

output:

array([[ 1,  3,  2,  6],
[ 9, 10, 6, 3]])

Sort

Here is the array ‘a’:

array([[ 1,  3,  2,  6],
[ 4, 1, 6, 7],
[ 9, 10, 6, 3]])

input:

np.sort(a)

output:

array([[ 1,  2,  3,  6],
[ 1, 4, 6, 7],
[ 3, 6, 9, 10]])

Look, it sorted in both directions. We can specify axis and sort in a specific axis.

input:

np.sort(a, axis=None)

output:

array([ 1,  1,  2,  3,  3,  4,  6,  6,  6,  7,  9, 10])

When the axis is None, it flattens the array and sort. Now, sort in axis 0 and 1.

input:

np.sort(a, axis=0)

output:

array([[ 1,  1,  2,  3],
[ 4, 3, 6, 6],
[ 9, 10, 6, 7]])

input:

np.sort(a, axis=1)

output:

array([[ 1,  2,  3,  6],
[ 1, 4, 6, 7],
[ 3, 6, 9, 10]])

Flip

It does exactly how it sounds like. Flips the arrays and rows.

Here is an array.

arr

output:

array([[ 1,  2,  3,  4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])

Now, flip this array in the direction of axis 0 and 1.

input:

np.flip(arr, 0)

output:

array([[ 9, 10, 11, 12],
[ 5, 6, 7, 8],
[ 1, 2, 3, 4]])

input:

np.flip(arr, 1)

output:

array([[ 4,  3,  2,  1],
[ 8, 7, 6, 5],
[12, 11, 10, 9]])

Random

Numpy has excellent functionalities for random number generation. They are very useful in machine learning, research, or statistics. Here are some examples.

input:

np.random.rand()

output:

0.541670003513435

In generates a number between 0 to 1. We can get an array or matrix out of random numbers like this.

input:

np.random.rand(3)

output:

array([0.6432591 , 0.78715203, 0.81071309])

input:

np.random.rand(2, 3)

output:

array([[0.91757316, 0.74438045, 0.85259742],
[0.19826903, 0.84990728, 0.48328816]])

It does not have to be the numbers from 0 to 1. We can generate random integers.

input:

np.random.randint(25)

output:

20

It generated a random number in the range of 0 to 25. We can specify how many numbers we want to generate.

input:

np.random.randint(1, 100, 10)

output:

array([96, 44, 90, 13, 47, 16,  9, 46, 49, 20])

Here, we are asking Numpy to generate 10 numbers in the range of 1 to 100.

Now, generate a 3×3 matrix in the range of 1 to 100.

input:

np.random.randint(1, 100, (3,3))

output:

array([[25, 80, 42],
[95, 82, 66],
[64, 95, 55]])

Instead of a range, you can provide an array and ask Numpy to make a 3×3 matrix using the numbers from the array you provided.

input:

np.random.choice([1,2,3,4,5,6,7,8,9,10], size=(3,3))

output:

array([[ 7,  9,  2],
[ 6, 4, 6],
[ 3, 10, 6]])

Another useful functionality is ‘shuffle’. Let’s make a new array and shuffle it.

input:

a = np.array([3,6,3,1,0, 11])
np.random.shuffle(a)
a

output:

array([ 3,  0,  6,  3, 11,  1])

Look, we have the same elements, just rearranged after shuffling.

Save, Load, and Import File

We can save the array ‘arr’ in a file.

input:

np.save('arrfile', arr)

Here, we are making a file named ‘arrfile’ to save the array ‘arr’. The file will be saved with an extension of ‘.npy’.

We can load that file and bring back to further use the array like this,

input:

np.load('arrfile.npy')

output:

array([[ 1,  2,  3,  4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])

We can import a CSV file or text file using Numpy as an array. I have a file named ‘Cartwheeldata.csv’ in the same folder as the Jupyter Notebook I worked on these examples. Now, import that file here.

input:

filedata = np.genfromtxt('Cartwheeldata.csv', delimiter=',')
filedata=filedata.astype('int32')
filedata

output:

I am showing the part of the array here. Because the file is big. So, here is the information on that file.

These types of arrays are very useful in machine learning.

Conclusion

This was all the Numpy functionalities I wanted to share in this article. Numpy is a big library. It has a lot of methods available. But these are the functionalities that should be good enough for everyday use. If you think I missed any topic here or some more functionalities should be added, please let me know. I will add them.

#datascience #DataAnalysis #Numpy #python #DataAnalytics




Leave a Reply

Close Menu