curlylogic.dev logo

NumPy Explained: The Heart of Data Analysis in Python

Introduction

Welcome to the world of NumPy, the backbone of numerical computing in Python. Whether you’re an aspiring data scientist, a machine learning enthusiast, or a seasoned programmer, NumPy is a library you’ll encounter time and time again. In this blog, we will embark on a journey to explore the versatility and efficiency that NumPy brings to the realm of data manipulation.

What is NumPy

NumPy is the fundamental library for scientific computing in Python. It was developed by “Travis Oliphant” in 2005. NumPy stands for Numerical Python. At the core, NumPy is ndarray object only. NumPy incorporates data in an array that can have one dimension to n dimensions and the array contains uniform data type.

Why NumPy

NumPy is a library that provides support for large, multi-dimensional arrays and matrices, along with a vast collection of mathematical functions to operate on these arrays. It’s the backbone of many scientific and data analysis libraries in Python, including SciPy, pandas, and scikit-learn. Here are some compelling reasons

  1. Speed: NumPy operations are considerably faster than traditional Python loops because NumPy arrays are homogeneous and operations are executed in compiled C code, reducing interpretive overhead.
  2. Memory Efficiency: NumPy uses contiguous blocks of memory, making it an ideal choice for handling big data.
  3. Broadcasting: NumPy allows for operations on arrays of different shapes and sizes through a mechanism called broadcasting. This feature simplifies complex operations and enhances code readability.
  4. Comprehensive Mathematical Functions: NumPy comes bundled with a wide range of mathematical functions for tasks such as statistics, linear algebra, and Fourier analysis.

Getting Started with NumPy

  • Importing NumPy

NumPy can be used numpy but professionals use np as its alias name which is not mandatory but very easy to use.

python
1import numpy as np

  • Creating NumPy Arrays

NumPy provides different methods of creating NumPy arrays and it is very simple and straightforward. Among these fundamental methods are np.array(), np.zeros(), np.ones(), and np.arange(). Let’s illustrate their usage.

python
1arr = np.array([1, 2, 3, 4, 5])
2
3print(arr) 
4#Output: [1 2 3 4 5]

Yay, you have created your first NumPy array. This is a one-dimensional array. But you can create two, three, or n-dimensional arrays using NumPy.

python
1zero_d_arr = np.array(4) #0-d array
2
3one_d_arr = np.array([1, 2, 3, 4, 5]) #1-d array
4
5two_d_arr = np.array([[1, 2, 3, 4, 5], [6,7,8,9,10]]) #2-d array
6
7three_d_arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]]) #3-d array

Let’s check other methods of creating an array:

np.zeros() and np.ones(): These methods are particularly useful for initializing arrays of a specific shape filled with zeros or ones. These methods take the number of elements you want to generate.

python
1# Create an array of zeros with 3 elements
2arr_zero = np.zeros([3])
3print(arr) #Output: [0 0 0]
4
5# Create an array of ones with 4 elements
6arr_one = np.ones([4])
7print(arr) #Output: [1 1 1 1]

np.arange(): When we need to generate a sequence of numbers, np.arange() is our best tool. The np.arange() function also helps generate a range of values, allowing for precise control over the start, stop, and step size of the sequence.

python
1# Create an array from 0 to 10 (inclusive)
2ages = np.arange(0, 11)
3print(ages)
4#Output: [ 0 1 2 3 4 5 6 7 8 9 10]
5
6# Create an array from 0 to 20 (inclusive) with step of 2
7counts = np.arange(0, 21, 2)
8print(counts)

  • Checking the Shape of the Array

We have seen how to create an array but since NumPy can create n-dimensional arrays, is there any way to find the shape or dimension of the array? Yes, there is a way. You can count the square braces from the start and you will get the answer. Just Kidding!! There are methods np.shape(array), array.ndim which can give you the dimensions of your arrays. Let’s search for the dimensions of the above-given arrays.

python
1print(zero_d_arr.ndim) 
2#Output: 0
3
4print(one_d_arr.ndim) 
5#Output: 1
6
7print(two_d_arr.ndim) 
8#Output: 2
9
10print(three_d_arr.ndim) 
11#Output: 3
12
13print(np.shape(zero_d_arr)) 
14#Output: ()
15
16print(np.shape(one_d_arr)) 
17#Output: (5,)
18
19print(np.shape(two_d_arr)) 
20#Output: (2, 5)
21
22print(np.shape(three_d_arr)) 
23#Output: (2, 2, 3)

This clearly shows the different outputs of np.shape()and ndim method of all the arrays.

Array Operations

  • Array Indexing

Array indexing simply means accessing the items in the array. You can access it using its index number and remember Python indexing starts from 0, not 1. Let’s take the above-created arrays and do some indexing on them.

python
1one_d_arr[1]
2#The output will be 2

This is how we access the items in the one-dimensional array but it can be a little different while working 2-d, 3-d, 4-d, or other dimensional arrays. Let’s try this on other arrays we have.

python
1two_d_arr[1,3]
2#Output is 9
3
4three_d_arr[0,1,2]
5#Output is 2

And we access items from the two and three-dimensional arrays. Congratulations.

  • Array Slicing

Array slicing in NumPy is a versatile tool that enables you to extract and manipulate data efficiently, whether you’re working with 1D or multi-dimensional arrays. We do slice like this [start:end]. We can also add a step sometimes when we need to slice alternate elements or elements after a fixed step like this [start:end:step].

Let’s try some slicing on a one-dimensional array.

python
1arr = np.array(['Ray','John','James','Mike','Tracy'])
2
3# Slice elements from element 1 to 4 (exclusive)
4basic_slice = arr[1:4]
5print(basic_slice)
6#Output: [John James Mike]
7
8# Slice elements after 2nd element
9after_slice = arr[2:]
10print(after_slice) 
11#Output: ['James' 'Mike' 'Tracy']
12
13# Slice elements before 4th element
14after_slice = arr[2:]
15print(after_slice) 
16#Output: ['Ray' 'John' 'James']
17
18# Slicing with step
19step_slice = arr[1:5:2]
20print(step_slice) 
21#Output: ['John' 'Mike']
22
23# Negative Slicing
24negative_slice = arr[-3:-1]
25print(negative_slice) 
26#Output: ['James' 'Mike']

We did slicing on the one-dimensional array, now let’s try some on the two-dimensional array because slicing on more than one dimension array is basically the same. In 2D or more than that, we pass slices like this

python
1arr_2d = np.array([['Ray','John','James','Mike','Tracy'],
2['Nate','Jamie','Kenny','Randall','Sonny']])
3
4# arr_2d[index, start:end]
5print(arr_2d[1, 1:4]) 
6#Output: ['Jamie' 'Kenny' 'Randall']
7
8arr_3d = np.array([[['Ray','John','James'],
9['Tracy', 'Nate','Jamie']], 
10[['Kenny','Randall','Sonny'], 
11['Sam', 'Casie', 'Lina']]])
12
13# arr_3d[index, index, start:end]
14print(arr_2d[1,1, 1:3]) 
15#Output: ['Casie' 'Lina']

In this first part of our NumPy exploration, we’ve covered the fundamentals, from creating NumPy arrays to performing basic operations. You’ve just scratched the surface of what this powerful library can do. But hold on to your seats, because in Part 2, we’ll dive deeper into NumPy techniques and unveil some of its hidden gems.

Are you ready to take your data analysis skills to the next level? Join us in Part 2, where we’ll uncover the full potential of NumPy!

In the meantime, feel free to leave your thoughts, questions, or feedback in the comments below.

Stay tuned for Part 2, coming soon!

Return To All articles