NumPy, a guide for beginners. Part 1

Original author: scipy.org
  • Transfer
NumPyLogoNumPy is an extension of the Python language that adds support for large multidimensional arrays and matrices, along with a large library of high-level mathematical functions for operations with these arrays.

The first part of the tutorial talks about the basics of working with NumPy: creating arrays, their attributes, basic operations, element-wise application of functions, indexes, slices, iteration. Various manipulations with transforming the shape of an array, combining arrays of several and vice versa - splitting one into several smaller ones are considered. At the end, we will discuss surface and deep copying.

The basics


If you have not installed NumPy yet, you can get it here . The version of Python used is 2.6.

The main object of NumPy is a homogeneous multidimensional array. This is a table of elements (usually numbers), all of the same type, indexed by sequences of natural numbers.

By "multidimensionality" of an array, we mean that it can have several dimensions or axes . Since the word "measurement" is ambiguous, instead of it we will use the words "axis" and "axes" more often. The number of axes is called the rank .

For example, the coordinates of a point in three-dimensional space [1, 2, 1] are an array of rank 1, it has only one axis. The length of this axis is 3. Another example, an array

[[ 1., 0., 0.],
[ 0., 1., 2.]]


represents an array of rank 2 (that is, it is a two-dimensional array). The length of the first dimension (axis) is 2, the length of the second axis is 3. See the Numpy Glossary for more information .

The multidimensional array class is called ndarray. Note that this is not the same as the arrayPython standard library class , which is used only for one-dimensional arrays. The most important attributes of objects ndarray:

ndarray.ndim - the number of axes (dimensions) of the array. As already mentioned, in the Python world, the number of dimensions is often called a rank.

ndarray.shape - array dimensions, its shape. This is a tuple of natural numbers, showing the length of the array along each axis. For a matrix of n rows and m columns,shapewill be (n,m). The number of elements of the tuple shapeequal to the rank of the array, that is ndim.

ndarray.size is the number of all elements in the array. Equals the product of all attribute elements shape.

ndarray.dtype - an object that describes the type of array elements. Can be defined dtypeusing standard Python data types. NumPy here offers a whole bunch of possibilities, for example: bool_, character, int_, int8, int16, int32, int64, float_, float8, float16, float32, float64, complex_, complex64, object_.

ndarray.itemsize - the size of each array element in bytes. For example, for an array of type elements, the float64value itemsizeis 8 (= 64/8), and for complex32this attribute is 4 (= 32/8).

ndarray.data- A buffer containing the actual elements of the array. Usually we will not need to use this attribute, because we will access the elements of the array using indexes.

Example


Define the following array:
Copy Source | Copy HTML
>>> from numpy import *
>>> a = arange(10).reshape(2,5)
>>> a
array([[ 0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])


We just created an array object named a . Array a has several attributes or properties. In Python, attributes of an individual object are denoted as name_of_object.attribute. In our case:
  • a.shape this (2.5)
  • a.ndimit's 2 (which is equal to the length a.shape)
  • a.size it's 10
  • a.dtype.name this is int32
  • a.itemsize it's 4, which means that int32 takes up 4 bytes of memory.

You can check all of these attributes by simply typing them interactively:
Copy Source | Copy HTML
>>> a.shape
(2, 5)
>>> a.dtype.name
'int32'

Etc.

Creating Arrays


There are many ways to create an array. For example, you can create an array from regular Python lists or tuples using the function array():
Copy Source | Copy HTML
>>> from numpy import *
>>> a = array( [2,3,4] )
>>> a
array([2, 3, 4])
>>> type(a)



The function array()transforms nested sequences into multidimensional arrays. The type of array depends on the type of elements in the original sequence.
Copy Source | Copy HTML
>>> b = array( [ (1.5,2,3), (4,5,6) ] ) # это станет массивом float элементов
>>> b
array([[ 1.5, 2. , 3. ],
       [ 4. , 5. , 6. ]])


Once we have an array, we can take a look at its attributes:
Copy Source | Copy HTML
>>> b.ndim # число осей
2
>>> b.shape # размеры
(2, 3)
>>> b.dtype # тип (8-байтовый float)
dtype('float64')
>>> b.itemsize # размер элемента данного типа
8


The type of the array can be explicitly specified at the time of creation:
Copy Source | Copy HTML
>>> c = array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+ 0.j, 2.+ 0.j],
       [ 3.+ 0.j, 4.+ 0.j]])


A common mistake is to call a function array()with a lot of numeric arguments instead of the supposed single argument as a list of numbers:

Copy Source | Copy HTML
>>> a = array(1,2,3,4) # WRONG
>>> a = array([1,2,3,4]) # RIGHT 


A function is array()not the only function to create arrays. Usually the elements of the array are initially unknown, and the array in which they will be stored is already needed. Therefore, there are several functions in order to create arrays with some kind of source content. The default type of the created array is float64.

The function zeros()creates an array of zeros, and the function creates an ones()array of ones:
Copy Source | Copy HTML
>>> zeros( (3,4) ) # аргумент задает форму массива
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])
>>> ones( (2,3,4), dtype=int16 ) # то есть также может быть задан dtype
array([[[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]],
       [[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]]], dtype=int16)


The function empty()creates an array without filling it. The original content is random and depends on the state of memory at the time the array was created (that is, on the garbage that is stored in it):
Copy Source | Copy HTML
>>> empty( (2,3) )
array([[ 3.73603959e-262, 6.02658058e-154, 6.55490914e-260],
       [ 5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])
>>> empty( (2,3) ) # содержимое меняется при новом вызове
array([[ 3.14678735e-307, 6.02658058e-154, 6.55490914e-260],
       [ 5.30498948e-313, 3.73603967e-262, 8.70018275e-313]])


To create sequences of numbers, NumPy has a function that is similar range(), but instead of lists, it returns arrays:
Copy Source | Copy HTML
>> arange( 10, 30, 5 )
array([10, 15, 20, 25])
>>> arange(  0, 2,  0.3 )
array([  0. ,  0.3,  0.6,  0.9, 1.2, 1.5, 1.8])


When used arange()with type arguments float, it is difficult to be sure of how many elements will be received (due to limitations in the precision of floating-point numbers). Therefore, in such cases, it is usually better to use a function linspace()that, instead of a step, takes as one of the arguments a number equal to the number of necessary elements:
Copy Source | Copy HTML
>>> linspace(  0, 2, 9 ) # 9 чисел от 0 до 2
array([  0. ,  0.25,  0.5 ,  0.75, 1. , 1.25, 1.5 , 1.75, 2. ])
>>> x = linspace(  0, 2*pi, 100 ) # полезно для вычисления значений функции в множестве точек
>>> f = sin(x)


Printing Arrays


When you print an array, NumPy displays them in a manner similar to nested lists, but it places them a bit differently:
  • the last axis is printed from left to right,
  • penultimate - from top to bottom,
  • and the rest - also from top to bottom, dividing the empty string.

One-dimensional arrays are displayed as strings, two-dimensional arrays as matrices, and three-dimensional arrays as lists of matrices.
Copy Source | Copy HTML
>>> a = arange(6) # 1d array
>>> print a
[0 1 2 3 4 5]
>>>
>>> b = arange(12).reshape(4,3) # 2d array
>>> print b
[[ 0 1 2]
 [ 3 4 5]
 [ 6 7 8]
 [ 9 10 11]]
>>>
>>> c = arange(24).reshape(2,3,4) # 3d array
>>> print c
[[[ 0 1 2 3]
  [ 4 5 6 7]
  [ 8 9 10 11]]
 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


If the array is too large to print, NumPy automatically hides the center of the array and displays only its corners:
Copy Source | Copy HTML
>>> print arange(10000)
[ 0 1 2 ..., 9997 9998 9999]
>>>
>>> print arange(10000).reshape(100,100)
[[ 0 1 2 ..., 97 98 99]
 [ 100 101 102 ..., 197 198 199]
 [ 200 201 202 ..., 297 298 299]
 ...,
 [9700 9701 9702 ..., 9797 9798 9799]
 [9800 9801 9802 ..., 9897 9898 9899]
 [9900 9901 9902 ..., 9997 9998 9999]]


If you really need to see everything that happens in a large array, outputting it completely, use the print setup function set_printoptions():
Copy Source | Copy HTML
>>> set_printoptions(threshold=nan)


Basic operations


Arithmetic operations on arrays are performed element by element . A new array is created, which is populated with the results of the operator action.
Copy Source | Copy HTML
>>> a = array( [20,30,40,50] )
>>> b = arange( 4 )
>>> c = a-b
>>> c
array([20, 29, 38, 47])
>>> b**2
array([ 0, 1, 4, 9])
>>> 10*sin(a)
array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
>>> a<35
array([True, True, False, False], dtype=bool)


Unlike the matrix approach, the product * operator in NumPy arrays also works elementwise. The matrix product can be carried out either by a function dot()or by creating matrix objects, which will be considered later (in the second part of the manual).
Copy Source | Copy HTML
>>> A = array( [[1,1],
... [ 0,1]] )
>>> B = array( [[2, 0],
... [3,4]] )
>>> A*B # поэлементное произведение
array([[2,  0],
       [ 0, 4]])
>>> dot(A,B) # матричное произведение
array([[5, 4],
       [3, 4]])


Some operations are done “in place” without creating a new array.
Copy Source | Copy HTML
>>> a = ones((2,3), dtype=int)
>>> b = random.random((2,3))
>>> a *= 3
>>> a
array([[3, 3, 3],
       [3, 3, 3]])
>>> b += a
>>> b
array([[ 3.69092703, 3.8324276 , 3.0114541 ],
       [ 3.18679111, 3.3039349 , 3.37600289]])
>>> a += b # b конвертируется к типу int
>>> a
array([[6, 6, 6],
       [6, 6, 6]])


When working with arrays of different types, the type of the resulting array corresponds to a more general or more accurate type.
Copy Source | Copy HTML
>>> a = ones(3, dtype=int32)
>>> b = linspace( 0,pi,3)
>>> b.dtype.name
'float64'
>>> c = a+b
>>> c
array([ 1. , 2.57079633, 4.14159265])
>>> c.dtype.name
'float64'
>>> d = exp(c*1j)
>>> d
array([  0.54030231+ 0.84147098j, - 0.84147098+ 0.54030231j,
       - 0.54030231- 0.84147098j])
>>> d.dtype.name
'complex128'


Many unary operations, such as calculating the sum of all elements of an array, are represented as class methods ndarray.
Copy Source | Copy HTML
>>> a = random.random((2,3))
>>> a
array([[  0.6903007 ,  0.39168346,  0.16524769],
       [  0.48819875,  0.77188505,  0.94792155]])
>>> a.sum()
3.4552372100521485
>>> a.min()
 0.16524768654743593
>>> a.max()
 0.9479215542670073


By default, these operations are applied to the array as if it were a list of numbers, regardless of its shape. However, by specifying the parameter, axisyou can apply the operation on the specified axis of the array:
Copy Source | Copy HTML
>>> b = arange(12).reshape(3,4)
>>> b
array([[  0, 1, 2, 3],
       [ 4, 5, 6, 7],
       [ 8, 9, 10, 11]])
>>>
>>> b.sum(axis= 0) # сумма в каждом столбце
array([12, 15, 18, 21])
>>>
>>> b.min(axis=1) # наименьшее число в каждой строке
array([ 0, 4, 8])
>>>
>>> b.cumsum(axis=1) # накопительная сумма каждой строки
array([[  0, 1, 3, 6],
       [ 4, 9, 15, 22],
       [ 8, 17, 27, 38]])


Universal functions


NumPy provides work with well-known mathematical functions sin, cos, exp and so on. But in NumPy, these functions are called universal ( ufunc). The reason for assigning such a name lies in the fact that in NumPy these functions work with arrays also elementwise, and the output is an array of values.
Copy Source | Copy HTML
>>> B = arange(3)
>>> B
array([ 0, 1, 2])
>>> exp(B)
array([ 1. , 2.71828183, 7.3890561 ])
>>> sqrt(B)
array([  0. , 1. , 1.41421356])
>>> C = array([2., -1., 4.])
>>> add(B, C)
array([ 2.,  0., 6.])


Indexes, Slices, Iterations


One-dimensional arrays perform indexing, slicing, and iterating operations in a very similar fashion to regular lists and other Python sequences.
Copy Source | Copy HTML
>>> a = arange(10)**3
>>> a
array([  0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
>>> a[:6:2] = -1000 #  изменить элементы в a
>>> a
array([-1000, 1, -1000, 27. -1000, 125, 216, 343, 512, 729])
>>> a[::-1] # перевернуть a
array([ 729, 512, 343, 216, 125, -1000, 27, -1000, 1, -1000])
>>> for i in a:
... print i**(1/3.),
...
nan 1. 0 nan 3. 0 nan 5.0 6.0 7.0 8.0 9. 0


In multidimensional arrays for each axis, there is one index. Indexes are transmitted as a sequence of numbers, separated by commas:
Copy Source | Copy HTML
>>> def f(x,y):
... return 10*x+y
...
>>> b = fromfunction(f,(5,4),dtype=int)
>>> b
array([[  0, 1, 2, 3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])
>>> b[2,3]
23
>>> b[:,1] # второй столбец массива b
array([ 1, 11, 21, 31, 41])
>>> b[1:3,:] # вторая и третья строки массива b
array([[10, 11, 12, 13],
       [20, 21, 22, 23]])


When there are fewer indices than axes, missing indices are assumed to be padded with slices:
Copy Source | Copy HTML
>>> b[-1] # последняя строка. Эквивалентно b[-1,:]
array([40, 41, 42, 43])


b[i]can read as b[i, <столько символов ':', сколько нужно>]. In NumPy, this can also be written using dots like b[i, ...].

For example, if it xhas a rank of 5 (that is, it has 5 axes), then
  • x[1, 2, ...]equivalent x[1, 2, :, :, :],
  • x[... , 3]the same thing x[:, :, :, :, 3]and
  • x[4, ... , 5, :]that x[4, :, :, 5, :].

Copy Source | Copy HTML
>>> c = array( [ [[  0, 1, 2], # 3d array
... [ 10, 12, 13]],
...
... [[100,101,102],
... [110,112,113]] ] )
>>> c.shape
(2, 2, 3)
>>> c[1,...] # то же, что c[1,:,:] или c[1]
array([[100, 101, 102],
       [110, 112, 113]])
>>> c[...,2] # то же, что c[:,:,2]
array([[ 2, 13],
       [102, 113]])


Iteration of multidimensional arrays begins with the first axis:
Copy Source | Copy HTML
>>> for row in b:
... print row
...
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]


However, if you need to sort through the entire array element by element, as if it were one-dimensional, you can use the attribute for this flat:
Copy Source | Copy HTML
>>> for element in b.flat:
... print element,
...
0 1 2 3 10 11 12 13 20 21 22 23 30 31 32 33 40 41 42 43


Manipulation with the form


As already mentioned, an array has a form ( shape) defined by the number of elements along each axis:
Copy Source | Copy HTML
>>> a = floor(10*random.random((3,4)))
>>> a
array([[ 7., 5., 9., 3.],
       [ 7., 2., 7., 8.],
       [ 6., 8., 3., 2.]])
>>> a.shape
(3, 4)


The shape of the array can be changed using various commands:
Copy Source | Copy HTML
>>> a.ravel() # делает массив плоским
array([ 7., 5., 9., 3., 7., 2., 7., 8., 6., 8., 3., 2.])
>>> a.shape = (6, 2)
>>> a.transpose()
array([[ 7., 9., 7., 7., 6., 3.],
       [ 5., 3., 2., 8., 8., 2.]])


The order of the elements in the array as a result of the function ravel()corresponds to the usual "C-style", that is, the more to the right the index, the faster it "changes": the element a[0,0]follows a[0,1]. If one shape of the array has been changed to another, the array is re-formed also in the “C-style”. In this order, NumPy usually creates arrays, so a function ravel()usually does not need to copy the argument, but if the array was created from slices of another array, a copy may be required. Functions ravel()and reshape()can also work (using an additional argument) in the FORTRAN style, in which the more left index changes faster.

The function reshape()returns its argument with the changed form, while the method resize()modifies the array itself:
Copy Source | Copy HTML
>>> a
array([[ 7., 5.],
       [ 9., 3.],
       [ 7., 2.],
       [ 7., 8.],
       [ 6., 8.],
       [ 3., 2.]])
>>> a.resize((2,6))
>>> a
array([[ 7., 5., 9., 3., 7., 2.],
       [ 7., 8., 6., 8., 3., 2.]])


If during the operation of such a restructuring one of the arguments is specified as -1, then it is automatically calculated in accordance with the rest given:
Copy Source | Copy HTML
>>> a.reshape(3,-1)
array([[ 7., 5., 9., 3.],
       [ 7., 2., 7., 8.],
       [ 6., 8., 3., 2.]])


Array join


Several arrays can be combined together along different axes:
Copy Source | Copy HTML
>>> a = floor(10*random.random((2,2)))
>>> a
array([[ 1., 1.],
       [ 5., 8.]])
>>> b = floor(10*random.random((2,2)))
>>> b
array([[ 3., 3.],
       [ 6.,  0.]])
>>> vstack((a,b))
array([[ 1., 1.],
       [ 5., 8.],
       [ 3., 3.],
       [ 6.,  0.]])
>>> hstack((a,b))
array([[ 1., 1., 3., 3.],
       [ 5., 8., 6.,  0.]])


The function column_stack()combines one-dimensional arrays as columns of a two-dimensional array:
Copy Source | Copy HTML
>>> column_stack((a,b))
array([[ 1., 1., 3., 3.],
       [ 5., 8., 6.,  0.]])
>>> a=array([4.,2.])
>>> b=array([2.,8.])
>>> a[:,newaxis] # Это дает нам 2D-ветор
array([[ 4.],
       [ 2.]])
>>> column_stack((a[:,newaxis],b[:,newaxis]))
array([[ 4., 2.],
       [ 2., 8.]])
>>> vstack((a[:,newaxis],b[:,newaxis])) # Поведение vstack другое
array([[ 4.],
       [ 2.],
       [ 2.],
       [ 8.]])


Similarly, there is a function for strings row_stack(). For arrays with more than two axes, hstack()combines arrays along the first axes, vstack()- on the last, additional arguments allow you to specify the number of axes along which the union should occur.

In difficult cases, they can be useful r_[]and с_[], allowing you to create one-dimensional arrays, using sequences of numbers along one axis. They also have the ability to use ":" to specify a range of literals:

Copy Source | Copy HTML
>>> r_[1:4, 0,4]
array([1, 2, 3,  0, 4])


Division of one array into several smaller ones


Using hsplit()you can split the array along the horizontal axis, indicating either the number of returned arrays of the same shape, or the column numbers, after which the array is cut with scissors:
Copy Source | Copy HTML
>>> a = floor(10*random.random((2,12)))
>>> a
array([[ 8., 8., 3., 9.,  0., 4., 3.,  0.,  0., 6., 4., 4.],
       [  0., 3., 2., 9., 6.,  0., 4., 5., 7., 5., 1., 4.]])
>>> hsplit(a,3) # Разбить на 3 массива
[array([[ 8., 8., 3., 9.],
       [  0., 3., 2., 9.]]), array([[  0., 4., 3.,  0.],
       [ 6.,  0., 4., 5.]]), array([[  0., 6., 4., 4.],
       [ 7., 5., 1., 4.]])]
>>> hsplit(a,(3,4)) # Разрезать a после третьего и четвертого столбца
[array([[ 8., 8., 3.],
       [  0., 3., 2.]]), array([[ 9.],
       [ 9.]]), array([[  0., 4., 3.,  0.,  0., 6., 4., 4.],
       [ 6.,  0., 4., 5., 7., 5., 1., 4.]])]

The function vsplit()splits the array along the vertical axis, and array_split()allows you to specify the axis along which the split will occur.

Copies and Submissions


When working with arrays, their data sometimes needs to be copied to another array, and sometimes not. This is often a source of confusion among beginners. Perhaps only three cases:

No copies at all


Simple assignment does not create either a copy of the array or a copy of its data:
Copy Source | Copy HTML
>>> a = arange(12)
>>> b = a # никакого нового объекта создано не было
>>> b is a # a и b это два имени для одного объекта ndarray 
True
>>> b.shape = 3,4 # изменит форму a
>>> a.shape
(3, 4)


Python passes mutable objects as links, so function calls also don't make copies:
Copy Source | Copy HTML
>>> def f(x):
... print id(x)
...
>>> id(a)
148293216
>>> f(a)
148293216


View or surface copy


Different array objects can use the same data. The method view()creates a new array object, which is a representation of the same data.

Copy Source | Copy HTML
>>> c = a.view()
>>> c is a
False
>>> c.base is a # c это представление данных, принадлежащих a
True
>>> c.flags.owndata
False
>>>
>>> c.shape = 2,6 # форма а не поменяется
>>> a.shape
(3, 4)
>>> c[ 0,4] = 1234 # данные а изменятся
>>> a
array([[  0, 1, 2, 3],
       [1234, 5, 6, 7],
       [ 8, 9, 10, 11]])


An array slice is a representation:
Copy Source | Copy HTML
>>> s = a[:,1:3]
>>> s[:] = 10 # s[:] это представление s. Заметьте разницу между s=10 и s[:]=10
>>> a
array([[  0, 10, 10, 3],
       [1234, 10, 10, 7],
       [ 8, 10, 10, 11]])


Deep copy


The method copy()creates a real copy of the array and its data:
Copy Source | Copy HTML
>>> d = a.copy() # создается новый объект массива с новыми данными
>>> d is a
False
>>> d.base is a # d не имеет ничего общего с а
False
>>> d[ 0, 0] = 9999
>>> a
array([[  0, 10, 10, 3],
       [1234, 10, 10, 7],
       [ 8, 10, 10, 11]])


Finally


So, in the first part we examined the most important basic operations of working with arrays. In addition to this part, I recommend a good cheat sheet . In the second part, we will talk about more specific things: indexing using arrays of indices or Boolean variables, implementing linear algebra and class operations, matrixand various useful tricks.

Also popular now: