Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

NumPy

What is NumPy?

NumPy (Numerical Python) is a foundational library for scientific computing and data analysis in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these efficiently.

Why NumPy is Essential for This Course

Throughout this course, you’ll use NumPy extensively:

  • Week 2-13: Practical notebooks for data analysis

  • Materials science: Lattice vectors, atomic positions, energy calculations

  • Machine learning: Feature extraction, data preprocessing

  • High-throughput: Processing large datasets efficiently

Learning Objectives

By the end of this chapter, you should:

  • Create and manipulate NumPy arrays

  • Understand array indexing, slicing, and broadcasting

  • Perform numerical operations efficiently

  • Use NumPy’s vectorized operations for performance

Topics

  1. Array Creation and Shapes

  2. Array Indexing and Slicing

  3. Array Operations

  4. Broadcasting

  5. Linear Algebra with NumPy

  6. Reshaping and Data Manipulation

  7. File I/O

  8. Performance Considerations

  9. Materials Science Applications


Array Creation and Shapes

Creating Arrays

import numpy as np

# 1D array (vector)
energy_levels = np.array([0.5, 1.2, 2.5, 3.0])
print(f"1D array shape: {energy_levels.shape}")  # (4,)

# 2D array (matrix)
lattice_vectors = np.array([[5.43, 0, 0],
                               [0, 5.43, 0],
                               [0, 0, 5.43]])
print(f"2D array shape: {lattice_vectors.shape}")  # (3, 3)

# Scalar
temperature = 298.0  # Room temperature in Kelvin
print(f"Scalar: {temperature}, shape: {np.array(temperature).shape}")  # ()
1D array shape: (4,)
2D array shape: (3, 3)
Scalar: 298.0, shape: ()

Array Shapes

The shape attribute of an array returns a tuple (nrows, ncols) indicating the array’s dimensions.

# Check shapes
print(f"Energy levels shape: {energy_levels.shape}")
print(f"Energy levels ndim: {energy_levels.ndim}")  # 1
print(f"Lattice vectors ndim: {lattice_vectors.ndim}")  # 2
print(f"Temperature ndim: {np.array(temperature).ndim}")  # 0
Energy levels shape: (4,)
Energy levels ndim: 1
Lattice vectors ndim: 2
Temperature ndim: 0

Array Indexing and Slicing

Basic Indexing

# Create a 3x3 matrix
matrix = np.array([[1, 2, 3],
                    [4, 5, 6],
                    [7, 8, 9]])

print("Matrix:")
print(matrix)
print()

# Access elements
print(f"matrix[0, 0]: {matrix[0, 0]}")  # 1
print(f"matrix[1, 0]: {matrix[1, 0]}")  # 4
print(f"matrix[2, 2]: {matrix[2, 2]}")  # 9
Matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

matrix[0, 0]: 1
matrix[1, 0]: 4
matrix[2, 2]: 9

Slicing with Ranges

# Slice first row
print(f"First row: {matrix[0, :]}")  # [1, 2, 3]

# Slice last two rows
print(f"Last two rows:\n{matrix[-2:, :]}")  # rows 1 and 2

# Slice columns
print(f"First two columns: {matrix[:, :2]}")  # columns 0 and 1
First row: [1 2 3]
Last two rows:
[[4 5 6]
 [7 8 9]]
First two columns: [[1 2]
 [4 5]
 [7 8]]

Boolean Indexing

# Boolean indexing
grades = np.array([70, 85, 92, 78, 88, 65, 72, 95])
passing = grades >= 75
print(f"Passing grades: {grades[passing]}")
print(f"Grade count: {np.sum(passing)}")  # 5 passed
Passing grades: [85 92 78 88 95]
Grade count: 5

Applications in Materials Science

Example: Silicon diamond structure (8 atoms)

# Atomic positions - Nx3 array
# Example: Silicon diamond structure (8 atoms)
si_positions = np.array([
    [0.000, 0.000, 0.000],
    [0.125, 0.125, 0.125],
    [0.250, 0.250, 0.250],
    [0.375, 0.375, 0.375],
    [0.500, 0.500, 0.500],
    [0.625, 0.625, 0.625],
    [0.750, 0.750, 0.750],
    [0.875, 0.875, 0.875]
])

print(f"Silicon positions shape: {si_positions.shape}")  # (8, 3)
print(f"Si lattice constant: {si_positions[0, 0]}")  # 0.0
Silicon positions shape: (8, 3)
Si lattice constant: 0.0

Array Operations

Element-wise Operations

# Basic operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(f"a + b = {a + b}")  # [5, 7, 9]
print(f"b - a = {b - a}")  # [3, 3, 3]
print(f"a * b = {a * b}")  # [4, 10, 18]
print(f"a / b = {a / b}")  # [0.25, 0.4, 0.5]
print(f"a ** 2 = {a ** 2}")  # [1, 4, 9]
a + b = [5 7 9]
b - a = [3 3 3]
a * b = [ 4 10 18]
a / b = [0.25 0.4  0.5 ]
a ** 2 = [1 4 9]

Statistical Operations

# Statistics on array
data = np.array([23.5, 28.2, 27.1, 26.3, 24.9, 25.0,
               23.8, 24.2, 20.3, 22.7, 21.1, 20.5])

print(f"Mean: {np.mean(data):.2f}")
print(f"Standard deviation: {np.std(data):.2f}")
print(f"Min: {np.min(data):.2f}")
print(f"Max: {np.max(data):.2f}")
print(f"Sum: {np.sum(data):.2f}")
print(f"Median: {np.median(data):.2f}")
Mean: 23.97
Standard deviation: 2.43
Min: 20.30
Max: 28.20
Sum: 287.60
Median: 24.00

Materials science example: bulk modulus

# Materials science example: bulk modulus
bulk_moduli = np.array([100, 150, 200, 300, 250])
print(f"Mean bulk modulus: {np.mean(bulk_moduli):.2f} GPa")
print(f"Standard deviation: {np.std(bulk_moduli):.2f} GPa")
Mean bulk modulus: 200.00 GPa
Standard deviation: 70.71 GPa

Linear Algebra with NumPy

NumPy includes linear algebra operations essential for materials science calculations.

Matrix Multiplication and Dot Products

# Dot product: scalar result
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
dot_product = np.dot(a, b)
print(f"Dot product: {dot_product}")  # 1*4 + 2*5 + 3*6 = 32

# Matrix multiplication: @ operator
matrix1 = np.array([[1, 2],
                    [3, 4]])
matrix2 = np.array([[5, 6],
                    [7, 8]])
result = matrix1 @ matrix2
print(f"Matrix product:\n{result}")

# Lattice coordinate transformation
# Transform fractional to Cartesian coordinates
lattice_vectors = np.array([[5.43, 0.0, 0.0],      # a
                            [0.0, 5.43, 0.0],      # b
                            [0.0, 0.0, 5.43]])     # c

fractional_coords = np.array([[0.0, 0.0, 0.0],
                              [0.25, 0.25, 0.25],
                              [0.5, 0.0, 0.0]])

# Transform to Cartesian
cartesian_coords = fractional_coords @ lattice_vectors
print(f"Cartesian coordinates:\n{cartesian_coords}")
Dot product: 32
Matrix product:
[[19 22]
 [43 50]]
Cartesian coordinates:
[[0.     0.     0.    ]
 [1.3575 1.3575 1.3575]
 [2.715  0.     0.    ]]

Solving Linear Systems and Eigenvalues

# Solve linear system Ax = b
A = np.array([[3, 1],
              [1, 2]])
b = np.array([9, 8])

x = np.linalg.solve(A, b)
print(f"Solution to Ax = b: {x}")  # [2. 3.]

# Verify: A @ x should equal b
print(f"Verification: A @ x = {A @ x}")

# Eigenvalues and eigenvectors (important for vibrational modes, bandstructure)
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"\nEigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")
Solution to Ax = b: [2. 3.]
Verification: A @ x = [9. 8.]

Eigenvalues: [3.61803399 1.38196601]
Eigenvectors:
[[ 0.85065081 -0.52573111]
 [ 0.52573111  0.85065081]]

Broadcasting

What is Broadcasting?

NumPy’s broadcasting mechanism allows operations between arrays of different shapes without explicit loops.

Basic Broadcasting Example

# Example: Scale energy levels by different constants
energy_levels = np.array([[0.5, 1.0, 2.0],
                        [0.8, 1.2, 2.0],
                        [1.6, 1.4, 2.0]])

scaling_factors = np.array([0.8, 1.1, 1.2])

# Broadcasting: (3, 3) * (3,)
scaled_energies = energy_levels * scaling_factors
print(f"Scaled energies shape: {scaled_energies.shape}")  # (3, 3)
print(f"Scaled energies:\n{scaled_energies}")
Scaled energies shape: (3, 3)
Scaled energies:
[[0.4  1.1  2.4 ]
 [0.64 1.32 2.4 ]
 [1.28 1.54 2.4 ]]

Reshaping and Data Manipulation

Reshaping Arrays

# Reshape array
data = np.arange(12)  # [0, 1, 2, ..., 11]
print(f"Original shape: {data.shape}")

reshaped = data.reshape(3, 4)
print(f"Reshaped to (3, 4):\n{reshaped}")

reshaped_3d = data.reshape(2, 2, 3)
print(f"Reshaped to (2, 2, 3):\n{reshaped_3d}")

# Flatten (opposite of reshape)
flattened = reshaped.flatten()
print(f"Flattened: {flattened}")

# Transpose (useful for coordinate transformations)
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])
print(f"Original shape: {matrix.shape}")
print(f"Transposed shape: {matrix.T.shape}")
print(f"Transposed:\n{matrix.T}")
Original shape: (12,)
Reshaped to (3, 4):
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Reshaped to (2, 2, 3):
[[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]]
Flattened: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Original shape: (2, 3)
Transposed shape: (3, 2)
Transposed:
[[1 4]
 [2 5]
 [3 6]]

Concatenating and Stacking Arrays

# Concatenate arrays
positions_frame1 = np.array([[1.0, 2.0, 3.0],
                             [4.0, 5.0, 6.0]])
positions_frame2 = np.array([[1.1, 2.1, 3.1],
                             [4.1, 5.1, 6.1]])

# Stack along a new axis (useful for trajectory data)
trajectory = np.stack([positions_frame1, positions_frame2])
print(f"Trajectory shape (2 frames, 2 atoms, 3 coords): {trajectory.shape}")
print(f"Trajectory:\n{trajectory}")

# Concatenate along axis
stacked = np.concatenate([positions_frame1, positions_frame2], axis=0)
print(f"\nConcatenated shape: {stacked.shape}")
Trajectory shape (2 frames, 2 atoms, 3 coords): (2, 2, 3)
Trajectory:
[[[1.  2.  3. ]
  [4.  5.  6. ]]

 [[1.1 2.1 3.1]
  [4.1 5.1 6.1]]]

Concatenated shape: (4, 3)

Performance Considerations

Vectorization

NumPy’s vectorized operations are implemented in C and optimized for performance.

import time

# BAD: Using loop for element-wise operations
a = np.random.rand(1000, 1000)

start = time.time()
result_loop = np.zeros_like(a)
for i in range(a.shape[0]):
    result_loop[i] = a[i] * 2
time_loop = time.time() - start

# GOOD: Vectorized operation
start = time.time()
result_vectorized = a * 2
time_vectorized = time.time() - start

print(f"Loop time: {time_loop:.4f} seconds")
print(f"Vectorized time: {time_vectorized:.4f} seconds")
print(f"Speedup: {time_loop/time_vectorized:.1f}x")
Loop time: 0.0044 seconds
Vectorized time: 0.0019 seconds
Speedup: 2.3x

File I/O with NumPy

Saving and Loading Arrays

# Save array to binary format (.npy) - preserves precision and is fast
energy_data = np.array([0.5, 1.2, 2.5, 3.0])
np.save('energy_data.npy', energy_data)

# Load array
loaded_energy = np.load('energy_data.npy')
print(f"Loaded energy data: {loaded_energy}")

# Save multiple arrays
positions = np.array([[1, 2, 3], [4, 5, 6]])
forces = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])
np.savez('trajectory.npz', positions=positions, forces=forces)

# Load multiple arrays
data = np.load('trajectory.npz')
print(f"\nLoaded positions:\n{data['positions']}")
print(f"Loaded forces:\n{data['forces']}")

# Save/load text format (human-readable, larger files)
np.savetxt('positions_text.txt', positions, fmt='%.6f', delimiter=',')
loaded_text = np.loadtxt('positions_text.txt', delimiter=',')
print(f"\nLoaded from text:\n{loaded_text}")
Loaded energy data: [0.5 1.2 2.5 3. ]

Loaded positions:
[[1 2 3]
 [4 5 6]]
Loaded forces:
[[0.1 0.2 0.3]
 [0.4 0.5 0.6]]

Loaded from text:
[[1. 2. 3.]
 [4. 5. 6.]]

Memory Efficiency

# Create large array
large_array = np.ones((1000, 1000))  # 8MB

# BAD: Creating copy in memory (wasteful)
# large_array_copy = large_array.copy()

# GOOD: Use view or assignment
large_array_view = large_array[:, 100]  # Memory efficient
print(f"View shape: {large_array_view.shape}")
View shape: (1000,)

Key Takeaways

  1. Use np.array() instead of lists for numerical data

  2. Understand ndim: Check array dimensionality before operations

  3. Use slicing : instead of loops when possible

  4. Prefer vectorized operations: * and @ for element-wise operations

  5. Check data types: Use appropriate types to save memory

  6. Use np.sum(), np.mean(), etc. instead of Python’s built-in sum(), mean()

  7. Be aware of broadcasting: Understand shape compatibility rules


Materials Science Applications

Distance Calculations

# Calculate distances between atoms
atom1 = np.array([0.0, 0.0, 0.0])
atom2 = np.array([1.0, 2.0, 2.0])

# Euclidean distance
distance = np.linalg.norm(atom2 - atom1)
print(f"Distance between atoms: {distance:.4f}")

# Pairwise distances (useful for radial distribution functions)
positions = np.array([[0.0, 0.0, 0.0],
                      [1.0, 0.0, 0.0],
                      [0.0, 1.0, 0.0],
                      [1.0, 1.0, 0.0]])

# Calculate all pairwise distances
diff = positions[:, None, :] - positions[None, :, :]
distances = np.linalg.norm(diff, axis=2)

print(f"\nPairwise distances:\n{distances}")
Distance between atoms: 3.0000

Pairwise distances:
[[0.         1.         1.         1.41421356]
 [1.         0.         1.41421356 1.        ]
 [1.         1.41421356 0.         1.        ]
 [1.41421356 1.         1.         0.        ]]

Angle Calculations

# Calculate angle between three atoms (bond angle)
atom_a = np.array([0.0, 0.0, 0.0])
atom_b = np.array([1.0, 0.0, 0.0])
atom_c = np.array([0.0, 1.0, 0.0])

# Vectors from center atom (b) to other atoms
vec_ba = atom_a - atom_b
vec_bc = atom_c - atom_b

# Angle using dot product
cos_angle = np.dot(vec_ba, vec_bc) / (np.linalg.norm(vec_ba) * np.linalg.norm(vec_bc))
angle_rad = np.arccos(cos_angle)
angle_deg = np.degrees(angle_rad)

print(f"Bond angle: {angle_deg:.2f}°")

# Dihedral angle calculation (more complex, useful for molecular conformations)
# This would involve four atoms - often implemented in ASE or other libraries
Bond angle: 45.00°