NumPy - Materials Informatics

What is NumPy?¶

NumPy (Numerical Python) is a foundational library for scientific computing and data analysis in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these efficiently.

Why NumPy is Essential for This Course¶

Throughout this course, you’ll use NumPy extensively:

Week 2-13: Practical notebooks for data analysis
Materials science: Lattice vectors, atomic positions, energy calculations
Machine learning: Feature extraction, data preprocessing
High-throughput: Processing large datasets efficiently

Learning Objectives¶

By the end of this chapter, you should:

Create and manipulate NumPy arrays
Understand array indexing, slicing, and broadcasting
Perform numerical operations efficiently
Use NumPy’s vectorized operations for performance

Topics¶

Array Creation and Shapes
Array Indexing and Slicing
Array Operations
Broadcasting
Linear Algebra with NumPy
Reshaping and Data Manipulation
File I/O
Performance Considerations
Materials Science Applications

Array Creation and Shapes¶

Creating Arrays¶

import numpy as np

# 1D array (vector)
energy_levels = np.array([0.5, 1.2, 2.5, 3.0])
print(f"1D array shape: {energy_levels.shape}")  # (4,)

# 2D array (matrix)
lattice_vectors = np.array([[5.43, 0, 0],
                               [0, 5.43, 0],
                               [0, 0, 5.43]])
print(f"2D array shape: {lattice_vectors.shape}")  # (3, 3)

# Scalar
temperature = 298.0  # Room temperature in Kelvin
print(f"Scalar: {temperature}, shape: {np.array(temperature).shape}")  # ()

1D array shape: (4,)
2D array shape: (3, 3)
Scalar: 298.0, shape: ()

Array Shapes¶

The shape attribute of an array returns a tuple (nrows, ncols) indicating the array’s dimensions.

# Check shapes
print(f"Energy levels shape: {energy_levels.shape}")
print(f"Energy levels ndim: {energy_levels.ndim}")  # 1
print(f"Lattice vectors ndim: {lattice_vectors.ndim}")  # 2
print(f"Temperature ndim: {np.array(temperature).ndim}")  # 0

Energy levels shape: (4,)
Energy levels ndim: 1
Lattice vectors ndim: 2
Temperature ndim: 0

Array Indexing and Slicing¶

Basic Indexing¶

# Create a 3x3 matrix
matrix = np.array([[1, 2, 3],
                    [4, 5, 6],
                    [7, 8, 9]])

print("Matrix:")
print(matrix)
print()

# Access elements
print(f"matrix[0, 0]: {matrix[0, 0]}")  # 1
print(f"matrix[1, 0]: {matrix[1, 0]}")  # 4
print(f"matrix[2, 2]: {matrix[2, 2]}")  # 9

Matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

matrix[0, 0]: 1
matrix[1, 0]: 4
matrix[2, 2]: 9

Slicing with Ranges¶

# Slice first row
print(f"First row: {matrix[0, :]}")  # [1, 2, 3]

# Slice last two rows
print(f"Last two rows:\n{matrix[-2:, :]}")  # rows 1 and 2

# Slice columns
print(f"First two columns: {matrix[:, :2]}")  # columns 0 and 1

First row: [1 2 3]
Last two rows:
[[4 5 6]
 [7 8 9]]
First two columns: [[1 2]
 [4 5]
 [7 8]]

Boolean Indexing¶

# Boolean indexing
grades = np.array([70, 85, 92, 78, 88, 65, 72, 95])
passing = grades >= 75
print(f"Passing grades: {grades[passing]}")
print(f"Grade count: {np.sum(passing)}")  # 5 passed

Passing grades: [85 92 78 88 95]
Grade count: 5

Applications in Materials Science¶

Example: Silicon diamond structure (8 atoms)

# Atomic positions - Nx3 array
# Example: Silicon diamond structure (8 atoms)
si_positions = np.array([
    [0.000, 0.000, 0.000],
    [0.125, 0.125, 0.125],
    [0.250, 0.250, 0.250],
    [0.375, 0.375, 0.375],
    [0.500, 0.500, 0.500],
    [0.625, 0.625, 0.625],
    [0.750, 0.750, 0.750],
    [0.875, 0.875, 0.875]
])

print(f"Silicon positions shape: {si_positions.shape}")  # (8, 3)
print(f"Si lattice constant: {si_positions[0, 0]}")  # 0.0

Silicon positions shape: (8, 3)
Si lattice constant: 0.0

Array Operations¶

Element-wise Operations¶

# Basic operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(f"a + b = {a + b}")  # [5, 7, 9]
print(f"b - a = {b - a}")  # [3, 3, 3]
print(f"a * b = {a * b}")  # [4, 10, 18]
print(f"a / b = {a / b}")  # [0.25, 0.4, 0.5]
print(f"a ** 2 = {a ** 2}")  # [1, 4, 9]

a + b = [5 7 9]
b - a = [3 3 3]
a * b = [ 4 10 18]
a / b = [0.25 0.4  0.5 ]
a ** 2 = [1 4 9]

Statistical Operations¶

# Statistics on array
data = np.array([23.5, 28.2, 27.1, 26.3, 24.9, 25.0,
               23.8, 24.2, 20.3, 22.7, 21.1, 20.5])

print(f"Mean: {np.mean(data):.2f}")
print(f"Standard deviation: {np.std(data):.2f}")
print(f"Min: {np.min(data):.2f}")
print(f"Max: {np.max(data):.2f}")
print(f"Sum: {np.sum(data):.2f}")
print(f"Median: {np.median(data):.2f}")

Mean: 23.97
Standard deviation: 2.43
Min: 20.30
Max: 28.20
Sum: 287.60
Median: 24.00

Materials science example: bulk modulus¶

# Materials science example: bulk modulus
bulk_moduli = np.array([100, 150, 200, 300, 250])
print(f"Mean bulk modulus: {np.mean(bulk_moduli):.2f} GPa")
print(f"Standard deviation: {np.std(bulk_moduli):.2f} GPa")

Mean bulk modulus: 200.00 GPa
Standard deviation: 70.71 GPa

Linear Algebra with NumPy¶

NumPy includes linear algebra operations essential for materials science calculations.

Matrix Multiplication and Dot Products¶

# Dot product: scalar result
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
dot_product = np.dot(a, b)
print(f"Dot product: {dot_product}")  # 1*4 + 2*5 + 3*6 = 32

# Matrix multiplication: @ operator
matrix1 = np.array([[1, 2],
                    [3, 4]])
matrix2 = np.array([[5, 6],
                    [7, 8]])
result = matrix1 @ matrix2
print(f"Matrix product:\n{result}")

# Lattice coordinate transformation
# Transform fractional to Cartesian coordinates
lattice_vectors = np.array([[5.43, 0.0, 0.0],      # a
                            [0.0, 5.43, 0.0],      # b
                            [0.0, 0.0, 5.43]])     # c

fractional_coords = np.array([[0.0, 0.0, 0.0],
                              [0.25, 0.25, 0.25],
                              [0.5, 0.0, 0.0]])

# Transform to Cartesian
cartesian_coords = fractional_coords @ lattice_vectors
print(f"Cartesian coordinates:\n{cartesian_coords}")

Dot product: 32
Matrix product:
[[19 22]
 [43 50]]
Cartesian coordinates:
[[0.     0.     0.    ]
 [1.3575 1.3575 1.3575]
 [2.715  0.     0.    ]]

Solving Linear Systems and Eigenvalues¶

# Solve linear system Ax = b
A = np.array([[3, 1],
              [1, 2]])
b = np.array([9, 8])

x = np.linalg.solve(A, b)
print(f"Solution to Ax = b: {x}")  # [2. 3.]

# Verify: A @ x should equal b
print(f"Verification: A @ x = {A @ x}")

# Eigenvalues and eigenvectors (important for vibrational modes, bandstructure)
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"\nEigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")

Solution to Ax = b: [2. 3.]
Verification: A @ x = [9. 8.]

Eigenvalues: [3.61803399 1.38196601]
Eigenvectors:
[[ 0.85065081 -0.52573111]
 [ 0.52573111  0.85065081]]

Broadcasting¶

What is Broadcasting?¶

NumPy’s broadcasting mechanism allows operations between arrays of different shapes without explicit loops.

Basic Broadcasting Example¶

# Example: Scale energy levels by different constants
energy_levels = np.array([[0.5, 1.0, 2.0],
                        [0.8, 1.2, 2.0],
                        [1.6, 1.4, 2.0]])

scaling_factors = np.array([0.8, 1.1, 1.2])

# Broadcasting: (3, 3) * (3,)
scaled_energies = energy_levels * scaling_factors
print(f"Scaled energies shape: {scaled_energies.shape}")  # (3, 3)
print(f"Scaled energies:\n{scaled_energies}")

Scaled energies shape: (3, 3)
Scaled energies:
[[0.4  1.1  2.4 ]
 [0.64 1.32 2.4 ]
 [1.28 1.54 2.4 ]]

Reshaping and Data Manipulation¶

Reshaping Arrays¶

# Reshape array
data = np.arange(12)  # [0, 1, 2, ..., 11]
print(f"Original shape: {data.shape}")

reshaped = data.reshape(3, 4)
print(f"Reshaped to (3, 4):\n{reshaped}")

reshaped_3d = data.reshape(2, 2, 3)
print(f"Reshaped to (2, 2, 3):\n{reshaped_3d}")

# Flatten (opposite of reshape)
flattened = reshaped.flatten()
print(f"Flattened: {flattened}")

# Transpose (useful for coordinate transformations)
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])
print(f"Original shape: {matrix.shape}")
print(f"Transposed shape: {matrix.T.shape}")
print(f"Transposed:\n{matrix.T}")

Original shape: (12,)
Reshaped to (3, 4):
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Reshaped to (2, 2, 3):
[[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]]
Flattened: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Original shape: (2, 3)
Transposed shape: (3, 2)
Transposed:
[[1 4]
 [2 5]
 [3 6]]

Concatenating and Stacking Arrays¶

# Concatenate arrays
positions_frame1 = np.array([[1.0, 2.0, 3.0],
                             [4.0, 5.0, 6.0]])
positions_frame2 = np.array([[1.1, 2.1, 3.1],
                             [4.1, 5.1, 6.1]])

# Stack along a new axis (useful for trajectory data)
trajectory = np.stack([positions_frame1, positions_frame2])
print(f"Trajectory shape (2 frames, 2 atoms, 3 coords): {trajectory.shape}")
print(f"Trajectory:\n{trajectory}")

# Concatenate along axis
stacked = np.concatenate([positions_frame1, positions_frame2], axis=0)
print(f"\nConcatenated shape: {stacked.shape}")

Trajectory shape (2 frames, 2 atoms, 3 coords): (2, 2, 3)
Trajectory:
[[[1.  2.  3. ]
  [4.  5.  6. ]]

 [[1.1 2.1 3.1]
  [4.1 5.1 6.1]]]

Concatenated shape: (4, 3)

Performance Considerations¶

Vectorization¶

NumPy’s vectorized operations are implemented in C and optimized for performance.

import time

# BAD: Using loop for element-wise operations
a = np.random.rand(1000, 1000)

start = time.time()
result_loop = np.zeros_like(a)
for i in range(a.shape[0]):
    result_loop[i] = a[i] * 2
time_loop = time.time() - start

# GOOD: Vectorized operation
start = time.time()
result_vectorized = a * 2
time_vectorized = time.time() - start

print(f"Loop time: {time_loop:.4f} seconds")
print(f"Vectorized time: {time_vectorized:.4f} seconds")
print(f"Speedup: {time_loop/time_vectorized:.1f}x")

Loop time: 0.0044 seconds
Vectorized time: 0.0019 seconds
Speedup: 2.3x

File I/O with NumPy¶

Saving and Loading Arrays¶

# Save array to binary format (.npy) - preserves precision and is fast
energy_data = np.array([0.5, 1.2, 2.5, 3.0])
np.save('energy_data.npy', energy_data)

# Load array
loaded_energy = np.load('energy_data.npy')
print(f"Loaded energy data: {loaded_energy}")

# Save multiple arrays
positions = np.array([[1, 2, 3], [4, 5, 6]])
forces = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])
np.savez('trajectory.npz', positions=positions, forces=forces)

# Load multiple arrays
data = np.load('trajectory.npz')
print(f"\nLoaded positions:\n{data['positions']}")
print(f"Loaded forces:\n{data['forces']}")

# Save/load text format (human-readable, larger files)
np.savetxt('positions_text.txt', positions, fmt='%.6f', delimiter=',')
loaded_text = np.loadtxt('positions_text.txt', delimiter=',')
print(f"\nLoaded from text:\n{loaded_text}")

Loaded energy data: [0.5 1.2 2.5 3. ]

Loaded positions:
[[1 2 3]
 [4 5 6]]
Loaded forces:
[[0.1 0.2 0.3]
 [0.4 0.5 0.6]]

Loaded from text:
[[1. 2. 3.]
 [4. 5. 6.]]

Memory Efficiency¶

# Create large array
large_array = np.ones((1000, 1000))  # 8MB

# BAD: Creating copy in memory (wasteful)
# large_array_copy = large_array.copy()

# GOOD: Use view or assignment
large_array_view = large_array[:, 100]  # Memory efficient
print(f"View shape: {large_array_view.shape}")

View shape: (1000,)

Key Takeaways¶

Use np.array() instead of lists for numerical data
Understand ndim: Check array dimensionality before operations
Use slicing : instead of loops when possible
Prefer vectorized operations: * and @ for element-wise operations
Check data types: Use appropriate types to save memory
Use np.sum(), np.mean(), etc. instead of Python’s built-in sum(), mean()
Be aware of broadcasting: Understand shape compatibility rules

Materials Science Applications¶

Distance Calculations¶

# Calculate distances between atoms
atom1 = np.array([0.0, 0.0, 0.0])
atom2 = np.array([1.0, 2.0, 2.0])

# Euclidean distance
distance = np.linalg.norm(atom2 - atom1)
print(f"Distance between atoms: {distance:.4f}")

# Pairwise distances (useful for radial distribution functions)
positions = np.array([[0.0, 0.0, 0.0],
                      [1.0, 0.0, 0.0],
                      [0.0, 1.0, 0.0],
                      [1.0, 1.0, 0.0]])

# Calculate all pairwise distances
diff = positions[:, None, :] - positions[None, :, :]
distances = np.linalg.norm(diff, axis=2)

print(f"\nPairwise distances:\n{distances}")

Distance between atoms: 3.0000

Pairwise distances:
[[0.         1.         1.         1.41421356]
 [1.         0.         1.41421356 1.        ]
 [1.         1.41421356 0.         1.        ]
 [1.41421356 1.         1.         0.        ]]

Angle Calculations¶

# Calculate angle between three atoms (bond angle)
atom_a = np.array([0.0, 0.0, 0.0])
atom_b = np.array([1.0, 0.0, 0.0])
atom_c = np.array([0.0, 1.0, 0.0])

# Vectors from center atom (b) to other atoms
vec_ba = atom_a - atom_b
vec_bc = atom_c - atom_b

# Angle using dot product
cos_angle = np.dot(vec_ba, vec_bc) / (np.linalg.norm(vec_ba) * np.linalg.norm(vec_bc))
angle_rad = np.arccos(cos_angle)
angle_deg = np.degrees(angle_rad)

print(f"Bond angle: {angle_deg:.2f}°")

# Dihedral angle calculation (more complex, useful for molecular conformations)
# This would involve four atoms - often implemented in ASE or other libraries

Bond angle: 45.00°