Skip to article frontmatterSkip to article content

Introduction

In materials science and engineering, we’re constantly searching for better materials. Better might mean stronger, lighter, more conductive, more stable, cheaper to produce, or a combination of these properties. “Better” almost always translates to finding the optimal value of some property or combination of properties. This is where optimization comes in.

Optimization, in a mathematical sense, is the process of finding the input values to a function that result in either the minimum or maximum output value of that function. We call this function we’re trying to minimize or maximize the objective function (sometimes also called the cost function or energy function).

Think about it:

In all these cases, we have a defined objective function (strength, energy, purity, loss, etc.) and a set of variables we can control (composition, atomic positions, reaction conditions, model’s parameters). Optimization is the systematic process of exploring the space of these variables to find the “best” possible outcome according to our objective function.

Basic Concepts and Overview

Before diving into specific algorithms, let’s define some key concepts:

Objective Function

Object function f(x) = x^2+2x+1. The minimum value of this function is 0 at x=-1.

Object function f(x)=x2+2x+1f(x) = x^2+2x+1. The minimum value of this function is 0 at x=1x=-1.

This is the function we want to minimize or maximize. x represents the input variables (often a vector). For example:

Variables

These are the parameters we can adjust to influence the objective function. They can be:

Constraints

The constrains of the optimization problem. Here, the feasible region is constrained by the inequality x\leq 0.5 and x\geq -2.5.

The constrains of the optimization problem. Here, the feasible region is constrained by the inequality x0.5x\leq 0.5 and x2.5x\geq -2.5.

Often, we can’t simply choose any value for our variables. Constraints define the allowed range or relationships between variables. Examples:

Local vs. Global Optima

Local and global minima of a function. The global minimum is the lowest point in the entire search space, while local minima are lower than nearby points.

Local and global minima of a function. The global minimum is the lowest point in the entire search space, while local minima are lower than nearby points.

Finding the global minimum is generally much harder than finding a local minimum. Many optimization algorithms can get “stuck” in local minima, failing to find the true global optimum.

Convexity

Convex and non-convex functions. Convex functions have only one minimum, which is the global minimum. Non-convex functions have multiple local minima.

Convex and non-convex functions. Convex functions have only one minimum, which is the global minimum. Non-convex functions have multiple local minima.

A function is convex if, for any two points within its domain, the line segment connecting those points lies entirely above or on the function’s curve. Mathematically:

f(λx1+(1λ)x2)λf(x1)+(1λ)f(x2)f(\lambda x_1 + (1 - \lambda)x_2) \le \lambda f(x_1) + (1 - \lambda)f(x_2)

for any x1x_1, x2x_2 in the domain and any λ between 0 and 1.

Convex functions are “nice” because they have only one minimum, which is the global minimum. If a function is non-convex (has multiple bumps and valleys), finding the global minimum is much more challenging.

Gradient

Left panel: The gradient of 1D function f(x) = (x+1)^2. The gradient is shown as arrows, pointing in the opposite direction of steepest ascent. Right panel: The gradient of 2D function f(x_1, x_2) = (x_1-1)^2 + (x_2+1)^2. The gradient is shown as vectors, pointing in the direction of steepest ascent.

Left panel: The gradient of 1D function f(x)=(x+1)2f(x) = (x+1)^2. The gradient is shown as arrows, pointing in the opposite direction of steepest ascent. Right panel: The gradient of 2D function f(x1,x2)=(x11)2+(x2+1)2f(x_1, x_2) = (x_1-1)^2 + (x_2+1)^2. The gradient is shown as vectors, pointing in the direction of steepest ascent.

The gradient, denoted by f(x)\nabla f(x), is a vector that points in the direction of the steepest ascent of the function at a given point. Its components are the partial derivatives of the function with respect to each variable:

f(x)=[fx1,fx2,...,fxn]\nabla f(x) = \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n} \right]

The negative gradient, f(x)-\nabla f(x), points in the direction of steepest descent. This is fundamental to many optimization algorithms. The gradient can be computed using numerical differentiation or symbolic differentiation.

Hessian

The Hessian matrix is the square matrix of second-order partial derivatives of a scalar-valued function.

H(f)=[2fx122fx1x22fx1xn2fx2x12fx222fx2xn2fxnx12fxnx22fxn2]H(f) = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}

The Hessian provides information about the local curvature of the function. If the Hessian is positive definite at a point, the function has a local minimum there. If it is negative definite, the function has a local maximum. If the Hessian has both positive and negative eigenvalues, the point is a saddle point. Hessian can be computed using numerical differentiation or symbolic differentiation, but it can be computationally expensive for high-dimensional functions.

Challenges

Optimizing materials presents significant challenges due to the inherent complexity of materials and the computational demands involved.

High Dimensionality

Material properties are influenced by numerous factors (composition, structure, processing), creating a vast, high-dimensional search space. This “curse of dimensionality” makes finding the global optimum exponentially harder.

Non-Convexity

The relationship between a material’s properties and its defining variables (the objective function) is often non-convex, exhibiting multiple local minima. Algorithms can get trapped in these local minima, mistaking them for the global optimum.

Computational Cost

Evaluating the objective function frequently requires computationally expensive simulations (e.g., DFT, MD) or experiments. This limits the number of evaluations, hindering thorough exploration of the search space.

Discrete and Continuous Variables

Optimization problems often involve both discrete variables (e.g., element choices, crystal structure type) and continuous variables (e.g., composition, temperature). This mixture requires specialized algorithms.

Noisy Objective Functions

Experimental measurements and computational simulations are subject to noise. This makes it harder to determine the true objective function value and can mislead optimization algorithms.

Physical and Chemical Constraints

Optimization must adhere to physical and chemical constraints (charge neutrality, phase stability, stoichiometry, manufacturability), limiting the search space.

Multi-Objective Optimization

Optimizing multiple properties simultaneously (e.g., strength and ductility) leads to a set of Pareto-optimal solutions (the Pareto front), where improving one objective worsens another. This adds complexity to decision-making. The objective function will be the weighted sum of multiple functions, F(x)=w1f1(x)+w2f2(x)F(x) = w_1 f_1(x) + w_2 f_2(x).