Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Features and Descriptors

Raw materials data, such as a list of atomic species and their coordinates, is often not directly ingestible by standard ML algorithms, which typically require fixed-size numerical input vectors. The process of transforming raw data into a suitable numerical format is known as feature engineering or descriptor generation. The resulting numerical representations are called features or descriptors.

In machine learning, structures are represented as vectors of features or descriptors. The process of transforming raw data into a suitable numerical format is known as feature engineering or descriptor generation. The resulting numerical representations are called features or descriptors. Then these features are used as input to machine learning algorithms to predict material properties or behaviors.

In machine learning, structures are represented as vectors of features or descriptors. The process of transforming raw data into a suitable numerical format is known as feature engineering or descriptor generation. The resulting numerical representations are called features or descriptors. Then these features are used as input to machine learning algorithms to predict material properties or behaviors.

General Requirements for Descriptor/Feature Vector in ML Tasks

The design of effective features is a cornerstone of materials informatics. Good features should ideally:

The choice of features is crucial, as it can significantly impact the performance of ML models. The design of effective features is an active area of research in materials informatics.

Global Descriptors

Global descriptors encode the properties related to the whole structure: crystal system, space group, density, energy, formation energy, band gap, etc. These descriptors are often derived from the entire crystal structure and can be used to characterize the material as a whole.

Local Descriptors

Local descriptors described the local environment around each atom in the structure. They are often derived from the local atomic environment and can capture information about the bonding and coordination of atoms. Examples include atomic forces, adsorption energies.

Parameters derived from the local atomic environment around each atom, such as coordination numbers, bond lengths, bond angles, or more sophisticated functions characterizing the neighbor distribution (e.g., radial distribution function moments, symmetry functions, SOAP - discuss later).

Codes

References
  1. Lourenço, M. P., Herrera, L. B., Hostaš, J., Calaminici, P., Köster, A. M., Tchagang, A., & Salahub, D. R. (2023). QMLMaterial─A Quantum Machine Learning Software for Material Design and Discovery. Journal of Chemical Theory and Computation, 19(17), 5999–6010. 10.1021/acs.jctc.3c00566