Graph
We have already briefly introduced the concept of graphs in the context of Atomistic Structure. In this section, we will delve deeper into the mathematical representation of graphs and their significance in materials science.
What is a Graph?¶
A graph is a mathematical structure consisting of nodes (or vertices) and edges (or links) that connect pairs of nodes. Graphs can be directed or undirected, weighted or unweighted, and can represent various types of relationships between entities.
A graph is mathematically defined as a pair , where:
- is a set of vertices or nodes, representing the objects or entities.
- is a set of edges, representing the connections or relationships between pairs of nodes. An edge connects node to node .
- is the global feature vector, which can represent properties of the entire graph.
Characteristics¶
Directed vs. Undirected: In an undirected graph, edges represent symmetric relationships; if node is connected to node , then is also connected to . The edge is the same as . This is common for representing atomic bonds or proximity where the relationship is inherently mutual. In a directed graph, edges have a direction, so is distinct from , representing asymmetric relationships (e.g., information flow, prerequisite dependencies).
Weighted vs. Unweighted: In an unweighted graph, edges simply indicate the presence or absence of a connection. In a weighted graph, each edge has an associated numerical weight , representing the strength, cost, or distance of the connection (e.g., the bond length or inverse distance between atoms).
Adjacency Matrix¶
While various data structures exist for storing graphs (like adjacency lists or incidence matrices), a common and conceptually important representation is the adjacency matrix. For a graph with nodes, the adjacency matrix is an square matrix where the entry indicates the connection between node and node .
- For an unweighted graph, if an edge exists between node and node , and otherwise.
- For a weighted graph, , the weight of the edge between and , if an edge exists, and (or sometimes ∞ for distance-based graphs) if no direct edge exists.
For an undirected graph, the adjacency matrix is symmetric (). For a directed graph, it may not be.
Example: Consider a simple undirected, unweighted graph with 4 nodes {1, 2, 3, 4} and edges {(1, 2), (1, 3), (2, 3), (3, 4)}. Its adjacency matrix would be:
Here, because there’s an edge between node 1 and node 2, while because there’s no direct edge between node 1 and node 4.
Features¶
Features are essential components of graphs, providing additional information about nodes and edges.
Node Features: Each node can have an associated feature vector , describing the properties of that node (e.g., element type, atomic coordinates).
Edge Features: Each edge can have an associated feature vector , describing the properties of the connection itself (e.g., bond type, distance).
Global Features: In some cases, graphs may also have global features that describe properties of the entire graph (e.g., total energy, symmetry).
Examples of Graphs¶
Graphs are ubiquitous in various fields, including computer science, social networks, biology, and materials science. Here are some examples of graphs:
- Social Networks: Perhaps the most intuitive example. Nodes represent individuals or entities (users, organizations), and edges represent relationships like friendship, followership, professional connections, or interactions (likes, comments). Node features might include user profiles, while edge features could represent the type or timestamp of an interaction. Analyzing these graphs helps understand community structure, influence propagation, and information dissemination.
- Images: While seemingly grid-like, images can be treated as graphs, especially for tasks like segmentation or scene understanding. Pixels or segmented regions (superpixels) can be nodes. Edges can connect adjacent pixels or regions, perhaps weighted by similarity in color or texture. This allows models to consider spatial relationships beyond simple convolution windows.
- Text and Natural Language Processing: Text can be represented as a graph where nodes are words, sentences, or documents. Edges can represent various relationships like co-occurrence, syntactic dependency (e.g., from parsing a sentence), semantic similarity, or citation links between documents. This graph structure helps capture context and relationships that linear sequence models might miss.
- Biological Networks: Graphs model interactions between biological entities. Nodes can be genes, proteins, or metabolites, and edges represent regulatory interactions, protein-protein interactions, or metabolic pathways.
- Knowledge Graphs: Nodes represent entities (people, places, concepts), and labeled edges represent the relationships between them (e.g., “Albert Einstein” -[born in]-> “Ulm”). These are widely used in search engines and question answering systems.
Molecular Graphs¶
The graph representation is exceptionally well-suited for materials. We can naturally model a material structure (like a molecule or a crystal) as a graph where:
Nodes () are atoms. Node features () can include the element type (e.g., represented as a one-hot vector or an embedding), atomic coordinates, partial charges, magnetic moments, etc.
Edges () represent chemical bonds or proximity. Often, an edge is defined between two atoms if the distance between them is less than a predefined cutoff radius. These are typically undirected. Edge features () can include the scalar distance between atoms and , or perhaps a vectorial representation of their separation, or features derived from bond angles involving neighboring atoms.
Global features () can represent properties of the entire structure, such as total energy, symmetry information, or other macroscopic properties.