Machine Learning of Slow Collective Variables and Enhanced Sampling via Spatial Techniques

Tuğçe Gökdemir; Jakub Rydzewski

doi:10.32388/70Q2OV

- JR
- TG

260

PDF

Field

Physics and Astronomy

Subfield

Statistical and Nonlinear Physics

Open Peer Review

Preprint

4.20 | 5 peer reviewers

Machine Learning of Slow Collective Variables and Enhanced Sampling via Spatial Techniques

Tuğçe Gökdemir¹, Jakub Rydzewski¹

Affiliations

Abstract

Understanding the long-time dynamics of complex physical processes depends on our ability to recognize patterns. To simplify the description of these processes, we often introduce a set of reaction coordinates, customarily referred to as collective variables (CVs). The quality of these CVs heavily impacts our comprehension of the dynamics, often influencing the estimates of thermodynamics and kinetics from atomistic simulations. Consequently, identifying CVs poses a fundamental challenge in chemical physics. Recently, significant progress was made by leveraging the predictive ability of unsupervised machine learning techniques to determine CVs. Many of these techniques require temporal information to learn slow CVs that correspond to the long timescale behavior of the studied process. Here, however, we specifically focus on techniques that can identify CVs corresponding to the slowest transitions between states without needing temporal trajectories as input, instead using the spatial characteristics of the data. We discuss the latest developments in this category of techniques and briefly discuss potential directions for thermodynamics-informed spatial learning of slow CVs.

I. Introduction

Complex systems in chemical physics often exhibit dynamics with multiple temporal scales, characterized by infrequent transitions between long-lived metastable states that occur on timescales orders of magnitude slower than fast molecular motions^[1]^[2]^[3]. This significant disparity in timescales is known as timescale separation. Understanding such physical processes depends on our ability to recognize patterns in molecular dynamics (MD) simulations. We typically simplify the dynamics by introducing a set of reaction coordinates, customarily referred to as order parameters or collective variables (CVs)^[4], which are meant to describe it on the macroscopic level. Then, we can estimate a free-energy landscape in CV space, which, to a large extent, is responsible for the thermodynamics and kinetics of physical processes^[5]^[6]^[7]^[8]^[9]^[10]^[11]^[12]^[13].

However, the determination of CVs has proved challenging even for simpler systems^[14]^[15]^[16]. The most interesting properties of complex processes are often hidden in slow dynamics to which fast variables are adiabatically constrained. Therefore, CVs should describe transitions between states that occur when crossing free-energy barriers significantly higher than thermal energy ( $\gg k_BT$ ). This picture is based on the transition state theory and Kramers’ theory for reaction dynamics, where the reactant and product states are separated by the energy barrier locating the transition state^[17]. Processes such as protein folding^[18]^[19], crystallization^[20], nucleation^[21], glass transitions^[22]^[23], aqueous systems^[24], catalysis^[25], or molecular recognition^[26]^[27]^[28]^[29] are only a few examples where these characteristics are present and that have frequently profited from such a reduced description.

Due to the rapid development of machine learning (ML) libraries^[30]^[31], using neural networks has become relatively straightforward and readily available for applications in chemical physics and MD. Interestingly, a fundamental challenge in ML is to develop simple and interpretable representations for complex data^[32]^[33]^[34]^[35]^[36]^[37], which closely resembles the task of developing CVs for dynamical systems. Consequently, ML methods have been employed to extract meaningful information from simulations due to their ability to recognize statistical patterns^[38]^[39]. These techniques can be harnessed to devise algorithms for learning CVs hidden in data to explain the dynamics on the macroscopic scale. A variety of such data-driven methods has been developed at the intersection of statistical physics, MD, and ML. Many of them were recently reviewed^[40]^[41]^[42]^[43]^[44]^[45]^[46]^[47]^[48]^[4]^[49]^[50]^[51]^[52].

Nonetheless, learning slow CVs remains a challenging task, presenting several difficulties. One key issue is that the quality of CVs is often significantly hampered by the inability to effectively capture longer timescales during standard simulations within a reasonable computing time. This is commonly known as the sampling problem in MD. As such, the construction of training datasets for ML techniques can be problematic as it cannot be known if every state is sufficiently sampled. Additionally, the scarcity of observations between states makes the representation of transition states in reduced space problematic. Enhanced sampling methods can partly alleviate the problem of poor statistics. However, they require correct data reweighting to obtain equilibrium characteristics, often tailored to a particular class of ML algorithms. These problems cause a circular dependency between sampling and learning that poses a major obstacle in developing these techniques.

In this brief review, we will focus on a specific type of ML methods for building slow CVs. Unlike other reviews that cover techniques using trajectories and their time-delayed versions as input to calculate kinetic quantities, such as correlation functions directly, our priority will be on unsupervised techniques that do not rely on temporal characteristics; instead, they estimate kinetics indirectly by analyzing the thermodynamic properties of MD data. These methods aim to learn the reduced space of CVs by capturing spatial characteristics of simulation data encoded in configuration or reduced space, such as the proximity between samples, density estimates, and weights derived from enhanced sampling simulations. We will explore various techniques, including spectral methods such as diffusion maps and their extensions and recently developed algorithms that leverage deep neural networks to learn slow CVs. Lastly, we will discuss potential avenues for future advancements in this field.

II. Background

A. Collective Variables

In statistical mechanics, we consider a system described by the microscopic coordinates $\mathbf{x} = (x_1, \ldots, x_n)$ whose dynamics at temperature $T$ evolves according to a potential energy function $U(\mathbf{x})$ . This dynamics can be described by the following overdamped Langevin equation:

$d\mathbf{x} = -\nabla U(\mathbf{x})dt + \sqrt{2\beta^{-1}}d\mathbf{w}, \tag{1}$

where $\beta = 1/k_BT$ is the inverse temperature, $k_B$ is the Boltzmann constant, and $d\mathbf{w}$ is the Brownian motion. The time-evolution of the system results in a canonical equilibrium distribution given by the Boltzmann density:

$p(\mathbf{x}) = \frac{1}{Z}e^{-\beta U(\mathbf{x})}, \tag{2}$

where $Z = \int d\mathbf{x} e^{-\beta U(\mathbf{x})}$ is the partition function of the system^[53]. We reduce the representation of the system by mapping it into reduced space defined by a set of $d$ functions of the microscopic coordinates, commonly referred to as CVs:

$\mathbf{z} = f(\mathbf{x}) \equiv \{f_k(\mathbf{x})\}_{k=1}^d, \tag{3}$

where $d \ll n$ . The dynamics of the system in reduced space samples a marginal equilibrium density:

$p(\mathbf{z}) = \int d\mathbf{x} p(\mathbf{x}) \delta(\mathbf{z} - f(\mathbf{x})) \tag{4}$

that is defined by weighting each slice through configuration space $\mathbf{x}$ , denoted by the delta function $\delta(\cdot)$ , with the Boltzmann factors $p(\mathbf{x}) \propto e^{-\beta U(\mathbf{x})}$ . The marginal probability $p(\mathbf{z})$ contains information about the free-energy landscape:

$F(\mathbf{z}) = -\frac{1}{\beta}\log p(\mathbf{z}) \tag{5}$

Even for simple systems, the free-energy landscape contains many stable states that are separated by barriers much larger than thermal energy, leading to significant timescale disparities in the dynamics.

As summarized in a review by Peters^[54], a general requirement for optimal CVs is to preserve dynamical self-consistency: The dynamics projected onto the free-energy landscape should remain consistent with trajectories sampling configuration space. Taking this apart, we can list more specific characteristics that define optimal CVs:

CVs must accurately recognize metastability; that is, distinguish between long-lived metastable states and identify transition states^[55]. Accurate metastability recognition is often difficult to achieve due to the sampling problem^[5]^[10].
CVs need to model reduced dynamics as primarily corresponding to transitions on longer timescales, with the dynamics of fast variables being negligible^[56]. Slow and fast variables should be unmixed in such a way that they induce a significant separation of timescales. Moreover, CVs should not be degenerate; each should describe a different slow mode.
CVs need to, preferably, be Markovian for the ability to describe slow dynamics as evolution in the free-energy landscape with configuration-dependent diffusion coefficients^[57]^[58]^[59].
CVs must be applicable in CV-based enhanced sampling methods (i.e., smooth and differentiable), such as umbrella sampling^[60]^[61]^[62]^[63], metadynamics^[64]^[65]^[66]^[67], or variationally enhanced sampling^[68]^[69]^[70], to improve sampling in MD and drive it toward long-timescale processes.

B. Timescale Separation

To illustrate the problem of timescale separation, let us focus on the spectral theory of dynamical systems and reversible Markov processes^[71]. Consider the forward Fokker–Planck equation for time-propagation of a probability distribution $p(\mathbf{x},t)$ : $\partial p/\partial t = -Lp$ , where $L$ is the generator of a Markov process. Through a change of variables, the Fokker–Planck equation can be solved^[72]. The solution to this equation can be written in closed form as an eigenfunction expansion^[73]:

$p(\mathbf{x},t) = \varphi_0(\mathbf{x}) + \sum_{k=1}^\infty a_k e^{-\mu_k t} \varphi_k(\mathbf{x}), \tag{6}$

where $a_k$ are coefficients and $t$ is a time variable. For time $t \to \infty$ , the solution of the forward equation converges to the equilibrium Boltzmann distribution $p(\mathbf{x})$ .

Under general conditions, the generator of the diffusion process $L$ has a discrete eigenspectrum of eigenvalues $\mu_k$ , and the corresponding eigenfunctions $\varphi_k(\mathbf{x})$ . The zeroth eigenfunction is the equilibrium density $\varphi_0(\mathbf{x}) \propto e^{-\beta U(\mathbf{x})}$ with the eigenvalue $\mu_0 = 0$ . The eigenvalues are non-negative and sorted in increasing order:

$\mu_0 = 0 < \mu_1 < \mu_2 < \cdots < \mu_\infty. \tag{7}$

The dominant eigenvalues of the Markov generator decay exponentially and are linked to the slowest relaxation timescales in the system. Each eigenvalue can be matched with an effective timescale $t_k = 1/\mu_k$ . In systems with timescale separation, only a few slow processes related to rare transitions between metastable states remain. As a result, the eigenspectrum of $L$ has a spectral gap, i.e., the largest difference between eigenvalues $\mu_{k+1}$ and $\mu_k$ . This implies that the eigenvalues much lower than $\mu_{k+1}$ can be neglected as they correspond with rapid fluctuations within states and decay much faster, leading effectively to $k$ slow processes (Fig. 1).

**Figure 1.** Model potential with two metastable states whose long-time behavior can effectively be described by the slow variable $x_s$ , with the fast variable $x_f$ responsible only for fluctuations within the states. The corresponding eigenspectrum of the diffusion generator $\lambda_k = e^{-\mu_k}$ shows timescale separation, which is indicated by the spectral gap $\lambda_{k-1} - \lambda_k$ , where $k=2$ is the number of states.

The spectral properties of reversible Markov processes can be related to the concept of metastability^[74]. Although this relation can be understood intuitively, Gaveau and Schulman^[75]^[76], drawing on the extensive work of Davies^[77]^[78]^[79], developed a spectral definition of metastability. They formally showed that dominant and nearly degenerate eigenvalues are related to metastable timescales. This concept relies on the presence of the spectral gap. If an eigenvalue is nearly degenerate, the equilibrium distribution separates into metastable states with infrequent transitions between them. Conversely, eigenvalue degeneracy exists if the equilibrium density breaks into metastable states separated by a free-energy barrier much larger than thermal energy. The eigenfunctions related to the dominant eigenvalues are linked to distributions that remain stable longer than transient processes. Furthermore, sign changes in these eigenfunctions indicate transitions between metastable states. The theory is summarized in a monograph by Bovier and Den Hollander^[80].

C. Enhanced Sampling

Acquiring an informative training dataset from unbiased MD trajectories is a crucial challenge. These trajectories need to spontaneously and repeatedly cross over all significant free-energy barriers in the system. However, the metastability leads to kinetic entrapment in a single state, making transitions between metastable states rare. To alleviate this issue, enhanced sampling methods can be used to improve sampling efficiency^[6]^[7]^[8]^[9]^[10]^[11].

Enhanced sampling methods that require CVs to improve sampling are based on employing a nonphysical bias potential. To such methods, we can include umbrella sampling introduced by Torrie and Valeau^[60], adiabatic biasing force^[81], adiabatic free-energy dynamics^[82], metadynamics proposed by Laio and Parrinello^[64] and improved to the well-tempered variant by Barducci et al.^[65], mean-force dynamics^[83], or variationally enhanced sampling^[68]. Biasing the system can cause the probability distribution of collective variables (CVs) to significantly deviate from equilibrium, resulting in sampling according to a biased distribution:

$p_V(\mathbf{z},t) \propto e^{-\beta[F(\mathbf{z})+V(\mathbf{z},t)]}, \tag{8}$

where $V(\mathbf{z},t)$ is a time-dependent bias potential. To calculate equilibrium properties, such as free-energy landscapes, the bias must be reverted during postprocessing. This is customarily done by reweighting, where each sample is associated with a statistical weight to counter the effect of biasing. In general, the weights are given by the likelihood ratio between the equilibrium and the biased probability distributions (Eq. 8):

$w(\mathbf{z},t) = \frac{p(\mathbf{z})}{p_V(\mathbf{z},t)}. \tag{9}$

For methods using a quasi-stationary bias potential^[60]^[68]^[84] (e.g., umbrella sampling), or when the simulation is converged and the bias does not change significantly, the weights are given as:

$w(\mathbf{z}) \propto \frac{e^{-\beta F(\mathbf{z})}}{e^{-\beta[F(\mathbf{z})+V(\mathbf{z})]}} = e^{\beta V(\mathbf{z})}. \tag{10}$

In contrast, in metadynamics^[65], the bias potential changes over time and requires accounting for a time-dependent offset^[85]. Thus, the functional form of weights may vary depending on an enhanced sampling method and a reweighting algorithm^[86]^[87]^[84]^[88]. A summary of such methods was recently published by Kamenik et al^[89].

To efficiently sample and drive complex physical processes, high-quality CVs are required for biasing. However, learning CVs demands using exhaustively sampled data. This problem creates a challenging circular dependency, which is referred to as the “chicken-and-egg” problem^[45]. Advances in the determination of CVs help address this problem and contribute to the development and implementation of enhanced sampling methods.

III. Spatial Learning

Due to recent extensive advancements in data-driven temporal methods^[90]^[91]^[92]^[69]^[93]^[94]^[95], there are numerous reviews summarizing this topic^[45]^[46]^[49]^[50]^[51]^[52]. In this work, however, we consider techniques that are “spatial,” i.e., algorithms for learning slow CVs that do not need to exploit temporal information in MD simulations. We can describe spatial techniques as those that rely on pairwise relations between samples in the dataset (usually through a distance metric) instead of counting transitions within a specified lag time. The development of such techniques can be traced back to the work of Shi and Malik^[96] on image segmentation and the classic Laplacian eigenmaps introduced by Belkin and Niyogi^[97]^[98]^[99]^[100]; and is closely related to graph spectral theory^[101] based on graphs, kernels, and random walks^[102]^[103]^[104].

The primary difference between spatial and temporal techniques lies in how kinetics is estimated. Spatial techniques estimate kinetics indirectly by analyzing the thermodynamic characteristics of MD data, such as equilibrium probabilities, in contrast to temporal techniques. Additionally, in spatial techniques, we assume that MD data closely approximates overdamped Langevin dynamics (see Sec. II.A). For these reasons, we can refer to these methods as thermodynamics-informed learning.

A. Anisotropic Kernels

The core of most spatial learning methods involves establishing similarity between samples, typically through a distance metric and a kernel^[105]. For example, Laplacian eigenmaps construct a Gaussian kernel to model relations between $N$ samples in a dataset $X = \{\mathbf{x}_k\}_{k=1}^N$ ^[97]^[98]^[99]^[100]:

$G_\varepsilon(\mathbf{x}_k, \mathbf{x}_l) = \exp(-\|\mathbf{x}_k - \mathbf{x}_l\|^2/\varepsilon^2), \tag{11}$

where $\varepsilon > 0$ is a scale parameter. This kernel is then used to define a Laplacian matrix and parametrize reduced space using its eigenvectors. However, methods that use a Gaussian kernel, such as Laplacian eigenmaps, cannot be used to compute slow CVs as their construction implicitly assumes that data is distributed uniformly. As the equilibrium density is often far from uniform, Laplacian eigenmaps have not seen many applications for analyzing trajectories. However, they are often used as a baseline for developing more advanced techniques.

Based on Laplacian eigenmaps, Coifman et al.^[106] developed the diffusion map algorithm that is especially suited for learning the reduced space of slow CVs. Diffusion maps use a density-preserving kernel for data sampled from any underlying probability distribution. For this, an anisotropic kernel is constructed on the dataset $X$ ^[107]:

$K(\mathbf{x}_k, \mathbf{x}_l) = \frac{G_\varepsilon(\mathbf{x}_k, \mathbf{x}_l)}{\rho^\alpha(\mathbf{x}_k)\rho^\alpha(\mathbf{x}_l)}, \tag{12}$

where $\varepsilon$ is a scale parameter, $\rho(\mathbf{x}_k) = \sum_l G_\varepsilon(\mathbf{x}_k, \mathbf{x}_l)$ is a density estimate that allows us to include information about non-uniformly sampled data into the kernel, and $\alpha \in [0,1]$ is the anisotropic diffusion constant. Next, a Markov transition matrix is constructed by row-normalizing $K$ :

$M(\mathbf{x}_k, \mathbf{x}_l) = \frac{K(\mathbf{x}_k, \mathbf{x}_l)}{\sum_i K(\mathbf{x}_k, \mathbf{x}_i)} \tag{13}$

to build a discrete Markov chain on the data:

$m_{kl} = \Pr(\mathbf{x}_{i+1} = \mathbf{x}_l \mid \mathbf{x}_i = \mathbf{x}_k) \tag{14}$

that expresses a transition probability between $\mathbf{x}_k$ and $\mathbf{x}_l$ . Note that this construction does not depend on the physical time. The local scale parameter $\varepsilon$ plays an important role in determining the quality of slow CVs, as it defines the scale within which the relation between two samples contributes to the Markov transition matrix.

Depending on the anisotropic diffusion constant $\alpha$ , several kernel normalizations are available, which can change the long-time convergence of the Markov chain to a particular operator. This group of constructions is known as anisotropic diffusion maps^[106]^[108]^[109]^[107]^[110]. For example, with $\alpha = 1/2$ , the Markov chain approaches the time asymptotics of the system by describing the dynamics by the Fokker–Planck anisotropic diffusion with the potential $U(\mathbf{x})$ . As such, this normalization is commonly used to extract information from MD trajectories. Two other frequently considered values are $\alpha = 0$ and 1. The former results in the classical normalized graph Laplacian, while the latter yields the Laplace-Beltrami operator with a uniform probability density^[106]^[108]^[109]^[107]^[110].

The advancements of the diffusion map algorithm and anisotropic Markovian kernels often involve using a kernel that captures more aspects of the data. For instance, self-tuning local kernels were introduced by Zelnik-Manor and Perona^[111]. Following this works by Rohrdanz et al.^[112] and Zhang et al.^[113]^[114] demonstrated that estimating the scale parameter as configuration-dependent $\varepsilon(\mathbf{x}_k)\varepsilon(\mathbf{x}_l)$ , where each term can be calculated as the distance between $\mathbf{x}$ and its $n$ -th nearest neighbor, improves the overall quality of slow CVs^[40]. A more general method for computing the local scale parameters was later proposed by Berry et al.^[115]^[116]

In works by Dsilva et al.^[117]^[118] and Singer et al.^[119], it was proposed to use a heterogeneous Gaussian kernel to improve properties of the resulting CVs. Instead of using the Euclidean distance, this kernel introduces a Mahalanobis-like distance, which incorporates a covariance matrix. The implication of this is that the Mahalanobis kernel, by including the correlations in the dataset, can be used to remove the effect of observing the underlying space through a complex nonlinear function^[119]^[117]^[118]:

$G_\Sigma(\mathbf{x}_k, \mathbf{x}_l) = \exp\left(-d_\Sigma^2(\mathbf{x}_k, \mathbf{x}_l)/\varepsilon^2\right), \tag{15}$

where the squared Mahalanobis distance is:

$d_\Sigma^2(\mathbf{x}_k, \mathbf{x}_l) = (\mathbf{x}_k - \mathbf{x}_l)^\top (\Sigma_k + \Sigma_l)^\dagger (\mathbf{x}_k - \mathbf{x}_l). \tag{16}$

The local covariance matrix $\Sigma_k$ can be estimated as a sample covariance matrix at configuration $\mathbf{x}_k$ in its immediate neighborhood^[120]^[119]^[117] and $\dagger$ denotes a pseudo-inverse (as $\Sigma$ can be rank-deficient).

Subsequently, Berry and Sauer^[121] developed a generalization of diffusion maps to local kernels by introducing diffusion and drift terms in the distance metric, which should be additionally computed from the data^[122]^[123]. It was shown by Berry et al.^[124] that it is possible to improve anisotropic kernels by including Taken’s delay coordinates in datasets, especially when observations are scarce. Diffusion map was also embedded in a framework for coarse-graining and clustering^[125].

B. Reweighted Transitions

The concept of reweighting transition probabilities is crucial when using enhanced sampling algorithms to build the Markov transition matrix and, thus, CVs. A Markov chain constructed from biased data does not converge to the equilibrium density given by the Boltzmann distribution^[126]^[127]. This bias affects the Markov chain and leads to incorrect density and geometric relations between samples, which can result in reduced space that does not accurately represent the characteristics of the data. Reweighting pairwise probabilities counters the bias from the Markov matrix, yielding the unbiased Markov. While learning biased CVs can still be used to analyze, speed up, and drive the sampling of rare events^[113]^[128]^[129], the necessity of a reweighting algorithm becomes apparent when we seek to restore the equilibrium properties of the system and compute slow CVs.

The initial approach to learning unbiased CVs from enhanced sampling simulations with the diffusion map algorithm was proposed by Ferguson et al.^[130], in which each configuration is weighted based on its importance in umbrella sampling simulations. A symmetric weighted Gaussian kernel was used by Zhang et al.^[114] to learn CVs from multiple metadynamics simulations. Building on the local kernels introduced by Berry and Sauer^[121], Banisch et al. and Trstanova et al. devised a general approach to reweighting transition probabilities based on target measure reweighting^[131]^[132]. This approach was later employed in works by Evans et al., where diffusion map with the Mahalanobis distance is constructed in $\mathbf{z}$ space^[131]^[132].

Zhang and Chen^[133] derived an alternative technique for reweighting, which Rydzewski et al.^[127] later generalized to multiple algorithms employing Markov transition kernels. They demonstrated that the anisotropic diffusion kernel as can be unbiased as:

$K(\mathbf{x}_k, \mathbf{x}_l) = r_{kl} \frac{G_\varepsilon(\mathbf{x}_k, \mathbf{x}_l)}{\rho^\alpha(\mathbf{x}_k)\rho^\alpha(\mathbf{x}_l)}, \tag{17}$

where a transition reweighting factor $r_{kl} = w_k w_l$ incorporates importance weights from enhanced sampling simulations and $\rho$ are reweighted density estimates:

$\rho(\mathbf{x}_k) = \sum_m w_m G_\varepsilon(\mathbf{x}_k, \mathbf{x}_m). \tag{18}$

A detailed derivation with possible approximations is given by Rydzewski et al.^[127] As explained in Sec. II.C, the form of weight depends on the employed enhanced sampling and reweighting techniques^[86]^[85]^[87]^[84]^[89].

Several approximate transition reweighting factors can be obtained depending on the scaling of the long-time asymptotics of the kernel with the constant $\alpha$ ^[127]. This kind of transition reweighting can be used for diffusion maps^[127] and deep learning^[133]^[126]^[127]. We refer to the review by Rydzewski et al.^[51] for a detailed discussion.

This idea was recently explored by Rydzewski^[134], who demonstrated that this form of transition reweighting in diffusion maps can be employed as a feature selection pipeline for further dimensionality reduction. This is done by leveraging the idea that the partial selection of variables should have a similar eigenspectrum to configuration space. This extension can provide an interpretable and explainable description by selecting physically important CVs for the given process^[134].

For a more general approach to dynamical transition reweighting, not limited to unbiasing transition probabilities in spatial techniques, see reviews by Chen and Chipot^[50], which discusses many reweighting methods for temporal techniques and Keller and Bolhuis^[135], where reweighting is examined from the perspective of Markov state models.

**Figure 2.** Learning CVs with spatial techniques. Diagram of a neural network showing the difference between reweighted stochastic embedding (RSE) and spectral map. RSE estimates transition matrices $M(\mathbf{x}_k, \mathbf{x}_l)$ and $Q(\mathbf{z}_k, \mathbf{z}_l)$ in both $\mathbf{x}$ and $\mathbf{z}$ spaces, respectively (as $\mathbf{x}$ space can consist of variables different than the microscopic coordinates, we denote it as features). Then, it uses the Kullback–Leibler (KL) divergence as a loss function to minimize differences between pairs of transition probabilities in $\mathbf{x}$ and $\mathbf{z}$ spaces. In contrast, spectral map constructs a transition matrix only in $\mathbf{z}$ space. Next, it performs an eigendecomposition of $Q$ to calculate the spectral gap between neighboring eigenvalues ( $\Delta\lambda_{m-1,m}$ where $m$ is the number of states in $\mathbf{z}$ space) and maximizes it to improve timescale separation between slow and fast variables.

C. Eigendecomposition

In learning algorithms that use a few eigenvectors of the Markov transition matrix to span $\mathbf{z}$ space, a mapping into $\mathbf{z}$ space is obtained by solving an eigendecomposition problem:

$M\psi_k = \lambda_k \psi_k, \tag{19}$

where $\lambda_k$ and $\psi_k$ are the eigenvalues and corresponding eigenvectors of the Markov transition matrix $M$ , respectively. As explained in Sec. II.B, as a result of the existence of the spectral gap between neighboring eigenvalues $\lambda_k$ , slow CVs can be approximated by the following truncated mapping:

$\mathbf{z} = (\lambda_1 \psi_1, \ldots, \lambda_d \psi_d), \tag{20}$

where $d$ is the dimension of $\mathbf{z}$ space. The eigenvalues of the Markov transition matrix $M$ are (sorted in non-ascending order):

$\lambda_0 = 1 > \lambda_1 \cdots \geq \lambda_N, \tag{21}$

where the eigenvalue $\lambda_0$ corresponds to the equilibrium distribution of the Markov chain given by the eigenvector $\psi_0$ . The dominant eigenvalues related to the slowest relaxation timescales in the system^[74] and the fast eigenvalues have a negligible contribution to slow CVs. In the case of anisotropic diffusion maps, the eigenvalues $\lambda_k$ are related to the eigenvalues of the Fokker–Planck generator $\mu_k$ by the relation $\lambda_k = e^{-\mu_k}$ .

Several techniques use the mapping provided by diffusion maps as an initial guess to improve slow CVs iteratively. For instance, the eigenvectors of the Markov transition matrix $M$ can serve as a basis to approximate kinetic quantities such as the transfer operator. This approach was exploited in works by Boninsegna et al.^[136], Noe and Clementi^[137]^[138], and more recently by Thiede et al. using a Galerkin approximation^[139].

Algorithms that use an eigendecomposition to construct $z$ space require an out-of-sample extension to map samples outside of the dataset. Specifically, for diffusion maps the Nyström extension^[140]^[141], Laplacian pyramids^[117], and geometric harmonics^[142]^[143] interpolators were used. A detailed analysis of out-of-sample algorithms was published by Bengio et al^[144].

D. Reweighted Stochastic Embedding

Reweighted stochastic embedding (RSE) is a recent framework for the parametric learning of slow CVs, introduced by Rydzewski et al.^[127], which employs algorithms to construct unbiased Markov transition matrices with transition reweighting (Sec. III.2), allowing for the estimation of CVs from data collected in enhanced sampling simulations. Building on the work of van Maaten, Hinton, and Roweis^[145]^[146]^[147]^[148], RSE optimizes a loss function to learn the mapping to reduced space. Specifically, it projects samples into $\mathbf{z}$ space using a neural network, while ensuring that the statistical distance between transition matrices estimated in both configuration space and $\mathbf{z}$ space is minimized (Fig. 2).

The first technique of this framework is stochastic kinetic embedding (StKE), which was proposed by Zhang and Chen^[133]. StKE combines modeling a slow manifold with parametric dimensionality reduction, building upon the reweighted anisotropic diffusion kernel. As such, StKE can learn slow CVs from biased data sampled in enhanced sampling simulations. In addition, it uses an iterative procedure incorporating temperate-accelerated MD^[149]^[150] to alleviate the circular dependency^[133], allowing the use of this algorithm on the fly in atomistic simulations^[133]^[49]^[127]^[151]. Subsequently, Rydzewski and Valsson introduced a RSE technique called multiscale reweighted stochastic embedding (MRSE)^[126] that shares similarities with StKE^[127]^[51]. The main difference between StKE and MRSE is, as in many methods discussed in this review, boils down to using other kernels to estimate Markov transition matrices. In MRSE, the process of constructing unbiased transition probabilities from enhanced sampling simulations involves adaptively estimating a kernel in $\mathbf{x}$ space based on information theory principles. In contrast, StKE employs a fixed anisotropic diffusion kernel, as used in diffusion maps (see Sec. III.2). This topic is discussed in detail in the review by Rydzewski et al^[51].

RSE employs building transition matrices in both $\mathbf{x}$ and $\mathbf{z}$ spaces (Fig. 2). As with many neutral network-based techniques for learning CVs, $\mathbf{x}$ space can comprise variables other than the microscopic coordinates, which are called features or descriptors. The transition matrix $M$ constructed in $\mathbf{x}$ space remains constant throughout learning, while the matrix $Q$ in $\mathbf{z}$ space is adjusted depending on a neural network that performs dimensionality reduction, i.e., $f_w(\mathbf{x}) = \mathbf{z}$ . Most generally, in RSE, a weighted Gaussian mixture is used to construct the transition matrix in $\mathbf{x}$ space^[126]:

$M(\mathbf{x}_k, \mathbf{x}_l) \propto \sum_\varepsilon \frac{w(\mathbf{x}_l)}{\rho^\alpha(\mathbf{x}_l)} G_\varepsilon(\mathbf{x}_k, \mathbf{x}_l) \tag{22}$

where the sum goes over scale parameters. In $\mathbf{z}$ space, the transition matrix can be given, for example, by a $t$ -distribution kernel^[126]:

$Q(\mathbf{z}_k, \mathbf{z}_l) \propto \left(1 + (\mathbf{z}_k - \mathbf{z}_l)^2\right)^{-1}. \tag{23}$

RSE minimizes the Kullback–Leibler divergence^[152] to learn CVs, which can be interpreted as a “distance” between probability distributions. Thus, after the training converges, the transition probabilities in both spaces should be approximately equal. More details about these algorithms can be found in Refs.^[127]^[126].

**Figure 3.** Free energy landscape of the FiP35 protein constructed from slow CVs learned with spectral map (right). The slow CVs discriminate between the folded state (FS) and the unfolded state (US), which are separated by the transition state (TS) near the free energy barrier. The most important physical interactions in the FiP35 consisting of two $\beta$ sheets identified by spectral map are shown in blue (left). [Figure based on Rydzewski, “Spectral Map for Slow Collective Variables, Markovian Dynamics, and Transition State Ensembles,” J. Chem. Theory Comput. (2024). Copyright 2024 Author, licensed under Creative Commons Attribution 4.0.]

E. Spectral Map

The first technique devised to maximize timescale separation to find CVs in complex systems was proposed by Tiwary and Berne^[153] and subsequently expanded^[154]^[155]^[156]^[157]. Their technique, called spectral gap optimization of order parameters (SGOOP), is based on constructing a transition matrix using the principles of the maximum caliber framework^[158]. As opposed to the techniques reviewed here, SGOOP explicitly uses time information to construct slow CVs.

A recent unsupervised statistical learning technique for learning slow CVs that is also based on maximizing timescale separation is spectral map, developed by Rydzewski^[159]. It is modeled using an overdamped Langevin diffusion in $\mathbf{z}$ space^[160]. Spectral map proceeds by mapping the dynamics into $\mathbf{z}$ space using a neural network and constructing a Markov transition matrix by row-normalizing the anisotropic diffusion kernel (Eq. 12), however, from data in $\mathbf{z}$ space:

$Q(\mathbf{z}_k, \mathbf{z}_l) = \frac{K(\mathbf{z}_k, \mathbf{z}_l)}{\sum_m K(\mathbf{z}_k, \mathbf{z}_m)}, \tag{24}$

where $\mathbf{z} = f_w(\mathbf{x})$ given by the neural network with learnable parameters $w$ . The transition matrix is then spectrally decomposed to estimate the degree of timescale separation from the spectral gap in its eigenspectrum. The spectral gap is used as a score function for the neural network and maximized during learning:

$\Delta\lambda_{m-1,m}(Q) = \lambda_{m-1} - \lambda_m, \tag{25}$

where $\lambda_k$ are the eigenvalues of $Q$ sorted in decreasing order and $m$ is the number of metastable states in the system. As the spectral gap is maximized, $\mathbf{z}$ space is adjusted accordingly by improving the parameters of the neural network. At the end, $\mathbf{z}$ space corresponds to slow CVs. A simplified diagram spectral map and comparison to RSE is given in Fig. 2.

Rydzewski and Gökdemir^[161] showed that maximizing timescale separation in spectral map results in the dynamics in $\mathbf{z}$ space with Markovian characteristics. In their work, it was shown that it is possible to construct a high-quality Markov state model based on the learned slow CVs and estimate kinetics accurately. In another work, Rydzewski showed that the framework can be easily extended for learning the transition state ensembles^[160] (Fig. 3), which is demanding for complex systems due to the scarcity of transitions between states^[162]^[163]^[164].

Using the transition state ensemble to count transition paths^[165], Rydzewski^[160] showed that a slow CV learned by spectral map closely approaches a Markovian limit for overdamped Langevin dynamics^[59]. Moreover, it was illustrated that spectral maps can estimate the quality of the reduced representations with commonly used physical descriptors by comparing their spectral gaps. It was demonstrated that spectral map can be used to construct interpretable reaction coordinates for protein folding with a linear model instead of a deep neural network, and they are slower than the fraction of native contacts or end-to-end distance^[160].

F. Enhanced Sampling via Neural Networks

After the training procedure, a neural network representing CVs can be used for the purposes of ehnanced sampling. To bias such a neural network, a biasing force must be applied in CV space. This force is equal to the negative derivative of the biasing potential with respect to the CVs, which can be estimated using the chain rule:

$F(\mathbf{x}) = -\frac{dV(\mathbf{z})}{d\mathbf{z}} \nabla_\mathbf{x} f(\mathbf{x}), \tag{26}$

where the second term on the right-hand side is automatically computed through backpropagation. By accumulating the biasing potential in CV space, the neural network can be used to push the system out of local minima. Such CVs, in the form of a neural network, can be integrated into several advanced MD simulation codes, such as PLUMED^[166]^[167]^[168]^[169]^[170].

We want to underline that there might be more requirements for slow CVs represented by a neural network (not only limited to spatial techniques). An often overlooked issue that can harm the convergence of biasing methods is that the neural network may learn a function where $\nabla_\mathbf{x} f(\mathbf{x}) \approx 0$ in basins. According to Darve et al.^[81], biasing a CV can be imagined in terms of an object that is pulled or pushed, where the CV has a “mass” attached to it that is related to the inverse of $\nabla_\mathbf{x} f(\mathbf{x})$ . Consequently, applying bias to neural networks with $\nabla_\mathbf{x} f(\mathbf{x}) \approx 0$ in energy minima might be inefficient due to the large mass and lead to numerical stability issues in MD simulations.

IV. Summary

Overall, we think further research in spatial techniques will follow by carefully incorporating more thermodynamical information into ML. Due to rapid developments in physics-informed algorithms, we expect that the primary effort will be directed toward solving the problem of constructing interpretable and explainable reaction coordinates for complex systems in chemical physics.

To address this issue, we can examine the theoretical progress in modeling slow dynamics in the context of timescale separation in CV space^[171]^[172]^[173]^[174]. By investigating slow dynamics using overdamped Langevin dynamics in a free-energy landscape with configuration-dependent diffusion coefficients, we can propose a Markovian interpretation of the physical process. The diffusion tensors, which depend on the coordinates, are important for reduced dynamics and can impact the free-energy landscape by altering transition states and barrier height^[175]^[176]^[177]^[178]^[179]. To account for this in spatial techniques, we can incorporate information about them in anisotropic kernels and transition matrices. Additionally, analyzing spatial techniques from the perspective of spectral graph theory^[101], especially the long-term behavior of Markov chains, the asymptotic rate of convergence to equilibrium, and mixing rates^[180]^[181], can lead to improvements.

For spatial techniques to learn from enhanced sampling simulations, slow CVs should be computed using unbiased Markov matrices through a transition reweighting algorithm, such as those presented in the review, to capture equilibrium information accurately. It would be interesting to explore the relationship between the reweighting of Markov transition matrices and dynamical path reweighting, for example, based on the Girsanov theorem^[182]^[183]^[184]. To improve sampling and drive it toward complex physical processes, spatial techniques can be extended with a general iterative learning-sampling framework where rounds of learning slow CVs (including reweighting) are followed by biasing using an enhanced sampling technique. Such iterative approaches have already been implemented using ML to learn from MD simulations^[92]^[185]^[186]^[133]^[187]^[93]^[188]^[169]^[189].

Finally, we underline that apart from spatial techniques, many others can be used to study complex processes in the fields of chemical physics and MD^[190]^[191]^[192]^[193]^[194]^[195]^[196]^[197]^[198]^[199]^[200]^[201]^[202]^[203]^[204]^[205]^[206]^[207]^[208]^[209]^[210]^[211]^[212]. A detailed introduction to such ML methods can be found in recent reviews^[40]^[41]^[42]^[43]^[44]^[45]^[46]^[47]^[48]^[49]^[50]^[51]^[52]. We think, however, that recent results in spatial techniques for learning slow CVs are worthy of further development and could provide a valuable alternative to temporal techniques for understanding the physics of complex systems.

Statements and Declarations

Acknowledgements

The research was supported by the National Science Center in Poland (Sonata 2021/43/D/ST4/00920, “Statistical Learning of Slow Collective Variables from Atomistic Simulations”). J. R. acknowledges funding the Ministry of Science and Higher Education in Poland. The authors acknowledge insightful feedback from Haochuan Chen, Luke Evans, Luigi Bonati, and Omar Valsson.

Conflicts of Interest

The authors have no conflicts to disclose.

Author Contributions

Tuğçe Gökdemir: Conceptualization (equal); Supervision (equal); Writing - original draft (equal); Writing - review & editing (equal). Jakub Rydzewski: Conceptualization (equal); Supervision (equal); Writing - original draft (equal); Writing - review & editing (equal).

Data Availability

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

^{^}Chandler D. Introduction to Modern Statistical Mechanics. Oxford University Press, Oxford, UK; 1987.
^{^}Lelièvre T, Rousset M, Stoltz G (2010). Free Energy Computations: A Mathematical Perspective. Imperial College Press.
^{^}Frenkel D, Smit B. Understanding Molecular Simulation: From Algorithms to Applications. 3rd ed. Academic Press; 2023.
^a, bRogal J (2021). "Reaction Coordinates in Complex Systems -- A Perspective". Eur. Phys. J. B. 94: 1–9. doi:10.1140/epjb/s10051-021-00233-5.
^a, bBolhuis PG, Chandler D, Dellago C, Geissler PL (2002). "Transition Path Sampling: Throwing Ropes over Rough Mountain Passes, in the Dark". Annu. Rev. Phys. Chem.. 53: 291–318. doi:10.1146/annurev.physchem.53.082301.113146.
^a, bAbrams C, Bussi G (2014). "Enhanced Sampling in Molecular Dynamics using Metadynamics, Replica-Exchange, and Temperature-Acceleration". Entropy. 16: 163–199. doi:10.3390/e16010163.
^a, bPietrucci F (2017). "Strategies for the Exploration of Free Energy Landscapes: Unity in Diversity and Challenges Ahead". Rev. Phys.. 2: 32–45. doi:10.1016/j.revip.2017.05.001.
^a, bValsson O, Tiwary P, Parrinello M (2016). "Enhancing Important Fluctuations: Rare Events and Metadynamics from a Conceptual Viewpoint". Annu. Rev. Phys. Chem.. 67: 159–184. doi:10.1146/annurev-physchem-040215-112229.
^a, bYang YI, Shao Q, Zhang J, Yang L, Gao YQ (2019). "Enhanced Sampling in Molecular Dynamics". J. Chem. Phys.. 151: 070902. doi:10.1063/1.5109531.
^a, b, cBussi G, Laio A (2020). "Using Metadynamics to Explore Complex Free-Energy Landscapes". Nat. Rev. Phys.. 2: 200–2012. doi:10.1038/s42254-020-0153-0.
^a, bHénin J, Lelièvre T, Shirts MR, Valsson O, Delemotte L (2022). "Enhanced Sampling Methods for Molecular Dynamics Simulations [Article v1.0]". Living Journal of Computational Molecular Science. 4: 1583. doi:10.33011/livecoms.4.1.1583.
^{^}Chen H, Chipot C (2022). "Enhancing Sampling with Free-Energy Calculations". Curr. Opin. Struct. Biol.. 77: 102497. doi:10.1016/j.sbi.2022.102497.
^{^}Ray D, Parrinello M (2023). "Kinetics from Metadynamics: Principles, Applications, and Outlook". J. Chem. Theory Comput.. 19: 5649–5670. doi:10.1021/acs.jctc.3c00660.
^{^}Geissler PL, Dellago C, Chandler D (1999). "Kinetic Pathways of Ion Pair Dissociation in Water". J. Phys. Chem. B. 103: 3706–3710. doi:10.1021/jp984837g.
^{^}Bolhuis PG, Dellago C, Chandler D (2000). "Reaction Coordinates of Biomolecular Isomerization". Proc. Natl. Acad. Sci. U.S.A.. 97: 5877–5882. doi:10.1073/pnas.100127697.
^{^}Ma A, Dinner AR (2005). "Automatic Method for Identifying Reaction Coordinates in Complex Systems". J. Phys. Chem. B. 109: 6769–6779. doi:10.1021/jp045546c.
^{^}Hnggi P, Talkner P, Borkovec M (1990). "Reaction-Rate Theory: Fifty Years after Kramers". Rev. Mod. Phys.. 62: 251. doi:10.1103/revmodphys.62.251.
^{^}Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y (2010). "Atomic-Level Characterization of the Structural Dynamics of Proteins". Science. 330: 341–346. doi:10.1126/science.1187409.
^{^}Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011). "How Fast-Folding Proteins Fold". Science. 334: 517–520. doi:10.1126/science.1208351.
^{^}Neha, Tiwari V, Mondal S, Kumari N, Karmakar T (2023). "Collective Variables for Crystallization Simulations--from Early Developments to Recent Advances". ACS Omega. 8: 127–146. doi:10.1021/acsomega.2c06310.
^{^}Beyerle ER, Zou Z, Tiwary P (2023). "Recent Advances in Describing and Driving Crystal Nucleation using Machine Learning and Artificial Intelligence". Curr. Opin. Solid State Mater. Sci.. 27: 101093. doi:10.1016/j.cossms.2023.101093.
^{^}Berthier L, Biroli G (2011). "Theoretical Perspective on the Glass Transition and Amorphous Materials". Rev. Mod. Phys.. 83: 587. doi:10.1103/revmodphys.83.587.
^{^}Hohenberg PC, Krekhov AP (2015). "An Introduction to the Ginzburg--Landau Theory of Phase Transitions and Nonequilibrium Patterns". Phys. Rep.. 572: 1–42. doi:10.1016/j.physrep.2015.01.001.
^{^}Banerjee D, Azizi K, Egan CK, Donkor ED, Malosso C, Pino SD, Mirón GD, Stella M, Sormani G, Hozana GN, Monti M, Morzan UN, Rodriguez A, Cassone G, Jelic A, Scherlis D, Hassanali A (2024). "Aqueous Solution Chemistry in Silico and the Role of Data-Driven Approaches". Chem. Phys. Rev.. 5: 021308. doi:10.1063/5.0207567.
^{^}Piccini G, Lee M-S, Yuk SF, Zhang D, Collinge G, Kollias L, Nguyen M-T, Glezakou V-A, Rousseau R (2022). "Ab Initio Molecular Dynamics with Enhanced Sampling in Heterogeneous Catalysis". Catal. Sci. Technol.. 12: 12–37. doi:10.1039/D1CY01329G.
^{^}Baron R, McCammon JA (2013). "Molecular Recognition and Ligand Association". Annu. Rev. Phys. Chem.. 64: 151–175. doi:10.1146/annurev-physchem-040412-110047.
^{^}Chipot C (2014). "Frontiers in Free-Energy Calculations of Biological Systems". Wiley Interdiscip. Rev. Comput. Mol. Sci.. 4: 71–89. doi:10.1002/wcms.1157.
^{^}Rydzewski J, Nowak W (2017). "Ligand Diffusion in Proteins via Enhanced Sampling in Molecular Dynamics". Phys. Life Rev.. 22: 58–74. doi:10.1016/j.plrev.2017.03.003.
^{^}Bernetti M, Masetti M, Rocchia W, Cavalli A (2019). "Kinetics of Drug Binding and Residence Time". Annu. Rev. Phys. Chem.. 70: 143–171. doi:10.1146/annurev-physchem-042018-052340.
^{^}Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011). "Scikit-learn: Machine Learning in Python". J. Mach. Learn. Res.. 12: 2825–2830. doi:https://www.jmlr.org/papers/v12/pedregosa11a.html.
^{^}Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. (2019). "PyTorch: An Imperative Style, High-Performance Deep Learning Library". Adv. Neural Inf. Process. Syst.. 32: 8026–8037. doi:https://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
^{^}Hastie T, Tibshirani R, Friedman JH, Friedman JH. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer; 2009.
^{^}Fukunaga K. Introduction to Statistical Pattern Recognition. 2nd ed. Elsevier; 2013.
^{^}Bengio Y, Courville A, Vincent P (2013). "Representation Learning: A Review and New Perspectives". IEEE Trans. Pattern Anal. Mach. Intell.. 35: 1798–1828. doi:10.1109/tpami.2013.50.
^{^}Xie J, Gao R, Nijkamp E, Zhu S-C, Wu YN (2020). "Representation Learning: A Statistical Perspective". Annu. Rev. Stat. Appl.. 7: 303–335. doi:10.1146/annurev-statistics-031219-041131.
^{^}Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L (2021). "Physics-Informed Machine Learning". Nat. Rev. Phys.. 3: 422–440. doi:10.1038/s42254-021-00314-5.
^{^}Meilă M, Zhang H (2024). "Manifold Learning: What, How, and Why". Annu. Rev. Stat. Appl.. 11: 393–417. doi:10.1146/annurev-statistics-040522-115238.
^{^}Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A (2021). "Unsupervised Learning Methods for Molecular Simulation Data". Chem. Rev.. 121: 9722–9758. doi:10.1021/acs.chemrev.0c01195.
^{^}Brunton SL, Budišić M, Kaiser E, Kutz JN (2022). "Modern Koopman Theory for Dynamical Systems". SIAM Rev.. 64: 229–340. doi:10.1137/21m1401243.
^a, b, cRohrdanz MA, Zheng W, Clementi C (2013). "Discovering Mountain Passes via Torchlight: Methods for the Definition of Reaction Coordinates and Pathways in Complex Macromolecular Reactions". Annu. Rev. Phys. Chem.. 64: 295–316. doi:10.1146/annurev-physchem-040412-110006.
^a, bLi W, Ma A (2014). "Recent Developments in Methods for Identifying Reaction Coordinates". Mol. Sim.. 40: 784–793. doi:10.1080/08927022.2014.907898.
^a, bPeters B (2016). "Reaction Coordinates and Mechanistic Hypothesis Tests". Annu. Rev. Phys. Chem.. 67: 669–690. doi:10.1146/annurev-physchem-040215-112215.
^a, bNoé F, Clementi C (2017). "Collective Variables for the Study of Long-Time Kinetics from Molecular Trajectories: Theory and Methods". Curr. Opin. Struct. Biol.. 43: 141–147. doi:10.1016/j.sbi.2017.02.006.
^a, bCeriotti M (2019). "Unsupervised Machine Learning in Atomistic Simulations, between Predictions and Understanding". J. Chem. Phys.. 150: 150901. doi:10.1063/1.5091842.
^{a, b, c, d}Wang Y, Ribeiro JML, Tiwary P (2020). "Machine Learning Approaches for Analyzing and Enhancing Molecular Dynamics Simulations". Curr. Opin. Struct. Biol.. 61: 139–145. doi:10.1016/j.sbi.2019.12.016.
^a, b, cWu H, Noé F (2020). "Variational Approach for Learning Markov Processes from Time Series Data". J. Nonlinear Sci.. 30: 23–66. doi:10.1007/s00332-019-09567-y.
^a, bKlus S, Nüske F, Koltai P, Wu H, Kevrekidis I, Schütte C, Noé F (2018). "Data-Driven Model Reduction and Transfer Operator Approximation". J. Nonlinear Sci.. 28: 985–1010. doi:10.1007/s00332-017-9437-7.
^a, bSidky H, Chen W, Ferguson AL (2020). "Machine Learning for Collective Variable Discovery and Enhanced Sampling in Biomolecular Simulation". Mol. Phys.. 118: e1737742. doi:10.1080/00268976.2020.1737742.
^{a, b, c, d}Chen M (2021). "Collective Variable-Based Enhanced Sampling and Machine Learning". Eur. Phys. J. B. 94: 1–17. doi:10.1140/epjb/s10051-021-00220-w.
^{a, b, c, d}Chen H, Chipot C (2023). "Chasing Collective Variables using Temporal Data-Driven Strategies". QRB Discovery. 4: e2. doi:10.1017/qrd.2022.23.
^{a, b, c, d, e, f}Rydzewski J, Chen M, Valsson O (2023). "Manifold Learning in Atomistic Simulations: A Conceptual Review". Mach. Learn.: Sci. Technol.. 4: 031001. doi:10.1088/2632-2153/ace81a.
^a, b, cMehdi S, Smith Z, Herron L, Zou Z, Tiwary P (2024). "Enhanced Sampling with Machine Learning". Annu. Rev. Phys. Chem.. 75: 347–370. doi:10.1146/annurev-physchem-083122-125941.
^{^}Zwanzig R. Nonequilibrium Statistical Mechanics. 1st ed. Oxford University Press; 2001. doi:10.1093/oso/9780195140187.001.0001.
^{^}Peters B, Bolhuis PG, Mullen RG, Shea J-E (2013). "Reaction Coordinates, One-Dimensional Smoluchowski Equations, and a Test for Dynamical Self-Consistency". J. Chem. Phys.. 138: 054106. doi:10.1063/1.4775807.
^{^}Hummer G (2004). "From Transition Paths to Transition States and Rate Coefficients". J. Chem. Phys.. 120: 516–523. doi:10.1063/1.1630572.
^{^}Coifman RR, Kevrekidis IG, Lafon S, Maggioni M, Nadler B (2008). "Diffusion Maps, Reduction Coordinates, and Low Dimensional Representation of Stochastic Systems". Multiscale Model. Simul.. 7: 842–864. doi:10.1137/070696325.
^{^}Berezhkovskii A, Szabo A (2005). "One-Dimensional Reaction Coordinates for Diffusive Activated Rate Processes in Many Dimensions". J. Chem. Phys.. 122: 014503. doi:10.1063/1.1818091.
^{^}Berezhkovskii A, Szabo A (2011). "Time Scale Separation Leads to Position-Dependent Diffusion along a Slow Coordinate". J. Chem. Phys.. 135: 074108. doi:10.1063/1.3626215.
^a, bBerezhkovskii AM, Makarov DE (2018). "Single-Molecule Test for Markovianity of the Dynamics along a Reaction Coordinate". J. Phys. Chem. Lett.. 9: 2190–2195. doi:10.1021/acs.jpclett.8b00956.
^a, b, cTorrie GM, Valleau JP (1977). "Nonphysical Sampling Distributions in Monte Carlo Free-Energy Estimation: Umbrella Sampling". J. Comp. Phys.. 23: 187–199. doi:10.1016/0021-9991(77)90121-8.
^{^}Mezei M (1987). "Adaptive Umbrella Sampling: Self-Consistent Determination of the Non-Boltzmann Bias". J. Comput. Phys.. 68: 237–248. doi:10.1016/0021-9991(87)90054-4.
^{^}Maragakis P, van der Vaart A, Karplus M (2009). "Gaussian-Mixture Umbrella Sampling". J. Phys. Chem. B. 113: 4664–4673. doi:10.1021/jp808381s.
^{^}Ke4stner J (2011). "Umbrella Sampling". Wiley Interdiscip. Rev. Comput. Mol. Sci.. 1: 932013;942. doi:10.1002/wcms.66.
^a, bLaio A, Parrinello M (2002). "Escaping Free-Energy Minima". Proc. Natl. Acad. Sci. U.S.A.. 99: 12562–12566. doi:10.1073/pnas.202427399.
^a, b, cBarducci A, Bussi G, Parrinello M (2008). "Well-Tempered Metadynamics: A Smoothly Converging and Tunable Free-Energy Method". Phys. Rev. Lett.. 100: 020603. doi:10.1103/PhysRevLett.100.020603.
^{^}Invernizzi M, Parrinello M (2020). "Rethinking Metadynamics: From Bias Potentials to Probability Distributions". J. Phys. Chem. Lett.. 11: 2731–2736. doi:10.1021/acs.jpclett.0c00497.
^{^}Invernizzi M, Piaggi PM, Parrinello M (2020). "Unified Approach to Enhanced Sampling". Phys. Rev. X. 10: 041034. doi:10.1103/PhysRevX.10.041034.
^a, b, cValsson O, Parrinello M (2014). "Variational Approach to Enhanced Sampling and Free Energy Calculations". Phys. Rev. Lett.. 113: 090601. doi:10.1103/PhysRevLett.113.090601.
^a, bYang YI, Parrinello M (2018). "Refining Collective Coordinates and Improving Free Energy Representation in Variational Enhanced Sampling". J. Chem. Theory Comput.. 14: 2889–2894. doi:10.1021/acs.jctc.8b00231.
^{^}Bonati L, Zhang Y-Y, Parrinello M (2019). "Neural Networks-Based Variationally Enhanced Sampling". Proc. Natl. Acad. Sci. U.S.A.. 116: 17641–17647. doi:10.1073/pnas.1907975116.
^{^}Roux B (2022). "Transition Rate Theory, Spectral Analysis, and Reactive Paths". J. Chem. Phys.. 156: 134111. doi:10.1063/5.0084209.
^{^}Shuler KE (1959). "Relaxation Processes in Multistate Systems". Phys. Fluids. 2: 442–448. doi:10.1063/1.1724416.
^{^}Risken H (1996). Fokker--Planck Equation. Springer.
^a, bBovier A, Eckhoff M, Gayrard V, Klein M (2002). "Metastability and Low Lying Spectra in Reversible Markov Chains". Commun. Math. Phys.. 228: 219–255. doi:10.1007/s002200200609.
^{^}Gaveau B, Schulman LS (1996). "Master Equation based Formulation of Nonequilibrium Statistical Mechanics". J. Math. Phys.. 37: 3897–3932. doi:10.1063/1.531608.
^{^}Gaveau B, Schulman LS (1998). "Theory of Nonequilibrium First-Order Phase Transitions for Stochastic Dynamics". J. Math. Phys.. 39: 1517–1533. doi:10.1063/1.532394.
^{^}Davies EB (1983). "Spectral Properties of Metastable Markov Semigroups". J. Funct. Anal.. 52: 315–329.
^{^}Davies EB (1982). "Metastable States of Symmetric Markov Semigroups I". Proc. London Math. Soc.. 3: 133–150.
^{^}Davies EB (1982). "Metastable States of Symmetric Markov Semigroups II". J. London Math. Soc.. 2: 541–556.
^{^}Bovier A, Den Hollander F. Metastability: A Potential-Theoretic Approach. Vol. 351. Springer; 2016.
^a, bDarve E, Pohorille A (2001). "Calculating Free Energies using Average Force". J. Chem. Phys.. 115: 9169. doi:10.1063/1.1410978.
^{^}Rosso L, Mináry P, Zhu Z, Tuckerman ME (2002). "On the Use of the Adiabatic Molecular Dynamics Technique in the Calculation of Free Energy Profiles". J. Chem. Phys.. 116: 4389–4402. doi:10.1063/1.1448491.
^{^}Morishita T, Itoh SG, Okumura H, Mikami M (2012). "Free-Energy Calculation via Mean-Force Dynamics using a Logarithmic Energy Landscape". Phys. Rev. E. 85: 066702. doi:10.1103/PhysRevE.85.066702.
^a, b, cGiberti F, Cheng B, Tribello GA, Ceriotti M (2020). "Iterative Unbiasing of Quasi-Equilibrium Sampling". J. Chem. Theory Comput.. 16: 100–107. doi:10.1021/acs.jctc.9b00907.
^a, bTiwary P, Parrinello M (2015). "A Time-Independent Free Energy Estimator for Metadynamics". J. Phys. Chem. B. 119: 736–742. doi:10.1021/jp504920s.
^a, bBonomi M, Barducci A, Parrinello M (2009). "Reconstructing the Equilibrium Boltzmann Distribution from Well-Tempered Metadynamics". J. Comput. Chem.. 30: 1615–1621. doi:10.1002/jcc.21305.
^a, bSchäfer TM, Settanni G (2020). "Data Reweighting in Metadynamics Simulations". J. Chem. Theory Comput.. 16: 2042–2052. doi:10.1021/acs.jctc.9b00867.
^{^}Linker SM, Wei\u00df RG, Riniker S (2020). "Connecting Dynamic Reweighting Algorithms: Derivation of the Dynamic Reweighting Family Tree". J. Chem. Phys.. 153: 234106. doi:10.1063/5.0019687.
^a, bKamenik AS, Linker SM, Riniker S (2022). "Enhanced Sampling without Borders: On Global Biasing Functions and how to Reweight them". Phys. Chem. Chem. Phys.. 24: 1225–1236. doi:10.1039/D1CP04809K.
^{^}Pérez-Hernández G, Paul F, Giorgino T, De Fabritiis G, Noé F (2013). "Identification of Slow Molecular Order Parameters for Markov Model Construction". J. Chem. Phys.. 139: 015102. doi:10.1063/1.4811489.
^{^}Wehmeyer C, Noé F (2018). "Time-Lagged Autoencoders: Deep Learning of Slow Collective Variables for Molecular Kinetics". J. Chem. Phys.. 148: 241703. doi:10.1063/1.5011399.
^a, bMcCarty J, Parrinello M (2017). "A Variational Conformational Dynamics Approach to the Selection of Collective Variables in Metadynamics". J. Chem. Phys.. 147: 204109. doi:10.1063/1.4998598.
^a, bBonati L, Piccini G, Parrinello M (2021). "Deep Learning the Slow Modes for Rare Events Sampling". Proc. Natl. Acad. Sci. U.S.A.. 118: e2113533118. doi:10.1073/pnas.2113533118.
^{^}Mardt A, Pasquali L, Wu H, Noé F (2018). "VAMPnets for Deep Learning of Molecular Kinetics". Nat. Commun.. 9: 5. doi:10.1038/s41467-017-02388-1.
^{^}Chen W, Sidky H, Ferguson A (2019). "Nonlinear Discovery of Slow Molecular Modes using State-Free Reversible VAMPnets". J. Chem. Phys.. 150: 214114. doi:10.1063/1.5092521.
^{^}Shi J, Malik J (2000). "Normalized Cuts and Image Segmentation". IEEE Trans. Pattern Anal. Mach. Intell.. 22: 888–905. doi:10.1109/cvpr.1997.609407.
^a, bBelkin M, Niyogi P (2001). "Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering". Adv. Neural Inf. Process. Syst.. 14: 585–591. doi:10.5555/2980539.2980616.
^a, bBelkin M, Niyogi P (2003). "Laplacian Eigenmaps for Dimensionality Reduction and Data Representation". Neural Comput.. 15: 1373–1396. doi:10.1162/089976603321780317.
^a, bBelkin M, Niyogi P (2004). "Semi-Supervised Learning on Riemannian Manifolds". Mach. Learn.. 56: 209–239. doi:10.1023/b:mach.0000033120.25363.1e.
^a, bBelkin M, Niyogi P (2008). "Towards a Theoretical Foundation for Laplacian-based Manifold Methods". J. Comput. Syst. Sci.. 74: 1289–1308. doi:10.1016/j.jcss.2007.08.006.
^a, bChung FRK. Spectral Graph Theory. American Mathematical Society; 1997. (92).
^{^}Schölkopf B, Smola A, Müller K-R (1998). "Nonlinear Component Analysis as a Kernel Eigenvalue Problem". Neural Comput.. 10: 1299–1319. doi:10.1162/089976698300017467.
^{^}Szummer M, Jaakkola T (2001). "Partially Labeled Classification with Markov Random Walks". Adv. Neural Inf. Process. Syst.. 14: 945–952. doi:http://papers.neurips.cc/paper/1967-partially-labeled-classification-with-markov-random-walks.pdf.
^{^}Kondor RI, Lafferty JD (2002). "Diffusion Kernels on Graphs and Other Discrete Input Spaces". Proc. ICML. 315–322. doi:https://www.ml.cmu.edu/research/dap-papers/kondor-diffusion-kernels.pdf.
^{^}Izenman AJ (2012). "Introduction to Manifold Learning". Wiley Interdiscip. Rev. Comput. Stat.. 4: 439–446. doi:10.1002/wics.1222.
^a, b, cCoifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW (2005). "Geometric Diffusions as a Tool for Harmonic Analysis and Structure Definition of Data: Diffusion Maps". Proc. Natl. Acad. Sci. U.S.A.. 102: 7426–7431. doi:10.1073/pnas.0500334102.
^a, b, cNadler B, Lafon S, Coifman RR, Kevrekidis IG (2006). "Diffusion Maps, Spectral Clustering and Reaction Coordinates of Dynamical Systems". Appl. Comput. Harmon. Anal.. 21: 113–127. doi:10.1016/j.acha.2005.07.004.
^a, bNadler B, Lafon S, Kevrekidis I, Coifman R (2006). "Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators". Adv. Neural Inf. Process. Syst.. 18: 955–962. doi:https://proceedings.neurips.cc/paper/2005/file/2a0f97f81755e2878b264adf39cba68e-Paper.pdf.
^a, bNadler B, Galun M (2006). "Fundamental Limitations of Spectral Clustering". Adv. Neural Inf. Process. Syst.. 19: 1017–1024. doi:10.7551/mitpress/7503.003.0132.
^a, bCoifman RR, Lafon S (2006). "Diffusion Maps". Appl. Comput. Harmon. Anal.. 21: 5–30. doi:10.1016/j.acha.2006.04.006.
^{^}Zelnik-Manor L, Perona P (2004). "Self-Tuning Spectral Clustering". Adv Neural Inf Process Syst. 17: 1601–1608. doi:http://papers.neurips.cc/paper/2619-self-tuning-spectral-clustering.pdf.
^{^}Rohrdanz MA, Zheng W, Maggioni M, Clementi C (2011). "Determination of Reaction Coordinates via Locally Scaled Diffusion Map". J. Chem. Phys.. 134: 03B624. doi:10.1063/1.3569857.
^a, bZheng W, Rohrdanz MA, Clementi C (2013). "Rapid Exploration of Configuration Space with Diffusion-Map-Directed Molecular Dynamics". J. Phys. Chem. B. 117: 12769–12776. doi:10.1021/jp401911h.
^a, bZheng W, Vargiu AV, Rohrdanz MA, Carloni P, Clementi C (2013). "Molecular Recognition of DNA by Ligands: Roughness and Complexity of the Free Energy Profile". J. Chem. Phys.. 139: 10B612_1. doi:10.1063/1.4824106.
^{^}Berry T, Giannakis D, Harlim J (2015). "Nonparametric Forecasting of Low-Dimensional Dynamical Systems". Phys. Rev. E. 91: 032915. doi:10.1103/PhysRevE.91.032915.
^{^}Berry T, Harlim J (2016). "Variable Bandwidth Diffusion Kernels". Appl. Comput. Harmon. Anal.. 40: 68–96. doi:10.1016/j.acha.2015.01.001.
^{a, b, c, d}Dsilva CJ, Talmon R, Rabin N, Coifman RR, Kevrekidis IG (2013). "Nonlinear Intrinsic Variables and State Reconstruction in Multiscale Simulations". J. Chem. Phys.. 139: 184109. doi:10.1063/1.4828457.
^a, bDsilva CJ, Talmon R, Gear CW, Coifman RR, Kevrekidis IG (2016). "Data-Driven Reduction for a Class of Multiscale Fast-Slow Stochastic Dynamical Systems". SIAM J. Appl. Dyn. Syst.. 15: 1327–1351. doi:10.1137/151004896.
^a, b, cSinger A, Erban R, Kevrekidis IG, Coifman RR (2009). "Detecting Intrinsic Slow Variables in Stochastic Dynamical Systems by Anisotropic Diffusion Maps". Proc. Natl. Acad. Sci. U.S.A.. 106: 16090–16095. doi:10.1073/pnas.0905547106.
^{^}Singer A, Coifman RR (2008). "Non-Linear Independent Component Analysis with Diffusion Maps". Appl. Comput. Harmon. Anal.. 25: 226–239. doi:10.1016/j.acha.2007.11.001.
^a, bBerry T, Sauer T (2016). "Local Kernels and the Geometric Structure of Data". Appl. Comput. Harmon. Anal.. 40: 439–469. doi:10.1016/j.acha.2015.03.002.
^{^}Mugnai ML, Elber R (2015). "Extracting the Diffusion Tensor from Molecular Dynamics Simulation with Milestoning". J. Chem. Phys.. 142: 014105. doi:10.1063/1.4904882.
^{^}Domingues TS, Coifman R, Haji-Akbari A (2024). "Estimating Position-Dependent and Anisotropic Diffusivity Tensors from Molecular Dynamics Trajectories: Existing Methods and Future Outlook". J. Chem. Theory Comput.. 11: 4427–4455. doi:10.1021/acs.jctc.4c00148.
^{^}Berry T, Cressman JR, Greguric-Ferencek Z, Sauer T (2013). "Time-Scale Separation from Diffusion-Mapped Delay Coordinates". SIAM J. Appl. Dyn. Syst.. 12: 618–649. doi:10.1137/12088183x.
^{^}Lafon S, Lee AB (2006). "Diffusion Maps and Coarse-Graining: A Unified Framework for Dimensionality Reduction, Graph Partitioning, and Data Set Parameterization". IEEE Trans. Pattern Anal. Mach. Intel.. 28: 1393–1403. doi:10.1109/tpami.2006.184.
^{a, b, c, d, e, f}Rydzewski J, Valsson O (2021). "Multiscale Reweighted Stochastic Embedding: Deep Learning of Collective Variables for Enhanced Sampling". J. Phys. Chem. A. 125: 6286–6302. doi:10.1021/acs.jpca.1c02869.
^{a, b, c, d, e, f, g, h, i, j}Rydzewski J, Chen M, Ghosh TK, Valsson O (2022). "Reweighted Manifold Learning of Collective Variables from Enhanced Sampling Simulations". J. Chem. Theory Comput.. 18: 7179–7192. doi:10.1021/acs.jctc.2c00873.
^{^}Rydzewski J, Nowak W (2016). "Machine Learning Based Dimensionality Reduction Facilitates Ligand Diffusion Paths Assessment: A Case of Cytochrome P450cam". J. Chem. Theory Comput.. 12: 2110–2120. doi:10.1021/acs.jctc.6b00212.
^{^}Chiavazzo E, Covino R, Coifman RR, Gear CW, Georgiou AS, Hummer G, Kevrekidis IG (2017). "Intrinsic Map Dynamics Exploration for Uncharted Effective Free-Energy Landscapes". Proc. Natl Acad. Sci. U.S.A.. 114: E5494–E5503. doi:10.1073/pnas.1621481114.
^{^}Ferguson AL, Panagiotopoulos AZ, Debenedetti PG, Kevrekidis IG (2011). "Integrating Diffusion Maps with Umbrella Sampling: Application to Alanine Dipeptide". J. Chem. Phys.. 134: 04B606. doi:10.1063/1.3574394.
^a, bBanisch R, Trstanova Z, Bittracher A, Klus S, Koltai P (2020). "Diffusion Maps Tailored to Arbitrary Non-Degenerate Itô Processes". Appl. Comput. Harmon. Anal.. 48: 242–265. doi:10.1016/j.acha.2018.05.001.
^a, bTrstanova Z, Leimkuhler B, Lelièvre T (2020). "Local and Global Perspectives on Diffusion Maps in the Analysis of Molecular Systems". Proc. Royal Soc. A. 476: 20190036. doi:10.1098/rspa.2019.0036.
^{a, b, c, d, e, f}Zhang J, Chen M (2018). "Unfolding Hidden Barriers by Active Enhanced Sampling". Phys. Rev. Lett.. 121: 010601. doi:10.1103/PhysRevLett.121.010601.
^a, bRydzewski J (2023). "Selecting High-Dimensional Representations of Physical Systems by Reweighted Diffusion Maps". J. Phys. Chem. Lett.. 14: 2778–2783. doi:10.1021/acs.jpclett.3c00265.
^{^}Keller BG, Bolhuis PG (2024). "Dynamical Reweighting for Biased Rare Event Simulations". Annu. Rev. Phys. Chem.. 75: 137–162. doi:10.1146/annurev-physchem-083122-124538.
^{^}Boninsegna L, Gobbo G, Noé F, Clementi C (2015). "Investigating Molecular Kinetics by Variationally Optimized Diffusion Maps". J. Chem. Theory Comput.. 11: 5947–5960. doi:10.1021/acs.jctc.5b00749.
^{^}Noé F, Clementi C (2015). "Kinetic Distance and Kinetic Maps from Molecular Dynamics Simulation". J. Chem. Theory Comput.. 11: 5002–5011. doi:10.1021/acs.jctc.5b00553.
^{^}Noé F, Banisch R, Clementi C (2016). "Commute Maps: Separating Slowly Mixing Molecular Configurations for Kinetic Modeling". J. Chem. Theory Comput.. 12: 5620–5630. doi:10.1021/acs.jctc.6b00762.
^{^}Thiede EH, Giannakis D, Dinner AR, Weare J (2019). "Galerkin Approximation of Dynamical Quantities using Trajectory Data". J. Chem. Phys.. 150: 244111. doi:10.1063/1.5063730.
^{^}Fowlkes C, Belongie S, Chung F, Malik J (2004). "Spectral Grouping using the Nystrom Method". IEEE Trans. Pattern Anal. Mach. Intell.. 26: 214–225. doi:10.1109/tpami.2004.1262185.
^{^}Long AW, Ferguson AL (2019). "Landmark Diffusion Maps (L-dMaps): Accelerated Manifold Learning Out-of-Sample Extension". Appl. Comput. Harmon. Anal.. 47: 190–211. doi:10.1016/j.acha.2017.08.004.
^{^}Coifman RR, Lafon S (2006). "Geometric Harmonics: A Novel Tool for Multiscale Out-of-Sample Extension of Empirical Functions". Appl. Comput. Harmon. Anal.. 21: 31–52. doi:10.1016/j.acha.2005.07.005.
^{^}Evangelou N, Dietrich F, Chiavazzo E, Lehmberg D, Meila M, Kevrekidis IG (2023). "Double Diffusion Maps and Their Latent Harmonics for Scientific Computations in Latent Space". J. Comput. Phys.. 485: 112072. doi:10.1016/j.jcp.2023.112072.
^{^}Bengio Y, Paiement JF, Vincent P, Delalleau O, Roux N, Ouimet M (2003). "Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering". Adv. Neural Inf. Process. Syst.. 16: 177–184. doi:https://proceedings.neurips.cc/paper/2003/file/cf05968255451bdefe3c5bc64d550517-Paper.pdf.
^{^}Hinton GE, Roweis S (2002). "Stochastic Neighbor Embedding". Adv Neural Inf Process Syst. 15: 833–864. doi:10.5555/2968618.2968729.
^{^}van der Maaten L, Hinton G (2008). "Visualizing Data using t-SNE". J. Mach. Learn. Res.. 9: 2579–2605. doi:http://www.jmlr.org/papers/v9/vandermaaten08a.html.
^{^}van der Maaten L (2009). "Learning a Parametric Embedding by Preserving Local Structure". J. Mach. Learn. Res.. 5: 384–391. doi:http://proceedings.mlr.press/v5/maaten09a.html.
^{^}van der Maaten L, Postma E, van den Herik J (2009). "Dimensionality Reduction: A Comparative Review". J. Mach. Learn. Res.. 10: 66–71. Available from: https://lvdmaaten.github.io/publications/papers/TR_Dimensionality_Reduction_Review_2009.pdf.
^{^}Sorensen MR, Voter AF (2000). "Temperature-Accelerated Dynamics for Simulation of Infrequent Events". J. Chem. Phys.. 112: 9599–9606. doi:10.1063/1.481576.
^{^}Maragliano L, Vanden-Eijnden E (2006). "A Temperature Accelerated Method for Sampling Free Energy and Determining Reaction Pathways in Rare Events Simulations". Chem. Phys. Lett.. 426: 168–175. doi:10.1016/j.cplett.2006.05.062.
^{^}Liu Y, Ghosh TK, Lin G, Chen M (2024). "Unbiasing Enhanced Sampling on a High-Dimensional Free Energy Surface with a Deep Generative Model". J. Phys. Chem. Lett.. 15: 3938–3945. doi:10.1021/acs.jpclett.3c03515.
^{^}Kullback S, Leibler RA (1951). "On Information and Sufficiency". Ann. Math. Stat.. 22: 79–86. doi:10.1214/aoms/1177729694.
^{^}Tiwary P, Berne BJ (2016). "Spectral Gap Optimization of Order Parameters for Sampling Complex Molecular Systems". Proc. Natl. Acad. Sci. U.S.A.. 113: 2839. doi:10.1073/pnas.1600917113.
^{^}Tiwary P, Berne B (2017). "Predicting Reaction Coordinates in Energy Landscapes with Diffusion Anisotropy". J. Chem. Phys.. 147: 152701. doi:10.1063/1.4983727.
^{^}Pant S, Smith Z, Wang Y, Tajkhorshid E, Tiwary P (2020). "Confronting Pitfalls of AI-Augmented Molecular Dynamics using Statistical Physics". J. Chem. Phys.. 153: 234118. doi:10.1063/5.0030931.
^{^}Tsai S-T, Smith Z, Tiwary P (2021). "SGOOP-d: Estimating Kinetic Distances and Reaction Coordinate Dimensionality for Rare Event Systems from Biased/Unbiased Simulations". J. Chem. Theory Comput.. 17: 6757–6765. doi:10.1021/acs.jctc.1c00431.
^{^}Zou Z, Tsai S-T, Tiwary P (2021). "Toward Automated Sampling of Polymorph Nucleation and Free Energies with the SGOOP and Metadynamics". J. Phys. Chem. B. 125: 13049–13056. doi:10.1021/acs.jpcb.1c07595.
^{^}Ghosh K, Dixit PD, Agozzino L, Dill KA (2020). "The Maximum Caliber Variational Principle for Nonequilibria". Annu. Rev. Phys. Chem.. 71: 213–238. doi:10.1146/annurev-physchem-071119-040206.
^{^}Rydzewski J (2023). "Spectral Map: Embedding Slow Kinetics in Collective Variables". J. Phys. Chem. Lett.. 14: 5216–5220. doi:10.1021/acs.jpclett.3c01101.
^{a, b, c, d}Rydzewski J (2024). "Spectral Map for Slow Collective Variables, Markovian Dynamics, and Transition State Ensembles". J. Chem. Theory Comput.. 20: 7775–7784. doi:10.1021/acs.jctc.4c00428.
^{^}Rydzewski J, Gökdemir T (2024). "Learning Markovian Dynamics with Spectral Maps". J. Chem. Phys.. 160: 091102. doi:10.1063/5.0189241.
^{^}Hummer G, Szabo A (2015). "Optimal Dimensionality Reduction of Multistate Kinetic and Markov-State Models". J. Phys. Chem. B. 119: 9029–9037. doi:10.1021/jp508375q.
^{^}Martini L, Kells A, Covino R, Hummer G, Buchete N-V, Rosta E (2017). "Variational Identification of Markovian Transition States". Phys. Rev. X. 7: 031060. doi:10.1103/physrevx.7.031060.
^{^}Ray D, Trizio E, Parrinello M (2023). "Deep Learning Collective Variables from Transition Path Ensemble". J. Chem. Phys.. 158: 204102. doi:10.1063/5.0148872.
^{^}Best RB, Hummer G (2005). "Reaction Coordinates and Rates from Transition Paths". Proc. Natl. Acad. Sci. U.S.A.. 102: 6732–6737. doi:10.1073/pnas.0408098102.
^{^}Tribello GA, Bonomi M, Branduardi D, Camilloni C, Bussi G (2014). "plumed 2: New Feathers for an Old Bird". Comp. Phys. Commun.. 185: 604–613. doi:10.1016/j.cpc.2013.09.018.
^{^}plumed Consortium (2019). "Promoting Transparency and Reproducibility in Enhanced Molecular Simulations". Nat. Methods. 16: 670–673. doi:10.1038/s41592-019-0506-8.
^{^}Tribello GA, Bonomi M, Bussi G, Camilloni C, et al. (2024). "PLUMED Tutorials: A Collaborative, Community-Driven Learning Ecosystem". arXiv preprint arXiv:2412.03595. Available from: https://arxiv.org/abs/2412.03595.
^a, bBonati L, Trizio E, Rizzi A, Parrinello M (2023). "A Unified Framework for Machine Learning Collective Variables for Enhanced Sampling Simulations: mlcolvar". J. Chem. Phys.. 159: 014801. doi:10.1063/5.0156343.
^{^}Trizio E, Rizzi A, Piaggi PM, Invernizzi M, Bonati L (2024). "Advanced Simulations with PLUMED: OPES and Machine Learning Collective Variables". arXiv preprint arXiv:2410.18019. Available from: https://arxiv.org/abs/2410.18019.
^{^}Maragliano L, Fischer A, Vanden-Eijnden E, Ciccotti G (2006). "String Method in Collective Variables: Minimum Free Energy Paths and Isocommittor Surfaces". J. Chem. Phys.. 125: 024106. doi:10.1063/1.2212942.
^{^}Lange OF, Grubmüller H (2006). "Collective Langevin Dynamics of Conformational Motions in Proteins". J. Chem. Phys.. 124: 214903. doi:10.1063/1.2199530.
^{^}Legoll F, Lelievre T (2010). "Effective Dynamics using Conditional Expectations". Nonlinearity. 23: 2131. doi:10.1088/0951-7715/23/9/006.
^{^}Zhang W, Hartmann C, Schfctte C (2016). "Effective Dynamics along Given Reaction Coordinates, and Reaction Rate Theory". Faraday Discuss.. 195: 365–394. doi:10.1039/c6fd00147e.
^{^}Rhee YM, Pande VS (2005). "One-Dimensional Reaction Coordinate and the Corresponding Potential of Mean Force from Commitment Probability Distribution". J. Phys. Chem. B. 109: 6780–6786. doi:10.1021/jp045544s.
^{^}Krivov SV, Karplus M (2008). "Diffusive Reaction Dynamics on Invariant Free Energy Profiles". Proc. Natl. Acad. Sci. U.S.A.. 105: 13841–13846. doi:10.1073/pnas.0800228105.
^{^}Best RB, Hummer G (2010). "Coordinate-Dependent Diffusion in Protein Folding". Proc. Natl. Acad. Sci. U.S.A.. 107: 1088–1093. doi:10.1073/pnas.0910390107.
^{^}Dietschreit JC, Diestler DJ, Hulm A, Ochsenfeld C, Gf3mez-Bombarelli R (2022). "From Free-Energy Profiles to Activation Free Energies". J. Chem. Phys.. 157: 084113. doi:10.1063/5.0102075.
^{^}Nakamura T (2024). "Derivation of the Invariant Free-Energy Landscape Based on Langevin Dynamics". Phys. Rev. Lett.. 132: 137101. doi:10.1103/physrevlett.132.137101.
^{^}Boyd S, Diaconis P, Xiao L (2004). "Fastest Mixing Markov Chain on a Graph". SIAM Rev.. 46: 667–689. doi:10.1137/s0036144503423264.
^{^}Boyd S (2006). "Convex Optimization of Graph Laplacian Eigenvalues". Proc. ICM. 3: 1311–1319. doi:10.4171/022-3/63.
^{^}Donati L, Hartmann C, Keller BG (2017). "Girsanov Reweighting for Path Ensembles and Markov State Models". J. Chem. Phys.. 146: 244112. doi:10.1063/1.4989474.
^{^}Kieninger S, Donati L, Keller BG (2020). "Dynamical Reweighting Methods for Markov Models". Curr. Opin. Struct. Biol.. 61: 124–131. doi:10.1016/j.sbi.2019.12.018.
^{^}Donati L, Weber M, Keller BG (2022). "A Review of Girsanov Reweighting and of Square Root Approximation for Building Molecular Markov State Models". J. Math. Phys.. 63: 123306. doi:10.1063/5.0127227.
^{^}Chen W, Ferguson AL (2018). "Molecular Enhanced Sampling with Autoencoders: On-the-fly Collective Variable Discovery and Accelerated Free Energy Landscape Exploration". J. Comput. Chem.. 39: 2079–2102. doi:10.1002/jcc.25520.
^{^}Ribeiro JML, Bravo P, Wang Y, Tiwary P (2018). "Reweighted Autoencoded Variational Bayes for Enhanced Sampling (RAVE)". J. Chem. Phys.. 149: 072301. doi:10.1063/1.5025487.
^{^}Brotzakis ZF, Parrinello M (2018). "Enhanced Sampling of Protein Conformational Transitions via Dynamically Optimized Collective Variables". J. Chem. Theory Comput.. 15: 1393–1398. doi:10.1021/acs.jctc.8b00827.
^{^}Mehdi S, Wang D, Pant S, Tiwary P (2022). "Accelerating All-Atom Simulations and Gaining Mechanistic Understanding of Biophysical Systems through State Predictive Information Bottleneck". J. Chem. Theory Comput.. 18: 3231–3238. doi:10.1021/acs.jctc.2c00058.
^{^}Shmilovich K, Ferguson AL (2023). "Girsanov Reweighting Enhanced Sampling Technique (GREST): On-the-Fly Data-Driven Discovery of and Enhanced Sampling in Slow Collective Variables". J. Phys. Chem. A. 127: 3497–3517. doi:10.1021/acs.jpca.3c00505.
^{^}Molgedey L, Schuster HG (1994). "Separation of a Mixture of Independent Signals using Time Delayed Correlations". Phys. Rev. Lett.. 72: 3634–3637. doi:10.1103/PhysRevLett.72.3634.
^{^}Wiskott L, Sejnowski TJ (2002). "Slow Feature Analysis: Unsupervised Learning of Invariances". Neural Comput.. 14: 715–770. doi:10.1162/089976602317318938.
^{^}Ceriotti M, Tribello GA, Parrinello M (2011). "Simplifying the Representation of Complex Free-Energy Landscapes using Sketch-Map". Proc. Natl. Acad. Sci. U.S.A.. 108: 13023–13028. doi:10.1073/pnas.1108486108.
^{^}Naritomi Y, Fuchigami S (2011). "Slow Dynamics in Protein Fluctuations Revealed by Time-Structure based Independent Component Analysis: the Case of Domain Motions". J. Chem. Phys.. 134: 065101. doi:10.1063/1.3554380.
^{^}Ceriotti M, Tribello GA, Parrinello M (2013). "Demonstrating the Transferability and the Descriptive Power of Sketch-Map". J. Chem. Theory Comput.. 9: 1521–1532. doi:10.1021/ct3010563.
^{^}Pérez-Hernández G, Noé F (2016). "Hierarchical Time-Lagged Independent Component Analysis: Computing Slow Modes and Reaction Coordinates for Large Molecular Systems". J. Chem. Theory Comput.. 12: 6118–6129. doi:10.1021/acs.jctc.6b00738.
^{^}McGibbon RT, Husic BE, Pande VS (2017). "Identification of Simple Reaction Coordinates from Complex Dynamics". J. Chem. Phys.. 146: 044109. doi:10.1063/1.4974306.
^{^}Sidky H, Chen W, Ferguson AL (2019). "High-Resolution Markov State Models for the Dynamics of Trp-Cage Miniprotein Constructed Over Slow Folding Modes Identified by State-Free Reversible VAMPnets". J. Phys. Chem. B. 123: 7999–8009. doi:10.1021/acs.jpcb.9b05578.
^{^}Li Q, Lin B, Ren W (2019). "Computing Committor Functions for the Study of Rare Events using Deep Learning". J. Chem. Phys.. 151: 054112. doi:10.1063/1.5110439.
^{^}Chen W, Sidky H, Ferguson AL (2019). "Capabilities and Limitations of Time-Lagged Autoencoders for Slow Mode Discovery in Dynamical Systems". J. Chem. Phys.. 151: 064123. doi:10.1063/1.5112048.
^{^}Tribello GA, Gasparotto P (2019). "Using Dimensionality Reduction to Analyze Protein Trajectories". Front. Mol. Biosci.. 6: 46. doi:10.3389/fmolb.2019.00046.
^{^}Wang Y, Ribeiro JML, Tiwary P (2019). "Past--Future Information Bottleneck for Sampling Molecular Reaction Coordinate Simultaneously with Thermodynamics and Kinetics". Nat. Commun.. 10: 3573. doi:10.1038/s41467-019-11405-4.
^{^}Zhang J, Yang YI, Noé F (2019). "Targeted Adversarial Learning Optimized Sampling". J. Phys. Chem. Lett.. 10: 5791–5797. doi:10.1021/acs.jpclett.9b02173.
^{^}Bonati L, Rizzi V, Parrinello M (2020). "Data-Driven Collective Variables for Enhanced Sampling". J. Phys. Chem. Lett.. 11: 2998–3004. doi:10.1021/acs.jpclett.0c00535.
^{^}Morishita T (2021). "Time-Dependent Principal Component Analysis: A Unified Approach to High-Dimensional Data Reduction using Adiabatic Dynamics". J. Chem. Phys.. 155: 134114. doi:10.1063/5.0061874.
^{^}Wang D, Tiwary P (2021). "State Predictive Information Bottleneck". J. Chem. Phys.. 154: 134111. doi:10.1063/5.0038198.
^{^}Belkacemi Z, Gkeka P, Lelièvre T, Stoltz G (2021). "Chasing Collective Variables using Autoencoders and Biased Trajectories". J. Chem. Theory Comput.. 18: 59–78. doi:10.1021/acs.jctc.1c00415.
^{^}Novelli P, Bonati L, Pontil M, Parrinello M (2022). "Characterizing Metastable States with the Help of Machine Learning". J. Chem. Theory Comput.. 18: 5195–5202. doi:10.1021/acs.jctc.2c00393.
^{^}Ketkaew R, Luber S (2022). "DeepCV: A Deep Learning Framework for Blind Search of Collective Variables in Expanded Configurational Space". J. Chem. Inf. Model.. 62: 6352–6364. doi:10.1021/acs.jcim.2c00883.
^{^}Sun L, Vandermause J, Batzner S, Xie Y, Clark D, Chen W, Kozinsky B (2022). "Multitask Machine Learning of Collective Variables for Enhanced Sampling of Rare Events". J. Chem. Theory Comput.. 18: 2341–2353. doi:10.1021/acs.jctc.1c00143.
^{^}Song P, Zhao C (2022). "Slow Down to go Better: A Survey on Slow Feature Analysis". IEEE Trans. Neural Netw. Learn. Syst.. 35: 3416–3436. doi:10.1109/TNNLS.2022.3201621.
^{^}Chen H, Roux B, Chipot C (2023). "Discovering Reaction Pathways, Slow Variables, and Committor Probabilities with Machine Learning". J. Chem. Theory Comput.. 19: 4414–4426. doi:10.1021/acs.jctc.3c00028.
^{^}Jung H, Covino R, Arjun A, Leitold C, Dellago C, Bolhuis PG, Hummer G (2023). "Machine-Guided Path Sampling to Discover Mechanisms of Molecular Self-Organization". Nat. Comput. Sci.. 3: 334–345. doi:10.1038/s43588-023-00428-z.