CartesianNeural Calculator

In the CartesianNeural module, the Cartesian coordinates of the atoms in the system are taken as inputs and not further transformed before being fed into the neural network. This calculator only work if the number and identities of atoms in the systems under study do not change; only the atomic positions can change. Because no transformation is made to the coordinates, this calculator can in principle always represent the potential energy surface exactly (with a large enough training set and system size).

cartesian-network.png

Running CartesianNeural

Importing CartesianNeural

An ASE-compatible version of CartesianNeural is imported as

from neural.cartesian import CartesianNeural

Initializing CartesianNeural

The calculator can be initiated, specifying a number of optional parameters. The default values are shown below.

calc = CartesianNeural(no_of_atoms=None,
                       json=None,
                       hiddenlayers=(5, 5),
                       activation='tanh',
                       label=None,
                       weights=None,
                       scalings=None,
                       extrapolate=True)

A detailed list of all keywords for the CartesianNeural calculator follows:

Keyword Type Default value Description
no_of_atoms int None Number of atoms in the atomic system for which CartesianNeural will be set up.
json str None Output JSON file, or input file to read all keywords if JSON already exists. If so, then the values assigned directly to all other keywords are ignored.
hiddenlayers tuple (5, 5) Architecture of hidden layers for the conventional neural network. Note that number of nodes in the input layer is always equal to three times number of atoms in the system and the number of nodes in the output layer is always one.
activation str 'tanh' To assign the type of activation function. "tanh" refers to tanh function, and "sigmoid" refers to sigmoid function.
label str None Name used for all files.
weights dict None Dictionary of arrays of weights connecting one layer to the next. Layers are indexed from 0 while the weight dictionary has key values starting from 1. There is one less weight array than there are layers. An example is weights = {1: [[1, 1], [2, 4], [4, 5], [6, 1]], 2: [[4], [6], [1]]}. The arrays are set up to connect node i in the previous layer with node j in the current layer with indices w[i,j]. There are n+1 rows (i values), with the last row being the bias, and m columns (j values). If weights are not given, weights will be randomly generated such that the arguments of the activation function are in the range -1.5 to 1.5.
scalings dict None Dictionary of parameters for slope and intercept. They are used in order to remove the final value from the range that the activation function would otherwise be locked in. For example, scalings = {intercept':1,slope':2}
extrapolate bool True Extrapolate is defined by any Cartesian coordinate (x, y, z) of an atom being in a region that is not covered by the scale argument.

Starting from old parameters

The parameters of calculator are saved when terminating so that the calculator can be re-established for future calculations. If the previously trained parameter file is named 'old.json', it can be introduced to the Neural to take those values with something like this.

calc = Neural(json='old.json', label='new')

The label ('new') is used as a prefix for any output from use of this calculator. It is a good idea to have a different name for this label to avoid accidentally overwriting your carefully trained calculator's files!

Training calculator

Training a new calculator instance occurs with the train method, which is given a list of training images and a desired value of root mean square error (rmse) per image per atom as below:

calc.train(images, goal=0.005)

Calling this method will generate output files where you can watch the progress. Note that this is in general a computationally-intensive process!

The two parameters are described in more detail in the following table:

Keyword Type Default value Description
images list - List of ASE atoms objects with positions, symbols, and energies in ASE form; this is the training set of data. Energies can be obtained from any reference, e.g. DFT calculations. This list can be provided in three possible forms: a list (or list-like object) of ASE images, a path to an ASE trajectory file, or a path to an ASE db file
goal float 0.005 Threshold rmse (eV) per image per atom at which simulation is converged.
train_forces bool True If "True", forces will be trained as well.
overfitting_constraint float 0. The constant to suppress overfitting. A proper value for this constant is subtle and depends on the train data set; a small value may not cure the over-fitting issue, whereas a large value may cause over-smoothness.

In general, we plan to make the ASE database the recommended data type. However, this is a 'work in progress' over at ASE.

Parallel computing

As the number of either training images or the number of atoms in each image increases, serial computing takes a long time, and thus, resorting to parallel computing becomes inevitable. Two common approaches in parallel computing are either to parallelize over multiple threads or to parallelize over multiple processes. In the former approach, threads usually have access to the same memory area, which can lead to conflicts in case of improper synchronization. But in the second approach, completely separate memory locations are allocated to each spawned process which avoids any possible interference between sub-tasks.

Currently, CartesianNeural calculator takes the second approach via multiprocessing, a built-in module in Python 2.6 (and above) standard library. For details about multiprocessing module, the reader is directed to the official documentation. Only local concurrency (multi-computing over local cores) is possible in the current release of BPNeural code (It is planned that remote concurrency (multi-computing over different machines) also becomes possible in the future releases). The code starts with serial computing, but when the neural network is feeding forward as well as when the corresponding error function is propagated backward, multiple processes are spawned each doing a sub-task on part of images, and putting the results back in multiprocessing Queues. All the calculated values of Queues are then gathered at the end and sent to the next step.

The code automatically finds the number of available cores.