Skip to content

Elemental Embeddings

The data contained in this repository are a collection of various elemental representation/embedding schemes. We provide the literature source for these representations as well as the data source for which the files were obtained. Some representations have been obtained from the following repositories:

Linear representations

For the linear/scalar representations, the Embedding class will load these representations as one-hot vectors where the vector components are ordered following the scale (i.e. the atomic representation is ordered by atomic numbers).

Modified Pettifor scale

The following paper describes the details of the modified Pettifor chemical scale: The optimal one-dimensional periodic table: a modified Pettifor chemical scale from data mining

Data source

Atomic numbers

We included atomic as a linear representation to generate one-hot vectors corresponding to the atomic numbers

Vector representations

The following representations are all vector representations (some are local, some are distributed) and the Embedding class will load these representations as they are.

cgnf

The following paper describes the implementation of the composition graph neural fingerprint (cgnf) from the node embedding vectors of a pre-trained crystal graph convolution neural network: Synthesizability of materials stoichiometry using semi-supervised learning

Data source

crystallm

The following paper describes the details behind the generative crystal structure model based on a large language model: Crystal Structure Generation with Autoregressive Large Language Modeling

magpie

The following paper describes the details of the Materials Agnostic Platform for Informatics and Exploration (Magpie) framework: A general-purpose machine learning framework for predicting properties of inorganic materials

The source code for Magpie can be found here

Data source

The 22 dimensional embedding vector includes the following elemental properties:

Click to see the 22 properties * Number; * Mendeleev number; * Atomic weight; * Melting temperature; * Group number; * Period; * Covalent Radius; * Electronegativity; * no. of s, p, d, f valence electrons (4 features); * no. of valence electrons; * no. of unfilled: s, p, d, f orbitals (4 features), * no. of unfilled orbtials * GSvolume_pa (DFT volume per atom of T=0K ground state from the OQMD) * GSbandgap(DFT bandgap energy of T=0K ground state from the OQMD) * GSmagmom (DFT magnetic moment of T=0K ground state from the OQMD) * Space Group Number
  • magpie_sc is a scaled version of the magpie embeddings. Data source

mat2vec

The following paper describes the implementation of mat2vec: Unsupervised word embeddings capture latent knowledge from materials science literature

Data source

matscholar

The following paper describes the natural language processing implementation of Materials Scholar (matscholar): Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature

Data source

megnet

The following paper describes the details of the construction of the MatErials Graph Network (MEGNet): Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. The 16 dimensional vectors are drawn from the atomic weights of a model trained to predict the formation energies of crystalline materials.

Data source

oliynyk

The following paper describes the details: High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds

Data source

The 44 features of the embedding vector are formed of the following properties:

Click to see the 44 features! * Number * Atomic_Weight * Period * Group * Families * Metal * Nonmetal * Metalliod * Mendeleev_Number * l_quantum_number * Atomic_Radius * Miracle_Radius_[pm] * Covalent_Radius * Zunger_radii_sum * Ionic_radius * crystal_radius * Pauling_Electronegativity * MB_electonegativity * Gordy_electonegativity * Mulliken_EN * Allred-Rockow_electronegativity * Metallic_valence * Number_of_valence_electrons * Gilmor_number_of_valence_electron * valence_s * valence_p * valence_d * valence_f * Number_of_unfilled_s_valence_electrons * Number_of_unfilled_p_valence_electrons * Number_of_unfilled_d_valence_electrons * Number_of_unfilled_f_valence_electrons * Outer_shell_electrons * 1st_ionization_potential_(kJ/mol) * Polarizability(A^3) * Melting_point_(K) * Boiling_Point_(K) * Density_(g/mL) * Specific_heat_(J/g_K)_ * Heat_of_fusion_(kJ/mol)_ * Heat_of_vaporization_(kJ/mol)_ * Thermal_conductivity_(W/(m_K))_ * Heat_atomization(kJ/mol) * Cohesive_energy
  • oliynyk_sc is a scaled version of the oliynyk embeddings: Data source

random

This is a set of 200-dimensional vectors in which the components are randomly generated

The 118 200-dimensional vectors in random_200_new were generated using the following code:

import numpy as np

mu , sigma = 0 , 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
s = np.random.default_rng(seed=42).normal(mu, sigma, (118,200))

skipatom

The following paper describes the details: Distributed representations of atoms and materials for machine learning

Data source

xenonpy

The XenonPy embedding uses the 58 features which are commonly used in publications that use the XenonPy package. See the following publications: * Representation of materials by kernel mean embedding * Crystal structure prediction with machine learning-based element substitution