Elemental Embeddings
The data contained in this repository are a collection of various elemental representation/embedding schemes. We provide the literature source for these representations as well as the data source for which the files were obtained. Some representations have been obtained from the following repositories:
Linear representations
For the linear/scalar representations, the Embedding
class will load these representations as one-hot vectors where the vector components are ordered following the scale (i.e. the atomic
representation is ordered by atomic numbers).
Modified Pettifor scale
The following paper describes the details of the modified Pettifor chemical scale: The optimal one-dimensional periodic table: a modified Pettifor chemical scale from data mining
Atomic numbers
We included atomic
as a linear representation to generate one-hot vectors corresponding to the atomic numbers
Vector representations
The following representations are all vector representations (some are local, some are distributed) and the Embedding
class will load these representations as they are.
cgnf
The following paper describes the implementation of the composition graph neural fingerprint (cgnf) from the node embedding vectors of a pre-trained crystal graph convolution neural network: Synthesizability of materials stoichiometry using semi-supervised learning
crystallm
The following paper describes the details behind the generative crystal structure model based on a large language model: Crystal Structure Generation with Autoregressive Large Language Modeling
magpie
The following paper describes the details of the Materials Agnostic Platform for Informatics and Exploration (Magpie) framework: A general-purpose machine learning framework for predicting properties of inorganic materials
The source code for Magpie can be found here
The 22 dimensional embedding vector includes the following elemental properties:
Click to see the 22 properties
- Number; - Mendeleev number; - Atomic weight; - Melting temperature; - Group number; - Period; - Covalent Radius; - Electronegativity; - no. of s, p, d, f valence electrons (4 features); - no. of valence electrons; - no. of unfilled: s, p, d, f orbitals (4 features), - no. of unfilled orbtials - GSvolume_pa (DFT volume per atom of T=0K ground state from the OQMD) - GSbandgap(DFT bandgap energy of T=0K ground state from the OQMD) - GSmagmom (DFT magnetic moment of T=0K ground state from the OQMD) - Space Group Numbermagpie_sc
is a scaled version of the magpie embeddings. Data source
mat2vec
The following paper describes the implementation of mat2vec: Unsupervised word embeddings capture latent knowledge from materials science literature
matscholar
The following paper describes the natural language processing implementation of Materials Scholar (matscholar): Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature
megnet
The following paper describes the details of the construction of the MatErials Graph Network (MEGNet): Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. The 16 dimensional vectors are drawn from the atomic weights of a model trained to predict the formation energies of crystalline materials.
oliynyk
The following paper describes the details: High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds
The 44 features of the embedding vector are formed of the following properties:
Click to see the 44 features!
- Number - Atomic_Weight - Period - Group - Families - Metal - Nonmetal - Metalliod - Mendeleev_Number - l_quantum_number - Atomic_Radius - Miracle*Radius*[pm] - Covalent_Radius - Zunger_radii_sum - Ionic_radius - crystal_radius - Pauling_Electronegativity - MB_electonegativity - Gordy_electonegativity - Mulliken_EN - Allred-Rockow_electronegativity - Metallic_valence - Number_of_valence_electrons - Gilmor_number_of_valence_electron - valence_s - valence_p - valence_d - valence_f - Number_of_unfilled_s_valence_electrons - Number_of_unfilled_p_valence_electrons - Number_of_unfilled_d_valence_electrons - Number_of_unfilled_f_valence_electrons - Outer_shell_electrons - 1st*ionization_potential*(kJ/mol) - Polarizability(A^3) - Melting*point*(K) - Boiling*Point*(K) - Density\_(g/mL) - Specific*heat*(J/g*K)* - Heat*of_fusion*(kJ/mol)\_ - Heat*of_vaporization*(kJ/mol)\_ - Thermal*conductivity*(W/(m*K))* - Heat_atomization(kJ/mol) - Cohesive_energyoliynyk_sc
is a scaled version of the oliynyk embeddings: Data source
random
This is a set of 200-dimensional vectors in which the components are randomly generated
The 118 200-dimensional vectors in random_200_new
were generated using the following code:
import numpy as np
mu, sigma = 0, 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
s = np.random.default_rng(seed=42).normal(mu, sigma, (118, 200))
skipatom
The following paper describes the details: Distributed representations of atoms and materials for machine learning
xenonpy
The XenonPy embedding uses the 58 features which are commonly used in publications that use the XenonPy package. See the following publications: