Skip to content

Simple usage

Here we will demonstrate how to use some of ElementEmbeddings's features. For full worked examples of using the package, please refer to the Jupyter notebooks in the examples section of the Github repo.

ElementEmbeddings

The Embedding class lies at the heart of the package. It handles elemental representation data and enables analysis and visualisation.

For simple usage, you can instantiate an Embedding object using one of the embeddings in the data directory. For this example, let's use the magpie elemental representation.

# Import the class
from elementembeddings.core import Embedding

# Load the magpie data
magpie = Embedding.load_data("magpie")

We can access some of the properties of the Embedding class. For example, we can find the dimensions of the elemental representation and the list of elements for which an embedding exists.

# Print out some of the properties of the ElementEmbeddings class
print(f"The magpie representation has embeddings of dimension {magpie.dim}")
print(
    f"The magpie representation contains these elements: \n {magpie.element_list}"
)  # prints out all the elements considered for this representation
print(
    f"The magpie representation contains these features: \n {magpie.feature_labels}"
)  # Prints out the feature labels of the chosen representation

# The magpie representation has embeddings of dimension 22
# The magpie representation contains these elements:
[
    "H",
    "He",
    "Li",
    "Be",
    "B",
    "C",
    "N",
    "O",
    "F",
    "Ne",
    "Na",
    "Mg",
    "Al",
    "Si",
    "P",
    "S",
    "Cl",
    "Ar",
    "K",
    "Ca",
    "Sc",
    "Ti",
    "V",
    "Cr",
    "Mn",
    "Fe",
    "Co",
    "Ni",
    "Cu",
    "Zn",
    "Ga",
    "Ge",
    "As",
    "Se",
    "Br",
    "Kr",
    "Rb",
    "Sr",
    "Y",
    "Zr",
    "Nb",
    "Mo",
    "Tc",
    "Ru",
    "Rh",
    "Pd",
    "Ag",
    "Cd",
    "In",
    "Sn",
    "Sb",
    "Te",
    "I",
    "Xe",
    "Cs",
    "Ba",
    "La",
    "Ce",
    "Pr",
    "Nd",
    "Pm",
    "Sm",
    "Eu",
    "Gd",
    "Tb",
    "Dy",
    "Ho",
    "Er",
    "Tm",
    "Yb",
    "Lu",
    "Hf",
    "Ta",
    "W",
    "Re",
    "Os",
    "Ir",
    "Pt",
    "Au",
    "Hg",
    "Tl",
    "Pb",
    "Bi",
    "Po",
    "At",
    "Rn",
    "Fr",
    "Ra",
    "Ac",
    "Th",
    "Pa",
    "U",
    "Np",
    "Pu",
    "Am",
    "Cm",
    "Bk",
]
# The magpie representation contains these features:
[
    "Number",
    "MendeleevNumber",
    "AtomicWeight",
    "MeltingT",
    "Column",
    "Row",
    "CovalentRadius",
    "Electronegativity",
    "NsValence",
    "NpValence",
    "NdValence",
    "NfValence",
    "NValence",
    "NsUnfilled",
    "NpUnfilled",
    "NdUnfilled",
    "NfUnfilled",
    "NUnfilled",
    "GSvolume_pa",
    "GSbandgap",
    "GSmagmom",
    "SpaceGroupNumber",
]

Plotting

We can quickly generate heatmaps of distance/similarity measures between the element vectors using heatmap_plotter and plot the representations in two dimensions using the dimension_plotter from the plotter module. Before we do that, we will standardise the embedding using the standardise method available to the Embedding class

from elementembeddings.plotter import heatmap_plotter, dimension_plotter
import matplotlib.pyplot as plt

magpie.standardise(inplace=True)  # Standardises the representation

fig, ax = plt.subplots(1, 1, figsize=(6, 6))
heatmap_params = {"vmin": -1, "vmax": 1}
heatmap_plotter(
    embedding=magpie,
    metric="cosine_similarity",
    show_axislabels=False,
    cmap="Blues_r",
    ax=ax,
    **heatmap_params
)
ax.set_title("Magpie cosine similarities")
fig.tight_layout()
fig.show()

Magpie cosine similarity heatmap

fig, ax = plt.subplots(1, 1, figsize=(6, 6))

reducer_params = {"n_neighbors": 30, "random_state": 42}
scatter_params = {"s": 100}

dimension_plotter(
    embedding=magpie,
    reducer="umap",
    n_components=2,
    ax=ax,
    adjusttext=True,
    reducer_params=reducer_params,
    scatter_params=scatter_params,
)
ax.set_title("Magpie UMAP (n_neighbours=30)")
ax.legend().remove()
handles, labels = ax1.get_legend_handles_labels()
fig.legend(handles, labels, bbox_to_anchor=(1.25, 0.5), loc="center right", ncol=1)

fig.tight_layout()
fig.show()

Magpie UMAP scatter plot

Compositions

The package can also be used to featurise compositions. Your data could be a list of formula strings or a pandas dataframe of the following format:

formula
CsPbI3
Fe2O3
NaCl
ZnS

The composition_featuriser function can be used to featurise the data. The compositions can be featurised using different representation schemes and different types of pooling through the embedding and stats arguments respectively.

from elementembeddings.composition import composition_featuriser

df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean", "sum"])

df_featurised
formula mean_Number mean_MendeleevNumber mean_AtomicWeight mean_MeltingT mean_Column mean_Row mean_CovalentRadius mean_Electronegativity mean_NsValence mean_NpValence mean_NdValence mean_NfValence mean_NValence mean_NsUnfilled mean_NpUnfilled mean_NdUnfilled mean_NfUnfilled mean_NUnfilled mean_GSvolume_pa mean_GSbandgap mean_GSmagmom mean_SpaceGroupNumber sum_Number sum_MendeleevNumber sum_AtomicWeight sum_MeltingT sum_Column sum_Row sum_CovalentRadius sum_Electronegativity sum_NsValence sum_NpValence sum_NdValence sum_NfValence sum_NValence sum_NsUnfilled sum_NpUnfilled sum_NdUnfilled sum_NfUnfilled sum_NUnfilled sum_GSvolume_pa sum_GSbandgap sum_GSmagmom sum_SpaceGroupNumber
CsPbI3 59.2 74.8 144.16377238 412.55 13.2 5.4 161.39999999999998 2.22 1.8 3.4 8.0 2.8000000000000003 16.0 0.2 1.4 0.0 0.0 1.6 54.584 0.6372 0.0 129.20000000000002 296.0 374.0 720.8188619 2062.75 66.0 27.0 807.0 11.100000000000001 9.0 17.0 40.0 14.0 80.0 1.0 7.0 0.0 0.0 8.0 272.92 3.186 0.0 646.0
Fe2O3 15.2 74.19999999999999 31.937640000000002 757.2800000000001 12.8 2.8 92.4 2.7960000000000003 2.0 2.4 2.4000000000000004 0.0 6.8 0.0 1.2 1.6 0.0 2.8 9.755 0.0 0.8442651200000001 98.80000000000001 76.0 371.0 159.6882 3786.4 64.0 14.0 462.0 13.98 10.0 12.0 12.0 0.0 34.0 0.0 6.0 8.0 0.0 14.0 48.775000000000006 0.0 4.2213256 494.0
NaCl 14.0 48.0 29.221384640000004 271.235 9.0 3.0 134.0 2.045 1.5 2.5 0.0 0.0 4.0 0.5 0.5 0.0 0.0 1.0 26.87041666665 1.2465 0.0 146.5 28.0 96.0 58.44276928000001 542.47 18.0 6.0 268.0 4.09 3.0 5.0 0.0 0.0 8.0 1.0 1.0 0.0 0.0 2.0 53.7408333333 2.493 0.0 293.0
ZnS 23.0 78.5 48.7225 540.52 14.0 3.5 113.5 2.115 2.0 2.0 5.0 0.0 9.0 0.0 1.0 0.0 0.0 1.0 19.8734375 1.101 0.0 132.0 46.0 157.0 97.445 1081.04 28.0 7.0 227.0 4.23 4.0 4.0 10.0 0.0 18.0 0.0 2.0 0.0 0.0 2.0 39.746875 2.202 0.0 264.0

The returned dataframe contains the mean- and sum-pooled features of the magpie representation for the four formulas.