Using the composition module¶

In [1]:

Copied!





import pandas as pd
from elementembeddings.composition import composition_featuriser
from elementembeddings.composition import CompositionalEmbedding
import numpy as np

np.set_printoptions(suppress=True)
import pandas as pd
from elementembeddings.composition import composition_featuriser
from elementembeddings.composition import CompositionalEmbedding
import numpy as np

np.set_printoptions(suppress=True)

/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

The core class of the elementembeddings.composition module is the CompositionalEmbedding class. We can use this class the create objects which represent a composition and an elemental representation. We can create an instance of this class as follows:

CsPbI3_magpie = CompositionalEmbedding(formula='CsPbI3', embedding='magpie')

In [2]:

Copied!

CsPbI3_magpie = CompositionalEmbedding(formula="CsPbI3", embedding="magpie")
CsPbI3_magpie = CompositionalEmbedding(formula="CsPbI3", embedding="magpie")

We can access the elemental embeddings of the individual elements in the composition from the el_matrix attribute.

>>> CsPbI3_magpie.el_matrix

In [3]:

Copied!

# Print the individual element feature vectors
print(CsPbI3_magpie.el_matrix)
# Print the individual element feature vectors
print(CsPbI3_magpie.el_matrix)

[[ 55.          5.        132.9054519 301.59        1.          6.
  244.          0.79        1.          0.          0.          0.
    1.          1.          0.          0.          0.          1.
  115.765       0.          0.        229.       ]
 [ 82.         81.        207.2       600.61       14.          6.
  146.          2.33        2.          2.         10.         14.
   28.          0.          4.          0.          0.          4.
   28.11        0.          0.        225.       ]
 [ 53.         96.        126.90447   386.85       17.          5.
  139.          2.66        2.          5.         10.          0.
   17.          0.          1.          0.          0.          1.
   43.015       1.062       0.         64.       ]]

Some properties which are accessible are the composition and fractional composition which are dictionaries of element:amount key:value pairs.

In [4]:

Copied!

# Print the composition and the fractional composition
print(CsPbI3_magpie.composition)
print(CsPbI3_magpie.fractional_composition)
# Print the composition and the fractional composition
print(CsPbI3_magpie.composition)
print(CsPbI3_magpie.fractional_composition)

defaultdict(<class 'float'>, {'Cs': 1.0, 'Pb': 1.0, 'I': 3.0})
{'Cs': 0.2, 'Pb': 0.2, 'I': 0.6}

Other properties and attributes that can be accessed are the (normalised) stoichiometry represented as a vector.

In [5]:

Copied!





# Print the list of elements
print(CsPbI3_magpie.element_list)
# Print the stoichiometric vector
print(CsPbI3_magpie.stoich_vector)

# Print the normalized stoichiometric vector
print(CsPbI3_magpie.norm_stoich_vector)

# Print the number of atoms
print(CsPbI3_magpie.num_atoms)
# Print the list of elements
print(CsPbI3_magpie.element_list)
# Print the stoichiometric vector
print(CsPbI3_magpie.stoich_vector)

# Print the normalized stoichiometric vector
print(CsPbI3_magpie.norm_stoich_vector)

# Print the number of atoms
print(CsPbI3_magpie.num_atoms)

['Cs', 'Pb', 'I']
[1. 1. 3.]
[0.2 0.2 0.6]
5.0

We can create create compositional-based feature vectors using the feature_vector method.

>>> CsPbI3_magpie.feature_vector()

By default, this will return the weighted average of the elemental embeddings of the composition. This would have the same dimension as the individual elemental embeddings. We can also specify the type of feature vector we want to create by passing the stats argument.

>>> CsPbI3_magpie.feature_vector(stats=['mean', 'variance'])

This would return a feature vector which is the concatenation of the mean and variance of the elemental embeddings of the composition. This would have twice the dimension of the individual elemental embeddings. In general, the dimension of the feature vector is the product of the dimension of the elemental embeddings and the number of statistics requested.

The available statistics are:

mean
variance
minpool
maxpool
sum
range
harmonic_mean
geometric_mean

In [6]:

Copied!

# Print the mean feature vector
print(CsPbI3_magpie.feature_vector(stats="mean"))
# Print the mean feature vector
print(CsPbI3_magpie.feature_vector(stats="mean"))

[ 59.2         74.8        144.16377238 412.55        13.2
   5.4        161.4          2.22         1.8          3.4
   8.           2.8         16.           0.2          1.4
   0.           0.           1.6         54.584        0.6372
   0.         129.2       ]

In [7]:

Copied!

print(CompositionalEmbedding(formula="NaCl", embedding="magpie").feature_vector())
print(CompositionalEmbedding(formula="NaCl", embedding="magpie").feature_vector())

[ 14.          48.          29.22138464 271.235        9.
   3.         134.           2.045        1.5          2.5
   0.           0.           4.           0.5          0.5
   0.           0.           1.          26.87041667   1.2465
   0.         146.5       ]

In [8]:

Copied!





# Print the feature vector for the mean, variance, minpool, maxpool, and sum
CsPbI3_magpie_cbfv = CsPbI3_magpie.feature_vector(
    stats=["mean", "variance", "minpool", "maxpool", "sum"]
)
print(f"The dimension of the feature vector is {CsPbI3_magpie_cbfv.shape[0]}")

print(CsPbI3_magpie_cbfv)
# Print the feature vector for the mean, variance, minpool, maxpool, and sum
CsPbI3_magpie_cbfv = CsPbI3_magpie.feature_vector(
    stats=["mean", "variance", "minpool", "maxpool", "sum"]
)
print(f"The dimension of the feature vector is {CsPbI3_magpie_cbfv.shape[0]}")

print(CsPbI3_magpie_cbfv)

The dimension of the feature vector is 110
[  59.2          74.8         144.16377238  412.55         13.2
    5.4         161.4           2.22          1.8           3.4
    8.            2.8          16.            0.2           1.4
    0.            0.            1.6          54.584         0.6372
    0.          129.2         130.56       1251.76        998.7932657
 9932.03104      38.56          0.24       1713.04          0.52756
    0.16          4.24         16.           31.36         74.4
    0.16          1.84          0.            0.            1.44
  969.102544      0.27068256    0.         6378.16         53.
    5.          126.90447     301.59          1.            5.
  139.            0.79          1.            0.            0.
    0.            1.            0.            0.            0.
    0.            1.           28.11          0.            0.
   64.           82.           96.          207.2         600.61
   17.            6.          244.            2.66          2.
    5.           10.           14.           28.            1.
    4.            0.            0.            4.          115.765
    1.062         0.          229.          296.          374.
  720.8188619  2062.75         66.           27.          807.
   11.1           9.           17.           40.           14.
   80.            1.            7.            0.            0.
    8.          272.92          3.186         0.          646.        ]

We can also featurise multiple formulas at once using the composition_featuriser function.

>>> composition_featuriser(["CsPbI3", "Fe2O3", "NaCl"], embedding='magpie')

This will return a numpy array of the feature vectors of the compositions. The order of the feature vectors will be the same as the order of the formulas in the input list.

In [9]:

Copied!

formulas = ["CsPbI3", "Fe2O3", "NaCl"]

composition_featuriser(formulas, embedding="magpie", stats="mean")
formulas = ["CsPbI3", "Fe2O3", "NaCl"]

composition_featuriser(formulas, embedding="magpie", stats="mean")

  0%|          | 0/3 [00:00<?, ?it/s]

100%|██████████| 3/3 [00:00<00:00, 25575.02it/s]

Out[9]:

[array([ 59.2       ,  74.8       , 144.16377238, 412.55      ,
         13.2       ,   5.4       , 161.4       ,   2.22      ,
          1.8       ,   3.4       ,   8.        ,   2.8       ,
         16.        ,   0.2       ,   1.4       ,   0.        ,
          0.        ,   1.6       ,  54.584     ,   0.6372    ,
          0.        , 129.2       ]),
 array([ 15.2       ,  74.2       ,  31.93764   , 757.28      ,
         12.8       ,   2.8       ,  92.4       ,   2.796     ,
          2.        ,   2.4       ,   2.4       ,   0.        ,
          6.8       ,   0.        ,   1.2       ,   1.6       ,
          0.        ,   2.8       ,   9.755     ,   0.        ,
          0.84426512,  98.8       ]),
 array([ 14.        ,  48.        ,  29.22138464, 271.235     ,
          9.        ,   3.        , 134.        ,   2.045     ,
          1.5       ,   2.5       ,   0.        ,   0.        ,
          4.        ,   0.5       ,   0.5       ,   0.        ,
          0.        ,   1.        ,  26.87041667,   1.2465    ,
          0.        , 146.5       ])]

In [10]:

Copied!

df = pd.DataFrame({"formula": formulas})
composition_featuriser(df, embedding="magpie", stats=["mean", "sum"])
df = pd.DataFrame({"formula": formulas})
composition_featuriser(df, embedding="magpie", stats=["mean", "sum"])

Featurising compositions...

  0%|          | 0/3 [00:00<?, ?it/s]

100%|██████████| 3/3 [00:00<00:00, 555.14it/s]

Computing feature vectors...

  0%|          | 0/3 [00:00<?, ?it/s]

100%|██████████| 3/3 [00:00<00:00, 19846.86it/s]

Out[10]:

	formula	mean_Number	mean_MendeleevNumber	mean_AtomicWeight	mean_MeltingT	mean_Column	mean_Row	mean_CovalentRadius	mean_Electronegativity	mean_NsValence	...	sum_NValence	sum_NsUnfilled	sum_NpUnfilled	sum_NdUnfilled	sum_NUnfilled	sum_GSvolume_pa	sum_GSbandgap	sum_GSmagmom	sum_SpaceGroupNumber
0	CsPbI3	59.2	74.8	144.163772	412.550	13.2	5.4	161.4	2.220	1.8	...	80.0	1.0	7.0	0.0	8.0	272.920000	3.186	0.000000	646.0
1	Fe2O3	15.2	74.2	31.937640	757.280	12.8	2.8	92.4	2.796	2.0	...	34.0	0.0	6.0	8.0	14.0	48.775000	0.000	4.221326	494.0
2	NaCl	14.0	48.0	29.221385	271.235	9.0	3.0	134.0	2.045	1.5	...	8.0	1.0	1.0	0.0	2.0	53.740833	2.493	0.000000	293.0

3 rows × 45 columns

We can also calculate the "distance" between two compositions using their feature vectors. This can be used to determine which compositions are more similar to each other.

In [ ]:

In [11]:

Copied!





print(
    f"The euclidean distance between CsPbI3 and Fe2O3 is {CsPbI3_magpie.distance('Fe2O3', distance_metric='euclidean', stats='mean'):.2f}"
)
print(
    f"The euclidean distance between CsPbI3 and NaCl is {CsPbI3_magpie.distance('NaCl',distance_metric='euclidean', stats='mean'):.2f}"
)
print(
    f"The euclidean distance between CsPbI3 and CsPbCl3 is {CsPbI3_magpie.distance('CsPbCl3',distance_metric='euclidean', stats='mean'):.2f}"
)
print(
    f"The euclidean distance between CsPbI3 and Fe2O3 is {CsPbI3_magpie.distance('Fe2O3', distance_metric='euclidean', stats='mean'):.2f}"
)
print(
    f"The euclidean distance between CsPbI3 and NaCl is {CsPbI3_magpie.distance('NaCl',distance_metric='euclidean', stats='mean'):.2f}"
)
print(
    f"The euclidean distance between CsPbI3 and CsPbCl3 is {CsPbI3_magpie.distance('CsPbCl3',distance_metric='euclidean', stats='mean'):.2f}"
)

The euclidean distance between CsPbI3 and Fe2O3 is 375.77
The euclidean distance between CsPbI3 and NaCl is 194.94
The euclidean distance between CsPbI3 and CsPbCl3 is 144.39

Based on the mean-pooled feature vectors, we can see that CsPbI3 and CsPbBr3 are more similar to each other than CsPbI3 and Fe2O3.