Using the composition module¶
import pandas as pd
from elementembeddings.composition import composition_featuriser
from elementembeddings.composition import CompositionalEmbedding
import numpy as np
np.set_printoptions(suppress=True)
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
The core class of the elementembeddings.composition
module is the CompositionalEmbedding
class.
We can use this class the create objects which represent a composition and an elemental representation.
We can create an instance of this class as follows:
CsPbI3_magpie = CompositionalEmbedding(formula='CsPbI3', embedding='magpie')
CsPbI3_magpie = CompositionalEmbedding(formula="CsPbI3", embedding="magpie")
We can access the elemental embeddings of the individual elements in the composition from the el_matrix
attribute.
>>> CsPbI3_magpie.el_matrix
# Print the individual element feature vectors
print(CsPbI3_magpie.el_matrix)
[[ 55. 5. 132.9054519 301.59 1. 6. 244. 0.79 1. 0. 0. 0. 1. 1. 0. 0. 0. 1. 115.765 0. 0. 229. ] [ 82. 81. 207.2 600.61 14. 6. 146. 2.33 2. 2. 10. 14. 28. 0. 4. 0. 0. 4. 28.11 0. 0. 225. ] [ 53. 96. 126.90447 386.85 17. 5. 139. 2.66 2. 5. 10. 0. 17. 0. 1. 0. 0. 1. 43.015 1.062 0. 64. ]]
Some properties which are accessible are the composition
and fractional composition
which are dictionaries of element:amount key:value pairs.
# Print the composition and the fractional composition
print(CsPbI3_magpie.composition)
print(CsPbI3_magpie.fractional_composition)
defaultdict(<class 'float'>, {'Cs': 1.0, 'Pb': 1.0, 'I': 3.0}) {'Cs': 0.2, 'Pb': 0.2, 'I': 0.6}
Other properties and attributes that can be accessed are the (normalised) stoichiometry represented as a vector.
# Print the list of elements
print(CsPbI3_magpie.element_list)
# Print the stoichiometric vector
print(CsPbI3_magpie.stoich_vector)
# Print the normalized stoichiometric vector
print(CsPbI3_magpie.norm_stoich_vector)
# Print the number of atoms
print(CsPbI3_magpie.num_atoms)
['Cs', 'Pb', 'I'] [1. 1. 3.] [0.2 0.2 0.6] 5.0
We can create create compositional-based feature vectors using the feature_vector
method.
>>> CsPbI3_magpie.feature_vector()
By default, this will return the weighted average of the elemental embeddings of the composition. This would have the same dimension as the individual elemental embeddings.
We can also specify the type of feature vector we want to create by passing the stats
argument.
>>> CsPbI3_magpie.feature_vector(stats=['mean', 'variance'])
This would return a feature vector which is the concatenation of the mean and variance of the elemental embeddings of the composition. This would have twice the dimension of the individual elemental embeddings. In general, the dimension of the feature vector is the product of the dimension of the elemental embeddings and the number of statistics requested.
The available statistics are:
mean
variance
minpool
maxpool
sum
range
harmonic_mean
geometric_mean
# Print the mean feature vector
print(CsPbI3_magpie.feature_vector(stats="mean"))
[ 59.2 74.8 144.16377238 412.55 13.2 5.4 161.4 2.22 1.8 3.4 8. 2.8 16. 0.2 1.4 0. 0. 1.6 54.584 0.6372 0. 129.2 ]
print(CompositionalEmbedding(formula="NaCl", embedding="magpie").feature_vector())
[ 14. 48. 29.22138464 271.235 9. 3. 134. 2.045 1.5 2.5 0. 0. 4. 0.5 0.5 0. 0. 1. 26.87041667 1.2465 0. 146.5 ]
# Print the feature vector for the mean, variance, minpool, maxpool, and sum
CsPbI3_magpie_cbfv = CsPbI3_magpie.feature_vector(
stats=["mean", "variance", "minpool", "maxpool", "sum"]
)
print(f"The dimension of the feature vector is {CsPbI3_magpie_cbfv.shape[0]}")
print(CsPbI3_magpie_cbfv)
The dimension of the feature vector is 110 [ 59.2 74.8 144.16377238 412.55 13.2 5.4 161.4 2.22 1.8 3.4 8. 2.8 16. 0.2 1.4 0. 0. 1.6 54.584 0.6372 0. 129.2 130.56 1251.76 998.7932657 9932.03104 38.56 0.24 1713.04 0.52756 0.16 4.24 16. 31.36 74.4 0.16 1.84 0. 0. 1.44 969.102544 0.27068256 0. 6378.16 53. 5. 126.90447 301.59 1. 5. 139. 0.79 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 28.11 0. 0. 64. 82. 96. 207.2 600.61 17. 6. 244. 2.66 2. 5. 10. 14. 28. 1. 4. 0. 0. 4. 115.765 1.062 0. 229. 296. 374. 720.8188619 2062.75 66. 27. 807. 11.1 9. 17. 40. 14. 80. 1. 7. 0. 0. 8. 272.92 3.186 0. 646. ]
We can also featurise multiple formulas at once using the composition_featuriser
function.
>>> composition_featuriser(["CsPbI3", "Fe2O3", "NaCl"], embedding='magpie')
This will return a numpy
array of the feature vectors of the compositions. The order of the feature vectors will be the same as the order of the formulas in the input list.
formulas = ["CsPbI3", "Fe2O3", "NaCl"]
composition_featuriser(formulas, embedding="magpie", stats="mean")
0%| | 0/3 [00:00<?, ?it/s]
100%|██████████| 3/3 [00:00<00:00, 20936.63it/s]
[array([ 59.2 , 74.8 , 144.16377238, 412.55 , 13.2 , 5.4 , 161.4 , 2.22 , 1.8 , 3.4 , 8. , 2.8 , 16. , 0.2 , 1.4 , 0. , 0. , 1.6 , 54.584 , 0.6372 , 0. , 129.2 ]), array([ 15.2 , 74.2 , 31.93764 , 757.28 , 12.8 , 2.8 , 92.4 , 2.796 , 2. , 2.4 , 2.4 , 0. , 6.8 , 0. , 1.2 , 1.6 , 0. , 2.8 , 9.755 , 0. , 0.84426512, 98.8 ]), array([ 14. , 48. , 29.22138464, 271.235 , 9. , 3. , 134. , 2.045 , 1.5 , 2.5 , 0. , 0. , 4. , 0.5 , 0.5 , 0. , 0. , 1. , 26.87041667, 1.2465 , 0. , 146.5 ])]
df = pd.DataFrame({"formula": formulas})
composition_featuriser(df, embedding="magpie", stats=["mean", "sum"])
Featurising compositions...
0%| | 0/3 [00:00<?, ?it/s]
100%|██████████| 3/3 [00:00<00:00, 530.01it/s]
Computing feature vectors...
0%| | 0/3 [00:00<?, ?it/s]
100%|██████████| 3/3 [00:00<00:00, 22795.13it/s]
formula | mean_Number | mean_MendeleevNumber | mean_AtomicWeight | mean_MeltingT | mean_Column | mean_Row | mean_CovalentRadius | mean_Electronegativity | mean_NsValence | ... | sum_NValence | sum_NsUnfilled | sum_NpUnfilled | sum_NdUnfilled | sum_NfUnfilled | sum_NUnfilled | sum_GSvolume_pa | sum_GSbandgap | sum_GSmagmom | sum_SpaceGroupNumber | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CsPbI3 | 59.2 | 74.8 | 144.163772 | 412.550 | 13.2 | 5.4 | 161.4 | 2.220 | 1.8 | ... | 80.0 | 1.0 | 7.0 | 0.0 | 0.0 | 8.0 | 272.920000 | 3.186 | 0.000000 | 646.0 |
1 | Fe2O3 | 15.2 | 74.2 | 31.937640 | 757.280 | 12.8 | 2.8 | 92.4 | 2.796 | 2.0 | ... | 34.0 | 0.0 | 6.0 | 8.0 | 0.0 | 14.0 | 48.775000 | 0.000 | 4.221326 | 494.0 |
2 | NaCl | 14.0 | 48.0 | 29.221385 | 271.235 | 9.0 | 3.0 | 134.0 | 2.045 | 1.5 | ... | 8.0 | 1.0 | 1.0 | 0.0 | 0.0 | 2.0 | 53.740833 | 2.493 | 0.000000 | 293.0 |
3 rows × 45 columns
We can also calculate the "distance" between two compositions using their feature vectors. This can be used to determine which compositions are more similar to each other.
print(
f"The euclidean distance between CsPbI3 and Fe2O3 is {CsPbI3_magpie.distance('Fe2O3', distance_metric='euclidean', stats='mean'):.2f}"
)
print(
f"The euclidean distance between CsPbI3 and NaCl is {CsPbI3_magpie.distance('NaCl',distance_metric='euclidean', stats='mean'):.2f}"
)
print(
f"The euclidean distance between CsPbI3 and CsPbCl3 is {CsPbI3_magpie.distance('CsPbCl3',distance_metric='euclidean', stats='mean'):.2f}"
)
The euclidean distance between CsPbI3 and Fe2O3 is 375.77 The euclidean distance between CsPbI3 and NaCl is 194.94 The euclidean distance between CsPbI3 and CsPbCl3 is 144.39
Based on the mean-pooled feature vectors, we can see that CsPbI3 and CsPbBr3 are more similar to each other than CsPbI3 and Fe2O3.