# Using the composition module

In [None]:
import pandas as pd
from elementembeddings.composition import composition_featuriser
from elementembeddings.composition import CompositionalEmbedding
import numpy as np

np.set_printoptions(suppress=True)

The core class of the `elementembeddings.composition` module is the `CompositionalEmbedding` class.
We can use this class the create objects which represent a composition and an elemental representation.
We can create an instance of this class as follows:

```python
CsPbI3_magpie = CompositionalEmbedding(formula='CsPbI3', embedding='magpie')
```


In [None]:
CsPbI3_magpie = CompositionalEmbedding(formula="CsPbI3", embedding="magpie")

We can access the elemental embeddings of the individual elements in the composition from the `el_matrix` attribute.
```python
>>> CsPbI3_magpie.el_matrix
```

In [None]:
# Print the individual element feature vectors
print(CsPbI3_magpie.el_matrix)

Some properties which are accessible are the `composition` and `fractional composition` which are dictionaries of element:amount key:value pairs.

In [None]:
# Print the composition and the fractional composition
print(CsPbI3_magpie.composition)
print(CsPbI3_magpie.fractional_composition)

Other properties and attributes that can be accessed are the (normalised) stoichiometry represented as a vector.


In [None]:
# Print the list of elements
print(CsPbI3_magpie.element_list)
# Print the stoichiometric vector
print(CsPbI3_magpie.stoich_vector)

# Print the normalized stoichiometric vector
print(CsPbI3_magpie.norm_stoich_vector)

# Print the number of atoms
print(CsPbI3_magpie.num_atoms)

We can create create compositional-based feature vectors using the `feature_vector` method.
```python
>>> CsPbI3_magpie.feature_vector()
```
By default, this will return the weighted average of the elemental embeddings of the composition. This would have the same dimension as the individual elemental embeddings.
We can also specify the type of feature vector we want to create by passing the `stats` argument.
```python
>>> CsPbI3_magpie.feature_vector(stats=['mean', 'variance'])
```
This would return a feature vector which is the concatenation of the mean and variance of the elemental embeddings of the composition. This would have twice the dimension of the individual elemental embeddings. In general, the dimension of the feature vector is the product of the dimension of the elemental embeddings and the number of statistics requested.

The available statistics are:
- `mean`
- `variance`
- `minpool`
- `maxpool`
- `sum`
- `range`
- `harmonic_mean`
- `geometric_mean`



In [None]:
# Print the mean feature vector
print(CsPbI3_magpie.feature_vector(stats="mean"))

In [None]:
print(CompositionalEmbedding(formula="NaCl", embedding="magpie").feature_vector())

In [None]:
# Print the feature vector for the mean, variance, minpool, maxpool, and sum
CsPbI3_magpie_cbfv = CsPbI3_magpie.feature_vector(
    stats=["mean", "variance", "minpool", "maxpool", "sum"]
)
print(f"The dimension of the feature vector is {CsPbI3_magpie_cbfv.shape[0]}")

print(CsPbI3_magpie_cbfv)

We can also featurise multiple formulas at once using the `composition_featuriser` function.
```python
>>> composition_featuriser(["CsPbI3", "Fe2O3", "NaCl"], embedding='magpie')
```
This will return a `numpy` array of the feature vectors of the compositions. The order of the feature vectors will be the same as the order of the formulas in the input list.


In [None]:
formulas = ["CsPbI3", "Fe2O3", "NaCl"]

composition_featuriser(formulas, embedding="magpie", stats="mean")

In [None]:
df = pd.DataFrame({"formula": formulas})
composition_featuriser(df, embedding="magpie", stats=["mean", "sum"])

We can also calculate the "distance" between two compositions using their feature vectors. This can be used to determine which compositions are more similar to each other.

In [None]:
print(
    f"The euclidean distance between CsPbI3 and Fe2O3 is {CsPbI3_magpie.distance('Fe2O3', distance_metric='euclidean', stats='mean'):.2f}"
)
print(
    f"The euclidean distance between CsPbI3 and NaCl is {CsPbI3_magpie.distance('NaCl',distance_metric='euclidean', stats='mean'):.2f}"
)
print(
    f"The euclidean distance between CsPbI3 and CsPbCl3 is {CsPbI3_magpie.distance('CsPbCl3',distance_metric='euclidean', stats='mean'):.2f}"
)

Based on the mean-pooled feature vectors, we can see that CsPbI3 and CsPbBr3 are more similar to each other than CsPbI3 and Fe2O3.