Session: 04-24-01: Advancing Composite Materials through Integrated Multiscale Modeling and Experimental Techniques
Paper Number: 145308
145308 - Unsupervised Classification of Perovskite Crystal Structures From Xrd Data
Perovskite materials hold immense promise for applications in photovoltaics, optoelectronics, and photonics. However, accurately characterizing their crystal structures, which directly influences these properties, remains a challenge. Traditionally, analysing these structures from X-ray diffraction (XRD) data relies on labour-intensive steps like crystallographic study and phase identification. These methods can be computationally demanding and susceptible to human error. Accordingly, this work explores the potential of unsupervised machine learning (ML) for automated and efficient classification for perovskite compounds. By applying unsupervised ML algorithms to raw XRD data, we directly classify crystal structures, bypassing the need for complex traditional procedures. Unlike supervised learning approaches that required labeled data sets, unsupervised ML algorithms like k-means clustering can identify inherent patterns within the complex XRD data of perovskites. These patterns can reveal underlying crystallographic information without the need for pre-existing classifications. By analyzing the intrinsic relationships between various features extracted from the XRD data (e.g., peak positions, intensities), k-means clustering can group similar perovskite structures together. This “on-the-fly” approach has the potential to streamline material characterization, enabling researchers to rapidly analyse large datasets and uncover hidden structural patterns within perovskite materials. This approach holds promise for streamlining the classification process and offering valuable insights into the structure-property relationships within perovskite materials.
PyMatgen was employed to extract data for perovskite compounds from the Materials Project repository, followed by initial preprocessing to prepare it for machine learning models. Total number of compounds along with their respective XRD spectra were 4008. Next, we focused on feature engineering, extracting relevant information from the XRD data. We then performed K-means clustering on two datasets: the raw XRD data itself and the extracted features specific to perovskite materials. K-means is a well-known unsupervised machine-learning technique for identifying a dataset's classes without labels (in our case : crystal structure class). We used the elbow method to identify the clusters for effectively grouping similar perovskite structures. We then employed the silhouette score to assess the effectiveness of our clustering methodology. Subsequently, we associated the identified clusters with crystal structures to unveil the correlation between the crystal structure and the assigned labels. Additionally, we computed accuracy metrics and Mathew's correlation coefficient. To visualize the clusters, dimensionality reduction techniques were applied, including Principal Component Analysis (PCA), to the XRD crystal structure data. Furthermore, we experimented with conducting PCA prior to K-Means clustering to investigate the impact of considering the entire dataset on the clustering outcome.
By applying K-means clustering on the XRD dataset, we achieved an accuracy of 72%, accompanied by a silhouette score of 0.21 and a Matthews Correlation Coefficient (MCC) of 0.51. In contrast, initiating the process with PCA followed by K-means clustering resulted in a similar accuracy rate (72%), silhouette score of .5, and a MCC of 0.5.. This suggests dimensionality reduction with PCA can be beneficial for this specific case.
Further analysis of the feature dataset demonstrated that starting with K-means clustering before PCA yielded an accuracy of 70%, 69%, a silhouette score of 0.54, an MCC of 0.412. Conversely, initiating with PCA before K-means clustering showed a slight improvement in accuracy (69%), silhouette score (0.5), MCC (0.41). These outcomes underline the significance of feature selection and the sequence of analytical techniques on the accuracy of crystal structure identification in perovskite materials, indicating a potential pathway for optimizing unsupervised machine learning methodologies in materials science.
Presenting Author: Jayakumar Vandavasi Karunamurthy Research and Development Centre, Dubai Electricity and Water Authority,
Presenting Author Biography: Senior R&D Technologist - 4IR Area Lead & Head of IoT at Dubai Electricity & Water Authority - DEWA
Eng. Jayakumar Vandavsi Karunamurthy ( Jayakumar) has been serving as the Senior Principal Researcher for 4IR Area Lead Since 2021 and he earned master degree engineering in applied electronics at Anna university ,India.
Eng. Jayakumar has more than 28 years of experience in various fields like Medical R&D, Sub- Sea ROVs, and as an IoT Solution Architect. he has leveraged his extensive experience and skill set to drive meaningful progress in DEWA R&D journey towards Industry 4.0 readiness to achieve the DEWA 2030 Net zero Goal.
Authors:
Ansu Mathew Research and Development Centre, Dubai Electricity and Water Authority,Ahmer a.b. Baloch Research and Development Centre, Dubai Electricity and Water Authority,
Alamin Mohammed Yakasai Research and Development Centre, Dubai Electricity and Water Authority,
Hemant Mittel Research and Development Centre, Dubai Electricity and Water Authority,
Vivian Alberts Research and Development Centre, Dubai Electricity and Water Authority,
Jayakumar Vandavasi Karunamurthy Research and Development Centre, Dubai Electricity and Water Authority,
Unsupervised Classification of Perovskite Crystal Structures From Xrd Data
Paper Type
Technical Paper Publication