Open Access   Article Go Back

Enhancing Interpretable Anomaly Detection: Depth-based Extended Isolation Forest Feature Importance (DEIFFI)

Rahul Singh1 , Deepti Gupta2

  1. Computer Science and Engineering, UIET, Panjab University, Chandigarh, India.
  2. Computer Science and Engineering, UIET, Panjab University, Chandigarh, India.

Section:Research Paper, Product Type: Journal Paper
Volume-12 , Issue-5 , Page no. 59-67, May-2024

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v12i5.5967

Online published on May 31, 2024

Copyright © Rahul Singh , Deepti Gupta . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Rahul Singh , Deepti Gupta, “Enhancing Interpretable Anomaly Detection: Depth-based Extended Isolation Forest Feature Importance (DEIFFI),” International Journal of Computer Sciences and Engineering, Vol.12, Issue.5, pp.59-67, 2024.

MLA Style Citation: Rahul Singh , Deepti Gupta "Enhancing Interpretable Anomaly Detection: Depth-based Extended Isolation Forest Feature Importance (DEIFFI)." International Journal of Computer Sciences and Engineering 12.5 (2024): 59-67.

APA Style Citation: Rahul Singh , Deepti Gupta, (2024). Enhancing Interpretable Anomaly Detection: Depth-based Extended Isolation Forest Feature Importance (DEIFFI). International Journal of Computer Sciences and Engineering, 12(5), 59-67.

BibTex Style Citation:
@article{Singh_2024,
author = {Rahul Singh , Deepti Gupta},
title = {Enhancing Interpretable Anomaly Detection: Depth-based Extended Isolation Forest Feature Importance (DEIFFI)},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2024},
volume = {12},
Issue = {5},
month = {5},
year = {2024},
issn = {2347-2693},
pages = {59-67},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=5693},
doi = {https://doi.org/10.26438/ijcse/v12i5.5967}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v12i5.5967}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=5693
TI - Enhancing Interpretable Anomaly Detection: Depth-based Extended Isolation Forest Feature Importance (DEIFFI)
T2 - International Journal of Computer Sciences and Engineering
AU - Rahul Singh , Deepti Gupta
PY - 2024
DA - 2024/05/31
PB - IJCSE, Indore, INDIA
SP - 59-67
IS - 5
VL - 12
SN - 2347-2693
ER -

VIEWS PDF XML
112 137 downloads 51 downloads
  
  
           

Abstract

The research introduces a novel approach, Depth-based Extended Isolation Forest Feature Importance (DEIFFI), to enhance the interpretability of Extended Isolation Forest (EIF) algorithm in anomaly detection (AD). Anomaly detection is critical for identifying rare and significant deviations from norm in data. However, understanding the reasons behind classifying instances as anomalies poses a challenge. DEIFFI addresses this challenge by providing valuable insights, empowering users of EIF-based AD to conduct thorough root cause analysis. A noteworthy feature of DEIFFI is its capacity to improve interpretability without imposing heavy computational burdens. This is crucial for real world applications requiring efficient AD, particularly in situations demanding real-time decision-making. DEIFFI achieves remarkable results with low computational costs, making it an appealing option for practical implementations. With an accuracy of 0.914 and 0.942, precision of 0.607 and 0.64, recall of 0.773 and 0.96, and an F1 score of 0.68 and 0.768 on real and synthetic datasets, respectively. DEIFFI provides interpretable insights alongside competitive performance metrics, solidifying its suitability for real-time decision support. Importantly, DEIFFI contributes to AD by enhancing interpretability and assisting in unsupervised feature selection. This dual capability highlights practical utility of DEIFFI, improving EIF’s capabilities and extending its applicability across diverse AD scenarios.

Key-Words / Index Term

Anomaly Detection, Explainable Artificial Intelligence, Extended Isolation Forest, Feature Selection, Interpretability, Outlier Detection

References

[1] Andrew Bell, Ian Solano-Kamaiko, Oded Nov, and Julia Stoyanovich. It’s just not that simple: an empirical study of the accuracy-explainability trade-off in machine learning for public policy. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp.248–266, 2022.
[2] G´erard Biau and Erwan Scornet. A random forest guided tour. Test, 25, pp.197–227, 2016.
[3] Mattia Carletti, Matteo Terzi, and Gian Antonio Susto. Interpretable anomaly detection with diffi: Depth-based feature importance of isolation forest. Engineering Applications of Artificial Intelligence, 119:105730, 2023.
[4] Chengjie Chen, Hao Chen, Yi Zhang, Hannah R Thomas, Margaret H Frank, Yehua He, and Rui Xia. Tbtools: an integrative toolkit developed for interactive analyses of big biological data. Molecular plant, Vol.13, Issue.8, pp.1194–1202, 2020.
[5] Zhiguo Ding and Minrui Fei. An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC Proceedings Vol.46, Issue.20, pp.12–17, 2013.
[6] Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
[7] Timo Freiesleben, Gunnar K¨onig, Christoph Molnar, and Alvaro Tejero-Cantero. Scientific inference with interpretable machine learning: Analyzing models to learn about real-world phenomena. arXiv preprint arXiv:2206.05487, 2022.
[8] David Gunning and David Aha. Darpa‘s explainable artificial intelligence (xai) program. AI magazine, Vol.40, Issue.2, pp.44–58, 2019.
[9] Sahand Hariri, Matias Carrasco Kind, and Robert J. Brunner. Extended isolation forest. IEEE Transactions on Knowledge and Data Engineering, Vol.33, Issue.4, pp.1479–1489, 2021.
[10] Abderrahim BENI Hssane and Moulay Lahcen. Improved and balanced leach for heterogeneous wireless sensor networks. IJCSE International Journal on Computer Science and Engineering, Vol.2, Issue.8, pp.2633–2640, 2010.
[11] Vladislav Ishimtsev, Alexander Bernstein, Evgeny Burnaev, and Ivan Nazarov. Conformal k-nn anomaly detector for univariate data streams. In Conformal and Probabilistic Prediction and Applications, pages 213–227. PMLR, 2017.
[12] Pawe-l Karczmarek, Adam Kiersztyn, Witold Pedrycz, and Ebru Al. K-means-based isolation forest. Knowledge-based systems, 195:105659, 2020.
[13] Edwin M Knorr, Raymond T Ng, and Vladimir Tucakov. Distance-based outliers: algorithms and applications. The VLDB Journal, Vol.8, Issue.3, pp.237–253, 2000.
[14] Hans-Peter Kriegel, Matthias Schubert, and Arthur Zimek. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.444– 452, 2008.
[15] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In 2008 eighth ieee international conference on data mining, pp.413–422, 2008.
[16] Lorenzo Meneghetti, Matteo Terzi, Simone Del Favero, Gian Antonio Susto, and Claudio Cobelli. Data-driven anomaly recognition for unsupervised model-free fault detection in artificial pancreas. IEEE Transactions on Control Systems Technology, Vol.28, Issue.1, pp.33–47, 2020.
[17] Hla Yin Min and Win Zaw. Performance evaluation of energy efficient cluster-based routing protocol in wireless sensor networks. International Journal of Computer Science Engineering IJCSE, Vol.3, Issue.2, pp.71–76, 2014.
[18] KM Archana Patel and Prateek Thakral. The best clustering algorithms in data mining. In 2016 International Conference on Communication and Signal Processing (ICCSP), pp.2042–2046, 2016.
[19] Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. Self-driving database management systems. In CIDR, Vol.4, pp.1, 2017.
[20] Luca Puggini and Se‘n McLoone. An enhanced variable selection and isolation forest-based methodology for anomaly detection with oes data. Engineering Applications of Artificial Intelligence, 67: pp.126–135, 2018.
[21] Guillaume Staerman, Pavlo Mozharovskyi, Stephan Cl´emen¸con, and Florence d‘Alch´e Buc. Functional isolation forest. In Asian Conference on Machine Learning, pp.332–347. 2019.
[22] Simon Tong and Daphne Koller. Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov), pp.45–66, 2001.
[23] Ke Wu, Kun Zhang, Wei Fan, Andrea Edwards, and S Yu Philip. Rs-forest: A rapid density estimator for streaming anomaly detection. In 2014 IEEE international conference on data mining, pp.600–609, 2014.
[24] Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, and Xiuwen Yi. Dnn-based prediction model for spatio-temporal data. In Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems, pp.1–4, 2016.