A-posteriori Analyses of Pattern Recognition Results

Langer, Horst; Falsaperla, Susanna; Hammer, Conny

Welcome to the OA Earth-prints Repository!

Earth-Prints is an open archive created and maintained by Istituto Nazionale di Geofisica e Vulcanologia. This digital collection allows users to browse, search and access manuscripts, journal articles, theses, conference materials, books, book-chapters, web products.

The goal of our repository is to collect, capture, disseminate and preserve the results of research in the fields of Atmosphere, Cryosphere, Hydrosphere and Solid Earth. Earth-prints is young and growing rapidly. Check back often.

Please notice that some documents are protected by institutional policy. Please contact the authors for additional information.

Please use this identifier to cite or link to this item: http://hdl.handle.net/2122/13744

Authors:	Langer, Horst* Falsaperla, Susanna* Hammer, Conny*
Title:	A-posteriori Analyses of Pattern Recognition Results
Issue Date:	8-May-2020
Publisher:	EGU
URL:	https://meetingorganizer.copernicus.org/EGU2020/EGU2020-19751.html
DOI:	10.5194/egusphere-egu2020-19751
Keywords:	pattern recognition machine learning statistics data processing
Subject Classification:	05.06. Methods 05.01. Computational geophysics
Abstract:	Data-driven approaches applied to to large and complex data sets are intriguing, however the results must be revised with a critical attitude. For example, a diagnostic tool may provide hints for a serious disease, or for anomalous conditions potentially indicating an impending natural risk. The demand of a high score of identified anomalies – true positives - comes together with the request of a low percentage of false positives. Indeed, a high rate of false positives can ruin the diagnostics. Receiver Operation Curves (ROC) allows us to find a reasonable compromise between the need of accuracy of the diagnostics and robustness with respect to false alerts. In multiclass problems success is commonly measured as the score for which calculated and target classification of patterns matches at best. A high score does not automatically mean that a method is truly effective. Its value becomes questionable, when a random guess leads to a high score as well. The so called “Kappa Statistics” is an elegant way to assess the quality of a classification scheme. We present some case studies demonstrating how such a-posteriori analysis helps corroborate the results. Sometimes an approach does not lead to the desired success. In thes cases, a sound a-posteriori analysis of the reasons for the failure often provide interesting insights into the problem, Those problems may reside in an inappropriate definition of the targets, inadequate features, etc. Often the problems can be fixed just by adjusting some choices. Finally, a change of strategy may be necessary in order to achieve a more satisfying result. In the applications presented here, we highlight the pitfalls arising in particular from ill-defined targets and unsuitable feature selections. The validation of unsupervised learning is still a matter of debate. Some formal criteria (e. g. Davies Bouldin Index, Silhouette Index or other) are available for centroid-based clustering where a unique metric valid for all clusters can be defined. Difficulties arise when metrics are defined individually for each single cluster (for instance, Gaussian Model clusters, adaptive criteria) as well as using schemes where centroids are essentially meaningless. This is the case in density based clustering. In all these cases, users are better off when asking themselves whether a clustering is meaningful for the problem in physical terms. In our presentation we discuss the problem of choosing a suitable number of clusters in cases in which formal criteria are not applicable. We demonstrate how the identification of groups of patterns helps the identification of elements which have a clear physical meaning, even when strict rules for assessing the clustering are not available.
Appears in Collections:	Conference materials

Files in This Item:

File	Description	Size	Format
EGU2020-19751-print.pdf	Abstract	292.38 kB	Adobe PDF	View/Open
EGU2020-19751_presentation.pdf	Poster	1.17 MB	Adobe PDF	View/Open

Show full item record

Page view(s)

59

checked on Apr 24, 2024

Download(s)

20

checked on Apr 24, 2024

Google Scholar^TM

Check

Welcome to the OA Earth-prints Repository!

Files in This Item:

Page view(s)

Download(s)

Google Scholar^TM

Altmetric

INFO

Earth-prints working group

Welcome to the OA Earth-prints Repository!

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Altmetric

Google Scholar^TM