Artworks and Articles Meet Artworks and Articles Meet MAPPER and Persistent MAPPER and Persistent Homology Homology Presented by Alicia Ledesma Alonso and Hongyuan Zhang Presentation design adapted from slidesgala.com https://slidesgala.com/sheldon/
Why TDA? Why TDA? ● Coordinate freeness ● Deformation invariance ● Compressed representations
Topology Topology ● Topology and topological spaces ● Distance and metrics ● Simplicial Complex ● Persistent Homology
Pipeline Pipeline Raw Data Raw Data Cleaned/Filtered data Cleaned/Filtered data Analyze Analyze Mapper Mapper Persistent Homology Persistent Homology
What is persistent homology What is persistent homology? Filtration example Filtration example Barcodes Barcodes
What is Mapper? What is Mapper? Ideally, we can recover the topological features of the original data cloud from the resulting simplicial complex. Credit to: “A User’s Guide to Topological Data Analysis” by Elizabeth Munch
arXiv arXiv • arXiv Data - arXiv online API and AmazonS3 • arXiv persistent homology -Select random samples -Identify persistent intervals -Identify differences • arXiv Mapper - Color by academic categories - Explore various lenses - Compare
arXiv metric arXiv metric How do we measure distance between two articles? L. Carlsson, G. Carlsson, and M. Vejdemo-Johansson. Fibres of Failure: Classifying errors in predictive processes. arXiv e -prints, February 2018.
arXiv Persistent Homology arXiv Persistent Homology - Dionysus Dionysus
arXiv Color Function arXiv Color Function
Met Met • Met Data - Official MET GitHub • Met persistent homology - Select random samples - Identify persistent intervals - Identify differences • Met Mapper - Identify subgroups - Select significant features - Compare
Met Metric Met Metric Q: How to measure distance between two artworks? A: Mixed type of data->measure each type using different metrics For categorical features->Jaccard distance For numerical features->difference divided by max distance
Met Mapper Met Mapper
Statistical Analysis Statistical Analysis
Model Comparison Model Comparison Model 1: “Is Public Domain” ~ “Drawings and Prints” Model 2: “Is Public Domain” ~ all variables Model Accuracy Scores (using Python Sklearn score() method): Model 1 52.17% Model 2 73.83% Mapper is effective in guiding feature selection!
Met Persistent Homology Met Persistent Homology [4, 5) is a relatively persistent interval for both groups in Dimension 1! Persistent Homology can help classification! comparing the number of persistent barcodes and the distributions of variables
Thank you! Thank you to Professor Marcos Ortiz for his mentorship, Grinnell College and the NSF for providing funding, and the Department of Mathematics and Statistics of Grinnell College for providing this opportunity.
Recommend
More recommend