m etodos de estad stica computacional y machine learning
play

M etodos de estad stica computacional y machine learning para - PowerPoint PPT Presentation

M etodos de estad stica computacional y machine learning para ciencias de la vida, con una aplicaci on a COVID-19 Gonzalo E. Mena May 20th, 2020 Data Science Initiative and Statistics Department, Harvard University 1 Palabras


  1. M´ etodos de estad´ ıstica computacional y machine learning para ciencias de la vida, con una aplicaci´ on a COVID-19 Gonzalo E. Mena May 20th, 2020 Data Science Initiative and Statistics Department, Harvard University 1

  2. Palabras iniciales (advertencias) • Charla acad´ emica, resumen sobre trabajos aplicados en ciencias de la vida (neurociencia). • Al final: algunos m´ etodos para COVID-19. M´ as preguntas que respuestas. Mostrar´ e algunos datos y an´ alisis muy preliminares sin calibrar. • Objetivo: incentivar discusi´ on y motivar trabajo en el ´ area y el uso de modelos bayesianos. • Spanglish 2

  3. World’s current situation • Several large-scale imaging and stimulation technologies. To read and write neural activity. 3

  4. World’s current situation • Several large-scale imaging and stimulation technologies. To read and write neural activity. Goal: to develop new experimental tools that will • Consensus on revolutionize our understanding of the brain. relevance. 3

  5. World’s current situation • Several large-scale imaging and stimulation technologies. To read and write neural activity. Goal: to develop new experimental tools that will • Consensus on revolutionize our understanding of the brain. relevance. Major bottleneck: data analysis capabilities are much below high-throughput data collection rates (TB’s/hour). Cannot fully exploit the potential of these technologies. 3

  6. This talk: (Neural) Data Science + COVID-19 (at the end) Claim The dialog between life sciences (neuroscience) and Statistics/Mathematics/Computation is of mutual benefit. Here, Bayesian Statistics 4

  7. Large-scale Spike Sorting with Stimulation Artifacts

  8. Introduction Overarching goal Stimulation and recording in large multi-electrode arrays (MEA) to read and write neural activity to achieve control. lens 0.9mm saline 1.85mm 60µm retina electrical stimulation 8-15 µm physiological recording • For control need to know the stimulus → response map fast . • Large-scale, online data analysis. 512 electrodes, 20 Khz ∼ 50 GB/hour. • Scientific and Clinical significance: development of high-resolution retinal prosthesis. 5

  9. Tailored activation Goal: To generate artificial vision, elicit arbitrary patterns of neural activity with tailored stimuli. 6

  10. Tailored activation Question: Is it possible to activate only the colored neurons? 6

  11. Tailored activation Easier question: is it possible to activate only neuron A? 6

  12. Tailored activation Stimulating with a pulse of 0 . 5 µ A on the electrode around the soma does not activate neuron A. 6

  13. Tailored activation However, stimulating with 1 . 0 µ A does activate the neuron. 6

  14. Tailored activation Further, stimulating with 1 . 5 µ A also activates nearby neuron B, through its axon. 6

  15. Tailored activation Activation curves summarize responsiveness of neurons. Inferred from many increasing stimuli. 6

  16. Stimulation artifacts Major hurdle: electrical stimuli are sensed in electrodes as artifacts , stymying identification of neural activity. 7

  17. Stimulation artifacts Major hurdle: electrical stimuli are sensed in electrodes as artifacts , stymying identification of neural activity. 7

  18. Stimulation artifacts Major hurdle: electrical stimuli are sensed in electrodes as artifacts , stymying identification of neural activity. • Artifacts are much larger than spikes, overlap temporally with them. 7

  19. Stimulation artifacts Major hurdle: electrical stimuli are sensed in electrodes as artifacts , stymying identification of neural activity. • Artifacts are much larger than spikes, overlap temporally with them. Current solutions break down. Can take weeks to a human. Not online. 7

  20. Stimulation Artifacts Problem Data contains a nuisance parameter A , Y = A + s + ǫ, Recorded traces Y , artifact A , neural activity s and noise ǫ . To infer s need to know A . 8

  21. Stimulation Artifacts Problem Data contains a nuisance parameter A , Y = A + s + ǫ, Recorded traces Y , artifact A , neural activity s and noise ǫ . To infer s need to know A . Solution Impose structure and prior knowledge in A , s , and ǫ so ˆ A , ˆ s can be resolved. 8

  22. Neural activity structure • Spike sorting of spontaneous activity to identify neurons. • Provide us with templates (or spikes, or action potentials waveforms) 9

  23. The structure of stimulation artifacts • Properties are revealed by silencing neural activity. 10

  24. The structure of stimulation artifacts • Properties are revealed by silencing neural activity. • Decays smoothly with distance from stimulating electrode and has a peak in time. Increases with strength of stimulus. Doesn’t change if stimulus is the same. 10

  25. The structure of stimulation artifacts • Properties are revealed by silencing neural activity. • Decays smoothly with distance from stimulating electrode and has a peak in time. Increases with strength of stimulus. Doesn’t change if stimulus is the same. Non-linear and non-stationary, but smooth and structured. 10

  26. Crafting a principled solution Consider the model Y = A + s + ǫ, • Data Y = Y t , e , j , i over time (1 ≤ t ≤ T ), space (electrode, 1 ≤ e ≤ E ), strength (1 ≤ j ≤ J ) and trial (1 ≤ i ≤ I ) dimensions. 11

  27. Crafting a principled solution Consider the model Y = A + s + ǫ, • Data Y = Y t , e , j , i over time (1 ≤ t ≤ T ), space (electrode, 1 ≤ e ≤ E ), strength (1 ≤ j ≤ J ) and trial (1 ≤ i ≤ I ) dimensions. Imposing structure • Represent neural activity s with Toeplitz matrices (shapes) and binary vectors (timing). 11

  28. Crafting a principled solution Consider the model Y = A + s + ǫ, • Data Y = Y t , e , j , i over time (1 ≤ t ≤ T ), space (electrode, 1 ≤ e ≤ E ), strength (1 ≤ j ≤ J ) and trial (1 ≤ i ≤ I ) dimensions. Imposing structure • Represent neural activity s with Toeplitz matrices (shapes) and binary vectors (timing). • Gaussian process (GP) to encode prior knowledge of artifact A ∼ GP (0 , K θ ), and to borrow strength. 11

  29. Crafting a principled solution Consider the model Y = A + s + ǫ, • Data Y = Y t , e , j , i over time (1 ≤ t ≤ T ), space (electrode, 1 ≤ e ≤ E ), strength (1 ≤ j ≤ J ) and trial (1 ≤ i ≤ I ) dimensions. Imposing structure • Represent neural activity s with Toeplitz matrices (shapes) and binary vectors (timing). • Gaussian process (GP) to encode prior knowledge of artifact A ∼ GP (0 , K θ ), and to borrow strength. • Problem: n ≈ 10 6 artifact variables, O ( n 3 ) does not scale. 11

  30. Crafting a principled solution Consider the model Y = A + s + ǫ, • Data Y = Y t , e , j , i over time (1 ≤ t ≤ T ), space (electrode, 1 ≤ e ≤ E ), strength (1 ≤ j ≤ J ) and trial (1 ≤ i ≤ I ) dimensions. Imposing structure • Represent neural activity s with Toeplitz matrices (shapes) and binary vectors (timing). • Gaussian process (GP) to encode prior knowledge of artifact A ∼ GP (0 , K θ ), and to borrow strength. • Problem: n ≈ 10 6 artifact variables, O ( n 3 ) does not scale. • Solution: Kronecker decomposition K ( θ,φ 2 ) = ρ K t ⊗ K e ⊗ K j + φ 2 I . • Each kernel must represent smoothness and non-stationarity. 11

  31. Algorithm s from the model Y = A + s + ǫ, A ∼ GP (0 , K ˆ Goal: Obtain ˆ θ ) A , ˆ • Produce estimates increasingly in j (strength). • Rationale: at lowest strengths A is better behaved and easier to estimate. • Initial guess ˆ j +1 is the extrapolation from ˆ A 0 A [1 , j ] . • Given j , alternate between maximizing p ( s j | Y j , ˆ A j , ˆ θ ) for ˆ s j and s j , ˆ θ ) for ˆ maximizing p ( A j | Y j , ˆ A j . ˆ j , i given ˆ s n A j : s n j , i = T n b j , i are binary vectors; do greedy template • matching. 2 � � � � � ( Y j , i − ˆ � T n b j , i min A j ) − . � � b n � � j , i n � • ˆ A j given ˆ s j via filtering (posterior mean) of spike-subtracted traces. 12

  32. Example of sorting 13

  33. Large-scale automatic analysis Gray dots indicate human judgement. 14

  34. Population results 1,713,233 trials. • Accuracy greater than 99.5%, also agreement in latencies. 1 • Past: weeks → Now: ≈ 15 minutes. Compatible with online control experiments. • Enhanced capabilities of technology. 1 Mena et al., PLOS computational Biology, 2017 . 15

  35. Probabilistic neural identity inference in C.elegans

  36. The relevance of C.elegans 16

  37. The relevance of C.elegans 16

  38. The relevance of C.elegans 16

  39. The relevance of C.elegans 16

  40. A data processing pipeline • Raw data: 5D point processes (space x time x color) • First step: finding neurons. • Second step: identifying neurons 17

  41. Find neurons with the help of color Brainbow (Lichtman and Sanes, 2008) stochastic coloring of neurons Tamily Weissman, 2008 Photomicrography competition 18

Recommend


More recommend