simsurv: A Package for Simulating Simple or Complex Survival Data - PowerPoint PPT Presentation

simsurv: A Package for Simulating Simple or Complex Survival Data Sam Brilleman 1,2 , Rory Wolfe 1,2 , Margarita Moreno-Betancur 2,3,4 , Michael J. Crowther 5 useR! Conference 2018 Brisbane, Australia 10-13 th July 2018 1 Monash University, Melbourne, Australia 2 Victorian Centre for Biostatistics (ViCBiostat) 3 Murdoch Children’s Research Institute, Melbourne, Australia 4 University of Melbourne, Melbourne, Australia 5 University of Leicester, Leicester, UK

Outline • Background to survival analysis • A general method for simulating event times • Examples of using the ‘ simsurv ’ package • Summary 2

What is survival analysis? • The analysis of a variable that corresponds to the time from a defined baseline (e.g. diagnosis of a disease) until occurrence of an event of interest (e.g. heart failure). 3

What is survival analysis? • The analysis of a variable that corresponds to the time from a defined baseline (e.g. diagnosis of a disease) until occurrence of an event of interest (e.g. heart failure). • Also known as: • Time-to-event analysis • Duration analysis (economics) • Reliability analysis (engineering) • Event history analysis (sociology) 4

What is survival analysis? • The analysis of a variable that corresponds to the time from a defined baseline (e.g. diagnosis of a disease) until occurrence of an event of interest (e.g. heart failure). • Also known as: • Time-to-event analysis • Duration analysis (economics) • Reliability analysis (engineering) • Event history analysis (sociology) • The context for this talk will be health research • Each observational unit will be an “individual” (e.g. a patient) 5

Why simulate survival data? • To evaluate the performance of new or existing statistical methods for survival analysis 6

Why simulate survival data? • To evaluate the performance of new or existing statistical methods for survival analysis • To calculate statistical power, e.g. in planning clinical trials 7

Why simulate survival data? • To evaluate the performance of new or existing statistical methods for survival analysis • To calculate statistical power, e.g. in planning clinical trials • To calculate uncertainty in model predictions, e.g. transition probabilities in multistate models 8

Why simulate survival data? • To evaluate the performance of new or existing statistical methods for survival analysis • To calculate statistical power, e.g. in planning clinical trials • To calculate uncertainty in model predictions, e.g. transition probabilities in multistate models • …others? 9

Modelling survival data ∗ denote the “true” event time for individual 𝑗 • Let 𝑈 𝑗 ∗ may not be observed due to right censoring, e.g. the study ending before • In practice, 𝑈 𝑗 an individual experiences the event 10

Modelling survival data ∗ denote the “true” event time for individual 𝑗 • Let 𝑈 𝑗 ∗ may not be observed due to right censoring, e.g. the study ending before • In practice, 𝑈 𝑗 an individual experiences the event ∗ directly , e.g. “accelerated failure time (AFT)” models • Possible to model 𝑼 𝒋 11

Modelling survival data ∗ denote the “true” event time for individual 𝑗 • Let 𝑈 𝑗 ∗ may not be observed due to right censoring, e.g. the study ending before • In practice, 𝑈 𝑗 an individual experiences the event ∗ directly , e.g. “accelerated failure time (AFT)” models • Possible to model 𝑼 𝒋 • But more common to model the rate of occurrence of the event (e.g. the “Cox” model) • The hazard at time t is defined as the instantaneous rate of occurrence for the event at time t ∗ < 𝑢 + Δ𝑢 │𝑈 𝑗 ∗ > 𝑢) 𝑄(𝑢 ≤ 𝑈 𝑗 ℎ 𝑗 𝑢 = lim Δ𝑢 Δ𝑢→0 12

The hazard, cumulative hazard & survival • Hazard (for individual 𝑗 ): ℎ 𝑗 𝑢 𝑢 • Cumulative hazard: 𝐼 𝑗 𝑢 = ׬ ℎ 𝑗 𝑡 𝑒𝑡 𝑡=0 ∗ > 𝑢 = exp −𝐼 𝑗 𝑢 • Survival probability: 𝑇 𝑗 𝑢 = 𝑄 𝑈 𝑗 13

The hazard, cumulative hazard & survival • Hazard (for individual 𝑗 ): ℎ 𝑗 𝑢 𝑢 • Cumulative hazard: 𝐼 𝑗 𝑢 = ׬ ℎ 𝑗 𝑡 𝑒𝑡 𝑡=0 ∗ > 𝑢 = exp −𝐼 𝑗 𝑢 • Survival probability: 𝑇 𝑗 𝑢 = 𝑄 𝑈 𝑗 This is the complement of the CDF for the distribution of event times 14

The hazard, cumulative hazard & survival • Hazard (for individual 𝑗 ): ℎ 𝑗 𝑢 𝑢 • Cumulative hazard: 𝐼 𝑗 𝑢 = ׬ ℎ 𝑗 𝑡 𝑒𝑡 𝑡=0 ∗ > 𝑢 = exp −𝐼 𝑗 𝑢 • Survival probability: 𝑇 𝑗 𝑢 = 𝑄 𝑈 𝑗 This is the complement of the CDF for the distribution of event times • The “probability integral transformation” tells us 1 − 𝐺 𝑌 𝑌 = 𝑉 , where 𝐺 𝑌 . is the CDF of a continuous random variable 𝑌 , and 𝑉 is a uniform random variable on the range 0 to 1 15

Cumulative hazard inversion • The result from the previous slide tells us 𝑡 = 𝐼 𝑗 −1 − log 𝑉 𝑗 𝑡 exp −𝐼 𝑗 𝑈 𝑗 = 𝑉 𝑗 ⟹ 𝑈 𝑗 where 𝑡 is a randomly drawn (i.e. simulated) event time for individual 𝑗 • 𝑈 𝑗 • 𝑉 𝑗 is a random uniform variable on the range 0 to 1 𝑢 • 𝐼 𝑗 𝑢 = ׬ ℎ 𝑗 𝑡 𝑒𝑡 is the cumulative hazard evaluated at time 𝑢 𝑡=0 16

Cumulative hazard inversion • The result from the previous slide tells us 𝑡 = 𝐼 𝑗 −1 − log 𝑉 𝑗 𝑡 exp −𝐼 𝑗 𝑈 𝑗 = 𝑉 𝑗 ⟹ 𝑈 𝑗 where 𝑡 is a randomly drawn (i.e. simulated) event time for individual 𝑗 • 𝑈 𝑗 • 𝑉 𝑗 is a random uniform variable on the range 0 to 1 𝑢 • 𝐼 𝑗 𝑢 = ׬ ℎ 𝑗 𝑡 𝑒𝑡 is the cumulative hazard evaluated at time 𝑢 𝑡=0 • Commonly known as the ‘cumulative hazard inversion method’ [1,2] • Easy and efficient when 𝐼 𝑗 𝑢 has a closed form and is invertible [1] Leemis LM. Variate Generation for Accelerated Life and Proportional Hazards Models. Operations Research , 1987: 35(6); 892 – 894. [2] Bender R et al. Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine. 2005: 24(11); 1713 – 1723.

Cumulative hazard inversion • The result from the previous slide tells us 𝑡 = 𝐼 𝑗 −1 − log 𝑉 𝑗 𝑡 exp −𝐼 𝑗 𝑈 𝑗 = 𝑉 𝑗 ⟹ 𝑈 𝑗 where 𝑡 is a randomly drawn (i.e. simulated) event time for individual 𝑗 • 𝑈 𝑗 • 𝑉 𝑗 is a random uniform variable on the range 0 to 1 𝑢 • 𝐼 𝑗 𝑢 = ׬ ℎ 𝑗 𝑡 𝑒𝑡 is the cumulative hazard evaluated at time 𝑢 𝑡=0 • Commonly known as the ‘cumulative hazard inversion method’ [1,2 ] • Easy and efficient when 𝐼 𝑗 𝑢 has a closed form and is invertible • But for complex specifications of ℎ 𝑗 𝑢 : [1] Leemis LM. Variate Generation for Accelerated Life and Proportional Hazards Models. Operations Research , 1987: 35(6); 892 – 894. [2] Bender R et al. Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine. 2005: 24(11); 1713 – 1723.

Cumulative hazard inversion • The result from the previous slide tells us 𝑡 = 𝐼 𝑗 −1 − log 𝑉 𝑗 𝑡 exp −𝐼 𝑗 𝑈 𝑗 = 𝑉 𝑗 ⟹ 𝑈 𝑗 where 𝑡 is a randomly drawn (i.e. simulated) event time for individual 𝑗 • 𝑈 𝑗 • 𝑉 𝑗 is a random uniform variable on the range 0 to 1 𝑢 • 𝐼 𝑗 𝑢 = ׬ ℎ 𝑗 𝑡 𝑒𝑡 is the cumulative hazard evaluated at time 𝑢 𝑡=0 • Commonly known as the ‘cumulative hazard inversion method’ [1,2] • Easy and efficient when 𝐼 𝑗 𝑢 has a closed form and is invertible • But for complex specifications of ℎ 𝑗 𝑢 : • 𝐼 𝑗 𝑢 may not have a closed form [1] Leemis LM. Variate Generation for Accelerated Life and Proportional Hazards Models. Operations Research , 1987: 35(6); 892 – 894. [2] Bender R et al. Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine. 2005: 24(11); 1713 – 1723.

Cumulative hazard inversion • The result from the previous slide tells us 𝑡 = 𝐼 𝑗 −1 − log 𝑉 𝑗 𝑡 exp −𝐼 𝑗 𝑈 𝑗 = 𝑉 𝑗 ⟹ 𝑈 𝑗 where 𝑡 is a randomly drawn (i.e. simulated) event time for individual 𝑗 • 𝑈 𝑗 • 𝑉 𝑗 is a random uniform variable on the range 0 to 1 𝑢 • 𝐼 𝑗 𝑢 = ׬ ℎ 𝑗 𝑡 𝑒𝑡 is the cumulative hazard evaluated at time 𝑢 𝑡=0 • Commonly known as the ‘cumulative hazard inversion method’ [1,2] • Easy and efficient when 𝐼 𝑗 𝑢 has a closed form and is invertible • But for complex specifications of ℎ 𝑗 𝑢 : • 𝐼 𝑗 𝑢 may not have a closed form • 𝐼 𝑗 𝑢 may not be invertible [1] Leemis LM. Variate Generation for Accelerated Life and Proportional Hazards Models. Operations Research , 1987: 35(6); 892 – 894. [2] Bender R et al. Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine. 2005: 24(11); 1713 – 1723.

simsurv: A Package for Simulating Simple or Complex Survival Data - PowerPoint PPT Presentation

simsurv: A Package for Simulating Simple or Complex Survival Data Sam Brilleman 1,2 , Rory Wolfe 1,2 , Margarita Moreno-Betancur 2,3,4 , Michael J. Crowther 5 useR! Conference 2018 Brisbane, Australia 10-13 th July 2018 1 Monash University,

Simulating Syst Simulating Systems in Gr ems in Ground V ound Vehicle hicle Design Design

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

Package Managers CC-BY-SA 2016 Nate Levesque What is a Package Manager? A package manager or

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

Intermembrane Space H + H + Cyt c Co Q Complex Complex III IV H + ATPase H + Complex

An Efficient Algorithm for An Efficient Algorithm for Simulating Coalescence with Simulating

Simulating Search Strategies Simulating Search Strategies for Gnutella for Gnutella Chun Wai

Simulating the effects of anticoagulant drugs Simulating the effects of anticoagulant drugs on

Non-Photorealistic Computer Graphics Chapter 6 Simulating Natural Media and Artistic Techniques

Simulating Chromosome Segregation Qi Zheng Simulating Chromosome Segregation Qi Zheng

Syscall Proxying Simulating Remote Execution Maximiliano Cceres maximiliano.caceres@corest.com

Classification of curves Simple, not closed Simple, closed Closed, not simple Not simple, not

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

GETTING STARTED? BASIC PREMIUM SHRED10 PACKAGE PACKAGE* PACKAGE* $61.50 /month $132.75

croft design studio Package Prices 2020 Package Prices We are now offering these package

Package Management with Package Management with Package Management with Anaconda Anaconda

Analysis of Country-wide Internet Outages Caused by Censorship Alberto Dainotti - alberto@unina.it

Estimation of the survival function Rasmus Waagepetersen Department of Mathematics Aalborg

EM Algorithm Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Ch. 4 in Givens & Hoeting

Analysis of Competing Risks in the Pareto Model for Progressive Censoring with binomial removals

Data-Discriminants of Likelihood Equations Jose Israel Rodriguez 1 and Xiaoxian Tang 2 1 University

Publishing Census Data as Linked Open Data Monica Scannapieco, R. M. Aracri, S. De Francisci, A.

A bit of context Ali Modarres Ali Modarres Historical Pa3erns

Census of Lya, [OIII]5007, Ha, and [CII]158um Line Emission with 1000 LAEs at z=4.9-7.0 Revealed

simsurv: A Package for Simulating Simple or Complex Survival Data - PowerPoint PPT Presentation

simsurv: A Package for Simulating Simple or Complex Survival Data Sam Brilleman 1,2 , Rory Wolfe 1,2 , Margarita Moreno-Betancur 2,3,4 , Michael J. Crowther 5 useR! Conference 2018 Brisbane, Australia 10-13 th July 2018 1 Monash University,

Simulating Syst Simulating Systems in Gr ems in Ground V ound Vehicle hicle Design Design

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

Package Managers CC-BY-SA 2016 Nate Levesque What is a Package Manager? A package manager or

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

Intermembrane Space H + H + Cyt c Co Q Complex Complex III IV H + ATPase H + Complex

An Efficient Algorithm for An Efficient Algorithm for Simulating Coalescence with Simulating

Simulating Search Strategies Simulating Search Strategies for Gnutella for Gnutella Chun Wai

Simulating the effects of anticoagulant drugs Simulating the effects of anticoagulant drugs on

Non-Photorealistic Computer Graphics Chapter 6 Simulating Natural Media and Artistic Techniques

Simulating Chromosome Segregation Qi Zheng Simulating Chromosome Segregation Qi Zheng

Syscall Proxying Simulating Remote Execution Maximiliano Cceres maximiliano.caceres@corest.com

Classification of curves Simple, not closed Simple, closed Closed, not simple Not simple, not

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

GETTING STARTED? BASIC PREMIUM SHRED10 PACKAGE PACKAGE* PACKAGE* $61.50 /month $132.75

croft design studio Package Prices 2020 Package Prices We are now offering these package

Package Management with Package Management with Package Management with Anaconda Anaconda

Analysis of Country-wide Internet Outages Caused by Censorship Alberto Dainotti - alberto@unina.it

Estimation of the survival function Rasmus Waagepetersen Department of Mathematics Aalborg

EM Algorithm Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Ch. 4 in Givens &amp; Hoeting

Analysis of Competing Risks in the Pareto Model for Progressive Censoring with binomial removals

Data-Discriminants of Likelihood Equations Jose Israel Rodriguez 1 and Xiaoxian Tang 2 1 University

Publishing Census Data as Linked Open Data Monica Scannapieco, R. M. Aracri, S. De Francisci, A.

A bit of context Ali Modarres Ali Modarres Historical Pa3erns

Census of Lya, [OIII]5007, Ha, and [CII]158um Line Emission with 1000 LAEs at z=4.9-7.0 Revealed

EM Algorithm Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Ch. 4 in Givens & Hoeting