Using C++ to speed up R code Th eo Michelot School of Mathematics - PowerPoint PPT Presentation

Using C++ to speed up R code Th´ eo Michelot School of Mathematics and Statistics 2 May 2017

Introduction R is a great tool for data analysis: • easy to learn; • designed with statistics in mind; • thousands of packages. But the convenience and flexibity come at the cost of speed. C++ (or e.g. Fortran) is a compiled programming language: • more complex and rigorous syntax; • usually very fast. 2 / 26

Introduction # R n <- 0 for(i in 1:1 e8) # big loop n <- n + 1 Time difference of 28.70 secs // C++ int n = 0; for(int i=1; i <=1 E8; i++) // big loop n = n + 1; Time difference of 0.27 secs 3 / 26

Rcpp 4 / 26

Prehistory The base R function .Call() loads C++ functions, but is not very user-friendly. dyn.load("myfun.so") myfunR <- function(arg1 , arg2 , arg3) { .Call("myfunCpp", arg1 , arg2 , arg3) } # Now I can use myfunR as a normal R function where myfunCpp is defined in the file “myfun.cpp”. − → Rcpp takes care of all the annoying technical tweaks. 5 / 26

Rcpp Rcpp is an R package. It can be installed with: install.packages("Rcpp") Then: • Write code in C++, using a simplified syntax quite close to R; • Rcpp compiles the C++ code, and generates an R function for each C++ function; • The C++ functions can be called from R. Note: more than 800 R packages on CRAN use Rcpp under the hood. 6 / 26

New C++ types Each variable is defined with a type in C++, e.g. int , double , char ... It is tricky to assign a C++ type to R variables. Rcpp includes a few additional types: • NumericVector , CharacterVector , LogicalVector ... • NumericMatrix , CharacterMatrix , LogicalMatrix ... • List • DataFrame • ... 7 / 26

Rcpp example example.cpp : #include <Rcpp.h> using namespace Rcpp; // [[ Rcpp :: export ]] double matsum_rcpp ( NumericMatrix mat) { double res = 0; for(int i=0; i<mat.nrow (); i++) for(int j=0; j<mat.ncol (); j++) res = res + mat[i,j]; return res; } 8 / 26

Rcpp example library(Rcpp) sourceCpp("example.cpp") A <- matrix (1,3,3) s1 <- matsum_rcpp(A) B <- matrix (2,3,3) s2 <- matsum_rcpp(B) > s1 [1] 9 > s2 [1] 18 9 / 26

RcppArmadillo Armadillo is a C++ library ( ≈ package) for linear algebra, and the R package RcppArmadillo makes it available from Rcpp. Features include: • better vector and matrix types, arma::vec and arma::mat ; • very fast routines for linear algebra operations, e.g. matrix multiplication, matrix inverse, linear system solving... 10 / 26

RcppArmadillo example #include <RcppArmadillo .h> // [[ Rcpp :: depends( RcppArmadillo )]] using namespace Rcpp; // [[ Rcpp :: export ]] arma :: mat matprod_rcpp (arma :: mat mat1 , arma :: mat mat2) { if(mat1.n_cols != mat2.n_rows) stop(" Incompatible matrix dimensions."); return mat1*mat2; } 11 / 26

RcppArmadillo example library( RcppArmadillo ) sourceCpp("example.cpp") A <- matrix (1:9 ,3 ,3) B <- matrix (1:9 ,3 ,3) P <- matprod_rcpp(A,B) > P [,1] [,2] [,3] [1,] 30 66 102 [2,] 36 81 126 [3,] 42 96 150 > A%*%B [,1] [,2] [,3] [1,] 30 66 102 [2,] 36 81 126 [3,] 42 96 150 12 / 26

RcppArmadillo example A <- matrix (1:8 ,4 ,2) B <- matrix (1:9 ,3 ,3) > P <- matprod_rcpp(A,B) Error in matprod_rcpp(A, B) : Incompatible matrix dimensions . 13 / 26

Other functionalities • Common probability distributions can be used in C++ code written with Rcpp: R::dnorm , R::dgamma , R::dexp ... • An alternative to RcppArmadillo is RcppEigen. • Dirk Eddelbuettel 14 / 26

Template Model Builder (TMB) 15 / 26

Automatic differentiation Automatic differentation is a method of numerical evaluation of the gradient of a function. Idea: 1 Write the function as the combination of many basic functions, like additions, multiplications, exponentials... f = f 1 ◦ f 2 ◦ · · · ◦ f k 2 Differentiate each basic function f i . 3 Use the chain rule to obtain the derivative of f . − → Can come in very handy for numerical optimisation, e.g. maximum likelihood estimation. 16 / 26

Laplace approximation The Laplace approximation of a likelihood function consists in locally approximating it by a Gaussian probability function. 0.5 0.4 0.3 likelihood 0.2 0.1 0.0 0 1 2 3 4 5 6 parameter − → It’s very simple to differentiate the approximate likelihood. 17 / 26

TMB workflow 1 Write the likelihood function in C++ using templates. (Or the negative log-likelihood, typically.) 2 Compile the C++ code from R. 3 TMB returns an R object which contains • the likelihood function; • the gradient (function) of the likelihood. 4 Optimise the likelihood in R. Optimisers can often take the gradient function as an argument, to make things faster. 18 / 26

TMB example #include <TMB.hpp > template <class Type > Type objective_function <Type >:: operator () () { // data (input from R) DATA_VECTOR (x); // parameters (input from R) PARAMETER(mu); PARAMETER(sigma); Type f = -sum(dnorm(x,mu ,sigma ,true)); return f; } library(TMB) compile("nllk.cpp") dyn.load(dynlib("nllk")) # simulate data x <- rnorm(n=1e4 , mean=3, sd =10) # create likelihood object nllk <- MakeADFun(data=list(x=x), parameters=list(mu=0, sigma =1)) # estimate maximum likelihood fit <- nlminb(start=nllk $ par , objective=nllk $ fn , gradient=nllk $ gr , lower=c(-Inf ,0) , upper=c(Inf ,Inf)) 19 / 26

TMB example > fit $ par mu sigma 2.965458 9.947335 $ objective [1] 37162.43 20 / 26

Speed comparison 21 / 26

Normal distribution 10 6 samples simulated from N ( 3 , 10 2 ) − → Estimate µ and σ by MLE. Pure R Rcpp TMB ∆ t 2.61 secs 1.86 sec 1.84 sec ∆ t / min(∆ t ) 1.4 1.0 1.0 log( L max ) -3721981 -3721981 -3721981 22 / 26

Normal distribution > dnorm function (x, mean = 0, sd = 1, log = FALSE) .Call(C_dnorm , x, mean , sd , log) <bytecode: 0x27d9268 > <environment : namespace:stats > > sum function (... , na.rm = FALSE) .Primitive("sum") 23 / 26

Poisson hidden Markov model X t − 1 X t X t + 1 observations Markov process S t − 1 S t S t + 1 where X t ∼ Poisson ( λ S t ) , S t ∈ { 1 , 2 } Parameters to estimate: • λ 1 , λ 2 , • γ 21 = Pr( S t + 1 = 2 | S t = 1 ) , γ 12 = Pr( S t + 1 = 1 | S t = 2 ) . − → The likelihood is a matrix product with 2 N terms. 24 / 26

Poisson hidden Markov model 10 5 samples simulated from the 2-state Poisson HMM − → Estimate λ 1 , λ 2 , γ 12 and γ 21 by MLE. Pure R Rcpp TMB ∆ t 1.16 min 6.04 secs 1.21 sec ∆ t / min(∆ t ) 57.5 5.0 1.0 log( L max ) -290672.9 -290672.9 -291233.3 25 / 26

Thanks for your attention Eddelbuettel, D., and Francois, R. (2011). Rcpp: Seamless R and C++ Integration. Journal of Statistical Software , 40(8), 1-18. Kristensen, K., Nielsen, A., Berg, C.W., Skaug, H., and Bell, B.M. (2016). TMB: Automatic Differentiation and Laplace Approximation. Journal of Statistical Software , 70(5), 1-21.

Using C++ to speed up R code Th eo Michelot School of Mathematics - PowerPoint PPT Presentation

Using C++ to speed up R code Th eo Michelot School of Mathematics and Statistics 2 May 2017 Introduction R is a great tool for data analysis: easy to learn; designed with statistics in mind; thousands of packages. But the

Cedar Rapids RLR & Speed Des Moines RLR & Speed

Speed, speed, speed D. J. Bernstein University of Illinois at Chicago; Ruhr University Bochum

SPEED OF THOUGHT SPEED OF THOUGHT 120m/s SPEED OF THOUGHT COMMUNICATIVE The Artist is Absent:

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

POWERED STARTUPS Speed@BDD Presentation July 2017 SPEED@BDD IN A NUTSHELL Speed@BDD is a

Speed Bump? http://www.skepticalscience.com/graphics.php?g=47 Speed Bump?

MCC Speed Management Policy Agenda Purpose of the Speed Management Policy Results of

Lab 9. Speed Control of a D.C. motor Sensing Motor Speed (Tachometer Frequency Method) Motor

10 years of Speed Tables Peter da Silva FlightAware What are Speed Tables? What are Speed

Speed, speed, speed $1000 TCR hashing competition D. J. Bernstein Crowley: I have a problem

80% of Code Red 2 Code Red 2 re-re- Code Red 1 and Code Red 2 Code Red 2 re- cleaned up

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

in practice source code source code javac scalac groovyc jrubyc 0xCAFEBABE byte code

SOUTHWEST BCSW ENROLLMENT GROWTH FTES by ZIP CODE FTES by ZIP CODE 2013-14 FTES by ZIP CODE

INF5110 Compiler Construction Spring 2016 1 / 98 Outline 1. Intermediate code generation

Armadillo C++ linear algebra library Umair Muslim 341318 Overview Introduction

Tern http://ternjs.net 1 // Demo variable 2 3 var audience = "GOTO

L ECTURE 13: R EACTIVE CONTROLLERS B EHAVIOR -B ASED C ONTROL N AVIGATION & O BSTACLE AVOIDANCE

Programming of Interactive Systems Anastasia Bezerianos introduction.prog.is@gmail.com 1

Interfacing C++ code from R Sren Hjsgaard Department of Mathematical Sciences Aalborg

STEPHEN JARRETT, VP GAME DESIGN @ ARKADIUM STEPH PHEN N JARRETT VP P Game Desi sign gn NE

building software with ease kenneth.hoste@ugent.be HPC UGENT About HPC UGent: central

Semantics for Natural Languages Informatics 2A: Lecture 23 John Longley 7 November 2014 1 / 25

Using C++ to speed up R code Th eo Michelot School of Mathematics - PowerPoint PPT Presentation

Using C++ to speed up R code Th eo Michelot School of Mathematics and Statistics 2 May 2017 Introduction R is a great tool for data analysis: easy to learn; designed with statistics in mind; thousands of packages. But the

Cedar Rapids RLR &amp; Speed Des Moines RLR &amp; Speed

Speed, speed, speed D. J. Bernstein University of Illinois at Chicago; Ruhr University Bochum

SPEED OF THOUGHT SPEED OF THOUGHT 120m/s SPEED OF THOUGHT COMMUNICATIVE The Artist is Absent:

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

POWERED STARTUPS Speed@BDD Presentation July 2017 SPEED@BDD IN A NUTSHELL Speed@BDD is a

Speed Bump? http://www.skepticalscience.com/graphics.php?g=47 Speed Bump?

MCC Speed Management Policy Agenda Purpose of the Speed Management Policy Results of

Lab 9. Speed Control of a D.C. motor Sensing Motor Speed (Tachometer Frequency Method) Motor

10 years of Speed Tables Peter da Silva FlightAware What are Speed Tables? What are Speed

Speed, speed, speed $1000 TCR hashing competition D. J. Bernstein Crowley: I have a problem

80% of Code Red 2 Code Red 2 re-re- Code Red 1 and Code Red 2 Code Red 2 re- cleaned up

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

in practice source code source code javac scalac groovyc jrubyc 0xCAFEBABE byte code

SOUTHWEST BCSW ENROLLMENT GROWTH FTES by ZIP CODE FTES by ZIP CODE 2013-14 FTES by ZIP CODE

INF5110 Compiler Construction Spring 2016 1 / 98 Outline 1. Intermediate code generation

Armadillo C++ linear algebra library Umair Muslim 341318 Overview Introduction

Tern http://ternjs.net 1 // Demo variable 2 3 var audience = &quot;GOTO

L ECTURE 13: R EACTIVE CONTROLLERS B EHAVIOR -B ASED C ONTROL N AVIGATION &amp; O BSTACLE AVOIDANCE

Programming of Interactive Systems Anastasia Bezerianos introduction.prog.is@gmail.com 1

Interfacing C++ code from R Sren Hjsgaard Department of Mathematical Sciences Aalborg

STEPHEN JARRETT, VP GAME DESIGN @ ARKADIUM STEPH PHEN N JARRETT VP P Game Desi sign gn NE

building software with ease kenneth.hoste@ugent.be HPC UGENT About HPC UGent: central

Semantics for Natural Languages Informatics 2A: Lecture 23 John Longley 7 November 2014 1 / 25

Cedar Rapids RLR & Speed Des Moines RLR & Speed

Tern http://ternjs.net 1 // Demo variable 2 3 var audience = "GOTO

L ECTURE 13: R EACTIVE CONTROLLERS B EHAVIOR -B ASED C ONTROL N AVIGATION & O BSTACLE AVOIDANCE