health cloud project
play

Health Cloud Project Integrated Media Systems Center University of - PowerPoint PPT Presentation

Health Cloud Project Integrated Media Systems Center University of Southern California Dimitrios Stripelis stripeli@usc.edu 1 Purpose Compute Machine Learning models from independent Spark clusters Combine partial models to construct


  1. Health Cloud Project Integrated Media Systems Center University of Southern California Dimitrios Stripelis stripeli@usc.edu 1

  2. Purpose • Compute Machine Learning models from independent Spark clusters • Combine partial models to construct a unified ML model 2

  3. Framework Schematically • 1 Main Portal for submitting requests • Independent Spark clusters each residing on a remote hospital network 3

  4. Framework Operations • User accesses Portal (Server 1) and requests the construction of a ML model from each remote Spark Cluster • The Cluster receives the request and computes the model through Spark MLlib • Once computation finishes every model along with algorithmic-specific auxiliary data are returned to the Portal in jSON format for unification 4

  5. ML Algorithms Currently the Framework supports two principal Algorithms: Naive Bayes Linear Regression with Stochastic Gradient Descent (SGD) Extensible for: classification & regression: SVM, decision trees collaborative filtering: alternating least squares (ALS) clustering: k-means, Gaussian Mixture, Latent Dirichlet Allocation (LDA) optimization: limited-memory BFGS (L-BFGS) 5

  6. Datasets We evaluated the Framework’s efficiency against Medical datasets available at the UCI Machine Learning repository. The datasets were related to: • Single Proton Emission Computed Tomography images • Diabetes 130 • Parkinsons Telemonitoring Data Set 6

  7. Implementation We developed the Health Cloud Framework’s infrastructure on Microsoft Azure Service on three type D1 servers. Portal Role We use the main Portal (server 1) to submit the machine learning computation requests on each remote server (servers 2, 3) by passing the following arguments: 1. Accessible External Hostname for each server 2. Name of the Machine Learning Algorithm to be computed 3. Path to the training data file in the remote server 4. Path to the testing data file in the remote server 5. Aglorithmic-specific parameters for model computation 7

  8. Implementation • After we have submitted the request to the Framework, we initialize a Spark cluster on each server , i.e. a single Master and a single Worker on top of each machine, and we execute the appropriate jar file for the Machine Learning Algorithm (currently NaiveBayes or LinearRegression) we need to compute. • Synchronous Execution Once the model is computated, the jSON file is constructed and sent to the main Portal. Thereinafter, we terminate the Spark cluster operation on the server and we proceed with the computation of the ML model in the second machine. 8

  9. Significance • One of the main contributions of the Framework is that we can configure separately on each server the computation of an ML Algorithm using different training and testing datasets and experiment with the algorithmic specific parameters so that we can optimize the requested results without tranfering any data between the servers . • Furthermore, this implementation gives us the flexibility to combine same or even different Machine Learning models that can be produced from dissimilar datasets and domains in order to construct a unified model which can in turn lead us to a more generic ML model with almost the same accuracy as the initial models. 9

  10. Real Execution We call the following script from the Portal (Server 1) for NaiveBayes and LinearRegression computation and we receive the subsequent jSON files . /ml_cluster_exec.sh --server instance-trans2.cloudapp.net --algorithm NaiveBayes --training-file /u01/health_data/2servers_data/SPECT.train.part1.csv --testing-file /u01/health_data/SPECT.test.csv --parameters type=bernoulli smoothing=0.01 --server instance-trans3.cloudapp.net --algorithm LinearRegression --training-file /u01/health_data/2servers_data/parkinsons_updrs.data.part1.csv --testing-file /u01/health_data/2servers_data/parkinsons_updrs.data.test.csv --parameters iterations=3 stepsize=3 Server 2 - NaiveBayes parameters: Server 3 - LinearRegression parameters: Type: Bernoulli – dataset was 0s,1s Number of Iterations: 3 Additive Smoothing: 0.01 Step Size of Gradient Descent: 3.0 10

  11. Future Work Development • Distribute requests and retrieve results asynchronously • Extend Health Cloud Framework to support all the spectrum of the Spark MLlib Algorithms Research Oriented • Based on current experimental features continue exploring novel ML models by combining information derived from intermediate ones 11

Recommend


More recommend