how to use hpc ai500
play

How to Use HPC AI500 Zihan Jiang, Xingwang Xiong, Tianshu Hao, and - PowerPoint PPT Presentation

How to Use HPC AI500 Zihan Jiang, Xingwang Xiong, Tianshu Hao, and Jianfeng Zhan INSTITUTE O http://www.benchcouncil.org/HPCAI500/index.html OF C COMPUTING T ICT, Chinese Academy of Sciences TECHNOLOGY ASPLOS 2018, Williamsburg, VA, USA


  1. How to Use HPC AI500 Zihan Jiang, Xingwang Xiong, Tianshu Hao, and Jianfeng Zhan INSTITUTE O http://www.benchcouncil.org/HPCAI500/index.html OF C COMPUTING T ICT, Chinese Academy of Sciences TECHNOLOGY ASPLOS 2018, Williamsburg, VA, USA

  2. General Steps to Use HPC AI500 n Current release n Version 1.0 on • http://www.benchcouncil.org/HPCAI500/index.html n Reference Implementation on BenchHub: • http://125.39.136.212:8090/hpc-ai500/ n General steps to run the benchmarks n Download the reference implementation on BenchHub n Prepare the dataset, environment according README.md n Running the scripts (training, evaluation, inference) HPC AI500 Bench 19

  3. Download from BenchHub n http://125.39.136.212:8090/hpc-ai500/ n Component benchmarks: n http://125.39.136.212:8090/hpc-ai500/EWA (Extreme Weather Analysis) n Micro Benchmarks: n CUDA Version: http://125.39.136.212:8090/hpc-ai500/hpc-ai500- benchmark/tree/master/micro_benchmarks/CUDA_version n MKL Version : http://125.39.136.212:8090/hpc-ai500/hpc-ai500- benchmark/tree/master/micro_benchmarks/MKL_version HPC AI500 Bench 19

  4. n Component Benchmark n Extreme weather analysis n Micro Benchmark HPC AI500 Bench 19

  5. Overview n Extreme weather poses a great challenge to human society. Understanding extreme weather life cycle and even predicting its future trend become a significant scientific goal. n Achieving this goal always requires accurately identifying the weather patterns to acquire the insight of climate change based on massive climate data analysis. Tropical Cyclone Atmospheric River Tropical Depression Extratropical Cyclone HPC AI500 Bench 19

  6. Overview n Using deep learning as the data analysis tool to automatically identify the extreme weather patterns, instead of if-else rules defined by human expert. Essentially an object detection task Original weather images labeled weather images HPC AI500 Bench 19

  7. Dataset n Dataset Intro: n https://extremeweatherdataset.github.io/ n Dataset Download: The files are large (62 GB each). Obtain them n from the following Globus endpoint. https://app.globus.org/file- n manager?origin_id=89a33dca-e540-11e9-9bfc- 0a19784404f4&origin_path=%2F You will need a Globus endpoint of your own n for the transfer. n Features: 16 channels, high resolution (1152 * 768) n HPC AI500 Bench 19

  8. Adopted Model n Faster-RCNN n ResNet-50 + FPN • See the model-desc.log in the EWA repo on BenchHub for details. • http://125.39.136.212:8 090/hpc-ai500/EWA HPC AI500 Bench 19

  9. Running Steps n Data preprocessing # h5 ⟹ JSON file with COCO format python hdf5_to_json.py -i ${HDF5_PATH} -o ${ANNO_DIR_PATH} -y ${year} # h5 ⟹ 16-channel TIFF images python hdf5_to_tif.py -i ${HDF5_PATH} -o ${TIFF_DIR_PATH} -y ${year} HPC AI500 Bench 19

  10. Running Steps n Environment installation # build a docker image cd docker docker build -t climo .​ # start and run the docker image: docker run --gpus all --ipc=host -p 2222:22 -d climo docker exec -it climo bash HPC AI500 Bench 19

  11. ​ Running Steps n Training export PYTHONPATH="$(pwd)/src" mpirun -np 32 --hostfile "src/hostfile" -bind-to none -map-by slot \ -x NCCL_DEBUG=INFO \ -x LD_LIBRARY_PATH \ -x NCCL_SOCKET_IFNAME=eth0 \ --allow-run-as-root \ python src/train.py --logdir /path/to/logdir/ \ --config MODE_MASK=False MODE_FPN=True \ DATA.BASEDIR=${DATA_DIR} TRAINER=horovod HPC AI500 Bench 19

  12. Running Steps n Inference python src/predict.py --predict /path/to/dataset/1979/climo_1979_00101.tif \ --load train_log/${dir}/model- 247500 \ --config MODE_MASK=False MODE_FPN=True HPC AI500 Bench 19

  13. ​ Running Steps n Get the Time-to-accuracy # time_to_accuracy.sh export PYTHONPATH="$(pwd)/src"​ LOG_DIR=train_log/ ACC_THRESHOLD=0.11 python src/time_to_accuracy.py --logdir ${LOG_DIR} --acc_threshold ${ACC_THRESHOLD} HPC AI500 Bench 19

  14. Visualization n Run HTTP Server export PYTHONPATH="$(pwd)/src" python http-server.py --load train_log/model-167000 \ --config MODE_MASK=False MODE_FPN=True n Visualization on browser n http://localhost:5000 * The prediction result contains the predicted boxes and their confidence. * TD, TC, EC, and AR represent Tropical Depression, Tropical Cyclone, Extratropical Cyclone, and Atmospheric River, respectively. HPC AI500 Bench 19

  15. Other Metrics n Obtain other metrics from Tensorboard n training loss n mAP (mean Average Precision) HPC AI500 Bench 19

  16. The Impact of Batchsize Ranks:128 Ranks:32 HPC AI500 Bench 19

  17. Scaling Evaluation Only 50% scaling efficiency. 180 Reason: 160 The EWA workload use Faster-rcnn Throuthput(samples/sec) 140 for object detection. The sizes and 120 numbers of objects are different in each image, which leads to different 100 amount of computation in each rank. 80 60 40 20 0 1 8 16 32 Practical Ideal HPC AI500 Bench 19

  18. n Component Benchmark n Extreme weather analysis n Micro Benchmark HPC AI500 Bench 19

  19. Overview n Objective n Evaluate the upper bound performance of the systems. n Significant DL operators based on the component workloads. n Convolution n Pooling n Fully-connected HPC AI500 Bench 19

  20. Example n Cuda Version n The MKL version of the implementation is basically similar. n See the following link: • http://125.39.136.212:8090/hpc-ai500/hpc-ai500- benchmark/tree/master/micro_benchmarks HPC AI500 Bench 19

  21. Running step n Environment Installation n CUDA: 9.0 n CUDNN: 7.1.4 n OPEN MPI: 3.1.2 n HDF5: 1.10.4 HPC AI500 Bench 19

  22. Running Step n Convolution n Source code: cudnn_conv.cpp n Running script: run_conv.sh n Parameters • Input data size: NCHW format • Filter size: OIHW format • Paddings • Strides • Dilations HPC AI500 Bench 19

  23. Running Step n Pooling n Source code: cudnn_pooling_forward.cpp n Running script: run_pooling.sh n Parameters: • Input data size: NCHW • Filter size • Paddings • Strides • The mode of pooling(0 for max pooling and 1 for average pooling) HPC AI500 Bench 19

  24. Running Step n Fully-connected n Source code: cudnn_fc_forward.cpp n Running script: run_fc.sh n Parameters: • Input data size: NCHW • Output Channel HPC AI500 Bench 19

  25. SPFLOPS HPC AI500 Bench 19

  26. Tensorcore Deep Learning FLOPS Nvidia Volta Architecture 64 FMA floating point operations per cycle HPC AI500 Bench 19

  27. Deep Learning FLOPS n The speed up of enabling Tensorcore HPC AI500 Bench 19

  28. Tensorcore's limitations HPC AI500 Bench 19

  29. HPC AI500 Bench 19

Recommend


More recommend