How to Use HPC AI500 Zihan Jiang, Xingwang Xiong, Tianshu Hao, and Jianfeng Zhan INSTITUTE O http://www.benchcouncil.org/HPCAI500/index.html OF C COMPUTING T ICT, Chinese Academy of Sciences TECHNOLOGY ASPLOS 2018, Williamsburg, VA, USA
General Steps to Use HPC AI500 n Current release n Version 1.0 on • http://www.benchcouncil.org/HPCAI500/index.html n Reference Implementation on BenchHub: • http://125.39.136.212:8090/hpc-ai500/ n General steps to run the benchmarks n Download the reference implementation on BenchHub n Prepare the dataset, environment according README.md n Running the scripts (training, evaluation, inference) HPC AI500 Bench 19
Download from BenchHub n http://125.39.136.212:8090/hpc-ai500/ n Component benchmarks: n http://125.39.136.212:8090/hpc-ai500/EWA (Extreme Weather Analysis) n Micro Benchmarks: n CUDA Version: http://125.39.136.212:8090/hpc-ai500/hpc-ai500- benchmark/tree/master/micro_benchmarks/CUDA_version n MKL Version : http://125.39.136.212:8090/hpc-ai500/hpc-ai500- benchmark/tree/master/micro_benchmarks/MKL_version HPC AI500 Bench 19
n Component Benchmark n Extreme weather analysis n Micro Benchmark HPC AI500 Bench 19
Overview n Extreme weather poses a great challenge to human society. Understanding extreme weather life cycle and even predicting its future trend become a significant scientific goal. n Achieving this goal always requires accurately identifying the weather patterns to acquire the insight of climate change based on massive climate data analysis. Tropical Cyclone Atmospheric River Tropical Depression Extratropical Cyclone HPC AI500 Bench 19
Overview n Using deep learning as the data analysis tool to automatically identify the extreme weather patterns, instead of if-else rules defined by human expert. Essentially an object detection task Original weather images labeled weather images HPC AI500 Bench 19
Dataset n Dataset Intro: n https://extremeweatherdataset.github.io/ n Dataset Download: The files are large (62 GB each). Obtain them n from the following Globus endpoint. https://app.globus.org/file- n manager?origin_id=89a33dca-e540-11e9-9bfc- 0a19784404f4&origin_path=%2F You will need a Globus endpoint of your own n for the transfer. n Features: 16 channels, high resolution (1152 * 768) n HPC AI500 Bench 19
Adopted Model n Faster-RCNN n ResNet-50 + FPN • See the model-desc.log in the EWA repo on BenchHub for details. • http://125.39.136.212:8 090/hpc-ai500/EWA HPC AI500 Bench 19
Running Steps n Data preprocessing # h5 ⟹ JSON file with COCO format python hdf5_to_json.py -i ${HDF5_PATH} -o ${ANNO_DIR_PATH} -y ${year} # h5 ⟹ 16-channel TIFF images python hdf5_to_tif.py -i ${HDF5_PATH} -o ${TIFF_DIR_PATH} -y ${year} HPC AI500 Bench 19
Running Steps n Environment installation # build a docker image cd docker docker build -t climo . # start and run the docker image: docker run --gpus all --ipc=host -p 2222:22 -d climo docker exec -it climo bash HPC AI500 Bench 19
Running Steps n Training export PYTHONPATH="$(pwd)/src" mpirun -np 32 --hostfile "src/hostfile" -bind-to none -map-by slot \ -x NCCL_DEBUG=INFO \ -x LD_LIBRARY_PATH \ -x NCCL_SOCKET_IFNAME=eth0 \ --allow-run-as-root \ python src/train.py --logdir /path/to/logdir/ \ --config MODE_MASK=False MODE_FPN=True \ DATA.BASEDIR=${DATA_DIR} TRAINER=horovod HPC AI500 Bench 19
Running Steps n Inference python src/predict.py --predict /path/to/dataset/1979/climo_1979_00101.tif \ --load train_log/${dir}/model- 247500 \ --config MODE_MASK=False MODE_FPN=True HPC AI500 Bench 19
Running Steps n Get the Time-to-accuracy # time_to_accuracy.sh export PYTHONPATH="$(pwd)/src" LOG_DIR=train_log/ ACC_THRESHOLD=0.11 python src/time_to_accuracy.py --logdir ${LOG_DIR} --acc_threshold ${ACC_THRESHOLD} HPC AI500 Bench 19
Visualization n Run HTTP Server export PYTHONPATH="$(pwd)/src" python http-server.py --load train_log/model-167000 \ --config MODE_MASK=False MODE_FPN=True n Visualization on browser n http://localhost:5000 * The prediction result contains the predicted boxes and their confidence. * TD, TC, EC, and AR represent Tropical Depression, Tropical Cyclone, Extratropical Cyclone, and Atmospheric River, respectively. HPC AI500 Bench 19
Other Metrics n Obtain other metrics from Tensorboard n training loss n mAP (mean Average Precision) HPC AI500 Bench 19
The Impact of Batchsize Ranks:128 Ranks:32 HPC AI500 Bench 19
Scaling Evaluation Only 50% scaling efficiency. 180 Reason: 160 The EWA workload use Faster-rcnn Throuthput(samples/sec) 140 for object detection. The sizes and 120 numbers of objects are different in each image, which leads to different 100 amount of computation in each rank. 80 60 40 20 0 1 8 16 32 Practical Ideal HPC AI500 Bench 19
n Component Benchmark n Extreme weather analysis n Micro Benchmark HPC AI500 Bench 19
Overview n Objective n Evaluate the upper bound performance of the systems. n Significant DL operators based on the component workloads. n Convolution n Pooling n Fully-connected HPC AI500 Bench 19
Example n Cuda Version n The MKL version of the implementation is basically similar. n See the following link: • http://125.39.136.212:8090/hpc-ai500/hpc-ai500- benchmark/tree/master/micro_benchmarks HPC AI500 Bench 19
Running step n Environment Installation n CUDA: 9.0 n CUDNN: 7.1.4 n OPEN MPI: 3.1.2 n HDF5: 1.10.4 HPC AI500 Bench 19
Running Step n Convolution n Source code: cudnn_conv.cpp n Running script: run_conv.sh n Parameters • Input data size: NCHW format • Filter size: OIHW format • Paddings • Strides • Dilations HPC AI500 Bench 19
Running Step n Pooling n Source code: cudnn_pooling_forward.cpp n Running script: run_pooling.sh n Parameters: • Input data size: NCHW • Filter size • Paddings • Strides • The mode of pooling(0 for max pooling and 1 for average pooling) HPC AI500 Bench 19
Running Step n Fully-connected n Source code: cudnn_fc_forward.cpp n Running script: run_fc.sh n Parameters: • Input data size: NCHW • Output Channel HPC AI500 Bench 19
SPFLOPS HPC AI500 Bench 19
Tensorcore Deep Learning FLOPS Nvidia Volta Architecture 64 FMA floating point operations per cycle HPC AI500 Bench 19
Deep Learning FLOPS n The speed up of enabling Tensorcore HPC AI500 Bench 19
Tensorcore's limitations HPC AI500 Bench 19
HPC AI500 Bench 19
Recommend
More recommend