Tools (e.g. for streaming DAQ, fast ML, automation/self running DAQ,…) Mia Liu, Nhan Tran, Fermilab + input from many in Fast ML and broader community! DOE Basic Research Needs Study (Community meeting for TDAQ) In partnership with: December 3rd, 2019
The dream TDAQ • Powerful intelligent algorithms • Sophisticated algorithms • Training/updating on the fly • Autonomous, self-calibrating • Safe with minimal down-time • Analyze everything, no data loss • Modular, multiple processing layers � 2
Generic system analysis, alert Detector, TDAQ-1 offline system, self- (reconstruct) data tier 1 Accelerator calibration (re-train) analysis, alert TDAQ-N offline system, self- (reconstruct) data tier N calibration (re-train) � 3
Specific systems � 4
Specific systems Real-time controls, trigger, alerts Fixed latency/clock to transient/streaming events Wide range of detector scales and timelines (1ns to 1s) � 5
������ ����������� �������� ����������-���/����� � ����� ����������� ����� ����� ����� �������-�� ������� ��������-������� ���-�� ~1 P B / DAY ~1 P B / S ����������-��������-���� ������������/���� ���������-�����-�� Latency landscape ����-�-���� ����� �-���-�� LSST transient detection? RF signal processing? DUNE DAQ? 1ns 1 μ s 1ms 1s CMS Trigger Latency 1 ns ms 1 s 1 us 1 kHz 100 kHz 1 MB/evt CMS example High-Level L1 Trigger 40 MHz Offline Trigger Offline Massive data rates, on-detector low-latency processing Extreme environments: low-power, cryogenic, high-radiation Computing challenges: Need to investigate in how to + integrate heterogeneous computing platforms + + ASICs FPGAs + + + + � 6
On-detector sophisticated algorithms [https://arxiv.org/abs/1804.06913], [fastmachinelearning.org/hls4ml] ML in the hardware trigger • All FPGA design • Flexible: many algorithm kernels for processing different architectures • Application and adoption growing across the LHC and beyond! • Growing interest with many on-going developments • CNNs, Graphs, RNNs, auto-encoders, binary/ternary • Alternate HLS (Intel, Mentor, Cadence) • Co-processors, multi-FPGA • Intelligent ASICs > 5000 parameter fully connected network in 100 ns • See Phil’s talk 7
hls4…ml…4asic? Hardware acceleration with an emphasis on co-design and fast turnaround time First project: Autoencoder with MNIST benchmark (28 x 28 x 8-bits @ 40 MHz) Original Encoder data Decoder High Reprogrammable speed weights Reconstructed data Compressed data Rate: 40MHz drivers - Efficient bandwidth usage reconfigurable - Reduced power consumption (data transfer) Enable edge compute : e.g. data compression First tests of 1-layer design Programmable and Reconfigurable : reprogrammable weights Latency: 9ns Hardware – Software codesign : algorithm-driven architectural approach Power (FPGA, 28nm) ~ 2.5 W Power (ASIC, 65nm) ~ 40 mW Optimized Mixed signal / Analog techniques : Low power and low latency Area = 0.5mm x 0.5mm for extreme environment (ionizing radiation, deep cryogenic) FNAL, NW, Columbia, work-in-progress � 8
Off detector: heterogeneous computing • Opportunities for deploying + Registers accelerated heterogeneous Control + + CPUs GPUs Unit ASICs (CU) Arithmetic compute for real-time FPGAs Logic Unit + + + + (ALU) analysis FLEXIBILITY EFFICIENCY • How best to integrate into a given TDAQ workflow • ML/not ML • Service or direct connect • GPU, FPGA, ASIC Advances in heterogeneous computing • Proof-of-concept for ML with FPGAs as a driven by service, https://arxiv.org/abs/1904.08986 machine learning � 9
Autonomous, self-calibrating detector Insitu-Training: • FPGA/Sytem-on- Chip O ff -line training: • CPU/ heterogeneous fast-streaming computing • Anomaly detection and weight updating • Transient detection algorithms • Reinforcement learning • Neuromorphic algorithms (spiking) Hardware: FPGA/Sytem-on-Chip � 10
Autonomous, self-tuning accelerator Insitu-Training: • FPGA/Sytem-on- Chip O ff -line training: • CPU/ heterogeneous fast-streaming computing • Anomaly detection and weight updating For accelerator applications, • Transient detection algorithms constant tuning/feedback loop required • Reinforcement learning • Neuromorphic algorithms (spiking) Hardware: FPGA/Sytem-on-Chip � 11
Tools for dream New algorithms • Powerful intelligent algorithms Electronics hardware • FPGAs designed for ML and vice versa and infrastructure • Opportunities for heterogeneous hardware (e.g. Versal) • Push up to the frontest end (ML in ASIC, reconfigurable weights) Systems designed for • New types of algorithms beyond classification & regression operations and control • Autonomous, self-calibrating • Automation for (a) when conditions have changed (b) what actions to take • Fast DAQ paths with deep buffers for monitoring individual channels, how to deal with different time scales? • Training and recalibration “offline-system” (GPU…) or small-scale in situ (ARM processor, in FPGA) • Analyze everything, no data loss • Modular, portable, multiple processing layers • Streaming fast analysis - accessible programming paradigms; SoC R&D • Data storage - Affordable, new/different storage technologies for persistent (parked) datasets � 12
Extra � 13
Recommend
More recommend