modeling resource coupled computations
play

Modeling Resource-Coupled Computations MarkHereld - PowerPoint PPT Presentation

Modeling Resource-Coupled Computations MarkHereld Computa0onIns0tute Mathema0csandComputerScience ArgonneLeadershipCompu0ngFacility ArgonneNa0onalLaboratory UniversityofChicago Roadmap


  1. Modeling Resource-Coupled Computations Mark
Hereld
 Computa0on
Ins0tute
 Mathema0cs
and
Computer
Science
 Argonne
Leadership
Compu0ng
Facility
 Argonne
Na0onal
Laboratory
 University
of
Chicago


  2. Roadmap • issues
 and
 ideas
 • models
 and
 measurements
 • implica0ons
 and
 work
in
progress 


  3. Issue • Given
increasingly
massive
(and
complex)
datasets…
 • how
to
connect
them
to
computa0onal
and
display
 resources
that
support
visualiza0on
and
analysis?


 • holis0c
approaches
to
alloca0ng
simula0on,
analysis,
 visualiza0on,
display,
storage,
and
network
resources
 • create
and
exploit
ways
to
op0mally
couple
these
 resources
in
real
0me


  4. Common sense • Analysis
engines
must
be
co‐located
with
simula0on
 engines
 • …or
even,
analysis
code
must
be
co‐located
with
 simula0on
code,
i.e
 in
situ
 • Display
resources
must
be
integrated
locally
with
HPC
 resources
 • In
general,
wide‐area
applica0ons
will
become
 impossible…
 • But,
maybe
the
situa0on
isn’t
so
dire.


  5. ideas • Ideas
 • Models
 • Measurements
 ideas
 • Consequences
 • Future



  6. Mitigation • More
efficient
I/O
prac0ces
 – Many
(most)
inefficiencies
in
R/W
rates
amenable
to
beWer
 prac0ces
by
applica0on
developer
 – In
addi0on
to
improvements
in
performance
of
I/O
libraries
 • BeWer
data
management
 – BeWer
data
layout
 • BeWer
brute
force
compression
methods
 – Uncertainty
aware;
domain
aware
 • Leveraging
limita0ons
at
the
des0na0on
 – Pixel
real
estate
 – Perceptual
limita0ons
(and
features)


  7. Coupled Resources • remote
visualiza0on :
couple
data
and
large
 computa0onal
resources
to
remote
display
hardware
 • in
situ 
analysis
and
visualiza0on:
merge
simula0on
 and
analysis
code
on
single
machine
 • co‐analysis :
couple
simula0on
on
supercomputer
to
 live
analysis
on
visualiza0on
and
analysis
plaZorm


  8. models • Ideas
 • Models
 • Measurements
 • Consequences
 models
 • Future



  9. ALCF
Network
Architecture
 40K
BGP
Compute
Nodes
 Eureka
 100
Nodes
 640
BGP
I/O
Nodes
 10G
MX
 Tree
 10GE
 4.3Tbps
 100
x
10G
 Myrinet
Switch
 
=
 1
Tbps
 
Complex
 5‐Stage
CLOS

 10G
MX
 
10GE<‐>MX
conversion
 MX<‐>MX
 640
x
10G
 128
x
10G
 
=
 6.4
Tbps
 
=
 1.28
Tbps
 Tbps
–
Terabits/sec
 128
File‐Server
 
Nodes
 Theore0cal
Max
Bandwidth
from
 I/O
Nodes
to
Eureka
 (Memory
to
Memory)



=
 1
Tbps

 
 
 
 
 
 
 
 
 
 
 
Bi‐direc0onal 


=
 2
Tbps
 Theore0cal
Max
Bandwidth
from
 I/O
Nodes
to
FileServer
 (Memory
to
Memory)
=
 1.28
Tbps
 
 
 
 
 
 
 
 
 
 
 
Bi‐direc0onal 



=
 2.56
Tbps
 Theore0cal
Max
Bandwidth
from
 Eureka
to
FileServer
 (Memory
to
Memory)




=
 1
Tbps
 
 
 
 
 
 
 
 
 
 
 
 Bi‐direc0onal 



=
2
Tbps


  10. Data Analytics Resource: Eureka • Data
analy0cs
and
visualiza0on
cluster
at
ALCF
 • (2)
head
nodes,
(100)
compute
nodes
 – (2)
Nvidia
Quadro
FX5600
graphics
cards
 – (2)
XEON
E5405
2.00
GHz
quad
core
processors
 – 32
GB
RAM:
(8)
4
rank,
4GB
DIMMS
 – (1)
Myricom
10G
CX4
NIC
 – (2)
250GB
local
disks;
(1)
system,
(1)
minimal
scratch
 – 32
GFlops
per
server


  11. Application • FLASH
 – Mul0‐physics
code:
Gravita0on,
nuclear
chemistry,
MHD
 – Laboratory
to
Universe
 • Mul0ple
(~20)
simula0ons
 – 8km
resolu0on,
10K
to
100K
blocks
each
(16
*
16
*
16)
voxel
 – 2
Racks
(8K
cores)
of
the
ANL’s
Intrepid
(BGP)
 – typical
simula0on
is
10
runs
each
12
hours
 • O(hour)
per
checkpoint
cycle
 – 66%
0me
spent
simula0ng
 – 33%
0me
spent
non‐overlapping
I/O


  12. measurements • Ideas
 • Models
 • Measurements
 • Consequences
 measurements
 • Future



  13. Flash IO for 1 run (12 hours) • Total
Run
0me

=

41557
secs
 – IO
0me
during
run
=
14325
sec

(34%
of
the
0me)
 – Circa
March
2009
 • Par0cle
Data:
 – 417
Files
(0.1GB
each)
=

 41.7
GB 
 – Time
spent
wri0ng
=

 9047
secs
 (
22%
of
the
run
0me)
 • Plot
files:
 – 104
files
(2.5GB
each)
;Total
=
 260GB
 
 – Time
spent
in
wri0ng
=

 3897
secs
 (
9%
of
the
run
0me)
 • Checkpoint

files:
 – 10
files
(8
GB
each)
;Total
=
 80GB
 
 – Time
spent
in
wri0ng
=
 1144
secs
 (
3%
of
the
run
0me)


  14. FLASH Supernova Explosion Project • mul0ple
(~20)
simula0ons
 – 8km
resolu0on
 – 10K
to
100K
blocks
each
(16
*
16
*
16)
voxel 
 – 2
Racks
(8K
cores)
of
the
ANL’s
Intrepid
(BGP) 
 – typical
simula0on
is
10
runs
each
12
hours
 – Circa
November
2009 
 • ======================================================= • File Type File Size #files #files Data Size • / Run / Sim • ======================================================= • Particle ~ 131 MB ~ 500 5000 500 GB • Plot ~ 13 GB 40-90 800 10 TB • Checkpoint ~ 42 GB 5-10 100 4.2 TB • =======================================================

  15. Internal Network Experiments BGP
I/O
Node
 Tree
Network
 Switch
 Analysis
Node
 BGP
Compute
Nodes


  16. Toward middleware to facilitate co-analysis BGP
Compute
Nodes


  17. consequences • Ideas
 • Models
 • Measurements
 • Consequences
 • Future

 consequences


  18. Map Intrepid I/O to Eureka • Speed
up
the
applica0on
 – Offload
data
organiza0on
and
disk
writes
 • Free
co‐analysis
 – Produce
several
high
resolu0on
movies
 – Data
compression
 – Mul0‐0me
step
caching
for
window
analysis
 • Eureka
is
an
accelerator
and
co‐analysis
engine
at
only
 1‐2%
cost
of
Intrepid


  19. future • Ideas
 • Models
 • Measurements
 • Consequences
 future
 • Future



  20. Works in Progress • Footprints
 – System
level
use
paWern
data
collec0on
 – Boo0ng
up
a
mini‐consor0um
of
resource
monitoring
enthusiasts
 • in
situ
 – Papka
parallel
sorware
rendering
 – Tom
Peterka
and
Rob
Ross
scaling
sorware
rendering
algorithms
 – HW‐SW
rendering
comparison
experiments
 • Co‐analysis
 – StarGate
experiments
 – Intrepid
<>
Eureka
communica0on
experiments
 – FLASH
test
 • Remote
Visualiza0on
 – Pixel
shipping
experiments
and
frameworks


  21. Eureka Rendering Times Surveyor Rendering Times 256x256x256 256x256x256 1 0.1 Full Frame Time 100 Time (secs) Time (secs) Full Frame Time Render Time 10 0.01 Render Time Composite Network Time 1 Composite Network Time 0.001 Composite Render Time 0.1 Sync State Time Composite Render Time 0.01 0.0001 Sync State Time 0.001 0.00001 0.0001 1 10 100 1000 1 10 100 1000 Num Procs Num Procs Eureka Rendering Times 512x512x512 Surveyor Rendering Times 512x512x512 1 0.1 100 Times (secs) Full Frame Time Full Frame Time 10 Time (secs) 0.01 Render Time Render Time 1 Composite Network Time Composite Network Time Composite Render Time 0.001 0.1 Sync State Time Composite Render Time 0.01 Sync State Time 0.0001 0.001 0.0001 0.00001 1 10 100 1000 1 10 100 1000 Num Procs Num Procs Eureka Rendering Times 1024x1024x1024 Eureka Rendering Times 2048x2048x2048 10 1 1 0.1 Time (secs) Full Frame Time Time (secs) 0.1 Full Frame Time Render Time Render Time 0.01 0.01 Composite Network Time Composite Network Time Composite Render Time Composite Render Time 0.001 Sync State Time Sync State Time 0.001 0.0001 0.00001 0.0001 1 10 100 1 10 100 1000 Num Procs Num Procs

  22. Wide Area Experiments Simula0on
 Visualiza0on
 Interac0ve
Display
 RESULTS
 • 4K
uniform
grid
cube
 RAW
 • Single
variable,
float
 • Large
0led
display
 • 257
GB
per
0me
step
 • Volume
rendering
 • Naviga0on
 DATA
 • 577
0me
steps
 • 4K
x
4K
pixel
 • Manipula0on
 CONTROL
 • 150
TB
total
 DETAILS
AND
DEMO
IN
SDSU
BOOTH


Recommend


More recommend