Hierarchical Pointer Analysis for Distributed Programs Distributed - - PowerPoint PPT Presentation

hierarchical pointer analysis for distributed programs
SMART_READER_LITE
LIVE PREVIEW

Hierarchical Pointer Analysis for Distributed Programs Distributed - - PowerPoint PPT Presentation

Hierarchical Pointer Analysis for Distributed Programs Distributed Programs Amir Kamil and Katherine Yelick U.C. Berkeley A August 23, 2007 t 23 2007 1 Hierarchical Pointer Analysis Amir Kamil Background 2 Hierarchical Pointer Analysis


slide-1
SLIDE 1

Hierarchical Pointer Analysis for Distributed Programs Distributed Programs

Amir Kamil and Katherine Yelick U.C. Berkeley A t 23 2007 August 23, 2007

1

Hierarchical Pointer Analysis Amir Kamil

slide-2
SLIDE 2

Background

2

Hierarchical Pointer Analysis Amir Kamil

slide-3
SLIDE 3

Hierarchical Machines

  • Parallel machines often have hierarchical

structure

level 1 level 1

(thread local)

level 2

(node local) A 1 B ( )

level 3

(cluster local)

level 4

C D 2 3 4

level 4

(grid world) 3

Amir Kamil Hierarchical Pointer Analysis

slide-4
SLIDE 4

Partitioned Global Address Space

  • Partitioned global address space (PGAS)

languages provide the illusion of shared memory across the machine

  • Wide pointers used to represent global

addresses

  • Contain identifying information plus the physical address
  • Narrow pointers can still be used for addresses

Process ID: 1 Address: 0xf9a0cb48

Narrow pointers can still be used for addresses in the local physical address space

Address: 0xf9a0cb48

4

Amir Kamil Hierarchical Pointer Analysis

slide-5
SLIDE 5

The Problems

5

Hierarchical Pointer Analysis Amir Kamil

slide-6
SLIDE 6

Three Problems

  • What data is private to a thread?
  • What data is local to the physical address

p y space?

  • What possible race conditions can occur?

What possible race conditions can occur?

6

Amir Kamil Hierarchical Pointer Analysis

slide-7
SLIDE 7

Data Privacy

  • Data is private if it cannot leak beyond its source

thread

  • Useful to know which data is private for global

garbage collection, monitor optimization, and

  • ther applications

7

Amir Kamil Hierarchical Pointer Analysis

slide-8
SLIDE 8

Data Locality

  • Recall: global pointers composed identifying
  • Recall: global pointers composed identifying

information and an address

P ID 1 Add 0 f9 0 b48

  • When dereferenced, runtime system must

perform a check to determine if the data is

Process ID: 1 Address: 0xf9a0cb48

perform a check to determine if the data is actually in the local physical address space

If l l th di tl

  • If local, then access directly
  • If not local, then perform communication

Th l b l i t tl i b th

  • Thus, global pointers are more costly in both

space and time, even if the actual data is local

8

Amir Kamil Hierarchical Pointer Analysis

slide-9
SLIDE 9

Race Detection

  • Shared memory introduces the possibility of

race conditions

  • Two threads access the same memory location
  • The accesses can be simultaneous (no intermediate

synchronization)

  • At least one access is a write

9

Amir Kamil Hierarchical Pointer Analysis

slide-10
SLIDE 10

The Solution

10

Hierarchical Pointer Analysis Amir Kamil

slide-11
SLIDE 11

Hierarchical Pointer Analysis

  • A pointer analysis that takes into account the

machine hierarchy can answer the preceding questions

  • For each variable, we want to know not only

from which allocation sites the data could have

  • riginated, but also from which threads

11

Amir Kamil Hierarchical Pointer Analysis

slide-12
SLIDE 12

Related Work

  • Thread-aware pointer analysis has been done by
  • thers
  • Rugina and Rinard , Zhu and Hendren, Hicks, and others
  • None of them did it for hierarchical, distributed machines
  • Data privacy and locality detection previously

done by Liblit, Aiken, and Yelick

  • Uses constraint propagation
  • Does not distinguish allocation sites

12

Amir Kamil Hierarchical Pointer Analysis

slide-13
SLIDE 13

The Implementation

13

Hierarchical Pointer Analysis Amir Kamil

slide-14
SLIDE 14

Titanium

  • Titanium is a single program multiple data
  • Titanium is a single program, multiple data

(SPMD) dialect of Java

  • All threads execute the same program text
  • All threads execute the same program text
  • Designed for distributed machines

Gl b l dd ll th d

  • Global address space – all threads can access

all memory At ti th d d i t

  • At runtime, threads are grouped into processes
  • A thread shares a physical address space with some
  • ther b t not all threads
  • ther, but not all threads

14

Amir Kamil Hierarchical Pointer Analysis

slide-15
SLIDE 15

Titanium Memory Hierarchy

  • Global memory is composed of a hierarchy
  • Global memory is composed of a hierarchy

Program Processes 1 2 3 global Threads

  • Locations can be thread-local (tlocal) process-

tlocal plocal global

  • Locations can be thread-local (tlocal), process-

local (plocal), or potentially in another process (global)

15

Amir Kamil Hierarchical Pointer Analysis

(global)

slide-16
SLIDE 16

The Analysis

16

Hierarchical Pointer Analysis Amir Kamil

slide-17
SLIDE 17

Approach

  • We define a small SPMD language based on
  • We define a small SPMD language based on

Titanium

  • We produce a type system that accounts for the
  • We produce a type system that accounts for the

memory hierarchy

  • The analysis can handle an arbitrary number of levels but
  • The analysis can handle an arbitrary number of levels, but

we use three levels in this talk

  • We give an overview of the pointer analysis
  • We give an overview of the pointer analysis

inference rules

17

Amir Kamil Hierarchical Pointer Analysis

slide-18
SLIDE 18

Language Syntax

  • Types

Types τ ::= int | refq τ

  • Qualifiers
  • Qualifiers

q ::= tlocal | plocal | global (tl l l l l b l) (tlocal plocal global)

  • Expressions

e ::= newl τ (allocation) | transmit e1 from e2 (communication) | transmit e1 from e2 (communication) | e1 e2 (dereferencing assignment) | convert(e n) (type conversion)

18

Amir Kamil Hierarchical Pointer Analysis

| convert(e, n) (type conversion)

slide-19
SLIDE 19

Type Rules – Allocation

  • The expression newl τ allocates space of type τ

The expression newl τ allocates space of type τ in local memory and returns a reference to the location

  • The label l is unique for each allocation site and will be

used by the pointer analysis

  • The resulting reference is qualified with tlocal, since it

references thread-local memory

Thread 0

newl int tl l

Γ newl τ : reftlocal τ

tlocal

19

Amir Kamil Hierarchical Pointer Analysis

slide-20
SLIDE 20

Type Rules – Communication

  • The expression transmit e1 from e2

The expression transmit e1 from e2 evaluates e1 on the thread given by e2 and retrieves the result

  • If e1 has reference type, the result type must be

widened to global widened to global

  • Statically do not know source thread, so must assume it

can be any thread Γ e1 : τ Γ e2 : int y Γ e1 : τ Γ e2 : int Γ transmit e1 from e2 : expand(τ global)

Thread 0 Thread 1

y tlocal

expand(τ, global)

global transmit y from 1

expand(refq τ, q’) ref(q, q’) τ expand(τ q’) τ otherwise

20

Amir Kamil Hierarchical Pointer Analysis

expand(τ, q ) τ otherwise

slide-21
SLIDE 21

Type Rules – Dereferencing Assignment

  • The expression e1 e2 puts the value of e2 into

The expression e1 e2 puts the value of e2 into the location referenced by e1 (like *e1 = e2 in C)

  • Some assignments are unsound

Some assignments are unsound

f b t( )

Thread 0 Thread 1

Γ e1 : refq τ Γ e2 : τ robust(τ, q) Γ e1 e2 : refq τ

y plocal tl l tlocal l l

Γ e1 e2 : refq τ robust(ref τ q’) false if q q’

z tlocal tlocal plocal

robust(refq τ, q ) false if q q robust(τ, q’) true otherwise

21

Amir Kamil Hierarchical Pointer Analysis

slide-22
SLIDE 22

Type Rules – Type Conversion

  • The expression convert(e, q) is an assertion

The expression convert(e, q) is an assertion that e refers to data that is no further than q

  • Titanium code often checks if data is plocal and then

p casts to it before operating on it for efficiency

Thread 0

Γ e : refq’ τ Γ t( ) f

x global

Γ convert(e, q) : refq τ

22

Amir Kamil Hierarchical Pointer Analysis

slide-23
SLIDE 23

Pointer Analysis

  • Since language is SPMD, analysis is only done

for a single thread

W th d 0 i l

  • We use thread 0 in our examples
  • Each expression has a points-to set of abstract

locations that it can reference locations that it can reference

  • Abstract locations also have points-to sets

23

Amir Kamil Hierarchical Pointer Analysis

slide-24
SLIDE 24

Abstract Locations

  • Abstract locations consist of label and qualifier
  • A-loc (l, q) can refer to any concrete location allocated

at label l that is at most distance q from thread 0 at label l that is at most distance q from thread 0

Thread 0 Thread 1 Thread 0

newl int tlocal

Thread 1

newl int tlocal

(l, tlocal) (l l l) (l, plocal)

24

Amir Kamil Hierarchical Pointer Analysis

slide-25
SLIDE 25

Pointer Analysis – Allocation and Pointer Analysis – Allocation and Communication

  • The inference rules for allocation and

The inference rules for allocation and communication are similar to the type rules

  • An allocation newl τ produces a new abstract

An allocation newl τ produces a new abstract location (l, tlocal)

  • The result of the expression transmit e

from

  • The result of the expression transmit e1 from

e2 is the set of a-locs resulting from e1 but with global qualifiers global qualifiers

e1 {(l1, tlocal), (l2, plocal), (l3, global)} transmit e1 from e2 {(l1, global), (l2, global), (l3, global)}

25

Amir Kamil Hierarchical Pointer Analysis

slide-26
SLIDE 26

Pointer Analysis – Dereferencing Pointer Analysis – Dereferencing Assignment

  • For assignment, must take into account actions

For assignment, must take into account actions

  • f other threads

Thread 0 Thread 1 Thread 2

x (l1, tlocal) x (l1, plocal) x (l1, plocal) (l2, tlocal) (l2, plocal) (l2, plocal) y y y

{(l l l)} (l1, tlocal) (l2, plocal), x y : x {(l1, tlocal)}, y {(l2, plocal)} (l1, plocal) (l2, plocal), (l l b l) (l l b l)

26

Amir Kamil Hierarchical Pointer Analysis

(l1, global) (l2, global)

slide-27
SLIDE 27

Pointer Analysis – Type Conversion

  • In the type conversion convert(e

q) the

  • In the type conversion convert(e, q), the

program is illegal if e evaluates to a location further than q further than q

  • Thus, the result of the expression convert(e,

q) is the set of a-locs resulting from e with the q) is the set of a-locs resulting from e with the qualifiers reduced to at most q

e {(l1, tlocal), (l2, plocal), (l3, global)} ( l l) {(l l l) (l l l) (l l l)} convert(e, plocal) {(l1, tlocal), (l2, plocal), (l3, plocal)}

27

Amir Kamil Hierarchical Pointer Analysis

slide-28
SLIDE 28

Evaluation

28

Hierarchical Pointer Analysis Amir Kamil

slide-29
SLIDE 29

Benchmarks

  • Five application benchmarks used to evaluate

the pointer analysis

Benchmark Line Count Description amr 7581 Adaptive mesh refinement suite gas 8841 Hyperbolic solver for a gas dynamics problem gas 8841 Hyperbolic solver for a gas dynamics problem ft 1192 NAS Fourier transform benchmark cg 1595 NAS conjugate gradient benchmark mg 1952 NAS multigrid benchmark

29

Amir Kamil Hierarchical Pointer Analysis

g g

slide-30
SLIDE 30

Running Time

  • Determine actual cost of introducing multiple

levels into the pointer analysis

  • Tests run on 2.4GHz Pentium 4 with 512MB RAM
  • Three analysis variants compared

Three analysis variants compared

Name Description PA1 Single-level pointer analysis PA2 Two-level pointer analysis (thread-local and global) PA2 Two-level pointer analysis (thread-local and global) PA3 Three-level pointer analysis

30

Amir Kamil Hierarchical Pointer Analysis

slide-31
SLIDE 31

Running Time Results

Pointer Analysis Running Time

3.5 4

PA1 PA2 PA3

2.5 3

conds)

1 5 2

s Time (sec

Good

1 1.5

Analysis

0.5 amr gas ft cg mg

31

Amir Kamil Hierarchical Pointer Analysis

g g g

Benchmark

slide-32
SLIDE 32

Data Privacy Detection

  • In pointer analysis, an allocation site is private if
  • nly thread-local references to it are used
  • Thus, only two levels, thread-local and global, needed in

the pointer analysis

  • Two types of analysis compared

Name Description SQI Constraint-based analysis by Liblit, Aiken, and Yelick; does not distinguish allocation sites does not distinguish allocation sites PA2 Two-level pointer analysis (thread-local and global)

32

Amir Kamil Hierarchical Pointer Analysis

slide-33
SLIDE 33

Data Privacy Detection Results

Data Privacy Detection

80 90 100

Private

SQI PA2

60 70 80

e Thread-P

40 50 60

mined to be

Good

20 30

cent Determ

10 amr gas ft cg mg

Perc 33

Amir Kamil Hierarchical Pointer Analysis

g g g

Benchmark

slide-34
SLIDE 34

Data Locality Detection

  • Goal: statically determine which pointers must

be process-local

  • Three analyses compared

Name Description LQI Constraint-based analysis by Liblit and Aiken; does not LQI y y ; distinguish allocation sites PA2 Two-level pointer analysis (thread-local and global) PA3 Three-level pointer analysis

34

Amir Kamil Hierarchical Pointer Analysis

slide-35
SLIDE 35

Data Locality Detection Results

Data Locality Detection

80 90 100

  • Local

LQI PA2 PA3

60 70 80

e Process-

40 50 60

mined to be

Good

20 30

cent Determ

10 amr gas ft cg mg

Perc 35

Amir Kamil Hierarchical Pointer Analysis

g g g

Benchmark

slide-36
SLIDE 36

Race Detection

  • Pointer analysis used with an existing

concurrency analysis to detect potential races at compile-time

  • Three analyses compared

Name Description Concurrency analysis plus constraint based data concur Concurrency analysis plus constraint-based data sharing analysis and type-based alias analysis concur+PA1 Concurrency analysis plus single-level pointer analysis concur PA1 Concurrency analysis plus single level pointer analysis concur+PA3 Concurrency analysis plus three-level pointer analysis

36

Amir Kamil Hierarchical Pointer Analysis

slide-37
SLIDE 37

Race Detection Results

Static Races Detected

100000

concur concur+PA1 concur+PA3

11493 3065 4082 2029 10000

c Scale)

793 1514 951 446 286 517 262 1000

  • garithmic

Good

207 198 67 66 100

Races (Lo

10 amr gas ft cg mg

37

Amir Kamil Hierarchical Pointer Analysis

g g g

Benchmark

slide-38
SLIDE 38

Conclusion

38

Hierarchical Pointer Analysis Amir Kamil

slide-39
SLIDE 39

Conclusion

  • We developed a pointer analysis for hierarchical,

distributed machines

  • The cost of introducing the memory hierarchy

into the analysis is small

  • On the other hand, the payoff is large

39

Amir Kamil Hierarchical Pointer Analysis

slide-40
SLIDE 40

Future Work

  • Scientific programs tend to use a lot of array-

based data structures

  • Need array index analysis to properly analyze them
  • Implement a dynamic race detector

p y

  • Use static results to minimize the program locations that

need to be tracked

40

Amir Kamil Hierarchical Pointer Analysis

slide-41
SLIDE 41

Questions

41

Hierarchical Pointer Analysis Amir Kamil