User also needs Critical questions: crypto to be secure . Which cryptographic systems Some examples of crypto failing: fit the user’s cost constraints? ✎ 2009 exploit of RSA-512 ✮ optimize choice of signatures in TI calculators cryptosystem + algorithm (small public computation); for each user device. ✎ 2010 exploit of ECDSA signatures in PlayStation 3 (trivial—stupid Sony mistake); ✎ 2012 exploit of MD5-based signatures by Flame malware (somewhat larger computation). Presumably many more examples not known to the public.
User also needs Critical questions: crypto to be secure . Which cryptographic systems Some examples of crypto failing: fit the user’s cost constraints? ✎ 2009 exploit of RSA-512 ✮ optimize choice of signatures in TI calculators cryptosystem + algorithm (small public computation); for each user device. ✎ 2010 exploit of ECDSA Which cryptographic systems signatures in PlayStation 3 can be broken by attackers? (trivial—stupid Sony mistake); ✮ optimize choice of ✎ 2012 exploit of MD5-based attack algorithm + device signatures by Flame malware for each cryptosystem. (somewhat larger computation). Presumably many more examples not known to the public.
User also needs Critical questions: crypto to be secure . Which cryptographic systems Some examples of crypto failing: fit the user’s cost constraints? ✎ 2009 exploit of RSA-512 ✮ optimize choice of signatures in TI calculators cryptosystem + algorithm (small public computation); for each user device. ✎ 2010 exploit of ECDSA Which cryptographic systems signatures in PlayStation 3 can be broken by attackers? (trivial—stupid Sony mistake); ✮ optimize choice of ✎ 2012 exploit of MD5-based attack algorithm + device signatures by Flame malware for each cryptosystem. (somewhat larger computation). Heavy interactions between Presumably many more examples high-level algorithms and not known to the public. low-level computer architecture.
also needs Critical questions: Theory vs. to be secure . Which cryptographic systems Predictions examples of crypto failing: fit the user’s cost constraints? physicists exploit of RSA-512 ✮ optimize choice of sometimes ✎ signatures in TI calculators cryptosystem + algorithm Common (small public computation); for each user device. underlying exploit of ECDSA ✎ Which cryptographic systems calculations signatures in PlayStation 3 can be broken by attackers? (trivial—stupid Sony mistake); ✮ optimize choice of exploit of MD5-based ✎ attack algorithm + device signatures by Flame malware for each cryptosystem. (somewhat larger computation). Heavy interactions between Presumably many more examples high-level algorithms and known to the public. low-level computer architecture.
Critical questions: Theory vs. experiment secure . Which cryptographic systems Predictions made b of crypto failing: fit the user’s cost constraints? physicists are often of RSA-512 ✮ optimize choice of sometimes wrong. ✎ TI calculators cryptosystem + algorithm Common sources of computation); for each user device. underlying models of ECDSA ✎ Which cryptographic systems calculations from those PlayStation 3 can be broken by attackers? (trivial—stupid Sony mistake); ✮ optimize choice of of MD5-based ✎ attack algorithm + device Flame malware for each cryptosystem. rger computation). Heavy interactions between many more examples high-level algorithms and the public. low-level computer architecture.
Critical questions: Theory vs. experiment Which cryptographic systems Predictions made by theoretical failing: fit the user’s cost constraints? physicists are often disputed, ✮ optimize choice of sometimes wrong. ✎ calculators cryptosystem + algorithm Common sources of error: computation); for each user device. underlying models of physics; ✎ Which cryptographic systems calculations from those models. 3 can be broken by attackers? mistake); ✮ optimize choice of MD5-based ✎ attack algorithm + device malware for each cryptosystem. computation). Heavy interactions between examples high-level algorithms and low-level computer architecture.
Critical questions: Theory vs. experiment Which cryptographic systems Predictions made by theoretical fit the user’s cost constraints? physicists are often disputed, ✮ optimize choice of sometimes wrong. cryptosystem + algorithm Common sources of error: for each user device. underlying models of physics; Which cryptographic systems calculations from those models. can be broken by attackers? ✮ optimize choice of attack algorithm + device for each cryptosystem. Heavy interactions between high-level algorithms and low-level computer architecture.
Critical questions: Theory vs. experiment Which cryptographic systems Predictions made by theoretical fit the user’s cost constraints? physicists are often disputed, ✮ optimize choice of sometimes wrong. cryptosystem + algorithm Common sources of error: for each user device. underlying models of physics; Which cryptographic systems calculations from those models. can be broken by attackers? Experiments aren’t perfect ✮ optimize choice of but catch many errors; attack algorithm + device resolve many disputes; for each cryptosystem. provide raw data Heavy interactions between leading to new theories; high-level algorithms and build more confidence than low-level computer architecture. theory alone can ever produce.
Critical questions: Theory vs. experiment Is physics Of course cryptographic systems Predictions made by theoretical user’s cost constraints? physicists are often disputed, Every field ✮ optimize choice of sometimes wrong. theoreticians cryptosystem + algorithm regarding Common sources of error: h user device. experimental underlying models of physics; measure cryptographic systems calculations from those models. we compa broken by attackers? Experiments aren’t perfect ✮ optimize choice of but catch many errors; attack algorithm + device resolve many disputes; h cryptosystem. provide raw data interactions between leading to new theories; high-level algorithms and build more confidence than w-level computer architecture. theory alone can ever produce.
questions: Theory vs. experiment Is physics uniquely Of course not. cryptographic systems Predictions made by theoretical cost constraints? physicists are often disputed, Every field of science: choice of sometimes wrong. theoreticians make ✮ regarding observable algorithm Common sources of error: device. experimental scientists underlying models of physics; measure those phenomena; cryptographic systems calculations from those models. we compare the re y attackers? Experiments aren’t perfect choice of ✮ but catch many errors; rithm + device resolve many disputes; cryptosystem. provide raw data interactions between leading to new theories; rithms and build more confidence than computer architecture. theory alone can ever produce.
Theory vs. experiment Is physics uniquely error-prone? Of course not. ystems Predictions made by theoretical constraints? physicists are often disputed, Every field of science: sometimes wrong. theoreticians make predictions ✮ regarding observable phenomena; rithm Common sources of error: experimental scientists underlying models of physics; measure those phenomena; ystems calculations from those models. we compare the results. ers? Experiments aren’t perfect ✮ but catch many errors; device resolve many disputes; provide raw data een leading to new theories; build more confidence than rchitecture. theory alone can ever produce.
Theory vs. experiment Is physics uniquely error-prone? Of course not. Predictions made by theoretical physicists are often disputed, Every field of science: sometimes wrong. theoreticians make predictions regarding observable phenomena; Common sources of error: experimental scientists underlying models of physics; measure those phenomena; calculations from those models. we compare the results. Experiments aren’t perfect but catch many errors; resolve many disputes; provide raw data leading to new theories; build more confidence than theory alone can ever produce.
Theory vs. experiment Is physics uniquely error-prone? Of course not. Predictions made by theoretical physicists are often disputed, Every field of science: sometimes wrong. theoreticians make predictions regarding observable phenomena; Common sources of error: experimental scientists underlying models of physics; measure those phenomena; calculations from those models. we compare the results. Experiments aren’t perfect What if measurements are but catch many errors; too expensive to carry out? resolve many disputes; Measurements start with provide raw data scaled-down experiments, leading to new theories; work up towards build more confidence than the scale of interest. theory alone can ever produce.
ry vs. experiment Is physics uniquely error-prone? Algorithm Of course not. error-prone Predictions made by theoretical physicists are often disputed, Every field of science: Theoreticians sometimes wrong. theoreticians make predictions regarding regarding observable phenomena; These predictions Common sources of error: experimental scientists disputed, underlying models of physics; measure those phenomena; calculations from those models. we compare the results. eriments aren’t perfect What if measurements are catch many errors; too expensive to carry out? many disputes; Measurements start with rovide raw data scaled-down experiments, leading to new theories; work up towards more confidence than the scale of interest. alone can ever produce.
eriment Is physics uniquely error-prone? Algorithm analysis Of course not. error-prone field of made by theoretical often disputed, Every field of science: Theoreticians make wrong. theoreticians make predictions regarding algorithm regarding observable phenomena; These predictions a sources of error: experimental scientists disputed, sometimes dels of physics; measure those phenomena; those models. we compare the results. ren’t perfect What if measurements are errors; too expensive to carry out? disputes; Measurements start with scaled-down experiments, theories; work up towards confidence than the scale of interest. ever produce.
Is physics uniquely error-prone? Algorithm analysis is another Of course not. error-prone field of science. retical disputed, Every field of science: Theoreticians make predictions theoreticians make predictions regarding algorithm performance. regarding observable phenomena; These predictions are often experimental scientists disputed, sometimes wrong. physics; measure those phenomena; models. we compare the results. What if measurements are too expensive to carry out? Measurements start with scaled-down experiments, work up towards than the scale of interest. duce.
Is physics uniquely error-prone? Algorithm analysis is another Of course not. error-prone field of science. Every field of science: Theoreticians make predictions theoreticians make predictions regarding algorithm performance. regarding observable phenomena; These predictions are often experimental scientists disputed, sometimes wrong. measure those phenomena; we compare the results. What if measurements are too expensive to carry out? Measurements start with scaled-down experiments, work up towards the scale of interest.
Is physics uniquely error-prone? Algorithm analysis is another Of course not. error-prone field of science. Every field of science: Theoreticians make predictions theoreticians make predictions regarding algorithm performance. regarding observable phenomena; These predictions are often experimental scientists disputed, sometimes wrong. measure those phenomena; Particularly error-prone: we compare the results. cryptanalytic extrapolations What if measurements are from an academic computation too expensive to carry out? to a serious real-world attack. Measurements start with scaled-down experiments, work up towards the scale of interest.
Is physics uniquely error-prone? Algorithm analysis is another Of course not. error-prone field of science. Every field of science: Theoreticians make predictions theoreticians make predictions regarding algorithm performance. regarding observable phenomena; These predictions are often experimental scientists disputed, sometimes wrong. measure those phenomena; Particularly error-prone: we compare the results. cryptanalytic extrapolations What if measurements are from an academic computation too expensive to carry out? to a serious real-world attack. Measurements start with We catch errors, resolve disputes scaled-down experiments, by carrying out experiments: work up towards actually running these algorithms the scale of interest. on the largest scale we can.
physics uniquely error-prone? Algorithm analysis is another 1980s sec course not. error-prone field of science. “QS” facto costs 2 100 field of science: Theoreticians make predictions reticians make predictions regarding algorithm performance. rding observable phenomena; These predictions are often erimental scientists disputed, sometimes wrong. measure those phenomena; Particularly error-prone: compare the results. cryptanalytic extrapolations if measurements are from an academic computation expensive to carry out? to a serious real-world attack. Measurements start with We catch errors, resolve disputes scaled-down experiments, by carrying out experiments: up towards actually running these algorithms scale of interest. on the largest scale we can.
uniquely error-prone? Algorithm analysis is another 1980s security evaluation: error-prone field of science. “QS” factorization costs 2 100 to break science: Theoreticians make predictions make predictions regarding algorithm performance. observable phenomena; These predictions are often scientists disputed, sometimes wrong. phenomena; Particularly error-prone: results. cryptanalytic extrapolations easurements are from an academic computation carry out? to a serious real-world attack. tart with We catch errors, resolve disputes eriments, by carrying out experiments: actually running these algorithms interest. on the largest scale we can.
rone? Algorithm analysis is another 1980s security evaluation: error-prone field of science. “QS” factorization algorithm costs 2 100 to break RSA-1024. Theoreticians make predictions redictions regarding algorithm performance. phenomena; These predictions are often disputed, sometimes wrong. phenomena; Particularly error-prone: cryptanalytic extrapolations from an academic computation out? to a serious real-world attack. We catch errors, resolve disputes by carrying out experiments: actually running these algorithms on the largest scale we can.
Algorithm analysis is another 1980s security evaluation: error-prone field of science. “QS” factorization algorithm costs 2 100 to break RSA-1024. Theoreticians make predictions regarding algorithm performance. These predictions are often disputed, sometimes wrong. Particularly error-prone: cryptanalytic extrapolations from an academic computation to a serious real-world attack. We catch errors, resolve disputes by carrying out experiments: actually running these algorithms on the largest scale we can.
Algorithm analysis is another 1980s security evaluation: error-prone field of science. “QS” factorization algorithm costs 2 100 to break RSA-1024. Theoreticians make predictions regarding algorithm performance. 1990 Pollard: new “NFS”. These predictions are often disputed, sometimes wrong. Particularly error-prone: cryptanalytic extrapolations from an academic computation to a serious real-world attack. We catch errors, resolve disputes by carrying out experiments: actually running these algorithms on the largest scale we can.
Algorithm analysis is another 1980s security evaluation: error-prone field of science. “QS” factorization algorithm costs 2 100 to break RSA-1024. Theoreticians make predictions regarding algorithm performance. 1990 Pollard: new “NFS”. These predictions are often 1991 Adleman: NFS disputed, sometimes wrong. won’t beat QS for RSA-1024. Particularly error-prone: cryptanalytic extrapolations from an academic computation to a serious real-world attack. We catch errors, resolve disputes by carrying out experiments: actually running these algorithms on the largest scale we can.
Algorithm analysis is another 1980s security evaluation: error-prone field of science. “QS” factorization algorithm costs 2 100 to break RSA-1024. Theoreticians make predictions regarding algorithm performance. 1990 Pollard: new “NFS”. These predictions are often 1991 Adleman: NFS disputed, sometimes wrong. won’t beat QS for RSA-1024. Particularly error-prone: Subsequent experiments ✮ cryptanalytic extrapolations NFS is much faster; maybe 2 80 ? from an academic computation to a serious real-world attack. We catch errors, resolve disputes by carrying out experiments: actually running these algorithms on the largest scale we can.
Algorithm analysis is another 1980s security evaluation: error-prone field of science. “QS” factorization algorithm costs 2 100 to break RSA-1024. Theoreticians make predictions regarding algorithm performance. 1990 Pollard: new “NFS”. These predictions are often 1991 Adleman: NFS disputed, sometimes wrong. won’t beat QS for RSA-1024. Particularly error-prone: Subsequent experiments ✮ cryptanalytic extrapolations NFS is much faster; maybe 2 80 ? from an academic computation Actual security of RSA-1024 is to a serious real-world attack. still a matter of dispute: e.g., We catch errors, resolve disputes 2009 Bos–Kaihara–Kleinjung– by carrying out experiments: Lenstra–Montgomery oppose actually running these algorithms NIST’s transition to RSA-2048. on the largest scale we can.
rithm analysis is another 1980s security evaluation: The attack rone field of science. “QS” factorization algorithm Enough theo costs 2 100 to break RSA-1024. reticians make predictions should reach rding algorithm performance. 1990 Pollard: new “NFS”. on amount predictions are often required 1991 Adleman: NFS disputed, sometimes wrong. won’t beat QS for RSA-1024. But can rticularly error-prone: this amount Subsequent experiments ✮ cryptanalytic extrapolations NFS is much faster; maybe 2 80 ? an academic computation Actual security of RSA-1024 is serious real-world attack. still a matter of dispute: e.g., catch errors, resolve disputes 2009 Bos–Kaihara–Kleinjung– rrying out experiments: Lenstra–Montgomery oppose actually running these algorithms NIST’s transition to RSA-2048. largest scale we can.
nalysis is another 1980s security evaluation: The attacker’s sup of science. “QS” factorization algorithm Enough theory+exp costs 2 100 to break RSA-1024. make predictions should reach consensus rithm performance. 1990 Pollard: new “NFS”. on amount of computation redictions are often required to break a 1991 Adleman: NFS sometimes wrong. won’t beat QS for RSA-1024. But can the attack r-prone: this amount of computation? Subsequent experiments ✮ extrapolations NFS is much faster; maybe 2 80 ? mic computation Actual security of RSA-1024 is real-world attack. still a matter of dispute: e.g., rs, resolve disputes 2009 Bos–Kaihara–Kleinjung– experiments: Lenstra–Montgomery oppose these algorithms NIST’s transition to RSA-2048. scale we can.
another 1980s security evaluation: The attacker’s supercomputer science. “QS” factorization algorithm Enough theory+experiment costs 2 100 to break RSA-1024. redictions should reach consensus rmance. 1990 Pollard: new “NFS”. on amount of computation often required to break a system. 1991 Adleman: NFS wrong. won’t beat QS for RSA-1024. But can the attacker perform this amount of computation? Subsequent experiments ✮ olations NFS is much faster; maybe 2 80 ? computation Actual security of RSA-1024 is attack. still a matter of dispute: e.g., disputes 2009 Bos–Kaihara–Kleinjung– eriments: Lenstra–Montgomery oppose algorithms NIST’s transition to RSA-2048. can.
1980s security evaluation: The attacker’s supercomputer “QS” factorization algorithm Enough theory+experiment costs 2 100 to break RSA-1024. should reach consensus 1990 Pollard: new “NFS”. on amount of computation required to break a system. 1991 Adleman: NFS won’t beat QS for RSA-1024. But can the attacker perform this amount of computation? Subsequent experiments ✮ NFS is much faster; maybe 2 80 ? Actual security of RSA-1024 is still a matter of dispute: e.g., 2009 Bos–Kaihara–Kleinjung– Lenstra–Montgomery oppose NIST’s transition to RSA-2048.
1980s security evaluation: The attacker’s supercomputer “QS” factorization algorithm Enough theory+experiment costs 2 100 to break RSA-1024. should reach consensus 1990 Pollard: new “NFS”. on amount of computation required to break a system. 1991 Adleman: NFS won’t beat QS for RSA-1024. But can the attacker perform this amount of computation? Subsequent experiments ✮ NFS is much faster; maybe 2 80 ? Hypothesize attacker resources. This talk: $2 billion, 65MW. Actual security of RSA-1024 is Alternative: millions of still a matter of dispute: e.g., compromised Internet computers. 2009 Bos–Kaihara–Kleinjung– Lenstra–Montgomery oppose The interesting part: analyze NIST’s transition to RSA-2048. optimal use of those resources.
security evaluation: The attacker’s supercomputer Communication factorization algorithm Enough theory+experiment Bill Dally 100 to break RSA-1024. should reach consensus “Communication ollard: new “NFS”. on amount of computation more energy required to break a system. Adleman: NFS beat QS for RSA-1024. But can the attacker perform this amount of computation? Subsequent experiments ✮ much faster; maybe 2 80 ? Hypothesize attacker resources. This talk: $2 billion, 65MW. Actual security of RSA-1024 is Alternative: millions of matter of dispute: e.g., compromised Internet computers. Bos–Kaihara–Kleinjung– Lenstra–Montgomery oppose The interesting part: analyze NIST’s transition to RSA-2048. optimal use of those resources.
evaluation: The attacker’s supercomputer Communication vs. tion algorithm Enough theory+experiment Bill Dally, 2013.06.17: ak RSA-1024. should reach consensus “Communication tak new “NFS”. on amount of computation more energy than a required to break a system. NFS for RSA-1024. But can the attacker perform this amount of computation? eriments ✮ faster; maybe 2 80 ? Hypothesize attacker resources. This talk: $2 billion, 65MW. of RSA-1024 is Alternative: millions of dispute: e.g., compromised Internet computers. ra–Kleinjung– Lenstra–Montgomery oppose The interesting part: analyze transition to RSA-2048. optimal use of those resources.
The attacker’s supercomputer Communication vs. arithmetic rithm Enough theory+experiment Bill Dally, 2013.06.17: RSA-1024. should reach consensus “Communication takes “NFS”. on amount of computation more energy than arithmetic”. required to break a system. RSA-1024. But can the attacker perform this amount of computation? ✮ e 2 80 ? Hypothesize attacker resources. This talk: $2 billion, 65MW. RSA-1024 is Alternative: millions of e.g., compromised Internet computers. jung– ose The interesting part: analyze RSA-2048. optimal use of those resources.
The attacker’s supercomputer Communication vs. arithmetic Enough theory+experiment Bill Dally, 2013.06.17: should reach consensus “Communication takes on amount of computation more energy than arithmetic”. required to break a system. But can the attacker perform this amount of computation? Hypothesize attacker resources. This talk: $2 billion, 65MW. Alternative: millions of compromised Internet computers. The interesting part: analyze optimal use of those resources.
The attacker’s supercomputer Communication vs. arithmetic Enough theory+experiment Bill Dally, 2013.06.17: should reach consensus “Communication takes on amount of computation more energy than arithmetic”. required to break a system. Stephen S. Pawlowski, But can the attacker perform 2013.06.18: “The majority of this amount of computation? energy that we spend today is on transferring data.” Hypothesize attacker resources. This talk: $2 billion, 65MW. Alternative: millions of compromised Internet computers. The interesting part: analyze optimal use of those resources.
The attacker’s supercomputer Communication vs. arithmetic Enough theory+experiment Bill Dally, 2013.06.17: should reach consensus “Communication takes on amount of computation more energy than arithmetic”. required to break a system. Stephen S. Pawlowski, But can the attacker perform 2013.06.18: “The majority of this amount of computation? energy that we spend today is on transferring data.” Hypothesize attacker resources. This talk: $2 billion, 65MW. Depends what you’re doing! Alternative: millions of Computations fundamentally vary compromised Internet computers. in amount of communication The interesting part: analyze (distance and volume) optimal use of those resources. and amount of arithmetic.
attacker’s supercomputer Communication vs. arithmetic Some algo ♥ Enough theory+experiment Bill Dally, 2013.06.17: Square matrix-ve ♥ 2 arithmetic. reach consensus “Communication takes amount of computation more energy than arithmetic”. required to break a system. Stephen S. Pawlowski, can the attacker perform 2013.06.18: “The majority of amount of computation? energy that we spend today is on transferring data.” othesize attacker resources. talk: $2 billion, 65MW. Depends what you’re doing! Alternative: millions of Computations fundamentally vary romised Internet computers. in amount of communication interesting part: analyze (distance and volume) optimal use of those resources. and amount of arithmetic.
supercomputer Communication vs. arithmetic Some algorithms using ♥ experiment Bill Dally, 2013.06.17: Square matrix-vecto ♥ 2 arithmetic. consensus “Communication takes computation more energy than arithmetic”. reak a system. Stephen S. Pawlowski, attacker perform 2013.06.18: “The majority of computation? energy that we spend today is on transferring data.” attacker resources. llion, 65MW. Depends what you’re doing! millions of Computations fundamentally vary Internet computers. in amount of communication part: analyze (distance and volume) those resources. and amount of arithmetic.
Some algorithms using ♥ 2 data: ercomputer Communication vs. arithmetic eriment Bill Dally, 2013.06.17: Square matrix-vector product: ♥ 2 arithmetic. “Communication takes more energy than arithmetic”. system. Stephen S. Pawlowski, rm 2013.06.18: “The majority of computation? energy that we spend today is on transferring data.” resources. 65MW. Depends what you’re doing! Computations fundamentally vary computers. in amount of communication analyze (distance and volume) resources. and amount of arithmetic.
Some algorithms using ♥ 2 data: Communication vs. arithmetic Bill Dally, 2013.06.17: Square matrix-vector product: ♥ 2 arithmetic. “Communication takes more energy than arithmetic”. Stephen S. Pawlowski, 2013.06.18: “The majority of energy that we spend today is on transferring data.” Depends what you’re doing! Computations fundamentally vary in amount of communication (distance and volume) and amount of arithmetic.
Some algorithms using ♥ 2 data: Communication vs. arithmetic Bill Dally, 2013.06.17: Square matrix-vector product: ♥ 2 arithmetic. “Communication takes more energy than arithmetic”. FFT for input size ♥ 2 : ♥ 2 lg ♥ arithmetic. Stephen S. Pawlowski, 2013.06.18: “The majority of energy that we spend today is on transferring data.” Depends what you’re doing! Computations fundamentally vary in amount of communication (distance and volume) and amount of arithmetic.
Some algorithms using ♥ 2 data: Communication vs. arithmetic Bill Dally, 2013.06.17: Square matrix-vector product: ♥ 2 arithmetic. “Communication takes more energy than arithmetic”. FFT for input size ♥ 2 : ♥ 2 lg ♥ arithmetic. Stephen S. Pawlowski, 2013.06.18: “The majority of Matrix-matrix product: energy that we spend today typically ♥ 3 arithmetic is on transferring data.” without Strassen etc. Depends what you’re doing! Computations fundamentally vary in amount of communication (distance and volume) and amount of arithmetic.
Some algorithms using ♥ 2 data: Communication vs. arithmetic Bill Dally, 2013.06.17: Square matrix-vector product: ♥ 2 arithmetic. “Communication takes more energy than arithmetic”. FFT for input size ♥ 2 : ♥ 2 lg ♥ arithmetic. Stephen S. Pawlowski, 2013.06.18: “The majority of Matrix-matrix product: energy that we spend today typically ♥ 3 arithmetic is on transferring data.” without Strassen etc. Depends what you’re doing! Integrals in quantum chemistry, Computations fundamentally vary many common iterations, in amount of communication graph algorithms, etc.: ♥ 4 arithmetic, sometimes more. (distance and volume) and amount of arithmetic.
Some algorithms using ♥ 2 data: Communication vs. arithmetic Chip area ♥ ✎ is enough Dally, 2013.06.17: Square matrix-vector product: all data fo ♥ ♥ 2 arithmetic. “Communication takes energy than arithmetic”. FFT for input size ♥ 2 : ♥ 2 lg ♥ arithmetic. Stephen S. Pawlowski, 2013.06.18: “The majority of Matrix-matrix product: that we spend today typically ♥ 3 arithmetic transferring data.” without Strassen etc. ends what you’re doing! Integrals in quantum chemistry, Computations fundamentally vary many common iterations, amount of communication graph algorithms, etc.: ♥ 4 arithmetic, sometimes more. (distance and volume) amount of arithmetic.
Some algorithms using ♥ 2 data: Chip area ♥ 2+ ✎ vs. arithmetic is enough to store 2013.06.17: Square matrix-vector product: all data for size- ♥ 2 ♥ 2 arithmetic. takes than arithmetic”. FFT for input size ♥ 2 : ♥ 2 lg ♥ arithmetic. wlowski, “The majority of Matrix-matrix product: spend today typically ♥ 3 arithmetic g data.” without Strassen etc. ou’re doing! Integrals in quantum chemistry, fundamentally vary many common iterations, communication graph algorithms, etc.: ♥ 4 arithmetic, sometimes more. volume) rithmetic.
Some algorithms using ♥ 2 data: Chip area ♥ 2+ ✎ rithmetic is enough to store Square matrix-vector product: all data for size- ♥ 2 FFT. ♥ 2 arithmetic. rithmetic”. FFT for input size ♥ 2 : ♥ 2 lg ♥ arithmetic. y of Matrix-matrix product: y typically ♥ 3 arithmetic without Strassen etc. doing! Integrals in quantum chemistry, fundamentally vary many common iterations, communication graph algorithms, etc.: ♥ 4 arithmetic, sometimes more.
Some algorithms using ♥ 2 data: Chip area ♥ 2+ ✎ is enough to store Square matrix-vector product: all data for size- ♥ 2 FFT. ♥ 2 arithmetic. FFT for input size ♥ 2 : ♥ 2 lg ♥ arithmetic. Matrix-matrix product: typically ♥ 3 arithmetic without Strassen etc. Integrals in quantum chemistry, many common iterations, graph algorithms, etc.: ♥ 4 arithmetic, sometimes more.
Some algorithms using ♥ 2 data: Chip area ♥ 2+ ✎ is enough to store Square matrix-vector product: all data for size- ♥ 2 FFT. ♥ 2 arithmetic. Chip area ♥ 2+ ✎ FFT for input size ♥ 2 : is also enough for ♥ 2 lg ♥ arithmetic. ♥ 2 parallel ALUs. Matrix-matrix product: typically ♥ 3 arithmetic without Strassen etc. Integrals in quantum chemistry, many common iterations, graph algorithms, etc.: ♥ 4 arithmetic, sometimes more.
Some algorithms using ♥ 2 data: Chip area ♥ 2+ ✎ is enough to store Square matrix-vector product: all data for size- ♥ 2 FFT. ♥ 2 arithmetic. Chip area ♥ 2+ ✎ FFT for input size ♥ 2 : is also enough for ♥ 2 lg ♥ arithmetic. ♥ 2 parallel ALUs. Matrix-matrix product: FFT takes time ♥ ✎ , typically ♥ 3 arithmetic thanks to parallelism? No! without Strassen etc. Routing the FFT data occupies area ♥ 2+ ✎ Integrals in quantum chemistry, many common iterations, for time ♥ 1+ ✎ . graph algorithms, etc.: ♥ 4 arithmetic, sometimes more.
Some algorithms using ♥ 2 data: Chip area ♥ 2+ ✎ is enough to store Square matrix-vector product: all data for size- ♥ 2 FFT. ♥ 2 arithmetic. Chip area ♥ 2+ ✎ FFT for input size ♥ 2 : is also enough for ♥ 2 lg ♥ arithmetic. ♥ 2 parallel ALUs. Matrix-matrix product: FFT takes time ♥ ✎ , typically ♥ 3 arithmetic thanks to parallelism? No! without Strassen etc. Routing the FFT data occupies area ♥ 2+ ✎ Integrals in quantum chemistry, many common iterations, for time ♥ 1+ ✎ . graph algorithms, etc.: 1981 Brent–Kung: need ♥ 1+ ✎ ♥ 4 arithmetic, sometimes more. even without wire delays.
algorithms using ♥ 2 data: Chip area ♥ 2+ ✎ Chip area ♥ ✎ is enough to store is enough matrix-vector product: all data for size- ♥ 2 FFT. several ♥ ✂ ♥ rithmetic. ♥ Chip area ♥ 2+ ✎ Routing for input size ♥ 2 : is also enough for occupies ✎ ♥ ♥ arithmetic. ♥ ♥ 2 parallel ALUs. for time ♥ ✎ Matrix-matrix product: FFT takes time ♥ ✎ , Typical ♥ ypically ♥ 3 arithmetic thanks to parallelism? No! also occupies ♥ without Strassen etc. Routing the FFT data for time ♥ ✎ occupies area ♥ 2+ ✎ Integrals in quantum chemistry, Closer lo ✎ common iterations, for time ♥ 1+ ✎ . the ALU algorithms, etc.: 1981 Brent–Kung: need ♥ 1+ ✎ although rithmetic, sometimes more. ♥ even without wire delays.
using ♥ 2 data: Chip area ♥ 2+ ✎ Chip area ♥ 2+ ✎ is enough to store is enough to store matrix-vector product: all data for size- ♥ 2 FFT. several ♥ ✂ ♥ matrices. ♥ Chip area ♥ 2+ ✎ Routing matrix pro size ♥ 2 : occupies area ♥ 2+ ✎ is also enough for ♥ rithmetic. ♥ ♥ 2 parallel ALUs. for time ♥ 1+ ✎ . roduct: Typical ♥ 3 arithmetic FFT takes time ♥ ✎ , rithmetic ♥ also occupies ♥ 2 ALUs thanks to parallelism? No! etc. for time ♥ 1+ ✎ . Routing the FFT data occupies area ♥ 2+ ✎ antum chemistry, Closer look at ✎ : iterations, for time ♥ 1+ ✎ . the ALU cost dominates, rithms, etc.: 1981 Brent–Kung: need ♥ 1+ ✎ although not by much. sometimes more. ♥ even without wire delays.
Chip area ♥ 2+ ✎ Chip area ♥ 2+ ✎ ♥ data: is enough to store is enough to store duct: all data for size- ♥ 2 FFT. several ♥ ✂ ♥ matrices. ♥ Chip area ♥ 2+ ✎ Routing matrix product ♥ occupies area ♥ 2+ ✎ is also enough for ♥ ♥ ♥ 2 parallel ALUs. for time ♥ 1+ ✎ . Typical ♥ 3 arithmetic FFT takes time ♥ ✎ , ♥ also occupies ♥ 2 ALUs thanks to parallelism? No! for time ♥ 1+ ✎ . Routing the FFT data occupies area ♥ 2+ ✎ chemistry, Closer look at ✎ : for time ♥ 1+ ✎ . the ALU cost dominates, 1981 Brent–Kung: need ♥ 1+ ✎ although not by much. more. ♥ even without wire delays.
Chip area ♥ 2+ ✎ Chip area ♥ 2+ ✎ is enough to store is enough to store all data for size- ♥ 2 FFT. several ♥ ✂ ♥ matrices. Chip area ♥ 2+ ✎ Routing matrix product occupies area ♥ 2+ ✎ is also enough for ♥ 2 parallel ALUs. for time ♥ 1+ ✎ . Typical ♥ 3 arithmetic FFT takes time ♥ ✎ , also occupies ♥ 2 ALUs thanks to parallelism? No! for time ♥ 1+ ✎ . Routing the FFT data occupies area ♥ 2+ ✎ Closer look at ✎ : for time ♥ 1+ ✎ . the ALU cost dominates, 1981 Brent–Kung: need ♥ 1+ ✎ although not by much. even without wire delays.
rea ♥ 2+ ✎ Chip area ♥ 2+ ✎ ❃ 90% of enough to store is enough to store of typical data for size- ♥ 2 FFT. several ♥ ✂ ♥ matrices. is spent ❁ 10% on rea ♥ 2+ ✎ Routing matrix product occupies area ♥ 2+ ✎ enough for Is Bluffdale for time ♥ 1+ ✎ . rallel ALUs. ♥ Typical ♥ 3 arithmetic takes time ♥ ✎ , also occupies ♥ 2 ALUs to parallelism? No! for time ♥ 1+ ✎ . Routing the FFT data ccupies area ♥ 2+ ✎ Closer look at ✎ : e ♥ 1+ ✎ . the ALU cost dominates, Brent–Kung: need ♥ 1+ ✎ although not by much. without wire delays.
Chip area ♥ 2+ ✎ ❃ 90% of the cost ✎ ♥ re is enough to store of typical supercomputers ♥ 2 FFT. several ♥ ✂ ♥ matrices. is spent on communication; ❁ 10% on ALUs. Routing matrix product ✎ ♥ occupies area ♥ 2+ ✎ r Is Bluffdale built this for time ♥ 1+ ✎ . ALUs. ♥ Typical ♥ 3 arithmetic ♥ ✎ , also occupies ♥ 2 ALUs rallelism? No! for time ♥ 1+ ✎ . data ♥ 2+ ✎ Closer look at ✎ : ✎ ♥ the ALU cost dominates, Brent–Kung: need ♥ 1+ ✎ although not by much. wire delays.
Chip area ♥ 2+ ✎ ❃ 90% of the cost ✎ ♥ is enough to store of typical supercomputers several ♥ ✂ ♥ matrices. is spent on communication; ♥ ❁ 10% on ALUs. Routing matrix product ✎ ♥ occupies area ♥ 2+ ✎ Is Bluffdale built this way? for time ♥ 1+ ✎ . ♥ Typical ♥ 3 arithmetic ♥ ✎ also occupies ♥ 2 ALUs No! for time ♥ 1+ ✎ . ✎ ♥ Closer look at ✎ : ✎ ♥ the ALU cost dominates, ♥ 1+ ✎ although not by much.
Chip area ♥ 2+ ✎ ❃ 90% of the cost is enough to store of typical supercomputers several ♥ ✂ ♥ matrices. is spent on communication; ❁ 10% on ALUs. Routing matrix product occupies area ♥ 2+ ✎ Is Bluffdale built this way? for time ♥ 1+ ✎ . Typical ♥ 3 arithmetic also occupies ♥ 2 ALUs for time ♥ 1+ ✎ . Closer look at ✎ : the ALU cost dominates, although not by much.
Chip area ♥ 2+ ✎ ❃ 90% of the cost is enough to store of typical supercomputers several ♥ ✂ ♥ matrices. is spent on communication; ❁ 10% on ALUs. Routing matrix product occupies area ♥ 2+ ✎ Is Bluffdale built this way? for time ♥ 1+ ✎ . No; NSA is not stupid. Typical ♥ 3 arithmetic Doubling number of ALUs also occupies ♥ 2 ALUs would cost ❁ 10% extra. for time ♥ 1+ ✎ . Would ✙ double performance of matrix-matrix product Closer look at ✎ : and heavier-arith computations. the ALU cost dominates, although not by much. NSA’s computations have a mix of heavy arith and heavy comm.
rea ♥ 2+ ✎ ❃ 90% of the cost GPUs have enough to store of typical supercomputers but relatively several ♥ ✂ ♥ matrices. is spent on communication; communication ❁ 10% on ALUs. a few long Routing matrix product ccupies area ♥ 2+ ✎ Is Bluffdale built this way? Is Bluffdale e ♥ 1+ ✎ . No; NSA is not stupid. ypical ♥ 3 arithmetic Doubling number of ALUs ccupies ♥ 2 ALUs would cost ❁ 10% extra. e ♥ 1+ ✎ . Would ✙ double performance of matrix-matrix product look at ✎ : and heavier-arith computations. ALU cost dominates, although not by much. NSA’s computations have a mix of heavy arith and heavy comm.
❃ 90% of the cost GPUs have many ALUs ✎ ♥ re of typical supercomputers but relatively little ♥ ✂ ♥ matrices. is spent on communication; communication capacit ❁ 10% on ALUs. a few long wires to product ♥ 2+ ✎ Is Bluffdale built this way? Is Bluffdale built this No; NSA is not stupid. ✎ ♥ rithmetic Doubling number of ALUs ♥ ♥ ALUs would cost ❁ 10% extra. Would ✙ double performance ✎ ♥ of matrix-matrix product ✎ and heavier-arith computations. dominates, much. NSA’s computations have a mix of heavy arith and heavy comm.
❃ 90% of the cost GPUs have many ALUs ✎ ♥ of typical supercomputers but relatively little is spent on communication; communication capacity: ♥ ✂ ♥ ❁ 10% on ALUs. a few long wires to RAM. Is Bluffdale built this way? Is Bluffdale built this way? ✎ ♥ No; NSA is not stupid. ✎ ♥ Doubling number of ALUs ♥ would cost ❁ 10% extra. ♥ Would ✙ double performance ✎ ♥ of matrix-matrix product ✎ and heavier-arith computations. NSA’s computations have a mix of heavy arith and heavy comm.
❃ 90% of the cost GPUs have many ALUs of typical supercomputers but relatively little is spent on communication; communication capacity: ❁ 10% on ALUs. a few long wires to RAM. Is Bluffdale built this way? Is Bluffdale built this way? No; NSA is not stupid. Doubling number of ALUs would cost ❁ 10% extra. Would ✙ double performance of matrix-matrix product and heavier-arith computations. NSA’s computations have a mix of heavy arith and heavy comm.
❃ 90% of the cost GPUs have many ALUs of typical supercomputers but relatively little is spent on communication; communication capacity: ❁ 10% on ALUs. a few long wires to RAM. Is Bluffdale built this way? Is Bluffdale built this way? No; NSA is not stupid. No; NSA is not stupid. Doubling number of ALUs Adding communication would cost ❁ 10% extra. between adjacent ALUs Would ✙ double performance would cost very little. of matrix-matrix product Would drastically speed up and heavier-arith computations. matrix-matrix product and heavier-comm computations: NSA’s computations have a mix FFT, sorting, etc. of heavy arith and heavy comm.
of the cost GPUs have many ALUs Documentation ❃ ypical supercomputers but relatively little Intel Xeon ent on communication; communication capacity: and a few on ALUs. a few long wires to RAM. plus adjacent ❁ communication Bluffdale built this way? Is Bluffdale built this way? NSA is not stupid. No; NSA is not stupid. Is Bluffdale Doubling number of ALUs Adding communication cost ❁ 10% extra. between adjacent ALUs ✙ double performance would cost very little. matrix-matrix product Would drastically speed up heavier-arith computations. matrix-matrix product and heavier-comm computations: computations have a mix FFT, sorting, etc. heavy arith and heavy comm.
cost GPUs have many ALUs Documentation tells ❃ ercomputers but relatively little Intel Xeon Phi has communication; communication capacity: and a few long wires a few long wires to RAM. plus adjacent one-dimensional ❁ communication (ring this way? Is Bluffdale built this way? stupid. No; NSA is not stupid. Is Bluffdale built this er of ALUs Adding communication extra. between adjacent ALUs ❁ performance would cost very little. ✙ product Would drastically speed up computations. matrix-matrix product and heavier-comm computations: computations have a mix FFT, sorting, etc. and heavy comm.
GPUs have many ALUs Documentation tells me that ❃ but relatively little Intel Xeon Phi has many ALUs communication; communication capacity: and a few long wires to RAM a few long wires to RAM. plus adjacent one-dimensional ❁ communication (ring bus). Is Bluffdale built this way? No; NSA is not stupid. Is Bluffdale built this way? Adding communication between adjacent ALUs ❁ rmance would cost very little. ✙ Would drastically speed up computations. matrix-matrix product and heavier-comm computations: a mix FFT, sorting, etc. comm.
GPUs have many ALUs Documentation tells me that but relatively little Intel Xeon Phi has many ALUs communication capacity: and a few long wires to RAM a few long wires to RAM. plus adjacent one-dimensional communication (ring bus). Is Bluffdale built this way? No; NSA is not stupid. Is Bluffdale built this way? Adding communication between adjacent ALUs would cost very little. Would drastically speed up matrix-matrix product and heavier-comm computations: FFT, sorting, etc.
GPUs have many ALUs Documentation tells me that but relatively little Intel Xeon Phi has many ALUs communication capacity: and a few long wires to RAM a few long wires to RAM. plus adjacent one-dimensional communication (ring bus). Is Bluffdale built this way? No; NSA is not stupid. Is Bluffdale built this way? No; NSA is not stupid. Adding communication Adding two-dimensional grid between adjacent ALUs would cost very little. would drastically speed up Would drastically speed up heavy-comm computations. matrix-matrix product e.g. 1977 Thompson–Kung. and heavier-comm computations: Grid examples: MasPar; FPGAs. FFT, sorting, etc. But FPGAs have other problems.
have many ALUs Documentation tells me that Save even relatively little Intel Xeon Phi has many ALUs with 3D communication capacity: and a few long wires to RAM e.g. 1983 long wires to RAM. plus adjacent one-dimensional Huge engineering communication (ring bus). Bluffdale built this way? 2D allows NSA is not stupid. Is Bluffdale built this way? energy input, No; NSA is not stupid. Adding communication up to very Adding two-dimensional grid 3D is hard een adjacent ALUs cost very little. would drastically speed up Some limited drastically speed up heavy-comm computations. (most interest matrix-matrix product e.g. 1977 Thompson–Kung. presumably heavier-comm computations: Grid examples: MasPar; FPGAs. Progress sorting, etc. But FPGAs have other problems. e.g., 4 ✂ ✂ is often called
many ALUs Documentation tells me that Save even more time little Intel Xeon Phi has many ALUs with 3D arrangement capacity: and a few long wires to RAM e.g. 1983 Rosenbe to RAM. plus adjacent one-dimensional Huge engineering challenge. communication (ring bus). this way? 2D allows easy scaling stupid. Is Bluffdale built this way? energy input, heat No; NSA is not stupid. communication up to very large chip Adding two-dimensional grid 3D is hard to scale. adjacent ALUs little. would drastically speed up Some limited progress drastically speed up heavy-comm computations. (most interesting: roduct e.g. 1977 Thompson–Kung. presumably used b heavier-comm computations: Grid examples: MasPar; FPGAs. Progress often exaggerated: etc. But FPGAs have other problems. e.g., 4 ✂ 16384 ✂ 16384 is often called “3D”.
Documentation tells me that Save even more time Intel Xeon Phi has many ALUs with 3D arrangement of ALUs? and a few long wires to RAM e.g. 1983 Rosenberg. plus adjacent one-dimensional Huge engineering challenge. communication (ring bus). 2D allows easy scaling of Is Bluffdale built this way? energy input, heat output No; NSA is not stupid. up to very large chip area. Adding two-dimensional grid 3D is hard to scale. would drastically speed up Some limited progress heavy-comm computations. (most interesting: optics), e.g. 1977 Thompson–Kung. presumably used by NSA. computations: Grid examples: MasPar; FPGAs. Progress often exaggerated: But FPGAs have other problems. e.g., 4 ✂ 16384 ✂ 16384 is often called “3D”.
Documentation tells me that Save even more time Intel Xeon Phi has many ALUs with 3D arrangement of ALUs? and a few long wires to RAM e.g. 1983 Rosenberg. plus adjacent one-dimensional Huge engineering challenge. communication (ring bus). 2D allows easy scaling of Is Bluffdale built this way? energy input, heat output No; NSA is not stupid. up to very large chip area. Adding two-dimensional grid 3D is hard to scale. would drastically speed up Some limited progress heavy-comm computations. (most interesting: optics), e.g. 1977 Thompson–Kung. presumably used by NSA. Grid examples: MasPar; FPGAs. Progress often exaggerated: But FPGAs have other problems. e.g., 4 ✂ 16384 ✂ 16384 is often called “3D”.
cumentation tells me that Save even more time Special vs. Xeon Phi has many ALUs with 3D arrangement of ALUs? Typical cryptanalytic few long wires to RAM e.g. 1983 Rosenberg. between ✂ ✂ adjacent one-dimensional Huge engineering challenge. better perfo communication (ring bus). from ASICs 2D allows easy scaling of Bluffdale built this way? mass-ma energy input, heat output NSA is not stupid. up to very large chip area. Some exce Adding two-dimensional grid 3D is hard to scale. ASICs bring drastically speed up Some limited progress heavy-comm computations. (most interesting: optics), 1977 Thompson–Kung. presumably used by NSA. examples: MasPar; FPGAs. Progress often exaggerated: FPGAs have other problems. e.g., 4 ✂ 16384 ✂ 16384 is often called “3D”.
tells me that Save even more time Special vs. general has many ALUs with 3D arrangement of ALUs? Typical cryptanalytic wires to RAM e.g. 1983 Rosenberg. between 100 ✂ and ✂ one-dimensional Huge engineering challenge. better performance (ring bus). from ASICs than from 2D allows easy scaling of this way? mass-market CPUs, energy input, heat output stupid. up to very large chip area. Some exceptions, but 3D is hard to scale. ASICs bring massive o-dimensional grid speed up Some limited progress computations. (most interesting: optics), Thompson–Kung. presumably used by NSA. MasPar; FPGAs. Progress often exaggerated: have other problems. e.g., 4 ✂ 16384 ✂ 16384 is often called “3D”.
that Save even more time Special vs. general purpose ALUs with 3D arrangement of ALUs? Typical cryptanalytic arith: RAM e.g. 1983 Rosenberg. between 100 ✂ and 1000 ✂ one-dimensional Huge engineering challenge. better performance per transisto bus). from ASICs than from 2D allows easy scaling of mass-market CPUs, GPUs. energy input, heat output up to very large chip area. Some exceptions, but overall 3D is hard to scale. ASICs bring massive speedup. grid Some limited progress s. (most interesting: optics), Thompson–Kung. presumably used by NSA. FPGAs. Progress often exaggerated: roblems. e.g., 4 ✂ 16384 ✂ 16384 is often called “3D”.
Save even more time Special vs. general purpose with 3D arrangement of ALUs? Typical cryptanalytic arith: e.g. 1983 Rosenberg. between 100 ✂ and 1000 ✂ Huge engineering challenge. better performance per transistor from ASICs than from 2D allows easy scaling of mass-market CPUs, GPUs. energy input, heat output up to very large chip area. Some exceptions, but overall 3D is hard to scale. ASICs bring massive speedup. Some limited progress (most interesting: optics), presumably used by NSA. Progress often exaggerated: e.g., 4 ✂ 16384 ✂ 16384 is often called “3D”.
Save even more time Special vs. general purpose with 3D arrangement of ALUs? Typical cryptanalytic arith: e.g. 1983 Rosenberg. between 100 ✂ and 1000 ✂ Huge engineering challenge. better performance per transistor from ASICs than from 2D allows easy scaling of mass-market CPUs, GPUs. energy input, heat output up to very large chip area. Some exceptions, but overall 3D is hard to scale. ASICs bring massive speedup. Some limited progress Only in cryptanalysis? No. (most interesting: optics), Estimated ASIC improvement presumably used by NSA. from preliminary scan of other Progress often exaggerated: supercomputing arith problems: e.g., 4 ✂ 16384 ✂ 16384 usually ❃ 10 ✂ , often ❃ 100 ✂ . is often called “3D”.
even more time Special vs. general purpose Frequent 3D arrangement of ALUs? chips spend Typical cryptanalytic arith: 1983 Rosenberg. on decoding+scheduling between 100 ✂ and 1000 ✂ engineering challenge. better performance per transistor ✮ CPU/GPU from ASICs than from reduce insn-handling allows easy scaling of mass-market CPUs, GPUs. by adding input, heat output apply same very large chip area. Some exceptions, but overall to multiple hard to scale. ASICs bring massive speedup. limited progress Only in cryptanalysis? No. interesting: optics), Estimated ASIC improvement resumably used by NSA. from preliminary scan of other Progress often exaggerated: supercomputing arith problems: ✂ 16384 ✂ 16384 usually ❃ 10 ✂ , often ❃ 100 ✂ . often called “3D”.
Recommend
More recommend