parallel programming and heterogeneous computing feedback
play

Parallel Programming and Heterogeneous Computing Feedback Assignment - PowerPoint PPT Presentation

Parallel Programming and Heterogeneous Computing Feedback Assignment 2 Max Plauth, Sven Khler , Felix Eberhardt, Lukas Wenzel, and Andreas Polze Operating Systems and Middleware Group Assignment 1: Covered Topics General Concepts:


  1. Parallel Programming and Heterogeneous Computing Feedback Assignment 2 Max Plauth, Sven Köhler , Felix Eberhardt, Lukas Wenzel, and Andreas Polze Operating Systems and Middleware Group

  2. Assignment 1: Covered Topics General Concepts: ■ Foster’s Method □ Amdahl’s Law □ Shared Memory Parallelism with OpenMP: ■ Task 2.1: Heat Map □ Task 2.2: IO-bound problem and reentrancy of legacy functions □ Task 2.3: Task-Parallel workloads □ ParProg 2019 Task 2.4: Java Monitors □ Feedback Assignment 2 Sven Köhler Hardware Effects: ■ Efficient use of caching □ Chart 2

  3. Parsum 1 ./heatmap ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 3

  4. Good Idea or Bad Idea? #ifdef WITH_OMP #pragma omp parallel for default(none) shared(heatmaps) #endif for (auto row = 1; row < height - 1; ++row) { for (auto col = 1; col < width - 1; ++col) { /* ... */ } } K No need to mask omp-pragmas (unless you have functions). Just don’t include –fopenmp in CFLAGS. ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 4

  5. Heat Map: And the winner was (A1) … ./heatmap 1000 1000 1000 random.csv (4 runs) 120 3300** 100 80 seconds 60 ParProg 2019 40 Feedback 30,54 28,27 Assignment 2 20 12,18 12,41 Sven Köhler 0,00* 0 Chart 5 submission16003 submission16005 submission16002 submission15983 submission16006 submission16022

  6. Heat Map: And the winner is … ./heatmap 1000 1000 1000 random.csv 14 12,259 12 10 8 seconds 6 4,245 ParProg 2019 4 Feedback 2,432 Assignment 2 2 Sven Köhler 0,877 0,484 0 Chart 6 submission16245 submission16417 submission16405 submission16429 submission16427

  7. decrypt 2 ./decrypt ParProg 2019 Feedback Assignment 2 Sven Köhler user266;Osten3 user906;Bahnhof Chart 7

  8. Your Verification Data Dictionary: The 42 most common terms from Unit A Passwords: barbera:Gozsjkgq.2N62 SubstitionItIs gene:SqJwiPjc8z9OQ speedup0 grace:L3xIP64G5RVk6 NotSoLowLevelNoMore ian:MyIR7zQEkP3Mg partitioning sheelagh:7CYgbT6A0xsM6 FishEyes richard:oGlayhJ1bTXuE ArrogantHippy ParProg 2019 margarete:qRbG.QWxv9c.6 GuideMeToTheMoon Feedback Assignment 2 elon:UFy0LW2XSNPVo FlyVeryHigh Sven Köhler satoshi:Hqw9N3HL38lAw BurnAllYourPower Chart 8

  9. Good Idea or Bad Idea? #pragma omp parallel for shared(result1, result2) for (int i = 0; i < tasks.size(); i++) { /* ... */ if (result1.found && result2.found) continue; for (int j = 0; j < dictPasswords.size(); j++) { auto password = dictPasswords[j]; struct crypt_data data; data.initialized = 0; /* ... */ ParProg 2019 L Only two results, not synchronization on vars } Feedback Assignment 2 } Sven Köhler L Wide jumps through dict-data (dict >> tasks) for locality swap loops Chart 9

  10. Good Idea or Bad Idea? struct crypt_data data; /* ... */ data.initialized = 0; { if (strcmp(crypt_r((password + "0").c_str(), salt, &data), hash) == 0) { /* ... */ break; } if (strcmp(crypt_r((password + "1").c_str(), salt, &data), hash) == 0) { /* ... */ ParProg 2019 break; Feedback Assignment 2 } L Loop unrolling only helps with tight loops Sven Köhler /* ... */ L Potential overhead for string buffer allocation+free } Chart 10

  11. Good Idea or Bad Idea? #pragma omp parallel shared(db,dict) { #pragma omp master { uint64_t last = 0; for (uint64_t i = 0; i < db_size; i++) { /* iterate through entire db character by character */ if (db[i] == '\n’) { /* if we are at a newline */ db[i] = '\0'; /* 0-terminate user entry */ #pragma omp task crack_user(db+last); ParProg 2019 J Start tasks while parsing input last = i+1; Feedback Assignment 2 } Sven Köhler } } Chart 11 #pragma omp taskwait }

  12. Good Idea or Bad Idea? while(dictFile >> word) { if (common_8_prefix(word, previousWord)) { continue; } previousWord = word; words.emplace_back(word); } J Smart reduction of problem size ParProg 2019 Feedback crypt(3) only operates on first 8 chars of input Assignment 2 Sven Köhler Chart 12

  13. decrypt: And the winner is … ./decrypt taskCryptPw.txt taskCryptDict.txt 3500 3210,346 3000 2500 seconds 2000 1570,16 1500 1000 662,597 ParProg 2019 500 305,585 274,29 Feedback 62,151 51,305 Assignment 2 0 1 8 3 8 8 9 4 Sven Köhler 0 0 1 8 2 0 0 4 4 4 3 4 4 4 6 6 6 6 6 6 6 1 1 1 1 1 1 1 n n n n n n n o o o o o o o i i i i i i i s s s s s s s s s s s s s s i i i i i i i m m m m m m m Chart 13 b b b b b b b u u u u u u u s s s s s s s

  14. Hash Ordered Index 3 ./hoi ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 14

  15. How to MD5? Provide own implementation • Use OpenSSL • Use Glibc • ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 15

  16. Good Idea or Bad Idea? #pragma omp parallel shared(hashes) firstprivate(seed) K Manual scheduling against paradigm { use scheduling clause, if really needed const uint_fast32_t thread = omp_get_thread_num(); const uint_fast32_t threads = omp_get_num_threads(); const uint_fast32_t from = thread*(blocks/threads); const uint_fast32_t to = (thread != threads-1) ? (thread+1)*(blocks/threads) : blocks; add(seed, from); for (unsigned int i = from; i < to; i++) { __uint128_t v = md5(seed); ParProg 2019 #pragma omp critical L Use std::vector::reserve and index Feedback Assignment 2 hashes.push_back(v); operations to get rid of synchro-needs Sven Köhler inc(seed); } Chart 16 }

  17. Good Idea or Bad Idea? std::sort(hashes.begin(),hashes.end(),less); L Serial by default better use task-parallelism with OpenMP Since C++17: std::execution::parallel_policy int max_query = *std::max_element(queries.begin(), queries.end()); if (max_query > 0.8f * n) std::sort(hashes.begin(), hashes.end()); else ParProg 2019 std::partial_sort(hashes.begin(), hashes.begin() + max_query + 1, hashes.end()); Feedback Assignment 2 Sven Köhler Chart 17

  18. Good Idea or Bad Idea? void qsort(unsigned char data[][MD5_DIGEST_LENGTH], unsigned int left, unsigned int right) { if (left < right) { auto pivot = qpartition(data, left, right); #pragma omp parallel sections { K good, but can do faster with #pragma omp section #pragma omp task if (pivot > 0) { (better task distribution, pot. higher qsort(data, left, pivot - 1); data locality) } #pragma omp section ParProg 2019 if (pivot < right - 1) { Feedback Assignment 2 qsort(data, pivot + 1, right); Sven Köhler } } Chart 18 } }

  19. Hint: Use Pipelining Use pipelining to reduce allocated memory size and reduce possible paging. ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 19

  20. HOI: And the winner is … ./hoi deadc0deba5e 268435456 0 32768 268435453 500 459,178 450 381,494 400 350 300 seconds 250 200 150 ParProg 2019 Feedback 100 69,869 Assignment 2 62,887 50 Sven Köhler 17,566 18,620 0 submission16430 submission16421 submission16398 submission16419 submission16424 submission16410 Chart 20

  21. ^D end ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 21

Recommend


More recommend