! welcome CS 744: SNOWFLAKE Shivaram Venkataraman Fall 2020
↳ ended open pros 16ns for yearnings Prepare ADMINISTRIVIA ' lecture Reading → notes various systems prepare Compare → year exams solve previous to Try → - Saurabh Contact - Assignment 1 grades out! Canvas . → - Assignment 2 by mid-week posted → on be will - Midterm this week! Thursday The exam → . canvas clarifications → for BB Collaborate - Project Proposal Peer review → practice ✓ upload canvas to & PDF this ! ! one Assigning type also it print out can you other to review for you
AEFIS FEEDBACK responses ! 4 1 78 How has your experience been reading papers? Are the lectures useful for learning? How are the discussion groups? Did you get to know students in the class? Would it help to have the same group each time? Anything else we could improve for the second half?
↳ ↳ Applications Machine Learning SQL Relational API scope : Language Analytics → Integration snowflake - elastic operators or Data of the execution suited for like Spark on run cloud engines
I yw.w.org?tfft etc ' Aws pigford ! scope maker etc . sage , Machine Learning SQL I CLOUD Elastic MR 1 EMR spark - MRI COMPUTING Computational Engines → ECZ > VMs I ' STACK service extreme :X I are as foster :ico Amazon -53 Scalable Storage Systems → I Ser l t datacenter enterprise r / system storage Google -
SNOWFLAKE: GOALS download anything need to as you No Software-as-a-Service - off → based & on it ' Browser need required as " many Elastic as necessary use when → I change more Not any easily Highly Available Fault tolerance - Relational fit a might not that Databases Semi-Structured Data Data → vs . schema strict strict JSON , scheme xm ? ' ;
SNOWFLAKE DESIGN " " Barons ✓ SaaS SaaS ↳ multiple Elastic - - 1 tenants O HA users f- J T semi structured Elasticity → f- where VW eah have can diff w radiate morbid tf → get
⇒ data this share STORAGE VS COMPUTE jobs compute & ↳ across ↳ failures out separate storage disk Cpu , coupling locality ! need CPK 4 ↳ If I No → 4 disks compute / Manage → scale low could be ↳ utilization independently madrid MR f - rs Dm Dm Dm ✓ Dm # X TI Cpu → E E In network i¥u p a.ge Did 447 up on ⑥ 7 users user 2 of act , E. data , → ← megabyte is Multi Cluster, Shared Data Shared Nothing these herd a VM
snowflake : STORAGE: HYBRID COLUMNAR database Age Table Name ¥ ⇒ =mi÷:÷÷÷÷÷ ! N . indices Alice 32 rows . Bob 22 . Eve 24 ↳ immutable Victor 27 - I - ← lot of gneieies A Name r . tgte4ThT@ICt3.n.r age touch tyteo only a ← columns ca Alice,32,Bob,22 Alice, Bob, 32,22 few - - - - - - ⇒ range get Affair C Eve,24,Victor,27 Eve, Victor,24,27 ] Avoid reading wife " CZ - - . . . - - entire file ! ! I Row-oriented Hybrid Columnar -
↳ ⇒ VIRTUAL WAREHOUSES yYEmot showdown one - Khun ! Only another user machines virtual them use Elasticity, Isolation ECL particular user → when ÷ launched for a a running runner agree : .se?:nIt Tres ' : ? T AI immutable files are Local caching, Stragglers but AFSINFS = yw yw ! H¥7 ⇒ ¥¥y÷÷÷÷÷÷÷ simple Lpv : wish - from flooder results SSD iodate
↳ Table query Table CLOUD SERVICES 4,04 ) → Cl D ca ÷÷ ' → for Table u a :c :c :c * o § cu D → abort ↳ schema Concurrency Control Pruning , ↳ www.ragtointr# updates from to handle How that both skip files many users herd Isolation tries to relevant Isolation ↳ It Snapshot = have don't from tale the reads tuples come ' . , age name s :{ In ? renin head consistent is a- a e
Hernias FAULT TOLERANCE guy tidy reminiscent of restart metadata pay a y Replicate → storage - Ephemeral - Nothing . - retry If failure , query the . across ↳ Replicate data data centers .
↳ SEMI STRUCTURED DATA go.net.in { Extraction operation ↳ it - gtadathy first_name: “john”, - :£%% integer ? last_name: “doe”, ⑦ g- → wining Sf ?dds order_id: “1234”, } JSON objets Flattening , { of arrays out of first_name: “bucky”, create rows - can last_name: “badger”, them # . Infer types, Pruning order_id: “52342”, T win order_date: “3/3/2020”, within a file } - - session . Tai ;*:irr +
↳ Taffy table . or policy TIME TRAVEL? over → versions - FI : : can ÷ own wgftjtiknfapk ET voto ) city - Cl Cl f a A ' D D a Multiple versions of table (MVCC) O command c > Undo accidental deletes new ca O → UN DROP " ¥÷¥¥ - - Cheap to clone / snapshot a table / !÷!e on write copy TE TE
SECURITY Hierarchical key management child ↳ You encrypt z a key parent using by • what gets accessed → limits a when Key rotation, re-keying turned T ooo t.ba/refrest tap used being
SUMMARY, TAKEAWAYS Snowflake - Cloud computing à Elastic data warehouse - Key idea: Separation of compute and storage! required → ranges - Hybrid columnar storage format - Elastic compute with virtual warehouses - - Pruning, semi-structured optimizations, fault tolerant - - hier . :* .
AEFIS FEEDBACK
DISCUSSION https://forms.gle/ZFosdUnizXYABAE86
We see how Snowflake leads to the design of an elastic data warehouse. If we were to similarly design an Elastic PyTorch for training how would the design look? What are some design trade-offs compared to existing PyTorch?
versimiscreatedhyaque.ge# NEXT STEPS Performance ' Yun : 'm cost within performance dignify :L *t Best anons - n ;¥todisJ¥¥÷q÷÷÷f Next class: Midterm! , ¥er¥abkT1 them boring , co , versions - AEFIS feedback ← I ÷ . Project proposal peer feedback assignments → D - W yr ! L / version
dependencies task DRF → Math R R , :iaa . , :p 4 : :c :c req aggregate GCN y ← < 6GB dimension a time Doesn't have
↳ ↳ ⇒ work ? this Does Instantaneous - fair scheme DRF tasks upstream downstream → the don't tasks tasks Downstream inherit shares RDDS materialized ↳ Immutable us . - E FT (b) Improving just default lineage is see paper 5.4 in the shorten can checkpoints lineage
↳ level mid aggregator twerk to do Fiqh 2¥ EE An eatery I
MapReduce Left > Sorting in tortbthidat Earthed ⇒ ÷i¥÷÷÷÷÷÷÷÷÷÷ Lady sorted fetch Det . . saay.ee/eedata9f# random It z . of date ¢ words - it of gram machine 1 itoh to fetch > buckets compute
DRF Gandia , Dismiss if incentive Sharing is allocation having small shared as good as cluster exclusive - qq.ME preferences task DRF a Diane ? Foo - locate tasks some trgat soften ' Lhasa true
Workload D- t § , - - = × ! ' - p f n - #- Dish DRAM Flash , latency capacity Blue : " age ? Bandwidth → Red , , Price 1GB Green :
Assumption : Process f MR failures entire May conflate failure map y d) - progress 7 In reduce - progress In ) is already (a) Trap output to be done on disk , nothing task restart may Cb ) reduce task restart may outputs will Cb ) all E÷÷* . means : " process ( only available →
Recommend
More recommend