Space efficient quantile selection Where U has order C an C U stream Input 9 0000000 ooo numerical data e.g alphabetic order names w grades allowed multiple passes median return the Goal minimum w 6 space passes a quantile queries generally more k element select rank kth largest
Input I pass oilskin O n
i iiiiiiw.am
Passe Spacesort and select quick select random pivot
Approximations and KEEN rank E given 0 param element k rank return En w Sampling median for elements sample b of sample return dn k rank for Deterministic
Quartiles space efficient mergable E approximate quantile queries answer l elements S L E 9,692L ge intervals Ilg along w O h minimax By tracking specially assume Ge ran 149 rankCge 4 n J j g O
E E Querying e y n O l I l I l I I k If EEK 2En kt2EnT for some Ica 9i then return 9 I I I l I I E 9 l I I l i Kt2en K K 2En t k how to such exists ensure 9
iemma 9 7 ti E 9 width 2En E total an interval Ilg ten every query K contains Proofy two cases I 9 KEI Gi 9 for some I LL 9 it 1 3 E 9 K ICg fg I I k
KE IG for i some Suppose done E Ik then if Icg 2g kt2E at else look 9 it
suppose KotICg ti E I 5 3 I 3 I 1 T I I I i 0 In combined intervals the In cover pick 1 covering k inside must lie the intervals of one
invariant consecutive two Key any 2En have width intervals E E APX quantile summary
E APX quantile given two Merging E APX 2 streams want summaries over combinedstream over summary s www.mszrrrrhhmnz to get summary of combine want QQ to www.mtszrrrrhhnhrrnz s s a Q'to I Cgi 3 ai I 9 Cgi
denote I'Gil I'Cg'd Q oh 9 I lgz Q Cg I s g I g m I Cgj bounds rank gj.hr rHSz let 9j C Q bound sits rank what goal 9J dit l 9 is 9 15,1 I 3 4 2 N rank S of 9J in so min I'Coptmin I Cgj set I Cgj I G'it t I Cg Max max
Q gel 94 3 oh intervals an's w E APX need show to to show Q is 2Ers width property in Q's Take intervals two consecutive Two cases from diff elements sets elements from sets same
diff sets A HAD a
sets same
that merging 2 QS's E APX This shows E APX combined of streams QS gives size
Pruning Input O h too many E approximate quantile summary w points sparser summary that's still very good Goal
O 3 4 15 G 4 4 e e e e e e Eta claim resulting quantile APX is Proof rank K suppose we query a f k
Recap we can E APX quantile summaries combine to get E APX quantile summary of whole thing E APX quantile summary sparsify to w Kpoints tat quantile summary APX to address Remains all how make at to one
what l if n all that's need claim I we APX so o o or o or o o o o o o k take µ login at root the O E approximate quartiles
Space D D D D D D D D D D D D D D D root only keep summaries
theorem 1 pass OC log4n E space deterministic stream quantile over E APX t dyadic intervals trick idea mergability slightly better
s a a a a a o or o o o o o o og n u Ye points have first level contain D D D D D D D D D D D D D D D D D D D
Theoremett 1 pass 0C logic Eh 1E space deterministic stream quantile over E APX better Even 2001T Greenwald Khanna space loafers more sophisticated quantile summary merging trick interval
and other ranks the median Finding in passes p p 2 for simplicity Fix OC ftp.olyloycnD space goal k rank suppose we are querying build E APX 1st quantile summary pass Yin for VJ login 6K E space w Kt in K VF a b query O n rank 6 E Kt 2in E KE K 2rn Erankca
Recommend
More recommend