T he Use o f Se a rc h E ng ine s fo r Ma ssive ly Sc a la b le F o re nsic Re po sito rie s www.c yb e rta pllc .c o m/ John H. Ric ke tson jr ic ke tson@c ybe r tapllc .c om jr ic ke tson@de javute c hnologie s.c om +1- 978- 692- 7229
Who I s c yb e rta p? • We pro vide a for e nsic platfor m for c ybe r inve stigations b a se d o n se ar c h e ngine te c hnology – E xte rna l T hre a ts & Viruse s – Ha c king – F ina nc ia l F ra ud – E le c tro nic Wa rfa re – Se c urity E ve nt Ana lysis • Ma rke ts – L a w E nfo rc e me nt – Cyb e r Se c urity, fo r b o th Go ve rnme nt a nd Co mme rc ia l E nte rprise s – Ba nking / T ra ding – E le c tro nic Co mme rc e – Ca ll Ce nte rs
F o re nsic E vide nc e • Do c ume nts – Co mpute r F o re nsic s / e Disc o ve ry / “da ta -a t-re st” – Disk Sc rub b ing , De a d Bo xe s, e tc – Sha re d Re po sito rie s like Dro pb o x o r Sha re po int • Arc hive s – E ma il – Insta nt Me ssa g ing • We b – Do wnlo a d e d HT ML pa g e s • F ina nc ia l I nfo rma tio n – Invo ic e s, Cre d it c a rd s, Priva te Info rma tio n – E le c tro nic T ra d e s • L o g F ile s fro m Ne two rk De vic e s • Ce ll Pho ne Ca ll Re c o rds • Re a l-T ime Ne two rk T ra nsa c tio ns – “da ta -in-mo tio n” – Pa c ke t Ca pture s – Ne two rk Stre a ms
F o re nsic Da ta • Arc hiva l in Na ture • No n-T ra nsa c tio na l • Co nta ins Me ta -da ta & Co nte nt & E xtra c te d I nte llig e nc e – Me ta -Da ta • Do c ume nt Attrib ute s – Autho r, Da te s, Printe rs, Ma c ro s, Do c ume nt E dits, F ile re fe re nc e • Ne two rk Attrib ute s – Addre ssing E ndpo ints, I D’ s, Do ma ins, Pro to c o l He a de rs – Co nte nt • Me ssa g e Co nte nt • Bo dy Co nte nt • Me dia Stre a ms – E xtra c te d I nte llig e nc e • E le c tro nic Pe rso na (e pe rso na ) • Ge o -lo c a tio n • Co rre la tio ns a nd links a mo ng he te o rg e no us da ta
Se a rc h E ng ine s Pro vide • No n-T ra nsa c tio na l re po sito ry o f a rc hiva l da ta • Me ta -da ta de sc ripto rs fo r ne two rk a nd do c ume nt a ttrib ute s – De line a ting me ta -da ta fro m c o nte nt in se a rc h q ue rie s is c ruc ia l • Sho w me a ll e ma il do c ume nts fro m this.I P a ddre ss to tha t.I P • Sho w me a ll I M me ssa g e s fro m this.I D to tha t.I D c o nta ining “nitra te ” • T he a b ility to se a rc h in fre e fo rm a ny me ta -da ta o r c o nte nt ite m • Ma ssive ly sc a la b le fo re nsic re po sito rie s – 10+ Billio n do c ume nts re pre se nting T e ra b yte s o f da ta • Sub -se c o nd se a rc h time s • Cro ss re fe re nc e a nd c o rre la tio n o f a ll da ta with a sing le se a rc h • Do c ume nt pa rsing • L e ve ra g e the se F , ric h-func tio na lity, fo re nsic re po sito rie s RE E – http:/ / luc e ne .a pa c he .o rg / so lr/ – http:/ / tika .a pa c he .o rg /
T ika Pro vide s • Me ta -da ta , MI ME , L a ng ua g e & Co nte nt – Hype rT e xt Ma rkup L a ng ua g e (html) – XML a nd d e rive d fo rma ts – Mic ro so ft Offic e d o c ume nt fo rma ts – Ope nDo c ume nt F o rma t – Po rta b le Do c ume nt F o rma t (pd f) – E le c tro nic Pub lic a tio n F o rma t – Ric h T e xt F o rma t (rtf) – Co mpre ssio n a nd pa c ka g ing fo rma ts (zip, ta r, e tc .) – T e xt fo rma ts – Aud io fo rma ts – Ima g e fo rma ts – Vid e o fo rma ts – Ja va c la ss file s a nd a rc hive s – T he mb o x fo rma t • I de ntifie s do c ume nts b y c o nte nt ONL Y File extension ≠ Content –
Apa c he So lr & T ika Ope n So urc e Pro je c ts - Meta-data - Content Documents Tika parser XML Solr indexer Index
De mo nstra tio n o f Do c ume nt Se a rc he s • I mpo rting Do c ume nts • I mpo rting E -ma il • Se a rc hing b o th Me ta -da ta & Co nte nt
Re a l-T ime Ne two rk Pa c ke t I ng e stio n Packets Decompile - Meta-data - Content Documents Tika parser XML Solr indexer Index
Ope n Do c ume nt E xtra c tio n & E nric hme nt • Vo I P te le pho ne c a ll pro c e ssing – wa v – I nc luding Spe e c h-to -T e xt, Vo ic e I de ntific a tio n, Vo ic e Re c o g nitio n • Vide o pro c e ssing – vide o , fla sh – I nc luding OCR, Spe e c h-to -T e xt, c lo se d c a ptio ning , a nd multi-fra me a na lysis. • I ma g e pro c e ssing – jpe g , pdf, g if, e tc – I nc luding OCR, fa c ia l re c o g nitio n, fle sh to ne de te c tio n, e tc . • Na tura l L a ng ua g e Pro c e ssing - te xt – I nc luding la ng ua g e tra nsla tio n, te xt e ntity e xtra c tio n, pro pe r na me disa mb ig ua tio n, summa rize c o nve rsa tio ns o r ide ntify pe o ple b y writing style . • Auto ma te d Ze ro -Da y Ma lwa re / Virus De te c tio n • Ste g a no g ra phic de te c tio n • De c ryptio n • e Pe rso na Ba c kg ro und Che c ks
De mo nstra tio n o f Pa c ke t Se a rc he s • I mpo rting Pa c ke ts • Se a rc hing b o th Me ta -da ta & Co nte nt • I MAP - HT T P - Vo I P - F a c e b o o k • Re c o nstruc tio n • Co nte nt E xtra c tio n • E le c tro nic Pe rso na I de ntific a tio n (e pe rso na )
Recommend
More recommend