Workflo kflow-Orient Oriented ed Cyberinfra berinfrastru structure cture for Sen ensor sor Dat ata a An Analyt alytics ics Arcot t Raj ajas asek ekar ar 1 , , John n Orcutt utt 2 , , Fran ank Ver erno non 2 1 University of North Carolina, Chapel Hill 2 University of California, San Diego 1
Four Kinds of Big Data Archetypal Crowd-Sourced Long-tail Sensor Streams Science Projects Social Media Science Projects Internet of Things LHC, SKA, LSST Facebook, Twitter Appliances, Homes Small orgs, RDM Personal Recommenders Business/Industry Smart Cities Hobbies, Citizen Yelp, Angie, Groupon Genomics, Finance Energy grids, Science/Arts Transportation Web Commerce Government Government Health Amazon, Ebay NASA, NOAA, DOE Internal and Biosensors, ER,OR unpublished Characterization Archetypal Crowd-Sourced Long-tail Sensor Streams Volume High High High High Velocity High Bursty Low High Variety Low High High High Veracity High Mixed Low Mixed Value High Ephemeral Unknown Huge Findability High High None None Availability High Short-term None Low
Sensor Data What is a sensor? A sensor acquires a physical parameter and converts it into a signal suitable for processing (e.g. optical, electrical, mechanical) Sensor data have some peculiar properties: • Highly distributed network • Time-related – Continuous • Concept of infinite stream • Volume – small to large packets • Velocity – slow mostly • Variety - Disparate • Fusion is important • Metadata is importan t • Sensor Concentrators 3
Sensors & DFC • Multiple partners use sensor data – Marine, Seismic & Environment Science (SciON) – Hydrology (Hydroshare) – Engineering (Smart Cities) – Cognitive Science (TDLC) – Biology (CyShare) • DFC development activities: – Access to sensor data • Access control, Authentication,… – Export to Standard formats – Archiving of sensor data • Reuse & Repurpose – Integrated Metadata & Discovery – Integrate into Tools & Workflows – Playback: Synchronized 4
Antelope Real Time System • Conc ncentr entrat ator or – Used by multiple projects – High performance Object Ring Buffer – Multiple types of sensor – Stream processing – Network of ORBs – Used by UCSD SIO BRTT.COM 5
DFC & Antelope • Loosely-coupled federation ORB ORB • Connection through Microservices • Can define MSO for each orb stream • Can be added to Workflows Provide access without burdening • Antelope Module ARTS Administrators Microservices • Implementation: – Reap Sensor Streams – Convert Formats – Store Streams as Files – Access Packets from Files DFC Platform – Push Files as Streams – Use Rules to Archive Client Client Client Client 6 iRules & Workflows
DFC Antelope Microservices • Packet Low-level Access (read, write) • Single Packet Microservices Microservices – msiAntelopeGet - get a packet – msiAntelopePut - put a packet – msiOrbGet - get current packet – msiOrbReap - get next packet Connection Microservices • msiOrbReapTimeout – – msiOrbPut – push a packet – msiOrbOpen msiOrbClose – Packet Manipulation Microservices • – msiOrbTell - redirect to an orb – msiOrbUnstuffPkt Stream-level Microservices msiFreeUnstuffPkt • – – msiOrbDecodePkt – msiOrbSelect - select streams – msiOrbStuffPkt – msiOrbReject - reject streams – msiOrbEncodePkt msiOrbPosition – position read pointer – by packetid • ARTS Heartbeat Microservices – msiOrbSeek - position read pointer by – msiOrbStat skipping packet – msiOrbPing – msiOrbAfter – seek with time • Other Helpers – convertExec – format conversion 7 – readLine
Reaping Rules Reap and Convert antelopRule{ antelopRule{ delay("<PLUSET>30s</PLUSET><EF>10m</EF>") { #Get Packet msiAddKeyVal(*KVP,"selectCriteria",*pktSelectInfo); msiOrbOpen(*orbHost,*orbParam, *orbId); msiAntelopeGet(*pktSelectInfo, *firstPktId, *lastPktId, msiOrbSelect(*orbId, *Sensor,*sresOut); *NumOfPkts,*outBufParam); msiOrbReap(*orbId, *pktId, *srcName, *oTime, *pktOut, *nBytes, *SColl = *Coll ++ "/" ++ *Sensor *resOut); *SFile = *SColl ++ "/" ++ "*firstPktId" ++ "_" ++ "*lastPktId" ++ ".data"; msiOrbDecodePkt(*orbId, *modeIn, *srcName, *oTime, *pktOut, msiCollCreate(*SColl,"1",*STAT_1); *nBytes, *decodeBufInOut); msiDataObjCreate(*SFile, *Resc, *D_FD); msiOrbClose(*orbId); msiDataObjWrite(*D_FD, *outBufParam, *WR_LN); #Store Packet msiDataObjClose(*D_FD,*STAT_2); *SColl = *Coll ++ "/" ++ *Sensor msiAddKeyVal(*KVP,"firstPktId","*firstPktId"); *SFile = *SColl ++ "/" ++ "waveform.data"; msiAddKeyVal(*KVP,"lastPktId","*lastPktId"); msiCollCreate(*SColl,"1",*STAT_1); msiAddKeyVal(*KVP,"numOfPkts","*NumOfPkts"); openForAppendOrCreate(*SFile, *Resc, *D_FD); msiAssociateKeyValuePairsToObj(*KVP, *SFile, "-d"); msiDataObjWrite(*D_FD, *decodeBufInOut, *WR_LN); } msiDataObjClose(*D_FD,*STAT_2); writeLine("stdout", "Delayed Rule Launched"); } } openForAppendOrCreate(*SFile, *Resc, *D_FD) { input *pktSelectInfo="<ORBHOST>anfexport.ucsd.edu:cascadia</ORBHOST> *SObj = "objPath=" ++ *SFile ++ "++++openFlags=O_RDWR"; <ORBSELECT>TA_M04C/MGENC/EP40</ORBSELECT> msiDataObjOpen(*SObj, *D_FD); <ORBWHICH>ORBOLDEST</ORBWHICH> msiDataObjLseek(*D_FD, *Offset,*Loc,*Status1); <ORBNUMOFPKTS>8</ORBNUMOFPKTS> } <ORBNUMBULKREADS>4</ORBNUMBULKREADS>", openForAppendOrCreate(*SFile, *Resc, *D_FD) { *Resc="destRescName=anfdemoResc++++forceFlag=", msiDataObjCreate(*SFile, *Resc, *D_FD); *Coll="/rajaanf/home/rods/SensorData", } *Sensor= "TA/M04C/MGENC/EP40" input *Coll="/rajaanf/home/rods/newsenstest",*Resc="dest output ruleExecOut RescName=anfdemoResc++++forceFlag=", *Sensor= "TA_J01E/MGENC/SM100", *orbHost="anfexport.ucsd.edu:cascadia", *orbParam="", *modeIn=2, *Offset="0", *Loc="SEEK_END" Continuous Reaper output *pktId, *srcName, *oTime, *nBytes, *pktOut, *decodeBufInOut, ruleExecOut 8
Ingest Rules Interactive Packet Ingestion Orb2Orb: Reaped Packet Ingestion antelopRule{ #get a MGENC packet from cascadia and put it in demo msiAntelopePut(*orbName, *srcName, *timeStamp, # also write also in a file to compare *pktPayLoad); antelopRule{ } # get the packet and the write into file input *orbName="anfdevl.ucsd.edu:demo", msiAntelopeGet(*pktSelectInfo, *firstPktId, *lastPktId, *NumOfPkts,*outBufParam); *srcName="DFC_UNC/ch/T1", *timeStamp="", *pktPayLoad=$"test 3 string" *SColl = *Coll ++ "/" ++ *Sensor output ruleExecOut *SFile = *SColl ++ "/" ++ "*firstPktId" ++ "_" ++ "*lastPktId" ++ ".data"; msiCollCreate(*SColl,"1",*STAT_1); msiDataObjCreate(*SFile, *Resc, *D_FD); msiDataObjWrite(*D_FD, *outBufParam, *WR_LN); msiDataObjClose(*D_FD,*STAT_2); # write to orb msiAntelopePut(*orbName, *srcName, *timeStamp, *outBufParam); } input *pktSelectInfo="<ORBHOST>anfexport.ucsd.edu:cascadia</ORBHOST> <ORBSELECT>TA_J01E/MGENC/SM1</ORBSELECT> <ORBWHICH>ORBOLDEST</ORBWHICH> <ORBNUMOFPKTS>1</ORBNUMOFPKTS> <ORBNUMBULKREADS>1</ORBNUMBULKREADS> <ORBPRESENTATION>ONEPKT</ORBPRESENTATION>", *Resc="destRescName=anfdemoResc++++forceFlag=", *Coll="/rajaanf/home/rods/SensorData", *Sensor="TA_J01E_MGENC_SM1", *orbName="anfdevl.ucsd.edu:demo ", *srcName="DFC_UNC/MGENC/T1", *timeStamp="" output *outBufParam, *firstPktId, *lastPktId, *NumOfPkts, ruleExecOut 9
Sensor Data in DFC • Sensor streams are stored as files in DFC: – Raw Orb format – buffer – CDL format - Common Data form Language a human-readable text representation of netCDF data – NC format: NetCDF Format • NetCDF 4 – version 4 • HDF5 compatible • Use ‘ ncgen ’ for conversion – JSON – human-readable format • Multi- type Sensor’s reaped – Seismic Sensor • 3 sensor measurement per packet • North, East, Vertical Movements – Pressure Sensor • 2 sensor measurement per packet • Barometric Pressure, Infrasound 10
Recommend
More recommend