New directions in Globus: Collections, responsive storage, and safe data Ian Foster The University of Chicago and Argonne National Laboratory 1
Breaking down walls to yuge data sharing and analysis Ian Foster The University of Chicago and Argonne National Laboratory 2
Thesis: We enhance data sharing and analysis by eliminating barriers to navigation and flow 3
Notable barriers to data flow and navigation • Moving data rapidly, securely, and reliably from lab to lab • Accessing data at remote locations • Controlling who can access data • Tracking what data is where • Discovering available data within a rapidly growing haystack • Computing at large scale, including on distributed data • Complying with rules on sensitive human data • Data lifecycle for large and distributed data 4
Cloud: Outsourcing and automation Software as a service: SaaS (web & mobile apps) Platform as a service: PaaS Infrastructure as a service: IaaS 5
Cloud: Outsourcing and automation SaaS for Software as a service: SaaS (web & mobile apps) science Platform as a service: PaaS Infrastructure as a service: IaaS 6
Compute facility Sequencing center Globus transfers Globus controls access to files reliably, 4 shared files on existing securely 7 2 storage; no need to move files to cloud storage! Curator reviews and Transfer approves; data set published 6 on campus or other system Collaborator logs in Publish to access shared Share files; no local Researcher Researcher account needed; 1 assembles data set; Researcher 3 initiates transfer download via attaches metadata selects files to 5 request; or requested Globus (Dublin core, share, selects user automatically by script, domain-specific) Publication or group, and sets science gateway repository access permissions 8 6 Peers, collaborators search and discover • Only Web browser required datasets; transfer and share using Globus • Use any storage system Discover • Access using any credential Personal Computer
How Globus adds value… • Ease of use, consistent user interface across systems • “Fire-and-forget” reliable file transfer • Low-overhead external collaboration • Secure access, multi-tier security model • Maximized wide area network throughput • Rapid deployment via standard packages • Highly automatable: CLI, RESTful API 9
Globus has the best numbers! 4 190 PB 50,000 25 billion files processed transferred registered users major services 13 10,000 99.9% 10,000 uptime active users national labs active endpoints 35+ 130 1 PB 3 months longest institutional largest single continuously federated subscribers transfer to date managed transfer campus identities
Globus has the best numbers!
Cloud: Outsourcing and automation Software as a service: SaaS (web & mobile apps) PaaS for Platform as a service: PaaS science Infrastructure as a service: IaaS 12
13
14
Prototypical research data portal Globus Cloud Globus Web Move portal storage Widgets into Science DMZ, Globus Transfer with Globus endpoint Globus Auth Service Leave portal web HTTPS Browser Portal Web Other REST server behind firewall Server (Client) Services Firewall Globus handles GridFTP security and data User’s Other Portal Endpoint Endpoints heavy lifting Endpoint (optional) Science DMZ Desktop 15
• Integrate Globus for data downloads • Shared endpoint with subfolder per request • Single sign on via streamlined account provisioning
Advanced Photon Source Francesco De Carlo
RDP admin RDP endpoint /~/ … 19
# (1) Create directory to be shared share_path = '/~/' + shared_dir + '/' RDP tc. operation_mkdir (host_id, path=share_path) admin RDP endpoint /~/ … shared_dir 20
# (1) Create directory to be shared share_path = '/~/' + shared_dir + '/' RDP tc. operation_mkdir (host_id, path=share_path) admin # (2 Create the shared endpoint on that directory RDP endpoint shared_ep_data = { Shared 'DATA_TYPE': 'shared_endpoint', /~/ endpoint 'host_endpoint': host_id, 'host_path': share_path, … shared_dir 'display_name': 'RDP ' + shared_dir, 'description': 'RDP shared endpoint' } r = tc. create_shared_endpoint (shared_ep_data) share_id = r['id'] 21
# (3) Copy data into the shared endpoint tc.endpoint_autoactivate(share_id) RDP tdata = TransferData(tc, host_id, share_id, admin label='RDP copy to share', sync_level='checksum') RDP endpoint tdata.add_item(source_path, '/~/', recursive=True) Shared r = tc. submit_transfer (tdata) /~/ endpoint tc. task_wai t(r['task_id'], timeout=1000, polling_interval=10) … shared_dir Files for user 22
# (4) Set access control to enable access by user r = ac. get_identities (ids=user_id) User email = r['identities'][0]['email'] RDP rule_data = { admin 'DATA_TYPE': 'access', 'principal_type': 'identity', # To whom is access granted? RDP endpoint 'principal': user_id, # In this case, an individual user Shared 'path': '/', # Path to which access is granted /~/ endpoint 'permissions': 'r', # Grant read-only access 'notify_email': email, # Email invite to this address … shared_dir 'notify_message': # Include this message in email 'The data that you requested from RDP is available.' Files for user } tc. add_endpoint_acl_rule (share_id, rule_data) 23
# (4) Set access control to enable access by user r = ac. get_identities (ids=user_id) User email = r['identities'][0]['email'] RDP rule_data = { admin 'DATA_TYPE': 'access', 'principal_type': 'identity', # To whom is access granted? RDP endpoint 'principal': user_id, # In this case, an individual user 'path': '/', # Path to which access is granted /~/ 'permissions': 'r', # Grant read-only access 'notify_email': email, # Email invite to this address … shared_dir 'notify_message': # Include this message in email 'The data that you requested from RDP is available.' Files for user } tc. add_endpoint_acl_rule (share_id, rule_data) # (5) Ultimately, delete the shared endpoint tc. delete_endpoint (share_id) 24
What’s coming soon: Richer endpoints HTTPS access to endpoints • Enhanced use of research storage: • Asynchronous, bulk transfer: GridFTP • Synchronous remote access: HTTPS • Enhanced Globus web app • Browser-based upload/download • Inline file viewer • Integration with clients, web apps GridFTP 25
What’s coming soon: Richer endpoints HTTPS access to endpoints • Enhanced use of research storage: • Asynchronous, bulk transfer: GridFTP • Synchronous remote access: HTTPS • Enhanced Globus web app • Browser-based upload/download • Inline file viewer • Integration with clients, web apps Collections • Groupings of files that are to be GridFTP treated as logical units • Can be named and described 26
What’s coming soon: Richer endpoints HTTPS access to endpoints Data search • Enhanced use of research storage: • Automated metadata harvesting • Asynchronous, bulk transfer: GridFTP • From Globus endpoints • Synchronous remote access: HTTPS • Event-driven extraction/synthesis • Enhanced Globus web app • Rich search capabilities • Browser-based upload/download • Free text, faceted, boosted • Inline file viewer • Integration with clients, web apps Collections • Groupings of files that are to be GridFTP treated as logical units • Can be named and described 27
Thank you to our sponsors U . S . D E P A R T M E N T O F ENERGY And the Globus team at the University of Chicago and Argonne, in particular: Rachana Ananthakrishnan, Ben Blaiszik, Kyle Chard, Raj Kettimuthu, Ravi Madduri,Brigitte Raumann, Steve Tuecke, Vas Vasiliadis 28
We have constructed a new global-scale data fabric that accelerates discovery by streamlining scientific data sharing and analysis • Globus-enabled storage systems enable robust, secure access • Globus cloud services implement transfer, sharing, publication, discovery, and other capabilities We are now working to extend this fabric to: • Enable distributed computation as well as data movement • Use distributed computation to map data without movement • Work with sensitive data 29
Recommend
More recommend