Repositories and content addressable storage A data repository needs - PowerPoint PPT Presentation

Repositories and content addressable storage A data repository needs to (among other things) ● Make sure data remains safe and uncorrupted ● Make sure data remains available ● If data is changed, previous version should be kept Solutions available, but.. ● Links to data break -- how to make sure that once a link is created it never breaks? ○ Who keeps track of what is where? ● What if two files have different names but the same content (duplication)? ● Dealing with unexpected events Many solutions used centralized systems ● Single point of failure, single entity in control ● What about doing all the above at scale? Big data etc.

Repositories and content addressable storage Possible solution: distributed and content addressed storage Distributed = a resource is controlled by many. No single place, person, server, entity, has full control Location addressed = things can be found based on a known location ● C:\Photos\vacation.jpg ● The identifier changes, even though the content doesn’t. C:\Pictures\Vacation\waterslide.jpg Content addressed = things can be found based on their content ● Create a digital fingerprint of vacation.jpg based on its content. ● The fingerprint stays the same no matter where it physically resides

Repositories and content addressable storage Content addressable storage ● Fingerprint (hash) stays the same always = uniquely identify, de-duplicate Distributed content addressable storage ● Decentralizes the table that keeps track of where the raw data associated with the fingerprints physically reside ○ Uses many participants each having equal responsibility ● No single point of failure - eg., no single entity controls the lookup table ● Links stay can around forever as long as the network exists. ● Can use the resources of participants to have safe copies of the data, use their bandwidth to speed up transfers

IPFS IPFS is a content addressed distributed storage protocol ● A single file system that is spread out on many computers (nodes)

IPFS and repositories Generally: Some interesting properties: IPFS is a protocol rather than a service Can build services on top of it (client, server) The nodes form a distributed file system based Can access IPFS content via standard HTTP on P2P technology (e.g., DHT for lookups) using gateways (see figure) or FUSE. Versioning, de-duplication is fundamentally part Objects can be “pinned” so they aren’t garbage of it collected and always stay local Files are broken down into blocks. Possibility to create a private IPFS network (via modification of the bootstrap list) ● Each block has a hash. ● Blocks are linked it a tree-like structure. Easy, quick “IPFS is actually more similar to a single bittorrent swarm exchanging git objects.”

IPFS Gateway

What IPFS isn’t A cloud storage service, backup protocol. A blockchain-based system ● Can’t upload stuff and disconnect Blockchain = immutable, publicly available & verifiable record of transactions Files must remain available by “pinning” them. Can work with blockchain ● Unpinned files get deleted after some ● Incentives for providing node time resources ● Who will pin files in addition to the ● “Mining” a cryptocurrency for reward owner? ● Storing data in a blockchain is ○ other interested parties? inefficient. ○ Combine to store transactions in blockchain, data IPFS.

Repositories and content addressable storage A data repository needs - PowerPoint PPT Presentation

Repositories and content addressable storage A data repository needs to (among other things) Make sure data remains safe and uncorrupted Make sure data remains available If data is changed, previous version should be kept

Disk Storage Disk Storage Different types of disk storage: The smallest addressable unit

Bazel and External Repositories Which version do you get? Klaus Aehlig October 910, 2018

Mining Software Repositories What is MSR? Mining Software Repositories (MSR) uses data

Working together to make ORCID work for repositories ORCID in repositories task force Open

Energy Efficient Content-Addressable Memory Advanced Seminar Computer Engineering Institute of

Better I/O Through Byte-Addressable, Persistent Memory Jeremy Condit , Ed Nightingale, Chris

HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System

Maureen P. Walsh Open Repositories 2013 Charlottetown, PEI

Some advice from a reproducible researcher about how some advice from research data repositories

Connecting my repository to the PID Graph Kristian Garza Open Repositories 2019 @kriztean

RCAAP Repositories RCAAP Repositories Network Network - Promoting Promoting Interoperability

ORCID in Finland? How to take advantage of ORCID in institutional repositories, Open Repositories

Implementing Trusted Digital Implementing Trusted Digital Repositories Repositories Reagan W.

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

FireClass FC501 Whats FC501 ? 2 New Addressable Panel for Conventional Needs An entry

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Consistent, Durable, and Safe Memory Management for Byte-Addressable Non-Volatile Main Memory

Computer Architecture Review CS 562 1 The von Neumann Model John von Neumann (1946)

Computer Systems What the actual bits represent depends on the context: Numerical value

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 1: Computer Organization; Operating

Learning in One-Layer Networks Psych 209 January 9, 2020 Input-output mapping Simplest model of

A Scalable, Cont ent - Addressable Net work 1,2 3 1 Sylvia Rat nasamy, Paul Francis, Mark

Peer-to-Peer Networks 03 CAN (Content Addressable Network) Christian Ortolf Technical Faculty

6502 Interrupt and Bus Philipp Koehn 23 September 2019 Philipp Koehn Computer Systems

Repositories and content addressable storage A data repository needs - PowerPoint PPT Presentation

Repositories and content addressable storage A data repository needs to (among other things) Make sure data remains safe and uncorrupted Make sure data remains available If data is changed, previous version should be kept

Disk Storage Disk Storage Different types of disk storage: The smallest addressable unit

Bazel and External Repositories Which version do you get? Klaus Aehlig October 910, 2018

Mining Software Repositories What is MSR? Mining Software Repositories (MSR) uses data

Working together to make ORCID work for repositories ORCID in repositories task force Open

Energy Efficient Content-Addressable Memory Advanced Seminar Computer Engineering Institute of

Better I/O Through Byte-Addressable, Persistent Memory Jeremy Condit , Ed Nightingale, Chris

HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System

Maureen P. Walsh Open Repositories 2013 Charlottetown, PEI

Some advice from a reproducible researcher about how some advice from research data repositories

Connecting my repository to the PID Graph Kristian Garza Open Repositories 2019 @kriztean

RCAAP Repositories RCAAP Repositories Network Network - Promoting Promoting Interoperability

ORCID in Finland? How to take advantage of ORCID in institutional repositories, Open Repositories

Implementing Trusted Digital Implementing Trusted Digital Repositories Repositories Reagan W.

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

FireClass FC501 Whats FC501 ? 2 New Addressable Panel for Conventional Needs An entry

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Consistent, Durable, and Safe Memory Management for Byte-Addressable Non-Volatile Main Memory

Computer Architecture Review CS 562 1 The von Neumann Model John von Neumann (1946)

Computer Systems What the actual bits represent depends on the context: Numerical value

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 1: Computer Organization; Operating

Learning in One-Layer Networks Psych 209 January 9, 2020 Input-output mapping Simplest model of

A Scalable, Cont ent - Addressable Net work 1,2 3 1 Sylvia Rat nasamy, Paul Francis, Mark

Peer-to-Peer Networks 03 CAN (Content Addressable Network) Christian Ortolf Technical Faculty

6502 Interrupt and Bus Philipp Koehn 23 September 2019 Philipp Koehn Computer Systems

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE