Decentralized storage has long been proclaimed as the backbone for Web 3.0, the next-gen Internet, but its capabilities and general accessibility for the masses have been consistently limited by the lack of an incentive structure. These are vital for providing users with guarantees that data will be kept available and delivered promptly when needed. This is otherwise very difficult—if not outright impossible—to promise in a truly decentralized environment.

Since the emergence of blockchain, various projects have begun exploring its application as an incentive layer for decentralized storage markets, with several noteworthy storage solutions beginning to fight centralized hosts for market dominance across a number of application scenarios.

In this article, we’ll introduce the main concepts of decentralized data storage and consider its potential applications, using them to compare Neo’s native solution, NeoFS, with other related and well-known projects in the blockchain industry—Sia, Filecoin, and Swarm.

Why use decentralized data storage?

Before diving further into comparisons between the various solutions, we should lend context by considering the advantages and challenges involved with decentralized storage, in contrast to their centralized counterparts.

The simplicity and low maintenance of cloud storage resulted in a mass migration of data to centralized servers, both for casual users and businesses with large storage needs. Due to economies of scale, this has led to the emergence of massive data silos, predominantly owned and operated by tech giants such as Amazon, Microsoft, IBM, and Google.

Though competition between corporations ensures that users are provided with a number of service providers to choose from, the nature of the services themselves often attracts concern over the potential for censorship or misuse of private data. The shift to cloud storage also creates an increased opportunity for data theft; in its 2019 report, the EU’s lead privacy regulator reported a 71% increase in valid data breaches compared to 2018.

Annual data breaches and exposed records in the US from 2005-2019 (Source: Statista.com)

Decentralized storage networks aim to disrupt the existing cloud market in several ways. Firstly, most of these networks operate on free market principles with open participation. This means that anyone can participate in the network, and rather than risking a single point of failure, data is replicated to multiple nodes across the distributed network.

Blockchain integration also enables the natural inclusion of public-key cryptography. Data is usually encrypted before being stored with a host, decipherable only by its rightful owner and any parties the owner has chosen to share with. This process can make these services more resistant to censorship, manipulation, and can render any data lost in a breach useless to an attacker.

Further, the integration with blockchain technology provides access to an incentive layer, which can be used to reward good behavior or punish malicious activity. This allows these platforms to make use of the global and borderless nature of cryptocurrency by serving a global market.

Finally, decentralized storage networks make a strong case as cost efficient solutions. Unlike the vast overheads attributed to the operation of data silos, decentralized networks make use of the unused storage capacity found in end user devices all around the world. Users are incentivized to contribute storage to the network, ideally resulting in an abundance of supply that can drive down prices long-term.

On the downside, the technical complexity of decentralized storage attracts significant problems that must be addressed, otherwise it may not be possible to provide end users with an experience comparable to that offered by existing centralized services.

Notable challenges for these networks to face include raw scalability, the construction of the incentive infrastructure in order to sustain an open marketplace, and ensuring data integrity across a dynamic, globally-distributed network.

Decentralized storage projects

Despite their commonalities, decentralized storage solutions come in many different shapes and sizes, often with unique priorities and target markets. Let’s take a second to introduce the four projects under discussion today:

NeoFS

NeoFS originated as a proposal for an integrated, distributed storage protocol in the original Neo whitepaper. The inclusion of a native data storage network was intended to provide further ability for applications to decentralize, achieved by granting them the ability to store, retrieve, and manage off-chain data.

The final design for the system was created by Neo St Petersburg Competence Center (Neo SPCC), a Russian R&D team formed to support the Neo ecosystem and develop a true decentralized cloud platform.

NeoFS sources storage space from any users, with commodity or enterprise-grade hardware, who can rent out unused HDD/SDD capacity to the network in return for GAS. Data placement is calculated deterministically through the use of rendezvous hashing over a network multigraph, making the solution extremely scalable despite the distributed environment by removing the need for unnecessary metadata transfer between nodes.

The solution also introduces an interactive zero-knowledge proof protocol, which is based on homomorphic hashing to preserve data integrity across the network asynchronously. This protocol provides consistent auditing; failure to pass the audit prevents payment, preventing nodes from trying to game the network by deleting data.

Sia

Conceived in 2013, Sia was conceptualized to disrupt cloud storage, using blockchain to create an open marketplace for users to purchase or rent out unused data storage capacity.

Participants agree on storage terms such as capacity and duration requirements, which are settled as cryptographic service level agreements known as file contracts. These are processed and completed automatically on the Sia blockchain.

Since it’s first stable release in June 2016, Sia has made consistent progress in both the development of its software and the growth of its storage network. With an initial focus on archival applications, according to siastats.info the network currently stores over 800 TB of data and has over 300 active hosts distributed around the globe.

Filecoin

Despite still being in an experimental stage, Filecoin arguably brings the most pedigree to the comparison. The initiative was launched by Protocol Labs, creators of the popular InterPlanetary File System (IPFS), a peer-to-peer storage network launched in 2015 in an attempt to revolutionize the way data is distributed on the Internet.

Unlike HyperText Transfer protocol (HTTP), the current web standard where data is requested from a specific location at which it is hosted, IPFS requests specify only the cryptographic hash of the data. This can be thought of as a fingerprint-like identifier.

Known as content-addressing, this technique allows users to request a desired piece of content directly by its identifier, and be connected to any nodes that can serve it. However, this does require that at least one node makes the effort to store that particular piece of content.

Since the release of IPFS, this has proven to be its key weakness—there needs to be an incentive for a node to pin content and guarantee its availability. This is where Filecoin comes into play, designed as the blockchain-based incentive layer for IPFS. Though yet to officially launch, in an April roadmap update the team reported crossing over 5 PB of proven storage on its TestNet. According to the same source, Filecoin is expected to launch its MainNet between July and August, 2020.

Swarm

Designed as the base layer for Ethereum’s Web3 stack, Swarm is a decentralized storage and communication infrastructure with a censorship and DDOS-resistant design. The network takes the form of a distributed, content-addressed chunk store,  making use of Ethereum’s devp2p network layer for peer discovery and communication. Ethereum is also used for smart contracts to build out its incentive infrastructure, the Swarm Accounting Protocol, or SWAP for short, which charges nodes for requesting resources and rewards them for serving resources.

Through SWAP and its address-key based retrieval protocol, Swarm intends to create an environment where nodes looking to maximize their profitability will naturally cache and store content that they can serve frequently. At the time of writing, Swarm’s incentivization scheme is only available for opt-in on the TestNet, with most data being stored altruistically at this time.

The low-level differences in approach taken by each project can make it difficult to compare them directly. To get around this issue, we’ll divide the potential use cases into three main groups and compare how each project serves the application scenario.

Backup and archival

The simplest application of distributed storage is also one of the most common today. Cloud storage has become widely popular with individuals and businesses alike, providing a resource for those looking to retain important files long-term or add redundancy to their setup in case of a disaster.

To compete with today’s cloud services, decentralized storage providers intending to meet this application scenario will typically prioritize raw capacity over performance or bandwidth. This makes it an ideal use case for open, distributed storage networks where unused space from any device can be sold on the open market, driving down prices through an abundance of supply.

Since NeoFS, Sia, Filecoin, and Swarm all provide similar infrastructure to support open data storage markets, they are all well positioned to benefit from this type of use case. Likewise, each of the projects provide a data replication mechanism to ensure persistence, countering the element of uncertainty inherent in decentralized networks. Each protocol adopts a similar approach, using erasure coding or similar techniques to split up data, then distributing copies to multiple nodes on the network for redundancy.

To guarantee that data is kept accessible by the nodes, incentives for good behavior and periodic data integrity checks (such as Filecoin’s proposed Proof of Replication or NeoFS’s zero-knowledge data validation mechanism) make it cheaper to follow the rules than it is to attempt to trick the network.

The NeoFS zero-knowledge data auditing game to discover corrupt or malicious nodes (Source: NeoFS Website)

Of the mentioned networks, Sia currently maintains dominance over the backup and archival use case, being the only of the discussed protocols with an active MainNet. This position is reinforced by another unique strength of Sia; its seed-based file recovery service which allows users to create a snapshot of current files stored on the network.

This seed allows users to recover these files at any time and from any location, making it a powerful tool. However, the team notes that the solution is not quite “fire and forget,” as users must ensure that file contracts remain active in order to restore files. This means the user must manually access Sia every few weeks to renew contracts, or set aside an allowance for auto-renewal.

Web hosting & content distribution

Sometimes the goal isn’t primarily to retain data for a long time, but is instead to quickly serve data to users. This is a typical scenario for a web frontend, and also applies to frequently accessed content such as music, video, or game streams.

To meet the needs of these users and compete with data centers, storage providers for these applications may prioritize performance (e.g SSD over HDD) and high bandwidth volume. Further, placing content as close to users as possible helps reduce the time taken for a retrieval request to be fulfilled.

When it comes to optimizing data placement, Ethereum’s Swarm presents a particularly elegant solution. Through its node synchronization and caching mechanism, Swarm is designed to act as an “autoscaling elastic cloud,” where the growth in popularity of a particular piece of content will increase the number of nearby nodes caching the chunk, in turn helping to optimize the routing for end users by reducing the average number of hops for any given request.

The end result is a distributed system that naturally configures itself for rapid distribution, making it a powerful contender for this class of application scenarios, especially for services that already integrate with Ethereum in other ways.

Example of how caching can optimize content delivery time for end users (Source: Incapsula)

Filecoin also adopts an incentive mechanism for ensuring fast content delivery. This is achieved by separating the network into storage and retrieval markets, helping differentiate between types of storage providers—those offering high capacity, and those focused on fast retrieval. This helps users select providers that can more closely match their needs, providing the infrastructure for IPFS to be used as a responsive CDN/web hosting service.

Despite its strong position in the data archival market, serving data has historically been a weakness of Sia, one that the team aimed to address in February with the introduction of Skynet, a Layer 2 network for Sia.

Skynet operates through Skynet portals, usually privately-used modified Sia nodes, and Skynet Webportals. Webportals are publicly accessible servers equipped with a web UI, allowing users to upload or access Skynet content without requiring any additional software, similar to the protocol gateways of NeoFS.

Leveraging the Sia backend for storage, Skynet is intended to add file sharing and content distribution facilities for end users, using the improved latency from the portals to deliver requests in a timely manner.

Both of these approaches can also be seen in NeoFS. Like Filecoin, the open marketplace allows for customer self-optimization when it comes to initial data placement. Storage nodes on NeoFS can define their geographic locations, storage type, capacity, and prices when joining the network, giving important information for renters to take into consideration when placing data. For example, a company may want to specifically target US-based storage nodes in order to help deliver data more quickly to US users.

Further, Neo SPCC also created a CDN service to further improve on performance and latency. Similar to Skynet, content stored on NeoFS can be requested via an independent CDN network layer, using caching and geo-locating to optimize data delivery. In the future, network participants will be able to use or host their own NeoFS.CDN Edges, providing healthy endpoints to serve content as quickly as possible, even if the data is stored in another region entirely.

An example of the NeoFS.CDN in action can be seen in the Send.NeoFS service. Currently available for public testing, users may upload files with a specified lifetime to the NeoFS TestNet and share them via link to demo the service.

Though it will likely take further development before any of the discussed networks can consistently challenge today’s cloud services in terms of consistent delivery time and cost, each has demonstrated that this is almost certainly a question of “when,” not “if.”

Interact with off-chain data through smart contracts

The creation of smart contract platforms provided developers with an environment designed for trustless execution of business logic, however, this only covers one facet of a particular application. Users may often be required to access a service through a centralized web frontend, re-introducing trust requirements, and projects may operate or rent their own servers to handle any additional data required by the app.

Though some of these use cases can be met by protocols designed with content distribution in mind, as noted in the previous section, blockchain applications still frequently depend on off-chain data stores, bringing with them concerns regarding trust. To allow the creation of truly decentralized applications, distributed off-chain storage can be used to host both user interfaces and to replace these centralized backends.

In an ideal scenario, a developer could write code into the smart contract that requests a piece of data from an off-chain source, act on the data, and then either persist the changes or move on. The problem is that smart contracts are inherently limited to the instruction and data set provided by the execution environment.

Storage operations will be passed through the Neo3 built-in oracle service to allow smart contracts to interact with decentralized off-chain storage (Source: Neo SPCC)

Put simply; if the virtual machine does not have the ability to access off-chain services, then data must be instead stored on-chain in order to be accessed by a contract. This in turn means that the data needs to be deployed alongside the contract, or accumulated over time by aggregating it through transactions.

Getting data on-chain in this way provides thorough redundancy, but this is accompanied by an extreme price per byte that is unlikely to meet the needs of many developers, particularly as larger and larger sets of data are required.

Though Sia and Filecoin have both expressed interest in making off-chain data directly accessible to smart contracts through cross-chain bridges, proper integration with a blockchain VM or oracle service is necessary before contracts can be designed to interoperate with the storage networks.

NeoFS is the first of the known decentralized storage networks to attempt to provide this functionality. Following completion of Neo’s native oracle service, contracts running through NeoVM will be able to request and even manipulate data stored off-chain on NeoFS. This can provide a functional alternative to centralized backends while avoiding the need for the high costs and often unnecessary redundancy associated with on-chain storage.

This integration makes NeoFS the only known decentralized storage system able to meet this application scenario, facilitating the creation of what may become the first truly decentralized applications—where both the backend database and user-facing frontend of an app can be served in a decentralized manner.

Summary

In summary, the recent juncture between blockchain technology and distributed storage has resulted in the creation of several promising decentralized storage networks, each with the incentive layers required to ensure their accessibility and viability. These networks represent a disruptive force in the cloud sector, though their application to the full range of potential application scenarios is still experimental at this time.

Moving forward, Neo SPCC also intends to further improve its tech stack through the addition of Neo Lambda, a service for decentralized serverless computing. Just as NeoFS allows applications to move their data storage needs off-chain, Neo Lambda intends to allow them to offload heavy computation, resulting in the team’s goal of creating a true decentralized cloud platform.