Arweave and the idea of the permaweb

This post looks at the pros and cons of building a permaweb. There are some great reasons to have an authenticated copy of everything ever published to the internet, but is that really how the internet is supposed to work?

Arweave and the idea of the permaweb

One of the defining qualities of a blockchain is that it is immutable. To be immutable means that data recorded on a blockchain can’t be changed or altered. It’s a fancy way of saying a blockchain is a permanent record.

There are a lot of reasons why having a permanent record is important. Censorship resistance is a big one. If something is immutable, it can’t be changed or altered by people or organizations in power.

Immutability is part of what helps people trust while operating with trustless systems. This might sound like a mind-bender, but another core attribute of blockchain-based transactions is that you don’t need to trust the other side of a transaction because you can verify all of the relevant information beforehand.

So part of trustless transactions is that we have faith in the underlying reporting and accuracy of the blockchain. We can trust the transaction record because it's immutable and transactions can’t be fabricated.

This is all a build-up before we start talking about the idea of a permaweb and the move to extend immutability to other facets of onchain activities. What I mean specifically is the idea of permanently archiving text-based posts (such as blogs and internet writing like what you are reading right now) onchain.

Archiving text posts has become the default for web3 publications. The preferred method (so far the only method that I’m aware of) is to archive published posts on a blockchain called Arweave.

A clip of the Arweave home page

The mission of Arweave is to build out a permaweb by archiving and making accessible all of the information published on the internet. Data stored on the permaweb isn’t limited to text-based posts. Other stuff can be stored there too, like decentralized apps and other digital file types such as video and images.

But for the most part, what we are talking about in what follows is the idea of saving text-based posts to the permaweb as a default. Does it make sense that every time you hit publish, the thing you are publishing also gets recorded to the permaweb?

On one hand, this sounds like a good idea that fits with the ethos of immutability and is in support of building a censorship-resistant internet. A permanent immutable archive of internet writing can become a library of sorts — a cornerstone of our collective knowledge.

At the same time, there is something that feels at odds about the idea of the permaweb and the fluid and iterative nature of the internet. I think there is a strong case to be made that not every blog post or update published on the internet should be permanently saved or archived.

While I am a proponent of censorship-resistant tech and an advocate for an open internet, I’m also struggling with the idea that we need a permaweb or that everything needs to become part of the permaweb.

Let’s dive in.

What is Arweave? And what is the permaweb?

Arweave is a blockchain that is designed to store data. Its consensus mechanism links new blocks of data to older blocks and not just the preceding block of data as is the case with most blockchain designs. This setup ensures that miners validate and track data across multiple blocks. This has the practical application of making data both permanent and retrievable.

Arweave is designed specifically to be a blockchain for data. As mentioned earlier, the goal of the blockchain is to create a permaweb or a permanent internet-based archive of information.

Here’s an example of a piece of content I published on Hackernoon last year:

An image of a text post saved to the permaweb via Arweave.

The idea behind the permaweb is to have a purpose-driven tech stack that allows for archiving different kinds of content. The process of publishing to the permaweb is pretty streamlined, and in the case of the HackerNoon piece, or the on popular web3 publishing platforms like Paragraph, it just happens via an integration.

The creator of the piece doesn’t decide whether the work should be archived forever. It just happens. And so the biggest issue is not with the underlying tech, which is useful, but the assumption that we need to save everything, forever.

The main components of data stored on the permaweb are similar to the components you might see if you engage in a blockchain transaction. The transaction number is like a wallet address and makes it possible to find the content again in the future using a blockchain explorer tool.

Other metadata associated with a permaweb post are a timestamp, block height (location on a blockchain), tags (notes on the file type), a signature (used to authenticate the contents of the media), a “from” field (where the media originated), and the cost associated with adding the media the permaweb (a negligible amount, which makes this whole thing work).

An image of a data statistics dashboard from an Arweave blockchain explorer,

What does this all mean?

The idea of creating a post about Arweave and the permaweb has been on my to-do list for a long time. One of the main reasons is that I was hoping it would force me to come up with a firm conclusion about the idea or the necessity of the permaweb or more specifically, the idea that the permaweb should be the default for the future of publishing or web3 media.

I finally decided to write this post now because I thought that it would be a good companion to the post I wrote earlier this week about web3 publishing.

Moving words onchain with Paragraph
The media business needs a new model. Will onchain publishing be it?

Read more about web3 publishing.

Here’s the breakdown of costs and benefits to build a permaweb as far as I understand them:

Let’s start with the benefits:

Censorship resistant

An immutable ledger is like a system of accountability. The need for a technology that captures and archives publishing on the internet is becoming increasingly important as we face the rise of weaponized misinformation.

A permaweb gives us a default or a standard and can provide a baseline to build other kinds of information and data services that rely on an authenticated source of information. We don’t need to live in a post-truth world, we just need better tools to help us understand what is authentic and what is not.

Verifiable chain of events or point of origins

Newspapers and local news organizations provided many functions. Their most important utility was that they acted as a “newspaper of record” or a credible and authoritative source of information.

As newspapers and news-gathering organizations decline, the need for a publication of record or some kind of place to find the source of a piece of information is vital to a well-functioning society. To me, it makes sense to create the future “paper of record” on a globally distributed, open blockchain like Arweave.

A new kind of library that makes content indexable and findable well into the future

Conceptually, the idea of the permaweb leads to the creation of a massive, global library of information. Right now, searching for the budding permaweb is difficult and cumbersome. It’s got the same problems or issues that early blockchain wallets and addresses had — the information is mostly accessible only if you have a long transaction number that you can input into a blockchain explorer.

Over time, it’s not hard to imagine that the permaweb becomes easier to search and use and that people will be able to find information based on several filters or that there is a specific search engine built for the sole purpose of surfacing things in the archive.

Eventually, the long string of numbers and letters used to represent each piece of content on the permaweb will become some kind of human-readable address, like a content title and brief description. But right now, none of that exists. The library is still very much in the early days.

Here are some of the costs of making the permaweb a publishing default:

Should everything on the internet be permanent?

One of the greatest attributes of the internet is that it is kind of like a journal that is constantly being updated. New work is derived from old work. New information becomes trends, trends become memes, and in some way, the constant chewing and digesting of the global information network helps us comprehend the world information landscape around us.

While it might make sense to archive or have a permanent record of certain kinds of information — like the output of professional news organizations or from independent reporters — do we need to store most of the stuff that is published with the intent that it will be consumed and instantly forgotten? Shouldn’t there be space on the internet for the constant revisions and creative mess that makes it so valuable, and so interesting?

Creates a flat information architecture

Right now, only the best content becomes enduring on the internet. The best stuff rises to the surface because people link to it, or refer to it, or build upon it. In reality, that kind of content — the kind we refer back to and understand as the beginning of something is rare. Very little content is durable. Most of it serves its purpose of grabbing fleeting attention and then fades into the background.

Where do we go from here?

I haven’t even mentioned how permanently storing the world’s ideas and output will play in the age of generative AI. Rather than provide some tie-breaker for the for and against arguments for the permaweb, I think the rise of AI only makes these decisions harder.

On one hand, it seems like having a verifiable creator or being able to source a point of origin for content on the internet will be especially useful as the world becomes more awash with copycat and generic ramblings from a robot.

On the other hand, having some kind of archive of what people are thinking and creating also feels like a giant honeypot or the perfect machine learning training ground.

Maybe there are permissions on access to an archive like Arweave, but having a permissioned system gets us back to square one. If we are going to have a permaweb, it should be open. However, an open system, especially an open information system, can be tremendously valuable. It’s just not clear yet who, or what, will benefit the most from something like that.

As the internet ages and matures, the idea of digital posterity becomes more important. Platforms will come and go, links will break, and websites will disappear, but if we had an archive and a way to store data onchain permanently, we would not risk losing the origin stories of the internet, or the origin stories of anything, ever again.