Storage of versioned websites for browsing on Safe Network

happybeing · March 18, 2024, 3:50pm

Related to the proposal for a Safe NRS/DNS, I’ve now split out the proposal for storing the websites themselves to this new topic, to allow separate discussions of each proposal.

Goals for Website Publishing

The proposal below provides for website publishing with the following properties.

published websites will:
- provide versioned, local-first, concurrent editing (based on the Safe RegisterCRDT)
- be referred to using "safe://domain" (in conjunction with a Safe NRS/DNS)
- use conventional human readable paths in URLs (e.g. “/index.html”, “/images/logo.jpg” etc)
- incorporate a map to enable a client (e.g. web browser) to map path based URLs to xor addresses
developers of static websites:
- can use existing web frameworks and tools with hot reloading etc.
- URLs within the website itself remain human readable
- use http(s):// within embedded URLs during development, which will be converted to safe:// on publication
- use an app (perhaps a new subcommand of the Safe CLI) to create a name for the website and upload/update the files to the website

Local Development

Website development can use the same website development tools and frameworks used currently to generate static websites, such as Svelte, Eleventy, Jekyl etc.

There may be a need to add some configuration information to the project to control publishing (such as the NRS name for the website) but this will not interfere with existing website development tools or workflow.

Publishing

Publishing will be performed by building the website file tree ready for deployment as normal, followed by publishing to Safe Network using a custom tool (such as the Safe CLI with new subcommands). Publishing involves uploading the files, and creating or updating the structures which will enable a client to retrieve and display the website. The publishing process is as follows:

Generate the website ready for uploading, as normal, as a tree of static HTML and related files (e.g. images).
Create a Website structure by

uploading each file to Safe Network and recording the relative URL of the file and its xor address in a map.

Add website metadata such as author, publication date etc (perhaps based on settings in the project’s ‘package.json’ file)
Serialise the Website structure (map and metadata) and upload it to Safe Network.
Create (if not already created) an NRS register (based on a RegisterCRDT) whose address is determined by the human readable NRS name of the site (e.g. by taking a hash of the name).
Insert the xor address of the Website structure into the NRS register and sync that to the network.

Resolving A Web URL

Resolving a web URL involves two stages:

using the ‘domain+subdomain’ to obtain the address of the structure that holds the content of the website
using the content structure to locate the data referred to by the path (e.g. a file of HTML stored at an xor address)

Step 1 is outside the scope of this proposal.

Resolving the path to content could be done in different ways and will depend on how the data referred to is stored and what metadata is available to locate it. For example, if xor addresses were used in the URLs locating the data is relatively straightforward, but makes those URLs difficult for humans to understand.

On the other hand, using the conventional path and original filenames from the the development tooling requires an extra step to map those, using some metadata, to the xor location of the content on Safe Network.

This proposal favours the second method as a way to achieve the goals listed at the start.

In more detail, the client uses NRS name to retrieve the NRS register corresponding to the NRS name. From the register it can obtain the address of the latest Website structure (or an earlier version if required). It retrieves this from the network and uses it to resolve the human readable URL to an xor address, and can load that file from the network. Any parameters applied to the URL will be made available to the page through a scripting API compatible with existing web conventions.

Similarly for URLs contained within the HTML files of the website when loading other pages, images, following links etc.

Website Metadata

In addition to metadata for the website which would reside in the Website structure, it may be desirable to store metadata for each file that makes up the website, such as CONTENT-TYPE (cf. http headers), creation date, modification date, author and so on. This could be entirely optional and supported by using a WebFile datastructure to hold the datamap pointer and metadata as a collection of text variables.

This could be implemented by adding metadata to the entries held in the Website map of paths to files.

Some of this metadata could be encoded within the XOR address of the file, in the way proposed in PR #337.

If the standard file datamap were to support metadata that might be used instead, but I don’t think that’s likely to be the case.

Why Diverge from Safe Folders API?

This design diverges from the Safe Folders API being created for syncing a tree of files with the network. The reason for that is in order to support the design goals set out at the beginning, in particular versioning of the tree without embedding xor addresses in the HTML URLs. That is important because it:

enables use of existing web tools and frameworks without modifications
improves readability of deployed HTML for debugging websites and web apps
simplifies merging concurrent edits
simplifies retrieval of different historical versions of the website

As I understand it, embedding of xor addresses in the HTML will be required in order to retrieve historical versions under the Safe Folders API, because it uses one register per directory in the tree, each with its own independent history. The versions of each directory can be retrieved, but there’s no way to know which version of each directory corresponds to the versions of all the other directories.

The implementation proposed earlier uses a single register and maintains a versioned map of file URLs to xor addresses for the whole file tree (or website).

Note that the xor addresses will be the same for both approaches so there’s a degree of compatibility, but I don’t think it will be easy or advantageous to upload the files using the Safe Folders API and then generate the mapping needed above. If that is in fact possible it is worth exploring so that in addition to the design goals, a published website would also be accessible via other applications using the Safe Folders API.

An alternative folders API implementation, were it to support versioning of the whole tree, such as using a TreeCRDT might also simplify access of published like a filesystem.

Related documents

These are I believe the latest relevant documents but I haven’t reviewed them, and I’m not sure how current or applicable they are now.

November 2018 - Naming System
- forum discussion: [RFC 052] Public Name System: Resolution and RDF
- RFC #052: RDF for the Public Name Resolution System
August 2019 - XOR-URL encoding
- PR #337: Define our own content-id encoding format for XOR-URLs
- forum post: XOR-URL RFC discussion

Traktion · March 18, 2024, 3:57am

I haven’t looked into working with registers yet, but my hot take is that the concept of a website should be aligned with files / folders API. A (static) website is just a collection of files after all.

For example, being able to retrieve a file by its name from the CLI seems similarly important to doing the same via a web browser. (Edit: FUSE/ file system mounts would also need the file names too)

It feels like sn_client should be able to resolve these URLs internally, then retrieve based on XOR URLs to maximise caching benefits. Integrations could then just choose to use XOR or NRS naming.

Iirc, when i was looking at IMIM blog, I used the old NRS implemention to retrieve a map of XOR URLs, which represented the website. When something changed, a new map was created and NRS was updated to point to it. The blog pages themselves still use XOR URLs though. This meant one lookup and lots of caching was possible. It also meant viewing an old version of the site was straightforward too.

I suppose the question is whether resolving a site vs each file is desirable. The latter may make sense with today’s tooling, but it may not be optimal / specific enough in a safe oriented world. It can also be a source of 404s/different content, which is a goal to avoid. Is exposing XOR URLs actually desirable in this context?

Traktion · March 18, 2024, 10:37am

I wonder if a tool to take a static site and convert it to a ‘safe network ready’ static site would be handy?

Such a tool could take a static site folder as input, then output an XORed folder. Obviously, dependencies would need to be XOR first, then the files that depended on them second.

We would then get the benefits of XOR (caching, immutability, etc), but the ease of use of a traditional static site generator.

Ofc, we would still need NRS to map to the website root (i.e. index page), but the rest would be straightforward for the publisher. For site consumers, the experience would be the same, other than the location bar looking XOR like.

Edit: probably something for another thread, but it is worth considering in context of NRS.

Traktion · April 22, 2024, 9:57am

(Related to DNS stuff here a bit too: A Safe Naming System to support versioned Websites (NRS/DNS) - #20 by Traktion)

I had some experimentation with this here: sn_httpd/example-config.json at master · traktion/sn_httpd · GitHub

sn_httpd reads the data map associated with a site (see above link), then named files are lookup up in the map within sn_httpd and resolved to XOR addresses.

From experimentation, having a single map works pretty well (at least for websites - for FUSE/file systems it wouldn’t be ideal), especially given the XOR space is flat (without trees/hierarchy). The ‘shape’ of the site is essentially what this data map illustrates.

I also added the concept of a route map for sn_httpd to allow Angular URLs to be routed correctly (i.e. SPA site).

Combining the two (along with experimental DNS), I can upload the standard/native Angular dist files for IMIM, then use friendly URL names to access IMIM articles, e.g. http://vader:8080/traktion/blog/traktion/article/blogging-with-imim-and-safe-network#article

Lmk what you think!

happybeing · April 22, 2024, 5:05pm

Sounds like we’re working on very similar ideas in parallel. I have a demo browser that loads websites using custom protocols: xor address of resource, or of website map, or today of register containing multiple website maps.

This works with any static HTML website published using a CLI, and which can be viewed using a demo browser in Svelte+Tauri (so Rust backend that talks to Autonomi directly in the app).

I’m working to get the UI and CLI tidied up at the moment and will probably show what I have in a few days, before looking at whether I can make an NRS without the public key limitation.

I’m not sure if your map works with standard static website? I did look at your code when I was starting but am not up to date on changes in the last few weeks.

My website map maps the path of each website resource (page, image, javascript file etc) to the XorName of the resource on Autonomi, and the browser app handles all the requests (from an iframe), fetching them from Autonomi.

I can’t really give feedback on your approach without taking time out to look in more detail, but it’s great to see the snapshots. It may be that we will converge, or share APIs etc. Let’s see how things pan out.