Maximum AppendOnlyData size

bzee · October 21, 2019, 12:55pm

As the AppendOnlyData (AOD) is a continuation from the old MutableData, is the size limit for an AOD also 1 MiB? And is the amount of entries also still limited to 100?

If so, does this mean a website can not have more than 100 versions? If the AOD has reached its size limit, what happens? Is this AOD to be abandoned? Is there a protocol to find a new AOD?

happybeing · October 21, 2019, 12:58pm

The limit for MD was raised to 1000 entries.

The design can’t limit AOD because of the issue you raise, but I’ll leave @Maidsafe to confirm that. It will be nice when we have an explainer about these with pictures to highlight the behaviour and conceptual structure!

lionel.faber · October 21, 2019, 1:01pm

There are no limits for Mutable / AppendOnly data entries.
The size is not limited to 1 MB as well.

bzee · October 21, 2019, 1:13pm

Thanks for answering, @lionel.faber!

Is this just how it’s currently implemented or is this how it is envisioned to be?

I’m not an expert on the technical network fundamentals, but it does leave me with a few questions. Perhaps you could expand a bit more on the reasoning behind this choice to deviate from the earlier-set limits.
If an AOD/MD grows larger than 1 MiB, will this not strain vaults? As I’ve always understood Immutable Data is split into chunks for various reasons (performance, reliability, etc.). Wouldn’t an AOD of, say, 1 GiB bring one of these characteristics in danger?

lionel.faber · October 22, 2019, 6:49am

It’s already implemented in this way @bzee

Your question is a valid one. The reasoning behind removing this limit was that even with the limit in place vaults could still be under strain / spam. Let me explain a bit:

The size limit for mutable data was introduced so that a single section could not be bombarded with large amounts of data. But, even with this size-check in place it’s technically still possible to bombard a section with loads of data. All you need to do is keep putting 1MB of mutable data at sequential addresses. Since the section that holds mutable data is determined by the prefix of the address it’s quite likely that all that data will accumulate in a single section and there you have it. A spam attack

Since scenarios like this will need to be properly handled anyways and the size limit to mutable data was restricting developers to a certain extent, the size limit was removed.

So your next question is probably, what happens when I try and PUT a huge MD / AD now ?

It depends on your connectivity to the shared vault. Safe Client libs has the timeout set to 180 seconds, so if your request goes through and a response is sent within that timeframe then you’ll get a success or else you’ll get a RequestTimeout error.

But on the bright side, you can still keep adding entries to an MD / AD if the size exceeds 1 MB, no problem. Retrieving some of these entries will be possible as the payload will not be too much. IIRC, during testing with the phase 1 vault, we were able to fetch upto ~300 MB in a single payload. If your mutable data entries have more data than that you can still code your application to fetch entries in batches. A pagination of sorts

JPL · October 22, 2019, 11:22am

Going back to @bzee’s example here, what if the file is larger than the average vault’s spare capacity? At what point will it be split?

lionel.faber · October 24, 2019, 7:59am

Good point @JPL
While there could be multiple potential ways of handling this, I’m afraid I can’t give you a concrete answer here. But this is certainly something we will be handling in the vaults from home phase.

Vaults phase 1 was purely to show the working of the new data types and their potential use-cases in applications. And it does just that with the showcasing of the perpetual web etc.

With a solid amount of work going into routing, vaults phase 2 is to demonstrate how requests go via consensus using the newly introduced PARSEC. It will be initially a single section and then go on to be multiple sections. Vaults from home will come in after this.

digipl · October 25, 2019, 9:03am

An AOD will have always the same XOR address and, at least with the current design, will never split. With MDs, and now with their variants, the problem of generating unbalanced sections has always existed.

happybeing · October 25, 2019, 9:50am

I preface this with: I have no idea how this works, but…

While the AOD always had the same XOR address that doesn’t mean all the data it holds is at that address, especially so now that we have the Perpetual Web because really AOD is just a more complex type of Immutable Data.

So just as an immutable file isn’t all in one place once it split into chunks, I would expect AOD to be the same.

Ahah! You may say, but what about MD? Now there you may have a point, although I can still imagine the MD being like a read/write data map so that once it gets over a certain size it can be broken up and gets spread around. However, I think the need for it to be writeable and deletable makes this more tricky so who knows.

digipl · October 25, 2019, 10:24am

Although all published data types are immutable, the family of AODs, in my opinion, is a derivative of the MD, not IM, as long as all the data, and all its history, are concentrated in the same Xorname. And with the current design, whatever its size, it always concentrates on the same section.

github.com

lionel1704/rfcs/blob/af3a93800faf40cd3c1d6f63fbdc5b3ee9d8891b/text/0054-published-and-unpublished-data/0054-published-and-unpublished-data.md

# Published and Unpublished DataType

- Status: proposed
- Type: enhancement
- Related components: safe_client_libs, safe_vault
- Start Date: 16-05-2019
- Discussion: https://safenetforum.org/t/rfc-54-published-and-unpublished-datatype/28620

## Summary

This document describes how to enhance the data types to allow the network to store Unpublished data via the `MutableData` type, or Unpublished or Published data via the `AppendOnlyData` type, and when these data types shall be used.

## Conventions
- The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](http://tools.ietf.org/html/rfc2119).

## Motivation

### Published Data

Published data refers to the content that is published (made available) for everyone. For example: websites, blogs, or research papers which anyone can fetch from the network and read without requiring any permission. For such public content, it becomes important to retain a history of changes. History MUST not be allowed to be tampered with and the published data MUST remain forever.

This file has been truncated. show original

rob · October 25, 2019, 12:14pm

I was under the impression that they would still be split once they reached the around the 1MB

Not sure of what method they were going to use for the address of the parts.