Network Health Metrics

mav · January 30, 2018, 12:37am

I’ve been unable to comment on a lot of the recent activity in this thread but it’s really fascinating.

I agree.

It’s hard to reason about the security model without a clear understanding of safecoin and how it will affect vault behaviour / incentives. Will it be better to have many small vaults or a few large ones or somewhere in between? Can’t know until safecoin algorithm is clearly defined.

The design of security and incentives seems to come down to how ‘health’ of the network is defined and measured. I don’t feel health is clear (at least not to me) and thus the design work here and elsewhere is not as well directed as it could be. The existing work is a very interesting exploration and learning, but eventually it must become directed and measurable.

I’ve been working on a spreadsheet of measurable metrics for network health (see this spreadsheet, comments welcome). Maybe it’s not so helpful, I don’t know, but hopefully it can clarify what the targets are rather than some abstract idea of ‘optimum’.

Network Health is defined as ‘the ability to securely store and deliver data’.

These metrics determine how well or poorly the network has ‘the ability to securely store and deliver data’.

Health is an ongoing assessment to correlate real-world actions with desired consequences.

GET performance
PUT performance
Section Prefix Length Similarity
Chunk Distribution Variation
New Vault Rejection Rate
New Vault Acceptance Rate
Unexpected Departure Rate
Expected Departure Rate
Relocation Rate
Vault Age Distribution
Section Farm Rate Similarity
Section Vault Count Similarity
Age Distribution of Unexpected Departures
Safecoin Allocation Rate
Total Vault Count

Eventually the goal is to have a monitoring program that can measure these aspects of the network and alert if poor health is detected and maybe trigger human (ie vault operator) intervention.

Qi_Ma · January 30, 2018, 3:13am

Hi @mav: a really useful document of network health metrics.
Just wondering, is Rate Of New Clients and Total Stagnant Clients more about the popularity instead of health ?

Also, I will prefer not to have a monitoring program which triggers human intervention when poor health is detected.
Instead, the vault itself shall monitor the metrics and adjust themselves to have a self recovery?

mav · January 30, 2018, 9:58pm

Yes, ‘popularity’ is a better description of these than ‘health’. But I do consider a network that was once popular but now unpopular to indicate poor health. It can’t be directly addressed by vault operators but I’d still consider it a useful indicator.

Totally agree. I think having a ‘vault commander’ automation program will be valuable to vault operators. The operator can configure it to start / stop / adjust vaults in an optimum way for the preferences of the operator (tuned for maximum profit or network stability or efficiency etc).

But the configuration of the commander is managed by a human so I still call it ‘human intervention’.

For example, safecoin price is a measure of network health, because it contributes to the viability and profitability of vaults for the operator. So the operator may have part of their vault management strategy as automated trading on an exchange to stabilise the price keeping their vaults viable and thus contributing to network stability.

Very large operators may have a strategy to ‘create a futures market’ or ‘legalise safecoin’ or other very complex high-level activities, which ultimately ends up assisting the future health of the network and thus future profits.

I think the scope of the spreadsheet is most useful if constrained to operations that can be initiated by the vault software.

Qi_Ma · January 31, 2018, 4:16am

The popularity could be affected by many factors, such as marketing or even just a super client provides service to other users so it looks like just one client connects to the network.
That’s why I prefer not to make it as a health metric which could affects network behaviour.

The configuration (or the algorithm to generate a dynamic config) will hardly be changed once vault executable got published. i.e. the algorithm of health metrics affects network behaviour will remain same during the runtime.
The word of human intervention gives the impression that human can affects network behaviour during runtime. And I prefer to avoid this.

hmmm… not sure about this large operators idea, it might be good, but could also give the impression that the network is in hands of some super-guy and be manipulated in any way in favour of his benefits.

mav · February 5, 2018, 4:44am

Number Of Clients is used In RFC-0012 Safecoin Implementation - Establishing StoreCost

StoreCost = FR * NC / GROUP_SIZE

Where NC is “the total number of client (NC) accounts (active, i.e. have stored data, possibly paid)”

Since number of clients is directly used in the calculation of store cost, it seems like it should form a health metric and is not just a matter of popularity. Does this sound correct?

Qi_Ma · February 5, 2018, 1:25pm

My understanding here is that the network health and safecoin farming (store cost) doesn’t need to be fully coupled. And number of clients is one of the cases.
There could be metrics that only affect one of them but not the other.

jlpell · February 6, 2018, 2:09pm

If these are the objective outputs to maximize/minimize it would also be nice if you/we could identify all of the inputs in the form of vault settings which determine the performance output. These could be used in simulations along with a multi-objective genetic algorithm (NSGA-II) in order to optimize network performance and identify good defaults. Taking it a step further, there may also be a way for the nodes to operate the genetic algorithm in real-time and share information in order to constantly adapt/optimize themselves and thus dynamically optimize the whole network.

mav · February 6, 2018, 9:26pm

I think many of these metrics do not lead to specific targets, just indicators. Sort of like how you don’t aim to minimize bacteria in humans but you do need to monitor for sudden changes.

So maybe these inputs should be aiming at three categories of output: maximize / minimize / stabilize