Any probs with Apha 2 (Core error: Routing client error -> Requested data not found)

happybeing · August 23, 2018, 2:22pm

I’m getting the error Core error: Routing client error -> Requested data not found when trying to access the _public container using this._safeJs.auth.getContainer('_public')

This code was working and I’ve not made any significant changes, so I just want to check if there are any recent problems with the alpha 2 network.

I’m using Peruse-v0.6.0-linux-x64 on Ubuntu - but as I said this was all working a few days ago. I’ve tried it on two different accounts.

Viv · August 23, 2018, 9:29pm

Ah think we may indeed have an issue here @happybeing

Just ran a quick test and found 16 nodes offline(none proxy nodes). I remember @StephenC confirming yesterday all nodes were fine. Just running an invariant check right now to confirm the network state from routing pov.

I was able to login to my account too. However a section or so might still have data loss if enough nodes went offline together ofc(if only parsec was there for alpha-2). So could still be having partial data loss even if the network recovered(that is if the invariant check passes).

We were also looking at https://blog.digitalocean.com/a-message-about-l1tf/ from DigitalOcean recently which could have played a part if some network inactivity occurred. which could make sense as some offline node logs end with:

D 18-08-22 17:16:16.792881 [routing::states::node node.rs:3761] Node(395310..(0011)) Lost all routing connections.
D 18-08-22 17:16:16.792989 [routing::state_machine state_machine.rs:418] State::Node(395310..(0011)) Terminating state machine

Gonna take some time to get some details unfortunately. Will keep this thread updated ofc.

Viv · August 23, 2018, 11:51pm

Quick Update:

Had a poke about and there seems to be some networking weirdness that went on with the droplets for sure(not confirmed what yet though) which seems to have triggered some node loss leading to large scale data replication triggering further throttle control from DO itself … leading to more node loss.

Could get the routing network invariant restored, however not sure if all data is back available yet(some options exist to try and recover that too), however would wanna be sure about what caused this from DO in the first place to be sure it doesn’t just repeat itself again. Hopefully should have some more updates tomorrow.

dugcampbell · August 24, 2018, 4:06pm

Hi @happybeing, just to let you know, there’s an update on this that’s just been posted on the main forum (https://safenetforum.org/t/please-read-digital-ocean-maintenance-issue/25098)

happybeing · September 4, 2018, 12:25pm

@Viv I’m getting the same error again with code that was working on Sunday:

safeApp.auth.getContainer('_public') gives:
Core error: Routing client error -> Requested data not found

I suspect this is my code because whm seems to work fine, but am a bit baffled as to how I could be messing up such a simple thing (even after reverting to code that was working), so just want to check that there are no know problems with the network again?

After seeing this on my test account I created another (with no data uploaded) but see the same error consistently on both accounts. I’ve also rebooted my machine inbetween and I get this error every time in my app, but never when I list the folder in whm.

DGeddes · September 4, 2018, 1:09pm

Stephen’s running some tests now… we’ll report back!

DG

DGeddes · September 4, 2018, 1:23pm

Quick update. We’ve run:

a check to ensure all nodes are online
the invariant check scripts

and all coming back clear so there doesn’t seem to be any problem with Alpha2 network nodes.

So something else is up…

happybeing · September 4, 2018, 3:19pm

I’m mystified by this one (nothing new there ) so have filed an issue:

DGeddes · September 4, 2018, 3:27pm

I’ve asked the front-end gurus to give us a hand.
DG

bochaco · September 4, 2018, 7:16pm

@happybeing, have you tried rolling back your changes to the point where it was working before? I see you say they were unrelated changes, so to confirm that. I see this is all part of a larger flow in your lib, can you provide a simplified sample code which can be used to reproduce the issue?

happybeing · September 4, 2018, 7:50pm

Yes I’ve rolled back the changes to Sunday in both repos (see tag wip-getcontainerbug-goodcode) even though none of them seem able to affect this area. This wasn’t happening as late as yesterday evening, so I’m as sure as I can be that it isn’t due to my changes to the code. This is why I wondered if it was problems with the network again - especially as the symptom was identical to the first time.

Also, I haven’t touched the auth code at all in months. I was making superficial changes to safe-containers.js in safenetworkjs.

I put that test getContainer('_public') call into bin.js to check before it gets to any of the code I’ve been working on.

Don’t you think the error itself is infeasible? How can a request to access _public result in that error? I really don’t know how to debug this.

The error looks suspicious to me which is why I’ve asked for help. I could create a simplified example, but I expect it will work - how could it not? So I think whatever the issue is that I’m triggering, it is worth finding out what is actually happening to cause that particular error message, because I think it is either a bug in the API, or reporting the wrong error, regardless of how I’m causing it to happen (if indeed I am - as I say, reverting to the earlier code hasn’t changed this).

I don’t know how to debug that so if you can’t help it will delay me while I learn how. If you can’t help debug, can you maybe try reproducing it?

bochaco · September 4, 2018, 8:01pm

I’ll try to reproduce it and get back to you.

happybeing · September 4, 2018, 9:19pm

Thanks. FYI I just tried cloning both repos and checking out the development branch (latest commit) to make sure I hadn’t accidentally corrupted anything (e.g. in the node_modules) but the problem is still there.

bochaco · September 4, 2018, 10:13pm

@happybeing, I was able to reproduce it, and it’s just that you are getting an authorisation URI from Peruse against the live (alpha2) network since you are running it with --live, but you are then actually trying to connect to mock network since you are running safenetwork-fuse with NODE_ENV=dev in this command: DEBUG=safe-fuse*,safenetworkjs* NODE_ENV=dev node --inspect-brk bin.js

I just tried removing NODE_ENV=dev from that command and the error is gone. Then I can see sometimes I get the Utf8("invalid utf-8: corrupt contents") error but that’s been solved in safe-app-nodejs v0.9.1 (you’d need to upgrade), so just after retrying I don’t see any other error but not sure what happens then, it seems the debugger looses the connection

happybeing · September 5, 2018, 8:22am

Ah you are a life saver Gabriel, thanks so much. I’ve been wracking my brain trying to think what could have changed. I can see how that happened: usually I grab that command from the shell history but after a reboot I will have copied and pasted it from my notes.

Phew and yipeee! I can get back to work. Thanks again.

happybeing · September 5, 2018, 9:19am

Sometimes you need a second pair of eyes - it might have taken me days to spot this. Which made me think.

I could certainly do this again, so will make notes to help me spot it, and others developing stand alone (desktop) apps will likely do this too. So I wonder if it would be worth detecting and flagging it - auth URI includes a flag (mock) that can be checked in the client library when used?

I’m guessing we haven’t seen this before because NODE_ENV was the only way to select mock/live, but with CLI also an option it will be easy for desktop app devs to do this.

What do you think - worth me creating a feature request?

bochaco · September 5, 2018, 12:29pm

It could be done, as you said, if the safe_authenticator lib can encode a bit/flag in the URI when it detects it’s working with mock, then the safe_app lib can decode it and throw an specific error if the flag was not expected or indeed expected. This would need to be done in the safe_client_libs so we’d need @nbaksalyar’s point of view?

nbaksalyar · September 11, 2018, 1:58pm

Sorry for a late reply on this.

@happybeing, this is a good suggestion, thank you! I created a Jira task for it and it should be implemented as a part of a new Client Libs version release.

happybeing · October 22, 2018, 2:22pm

This topic was automatically closed after 60 days. New replies are no longer allowed.