Scalable content indexes

drehb · August 18, 2017, 2:11pm

Continuing the discussion from here (https://safenetforum.org/t/safe-tube-release/15690/32), what are your thoughts on building scalable content indexes? If we have a MD that holds a list of pointers to other content, MDs need to be linked together once we get to the max number of entries for a MD (100 max - entries required for linkage). Users need to download this entire content index to search for what they are looking for.
As suggested by @happybeing in the linked thread, various data structures could be used to do this linkage of MDs. Using a linked list would probably be easiest, but then all of the MDs would need to be downloaded serially. Using a tree would allow for parallelization of MD downloads. Would this be able to scale for a large application like YouTube though? Would we need to worry about concurrency issues when multiple users are trying to add content at the same time? Any other thoughts on how to accomplish this type of thing?

@Joseph_Meagher

Vort · August 18, 2017, 2:35pm

It seems to me that most important problem here is security.
With single object used developer can create it before the launch of the service.
This allows to restrict users from data deletion, for example.
But who will create extended parts of the objects in case of list, tree etc.?
If this task will need to be performed manually, developer will get bored too quickly.
Allowing of object extension for anyone (via client-side JS) will allow any user to break the structure of combined object.
Is there any solution to this problem? Or am I miss something and this is not a problem?
(sorry for my bad English)

happybeing · August 18, 2017, 2:48pm

I didn’t envisage the developer pre-creating these structures, always the app being run by the user. Although you may be right that there needs to be a reference MD created in the first place if this is to be a shared index with a single starting point. My question is should that always be the case? I can imagine there are alternatives to that, but we are just not used to thinking in a decentralised way so it’s less easy to come up with them. I’ll come back to this in a moment.

So with a single shared ‘index’ you are right that this creates potential difficulties. I think they can be dealt with but I have not got a solution in my pocket. My thoughts are that we can allow anyone (ie every user running the app) to build an index that can be found by others (also users running the app) which can be incorporated in (ie linked to from) easy user’s own index, or not if it isn’t useful, or is spam, or unwanted, or doesn’t work etc. I’m interested in solutions to this but have not given them any thought yet.

Going back to the first point - how can this be done in an even more decentralised way? I think we can begin without a central index at all, but where each user’s videos appear in their own index. Then we can think about how users can share individual videos, play lists, or their entire index etc. A user receiving a share can incorporate all or part into their own index, either through a link or a copy.

I realise this raises more questions but hope it gives some ideas as to how we can begin to think about these kind of apps.

Vort · August 18, 2017, 3:21pm

This idea looks more like social network.
But not every service needs to be a social network.
Let’s take YouTube as an example.
User can open website and start watching videos immediately.
Without invites, registrations, shares and likes.
It’s a useful feature.
And it is worth trying not to lose it.

drehb · August 18, 2017, 3:25pm

I also wondered about how to cache these indexes to avoid having to re-download them on every visit or site refresh. With an index build with MD, it seems you wouldn’t have a way to know if the index you downloaded on your previous visit was still current, or if any entries had been modified.

happybeing · August 18, 2017, 6:23pm

Caching recently accessed videos, or parts of an index such as recent searches would be easy to do using browser based storage, or the local file system. Modern browsers include a local database, so you can use that too.

I can imagine partial caching being used to provide quick results for any search immediately based on the cache, but with searching of the network being done in the background and providing more results (adding or refreshing the cache results) if the user is patient!

Existing web services already operate in a similar way - trying to return some results quickly but adding to them over time if the user waits. So this kind of UX could work well.

Regarding @Vort’s social network analogy, I say well it can be both or either. Google and Facebook are examples where the backend is probably not that different, but the front end appears to be. I see no reason why a decentralised set of user owned indexes can’t be searched and appear as if there was one big central index. I’m not saying this would be identical to YouTube, but i think it could be just as useful, just as good a user experience, or probably better because there will no doubt be more innovation arising from two things:

far more developers able to build massively scalable apps on a shoestring budget
things that can be done on SAFEnetwork that can’t be done I on the current platforms (but don’t ask me what they are yet - check out David’s upcoming blogs for ideas though )

Vort · August 19, 2017, 4:49am

I don’t see how it is possible to make such decentralised solution without support from SAFE network core.
New user can’t get information from nowhere.
He must download the object, which will link him to the first set of users.
And this object will be the “single point of failure”.

happybeing · August 19, 2017, 8:56am

Only if it is designed like that. The are many ways to tackle this issue - it’s similar to: how do nodes connect to SAFEnetwork without a central server. Or indeed how does any p2p network bootstrap. So maybe look for ideas in those areas.

Vort · August 19, 2017, 9:20am

Double bootstrap looks redundant.
It is better to have query API available, which will reuse available information instead of building network on top of the network.

happybeing · August 19, 2017, 12:46pm

I’m not sure how you can build that on top of the network bootstrap code, but if you can figure out how that would be neat indeed.