WebCache - Transparently Cache the Current Web in Safe

Hey Safe family! I’m creating a proxy server that runs locally and caches static data from the current web into the Safe network. The purpose is to provide a transition point for both users and developers where familiar workflows and APIs can still be used but with a Safe backend. This allows Safe to prove its value without significant interruption to current user/developer experience.

For example, to download a YouTube video currently located at http://youtube.com/video1.mp4, the local proxy server would do the following

  1. Detect request for static data, found to be http://youtube.com/video1.mp4
  2. Search for file @ safe://WebCache/http://youtube.com/video1.mp4 where “WebCache” is the name of the container holding a cache of all static data on the current web and “http://youtube.com/video1.mp4” is the name of the cached file
  3. Fail to find a Safe file @ safe://WebCache/http://youtube.com/video1.mp4
  4. Download video1.mp4 from http://youtube.com/video1.mp4
  5. Upload video1.mp4 to Safe @ safe://WebCache/http://youtube.com/video1.mp4
  6. Send requesting client video1.mp4 via its socket

Note, Steps 5 & 6 would be done in parallel

Upon subsequent request for http://youtube.com/video1.mp4

  1. Detect request for static data, found to be http://youtube.com/video1.mp4
  2. Search for file @ safe://WebCache/http://youtube.com/video1.mp4
  3. Find Safe file @ safe://WebCache/http://youtube.com/video1.mp4
  4. Download video1.mp4 from safe://WebCache/http://youtube.com/video1.mp4
  5. Send requesting client video1.mp4 via its socket

Thus far I’m researching a local proxy server in C# with https://github.com/justcoding121/Titanium-Web-Proxy. I’m suuuper excited about this project, so if you want to contribute in any way (design, development, testing, etc) please contact me directly or in this thread. Let’s change the world!

6 Likes

I like it. I’m not entirely sure I’m fully following but I am curious though, could people on the SAFE network take advantage of this cached static data? If you have a clear net web app written in say JS that is using this as a SAFE backend then that app could also be published to SAFE and use that same data, no? Just a though.

Absolutely, they would simply point to the same data. If you’re writing a Safe native app and want to use a file that you know was cached from http://youtbue.com/TheCoolVideo.mp4, then your app would simply point to safe://WebCache/http://youtube.com/TheCoolVideo.mp4. Does that make sense?

1 Like

Indeed! I will have to think about how I want to formulate them but I do have some questions bubbling up in my head. Would love to hear as many details or docs as you’re willing to share on this though.

Just a quick one. All these web caches will be public correct? But as far as knowing what the specific NRS would be you would need to know what is being cached and from what app so you would know where to find it and best how to utilize it, from what I’m picking up from your last comment?

By the looks of it you are naming the NRS after what you are caching and just have safe://WebCache/ before it which is really nice.

I’m just wondering if there is a way to make it a two way street.

I just have this half baked concept in my head but I don’t think it works. Say you have a podcast web app on the legacy web that uses this proxy server and then a clone of this web app on the SAFE network that could publish what has been consumed on the legacy web side. This would be really neat. But it wouldn’t be a really useful app until a lot of data had been consumed plus some form of indexing would have to be implemented to be able to search this content on the SAFE side. Am I right here?

This will be really useful to introduce folks to SAFE as is, very cool.

Yes, the singular web cache, as there only needs to be one, would be public.

The NRS would follow the convention => safe://WebCache/.

As far as I know, there is no need to know of the app on the current web. Like it is in the current web, a URL points to a specific file, or at least a snapshot of it. So two apps on the current web can reference the same URL.

1 Like

This may be a silly question, but why would the podcast web app need to be re-written? When the code in the browser tries to retrieve data it will already be going through the proxy and thus automatically retrieve data cached in Safe.

Indexing for search would be valuable in some cases, but for the most part, whenever a static file is retrieved, it knows exactly what to look for in Safe due to the convention of where we cached it ([the http URL] => safe://WebCache/[the http URL].

If thousands of people are running the proxy, then the cache is being passively populated on every GET on each of those instances. The cache would be populated rather quickly.

Well I don’t think it would need to be rewritten because any old app can be published to SAFE it just has to use the SAFE API’s but as far as searching goes (I think) some form of indexing would have to be implemented for a user of the app to search content.

I see what you’re saying and you’re probably right.

This is true.

Also, a brand new traditional app could still use a traditional API by targeting a convention URL.
Let’s say when the proxy is installed we also include an extremely basic web server that simply forwards requests to Safe. We do this by adding to a local DNS entry from http://safecache to http://localhost:8888. The local forwarding server simply takes a request and retrieves it from safe using a similar convention [http://safecache/TheFileWeWant] => safe://SafeProxy/[TheFileWeWant]. This way devs can use the Safe network without using its APIs directly. Users would get the local forwarding server automatically installed with the proxy installer so they wouldn’t no the wiser.

2 Likes

I love it man. Just connect to the SAFE proxy server and call it a day.

1 Like

Perhaps use Rust? Maybe this could be helpful? https://gist.github.com/Plecra/d95f170bc8f42ed80158f3dcc19bcc9a/c6feec657343b05123ba58d56b784846497a64bc#file-proxy-rs-L53

A bit off topic, I love this idea, by the way. But could a malicious bot, spam the entire network by cacheing the same video cache repeatedly, just by changing the name of the video like: safe://WebCache/video1, safe://WebCache/video2… safe://WebCache/video100000000000, forever? I’ve always wondered about the concept of only having one version of any data needed on the network. Yet all someone would need to do is put the same video up with a different name to make it sound original, or different, when, in fact, it is just a junk copy. Am I wrong? Is there a way around this kind of malicious spamming? Thanks

Changing the name does not change the chunks being stored. It is the chunks that de-duplication occurs for. Caches are caching the chunk, not the whole file.

But to spam the network, all you need to do is write a 1MB file that has 4 bytes changing (eg a 4 byte integer counting up to 4 billion

Now when you do this it cost safecoin to do this. A person will be going broke before they destroy the network with spam

For caches the chunk has to have the correct hash in order to be valid. Thus if a node changes the chunks they are caching then that copy will be rejected

3 Likes

I see, yes thanks, right, it costs safecoin.

1 Like