JAMS Technical Discussion

BryanB · February 18, 2018, 7:31am

Good thought, and don’t worrying about interjecting. This is a pretty open thread and your comment fits well.

So the history itself might not be a service as we’ve been discussing it, but making one’s history public could allow it to be mined by recommendation services on behalf of the user or other users who share similar tastes.

in fact, the same could be said for your history or a carefully crafted playlist. Fun fact about JAMS: History is just another playlist. So generally, make playlists publicly available for mining (ideally at the user request and not be default).

I’ll keep that in mind and add it to notes for future work.

BryanB · February 18, 2018, 7:40am

I didn’t mean to imply that. I meant to imply that as JAMS uses the data now, it is a list of XORs without context. The context is assumed by JAMS, if you loaded a list of XORs that weren’t songs … it’d get dumb about it and treat them as songs anyway.

I agree that an abstract data structure would be nice. Perhaps a UUID (or a subset of UUID to reduce bytes) to identify data types could be employed? Maybe build a registry so humans can look it up and understand what UUID is reserved for what data types? Potentially include a version as well, though honestly, UUIDs are large enough to have a new one for each new version.

For example, there might be a UUID for Type: Song, then another UUID for Type: Indexed List. Each item in the Indexed List is a tuple of (UUID, XOR). Now JAMS can load a playlist (which is an Indexed List) and check that each XOR corresponds to a song type that JAMS understands how to parse by checking its corresponding UUID. JAMS no longer would assume each XOR is a song, and should not-a-song be encountered, it could easily decide this isn’t a valid playlist for JAMS.

Copy/Paste and replace “XOR” with “MD url”, or have UUIDs which refer to XORs and others which refer to MD urls, then the second part of the tuples might contain XOR or MD url. Shrug. UUIDs are huge.

EDIT: I’m responding to comments in order, and I see these ideas are being hashed out in a number of ways after I already wrote this response here. This response still seems valid for quite a bit of different comments from @joshuaef @happybeing and @bochaco

BryanB · February 18, 2018, 7:47am

@bochaco

I looked so many places for what type tag means and what values to use. All the documentation had examples with arbitrary numbers (and no description of what the field is used for at all), so I began using the arbitrary numbers from the examples.

Is there going to be a list of accepted tags chosen by the community somewhere? Like a wiki or a registry? I’m not sure having every developer wing it is a good idea.

Is there any network advantage to using type tags instead of baking it into the (higher level) data of the MD or XOR? Why would someone use a type tag other than it’s there?

digipl · February 18, 2018, 9:00am

Possibly an off topic but, have you considered adding a lossless audio format? For any audiophile it is a very important option.

The most common lossless format is Flac and you have JS libraries like https://github.com/audiocogs/flac.js

drehb · February 18, 2018, 2:37pm

Also Ogg Vorbis support
(The FLOSS version of mp3)

happybeing · February 18, 2018, 2:45pm

Not sure how applicable in this context, but in case it is, the semantic Web provides a universal mechanism for identifying data types, creating /finding & re-using vocabularies etc based on URIs rather than UUIDs, so when apps use those there’s a greater chance of cross app data sharing, data discovery etc. I don’t know a lot about this yet, but if it is applicable here I think it might be worth investigation.

digipl · February 18, 2018, 5:49pm

I modified an HTML5 audio test page to work on Safe and both, Peruse 0.4.1 and SafeBrowser 0.9.0, pass all the test including ogg, opus and flac.

safe://test.digipl/

BryanB · February 19, 2018, 8:16pm

@drehb @digipl

We’re currently using the built-in AUDIO tag. The browser handles which audio formats are supported. However, I’m fairly sure the Safe Browser is built using FFMPEG which should support FLAC and OGG. Try it out!

I’m uncertain about the streaming browser and how the streaming will integrate with the built-in audio support. That’s a different issue for a different day. I’m also uncertain if the Peruse browser is built with the same audio parsing libraries, but I’ll test that out soon enough.

EDIT:

One of these days I’ll read all replies before posting responses. Today was not that day. Thanks for the thorough testing! Also glad to hear it’s working with Peruse, I hadn’t gotten that far yet!

DBL EDIT:

I only have limited metadata parsing support. I can read ID3s off MP3s and I think whatever metadata from AAC. This is from an open source javascript library. It doesn’t support metadata from other audio containers like OGG. So if you upload a song in a container like OGG, I strongly suspect you’ll have to fill in the artist, song name, and album yourself. I haven’t found many open source Javascript libraries to parse audio files as it is, so I was happy we got the two most common formats covered.

BryanB · February 19, 2018, 8:23pm

I appreciate the sentiment, but semantic web is very pie-in-the-sky, and I’m effectively trying to engineer something today, right now. If we can find concrete, solid practices that are proven today, great! If not, then I’ll wing it or use other practices (like UUID which are tested and proven and in common use). This project is more about getting it out there than getting it perfect.

I am both an engineer and an academic (which is why I appreciate the note), but as a lone developer on this major project built atop bleeding edge software with a lot of quirks, I am strictly an engineer or the project would never move forward.

happybeing · February 19, 2018, 9:52pm

I’m not sure why you are so skeptical about the semantic Web, it exists and is ready for anyone to use - I’m not sure if it is applicable to JAMS, but there’s very little overhead in finding out if there is a suitable RDF type - which I’d be happy to help with. Adopting that rather than a UUID that is known only to your application would I think be trivial. Adopting RDF would be more work, so of course you may not feel it worthwhile, but there are potential advantages for applications and there are lots of libraries ready to use, and a community of developers to help anyone trying to use them.

LinkedData vocabularies exist ready for use and are being adopted widely. There’s a decent and growing set of tools in use and many in development. Add that to LDP and you have something much more than a single application with application specific data, and interoperability with a growing ecosystem.

I get that you don’t want to be diverted, so no problem with that. Just want to correct the impressions about the state of LinkedData / RDF, and to let you know that I’ll help where I can.

EDIT: I did a quick search…

This gives a basic intro to using RDF for audio publishing:

http://musicontology.com/docs/getting-started.html

digipl · February 19, 2018, 10:21pm

Did you check this?

Or this?

BryanB · February 19, 2018, 11:42pm

The word “ontology” is what makes me skeptical. Great for academic studies and planning out solutions, but I’ve not yet seen RDF implemented by anyone trying to accomplish anything. It’s usually an academic research tool, which is valuable, but not for getting stuff done. Almost every RDF I’ve seen people promote gets set aside for a W3C or IEEE RFC that solve the problem in a more compact and elegant way. My experiences with RDFs is that they are usually too abstract to be of practical use, and any implementation details effectively make it your own version separate from the “specification.” Things might have changed since I last looked though. Again, I’m trying to get the app done, not research abstract ontologies (which is a fun pursuit, just not for this task).

I’m glad to see really good documentation here. I don’t see it as a widely adopted standard though. If you see this being used in the wild and see how it applies to JAMS interoperating with those wild applications, I’ll consider overhauling all my data storage at that time.

It wouldn’t necessarily be restricted to JAMS, especially if the UUID scheme is well documented and shared. Just like the RDF you pointed out. Adoption is what matters. Given all else equal, I’d rather get it done quickly whipping together a UUID solution than spend weeks trying to grok the ontology of the RDF. Can always go back and change it later.

BryanB · February 19, 2018, 11:47pm

@digipl

Thanks for doing some research.

Borewit’s music-metadata is for Node, which uses a different Javascript engine. It’s meant for server side JS interpreted by the Node engine. We’re using client side JS interpreted by the V8 engine.

Most of what I found on github was for Node.

Tmont’s audio-metadata is something I will bookmark and look at more closely. Based on my quick look, I think it only supports OGG and ID3 (and I already have ID3 support). The addition of OGG is certainly something.

Nigel · February 20, 2018, 3:56am

This is interesting Mark. Bryan had coincidentally mentioned Musicbrainz not too far back when I was throwing spaghetti at the wall. That discussion of a JAMS DB had kind of deflated but was reborn in this thread. A free streaming music data base is something I ultimately want to persue in due time and it’s clear musicbrainz benefited from this approach internally.

I see @BryanB’s point as well but what I find exciting is that we are basically building apps for a new internet and we have an opportunity to make the changes we want to see in the future decentralized internet. I think semantic data and linked data have finally found a chance to succeed with SAFE but it is up to early developers to set the standards.

Nigel · February 20, 2018, 4:14am

Looks quite promising! Nice find @digipl

digipl · February 20, 2018, 8:52am

As I understand, Flac and Opus use Vorbis Tag header type so adding Tmont you can cover this two format too.

happybeing · February 20, 2018, 9:26am

I do get @BryanB’s points too. As a developer I to like to focus and not have distractions from that, and particularly don’t want to be diverted into researching once I’ve got something ready to implement. I can imagine myself responding exactly the same way, which is why I’m also offering to help if I can. Part of me is happy not to be diverted from what I’m working on into looking at this with you guys ! I want to press ahead with my stuff too!

@Nigel, I’m glad you also see the potential for something new here, a better Web with a new ecosystem of sharable mashable data.

@BryanB I don’t want to get into trying to convince you. Just to say I think you don’t have an accurate picture. I don’t either, and I wish I had more time to research Solid to improve that. But two points: 1) this stuff will stay in academia unless people like us get out there and do stuff with it (although sitting in the Solid chat for the last few months I also know that there’s a lot of real world stuff going on, from adoption by government departments, to the Solid team building real world apps for ordinary users - chat, meeting setup, payment, pubsub etc). 2) I’m skeptical that once you’ve made the design choice to have your own format, that will easily be redone later in RDF. In theory that’s true, but in reality things tend to get set in stone by a number of factors, not least having users with data in the old format.

I do understand and respect your decision though, so I’m happy to leave you to get on with your preferred approach and am still full on supporter of JAMS as a product and and you as a developer. JAMS is already creating a buzz and you’ve already got a great start. Best of luck with it, and if at some point you do want to look at Solid and RDF I’ll be happy to discuss and help out.

Nigel · February 20, 2018, 3:00pm

Absolutely interested Mark, it just isn’t falling within the milestones we’ve just worked up which will bring about JAMS Demo ver2. I believe we have the time to pull everything off in the end but currently we’re just adding a couple missing features, optimizing with a code cleanup, and then remap the service with a revamped demo version for the community to try that has streaming.

After all that is where the bigger plans fit in