VLC player demos real-time AI subtitling for videos

Otter@lemmy.ca · edit-2 8 months ago

VLC player demos real-time AI subtitling for videos

asbestos@lemmy.world · 8 months ago

Finally, some good fucking AI

shyguyblue@lemmy.world · 8 months ago

I was just thinking, this is exactly what AI should be used for. Pattern recognition, full stop.

snooggums@lemmy.world · 8 months ago

Yup, and if it isn’t perfect that is ok as long as it is close enough.

Like getting name spellings wrong or mixing homophones is fine because it isn’t trying to be factually accurate.

TJA!@sh.itjust.works · 8 months ago

Problem ist that now people will say that they don’t get to create accurate subtitles because VLC is doing the job for them.

Accessibility might suffer from that, because all subtitles are now just “good enough”

Railcar8095@lemm.ee · 8 months ago

Or they can get OK ones with this tool, and fix the errors. Might save a lot of time

snooggums@lemmy.world · 8 months ago

Regular old live broadcast closed captioning is pretty much ‘good enough’ and that is the standard I’m comparing to.

Actual subtitles created ahead of time should be perfect because they have the time to double check.

LandedGentry@lemmy.zip · edit-2 8 months ago

deleted by creator

TachyonTele@lemm.ee · 8 months ago

I have a feeling that if you care enough about subtitles you’re going to look for good ones, instead of using “ok” ai subs.

shyguyblue@lemmy.world · edit-2 8 months ago

I imagine it would be not-exactly-simple-but-not- complicated to add a “threshold” feature. If Ai is less than X% certain, it can request human clarification.

Edit: Derp. I forgot about the “real time” part. Still, as others have said, even a single botched word would still work well enough with context.

snooggums@lemmy.world · edit-2 8 months ago

That defeats the purpose of doing it in real time as it would introduce a delay.

shyguyblue@lemmy.world · 8 months ago

Derp. You’re right, I’ve added an edit to my comment.

vvv@programming.dev · 8 months ago

I’d like to see this fix the most annoying part about subtitles, timing. find transcript/any subs on the Internet and have the AI align it with the audio properly.

LandedGentry@lemmy.zip · edit-2 8 months ago

deleted by creator

Petter1@lemm.ee · 8 months ago

Finally some good AI fucking 🤭

m8052@lemmy.world · 8 months ago

What’s important is that this is running on your machine locally, offline, without any cloud services. It runs directly inside the executable

YES, thank you JB

Petter1@lemm.ee · 8 months ago

Justin Bieber?

m8052@lemmy.world · 8 months ago

JB stands for Jean-Baptiste, who is the main maintainer of vlc

brbposting@sh.itjust.works · 8 months ago

Ah JBK of course!

Petter1@lemm.ee · 8 months ago

❤️

renzev@lemmy.world · 8 months ago

This sounds like a great thing for deaf people and just in general, but I don’t think AI will ever replace anime fansub makers who have no problem throwing a wall of text on screen for a split second just to explain an obscure untranslatable pun.

rustyricotta@lemmy.ml · 8 months ago

Bless those subbers. I love those walls of text.

FMT99@lemmy.world · 8 months ago

Translator’s note: keikaku means plan

FordBeeblebrox@lemmy.world · 8 months ago

They are like the * in any Terry Pratchett (GNU) novel, sometimes a funny joke can have a little more spice added to make it even funnier

cley_faye@lemmy.world · 8 months ago

It’s unlikely to even replace good subtitles, fan or not. It’s just a nice thing to have for a lot of content though.

boonhet@lemm.ee · edit-2 8 months ago

I have family members who can’t really understand spoken English because it’s a bit fast, and can’t read English subtitles again, because again, too fast for them.

Sometimes you download a movie and all the Estonian subtitles are for an older release and they desynchronize. Sometimes you can barely even find synchronized English subtitles, so even that doesn’t work.

This seems like a godsend, honestly.

Funnily enough, of all the streaming services, I’m again going to have to commend Apple TV+ here. Their shit has Estonian subtitles. Netflix, Prime, etc, do not. Meaning if I’m watching with a family member who doesn’t understand English well, I’ll watch Apple TV+ with a subscription, and everything else is going to be pirated for subtitles. So I don’t bother subscribing anymore. We’re a tiny country, but for some reason Apple of all companies has chosen to acknowledge us. Meanwhile, I was setting up an Xbox for someone a few years ago, and Estonia just… straight up doesn’t exist. I’m not talking about language support - you literally couldn’t pick it as your LOCATION.

brbposting@sh.itjust.works · 8 months ago

For all their faults, Apple knows accessibility. Good job Timmy.

m-p{3}@lemmy.ca · edit-2 8 months ago

Now I want some AR glasses that display subtitles above someone’s head when they talk à la Cyberpunk that also auto-translates. Of course, it has to be done entirely locally.

billwashere@lemmy.world · 8 months ago

This might be one of the few times I’ve seen AI being useful and not just slapped on something for marketing purposes.

PalmTreeIsBestTree@lemmy.world · 8 months ago

And not to do evil shit

SatansMaggotyCumFart@lemmy.world · 8 months ago

But the toppings contains potassium benzoate.

ZeroOne@lemmy.world · 8 months ago

As long as the models are OpenSource I have no complains

Knock_Knock_Lemmy_In@lemmy.world · 8 months ago

And the data stays local.

Phoenixz@lemmy.ca · edit-2 8 months ago

As vlc is open source, can we expect this technology to also be available for, say, jellyfin, so that I can for once and for all have subtitles.done right?

Edit: I think it’s great that vlc has this, but this sounds like something many other apps could benefit from

QuadratureSurfer@lemmy.world · 8 months ago

It’s already available for anyone to use. https://github.com/openai/whisper

They’re using OpenAI’s Whisper model for this: https://code.videolan.org/videolan/vlc/-/merge_requests/5155

Eagle0110@lemmy.world · 8 months ago

Has there been any estimated minimal system requirements for this yet, since it runs locally?

WalnutLum@lemmy.ml · edit-2 8 months ago

It’s actually using whisper.cpp

From the README:

Memory usage Model Disk Mem tiny 75 MiB ~273 MB base 142 MiB ~388 MB small 466 MiB ~852 MB medium 1.5 GiB ~2.1 GB large 2.9 GiB ~3.9 GiB

Those are the model sizes

Eagle0110@lemmy.world · 8 months ago

Oh wow those pretty tiny memory requirements for a decent modern system! That’s actually very impressive! :D

Many people can probably even run this on older media servers or even just a plain NAS! That’s awesome! :D

GreenKnight23@lemmy.world · edit-2 8 months ago

deleted by creator

Alexstarfire@lemmy.world · 8 months ago

That explains why their subtitles have seemed worse to me lately. Every now and then I see something obviously wrong and wonder how it got by anyone who looked at it. Now I know why. No one looked at it.

GreenKnight23@lemmy.world · edit-2 8 months ago

deleted by creator

dance_ninja@lemmy.world · 8 months ago

Malevolent Kitchen Intensifies

Eezyville@sh.itjust.works · edit-2 8 months ago

I hope it’s available for Stash App. I wanna know what this JAV girls are saying.

NOT_RICK@lemmy.world · 8 months ago

( ͡° ͜ʖ ͡°)

asbestos@lemmy.world · 8 months ago

Ooooh I like this

Doorbook@lemmy.world · 8 months ago

The nice thing is, now at least this can be used with live tv from other countries and languages.

Think you want to watch Japanese tv or Korean channels with out bothering about downloading, searching and syncing subtitles

sugar_in_your_tea@sh.itjust.works · 8 months ago

I prefer watching Mexican football announcers, and it would be nice to know what they’re saying. Though that might actually detract from the experience.

InFerNo@lemmy.ml · 8 months ago

GOOOOOOAAAAAAAAALLLLLLLLLL

Qwaffle_waffle@sh.itjust.works · 8 months ago

Just fill up the whole screen with this.

werefreeatlast@lemmy.world · 8 months ago

The opposing team has scored.

Thistlewick@lemmynsfw.com · 8 months ago

Amazing. I can finally find out exactly what that nurse is yelling about while she gets railed by the local basketball team.

Clot@lemm.ee · 8 months ago

Will it be possible to export these AI subs?

TheRealKuni@lemmy.world · 8 months ago

And yet they turned down having thumbnails for seeking because it would be too resource intensive. 😐

DreamlandLividity@lemmy.world · 8 months ago

I mean, it would. For example Jellyfin implements it, but it does so by extracting the pictures ahead of time and saving them. It takes days to do this for my library.

fishpen0@lemmy.world · 8 months ago

Yeah, I do this for plex as well, and stash. I think if the file already exists in the directory vlc should use it. It’s up to you to generate them. That is exactly how cover art for albums on songs worked in VLC for a decade before they added the feature to pull cover art on the fly.

DreamlandLividity@lemmy.world · 8 months ago

I get what you are saying, but I don’t think there is any standardized format for these trickplay images. The same images from Plex would likely not be usable in Jellyfin without converting the metadata (e.g. to which time in the video an image belongs to). So VLC probably does not have a good way to understand trickplay images not made by VLC.

cley_faye@lemmy.world · 8 months ago

Video decoding is resource intensive. We’re used to it, we have hardware acceleration for some of it, but spewing something around 52 million pixels every second from a highly compressed data source is not cheap. I’m not sure how both compare, but small LLM models are not that costly to run if you don’t factor their creation in.

TheRealKuni@lemmy.world · 8 months ago

All they’d need to do is generate thumbnails for every period on video load. Make that period adjustable. Might take a few extra seconds to load a video. Make it off by default if they’re worried about the performance hit.

There are other desktop video players that make this work.

serenissi@lemmy.world · 8 months ago

It is useful for internet streams though, not really for local or lan video.

Nalivai@lemmy.world · 8 months ago

The technology is nowhere near being good though. On synthetic tests, on the data it was trained and tweeked on, maybe, I don’t know.
I corun an event when we invite speakers from all over the world, and we tried every way to generate subtitles, all of them run on the level of YouTube autogenerated ones. It’s better than nothing, but you can’t rely on it really.

TriflingToad@sh.itjust.works · edit-2 8 months ago

is your goal to rely on it, or to have it as a backup?
For my purpose of having backup nearly anything will be better than nothing.

Nalivai@lemmy.world · 8 months ago

When you do live streaming there is no time for backup, it either works or not. Better than nothing, that’s for sure, but also maybe marginally better than whatever we had 10 years ago

Petter1@lemm.ee · 8 months ago

You were not able to test it yet calling it nowhere near good 🤦🏻

Like how should you know?!

Nalivai@lemmy.world · edit-2 8 months ago

Relax, they didn’t write a new way of doing magic, they integrated a solution from the market.
I don’t know what the new BMW car they introduce this year is capable of, but I know for a fact it can’t fly.

SuperCub@sh.itjust.works · 8 months ago

Haven’t watched the video yet, but it makes a lot of sense that you could train an AI using already subtitled movies and their audio. There are times when official subtitles paraphrase the speech to make it easier to read quickly, so I wonder how that would work. There’s also just a lot of voice recognition everywhere nowadays, so maybe that’s all they need?

/home/pineapplelover@lemm.ee · 8 months ago

When does this get released? I really want to try it

squid_slime@lemm.ee · 8 months ago

When we getting amd’s fsr upscaling and frame-gen? Also would subtitles make more sense to use the jellyfin approach.

Naz@sh.itjust.works · 8 months ago

I have an AMD card, add VLC as a game in the drivers, and you can turn on AMFM (frame gen).

If it doesn’t work you could just turn it on system wide in display settings of the Adrenaline Software (gear upper right corner, display/gaming).

I think it requires at least a 6000 series GPU however.

If you have a Samsung TV or other modern smart TV connected to a laptop, you can also turn on frame-gen using Auto Motion Plus, set to Custom.

Judder Reduction 10 is double frames, so 24 FPS -> 48.

squid_slime@lemm.ee · 8 months ago

I’m using Linux. There probably is a way to have a similar outcome but an integrated solution as vlc and amd once mentioned would be better.

CriticalMiss@lemmy.world · 8 months ago

Gamescope probably. It has launch options to force FSR iirc