WhisperX — Automated Transcripts w/ Timestamps and Speaker Tagging

dgdft@lemmy.world · 2 days ago

dgdft@lemmy.world · edit-2 15 hours ago

You should be able to get decent results if you pipe your tracks through demucs first to isolate the vocals.

Vanilla whisper will probably be better than whisperX for that use case though.

Depending on how esoteric your music library is, you can also build a lyrics DB with beets: https://beets.readthedocs.io/en/stable/plugins/lyrics.html

irmadlad@lemmy.world · 15 hours ago

I use UVR for vocal isolation. It just works, but that shouldn’t be a problem. I’ll check it out. At the worst, I’ll learn something.