Skip to content
Snippets Groups Projects
Commit e6f2c097 authored by Jan Hartig's avatar Jan Hartig
Browse files

Switch to WhisperX backend

parent 731f8575
Branches
No related tags found
1 merge request!15Switch to WhisperX backend
......@@ -2,7 +2,7 @@
1. Webserver takes and validates user submitted files
2. Cron job scans files and enqueues new jobs on cluster
3. Job gets processed on the cluster using [whisper-webvtt-transcriber](https://gitlab1.ptb.de/janhartig/whisper-webvtt-transcriber)
3. Job gets processed on the cluster using [WhisperX](https://github.com/m-bain/whisperX)
4. Mailservice scans job folders for completed jobs and:
- Sends processed files to users
- Optional: Notifies admins on processing errors
......@@ -24,7 +24,7 @@ job_uuid:
Preprocessed input file. Contains only audio data to conserve disk space.
### video_language.txt
Contains the video language tag used for processing with [whisper-webvtt-transcriber](https://gitlab1.ptb.de/janhartig/whisper-webvtt-transcriber).
Contains the video language tag used for processing with [WhisperX](https://github.com/m-bain/whisperX).
Is used by the cronjob script (step 3).
### metadata.json
......
......@@ -40,8 +40,8 @@ de = "Sprache"
en = "Language"
[ language.helptext ]
de = "Die gesprochene Sprache der Aufnahme.<br>Bei mehrsprachigen Aufnahmen wählen Sie die Häufigste."
en = "Spoken language of recording.<br>For multi-language recordings choose the most frequent."
de = "Die gesprochene Sprache der Aufnahme.<br>Bei mehrsprachigen Aufnahmen wählen Sie die automatische Erkennung."
en = "Spoken language of recording.<br>For multi-language recordings choose auto-detect."
[ language.choose ]
de = "Wählen..."
......@@ -63,6 +63,14 @@ en = "French"
de = "Spanisch"
en = "Spanish"
[ language.options.it ]
de = "Italienisch"
en = "Italian"
[ language.options.auto ]
de = "Automatisch"
en = "Auto-detect"
[ language.feedback.required ]
de = "Bitte wählen Sie die Sprache des Videos."
en = "Please select language of recording."
......
......@@ -82,7 +82,7 @@ def main(end):
.name
)
with open(Path(job).joinpath("subtitles.vtt")) as f:
with open(Path(job).joinpath("audio.vtt")) as f:
msg.add_attachment(f.read(), filename=filename)
s.send_message(msg)
......
av~=13.0.0
Flask~=3.0.0
Flask-WTF~=1.2.1
wtforms[email]~=3.1.2
whitenoise~=6.7.0
av~=14.0.0
Flask~=3.1.0
Flask-WTF~=1.2.2
wtforms[email]~=3.2.1
whitenoise~=6.9.0
requests~=2.32.3
\ No newline at end of file
......@@ -48,7 +48,7 @@ def upload(language: str):
audio_stream = [stream for stream in container.streams if stream.type == "audio"][0]
with av.open(path.join(folder_path, "audio.mkv"), "w") as out:
out_stream = out.add_stream(template=audio_stream)
out_stream = out.add_stream_from_template(audio_stream)
for packet in container.demux(audio_stream):
# Skip the "flushing" packets that `demux` generates.
......@@ -60,10 +60,12 @@ def upload(language: str):
out.mux(packet)
video_language = "None" if form.language.data == "auto" else form.language.data
metadata = {
"email": form.email.data,
"language": language,
"video_language": form.language.data,
"video_language": video_language,
"filename": file.filename,
}
......@@ -71,7 +73,7 @@ def upload(language: str):
json.dump(metadata, f)
with open(path.join(folder_path, "video_language.txt"), "w") as f:
f.write("{}".format(form.language.data))
f.write(video_language)
open(path.join(folder_path, "new"), "wb").close()
......
......@@ -63,7 +63,8 @@
<br>
<p class="mb-1">{{ config["LOCALISATIONS"]["contact"]["text"][request.language] }}:</p>
<p class="mb-1">{{ config["CONTACT"]["ORG"] }} <a class="link-secondary" href="mailto:{{ config["CONTACT"]["MAIL"] }}">{{ config["CONTACT"]["NAME"] }}</a></p>
<p class="font-monospace"><a class="link-secondary" target="_blank" referrerpolicy="no-referrer" href="https://gitlab1.ptb.de/janhartig/whisper-webvtt-transcriber">whisper-webvtt-transcriber</a></p>
<p class="font-monospace"><a class="link-secondary" target="_blank" referrerpolicy="no-referrer" href="https://github.com/m-bain/whisperX">WhisperX</a><br>
<a class="link-secondary" target="_blank" referrerpolicy="no-referrer" href="https://gitlab1.ptb.de/janhartig/ptb-subtitler">ptb-subtitle-service</a></p>
</div>
</footer>
<script src="{{ url_for('static', filename='js/bootstrap.bundle.min.js') }}"></script>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment