8 Commits

Author SHA1 Message Date
himeditator
6bff978b88 feat(engine): 替换重采样模型、SOSV 添加标点恢复模型
- 将 samplerate 库替换为 resampy 库,提高重采样质量
- Shepra-ONNX SenseVoice 添加中文和英语标点恢复模型
2025-09-06 23:15:33 +08:00
himeditator
eba2c5ca45 feat(engine): 重构字幕引擎,新增 Sherpa-ONNX SenseVoice 语音识别模型
- 重构字幕引擎,将音频采集改为在新线程上进行
- 重构 audio2text 中的类,调整运行逻辑
- 更新 main 函数,添加对 Sosv 模型的支持
- 修改 AudioStream 类,默认使用 16000Hz 采样率
2025-09-06 20:49:46 +08:00
himeditator
2b7ce06f04 feat(translation): 添加非实时翻译功能用户界面组件 2025-09-04 23:41:22 +08:00
himeditator
14987cbfc5 feat(vosk): 为 Vosk 模型添加非实时翻译功能 (#14)
- 添加 Ollama 大模型翻译和 Google 翻译(非实时),支持多种语言
- 为 Vosk 引擎添加非实时翻译
- 为新增的翻译功能添加和修改接口
- 修改 Electron 构建配置,之后不同平台构建无需修改构建文件
2025-09-02 23:19:53 +08:00
himeditator
56fdc348f8 fix(engine): 解决在引擎状态不为 running 时强制关闭字幕引擎失败的问题
- 合并了 CaptionEngine 类中的 kill 和 forceKill 方法,删除了状态警告中的提前  return
- 更新了 README 文件中的macOS兼容性说明,添加了配置链接
2025-08-30 20:57:26 +08:00
Chen Janai
f42458124e Merge pull request #17 from xuemian168/main
feat(engine): 添加启动超时功能和强制终止引擎的支持
2025-08-28 12:25:33 +08:00
himeditator
2352bcee5d feat(engine): 优化超时启动功能的小问题
- 更新接口文档
- 修改国际化文本使得内容不超过标签长度
- 解决强制关闭按钮点击无效的问题
2025-08-28 12:22:19 +08:00
xuemian
051a497f3a feat(engine): 添加启动超时功能和强制终止引擎的支持
- 在 ControlWindow 中添加了 'control.engine.forceKill' 事件处理,允许强制终止引擎。
- 在 CaptionEngine 中实现了启动超时机制,若引擎启动超时,将自动强制停止并发送错误消息。
- 更新了国际化文件,添加了与启动超时相关的提示信息。
- 在 EngineControl 组件中添加了启动超时的输入选项,允许用户设置超时时间。
- 更新了相关类型定义以支持新的启动超时配置。
2025-08-28 10:24:08 +10:00
38 changed files with 877 additions and 274 deletions

View File

@@ -49,7 +49,7 @@
| 操作系统版本 | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
| ------------------ | ---------- | ---------------- | ---------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ [需要额外配置](./docs/user-manual/zh.md#macos-获取系统音频输出) | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
@@ -188,15 +188,3 @@ npm run build:mac
# For Linux
npm run build:linux
```
注意,根据不同的平台需要修改项目根目录下 `electron-builder.yml` 文件中的配置内容:
```yml
extraResources:
# For Windows
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# For macOS and Linux
# - from: ./engine/dist/main
# to: ./engine/main
```

View File

@@ -49,7 +49,7 @@ The software has been adapted for Windows, macOS, and Linux platforms. The teste
| OS Version | Architecture | System Audio Input | System Audio Output |
| ------------------ | ------------ | ------------------ | ------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ [Additional config required](./docs/user-manual/en.md#capturing-system-audio-output-on-macos) | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
@@ -188,15 +188,3 @@ npm run build:mac
# For Linux
npm run build:linux
```
Note: You need to modify the configuration content in the `electron-builder.yml` file in the project root directory according to different platforms:
```yml
extraResources:
# For Windows
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# For macOS and Linux
# - from: ./engine/dist/main
# to: ./engine/main
```

View File

@@ -49,7 +49,7 @@
| OS バージョン | アーキテクチャ | システムオーディオ入力 | システムオーディオ出力 |
| ------------------ | ------------ | ------------------ | ------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ [追加設定が必要](./docs/user-manual/ja.md#macos-でのシステムオーディオ出力の取得方法) | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
@@ -188,15 +188,3 @@ npm run build:mac
# Linux 用
npm run build:linux
```
注意: プラットフォームに応じて、プロジェクトルートディレクトリにある `electron-builder.yml` ファイルの設定内容を変更する必要があります:
```yml
extraResources:
# Windows 用
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# macOS と Linux 用
# - from: ./engine/dist/main
# to: ./engine/main
```

View File

@@ -153,4 +153,18 @@
### 优化体验
- 优化软件用户界面的部分组件
- 更清晰的日志输出
- 更清晰的日志输出
## v0.8.0
2025-09-??
### 新增功能
- 字幕引擎添加超时关闭功能:如果在规定时间字幕引擎没有启动成功会自动关闭、在字幕引擎启动过程中也可选择关闭字幕引擎
- 添加非实时翻译功能:支持调用 Ollama 本地模型进行翻译、支持调用 Google 翻译 API 进行翻译
### 优化体验
- 带有额外信息的标签颜色改为与主题色一致

View File

@@ -58,6 +58,18 @@ Electron 主进程通过 TCP Socket 向 Python 进程发送数据。发送的数
Python 端监听到的音频流转换为的字幕数据。
### `translation`
```js
{
command: "translation",
time_s: string,
translation: string
}
```
语音识别的内容的翻译,可以根据起始时间确定对应的字幕。
### `print`
```js
@@ -67,7 +79,7 @@ Python 端监听到的音频流转换为的字幕数据。
}
```
输出 Python 端打印的内容。
输出 Python 端打印的内容,不计入日志
### `info`
@@ -78,7 +90,18 @@ Python 端监听到的音频流转换为的字幕数据。
}
```
Python 端打印的提示信息,比起 `print`,该信息更希望 Electron 端的关注
Python 端打印的提示信息,会计入日志
### `warn`
```js
{
command: "warn",
content: string
}
```
Python 端打印的警告信息,会计入日志。
### `error`
@@ -89,7 +112,7 @@ Python 端打印的提示信息,比起 `print`,该信息更希望 Electron
}
```
Python 端打印的错误信息,该错误信息需要在前端弹窗显示。
Python 端打印的错误信息,该错误信息在前端弹窗显示。
### `usage`

View File

@@ -182,6 +182,16 @@
**数据类型:** 无数据
### `control.engine.forceKill`
**介绍:** 强制关闭启动超时的字幕引擎
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** 无数据
### `caption.windowHeight.change`
**介绍:** 字幕窗口宽度发生改变

View File

@@ -1,5 +1,5 @@
appId: com.himeditator.autocaption
productName: auto-caption
productName: Auto Caption
directories:
buildResources: build
files:
@@ -13,13 +13,15 @@ files:
- '!engine/*'
- '!docs/*'
- '!assets/*'
- '!.repomap/*'
- '!.virtualme/*'
extraResources:
# For Windows
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# For macOS and Linux
# - from: ./engine/dist/main
# to: ./engine/main
- from: ./engine/dist/main
to: ./engine/main
win:
executableName: auto-caption
icon: build/icon.png

View File

@@ -1,3 +1,3 @@
from dashscope.common.error import InvalidParameter
from .gummy import GummyRecognizer
from .vosk import VoskRecognizer
from .vosk import VoskRecognizer
from .sosv import SosvRecognizer

View File

@@ -5,9 +5,10 @@ from dashscope.audio.asr import (
TranslationRecognizerRealtime
)
import dashscope
from dashscope.common.error import InvalidParameter
from datetime import datetime
from utils import stdout_cmd, stdout_obj, stderr
from utils import stdout_cmd, stdout_obj, stdout_err
from utils import shared_data
class Callback(TranslationRecognizerCallback):
"""
@@ -90,9 +91,23 @@ class GummyRecognizer:
"""启动 Gummy 引擎"""
self.translator.start()
def send_audio_frame(self, data):
"""发送音频帧,擎将自动识别将识别结果输出到标准输出中"""
self.translator.send_audio_frame(data)
def translate(self):
"""持续读取共享数据中的音频帧,并进行语音识别将识别结果输出到标准输出中"""
global shared_data
restart_count = 0
while shared_data.status == 'running':
chunk = shared_data.chunk_queue.get()
try:
self.translator.send_audio_frame(chunk)
except InvalidParameter as e:
restart_count += 1
if restart_count > 5:
stdout_err(str(e))
shared_data.status = "kill"
stdout_cmd('kill')
break
else:
stdout_cmd('info', f'Gummy engine stopped, restart attempt: {restart_count}...')
def stop(self):
"""停止 Gummy 引擎"""

176
engine/audio2text/sosv.py Normal file
View File

@@ -0,0 +1,176 @@
"""
Shepra-ONNX SenseVoice Model
This code file references the following:
https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/simulate-streaming-sense-voice-microphone.py
"""
import time
from datetime import datetime
import sherpa_onnx
import threading
import numpy as np
from utils import shared_data
from utils import stdout_cmd, stdout_obj
from utils import google_translate, ollama_translate
class SosvRecognizer:
"""
使用 Sense Voice 非流式模型处理流式音频数据,并在标准输出中输出 Auto Caption 软件可读取的 JSON 字符串数据
初始化参数:
model_path: Shepra ONNX Sense Voice 识别模型路径
vad_model: Silero VAD 模型路径
source: 识别源语言(auto, zh, en, ja, ko, yue)
target: 翻译目标语言
trans_model: 翻译模型名称
ollama_name: Ollama 模型名称
"""
def __init__(self, model_path: str, source: str, target: str | None, trans_model: str, ollama_name: str):
if model_path.startswith('"'):
model_path = model_path[1:]
if model_path.endswith('"'):
model_path = model_path[:-1]
self.model_path = model_path
self.ext = ""
if self.model_path[-4:] == "int8":
self.ext = ".int8"
self.source = source
self.target = target
if trans_model == 'google':
self.trans_func = google_translate
else:
self.trans_func = ollama_translate
self.ollama_name = ollama_name
self.time_str = ''
self.cur_id = 0
self.prev_content = ''
def start(self):
"""启动 Sense Voice 模型"""
self.recognizer = sherpa_onnx.OfflineRecognizer.from_sense_voice(
model=f"{self.model_path}/sensevoice/model{self.ext}.onnx",
tokens=f"{self.model_path}/sensevoice/tokens.txt",
language=self.source,
num_threads = 2,
)
vad_config = sherpa_onnx.VadModelConfig()
vad_config.silero_vad.model = f"{self.model_path}/silero_vad.onnx"
vad_config.silero_vad.threshold = 0.5
vad_config.silero_vad.min_silence_duration = 0.1
vad_config.silero_vad.min_speech_duration = 0.25
vad_config.silero_vad.max_speech_duration = 8
vad_config.sample_rate = 16000
self.window_size = vad_config.silero_vad.window_size
self.vad = sherpa_onnx.VoiceActivityDetector(vad_config, buffer_size_in_seconds=100)
if self.source == 'en':
model_config = sherpa_onnx.OnlinePunctuationModelConfig(
cnn_bilstm=f"{self.model_path}/punct-en/model{self.ext}.onnx",
bpe_vocab=f"{self.model_path}/punct-en/bpe.vocab"
)
punct_config = sherpa_onnx.OnlinePunctuationConfig(
model_config=model_config,
)
self.punct = sherpa_onnx.OnlinePunctuation(punct_config)
else:
punct_config = sherpa_onnx.OfflinePunctuationConfig(
model=sherpa_onnx.OfflinePunctuationModelConfig(
ct_transformer=f"{self.model_path}/punct/model{self.ext}.onnx"
),
)
self.punct = sherpa_onnx.OfflinePunctuation(punct_config)
self.buffer = []
self.offset = 0
self.started = False
self.started_time = .0
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
stdout_cmd('info', 'Shepra ONNX Sense Voice recognizer started.')
def send_audio_frame(self, data: bytes):
"""
发送音频帧给 SOSV 引擎,引擎将自动识别并将识别结果输出到标准输出中
Args:
data: 音频帧数据,采样率必须为 16000Hz
"""
caption = {}
caption['command'] = 'caption'
caption['translation'] = ''
data_np = np.frombuffer(data, dtype=np.int16).astype(np.float32)
self.buffer = np.concatenate([self.buffer, data_np])
while self.offset + self.window_size < len(self.buffer):
self.vad.accept_waveform(self.buffer[self.offset: self.offset + self.window_size])
if not self.started and self.vad.is_speech_detected():
self.started = True
self.started_time = time.time()
self.offset += self.window_size
if not self.started:
if len(self.buffer) > 10 * self.window_size:
self.offset -= len(self.buffer) - 10 * self.window_size
self.buffer = self.buffer[-10 * self.window_size:]
if self.started and time.time() - self.started_time > 0.2:
stream = self.recognizer.create_stream()
stream.accept_waveform(16000, self.buffer)
self.recognizer.decode_stream(stream)
text = stream.result.text.strip()
if text and self.prev_content != text:
caption['index'] = self.cur_id
caption['text'] = text
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.prev_content = text
stdout_obj(caption)
self.started_time = time.time()
while not self.vad.empty():
stream = self.recognizer.create_stream()
stream.accept_waveform(16000, self.vad.front.samples)
self.vad.pop()
self.recognizer.decode_stream(stream)
text = stream.result.text.strip()
if self.source == 'en':
text_with_punct = self.punct.add_punctuation_with_case(text)
else:
text_with_punct = self.punct.add_punctuation(text)
caption['index'] = self.cur_id
caption['text'] = text_with_punct
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
if text:
stdout_obj(caption)
if self.target:
th = threading.Thread(
target=self.trans_func,
args=(self.ollama_name, self.target, caption['text'], self.time_str),
daemon=True
)
th.start()
self.cur_id += 1
self.prev_content = ''
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.buffer = []
self.offset = 0
self.started = False
self.started_time = .0
def translate(self):
"""持续读取共享数据中的音频帧,并进行语音识别,将识别结果输出到标准输出中"""
global shared_data
while shared_data.status == 'running':
chunk = shared_data.chunk_queue.get()
self.send_audio_frame(chunk)
def stop(self):
"""停止 Sense Voice 模型"""
stdout_cmd('info', 'Shepra ONNX Sense Voice recognizer closed.')

View File

@@ -1,8 +1,11 @@
import json
import threading
import time
from datetime import datetime
from vosk import Model, KaldiRecognizer, SetLogLevel
from utils import stdout_cmd, stdout_obj
from utils import shared_data
from utils import stdout_cmd, stdout_obj, google_translate, ollama_translate
class VoskRecognizer:
@@ -11,14 +14,23 @@ class VoskRecognizer:
初始化参数:
model_path: Vosk 识别模型路径
target: 翻译目标语言
trans_model: 翻译模型名称
ollama_name: Ollama 模型名称
"""
def __init__(self, model_path: str):
def __init__(self, model_path: str, target: str | None, trans_model: str, ollama_name: str):
SetLogLevel(-1)
if model_path.startswith('"'):
model_path = model_path[1:]
if model_path.endswith('"'):
model_path = model_path[:-1]
self.model_path = model_path
self.target = target
if trans_model == 'google':
self.trans_func = google_translate
else:
self.trans_func = ollama_translate
self.ollama_name = ollama_name
self.time_str = ''
self.cur_id = 0
self.prev_content = ''
@@ -48,7 +60,16 @@ class VoskRecognizer:
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.prev_content = ''
if content == '': return
self.cur_id += 1
if self.target:
th = threading.Thread(
target=self.trans_func,
args=(self.ollama_name, self.target, caption['text'], self.time_str),
daemon=True
)
th.start()
else:
content = json.loads(self.recognizer.PartialResult()).get('partial', '')
if content == '' or content == self.prev_content:
@@ -63,6 +84,13 @@ class VoskRecognizer:
stdout_obj(caption)
def translate(self):
"""持续读取共享数据中的音频帧,并进行语音识别,将识别结果输出到标准输出中"""
global shared_data
while shared_data.status == 'running':
chunk = shared_data.chunk_queue.get()
self.send_audio_frame(chunk)
def stop(self):
"""停止 Vosk 引擎"""
stdout_cmd('info', 'Vosk recognizer closed.')

View File

@@ -1,90 +1,153 @@
import wave
import argparse
from utils import stdout_cmd, stdout_err
from utils import thread_data, start_server
import threading
from utils import stdout, stdout_cmd
from utils import shared_data, start_server
from utils import merge_chunk_channels, resample_chunk_mono
from audio2text import InvalidParameter, GummyRecognizer
from audio2text import GummyRecognizer
from audio2text import VoskRecognizer
from audio2text import SosvRecognizer
from sysaudio import AudioStream
def audio_recording(stream: AudioStream, resample: bool, save = False, path = ''):
global shared_data
stream.open_stream()
wf = None
if save:
if path != '':
path += '/'
wf = wave.open(f'{path}record.wav', 'wb')
wf.setnchannels(stream.CHANNELS)
wf.setsampwidth(stream.SAMP_WIDTH)
wf.setframerate(stream.CHUNK_RATE)
while shared_data.status == 'running':
raw_chunk = stream.read_chunk()
if save: wf.writeframes(raw_chunk) # type: ignore
if raw_chunk is None: continue
if resample:
chunk = resample_chunk_mono(raw_chunk, stream.CHANNELS, stream.RATE, 16000)
else:
chunk = merge_chunk_channels(raw_chunk, stream.CHANNELS)
shared_data.chunk_queue.put(chunk)
if save: wf.close() # type: ignore
stream.close_stream_signal()
def main_gummy(s: str, t: str, a: int, c: int, k: str):
global thread_data
"""
Parameters:
s: Source language
t: Target language
k: Aliyun Bailian API key
"""
stream = AudioStream(a, c)
if t == 'none':
engine = GummyRecognizer(stream.RATE, s, None, k)
else:
engine = GummyRecognizer(stream.RATE, s, t, k)
stream.open_stream()
engine.start()
chunk_mono = bytes()
restart_count = 0
while thread_data.status == "running":
try:
chunk = stream.read_chunk()
if chunk is None: continue
chunk_mono = merge_chunk_channels(chunk, stream.CHANNELS)
try:
engine.send_audio_frame(chunk_mono)
except InvalidParameter as e:
restart_count += 1
if restart_count > 5:
stdout_err(str(e))
thread_data.status = "kill"
stdout_cmd('kill')
break
else:
stdout_cmd('info', f'Gummy engine stopped, restart attempt: {restart_count}...')
except KeyboardInterrupt:
break
engine.send_audio_frame(chunk_mono)
stream.close_stream()
stream_thread = threading.Thread(
target=audio_recording,
args=(stream, False),
daemon=True
)
stream_thread.start()
try:
engine.translate()
except KeyboardInterrupt:
stdout("Keyboard interrupt detected. Exiting...")
engine.stop()
def main_vosk(a: int, c: int, m: str):
global thread_data
def main_vosk(a: int, c: int, vosk: str, t: str, tm: str, omn: str):
"""
Parameters:
a: Audio source: 0 for output, 1 for input
c: Chunk number in 1 second
vosk: Vosk model path
t: Target language
tm: Translation model type, ollama or google
omn: Ollama model name
"""
stream = AudioStream(a, c)
engine = VoskRecognizer(m)
if t == 'none':
engine = VoskRecognizer(vosk, None, tm, omn)
else:
engine = VoskRecognizer(vosk, t, tm, omn)
stream.open_stream()
engine.start()
stream_thread = threading.Thread(
target=audio_recording,
args=(stream, True),
daemon=True
)
stream_thread.start()
try:
engine.translate()
except KeyboardInterrupt:
stdout("Keyboard interrupt detected. Exiting...")
engine.stop()
while thread_data.status == "running":
try:
chunk = stream.read_chunk()
if chunk is None: continue
chunk_mono = resample_chunk_mono(chunk, stream.CHANNELS, stream.RATE, 16000)
engine.send_audio_frame(chunk_mono)
except KeyboardInterrupt:
break
stream.close_stream()
def main_sosv(a: int, c: int, sosv: str, s: str, t: str, tm: str, omn: str):
"""
Parameters:
a: Audio source: 0 for output, 1 for input
c: Chunk number in 1 second
sosv: Sherpa-ONNX SenseVoice model path
s: Source language
t: Target language
tm: Translation model type, ollama or google
omn: Ollama model name
"""
stream = AudioStream(a, c)
if t == 'none':
engine = SosvRecognizer(sosv, s, None, tm, omn)
else:
engine = SosvRecognizer(sosv, s, t, tm, omn)
engine.start()
stream_thread = threading.Thread(
target=audio_recording,
args=(stream, True),
daemon=True
)
stream_thread.start()
try:
engine.translate()
except KeyboardInterrupt:
stdout("Keyboard interrupt detected. Exiting...")
engine.stop()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
# both
# all
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
parser.add_argument('-c', '--chunk_rate', default=10, help='Number of audio stream chunks collected per second')
parser.add_argument('-p', '--port', default=8080, help='The port to run the server on, 0 for no server')
parser.add_argument('-p', '--port', default=0, help='The port to run the server on, 0 for no server')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code, "none" for no translation')
# gummy and sosv
parser.add_argument('-s', '--source_language', default='auto', help='Source language code')
# gummy only
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
# vosk and sosv
parser.add_argument('-tm', '--translation_model', default='ollama', help='Model for translation: ollama or google')
parser.add_argument('-omn', '--ollama_name', default='', help='Ollama model name for translation')
# vosk only
parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
parser.add_argument('-vosk', '--vosk_model', default='', help='The path to the vosk model.')
# sosv only
parser.add_argument('-sosv', '--sosv_model', default=None, help='The SenseVoice model path')
args = parser.parse_args()
if int(args.port) == 0:
thread_data.status = "running"
shared_data.status = "running"
else:
start_server(int(args.port))
if args.caption_engine == 'gummy':
main_gummy(
args.source_language,
@@ -97,10 +160,23 @@ if __name__ == "__main__":
main_vosk(
int(args.audio_type),
int(args.chunk_rate),
args.model_path
args.vosk_model,
args.target_language,
args.translation_model,
args.ollama_name
)
elif args.caption_engine == 'sosv':
main_sosv(
int(args.audio_type),
int(args.chunk_rate),
args.sosv_model,
args.source_language,
args.target_language,
args.translation_model,
args.ollama_name
)
else:
raise ValueError('Invalid caption engine specified.')
if thread_data.status == "kill":
if shared_data.status == "kill":
stdout_cmd('kill')

View File

@@ -1,7 +1,10 @@
dashscope
numpy
samplerate
resampy
vosk
pyinstaller
pyaudio; sys_platform == 'darwin'
pyaudiowpatch; sys_platform == 'win32'
googletrans
ollama
sherpa_onnx

View File

@@ -37,14 +37,13 @@ class AudioStream:
self.FORMAT = pyaudio.paInt16
self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
self.CHANNELS = int(self.device["maxInputChannels"])
self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // chunk_rate
self.DEFAULT_RATE = int(self.device["defaultSampleRate"])
self.CHUNK_RATE = chunk_rate
def reset_chunk_size(self, chunk_size: int):
"""
重新设置音频块大小
"""
self.CHUNK = chunk_size
self.RATE = 16000
self.CHUNK = self.RATE // self.CHUNK_RATE
self.open_stream()
self.close_stream()
def get_info(self):
dev_info = f"""
@@ -72,16 +71,27 @@ class AudioStream:
打开并返回系统音频输出流
"""
if self.stream: return self.stream
self.stream = self.mic.open(
format = self.FORMAT,
channels = int(self.CHANNELS),
rate = self.RATE,
input = True,
input_device_index = int(self.INDEX)
)
try:
self.stream = self.mic.open(
format = self.FORMAT,
channels = int(self.CHANNELS),
rate = self.RATE,
input = True,
input_device_index = int(self.INDEX)
)
except OSError:
self.RATE = self.DEFAULT_RATE
self.CHUNK = self.RATE // self.CHUNK_RATE
self.stream = self.mic.open(
format = self.FORMAT,
channels = int(self.CHANNELS),
rate = self.RATE,
input = True,
input_device_index = int(self.INDEX)
)
return self.stream
def read_chunk(self):
def read_chunk(self) -> bytes | None:
"""
读取音频数据
"""

View File

@@ -55,15 +55,10 @@ class AudioStream:
self.FORMAT = 16
self.SAMP_WIDTH = 2
self.CHANNELS = 2
self.RATE = 48000
self.RATE = 16000
self.CHUNK_RATE = chunk_rate
self.CHUNK = self.RATE // chunk_rate
def reset_chunk_size(self, chunk_size: int):
"""
重新设置音频块大小
"""
self.CHUNK = chunk_size
def get_info(self):
dev_info = f"""
音频捕获进程:
@@ -84,7 +79,7 @@ class AudioStream:
启动音频捕获进程
"""
self.process = subprocess.Popen(
["parec", "-d", self.source, "--format=s16le", "--rate=48000", "--channels=2"],
["parec", "-d", self.source, "--format=s16le", "--rate=16000", "--channels=2"],
stdout=subprocess.PIPE
)

View File

@@ -61,14 +61,13 @@ class AudioStream:
self.FORMAT = pyaudio.paInt16
self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
self.CHANNELS = int(self.device["maxInputChannels"])
self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // chunk_rate
self.DEFAULT_RATE = int(self.device["defaultSampleRate"])
self.CHUNK_RATE = chunk_rate
def reset_chunk_size(self, chunk_size: int):
"""
重新设置音频块大小
"""
self.CHUNK = chunk_size
self.RATE = 16000
self.CHUNK = self.RATE // self.CHUNK_RATE
self.open_stream()
self.close_stream()
def get_info(self):
dev_info = f"""
@@ -96,13 +95,24 @@ class AudioStream:
打开并返回系统音频输出流
"""
if self.stream: return self.stream
self.stream = self.mic.open(
format = self.FORMAT,
channels = self.CHANNELS,
rate = self.RATE,
input = True,
input_device_index = self.INDEX
)
try:
self.stream = self.mic.open(
format = self.FORMAT,
channels = self.CHANNELS,
rate = self.RATE,
input = True,
input_device_index = self.INDEX
)
except OSError:
self.RATE = self.DEFAULT_RATE
self.CHUNK = self.RATE // self.CHUNK_RATE
self.stream = self.mic.open(
format = self.FORMAT,
channels = self.CHANNELS,
rate = self.RATE,
input = True,
input_device_index = self.INDEX
)
return self.stream
def read_chunk(self) -> bytes | None:

View File

@@ -1,9 +1,5 @@
from .audioprcs import (
merge_chunk_channels,
resample_chunk_mono,
resample_chunk_mono_np,
resample_mono_chunk
)
from .audioprcs import merge_chunk_channels, resample_chunk_mono
from .sysout import stdout, stdout_err, stdout_cmd, stdout_obj, stderr
from .thdata import thread_data
from .server import start_server
from .shared import shared_data
from .server import start_server
from .translation import ollama_translate, google_translate

View File

@@ -1,4 +1,4 @@
import samplerate
import resampy
import numpy as np
import numpy.core.multiarray # do not remove
@@ -24,16 +24,15 @@ def merge_chunk_channels(chunk: bytes, channels: int) -> bytes:
return chunk_mono.tobytes()
def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int) -> bytes:
"""
将当前多通道音频数据块转换成单通道音频数据块,然后进行重采样
将当前多通道音频数据块转换成单通道音频数据块,进行重采样
Args:
chunk: 多通道音频数据块
channels: 通道数
orig_sr: 原始采样率
target_sr: 目标采样率
mode: 重采样模式,可选:'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
Return:
单通道音频数据块
@@ -49,60 +48,17 @@ def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: in
# (length,)
chunk_mono = np.mean(chunk_np.astype(np.float32), axis=1)
ratio = target_sr / orig_sr
chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode)
if orig_sr == target_sr:
return chunk_mono.astype(np.int16).tobytes()
chunk_mono_r = resampy.resample(chunk_mono, orig_sr, target_sr)
chunk_mono_r = np.round(chunk_mono_r).astype(np.int16)
return chunk_mono_r.tobytes()
def resample_chunk_mono_np(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best", dtype=np.float32) -> np.ndarray:
"""
将当前多通道音频数据块转换成单通道音频数据块,然后进行重采样,返回 Numpy 数组
Args:
chunk: 多通道音频数据块
channels: 通道数
orig_sr: 原始采样率
target_sr: 目标采样率
mode: 重采样模式,可选:'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
dtype: 返回 Numpy 数组的数据类型
Return:
单通道音频数据块
"""
if channels == 1:
chunk_mono = np.frombuffer(chunk, dtype=np.int16)
chunk_mono = chunk_mono.astype(np.float32)
real_len = round(chunk_mono.shape[0] * target_sr / orig_sr)
if(chunk_mono_r.shape[0] != real_len):
print(chunk_mono_r.shape[0], real_len)
if(chunk_mono_r.shape[0] > real_len):
chunk_mono_r = chunk_mono_r[:real_len]
else:
# (length * channels,)
chunk_np = np.frombuffer(chunk, dtype=np.int16)
# (length, channels)
chunk_np = chunk_np.reshape(-1, channels)
# (length,)
chunk_mono = np.mean(chunk_np.astype(np.float32), axis=1)
ratio = target_sr / orig_sr
chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode)
chunk_mono_r = chunk_mono_r.astype(dtype)
return chunk_mono_r
def resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
"""
将当前单通道音频块进行重采样
Args:
chunk: 单通道音频数据块
orig_sr: 原始采样率
target_sr: 目标采样率
mode: 重采样模式,可选:'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
Return:
单通道音频数据块
"""
chunk_np = np.frombuffer(chunk, dtype=np.int16)
chunk_np = chunk_np.astype(np.float32)
ratio = target_sr / orig_sr
chunk_r = samplerate.resample(chunk_np, ratio, converter_type=mode)
chunk_r = np.round(chunk_r).astype(np.int16)
return chunk_r.tobytes()
while chunk_mono_r.shape[0] < real_len:
chunk_mono_r = np.append(chunk_mono_r, chunk_mono_r[-1])
return chunk_mono_r.tobytes()

View File

@@ -1,12 +1,12 @@
import socket
import threading
import json
from utils import thread_data, stdout_cmd, stderr
from utils import shared_data, stdout_cmd, stderr
def handle_client(client_socket):
global thread_data
while thread_data.status == 'running':
global shared_data
while shared_data.status == 'running':
try:
data = client_socket.recv(4096).decode('utf-8')
if not data:
@@ -14,13 +14,13 @@ def handle_client(client_socket):
data = json.loads(data)
if data['command'] == 'stop':
thread_data.status = 'stop'
shared_data.status = 'stop'
break
except Exception as e:
stderr(f'Communication error: {e}')
break
thread_data.status = 'stop'
shared_data.status = 'stop'
client_socket.close()

8
engine/utils/shared.py Normal file
View File

@@ -0,0 +1,8 @@
import queue
class SharedData:
def __init__(self):
self.status = "running"
self.chunk_queue = queue.Queue()
shared_data = SharedData()

View File

@@ -1,5 +0,0 @@
class ThreadData:
def __init__(self):
self.status = "running"
thread_data = ThreadData()

View File

@@ -0,0 +1,49 @@
from ollama import chat
from ollama import ChatResponse
import asyncio
from googletrans import Translator
from .sysout import stdout_cmd, stdout_obj
lang_map = {
'en': 'English',
'es': 'Spanish',
'fr': 'French',
'de': 'German',
'it': 'Italian',
'ru': 'Russian',
'ja': 'Japanese',
'ko': 'Korean',
'zh': 'Chinese',
'zh-cn': 'Chinese'
}
def ollama_translate(model: str, target: str, text: str, time_s: str):
response: ChatResponse = chat(
model=model,
messages=[
{"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
{"role": "user", "content": text}
]
)
content = response.message.content or ""
if content.startswith('<think>'):
index = content.find('</think>')
if index != -1:
content = content[index+8:]
stdout_obj({
"command": "translation",
"time_s": time_s,
"translation": content.strip()
})
def google_translate(model: str, target: str, text: str, time_s: str):
translator = Translator()
try:
res = asyncio.run(translator.translate(text, dest=target))
stdout_obj({
"command": "translation",
"time_s": time_s,
"translation": res.text
})
except Exception as e:
stdout_cmd("warn", f"Google translation request failed, please check your network connection...")

View File

@@ -159,6 +159,10 @@ class ControlWindow {
captionEngine.stop()
})
ipcMain.on('control.engine.forceKill', () => {
captionEngine.kill()
})
ipcMain.on('control.captionLog.clear', () => {
allConfig.captionLog.splice(0)
})

View File

@@ -4,5 +4,6 @@ export default {
"engine.start.error": "Caption engine failed to start: ",
"engine.output.parse.error": "Unable to parse caption engine output as a JSON object: ",
"engine.error": "Caption engine error: ",
"engine.shutdown.error": "Failed to shut down the caption engine process: "
"engine.shutdown.error": "Failed to shut down the caption engine process: ",
"engine.start.timeout": "Caption engine startup timeout, automatically force stopped"
}

View File

@@ -4,5 +4,6 @@ export default {
"engine.start.error": "字幕エンジンの起動に失敗しました: ",
"engine.output.parse.error": "字幕エンジンの出力を JSON オブジェクトとして解析できませんでした: ",
"engine.error": "字幕エンジンエラー: ",
"engine.shutdown.error": "字幕エンジンプロセスの終了に失敗しました: "
"engine.shutdown.error": "字幕エンジンプロセスの終了に失敗しました: ",
"engine.start.timeout": "字幕エンジンの起動がタイムアウトしました。自動的に強制停止しました"
}

View File

@@ -4,5 +4,6 @@ export default {
"engine.start.error": "字幕引擎启动失败:",
"engine.output.parse.error": "字幕引擎输出内容无法解析为 JSON 对象:",
"engine.error": "字幕引擎错误:",
"engine.shutdown.error": "字幕引擎进程关闭失败:"
"engine.shutdown.error": "字幕引擎进程关闭失败:",
"engine.start.timeout": "字幕引擎启动超时,已自动强制停止"
}

View File

@@ -6,6 +6,8 @@ export interface Controls {
engineEnabled: boolean,
sourceLang: string,
targetLang: string,
transModel: string,
ollamaName: string,
engine: string,
audio: 0 | 1,
translation: boolean,
@@ -13,7 +15,8 @@ export interface Controls {
modelPath: string,
customized: boolean,
customizedApp: string,
customizedCommand: string
customizedCommand: string,
startTimeoutSeconds: number
}
export interface Styles {

View File

@@ -7,6 +7,11 @@ import { app, BrowserWindow } from 'electron'
import * as path from 'path'
import * as fs from 'fs'
interface CaptionTranslation {
time_s: string,
translation: string
}
const defaultStyles: Styles = {
lineBreak: 1,
fontFamily: 'sans-serif',
@@ -31,6 +36,8 @@ const defaultStyles: Styles = {
const defaultControls: Controls = {
sourceLang: 'en',
targetLang: 'zh',
transModel: 'ollama',
ollamaName: '',
engine: 'gummy',
audio: 0,
engineEnabled: false,
@@ -39,7 +46,8 @@ const defaultControls: Controls = {
translation: true,
customized: false,
customizedApp: '',
customizedCommand: ''
customizedCommand: '',
startTimeoutSeconds: 30
};
@@ -157,12 +165,28 @@ class AllConfig {
}
}
public sendCaptionLog(window: BrowserWindow, command: 'add' | 'upd' | 'set') {
public updateCaptionTranslation(trans: CaptionTranslation){
for(let i = this.captionLog.length - 1; i >= 0; i--){
if(this.captionLog[i].time_s === trans.time_s){
this.captionLog[i].translation = trans.translation
for(const window of BrowserWindow.getAllWindows()){
this.sendCaptionLog(window, 'upd', i)
}
break
}
}
}
public sendCaptionLog(
window: BrowserWindow,
command: 'add' | 'upd' | 'set',
index: number | undefined = undefined
) {
if(command === 'add'){
window.webContents.send(`both.captionLog.add`, this.captionLog[this.captionLog.length - 1])
window.webContents.send(`both.captionLog.add`, this.captionLog.at(-1))
}
else if(command === 'upd'){
window.webContents.send(`both.captionLog.upd`, this.captionLog[this.captionLog.length - 1])
if(index !== undefined) window.webContents.send(`both.captionLog.upd`, this.captionLog[index])
else window.webContents.send(`both.captionLog.upd`, this.captionLog.at(-1))
}
else if(command === 'set'){
window.webContents.send(`both.captionLog.set`, this.captionLog)

View File

@@ -14,8 +14,9 @@ export class CaptionEngine {
process: any | undefined
client: net.Socket | undefined
port: number = 8080
status: 'running' | 'starting' | 'stopping' | 'stopped' = 'stopped'
status: 'running' | 'starting' | 'stopping' | 'stopped' | 'starting-timeout' = 'stopped'
timerID: NodeJS.Timeout | undefined
startTimeoutID: NodeJS.Timeout | undefined
private getApp(): boolean {
if (allConfig.controls.customized) {
@@ -66,22 +67,23 @@ export class CaptionEngine {
this.command.push('-a', allConfig.controls.audio ? '1' : '0')
this.port = Math.floor(Math.random() * (65535 - 1024 + 1)) + 1024
this.command.push('-p', this.port.toString())
this.command.push(
'-t', allConfig.controls.translation ?
allConfig.controls.targetLang : 'none'
)
if(allConfig.controls.engine === 'gummy') {
this.command.push('-e', 'gummy')
this.command.push('-s', allConfig.controls.sourceLang)
this.command.push(
'-t', allConfig.controls.translation ?
allConfig.controls.targetLang : 'none'
)
if(allConfig.controls.API_KEY) {
this.command.push('-k', allConfig.controls.API_KEY)
}
}
else if(allConfig.controls.engine === 'vosk'){
this.command.push('-e', 'vosk')
this.command.push('-m', `"${allConfig.controls.modelPath}"`)
this.command.push('-vosk', `"${allConfig.controls.modelPath}"`)
this.command.push('-tm', allConfig.controls.transModel)
this.command.push('-omn', allConfig.controls.ollamaName)
}
}
Log.info('Engine Path:', this.appPath)
@@ -96,6 +98,10 @@ export class CaptionEngine {
public connect() {
if(this.client) { Log.warn('Client already exists, ignoring...') }
if (this.startTimeoutID) {
clearTimeout(this.startTimeoutID)
this.startTimeoutID = undefined
}
this.client = net.createConnection({ port: this.port }, () => {
Log.info('Connected to caption engine server');
});
@@ -130,6 +136,16 @@ export class CaptionEngine {
this.process = spawn(this.appPath, this.command)
this.status = 'starting'
Log.info('Caption Engine Starting, PID:', this.process.pid)
const timeoutMs = allConfig.controls.startTimeoutSeconds * 1000
this.startTimeoutID = setTimeout(() => {
if (this.status === 'starting') {
Log.warn(`Engine start timeout after ${allConfig.controls.startTimeoutSeconds} seconds, forcing kill...`)
this.status = 'starting-timeout'
controlWindow.sendErrorMessage(i18n('engine.start.timeout'))
this.kill()
}
}, timeoutMs)
this.process.stdout.on('data', (data: any) => {
const lines = data.toString().split('\n')
@@ -165,6 +181,10 @@ export class CaptionEngine {
}
this.status = 'stopped'
clearInterval(this.timerID)
if (this.startTimeoutID) {
clearTimeout(this.startTimeoutID)
this.startTimeoutID = undefined
}
Log.info(`Engine exited with code ${code}`)
});
}
@@ -172,7 +192,6 @@ export class CaptionEngine {
public stop() {
if(this.status !== 'running'){
Log.warn('Trying to stop engine which is not running, current status:', this.status)
return
}
this.sendCommand('stop')
if(this.client){
@@ -192,19 +211,29 @@ export class CaptionEngine {
if(this.status !== 'running'){
Log.warn('Trying to kill engine which is not running, current status:', this.status)
}
Log.warn('Trying to kill engine process, PID:', this.process.pid)
Log.warn('Killing engine process, PID:', this.process.pid)
if (this.startTimeoutID) {
clearTimeout(this.startTimeoutID)
this.startTimeoutID = undefined
}
if(this.client){
this.client.destroy()
this.client = undefined
}
if (this.process.pid) {
let cmd = `kill ${this.process.pid}`;
let cmd = `kill -9 ${this.process.pid}`;
if (process.platform === "win32") {
cmd = `taskkill /pid ${this.process.pid} /t /f`
}
exec(cmd)
exec(cmd, (error) => {
if (error) {
Log.error('Failed to kill process:', error)
} else {
Log.info('Process killed successfully')
}
})
}
this.status = 'stopping'
}
}
@@ -221,12 +250,18 @@ function handleEngineData(data: any) {
else if(data.command === 'caption') {
allConfig.updateCaptionLog(data);
}
else if(data.command === 'translation') {
allConfig.updateCaptionTranslation(data);
}
else if(data.command === 'print') {
Log.info('Engine Print:', data.content)
console.log(data.content)
}
else if(data.command === 'info') {
Log.info('Engine Info:', data.content)
}
else if(data.command === 'warn') {
Log.warn('Engine Warn:', data.content)
}
else if(data.command === 'error') {
Log.error('Engine Error:', data.content)
controlWindow.sendErrorMessage(/*i18n('engine.error') +*/ data.content)

View File

@@ -5,9 +5,18 @@
<a @click="applyChange">{{ $t('engine.applyChange') }}</a> |
<a @click="cancelChange">{{ $t('engine.cancelChange') }}</a>
</template>
<div class="input-item">
<span class="input-label">{{ $t('engine.captionEngine') }}</span>
<a-select
class="input-area"
v-model:value="currentEngine"
:options="captionEngine"
></a-select>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.sourceLang') }}</span>
<a-select
:disabled="currentEngine === 'vosk'"
class="input-area"
v-model:value="currentSourceLang"
:options="langList"
@@ -16,20 +25,33 @@
<div class="input-item">
<span class="input-label">{{ $t('engine.transLang') }}</span>
<a-select
:disabled="currentEngine === 'vosk'"
class="input-area"
v-model:value="currentTargetLang"
:options="langList.filter((item) => item.value !== 'auto')"
></a-select>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.captionEngine') }}</span>
<div class="input-item" v-if="transModel">
<span class="input-label">{{ $t('engine.transModel') }}</span>
<a-select
class="input-area"
v-model:value="currentEngine"
:options="captionEngine"
v-model:value="currentTransModel"
:options="transModel"
></a-select>
</div>
<div class="input-item" v-if="transModel && currentTransModel === 'ollama'">
<a-popover placement="right">
<template #content>
<p class="label-hover-info">{{ $t('engine.ollamaNote') }}</p>
</template>
<span class="input-label info-label"
:style="{color: uiColor}"
>{{ $t('engine.ollama') }}</span>
</a-popover>
<a-input
class="input-area"
v-model:value="currentOllamaName"
></a-input>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.audioType') }}</span>
<a-select
@@ -80,11 +102,13 @@
<a-card size="small" :title="$t('engine.showMore')" v-show="showMore" style="margin-top:10px;">
<div class="input-item">
<a-popover>
<a-popover placement="right">
<template #content>
<p class="label-hover-info">{{ $t('engine.apikeyInfo') }}</p>
</template>
<span class="input-label info-label">{{ $t('engine.apikey') }}</span>
<span class="input-label info-label"
:style="{color: uiColor}"
>{{ $t('engine.apikey') }}</span>
</a-popover>
<a-input
class="input-area"
@@ -93,14 +117,17 @@
/>
</div>
<div class="input-item">
<a-popover>
<a-popover placement="right">
<template #content>
<p class="label-hover-info">{{ $t('engine.modelPathInfo') }}</p>
</template>
<span class="input-label info-label">{{ $t('engine.modelPath') }}</span>
<span class="input-label info-label"
:style="{color: uiColor}"
>{{ $t('engine.modelPath') }}</span>
</a-popover>
<span
class="input-folder"
:style="{color: uiColor}"
@click="selectFolderPath"
><span><FolderOpenOutlined /></span></span>
<a-input
@@ -109,18 +136,37 @@
v-model:value="currentModelPath"
/>
</div>
<div class="input-item">
<a-popover placement="right">
<template #content>
<p class="label-hover-info">{{ $t('engine.startTimeoutInfo') }}</p>
</template>
<span
class="input-label info-label"
:style="{color: uiColor, verticalAlign: 'middle'}"
>{{ $t('engine.startTimeout') }}</span>
</a-popover>
<a-input-number
class="input-area"
v-model:value="currentStartTimeoutSeconds"
:min="10"
:max="120"
:step="5"
:addon-after="$t('engine.seconds')"
/>
</div>
</a-card>
</a-card>
<div style="height: 20px;"></div>
</template>
<script setup lang="ts">
import { ref, computed, watch } from 'vue'
import { ref, computed, watch, h } from 'vue'
import { storeToRefs } from 'pinia'
import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
import { useEngineControlStore } from '@renderer/stores/engineControl'
import { notification } from 'ant-design-vue'
import { FolderOpenOutlined ,InfoCircleOutlined } from '@ant-design/icons-vue';
import { ExclamationCircleOutlined, FolderOpenOutlined ,InfoCircleOutlined } from '@ant-design/icons-vue';
import { useI18n } from 'vue-i18n'
const { t } = useI18n()
@@ -129,16 +175,22 @@ const showMore = ref(false)
const engineControl = useEngineControlStore()
const { captionEngine, audioType, changeSignal } = storeToRefs(engineControl)
const generalSetting = useGeneralSettingStore()
const { uiColor } = storeToRefs(generalSetting)
const currentSourceLang = ref('auto')
const currentTargetLang = ref('zh')
const currentEngine = ref<string>('gummy')
const currentAudio = ref<0 | 1>(0)
const currentTranslation = ref<boolean>(false)
const currentTranslation = ref<boolean>(true)
const currentTransModel = ref('ollama')
const currentOllamaName = ref('')
const currentAPI_KEY = ref<string>('')
const currentModelPath = ref<string>('')
const currentCustomized = ref<boolean>(false)
const currentCustomizedApp = ref('')
const currentCustomizedCommand = ref('')
const currentStartTimeoutSeconds = ref<number>(30)
const langList = computed(() => {
for(let item of captionEngine.value){
@@ -149,9 +201,33 @@ const langList = computed(() => {
return []
})
const transModel = computed(() => {
for(let item of captionEngine.value){
if(item.value === currentEngine.value) {
return item.transModel
}
}
return []
})
function applyChange(){
if(
currentTranslation.value && transModel.value &&
currentTransModel.value === 'ollama' && !currentOllamaName.value.trim()
) {
notification.open({
message: t('noti.ollamaNameNull'),
description: t('noti.ollamaNameNullNote'),
duration: null,
icon: () => h(ExclamationCircleOutlined, { style: 'color: #ff4d4f' })
})
return
}
engineControl.sourceLang = currentSourceLang.value
engineControl.targetLang = currentTargetLang.value
engineControl.transModel = currentTransModel.value
engineControl.ollamaName = currentOllamaName.value
engineControl.engine = currentEngine.value
engineControl.audio = currentAudio.value
engineControl.translation = currentTranslation.value
@@ -160,6 +236,7 @@ function applyChange(){
engineControl.customized = currentCustomized.value
engineControl.customizedApp = currentCustomizedApp.value
engineControl.customizedCommand = currentCustomizedCommand.value
engineControl.startTimeoutSeconds = currentStartTimeoutSeconds.value
engineControl.sendControlsChange()
@@ -173,6 +250,8 @@ function applyChange(){
function cancelChange(){
currentSourceLang.value = engineControl.sourceLang
currentTargetLang.value = engineControl.targetLang
currentTransModel.value = engineControl.transModel
currentOllamaName.value = engineControl.ollamaName
currentEngine.value = engineControl.engine
currentAudio.value = engineControl.audio
currentTranslation.value = engineControl.translation
@@ -181,6 +260,7 @@ function cancelChange(){
currentCustomized.value = engineControl.customized
currentCustomizedApp.value = engineControl.customizedApp
currentCustomizedCommand.value = engineControl.customizedCommand
currentStartTimeoutSeconds.value = engineControl.startTimeoutSeconds
}
function selectFolderPath() {
@@ -200,7 +280,10 @@ watch(changeSignal, (val) => {
watch(currentEngine, (val) => {
if(val == 'vosk'){
currentSourceLang.value = 'auto'
currentTargetLang.value = ''
currentTargetLang.value = useGeneralSettingStore().uiLanguage
if(currentTargetLang.value === 'zh') {
currentTargetLang.value = 'zh-cn'
}
}
else if(val == 'gummy'){
currentSourceLang.value = 'auto'
@@ -218,8 +301,8 @@ watch(currentEngine, (val) => {
}
.info-label {
color: #1677ff;
cursor: pointer;
font-style: italic;
}
.input-folder {
@@ -230,20 +313,12 @@ watch(currentEngine, (val) => {
transition: all 0.25s;
}
.input-folder>span {
padding: 0 2px;
border: 2px solid #1677ff;
color: #1677ff;
border-radius: 30%;
}
.input-folder:hover {
transform: scale(1.1);
}
.customize-note {
padding: 10px 10px 0;
color: red;
max-width: min(40vw, 480px);
}
</style>

View File

@@ -67,11 +67,26 @@
@click="openCaptionWindow"
>{{ $t('status.openCaption') }}</a-button>
<a-button
v-if="!isStarting"
class="control-button"
:loading="pending && !engineEnabled"
:disabled="pending || engineEnabled"
@click="startEngine"
>{{ $t('status.startEngine') }}</a-button>
<a-popconfirm
v-if="isStarting"
:title="$t('status.forceKillConfirm')"
:ok-text="$t('status.confirm')"
:cancel-text="$t('status.cancel')"
@confirm="forceKillEngine"
>
<a-button
danger
class="control-button"
type="primary"
:icon="h(LoadingOutlined)"
>{{ $t('status.forceKillStarting') }}</a-button>
</a-popconfirm>
<a-button
danger class="control-button"
:loading="pending && engineEnabled"
@@ -128,15 +143,16 @@
<script setup lang="ts">
import { EngineInfo } from '@renderer/types'
import { ref, watch } from 'vue'
import { ref, watch, h } from 'vue'
import { storeToRefs } from 'pinia'
import { useCaptionLogStore } from '@renderer/stores/captionLog'
import { useSoftwareLogStore } from '@renderer/stores/softwareLog'
import { useEngineControlStore } from '@renderer/stores/engineControl'
import { GithubOutlined, InfoCircleOutlined } from '@ant-design/icons-vue'
import { GithubOutlined, InfoCircleOutlined, LoadingOutlined } from '@ant-design/icons-vue'
const showAbout = ref(false)
const pending = ref(false)
const isStarting = ref(false)
const captionLog = useCaptionLogStore()
const { captionData } = storeToRefs(captionLog)
@@ -158,8 +174,11 @@ function openCaptionWindow() {
function startEngine() {
pending.value = true
isStarting.value = true
if(engineControl.engine === 'vosk' && engineControl.modelPath.trim() === '') {
engineControl.emptyModelPathErr()
pending.value = false
isStarting.value = false
return
}
window.electron.ipcRenderer.send('control.engine.start')
@@ -170,6 +189,12 @@ function stopEngine() {
window.electron.ipcRenderer.send('control.engine.stop')
}
function forceKillEngine() {
pending.value = true
isStarting.value = false
window.electron.ipcRenderer.send('control.engine.forceKill')
}
function getEngineInfo() {
window.electron.ipcRenderer.invoke('control.engine.info').then((data: EngineInfo) => {
pid.value = data.pid
@@ -181,12 +206,16 @@ function getEngineInfo() {
})
}
watch(engineEnabled, () => {
watch(engineEnabled, (enabled) => {
pending.value = false
if (enabled) {
isStarting.value = false
}
})
watch(errorSignal, () => {
pending.value = false
isStarting.value = false
errorSignal.value = false
})
</script>

View File

@@ -21,6 +21,19 @@ export const engines = {
label: '本地 - Vosk',
languages: [
{ value: 'auto', label: '需要自行配置模型' },
{ value: 'en', label: '英语' },
{ value: 'zh-cn', label: '中文' },
{ value: 'ja', label: '日语' },
{ value: 'ko', label: '韩语' },
{ value: 'de', label: '德语' },
{ value: 'fr', label: '法语' },
{ value: 'ru', label: '俄语' },
{ value: 'es', label: '西班牙语' },
{ value: 'it', label: '意大利语' },
],
transModel: [
{ value: 'ollama', label: 'Ollama 本地模型' },
{ value: 'google', label: 'Google API 调用' },
]
}
],
@@ -46,6 +59,19 @@ export const engines = {
label: 'Local - Vosk',
languages: [
{ value: 'auto', label: 'Model needs to be configured manually' },
{ value: 'en', label: 'English' },
{ value: 'zh-cn', label: 'Chinese' },
{ value: 'ja', label: 'Japanese' },
{ value: 'ko', label: 'Korean' },
{ value: 'de', label: 'German' },
{ value: 'fr', label: 'French' },
{ value: 'ru', label: 'Russian' },
{ value: 'es', label: 'Spanish' },
{ value: 'it', label: 'Italian' },
],
transModel: [
{ value: 'ollama', label: 'Ollama Local Model' },
{ value: 'google', label: 'Google API Call' },
]
}
],
@@ -71,8 +97,20 @@ export const engines = {
label: 'ローカル - Vosk',
languages: [
{ value: 'auto', label: 'モデルを手動で設定する必要があります' },
{ value: 'en', label: '英語' },
{ value: 'zh-cn', label: '中国語' },
{ value: 'ja', label: '日本語' },
{ value: 'ko', label: '韓国語' },
{ value: 'de', label: 'ドイツ語' },
{ value: 'fr', label: 'フランス語' },
{ value: 'ru', label: 'ロシア語' },
{ value: 'es', label: 'スペイン語' },
{ value: 'it', label: 'イタリア語' },
],
transModel: [
{ value: 'ollama', label: 'Ollama ローカルモデル' },
{ value: 'google', label: 'Google API 呼び出し' },
]
}
]
}

View File

@@ -27,7 +27,10 @@ export default {
"engineChange": "Cpation Engine Configuration Changed",
"changeInfo": "If the caption engine is already running, you need to restart it for the changes to take effect.",
"styleChange": "Caption Style Changed",
"styleInfo": "Caption style changes have been saved and applied."
"styleInfo": "Caption style changes have been saved and applied.",
"engineStartTimeout": "Caption engine startup timeout, automatically force stopped",
"ollamaNameNull": "'Ollama' Field is Empty",
"ollamaNameNullNote": "When selecting Ollama model as the translation model, the 'Ollama' field cannot be empty and must be filled with the name of a locally configured Ollama model."
},
general: {
"title": "General Settings",
@@ -46,6 +49,9 @@ export default {
"cancelChange": "Cancel Changes",
"sourceLang": "Source",
"transLang": "Translation",
"transModel": "Model",
"ollama": "Ollama",
"ollamaNote": "To use for translation, the name of the local Ollama model that will call the service on the default port. It is recommended to use a non-inference model with less than 1B parameters.",
"captionEngine": "Engine",
"audioType": "Audio Type",
"systemOutput": "System Audio Output (Speaker)",
@@ -54,8 +60,11 @@ export default {
"showMore": "More Settings",
"apikey": "API KEY",
"modelPath": "Model Path",
"startTimeout": "Timeout",
"seconds": "seconds",
"apikeyInfo": "API KEY required for the Gummy subtitle engine, which needs to be obtained from the Alibaba Cloud Bailing platform. For more details, see the project user manual.",
"modelPathInfo": "The folder path of the model required by the Vosk subtitle engine. You need to download the required model to your local machine in advance. For more details, see the project user manual.",
"startTimeoutInfo": "Caption engine startup timeout duration. Engine will be forcefully stopped if startup exceeds this time. Recommended range: 10-120 seconds.",
"customEngine": "Custom Engine",
custom: {
"title": "Custom Caption Engine",
@@ -112,6 +121,11 @@ export default {
"startEngine": "Start Caption Engine",
"restartEngine": "Restart Caption Engine",
"stopEngine": "Stop Caption Engine",
"forceKill": "Force Stop",
"forceKillStarting": "Starting Engine... (Force Stop)",
"forceKillConfirm": "Are you sure you want to force stop the caption engine? This will terminate the process immediately.",
"confirm": "Confirm",
"cancel": "Cancel",
about: {
"title": "About This Project",
"proj": "Auto Caption Project",

View File

@@ -27,7 +27,10 @@ export default {
"engineChange": "字幕エンジンの設定が変更されました",
"changeInfo": "字幕エンジンがすでに起動している場合、変更を有効にするには再起動が必要です。",
"styleChange": "字幕のスタイルが変更されました",
"styleInfo": "字幕のスタイル変更が保存され、適用されました"
"styleInfo": "字幕のスタイル変更が保存され、適用されました",
"engineStartTimeout": "字幕エンジンの起動がタイムアウトしました。自動的に強制停止しました",
"ollamaNameNull": "Ollama フィールドが空です",
"ollamaNameNullNote": "Ollama モデルを翻訳モデルとして選択する場合、Ollama フィールドは空にできません。ローカルで設定された Ollama モデルの名前を入力してください。"
},
general: {
"title": "一般設定",
@@ -46,6 +49,9 @@ export default {
"cancelChange": "変更をキャンセル",
"sourceLang": "ソース言語",
"transLang": "翻訳言語",
"transModel": "翻訳モデル",
"ollama": "Ollama",
"ollamaNote": "翻訳に使用する、デフォルトポートでサービスを呼び出すローカルOllamaモデルの名前。1B 未満のパラメータを持つ非推論モデルの使用を推奨します。",
"captionEngine": "エンジン",
"audioType": "オーディオ",
"systemOutput": "システムオーディオ出力(スピーカー)",
@@ -54,8 +60,11 @@ export default {
"showMore": "詳細設定",
"apikey": "API KEY",
"modelPath": "モデルパス",
"startTimeout": "時間制限",
"seconds": "秒",
"apikeyInfo": "Gummy 字幕エンジンに必要な API KEY は、アリババクラウド百煉プラットフォームから取得する必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
"modelPathInfo": "Vosk 字幕エンジンに必要なモデルのフォルダパスです。必要なモデルを事前にローカルマシンにダウンロードする必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
"startTimeoutInfo": "字幕エンジンの起動タイムアウト時間です。この時間を超えると自動的に強制停止されます。10-120秒の範囲で設定することを推奨します。",
"customEngine": "カスタムエンジン",
custom: {
"title": "カスタムキャプションエンジン",
@@ -112,6 +121,11 @@ export default {
"startEngine": "字幕エンジンを開始",
"restartEngine": "字幕エンジンを再起動",
"stopEngine": "字幕エンジンを停止",
"forceKill": "強制停止",
"forceKillStarting": "エンジン起動中... (強制停止)",
"forceKillConfirm": "字幕エンジンを強制停止しますか?プロセスが直ちに終了されます。",
"confirm": "確認",
"cancel": "キャンセル",
about: {
"title": "このプロジェクトについて",
"proj": "Auto Caption プロジェクト",

View File

@@ -27,7 +27,10 @@ export default {
"engineChange": "字幕引擎配置已更改",
"changeInfo": "如果字幕引擎已经启动,需要重启字幕引擎修改才会生效",
"styleChange": "字幕样式已修改",
"styleInfo": "字幕样式修改已经保存并生效"
"styleInfo": "字幕样式修改已经保存并生效",
"engineStartTimeout": "字幕引擎启动超时,已自动强制停止",
"ollamaNameNull": "Ollama 字段为空",
"ollamaNameNullNote": "选择 Ollama 模型作为翻译模型时Ollama 字段不能为空,需要填写本地已经配置好的 Ollama 模型的名称。"
},
general: {
"title": "通用设置",
@@ -46,6 +49,9 @@ export default {
"cancelChange": "取消更改",
"sourceLang": "源语言",
"transLang": "翻译语言",
"transModel": "翻译模型",
"ollama": "Ollama",
"ollamaNote": "要使用的进行翻译的本地 Ollama 模型的名称,将调用默认端口的服务,建议使用参数量小于 1B 的非推理模型。",
"captionEngine": "字幕引擎",
"audioType": "音频类型",
"systemOutput": "系统音频输出(扬声器)",
@@ -54,8 +60,11 @@ export default {
"showMore": "更多设置",
"apikey": "API KEY",
"modelPath": "模型路径",
"startTimeout": "启动超时",
"seconds": "秒",
"apikeyInfo": "Gummy 字幕引擎需要的 API KEY需要在阿里云百炼平台获取。详细信息见项目用户手册。",
"modelPathInfo": "Vosk 字幕引擎需要的模型的文件夹路径,需要提前下载需要的模型到本地。信息详情见项目用户手册。",
"startTimeoutInfo": "字幕引擎启动超时时间,超过此时间将自动强制停止。建议设置为 10-120 秒之间。",
"customEngine": "自定义引擎",
custom: {
"title": "自定义字幕引擎",
@@ -112,6 +121,11 @@ export default {
"startEngine": "启动字幕引擎",
"restartEngine": "重启字幕引擎",
"stopEngine": "关闭字幕引擎",
"forceKill": "强行停止",
"forceKillStarting": "正在启动引擎... (强行停止)",
"forceKillConfirm": "确定要强行停止字幕引擎吗?这将立即终止进程。",
"confirm": "确定",
"cancel": "取消",
about: {
"title": "关于本项目",
"proj": "Auto Caption 项目",

View File

@@ -15,7 +15,12 @@ export const useCaptionLogStore = defineStore('captionLog', () => {
})
window.electron.ipcRenderer.on('both.captionLog.upd', (_, log) => {
captionData.value.splice(captionData.value.length - 1, 1, log)
for(let i = captionData.value.length - 1; i >= 0; i--) {
if(captionData.value[i].time_s === log.time_s){
captionData.value.splice(i, 1, log)
break
}
}
})
window.electron.ipcRenderer.on('both.captionLog.set', (_, logs) => {

View File

@@ -19,6 +19,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
const engineEnabled = ref(false)
const sourceLang = ref<string>('en')
const targetLang = ref<string>('zh')
const transModel = ref<string>('ollama')
const ollamaName = ref<string>('')
const engine = ref<string>('gummy')
const audio = ref<0 | 1>(0)
const translation = ref<boolean>(true)
@@ -27,6 +29,7 @@ export const useEngineControlStore = defineStore('engineControl', () => {
const customized = ref<boolean>(false)
const customizedApp = ref<string>('')
const customizedCommand = ref<string>('')
const startTimeoutSeconds = ref<number>(30)
const changeSignal = ref<boolean>(false)
const errorSignal = ref<boolean>(false)
@@ -36,6 +39,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
engineEnabled: engineEnabled.value,
sourceLang: sourceLang.value,
targetLang: targetLang.value,
transModel: transModel.value,
ollamaName: ollamaName.value,
engine: engine.value,
audio: audio.value,
translation: translation.value,
@@ -43,7 +48,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
modelPath: modelPath.value,
customized: customized.value,
customizedApp: customizedApp.value,
customizedCommand: customizedCommand.value
customizedCommand: customizedCommand.value,
startTimeoutSeconds: startTimeoutSeconds.value
}
window.electron.ipcRenderer.send('control.controls.change', controls)
}
@@ -66,6 +72,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
}
sourceLang.value = controls.sourceLang
targetLang.value = controls.targetLang
transModel.value = controls.transModel
ollamaName.value = controls.ollamaName
engine.value = controls.engine
audio.value = controls.audio
engineEnabled.value = controls.engineEnabled
@@ -75,6 +83,7 @@ export const useEngineControlStore = defineStore('engineControl', () => {
customized.value = controls.customized
customizedApp.value = controls.customizedApp
customizedCommand.value = controls.customizedCommand
startTimeoutSeconds.value = controls.startTimeoutSeconds
changeSignal.value = true
}
@@ -129,6 +138,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
engineEnabled, // 字幕引擎是否启用
sourceLang, // 源语言
targetLang, // 目标语言
transModel, // 翻译模型
ollamaName, // Ollama 模型
engine, // 字幕引擎
audio, // 选择音频
translation, // 是否启用翻译
@@ -137,6 +148,7 @@ export const useEngineControlStore = defineStore('engineControl', () => {
customized, // 是否使用自定义字幕引擎
customizedApp, // 自定义字幕引擎的应用程序
customizedCommand, // 自定义字幕引擎的命令
startTimeoutSeconds, // 启动超时时间(秒)
setControls, // 设置引擎配置
sendControlsChange, // 发送最新控制消息到后端
emptyModelPathErr, // 模型路径为空时显示警告

View File

@@ -6,6 +6,8 @@ export interface Controls {
engineEnabled: boolean,
sourceLang: string,
targetLang: string,
transModel: string,
ollamaName: string,
engine: string,
audio: 0 | 1,
translation: boolean,
@@ -13,7 +15,8 @@ export interface Controls {
modelPath: string,
customized: boolean,
customizedApp: string,
customizedCommand: string
customizedCommand: string,
startTimeoutSeconds: number
}
export interface Styles {