feat(engine): 替换重采样模型、SOSV 添加标点恢复模型

- 将 samplerate 库替换为 resampy 库，提高重采样质量 - Shepra-ONNX SenseVoice 添加中文和英语标点恢复模型
feat(engine): 重构字幕引擎，新增 Sherpa-ONNX SenseVoice 语音识别模型
2026-03-13 17:47:34 +08:00 · 2025-09-06 23:15:33 +08:00 · 2025-09-06 20:49:46 +08:00 · 2025-09-04 23:41:22 +08:00 · 2025-09-02 23:19:53 +08:00 · 2025-08-30 20:57:26 +08:00
33 changed files with 739 additions and 288 deletions
--- a/README.md
+++ b/README.md
@@ -49,7 +49,7 @@
 | 操作系统版本        | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
 | ------------------ | ---------- | ---------------- | ---------------- |
 | Windows 11 24H2    | x64        | ✅               | ✅                |
-| macOS Sequoia 15.5 | arm64      | ✅需要额外配置     | ✅                |
+| macOS Sequoia 15.5 | arm64      | ✅ [需要额外配置](./docs/user-manual/zh.md#macos-获取系统音频输出)     | ✅                |
 | Ubuntu 24.04.2     | x64        | ✅               | ✅                |
 | Kali Linux 2022.3  | x64        | ✅               | ✅                |
 | Kylin Server V10 SP3 | x64 | ✅ | ✅ |
@@ -188,15 +188,3 @@ npm run build:mac
 # For Linux
 npm run build:linux
 ```
-
-注意，根据不同的平台需要修改项目根目录下 `electron-builder.yml` 文件中的配置内容：
-
-```yml
-extraResources:
-  # For Windows
-  - from: ./engine/dist/main.exe
-    to: ./engine/main.exe
-  # For macOS and Linux
-  # - from: ./engine/dist/main
-  #   to: ./engine/main
-```
--- a/README_en.md
+++ b/README_en.md
@@ -49,7 +49,7 @@ The software has been adapted for Windows, macOS, and Linux platforms. The teste
 | OS Version         | Architecture | System Audio Input | System Audio Output |
 | ------------------ | ------------ | ------------------ | ------------------- |
 | Windows 11 24H2    | x64          | ✅                 | ✅                   |
-| macOS Sequoia 15.5 | arm64        | ✅ Additional config required | ✅        |
+| macOS Sequoia 15.5 | arm64        | ✅ [Additional config required](./docs/user-manual/en.md#capturing-system-audio-output-on-macos) | ✅        |
 | Ubuntu 24.04.2     | x64          | ✅                 | ✅                   |
 | Kali Linux 2022.3  | x64          | ✅                 | ✅                   |
 | Kylin Server V10 SP3 | x64 | ✅ | ✅ |
@@ -188,15 +188,3 @@ npm run build:mac
 # For Linux
 npm run build:linux
 ```
-
-Note: You need to modify the configuration content in the `electron-builder.yml` file in the project root directory according to different platforms:
-
-```yml
-extraResources:
-  # For Windows
-  - from: ./engine/dist/main.exe
-    to: ./engine/main.exe
-  # For macOS and Linux
-  # - from: ./engine/dist/main
-  #   to: ./engine/main
-```
--- a/README_ja.md
+++ b/README_ja.md
@@ -49,7 +49,7 @@
 | OS バージョン | アーキテクチャ | システムオーディオ入力 | システムオーディオ出力 |
 | ------------------ | ------------ | ------------------ | ------------------- |
 | Windows 11 24H2    | x64          | ✅                 | ✅                   |
-| macOS Sequoia 15.5 | arm64        | ✅ 追加設定が必要    | ✅                   |
+| macOS Sequoia 15.5 | arm64        | ✅ [追加設定が必要](./docs/user-manual/ja.md#macos-でのシステムオーディオ出力の取得方法)    | ✅                   |
 | Ubuntu 24.04.2     | x64          | ✅                 | ✅                   |
 | Kali Linux 2022.3  | x64          | ✅                 | ✅                   |
 | Kylin Server V10 SP3 | x64 | ✅ | ✅ |
@@ -188,15 +188,3 @@ npm run build:mac
 # Linux 用
 npm run build:linux
 ```
-
-注意: プラットフォームに応じて、プロジェクトルートディレクトリにある `electron-builder.yml` ファイルの設定内容を変更する必要があります:
-
-```yml
-extraResources:
-  # Windows 用
-  - from: ./engine/dist/main.exe
-    to: ./engine/main.exe
-  # macOS と Linux 用
-  # - from: ./engine/dist/main
-  #   to: ./engine/main
-```
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@@ -153,4 +153,18 @@
 ### 优化体验

 - 优化软件用户界面的部分组件
- 更清晰的日志输出
+- 更清晰的日志输出
+
+
+## v0.8.0
+
+2025-09-??
+
+### 新增功能
+
+- 字幕引擎添加超时关闭功能：如果在规定时间字幕引擎没有启动成功会自动关闭、在字幕引擎启动过程中也可选择关闭字幕引擎
+- 添加非实时翻译功能：支持调用 Ollama 本地模型进行翻译、支持调用 Google 翻译 API 进行翻译
+
+### 优化体验
+
+- 带有额外信息的标签颜色改为与主题色一致
--- a/docs/api-docs/caption-engine.md
+++ b/docs/api-docs/caption-engine.md
@@ -58,6 +58,18 @@ Electron 主进程通过 TCP Socket 向 Python 进程发送数据。发送的数

 Python 端监听到的音频流转换为的字幕数据。

+### `translation`
+
+```js
+{
+  command: "translation",
+  time_s: string,
+  translation: string
+}
+```
+
+语音识别的内容的翻译，可以根据起始时间确定对应的字幕。
+
 ### `print`

 ```js
@@ -67,7 +79,7 @@ Python 端监听到的音频流转换为的字幕数据。
 }
 ```

-输出 Python 端打印的内容。
+输出 Python 端打印的内容，不计入日志。

 ### `info`

@@ -78,7 +90,18 @@ Python 端监听到的音频流转换为的字幕数据。
 }
 ```

-Python 端打印的提示信息，比起 `print`，该信息更希望 Electron 端的关注。
+Python 端打印的提示信息，会计入日志。
+
+### `warn`
+
+```js
+{
+  command: "warn",
+  content: string
+}
+```
+
+Python 端打印的警告信息，会计入日志。

 ### `error`

@@ -89,7 +112,7 @@ Python 端打印的提示信息，比起 `print`，该信息更希望 Electron
 }
 ```

-Python 端打印的错误信息，该错误信息需要在前端弹窗显示。
+Python 端打印的错误信息，该错误信息会在前端弹窗显示。

 ### `usage`

--- a/electron-builder.yml
+++ b/electron-builder.yml
@@ -1,5 +1,5 @@
 appId: com.himeditator.autocaption
-productName: auto-caption
+productName: Auto Caption
 directories:
  buildResources: build
 files:
@@ -13,13 +13,15 @@ files:
  - '!engine/*'
  - '!docs/*'
  - '!assets/*'
+  - '!.repomap/*'
+  - '!.virtualme/*'
 extraResources:
  # For Windows
  - from: ./engine/dist/main.exe
    to: ./engine/main.exe
  # For macOS and Linux
-  # - from: ./engine/dist/main
-  #   to: ./engine/main
+  - from: ./engine/dist/main
+    to: ./engine/main
 win:
  executableName: auto-caption
  icon: build/icon.png
--- a/engine/audio2text/init.py
+++ b/engine/audio2text/init.py
@@ -1,3 +1,3 @@
-from dashscope.common.error import InvalidParameter
 from .gummy import GummyRecognizer
-from .vosk import VoskRecognizer
+from .vosk import VoskRecognizer
+from .sosv import SosvRecognizer
--- a/engine/audio2text/gummy.py
+++ b/engine/audio2text/gummy.py
@@ -5,9 +5,10 @@ from dashscope.audio.asr import (
    TranslationRecognizerRealtime
 )
 import dashscope
+from dashscope.common.error import InvalidParameter
 from datetime import datetime
-from utils import stdout_cmd, stdout_obj, stderr
-
+from utils import stdout_cmd, stdout_obj, stdout_err
+from utils import shared_data

 class Callback(TranslationRecognizerCallback):
    """
@@ -90,9 +91,23 @@ class GummyRecognizer:
        """启动 Gummy 引擎"""
        self.translator.start()

-    def send_audio_frame(self, data):
-        """发送音频帧，擎将自动识别并将识别结果输出到标准输出中"""
-        self.translator.send_audio_frame(data)
+    def translate(self):
+        """持续读取共享数据中的音频帧，并进行语音识别，将识别结果输出到标准输出中"""
+        global shared_data
+        restart_count = 0
+        while shared_data.status == 'running':
+            chunk = shared_data.chunk_queue.get()
+            try:
+                self.translator.send_audio_frame(chunk)
+            except InvalidParameter as e:
+                restart_count += 1
+                if restart_count > 5:
+                    stdout_err(str(e))
+                    shared_data.status = "kill"
+                    stdout_cmd('kill')
+                    break
+                else:
+                    stdout_cmd('info', f'Gummy engine stopped, restart attempt: {restart_count}...')

    def stop(self):
        """停止 Gummy 引擎"""
--- a/engine/audio2text/sosv.py
+++ b/engine/audio2text/sosv.py
@@ -0,0 +1,176 @@
+"""
+Shepra-ONNX SenseVoice Model
+
+This code file references the following:
+
+https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/simulate-streaming-sense-voice-microphone.py
+"""
+
+import time
+from datetime import datetime
+import sherpa_onnx
+import threading
+import numpy as np
+
+from utils import shared_data
+from utils import stdout_cmd, stdout_obj
+from utils import google_translate, ollama_translate
+
+
+class SosvRecognizer:
+    """
+    使用 Sense Voice 非流式模型处理流式音频数据，并在标准输出中输出 Auto Caption 软件可读取的 JSON 字符串数据
+
+    初始化参数：
+        model_path: Shepra ONNX Sense Voice 识别模型路径
+        vad_model: Silero VAD 模型路径
+        source: 识别源语言(auto, zh, en, ja, ko, yue)
+        target: 翻译目标语言
+        trans_model: 翻译模型名称
+        ollama_name: Ollama 模型名称
+    """
+    def __init__(self, model_path: str, source: str, target: str | None, trans_model: str, ollama_name: str):
+        if model_path.startswith('"'):
+            model_path = model_path[1:]
+        if model_path.endswith('"'):
+            model_path = model_path[:-1]
+        self.model_path = model_path
+        self.ext = ""
+        if self.model_path[-4:] == "int8":
+            self.ext = ".int8"
+        self.source = source
+        self.target = target
+        if trans_model == 'google':
+            self.trans_func = google_translate
+        else:
+            self.trans_func = ollama_translate
+        self.ollama_name = ollama_name
+        self.time_str = ''
+        self.cur_id = 0
+        self.prev_content = ''
+
+    def start(self):
+        """启动 Sense Voice 模型"""
+        self.recognizer = sherpa_onnx.OfflineRecognizer.from_sense_voice(
+            model=f"{self.model_path}/sensevoice/model{self.ext}.onnx",
+            tokens=f"{self.model_path}/sensevoice/tokens.txt",
+            language=self.source,
+            num_threads = 2,
+        )
+        
+        vad_config = sherpa_onnx.VadModelConfig()
+        vad_config.silero_vad.model = f"{self.model_path}/silero_vad.onnx"
+        vad_config.silero_vad.threshold = 0.5
+        vad_config.silero_vad.min_silence_duration = 0.1
+        vad_config.silero_vad.min_speech_duration = 0.25
+        vad_config.silero_vad.max_speech_duration = 8
+        vad_config.sample_rate = 16000
+        self.window_size = vad_config.silero_vad.window_size
+        self.vad = sherpa_onnx.VoiceActivityDetector(vad_config, buffer_size_in_seconds=100)
+
+        if self.source == 'en':
+            model_config = sherpa_onnx.OnlinePunctuationModelConfig(
+                cnn_bilstm=f"{self.model_path}/punct-en/model{self.ext}.onnx",
+                bpe_vocab=f"{self.model_path}/punct-en/bpe.vocab"
+            )
+            punct_config = sherpa_onnx.OnlinePunctuationConfig(
+                model_config=model_config,
+            )
+            self.punct = sherpa_onnx.OnlinePunctuation(punct_config)
+        else:
+            punct_config = sherpa_onnx.OfflinePunctuationConfig(
+                model=sherpa_onnx.OfflinePunctuationModelConfig(
+                    ct_transformer=f"{self.model_path}/punct/model{self.ext}.onnx"
+                ),
+            )
+            self.punct = sherpa_onnx.OfflinePunctuation(punct_config)
+
+        self.buffer = []
+        self.offset = 0
+        self.started = False
+        self.started_time = .0
+        self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+        stdout_cmd('info', 'Shepra ONNX Sense Voice recognizer started.')
+
+    def send_audio_frame(self, data: bytes):
+        """
+        发送音频帧给 SOSV 引擎，引擎将自动识别并将识别结果输出到标准输出中
+
+        Args:
+            data: 音频帧数据，采样率必须为 16000Hz
+        """
+        caption = {}
+        caption['command'] = 'caption'
+        caption['translation'] = ''
+
+        data_np = np.frombuffer(data, dtype=np.int16).astype(np.float32)
+        self.buffer = np.concatenate([self.buffer, data_np])
+        while self.offset + self.window_size < len(self.buffer):
+            self.vad.accept_waveform(self.buffer[self.offset: self.offset + self.window_size])
+            if not self.started and self.vad.is_speech_detected():
+                self.started = True
+                self.started_time = time.time()
+            self.offset += self.window_size
+
+        if not self.started:
+            if len(self.buffer) > 10 * self.window_size:
+                self.offset -= len(self.buffer) - 10 * self.window_size
+                self.buffer = self.buffer[-10 * self.window_size:]
+
+        if self.started and time.time() - self.started_time > 0.2:
+            stream = self.recognizer.create_stream()
+            stream.accept_waveform(16000, self.buffer)
+            self.recognizer.decode_stream(stream)
+            text = stream.result.text.strip()
+            if text and self.prev_content != text:
+                caption['index'] = self.cur_id
+                caption['text'] = text
+                caption['time_s'] = self.time_str
+                caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+                self.prev_content = text
+                stdout_obj(caption)
+            self.started_time = time.time()
+        
+        while not self.vad.empty():
+            stream = self.recognizer.create_stream()
+            stream.accept_waveform(16000, self.vad.front.samples)
+            self.vad.pop()
+            self.recognizer.decode_stream(stream)
+            text = stream.result.text.strip()
+
+            if self.source == 'en':
+                text_with_punct = self.punct.add_punctuation_with_case(text)
+            else:
+                text_with_punct = self.punct.add_punctuation(text)
+
+            caption['index'] = self.cur_id
+            caption['text'] = text_with_punct
+            caption['time_s'] = self.time_str
+            caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+            if text:
+                stdout_obj(caption)
+                if self.target:
+                    th = threading.Thread(
+                        target=self.trans_func,
+                        args=(self.ollama_name, self.target, caption['text'], self.time_str),
+                        daemon=True
+                    )
+                    th.start()    
+                self.cur_id += 1
+            self.prev_content = ''
+            self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+            self.buffer = []
+            self.offset = 0
+            self.started = False
+            self.started_time = .0
+
+    def translate(self):
+        """持续读取共享数据中的音频帧，并进行语音识别，将识别结果输出到标准输出中"""
+        global shared_data
+        while shared_data.status == 'running':
+            chunk = shared_data.chunk_queue.get()
+            self.send_audio_frame(chunk)
+    
+    def stop(self):
+        """停止 Sense Voice 模型"""
+        stdout_cmd('info', 'Shepra ONNX Sense Voice recognizer closed.')
--- a/engine/audio2text/vosk.py
+++ b/engine/audio2text/vosk.py
@@ -1,8 +1,11 @@
 import json
+import threading
+import time
 from datetime import datetime

 from vosk import Model, KaldiRecognizer, SetLogLevel
-from utils import stdout_cmd, stdout_obj
+from utils import shared_data
+from utils import stdout_cmd, stdout_obj, google_translate, ollama_translate


 class VoskRecognizer:
@@ -11,14 +14,23 @@ class VoskRecognizer:

    初始化参数：
        model_path: Vosk 识别模型路径
+        target: 翻译目标语言
+        trans_model: 翻译模型名称
+        ollama_name: Ollama 模型名称
    """
-    def __init__(self, model_path: str):
+    def __init__(self, model_path: str, target: str | None, trans_model: str, ollama_name: str):
        SetLogLevel(-1)
        if model_path.startswith('"'):
            model_path = model_path[1:]
        if model_path.endswith('"'):
            model_path = model_path[:-1]
        self.model_path = model_path
+        self.target = target
+        if trans_model == 'google':
+            self.trans_func = google_translate
+        else:
+            self.trans_func = ollama_translate
+        self.ollama_name = ollama_name
        self.time_str = ''
        self.cur_id = 0
        self.prev_content = ''
@@ -48,7 +60,16 @@ class VoskRecognizer:
            caption['time_s'] = self.time_str
            caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
            self.prev_content = ''
+            if content == '': return
            self.cur_id += 1
+            
+            if self.target:
+                th = threading.Thread(
+                    target=self.trans_func,
+                    args=(self.ollama_name, self.target, caption['text'], self.time_str),
+                    daemon=True
+                )
+                th.start()
        else:
            content = json.loads(self.recognizer.PartialResult()).get('partial', '')
            if content == '' or content == self.prev_content:
@@ -63,6 +84,13 @@ class VoskRecognizer:
        
        stdout_obj(caption)

+    def translate(self):
+        """持续读取共享数据中的音频帧，并进行语音识别，将识别结果输出到标准输出中"""
+        global shared_data
+        while shared_data.status == 'running':
+            chunk = shared_data.chunk_queue.get()
+            self.send_audio_frame(chunk)
+
    def stop(self):
        """停止 Vosk 引擎"""
        stdout_cmd('info', 'Vosk recognizer closed.')
--- a/engine/main.py
+++ b/engine/main.py
@@ -1,90 +1,153 @@
+import wave
 import argparse
-from utils import stdout_cmd, stdout_err
-from utils import thread_data, start_server
+import threading
+from utils import stdout, stdout_cmd
+from utils import shared_data, start_server
 from utils import merge_chunk_channels, resample_chunk_mono
-from audio2text import InvalidParameter, GummyRecognizer
+from audio2text import GummyRecognizer
 from audio2text import VoskRecognizer
+from audio2text import SosvRecognizer
 from sysaudio import AudioStream


+def audio_recording(stream: AudioStream, resample: bool, save = False, path = ''):
+    global shared_data
+    stream.open_stream()
+    wf = None
+    if save:
+        if path != '':
+            path += '/'
+        wf = wave.open(f'{path}record.wav', 'wb')
+        wf.setnchannels(stream.CHANNELS)
+        wf.setsampwidth(stream.SAMP_WIDTH)
+        wf.setframerate(stream.CHUNK_RATE)
+    while shared_data.status == 'running':
+        raw_chunk = stream.read_chunk()
+        if save: wf.writeframes(raw_chunk) # type: ignore
+        if raw_chunk is None: continue
+        if resample:
+            chunk = resample_chunk_mono(raw_chunk, stream.CHANNELS, stream.RATE, 16000)
+        else:
+            chunk = merge_chunk_channels(raw_chunk, stream.CHANNELS)
+        shared_data.chunk_queue.put(chunk)
+    if save: wf.close() # type: ignore
+    stream.close_stream_signal()
+
+
 def main_gummy(s: str, t: str, a: int, c: int, k: str):
-    global thread_data
+    """
+    Parameters:
+        s: Source language
+        t: Target language
+        k: Aliyun Bailian API key
+    """
    stream = AudioStream(a, c)
    if t == 'none':
        engine = GummyRecognizer(stream.RATE, s, None, k)
    else:
        engine = GummyRecognizer(stream.RATE, s, t, k)

-    stream.open_stream()
    engine.start()
-    chunk_mono = bytes()
-
-    restart_count = 0
-    while thread_data.status == "running":
-        try:
-            chunk = stream.read_chunk()
-            if chunk is None: continue
-            chunk_mono = merge_chunk_channels(chunk, stream.CHANNELS)
-            try:
-                engine.send_audio_frame(chunk_mono)
-            except InvalidParameter as e:
-                restart_count += 1
-                if restart_count > 5:
-                    stdout_err(str(e))
-                    thread_data.status = "kill"
-                    stdout_cmd('kill')
-                    break
-                else:
-                    stdout_cmd('info', f'Gummy engine stopped, restart attempt: {restart_count}...')
-        except KeyboardInterrupt:
-            break
-
-    engine.send_audio_frame(chunk_mono)
-    stream.close_stream()
+    stream_thread = threading.Thread(
+        target=audio_recording,
+        args=(stream, False),
+        daemon=True
+    )
+    stream_thread.start()
+    try:
+        engine.translate()
+    except KeyboardInterrupt:
+        stdout("Keyboard interrupt detected. Exiting...")
    engine.stop()


-def main_vosk(a: int, c: int, m: str):
-    global thread_data
+def main_vosk(a: int, c: int, vosk: str, t: str, tm: str, omn: str):
+    """
+    Parameters:
+        a: Audio source: 0 for output, 1 for input
+        c: Chunk number in 1 second
+        vosk: Vosk model path
+        t: Target language
+        tm: Translation model type, ollama or google
+        omn: Ollama model name
+    """
    stream = AudioStream(a, c)
-    engine = VoskRecognizer(m)
+    if t == 'none':
+        engine = VoskRecognizer(vosk, None, tm, omn)
+    else:
+        engine = VoskRecognizer(vosk, t, tm, omn)

-    stream.open_stream()
    engine.start()
+    stream_thread = threading.Thread(
+        target=audio_recording,
+        args=(stream, True),
+        daemon=True
+    )
+    stream_thread.start()
+    try:
+        engine.translate()
+    except KeyboardInterrupt:
+        stdout("Keyboard interrupt detected. Exiting...")
+    engine.stop()

-    while thread_data.status == "running":
-        try:
-            chunk = stream.read_chunk()
-            if chunk is None: continue
-            chunk_mono = resample_chunk_mono(chunk, stream.CHANNELS, stream.RATE, 16000)
-            engine.send_audio_frame(chunk_mono)
-        except KeyboardInterrupt:
-            break

-    stream.close_stream()
+def main_sosv(a: int, c: int, sosv: str, s: str, t: str, tm: str, omn: str):
+    """
+    Parameters:
+        a: Audio source: 0 for output, 1 for input
+        c: Chunk number in 1 second
+        sosv: Sherpa-ONNX SenseVoice model path
+        s: Source language
+        t: Target language
+        tm: Translation model type, ollama or google
+        omn: Ollama model name
+    """
+    stream = AudioStream(a, c)
+    if t == 'none':
+        engine = SosvRecognizer(sosv, s, None, tm, omn)
+    else:
+        engine = SosvRecognizer(sosv, s, t, tm, omn)
+
+    engine.start()
+    stream_thread = threading.Thread(
+        target=audio_recording,
+        args=(stream, True),
+        daemon=True
+    )
+    stream_thread.start()
+    try:
+        engine.translate()
+    except KeyboardInterrupt:
+        stdout("Keyboard interrupt detected. Exiting...")
    engine.stop()


 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Convert system audio stream to text')
-    # both
+    # all
    parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
    parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
    parser.add_argument('-c', '--chunk_rate', default=10, help='Number of audio stream chunks collected per second')
-    parser.add_argument('-p', '--port', default=8080, help='The port to run the server on, 0 for no server')
+    parser.add_argument('-p', '--port', default=0, help='The port to run the server on, 0 for no server')
+    parser.add_argument('-t', '--target_language', default='zh', help='Target language code, "none" for no translation')
+    # gummy and sosv
+    parser.add_argument('-s', '--source_language', default='auto', help='Source language code')
    # gummy only
-    parser.add_argument('-s', '--source_language', default='en', help='Source language code')
-    parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
    parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
+    # vosk and sosv
+    parser.add_argument('-tm', '--translation_model', default='ollama', help='Model for translation: ollama or google')
+    parser.add_argument('-omn', '--ollama_name', default='', help='Ollama model name for translation')
    # vosk only
-    parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
+    parser.add_argument('-vosk', '--vosk_model', default='', help='The path to the vosk model.')
+    # sosv only
+    parser.add_argument('-sosv', '--sosv_model', default=None, help='The SenseVoice model path')

    args = parser.parse_args()
    if int(args.port) == 0:
-        thread_data.status = "running"
+        shared_data.status = "running"
    else:
        start_server(int(args.port))
-
+    
    if args.caption_engine == 'gummy':
        main_gummy(
            args.source_language,
@@ -97,10 +160,23 @@ if __name__ == "__main__":
        main_vosk(
            int(args.audio_type),
            int(args.chunk_rate),
-            args.model_path
+            args.vosk_model,
+            args.target_language,
+            args.translation_model,
+            args.ollama_name
+        )
+    elif args.caption_engine == 'sosv':
+        main_sosv(
+            int(args.audio_type),
+            int(args.chunk_rate),
+            args.sosv_model,
+            args.source_language,
+            args.target_language,
+            args.translation_model,
+            args.ollama_name
        )
    else:
        raise ValueError('Invalid caption engine specified.')
    
-    if thread_data.status == "kill":
+    if shared_data.status == "kill":
        stdout_cmd('kill')
--- a/engine/requirements.txt
+++ b/engine/requirements.txt
@@ -1,7 +1,10 @@
 dashscope
 numpy
-samplerate
+resampy
 vosk
 pyinstaller
 pyaudio; sys_platform == 'darwin'
 pyaudiowpatch; sys_platform == 'win32'
+googletrans
+ollama
+sherpa_onnx
--- a/engine/sysaudio/darwin.py
+++ b/engine/sysaudio/darwin.py
@@ -37,14 +37,13 @@ class AudioStream:
        self.FORMAT = pyaudio.paInt16
        self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
        self.CHANNELS = int(self.device["maxInputChannels"])
-        self.RATE = int(self.device["defaultSampleRate"])
-        self.CHUNK = self.RATE // chunk_rate
+        self.DEFAULT_RATE = int(self.device["defaultSampleRate"])
+        self.CHUNK_RATE = chunk_rate

-    def reset_chunk_size(self, chunk_size: int):
-        """
-        重新设置音频块大小
-        """
-        self.CHUNK = chunk_size
+        self.RATE = 16000
+        self.CHUNK = self.RATE // self.CHUNK_RATE
+        self.open_stream()
+        self.close_stream()

    def get_info(self):
        dev_info = f"""
@@ -72,16 +71,27 @@ class AudioStream:
        打开并返回系统音频输出流
        """
        if self.stream: return self.stream
-        self.stream = self.mic.open(
-            format = self.FORMAT,
-            channels = int(self.CHANNELS),
-            rate = self.RATE,
-            input = True,
-            input_device_index = int(self.INDEX)
-        )
+        try:
+            self.stream = self.mic.open(
+                format = self.FORMAT,
+                channels = int(self.CHANNELS),
+                rate = self.RATE,
+                input = True,
+                input_device_index = int(self.INDEX)
+            )
+        except OSError:
+            self.RATE = self.DEFAULT_RATE
+            self.CHUNK = self.RATE // self.CHUNK_RATE
+            self.stream = self.mic.open(
+                format = self.FORMAT,
+                channels = int(self.CHANNELS),
+                rate = self.RATE,
+                input = True,
+                input_device_index = int(self.INDEX)
+            )
        return self.stream

-    def read_chunk(self):
+    def read_chunk(self) -> bytes | None:
        """
        读取音频数据
        """
--- a/engine/sysaudio/linux.py
+++ b/engine/sysaudio/linux.py
@@ -55,15 +55,10 @@ class AudioStream:
        self.FORMAT = 16
        self.SAMP_WIDTH = 2
        self.CHANNELS = 2
-        self.RATE = 48000
+        self.RATE = 16000
+        self.CHUNK_RATE = chunk_rate
        self.CHUNK = self.RATE // chunk_rate

-    def reset_chunk_size(self, chunk_size: int):
-        """
-        重新设置音频块大小
-        """
-        self.CHUNK = chunk_size
-
    def get_info(self):
        dev_info = f"""
        音频捕获进程：
@@ -84,7 +79,7 @@ class AudioStream:
        启动音频捕获进程
        """
        self.process = subprocess.Popen(
-            ["parec", "-d", self.source, "--format=s16le", "--rate=48000", "--channels=2"],
+            ["parec", "-d", self.source, "--format=s16le", "--rate=16000", "--channels=2"],
            stdout=subprocess.PIPE
        )

--- a/engine/sysaudio/win.py
+++ b/engine/sysaudio/win.py
@@ -61,14 +61,13 @@ class AudioStream:
        self.FORMAT = pyaudio.paInt16
        self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
        self.CHANNELS = int(self.device["maxInputChannels"])
-        self.RATE = int(self.device["defaultSampleRate"])
-        self.CHUNK = self.RATE // chunk_rate
+        self.DEFAULT_RATE = int(self.device["defaultSampleRate"])
+        self.CHUNK_RATE = chunk_rate

-    def reset_chunk_size(self, chunk_size: int):
-        """
-        重新设置音频块大小
-        """
-        self.CHUNK = chunk_size
+        self.RATE = 16000
+        self.CHUNK = self.RATE // self.CHUNK_RATE
+        self.open_stream()
+        self.close_stream()

    def get_info(self):
        dev_info = f"""
@@ -96,13 +95,24 @@ class AudioStream:
        打开并返回系统音频输出流
        """
        if self.stream: return self.stream
-        self.stream = self.mic.open(
-            format = self.FORMAT,
-            channels = self.CHANNELS,
-            rate = self.RATE,
-            input = True,
-            input_device_index = self.INDEX
-        )
+        try: 
+            self.stream = self.mic.open(
+                format = self.FORMAT,
+                channels = self.CHANNELS,
+                rate = self.RATE,
+                input = True,
+                input_device_index = self.INDEX
+            )
+        except OSError:
+            self.RATE = self.DEFAULT_RATE
+            self.CHUNK = self.RATE // self.CHUNK_RATE
+            self.stream = self.mic.open(
+                format = self.FORMAT,
+                channels = self.CHANNELS,
+                rate = self.RATE,
+                input = True,
+                input_device_index = self.INDEX
+            )
        return self.stream

    def read_chunk(self) -> bytes | None:
--- a/engine/utils/init.py
+++ b/engine/utils/init.py
@@ -1,9 +1,5 @@
-from .audioprcs import (
-    merge_chunk_channels,
-    resample_chunk_mono,
-    resample_chunk_mono_np,
-    resample_mono_chunk
-)
+from .audioprcs import merge_chunk_channels, resample_chunk_mono
 from .sysout import stdout, stdout_err, stdout_cmd, stdout_obj, stderr
-from .thdata import thread_data
-from .server import start_server
+from .shared import shared_data
+from .server import start_server
+from .translation import ollama_translate, google_translate
--- a/engine/utils/audioprcs.py
+++ b/engine/utils/audioprcs.py
@@ -1,4 +1,4 @@
-import samplerate
+import resampy
 import numpy as np
 import numpy.core.multiarray # do not remove

@@ -24,16 +24,15 @@ def merge_chunk_channels(chunk: bytes, channels: int) -> bytes:
    return chunk_mono.tobytes()


-def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
+def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int) -> bytes:
    """
-    将当前多通道音频数据块转换成单通道音频数据块，然后进行重采样
+    将当前多通道音频数据块转换成单通道音频数据块，并进行重采样

    Args:
        chunk: 多通道音频数据块
        channels: 通道数
        orig_sr: 原始采样率
        target_sr: 目标采样率
-        mode: 重采样模式，可选：'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'

    Return:
        单通道音频数据块
@@ -49,60 +48,17 @@ def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: in
        # (length,)
        chunk_mono = np.mean(chunk_np.astype(np.float32), axis=1)

-    ratio = target_sr / orig_sr
-    chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode)
+    if orig_sr == target_sr:
+        return chunk_mono.astype(np.int16).tobytes()
+    
+    chunk_mono_r = resampy.resample(chunk_mono, orig_sr, target_sr)
    chunk_mono_r = np.round(chunk_mono_r).astype(np.int16)
-    return chunk_mono_r.tobytes()
-
-
-def resample_chunk_mono_np(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best", dtype=np.float32) -> np.ndarray:
-    """
-    将当前多通道音频数据块转换成单通道音频数据块，然后进行重采样，返回 Numpy 数组
-
-    Args:
-        chunk: 多通道音频数据块
-        channels: 通道数
-        orig_sr: 原始采样率
-        target_sr: 目标采样率
-        mode: 重采样模式，可选：'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
-        dtype: 返回 Numpy 数组的数据类型
-
-    Return:
-        单通道音频数据块
-    """
-    if channels == 1:
-        chunk_mono = np.frombuffer(chunk, dtype=np.int16)
-        chunk_mono = chunk_mono.astype(np.float32)
+    real_len = round(chunk_mono.shape[0] * target_sr / orig_sr)
+    if(chunk_mono_r.shape[0] != real_len):
+        print(chunk_mono_r.shape[0], real_len)
+    if(chunk_mono_r.shape[0] > real_len):
+        chunk_mono_r = chunk_mono_r[:real_len]
    else:
-        # (length * channels,)
-        chunk_np = np.frombuffer(chunk, dtype=np.int16)
-        # (length, channels)
-        chunk_np = chunk_np.reshape(-1, channels)
-        # (length,)
-        chunk_mono = np.mean(chunk_np.astype(np.float32), axis=1)
-
-    ratio = target_sr / orig_sr
-    chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode)
-    chunk_mono_r = chunk_mono_r.astype(dtype)
-    return chunk_mono_r
-
-
-def resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
-    """
-    将当前单通道音频块进行重采样
-
-    Args:
-        chunk: 单通道音频数据块
-        orig_sr: 原始采样率
-        target_sr: 目标采样率
-        mode: 重采样模式，可选：'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
-
-    Return:
-        单通道音频数据块
-    """
-    chunk_np = np.frombuffer(chunk, dtype=np.int16)
-    chunk_np = chunk_np.astype(np.float32)
-    ratio = target_sr / orig_sr
-    chunk_r =  samplerate.resample(chunk_np, ratio, converter_type=mode)
-    chunk_r = np.round(chunk_r).astype(np.int16)
-    return chunk_r.tobytes()
+        while chunk_mono_r.shape[0] < real_len:
+            chunk_mono_r = np.append(chunk_mono_r, chunk_mono_r[-1])
+    return chunk_mono_r.tobytes()
--- a/engine/utils/server.py
+++ b/engine/utils/server.py
@@ -1,13 +1,12 @@
 import socket
 import threading
 import json
-# import time
-from utils import thread_data, stdout_cmd, stderr
+from utils import shared_data, stdout_cmd, stderr


 def handle_client(client_socket):
-    global thread_data
-    while thread_data.status == 'running':
+    global shared_data
+    while shared_data.status == 'running':
        try:
            data = client_socket.recv(4096).decode('utf-8')
            if not data:
@@ -15,13 +14,13 @@ def handle_client(client_socket):
            data = json.loads(data)

            if data['command'] == 'stop':
-                thread_data.status = 'stop'
+                shared_data.status = 'stop'
                break
        except Exception as e:
            stderr(f'Communication error: {e}')
            break
    
-    thread_data.status = 'stop'
+    shared_data.status = 'stop'
    client_socket.close()


@@ -34,7 +33,6 @@ def start_server(port: int):
        stderr(str(e))
        stdout_cmd('kill')
        return
-    # time.sleep(20)
    stdout_cmd('connect')

    client, addr = server.accept()
--- a/engine/utils/shared.py
+++ b/engine/utils/shared.py
@@ -0,0 +1,8 @@
+import queue
+
+class SharedData:
+    def __init__(self):
+        self.status = "running"
+        self.chunk_queue = queue.Queue()
+
+shared_data = SharedData()
--- a/engine/utils/thdata.py
+++ b/engine/utils/thdata.py
@@ -1,5 +0,0 @@
-class ThreadData:
-    def __init__(self):
-        self.status = "running"
-
-thread_data = ThreadData()
--- a/engine/utils/translation.py
+++ b/engine/utils/translation.py
@@ -0,0 +1,49 @@
+from ollama import chat
+from ollama import ChatResponse
+import asyncio
+from googletrans import Translator
+from .sysout import stdout_cmd, stdout_obj
+
+lang_map = {
+    'en': 'English',
+    'es': 'Spanish',
+    'fr': 'French',
+    'de': 'German',
+    'it': 'Italian',
+    'ru': 'Russian',
+    'ja': 'Japanese',
+    'ko': 'Korean',
+    'zh': 'Chinese',
+    'zh-cn': 'Chinese'
+}
+
+def ollama_translate(model: str, target: str, text: str, time_s: str):
+    response: ChatResponse = chat(
+        model=model,
+        messages=[
+            {"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
+            {"role": "user", "content": text}
+        ]
+    )
+    content = response.message.content or ""
+    if content.startswith('<think>'):
+        index = content.find('</think>')
+        if index != -1:
+            content = content[index+8:]
+    stdout_obj({
+        "command": "translation",
+        "time_s": time_s,
+        "translation": content.strip()
+    })
+
+def google_translate(model: str, target: str, text: str, time_s: str):
+    translator = Translator()
+    try:
+        res = asyncio.run(translator.translate(text, dest=target))
+        stdout_obj({
+            "command": "translation",
+            "time_s": time_s,
+            "translation": res.text
+        })
+    except Exception as e:
+        stdout_cmd("warn", f"Google translation request failed, please check your network connection...")
--- a/src/main/ControlWindow.ts
+++ b/src/main/ControlWindow.ts
@@ -160,7 +160,7 @@ class ControlWindow {
    })

    ipcMain.on('control.engine.forceKill', () => {
-      captionEngine.forceKill()
+      captionEngine.kill()
    })

    ipcMain.on('control.captionLog.clear', () => {
--- a/src/main/types/index.ts
+++ b/src/main/types/index.ts
@@ -6,6 +6,8 @@ export interface Controls {
  engineEnabled: boolean,
  sourceLang: string,
  targetLang: string,
+  transModel: string,
+  ollamaName: string,
  engine: string,
  audio: 0 | 1,
  translation: boolean,
--- a/src/main/utils/AllConfig.ts
+++ b/src/main/utils/AllConfig.ts
@@ -7,6 +7,11 @@ import { app, BrowserWindow } from 'electron'
 import * as path from 'path'
 import * as fs from 'fs'

+interface CaptionTranslation {
+  time_s: string,
+  translation: string
+}
+
 const defaultStyles: Styles = {
  lineBreak: 1,
  fontFamily: 'sans-serif',
@@ -31,6 +36,8 @@ const defaultStyles: Styles = {
 const defaultControls: Controls = {
  sourceLang: 'en',
  targetLang: 'zh',
+  transModel: 'ollama',
+  ollamaName: '',
  engine: 'gummy',
  audio: 0,
  engineEnabled: false,
@@ -158,12 +165,28 @@ class AllConfig {
    }
  }

-  public sendCaptionLog(window: BrowserWindow, command: 'add' | 'upd' | 'set') {
+  public updateCaptionTranslation(trans: CaptionTranslation){
+    for(let i = this.captionLog.length - 1; i >= 0; i--){
+      if(this.captionLog[i].time_s === trans.time_s){
+        this.captionLog[i].translation = trans.translation
+        for(const window of BrowserWindow.getAllWindows()){
+          this.sendCaptionLog(window, 'upd', i)
+        }
+        break
+      }
+    }
+  }
+  public sendCaptionLog(
+    window: BrowserWindow,
+    command: 'add' | 'upd' | 'set',
+    index: number | undefined = undefined
+  ) {
    if(command === 'add'){
-      window.webContents.send(`both.captionLog.add`, this.captionLog[this.captionLog.length - 1])
+      window.webContents.send(`both.captionLog.add`, this.captionLog.at(-1))
    }
    else if(command === 'upd'){
-      window.webContents.send(`both.captionLog.upd`, this.captionLog[this.captionLog.length - 1])
+      if(index !== undefined) window.webContents.send(`both.captionLog.upd`, this.captionLog[index])
+      else window.webContents.send(`both.captionLog.upd`, this.captionLog.at(-1))
    }
    else if(command === 'set'){
      window.webContents.send(`both.captionLog.set`, this.captionLog)
--- a/src/main/utils/CaptionEngine.ts
+++ b/src/main/utils/CaptionEngine.ts
@@ -67,22 +67,23 @@ export class CaptionEngine {
      this.command.push('-a', allConfig.controls.audio ? '1' : '0')
      this.port = Math.floor(Math.random() * (65535 - 1024 + 1)) + 1024
      this.command.push('-p', this.port.toString())
+      this.command.push(
+        '-t', allConfig.controls.translation ?
+        allConfig.controls.targetLang : 'none'
+      )

      if(allConfig.controls.engine === 'gummy') {
        this.command.push('-e', 'gummy')
        this.command.push('-s', allConfig.controls.sourceLang)
-        this.command.push(
-          '-t', allConfig.controls.translation ?
-          allConfig.controls.targetLang : 'none'
-        )
        if(allConfig.controls.API_KEY) {
          this.command.push('-k', allConfig.controls.API_KEY)
        }
      }
      else if(allConfig.controls.engine === 'vosk'){
        this.command.push('-e', 'vosk')
-        
-        this.command.push('-m', `"${allConfig.controls.modelPath}"`)        
+        this.command.push('-vosk', `"${allConfig.controls.modelPath}"`)
+        this.command.push('-tm', allConfig.controls.transModel)
+        this.command.push('-omn', allConfig.controls.ollamaName)
      }
    }
    Log.info('Engine Path:', this.appPath)
@@ -97,7 +98,6 @@ export class CaptionEngine {

  public connect() {
    if(this.client) { Log.warn('Client already exists, ignoring...') }
-    // 清除启动超时计时器
    if (this.startTimeoutID) {
      clearTimeout(this.startTimeoutID)
      this.startTimeoutID = undefined
@@ -137,14 +137,13 @@ export class CaptionEngine {
    this.status = 'starting'
    Log.info('Caption Engine Starting, PID:', this.process.pid)

-    // 设置启动超时机制
    const timeoutMs = allConfig.controls.startTimeoutSeconds * 1000
    this.startTimeoutID = setTimeout(() => {
      if (this.status === 'starting') {
        Log.warn(`Engine start timeout after ${allConfig.controls.startTimeoutSeconds} seconds, forcing kill...`)
        this.status = 'starting-timeout'
        controlWindow.sendErrorMessage(i18n('engine.start.timeout'))
-        this.forceKill()
+        this.kill()
      }
    }, timeoutMs)
    
@@ -182,7 +181,6 @@ export class CaptionEngine {
      }
      this.status = 'stopped'
      clearInterval(this.timerID)
-      // 清理启动超时计时器
      if (this.startTimeoutID) {
        clearTimeout(this.startTimeoutID)
        this.startTimeoutID = undefined
@@ -194,7 +192,6 @@ export class CaptionEngine {
  public stop() {
    if(this.status !== 'running'){
      Log.warn('Trying to stop engine which is not running, current status:', this.status)
-      return
    }
    this.sendCommand('stop')
    if(this.client){
@@ -210,27 +207,12 @@ export class CaptionEngine {
  }

  public kill(){
+    if(!this.process || !this.process.pid) return
    if(this.status !== 'running'){
      Log.warn('Trying to kill engine which is not running, current status:', this.status)
-      return
    }
-    this.sendCommand('stop')
-    if(this.client){
-      this.client.destroy()
-      this.client = undefined
-    }
-    this.status = 'stopping'
-    this.timerID = setTimeout(() => {
-      if(this.status !== 'stopping') return
-      Log.warn('Engine process still not stopped, trying to kill...')
-      this.forceKill()
-    }, 4000);
-  }
+    Log.warn('Killing engine process, PID:', this.process.pid)

-  public forceKill(){
-    if(!this.process || !this.process.pid) return
-    Log.warn('Force killing engine process, PID:', this.process.pid)
-    // 清理启动超时计时器
    if (this.startTimeoutID) {
      clearTimeout(this.startTimeoutID)
      this.startTimeoutID = undefined
@@ -246,13 +228,12 @@ export class CaptionEngine {
      }
      exec(cmd, (error) => {
        if (error) {
-          Log.error('Failed to force kill process:', error)
+          Log.error('Failed to kill process:', error)
        } else {
-          Log.info('Process force killed successfully')
+          Log.info('Process killed successfully')
        }
      })
    }
-    this.status = 'stopping'
  }
 }

@@ -269,12 +250,18 @@ function handleEngineData(data: any) {
  else if(data.command === 'caption') {
    allConfig.updateCaptionLog(data);
  }
+  else if(data.command === 'translation') {
+    allConfig.updateCaptionTranslation(data);
+  }
  else if(data.command === 'print') {
-    Log.info('Engine Print:', data.content)
+    console.log(data.content)
  }
  else if(data.command === 'info') {
    Log.info('Engine Info:', data.content)
  }
+  else if(data.command === 'warn') {
+    Log.warn('Engine Warn:', data.content)
+  }
  else if(data.command === 'error') {
    Log.error('Engine Error:', data.content)
    controlWindow.sendErrorMessage(/*i18n('engine.error') +*/ data.content)
--- a/src/renderer/src/components/EngineControl.vue
+++ b/src/renderer/src/components/EngineControl.vue
@@ -5,9 +5,18 @@
      <a @click="applyChange">{{ $t('engine.applyChange') }}</a> |
      <a @click="cancelChange">{{ $t('engine.cancelChange') }}</a>
    </template>
+    <div class="input-item">
+      <span class="input-label">{{ $t('engine.captionEngine') }}</span>
+      <a-select
+        class="input-area"
+        v-model:value="currentEngine"
+        :options="captionEngine"
+      ></a-select>
+    </div>
    <div class="input-item">
      <span class="input-label">{{ $t('engine.sourceLang') }}</span>
      <a-select
+        :disabled="currentEngine === 'vosk'"
        class="input-area"
        v-model:value="currentSourceLang"
        :options="langList"
@@ -16,20 +25,33 @@
    <div class="input-item">
      <span class="input-label">{{ $t('engine.transLang') }}</span>
      <a-select
-        :disabled="currentEngine === 'vosk'"
        class="input-area"
        v-model:value="currentTargetLang"
        :options="langList.filter((item) => item.value !== 'auto')"
      ></a-select>
    </div>
-    <div class="input-item">
-      <span class="input-label">{{ $t('engine.captionEngine') }}</span>
+    <div class="input-item" v-if="transModel">
+      <span class="input-label">{{ $t('engine.transModel') }}</span>
      <a-select
        class="input-area"
-        v-model:value="currentEngine"
-        :options="captionEngine"
+        v-model:value="currentTransModel"
+        :options="transModel"
      ></a-select>
    </div>
+    <div class="input-item" v-if="transModel && currentTransModel === 'ollama'">
+      <a-popover placement="right">
+        <template #content>
+          <p class="label-hover-info">{{ $t('engine.ollamaNote') }}</p>
+        </template>
+        <span class="input-label info-label"
+          :style="{color: uiColor}"
+        >{{ $t('engine.ollama') }}</span>
+      </a-popover>
+      <a-input
+        class="input-area"
+        v-model:value="currentOllamaName"
+      ></a-input>
+    </div>
    <div class="input-item">
      <span class="input-label">{{ $t('engine.audioType') }}</span>
      <a-select
@@ -80,11 +102,13 @@

    <a-card size="small" :title="$t('engine.showMore')" v-show="showMore" style="margin-top:10px;">
      <div class="input-item">
-        <a-popover>
+        <a-popover placement="right">
          <template #content>
            <p class="label-hover-info">{{ $t('engine.apikeyInfo') }}</p>
          </template>
-          <span class="input-label info-label">{{ $t('engine.apikey') }}</span>
+          <span class="input-label info-label"
+            :style="{color: uiColor}"
+          >{{ $t('engine.apikey') }}</span>
        </a-popover>
        <a-input
          class="input-area"
@@ -93,14 +117,17 @@
        />
      </div>
      <div class="input-item">
-        <a-popover>
+        <a-popover placement="right">
          <template #content>
            <p class="label-hover-info">{{ $t('engine.modelPathInfo') }}</p>
          </template>
-          <span class="input-label info-label">{{ $t('engine.modelPath') }}</span>
+          <span class="input-label info-label"
+            :style="{color: uiColor}"
+          >{{ $t('engine.modelPath') }}</span>
        </a-popover>
        <span
          class="input-folder"
+          :style="{color: uiColor}"
          @click="selectFolderPath"
        ><span><FolderOpenOutlined /></span></span>
        <a-input
@@ -110,13 +137,13 @@
        />
      </div>
      <div class="input-item">
-        <a-popover>
+        <a-popover placement="right">
          <template #content>
            <p class="label-hover-info">{{ $t('engine.startTimeoutInfo') }}</p>
          </template>
          <span
            class="input-label info-label"
-            style="vertical-align: middle;"
+            :style="{color: uiColor, verticalAlign: 'middle'}"
          >{{ $t('engine.startTimeout') }}</span>
        </a-popover>
        <a-input-number
@@ -134,12 +161,12 @@
 </template>

 <script setup lang="ts">
-import { ref, computed, watch } from 'vue'
+import { ref, computed, watch, h } from 'vue'
 import { storeToRefs } from 'pinia'
 import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
 import { useEngineControlStore } from '@renderer/stores/engineControl'
 import { notification } from 'ant-design-vue'
-import { FolderOpenOutlined ,InfoCircleOutlined } from '@ant-design/icons-vue';
+import { ExclamationCircleOutlined, FolderOpenOutlined ,InfoCircleOutlined } from '@ant-design/icons-vue';
 import { useI18n } from 'vue-i18n'

 const { t } = useI18n()
@@ -148,11 +175,16 @@ const showMore = ref(false)
 const engineControl = useEngineControlStore()
 const { captionEngine, audioType, changeSignal } = storeToRefs(engineControl)

+const generalSetting = useGeneralSettingStore()
+const { uiColor } = storeToRefs(generalSetting)
+
 const currentSourceLang = ref('auto')
 const currentTargetLang = ref('zh')
 const currentEngine = ref<string>('gummy')
 const currentAudio = ref<0 | 1>(0)
-const currentTranslation = ref<boolean>(false)
+const currentTranslation = ref<boolean>(true)
+const currentTransModel = ref('ollama')
+const currentOllamaName = ref('')
 const currentAPI_KEY = ref<string>('')
 const currentModelPath = ref<string>('')
 const currentCustomized = ref<boolean>(false)
@@ -169,9 +201,33 @@ const langList = computed(() => {
  return []
 })

+const transModel = computed(() => {
+  for(let item of captionEngine.value){
+    if(item.value === currentEngine.value) {
+      return item.transModel
+    }
+  }
+  return []
+})
+
 function applyChange(){
+  if(
+    currentTranslation.value && transModel.value &&
+    currentTransModel.value === 'ollama' && !currentOllamaName.value.trim()
+  ) {
+    notification.open({
+      message: t('noti.ollamaNameNull'),
+      description: t('noti.ollamaNameNullNote'),
+      duration: null,
+      icon: () => h(ExclamationCircleOutlined, { style: 'color: #ff4d4f' })
+    })
+    return
+  }
+
  engineControl.sourceLang = currentSourceLang.value
  engineControl.targetLang = currentTargetLang.value
+  engineControl.transModel = currentTransModel.value
+  engineControl.ollamaName = currentOllamaName.value
  engineControl.engine = currentEngine.value
  engineControl.audio = currentAudio.value
  engineControl.translation = currentTranslation.value
@@ -194,6 +250,8 @@ function applyChange(){
 function cancelChange(){
  currentSourceLang.value = engineControl.sourceLang
  currentTargetLang.value = engineControl.targetLang
+  currentTransModel.value = engineControl.transModel
+  currentOllamaName.value = engineControl.ollamaName
  currentEngine.value = engineControl.engine
  currentAudio.value = engineControl.audio
  currentTranslation.value = engineControl.translation
@@ -222,7 +280,10 @@ watch(changeSignal, (val) => {
 watch(currentEngine, (val) => {
  if(val == 'vosk'){
    currentSourceLang.value = 'auto'
-    currentTargetLang.value = ''
+    currentTargetLang.value = useGeneralSettingStore().uiLanguage
+    if(currentTargetLang.value === 'zh') {
+      currentTargetLang.value = 'zh-cn'
+    }
  }
  else if(val == 'gummy'){
    currentSourceLang.value = 'auto'
@@ -240,8 +301,8 @@ watch(currentEngine, (val) => {
 }

 .info-label {
-  color: #1677ff;
  cursor: pointer;
+  font-style: italic;
 }

 .input-folder {
@@ -252,20 +313,12 @@ watch(currentEngine, (val) => {
  transition: all 0.25s;
 }

-.input-folder>span {
-  padding: 0 2px;
-  border: 2px solid #1677ff;
-  color: #1677ff;
-  border-radius: 30%;
-}
-
 .input-folder:hover {
  transform: scale(1.1);
 }

 .customize-note {
  padding: 10px 10px 0;
-  color: red;
  max-width: min(40vw, 480px);
 }
 </style>
--- a/src/renderer/src/i18n/config/engine.ts
+++ b/src/renderer/src/i18n/config/engine.ts
@@ -21,6 +21,19 @@ export const engines = {
      label: '本地 -  Vosk',
      languages: [
        { value: 'auto', label: '需要自行配置模型' },
+        { value: 'en', label: '英语' },
+        { value: 'zh-cn', label: '中文' },
+        { value: 'ja', label: '日语' },
+        { value: 'ko', label: '韩语' },
+        { value: 'de', label: '德语' },
+        { value: 'fr', label: '法语' },
+        { value: 'ru', label: '俄语' },
+        { value: 'es', label: '西班牙语' },
+        { value: 'it', label: '意大利语' },
+      ],
+      transModel: [
+        { value: 'ollama', label: 'Ollama 本地模型' },
+        { value: 'google', label: 'Google API 调用' },
      ]
    }
  ],
@@ -46,6 +59,19 @@ export const engines = {
      label: 'Local - Vosk',
      languages: [
        { value: 'auto', label: 'Model needs to be configured manually' },
+        { value: 'en', label: 'English' },
+        { value: 'zh-cn', label: 'Chinese' },
+        { value: 'ja', label: 'Japanese' },
+        { value: 'ko', label: 'Korean' },
+        { value: 'de', label: 'German' },
+        { value: 'fr', label: 'French' },
+        { value: 'ru', label: 'Russian' },
+        { value: 'es', label: 'Spanish' },
+        { value: 'it', label: 'Italian' },
+      ],
+      transModel: [
+        { value: 'ollama', label: 'Ollama Local Model' },
+        { value: 'google', label: 'Google API Call' },
      ]
    }
  ],
@@ -71,8 +97,20 @@ export const engines = {
      label: 'ローカル - Vosk',
      languages: [
        { value: 'auto', label: 'モデルを手動で設定する必要があります' },
+        { value: 'en', label: '英語' },
+        { value: 'zh-cn', label: '中国語' },
+        { value: 'ja', label: '日本語' },
+        { value: 'ko', label: '韓国語' },
+        { value: 'de', label: 'ドイツ語' },
+        { value: 'fr', label: 'フランス語' },
+        { value: 'ru', label: 'ロシア語' },
+        { value: 'es', label: 'スペイン語' },
+        { value: 'it', label: 'イタリア語' },
+      ],
+      transModel: [
+        { value: 'ollama', label: 'Ollama ローカルモデル' },
+        { value: 'google', label: 'Google API 呼び出し' },
      ]
    }
  ]
 }
-
--- a/src/renderer/src/i18n/lang/en.ts
+++ b/src/renderer/src/i18n/lang/en.ts
@@ -28,7 +28,9 @@ export default {
    "changeInfo": "If the caption engine is already running, you need to restart it for the changes to take effect.",
    "styleChange": "Caption Style Changed",
    "styleInfo": "Caption style changes have been saved and applied.",
-    "engineStartTimeout": "Caption engine startup timeout, automatically force stopped"
+    "engineStartTimeout": "Caption engine startup timeout, automatically force stopped",
+    "ollamaNameNull": "'Ollama' Field is Empty",
+    "ollamaNameNullNote": "When selecting Ollama model as the translation model, the 'Ollama' field cannot be empty and must be filled with the name of a locally configured Ollama model."
  },
  general: {
    "title": "General Settings",
@@ -47,6 +49,9 @@ export default {
    "cancelChange": "Cancel Changes",
    "sourceLang": "Source",
    "transLang": "Translation",
+    "transModel": "Model",
+    "ollama": "Ollama",
+    "ollamaNote": "To use for translation, the name of the local Ollama model that will call the service on the default port. It is recommended to use a non-inference model with less than 1B parameters.",
    "captionEngine": "Engine",
    "audioType": "Audio Type",
    "systemOutput": "System Audio Output (Speaker)",
--- a/src/renderer/src/i18n/lang/ja.ts
+++ b/src/renderer/src/i18n/lang/ja.ts
@@ -28,7 +28,9 @@ export default {
    "changeInfo": "字幕エンジンがすでに起動している場合、変更を有効にするには再起動が必要です。",
    "styleChange": "字幕のスタイルが変更されました",
    "styleInfo": "字幕のスタイル変更が保存され、適用されました",
-    "engineStartTimeout": "字幕エンジンの起動がタイムアウトしました。自動的に強制停止しました"
+    "engineStartTimeout": "字幕エンジンの起動がタイムアウトしました。自動的に強制停止しました",
+    "ollamaNameNull": "Ollama フィールドが空です",
+    "ollamaNameNullNote": "Ollama モデルを翻訳モデルとして選択する場合、Ollama フィールドは空にできません。ローカルで設定された Ollama モデルの名前を入力してください。"
  },
  general: {
    "title": "一般設定",
@@ -47,6 +49,9 @@ export default {
    "cancelChange": "変更をキャンセル",
    "sourceLang": "ソース言語",
    "transLang": "翻訳言語",
+    "transModel": "翻訳モデル",
+    "ollama": "Ollama",
+    "ollamaNote": "翻訳に使用する、デフォルトポートでサービスを呼び出すローカルOllamaモデルの名前。1B 未満のパラメータを持つ非推論モデルの使用を推奨します。",
    "captionEngine": "エンジン",
    "audioType": "オーディオ",
    "systemOutput": "システムオーディオ出力（スピーカー）",
--- a/src/renderer/src/i18n/lang/zh.ts
+++ b/src/renderer/src/i18n/lang/zh.ts
@@ -28,7 +28,9 @@ export default {
    "changeInfo": "如果字幕引擎已经启动，需要重启字幕引擎修改才会生效",
    "styleChange": "字幕样式已修改",
    "styleInfo": "字幕样式修改已经保存并生效",
-    "engineStartTimeout": "字幕引擎启动超时，已自动强制停止"
+    "engineStartTimeout": "字幕引擎启动超时，已自动强制停止",
+    "ollamaNameNull": "Ollama 字段为空",
+    "ollamaNameNullNote": "选择 Ollama 模型作为翻译模型时，Ollama 字段不能为空，需要填写本地已经配置好的 Ollama 模型的名称。"
  },
  general: {
    "title": "通用设置",
@@ -47,6 +49,9 @@ export default {
    "cancelChange": "取消更改",
    "sourceLang": "源语言",
    "transLang": "翻译语言",
+    "transModel": "翻译模型",
+    "ollama": "Ollama",
+    "ollamaNote": "要使用的进行翻译的本地 Ollama 模型的名称，将调用默认端口的服务，建议使用参数量小于 1B 的非推理模型。",
    "captionEngine": "字幕引擎",
    "audioType": "音频类型",
    "systemOutput": "系统音频输出（扬声器）",
--- a/src/renderer/src/stores/captionLog.ts
+++ b/src/renderer/src/stores/captionLog.ts
@@ -15,7 +15,12 @@ export const useCaptionLogStore = defineStore('captionLog', () => {
  })

  window.electron.ipcRenderer.on('both.captionLog.upd', (_, log) => {
-    captionData.value.splice(captionData.value.length - 1, 1, log)
+    for(let i = captionData.value.length - 1; i >= 0; i--) {
+      if(captionData.value[i].time_s === log.time_s){
+        captionData.value.splice(i, 1, log)
+        break
+      }
+    }
  })

  window.electron.ipcRenderer.on('both.captionLog.set', (_, logs) => {
--- a/src/renderer/src/stores/engineControl.ts
+++ b/src/renderer/src/stores/engineControl.ts
@@ -19,6 +19,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
  const engineEnabled = ref(false)
  const sourceLang = ref<string>('en')
  const targetLang = ref<string>('zh')
+  const transModel = ref<string>('ollama')
+  const ollamaName = ref<string>('')
  const engine = ref<string>('gummy')
  const audio = ref<0 | 1>(0)
  const translation = ref<boolean>(true)
@@ -37,6 +39,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
      engineEnabled: engineEnabled.value,
      sourceLang: sourceLang.value,
      targetLang: targetLang.value,
+      transModel: transModel.value,
+      ollamaName: ollamaName.value,
      engine: engine.value,
      audio: audio.value,
      translation: translation.value,
@@ -68,6 +72,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
    }
    sourceLang.value = controls.sourceLang
    targetLang.value = controls.targetLang
+    transModel.value = controls.transModel
+    ollamaName.value = controls.ollamaName
    engine.value = controls.engine
    audio.value = controls.audio
    engineEnabled.value = controls.engineEnabled
@@ -132,6 +138,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
    engineEnabled,      // 字幕引擎是否启用
    sourceLang,         // 源语言
    targetLang,         // 目标语言
+    transModel,         // 翻译模型
+    ollamaName,        // Ollama 模型
    engine,             // 字幕引擎
    audio,              // 选择音频
    translation,        // 是否启用翻译
--- a/src/renderer/src/types/index.ts
+++ b/src/renderer/src/types/index.ts
@@ -6,6 +6,8 @@ export interface Controls {
  engineEnabled: boolean,
  sourceLang: string,
  targetLang: string,
+  transModel: string,
+  ollamaName: string,
  engine: string,
  audio: 0 | 1,
  translation: boolean,
Author	SHA1	Message	Date
himeditator	6bff978b88	feat(engine): 替换重采样模型、SOSV 添加标点恢复模型 - 将 samplerate 库替换为 resampy 库，提高重采样质量 - Shepra-ONNX SenseVoice 添加中文和英语标点恢复模型	2025-09-06 23:15:33 +08:00
himeditator	eba2c5ca45	feat(engine): 重构字幕引擎，新增 Sherpa-ONNX SenseVoice 语音识别模型 - 重构字幕引擎，将音频采集改为在新线程上进行 - 重构 audio2text 中的类，调整运行逻辑 - 更新 main 函数，添加对 Sosv 模型的支持 - 修改 AudioStream 类，默认使用 16000Hz 采样率	2025-09-06 20:49:46 +08:00
himeditator	2b7ce06f04	feat(translation): 添加非实时翻译功能用户界面组件	2025-09-04 23:41:22 +08:00
himeditator	14987cbfc5	feat(vosk): 为 Vosk 模型添加非实时翻译功能 (#14 ) - 添加 Ollama 大模型翻译和 Google 翻译（非实时），支持多种语言 - 为 Vosk 引擎添加非实时翻译 - 为新增的翻译功能添加和修改接口 - 修改 Electron 构建配置，之后不同平台构建无需修改构建文件	2025-09-02 23:19:53 +08:00
himeditator	56fdc348f8	fix(engine): 解决在引擎状态不为 running 时强制关闭字幕引擎失败的问题 - 合并了 CaptionEngine 类中的 kill 和 forceKill 方法，删除了状态警告中的提前 return - 更新了 README 文件中的macOS兼容性说明，添加了配置链接	2025-08-30 20:57:26 +08:00