feat(engine): 优化字幕引擎通信和控制逻辑，优化窗口信息展示

- 优化错误处理和引擎重启逻辑 - 添加字幕引擎强制终止功能 - 调整通知和错误提示的显示位置 - 优化日志记录精度到毫秒级
feat(engine): 重构字幕引擎并实现 WebSocket 通信
2026-02-24 19:04:43 +08:00 · 2025-07-28 21:44:49 +08:00 · 2025-07-28 15:49:52 +08:00
22 changed files with 368 additions and 320 deletions
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@@ -114,3 +114,22 @@

 - 修复无法调用自定义字幕引擎的 bug
 - 修复自定义字幕引擎的参数失效 bug
+
+## v0.6.0
+
+2025-07-xx
+
+### 新增功能
+
+- 新增字幕记录排序功能，可选择字幕记录正序或倒叙显示
+
+### 优化体验
+
+- 交换窗口界面信息和错误提示弹窗的位置，防止提示信息挡住操作
+
+### 项目优化
+
+- 重构字幕引擎，提示字幕引擎代码的可扩展性和可读性
+- 合并 Gummy 和 Vosk 引擎为单个可执行文件，减小软件体积
+- 字幕引擎和主程序添加 WebScoket 通信，完全避免字幕引擎成为孤儿进程
+
--- a/docs/TODO.md
+++ b/docs/TODO.md
@@ -16,10 +16,11 @@
 - [x] 可以导出 srt 格式的字幕记录 *2025/07/14*
 - [x] 可以获取字幕引擎的系统资源消耗情况 *2025/07/15*
 - [x] 添加字幕记录按时间降序排列选择 *2025/07/26*
+- [x] 重构字幕引擎 *2025/07/28*

 ## 待完成

- [ ] 重构字幕引擎
+- [ ] 优化前端界面提示消息
 - [ ] 验证 / 添加基于 sherpa-onnx 的字幕引擎

 ## 后续计划
--- a/docs/api-docs/caption-engine.md
+++ b/docs/api-docs/caption-engine.md
@@ -1,17 +1,63 @@
 # caption engine api-doc

-本文档主要 Electron 主进程和字幕引擎进程的通信约定。
+本文档主要介绍字幕引擎和 Electron 主进程进程的通信约定。

 ## 原理说明

-本项目的 Python 进程通过标准输出向 Electron 主进程发送数据。
+本项目的 Python 进程通过标准输出向 Electron 主进程发送数据。Python 进程标准输出 (`sys.stdout`) 的内容一定为一行一行的字符串。且每行字符串均可以解释为一个 JSON 对象。每个 JSON 对象一定有 `command` 参数。

-Python 进程标准输出 (`sys.stdout`) 的内容一定为一行一行的字符串。且每行字符串均可以解释为一个 JSON 对象。每个 JSON 对象一定有 `command` 参数。
+Electron 主进程通过 WebSocket 向 Python 进程发送数据。发送的数据均是转化为字符串的对象，对象格式一定为：

-## 输出约定
+```js
+{
+  command: string,
+  content: string
+}
+```
+
+## 标准输出约定
+
+> 数据传递方向：字幕引擎进程 => Electron 主进程

 当 JSON 对象的 `command` 参数为下列值时，表示的对应的含义：

+### `connect`
+
+```js
+{
+  command: "connect",
+  content: ""
+}
+```
+
+字幕引擎 WebSocket 服务已经准备好，命令 Electron 主进程连接字幕引擎 WebSocket 服务
+
+### `kill`
+
+```js
+{
+  command: "connect",
+  content: ""
+}
+```
+
+命令 Electron 主进程强制结束字幕引擎进程。
+
+### `caption`
+
+```js
+{
+  command: "caption",
+  index: number,
+  time_s: string,
+  time_t: string,
+  text: string,
+  translation: string
+}
+```
+
+Python 端监听到的音频流转换为的字幕数据。
+
 ### `print`

 ```js
@@ -45,18 +91,12 @@ Python 端打印的提示信息，比起 `print`，该信息更希望 Electron

 Gummy 字幕引擎结束时打印计费消耗信息。

+## WebSocket

-### `caption`
+> 数据传递方向：Electron 主进程 => 字幕引擎进程

-```js
-{
-  command: "caption",
-  index: number,
-  time_s: string,
-  time_t: string,
-  text: string,
-  translation: string
-}
-```
+当 JSON 对象的 `command` 参数为下列值时，表示的对应的含义：

-Python 端监听到的音频流转换为的字幕数据。
+### `stop`
+
+命令当前字幕引擎停止监听并结束任务。
--- a/electron-builder.yml
+++ b/electron-builder.yml
@@ -11,20 +11,15 @@ files:
  - '!{.env,.env.*,.npmrc,pnpm-lock.yaml}'
  - '!{tsconfig.json,tsconfig.node.json,tsconfig.web.json}'
  - '!engine/*'
-  - '!engine-test/*'
  - '!docs/*'
  - '!assets/*'
 extraResources:
  # For Windows
-  - from: ./engine/dist/main-gummy.exe
-    to: ./engine/main-gummy.exe
-  - from: ./engine/dist/main-vosk.exe
-    to: ./engine/main-vosk.exe
+  - from: ./engine/dist/main.exe
+    to: ./engine/main.exe
  # For macOS and Linux
-  # - from: ./engine/dist/main-gummy
-  #   to: ./engine/main-gummy
-  # - from: ./engine/dist/main-vosk
-  #   to: ./engine/main-vosk
+  # - from: ./engine/dist/main
+  #   to: ./engine/main
 win:
  executableName: auto-caption
  icon: build/icon.png
--- a/engine/audio2text/init.py
+++ b/engine/audio2text/init.py
@@ -1,2 +1,3 @@
 from dashscope.common.error import InvalidParameter
-from .gummy import GummyTranslator
+from .gummy import GummyRecognizer
+from .vosk import VoskRecognizer
--- a/engine/audio2text/gummy.py
+++ b/engine/audio2text/gummy.py
@@ -6,7 +6,7 @@ from dashscope.audio.asr import (
 )
 import dashscope
 from datetime import datetime
-from utils import stdout_cmd, stdout_obj
+from utils import stdout_cmd, stdout_obj, stderr


 class Callback(TranslationRecognizerCallback):
@@ -62,7 +62,7 @@ class Callback(TranslationRecognizerCallback):
            stdout_obj(caption)


-class GummyTranslator:
+class GummyRecognizer:
    """
    使用 Gummy 引擎流式处理的音频数据，并在标准输出中输出与 Auto Caption 软件可读取的 JSON 字符串数据

@@ -70,6 +70,7 @@ class GummyTranslator:
        rate: 音频采样率
        source: 源语言代码字符串（zh, en, ja 等）
        target: 目标语言代码字符串（zh, en, ja 等）
+        api_key: 阿里云百炼平台 API KEY
    """
    def __init__(self, rate: int, source: str, target: str | None, api_key: str | None):
        if api_key:
@@ -95,4 +96,7 @@ class GummyTranslator:

    def stop(self):
        """停止 Gummy 引擎"""
-        self.translator.stop()
+        try:
+            self.translator.stop()
+        except Exception:
+            return
--- a/engine/audio2text/vosk.py
+++ b/engine/audio2text/vosk.py
@@ -2,7 +2,8 @@ import json
 from datetime import datetime

 from vosk import Model, KaldiRecognizer, SetLogLevel
-from utils import stdout_obj
+from utils import stdout_cmd, stdout_obj
+

 class VoskRecognizer:
    """
@@ -11,7 +12,7 @@ class VoskRecognizer:
    初始化参数：
        model_path: Vosk 识别模型路径
    """
-    def __int__(self, model_path: str):
+    def __init__(self, model_path: str):
        SetLogLevel(-1)
        if model_path.startswith('"'):
            model_path = model_path[1:]
@@ -24,7 +25,11 @@ class VoskRecognizer:

        self.model = Model(self.model_path)
        self.recognizer = KaldiRecognizer(self.model, 16000)
-    
+
+    def start(self):
+        """启动 Vosk 引擎"""
+        stdout_cmd('info', 'Vosk recognizer started.')
+
    def send_audio_frame(self, data: bytes):
        """
        发送音频帧给 Vosk 引擎，引擎将自动识别并将识别结果输出到标准输出中
@@ -57,3 +62,7 @@ class VoskRecognizer:
            self.prev_content = content
        
        stdout_obj(caption)
+
+    def stop(self):
+        """停止 Vosk 引擎"""
+        stdout_cmd('info', 'Vosk recognizer closed.')
--- a/engine/main-gummy.py
+++ b/engine/main-gummy.py
@@ -1,49 +0,0 @@
-import sys
-import argparse
-from sysaudio import AudioStream
-from utils import merge_chunk_channels
-from audio2text import InvalidParameter, GummyTranslator
-
-
-def convert_audio_to_text(s_lang, t_lang, audio_type, chunk_rate, api_key):
-    stream = AudioStream(audio_type, chunk_rate)
-
-    if t_lang == 'none':
-        gummy = GummyTranslator(stream.RATE, s_lang, None, api_key)
-    else:
-        gummy = GummyTranslator(stream.RATE, s_lang, t_lang, api_key)
-
-    stream.open_stream()
-    gummy.start()
-
-    while True:
-        try:
-            chunk = stream.read_chunk()
-            if chunk is None: continue
-            chunk_mono = merge_chunk_channels(chunk, stream.CHANNELS)
-            try:
-                gummy.send_audio_frame(chunk_mono)
-            except InvalidParameter:
-                gummy.start()
-                gummy.send_audio_frame(chunk_mono)
-        except KeyboardInterrupt:
-            stream.close_stream()
-            gummy.stop()
-            break
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description='Convert system audio stream to text')
-    parser.add_argument('-s', '--source_language', default='en', help='Source language code')
-    parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
-    parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
-    parser.add_argument('-c', '--chunk_rate', default=20, help='The number of audio stream chunks collected per second.')
-    parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
-    args = parser.parse_args()
-    convert_audio_to_text(
-        args.source_language,
-        args.target_language,
-        int(args.audio_type),
-        int(args.chunk_rate),
-        args.api_key
-    )
--- a/engine/main-gummy.spec
+++ b/engine/main-gummy.spec
@@ -1,39 +0,0 @@
-# -*- mode: python ; coding: utf-8 -*-
-
-
-a = Analysis(
-    ['main-gummy.py'],
-    pathex=[],
-    binaries=[],
-    datas=[],
-    hiddenimports=[],
-    hookspath=[],
-    hooksconfig={},
-    runtime_hooks=[],
-    excludes=[],
-    noarchive=False,
-    optimize=0,
-)
-pyz = PYZ(a.pure)
-
-exe = EXE(
-    pyz,
-    a.scripts,
-    a.binaries,
-    a.datas,
-    [],
-    name='main-gummy',
-    debug=False,
-    bootloader_ignore_signals=False,
-    strip=False,
-    upx=True,
-    upx_exclude=[],
-    runtime_tmpdir=None,
-    console=True,
-    disable_windowed_traceback=False,
-    argv_emulation=False,
-    target_arch=None,
-    codesign_identity=None,
-    entitlements_file=None,
-    onefile=True,
-)
--- a/engine/main-vosk.py
+++ b/engine/main-vosk.py
@@ -1,77 +0,0 @@
-import sys
-import json
-import argparse
-from datetime import datetime
-import numpy.core.multiarray
-
-from sysaudio import AudioStream
-from vosk import Model, KaldiRecognizer, SetLogLevel
-from utils import resample_chunk_mono
-
-SetLogLevel(-1)
-
-def convert_audio_to_text(audio_type, chunk_rate, model_path):
-    sys.stdout.reconfigure(line_buffering=True) # type: ignore
-
-    if model_path.startswith('"'):
-        model_path = model_path[1:]
-    if model_path.endswith('"'):
-        model_path = model_path[:-1]
-
-    model = Model(model_path)
-    recognizer = KaldiRecognizer(model, 16000)
-
-    stream = AudioStream(audio_type, chunk_rate)
-    stream.open_stream()
-
-    time_str = ''
-    cur_id = 0
-    prev_content = ''
-
-    while True:
-        chunk = stream.read_chunk()
-        if chunk is None: continue
-        chunk_mono = resample_chunk_mono(chunk, stream.CHANNELS, stream.RATE, 16000)
-
-        caption = {}
-        if recognizer.AcceptWaveform(chunk_mono):
-            content = json.loads(recognizer.Result()).get('text', '')
-            caption['index'] = cur_id
-            caption['text'] = content
-            caption['time_s'] = time_str
-            caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
-            caption['translation'] = ''
-            prev_content = ''
-            cur_id += 1
-        else:
-            content = json.loads(recognizer.PartialResult()).get('partial', '')
-            if content == '' or content == prev_content:
-                continue
-            if prev_content == '':
-                time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
-            caption['command'] = 'caption'
-            caption['index'] = cur_id
-            caption['text'] = content
-            caption['time_s'] = time_str
-            caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
-            caption['translation'] = ''
-            prev_content = content
-        try:
-            json_str = json.dumps(caption) + '\n'
-            sys.stdout.write(json_str)
-            sys.stdout.flush()
-        except Exception as e:
-            print(e)
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description='Convert system audio stream to text')
-    parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
-    parser.add_argument('-c', '--chunk_rate', default=20, help='The number of audio stream chunks collected per second.')
-    parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
-    args = parser.parse_args()
-    convert_audio_to_text(
-        int(args.audio_type),
-        int(args.chunk_rate),
-        args.model_path
-    )
--- a/engine/main.py
+++ b/engine/main.py
@@ -1,10 +1,65 @@
 import argparse
+from utils import stdout_cmd, stderr
+from utils import thread_data, start_server
+from utils import merge_chunk_channels, resample_chunk_mono
+from audio2text import InvalidParameter, GummyRecognizer
+from audio2text import VoskRecognizer
+from sysaudio import AudioStream

-def gummy_engine(s, t, a, c, k):
-    pass

-def vosk_engine(a, c, m):
-    pass
+def main_gummy(s: str, t: str, a: int, c: int, k: str):
+    global thread_data
+    stream = AudioStream(a, c)
+    if t == 'none':
+        engine = GummyRecognizer(stream.RATE, s, None, k)
+    else:
+        engine = GummyRecognizer(stream.RATE, s, t, k)
+
+    stream.open_stream()
+    engine.start()
+
+    restart_count = 0
+    while thread_data.status == "running":
+        try:
+            chunk = stream.read_chunk()
+            if chunk is None: continue
+            chunk_mono = merge_chunk_channels(chunk, stream.CHANNELS)
+            try:
+                engine.send_audio_frame(chunk_mono)
+            except InvalidParameter as e:
+                restart_count += 1
+                if restart_count > 8:
+                    stderr(str(e))
+                    thread_data.status = "kill"
+                    break
+                else:
+                    stdout_cmd('info', f'Gummy engine stopped, trying to restart #{restart_count}')
+        except KeyboardInterrupt:
+            break
+
+    stream.close_stream()
+    engine.stop()
+
+def main_vosk(a: int, c: int, m: str):
+    global thread_data
+    stream = AudioStream(a, c)
+    engine = VoskRecognizer(m)
+
+    stream.open_stream()
+    engine.start()
+
+    while thread_data.status == "running":
+        try:
+            chunk = stream.read_chunk()
+            if chunk is None: continue
+            chunk_mono = resample_chunk_mono(chunk, stream.CHANNELS, stream.RATE, 16000)
+            engine.send_audio_frame(chunk_mono)
+        except KeyboardInterrupt:
+            break
+
+    stream.close_stream()
+    engine.stop()
+

 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Convert system audio stream to text')
@@ -12,15 +67,22 @@ if __name__ == "__main__":
    parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
    parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
    parser.add_argument('-c', '--chunk_rate', default=20, help='Number of audio stream chunks collected per second')
+    parser.add_argument('-p', '--port', default=7070, help='The port to run the server on, 0 for no server')
    # gummy
    parser.add_argument('-s', '--source_language', default='en', help='Source language code')
    parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
    parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
    # vosk
    parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
-    args = parser.parse_args()
+
+    args = parser.parse_args()    
+    if int(args.port) == 0:
+        thread_data.status = "running"
+    else:
+        start_server(int(args.port))
+
    if args.caption_engine == 'gummy':
-        gummy_engine(
+        main_gummy(
            args.source_language,
            args.target_language,
            int(args.audio_type),
@@ -28,10 +90,13 @@ if __name__ == "__main__":
            args.api_key
        )
    elif args.caption_engine == 'vosk':
-        vosk_engine(
+        main_vosk(
            int(args.audio_type),
            int(args.chunk_rate),
            args.model_path
        )
    else:
-        raise ValueError('Invalid caption engine specified.')
+        raise ValueError('Invalid caption engine specified.')
+    
+    if thread_data.status == "kill":
+        stdout_cmd('kill')
--- a/engine/main-vosk.spec
+++ b/engine/main-vosk.spec
@@ -9,7 +9,7 @@ else:
    vosk_path = str(Path('./subenv/lib/python3.12/site-packages/vosk').resolve())

 a = Analysis(
-    ['main-vosk.py'],
+    ['main.py'],
    pathex=[],
    binaries=[],
    datas=[(vosk_path, 'vosk')],
@@ -30,7 +30,7 @@ exe = EXE(
    a.binaries,
    a.datas,
    [],
-    name='main-vosk',
+    name='main',
    debug=False,
    bootloader_ignore_signals=False,
    strip=False,
--- a/engine/utils/init.py
+++ b/engine/utils/init.py
@@ -1,2 +1,4 @@
-from .process import merge_chunk_channels, resample_chunk_mono, resample_mono_chunk
-from .sysout import stdout, stdout_cmd, stdout_obj, stderr
+from .audioprcs import merge_chunk_channels, resample_chunk_mono, resample_mono_chunk
+from .sysout import stdout, stdout_cmd, stdout_obj, stderr
+from .thdata import thread_data
+from .server import start_server
--- a/engine/utils/audioprcs.py
+++ b/engine/utils/audioprcs.py
@@ -1,6 +1,6 @@
 import samplerate
 import numpy as np
-
+import numpy.core.multiarray # do not remove

 def merge_chunk_channels(chunk: bytes, channels: int) -> bytes:
    """
@@ -13,6 +13,7 @@ def merge_chunk_channels(chunk: bytes, channels: int) -> bytes:
    Returns:
        单通道音频数据块
    """
+    if channels == 1: return chunk
    # (length * channels,)
    chunk_np = np.frombuffer(chunk, dtype=np.int16)
    # (length, channels)
@@ -37,13 +38,17 @@ def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: in
    Return:
        单通道音频数据块
    """
-    # (length * channels,)
-    chunk_np = np.frombuffer(chunk, dtype=np.int16)
-    # (length, channels)
-    chunk_np = chunk_np.reshape(-1, channels)
-    # (length,)
-    chunk_mono_f = np.mean(chunk_np.astype(np.float32), axis=1)
-    chunk_mono = chunk_mono_f.astype(np.int16)
+    if channels == 1:
+        chunk_mono = chunk
+    else:
+        # (length * channels,)
+        chunk_np = np.frombuffer(chunk, dtype=np.int16)
+        # (length, channels)
+        chunk_np = chunk_np.reshape(-1, channels)
+        # (length,)
+        chunk_mono_f = np.mean(chunk_np.astype(np.float32), axis=1)
+        chunk_mono = chunk_mono_f.astype(np.int16)
+
    ratio = target_sr / orig_sr
    chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode)
    chunk_mono_r = np.round(chunk_mono_r).astype(np.int16)
--- a/engine/utils/server.py
+++ b/engine/utils/server.py
@@ -0,0 +1,36 @@
+import socket
+import threading
+import json
+from utils import thread_data, stdout_cmd, stderr
+
+
+def handle_client(client_socket):
+    global thread_data
+    while thread_data.status == 'running':
+        try:
+            data = client_socket.recv(4096).decode('utf-8')
+            if not data:
+                break
+            data = json.loads(data)
+
+            if data['command'] == 'stop':
+                thread_data.status = 'stop'
+                break
+        except Exception as e:
+            stderr(f'Communication error: {e}')
+            break
+    
+    thread_data.status = 'stop'
+    client_socket.close()
+
+
+def start_server(port: int):
+    server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    server.bind(('localhost', port))
+    server.listen(1)
+    stdout_cmd('connect')
+
+    client, addr = server.accept()
+    client_handler = threading.Thread(target=handle_client, args=(client,))
+    client_handler.daemon = True
+    client_handler.start()
--- a/engine/utils/thdata.py
+++ b/engine/utils/thdata.py
@@ -0,0 +1,5 @@
+class ThreadData:
+    def __init__(self):
+        self.status = "running"
+
+thread_data = ThreadData()
--- a/src/main/utils/CaptionEngine.ts
+++ b/src/main/utils/CaptionEngine.ts
@@ -1,7 +1,8 @@
-import { spawn, exec } from 'child_process'
+import { exec, spawn } from 'child_process'
 import { app } from 'electron'
 import { is } from '@electron-toolkit/utils'
 import path from 'path'
+import net from 'net'
 import { controlWindow } from '../ControlWindow'
 import { allConfig } from './AllConfig'
 import { i18n } from '../i18n'
@@ -11,22 +12,22 @@ export class CaptionEngine {
  appPath: string = ''
  command: string[] = []
  process: any | undefined
-  processStatus: 'running' | 'stopping' | 'stopped' = 'stopped'
+  client: net.Socket | undefined
+  status: 'running' | 'starting' | 'stopping' | 'stopped' = 'stopped'

  private getApp(): boolean {
-    if (allConfig.controls.customized && allConfig.controls.customizedApp) {
-      Log.info('Using customized engine')
+    if (allConfig.controls.customized) {
+      Log.info('Using customized caption engine')
      this.appPath = allConfig.controls.customizedApp
      this.command = allConfig.controls.customizedCommand.split(' ')
    }
-    else if (allConfig.controls.engine === 'gummy') {
-      allConfig.controls.customized = false
-      if(!allConfig.controls.API_KEY && !process.env.DASHSCOPE_API_KEY) {
+    else {
+      if(allConfig.controls.engine === 'gummy' && 
+        !allConfig.controls.API_KEY && !process.env.DASHSCOPE_API_KEY
+      ) {
        controlWindow.sendErrorMessage(i18n('gummy.key.missing'))
        return false
      }
-      let gummyName = 'main-gummy'
-      if (process.platform === 'win32') { gummyName += '.exe' }
      this.command = []
      if (is.dev) {
        this.appPath = path.join(
@@ -34,70 +35,44 @@ export class CaptionEngine {
          'subenv', 'Scripts', 'python.exe'
        )
        this.command.push(path.join(
-          app.getAppPath(), 'engine', 'main-gummy.py'
+          app.getAppPath(), 'engine', 'main.py'
        ))
+        // this.appPath = path.join(app.getAppPath(), 'engine', 'dist', 'main.exe')
      }
      else {
-        this.appPath = path.join(
-          process.resourcesPath, 'engine', gummyName
+        this.appPath = path.join(process.resourcesPath, 'engine', 'main.exe')
+      }
+
+      if(allConfig.controls.engine === 'gummy') {
+        this.command.push('-e', 'gummy')
+        this.command.push('-s', allConfig.controls.sourceLang)
+        this.command.push(
+          '-t', allConfig.controls.translation ?
+          allConfig.controls.targetLang : 'none'
        )
+        this.command.push('-a', allConfig.controls.audio ? '1' : '0')
+        if(allConfig.controls.API_KEY) {
+          this.command.push('-k', allConfig.controls.API_KEY)
+        }
      }
-      this.command.push('-s', allConfig.controls.sourceLang)
-      this.command.push(
-        '-t', allConfig.controls.translation ?
-        allConfig.controls.targetLang : 'none'
-      )
-      this.command.push('-a', allConfig.controls.audio ? '1' : '0')
-      if(allConfig.controls.API_KEY) {
-        this.command.push('-k', allConfig.controls.API_KEY)
+      else if(allConfig.controls.engine === 'vosk'){
+        this.command.push('-e', 'vosk')
+        this.command.push('-a', allConfig.controls.audio ? '1' : '0')
+        this.command.push('-m', `"${allConfig.controls.modelPath}"`)        
      }
    }
-    else if(allConfig.controls.engine === 'vosk'){
-      allConfig.controls.customized = false
-      let voskName = 'main-vosk'
-      if (process.platform === 'win32') { voskName += '.exe' }
-      this.command = []
-      if (is.dev) {
-        this.appPath = path.join(
-          app.getAppPath(), 'engine',
-          'subenv', 'Scripts', 'python.exe'
-        )
-        this.command.push(path.join(
-          app.getAppPath(), 'engine', 'main-vosk.py'
-        ))
-      }
-      else {
-        this.appPath = path.join(
-          process.resourcesPath, 'engine', voskName
-        )
-      }
-      this.command.push('-a', allConfig.controls.audio ? '1' : '0')
-      this.command.push('-m', `"${allConfig.controls.modelPath}"`)
-    }
    Log.info('Engine Path:', this.appPath)
    Log.info('Engine Command:', this.command)
    return true
  }

-  public start() {
-    if (this.processStatus !== 'stopped') {
-      Log.warn('Caption engine status is not stopped, cannot start')
-      return
-    }
-    if(!this.getApp()){ return }
-
-    try {
-      this.process = spawn(this.appPath, this.command)
-    }
-    catch (e) {
-      controlWindow.sendErrorMessage(i18n('engine.start.error') + e)
-      Log.error('Error starting engine:', e)
-      return
-    }
-
-    this.processStatus = 'running'
-    Log.info('Caption Engine Started, PID:', this.process.pid)
-
+  public connect() {
+    if(this.client) { Log.warn('Client already exists, ignoring...') }
+    Log.info('Connecting to caption engine server...');
+    this.client = net.createConnection({ port: 7070 }, () => {
+      Log.info('Connected to caption engine server');
+    });
+    this.status = 'running'
    allConfig.controls.engineEnabled = true
    if(controlWindow.window){
      allConfig.sendControls(controlWindow.window)
@@ -106,9 +81,31 @@ export class CaptionEngine {
        this.process.pid
      )
    }
+  }

+  public sendCommand(command: string, content: string = "") {
+    if(this.client === undefined) {
+      Log.error('Client not initialized yet')
+      return
+    }
+    const data = JSON.stringify({command, content})
+    this.client.write(data);
+    Log.info(`Send data to python server: ${data}`);
+  }
+
+  public start() {
+    if (this.status !== 'stopped') {
+      Log.warn('Casption engine is not stopped, current status:', this.status)
+      return
+    }
+    if(!this.getApp()){ return }
+
+    this.process = spawn(this.appPath, this.command)
+    this.status = 'starting'
+    Log.info('Caption Engine Starting, PID:', this.process.pid)
+    
    this.process.stdout.on('data', (data: any) => {
-      const lines = data.toString().split('\n');
+      const lines = data.toString().split('\n')
      lines.forEach((line: string) => {
        if (line.trim()) {
          try {
@@ -123,66 +120,87 @@ export class CaptionEngine {
    });

    this.process.stderr.on('data', (data: any) => {
-      if(this.processStatus === 'stopping') return
-      controlWindow.sendErrorMessage(i18n('engine.error') + data)
-      Log.error(`Engine Error: ${data}`);
+      const lines = data.toString().split('\n')
+      lines.forEach((line: string) => {
+        if(line.trim()){
+          controlWindow.sendErrorMessage(/*i18n('engine.error') +*/ line)
+          console.error(line)          
+        }
+      })
    });

    this.process.on('close', (code: any) => {
      this.process = undefined;
+      this.client = undefined
      allConfig.controls.engineEnabled = false
      if(controlWindow.window){
        allConfig.sendControls(controlWindow.window)
        controlWindow.window.webContents.send('control.engine.stopped')
      }
-      this.processStatus = 'stopped'
+      this.status = 'stopped'
      Log.info(`Engine exited with code ${code}`)
    });
  }

  public stop() {
-    if(this.processStatus !== 'running') return
+    if(this.status !== 'running'){
+      Log.warn('Engine is not running, current status:', this.status)
+      return
+    }
+    this.sendCommand('stop')
+    if(this.client){
+      this.client.destroy()
+      this.client = undefined
+    }
+    this.status = 'stopping'
+    Log.info('Caption engine process stopping...')
+  }
+
+  public kill(){
+    if(this.status !== 'running'){
+      Log.warn('Engine is not running, current status:', this.status)
+      return
+    }
    if (this.process.pid) {
-      Log.info('Trying to stop process, PID:', this.process.pid)
+      Log.warn('Trying to kill engine process, PID:', this.process.pid)
+      if(this.client){
+        this.client.destroy()
+        this.client = undefined
+      }
      let cmd = `kill ${this.process.pid}`;
      if (process.platform === "win32") {
        cmd = `taskkill /pid ${this.process.pid} /t /f`
      }
-      exec(cmd, (error) => {
-        if (error) {
-          controlWindow.sendErrorMessage(i18n('engine.shutdown.error') + error)
-          Log.error(`Failed to kill process: ${error}`)
-        }
-      })
+      exec(cmd)
    }
-    else {
-      this.process = undefined;
-      allConfig.controls.engineEnabled = false
-      if(controlWindow.window){
-        allConfig.sendControls(controlWindow.window)
-        controlWindow.window.webContents.send('control.engine.stopped')
-      }
-      this.processStatus = 'stopped'
-      Log.info('Process PID undefined, caption engine process stopped')
-      return
-    }
-    this.processStatus = 'stopping'
-    Log.info('Caption engine process stopping')
+    this.status = 'stopping'
  }
 }

 function handleEngineData(data: any) {
-  if(data.command === 'caption') {
+  if(data.command === 'connect'){
+    captionEngine.connect()
+  }
+  else if(data.command === 'kill') {
+    if(captionEngine.status !== 'stopped') {
+      Log.warn('Error occurred, trying to kill Gummy engine...')
+      captionEngine.kill()
+    }
+  }
+  else if(data.command === 'caption') {
    allConfig.updateCaptionLog(data);
  }
  else if(data.command === 'print') {
-    Log.info('Engine print:', data.content)
+    Log.info('Engine Print:', data.content)
  }
  else if(data.command === 'info') {
-    Log.info('Engine info:', data.content)
+    Log.info('Engine Info:', data.content)
  }
  else if(data.command === 'usage') {
-    Log.info('Caption engine usage: ', data.content)
+    Log.info('Gummy Engine Usage: ', data.content)
+  }
+  else {
+    Log.warn('Unknown command:', data)
  }
 }

--- a/src/main/utils/Log.ts
+++ b/src/main/utils/Log.ts
@@ -3,7 +3,8 @@ function getTimeString() {
  const HH = String(now.getHours()).padStart(2, '0')
  const MM = String(now.getMinutes()).padStart(2, '0')
  const SS = String(now.getSeconds()).padStart(2, '0')
-  return `${HH}:${MM}:${SS}`
+  const MS = String(now.getMilliseconds()).padStart(3, '0')
+  return `${HH}:${MM}:${SS}.${MS}`
 }

 export class Log {
@@ -12,10 +13,10 @@ export class Log {
  }

  static warn(...msg: any[]){
-    console.log(`[WARN ${getTimeString()}]`, ...msg)
+    console.warn(`[WARN ${getTimeString()}]`, ...msg)
  }

  static error(...msg: any[]){
-    console.log(`[ERROR ${getTimeString()}]`, ...msg)
+    console.error(`[ERROR ${getTimeString()}]`, ...msg)
  }
 }
--- a/src/renderer/src/components/CaptionStyle.vue
+++ b/src/renderer/src/components/CaptionStyle.vue
@@ -282,7 +282,8 @@ function applyStyle(){

  captionStyle.sendStylesChange();

-    notification.open({
+  notification.open({
+    placement: 'topLeft',
    message: t('noti.styleChange'),
    description: t('noti.styleInfo')
  });
--- a/src/renderer/src/components/EngineControl.vue
+++ b/src/renderer/src/components/EngineControl.vue
@@ -164,6 +164,7 @@ function applyChange(){
  engineControl.sendControlsChange()

  notification.open({
+    placement: 'topLeft',
    message: t('noti.engineChange'),
    description: t('noti.changeInfo')
  });
--- a/src/renderer/src/components/EngineStatus.vue
+++ b/src/renderer/src/components/EngineStatus.vue
@@ -4,7 +4,7 @@
      <a-col :span="6">
        <a-statistic
          :title="$t('status.engine')"
-          :value="(customized && customizedApp)?$t('status.customized'):engine"
+          :value="customized?$t('status.customized'):engine"
        />
      </a-col>
      <a-popover :title="$t('status.engineStatus')">
@@ -61,12 +61,14 @@
    >{{ $t('status.openCaption') }}</a-button>
    <a-button
      class="control-button"
-      :disabled="engineEnabled"
+      :loading="pending && !engineEnabled"
+      :disabled="pending || engineEnabled"
      @click="startEngine"
    >{{ $t('status.startEngine') }}</a-button>
    <a-button
     danger class="control-button"
-     :disabled="!engineEnabled"
+     :loading="pending && engineEnabled"
+     :disabled="pending || !engineEnabled"
     @click="stopEngine"
    >{{ $t('status.stopEngine') }}</a-button>
  </div>
@@ -119,18 +121,19 @@

 <script setup lang="ts">
 import { EngineInfo } from '@renderer/types'
-import { ref } from 'vue'
+import { ref, watch } from 'vue'
 import { storeToRefs } from 'pinia'
 import { useCaptionLogStore } from '@renderer/stores/captionLog'
 import { useEngineControlStore } from '@renderer/stores/engineControl'
 import { GithubOutlined, InfoCircleOutlined } from '@ant-design/icons-vue';

 const showAbout = ref(false)
+const pending = ref(false)

 const captionLog = useCaptionLogStore()
 const { captionData } = storeToRefs(captionLog)
 const engineControl = useEngineControlStore()
-const { engineEnabled, engine, customized, customizedApp } = storeToRefs(engineControl)
+const { engineEnabled, engine, customized } = storeToRefs(engineControl)

 const pid = ref(0)
 const ppid = ref(0)
@@ -143,6 +146,7 @@ function openCaptionWindow() {
 }

 function startEngine() {
+  pending.value = true
  if(engineControl.engine === 'vosk' && engineControl.modelPath.trim() === '') {
    engineControl.emptyModelPathErr()
    return
@@ -151,6 +155,7 @@ function startEngine() {
 }

 function stopEngine() {
+  pending.value = true
  window.electron.ipcRenderer.send('control.engine.stop')
 }

@@ -164,6 +169,9 @@ function getEngineInfo() {
  })
 }

+watch(engineEnabled, () => {
+  pending.value = false
+})
 </script>

 <style scoped>
--- a/src/renderer/src/stores/engineControl.ts
+++ b/src/renderer/src/stores/engineControl.ts
@@ -64,6 +64,7 @@ export const useEngineControlStore = defineStore('engineControl', () => {

  function emptyModelPathErr() {
    notification.open({
+      placement: 'topLeft',
      message: t('noti.empty'),
      description: t('noti.emptyInfo')
    });
@@ -80,15 +81,17 @@ export const useEngineControlStore = defineStore('engineControl', () => {
      (translation.value ? `${t('noti.tLang')}${targetLang.value}` : '');
    const str1 = `${t('noti.custom')}${customizedApp.value}${t('noti.args')}${customizedCommand.value}`;
    notification.open({
+      placement: 'topLeft',
      message: t('noti.started'),
      description:
-        ((customized.value && customizedApp.value) ? str1 : str0) +
+        (customized.value ? str1 : str0) +
        `${t('noti.pidInfo')}${args}`
    });
  })

  window.electron.ipcRenderer.on('control.engine.stopped', () => {
    notification.open({
+      placement: 'topLeft',
      message: t('noti.stopped'),
      description: t('noti.stoppedInfo')
    });
@@ -99,7 +102,6 @@ export const useEngineControlStore = defineStore('engineControl', () => {
      message: t('noti.error'),
      description: message,
      duration: null,
-      placement: 'topLeft',
      icon: () => h(ExclamationCircleOutlined, { style: 'color: #ff4d4f' })
    });
  })