feat(engine): 优化字幕引擎输出格式、准备合并两个字幕引擎

- 重构字幕引擎相关代码 - 准备合并两个字幕引擎
refactor(engine): 重构字幕引擎
2026-02-26 21:54:43 +08:00 · 2025-07-27 17:15:12 +08:00 · 2025-07-26 23:37:24 +08:00 · 2025-07-26 21:29:16 +08:00 · 2025-07-20 00:32:57 +08:00
43 changed files with 523 additions and 784 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -5,8 +5,8 @@ out
 .eslintcache
 *.log*
 __pycache__
-subenv
-caption-engine/build
-caption-engine/models
-output.wav
 .venv
+subenv
+engine/build
+engine/models
+engine/notebook
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@@ -9,6 +9,6 @@
    "editor.defaultFormatter": "esbenp.prettier-vscode"
  },
  "python.analysis.extraPaths": [
-    "./caption-engine"
+    "./engine"
  ]
 }
--- a/README.md
+++ b/README.md
@@ -122,10 +122,10 @@ npm install

 ### 构建字幕引擎

-首先进入 `caption-engine` 文件夹，执行如下指令创建虚拟环境：
+首先进入 `engine` 文件夹，执行如下指令创建虚拟环境：

 ```bash
-# in ./caption-engine folder
+# in ./engine folder
 python -m venv subenv
 # or
 python3 -m venv subenv
@@ -173,7 +173,7 @@ vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
 vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
 ```

-此时项目构建完成，在进入 `caption-engine/dist` 文件夹可见对应的可执行文件。即可进行后续操作。
+此时项目构建完成，在进入 `engine/dist` 文件夹可见对应的可执行文件。即可进行后续操作。

 ### 运行项目

@@ -183,8 +183,6 @@ npm run dev

 ### 构建项目

-注意目前软件只在 Windows 和 macOS 平台上进行了构建和测试，无法保证软件在 Linux 平台下的正确性。
-
 ```bash
 # For windows
 npm run build:win
@@ -199,13 +197,13 @@ npm run build:linux
 ```yml
 extraResources:
  # For Windows
-  - from: ./caption-engine/dist/main-gummy.exe
-    to: ./caption-engine/main-gummy.exe
-  - from: ./caption-engine/dist/main-vosk.exe
-    to: ./caption-engine/main-vosk.exe
+  - from: ./engine/dist/main-gummy.exe
+    to: ./engine/main-gummy.exe
+  - from: ./engine/dist/main-vosk.exe
+    to: ./engine/main-vosk.exe
  # For macOS and Linux
-  # - from: ./caption-engine/dist/main-gummy
-  #   to: ./caption-engine/main-gummy
-  # - from: ./caption-engine/dist/main-vosk
-  #   to: ./caption-engine/main-vosk
+  # - from: ./engine/dist/main-gummy
+  #   to: ./engine/main-gummy
+  # - from: ./engine/dist/main-vosk
+  #   to: ./engine/main-vosk
 ```
--- a/README_en.md
+++ b/README_en.md
@@ -122,10 +122,10 @@ npm install

 ### Build Subtitle Engine

-First enter the `caption-engine` folder and execute the following commands to create a virtual environment:
+First enter the `engine` folder and execute the following commands to create a virtual environment:

 ```bash
-# in ./caption-engine folder
+# in ./engine folder
 python -m venv subenv
 # or
 python3 -m venv subenv
@@ -173,7 +173,7 @@ vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
 vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
 ```

-After the build completes, you can find the executable file in the `caption-engine/dist` folder. Then proceed with subsequent operations.
+After the build completes, you can find the executable file in the `engine/dist` folder. Then proceed with subsequent operations.

 ### Run Project

@@ -183,8 +183,6 @@ npm run dev

 ### Build Project

-Note: Currently the software has only been built and tested on Windows and macOS platforms. Correct operation on Linux platform is not guaranteed.
-
 ```bash
 # For windows
 npm run build:win
@@ -199,13 +197,13 @@ Note: You need to modify the configuration content in the `electron-builder.yml`
 ```yml
 extraResources:
  # For Windows
-  - from: ./caption-engine/dist/main-gummy.exe
-    to: ./caption-engine/main-gummy.exe
-  - from: ./caption-engine/dist/main-vosk.exe
-    to: ./caption-engine/main-vosk.exe
+  - from: ./engine/dist/main-gummy.exe
+    to: ./engine/main-gummy.exe
+  - from: ./engine/dist/main-vosk.exe
+    to: ./engine/main-vosk.exe
  # For macOS and Linux
-  # - from: ./caption-engine/dist/main-gummy
-  #   to: ./caption-engine/main-gummy
-  # - from: ./caption-engine/dist/main-vosk
-  #   to: ./caption-engine/main-vosk
+  # - from: ./engine/dist/main-gummy
+  #   to: ./engine/main-gummy
+  # - from: ./engine/dist/main-vosk
+  #   to: ./engine/main-vosk
 ```
--- a/README_ja.md
+++ b/README_ja.md
@@ -122,10 +122,10 @@ npm install

 ### 字幕エンジンの構築

-まず `caption-engine` フォルダに入り、以下のコマンドを実行して仮想環境を作成します：
+まず `engine` フォルダに入り、以下のコマンドを実行して仮想環境を作成します：

 ```bash
-# ./caption-engine フォルダ内
+# ./engine フォルダ内
 python -m venv subenv
 # または
 python3 -m venv subenv
@@ -173,7 +173,7 @@ vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
 vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
 ```

-これでプロジェクトのビルドが完了し、`caption-engine/dist` フォルダ内に対応する実行可能ファイルが確認できます。その後、次の操作に進むことができます。
+これでプロジェクトのビルドが完了し、`engine/dist` フォルダ内に対応する実行可能ファイルが確認できます。その後、次の操作に進むことができます。

 ### プロジェクト実行

@@ -183,8 +183,6 @@ npm run dev

 ### プロジェクト構築

-現在、ソフトウェアは Windows と macOS プラットフォームでのみ構築とテストが行われており、Linux プラットフォームでの正しい動作は保証できません。
-
 ```bash
 # Windows 用
 npm run build:win
@@ -199,13 +197,13 @@ npm run build:linux
 ```yml
 extraResources:
  # Windows用
-  - from: ./caption-engine/dist/main-gummy.exe
-    to: ./caption-engine/main-gummy.exe
-  - from: ./caption-engine/dist/main-vosk.exe
-    to: ./caption-engine/main-vosk.exe
+  - from: ./engine/dist/main-gummy.exe
+    to: ./engine/main-gummy.exe
+  - from: ./engine/dist/main-vosk.exe
+    to: ./engine/main-vosk.exe
  # macOSとLinux用
-  # - from: ./caption-engine/dist/main-gummy
-  #   to: ./caption-engine/main-gummy
-  # - from: ./caption-engine/dist/main-vosk
-  #   to: ./caption-engine/main-vosk
+  # - from: ./engine/dist/main-gummy
+  #   to: ./engine/main-gummy
+  # - from: ./engine/dist/main-vosk
+  #   to: ./engine/main-vosk
 ```
--- a/caption-engine/audioprcs/init.py
+++ b/caption-engine/audioprcs/init.py
@@ -1 +0,0 @@
-from .process import mergeChunkChannels, resampleRawChunk, resampleMonoChunk
--- a/caption-engine/sysaudio/init.py
+++ b/caption-engine/sysaudio/init.py
--- a/docs/TODO.md
+++ b/docs/TODO.md
@@ -15,10 +15,12 @@
 - [x] 可以调整字幕时间轴 *2025/07/14*
 - [x] 可以导出 srt 格式的字幕记录 *2025/07/14*
 - [x] 可以获取字幕引擎的系统资源消耗情况 *2025/07/15*
+- [x] 添加字幕记录按时间降序排列选择 *2025/07/26*

 ## 待完成

- [ ] 探索更多的语音转文字模型
+- [ ] 重构字幕引擎
+- [ ] 验证 / 添加基于 sherpa-onnx 的字幕引擎

 ## 后续计划

--- a/docs/api-docs/caption-engine.md
+++ b/docs/api-docs/caption-engine.md
@@ -0,0 +1,62 @@
+# caption engine api-doc
+
+本文档主要 Electron 主进程和字幕引擎进程的通信约定。
+
+## 原理说明
+
+本项目的 Python 进程通过标准输出向 Electron 主进程发送数据。
+
+Python 进程标准输出 (`sys.stdout`) 的内容一定为一行一行的字符串。且每行字符串均可以解释为一个 JSON 对象。每个 JSON 对象一定有 `command` 参数。
+
+## 输出约定
+
+当 JSON 对象的 `command` 参数为下列值时，表示的对应的含义：
+
+### `print`
+
+```js
+{
+  command: "print",
+  content: string
+}
+```
+
+输出 Python 端打印的内容。
+
+### `info`
+
+```js
+{
+  command: "info",
+  content: string
+}
+```
+
+Python 端打印的提示信息，比起 `print`，该信息更希望 Electron 端的关注。
+
+### `usage`
+
+```js
+{
+  command: "usage",
+  content: string
+}
+```
+
+Gummy 字幕引擎结束时打印计费消耗信息。
+
+
+### `caption`
+
+```js
+{
+  command: "caption",
+  index: number,
+  time_s: string,
+  time_t: string,
+  text: string,
+  translation: string
+}
+```
+
+Python 端监听到的音频流转换为的字幕数据。
--- a/docs/engine-manual/en.md
+++ b/docs/engine-manual/en.md
@@ -2,6 +2,8 @@

 Corresponding Version: v0.5.1

+**Note: Due to limited personal resources, the English and Japanese documentation files for this project (except for the README document) will no longer be maintained. The content of this document may not be consistent with the latest version of the project. If you are willing to help with translation, please submit relevant Pull Requests.**
+
 ![](../../assets/media/structure_en.png)

 ## Introduction to the Caption Engine
@@ -20,7 +22,7 @@ Generally, the captured audio stream data consists of short audio chunks, and th

 The acquired audio stream may need preprocessing before being converted to text. For instance, Alibaba Cloud's Gummy model can only recognize single-channel audio streams, while the collected audio streams are typically dual-channel, thus requiring conversion from dual-channel to single-channel. Channel conversion can be achieved using methods in the NumPy library.

-You can directly use the audio acquisition (`caption-engine/sysaudio`) and audio processing (`caption-engine/audioprcs`) modules I have developed.
+You can directly use the audio acquisition (`engine/sysaudio`) and audio processing (`engine/audioprcs`) modules I have developed.

 ### Audio to Text Conversion

@@ -105,10 +107,10 @@ export interface CaptionItem {
 If using Python, you can refer to the following method to pass data to the main program:

 ```python
-# caption-engine\main-gummy.py
+# engine\main-gummy.py
 sys.stdout.reconfigure(line_buffering=True)

-# caption-engine\audio2text\gummy.py
+# engine\audio2text\gummy.py
 ...
    def send_to_node(self, data):
        """
@@ -198,4 +200,4 @@ With a working caption engine, specify its path and runtime parameters in the ca

 ## Reference Code

-The `main-gummy.py` file under the `caption-engine` folder in this project serves as the entry point for the default caption engine. The `src\main\utils\engine.ts` file contains the server-side code for acquiring and processing data from the caption engine. You can read and understand the implementation details and the complete execution process of the caption engine as needed.
+The `main-gummy.py` file under the `engine` folder in this project serves as the entry point for the default caption engine. The `src\main\utils\engine.ts` file contains the server-side code for acquiring and processing data from the caption engine. You can read and understand the implementation details and the complete execution process of the caption engine as needed.
--- a/docs/engine-manual/ja.md
+++ b/docs/engine-manual/ja.md
@@ -4,6 +4,8 @@

 この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。

+**注意：個人のリソースが限られているため、このプロジェクトの英語および日本語のドキュメント（README ドキュメントを除く）のメンテナンスは行われません。このドキュメントの内容は最新版のプロジェクトと一致しない場合があります。翻訳のお手伝いをしていただける場合は、関連するプルリクエストを提出してください。**
+
 ![](../../assets/media/structure_ja.png)

 ## 字幕エンジンの紹介
@@ -22,7 +24,7 @@

 取得した音声ストリームは、テキストに変換する前に前処理が必要な場合があります。例えば、アリババクラウドのGummyモデルは単一チャンネルの音声ストリームしか認識できませんが、収集された音声ストリームは通常二重チャンネルであるため、二重チャンネルの音声ストリームを単一チャンネルに変換する必要があります。チャンネル数の変換はNumPyライブラリのメソッドを使って行うことができます。

-あなたは私によって開発された音声の取得（`caption-engine/sysaudio`）と音声の処理（`caption-engine/audioprcs`）モジュールを直接使用することができます。
+あなたは私によって開発された音声の取得（`engine/sysaudio`）と音声の処理（`engine/audioprcs`）モジュールを直接使用することができます。

 ### 音声からテキストへの変換

@@ -107,10 +109,10 @@ export interface CaptionItem {
 Python言語を使用する場合、以下の方法でデータをメインプログラムに渡すことができます：

 ```python
-# caption-engine\main-gummy.py
+# engine\main-gummy.py
 sys.stdout.reconfigure(line_buffering=True)

-# caption-engine\audio2text\gummy.py
+# engine\audio2text\gummy.py
 ...
    def send_to_node(self, data):
        """
@@ -198,4 +200,4 @@ python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>

 ## 参考コード

-本プロジェクトの`caption-engine`フォルダにある`main-gummy.py`ファイルはデフォルトの字幕エンジンのエントリーコードです。`src\main\utils\engine.ts`はサーバー側で字幕エンジンのデータを取得・処理するコードです。必要に応じて字幕エンジンの実装詳細と完全な実行プロセスを理解するために参照してください。
+本プロジェクトの`engine`フォルダにある`main-gummy.py`ファイルはデフォルトの字幕エンジンのエントリーコードです。`src\main\utils\engine.ts`はサーバー側で字幕エンジンのデータを取得・処理するコードです。必要に応じて字幕エンジンの実装詳細と完全な実行プロセスを理解するために参照してください。
--- a/docs/engine-manual/zh.md
+++ b/docs/engine-manual/zh.md
@@ -20,7 +20,7 @@

 获取到的音频流在转文字之前可能需要进行预处理。比如阿里云的 Gummy 模型只能识别单通道的音频流，而收集的音频流一般是双通道的，因此要将双通道音频流转换为单通道。通道数的转换可以使用 NumPy 库中的方法实现。

-你可以直接使用我开发好的音频获取（`caption-engine/sysaudio`）和音频处理（`caption-engine/audioprcs`）模块。
+你可以直接使用我开发好的音频获取（`engine/sysaudio`）和音频处理（`engine/audioprcs`）模块。

 ### 音频转文字

@@ -105,10 +105,10 @@ export interface CaptionItem {
 如果使用 python 语言，可以参考以下方式将数据传递给主程序：

 ```python
-# caption-engine\main-gummy.py
+# engine\main-gummy.py
 sys.stdout.reconfigure(line_buffering=True)

-# caption-engine\audio2text\gummy.py
+# engine\audio2text\gummy.py
 ...
    def send_to_node(self, data):
        """
@@ -198,4 +198,4 @@ python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>

 ## 参考代码

-本项目 `caption-engine` 文件夹下的 `main-gummy.py` 文件为默认字幕引擎的入口代码。`src\main\utils\engine.ts` 为服务端获取字幕引擎数据和进行处理的代码。可以根据需要阅读了解字幕引擎的实现细节和完整运行过程。
+本项目 `engine` 文件夹下的 `main-gummy.py` 文件为默认字幕引擎的入口代码。`src\main\utils\engine.ts` 为服务端获取字幕引擎数据和进行处理的代码。可以根据需要阅读了解字幕引擎的实现细节和完整运行过程。
--- a/docs/user-manual/en.md
+++ b/docs/user-manual/en.md
@@ -2,6 +2,8 @@

 Corresponding Version: v0.5.1

+**Note: Due to limited personal resources, the English and Japanese documentation files for this project (except for the README document) will no longer be maintained. The content of this document may not be consistent with the latest version of the project. If you are willing to help with translation, please submit relevant Pull Requests.**
+
 ## Software Introduction

 Auto Caption is a cross-platform caption display software that can real-time capture system audio input (recording) or output (playback) streaming data and use an audio-to-text model to generate captions for the corresponding audio. The default caption engine provided by the software (using Alibaba Cloud Gummy model) supports recognition and translation in nine languages (Chinese, English, Japanese, Korean, German, French, Russian, Spanish, Italian).
--- a/docs/user-manual/ja.md
+++ b/docs/user-manual/ja.md
@@ -4,6 +4,8 @@

 この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。

+**注意：個人のリソースが限られているため、このプロジェクトの英語および日本語のドキュメント（README ドキュメントを除く）のメンテナンスは行われません。このドキュメントの内容は最新版のプロジェクトと一致しない場合があります。翻訳のお手伝いをしていただける場合は、関連するプルリクエストを提出してください。**
+
 ## ソフトウェアの概要

 Auto Caption は、クロスプラットフォームの字幕表示ソフトウェアで、システムの音声入力（録音）または出力（音声再生）のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。このソフトウェアが提供するデフォルトの字幕エンジン（アリババクラウド Gummy モデルを使用）は、9つの言語（中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、スペイン語、イタリア語）の認識と翻訳をサポートしています。
--- a/electron-builder.yml
+++ b/electron-builder.yml
@@ -10,21 +10,21 @@ files:
  - '!{LICENSE,README.md,README_en.md,README_ja.md}'
  - '!{.env,.env.*,.npmrc,pnpm-lock.yaml}'
  - '!{tsconfig.json,tsconfig.node.json,tsconfig.web.json}'
-  - '!caption-engine/*'
+  - '!engine/*'
  - '!engine-test/*'
  - '!docs/*'
  - '!assets/*'
 extraResources:
  # For Windows
-  - from: ./caption-engine/dist/main-gummy.exe
-    to: ./caption-engine/main-gummy.exe
-  - from: ./caption-engine/dist/main-vosk.exe
-    to: ./caption-engine/main-vosk.exe
+  - from: ./engine/dist/main-gummy.exe
+    to: ./engine/main-gummy.exe
+  - from: ./engine/dist/main-vosk.exe
+    to: ./engine/main-vosk.exe
  # For macOS and Linux
-  # - from: ./caption-engine/dist/main-gummy
-  #   to: ./caption-engine/main-gummy
-  # - from: ./caption-engine/dist/main-vosk
-  #   to: ./caption-engine/main-vosk
+  # - from: ./engine/dist/main-gummy
+  #   to: ./engine/main-gummy
+  # - from: ./engine/dist/main-vosk
+  #   to: ./engine/main-vosk
 win:
  executableName: auto-caption
  icon: build/icon.png
--- a/engine-test/gummy.ipynb
+++ b/engine-test/gummy.ipynb
@@ -1,221 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from dashscope.audio.asr import * # type: ignore\n",
-    "import pyaudiowpatch as pyaudio\n",
-    "import numpy as np\n",
-    "\n",
-    "\n",
-    "def getDefaultSpeakers(mic: pyaudio.PyAudio, info = True):\n",
-    "    \"\"\"\n",
-    "    获取默认的系统音频输出的回环设备\n",
-    "    Args:\n",
-    "        mic (pyaudio.PyAudio): pyaudio对象\n",
-    "        info (bool, optional): 是否打印设备信息. Defaults to True.\n",
-    "\n",
-    "    Returns:\n",
-    "        dict: 统音频输出的回环设备\n",
-    "    \"\"\"\n",
-    "    try:\n",
-    "        WASAPI_info = mic.get_host_api_info_by_type(pyaudio.paWASAPI)\n",
-    "    except OSError:\n",
-    "        print(\"Looks like WASAPI is not available on the system. Exiting...\")\n",
-    "        exit()\n",
-    "\n",
-    "    default_speaker = mic.get_device_info_by_index(WASAPI_info[\"defaultOutputDevice\"])\n",
-    "    if(info): print(\"wasapi_info:\\n\", WASAPI_info, \"\\n\")\n",
-    "    if(info): print(\"default_speaker:\\n\", default_speaker, \"\\n\")\n",
-    "\n",
-    "    if not default_speaker[\"isLoopbackDevice\"]:\n",
-    "        for loopback in mic.get_loopback_device_info_generator():\n",
-    "            if default_speaker[\"name\"] in loopback[\"name\"]:\n",
-    "                default_speaker = loopback\n",
-    "                if(info): print(\"Using loopback device:\\n\", default_speaker, \"\\n\")\n",
-    "                break\n",
-    "        else:\n",
-    "            print(\"Default loopback output device not found.\")\n",
-    "            print(\"Run `python -m pyaudiowpatch` to check available devices.\")\n",
-    "            print(\"Exiting...\")\n",
-    "            exit()\n",
-    "            \n",
-    "    if(info): print(f\"Recording Device: #{default_speaker['index']} {default_speaker['name']}\")\n",
-    "    return default_speaker\n",
-    "\n",
-    "\n",
-    "class Callback(TranslationRecognizerCallback):\n",
-    "    \"\"\"\n",
-    "    语音大模型流式传输回调对象\n",
-    "    \"\"\"\n",
-    "    def __init__(self):\n",
-    "        super().__init__()\n",
-    "        self.usage = 0\n",
-    "        self.sentences = []\n",
-    "        self.translations = []\n",
-    "    \n",
-    "    def on_open(self) -> None:\n",
-    "        print(\"\\n流式翻译开始...\\n\")\n",
-    "\n",
-    "    def on_close(self) -> None:\n",
-    "        print(f\"\\nTokens消耗：{self.usage}\")\n",
-    "        print(f\"流式翻译结束...\\n\")\n",
-    "        for i in range(len(self.sentences)):\n",
-    "            print(f\"\\n{self.sentences[i]}\\n{self.translations[i]}\\n\")\n",
-    "\n",
-    "    def on_event(\n",
-    "        self,\n",
-    "        request_id,\n",
-    "        transcription_result: TranscriptionResult,\n",
-    "        translation_result: TranslationResult,\n",
-    "        usage\n",
-    "    ) -> None:\n",
-    "        if transcription_result is not None:\n",
-    "            id = transcription_result.sentence_id\n",
-    "            text = transcription_result.text\n",
-    "            if transcription_result.stash is not None:\n",
-    "                stash = transcription_result.stash.text\n",
-    "            else:\n",
-    "                stash = \"\"\n",
-    "            print(f\"#{id}: {text}{stash}\")\n",
-    "            if usage: self.sentences.append(text)\n",
-    "        \n",
-    "        if translation_result is not None:\n",
-    "            lang = translation_result.get_language_list()[0]\n",
-    "            text = translation_result.get_translation(lang).text\n",
-    "            if translation_result.get_translation(lang).stash is not None:\n",
-    "                stash = translation_result.get_translation(lang).stash.text\n",
-    "            else:\n",
-    "                stash = \"\"\n",
-    "            print(f\"#{lang}: {text}{stash}\")\n",
-    "            if usage: self.translations.append(text)\n",
-    "        \n",
-    "        if usage: self.usage += usage['duration']"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "采样输入设备：\n",
-      "    - 序号：26\n",
-      "    - 名称：耳机 (HUAWEI FreeLace 活力版) [Loopback]\n",
-      "    - 最大输入通道数：2\n",
-      "    - 默认低输入延迟：0.003s\n",
-      "    - 默认高输入延迟：0.01s\n",
-      "    - 默认采样率：48000.0Hz\n",
-      "    - 是否回环设备：True\n",
-      "\n",
-      "音频样本块大小：4800\n",
-      "样本位宽：2\n",
-      "音频数据格式：8\n",
-      "音频通道数：2\n",
-      "音频采样率：48000\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "mic = pyaudio.PyAudio()\n",
-    "default_speaker = getDefaultSpeakers(mic, False)\n",
-    "\n",
-    "SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)\n",
-    "FORMAT = pyaudio.paInt16\n",
-    "CHANNELS = default_speaker[\"maxInputChannels\"]\n",
-    "RATE = int(default_speaker[\"defaultSampleRate\"])\n",
-    "CHUNK = RATE // 10\n",
-    "INDEX = default_speaker[\"index\"]\n",
-    "\n",
-    "dev_info = f\"\"\"\n",
-    "采样输入设备：\n",
-    "    - 序号：{default_speaker['index']}\n",
-    "    - 名称：{default_speaker['name']}\n",
-    "    - 最大输入通道数：{default_speaker['maxInputChannels']}\n",
-    "    - 默认低输入延迟：{default_speaker['defaultLowInputLatency']}s\n",
-    "    - 默认高输入延迟：{default_speaker['defaultHighInputLatency']}s\n",
-    "    - 默认采样率：{default_speaker['defaultSampleRate']}Hz\n",
-    "    - 是否回环设备：{default_speaker['isLoopbackDevice']}\n",
-    "\n",
-    "音频样本块大小：{CHUNK}\n",
-    "样本位宽：{SAMP_WIDTH}\n",
-    "音频数据格式：{FORMAT}\n",
-    "音频通道数：{CHANNELS}\n",
-    "音频采样率：{RATE}\n",
-    "\"\"\"\n",
-    "print(dev_info)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "RECORD_SECONDS = 20 # 监听时长(s)\n",
-    "\n",
-    "stream = mic.open(\n",
-    "    format = FORMAT,\n",
-    "    channels = CHANNELS,\n",
-    "    rate = RATE,\n",
-    "    input = True,\n",
-    "    input_device_index = INDEX\n",
-    ")\n",
-    "translator = TranslationRecognizerRealtime(\n",
-    "    model = \"gummy-realtime-v1\",\n",
-    "    format = \"pcm\",\n",
-    "    sample_rate = RATE,\n",
-    "    transcription_enabled = True,\n",
-    "    translation_enabled = True,\n",
-    "    source_language = \"ja\",\n",
-    "    translation_target_languages = [\"zh\"],\n",
-    "    callback = Callback()\n",
-    ")\n",
-    "translator.start()\n",
-    "\n",
-    "for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):\n",
-    "    data = stream.read(CHUNK)\n",
-    "    data_np = np.frombuffer(data, dtype=np.int16)\n",
-    "    data_np_r = data_np.reshape(-1, CHANNELS)\n",
-    "    print(data_np_r.shape)\n",
-    "    mono_data = np.mean(data_np_r.astype(np.float32), axis=1)\n",
-    "    mono_data = mono_data.astype(np.int16)\n",
-    "    mono_data_bytes = mono_data.tobytes()\n",
-    "    translator.send_audio_frame(mono_data_bytes)\n",
-    "\n",
-    "translator.stop()\n",
-    "stream.stop_stream()\n",
-    "stream.close()"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "mystd",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.12"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/engine-test/resample.ipynb
+++ b/engine-test/resample.ipynb
@@ -1,189 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "1e12f3ef",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "        采样输入设备：\n",
-      "            - 设备类型：音频输出\n",
-      "            - 序号：0\n",
-      "            - 名称：BlackHole 2ch\n",
-      "            - 最大输入通道数：2\n",
-      "            - 默认低输入延迟：0.01s\n",
-      "            - 默认高输入延迟：0.1s\n",
-      "            - 默认采样率：48000.0Hz\n",
-      "\n",
-      "        音频样本块大小：2400\n",
-      "        样本位宽：2\n",
-      "        采样格式：8\n",
-      "        音频通道数：2\n",
-      "        音频采样率：48000\n",
-      "        \n"
-     ]
-    }
-   ],
-   "source": [
-    "import sys\n",
-    "import os\n",
-    "import wave\n",
-    "\n",
-    "current_dir = os.getcwd() \n",
-    "sys.path.append(os.path.join(current_dir, '../caption-engine'))\n",
-    "\n",
-    "from sysaudio.darwin import AudioStream\n",
-    "from audioprcs import resampleRawChunk, mergeChunkChannels\n",
-    "\n",
-    "stream = AudioStream(0)\n",
-    "stream.printInfo()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "a72914f4",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Recording...\n",
-      "Done\n"
-     ]
-    }
-   ],
-   "source": [
-    "\"\"\"获取系统音频输出5秒，然后保存为wav文件\"\"\"\n",
-    "\n",
-    "with wave.open('output.wav', 'wb') as wf:\n",
-    "    wf.setnchannels(stream.CHANNELS)\n",
-    "    wf.setsampwidth(stream.SAMP_WIDTH)\n",
-    "    wf.setframerate(stream.RATE)\n",
-    "    stream.openStream()\n",
-    "\n",
-    "    print('Recording...')\n",
-    "\n",
-    "    for _ in range(0, 100):\n",
-    "        chunk = stream.read_chunk()\n",
-    "        if isinstance(chunk, bytes):\n",
-    "            wf.writeframes(chunk)\n",
-    "        else:\n",
-    "            raise Exception('Error: chunk is not bytes')\n",
-    "        \n",
-    "    stream.closeStream()    \n",
-    "    print('Done')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "a6e8a098",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Recording...\n",
-      "Done\n"
-     ]
-    }
-   ],
-   "source": [
-    "\"\"\"获取系统音频输入，转换为单通道音频，持续5秒，然后保存为wav文件\"\"\"\n",
-    "\n",
-    "with wave.open('output.wav', 'wb') as wf:\n",
-    "    wf.setnchannels(1)\n",
-    "    wf.setsampwidth(stream.SAMP_WIDTH)\n",
-    "    wf.setframerate(stream.RATE)\n",
-    "    stream.openStream()\n",
-    "\n",
-    "    print('Recording...')\n",
-    "\n",
-    "    for _ in range(0, 100):\n",
-    "        chunk = mergeChunkChannels(\n",
-    "            stream.read_chunk(),\n",
-    "            stream.CHANNELS\n",
-    "        )\n",
-    "        if isinstance(chunk, bytes):\n",
-    "            wf.writeframes(chunk)\n",
-    "        else:\n",
-    "            raise Exception('Error: chunk is not bytes')\n",
-    "        \n",
-    "    stream.closeStream()    \n",
-    "    print('Done')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "aaca1465",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Recording...\n",
-      "Done\n"
-     ]
-    }
-   ],
-   "source": [
-    "\"\"\"获取系统音频输入，转换为单通道音频并重采样到16000Hz，持续5秒，然后保存为wav文件\"\"\"\n",
-    "\n",
-    "with wave.open('output.wav', 'wb') as wf:\n",
-    "    wf.setnchannels(1)\n",
-    "    wf.setsampwidth(stream.SAMP_WIDTH)\n",
-    "    wf.setframerate(16000)\n",
-    "    stream.openStream()\n",
-    "\n",
-    "    print('Recording...')\n",
-    "\n",
-    "    for _ in range(0, 100):\n",
-    "        chunk = resampleRawChunk(\n",
-    "            stream.read_chunk(),\n",
-    "            stream.CHANNELS,\n",
-    "            stream.RATE,\n",
-    "            16000,\n",
-    "            mode=\"sinc_best\"\n",
-    "        )\n",
-    "        if isinstance(chunk, bytes):\n",
-    "            wf.writeframes(chunk)\n",
-    "        else:\n",
-    "            raise Exception('Error: chunk is not bytes')\n",
-    "        \n",
-    "    stream.closeStream()    \n",
-    "    print('Done')"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": ".venv",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/engine-test/vosk.ipynb
+++ b/engine-test/vosk.ipynb
@@ -1,124 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "6fb12704",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "d:\\Projects\\auto-caption\\caption-engine\\subenv\\Lib\\site-packages\\vosk\\__init__.py\n"
-     ]
-    }
-   ],
-   "source": [
-    "import vosk\n",
-    "print(vosk.__file__)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "63a06f5c",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "        采样设备：\n",
-      "            - 设备类型：音频输入\n",
-      "            - 序号：1\n",
-      "            - 名称：麦克风阵列 (Realtek(R) Audio)\n",
-      "            - 最大输入通道数：2\n",
-      "            - 默认低输入延迟：0.09s\n",
-      "            - 默认高输入延迟：0.18s\n",
-      "            - 默认采样率：44100.0Hz\n",
-      "            - 是否回环设备：False\n",
-      "\n",
-      "        音频样本块大小：2205\n",
-      "        样本位宽：2\n",
-      "        采样格式：8\n",
-      "        音频通道数：2\n",
-      "        音频采样率：44100\n",
-      "        \n"
-     ]
-    }
-   ],
-   "source": [
-    "import sys\n",
-    "import os\n",
-    "import json\n",
-    "from vosk import Model, KaldiRecognizer\n",
-    "\n",
-    "current_dir = os.getcwd() \n",
-    "sys.path.append(os.path.join(current_dir, '../caption-engine'))\n",
-    "\n",
-    "from sysaudio.win import AudioStream\n",
-    "from audioprcs import resampleRawChunk, mergeChunkChannels\n",
-    "\n",
-    "stream = AudioStream(1)\n",
-    "stream.printInfo()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "5d5a0afa",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "model = Model(os.path.join(\n",
-    "    current_dir,\n",
-    "    '../caption-engine/models/vosk-model-small-cn-0.22'\n",
-    "))\n",
-    "recognizer = KaldiRecognizer(model, 16000)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7e9d1530",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "stream.openStream()\n",
-    "\n",
-    "for i in range(200):\n",
-    "    chunk = stream.read_chunk()\n",
-    "    chunk_mono = resampleRawChunk(chunk, stream.CHANNELS, stream.RATE, 16000)\n",
-    "    if recognizer.AcceptWaveform(chunk_mono):\n",
-    "        result = json.loads(recognizer.Result())\n",
-    "        print(\"acc:\", result.get(\"text\", \"\"))\n",
-    "    else:\n",
-    "        partial = json.loads(recognizer.PartialResult())\n",
-    "        print(\"else:\", partial.get(\"partial\", \"\"))"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "subenv",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.12.1"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/caption-engine/audio2text/init.py
+++ b/caption-engine/audio2text/init.py
--- a/caption-engine/audio2text/gummy.py
+++ b/caption-engine/audio2text/gummy.py
@@ -6,8 +6,8 @@ from dashscope.audio.asr import (
 )
 import dashscope
 from datetime import datetime
-import json
-import sys
+from utils import stdout_cmd, stdout_obj
+

 class Callback(TranslationRecognizerCallback):
    """
@@ -15,17 +15,20 @@ class Callback(TranslationRecognizerCallback):
    """
    def __init__(self):
        super().__init__()
+        self.index = 0
        self.usage = 0
        self.cur_id = -1
        self.time_str = ''

    def on_open(self) -> None:
-        # print("on_open")
-        pass
+        self.usage = 0
+        self.cur_id = -1
+        self.time_str = ''
+        stdout_cmd('info', 'Gummy translator started.')

    def on_close(self) -> None:
-        # print("on_close")
-        pass
+        stdout_cmd('info', 'Gummy translator closed.')
+        stdout_cmd('usage', str(self.usage))

    def on_event(
        self,
@@ -35,17 +38,17 @@ class Callback(TranslationRecognizerCallback):
        usage
    ) -> None:
        caption = {}
+
        if transcription_result is not None:
-            caption['index'] = transcription_result.sentence_id
-            caption['text'] = transcription_result.text
-            if caption['index'] != self.cur_id:
-                self.cur_id = caption['index']
-                cur_time = datetime.now().strftime('%H:%M:%S.%f')[:-3]
-                caption['time_s'] = cur_time
-                self.time_str = cur_time
-            else:
-                caption['time_s'] = self.time_str
+            if self.cur_id != transcription_result.sentence_id:
+                self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+                self.cur_id = transcription_result.sentence_id
+                self.index += 1  
+            caption['command'] = 'caption'
+            caption['index'] = self.index
+            caption['time_s'] = self.time_str
            caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+            caption['text'] = transcription_result.text
            caption['translation'] = ""

        if translation_result is not None:
@@ -55,19 +58,9 @@ class Callback(TranslationRecognizerCallback):
        if usage:
            self.usage += usage['duration']

-        # print(caption)
-        self.send_to_node(caption)
+        if 'text' in caption:
+            stdout_obj(caption)

-    def send_to_node(self, data):
-        """
-        将数据发送到 Node.js 进程
-        """
-        try:
-            json_data = json.dumps(data) + '\n'
-            sys.stdout.write(json_data)
-            sys.stdout.flush()
-        except Exception as e:
-            print(f"Error sending data to Node.js: {e}", file=sys.stderr)

 class GummyTranslator:
    """
@@ -78,7 +71,7 @@ class GummyTranslator:
        source: 源语言代码字符串（zh, en, ja 等）
        target: 目标语言代码字符串（zh, en, ja 等）
    """
-    def __init__(self, rate, source, target, api_key):
+    def __init__(self, rate: int, source: str, target: str | None, api_key: str | None):
        if api_key:
            dashscope.api_key = api_key
        self.translator = TranslationRecognizerRealtime(
@@ -97,7 +90,7 @@ class GummyTranslator:
        self.translator.start()

    def send_audio_frame(self, data):
-        """发送音频帧"""
+        """发送音频帧，擎将自动识别并将识别结果输出到标准输出中"""
        self.translator.send_audio_frame(data)

    def stop(self):
--- a/engine/audio2text/vosk.py
+++ b/engine/audio2text/vosk.py
@@ -0,0 +1,59 @@
+import json
+from datetime import datetime
+
+from vosk import Model, KaldiRecognizer, SetLogLevel
+from utils import stdout_obj
+
+class VoskRecognizer:
+    """
+    使用 Vosk 引擎流式处理的音频数据，并在标准输出中输出与 Auto Caption 软件可读取的 JSON 字符串数据
+
+    初始化参数：
+        model_path: Vosk 识别模型路径
+    """
+    def __int__(self, model_path: str):
+        SetLogLevel(-1)
+        if model_path.startswith('"'):
+            model_path = model_path[1:]
+        if model_path.endswith('"'):
+            model_path = model_path[:-1]
+        self.model_path = model_path
+        self.time_str = ''
+        self.cur_id = 0
+        self.prev_content = ''
+
+        self.model = Model(self.model_path)
+        self.recognizer = KaldiRecognizer(self.model, 16000)
+    
+    def send_audio_frame(self, data: bytes):
+        """
+        发送音频帧给 Vosk 引擎，引擎将自动识别并将识别结果输出到标准输出中
+
+        Args:
+            data: 音频帧数据，采样率必须为 16000Hz
+        """
+        caption = {}
+        caption['command'] = 'caption'
+        caption['translation'] = ''
+
+        if self.recognizer.AcceptWaveform(data):
+            content = json.loads(self.recognizer.Result()).get('text', '')
+            caption['index'] = self.cur_id
+            caption['text'] = content
+            caption['time_s'] = self.time_str
+            caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+            self.prev_content = ''
+            self.cur_id += 1
+        else:
+            content = json.loads(self.recognizer.PartialResult()).get('partial', '')
+            if content == '' or content == self.prev_content:
+                return
+            if self.prev_content == '':
+                self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+            caption['index'] = self.cur_id
+            caption['text'] = content
+            caption['time_s'] = self.time_str
+            caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+            self.prev_content = content
+        
+        stdout_obj(caption)
--- a/caption-engine/main-gummy.py
+++ b/caption-engine/main-gummy.py
@@ -1,21 +1,11 @@
 import sys
 import argparse
-
-if sys.platform == 'win32':
-    from sysaudio.win import AudioStream
-elif sys.platform == 'darwin':
-    from sysaudio.darwin import AudioStream
-elif sys.platform == 'linux':
-    from sysaudio.linux import AudioStream
-else:
-    raise NotImplementedError(f"Unsupported platform: {sys.platform}")
-
-from audioprcs import mergeChunkChannels
+from sysaudio import AudioStream
+from utils import merge_chunk_channels
 from audio2text import InvalidParameter, GummyTranslator


 def convert_audio_to_text(s_lang, t_lang, audio_type, chunk_rate, api_key):
-    sys.stdout.reconfigure(line_buffering=True) # type: ignore
    stream = AudioStream(audio_type, chunk_rate)

    if t_lang == 'none':
@@ -23,20 +13,21 @@ def convert_audio_to_text(s_lang, t_lang, audio_type, chunk_rate, api_key):
    else:
        gummy = GummyTranslator(stream.RATE, s_lang, t_lang, api_key)

-    stream.openStream()
+    stream.open_stream()
    gummy.start()

    while True:
        try:
            chunk = stream.read_chunk()
-            chunk_mono = mergeChunkChannels(chunk, stream.CHANNELS)
+            if chunk is None: continue
+            chunk_mono = merge_chunk_channels(chunk, stream.CHANNELS)
            try:
                gummy.send_audio_frame(chunk_mono)
            except InvalidParameter:
                gummy.start()
                gummy.send_audio_frame(chunk_mono)
        except KeyboardInterrupt:
-            stream.closeStream()
+            stream.close_stream()
            gummy.stop()
            break

--- a/caption-engine/main-gummy.spec
+++ b/caption-engine/main-gummy.spec
--- a/caption-engine/main-vosk.py
+++ b/caption-engine/main-vosk.py
@@ -4,17 +4,9 @@ import argparse
 from datetime import datetime
 import numpy.core.multiarray

-if sys.platform == 'win32':
-    from sysaudio.win import AudioStream
-elif sys.platform == 'darwin':
-    from sysaudio.darwin import AudioStream
-elif sys.platform == 'linux':
-    from sysaudio.linux import AudioStream
-else:
-    raise NotImplementedError(f"Unsupported platform: {sys.platform}")
-
+from sysaudio import AudioStream
 from vosk import Model, KaldiRecognizer, SetLogLevel
-from audioprcs import resampleRawChunk
+from utils import resample_chunk_mono

 SetLogLevel(-1)

@@ -30,7 +22,7 @@ def convert_audio_to_text(audio_type, chunk_rate, model_path):
    recognizer = KaldiRecognizer(model, 16000)

    stream = AudioStream(audio_type, chunk_rate)
-    stream.openStream()
+    stream.open_stream()

    time_str = ''
    cur_id = 0
@@ -38,7 +30,8 @@ def convert_audio_to_text(audio_type, chunk_rate, model_path):

    while True:
        chunk = stream.read_chunk()
-        chunk_mono = resampleRawChunk(chunk, stream.CHANNELS, stream.RATE, 16000)
+        if chunk is None: continue
+        chunk_mono = resample_chunk_mono(chunk, stream.CHANNELS, stream.RATE, 16000)

        caption = {}
        if recognizer.AcceptWaveform(chunk_mono):
@@ -56,6 +49,7 @@ def convert_audio_to_text(audio_type, chunk_rate, model_path):
                continue
            if prev_content == '':
                time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+            caption['command'] = 'caption'
            caption['index'] = cur_id
            caption['text'] = content
            caption['time_s'] = time_str
--- a/caption-engine/main-vosk.spec
+++ b/caption-engine/main-vosk.spec
--- a/engine/main.py
+++ b/engine/main.py
@@ -0,0 +1,37 @@
+import argparse
+
+def gummy_engine(s, t, a, c, k):
+    pass
+
+def vosk_engine(a, c, m):
+    pass
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Convert system audio stream to text')
+    # both
+    parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
+    parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
+    parser.add_argument('-c', '--chunk_rate', default=20, help='Number of audio stream chunks collected per second')
+    # gummy
+    parser.add_argument('-s', '--source_language', default='en', help='Source language code')
+    parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
+    parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
+    # vosk
+    parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
+    args = parser.parse_args()
+    if args.caption_engine == 'gummy':
+        gummy_engine(
+            args.source_language,
+            args.target_language,
+            int(args.audio_type),
+            int(args.chunk_rate),
+            args.api_key
+        )
+    elif args.caption_engine == 'vosk':
+        vosk_engine(
+            int(args.audio_type),
+            int(args.chunk_rate),
+            args.model_path
+        )
+    else:
+        raise ValueError('Invalid caption engine specified.')
--- a/caption-engine/requirements_darwin.txt
+++ b/caption-engine/requirements_darwin.txt
--- a/caption-engine/requirements_linux.txt
+++ b/caption-engine/requirements_linux.txt
--- a/caption-engine/requirements_win.txt
+++ b/caption-engine/requirements_win.txt
@@ -1,7 +1,6 @@
 dashscope
 numpy
 samplerate
-PyAudio
 PyAudioWPatch
 vosk
 pyinstaller
--- a/engine/sysaudio/init.py
+++ b/engine/sysaudio/init.py
@@ -0,0 +1,10 @@
+import sys
+
+if sys.platform == "win32":
+    from .win import AudioStream
+elif sys.platform == "darwin":
+    from .darwin import AudioStream
+elif sys.platform == "linux":
+    from .linux import AudioStream
+else:
+    raise NotImplementedError(f"Unsupported platform: {sys.platform}")
--- a/caption-engine/sysaudio/darwin.py
+++ b/caption-engine/sysaudio/darwin.py
@@ -1,11 +1,24 @@
 """获取 MacOS 系统音频输入/输出流"""

 import pyaudio
+from textwrap import dedent
+
+
+def get_blackhole_device(mic: pyaudio.PyAudio):
+    """
+    获取 BlackHole 设备
+    """
+    device_count = mic.get_device_count()
+    for i in range(device_count):
+        dev_info = mic.get_device_info_by_index(i)
+        if 'blackhole' in str(dev_info["name"]).lower():    
+            return dev_info
+    raise Exception("The device containing BlackHole was not found.")


 class AudioStream:
    """
-    获取系统音频流（支持 BlackHole 作为系统音频输出捕获）
+    获取系统音频流（如果要捕获输出音频，仅支持 BlackHole 作为系统音频输出捕获）

    初始化参数：
        audio_type: 0-系统音频输出流（需配合 BlackHole），1-系统音频输入流
@@ -15,46 +28,40 @@ class AudioStream:
        self.audio_type = audio_type
        self.mic = pyaudio.PyAudio()
        if self.audio_type == 0:
-            self.device = self.getOutputDeviceInfo()
+            self.device = get_blackhole_device(self.mic)
        else:
            self.device = self.mic.get_default_input_device_info()
+        self.stop_signal = False
        self.stream = None
-        self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
+        self.INDEX = self.device["index"]
        self.FORMAT = pyaudio.paInt16
-        self.CHANNELS = self.device["maxInputChannels"]
+        self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
+        self.CHANNELS = int(self.device["maxInputChannels"])
        self.RATE = int(self.device["defaultSampleRate"])
        self.CHUNK = self.RATE // chunk_rate
-        self.INDEX = self.device["index"]

-    def getOutputDeviceInfo(self):
-        """查找指定关键词的输入设备"""
-        device_count = self.mic.get_device_count()
-        for i in range(device_count):
-            dev_info = self.mic.get_device_info_by_index(i)
-            if 'blackhole' in dev_info["name"].lower():    
-                return dev_info
-        raise Exception("The device containing BlackHole was not found.")
-
-    def printInfo(self):
+    def get_info(self):
        dev_info = f"""
-        采样输入设备：
+        采样设备：
            - 设备类型：{ "音频输出" if self.audio_type == 0 else "音频输入" }
-            - 序号：{self.device['index']}
-            - 名称：{self.device['name']}
+            - 设备序号：{self.device['index']}
+            - 设备名称：{self.device['name']}
            - 最大输入通道数：{self.device['maxInputChannels']}
            - 默认低输入延迟：{self.device['defaultLowInputLatency']}s
            - 默认高输入延迟：{self.device['defaultHighInputLatency']}s
            - 默认采样率：{self.device['defaultSampleRate']}Hz
+            - 是否回环设备：{self.device['isLoopbackDevice']}

-        音频样本块大小：{self.CHUNK}
+        设备序号：{self.INDEX}
+        样本格式：{self.FORMAT}
        样本位宽：{self.SAMP_WIDTH}
-        采样格式：{self.FORMAT}
-        音频通道数：{self.CHANNELS}
-        音频采样率：{self.RATE}
+        样本通道数：{self.CHANNELS}
+        样本采样率：{self.RATE}
+        样本块大小：{self.CHUNK}
        """
-        print(dev_info)
+        return dedent(dev_info).strip()

-    def openStream(self):
+    def open_stream(self):
        """
        打开并返回系统音频输出流
        """
@@ -72,14 +79,24 @@ class AudioStream:
        """
        读取音频数据
        """
+        if self.stop_signal:
+            self.close_stream()
+            return None
        if not self.stream: return None
        return self.stream.read(self.CHUNK, exception_on_overflow=False)

-    def closeStream(self):
+    def close_stream_signal(self):
        """
-        关闭系统音频输出流
+        线程安全的关闭系统音频输入流，不一定会立即关闭
        """
-        if self.stream is None: return
-        self.stream.stop_stream()
-        self.stream.close()
-        self.stream = None
+        self.stop_signal = True
+
+    def close_stream(self):
+        """
+        立即关闭系统音频输入流
+        """
+        if self.stream is not None:
+            self.stream.stop_stream()
+            self.stream.close()
+            self.stream = None
+        self.stop_signal = False
--- a/caption-engine/sysaudio/linux.py
+++ b/caption-engine/sysaudio/linux.py
@@ -1,8 +1,10 @@
 """获取 Linux 系统音频输入流"""

 import subprocess
+from textwrap import dedent

-def findMonitorSource():
+
+def find_monitor_source():
    result = subprocess.run(
        ["pactl", "list", "short", "sources"],
        stdout=subprocess.PIPE, text=True
@@ -16,7 +18,8 @@ def findMonitorSource():

    raise RuntimeError("System output monitor device not found")

-def findInputSource():
+
+def find_input_source():
    result = subprocess.run(
        ["pactl", "list", "short", "sources"],
        stdout=subprocess.PIPE, text=True
@@ -28,8 +31,10 @@ def findInputSource():
        name = parts[1]
        if ".monitor" not in name:
            return name
+
    raise RuntimeError("Microphone input device not found")

+
 class AudioStream:
    """
    获取系统音频流
@@ -42,34 +47,33 @@ class AudioStream:
        self.audio_type = audio_type

        if self.audio_type == 0:
-            self.source = findMonitorSource()
+            self.source = find_monitor_source()
        else:
-            self.source = findInputSource()
-
+            self.source = find_input_source()
+        self.stop_signal = False
        self.process = None
-
-        self.SAMP_WIDTH = 2
        self.FORMAT = 16
+        self.SAMP_WIDTH = 2
        self.CHANNELS = 2
        self.RATE = 48000
        self.CHUNK = self.RATE // chunk_rate

-    def printInfo(self):
+    def get_info(self):
        dev_info = f"""
        音频捕获进程：
            - 捕获类型：{"音频输出" if self.audio_type == 0 else "音频输入"}
            - 设备源：{self.source}
-            - 捕获进程PID：{self.process.pid if self.process else "None"}
+            - 捕获进程 PID：{self.process.pid if self.process else "None"}

-        音频样本块大小：{self.CHUNK}
+        样本格式：{self.FORMAT}
        样本位宽：{self.SAMP_WIDTH}
-        采样格式：{self.FORMAT}
-        音频通道数：{self.CHANNELS}
-        音频采样率：{self.RATE}
+        样本通道数：{self.CHANNELS}
+        样本采样率：{self.RATE}
+        样本块大小：{self.CHUNK}
        """
        print(dev_info)

-    def openStream(self):
+    def open_stream(self):
        """
        启动音频捕获进程
        """
@@ -82,13 +86,23 @@ class AudioStream:
        """
        读取音频数据
        """
-        if self.process:
+        if self.stop_signal:
+            self.close_stream()
+            return None
+        if self.process and self.process.stdout:
            return self.process.stdout.read(self.CHUNK)
        return None

-    def closeStream(self):
+    def close_stream_signal(self):
+        """
+        线程安全的关闭系统音频输入流，不一定会立即关闭
+        """
+        self.stop_signal = True
+
+    def close_stream(self):
        """
        关闭系统音频捕获进程
        """
        if self.process:
            self.process.terminate()
+        self.stop_signal = False
--- a/caption-engine/sysaudio/win.py
+++ b/caption-engine/sysaudio/win.py
@@ -1,14 +1,15 @@
 """获取 Windows 系统音频输入/输出流"""

 import pyaudiowpatch as pyaudio
+from textwrap import dedent


-def getDefaultLoopbackDevice(mic: pyaudio.PyAudio, info = True)->dict:
+def get_default_loopback_device(mic: pyaudio.PyAudio, info = True)->dict:
    """
    获取默认的系统音频输出的回环设备
    Args:
-        mic (pyaudio.PyAudio): pyaudio对象
-        info (bool, optional): 是否打印设备信息
+        mic: pyaudio对象
+        info: 是否打印设备信息

    Returns:
        dict: 系统音频输出的回环设备
@@ -51,38 +52,40 @@ class AudioStream:
        self.audio_type = audio_type
        self.mic = pyaudio.PyAudio()
        if self.audio_type == 0:
-            self.device = getDefaultLoopbackDevice(self.mic, False)
+            self.device = get_default_loopback_device(self.mic, False)
        else:
            self.device = self.mic.get_default_input_device_info()
+        self.stop_signal = False
        self.stream = None
-        self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
+        self.INDEX = self.device["index"]
        self.FORMAT = pyaudio.paInt16
+        self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
        self.CHANNELS = int(self.device["maxInputChannels"])
        self.RATE = int(self.device["defaultSampleRate"])
        self.CHUNK = self.RATE // chunk_rate
-        self.INDEX = self.device["index"]

-    def printInfo(self):
+    def get_info(self):
        dev_info = f"""
        采样设备：
            - 设备类型：{ "音频输出" if self.audio_type == 0 else "音频输入" }
-            - 序号：{self.device['index']}
-            - 名称：{self.device['name']}
+            - 设备序号：{self.device['index']}
+            - 设备名称：{self.device['name']}
            - 最大输入通道数：{self.device['maxInputChannels']}
            - 默认低输入延迟：{self.device['defaultLowInputLatency']}s
            - 默认高输入延迟：{self.device['defaultHighInputLatency']}s
            - 默认采样率：{self.device['defaultSampleRate']}Hz
            - 是否回环设备：{self.device['isLoopbackDevice']}

-        音频样本块大小：{self.CHUNK}
+        设备序号：{self.INDEX}
+        样本格式：{self.FORMAT}
        样本位宽：{self.SAMP_WIDTH}
-        采样格式：{self.FORMAT}
-        音频通道数：{self.CHANNELS}
-        音频采样率：{self.RATE}
+        样本通道数：{self.CHANNELS}
+        样本采样率：{self.RATE}
+        样本块大小：{self.CHUNK}
        """
-        print(dev_info)
+        return dedent(dev_info).strip()

-    def openStream(self):
+    def open_stream(self):
        """
        打开并返回系统音频输出流
        """
@@ -96,18 +99,28 @@ class AudioStream:
        )
        return self.stream

-    def read_chunk(self):
+    def read_chunk(self) -> bytes | None:
        """
        读取音频数据
        """
+        if self.stop_signal:
+            self.close_stream()
+            return None
        if not self.stream: return None
        return self.stream.read(self.CHUNK, exception_on_overflow=False)

-    def closeStream(self):
+    def close_stream_signal(self):
        """
-        关闭系统音频输出流
+        线程安全的关闭系统音频输入流，不一定会立即关闭
        """
-        if self.stream is None: return
-        self.stream.stop_stream()
-        self.stream.close()
-        self.stream = None
+        self.stop_signal = True
+
+    def close_stream(self):
+        """
+        关闭系统音频输入流
+        """
+        if self.stream is not None:
+            self.stream.stop_stream()
+            self.stream.close()
+            self.stream = None
+        self.stop_signal = False
--- a/engine/utils/init.py
+++ b/engine/utils/init.py
@@ -0,0 +1,2 @@
+from .process import merge_chunk_channels, resample_chunk_mono, resample_mono_chunk
+from .sysout import stdout, stdout_cmd, stdout_obj, stderr
--- a/caption-engine/audioprcs/process.py
+++ b/caption-engine/audioprcs/process.py
@@ -1,16 +1,17 @@
 import samplerate
 import numpy as np

-def mergeChunkChannels(chunk, channels):
+
+def merge_chunk_channels(chunk: bytes, channels: int) -> bytes:
    """
    将当前多通道音频数据块转换为单通道音频数据块

    Args:
-        chunk: (bytes)多通道音频数据块
+        chunk: 多通道音频数据块
        channels: 通道数

    Returns:
-        (bytes)单通道音频数据块
+        单通道音频数据块
    """
    # (length * channels,)
    chunk_np = np.frombuffer(chunk, dtype=np.int16)
@@ -22,19 +23,19 @@ def mergeChunkChannels(chunk, channels):
    return chunk_mono.tobytes()


-def resampleRawChunk(chunk, channels, orig_sr, target_sr, mode="sinc_best"):
+def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
    """
    将当前多通道音频数据块转换成单通道音频数据块，然后进行重采样

    Args:
-        chunk: (bytes)多通道音频数据块
+        chunk: 多通道音频数据块
        channels: 通道数
        orig_sr: 原始采样率
        target_sr: 目标采样率
        mode: 重采样模式，可选：'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'

    Return:
-        (bytes)单通道音频数据块
+        单通道音频数据块
    """
    # (length * channels,)
    chunk_np = np.frombuffer(chunk, dtype=np.int16)
@@ -44,22 +45,23 @@ def resampleRawChunk(chunk, channels, orig_sr, target_sr, mode="sinc_best"):
    chunk_mono_f = np.mean(chunk_np.astype(np.float32), axis=1)
    chunk_mono = chunk_mono_f.astype(np.int16)
    ratio = target_sr / orig_sr
-    chunk_mono_r =  samplerate.resample(chunk_mono, ratio, converter_type=mode)
+    chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode)
    chunk_mono_r = np.round(chunk_mono_r).astype(np.int16)
    return chunk_mono_r.tobytes()

-def resampleMonoChunk(chunk, orig_sr, target_sr, mode="sinc_best"):
+
+def resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
    """
    将当前单通道音频块进行重采样

    Args:
-        chunk: (bytes)单通道音频数据块
+        chunk: 单通道音频数据块
        orig_sr: 原始采样率
        target_sr: 目标采样率
        mode: 重采样模式，可选：'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'

    Return:
-        (bytes)单通道音频数据块
+        单通道音频数据块
    """
    chunk_np = np.frombuffer(chunk, dtype=np.int16)
    ratio = target_sr / orig_sr
--- a/engine/utils/sysout.py
+++ b/engine/utils/sysout.py
@@ -0,0 +1,18 @@
+import sys
+import json
+
+def stdout(text: str):
+    stdout_cmd("print", text)
+
+def stdout_cmd(command: str, content = ""):
+    msg = { "command": command, "content": content }
+    sys.stdout.write(json.dumps(msg) + "\n")
+    sys.stdout.flush()
+
+def stdout_obj(obj):
+    sys.stdout.write(json.dumps(obj) + "\n")
+    sys.stdout.flush()
+
+def stderr(text: str):
+    sys.stderr.write(text + "\n")
+    sys.stderr.flush()
--- a/package-lock.json
+++ b/package-lock.json
@@ -22,6 +22,7 @@
        "@electron-toolkit/eslint-config-ts": "^3.0.0",
        "@electron-toolkit/tsconfig": "^1.0.1",
        "@types/node": "^22.14.1",
+        "@types/pidusage": "^2.0.5",
        "@vitejs/plugin-vue": "^5.2.3",
        "electron": "^35.1.5",
        "electron-builder": "^25.1.8",
@@ -2296,6 +2297,13 @@
        "undici-types": "~6.21.0"
      }
    },
+    "node_modules/@types/pidusage": {
+      "version": "2.0.5",
+      "resolved": "https://registry.npmmirror.com/@types/pidusage/-/pidusage-2.0.5.tgz",
+      "integrity": "sha512-MIiyZI4/MK9UGUXWt0jJcCZhVw7YdhBuTOuqP/BjuLDLZ2PmmViMIQgZiWxtaMicQfAz/kMrZ5T7PKxFSkTeUA==",
+      "dev": true,
+      "license": "MIT"
+    },
    "node_modules/@types/plist": {
      "version": "3.0.5",
      "resolved": "https://registry.npmmirror.com/@types/plist/-/plist-3.0.5.tgz",
--- a/package.json
+++ b/package.json
@@ -13,7 +13,7 @@
    "typecheck:web": "vue-tsc --noEmit -p tsconfig.web.json --composite false",
    "typecheck": "npm run typecheck:node && npm run typecheck:web",
    "start": "electron-vite preview",
-    "dev": "electron-vite dev",
+    "dev": "chcp 65001 && electron-vite dev",
    "build": "npm run typecheck && electron-vite build",
    "postinstall": "electron-builder install-app-deps",
    "build:unpack": "npm run build && electron-builder --dir",
@@ -35,6 +35,7 @@
    "@electron-toolkit/eslint-config-ts": "^3.0.0",
    "@electron-toolkit/tsconfig": "^1.0.1",
    "@types/node": "^22.14.1",
+    "@types/pidusage": "^2.0.5",
    "@vitejs/plugin-vue": "^5.2.3",
    "electron": "^35.1.5",
    "electron-builder": "^25.1.8",
--- a/src/main/utils/AllConfig.ts
+++ b/src/main/utils/AllConfig.ts
@@ -2,6 +2,7 @@ import {
  UILanguage, UITheme, Styles, Controls,
  CaptionItem, FullConfig
 } from '../types'
+import { Log } from './Log'
 import { app, BrowserWindow } from 'electron'
 import * as path from 'path'
 import * as fs from 'fs'
@@ -48,6 +49,7 @@ class AllConfig {
  uiTheme: UITheme = 'system';
  styles: Styles = {...defaultStyles};
  controls: Controls = {...defaultControls};
+  lastLogIndex: number = -1;
  captionLog: CaptionItem[] = [];

  constructor() {}
@@ -61,7 +63,7 @@ class AllConfig {
      if(config.leftBarWidth) this.leftBarWidth = config.leftBarWidth
      if(config.styles) this.setStyles(config.styles)
      if(config.controls) this.setControls(config.controls)
-      console.log('[INFO] Read Config from:', configPath)
+      Log.info('Read Config from:', configPath)
    }
  }

@@ -75,7 +77,7 @@ class AllConfig {
    }
    const configPath = path.join(app.getPath('userData'), 'config.json')
    fs.writeFileSync(configPath, JSON.stringify(config, null, 2))
-    console.log('[INFO] Write Config to:', configPath)
+    Log.info('Write Config to:', configPath)
  }

  public getFullConfig(): FullConfig {
@@ -96,7 +98,7 @@ class AllConfig {
        this.styles[key] = args[key]
      }
    }
-    console.log('[INFO] Set Styles:', this.styles)
+    Log.info('Set Styles:', this.styles)
  }

  public resetStyles() {
@@ -105,7 +107,7 @@ class AllConfig {

  public sendStyles(window: BrowserWindow) {
    window.webContents.send('both.styles.set', this.styles)
-    console.log(`[INFO] Send Styles to #${window.id}:`, this.styles)
+    Log.info(`Send Styles to #${window.id}:`, this.styles)
  }

  public setControls(args: Object) {
@@ -116,27 +118,28 @@ class AllConfig {
      }
    }
    this.controls.engineEnabled = engineEnabled
-    console.log('[INFO] Set Controls:', this.controls)
+    Log.info('Set Controls:', this.controls)
  }

  public sendControls(window: BrowserWindow) {
    window.webContents.send('control.controls.set', this.controls)
-    console.log(`[INFO] Send Controls to #${window.id}:`, this.controls)
+    Log.info(`Send Controls to #${window.id}:`, this.controls)
  }

  public updateCaptionLog(log: CaptionItem) {
    let command: 'add' | 'upd' = 'add'
    if(
      this.captionLog.length &&
-      this.captionLog[this.captionLog.length - 1].index === log.index &&
-      this.captionLog[this.captionLog.length - 1].time_s === log.time_s
+      this.lastLogIndex === log.index
    ) {
      this.captionLog.splice(this.captionLog.length - 1, 1, log)
      command = 'upd'
    }
    else {
      this.captionLog.push(log)
+      this.lastLogIndex = log.index
    }
+    this.captionLog[this.captionLog.length - 1].index = this.captionLog.length
    for(const window of BrowserWindow.getAllWindows()){
      this.sendCaptionLog(window, command)
    }
--- a/src/main/utils/CaptionEngine.ts
+++ b/src/main/utils/CaptionEngine.ts
@@ -5,6 +5,7 @@ import path from 'path'
 import { controlWindow } from '../ControlWindow'
 import { allConfig } from './AllConfig'
 import { i18n } from '../i18n'
+import { Log } from './Log'

 export class CaptionEngine {
  appPath: string = ''
@@ -14,7 +15,7 @@ export class CaptionEngine {

  private getApp(): boolean {
    if (allConfig.controls.customized && allConfig.controls.customizedApp) {
-      console.log('[INFO] Using customized engine')
+      Log.info('Using customized engine')
      this.appPath = allConfig.controls.customizedApp
      this.command = allConfig.controls.customizedCommand.split(' ')
    }
@@ -25,21 +26,22 @@ export class CaptionEngine {
        return false
      }
      let gummyName = 'main-gummy'
-      if (process.platform === 'win32') {
-        gummyName += '.exe'
-      }
+      if (process.platform === 'win32') { gummyName += '.exe' }
+      this.command = []
      if (is.dev) {
        this.appPath = path.join(
-          app.getAppPath(),
-          'caption-engine', 'dist', gummyName
+          app.getAppPath(), 'engine',
+          'subenv', 'Scripts', 'python.exe'
        )
+        this.command.push(path.join(
+          app.getAppPath(), 'engine', 'main-gummy.py'
+        ))
      }
      else {
        this.appPath = path.join(
-          process.resourcesPath, 'caption-engine', gummyName
+          process.resourcesPath, 'engine', gummyName
        )
      }
-      this.command = []
      this.command.push('-s', allConfig.controls.sourceLang)
      this.command.push(
        '-t', allConfig.controls.translation ?
@@ -53,31 +55,33 @@ export class CaptionEngine {
    else if(allConfig.controls.engine === 'vosk'){
      allConfig.controls.customized = false
      let voskName = 'main-vosk'
-      if (process.platform === 'win32') {
-        voskName += '.exe'
-      }
+      if (process.platform === 'win32') { voskName += '.exe' }
+      this.command = []
      if (is.dev) {
        this.appPath = path.join(
-          app.getAppPath(),
-          'caption-engine', 'dist', voskName
+          app.getAppPath(), 'engine',
+          'subenv', 'Scripts', 'python.exe'
        )
+        this.command.push(path.join(
+          app.getAppPath(), 'engine', 'main-vosk.py'
+        ))
      }
      else {
        this.appPath = path.join(
-          process.resourcesPath, 'caption-engine', voskName
+          process.resourcesPath, 'engine', voskName
        )
      }
-      this.command = []
      this.command.push('-a', allConfig.controls.audio ? '1' : '0')
      this.command.push('-m', `"${allConfig.controls.modelPath}"`)
    }
-    console.log('[INFO] Engine Path:', this.appPath)
-    console.log('[INFO] Engine Command:', this.command)
+    Log.info('Engine Path:', this.appPath)
+    Log.info('Engine Command:', this.command)
    return true
  }

  public start() {
    if (this.processStatus !== 'stopped') {
+      Log.warn('Caption engine status is not stopped, cannot start')
      return
    }
    if(!this.getApp()){ return }
@@ -87,12 +91,12 @@ export class CaptionEngine {
    }
    catch (e) {
      controlWindow.sendErrorMessage(i18n('engine.start.error') + e)
-      console.error('[ERROR] Error starting subprocess:', e)
+      Log.error('Error starting engine:', e)
      return
    }

    this.processStatus = 'running'
-    console.log('[INFO] Caption Engine Started, PID:', this.process.pid)
+    Log.info('Caption Engine Started, PID:', this.process.pid)

    allConfig.controls.engineEnabled = true
    if(controlWindow.window){
@@ -108,27 +112,23 @@ export class CaptionEngine {
      lines.forEach((line: string) => {
        if (line.trim()) {
          try {
-            const caption = JSON.parse(line);
-            if(caption.index === undefined) {
-              console.log('[INFO] Engine Bad Output:', caption);
-            }
-            else allConfig.updateCaptionLog(caption);
+            const data_obj = JSON.parse(line)
+            handleEngineData(data_obj)
          } catch (e) {
            controlWindow.sendErrorMessage(i18n('engine.output.parse.error') + e)
-            console.error('[ERROR] Error parsing JSON:', e);
+            Log.error('Error parsing JSON:', e)
          }
        }
      });
    });

-    this.process.stderr.on('data', (data) => {
+    this.process.stderr.on('data', (data: any) => {
      if(this.processStatus === 'stopping') return
      controlWindow.sendErrorMessage(i18n('engine.error') + data)
-      console.error(`[ERROR] Subprocess Error: ${data}`);
+      Log.error(`Engine Error: ${data}`);
    });

    this.process.on('close', (code: any) => {
-      console.log(`[INFO] Subprocess exited with code ${code}`);
      this.process = undefined;
      allConfig.controls.engineEnabled = false
      if(controlWindow.window){
@@ -136,14 +136,14 @@ export class CaptionEngine {
        controlWindow.window.webContents.send('control.engine.stopped')
      }
      this.processStatus = 'stopped'
-      console.log('[INFO] Caption engine process stopped')
+      Log.info(`Engine exited with code ${code}`)
    });
  }

  public stop() {
    if(this.processStatus !== 'running') return
    if (this.process.pid) {
-      console.log('[INFO] Trying to stop process, PID:', this.process.pid)
+      Log.info('Trying to stop process, PID:', this.process.pid)
      let cmd = `kill ${this.process.pid}`;
      if (process.platform === "win32") {
        cmd = `taskkill /pid ${this.process.pid} /t /f`
@@ -151,7 +151,7 @@ export class CaptionEngine {
      exec(cmd, (error) => {
        if (error) {
          controlWindow.sendErrorMessage(i18n('engine.shutdown.error') + error)
-          console.error(`[ERROR] Failed to kill process: ${error}`)
+          Log.error(`Failed to kill process: ${error}`)
        }
      })
    }
@@ -163,11 +163,26 @@ export class CaptionEngine {
        controlWindow.window.webContents.send('control.engine.stopped')
      }
      this.processStatus = 'stopped'
-      console.log('[INFO] Process PID undefined, caption engine process stopped')
+      Log.info('Process PID undefined, caption engine process stopped')
      return
    }
    this.processStatus = 'stopping'
-    console.log('[INFO] Caption engine process stopping')
+    Log.info('Caption engine process stopping')
+  }
+}
+
+function handleEngineData(data: any) {
+  if(data.command === 'caption') {
+    allConfig.updateCaptionLog(data);
+  }
+  else if(data.command === 'print') {
+    Log.info('Engine print:', data.content)
+  }
+  else if(data.command === 'info') {
+    Log.info('Engine info:', data.content)
+  }
+  else if(data.command === 'usage') {
+    Log.info('Caption engine usage: ', data.content)
  }
 }

--- a/src/main/utils/Log.ts
+++ b/src/main/utils/Log.ts
@@ -0,0 +1,21 @@
+function getTimeString() {
+  const now = new Date()
+  const HH = String(now.getHours()).padStart(2, '0')
+  const MM = String(now.getMinutes()).padStart(2, '0')
+  const SS = String(now.getSeconds()).padStart(2, '0')
+  return `${HH}:${MM}:${SS}`
+}
+
+export class Log {
+  static info(...msg: any[]){
+    console.log(`[INFO ${getTimeString()}]`, ...msg)
+  }
+
+  static warn(...msg: any[]){
+    console.log(`[WARN ${getTimeString()}]`, ...msg)
+  }
+
+  static error(...msg: any[]){
+    console.log(`[ERROR ${getTimeString()}]`, ...msg)
+  }
+}
--- a/src/renderer/src/components/CaptionLog.vue
+++ b/src/renderer/src/components/CaptionLog.vue
@@ -136,6 +136,7 @@ import { useCaptionLogStore } from '@renderer/stores/captionLog'
 import { message } from 'ant-design-vue'
 import { useI18n } from 'vue-i18n'
 import * as tc from '../utils/timeCalc'
+import { CaptionItem } from '../types'

 const { t } = useI18n()

@@ -154,10 +155,9 @@ const baseMS = ref<number>(0)

 const pagination = ref({
  current: 1,
-  pageSize: 10,
+  pageSize: 20,
  showSizeChanger: true,
-  pageSizeOptions: ['10', '20', '50'],
-  showTotal: (total: number) => `Total: ${total}`,
+  pageSizeOptions: ['10', '20', '50', '100'],
  onChange: (page: number, pageSize: number) => {
    pagination.value.current = page
    pagination.value.pageSize = pageSize
@@ -174,12 +174,23 @@ const columns = [
    dataIndex: 'index',
    key: 'index',
    width: 80,
+    sorter: (a: CaptionItem, b: CaptionItem) => {
+      if(a.index <= b.index) return -1
+      return 1
+    },
+    sortDirections: ['descend'],
+    defaultSortOrder: 'descend',
  },
  {
    title: 'time',
    dataIndex: 'time',
    key: 'time',
    width: 160,
+    sorter: (a: CaptionItem, b: CaptionItem) => {
+      if(a.time_s <= b.time_s) return -1
+      return 1
+    },
+    sortDirections: ['descend', 'ascend'],
  },
  {
    title: 'content',
--- a/src/renderer/src/components/CaptionStyle.vue
+++ b/src/renderer/src/components/CaptionStyle.vue
@@ -37,7 +37,7 @@
      <a-input
        class="input-area"
        type="range"
-        min="0" max="64"
+        min="0" max="72"
        v-model:value="currentFontSize"
      />
      <div class="input-item-value">{{ currentFontSize }}px</div>
@@ -114,7 +114,7 @@
          <a-input
            class="input-area"
            type="range"
-            min="0" max="64"
+            min="0" max="72"
            v-model:value="currentTransFontSize"
          />
          <div class="input-item-value">{{ currentTransFontSize }}px</div>
@@ -159,7 +159,7 @@
          <a-input
            class="input-area"
            type="range"
-            min="0" max="10"
+            min="0" max="12"
            v-model:value="currentBlur"
          />
          <div class="input-item-value">{{ currentBlur }}px</div>
Author	SHA1	Message	Date
himeditator	b658ef5440	feat(engine): 优化字幕引擎输出格式、准备合并两个字幕引擎 - 重构字幕引擎相关代码 - 准备合并两个字幕引擎	2025-07-27 17:15:12 +08:00
himeditator	3792eb88b6	refactor(engine): 重构字幕引擎 - 更新 GummyTranslator 类，优化字幕生成逻辑 - 移除 audioprcs 模块，音频处理功能转移到 utils 模块 - 重构 sysaudio 模块，提高音频流管理的灵活性和稳定性 - 修改 TODO.md，完成按时间降序排列字幕记录的功能 - 更新文档，说明因资源限制将不再维护英文和日文文档	2025-07-26 23:37:24 +08:00
himeditator	8e575a9ba3	refactor(engine): 字幕引擎文件夹重命名，字幕记录添加降序选择 - 字幕记录表格可以按时间降序排列 - 将 caption-engine 重命名为 engine - 更新了相关文件和文件夹的路径 - 修改了 README 和 TODO 文档中的相关内容 - 更新了 Electron 构建配置	2025-07-26 21:29:16 +08:00
himeditator	697488ce84	docs: update README, add TODO	2025-07-20 00:32:57 +08:00
				`@@ -1 +0,0 @@`
				`from .process import mergeChunkChannels, resampleRawChunk, resampleMonoChunk`