release v0.5.0

- 更新了发行说明和用户手册 - 优化了界面显示和功能 - 过滤 Gummy 字幕引擎输出的不完整字幕
feat(engine): 添加字幕引擎资源消耗监控功能
2026-02-04 12:24:42 +08:00 · 2025-07-15 18:48:16 +08:00 · 2025-07-15 13:52:10 +08:00 · 2025-07-14 20:07:22 +08:00 · 2025-07-13 23:28:40 +08:00 · 2025-07-11 13:25:52 +08:00
55 changed files with 1617 additions and 318 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -7,5 +7,6 @@ out
 __pycache__
 subenv
 caption-engine/build
+caption-engine/models
 output.wav
-.venv
+.venv
--- a/.npmrc
+++ b/.npmrc
@@ -1,2 +1,2 @@
-# electron_mirror=https://npmmirror.com/mirrors/electron/
-# electron_builder_binaries_mirror=https://npmmirror.com/mirrors/electron-builder-binaries/
+electron_mirror=https://npmmirror.com/mirrors/electron/
+electron_builder_binaries_mirror=https://npmmirror.com/mirrors/electron-builder-binaries/
--- a/README.md
+++ b/README.md
@@ -2,17 +2,23 @@
    <img src="./build/icon.png" width="100px" height="100px"/>
    <h1 align="center">auto-caption</h1>
    <p>Auto Caption 是一个跨平台的实时字幕显示软件。</p>
-    <img src="https://img.shields.io/badge/version-0.3.0-blue">
-    <img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
-    <img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
-    <img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
-    <img src="https://visitor-badge.laobi.icu/badge?page_id=himeditator.github.io">
+    <p>
+      <a href="https://github.com/HiMeditator/auto-caption/releases">
+        <img src="https://img.shields.io/badge/release-0.5.0-blue">
+      </a>
+      <a href="https://github.com/HiMeditator/auto-caption/issues">
+        <img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
+      </a>
+      <img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
+      <img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
+      <img src="https://img.shields.io/github/stars/HiMeditator/auto-caption?style=social">
+    </p>
    <p>
        | <b>简体中文</b>
        | <a href="./README_en.md">English</a>
        | <a href="./README_ja.md">日本語</a> |
    </p>
-    <p><i>v0.3.0版本已经发布。预计将添加本地字幕引擎的v1.0.0版本仍正在开发中...</i></p>
+    <p><i>v0.5.0 版本已经发布。<b>目前 Vosk 本地字幕引擎效果较差，且不含翻译</b>，更优秀的字幕引擎正在尝试开发中...</i></p>
 </div>

 ![](./assets/media/main_zh.png)
@@ -31,18 +37,32 @@

 ## 📖 基本使用

-目前提供了 Windows 和 macOS 平台的可安装版本。如果要使用默认的 Gummy 字幕引擎，首先需要获取阿里云百炼平台的 API KEY，然后将 API KEY 添加到软件设置中或者配置到环境变量中（仅 Windows 平台支持读取环境变量中的 API KEY），这样才能正常使用该模型。
+软件已经适配了 Windows、macOS 和 Linux 平台。测试过的平台信息如下：

-![](./assets/media/api_zh.png)
+| 操作系统版本        | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
+| ------------------ | ---------- | ---------------- | ---------------- |
+| Windows 11 24H2    | x64        | ✅               | ✅                |
+| macOS Sequoia 15.5 | arm64      | ✅需要额外配置     | ✅                |
+| Ubuntu 24.04.2     | x64        | ✅               | ✅                |
+| Kali Linux 2022.3  | x64        | ✅               | ✅                |

-**国际版的阿里云服务并没有提供 Gummy 模型，因此目前非中国用户无法使用默认字幕引擎。我正在开发新的本地字幕引擎，以确保所有用户都有默认字幕引擎可以使用。**
+macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置，详见[Auto Caption 用户手册](./docs/user-manual/zh.md)。

-相关教程：
+> 国际版的阿里云服务并没有提供 Gummy 模型，因此目前非中国用户无法使用 Gummy 字幕引擎。
+
+如果要使用默认的 Gummy 字幕引擎（使用云端模型进行语音识别和翻译），首先需要获取阿里云百炼平台的 API KEY，然后将 API KEY 添加到软件设置中或者配置到环境变量中（仅 Windows 平台支持读取环境变量中的 API KEY），这样才能正常使用该模型。相关教程：

 - [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
 - [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)

-如果你想了解字幕引擎的工作原理，或者你想开发自己的字幕引擎，请参考[字幕引擎说明文档](./docs/engine-manual/zh.md)。
+> Vosk 模型的识别效果较差，请谨慎使用。
+
+如果要使用 Vosk 本地字幕引擎，首先需要在 [Vosk Models](https://alphacephei.com/vosk/models) 页面下载你需要的模型，并将模型解压到本地，并将模型文件夹的路径添加到软件的设置中。目前 Vosk 字幕引擎还不支持翻译字幕内容。
+
+![](./assets/media/vosk_zh.png)
+
+**如果你觉得上述字幕引擎不能满足你的需求，而且你会 Python，那么你可以考虑开发自己的字幕引擎。详细说明请参考[字幕引擎说明文档](./docs/engine-manual/zh.md)。**
+
 ## ✨ 特性

 - 跨平台、多界面语言支持
@@ -52,13 +72,9 @@
 - 字幕记录展示与导出
 - 生成音频输出或麦克风输入的字幕

-说明：
- Windows 和 macOS 平台支持生成音频输出和麦克风输入的字幕，但是 **macOS 平台获取系统音频输出需要进行设置，详见[Auto Caption 用户手册](./docs/user-manual/zh.md)**
- Linux 平台目前无法获取系统音频输出，仅支持生成麦克风输入的字幕
-
 ## ⚙️ 自带字幕引擎说明

-目前软件自带 1 个字幕引擎，正在规划 2 个新的引擎。它们的详细信息如下。
+目前软件自带 2 个字幕引擎，正在规划 1 个新的引擎。它们的详细信息如下。

 ### Gummy 字幕引擎（云端）

@@ -87,7 +103,7 @@ $$

 ### Vosk 字幕引擎（本地）

-预计基于 [vosk-api](https://github.com/alphacep/vosk-api) 进行开发，正在实验中。
+基于 [vosk-api](https://github.com/alphacep/vosk-api) 开发。目前只支持生成音频对应的原文，不支持生成翻译内容。

 ### FunASR 字幕引擎（本地）

@@ -123,18 +139,37 @@ subenv/Scripts/activate
 source subenv/bin/activate
 ```

-然后安装依赖（注意如果是 Linux 或 macOS 环境，需要注释掉 `requirements.txt` 中的 `PyAudioWPatch`，该模块仅适用于 Windows 环境）。
-
-> 这一步可能会报错，一般是因为构建失败，需要根据报错信息安装对应的构建工具包。
+然后安装依赖（这一步可能会报错，一般是因为构建失败，需要根据报错信息安装对应的工具包）：

 ```bash
-pip install -r requirements.txt
+# Windows
+pip install -r requirements_win.txt
+# macOS
+pip install -r requirements_darwin.txt
+# Linux
+pip install -r requirements_linux.txt
+```
+
+如果在 Linux 系统上安装 samplerate 模块报错，可以尝试使用以下命令单独安装：
+
+```bash
+pip install samplerate --only-binary=:all:
 ```

 然后使用 `pyinstaller` 构建项目：

 ```bash
-pyinstaller --onefile main-gummy.py
+pyinstaller ./main-gummy.spec
+pyinstaller ./main-vosk.spec
+```
+
+注意 `main-vosk.spec` 文件中 `vosk` 库的路径可能不正确，需要根据实际状况配置。
+
+```
+# Windows
+vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
+# Linux or macOS
+vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
 ```

 此时项目构建完成，在进入 `caption-engine/dist` 文件夹可见对应的可执行文件。即可进行后续操作。
@@ -144,6 +179,7 @@ pyinstaller --onefile main-gummy.py
 ```bash
 npm run dev
 ```
+
 ### 构建项目

 注意目前软件只在 Windows 和 macOS 平台上进行了构建和测试，无法保证软件在 Linux 平台下的正确性。
@@ -156,3 +192,19 @@ npm run build:mac
 # For Linux
 npm run build:linux
 ```
+
+注意，根据不同的平台需要修改项目根目录下 `electron-builder.yml` 文件中的配置内容：
+
+```yml
+extraResources:
+  # For Windows
+  - from: ./caption-engine/dist/main-gummy.exe
+    to: ./caption-engine/main-gummy.exe
+  - from: ./caption-engine/dist/main-vosk.exe
+    to: ./caption-engine/main-vosk.exe
+  # For macOS and Linux
+  # - from: ./caption-engine/dist/main-gummy
+  #   to: ./caption-engine/main-gummy
+  # - from: ./caption-engine/dist/main-vosk
+  #   to: ./caption-engine/main-vosk
+```
--- a/README_en.md
+++ b/README_en.md
@@ -2,17 +2,23 @@
    <img src="./build/icon.png" width="100px" height="100px"/>
    <h1 align="center">auto-caption</h1>
    <p>Auto Caption is a cross-platform real-time caption display software.</p>
-    <img src="https://img.shields.io/badge/version-0.3.0-blue">
-    <img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
-    <img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
-    <img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
-    <img src="https://visitor-badge.laobi.icu/badge?page_id=himeditator.github.io">
+    <p>
+      <a href="https://github.com/HiMeditator/auto-caption/releases">
+        <img src="https://img.shields.io/badge/release-0.5.0-blue">
+      </a>
+      <a href="https://github.com/HiMeditator/auto-caption/issues">
+        <img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
+      </a>
+      <img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
+      <img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
+      <img src="https://img.shields.io/github/stars/HiMeditator/auto-caption?style=social">
+    </p>
    <p>
        | <a href="./README.md">简体中文</a>
        | <b>English</b>
        | <a href="./README_ja.md">日本語</a> |
    </p>
-    <p><i>Version v0.3.0 has been released. Version v1.0.0, which is expected to add a local caption engine, is still under development...</i></p>
+    <p><i>Version v0.5.0 has been released. <b>The current Vosk local caption engine performs poorly and does not include translation</b>. A better caption engine is under development...</i></p>
 </div>

 ![](./assets/media/main_en.png)
@@ -31,18 +37,31 @@

 ## 📖 Basic Usage

-Currently, installable versions are provided for Windows and macOS platforms. To use the default Gummy caption engine, you first need to obtain an API KEY from Alibaba Cloud Bailian platform, then add the API KEY to the software settings or configure it in environment variables (only Windows platform supports reading API KEY from environment variables) to enable normal usage of this model.
+The software has been adapted for Windows, macOS, and Linux platforms. The tested platform information is as follows:

-![](./assets/media/api_en.png)
+| OS Version         | Architecture | System Audio Input | System Audio Output |
+| ------------------ | ------------ | ------------------ | ------------------- |
+| Windows 11 24H2    | x64          | ✅                 | ✅                   |
+| macOS Sequoia 15.5 | arm64        | ✅ Additional config required | ✅        |
+| Ubuntu 24.04.2     | x64          | ✅                 | ✅                   |
+| Kali Linux 2022.3  | x64          | ✅                 | ✅                   |

-**The international version of Alibaba Cloud services does not provide the Gummy model, so currently non-Chinese users cannot use the default caption engine. I'm developing a new local caption engine to ensure all users have a default caption engine available.**
+Additional configuration is required to capture system audio output on macOS and Linux platforms. See [Auto Caption User Manual](./docs/user-manual/en.md) for details.

-Related tutorials:
+> The international version of Alibaba Cloud services does not provide the Gummy model, so non-Chinese users currently cannot use the Gummy caption engine.

- [Obtain API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
- [Configure API Key in Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
+To use the default Gummy caption engine (which uses cloud-based models for speech recognition and translation), you first need to obtain an API KEY from the Alibaba Cloud Bailian platform. Then add the API KEY to the software settings or configure it in environment variables (only Windows platform supports reading API KEY from environment variables) to properly use this model. Related tutorials:

-If you want to understand how the caption engine works, or if you want to develop your own caption engine, please refer to [Caption Engine Documentation](./docs/engine-manual/en.md).
+- [Obtaining API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
+- [Configuring API Key through Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
+
+> The recognition performance of Vosk models is suboptimal, please use with caution.
+
+To use the Vosk local caption engine, first download your required model from [Vosk Models](https://alphacephei.com/vosk/models) page, extract the model locally, and add the model folder path to the software settings. Currently, the Vosk caption engine does not support translated captions.
+
+![](./assets/media/vosk_en.png)
+
+**If you find the above caption engines don't meet your needs and you know Python, you may consider developing your own caption engine. For detailed instructions, please refer to the [Caption Engine Documentation](./docs/engine-manual/en.md).**

 ## ✨ Features

@@ -53,13 +72,9 @@ If you want to understand how the caption engine works, or if you want to develo
 - Caption recording display and export
 - Generate captions for audio output or microphone input

-Notes:
- Windows and macOS platforms support generating captions for both audio output and microphone input, but **macOS requires additional setup to capture system audio output. See [Auto Caption User Manual](./docs/user-manual/en.md) for details.**
- Linux platform currently cannot capture system audio output, only supports generating subtitles for microphone input.
-
 ## ⚙️ Built-in Subtitle Engines

-Currently, the software comes with 1 subtitle engine, with 2 new engines planned. Details are as follows.
+Currently, the software comes with 2 subtitle engines, with 1 new engine planned. Details are as follows.

 ### Gummy Subtitle Engine (Cloud)

@@ -88,7 +103,7 @@ The engine only uploads data when receiving audio streams, so the actual upload

 ### Vosk Subtitle Engine (Local)

-Planned to be developed based on [vosk-api](https://github.com/alphacep/vosk-api), currently in experimentation.
+Developed based on [vosk-api](https://github.com/alphacep/vosk-api). Currently only supports generating original text from audio, does not support translation content.

 ### FunASR Subtitle Engine (Local)

@@ -124,18 +139,37 @@ subenv/Scripts/activate
 source subenv/bin/activate
 ```

-Then install dependencies (note: for Linux or macOS environments, you need to comment out `PyAudioWPatch` in `requirements.txt`, as this module is only for Windows environments).
-
-> This step may report errors, usually due to build failures. You need to install corresponding build tools based on the error messages.
+Then install dependencies (this step may fail, usually due to build failures - you'll need to install the corresponding tool packages based on the error messages):

 ```bash
-pip install -r requirements.txt
+# Windows
+pip install -r requirements_win.txt
+# macOS
+pip install -r requirements_darwin.txt
+# Linux
+pip install -r requirements_linux.txt
+```
+
+If you encounter errors when installing the `samplerate` module on Linux systems, you can try installing it separately with this command:
+
+```bash
+pip install samplerate --only-binary=:all:
 ```

 Then use `pyinstaller` to build the project:

 ```bash
-pyinstaller --onefile main-gummy.py
+pyinstaller ./main-gummy.spec
+pyinstaller ./main-vosk.spec
+```
+
+Note that the path to the `vosk` library in `main-vosk.spec` might be incorrect and needs to be configured according to the actual situation.
+
+```
+# Windows
+vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
+# Linux or macOS
+vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
 ```

 After the build completes, you can find the executable file in the `caption-engine/dist` folder. Then proceed with subsequent operations.
@@ -158,3 +192,19 @@ npm run build:mac
 # For Linux
 npm run build:linux
 ```
+
+Note: You need to modify the configuration content in the `electron-builder.yml` file in the project root directory according to different platforms:
+
+```yml
+extraResources:
+  # For Windows
+  - from: ./caption-engine/dist/main-gummy.exe
+    to: ./caption-engine/main-gummy.exe
+  - from: ./caption-engine/dist/main-vosk.exe
+    to: ./caption-engine/main-vosk.exe
+  # For macOS and Linux
+  # - from: ./caption-engine/dist/main-gummy
+  #   to: ./caption-engine/main-gummy
+  # - from: ./caption-engine/dist/main-vosk
+  #   to: ./caption-engine/main-vosk
+```
--- a/README_ja.md
+++ b/README_ja.md
@@ -2,17 +2,23 @@
    <img src="./build/icon.png" width="100px" height="100px"/>
    <h1 align="center">auto-caption</h1>
    <p>Auto Caption はクロスプラットフォームのリアルタイム字幕表示ソフトウェアです。</p>
-    <img src="https://img.shields.io/badge/version-0.3.0-blue">
-    <img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
-    <img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
-    <img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
-    <img src="https://visitor-badge.laobi.icu/badge?page_id=himeditator.github.io">
+    <p>
+      <a href="https://github.com/HiMeditator/auto-caption/releases">
+        <img src="https://img.shields.io/badge/release-0.5.0-blue">
+      </a>
+      <a href="https://github.com/HiMeditator/auto-caption/issues">
+        <img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
+      </a>
+      <img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
+      <img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
+      <img src="https://img.shields.io/github/stars/HiMeditator/auto-caption?style=social">
+    </p>
    <p>
        | <a href="./README.md">简体中文</a>
        | <a href="./README_en.md">English</a>
        | <b>日本語</b> |
    </p>
-    <p><i>v0.3.0 バージョンがリリースされました。ローカル字幕エンジンを追加予定の v1.0.0 バージョンを現在開発中...</i></p>
+    <p><i>バージョン v0.5.0 がリリースされました。<b>現在の Vosk ローカル字幕エンジンは性能が低く、翻訳機能も含まれていません</b>。より優れた字幕エンジンを開発中です...</i></p>
 </div>

 ![](./assets/media/main_ja.png)
@@ -29,18 +35,33 @@

 [プロジェクト API ドキュメント（中国語）](./docs/api-docs/electron-ipc.md)

-現在、Windows と macOS プラットフォーム向けのインストール可能なバージョンを提供しています。デフォルトの Gummy 字幕エンジンを使用するには、まず Alibaba Cloud Bailian プラットフォームから API KEY を取得し、その API KEY をソフトウェア設定に追加するか、環境変数に設定する必要があります（Windows プラットフォームのみ環境変数からの API KEY 読み取りをサポートしています）。
+## 📖 基本使い方

-![](./assets/media/api_ja.png)
+このソフトウェアはWindows、macOS、Linuxプラットフォームに対応しています。テスト済みのプラットフォーム情報は以下の通りです：

-**国際版の Alibaba Cloud サービスには Gummy モデルが提供されていないため、現在中国以外のユーザーはデフォルトの字幕エンジンを使用できません。すべてのユーザーがデフォルトの字幕エンジンを使用できるように、新しいローカル字幕エンジンを開発中です。**
+| OS バージョン | アーキテクチャ | システムオーディオ入力 | システムオーディオ出力 |
+| ------------------ | ------------ | ------------------ | ------------------- |
+| Windows 11 24H2    | x64          | ✅                 | ✅                   |
+| macOS Sequoia 15.5 | arm64        | ✅ 追加設定が必要    | ✅                   |
+| Ubuntu 24.04.2     | x64          | ✅                 | ✅                   |
+| Kali Linux 2022.3  | x64          | ✅                 | ✅                   |

-関連チュートリアル：
+macOSおよびLinuxプラットフォームでシステムオーディオ出力を取得するには追加設定が必要です。詳細は[Auto Captionユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。
+
+> 阿里雲の国際版サービスでは Gummy モデルを提供していないため、現在中国以外のユーザーは Gummy 字幕エンジンを使用できません。
+
+デフォルトの Gummy 字幕エンジン（クラウドベースのモデルを使用した音声認識と翻訳）を使用するには、まず阿里雲百煉プラットフォームから API KEY を取得する必要があります。その後、API KEY をソフトウェア設定に追加するか、環境変数に設定します（Windows プラットフォームのみ環境変数からの API KEY 読み取りをサポート）。関連チュートリアル：

 - [API KEY の取得（中国語）](https://help.aliyun.com/zh/model-studio/get-api-key)
- [環境変数への API Key 設定（中国語）](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
+- [環境変数を通じて API Key を設定（中国語）](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)

-字幕エンジンの動作原理を理解したい場合、または独自の字幕エンジンを開発したい場合は、[字幕エンジン説明ドキュメント](./docs/engine-manual/ja.md)を参照してください。
+> Vosk モデルの認識精度は低いため、注意してご使用ください。
+
+Vosk ローカル字幕エンジンを使用するには、まず [Vosk Models](https://alphacephei.com/vosk/models) ページから必要なモデルをダウンロードし、ローカルに解凍した後、モデルフォルダのパスをソフトウェア設定に追加してください。現在、Vosk 字幕エンジンは字幕の翻訳をサポートしていません。
+
+![](./assets/media/vosk_ja.png)
+
+**上記の字幕エンジンがご要望を満たさず、かつ Python の知識をお持ちの場合、独自の字幕エンジンを開発することも可能です。詳細な説明は[字幕エンジン説明書](./docs/engine-manual/ja.md)をご参照ください。**

 ## ✨ 特徴

@@ -51,13 +72,9 @@
 - 字幕記録の表示とエクスポート
 - オーディオ出力またはマイク入力からの字幕生成

-注記：
- Windows と macOS プラットフォームはオーディオ出力とマイク入力の両方からの字幕生成をサポートしていますが、**macOS プラットフォームでシステムオーディオ出力を取得するには設定が必要です。詳細は[Auto Caption ユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。**
- Linux プラットフォームは現在システムオーディオ出力を取得できず、マイク入力からの字幕生成のみをサポートしています。
-
 ## ⚙️ 字幕エンジン説明

-現在ソフトウェアには1つの字幕エンジンが組み込まれており、2つの新しいエンジンを計画中です。詳細は以下の通りです。
+現在ソフトウェアには2つの字幕エンジンが組み込まれており、1つの新しいエンジンを計画中です。詳細は以下の通りです。

 ### Gummy 字幕エンジン（クラウド）

@@ -86,7 +103,7 @@ $$

 ### Vosk字幕エンジン（ローカル）

-[vosk-api](https://github.com/alphacep/vosk-api) をベースに開発予定で、現在実験中です。
+[vosk-api](https://github.com/alphacep/vosk-api) をベースに開発されています。現在は音声に対応する原文の生成のみをサポートしており、翻訳コンテンツはサポートしていません。

 ### FunASR字幕エンジン（ローカル）

@@ -122,18 +139,37 @@ subenv/Scripts/activate
 source subenv/bin/activate
 ```

-その後、依存関係をインストールします（Linux または macOS 環境の場合、`requirements.txt` 内の `PyAudioWPatch` をコメントアウトする必要があります。このモジュールは Windows 環境専用です）。
-
-> このステップでエラーが発生する場合があります。一般的にはビルド失敗が原因で、エラーメッセージに基づいて対応するビルドツールパッケージをインストールする必要があります。
+次に依存関係をインストールします（このステップは失敗する可能性があります、通常はビルド失敗が原因です - エラーメッセージに基づいて対応するツールパッケージをインストールする必要があります）：

 ```bash
-pip install -r requirements.txt
+# Windows
+pip install -r requirements_win.txt
+# macOS
+pip install -r requirements_darwin.txt
+# Linux
+pip install -r requirements_linux.txt
+```
+
+Linuxシステムで`samplerate`モジュールのインストールに問題が発生した場合、以下のコマンドで個別にインストールを試すことができます：
+
+```bash
+pip install samplerate --only-binary=:all:
 ```

 その後、`pyinstaller` を使用してプロジェクトをビルドします：

 ```bash
-pyinstaller --onefile main-gummy.py
+pyinstaller ./main-gummy.spec
+pyinstaller ./main-vosk.spec
+```
+
+`main-vosk.spec` ファイル内の `vosk` ライブラリのパスが正しくない可能性があるため、実際の状況に応じて設定する必要があります。
+
+```
+# Windows
+vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
+# LinuxまたはmacOS
+vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
 ```

 これでプロジェクトのビルドが完了し、`caption-engine/dist` フォルダ内に対応する実行可能ファイルが確認できます。その後、次の操作に進むことができます。
@@ -156,3 +192,19 @@ npm run build:mac
 # Linux 用
 npm run build:linux
 ```
+
+注意: プラットフォームに応じて、プロジェクトルートディレクトリにある `electron-builder.yml` ファイルの設定内容を変更する必要があります:
+
+```yml
+extraResources:
+  # Windows用
+  - from: ./caption-engine/dist/main-gummy.exe
+    to: ./caption-engine/main-gummy.exe
+  - from: ./caption-engine/dist/main-vosk.exe
+    to: ./caption-engine/main-vosk.exe
+  # macOSとLinux用
+  # - from: ./caption-engine/dist/main-gummy
+  #   to: ./caption-engine/main-gummy
+  # - from: ./caption-engine/dist/main-vosk
+  #   to: ./caption-engine/main-vosk
+```
--- a/assets/media/api_en.png
+++ b/assets/media/api_en.png
--- a/assets/media/api_ja.png
+++ b/assets/media/api_ja.png
--- a/assets/media/api_zh.png
+++ b/assets/media/api_zh.png
--- a/assets/media/main_en.png
+++ b/assets/media/main_en.png
--- a/assets/media/main_ja.png
+++ b/assets/media/main_ja.png
--- a/assets/media/main_zh.png
+++ b/assets/media/main_zh.png
--- a/assets/media/vosk_en.png
+++ b/assets/media/vosk_en.png
--- a/assets/media/vosk_ja.png
+++ b/assets/media/vosk_ja.png
--- a/assets/media/vosk_zh.png
+++ b/assets/media/vosk_zh.png
--- a/caption-engine/audioprcs/init.py
+++ b/caption-engine/audioprcs/init.py
@@ -1 +1 @@
-from .process import mergeChunkChannels, resampleRawChunk
+from .process import mergeChunkChannels, resampleRawChunk, resampleMonoChunk
--- a/caption-engine/audioprcs/process.py
+++ b/caption-engine/audioprcs/process.py
@@ -47,3 +47,22 @@ def resampleRawChunk(chunk, channels, orig_sr, target_sr, mode="sinc_best"):
    chunk_mono_r =  samplerate.resample(chunk_mono, ratio, converter_type=mode)
    chunk_mono_r = np.round(chunk_mono_r).astype(np.int16)
    return chunk_mono_r.tobytes()
+
+def resampleMonoChunk(chunk, orig_sr, target_sr, mode="sinc_best"):
+    """
+    将当前单通道音频块进行重采样
+
+    Args:
+        chunk: (bytes)单通道音频数据块
+        orig_sr: 原始采样率
+        target_sr: 目标采样率
+        mode: 重采样模式，可选：'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
+
+    Return:
+        (bytes)单通道音频数据块
+    """
+    chunk_np = np.frombuffer(chunk, dtype=np.int16)
+    ratio = target_sr / orig_sr
+    chunk_r =  samplerate.resample(chunk_np, ratio, converter_type=mode)
+    chunk_r = np.round(chunk_r).astype(np.int16)
+    return chunk_r.tobytes()
--- a/caption-engine/main-vosk.py
+++ b/caption-engine/main-vosk.py
@@ -0,0 +1,83 @@
+import sys
+import json
+import argparse
+from datetime import datetime
+import numpy.core.multiarray
+
+if sys.platform == 'win32':
+    from sysaudio.win import AudioStream
+elif sys.platform == 'darwin':
+    from sysaudio.darwin import AudioStream
+elif sys.platform == 'linux':
+    from sysaudio.linux import AudioStream
+else:
+    raise NotImplementedError(f"Unsupported platform: {sys.platform}")
+
+from vosk import Model, KaldiRecognizer, SetLogLevel
+from audioprcs import resampleRawChunk
+
+SetLogLevel(-1)
+
+def convert_audio_to_text(audio_type, chunk_rate, model_path):
+    sys.stdout.reconfigure(line_buffering=True) # type: ignore
+
+    if model_path.startswith('"'):
+        model_path = model_path[1:]
+    if model_path.endswith('"'):
+        model_path = model_path[:-1]
+
+    model = Model(model_path)
+    recognizer = KaldiRecognizer(model, 16000)
+
+    stream = AudioStream(audio_type, chunk_rate)
+    stream.openStream()
+
+    time_str = ''
+    cur_id = 0
+    prev_content = ''
+
+    while True:
+        chunk = stream.read_chunk()
+        chunk_mono = resampleRawChunk(chunk, stream.CHANNELS, stream.RATE, 16000)
+
+        caption = {}
+        if recognizer.AcceptWaveform(chunk_mono):
+            content = json.loads(recognizer.Result()).get('text', '')
+            caption['index'] = cur_id
+            caption['text'] = content
+            caption['time_s'] = time_str
+            caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+            caption['translation'] = ''
+            prev_content = ''
+            cur_id += 1
+        else:
+            content = json.loads(recognizer.PartialResult()).get('partial', '')
+            if content == '' or content == prev_content:
+                continue
+            if prev_content == '':
+                time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+            caption['index'] = cur_id
+            caption['text'] = content
+            caption['time_s'] = time_str
+            caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+            caption['translation'] = ''
+            prev_content = content
+        try:
+            json_str = json.dumps(caption) + '\n'
+            sys.stdout.write(json_str)
+            sys.stdout.flush()
+        except Exception as e:
+            print(e)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Convert system audio stream to text')
+    parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
+    parser.add_argument('-c', '--chunk_rate', default=20, help='The number of audio stream chunks collected per second.')
+    parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
+    args = parser.parse_args()
+    convert_audio_to_text(
+        int(args.audio_type),
+        int(args.chunk_rate),
+        args.model_path
+    )
--- a/caption-engine/main-vosk.spec
+++ b/caption-engine/main-vosk.spec
@@ -0,0 +1,46 @@
+# -*- mode: python ; coding: utf-8 -*-
+
+from pathlib import Path
+import sys
+
+if sys.platform == 'win32':
+    vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
+else:
+    vosk_path = str(Path('./subenv/lib/python3.12/site-packages/vosk').resolve())
+
+a = Analysis(
+    ['main-vosk.py'],
+    pathex=[],
+    binaries=[],
+    datas=[(vosk_path, 'vosk')],
+    hiddenimports=[],
+    hookspath=[],
+    hooksconfig={},
+    runtime_hooks=[],
+    excludes=[],
+    noarchive=False,
+    optimize=0,
+)
+
+pyz = PYZ(a.pure)
+
+exe = EXE(
+    pyz,
+    a.scripts,
+    a.binaries,
+    a.datas,
+    [],
+    name='main-vosk',
+    debug=False,
+    bootloader_ignore_signals=False,
+    strip=False,
+    upx=True,
+    upx_exclude=[],
+    runtime_tmpdir=None,
+    console=True,
+    disable_windowed_traceback=False,
+    argv_emulation=False,
+    target_arch=None,
+    codesign_identity=None,
+    entitlements_file=None,
+)
--- a/caption-engine/requirements_darwin.txt
+++ b/caption-engine/requirements_darwin.txt
@@ -2,5 +2,5 @@ dashscope
 numpy
 samplerate
 PyAudio
-PyAudioWPatch # Windows only
+vosk
 pyinstaller
--- a/caption-engine/requirements_linux.txt
+++ b/caption-engine/requirements_linux.txt
@@ -0,0 +1,5 @@
+dashscope
+numpy
+vosk
+pyinstaller
+samplerate # pip install samplerate --only-binary=:all:
--- a/caption-engine/requirements_win.txt
+++ b/caption-engine/requirements_win.txt
@@ -0,0 +1,7 @@
+dashscope
+numpy
+samplerate
+PyAudio
+PyAudioWPatch
+vosk
+pyinstaller
--- a/caption-engine/sysaudio/linux.py
+++ b/caption-engine/sysaudio/linux.py
@@ -1,7 +1,34 @@
 """获取 Linux 系统音频输入流"""

-import pyaudio
+import subprocess

+def findMonitorSource():
+    result = subprocess.run(
+        ["pactl", "list", "short", "sources"],
+        stdout=subprocess.PIPE, text=True
+    )
+    lines = result.stdout.splitlines()
+
+    for line in lines:
+        parts = line.split('\t')
+        if len(parts) >= 2 and ".monitor" in parts[1]:
+            return parts[1]
+
+    raise RuntimeError("System output monitor device not found")
+
+def findInputSource():
+    result = subprocess.run(
+        ["pactl", "list", "short", "sources"],
+        stdout=subprocess.PIPE, text=True
+    )
+    lines = result.stdout.splitlines()
+
+    for line in lines:
+        parts = line.split('\t')
+        name = parts[1]
+        if ".monitor" not in name:
+            return name
+    raise RuntimeError("Microphone input device not found")

 class AudioStream:
    """
@@ -13,26 +40,26 @@ class AudioStream:
    """
    def __init__(self, audio_type=1,  chunk_rate=20):
        self.audio_type = audio_type
-        self.mic = pyaudio.PyAudio()
-        self.device = self.mic.get_default_input_device_info()
-        self.stream = None
-        self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
-        self.FORMAT = pyaudio.paInt16
-        self.CHANNELS = self.device["maxInputChannels"]
-        self.RATE = int(self.device["defaultSampleRate"])
+
+        if self.audio_type == 0:
+            self.source = findMonitorSource()
+        else:
+            self.source = findInputSource()
+
+        self.process = None
+
+        self.SAMP_WIDTH = 2
+        self.FORMAT = 16
+        self.CHANNELS = 2
+        self.RATE = 48000
        self.CHUNK = self.RATE // chunk_rate
-        self.INDEX = self.device["index"]

    def printInfo(self):
        dev_info = f"""
-        采样输入设备：
-            - 设备类型：{ "音频输入（Linux平台目前仅支持该项）" }
-            - 序号：{self.device['index']}
-            - 名称：{self.device['name']}
-            - 最大输入通道数：{self.device['maxInputChannels']}
-            - 默认低输入延迟：{self.device['defaultLowInputLatency']}s
-            - 默认高输入延迟：{self.device['defaultHighInputLatency']}s
-            - 默认采样率：{self.device['defaultSampleRate']}Hz
+        音频捕获进程：
+            - 捕获类型：{"音频输出" if self.audio_type == 0 else "音频输入"}
+            - 设备源：{self.source}
+            - 捕获进程PID：{self.process.pid if self.process else "None"}

        音频样本块大小：{self.CHUNK}
        样本位宽：{self.SAMP_WIDTH}
@@ -44,30 +71,24 @@ class AudioStream:

    def openStream(self):
        """
-        打开并返回系统音频输出流
+        启动音频捕获进程
        """
-        if self.stream: return self.stream
-        self.stream = self.mic.open(
-            format = self.FORMAT,
-            channels = int(self.CHANNELS),
-            rate = self.RATE,
-            input = True,
-            input_device_index = int(self.INDEX)
+        self.process = subprocess.Popen(
+            ["parec", "-d", self.source, "--format=s16le", "--rate=48000", "--channels=2"],
+            stdout=subprocess.PIPE
        )
-        return self.stream

    def read_chunk(self):
        """
        读取音频数据
        """
-        if not self.stream: return None
-        return self.stream.read(self.CHUNK)
+        if self.process:
+            return self.process.stdout.read(self.CHUNK)
+        return None

    def closeStream(self):
        """
-        关闭系统音频输出流
+        关闭系统音频捕获进程
        """
-        if self.stream is None: return
-        self.stream.stop_stream()
-        self.stream.close()
-        self.stream = None
+        if self.process:
+            self.process.terminate()
--- a/caption-engine/sysaudio/win.py
+++ b/caption-engine/sysaudio/win.py
@@ -57,7 +57,7 @@ class AudioStream:
        self.stream = None
        self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
        self.FORMAT = pyaudio.paInt16
-        self.CHANNELS = self.device["maxInputChannels"]
+        self.CHANNELS = int(self.device["maxInputChannels"])
        self.RATE = int(self.device["defaultSampleRate"])
        self.CHUNK = self.RATE // chunk_rate
        self.INDEX = self.device["index"]
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@@ -72,3 +72,36 @@
 ### 修复bug

 - 修复使用系统主题时暗色系统载入为亮色的问题
+
+## v0.4.0
+
+2025-07-11
+
+添加了 Vosk 本地字幕引擎，更新了项目文档，继续优化使用体验。
+
+### 新增功能
+
+- 添加了基于 Vosk 的字幕引擎， **当前 Vosk 字幕引擎暂不支持翻译**
+- 更新用户界面，增加 Vosk 引擎选项和模型路径设置
+
+### 优化体验
+
+- 字幕窗口右上角图标的颜色改为和字幕原文字体颜色一致
+
+## v0.5.0
+
+2025-07-15
+
+为软件本体添加了更多功能、适配了 Linux。
+
+### 新增功能
+
+- 适配了 Linux 平台
+- 新增修改字幕时间功能，可调整字幕时间
+- 支持导出 srt 格式的字幕记录
+- 支持显示字幕引擎状态（pid、ppid、CPU占用率、内存占用、运行时间）
+
+### 优化体验
+
+- 调整字幕窗口右上角图标为竖向排布
+- 过滤 Gummy 字幕引擎输出的不完整字幕
--- a/docs/TODO.md
+++ b/docs/TODO.md
@@ -9,12 +9,21 @@
 - [x] 添加复制字幕到剪贴板功能 *2025/07/08*
 - [x] 适配 macOS 平台 *2025/07/08*
 - [x] 添加字幕文字描边 *2025/07/09*
+- [x] 添加基于 Vosk 的字幕引擎 *2025/07/09*
+- [x] 适配 Linux 平台 *2025/07/13*
+- [x] 字幕窗口右上角图标改为竖向排布 *2025/07/14*
+- [x] 可以调整字幕时间轴 *2025/07/14*
+- [x] 可以导出 srt 格式的字幕记录 *2025/07/14*
+- [x] 可以获取字幕引擎的系统资源消耗情况 *2025/07/15*

 ## 待完成

- [ ] 添加本地字幕引擎
-  - [ ] 添加基于 Vosk 的字幕引擎
-  - [ ] 验证 / 添加基于 FunASR 的字幕引擎
+- [ ] 探索更多的语音转文字模型
+
+## 后续计划
+
+- [ ] 添加 Ollama 模型用于本地字幕引擎的翻译
+- [ ] 验证 / 添加基于 FunASR 的字幕引擎
 - [ ] 减小软件不必要的体积

 ## 遥远的未来
--- a/docs/api-docs/electron-ipc.md
+++ b/docs/api-docs/electron-ipc.md
@@ -44,6 +44,32 @@
 - 发送：无数据
 - 接收：`string`

+### `control.folder.select`
+
+**介绍：** 打开文件夹选择器，并将用户选择的文件夹路径返回给前端
+
+**发起方：** 前端控制窗口
+
+**接收方：** 后端控制窗口实例
+
+**数据类型：**
+
+- 发送：无数据
+- 接收：`string`
+
+### `control.engine.info`
+
+**介绍：** 获取字幕引擎的资源消耗情况
+
+**发起方：** 前端控制窗口
+
+**接收方：** 后端控制窗口实例
+
+**数据类型：**
+
+- 发送：无数据
+- 接收：`EngineInfo`
+
 ## 前端 ==> 后端

 ### `control.uiLanguage.change`
--- a/docs/engine-manual/en.md
+++ b/docs/engine-manual/en.md
@@ -1,6 +1,6 @@
 # Caption Engine Documentation

-Corresponding Version: v0.3.0
+Corresponding Version: v0.5.0

 ![](../../assets/media/structure_en.png)

@@ -80,6 +80,10 @@ def convert_audio_to_text(s_lang, t_lang, audio_type, chunk_rate, api_key):
            break
 ```

+### Caption Translation
+
+Some speech-to-text models don't provide translation functionality, requiring an additional translation module. This part can use either cloud-based translation APIs or local translation models.
+
 ### Data Transmission

 After obtaining the text of the current audio stream, it needs to be transmitted to the main program. The caption engine process passes the caption data to the Electron main process through standard output.
@@ -147,6 +151,51 @@ Data receiver code is as follows:
 ...
 ```

+## Usage of Caption Engine
+
+### Command Line Parameter Specification
+
+The custom caption engine settings are specified via command line parameters. Common required parameters are as follows:
+
+```python
+import argparse
+
+...
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Convert system audio stream to text')
+    parser.add_argument('-s', '--source_language', default='en', help='Source language code')
+    parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
+    parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
+    parser.add_argument('-c', '--chunk_rate', default=20, help='The number of audio stream chunks collected per second.')
+    parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
+    args = parser.parse_args()
+    convert_audio_to_text(
+        args.source_language,
+        args.target_language,
+        int(args.audio_type),
+        int(args.chunk_rate),
+        args.api_key
+    )
+```
+
+For example, to specify Japanese as source language, Chinese as target language, capture system audio output, and collect 0.1s audio chunks, use the following command:
+
+```bash
+python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>
+```
+
+### Packaging
+
+After development and testing, package the caption engine into an executable file using `pyinstaller`. If errors occur, check for missing dependencies.
+
+### Execution
+
+With a working caption engine, specify its path and runtime parameters in the caption software window to launch it.
+
+![](../img/02_en.png)
+
+
 ## Reference Code

-The `main-gummy.py` file under the `caption-engine` folder in this project serves as the entry point for the default caption engine. The `src\main\utils\engine.ts` file contains the server-side code for acquiring and processing data from the caption engine. You can read and understand the implementation details and the complete execution process of the caption engine as needed.
+The `main-gummy.py` file under the `caption-engine` folder in this project serves as the entry point for the default caption engine. The `src\main\utils\engine.ts` file contains the server-side code for acquiring and processing data from the caption engine. You can read and understand the implementation details and the complete execution process of the caption engine as needed.
--- a/docs/engine-manual/ja.md
+++ b/docs/engine-manual/ja.md
@@ -1,6 +1,6 @@
 # 字幕エンジンの説明文書

-対応バージョン：v0.3.0
+対応バージョン：v0.5.0

 この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。

@@ -82,6 +82,10 @@ def convert_audio_to_text(s_lang, t_lang, audio_type, chunk_rate, api_key):
            break
 ```

+### 字幕翻訳
+
+音声認識モデルによっては翻訳機能を提供していないため、別途翻訳モジュールを追加する必要があります。この部分にはクラウドベースの翻訳APIを使用することも、ローカルの翻訳モデルを使用することも可能です。
+
 ### データの伝送

 現在の音声ストリームのテキストを得たら、それをメインプログラムに渡す必要があります。字幕エンジンプロセスは標準出力を通じて電子メール主プロセスに字幕データを渡します。
@@ -121,4 +125,77 @@ sys.stdout.reconfigure(line_buffering=True)
 ...
 ```

-データ受信側のコードは
+データ受信側のコード
+
+```typescript
+// src\main\utils\engine.ts
+...
+    this.process.stdout.on('data', (data) => {
+      const lines = data.toString().split('\n');
+      lines.forEach((line: string) => {
+        if (line.trim()) {
+          try {
+            const caption = JSON.parse(line);
+            addCaptionLog(caption);
+          } catch (e) {
+            controlWindow.sendErrorMessage('字幕エンジンの出力をJSONオブジェクトとして解析できません:' + e)
+            console.error('[ERROR] JSON解析エラー:', e);
+          }
+        }
+      });
+    });
+
+    this.process.stderr.on('data', (data) => {
+      controlWindow.sendErrorMessage('字幕エンジンエラー:' + data)
+      console.error(`[ERROR] サブプロセスエラー: ${data}`);
+    });
+...
+```
+
+## 字幕エンジンの使用方法
+
+### コマンドライン引数の指定
+
+カスタム字幕エンジンの設定はコマンドライン引数で指定します。主な必要なパラメータは以下の通りです：
+
+```python
+import argparse
+
+...
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='システムのオーディオストリームをテキストに変換')
+    parser.add_argument('-s', '--source_language', default='en', help='ソース言語コード')
+    parser.add_argument('-t', '--target_language', default='zh', help='ターゲット言語コード')
+    parser.add_argument('-a', '--audio_type', default=0, help='オーディオストリームソース: 0は出力音声、1は入力音声')
+    parser.add_argument('-c', '--chunk_rate', default=20, help='1秒間に収集するオーディオチャンク数')
+    parser.add_argument('-k', '--api_key', default='', help='GummyモデルのAPIキー')
+    args = parser.parse_args()
+    convert_audio_to_text(
+        args.source_language,
+        args.target_language,
+        int(args.audio_type),
+        int(args.chunk_rate),
+        args.api_key
+    )
+```
+
+例：原文を日本語、翻訳を中国語に指定し、システム音声出力を取得、0.1秒のオーディオデータを収集する場合：
+
+```bash
+python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>
+```
+
+### パッケージ化
+
+開発とテスト完了後、`pyinstaller`を使用して実行可能ファイルにパッケージ化します。エラーが発生した場合、依存ライブラリの不足を確認してください。
+
+### 実行
+
+利用可能な字幕エンジンが準備できたら、字幕ソフトウェアのウィンドウでエンジンのパスと実行パラメータを指定して起動します。
+
+![](../img/02_ja.png)
+
+## 参考コード
+
+本プロジェクトの`caption-engine`フォルダにある`main-gummy.py`ファイルはデフォルトの字幕エンジンのエントリーコードです。`src\main\utils\engine.ts`はサーバー側で字幕エンジンのデータを取得・処理するコードです。必要に応じて字幕エンジンの実装詳細と完全な実行プロセスを理解するために参照してください。
--- a/docs/engine-manual/zh.md
+++ b/docs/engine-manual/zh.md
@@ -1,6 +1,6 @@
 # 字幕引擎说明文档

-对应版本：v0.3.0
+对应版本：v0.5.0

 ![](../../assets/media/structure_zh.png)

@@ -32,7 +32,7 @@
 import sys
 import argparse

-# 引入系统音频获取勒
+# 引入系统音频获取类
 if sys.platform == 'win32':
    from sysaudio.win import AudioStream
 elif sys.platform == 'darwin':
@@ -80,6 +80,10 @@ def convert_audio_to_text(s_lang, t_lang, audio_type, chunk_rate, api_key):
            break
 ```

+### 字幕翻译
+
+有的语音转文字模型并不提供翻译，需要再添加一个翻译模块。这部分可以使用云端翻译 API 也可以使用本地翻译模型。
+
 ### 数据传递

 在获取到当前音频流的文字后，需要将文字传递给主程序。字幕引擎进程通过标准输出将字幕数据传递给 electron 主进程。
@@ -96,7 +100,7 @@ export interface CaptionItem {
 }
 ```

-**注意必须确保咱们一起每输出一次字幕 JSON 数据就得刷新缓冲区，确保 electron 主进程每次接收到的字符串都可以被解释为 JSON 对象。**
+**注意必须确保每输出一次字幕 JSON 数据就得刷新缓冲区，确保 electron 主进程每次接收到的字符串都可以被解释为 JSON 对象。**

 如果使用 python 语言，可以参考以下方式将数据传递给主程序：

@@ -147,6 +151,51 @@ sys.stdout.reconfigure(line_buffering=True)
 ...
 ```

+## 字幕引擎的使用
+
+### 命令行参数的指定
+
+自定义字幕引擎的设置提供命令行参数指定，因此需要设置好字幕引擎的参数，常见的需要的参数如下：
+
+```python
+import argparse
+
+...
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Convert system audio stream to text')
+    parser.add_argument('-s', '--source_language', default='en', help='Source language code')
+    parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
+    parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
+    parser.add_argument('-c', '--chunk_rate', default=20, help='The number of audio stream chunks collected per second.')
+    parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
+    args = parser.parse_args()
+    convert_audio_to_text(
+        args.source_language,
+        args.target_language,
+        int(args.audio_type),
+        int(args.chunk_rate),
+        args.api_key
+    )
+```
+
+比如对应上面的字幕引擎，我想指定原文为日语，翻译为中文，获取系统音频输出的字幕，每次截取 0.1s 的音频数据，那么命令行参数如下：
+
+```bash
+python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>
+```
+
+### 打包
+
+在完成字幕引擎的开发和测试后，需要将字幕引擎打包成可执行文件。一般使用 `pyinstaller` 进行打包。如果打包好的字幕引擎文件执行报错，可能是打包漏掉了某些依赖库，请检查是否缺少了依赖库。
+
+### 运行
+
+有了可以使用的字幕引擎，就可以在字幕软件窗口中通过指定字幕引擎的路径和字幕引擎的运行指令（参数）来启动字幕引擎了。
+
+![](../img/02_zh.png)
+
+
 ## 参考代码

 本项目 `caption-engine` 文件夹下的 `main-gummy.py` 文件为默认字幕引擎的入口代码。`src\main\utils\engine.ts` 为服务端获取字幕引擎数据和进行处理的代码。可以根据需要阅读了解字幕引擎的实现细节和完整运行过程。
--- a/docs/img/02_en.png
+++ b/docs/img/02_en.png
--- a/docs/img/02_ja.png
+++ b/docs/img/02_ja.png
--- a/docs/img/02_zh.png
+++ b/docs/img/02_zh.png
--- a/docs/user-manual/en.md
+++ b/docs/user-manual/en.md
@@ -1,39 +1,50 @@
 # Auto Caption User Manual

-Corresponding Version: v0.3.0
+Corresponding Version: v0.5.0

 ## Software Introduction

 Auto Caption is a cross-platform caption display software that can real-time capture system audio input (recording) or output (playback) streaming data and use an audio-to-text model to generate captions for the corresponding audio. The default caption engine provided by the software (using Alibaba Cloud Gummy model) supports recognition and translation in nine languages (Chinese, English, Japanese, Korean, German, French, Russian, Spanish, Italian).

-Currently, the default caption engine of the software only has full functionality on Windows and macOS platforms. Additional configuration is required to capture system audio output on macOS.
+The default caption engine currently has full functionality on Windows, macOS, and Linux platforms. Additional configuration is required to capture system audio output on macOS.

-On Linux platforms, it can only generate captions for audio input (microphone), and currently does not support generating captions for audio output (playback).
+The following operating system versions have been tested and confirmed to work properly. The software cannot guarantee normal operation on untested OS versions.
+
+| OS Version         | Architecture | Audio Input Capture | Audio Output Capture |
+| ------------------ | ------------ | ------------------- | -------------------- |
+| Windows 11 24H2    | x64          | ✅                   | ✅                    |
+| macOS Sequoia 15.5 | arm64        | ✅ Additional config required  | ✅          |
+| Ubuntu 24.04.2     | x64          | ✅                   | ✅                    |
+| Kali Linux 2022.3  | x64          | ✅                   | ✅                    |

 ![](../../assets/media/main_en.png)

 ### Software Limitations

-To use the default caption service, you need to obtain an API KEY from Alibaba Cloud.
+To use the Gummy caption engine, you need to obtain an API KEY from Alibaba Cloud.

 Additional configuration is required to capture audio output on macOS platform.

 The software is built using Electron, so the software size is inevitably large.

-## Software Usage
+## Preparation for Using Gummy Engine

-### Preparing the Alibaba Cloud Model Studio API KEY
+To use the default caption engine provided by the software (Alibaba Cloud Gummy), you need to obtain an API KEY from the Alibaba Cloud Bailian platform. Then add the API KEY to the software settings or configure it in environment variables (only Windows platform supports reading API KEY from environment variables).

-To use the default caption engine (Alibaba Cloud Gummy), you need to obtain an API KEY from the Alibaba Cloud Model Studio and configure it in your local environment variables.
+**The international version of Alibaba Cloud services does not provide the Gummy model, so non-Chinese users currently cannot use the default caption engine.**

-**The international version of Alibaba Cloud does not provide the Gummy model, so non-Chinese users currently cannot use the default caption engine. I am trying to develop a new local caption engine to ensure that all users have access to a default caption engine.**
+Alibaba Cloud provides detailed tutorials for this part, which can be referenced:

-Alibaba Cloud provides detailed tutorials for this:
+- [Obtaining API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
+- [Configuring API Key through Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)

- [Obtain API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
- [Configure API Key in Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
+## Preparation for Using Vosk Engine

-### Capturing System Audio Output on macOS
+To use the Vosk local caption engine, first download your required model from the [Vosk Models](https://alphacephei.com/vosk/models) page. Then extract the downloaded model package locally and add the corresponding model folder path to the software settings. Currently, the Vosk caption engine does not support translated caption content.
+
+![](../../assets/media/vosk_en.png)
+
+## Capturing System Audio Output on macOS

 > Based on the [Setup Multi-Output Device](https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device) tutorial

@@ -57,6 +68,32 @@ Once BlackHole is confirmed installed, in the `Audio MIDI Setup` page, click the

 Now the caption engine can capture system audio output and generate captions.

+## Getting System Audio Output on Linux
+
+First execute in the terminal:
+
+```bash
+pactl list short sources
+```
+
+If you see output similar to the following, no additional configuration is needed:
+
+```bash
+220     alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor    PipeWire        s16le 2ch 48000Hz       SUSPENDED
+221     alsa_input.pci-0000_02_02.0.3.analog-stereo     PipeWire        s16le 2ch 48000Hz       SUSPENDED
+```
+
+Otherwise, install `pulseaudio` and `pavucontrol` using the following commands:
+
+```bash
+# For Debian/Ubuntu etc.
+sudo apt install pulseaudio pavucontrol
+# For CentOS etc.
+sudo yum install pulseaudio pavucontrol
+```
+
+## Software Usage
+
 ### Modifying Settings

 Caption settings can be divided into three categories: general settings, caption engine settings, and caption style settings. Note that changes to general settings take effect immediately. For the other two categories, after making changes, you need to click the "Apply" option in the upper right corner of the corresponding settings module for the changes to take effect. If you click "Cancel Changes," the current modifications will not be saved and will revert to the previous state.
@@ -73,13 +110,13 @@ The following image shows the caption display window, which displays the latest

 ### Exporting Caption Records

-In the caption control window, you can see the records of all collected captions. Click the "Export Caption Records" button to export the caption records as a JSON file.
+In the caption control window, you can see the records of all collected captions. Click the "Export Log" button to export the caption records as a JSON or SRT file.

 ## Caption Engine

-The so-called caption engine is actually a subprocess that real-time captures system audio input (recording) or output (playback) streaming data and uses an audio-to-text model to generate captions for the corresponding audio. The generated captions are output as JSON data converted to strings and returned to the main program. The main program reads the caption data, processes it, and displays it in the window.
+The so-called caption engine is essentially a subprogram that captures real-time streaming data from system audio input (recording) or output (playback), and invokes speech-to-text models to generate corresponding captions. The generated captions are converted into JSON-formatted strings and passed to the main program through standard output. The main program reads the caption data, processes it, and displays it in the window.

-The software provides a default caption engine. If you need other caption engines, you can call them by enabling the custom engine option (other engines need to be developed specifically for this software). The engine path is the path to the custom caption engine on your computer, and the engine command is the runtime parameters for the custom caption engine, which need to be filled out according to the rules of the specific caption engine.
+The software provides two default caption engines. If you need other caption engines, you can invoke them by enabling the custom engine option (other engines need to be specifically developed for this software). The engine path refers to the location of the custom caption engine on your computer, while the engine command represents the runtime parameters of the custom caption engine, which should be configured according to the rules of that particular caption engine.

 ![](../img/02_en.png)

--- a/docs/user-manual/ja.md
+++ b/docs/user-manual/ja.md
@@ -1,6 +1,6 @@
 # Auto Caption ユーザーマニュアル

-対応バージョン：v0.3.0
+対応バージョン：v0.5.0

 この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。

@@ -8,34 +8,45 @@

 Auto Caption は、クロスプラットフォームの字幕表示ソフトウェアで、システムの音声入力（録音）または出力（音声再生）のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。このソフトウェアが提供するデフォルトの字幕エンジン（アリババクラウド Gummy モデルを使用）は、9つの言語（中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、スペイン語、イタリア語）の認識と翻訳をサポートしています。

-現在、ソフトウェアのデフォルト字幕エンジンは Windows と macOS プラットフォームでのみ完全な機能を有しています。macOS でシステムオーディオ出力を取得するには追加の設定が必要です。
+現在のデフォルト字幕エンジンは Windows、macOS、Linux プラットフォームで完全な機能を有しています。macOSでシステムのオーディオ出力を取得するには追加設定が必要です。

-Linux プラットフォームでは、オーディオ入力（マイク）からの字幕生成のみ可能で、現在オーディオ出力（再生音）からの字幕生成はサポートしていません。
+以下のオペレーティングシステムバージョンで正常動作を確認しています。記載以外の OS での正常動作は保証できません。
+
+| OS バージョン        | アーキテクチャ | オーディオ入力取得 | オーディオ出力取得 |
+| ------------------- | ------------- | ------------------ | ------------------ |
+| Windows 11 24H2     | x64           | ✅                  | ✅                  |
+| macOS Sequoia 15.5  | arm64         | ✅ 追加設定が必要      | ✅                  |
+| Ubuntu 24.04.2      | x64           | ✅                  | ✅                  |
+| Kali Linux 2022.3   | x64           | ✅                  | ✅                  |

 ![](../../assets/media/main_ja.png)

 ### ソフトウェアの欠点

-デフォルトの字幕サービスを使用するには、アリババクラウドの API KEY を取得する必要があります。
+Gummy 字幕エンジンを使用するには、アリババクラウドの API KEY を取得する必要があります。

 macOS プラットフォームでオーディオ出力を取得するには追加の設定が必要です。

 ソフトウェアは Electron で構築されているため、そのサイズは避けられないほど大きいです。

-## ソフトウェアの使用方法
+## Gummyエンジン使用前の準備

-### 百炼プラットフォームの API KEY の準備
+ソフトウェアが提供するデフォルトの字幕エンジン（Alibaba Cloud Gummy）を使用するには、Alibaba Cloud百煉プラットフォームからAPI KEYを取得する必要があります。その後、API KEYをソフトウェア設定に追加するか、環境変数に設定します（Windowsプラットフォームのみ環境変数からのAPI KEY読み取りをサポート）。

-ソフトウェアが提供するデフォルトの字幕エンジン（アリババクラウド Gummy）を使用するには、アリババクラウド百炼プラットフォームから API KEY を取得し、ローカル環境変数に設定する必要があります。
+**Alibaba Cloudの国際版サービスではGummyモデルを提供していないため、現在中国以外のユーザーはデフォルトの字幕エンジンを使用できません。**

-**アリババクラウドの国際版には Gummy モデルが提供されていないため、中国以外のユーザーは現在、デフォルトの字幕エンジンを使用できません。すべてのユーザーが利用できるように、新しいローカルの字幕エンジンを開発中です。**
-
-アリババクラウドは詳細なチュートリアルを提供していますので、以下のリンクを参照してください：
+この部分についてAlibaba Cloudは詳細なチュートリアルを提供しており、以下を参照できます：

 - [API KEY の取得（中国語）](https://help.aliyun.com/zh/model-studio/get-api-key)
- [環境変数を通じて API Key を設定する（中国語）](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
+- [環境変数を通じて API Key を設定（中国語）](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)

-### macOS でのシステムオーディオ出力の取得方法
+## Voskエンジン使用前の準備
+
+Voskローカル字幕エンジンを使用するには、まず[Vosk Models](https://alphacephei.com/vosk/models)ページから必要なモデルをダウンロードしてください。その後、ダウンロードしたモデルパッケージをローカルに解凍し、対応するモデルフォルダのパスをソフトウェア設定に追加します。現在、Vosk字幕エンジンは字幕の翻訳をサポートしていません。
+
+![](../../assets/media/vosk_ja.png)
+
+## macOS でのシステムオーディオ出力の取得方法

 > [マルチ出力デバイスの設定](https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device) チュートリアルに基づいて作成

@@ -60,6 +71,32 @@ BlackHoleのインストールが確認できたら、`オーディオ MIDI 設

 これで字幕エンジンがシステムオーディオ出力をキャプチャし、字幕を生成できるようになります。

+## Linux でシステムオーディオ出力を取得する
+
+まずターミナルで以下を実行してください:
+
+```bash
+pactl list short sources
+```
+
+以下のような出力が確認できれば追加設定は不要です:
+
+```bash
+220     alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor    PipeWire        s16le 2ch 48000Hz       SUSPENDED
+221     alsa_input.pci-0000_02_02.0.3.analog-stereo     PipeWire        s16le 2ch 48000Hz       SUSPENDED
+```
+
+それ以外の場合は、以下のコマンドで`pulseaudio`と`pavucontrol`をインストールしてください:
+
+```bash
+# Debian/Ubuntu系の場合
+sudo apt install pulseaudio pavucontrol
+# CentOS系の場合
+sudo yum install pulseaudio pavucontrol
+```
+
+## ソフトウェアの使い方
+
 ### 設定の変更

 字幕の設定は3つのカテゴリーに分かれます：一般的な設定、字幕エンジンの設定、字幕スタイルの設定。注意すべき点として、一般的な設定の変更は即座に適用されます。しかし、他の2つの設定については、変更後に該当する設定モジュール右上の「適用」オプションをクリックすることで初めて変更が有効になります。「変更を取り消す」を選択すると、現在の変更は保存されず、前回の状態に戻ります。
@@ -76,13 +113,13 @@ BlackHoleのインストールが確認できたら、`オーディオ MIDI 設

 ### 字幕記録のエクスポート

-字幕制御ウィンドウでは、現在収集されたすべての字幕の記録を見ることができます。「字幕記録をエクスポート」ボタンをクリックすると、字幕記録をJSONファイルとしてエクスポートできます。
+「エクスポート」ボタンをクリックすると、字幕記録を JSON または SRT ファイル形式で出力できます。

 ## 字幕エンジン

-字幕エンジンとは、実際にはサブプログラムであり、システムの音声入力（録音）または出力（音声再生）のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。生成された字幕はIPC経由で文字列に変換されたJSONデータとして出力され、メインプログラムに返されます。メインプログラムは字幕データを読み取り、処理してウィンドウ上に表示します。
+字幕エンジンとは、システムのオーディオ入力（録音）または出力（再生音）のストリーミングデータをリアルタイムで取得し、音声テキスト変換モデルを呼び出して対応する字幕を生成するサブプログラムです。生成された字幕は JSON 形式の文字列に変換され、標準出力を通じてメインプログラムに渡されます。メインプログラムは字幕データを読み取り、処理した後、ウィンドウに表示します。

-ソフトウェアはデフォルトの字幕エンジンを提供しており、他の字幕エンジンが必要な場合は、カスタムエンジンオプションを開いて他の字幕エンジンを呼び出すことができます（他のエンジンはこのソフトウェアに対して開発する必要があります）。エンジンパスは、あなたのコンピュータ上のカスタム字幕エンジンのパスであり、エンジンコマンドはカスタム字幕エンジンの実行パラメータです。これらの部分は、その字幕エンジンの規則に従って記入する必要があります。
+ソフトウェアには2つのデフォルトの字幕エンジンが用意されています。他の字幕エンジンが必要な場合、カスタムエンジンオプションを有効にすることで呼び出すことができます（他のエンジンはこのソフトウェア向けに特別に開発する必要があります）。エンジンパスはコンピュータ上のカスタム字幕エンジンの場所を指し、エンジンコマンドはカスタム字幕エンジンの実行パラメータを表します。これらは該当する字幕エンジンの規則に従って設定する必要があります。

 ![](../img/02_ja.png)

--- a/docs/user-manual/zh.md
+++ b/docs/user-manual/zh.md
@@ -1,42 +1,50 @@
 # Auto Caption 用户手册

-对应版本：v0.3.0
+对应版本：v0.5.0

 ## 软件简介

 Auto Caption 是一个跨平台的字幕显示软件，能够实时获取系统音频输入（录音）或输出（播放声音）的流式数据，并调用音频转文字的模型生成对应音频的字幕。软件提供的默认字幕引擎（使用阿里云 Gummy 模型）支持九种语言（中、英、日、韩、德、法、俄、西、意）的识别与翻译。

-目前软件默认字幕引擎只有在 Windows 和 macOS 平台下才拥有完整功能，在 macOS 要获取系统音频输出需要额外配置。
+目前软件默认字幕引擎在 Windows、 macOS 和 Linux 平台下均拥有完整功能，在 macOS 要获取系统音频输出需要额外配置。

-在 Linux 平台下只能生成音频输入（麦克风）的字幕，暂不支持音频输出（播放声音）的字幕生成。
+测试过可正常运行的操作系统信息如下，软件不能保证在非下列版本的操作系统上正常运行。
+
+| 操作系统版本        | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
+| ------------------ | ---------- | ---------------- | ---------------- |
+| Windows 11 24H2    | x64        | ✅                | ✅                |
+| macOS Sequoia 15.5 | arm64      | ✅需要额外配置    | ✅                |
+| Ubuntu 24.04.2     | x64        | ✅    | ✅                |
+| Kali Linux 2022.3     | x64        | ✅    | ✅                |

 ![](../../assets/media/main_zh.png)

 ### 软件缺点

-要使用默认字幕服务需要获取阿里云的 API KEY。
+要使用默认的 Gummy 字幕引擎需要获取阿里云的 API KEY。

 在 macOS 平台获取音频输出需要额外配置。

 软件使用 Electron 构建，因此软件体积不可避免的较大。

-## 软件使用
-
-### 准备阿里云百炼平台 API KEY
+## Gummy 引擎使用前准备

 要使用软件提供的默认字幕引擎（阿里云 Gummy），需要从阿里云百炼平台获取 API KEY，然后将 API KEY 添加到软件设置中或者配置到环境变量中（仅 Windows 平台支持读取环境变量中的 API KEY）。

-![](../../assets/media/api_zh.png)
-
-**国际版的阿里云服务并没有提供 Gummy 模型，因此目前非中国用户无法使用默认字幕引擎。我正在开发新的本地字幕引擎，以确保所有用户都有默认字幕引擎可以使用。**
+**国际版的阿里云服务并没有提供 Gummy 模型，因此目前非中国用户无法使用默认字幕引擎。**

 这部分阿里云提供了详细的教程，可参考：

 - [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
-
 - [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)

-### macOS 获取系统音频输出
+## Vosk 引擎使用前准备
+
+如果要使用 Vosk 本地字幕引擎，首先需要在 [Vosk Models](https://alphacephei.com/vosk/models) 页面下载你需要的模型。然后将下载的模型安装包解压到本地，并将对应的模型文件夹的路径添加到软件的设置中。目前 Vosk 字幕引擎还不支持翻译字幕内容。
+
+![](../../assets/media/vosk_zh.png)
+
+## macOS 获取系统音频输出

 > 基于 [Setup Multi-Output Device](https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device) 教程编写

@@ -60,6 +68,32 @@ brew install blackhole-64ch

 现在字幕引擎就能捕获系统的音频输出并生成字幕了。

+## Linux 获取系统音频输出
+
+首先在控制台执行：
+
+```bash
+pactl list short sources
+```
+
+如果有以下类似的输出内容则无需额外配置：
+
+```bash
+220     alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor    PipeWire        s16le 2ch 48000Hz       SUSPENDED
+221     alsa_input.pci-0000_02_02.0.3.analog-stereo     PipeWire        s16le 2ch 48000Hz       SUSPENDED
+```
+
+否则，执行以下命令安装 `pulseaudio` 和 `pavucontrol`：
+
+```bash
+# Debian or Ubuntu, etc.
+sudo apt install pulseaudio pavucontrol
+# CentOS, etc.
+sudo yum install pulseaudio pavucontrol
+```
+
+## 软件使用
+
 ### 修改设置

 字幕设置可以分为三类：通用设置、字幕引擎设置、字幕样式设置。需要注意的是，修改通用设置是立即生效的。但是对于其他两类设置，修改后需要点击对应设置模块右上角的“应用”选项，更改才会真正生效。如果点击“取消更改”那么当前修改将不会被保存，而是回退到上次修改的状态。
@@ -76,13 +110,13 @@ brew install blackhole-64ch

 ### 字幕记录的导出

-在字幕控制窗口中可以看到当前收集的所有字幕的记录，点击“导出字幕记录”按钮，即可将字幕记录导出为 JSON 文件。
+在字幕控制窗口中可以看到当前收集的所有字幕的记录，点击“导出字幕”按钮，即可将字幕记录导出为 JSON 或 SRT 文件。

 ## 字幕引擎

-所谓的字幕引擎实际上是一个子程序，它会实时获取系统音频输入（录音）或输出（播放声音）的流式数据，并调用音频转文字的模型生成对应音频的字幕。生成的字幕通过 IPC 输出为转换为字符串的 JSON 数据，并返回给主程序。主程序读取字幕数据，处理后显示在窗口上。
+所谓的字幕引擎实际上是一个子程序，它会实时获取系统音频输入（录音）或输出（播放声音）的流式数据，并调用音频转文字的模型生成对应音频的字幕。生成的字幕通过转换为字符串的 JSON 数据，并通过标准输出传递给主程序。主程序读取字幕数据，处理后显示在窗口上。

-软件提供了一个默认的字幕引擎，如果你需要其他的字幕引擎，可以通过打开自定义引擎选项来调用其他字幕引擎（其他引擎需要针对该软件进行开发）。其中引擎路径是自定义字幕引擎在你的电脑上的路径，引擎指令是自定义字幕引擎的运行参数，这部分需要按该字幕引擎的规则进行填写。
+软件提供了两个默认的字幕引擎，如果你需要其他的字幕引擎，可以通过打开自定义引擎选项来调用其他字幕引擎（其他引擎需要针对该软件进行开发）。其中引擎路径是自定义字幕引擎在你的电脑上的路径，引擎指令是自定义字幕引擎的运行参数，这部分需要按该字幕引擎的规则进行填写。

 ![](../img/02_zh.png)

--- a/electron-builder.yml
+++ b/electron-builder.yml
@@ -6,17 +6,28 @@ files:
  - '!**/.vscode/*'
  - '!src/*'
  - '!electron.vite.config.{js,ts,mjs,cjs}'
-  - '!{.eslintcache,eslint.config.mjs,.prettierignore,.prettierrc.yaml,dev-app-update.yml,CHANGELOG.md,README.md}'
+  - '!{.eslintcache,eslint.config.mjs,.prettierignore,.prettierrc.yaml,dev-app-update.yml,CHANGELOG.md}'
+  - '!{LICENSE,README.md,README_en.md,README_ja.md}'
  - '!{.env,.env.*,.npmrc,pnpm-lock.yaml}'
  - '!{tsconfig.json,tsconfig.node.json,tsconfig.web.json}'
+  - '!caption-engine/*'
+  - '!engine-test/*'
+  - '!docs/*'
+  - '!assets/*'
 extraResources:
-  from: ./caption-engine/dist/main-gummy
-  to: ./caption-engine/main-gummy
-asarUnpack:
-  - resources/**
+  # For Windows
+  - from: ./caption-engine/dist/main-gummy.exe
+    to: ./caption-engine/main-gummy.exe
+  - from: ./caption-engine/dist/main-vosk.exe
+    to: ./caption-engine/main-vosk.exe
+  # For macOS and Linux
+  # - from: ./caption-engine/dist/main-gummy
+  #   to: ./caption-engine/main-gummy
+  # - from: ./caption-engine/dist/main-vosk
+  #   to: ./caption-engine/main-vosk
 win:
  executableName: auto-caption
-  icon: resources/icon.png
+  icon: build/icon.png
 nsis:
  artifactName: ${name}-${version}-setup.${ext}
  shortcutName: ${productName}
--- a/engine-test/vosk.ipynb
+++ b/engine-test/vosk.ipynb
@@ -0,0 +1,124 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "6fb12704",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "d:\\Projects\\auto-caption\\caption-engine\\subenv\\Lib\\site-packages\\vosk\\__init__.py\n"
+     ]
+    }
+   ],
+   "source": [
+    "import vosk\n",
+    "print(vosk.__file__)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "63a06f5c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "        采样设备：\n",
+      "            - 设备类型：音频输入\n",
+      "            - 序号：1\n",
+      "            - 名称：麦克风阵列 (Realtek(R) Audio)\n",
+      "            - 最大输入通道数：2\n",
+      "            - 默认低输入延迟：0.09s\n",
+      "            - 默认高输入延迟：0.18s\n",
+      "            - 默认采样率：44100.0Hz\n",
+      "            - 是否回环设备：False\n",
+      "\n",
+      "        音频样本块大小：2205\n",
+      "        样本位宽：2\n",
+      "        采样格式：8\n",
+      "        音频通道数：2\n",
+      "        音频采样率：44100\n",
+      "        \n"
+     ]
+    }
+   ],
+   "source": [
+    "import sys\n",
+    "import os\n",
+    "import json\n",
+    "from vosk import Model, KaldiRecognizer\n",
+    "\n",
+    "current_dir = os.getcwd() \n",
+    "sys.path.append(os.path.join(current_dir, '../caption-engine'))\n",
+    "\n",
+    "from sysaudio.win import AudioStream\n",
+    "from audioprcs import resampleRawChunk, mergeChunkChannels\n",
+    "\n",
+    "stream = AudioStream(1)\n",
+    "stream.printInfo()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "5d5a0afa",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = Model(os.path.join(\n",
+    "    current_dir,\n",
+    "    '../caption-engine/models/vosk-model-small-cn-0.22'\n",
+    "))\n",
+    "recognizer = KaldiRecognizer(model, 16000)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e9d1530",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "stream.openStream()\n",
+    "\n",
+    "for i in range(200):\n",
+    "    chunk = stream.read_chunk()\n",
+    "    chunk_mono = resampleRawChunk(chunk, stream.CHANNELS, stream.RATE, 16000)\n",
+    "    if recognizer.AcceptWaveform(chunk_mono):\n",
+    "        result = json.loads(recognizer.Result())\n",
+    "        print(\"acc:\", result.get(\"text\", \"\"))\n",
+    "    else:\n",
+    "        partial = json.loads(recognizer.PartialResult())\n",
+    "        print(\"else:\", partial.get(\"partial\", \"\"))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "subenv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/package-lock.json
+++ b/package-lock.json
@@ -1,17 +1,18 @@
 {
  "name": "auto-caption",
-  "version": "0.3.0",
+  "version": "0.5.0",
  "lockfileVersion": 3,
  "requires": true,
  "packages": {
    "": {
      "name": "auto-caption",
-      "version": "0.3.0",
+      "version": "0.5.0",
      "hasInstallScript": true,
      "dependencies": {
        "@electron-toolkit/preload": "^3.0.1",
        "@electron-toolkit/utils": "^4.0.0",
        "ant-design-vue": "^4.2.6",
+        "pidusage": "^4.0.1",
        "pinia": "^3.0.2",
        "vue-i18n": "^11.1.9",
        "vue-router": "^4.5.1"
@@ -7742,6 +7743,18 @@
        "url": "https://github.com/sponsors/jonschlinkert"
      }
    },
+    "node_modules/pidusage": {
+      "version": "4.0.1",
+      "resolved": "https://registry.npmjs.org/pidusage/-/pidusage-4.0.1.tgz",
+      "integrity": "sha512-yCH2dtLHfEBnzlHUJymR/Z1nN2ePG3m392Mv8TFlTP1B0xkpMQNHAnfkY0n2tAi6ceKO6YWhxYfZ96V4vVkh/g==",
+      "license": "MIT",
+      "dependencies": {
+        "safe-buffer": "^5.2.1"
+      },
+      "engines": {
+        "node": ">=18"
+      }
+    },
    "node_modules/pinia": {
      "version": "3.0.2",
      "resolved": "https://registry.npmmirror.com/pinia/-/pinia-3.0.2.tgz",
@@ -8292,7 +8305,6 @@
      "version": "5.2.1",
      "resolved": "https://registry.npmmirror.com/safe-buffer/-/safe-buffer-5.2.1.tgz",
      "integrity": "sha512-rp3So07KcdmmKbGvgaNxQSJr7bGVSVk5S9Eq1F+ppbRo70+YeaDxkw5Dd8NPN+GD6bjnYm2VuPuCXmpuYvmCXQ==",
-      "dev": true,
      "funding": [
        {
          "type": "github",
--- a/package.json
+++ b/package.json
@@ -1,7 +1,7 @@
 {
  "name": "auto-caption",
  "productName": "Auto Caption",
-  "version": "0.3.0",
+  "version": "0.5.0",
  "description": "A cross-platform subtitle display software.",
  "main": "./out/main/index.js",
  "author": "himeditator",
@@ -25,6 +25,7 @@
    "@electron-toolkit/preload": "^3.0.1",
    "@electron-toolkit/utils": "^4.0.0",
    "ant-design-vue": "^4.2.6",
+    "pidusage": "^4.0.1",
    "pinia": "^3.0.2",
    "vue-i18n": "^11.1.9",
    "vue-router": "^4.5.1"
--- a/src/main/ControlWindow.ts
+++ b/src/main/ControlWindow.ts
@@ -1,5 +1,7 @@
-import { shell, BrowserWindow, ipcMain, nativeTheme } from 'electron'
+import { shell, BrowserWindow, ipcMain, nativeTheme, dialog } from 'electron'
 import path from 'path'
+import { EngineInfo } from './types'
+import pidusage from 'pidusage'
 import { is } from '@electron-toolkit/utils'
 import icon from '../../build/icon.png?asset'
 import { captionWindow } from './CaptionWindow'
@@ -72,6 +74,29 @@ class ControlWindow {
      return allConfig.uiTheme
    })

+    ipcMain.handle('control.folder.select', async () => {
+      const result = await dialog.showOpenDialog({
+        properties: ['openDirectory']
+      });
+
+      if (result.canceled) return "";
+      return result.filePaths[0];
+    })
+
+    ipcMain.handle('control.engine.info', async () => {
+      const info: EngineInfo = {
+        pid: 0, ppid: 0, cpu: 0, mem: 0, elapsed: 0
+      }
+      if(captionEngine.processStatus !== 'running') return info
+      const stats = await pidusage(captionEngine.process.pid)
+      info.pid = stats.pid
+      info.ppid = stats.ppid
+      info.cpu = stats.cpu
+      info.mem = stats.memory
+      info.elapsed = stats.elapsed
+      return info
+    })
+
    ipcMain.on('control.uiLanguage.change', (_, args) => {
      allConfig.uiLanguage = args
      if(captionWindow.window){
--- a/src/main/types/index.ts
+++ b/src/main/types/index.ts
@@ -6,10 +6,11 @@ export interface Controls {
  engineEnabled: boolean,
  sourceLang: string,
  targetLang: string,
-  engine: 'gummy',
+  engine: string,
  audio: 0 | 1,
  translation: boolean,
  API_KEY: string,
+  modelPath: string,
  customized: boolean,
  customizedApp: string,
  customizedCommand: string
@@ -53,3 +54,11 @@ export interface FullConfig {
  controls: Controls,
  captionLog: CaptionItem[]
 }
+
+export interface EngineInfo {
+  pid: number,
+  ppid: number,
+  cpu: number,
+  mem: number,
+  elapsed: number
+}
--- a/src/main/utils/AllConfig.ts
+++ b/src/main/utils/AllConfig.ts
@@ -34,6 +34,7 @@ const defaultControls: Controls = {
  audio: 0,
  engineEnabled: false,
  API_KEY: '',
+  modelPath: '',
  translation: true,
  customized: false,
  customizedApp: '',
@@ -59,7 +60,6 @@ class AllConfig {
      if(config.uiTheme) this.uiTheme = config.uiTheme
      if(config.leftBarWidth) this.leftBarWidth = config.leftBarWidth
      if(config.styles) this.setStyles(config.styles)
-      if(process.platform !== 'win32' && process.platform !== 'darwin') config.controls.audio = 1
      if(config.controls) this.setControls(config.controls)
      console.log('[INFO] Read Config from:', configPath)
    }
--- a/src/main/utils/CaptionEngine.ts
+++ b/src/main/utils/CaptionEngine.ts
@@ -13,26 +13,20 @@ export class CaptionEngine {
  processStatus: 'running' | 'stopping' | 'stopped' = 'stopped'

  private getApp(): boolean {
+    allConfig.controls.customized = false
    if (allConfig.controls.customized && allConfig.controls.customizedApp) {
      this.appPath = allConfig.controls.customizedApp
      this.command = [allConfig.controls.customizedCommand]
+      allConfig.controls.customized = true
    }
    else if (allConfig.controls.engine === 'gummy') {
-      allConfig.controls.customized = false
      if(!allConfig.controls.API_KEY && !process.env.DASHSCOPE_API_KEY) {
        controlWindow.sendErrorMessage(i18n('gummy.key.missing'))
        return false
      }
-      let gummyName = ''
+      let gummyName = 'main-gummy'
      if (process.platform === 'win32') {
-        gummyName = 'main-gummy.exe'
-      }
-      else if (process.platform === 'darwin' || process.platform === 'linux') {
-        gummyName = 'main-gummy'
-      }
-      else {
-        controlWindow.sendErrorMessage(i18n('platform.unsupported') + process.platform)
-        throw new Error(i18n('platform.unsupported'))
+        gummyName += '.exe'
      }
      if (is.dev) {
        this.appPath = path.join(
@@ -55,10 +49,29 @@ export class CaptionEngine {
      if(allConfig.controls.API_KEY) {
        this.command.push('-k', allConfig.controls.API_KEY)
      }
-
-      console.log('[INFO] Engine Path:', this.appPath)
-      console.log('[INFO] Engine Command:', this.command)
    }
+    else if(allConfig.controls.engine === 'vosk'){
+      let voskName = 'main-vosk'
+      if (process.platform === 'win32') {
+        voskName += '.exe'
+      }
+      if (is.dev) {
+        this.appPath = path.join(
+          app.getAppPath(),
+          'caption-engine', 'dist', voskName
+        )
+      }
+      else {
+        this.appPath = path.join(
+          process.resourcesPath, 'caption-engine', voskName
+        )
+      }
+      this.command = []
+      this.command.push('-a', allConfig.controls.audio ? '1' : '0')
+      this.command.push('-m', `"${allConfig.controls.modelPath}"`)
+    }
+    console.log('[INFO] Engine Path:', this.appPath)
+    console.log('[INFO] Engine Command:', this.command)
    return true
  }

@@ -95,7 +108,10 @@ export class CaptionEngine {
        if (line.trim()) {
          try {
            const caption = JSON.parse(line);
-            allConfig.updateCaptionLog(caption);
+            if(caption.index === undefined) {
+              console.log('[INFO] Engine Bad Output:', caption);
+            }
+            else allConfig.updateCaptionLog(caption);
          } catch (e) {
            controlWindow.sendErrorMessage(i18n('engine.output.parse.error') + e)
            console.error('[ERROR] Error parsing JSON:', e);
@@ -105,6 +121,7 @@ export class CaptionEngine {
    });

    this.process.stderr.on('data', (data) => {
+      if(this.processStatus === 'stopping') return
      controlWindow.sendErrorMessage(i18n('engine.error') + data)
      console.error(`[ERROR] Subprocess Error: ${data}`);
    });
--- a/src/renderer/src/components/CaptionLog.vue
+++ b/src/renderer/src/components/CaptionLog.vue
@@ -4,46 +4,109 @@
      <a-app class="caption-title">
        <span style="margin-right: 30px;">{{ $t('log.title') }}</span>
      </a-app>
-      <a-button
-        type="primary"
-        style="margin-right: 20px;"
-        @click="exportCaptions"
-        :disabled="captionData.length === 0"
-      >{{ $t('log.export') }}</a-button>
-
-    <a-popover :title="$t('log.copyOptions')">
-      <template #content>
-        <div class="input-item">
-          <span class="input-label">{{ $t('log.addIndex') }}</span>
-          <a-switch v-model:checked="showIndex" />
-          <span class="input-label">{{ $t('log.copyTime') }}</span>
-          <a-switch v-model:checked="copyTime" />
-        </div>
-        <div class="input-item">
-          <span class="input-label">{{ $t('log.copyContent') }}</span>
-          <a-radio-group v-model:value="copyOption">
-            <a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
-            <a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
-            <a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
-          </a-radio-group>
-        </div>
-      </template>
-      <a-button
-        style="margin-right: 20px;"
-        @click="copyCaptions"
-        :disabled="captionData.length === 0"
-      >{{ $t('log.copy') }}</a-button>
+      <a-popover :title="$t('log.baseTime')">
+        <template #content>
+          <div class="base-time">
+            <div class="base-time-container">
+              <a-input
+                type="number" min="0"
+                v-model:value="baseHH"
+              ></a-input>
+              <span class="base-time-label">{{ $t('log.hour') }}</span>
+            </div>
+          </div><span style="margin: 0 4px;">:</span>
+          <div class="base-time">
+            <div class="base-time-container">
+              <a-input
+                type="number" min="0" max="59"
+                v-model:value="baseMM"
+              ></a-input>
+              <span class="base-time-label">{{ $t('log.min') }}</span>
+            </div>
+          </div><span style="margin: 0 4px;">:</span>
+          <div class="base-time">
+            <div class="base-time-container">
+              <a-input
+                type="number" min="0" max="59"
+                v-model:value="baseSS"
+              ></a-input>
+              <span class="base-time-label">{{ $t('log.sec') }}</span>
+            </div>
+          </div><span style="margin: 0 4px;">.</span>
+          <div class="base-time">
+            <div class="base-time-container">
+              <a-input
+                type="number" min="0" max="999"
+                v-model:value="baseMS"
+              ></a-input>
+              <span class="base-time-label">{{ $t('log.ms') }}</span>
+            </div>
+          </div>
+        </template>
+        <a-button
+          type="primary"
+          style="margin-right: 20px;"
+          @click="changeBaseTime"
+          :disabled="captionData.length === 0"
+        >{{ $t('log.changeTime') }}</a-button>
+      </a-popover>
+      <a-popover :title="$t('log.exportOptions')">
+        <template #content>
+          <div class="input-item">
+            <span class="input-label">{{ $t('log.exportFormat') }}</span>
+            <a-radio-group v-model:value="exportFormat">
+              <a-radio-button value="srt">.srt</a-radio-button>
+              <a-radio-button value="json">.json</a-radio-button>
+            </a-radio-group>
+          </div>
+          <div class="input-item">
+            <span class="input-label">{{ $t('log.exportContent') }}</span>
+            <a-radio-group v-model:value="contentOption">
+              <a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
+              <a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
+              <a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
+            </a-radio-group>
+          </div>
+        </template>
+        <a-button
+          style="margin-right: 20px;"
+          @click="exportCaptions"
+          :disabled="captionData.length === 0"
+        >{{ $t('log.export') }}</a-button>
+      </a-popover>
+      <a-popover :title="$t('log.copyOptions')">
+        <template #content>
+          <div class="input-item">
+            <span class="input-label">{{ $t('log.addIndex') }}</span>
+            <a-switch v-model:checked="showIndex" />
+            <span class="input-label">{{ $t('log.copyTime') }}</span>
+            <a-switch v-model:checked="copyTime" />
+          </div>
+          <div class="input-item">
+            <span class="input-label">{{ $t('log.copyContent') }}</span>
+            <a-radio-group v-model:value="contentOption">
+              <a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
+              <a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
+              <a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
+            </a-radio-group>
+          </div>
+        </template>
+        <a-button
+          style="margin-right: 20px;"
+          @click="copyCaptions"
+        >{{ $t('log.copy') }}</a-button>
      </a-popover>
-
      <a-button
        danger
        @click="clearCaptions"
      >{{ $t('log.clear') }}</a-button>
    </div>
+
    <a-table
      :columns="columns"
      :data-source="captionData"
      v-model:pagination="pagination"
+      style="margin-top: 10px;"
    >
      <template #bodyCell="{ column, record }">
        <template v-if="column.key === 'index'">
@@ -72,14 +135,22 @@ import { storeToRefs } from 'pinia'
 import { useCaptionLogStore } from '@renderer/stores/captionLog'
 import { message } from 'ant-design-vue'
 import { useI18n } from 'vue-i18n'
+import * as tc from '../utils/timeCalc'
+
 const { t } = useI18n()

 const captionLog = useCaptionLogStore()
 const { captionData } = storeToRefs(captionLog)

+const exportFormat = ref('srt')
 const showIndex = ref(true)
 const copyTime = ref(true)
-const copyOption = ref('both')
+const contentOption = ref('both')
+
+const baseHH = ref<number>(0)
+const baseMM = ref<number>(0)
+const baseSS = ref<number>(0)
+const baseMS = ref<number>(0)

 const pagination = ref({
  current: 1,
@@ -117,28 +188,68 @@ const columns = [
  },
 ]

+function changeBaseTime() {
+  if(baseHH.value < 0) baseHH.value = 0
+  if(baseMM.value < 0) baseMM.value = 0
+  if(baseMM.value > 59) baseMM.value = 59
+  if(baseSS.value < 0) baseSS.value = 0
+  if(baseSS.value > 59) baseSS.value = 59
+  if(baseMS.value < 0) baseMS.value = 0
+  if(baseMS.value > 999) baseMS.value = 999
+  const newBase: tc.Time = {
+    hh: Number(baseHH.value),
+    mm: Number(baseMM.value),
+    ss: Number(baseSS.value),
+    ms: Number(baseMS.value)
+  }
+  const oldBase =  tc.getTimeFromStr(captionData.value[0].time_s)
+  const deltaMs = tc.getMsFromTime(newBase) - tc.getMsFromTime(oldBase)
+  for(let i = 0; i < captionData.value.length; i++){
+    captionData.value[i].time_s =
+      tc.getNewTimeStr(captionData.value[i].time_s, deltaMs)
+    captionData.value[i].time_t =
+      tc.getNewTimeStr(captionData.value[i].time_t, deltaMs)
+  }
+}
+
 function exportCaptions() {
-  const jsonData = JSON.stringify(captionData.value, null, 2)
-  const blob = new Blob([jsonData], { type: 'application/json' })
+  const exportData = getExportData()
+  const blob = new Blob([exportData], {
+    type: exportFormat.value === 'json' ? 'application/json' : 'text/plain'
+  })
  const url = URL.createObjectURL(blob)
  const a = document.createElement('a')
  a.href = url
  const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
-  a.download = `captions-${timestamp}.json`
+  a.download = `captions-${timestamp}.${exportFormat.value}`
  document.body.appendChild(a)
  a.click()
  document.body.removeChild(a)
  URL.revokeObjectURL(url)
 }

+function getExportData() {
+  if(exportFormat.value === 'json') return JSON.stringify(captionData.value, null, 2)
+  let content = ''
+  for(let i = 0; i < captionData.value.length; i++){
+    const item = captionData.value[i]
+    content += `${i+1}\n`
+    content += `${item.time_s} --> ${item.time_t}\n`.replace(/\./g, ',')
+    if(contentOption.value === 'both') content += `${item.text}\n${item.translation}\n\n`
+    else if(contentOption.value === 'source') content += `${item.text}\n\n`
+    else content += `${item.translation}\n\n`
+  }
+  return content
+}
+
 function copyCaptions() {
  let content = ''
  for(let i = 0; i < captionData.value.length; i++){
    const item = captionData.value[i]
    if(showIndex.value) content += `${i+1}\n`
    if(copyTime.value) content += `${item.time_s} --> ${item.time_t}\n`.replace(/\./g, ',')
-    if(copyOption.value === 'both') content += `${item.text}\n${item.translation}\n\n`
-    else if(copyOption.value === 'source') content += `${item.text}\n\n`
+    if(contentOption.value === 'both') content += `${item.text}\n${item.translation}\n\n`
+    else if(contentOption.value === 'source') content += `${item.text}\n\n`
    else content += `${item.translation}\n\n`
  }
  navigator.clipboard.writeText(content)
@@ -166,6 +277,23 @@ function clearCaptions() {
  margin-bottom: 10px;
 }

+.base-time {
+  width: 64px;
+  display: inline-block;
+}
+
+.base-time-container {
+  display: flex;
+  flex-direction: column;
+  align-items: center;
+  gap: 4px;
+}
+
+.base-time-label {
+  font-size: 12px;
+  color: var(--tag-color);
+}
+
 .time-cell {
  display: flex;
  flex-direction: column;
--- a/src/renderer/src/components/CaptionStyle.vue
+++ b/src/renderer/src/components/CaptionStyle.vue
@@ -257,6 +257,7 @@ function useSameStyle(){
  currentTransFontFamily.value = currentFontFamily.value;
  currentTransFontSize.value = currentFontSize.value;
  currentTransFontColor.value = currentFontColor.value;
+  currentTransFontWeight.value = currentFontWeight.value;
 }

 function applyStyle(){
@@ -334,13 +335,12 @@ watch(changeSignal, (val) => {
 }

 .preview-container {
-  line-height: 2em;
  width: 60%;
  text-align: center;
  position: absolute;
-  padding: 20px;
+  padding: 10px;
  border-radius: 10px;
-  left: 50%;
+  left: 64%;
  transform: translateX(-50%);
  bottom: 20px;
 }
@@ -348,7 +348,7 @@ watch(changeSignal, (val) => {
 .preview-container p {
  text-align: center;
  margin: 0;
-  line-height: 1.5em;
+  line-height: 1.6em;
 }

 .left-ellipsis {
--- a/src/renderer/src/components/EngineControl.vue
+++ b/src/renderer/src/components/EngineControl.vue
@@ -16,6 +16,7 @@
    <div class="input-item">
      <span class="input-label">{{ $t('engine.transLang') }}</span>
      <a-select
+        :disabled="currentEngine === 'vosk'"
        class="input-area"
        v-model:value="currentTargetLang"
        :options="langList.filter((item) => item.value !== 'auto')"
@@ -32,7 +33,6 @@
    <div class="input-item">
      <span class="input-label">{{ $t('engine.audioType') }}</span>
      <a-select
-        :disabled="platform !== 'win32' && platform !== 'darwin'"
        class="input-area"
        v-model:value="currentAudio"
        :options="audioType"
@@ -47,15 +47,38 @@
        <a-switch v-model:checked="showMore" />
      </div>
    </div>
-    <a-card size="small" :title="$t('engine.custom.title')" v-show="showMore">
+
+    <a-card size="small" :title="$t('engine.showMore')" v-show="showMore">
      <div class="input-item">
-        <span class="input-label">{{ $t('engine.apikey') }}</span>
+        <a-popover>
+          <template #content>
+            <p class="label-hover-info">{{ $t('engine.apikeyInfo') }}</p>
+          </template>
+          <span class="input-label info-label">{{ $t('engine.apikey') }}</span>
+        </a-popover>
        <a-input
          class="input-area"
          type="password"
          v-model:value="currentAPI_KEY"
        />
      </div>
+      <div class="input-item">
+        <a-popover>
+          <template #content>
+            <p class="label-hover-info">{{ $t('engine.modelPathInfo') }}</p>
+          </template>
+          <span class="input-label info-label">{{ $t('engine.modelPath') }}</span>
+        </a-popover>
+        <span
+          class="input-folder"
+          @click="selectFolderPath"
+        ><span><FolderOpenOutlined /></span></span>
+        <a-input
+          class="input-area"
+          style="width:calc(100% - 140px);"
+          v-model:value="currentModelPath"
+        />
+      </div>
      <div class="input-item">
        <span style="margin-right:5px;">{{ $t('engine.customEngine') }}</span>
        <a-switch v-model:checked="currentCustomized" />
@@ -85,9 +108,8 @@
            ></a-input>
          </div>
        </a-card>
-      </div>      
+      </div>
    </a-card>
-
  </a-card>
  <div style="height: 20px;"></div>
 </template>
@@ -95,23 +117,25 @@
 <script setup lang="ts">
 import { ref, computed, watch } from 'vue'
 import { storeToRefs } from 'pinia'
+import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
 import { useEngineControlStore } from '@renderer/stores/engineControl'
 import { notification } from 'ant-design-vue'
-import { InfoCircleOutlined } from '@ant-design/icons-vue';
+import { FolderOpenOutlined ,InfoCircleOutlined } from '@ant-design/icons-vue';
 import { useI18n } from 'vue-i18n'

 const { t } = useI18n()
 const showMore = ref(false)

 const engineControl = useEngineControlStore()
-const { platform, captionEngine, audioType, changeSignal } = storeToRefs(engineControl)
+const { captionEngine, audioType, changeSignal } = storeToRefs(engineControl)

 const currentSourceLang = ref('auto')
 const currentTargetLang = ref('zh')
-const currentEngine = ref<'gummy'>('gummy')
+const currentEngine = ref<string>('gummy')
 const currentAudio = ref<0 | 1>(0)
 const currentTranslation = ref<boolean>(false)
 const currentAPI_KEY = ref<string>('')
+const currentModelPath = ref<string>('')
 const currentCustomized = ref<boolean>(false)
 const currentCustomizedApp = ref('')
 const currentCustomizedCommand = ref('')
@@ -132,6 +156,7 @@ function applyChange(){
  engineControl.audio = currentAudio.value
  engineControl.translation = currentTranslation.value
  engineControl.API_KEY = currentAPI_KEY.value
+  engineControl.modelPath = currentModelPath.value
  engineControl.customized = currentCustomized.value
  engineControl.customizedApp = currentCustomizedApp.value
  engineControl.customizedCommand = currentCustomizedCommand.value
@@ -151,22 +176,70 @@ function cancelChange(){
  currentAudio.value = engineControl.audio
  currentTranslation.value = engineControl.translation
  currentAPI_KEY.value = engineControl.API_KEY
+  currentModelPath.value = engineControl.modelPath
  currentCustomized.value = engineControl.customized
  currentCustomizedApp.value = engineControl.customizedApp
  currentCustomizedCommand.value = engineControl.customizedCommand
 }

+function selectFolderPath() {
+  window.electron.ipcRenderer.invoke('control.folder.select').then((folderPath) => {
+    if(!folderPath) return
+    currentModelPath.value = folderPath
+  })
+}
+
 watch(changeSignal, (val) => {
  if(val == true) {
    cancelChange();
    engineControl.changeSignal = false;
  }
 })
+
+watch(currentEngine, (val) => {
+  if(val == 'vosk'){
+    currentSourceLang.value = 'auto'
+    currentTargetLang.value = ''
+  }
+  else if(val == 'gummy'){
+    currentSourceLang.value = 'auto'
+    currentTargetLang.value = useGeneralSettingStore().uiLanguage
+  }
+})
 </script>

 <style scoped>
@import url(../assets/input.css);

+.label-hover-info {
+  margin-top: 10px;
+  max-width: min(36vw, 380px);
+}
+
+.info-label {
+  color: #1677ff;
+  cursor: pointer;
+}
+
+.input-folder {
+  display:inline-block;
+  width: 40px;
+  font-size:1.38em;
+  cursor: pointer;
+  transition: all 0.25s;
+}
+
+.input-folder>span {
+  padding: 0 2px;
+  border: 2px solid #1677ff;
+  color: #1677ff;
+  border-radius: 30%;
+}
+
+.input-folder:hover {
+  transform: scale(1.1);
+}
+
 .customize-note {
  padding: 10px 10px 0;
  color: red;
--- a/src/renderer/src/components/EngineStatus.vue
+++ b/src/renderer/src/components/EngineStatus.vue
@@ -7,12 +7,42 @@
          :value="(customized && customizedApp)?$t('status.customized'):engine"
        />
      </a-col>
-      <a-col :span="6">
-        <a-statistic
-          :title="$t('status.status')"
-          :value="engineEnabled?$t('status.started'):$t('status.stopped')"
-        />
-      </a-col>
+      <a-popover :title="$t('status.engineStatus')"> 
+        <template #content>
+          <a-row class="engine-status">
+            <a-col :flex="1" :title="$t('status.pid')" style="cursor:pointer;">
+              <div class="engine-status-title">pid</div>
+              <div>{{ pid }}</div>
+            </a-col>
+            <a-col :flex="1" :title="$t('status.ppid')" style="cursor:pointer;">
+              <div class="engine-status-title">ppid</div>
+              <div>{{ ppid }}</div>
+            </a-col>
+            <a-col :flex="1" :title="$t('status.cpu')" style="cursor:pointer;">
+              <div class="engine-status-title">cpu</div>
+              <div>{{ cpu.toFixed(1) }}%</div>
+            </a-col>
+            <a-col :flex="1" :title="$t('status.mem')" style="cursor:pointer;">
+              <div class="engine-status-title">mem</div>
+              <div>{{ (mem/1024/1024).toFixed(2) }}MB</div>
+            </a-col>
+            <a-col :flex="1" :title="$t('status.elapsed')" style="cursor:pointer;">
+              <div class="engine-status-title">elapsed</div>
+              <div>{{ (elapsed/1000).toFixed(0) }}s</div>
+            </a-col>
+          </a-row>
+        </template>
+        <a-col :span="6" @mouseenter="getEngineInfo" style="cursor: pointer;">
+          <a-statistic
+            :title="$t('status.status')"
+            :value="engineEnabled?$t('status.started'):$t('status.stopped')"
+          >
+            <template #suffix v-if="engineEnabled">
+              <InfoCircleOutlined style="font-size:18px;color:#1677ff"/>
+            </template>
+          </a-statistic>
+        </a-col>  
+      </a-popover>      
      <a-col :span="6">
        <a-statistic :title="$t('status.logNumber')" :value="captionData.length" />
      </a-col>
@@ -47,7 +77,7 @@
      <p class="about-desc">{{ $t('status.about.desc') }}</p>
      <a-divider />
      <div class="about-info">
-        <p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v0.3.0</a-tag></p>
+        <p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v0.5.0</a-tag></p>
        <p>
          <b>{{ $t('status.about.author') }}</b>
          <a
@@ -88,11 +118,12 @@
 </template>

 <script setup lang="ts">
+import { EngineInfo } from '@renderer/types'
 import { ref } from 'vue'
 import { storeToRefs } from 'pinia'
 import { useCaptionLogStore } from '@renderer/stores/captionLog'
 import { useEngineControlStore } from '@renderer/stores/engineControl'
-import { GithubOutlined } from '@ant-design/icons-vue';
+import { GithubOutlined, InfoCircleOutlined } from '@ant-design/icons-vue';

 const showAbout = ref(false)

@@ -101,20 +132,53 @@ const { captionData } = storeToRefs(captionLog)
 const engineControl = useEngineControlStore()
 const { engineEnabled, engine, customized, customizedApp } = storeToRefs(engineControl)

+const pid = ref(0)
+const ppid = ref(0)
+const cpu = ref(0)
+const mem = ref(0)
+const elapsed = ref(0)
+
 function openCaptionWindow() {
  window.electron.ipcRenderer.send('control.captionWindow.activate')
 }

 function startEngine() {
+  if(engineControl.engine === 'vosk' && engineControl.modelPath.trim() === '') {
+    engineControl.emptyModelPathErr()
+    return
+  }
  window.electron.ipcRenderer.send('control.engine.start')
 }

 function stopEngine() {
  window.electron.ipcRenderer.send('control.engine.stop')
 }
+
+function getEngineInfo() {
+  window.electron.ipcRenderer.invoke('control.engine.info').then((data: EngineInfo) => {
+    pid.value = data.pid
+    ppid.value = data.ppid
+    cpu.value = data.cpu
+    mem.value = data.mem
+    elapsed.value = data.elapsed
+  })
+}
+
 </script>

 <style scoped>
+.engine-status {
+  width: max(420px, 36vw);
+  display: flex;
+  align-items: center;
+  padding: 5px 10px;
+}
+
+.engine-status-title {
+  font-size: 12px;
+  color: var(--tag-color);
+}
+
 .about-tag {
  color: var(--tag-color);
  margin-bottom: 16px;
--- a/src/renderer/src/i18n/config/engine.ts
+++ b/src/renderer/src/i18n/config/engine.ts
@@ -16,6 +16,13 @@ export const engines = {
        { value: 'it', label: '意大利语' },
      ]
    },
+    {
+      value: 'vosk',
+      label: '本地 -  Vosk',
+      languages: [
+        { value: 'auto', label: '需要自行配置模型' },
+      ]
+    }
  ],
  en: [
    {
@@ -34,6 +41,13 @@ export const engines = {
        { value: 'it', label: 'Italian' },
      ]
    },
+    {
+      value: 'vosk',
+      label: 'Local - Vosk',
+      languages: [
+        { value: 'auto', label: 'Model needs to be configured manually' },
+      ]
+    }
  ],
  ja: [
    {
@@ -52,6 +66,13 @@ export const engines = {
        { value: 'it', label: 'イタリア語' },
      ]
    },
+    {
+      value: 'vosk',
+      label: 'ローカル - Vosk',
+      languages: [
+        { value: 'auto', label: 'モデルを手動で設定する必要があります' },
+      ]
+    }
  ]
 }

--- a/src/renderer/src/i18n/lang/en.ts
+++ b/src/renderer/src/i18n/lang/en.ts
@@ -17,6 +17,8 @@ export default {
    "custom": "Type: Custom engine, engine path: ",
    "args": ", command arguments: ",
    "pidInfo": ", caption engine process PID: ",
+    "empty": "Model Path is Empty",
+    "emptyInfo": "The Vosk model path is empty. Please set the Vosk model path in the additional settings of the subtitle engine settings.",
    "stopped": "Caption Engine Stopped",
    "stoppedInfo": "The caption engine has stopped. You can click the 'Start Caption Engine' button to restart it.",
    "error": "An error occurred",
@@ -48,6 +50,9 @@ export default {
    "enableTranslation": "Translation",
    "showMore": "More Settings",
    "apikey": "API KEY",
+    "modelPath": "Model Path",
+    "apikeyInfo": "API KEY required for the Gummy subtitle engine, which needs to be obtained from the Alibaba Cloud Bailing platform. For more details, see the project user manual.",
+    "modelPathInfo": "The folder path of the model required by the Vosk subtitle engine. You need to download the required model to your local machine in advance. For more details, see the project user manual.",
    "customEngine": "Custom Engine",
    custom: {
      "title": "Custom Caption Engine",
@@ -86,6 +91,12 @@ export default {
  },
  status: {
    "engine": "Caption Engine",
+    "engineStatus": "Caption Engine Status",
+    "pid": "Process ID",
+    "ppid": "Parent Process ID", 
+    "cpu": "CPU Usage",
+    "mem": "Memory Usage",
+    "elapsed": "Running Time",
    "customized": "Customized",
    "status": "Engine Status",
    "started": "Started",
@@ -105,21 +116,30 @@ export default {
      "projLink": "Project Link",
      "manual": "User Manual",
      "engineDoc": "Caption Engine Manual",
-      "date": "July 9, 2026"
+      "date": "July 15, 2025"
    }
  },
  log: {
    "title": "Caption Log",
-    "copy": "Copy to Clipboard",
+    "changeTime": "Modify Time",
+    "baseTime": "First Caption Start Time",
+    "hour": "Hour",
+    "min": "Minute",
+    "sec": "Second",
+    "ms": "Millisecond",
+    "export": "Export Log",
+    "copy": "Copy Log",
+    "exportOptions": "Export Options",
+    "exportFormat": "Format",
+    "exportContent": "Content",
    "copyOptions": "Copy Options",
    "addIndex": "Add Index",
    "copyTime": "Copy Time",
    "copyContent": "Content",
-    "both": "Original and Translation",
-    "source": "Original Only",
-    "translation": "Translation Only",
+    "both": "Both",
+    "source": "Original",
+    "translation": "Translation",
    "copySuccess": "Subtitle copied to clipboard",
-    "export": "Export Caption Log",
-    "clear": "Clear Caption Log"
+    "clear": "Clear Log"
  }
 }
--- a/src/renderer/src/i18n/lang/ja.ts
+++ b/src/renderer/src/i18n/lang/ja.ts
@@ -17,6 +17,8 @@ export default {
    "custom": "タイプ：カスタムエンジン、エンジンパス：",
    "args": "、コマンド引数：",
    "pidInfo": "、字幕エンジンプロセス PID：",
+    "empty": "モデルパスが空です",
+    "emptyInfo": "Vosk モデルのパスが空です。字幕エンジン設定の追加設定で Vosk モデルのパスを設定してください。",
    "stopped": "字幕エンジンが停止しました",
    "stoppedInfo": "字幕エンジンが停止しました。再起動するには「字幕エンジンを開始」ボタンをクリックしてください。",
    "error": "エラーが発生しました",
@@ -48,6 +50,9 @@ export default {
    "enableTranslation": "翻訳",
    "showMore": "詳細設定",
    "apikey": "API KEY",
+    "modelPath": "モデルパス",
+    "apikeyInfo": "Gummy 字幕エンジンに必要な API KEY は、アリババクラウド百煉プラットフォームから取得する必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
+    "modelPathInfo": "Vosk 字幕エンジンに必要なモデルのフォルダパスです。必要なモデルを事前にローカルマシンにダウンロードする必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
    "customEngine": "カスタムエンジン",
    custom: {
      "title": "カスタムキャプションエンジン",
@@ -86,6 +91,12 @@ export default {
  },
  status: {
    "engine": "字幕エンジン",
+    "engineStatus": "字幕エンジンの状態",
+    "pid": "プロセス ID",
+    "ppid": "親プロセス ID",
+    "cpu": "CPU 使用率", 
+    "mem": "メモリ使用量",
+    "elapsed": "稼働時間",
    "customized": "カスタマイズ済み",
    "status": "エンジン状態",
    "started": "開始済み",
@@ -105,21 +116,30 @@ export default {
      "projLink": "プロジェクトリンク",
      "manual": "ユーザーマニュアル",
      "engineDoc": "字幕エンジンマニュアル",
-      "date": "2025 年 7 月 9 日"
+      "date": "2025 年 7 月 15 日"
    }
  },
  log: {
    "title": "字幕ログ",
-    "copy": "クリップボードにコピー",
+    "changeTime": "時間を変更",
+    "baseTime": "最初の字幕開始時間",
+    "hour": "時",
+    "min": "分",
+    "sec": "秒",
+    "ms": "ミリ秒",
+    "export": "エクスポート",
+    "copy": "ログをコピー",
+    "exportOptions": "エクスポートオプション",
+    "exportFormat": "形式",
+    "exportContent": "内容",
    "copyOptions": "コピー設定",
    "addIndex": "順序番号",
    "copyTime": "時間",
    "copyContent": "内容",
-    "both": "原文と翻訳",
-    "source": "原文のみ",
-    "translation": "翻訳のみ",
+    "both": "すべて",
+    "source": "原文",
+    "translation": "翻訳",
    "copySuccess": "字幕がクリップボードにコピーされました",
-    "export": "エクスポート",
-    "clear": "字幕ログをクリア"
+    "clear": "ログをクリア"
  }
 }
--- a/src/renderer/src/i18n/lang/zh.ts
+++ b/src/renderer/src/i18n/lang/zh.ts
@@ -17,6 +17,8 @@ export default {
    "custom": "类型：自定义引擎，引擎路径：",
    "args": "，命令参数：",
    "pidInfo": "，字幕引擎进程 PID：",
+    "empty": "模型路径为空",
+    "emptyInfo": "Vosk 模型模型路径为空，请在字幕引擎设置的更多设置中设置 Vosk 模型的路径。",
    "stopped": "字幕引擎停止",
    "stoppedInfo": "字幕引擎已经停止，可点击“启动字幕引擎”按钮重新启动",
    "error": "发生错误",
@@ -48,6 +50,9 @@ export default {
    "enableTranslation": "启用翻译",
    "showMore": "更多设置",
    "apikey": "API KEY",
+    "modelPath": "模型路径",
+    "apikeyInfo": "Gummy 字幕引擎需要的 API KEY，需要在阿里云百炼平台获取。详细信息见项目用户手册。",
+    "modelPathInfo": "Vosk 字幕引擎需要的模型的文件夹路径，需要提前下载需要的模型到本地。信息详情见项目用户手册。",
    "customEngine": "自定义引擎",
    custom: {
      "title": "自定义字幕引擎",
@@ -86,6 +91,12 @@ export default {
  },
  status: {
    "engine": "字幕引擎",
+    "engineStatus": "字幕引擎状态",
+    "pid": "进程ID",
+    "ppid": "父进程ID",
+    "cpu": "CPU使用率",
+    "mem": "内存使用量",
+    "elapsed": "运行时间",
    "customized": "自定义",
    "status": "引擎状态",
    "started": "已启动",
@@ -105,21 +116,30 @@ export default {
      "projLink": "项目链接",
      "manual": "用户手册",
      "engineDoc": "字幕引擎手册",
-      "date": "2025 年 7 月 9 日"
+      "date": "2025 年 7 月 15 日"
    }
  },
  log: {
    "title": "字幕记录",
-    "export": "导出字幕记录",
-    "copy": "复制到剪贴板",
+    "changeTime": "修改时间",
+    "baseTime": "首条字幕起始时间",
+    "hour": "时",
+    "min": "分",
+    "sec": "秒",
+    "ms": "毫秒",
+    "export": "导出字幕",
+    "copy": "复制内容",
+    "exportOptions": "导出选项",
+    "exportFormat": "导出格式",
+    "exportContent": "导出内容",
    "copyOptions": "复制选项",
    "addIndex": "添加序号",
    "copyTime": "复制时间",
    "copyContent": "复制内容",
-    "both": "原文与翻译",
-    "source": "仅原文",
-    "translation": "仅翻译",
+    "both": "全部",
+    "source": "原文",
+    "translation": "翻译",
    "copySuccess": "字幕已复制到剪贴板",
-    "clear": "清空字幕记录"
+    "clear": "清空记录"
  }
 }
--- a/src/renderer/src/stores/engineControl.ts
+++ b/src/renderer/src/stores/engineControl.ts
@@ -1,4 +1,4 @@
-import { ref, watch } from 'vue'
+import { ref } from 'vue'
 import { defineStore } from 'pinia'

 import { notification } from 'ant-design-vue'
@@ -16,13 +16,14 @@ export const useEngineControlStore = defineStore('engineControl', () => {

  const captionEngine = ref(engines[useGeneralSettingStore().uiLanguage])
  const audioType = ref(audioTypes[useGeneralSettingStore().uiLanguage])
-  const API_KEY = ref<string>('')
  const engineEnabled = ref(false)
  const sourceLang = ref<string>('en')
  const targetLang = ref<string>('zh')
-  const engine = ref<'gummy'>('gummy')
+  const engine = ref<string>('gummy')
  const audio = ref<0 | 1>(0)
  const translation = ref<boolean>(true)
+  const API_KEY = ref<string>('')
+  const modelPath = ref<string>('')
  const customized = ref<boolean>(false)
  const customizedApp = ref<string>('')
  const customizedCommand = ref<string>('')
@@ -38,6 +39,7 @@ export const useEngineControlStore = defineStore('engineControl', () => {
      audio: audio.value,
      translation: translation.value,
      API_KEY: API_KEY.value,
+      modelPath: modelPath.value,
      customized: customized.value,
      customizedApp: customizedApp.value,
      customizedCommand: customizedCommand.value
@@ -53,12 +55,20 @@ export const useEngineControlStore = defineStore('engineControl', () => {
    engineEnabled.value = controls.engineEnabled
    translation.value = controls.translation
    API_KEY.value = controls.API_KEY
+    modelPath.value = controls.modelPath
    customized.value = controls.customized
    customizedApp.value = controls.customizedApp
    customizedCommand.value = controls.customizedCommand
    changeSignal.value = true
  }

+  function emptyModelPathErr() {
+    notification.open({
+      message: t('noti.empty'),
+      description: t('noti.emptyInfo')
+    });
+  }
+
  window.electron.ipcRenderer.on('control.controls.set', (_, controls: Controls) => {
    setControls(controls)
  })
@@ -94,15 +104,9 @@ export const useEngineControlStore = defineStore('engineControl', () => {
    });
  })

-  watch(platform, (newValue) => {
-    if(newValue !== 'win32' && newValue !== 'darwin') {
-      audio.value = 1
-    }
-  })
-
  return {
    platform,           // 系统平台
-    captionEngine,      // 字幕引擎
+    captionEngine,      // 字幕引擎列表
    audioType,          // 音频类型
    engineEnabled,      // 字幕引擎是否启用
    sourceLang,         // 源语言
@@ -111,11 +115,13 @@ export const useEngineControlStore = defineStore('engineControl', () => {
    audio,              // 选择音频
    translation,        // 是否启用翻译
    API_KEY,            // API KEY
+    modelPath,          // vosk 模型路径
    customized,         // 是否使用自定义字幕引擎
    customizedApp,      // 自定义字幕引擎的应用程序
    customizedCommand,  // 自定义字幕引擎的命令
    setControls,        // 设置引擎配置
    sendControlsChange, // 发送最新控制消息到后端
+    emptyModelPathErr,  // 模型路径为空时显示警告
    changeSignal,       // 配置改变信号
  }
 })
--- a/src/renderer/src/types/index.ts
+++ b/src/renderer/src/types/index.ts
@@ -6,10 +6,11 @@ export interface Controls {
  engineEnabled: boolean,
  sourceLang: string,
  targetLang: string,
-  engine: 'gummy',
+  engine: string,
  audio: 0 | 1,
  translation: boolean,
  API_KEY: string,
+  modelPath: string,
  customized: boolean,
  customizedApp: string,
  customizedCommand: string
@@ -53,3 +54,11 @@ export interface FullConfig {
  controls: Controls,
  captionLog: CaptionItem[]
 }
+
+export interface EngineInfo {
+  pid: number,
+  ppid: number,
+  cpu: number,
+  mem: number,
+  elapsed: number
+}
--- a/src/renderer/src/utils/timeCalc.ts
+++ b/src/renderer/src/utils/timeCalc.ts
@@ -0,0 +1,42 @@
+export interface Time {
+  hh: number;
+  mm: number;
+  ss: number;
+  ms: number;
+}
+
+export function getTimeFromStr(time: string): Time {
+  const arr = time.split(":");
+  const hh = parseInt(arr[0]);
+  const mm = parseInt(arr[1]);
+  const ss = parseInt(arr[2].split(".")[0]);
+  const ms = parseInt(arr[2].split(".")[1]);
+  return { hh, mm, ss, ms };
+}
+
+export function getStrFromTime(time: Time): string {
+  return `${time.hh}:${time.mm}:${time.ss}.${time.ms}`;
+}
+
+export function getMsFromTime(time: Time): number {
+  return (
+    time.hh * 3600000 +
+    time.mm * 60000 +
+    time.ss * 1000 +
+    time.ms
+  );
+}
+
+export function getTimeFromMs(milliseconds: number): Time {
+  const hh = Math.floor(milliseconds / 3600000);
+  const mm = Math.floor((milliseconds % 3600000) / 60000);
+  const ss = Math.floor((milliseconds % 60000) / 1000);
+  const ms = milliseconds % 1000;
+  return { hh, mm, ss, ms };
+}
+
+export function getNewTimeStr(timeStr: string, Ms: number): string {
+  const timeMs = getMsFromTime(getTimeFromStr(timeStr));
+  const newTimeMs = timeMs + Ms;
+  return getStrFromTime(getTimeFromMs(newTimeMs));
+}
--- a/src/renderer/src/views/CaptionPage.vue
+++ b/src/renderer/src/views/CaptionPage.vue
@@ -1,24 +1,11 @@
 <template>
  <div
-  class="caption-page"
-  ref="caption"
-  :style="{
-    backgroundColor: captionStyle.backgroundRGBA
-  }"
+    class="caption-page"
+    ref="caption"
+    :style="{
+      backgroundColor: captionStyle.backgroundRGBA
+    }"
  >
-    <div class="title-bar">
-      <div class="drag-area">&nbsp;</div>
-      <div class="option-item" @click="pinCaptionWindow">
-        <PushpinFilled v-if="pinned" />
-        <PushpinOutlined v-else />
-      </div>
-      <div class="option-item" @click="openControlWindow">
-        <SettingOutlined />
-      </div>
-      <div class="option-item" @click="closeCaptionWindow">
-        <CloseOutlined />
-      </div>
-    </div>
    <div
      class="caption-container"
      :style="{
@@ -46,6 +33,20 @@
        <span v-else>{{ $t('example.translation') }}</span>
      </p>
    </div>
+
+    <div class="title-bar" :style="{color: captionStyle.fontColor}">
+      <div class="option-item" @click="closeCaptionWindow">
+        <CloseOutlined />
+      </div>
+      <div class="option-item" @click="openControlWindow">
+        <SettingOutlined />
+      </div>
+      <div class="option-item" @click="pinCaptionWindow">
+        <PushpinFilled v-if="pinned" />
+        <PushpinOutlined v-else />
+      </div>
+      <div class="drag-area"></div>
+    </div>
  </div>
 </template>

@@ -97,38 +98,21 @@ function closeCaptionWindow() {
  border-radius: 8px;
  box-sizing: border-box;
  border: 1px solid #3333;
-}
-
-.title-bar {
  display: flex;
-  align-items: center;
-}
-
-.drag-area {
-  padding: 5px;
-  flex-grow: 1;
-  -webkit-app-region: drag;
-}
-
-.option-item {
-  display: inline-block;
-  padding: 5px 10px;
-  cursor: pointer;
-}
-
-.option-item:hover {
-  background-color: #2221;
 }

 .caption-container {
+  display: inline-block;
+  width: calc(100% - 32px);
  -webkit-app-region: drag;
+  padding-top: 10px;
+  padding-bottom: 10px;
 }

 .caption-container p {
  text-align: center;
  margin: 0;
-  line-height: 1.5em;
-  padding: 0 10px 10px 10px;
+  line-height: 1.6em;
 }

 .left-ellipsis {
@@ -142,4 +126,30 @@ function closeCaptionWindow() {
  direction: ltr;
  display: inline-block;
 }
+
+.title-bar {
+  width: 32px;
+  display: flex;
+  flex-direction: column;
+  vertical-align: top;
+}
+
+.option-item {
+  width: 32px;
+  height: 32px;
+  display: flex;
+  justify-content: center;
+  align-items: center;
+  cursor: pointer;
+}
+
+.option-item:hover {
+  background-color: #2221;
+}
+
+.drag-area {
+  display: inline-flex;
+  flex-grow: 1;
+  -webkit-app-region: drag;
+}
 </style>
Author	SHA1	Message	Date
himeditator	25b6ad5ed2	release v0.5.0 - 更新了发行说明和用户手册 - 优化了界面显示和功能 - 过滤 Gummy 字幕引擎输出的不完整字幕	2025-07-15 18:48:16 +08:00
himeditator mac	760c01d79e	feat(engine): 添加字幕引擎资源消耗监控功能 - 在控制窗口添加引擎状态显示，包括 PID、PPID、CPU 使用率、内存使用量和运行时间 - 优化字幕记录导出和复制功能，支持选择导出内容类型	2025-07-15 13:52:10 +08:00
himeditator	a0a0a2e66d	feat(caption): 调整字幕窗口、添加字幕时间轴修改 (#8 ) - 新增修改字幕时间功能 - 添加导出字幕记录类型，支持 srt 和 json 格式 - 调整字幕窗口右上角图标为竖向排布	2025-07-14 20:07:22 +08:00
himeditator	665c47d24f	feat(linux): 支持 Linux 系统音频输出 - 添加了对 Linux 系统音频输出的支持 - 更新了 README 和用户手册中的平台兼容性信息 - 修改了 AudioStream 类以支持 Linux 平台	2025-07-13 23:28:40 +08:00
himeditator	7f8766b13e	docs(engine-manual): 更新字幕引擎开发文档 - 添加了命令行参数指定的详细说明 - 增加了字幕引擎打包和运行的步骤说明 - 修复了一些文档中的错误和拼写问题	2025-07-11 13:25:52 +08:00
himeditator	6920957152	Merge branch 'dev-v0.4.0-vosk'	2025-07-11 02:32:33 +08:00
himeditator	604f8becc9	fix: 添加构建说明、修复 vosk 提示逻辑 - 优化 EngineStatus 组件中的引擎启动逻辑，增加对 vosk 引擎的判断 - 在 README.md、README_en.md 和 README_ja.md 中添加 macOS 截图	2025-07-11 02:31:10 +08:00
Chen Janai	0af5bab75d	Merge pull request #7 from HiMeditator/dev-v0.4.0-vosk Release v0.4.0 with Vosk Caption Engine	2025-07-11 01:36:08 +08:00
himeditator	0b8b823b2e	release v0.4.0 - 更新 README 和用户手册，增加 Vosk 引擎的使用说明 - 修改构建配置，支持 Vosk 引擎的打包 - 更新版本号至 0.4.0，准备发布新功能	2025-07-11 01:33:04 +08:00
himeditator	d354a6fefa	feat(engine): 优化 Vosk 字幕引擎支持 - 实现文件夹选择功能，用于选择 Vosk 模型路径 - 在 EngineControl 组件中添加模型路径选择按钮和相关提示 - 在 EngineStatus 组件中增加对空模型路径的检查和错误提示	2025-07-10 11:22:39 +08:00
himeditator	1c29fd5adc	feat(engine): 添加 Vosk 本地离线引擎支持 - 新增 Vosk 引擎配置和识别逻辑 - 更新用户界面，增加 Vosk 引擎选项和模型路径设置 - 更新依赖，添加 vosk 库	2025-07-09 19:53:30 +08:00
himeditator	f97b885411	release v0.3.0 - 在 README中更新访问者徽章的 page_id 为正确的项目路径 - 修改 electron-builder.yml 中的 extraResources 配置	2025-07-09 02:34:15 +08:00