himeditator cd9f3a847d feat(engine): 重构字幕引擎并实现 WebSocket 通信
- 重构了 Gummy 和 Vosk 字幕引擎的代码,提高了可扩展性和可读性
- 合并 Gummy 和 Vosk 引擎为单个可执行文件
- 实现了字幕引擎和主程序之间的 WebSocket 通信,避免了孤儿进程问题
2025-07-28 15:49:52 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 23:50:31 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 21:41:22 +08:00

auto-caption

Auto Caption is a cross-platform real-time caption display software.

| 简体中文 | English | 日本語 |

Version v0.5.1 has been released. The current Vosk local caption engine performs poorly and does not include translation. A better caption engine is under development...

📥 Download

GitHub Releases

📚 Documentation

Auto Caption User Manual

Caption Engine Documentation

Project API Documentation (Chinese)

📖 Basic Usage

The software has been adapted for Windows, macOS, and Linux platforms. The tested platform information is as follows:

OS Version Architecture System Audio Input System Audio Output
Windows 11 24H2 x64
macOS Sequoia 15.5 arm64 Additional config required
Ubuntu 24.04.2 x64
Kali Linux 2022.3 x64
Kylin Server V10 SP3 x64

Additional configuration is required to capture system audio output on macOS and Linux platforms. See Auto Caption User Manual for details.

The international version of Alibaba Cloud services does not provide the Gummy model, so non-Chinese users currently cannot use the Gummy caption engine.

To use the default Gummy caption engine (which uses cloud-based models for speech recognition and translation), you first need to obtain an API KEY from the Alibaba Cloud Bailian platform. Then add the API KEY to the software settings or configure it in environment variables (only Windows platform supports reading API KEY from environment variables) to properly use this model. Related tutorials:

The recognition performance of Vosk models is suboptimal, please use with caution.

To use the Vosk local caption engine, first download your required model from Vosk Models page, extract the model locally, and add the model folder path to the software settings. Currently, the Vosk caption engine does not support translated captions.

If you find the above caption engines don't meet your needs and you know Python, you may consider developing your own caption engine. For detailed instructions, please refer to the Caption Engine Documentation.

Features

  • Cross-platform, multi-language UI support
  • Rich caption style settings
  • Flexible caption engine selection
  • Multi-language recognition and translation
  • Caption recording display and export
  • Generate captions for audio output or microphone input

⚙️ Built-in Subtitle Engines

Currently, the software comes with 2 subtitle engines, with 1 new engine planned. Details are as follows.

Gummy Subtitle Engine (Cloud)

Developed based on Tongyi Lab's Gummy Speech Translation Model, using Alibaba Cloud Bailian API to call this cloud model.

Model Parameters:

  • Supported audio sample rate: 16kHz and above
  • Audio sample depth: 16bit
  • Supported audio channels: Mono
  • Recognizable languages: Chinese, English, Japanese, Korean, German, French, Russian, Italian, Spanish
  • Supported translations:
    • Chinese → English, Japanese, Korean
    • English → Chinese, Japanese, Korean
    • Japanese, Korean, German, French, Russian, Italian, Spanish → Chinese or English

Network Traffic Consumption:

The subtitle engine uses native sample rate (assumed to be 48kHz) for sampling, with 16bit sample depth and mono channel, so the upload rate is approximately:


48000\ \text{samples/second} \times 2\ \text{bytes/sample} \times 1\ \text{channel}  = 93.75\ \text{KB/s}

The engine only uploads data when receiving audio streams, so the actual upload rate may be lower. The return traffic consumption of model results is small and not considered here.

Vosk Subtitle Engine (Local)

Developed based on vosk-api. Currently only supports generating original text from audio, does not support translation content.

FunASR Subtitle Engine (Local)

If feasible, will be developed based on FunASR. Not yet researched or verified for feasibility.

🚀 Project Setup

Install Dependencies

npm install

Build Subtitle Engine

First enter the engine folder and execute the following commands to create a virtual environment:

# in ./engine folder
python -m venv subenv
# or
python3 -m venv subenv

Then activate the virtual environment:

# Windows
subenv/Scripts/activate
# Linux or macOS
source subenv/bin/activate

Then install dependencies (this step may fail, usually due to build failures - you'll need to install the corresponding tool packages based on the error messages):

# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt

If you encounter errors when installing the samplerate module on Linux systems, you can try installing it separately with this command:

pip install samplerate --only-binary=:all:

Then use pyinstaller to build the project:

pyinstaller ./main-gummy.spec
pyinstaller ./main-vosk.spec

Note that the path to the vosk library in main-vosk.spec might be incorrect and needs to be configured according to the actual situation.

# Windows
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
# Linux or macOS
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())

After the build completes, you can find the executable file in the engine/dist folder. Then proceed with subsequent operations.

Run Project

npm run dev

Build Project

# For windows
npm run build:win
# For macOS
npm run build:mac
# For Linux
npm run build:linux

Note: You need to modify the configuration content in the electron-builder.yml file in the project root directory according to different platforms:

extraResources:
  # For Windows
  - from: ./engine/dist/main-gummy.exe
    to: ./engine/main-gummy.exe
  - from: ./engine/dist/main-vosk.exe
    to: ./engine/main-vosk.exe
  # For macOS and Linux
  # - from: ./engine/dist/main-gummy
  #   to: ./engine/main-gummy
  # - from: ./engine/dist/main-vosk
  #   to: ./engine/main-vosk
Description
A cross-platform subtitle display software. 一个跨平台的字幕显示软件。
Readme MIT 34 MiB
Languages
TypeScript 43.1%
Vue 30.8%
Python 25.2%
JavaScript 0.4%
CSS 0.3%
Other 0.2%