Files
auto-caption/README_en.md
himeditator 082eb8579b docs(README): 更新自带字幕引擎说明 (#4)
- 在 README.md、README_en.md 和 README_ja.md 中添加了自带字幕引擎的详细说明
- 给予字幕窗口更大的顶置优先级
2025-07-07 22:54:30 +08:00

5.2 KiB

auto-caption

Auto Caption is a cross-platform real-time caption display software.

| Chinese | English | Japanese |

Version v0.2.0 has been released. Version v1.0.0, which is expected to add a local caption engine, is under development...

📥 Download

GitHub Releases

Auto Caption User Manual

Caption Engine Explanation Document

Project API Documentation (Chinese)

📖 Basic Usage

Currently, only an installable version for the Windows platform is provided. If you want to use the default Gummy caption engine, you first need to obtain an API KEY from the Alibaba Cloud Model Studio and configure it in the environment variables. This is necessary to use the model properly.

The international version of Alibaba Cloud does not provide the Gummy model, so non-Chinese users currently cannot use the default caption engine. I am trying to develop a new local caption engine to ensure that all users have access to a default caption engine.

Relevant tutorials:

If you want to understand how the caption engine works or if you want to develop your own caption engine, please refer to the Caption Engine Explanation Document.

Features

  • Multi-language interface support
  • Rich caption style settings
  • Flexible caption engine selection
  • Multi-language recognition and translation
  • Caption record display and export
  • Generate captions for audio output and microphone input

Notes:

  • The Windows platform supports generating captions for both audio output and microphone input.
  • The Linux platform currently only supports generating captions for microphone input.
  • The macOS platform is not yet supported.

⚙️ Subtitle Engine Description

Currently, the software comes with 1 subtitle engine, and 2 new engines are being planned. The details of these engines are as follows.

Gummy Subtitle Engine (Cloud-based)

Developed based on the Gummy Speech Translation Large Model from Tongyi Lab, this cloud-based model is invoked through the API provided by Aliyun Bailing.

Model Detailed Parameters:

  • Supported audio sampling rates: 16kHz and above
  • Audio bit depth: 16bit
  • Supported audio channels: Mono
  • Recognizable languages: Chinese, English, Japanese, Korean, German, French, Russian, Italian, Spanish
  • Supported translations:
    • Chinese → English, Japanese, Korean
    • English → Chinese, Japanese, Korean
    • Japanese, Korean, German, French, Russian, Italian, Spanish → Chinese or English

Network Traffic Consumption:

The subtitle engine uses the native sampling rate (assuming 48kHz) for sampling, with a sample bit depth of 16bit and single-channel audio, so the upload rate is approximately:


48000\, \text{samples/second} \times 2\,\text{bytes/sample} \times 1\, \text{channel} = 93.75\,\text{KB/s}

The traffic consumption for returning the model results is relatively small and can be disregarded.

Vosk Subtitle Engine (Local)

Expected to be developed based on vosk-api, currently under experimentation.

FunASR Subtitle Engine (Local)

If feasible, it will be developed based on FunASR. Research and feasibility verification have not yet been conducted.

🚀 Project Execution

Install Dependencies

npm install

Build Caption Engine

First, navigate to the caption-engine folder and execute the following command to create a virtual environment:

python -m venv subenv

Then activate the virtual environment:

# Windows
subenv/Scripts/activate
# Linux
source subenv/bin/activate

Next, install the dependencies (note that if you are in a Linux environment, you should comment out PyAudioWPatch in requirements.txt, as this module is only applicable to the Windows environment):

pip install -r requirements.txt

Then build the project using pyinstaller:

pyinstaller --onefile main-gummy.py

At this point, the project is built. You can find the executable file in the caption-engine/dist folder and proceed with further operations.

Run the Project

npm run dev

Build the Project

Note that the software is currently not adapted for the macOS platform. Please use Windows or Linux systems for building, with Windows being more recommended due to its full functionality.

# For Windows
npm run build:win
# For macOS, not avaliable yet
npm run build:mac
# For Linux
npm run build:linux