2025-06-26 23:04:39 +08:00
2025-06-12 23:03:51 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 23:50:31 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 21:41:22 +08:00
2025-05-11 21:41:22 +08:00

auto-caption

Auto Caption is a cross-platform real-time subtitle display software.

| 简体中文 | English |

⚠️ Attention

The current software interface language is Chinese. English adaptation has not been done yet.

📥 Download

GitHub Releases

📚 User Manual

Auto Caption User Manual (Chinese)

Caption Engine Documentation (Chinese)

Basic Usage

Currently, only an installable version for the Windows platform is provided. If using the default Gummy subtitle engine, you need to obtain an API KEY from Alibaba Cloud's Bailian platform and configure it in the environment variables to use the model properly. Related tutorials: Get API KEY, Configure API Key through Environment Variables.

For developers, you can create a new subtitle engine. For instructions on customizing the subtitle engine, please refer to the Caption Engine Documentation (Chinese).

Features

  • Rich subtitle style settings
  • Flexible subtitle engine selection
  • Multi-language recognition and translation
  • Subtitle record display and export
  • Generate subtitles for audio output and microphone input

Note: The Windows platform supports generating subtitles for both audio output and microphone input, while the Linux platform only supports generating subtitles for microphone input.

🚀 Project Execution

Install Dependencies

npm install

Build Subtitle Engine

Background

If you are a developer and want to develop a custom subtitle engine, please refer to the Caption Engine Documentation (Chinese).

The so-called subtitle engine is actually a subprocess that will real-time acquire streaming data from system audio input (recording) or output (playing sound) and call an audio-to-text model to generate corresponding subtitles for the audio. The generated subtitles are output as JSON data converted to strings via IPC and returned to the main program. The main program reads the subtitle data, processes it, and displays it on the window.

Currently, the project uses the Alibaba Cloud Gummy Model by default, which requires obtaining an API KEY from Alibaba Cloud's Bailian platform and configuring it in the environment variables to function properly. Related tutorials: Get API KEY, Configure API Key through Environment Variables.

The gummy subtitle engine in this project is a Python subprocess, packaged into an executable file using pyinstaller. The code for running the subtitle engine subprocess is in the src\main\utils\engine.ts file.

First, enter the python-subprocess folder and execute the following command to create a virtual environment:

python -m venv subenv

Then activate the virtual environment:

# Windows
subenv/Scripts/activate
# Linux
source subenv/bin/activate

Then install the dependencies (note that if you are in a Linux environment, you need to comment out PyAudioWPatch in requirements.txt, as this module is only applicable to the Windows environment):

pip install -r requirements.txt

Then build the project using pyinstaller:

pyinstaller --onefile main-gummy.py

At this point, the project is built. You can find the corresponding executable file in the python-subprocess/dist folder. You can proceed with further operations.

Run the Project

npm run dev

Build the Project

Please note that the software is currently not compatible with the macOS platform. Use Windows or Linux systems for building, with Windows being more recommended as it implements the full set of features.

# For Windows
npm run build:win
# For macOS
npm run build:mac
# For Linux
npm run build:linux
Description
A cross-platform subtitle display software. 一个跨平台的字幕显示软件。
Readme MIT 34 MiB
Languages
TypeScript 43.1%
Vue 30.8%
Python 25.2%
JavaScript 0.4%
CSS 0.3%
Other 0.2%