mirror of https://github.com/HiMeditator/auto-caption.git synced 2026-05-13 06:57:30 +08:00

Go to file

himeditator 606f9b480b release v0.3.0

- 新增字幕字体粗细、文本阴影等设置选项
- 更新相关文档，增加新功能说明
- 修复系统主题载入颜色bug

2025-07-09 01:33:21 +08:00

.vscode

refactor(caption-engine): 重构字幕引擎代码结构

2025-07-07 22:54:30 +08:00

assets

release v0.3.0

2025-07-09 01:33:21 +08:00

build

build: 进行 macOS 适配，更新图标资源并升级项目版本

2025-07-08 13:27:44 +08:00

caption-engine

feat(gummy): 支持通过设置添加 API KEY

2025-07-08 21:05:43 +08:00

docs

release v0.3.0

2025-07-09 01:33:21 +08:00

engine-test

feat(sysaudio): 支持 macOS 系统音频流采集

2025-07-08 17:04:15 +08:00

src

release v0.3.0

2025-07-09 01:33:21 +08:00

.editorconfig

refactor(caption-engine): 重构字幕引擎代码结构

2025-07-07 22:54:30 +08:00

.gitignore

feat(sysaudio): 支持 macOS 系统音频流采集

2025-07-08 17:04:15 +08:00

.npmrc

refactor: 重构项目后端

2025-07-01 21:50:33 +08:00

.prettierignore

init repo

2025-05-11 21:41:22 +08:00

.prettierrc.yaml

init repo

2025-05-11 21:41:22 +08:00

electron-builder.yml

feat(gummy): 支持通过设置添加 API KEY

2025-07-08 21:05:43 +08:00

electron.vite.config.ts

init repo

2025-05-11 21:41:22 +08:00

eslint.config.mjs

feat: 实现简易字幕

2025-05-11 23:50:31 +08:00

LICENSE

feat: 更新 README 并添加清空字幕记录功能

2025-06-22 00:17:43 +08:00

package-lock.json

feat(gummy): 支持通过设置添加 API KEY

2025-07-08 21:05:43 +08:00

package.json

feat(gummy): 支持通过设置添加 API KEY

2025-07-08 21:05:43 +08:00

README_en.md

release v0.3.0

2025-07-09 01:33:21 +08:00

README_ja.md

release v0.3.0

2025-07-09 01:33:21 +08:00

README.md

release v0.3.0

2025-07-09 01:33:21 +08:00

tsconfig.json

init repo

2025-05-11 21:41:22 +08:00

tsconfig.node.json

init repo

2025-05-11 21:41:22 +08:00

tsconfig.web.json

init repo

2025-05-11 21:41:22 +08:00

README_en.md

auto-caption

Auto Caption is a cross-platform real-time caption display software.

| 简体中文 | English | 日本語 |

Version v0.3.0 has been released. Version v1.0.0, which is expected to add a local caption engine, is still under development...

📥 Download

GitHub Releases

📚 Documentation

Auto Caption User Manual

Caption Engine Documentation

Project API Documentation (Chinese)

📖 Basic Usage

Currently, installable versions are provided for Windows and macOS platforms. To use the default Gummy caption engine, you first need to obtain an API KEY from Alibaba Cloud Bailian platform, then add the API KEY to the software settings or configure it in environment variables (only Windows platform supports reading API KEY from environment variables) to enable normal usage of this model.

The international version of Alibaba Cloud services does not provide the Gummy model, so currently non-Chinese users cannot use the default caption engine. I'm developing a new local caption engine to ensure all users have a default caption engine available.

✨ Features

Cross-platform, multi-language UI support
Rich caption style settings
Flexible caption engine selection
Multi-language recognition and translation
Caption recording display and export
Generate captions for audio output or microphone input

Notes:

Windows and macOS platforms support generating captions for both audio output and microphone input, but macOS requires additional setup to capture system audio output. See Auto Caption User Manual for details.
Linux platform currently cannot capture system audio output, only supports generating subtitles for microphone input.

⚙️ Built-in Subtitle Engines

Currently, the software comes with 1 subtitle engine, with 2 new engines planned. Details are as follows.

Gummy Subtitle Engine (Cloud)

Developed based on Tongyi Lab's Gummy Speech Translation Model, using Alibaba Cloud Bailian API to call this cloud model.

Model Parameters:

Supported audio sample rate: 16kHz and above
Audio sample depth: 16bit
Supported audio channels: Mono
Recognizable languages: Chinese, English, Japanese, Korean, German, French, Russian, Italian, Spanish
Supported translations:
- Chinese → English, Japanese, Korean
- English → Chinese, Japanese, Korean
- Japanese, Korean, German, French, Russian, Italian, Spanish → Chinese or English

Network Traffic Consumption:

The subtitle engine uses native sample rate (assumed to be 48kHz) for sampling, with 16bit sample depth and mono channel, so the upload rate is approximately:


48000\ \text{samples/second} \times 2\ \text{bytes/sample} \times 1\ \text{channel}  = 93.75\ \text{KB/s}

The engine only uploads data when receiving audio streams, so the actual upload rate may be lower. The return traffic consumption of model results is small and not considered here.

Vosk Subtitle Engine (Local)

Planned to be developed based on vosk-api, currently in experimentation.

FunASR Subtitle Engine (Local)

If feasible, will be developed based on FunASR. Not yet researched or verified for feasibility.

🚀 Project Setup

Install Dependencies

npm install

Build Subtitle Engine

First enter the caption-engine folder and execute the following commands to create a virtual environment:

# in ./caption-engine folder
python -m venv subenv
# or
python3 -m venv subenv

Then activate the virtual environment:

# Windows
subenv/Scripts/activate
# Linux or macOS
source subenv/bin/activate

Then install dependencies (note: for Linux or macOS environments, you need to comment out PyAudioWPatch in requirements.txt, as this module is only for Windows environments).

This step may report errors, usually due to build failures. You need to install corresponding build tools based on the error messages.

pip install -r requirements.txt

Then use pyinstaller to build the project:

pyinstaller --onefile main-gummy.py

After the build completes, you can find the executable file in the caption-engine/dist folder. Then proceed with subsequent operations.

Run Project

npm run dev

Build Project

Note: Currently the software has only been built and tested on Windows and macOS platforms. Correct operation on Linux platform is not guaranteed.

# For windows
npm run build:win
# For macOS
npm run build:mac
# For Linux
npm run build:linux

Languages

TypeScript 43.1%

Vue 30.8%

Python 25.2%

JavaScript 0.4%

CSS 0.3%

Other 0.2%