11 Commits

Author SHA1 Message Date
himeditator
b658ef5440 feat(engine): 优化字幕引擎输出格式、准备合并两个字幕引擎
- 重构字幕引擎相关代码
- 准备合并两个字幕引擎
2025-07-27 17:15:12 +08:00
himeditator
3792eb88b6 refactor(engine): 重构字幕引擎
- 更新 GummyTranslator 类,优化字幕生成逻辑
- 移除 audioprcs 模块,音频处理功能转移到 utils 模块
- 重构 sysaudio 模块,提高音频流管理的灵活性和稳定性
- 修改 TODO.md,完成按时间降序排列字幕记录的功能
- 更新文档,说明因资源限制将不再维护英文和日文文档
2025-07-26 23:37:24 +08:00
himeditator
8e575a9ba3 refactor(engine): 字幕引擎文件夹重命名,字幕记录添加降序选择
- 字幕记录表格可以按时间降序排列
- 将 caption-engine 重命名为 engine
- 更新了相关文件和文件夹的路径
- 修改了 README 和 TODO 文档中的相关内容
- 更新了 Electron 构建配置
2025-07-26 21:29:16 +08:00
himeditator
697488ce84 docs: update README, add TODO 2025-07-20 00:32:57 +08:00
himeditator
f7d2df938d fix(engine): 修复自定义字幕引擎相关问题 2025-07-17 20:52:27 +08:00
himeditator
5513c7e84c docs(compatibility): 添加 Kylin OS 支持、更新文档 2025-07-16 20:55:03 +08:00
himeditator
25b6ad5ed2 release v0.5.0
- 更新了发行说明和用户手册
- 优化了界面显示和功能
- 过滤 Gummy 字幕引擎输出的不完整字幕
2025-07-15 18:48:16 +08:00
himeditator mac
760c01d79e feat(engine): 添加字幕引擎资源消耗监控功能
- 在控制窗口添加引擎状态显示,包括 PID、PPID、CPU 使用率、内存使用量和运行时间
- 优化字幕记录导出和复制功能,支持选择导出内容类型
2025-07-15 13:52:10 +08:00
himeditator
a0a0a2e66d feat(caption): 调整字幕窗口、添加字幕时间轴修改 (#8)
- 新增修改字幕时间功能
- 添加导出字幕记录类型,支持 srt 和 json 格式
- 调整字幕窗口右上角图标为竖向排布
2025-07-14 20:07:22 +08:00
himeditator
665c47d24f feat(linux): 支持 Linux 系统音频输出
- 添加了对 Linux 系统音频输出的支持
- 更新了 README 和用户手册中的平台兼容性信息
- 修改了 AudioStream 类以支持 Linux 平台
2025-07-13 23:28:40 +08:00
himeditator
7f8766b13e docs(engine-manual): 更新字幕引擎开发文档
- 添加了命令行参数指定的详细说明
- 增加了字幕引擎打包和运行的步骤说明
- 修复了一些文档中的错误和拼写问题
2025-07-11 13:25:52 +08:00
71 changed files with 1475 additions and 1092 deletions

8
.gitignore vendored
View File

@@ -5,8 +5,8 @@ out
.eslintcache
*.log*
__pycache__
subenv
caption-engine/build
caption-engine/models
output.wav
.venv
subenv
engine/build
engine/models
engine/notebook

4
.npmrc
View File

@@ -1,2 +1,2 @@
# electron_mirror=https://npmmirror.com/mirrors/electron/
# electron_builder_binaries_mirror=https://npmmirror.com/mirrors/electron-builder-binaries/
electron_mirror=https://npmmirror.com/mirrors/electron/
electron_builder_binaries_mirror=https://npmmirror.com/mirrors/electron-builder-binaries/

View File

@@ -9,6 +9,6 @@
"editor.defaultFormatter": "esbenp.prettier-vscode"
},
"python.analysis.extraPaths": [
"./caption-engine"
"./engine"
]
}

View File

@@ -4,7 +4,7 @@
<p>Auto Caption 是一个跨平台的实时字幕显示软件。</p>
<p>
<a href="https://github.com/HiMeditator/auto-caption/releases">
<img src="https://img.shields.io/badge/release-0.4.0-blue">
<img src="https://img.shields.io/badge/release-0.5.1-blue">
</a>
<a href="https://github.com/HiMeditator/auto-caption/issues">
<img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
@@ -18,13 +18,11 @@
| <a href="./README_en.md">English</a>
| <a href="./README_ja.md">日本語</a> |
</p>
<p><i>包含 Vosk 本地字幕引擎的 v0.4.0 版本已经发布。<b>目前本地字幕引擎不含翻译</b>,本地翻译模块仍正在开发中...</i></p>
<p><i>v0.5.1 版本已经发布。<b>目前 Vosk 本地字幕引擎效果较差,且不含翻译</b>,更优秀的字幕引擎正在尝试开发中...</i></p>
</div>
![](./assets/media/main_zh.png)
![](./assets/media/main_mac_zh.png)
## 📥 下载
[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
@@ -39,7 +37,17 @@
## 📖 基本使用
目前提供了 WindowsmacOS 平台的可安装版本。
软件已经适配了 WindowsmacOS 和 Linux 平台。测试过的平台信息如下:
| 操作系统版本 | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
| ------------------ | ---------- | ---------------- | ---------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置,详见[Auto Caption 用户手册](./docs/user-manual/zh.md)。
> 国际版的阿里云服务并没有提供 Gummy 模型,因此目前非中国用户无法使用 Gummy 字幕引擎。
@@ -54,7 +62,6 @@
![](./assets/media/vosk_zh.png)
**如果你觉得上述字幕引擎不能满足你的需求,而且你会 Python那么你可以考虑开发自己的字幕引擎。详细说明请参考[字幕引擎说明文档](./docs/engine-manual/zh.md)。**
## ✨ 特性
@@ -66,10 +73,6 @@
- 字幕记录展示与导出
- 生成音频输出或麦克风输入的字幕
说明:
- Windows 和 macOS 平台支持生成音频输出和麦克风输入的字幕,但是 **macOS 平台获取系统音频输出需要进行设置,详见[Auto Caption 用户手册](./docs/user-manual/zh.md)**
- Linux 平台目前无法获取系统音频输出,仅支持生成麦克风输入的字幕
## ⚙️ 自带字幕引擎说明
目前软件自带 2 个字幕引擎,正在规划 1 个新的引擎。它们的详细信息如下。
@@ -119,10 +122,10 @@ npm install
### 构建字幕引擎
首先进入 `caption-engine` 文件夹,执行如下指令创建虚拟环境:
首先进入 `engine` 文件夹,执行如下指令创建虚拟环境:
```bash
# in ./caption-engine folder
# in ./engine folder
python -m venv subenv
# or
python3 -m venv subenv
@@ -137,12 +140,21 @@ subenv/Scripts/activate
source subenv/bin/activate
```
然后安装依赖(注意如果是 Linux 或 macOS 环境,需要注释掉 `requirements.txt` 中的 `PyAudioWPatch`,该模块仅适用于 Windows 环境)。
> 这一步可能会报错,一般是因为构建失败,需要根据报错信息安装对应的构建工具包。
然后安装依赖(这一步可能会报错,一般是因为构建失败,需要根据报错信息安装对应的工具包):
```bash
pip install -r requirements.txt
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
如果在 Linux 系统上安装 samplerate 模块报错,可以尝试使用以下命令单独安装:
```bash
pip install samplerate --only-binary=:all:
```
然后使用 `pyinstaller` 构建项目:
@@ -152,7 +164,7 @@ pyinstaller ./main-gummy.spec
pyinstaller ./main-vosk.spec
```
注意 `main-vosk.spec` 文件中 `vsok` 库的路径可能不正确,需要根据实际状况配置。
注意 `main-vosk.spec` 文件中 `vosk` 库的路径可能不正确,需要根据实际状况配置。
```
# Windows
@@ -161,16 +173,15 @@ vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
```
此时项目构建完成,在进入 `caption-engine/dist` 文件夹可见对应的可执行文件。即可进行后续操作。
此时项目构建完成,在进入 `engine/dist` 文件夹可见对应的可执行文件。即可进行后续操作。
### 运行项目
```bash
npm run dev
```
### 构建项目
注意目前软件只在 Windows 和 macOS 平台上进行了构建和测试,无法保证软件在 Linux 平台下的正确性。
### 构建项目
```bash
# For windows
@@ -186,13 +197,13 @@ npm run build:linux
```yml
extraResources:
# For Windows
- from: ./caption-engine/dist/main-gummy.exe
to: ./caption-engine/main-gummy.exe
- from: ./caption-engine/dist/main-vosk.exe
to: ./caption-engine/main-vosk.exe
- from: ./engine/dist/main-gummy.exe
to: ./engine/main-gummy.exe
- from: ./engine/dist/main-vosk.exe
to: ./engine/main-vosk.exe
# For macOS and Linux
# - from: ./caption-engine/dist/main-gummy
# to: ./caption-engine/main-gummy
# - from: ./caption-engine/dist/main-vosk
# to: ./caption-engine/main-vosk
# - from: ./engine/dist/main-gummy
# to: ./engine/main-gummy
# - from: ./engine/dist/main-vosk
# to: ./engine/main-vosk
```

View File

@@ -4,7 +4,7 @@
<p>Auto Caption is a cross-platform real-time caption display software.</p>
<p>
<a href="https://github.com/HiMeditator/auto-caption/releases">
<img src="https://img.shields.io/badge/release-0.4.0-blue">
<img src="https://img.shields.io/badge/release-0.5.1-blue">
</a>
<a href="https://github.com/HiMeditator/auto-caption/issues">
<img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
@@ -18,13 +18,11 @@
| <b>English</b>
| <a href="./README_ja.md">日本語</a> |
</p>
<p><i>The v0.4.0 version with Vosk local caption engine has been released. <b>Currently the local caption engine does not include translation</b>, the local translation module is still under development...</i></p>
<p><i>Version v0.5.1 has been released. <b>The current Vosk local caption engine performs poorly and does not include translation</b>. A better caption engine is under development...</i></p>
</div>
![](./assets/media/main_en.png)
![](./assets/media/main_mac_en.png)
## 📥 Download
[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
@@ -39,7 +37,17 @@
## 📖 Basic Usage
Currently, installable versions are available for Windows and macOS platforms.
The software has been adapted for Windows, macOS, and Linux platforms. The tested platform information is as follows:
| OS Version | Architecture | System Audio Input | System Audio Output |
| ------------------ | ------------ | ------------------ | ------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
Additional configuration is required to capture system audio output on macOS and Linux platforms. See [Auto Caption User Manual](./docs/user-manual/en.md) for details.
> The international version of Alibaba Cloud services does not provide the Gummy model, so non-Chinese users currently cannot use the Gummy caption engine.
@@ -65,10 +73,6 @@ To use the Vosk local caption engine, first download your required model from [V
- Caption recording display and export
- Generate captions for audio output or microphone input
Notes:
- Windows and macOS platforms support generating captions for both audio output and microphone input, but **macOS requires additional setup to capture system audio output. See [Auto Caption User Manual](./docs/user-manual/en.md) for details.**
- Linux platform currently cannot capture system audio output, only supports generating subtitles for microphone input.
## ⚙️ Built-in Subtitle Engines
Currently, the software comes with 2 subtitle engines, with 1 new engine planned. Details are as follows.
@@ -118,10 +122,10 @@ npm install
### Build Subtitle Engine
First enter the `caption-engine` folder and execute the following commands to create a virtual environment:
First enter the `engine` folder and execute the following commands to create a virtual environment:
```bash
# in ./caption-engine folder
# in ./engine folder
python -m venv subenv
# or
python3 -m venv subenv
@@ -136,12 +140,21 @@ subenv/Scripts/activate
source subenv/bin/activate
```
Then install dependencies (note: for Linux or macOS environments, you need to comment out `PyAudioWPatch` in `requirements.txt`, as this module is only for Windows environments).
> This step may report errors, usually due to build failures. You need to install corresponding build tools based on the error messages.
Then install dependencies (this step may fail, usually due to build failures - you'll need to install the corresponding tool packages based on the error messages):
```bash
pip install -r requirements.txt
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
If you encounter errors when installing the `samplerate` module on Linux systems, you can try installing it separately with this command:
```bash
pip install samplerate --only-binary=:all:
```
Then use `pyinstaller` to build the project:
@@ -160,7 +173,7 @@ vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
```
After the build completes, you can find the executable file in the `caption-engine/dist` folder. Then proceed with subsequent operations.
After the build completes, you can find the executable file in the `engine/dist` folder. Then proceed with subsequent operations.
### Run Project
@@ -170,8 +183,6 @@ npm run dev
### Build Project
Note: Currently the software has only been built and tested on Windows and macOS platforms. Correct operation on Linux platform is not guaranteed.
```bash
# For windows
npm run build:win
@@ -186,13 +197,13 @@ Note: You need to modify the configuration content in the `electron-builder.yml`
```yml
extraResources:
# For Windows
- from: ./caption-engine/dist/main-gummy.exe
to: ./caption-engine/main-gummy.exe
- from: ./caption-engine/dist/main-vosk.exe
to: ./caption-engine/main-vosk.exe
- from: ./engine/dist/main-gummy.exe
to: ./engine/main-gummy.exe
- from: ./engine/dist/main-vosk.exe
to: ./engine/main-vosk.exe
# For macOS and Linux
# - from: ./caption-engine/dist/main-gummy
# to: ./caption-engine/main-gummy
# - from: ./caption-engine/dist/main-vosk
# to: ./caption-engine/main-vosk
# - from: ./engine/dist/main-gummy
# to: ./engine/main-gummy
# - from: ./engine/dist/main-vosk
# to: ./engine/main-vosk
```

View File

@@ -4,7 +4,7 @@
<p>Auto Caption はクロスプラットフォームのリアルタイム字幕表示ソフトウェアです。</p>
<p>
<a href="https://github.com/HiMeditator/auto-caption/releases">
<img src="https://img.shields.io/badge/release-0.4.0-blue">
<img src="https://img.shields.io/badge/release-0.5.1-blue">
</a>
<a href="https://github.com/HiMeditator/auto-caption/issues">
<img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
@@ -18,13 +18,11 @@
| <a href="./README_en.md">English</a>
| <b>日本語</b> |
</p>
<p><i>Voskローカル字幕エンジンを含む v0.4.0 バージョンがリリースされました。<b>現在ローカル字幕エンジンは翻訳機能含まれておりません</b>。ローカル翻訳モジュールは現在も開発中です...</i></p>
<p><i>バージョン v0.5.1 がリリースされました。<b>現在の Vosk ローカル字幕エンジンは性能が低く、翻訳機能含まれてません</b>。より優れた字幕エンジンを開発中です...</i></p>
</div>
![](./assets/media/main_ja.png)
![](./assets/media/main_mac_ja.png)
## 📥 ダウンロード
[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
@@ -39,7 +37,17 @@
## 📖 基本使い方
現在、WindowsmacOS プラットフォーム向けのインストール可能なバージョンを提供しています。
このソフトウェアはWindowsmacOS、Linuxプラットフォームに対応しています。テスト済みのプラットフォーム情報は以下の通りです:
| OS バージョン | アーキテクチャ | システムオーディオ入力 | システムオーディオ出力 |
| ------------------ | ------------ | ------------------ | ------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
macOSおよびLinuxプラットフォームでシステムオーディオ出力を取得するには追加設定が必要です。詳細は[Auto Captionユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。
> 阿里雲の国際版サービスでは Gummy モデルを提供していないため、現在中国以外のユーザーは Gummy 字幕エンジンを使用できません。
@@ -65,10 +73,6 @@ Vosk ローカル字幕エンジンを使用するには、まず [Vosk Models](
- 字幕記録の表示とエクスポート
- オーディオ出力またはマイク入力からの字幕生成
注記:
- Windows と macOS プラットフォームはオーディオ出力とマイク入力の両方からの字幕生成をサポートしていますが、**macOS プラットフォームでシステムオーディオ出力を取得するには設定が必要です。詳細は[Auto Caption ユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。**
- Linux プラットフォームは現在システムオーディオ出力を取得できず、マイク入力からの字幕生成のみをサポートしています。
## ⚙️ 字幕エンジン説明
現在ソフトウェアには2つの字幕エンジンが組み込まれており、1つの新しいエンジンを計画中です。詳細は以下の通りです。
@@ -118,10 +122,10 @@ npm install
### 字幕エンジンの構築
まず `caption-engine` フォルダに入り、以下のコマンドを実行して仮想環境を作成します:
まず `engine` フォルダに入り、以下のコマンドを実行して仮想環境を作成します:
```bash
# ./caption-engine フォルダ内
# ./engine フォルダ内
python -m venv subenv
# または
python3 -m venv subenv
@@ -136,12 +140,21 @@ subenv/Scripts/activate
source subenv/bin/activate
```
その後、依存関係をインストールします(Linux または macOS 環境の場合、`requirements.txt` 内の `PyAudioWPatch` をコメントアウトする必要があります。このモジュールは Windows 環境専用です)。
> このステップでエラーが発生する場合があります。一般的にはビルド失敗が原因で、エラーメッセージに基づいて対応するビルドツールパッケージをインストールする必要があります。
次に依存関係をインストールします(このステップは失敗する可能性があります、通常はビルド失敗が原因です - エラーメッセージに基づいて対応するツールパッケージをインストールする必要があります):
```bash
pip install -r requirements.txt
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
Linuxシステムで`samplerate`モジュールのインストールに問題が発生した場合、以下のコマンドで個別にインストールを試すことができます:
```bash
pip install samplerate --only-binary=:all:
```
その後、`pyinstaller` を使用してプロジェクトをビルドします:
@@ -160,7 +173,7 @@ vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
```
これでプロジェクトのビルドが完了し、`caption-engine/dist` フォルダ内に対応する実行可能ファイルが確認できます。その後、次の操作に進むことができます。
これでプロジェクトのビルドが完了し、`engine/dist` フォルダ内に対応する実行可能ファイルが確認できます。その後、次の操作に進むことができます。
### プロジェクト実行
@@ -170,8 +183,6 @@ npm run dev
### プロジェクト構築
現在、ソフトウェアは Windows と macOS プラットフォームでのみ構築とテストが行われており、Linux プラットフォームでの正しい動作は保証できません。
```bash
# Windows 用
npm run build:win
@@ -186,13 +197,13 @@ npm run build:linux
```yml
extraResources:
# Windows用
- from: ./caption-engine/dist/main-gummy.exe
to: ./caption-engine/main-gummy.exe
- from: ./caption-engine/dist/main-vosk.exe
to: ./caption-engine/main-vosk.exe
- from: ./engine/dist/main-gummy.exe
to: ./engine/main-gummy.exe
- from: ./engine/dist/main-vosk.exe
to: ./engine/main-vosk.exe
# macOSとLinux用
# - from: ./caption-engine/dist/main-gummy
# to: ./caption-engine/main-gummy
# - from: ./caption-engine/dist/main-vosk
# to: ./caption-engine/main-vosk
# - from: ./engine/dist/main-gummy
# to: ./engine/main-gummy
# - from: ./engine/dist/main-vosk
# to: ./engine/main-vosk
```

Binary file not shown.

Before

Width:  |  Height:  |  Size: 462 KiB

After

Width:  |  Height:  |  Size: 367 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 477 KiB

After

Width:  |  Height:  |  Size: 382 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.7 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.7 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.8 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 468 KiB

After

Width:  |  Height:  |  Size: 372 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 321 KiB

After

Width:  |  Height:  |  Size: 323 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 324 KiB

After

Width:  |  Height:  |  Size: 324 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 323 KiB

After

Width:  |  Height:  |  Size: 324 KiB

Binary file not shown.

View File

@@ -1 +0,0 @@
from .process import mergeChunkChannels, resampleRawChunk, resampleMonoChunk

View File

@@ -1,73 +0,0 @@
"""获取 Linux 系统音频输入流"""
import pyaudio
class AudioStream:
"""
获取系统音频流
初始化参数:
audio_type: 0-系统音频输出流不支持不会生效1-系统音频输入流(默认)
chunk_rate: 每秒采集音频块的数量默认为20
"""
def __init__(self, audio_type=1, chunk_rate=20):
self.audio_type = audio_type
self.mic = pyaudio.PyAudio()
self.device = self.mic.get_default_input_device_info()
self.stream = None
self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
self.FORMAT = pyaudio.paInt16
self.CHANNELS = self.device["maxInputChannels"]
self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // chunk_rate
self.INDEX = self.device["index"]
def printInfo(self):
dev_info = f"""
采样输入设备:
- 设备类型:{ "音频输入Linux平台目前仅支持该项" }
- 序号:{self.device['index']}
- 名称:{self.device['name']}
- 最大输入通道数:{self.device['maxInputChannels']}
- 默认低输入延迟:{self.device['defaultLowInputLatency']}s
- 默认高输入延迟:{self.device['defaultHighInputLatency']}s
- 默认采样率:{self.device['defaultSampleRate']}Hz
音频样本块大小:{self.CHUNK}
样本位宽:{self.SAMP_WIDTH}
采样格式:{self.FORMAT}
音频通道数:{self.CHANNELS}
音频采样率:{self.RATE}
"""
print(dev_info)
def openStream(self):
"""
打开并返回系统音频输出流
"""
if self.stream: return self.stream
self.stream = self.mic.open(
format = self.FORMAT,
channels = int(self.CHANNELS),
rate = self.RATE,
input = True,
input_device_index = int(self.INDEX)
)
return self.stream
def read_chunk(self):
"""
读取音频数据
"""
if not self.stream: return None
return self.stream.read(self.CHUNK)
def closeStream(self):
"""
关闭系统音频输出流
"""
if self.stream is None: return
self.stream.stop_stream()
self.stream.close()
self.stream = None

View File

@@ -87,3 +87,30 @@
### 优化体验
- 字幕窗口右上角图标的颜色改为和字幕原文字体颜色一致
## v0.5.0
2025-07-15
为软件本体添加了更多功能、适配了 Linux。
### 新增功能
- 适配了 Linux 平台
- 新增修改字幕时间功能,可调整字幕时间
- 支持导出 srt 格式的字幕记录
- 支持显示字幕引擎状态pid、ppid、CPU占用率、内存占用、运行时间
### 优化体验
- 调整字幕窗口右上角图标为竖向排布
- 过滤 Gummy 字幕引擎输出的不完整字幕
## v0.5.1
2025-07-17
### 修复 bug
- 修复无法调用自定义字幕引擎的 bug
- 修复自定义字幕引擎的参数失效 bug

View File

@@ -10,12 +10,22 @@
- [x] 适配 macOS 平台 *2025/07/08*
- [x] 添加字幕文字描边 *2025/07/09*
- [x] 添加基于 Vosk 的字幕引擎 *2025/07/09*
- [x] 适配 Linux 平台 *2025/07/13*
- [x] 字幕窗口右上角图标改为竖向排布 *2025/07/14*
- [x] 可以调整字幕时间轴 *2025/07/14*
- [x] 可以导出 srt 格式的字幕记录 *2025/07/14*
- [x] 可以获取字幕引擎的系统资源消耗情况 *2025/07/15*
- [x] 添加字幕记录按时间降序排列选择 *2025/07/26*
## 待完成
- [ ] 重构字幕引擎
- [ ] 验证 / 添加基于 sherpa-onnx 的字幕引擎
## 后续计划
- [ ] 添加 Ollama 模型用于本地字幕引擎的翻译
- [ ] 添加本地字幕引擎
- [ ] 验证 / 添加基于 FunASR 的字幕引擎
- [ ] 验证 / 添加基于 FunASR 的字幕引擎
- [ ] 减小软件不必要的体积
## 遥远的未来

View File

@@ -0,0 +1,62 @@
# caption engine api-doc
本文档主要 Electron 主进程和字幕引擎进程的通信约定。
## 原理说明
本项目的 Python 进程通过标准输出向 Electron 主进程发送数据。
Python 进程标准输出 (`sys.stdout`) 的内容一定为一行一行的字符串。且每行字符串均可以解释为一个 JSON 对象。每个 JSON 对象一定有 `command` 参数。
## 输出约定
当 JSON 对象的 `command` 参数为下列值时,表示的对应的含义:
### `print`
```js
{
command: "print",
content: string
}
```
输出 Python 端打印的内容。
### `info`
```js
{
command: "info",
content: string
}
```
Python 端打印的提示信息,比起 `print`,该信息更希望 Electron 端的关注。
### `usage`
```js
{
command: "usage",
content: string
}
```
Gummy 字幕引擎结束时打印计费消耗信息。
### `caption`
```js
{
command: "caption",
index: number,
time_s: string,
time_t: string,
text: string,
translation: string
}
```
Python 端监听到的音频流转换为的字幕数据。

View File

@@ -57,6 +57,19 @@
- 发送:无数据
- 接收:`string`
### `control.engine.info`
**介绍:** 获取字幕引擎的资源消耗情况
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:**
- 发送:无数据
- 接收:`EngineInfo`
## 前端 ==> 后端
### `control.uiLanguage.change`

View File

@@ -1,6 +1,8 @@
# Caption Engine Documentation
Corresponding Version: v0.4.0
Corresponding Version: v0.5.1
**Note: Due to limited personal resources, the English and Japanese documentation files for this project (except for the README document) will no longer be maintained. The content of this document may not be consistent with the latest version of the project. If you are willing to help with translation, please submit relevant Pull Requests.**
![](../../assets/media/structure_en.png)
@@ -20,7 +22,7 @@ Generally, the captured audio stream data consists of short audio chunks, and th
The acquired audio stream may need preprocessing before being converted to text. For instance, Alibaba Cloud's Gummy model can only recognize single-channel audio streams, while the collected audio streams are typically dual-channel, thus requiring conversion from dual-channel to single-channel. Channel conversion can be achieved using methods in the NumPy library.
You can directly use the audio acquisition (`caption-engine/sysaudio`) and audio processing (`caption-engine/audioprcs`) modules I have developed.
You can directly use the audio acquisition (`engine/sysaudio`) and audio processing (`engine/audioprcs`) modules I have developed.
### Audio to Text Conversion
@@ -105,10 +107,10 @@ export interface CaptionItem {
If using Python, you can refer to the following method to pass data to the main program:
```python
# caption-engine\main-gummy.py
# engine\main-gummy.py
sys.stdout.reconfigure(line_buffering=True)
# caption-engine\audio2text\gummy.py
# engine\audio2text\gummy.py
...
def send_to_node(self, data):
"""
@@ -151,6 +153,51 @@ Data receiver code is as follows:
...
```
## Usage of Caption Engine
### Command Line Parameter Specification
The custom caption engine settings are specified via command line parameters. Common required parameters are as follows:
```python
import argparse
...
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
parser.add_argument('-c', '--chunk_rate', default=20, help='The number of audio stream chunks collected per second.')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
args = parser.parse_args()
convert_audio_to_text(
args.source_language,
args.target_language,
int(args.audio_type),
int(args.chunk_rate),
args.api_key
)
```
For example, to specify Japanese as source language, Chinese as target language, capture system audio output, and collect 0.1s audio chunks, use the following command:
```bash
python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>
```
### Packaging
After development and testing, package the caption engine into an executable file using `pyinstaller`. If errors occur, check for missing dependencies.
### Execution
With a working caption engine, specify its path and runtime parameters in the caption software window to launch it.
![](../img/02_en.png)
## Reference Code
The `main-gummy.py` file under the `caption-engine` folder in this project serves as the entry point for the default caption engine. The `src\main\utils\engine.ts` file contains the server-side code for acquiring and processing data from the caption engine. You can read and understand the implementation details and the complete execution process of the caption engine as needed.
The `main-gummy.py` file under the `engine` folder in this project serves as the entry point for the default caption engine. The `src\main\utils\engine.ts` file contains the server-side code for acquiring and processing data from the caption engine. You can read and understand the implementation details and the complete execution process of the caption engine as needed.

View File

@@ -1,9 +1,11 @@
# 字幕エンジンの説明文書
対応バージョンv0.4.0
対応バージョンv0.5.1
この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
**注意個人のリソースが限られているため、このプロジェクトの英語および日本語のドキュメントREADME ドキュメントを除く)のメンテナンスは行われません。このドキュメントの内容は最新版のプロジェクトと一致しない場合があります。翻訳のお手伝いをしていただける場合は、関連するプルリクエストを提出してください。**
![](../../assets/media/structure_ja.png)
## 字幕エンジンの紹介
@@ -22,7 +24,7 @@
取得した音声ストリームは、テキストに変換する前に前処理が必要な場合があります。例えば、アリババクラウドのGummyモデルは単一チャンネルの音声ストリームしか認識できませんが、収集された音声ストリームは通常二重チャンネルであるため、二重チャンネルの音声ストリームを単一チャンネルに変換する必要があります。チャンネル数の変換はNumPyライブラリのメソッドを使って行うことができます。
あなたは私によって開発された音声の取得(`caption-engine/sysaudio`)と音声の処理(`caption-engine/audioprcs`)モジュールを直接使用することができます。
あなたは私によって開発された音声の取得(`engine/sysaudio`)と音声の処理(`engine/audioprcs`)モジュールを直接使用することができます。
### 音声からテキストへの変換
@@ -107,10 +109,10 @@ export interface CaptionItem {
Python言語を使用する場合、以下の方法でデータをメインプログラムに渡すことができます
```python
# caption-engine\main-gummy.py
# engine\main-gummy.py
sys.stdout.reconfigure(line_buffering=True)
# caption-engine\audio2text\gummy.py
# engine\audio2text\gummy.py
...
def send_to_node(self, data):
"""
@@ -125,4 +127,77 @@ sys.stdout.reconfigure(line_buffering=True)
...
```
データ受信側のコード
データ受信側のコード
```typescript
// src\main\utils\engine.ts
...
this.process.stdout.on('data', (data) => {
const lines = data.toString().split('\n');
lines.forEach((line: string) => {
if (line.trim()) {
try {
const caption = JSON.parse(line);
addCaptionLog(caption);
} catch (e) {
controlWindow.sendErrorMessage('字幕エンジンの出力をJSONオブジェクトとして解析できません:' + e)
console.error('[ERROR] JSON解析エラー:', e);
}
}
});
});
this.process.stderr.on('data', (data) => {
controlWindow.sendErrorMessage('字幕エンジンエラー:' + data)
console.error(`[ERROR] サブプロセスエラー: ${data}`);
});
...
```
## 字幕エンジンの使用方法
### コマンドライン引数の指定
カスタム字幕エンジンの設定はコマンドライン引数で指定します。主な必要なパラメータは以下の通りです:
```python
import argparse
...
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='システムのオーディオストリームをテキストに変換')
parser.add_argument('-s', '--source_language', default='en', help='ソース言語コード')
parser.add_argument('-t', '--target_language', default='zh', help='ターゲット言語コード')
parser.add_argument('-a', '--audio_type', default=0, help='オーディオストリームソース: 0は出力音声、1は入力音声')
parser.add_argument('-c', '--chunk_rate', default=20, help='1秒間に収集するオーディオチャンク数')
parser.add_argument('-k', '--api_key', default='', help='GummyモデルのAPIキー')
args = parser.parse_args()
convert_audio_to_text(
args.source_language,
args.target_language,
int(args.audio_type),
int(args.chunk_rate),
args.api_key
)
```
原文を日本語、翻訳を中国語に指定し、システム音声出力を取得、0.1秒のオーディオデータを収集する場合:
```bash
python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>
```
### パッケージ化
開発とテスト完了後、`pyinstaller`を使用して実行可能ファイルにパッケージ化します。エラーが発生した場合、依存ライブラリの不足を確認してください。
### 実行
利用可能な字幕エンジンが準備できたら、字幕ソフトウェアのウィンドウでエンジンのパスと実行パラメータを指定して起動します。
![](../img/02_ja.png)
## 参考コード
本プロジェクトの`engine`フォルダにある`main-gummy.py`ファイルはデフォルトの字幕エンジンのエントリーコードです。`src\main\utils\engine.ts`はサーバー側で字幕エンジンのデータを取得・処理するコードです。必要に応じて字幕エンジンの実装詳細と完全な実行プロセスを理解するために参照してください。

View File

@@ -1,6 +1,6 @@
# 字幕引擎说明文档
对应版本v0.4.0
对应版本v0.5.1
![](../../assets/media/structure_zh.png)
@@ -20,7 +20,7 @@
获取到的音频流在转文字之前可能需要进行预处理。比如阿里云的 Gummy 模型只能识别单通道的音频流,而收集的音频流一般是双通道的,因此要将双通道音频流转换为单通道。通道数的转换可以使用 NumPy 库中的方法实现。
你可以直接使用我开发好的音频获取(`caption-engine/sysaudio`)和音频处理(`caption-engine/audioprcs`)模块。
你可以直接使用我开发好的音频获取(`engine/sysaudio`)和音频处理(`engine/audioprcs`)模块。
### 音频转文字
@@ -32,7 +32,7 @@
import sys
import argparse
# 引入系统音频获取
# 引入系统音频获取
if sys.platform == 'win32':
from sysaudio.win import AudioStream
elif sys.platform == 'darwin':
@@ -100,15 +100,15 @@ export interface CaptionItem {
}
```
**注意必须确保咱们一起每输出一次字幕 JSON 数据就得刷新缓冲区,确保 electron 主进程每次接收到的字符串都可以被解释为 JSON 对象。**
**注意必须确保每输出一次字幕 JSON 数据就得刷新缓冲区,确保 electron 主进程每次接收到的字符串都可以被解释为 JSON 对象。**
如果使用 python 语言,可以参考以下方式将数据传递给主程序:
```python
# caption-engine\main-gummy.py
# engine\main-gummy.py
sys.stdout.reconfigure(line_buffering=True)
# caption-engine\audio2text\gummy.py
# engine\audio2text\gummy.py
...
def send_to_node(self, data):
"""
@@ -151,6 +151,51 @@ sys.stdout.reconfigure(line_buffering=True)
...
```
## 字幕引擎的使用
### 命令行参数的指定
自定义字幕引擎的设置提供命令行参数指定,因此需要设置好字幕引擎的参数,常见的需要的参数如下:
```python
import argparse
...
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
parser.add_argument('-c', '--chunk_rate', default=20, help='The number of audio stream chunks collected per second.')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
args = parser.parse_args()
convert_audio_to_text(
args.source_language,
args.target_language,
int(args.audio_type),
int(args.chunk_rate),
args.api_key
)
```
比如对应上面的字幕引擎,我想指定原文为日语,翻译为中文,获取系统音频输出的字幕,每次截取 0.1s 的音频数据,那么命令行参数如下:
```bash
python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>
```
### 打包
在完成字幕引擎的开发和测试后,需要将字幕引擎打包成可执行文件。一般使用 `pyinstaller` 进行打包。如果打包好的字幕引擎文件执行报错,可能是打包漏掉了某些依赖库,请检查是否缺少了依赖库。
### 运行
有了可以使用的字幕引擎,就可以在字幕软件窗口中通过指定字幕引擎的路径和字幕引擎的运行指令(参数)来启动字幕引擎了。
![](../img/02_zh.png)
## 参考代码
本项目 `caption-engine` 文件夹下的 `main-gummy.py` 文件为默认字幕引擎的入口代码。`src\main\utils\engine.ts` 为服务端获取字幕引擎数据和进行处理的代码。可以根据需要阅读了解字幕引擎的实现细节和完整运行过程。
本项目 `engine` 文件夹下的 `main-gummy.py` 文件为默认字幕引擎的入口代码。`src\main\utils\engine.ts` 为服务端获取字幕引擎数据和进行处理的代码。可以根据需要阅读了解字幕引擎的实现细节和完整运行过程。

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

After

Width:  |  Height:  |  Size: 57 KiB

View File

@@ -1,14 +1,24 @@
# Auto Caption User Manual
Corresponding Version: v0.4.0
Corresponding Version: v0.5.1
**Note: Due to limited personal resources, the English and Japanese documentation files for this project (except for the README document) will no longer be maintained. The content of this document may not be consistent with the latest version of the project. If you are willing to help with translation, please submit relevant Pull Requests.**
## Software Introduction
Auto Caption is a cross-platform caption display software that can real-time capture system audio input (recording) or output (playback) streaming data and use an audio-to-text model to generate captions for the corresponding audio. The default caption engine provided by the software (using Alibaba Cloud Gummy model) supports recognition and translation in nine languages (Chinese, English, Japanese, Korean, German, French, Russian, Spanish, Italian).
Currently, the default caption engine of the software only has full functionality on Windows and macOS platforms. Additional configuration is required to capture system audio output on macOS.
The default caption engine currently has full functionality on Windows, macOS, and Linux platforms. Additional configuration is required to capture system audio output on macOS.
On Linux platforms, it can only generate captions for audio input (microphone), and currently does not support generating captions for audio output (playback).
The following operating system versions have been tested and confirmed to work properly. The software cannot guarantee normal operation on untested OS versions.
| OS Version | Architecture | Audio Input Capture | Audio Output Capture |
| ------------------ | ------------ | ------------------- | -------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
![](../../assets/media/main_en.png)
@@ -61,6 +71,30 @@ Once BlackHole is confirmed installed, in the `Audio MIDI Setup` page, click the
Now the caption engine can capture system audio output and generate captions.
## Getting System Audio Output on Linux
First execute in the terminal:
```bash
pactl list short sources
```
If you see output similar to the following, no additional configuration is needed:
```bash
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
```
Otherwise, install `pulseaudio` and `pavucontrol` using the following commands:
```bash
# For Debian/Ubuntu etc.
sudo apt install pulseaudio pavucontrol
# For CentOS etc.
sudo yum install pulseaudio pavucontrol
```
## Software Usage
### Modifying Settings
@@ -79,7 +113,7 @@ The following image shows the caption display window, which displays the latest
### Exporting Caption Records
In the caption control window, you can see the records of all collected captions. Click the "Export Caption Records" button to export the caption records as a JSON file.
In the caption control window, you can see the records of all collected captions. Click the "Export Log" button to export the caption records as a JSON or SRT file.
## Caption Engine

View File

@@ -1,16 +1,26 @@
# Auto Caption ユーザーマニュアル
対応バージョンv0.4.0
対応バージョンv0.5.1
この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
**注意個人のリソースが限られているため、このプロジェクトの英語および日本語のドキュメントREADME ドキュメントを除く)のメンテナンスは行われません。このドキュメントの内容は最新版のプロジェクトと一致しない場合があります。翻訳のお手伝いをしていただける場合は、関連するプルリクエストを提出してください。**
## ソフトウェアの概要
Auto Caption は、クロスプラットフォームの字幕表示ソフトウェアで、システムの音声入力(録音)または出力(音声再生)のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。このソフトウェアが提供するデフォルトの字幕エンジン(アリババクラウド Gummy モデルを使用は、9つの言語中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、スペイン語、イタリア語の認識と翻訳をサポートしています。
現在、ソフトウェアのデフォルト字幕エンジンは WindowsmacOS プラットフォームでのみ完全な機能を有しています。macOS でシステムオーディオ出力を取得するには追加設定が必要です。
現在のデフォルト字幕エンジンは WindowsmacOS、Linux プラットフォームで完全な機能を有しています。macOSでシステムオーディオ出力を取得するには追加設定が必要です。
Linux プラットフォームでは、オーディオ入力(マイク)からの字幕生成のみ可能で、現在オーディオ出力(再生音)からの字幕生成はサポートしていません。
以下のオペレーティングシステムバージョンで正常動作を確認しています。記載以外の OS での正常動作は保証できません。
| OS バージョン | アーキテクチャ | オーディオ入力取得 | オーディオ出力取得 |
| ------------------- | ------------- | ------------------ | ------------------ |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
![](../../assets/media/main_ja.png)
@@ -64,6 +74,30 @@ BlackHoleのインストールが確認できたら、`オーディオ MIDI 設
これで字幕エンジンがシステムオーディオ出力をキャプチャし、字幕を生成できるようになります。
## Linux でシステムオーディオ出力を取得する
まずターミナルで以下を実行してください:
```bash
pactl list short sources
```
以下のような出力が確認できれば追加設定は不要です:
```bash
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
```
それ以外の場合は、以下のコマンドで`pulseaudio``pavucontrol`をインストールしてください:
```bash
# Debian/Ubuntu系の場合
sudo apt install pulseaudio pavucontrol
# CentOS系の場合
sudo yum install pulseaudio pavucontrol
```
## ソフトウェアの使い方
### 設定の変更
@@ -82,7 +116,7 @@ BlackHoleのインストールが確認できたら、`オーディオ MIDI 設
### 字幕記録のエクスポート
字幕制御ウィンドウでは、現在収集されたすべての字幕の記録を見ることができます。「字幕記録をエクスポート」ボタンをクリックすると、字幕記録をJSONファイルとしてエクスポートできます。
エクスポート」ボタンをクリックすると、字幕記録を JSON または SRT ファイル形式で出力できます。
## 字幕エンジン

View File

@@ -1,14 +1,22 @@
# Auto Caption 用户手册
对应版本v0.4.0
对应版本v0.5.1
## 软件简介
Auto Caption 是一个跨平台的字幕显示软件,能够实时获取系统音频输入(录音)或输出(播放声音)的流式数据,并调用音频转文字的模型生成对应音频的字幕。软件提供的默认字幕引擎(使用阿里云 Gummy 模型)支持九种语言(中、英、日、韩、德、法、俄、西、意)的识别与翻译。
目前软件默认字幕引擎只有在 Windows macOS 平台下拥有完整功能,在 macOS 要获取系统音频输出需要额外配置。
目前软件默认字幕引擎在 Windows macOS 和 Linux 平台下拥有完整功能,在 macOS 要获取系统音频输出需要额外配置。
在 Linux 平台下只能生成音频输入(麦克风)的字幕,暂不支持音频输出(播放声音)的字幕生成
测试过可正常运行的操作系统信息如下,软件不能保证在非下列版本的操作系统上正常运行
| 操作系统版本 | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
| ------------------ | ---------- | ---------------- | ---------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
![](../../assets/media/main_zh.png)
@@ -29,7 +37,6 @@ Auto Caption 是一个跨平台的字幕显示软件,能够实时获取系统
这部分阿里云提供了详细的教程,可参考:
- [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
- [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
## Vosk 引擎使用前准备
@@ -62,6 +69,30 @@ brew install blackhole-64ch
现在字幕引擎就能捕获系统的音频输出并生成字幕了。
## Linux 获取系统音频输出
首先在控制台执行:
```bash
pactl list short sources
```
如果有以下类似的输出内容则无需额外配置:
```bash
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
```
否则,执行以下命令安装 `pulseaudio``pavucontrol`
```bash
# Debian or Ubuntu, etc.
sudo apt install pulseaudio pavucontrol
# CentOS, etc.
sudo yum install pulseaudio pavucontrol
```
## 软件使用
### 修改设置
@@ -80,7 +111,7 @@ brew install blackhole-64ch
### 字幕记录的导出
在字幕控制窗口中可以看到当前收集的所有字幕的记录,点击“导出字幕记录”按钮,即可将字幕记录导出为 JSON 文件。
在字幕控制窗口中可以看到当前收集的所有字幕的记录,点击“导出字幕”按钮,即可将字幕记录导出为 JSON 或 SRT 文件。
## 字幕引擎

View File

@@ -10,21 +10,21 @@ files:
- '!{LICENSE,README.md,README_en.md,README_ja.md}'
- '!{.env,.env.*,.npmrc,pnpm-lock.yaml}'
- '!{tsconfig.json,tsconfig.node.json,tsconfig.web.json}'
- '!caption-engine/*'
- '!engine/*'
- '!engine-test/*'
- '!docs/*'
- '!assets/*'
extraResources:
# For Windows
- from: ./caption-engine/dist/main-gummy.exe
to: ./caption-engine/main-gummy.exe
- from: ./caption-engine/dist/main-vosk.exe
to: ./caption-engine/main-vosk.exe
- from: ./engine/dist/main-gummy.exe
to: ./engine/main-gummy.exe
- from: ./engine/dist/main-vosk.exe
to: ./engine/main-vosk.exe
# For macOS and Linux
# - from: ./caption-engine/dist/main-gummy
# to: ./caption-engine/main-gummy
# - from: ./caption-engine/dist/main-vosk
# to: ./caption-engine/main-vosk
# - from: ./engine/dist/main-gummy
# to: ./engine/main-gummy
# - from: ./engine/dist/main-vosk
# to: ./engine/main-vosk
win:
executableName: auto-caption
icon: build/icon.png

View File

@@ -1,221 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dashscope.audio.asr import * # type: ignore\n",
"import pyaudiowpatch as pyaudio\n",
"import numpy as np\n",
"\n",
"\n",
"def getDefaultSpeakers(mic: pyaudio.PyAudio, info = True):\n",
" \"\"\"\n",
" 获取默认的系统音频输出的回环设备\n",
" Args:\n",
" mic (pyaudio.PyAudio): pyaudio对象\n",
" info (bool, optional): 是否打印设备信息. Defaults to True.\n",
"\n",
" Returns:\n",
" dict: 统音频输出的回环设备\n",
" \"\"\"\n",
" try:\n",
" WASAPI_info = mic.get_host_api_info_by_type(pyaudio.paWASAPI)\n",
" except OSError:\n",
" print(\"Looks like WASAPI is not available on the system. Exiting...\")\n",
" exit()\n",
"\n",
" default_speaker = mic.get_device_info_by_index(WASAPI_info[\"defaultOutputDevice\"])\n",
" if(info): print(\"wasapi_info:\\n\", WASAPI_info, \"\\n\")\n",
" if(info): print(\"default_speaker:\\n\", default_speaker, \"\\n\")\n",
"\n",
" if not default_speaker[\"isLoopbackDevice\"]:\n",
" for loopback in mic.get_loopback_device_info_generator():\n",
" if default_speaker[\"name\"] in loopback[\"name\"]:\n",
" default_speaker = loopback\n",
" if(info): print(\"Using loopback device:\\n\", default_speaker, \"\\n\")\n",
" break\n",
" else:\n",
" print(\"Default loopback output device not found.\")\n",
" print(\"Run `python -m pyaudiowpatch` to check available devices.\")\n",
" print(\"Exiting...\")\n",
" exit()\n",
" \n",
" if(info): print(f\"Recording Device: #{default_speaker['index']} {default_speaker['name']}\")\n",
" return default_speaker\n",
"\n",
"\n",
"class Callback(TranslationRecognizerCallback):\n",
" \"\"\"\n",
" 语音大模型流式传输回调对象\n",
" \"\"\"\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.usage = 0\n",
" self.sentences = []\n",
" self.translations = []\n",
" \n",
" def on_open(self) -> None:\n",
" print(\"\\n流式翻译开始...\\n\")\n",
"\n",
" def on_close(self) -> None:\n",
" print(f\"\\nTokens消耗{self.usage}\")\n",
" print(f\"流式翻译结束...\\n\")\n",
" for i in range(len(self.sentences)):\n",
" print(f\"\\n{self.sentences[i]}\\n{self.translations[i]}\\n\")\n",
"\n",
" def on_event(\n",
" self,\n",
" request_id,\n",
" transcription_result: TranscriptionResult,\n",
" translation_result: TranslationResult,\n",
" usage\n",
" ) -> None:\n",
" if transcription_result is not None:\n",
" id = transcription_result.sentence_id\n",
" text = transcription_result.text\n",
" if transcription_result.stash is not None:\n",
" stash = transcription_result.stash.text\n",
" else:\n",
" stash = \"\"\n",
" print(f\"#{id}: {text}{stash}\")\n",
" if usage: self.sentences.append(text)\n",
" \n",
" if translation_result is not None:\n",
" lang = translation_result.get_language_list()[0]\n",
" text = translation_result.get_translation(lang).text\n",
" if translation_result.get_translation(lang).stash is not None:\n",
" stash = translation_result.get_translation(lang).stash.text\n",
" else:\n",
" stash = \"\"\n",
" print(f\"#{lang}: {text}{stash}\")\n",
" if usage: self.translations.append(text)\n",
" \n",
" if usage: self.usage += usage['duration']"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"采样输入设备:\n",
" - 序号26\n",
" - 名称:耳机 (HUAWEI FreeLace 活力版) [Loopback]\n",
" - 最大输入通道数2\n",
" - 默认低输入延迟0.003s\n",
" - 默认高输入延迟0.01s\n",
" - 默认采样率48000.0Hz\n",
" - 是否回环设备True\n",
"\n",
"音频样本块大小4800\n",
"样本位宽2\n",
"音频数据格式8\n",
"音频通道数2\n",
"音频采样率48000\n",
"\n"
]
}
],
"source": [
"mic = pyaudio.PyAudio()\n",
"default_speaker = getDefaultSpeakers(mic, False)\n",
"\n",
"SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)\n",
"FORMAT = pyaudio.paInt16\n",
"CHANNELS = default_speaker[\"maxInputChannels\"]\n",
"RATE = int(default_speaker[\"defaultSampleRate\"])\n",
"CHUNK = RATE // 10\n",
"INDEX = default_speaker[\"index\"]\n",
"\n",
"dev_info = f\"\"\"\n",
"采样输入设备:\n",
" - 序号:{default_speaker['index']}\n",
" - 名称:{default_speaker['name']}\n",
" - 最大输入通道数:{default_speaker['maxInputChannels']}\n",
" - 默认低输入延迟:{default_speaker['defaultLowInputLatency']}s\n",
" - 默认高输入延迟:{default_speaker['defaultHighInputLatency']}s\n",
" - 默认采样率:{default_speaker['defaultSampleRate']}Hz\n",
" - 是否回环设备:{default_speaker['isLoopbackDevice']}\n",
"\n",
"音频样本块大小:{CHUNK}\n",
"样本位宽:{SAMP_WIDTH}\n",
"音频数据格式:{FORMAT}\n",
"音频通道数:{CHANNELS}\n",
"音频采样率:{RATE}\n",
"\"\"\"\n",
"print(dev_info)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"RECORD_SECONDS = 20 # 监听时长(s)\n",
"\n",
"stream = mic.open(\n",
" format = FORMAT,\n",
" channels = CHANNELS,\n",
" rate = RATE,\n",
" input = True,\n",
" input_device_index = INDEX\n",
")\n",
"translator = TranslationRecognizerRealtime(\n",
" model = \"gummy-realtime-v1\",\n",
" format = \"pcm\",\n",
" sample_rate = RATE,\n",
" transcription_enabled = True,\n",
" translation_enabled = True,\n",
" source_language = \"ja\",\n",
" translation_target_languages = [\"zh\"],\n",
" callback = Callback()\n",
")\n",
"translator.start()\n",
"\n",
"for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):\n",
" data = stream.read(CHUNK)\n",
" data_np = np.frombuffer(data, dtype=np.int16)\n",
" data_np_r = data_np.reshape(-1, CHANNELS)\n",
" print(data_np_r.shape)\n",
" mono_data = np.mean(data_np_r.astype(np.float32), axis=1)\n",
" mono_data = mono_data.astype(np.int16)\n",
" mono_data_bytes = mono_data.tobytes()\n",
" translator.send_audio_frame(mono_data_bytes)\n",
"\n",
"translator.stop()\n",
"stream.stop_stream()\n",
"stream.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "mystd",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,189 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 7,
"id": "1e12f3ef",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" 采样输入设备:\n",
" - 设备类型:音频输出\n",
" - 序号0\n",
" - 名称BlackHole 2ch\n",
" - 最大输入通道数2\n",
" - 默认低输入延迟0.01s\n",
" - 默认高输入延迟0.1s\n",
" - 默认采样率48000.0Hz\n",
"\n",
" 音频样本块大小2400\n",
" 样本位宽2\n",
" 采样格式8\n",
" 音频通道数2\n",
" 音频采样率48000\n",
" \n"
]
}
],
"source": [
"import sys\n",
"import os\n",
"import wave\n",
"\n",
"current_dir = os.getcwd() \n",
"sys.path.append(os.path.join(current_dir, '../caption-engine'))\n",
"\n",
"from sysaudio.darwin import AudioStream\n",
"from audioprcs import resampleRawChunk, mergeChunkChannels\n",
"\n",
"stream = AudioStream(0)\n",
"stream.printInfo()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a72914f4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Recording...\n",
"Done\n"
]
}
],
"source": [
"\"\"\"获取系统音频输出5秒然后保存为wav文件\"\"\"\n",
"\n",
"with wave.open('output.wav', 'wb') as wf:\n",
" wf.setnchannels(stream.CHANNELS)\n",
" wf.setsampwidth(stream.SAMP_WIDTH)\n",
" wf.setframerate(stream.RATE)\n",
" stream.openStream()\n",
"\n",
" print('Recording...')\n",
"\n",
" for _ in range(0, 100):\n",
" chunk = stream.read_chunk()\n",
" if isinstance(chunk, bytes):\n",
" wf.writeframes(chunk)\n",
" else:\n",
" raise Exception('Error: chunk is not bytes')\n",
" \n",
" stream.closeStream() \n",
" print('Done')"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a6e8a098",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Recording...\n",
"Done\n"
]
}
],
"source": [
"\"\"\"获取系统音频输入转换为单通道音频持续5秒然后保存为wav文件\"\"\"\n",
"\n",
"with wave.open('output.wav', 'wb') as wf:\n",
" wf.setnchannels(1)\n",
" wf.setsampwidth(stream.SAMP_WIDTH)\n",
" wf.setframerate(stream.RATE)\n",
" stream.openStream()\n",
"\n",
" print('Recording...')\n",
"\n",
" for _ in range(0, 100):\n",
" chunk = mergeChunkChannels(\n",
" stream.read_chunk(),\n",
" stream.CHANNELS\n",
" )\n",
" if isinstance(chunk, bytes):\n",
" wf.writeframes(chunk)\n",
" else:\n",
" raise Exception('Error: chunk is not bytes')\n",
" \n",
" stream.closeStream() \n",
" print('Done')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "aaca1465",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Recording...\n",
"Done\n"
]
}
],
"source": [
"\"\"\"获取系统音频输入转换为单通道音频并重采样到16000Hz持续5秒然后保存为wav文件\"\"\"\n",
"\n",
"with wave.open('output.wav', 'wb') as wf:\n",
" wf.setnchannels(1)\n",
" wf.setsampwidth(stream.SAMP_WIDTH)\n",
" wf.setframerate(16000)\n",
" stream.openStream()\n",
"\n",
" print('Recording...')\n",
"\n",
" for _ in range(0, 100):\n",
" chunk = resampleRawChunk(\n",
" stream.read_chunk(),\n",
" stream.CHANNELS,\n",
" stream.RATE,\n",
" 16000,\n",
" mode=\"sinc_best\"\n",
" )\n",
" if isinstance(chunk, bytes):\n",
" wf.writeframes(chunk)\n",
" else:\n",
" raise Exception('Error: chunk is not bytes')\n",
" \n",
" stream.closeStream() \n",
" print('Done')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,64 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "440d4a07",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"d:\\Projects\\auto-caption\\caption-engine\\subenv\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n",
"None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\n"
]
},
{
"ename": "ImportError",
"evalue": "\nMarianTokenizer requires the SentencePiece library but it was not found in your environment. Check out the instructions on the\ninstallation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones\nthat match your environment. Please note that you may need to restart your runtime after installation.\n",
"output_type": "error",
"traceback": [
"\u001b[31m---------------------------------------------------------------------------\u001b[39m",
"\u001b[31mImportError\u001b[39m Traceback (most recent call last)",
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 3\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mtransformers\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m MarianMTModel, MarianTokenizer\n\u001b[32m----> \u001b[39m\u001b[32m3\u001b[39m tokenizer = \u001b[43mMarianTokenizer\u001b[49m\u001b[43m.\u001b[49m\u001b[43mfrom_pretrained\u001b[49m(\u001b[33m\"\u001b[39m\u001b[33mHelsinki-NLP/opus-mt-en-zh\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m 4\u001b[39m model = MarianMTModel.from_pretrained(\u001b[33m\"\u001b[39m\u001b[33mHelsinki-NLP/opus-mt-en-zh\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m 6\u001b[39m tokenizer.save_pretrained(\u001b[33m\"\u001b[39m\u001b[33m./model_en_zh\u001b[39m\u001b[33m\"\u001b[39m)\n",
"\u001b[36mFile \u001b[39m\u001b[32md:\\Projects\\auto-caption\\caption-engine\\subenv\\Lib\\site-packages\\transformers\\utils\\import_utils.py:1994\u001b[39m, in \u001b[36mDummyObject.__getattribute__\u001b[39m\u001b[34m(cls, key)\u001b[39m\n\u001b[32m 1992\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m (key.startswith(\u001b[33m\"\u001b[39m\u001b[33m_\u001b[39m\u001b[33m\"\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m key != \u001b[33m\"\u001b[39m\u001b[33m_from_config\u001b[39m\u001b[33m\"\u001b[39m) \u001b[38;5;129;01mor\u001b[39;00m key == \u001b[33m\"\u001b[39m\u001b[33mis_dummy\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;129;01mor\u001b[39;00m key == \u001b[33m\"\u001b[39m\u001b[33mmro\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;129;01mor\u001b[39;00m key == \u001b[33m\"\u001b[39m\u001b[33mcall\u001b[39m\u001b[33m\"\u001b[39m:\n\u001b[32m 1993\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28msuper\u001b[39m().\u001b[34m__getattribute__\u001b[39m(key)\n\u001b[32m-> \u001b[39m\u001b[32m1994\u001b[39m \u001b[43mrequires_backends\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mcls\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mcls\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_backends\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[36mFile \u001b[39m\u001b[32md:\\Projects\\auto-caption\\caption-engine\\subenv\\Lib\\site-packages\\transformers\\utils\\import_utils.py:1980\u001b[39m, in \u001b[36mrequires_backends\u001b[39m\u001b[34m(obj, backends)\u001b[39m\n\u001b[32m 1977\u001b[39m failed.append(msg.format(name))\n\u001b[32m 1979\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m failed:\n\u001b[32m-> \u001b[39m\u001b[32m1980\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m(\u001b[33m\"\u001b[39m\u001b[33m\"\u001b[39m.join(failed))\n",
"\u001b[31mImportError\u001b[39m: \nMarianTokenizer requires the SentencePiece library but it was not found in your environment. Check out the instructions on the\ninstallation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones\nthat match your environment. Please note that you may need to restart your runtime after installation.\n"
]
}
],
"source": [
"from transformers import MarianMTModel, MarianTokenizer\n",
"\n",
"tokenizer = MarianTokenizer.from_pretrained(\"Helsinki-NLP/opus-mt-en-zh\")\n",
"model = MarianMTModel.from_pretrained(\"Helsinki-NLP/opus-mt-en-zh\")\n",
"\n",
"tokenizer.save_pretrained(\"./model_en_zh\")\n",
"model.save_pretrained(\"./model_en_zh\")\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "subenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,124 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "6fb12704",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"d:\\Projects\\auto-caption\\caption-engine\\subenv\\Lib\\site-packages\\vosk\\__init__.py\n"
]
}
],
"source": [
"import vosk\n",
"print(vosk.__file__)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "63a06f5c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" 采样设备:\n",
" - 设备类型:音频输入\n",
" - 序号1\n",
" - 名称:麦克风阵列 (Realtek(R) Audio)\n",
" - 最大输入通道数2\n",
" - 默认低输入延迟0.09s\n",
" - 默认高输入延迟0.18s\n",
" - 默认采样率44100.0Hz\n",
" - 是否回环设备False\n",
"\n",
" 音频样本块大小2205\n",
" 样本位宽2\n",
" 采样格式8\n",
" 音频通道数2\n",
" 音频采样率44100\n",
" \n"
]
}
],
"source": [
"import sys\n",
"import os\n",
"import json\n",
"from vosk import Model, KaldiRecognizer\n",
"\n",
"current_dir = os.getcwd() \n",
"sys.path.append(os.path.join(current_dir, '../caption-engine'))\n",
"\n",
"from sysaudio.win import AudioStream\n",
"from audioprcs import resampleRawChunk, mergeChunkChannels\n",
"\n",
"stream = AudioStream(1)\n",
"stream.printInfo()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "5d5a0afa",
"metadata": {},
"outputs": [],
"source": [
"model = Model(os.path.join(\n",
" current_dir,\n",
" '../caption-engine/models/vosk-model-small-cn-0.22'\n",
"))\n",
"recognizer = KaldiRecognizer(model, 16000)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e9d1530",
"metadata": {},
"outputs": [],
"source": [
"stream.openStream()\n",
"\n",
"for i in range(200):\n",
" chunk = stream.read_chunk()\n",
" chunk_mono = resampleRawChunk(chunk, stream.CHANNELS, stream.RATE, 16000)\n",
" if recognizer.AcceptWaveform(chunk_mono):\n",
" result = json.loads(recognizer.Result())\n",
" print(\"acc:\", result.get(\"text\", \"\"))\n",
" else:\n",
" partial = json.loads(recognizer.PartialResult())\n",
" print(\"else:\", partial.get(\"partial\", \"\"))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "subenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -6,8 +6,8 @@ from dashscope.audio.asr import (
)
import dashscope
from datetime import datetime
import json
import sys
from utils import stdout_cmd, stdout_obj
class Callback(TranslationRecognizerCallback):
"""
@@ -15,17 +15,20 @@ class Callback(TranslationRecognizerCallback):
"""
def __init__(self):
super().__init__()
self.index = 0
self.usage = 0
self.cur_id = -1
self.time_str = ''
def on_open(self) -> None:
# print("on_open")
pass
self.usage = 0
self.cur_id = -1
self.time_str = ''
stdout_cmd('info', 'Gummy translator started.')
def on_close(self) -> None:
# print("on_close")
pass
stdout_cmd('info', 'Gummy translator closed.')
stdout_cmd('usage', str(self.usage))
def on_event(
self,
@@ -35,17 +38,17 @@ class Callback(TranslationRecognizerCallback):
usage
) -> None:
caption = {}
if transcription_result is not None:
caption['index'] = transcription_result.sentence_id
caption['text'] = transcription_result.text
if caption['index'] != self.cur_id:
self.cur_id = caption['index']
cur_time = datetime.now().strftime('%H:%M:%S.%f')[:-3]
caption['time_s'] = cur_time
self.time_str = cur_time
else:
caption['time_s'] = self.time_str
if self.cur_id != transcription_result.sentence_id:
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.cur_id = transcription_result.sentence_id
self.index += 1
caption['command'] = 'caption'
caption['index'] = self.index
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
caption['text'] = transcription_result.text
caption['translation'] = ""
if translation_result is not None:
@@ -55,19 +58,9 @@ class Callback(TranslationRecognizerCallback):
if usage:
self.usage += usage['duration']
# print(caption)
self.send_to_node(caption)
if 'text' in caption:
stdout_obj(caption)
def send_to_node(self, data):
"""
将数据发送到 Node.js 进程
"""
try:
json_data = json.dumps(data) + '\n'
sys.stdout.write(json_data)
sys.stdout.flush()
except Exception as e:
print(f"Error sending data to Node.js: {e}", file=sys.stderr)
class GummyTranslator:
"""
@@ -78,7 +71,7 @@ class GummyTranslator:
source: 源语言代码字符串zh, en, ja
target: 目标语言代码字符串zh, en, ja
"""
def __init__(self, rate, source, target, api_key):
def __init__(self, rate: int, source: str, target: str | None, api_key: str | None):
if api_key:
dashscope.api_key = api_key
self.translator = TranslationRecognizerRealtime(
@@ -97,7 +90,7 @@ class GummyTranslator:
self.translator.start()
def send_audio_frame(self, data):
"""发送音频帧"""
"""发送音频帧,擎将自动识别并将识别结果输出到标准输出中"""
self.translator.send_audio_frame(data)
def stop(self):

59
engine/audio2text/vosk.py Normal file
View File

@@ -0,0 +1,59 @@
import json
from datetime import datetime
from vosk import Model, KaldiRecognizer, SetLogLevel
from utils import stdout_obj
class VoskRecognizer:
"""
使用 Vosk 引擎流式处理的音频数据,并在标准输出中输出与 Auto Caption 软件可读取的 JSON 字符串数据
初始化参数:
model_path: Vosk 识别模型路径
"""
def __int__(self, model_path: str):
SetLogLevel(-1)
if model_path.startswith('"'):
model_path = model_path[1:]
if model_path.endswith('"'):
model_path = model_path[:-1]
self.model_path = model_path
self.time_str = ''
self.cur_id = 0
self.prev_content = ''
self.model = Model(self.model_path)
self.recognizer = KaldiRecognizer(self.model, 16000)
def send_audio_frame(self, data: bytes):
"""
发送音频帧给 Vosk 引擎,引擎将自动识别并将识别结果输出到标准输出中
Args:
data: 音频帧数据,采样率必须为 16000Hz
"""
caption = {}
caption['command'] = 'caption'
caption['translation'] = ''
if self.recognizer.AcceptWaveform(data):
content = json.loads(self.recognizer.Result()).get('text', '')
caption['index'] = self.cur_id
caption['text'] = content
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.prev_content = ''
self.cur_id += 1
else:
content = json.loads(self.recognizer.PartialResult()).get('partial', '')
if content == '' or content == self.prev_content:
return
if self.prev_content == '':
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
caption['index'] = self.cur_id
caption['text'] = content
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.prev_content = content
stdout_obj(caption)

View File

@@ -1,21 +1,11 @@
import sys
import argparse
if sys.platform == 'win32':
from sysaudio.win import AudioStream
elif sys.platform == 'darwin':
from sysaudio.darwin import AudioStream
elif sys.platform == 'linux':
from sysaudio.linux import AudioStream
else:
raise NotImplementedError(f"Unsupported platform: {sys.platform}")
from audioprcs import mergeChunkChannels
from sysaudio import AudioStream
from utils import merge_chunk_channels
from audio2text import InvalidParameter, GummyTranslator
def convert_audio_to_text(s_lang, t_lang, audio_type, chunk_rate, api_key):
sys.stdout.reconfigure(line_buffering=True) # type: ignore
stream = AudioStream(audio_type, chunk_rate)
if t_lang == 'none':
@@ -23,20 +13,21 @@ def convert_audio_to_text(s_lang, t_lang, audio_type, chunk_rate, api_key):
else:
gummy = GummyTranslator(stream.RATE, s_lang, t_lang, api_key)
stream.openStream()
stream.open_stream()
gummy.start()
while True:
try:
chunk = stream.read_chunk()
chunk_mono = mergeChunkChannels(chunk, stream.CHANNELS)
if chunk is None: continue
chunk_mono = merge_chunk_channels(chunk, stream.CHANNELS)
try:
gummy.send_audio_frame(chunk_mono)
except InvalidParameter:
gummy.start()
gummy.send_audio_frame(chunk_mono)
except KeyboardInterrupt:
stream.closeStream()
stream.close_stream()
gummy.stop()
break

View File

@@ -35,4 +35,5 @@ exe = EXE(
target_arch=None,
codesign_identity=None,
entitlements_file=None,
onefile=True,
)

View File

@@ -4,17 +4,9 @@ import argparse
from datetime import datetime
import numpy.core.multiarray
if sys.platform == 'win32':
from sysaudio.win import AudioStream
elif sys.platform == 'darwin':
from sysaudio.darwin import AudioStream
elif sys.platform == 'linux':
from sysaudio.linux import AudioStream
else:
raise NotImplementedError(f"Unsupported platform: {sys.platform}")
from sysaudio import AudioStream
from vosk import Model, KaldiRecognizer, SetLogLevel
from audioprcs import resampleRawChunk
from utils import resample_chunk_mono
SetLogLevel(-1)
@@ -30,7 +22,7 @@ def convert_audio_to_text(audio_type, chunk_rate, model_path):
recognizer = KaldiRecognizer(model, 16000)
stream = AudioStream(audio_type, chunk_rate)
stream.openStream()
stream.open_stream()
time_str = ''
cur_id = 0
@@ -38,7 +30,8 @@ def convert_audio_to_text(audio_type, chunk_rate, model_path):
while True:
chunk = stream.read_chunk()
chunk_mono = resampleRawChunk(chunk, stream.CHANNELS, stream.RATE, 16000)
if chunk is None: continue
chunk_mono = resample_chunk_mono(chunk, stream.CHANNELS, stream.RATE, 16000)
caption = {}
if recognizer.AcceptWaveform(chunk_mono):
@@ -56,6 +49,7 @@ def convert_audio_to_text(audio_type, chunk_rate, model_path):
continue
if prev_content == '':
time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
caption['command'] = 'caption'
caption['index'] = cur_id
caption['text'] = content
caption['time_s'] = time_str

View File

@@ -1,8 +1,12 @@
# -*- mode: python ; coding: utf-8 -*-
from pathlib import Path
import sys
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
if sys.platform == 'win32':
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
else:
vosk_path = str(Path('./subenv/lib/python3.12/site-packages/vosk').resolve())
a = Analysis(
['main-vosk.py'],
@@ -39,4 +43,5 @@ exe = EXE(
target_arch=None,
codesign_identity=None,
entitlements_file=None,
onefile=True,
)

37
engine/main.py Normal file
View File

@@ -0,0 +1,37 @@
import argparse
def gummy_engine(s, t, a, c, k):
pass
def vosk_engine(a, c, m):
pass
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
# both
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
parser.add_argument('-c', '--chunk_rate', default=20, help='Number of audio stream chunks collected per second')
# gummy
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
# vosk
parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
args = parser.parse_args()
if args.caption_engine == 'gummy':
gummy_engine(
args.source_language,
args.target_language,
int(args.audio_type),
int(args.chunk_rate),
args.api_key
)
elif args.caption_engine == 'vosk':
vosk_engine(
int(args.audio_type),
int(args.chunk_rate),
args.model_path
)
else:
raise ValueError('Invalid caption engine specified.')

View File

@@ -2,6 +2,5 @@ dashscope
numpy
samplerate
PyAudio
PyAudioWPatch # Windows only
vosk
pyinstaller

View File

@@ -0,0 +1,5 @@
dashscope
numpy
vosk
pyinstaller
samplerate # pip install samplerate --only-binary=:all:

View File

@@ -0,0 +1,6 @@
dashscope
numpy
samplerate
PyAudioWPatch
vosk
pyinstaller

View File

@@ -0,0 +1,10 @@
import sys
if sys.platform == "win32":
from .win import AudioStream
elif sys.platform == "darwin":
from .darwin import AudioStream
elif sys.platform == "linux":
from .linux import AudioStream
else:
raise NotImplementedError(f"Unsupported platform: {sys.platform}")

View File

@@ -1,11 +1,24 @@
"""获取 MacOS 系统音频输入/输出流"""
import pyaudio
from textwrap import dedent
def get_blackhole_device(mic: pyaudio.PyAudio):
"""
获取 BlackHole 设备
"""
device_count = mic.get_device_count()
for i in range(device_count):
dev_info = mic.get_device_info_by_index(i)
if 'blackhole' in str(dev_info["name"]).lower():
return dev_info
raise Exception("The device containing BlackHole was not found.")
class AudioStream:
"""
获取系统音频流支持 BlackHole 作为系统音频输出捕获
获取系统音频流如果要捕获输出音频支持 BlackHole 作为系统音频输出捕获
初始化参数
audio_type: 0-系统音频输出流需配合 BlackHole1-系统音频输入流
@@ -15,46 +28,40 @@ class AudioStream:
self.audio_type = audio_type
self.mic = pyaudio.PyAudio()
if self.audio_type == 0:
self.device = self.getOutputDeviceInfo()
self.device = get_blackhole_device(self.mic)
else:
self.device = self.mic.get_default_input_device_info()
self.stop_signal = False
self.stream = None
self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
self.INDEX = self.device["index"]
self.FORMAT = pyaudio.paInt16
self.CHANNELS = self.device["maxInputChannels"]
self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
self.CHANNELS = int(self.device["maxInputChannels"])
self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // chunk_rate
self.INDEX = self.device["index"]
def getOutputDeviceInfo(self):
"""查找指定关键词的输入设备"""
device_count = self.mic.get_device_count()
for i in range(device_count):
dev_info = self.mic.get_device_info_by_index(i)
if 'blackhole' in dev_info["name"].lower():
return dev_info
raise Exception("The device containing BlackHole was not found.")
def printInfo(self):
def get_info(self):
dev_info = f"""
采样输入设备
采样设备
- 设备类型{ "音频输出" if self.audio_type == 0 else "音频输入" }
- 序号{self.device['index']}
- 名称{self.device['name']}
- 设备序号{self.device['index']}
- 设备名称{self.device['name']}
- 最大输入通道数{self.device['maxInputChannels']}
- 默认低输入延迟{self.device['defaultLowInputLatency']}s
- 默认高输入延迟{self.device['defaultHighInputLatency']}s
- 默认采样率{self.device['defaultSampleRate']}Hz
- 是否回环设备{self.device['isLoopbackDevice']}
音频样本块大小{self.CHUNK}
设备序号{self.INDEX}
样本格式{self.FORMAT}
样本位宽{self.SAMP_WIDTH}
采样格式{self.FORMAT}
音频通道数{self.CHANNELS}
音频采样率{self.RATE}
样本通道数{self.CHANNELS}
样本采样率{self.RATE}
样本块大小{self.CHUNK}
"""
print(dev_info)
return dedent(dev_info).strip()
def openStream(self):
def open_stream(self):
"""
打开并返回系统音频输出流
"""
@@ -72,14 +79,24 @@ class AudioStream:
"""
读取音频数据
"""
if self.stop_signal:
self.close_stream()
return None
if not self.stream: return None
return self.stream.read(self.CHUNK, exception_on_overflow=False)
def closeStream(self):
def close_stream_signal(self):
"""
关闭系统音频输
线程安全的关闭系统音频输不一定会立即关闭
"""
if self.stream is None: return
self.stream.stop_stream()
self.stream.close()
self.stream = None
self.stop_signal = True
def close_stream(self):
"""
立即关闭系统音频输入流
"""
if self.stream is not None:
self.stream.stop_stream()
self.stream.close()
self.stream = None
self.stop_signal = False

108
engine/sysaudio/linux.py Normal file
View File

@@ -0,0 +1,108 @@
"""获取 Linux 系统音频输入流"""
import subprocess
from textwrap import dedent
def find_monitor_source():
result = subprocess.run(
["pactl", "list", "short", "sources"],
stdout=subprocess.PIPE, text=True
)
lines = result.stdout.splitlines()
for line in lines:
parts = line.split('\t')
if len(parts) >= 2 and ".monitor" in parts[1]:
return parts[1]
raise RuntimeError("System output monitor device not found")
def find_input_source():
result = subprocess.run(
["pactl", "list", "short", "sources"],
stdout=subprocess.PIPE, text=True
)
lines = result.stdout.splitlines()
for line in lines:
parts = line.split('\t')
name = parts[1]
if ".monitor" not in name:
return name
raise RuntimeError("Microphone input device not found")
class AudioStream:
"""
获取系统音频流
初始化参数:
audio_type: 0-系统音频输出流不支持不会生效1-系统音频输入流(默认)
chunk_rate: 每秒采集音频块的数量默认为20
"""
def __init__(self, audio_type=1, chunk_rate=20):
self.audio_type = audio_type
if self.audio_type == 0:
self.source = find_monitor_source()
else:
self.source = find_input_source()
self.stop_signal = False
self.process = None
self.FORMAT = 16
self.SAMP_WIDTH = 2
self.CHANNELS = 2
self.RATE = 48000
self.CHUNK = self.RATE // chunk_rate
def get_info(self):
dev_info = f"""
音频捕获进程:
- 捕获类型:{"音频输出" if self.audio_type == 0 else "音频输入"}
- 设备源:{self.source}
- 捕获进程 PID{self.process.pid if self.process else "None"}
样本格式:{self.FORMAT}
样本位宽:{self.SAMP_WIDTH}
样本通道数:{self.CHANNELS}
样本采样率:{self.RATE}
样本块大小:{self.CHUNK}
"""
print(dev_info)
def open_stream(self):
"""
启动音频捕获进程
"""
self.process = subprocess.Popen(
["parec", "-d", self.source, "--format=s16le", "--rate=48000", "--channels=2"],
stdout=subprocess.PIPE
)
def read_chunk(self):
"""
读取音频数据
"""
if self.stop_signal:
self.close_stream()
return None
if self.process and self.process.stdout:
return self.process.stdout.read(self.CHUNK)
return None
def close_stream_signal(self):
"""
线程安全的关闭系统音频输入流,不一定会立即关闭
"""
self.stop_signal = True
def close_stream(self):
"""
关闭系统音频捕获进程
"""
if self.process:
self.process.terminate()
self.stop_signal = False

View File

@@ -1,14 +1,15 @@
"""获取 Windows 系统音频输入/输出流"""
import pyaudiowpatch as pyaudio
from textwrap import dedent
def getDefaultLoopbackDevice(mic: pyaudio.PyAudio, info = True)->dict:
def get_default_loopback_device(mic: pyaudio.PyAudio, info = True)->dict:
"""
获取默认的系统音频输出的回环设备
Args:
mic (pyaudio.PyAudio): pyaudio对象
info (bool, optional): 是否打印设备信息
mic: pyaudio对象
info: 是否打印设备信息
Returns:
dict: 系统音频输出的回环设备
@@ -51,38 +52,40 @@ class AudioStream:
self.audio_type = audio_type
self.mic = pyaudio.PyAudio()
if self.audio_type == 0:
self.device = getDefaultLoopbackDevice(self.mic, False)
self.device = get_default_loopback_device(self.mic, False)
else:
self.device = self.mic.get_default_input_device_info()
self.stop_signal = False
self.stream = None
self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
self.INDEX = self.device["index"]
self.FORMAT = pyaudio.paInt16
self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
self.CHANNELS = int(self.device["maxInputChannels"])
self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // chunk_rate
self.INDEX = self.device["index"]
def printInfo(self):
def get_info(self):
dev_info = f"""
采样设备
- 设备类型{ "音频输出" if self.audio_type == 0 else "音频输入" }
- 序号{self.device['index']}
- 名称{self.device['name']}
- 设备序号{self.device['index']}
- 设备名称{self.device['name']}
- 最大输入通道数{self.device['maxInputChannels']}
- 默认低输入延迟{self.device['defaultLowInputLatency']}s
- 默认高输入延迟{self.device['defaultHighInputLatency']}s
- 默认采样率{self.device['defaultSampleRate']}Hz
- 是否回环设备{self.device['isLoopbackDevice']}
音频样本块大小{self.CHUNK}
设备序号{self.INDEX}
样本格式{self.FORMAT}
样本位宽{self.SAMP_WIDTH}
采样格式{self.FORMAT}
音频通道数{self.CHANNELS}
音频采样率{self.RATE}
样本通道数{self.CHANNELS}
样本采样率{self.RATE}
样本块大小{self.CHUNK}
"""
print(dev_info)
return dedent(dev_info).strip()
def openStream(self):
def open_stream(self):
"""
打开并返回系统音频输出流
"""
@@ -96,18 +99,28 @@ class AudioStream:
)
return self.stream
def read_chunk(self):
def read_chunk(self) -> bytes | None:
"""
读取音频数据
"""
if self.stop_signal:
self.close_stream()
return None
if not self.stream: return None
return self.stream.read(self.CHUNK, exception_on_overflow=False)
def closeStream(self):
def close_stream_signal(self):
"""
关闭系统音频输
线程安全的关闭系统音频输不一定会立即关闭
"""
if self.stream is None: return
self.stream.stop_stream()
self.stream.close()
self.stream = None
self.stop_signal = True
def close_stream(self):
"""
关闭系统音频输入流
"""
if self.stream is not None:
self.stream.stop_stream()
self.stream.close()
self.stream = None
self.stop_signal = False

2
engine/utils/__init__.py Normal file
View File

@@ -0,0 +1,2 @@
from .process import merge_chunk_channels, resample_chunk_mono, resample_mono_chunk
from .sysout import stdout, stdout_cmd, stdout_obj, stderr

View File

@@ -1,16 +1,17 @@
import samplerate
import numpy as np
def mergeChunkChannels(chunk, channels):
def merge_chunk_channels(chunk: bytes, channels: int) -> bytes:
"""
将当前多通道音频数据块转换为单通道音频数据块
Args:
chunk: (bytes)多通道音频数据块
chunk: 多通道音频数据块
channels: 通道数
Returns:
(bytes)单通道音频数据块
单通道音频数据块
"""
# (length * channels,)
chunk_np = np.frombuffer(chunk, dtype=np.int16)
@@ -22,19 +23,19 @@ def mergeChunkChannels(chunk, channels):
return chunk_mono.tobytes()
def resampleRawChunk(chunk, channels, orig_sr, target_sr, mode="sinc_best"):
def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
"""
将当前多通道音频数据块转换成单通道音频数据块然后进行重采样
Args:
chunk: (bytes)多通道音频数据块
chunk: 多通道音频数据块
channels: 通道数
orig_sr: 原始采样率
target_sr: 目标采样率
mode: 重采样模式可选'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
Return:
(bytes)单通道音频数据块
单通道音频数据块
"""
# (length * channels,)
chunk_np = np.frombuffer(chunk, dtype=np.int16)
@@ -44,22 +45,23 @@ def resampleRawChunk(chunk, channels, orig_sr, target_sr, mode="sinc_best"):
chunk_mono_f = np.mean(chunk_np.astype(np.float32), axis=1)
chunk_mono = chunk_mono_f.astype(np.int16)
ratio = target_sr / orig_sr
chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode)
chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode)
chunk_mono_r = np.round(chunk_mono_r).astype(np.int16)
return chunk_mono_r.tobytes()
def resampleMonoChunk(chunk, orig_sr, target_sr, mode="sinc_best"):
def resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
"""
将当前单通道音频块进行重采样
Args:
chunk: (bytes)单通道音频数据块
chunk: 单通道音频数据块
orig_sr: 原始采样率
target_sr: 目标采样率
mode: 重采样模式可选'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
Return:
(bytes)单通道音频数据块
单通道音频数据块
"""
chunk_np = np.frombuffer(chunk, dtype=np.int16)
ratio = target_sr / orig_sr

18
engine/utils/sysout.py Normal file
View File

@@ -0,0 +1,18 @@
import sys
import json
def stdout(text: str):
stdout_cmd("print", text)
def stdout_cmd(command: str, content = ""):
msg = { "command": command, "content": content }
sys.stdout.write(json.dumps(msg) + "\n")
sys.stdout.flush()
def stdout_obj(obj):
sys.stdout.write(json.dumps(obj) + "\n")
sys.stdout.flush()
def stderr(text: str):
sys.stderr.write(text + "\n")
sys.stderr.flush()

26
package-lock.json generated
View File

@@ -1,17 +1,18 @@
{
"name": "auto-caption",
"version": "0.4.0",
"version": "0.5.1",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "auto-caption",
"version": "0.4.0",
"version": "0.5.1",
"hasInstallScript": true,
"dependencies": {
"@electron-toolkit/preload": "^3.0.1",
"@electron-toolkit/utils": "^4.0.0",
"ant-design-vue": "^4.2.6",
"pidusage": "^4.0.1",
"pinia": "^3.0.2",
"vue-i18n": "^11.1.9",
"vue-router": "^4.5.1"
@@ -21,6 +22,7 @@
"@electron-toolkit/eslint-config-ts": "^3.0.0",
"@electron-toolkit/tsconfig": "^1.0.1",
"@types/node": "^22.14.1",
"@types/pidusage": "^2.0.5",
"@vitejs/plugin-vue": "^5.2.3",
"electron": "^35.1.5",
"electron-builder": "^25.1.8",
@@ -2295,6 +2297,13 @@
"undici-types": "~6.21.0"
}
},
"node_modules/@types/pidusage": {
"version": "2.0.5",
"resolved": "https://registry.npmmirror.com/@types/pidusage/-/pidusage-2.0.5.tgz",
"integrity": "sha512-MIiyZI4/MK9UGUXWt0jJcCZhVw7YdhBuTOuqP/BjuLDLZ2PmmViMIQgZiWxtaMicQfAz/kMrZ5T7PKxFSkTeUA==",
"dev": true,
"license": "MIT"
},
"node_modules/@types/plist": {
"version": "3.0.5",
"resolved": "https://registry.npmmirror.com/@types/plist/-/plist-3.0.5.tgz",
@@ -7742,6 +7751,18 @@
"url": "https://github.com/sponsors/jonschlinkert"
}
},
"node_modules/pidusage": {
"version": "4.0.1",
"resolved": "https://registry.npmjs.org/pidusage/-/pidusage-4.0.1.tgz",
"integrity": "sha512-yCH2dtLHfEBnzlHUJymR/Z1nN2ePG3m392Mv8TFlTP1B0xkpMQNHAnfkY0n2tAi6ceKO6YWhxYfZ96V4vVkh/g==",
"license": "MIT",
"dependencies": {
"safe-buffer": "^5.2.1"
},
"engines": {
"node": ">=18"
}
},
"node_modules/pinia": {
"version": "3.0.2",
"resolved": "https://registry.npmmirror.com/pinia/-/pinia-3.0.2.tgz",
@@ -8292,7 +8313,6 @@
"version": "5.2.1",
"resolved": "https://registry.npmmirror.com/safe-buffer/-/safe-buffer-5.2.1.tgz",
"integrity": "sha512-rp3So07KcdmmKbGvgaNxQSJr7bGVSVk5S9Eq1F+ppbRo70+YeaDxkw5Dd8NPN+GD6bjnYm2VuPuCXmpuYvmCXQ==",
"dev": true,
"funding": [
{
"type": "github",

View File

@@ -1,7 +1,7 @@
{
"name": "auto-caption",
"productName": "Auto Caption",
"version": "0.4.0",
"version": "0.5.1",
"description": "A cross-platform subtitle display software.",
"main": "./out/main/index.js",
"author": "himeditator",
@@ -13,7 +13,7 @@
"typecheck:web": "vue-tsc --noEmit -p tsconfig.web.json --composite false",
"typecheck": "npm run typecheck:node && npm run typecheck:web",
"start": "electron-vite preview",
"dev": "electron-vite dev",
"dev": "chcp 65001 && electron-vite dev",
"build": "npm run typecheck && electron-vite build",
"postinstall": "electron-builder install-app-deps",
"build:unpack": "npm run build && electron-builder --dir",
@@ -25,6 +25,7 @@
"@electron-toolkit/preload": "^3.0.1",
"@electron-toolkit/utils": "^4.0.0",
"ant-design-vue": "^4.2.6",
"pidusage": "^4.0.1",
"pinia": "^3.0.2",
"vue-i18n": "^11.1.9",
"vue-router": "^4.5.1"
@@ -34,6 +35,7 @@
"@electron-toolkit/eslint-config-ts": "^3.0.0",
"@electron-toolkit/tsconfig": "^1.0.1",
"@types/node": "^22.14.1",
"@types/pidusage": "^2.0.5",
"@vitejs/plugin-vue": "^5.2.3",
"electron": "^35.1.5",
"electron-builder": "^25.1.8",

View File

@@ -1,5 +1,7 @@
import { shell, BrowserWindow, ipcMain, nativeTheme, dialog } from 'electron'
import path from 'path'
import { EngineInfo } from './types'
import pidusage from 'pidusage'
import { is } from '@electron-toolkit/utils'
import icon from '../../build/icon.png?asset'
import { captionWindow } from './CaptionWindow'
@@ -81,6 +83,20 @@ class ControlWindow {
return result.filePaths[0];
})
ipcMain.handle('control.engine.info', async () => {
const info: EngineInfo = {
pid: 0, ppid: 0, cpu: 0, mem: 0, elapsed: 0
}
if(captionEngine.processStatus !== 'running') return info
const stats = await pidusage(captionEngine.process.pid)
info.pid = stats.pid
info.ppid = stats.ppid
info.cpu = stats.cpu
info.mem = stats.memory
info.elapsed = stats.elapsed
return info
})
ipcMain.on('control.uiLanguage.change', (_, args) => {
allConfig.uiLanguage = args
if(captionWindow.window){

View File

@@ -54,3 +54,11 @@ export interface FullConfig {
controls: Controls,
captionLog: CaptionItem[]
}
export interface EngineInfo {
pid: number,
ppid: number,
cpu: number,
mem: number,
elapsed: number
}

View File

@@ -2,6 +2,7 @@ import {
UILanguage, UITheme, Styles, Controls,
CaptionItem, FullConfig
} from '../types'
import { Log } from './Log'
import { app, BrowserWindow } from 'electron'
import * as path from 'path'
import * as fs from 'fs'
@@ -48,6 +49,7 @@ class AllConfig {
uiTheme: UITheme = 'system';
styles: Styles = {...defaultStyles};
controls: Controls = {...defaultControls};
lastLogIndex: number = -1;
captionLog: CaptionItem[] = [];
constructor() {}
@@ -60,9 +62,8 @@ class AllConfig {
if(config.uiTheme) this.uiTheme = config.uiTheme
if(config.leftBarWidth) this.leftBarWidth = config.leftBarWidth
if(config.styles) this.setStyles(config.styles)
if(process.platform !== 'win32' && process.platform !== 'darwin') config.controls.audio = 1
if(config.controls) this.setControls(config.controls)
console.log('[INFO] Read Config from:', configPath)
Log.info('Read Config from:', configPath)
}
}
@@ -76,7 +77,7 @@ class AllConfig {
}
const configPath = path.join(app.getPath('userData'), 'config.json')
fs.writeFileSync(configPath, JSON.stringify(config, null, 2))
console.log('[INFO] Write Config to:', configPath)
Log.info('Write Config to:', configPath)
}
public getFullConfig(): FullConfig {
@@ -97,7 +98,7 @@ class AllConfig {
this.styles[key] = args[key]
}
}
console.log('[INFO] Set Styles:', this.styles)
Log.info('Set Styles:', this.styles)
}
public resetStyles() {
@@ -106,7 +107,7 @@ class AllConfig {
public sendStyles(window: BrowserWindow) {
window.webContents.send('both.styles.set', this.styles)
console.log(`[INFO] Send Styles to #${window.id}:`, this.styles)
Log.info(`Send Styles to #${window.id}:`, this.styles)
}
public setControls(args: Object) {
@@ -117,27 +118,28 @@ class AllConfig {
}
}
this.controls.engineEnabled = engineEnabled
console.log('[INFO] Set Controls:', this.controls)
Log.info('Set Controls:', this.controls)
}
public sendControls(window: BrowserWindow) {
window.webContents.send('control.controls.set', this.controls)
console.log(`[INFO] Send Controls to #${window.id}:`, this.controls)
Log.info(`Send Controls to #${window.id}:`, this.controls)
}
public updateCaptionLog(log: CaptionItem) {
let command: 'add' | 'upd' = 'add'
if(
this.captionLog.length &&
this.captionLog[this.captionLog.length - 1].index === log.index &&
this.captionLog[this.captionLog.length - 1].time_s === log.time_s
this.lastLogIndex === log.index
) {
this.captionLog.splice(this.captionLog.length - 1, 1, log)
command = 'upd'
}
else {
this.captionLog.push(log)
this.lastLogIndex = log.index
}
this.captionLog[this.captionLog.length - 1].index = this.captionLog.length
for(const window of BrowserWindow.getAllWindows()){
this.sendCaptionLog(window, command)
}

View File

@@ -5,6 +5,7 @@ import path from 'path'
import { controlWindow } from '../ControlWindow'
import { allConfig } from './AllConfig'
import { i18n } from '../i18n'
import { Log } from './Log'
export class CaptionEngine {
appPath: string = ''
@@ -13,33 +14,34 @@ export class CaptionEngine {
processStatus: 'running' | 'stopping' | 'stopped' = 'stopped'
private getApp(): boolean {
allConfig.controls.customized = false
if (allConfig.controls.customized && allConfig.controls.customizedApp) {
Log.info('Using customized engine')
this.appPath = allConfig.controls.customizedApp
this.command = [allConfig.controls.customizedCommand]
allConfig.controls.customized = true
this.command = allConfig.controls.customizedCommand.split(' ')
}
else if (allConfig.controls.engine === 'gummy') {
allConfig.controls.customized = false
if(!allConfig.controls.API_KEY && !process.env.DASHSCOPE_API_KEY) {
controlWindow.sendErrorMessage(i18n('gummy.key.missing'))
return false
}
let gummyName = 'main-gummy'
if (process.platform === 'win32') {
gummyName += '.exe'
}
if (process.platform === 'win32') { gummyName += '.exe' }
this.command = []
if (is.dev) {
this.appPath = path.join(
app.getAppPath(),
'caption-engine', 'dist', gummyName
app.getAppPath(), 'engine',
'subenv', 'Scripts', 'python.exe'
)
this.command.push(path.join(
app.getAppPath(), 'engine', 'main-gummy.py'
))
}
else {
this.appPath = path.join(
process.resourcesPath, 'caption-engine', gummyName
process.resourcesPath, 'engine', gummyName
)
}
this.command = []
this.command.push('-s', allConfig.controls.sourceLang)
this.command.push(
'-t', allConfig.controls.translation ?
@@ -51,32 +53,35 @@ export class CaptionEngine {
}
}
else if(allConfig.controls.engine === 'vosk'){
allConfig.controls.customized = false
let voskName = 'main-vosk'
if (process.platform === 'win32') {
voskName += '.exe'
}
if (process.platform === 'win32') { voskName += '.exe' }
this.command = []
if (is.dev) {
this.appPath = path.join(
app.getAppPath(),
'caption-engine', 'dist', voskName
app.getAppPath(), 'engine',
'subenv', 'Scripts', 'python.exe'
)
this.command.push(path.join(
app.getAppPath(), 'engine', 'main-vosk.py'
))
}
else {
this.appPath = path.join(
process.resourcesPath, 'caption-engine', voskName
process.resourcesPath, 'engine', voskName
)
}
this.command = []
this.command.push('-a', allConfig.controls.audio ? '1' : '0')
this.command.push('-m', `"${allConfig.controls.modelPath}"`)
}
console.log('[INFO] Engine Path:', this.appPath)
console.log('[INFO] Engine Command:', this.command)
Log.info('Engine Path:', this.appPath)
Log.info('Engine Command:', this.command)
return true
}
public start() {
if (this.processStatus !== 'stopped') {
Log.warn('Caption engine status is not stopped, cannot start')
return
}
if(!this.getApp()){ return }
@@ -86,12 +91,12 @@ export class CaptionEngine {
}
catch (e) {
controlWindow.sendErrorMessage(i18n('engine.start.error') + e)
console.error('[ERROR] Error starting subprocess:', e)
Log.error('Error starting engine:', e)
return
}
this.processStatus = 'running'
console.log('[INFO] Caption Engine Started, PID:', this.process.pid)
Log.info('Caption Engine Started, PID:', this.process.pid)
allConfig.controls.engineEnabled = true
if(controlWindow.window){
@@ -107,23 +112,23 @@ export class CaptionEngine {
lines.forEach((line: string) => {
if (line.trim()) {
try {
const caption = JSON.parse(line);
allConfig.updateCaptionLog(caption);
const data_obj = JSON.parse(line)
handleEngineData(data_obj)
} catch (e) {
controlWindow.sendErrorMessage(i18n('engine.output.parse.error') + e)
console.error('[ERROR] Error parsing JSON:', e);
Log.error('Error parsing JSON:', e)
}
}
});
});
this.process.stderr.on('data', (data) => {
this.process.stderr.on('data', (data: any) => {
if(this.processStatus === 'stopping') return
controlWindow.sendErrorMessage(i18n('engine.error') + data)
console.error(`[ERROR] Subprocess Error: ${data}`);
Log.error(`Engine Error: ${data}`);
});
this.process.on('close', (code: any) => {
console.log(`[INFO] Subprocess exited with code ${code}`);
this.process = undefined;
allConfig.controls.engineEnabled = false
if(controlWindow.window){
@@ -131,14 +136,14 @@ export class CaptionEngine {
controlWindow.window.webContents.send('control.engine.stopped')
}
this.processStatus = 'stopped'
console.log('[INFO] Caption engine process stopped')
Log.info(`Engine exited with code ${code}`)
});
}
public stop() {
if(this.processStatus !== 'running') return
if (this.process.pid) {
console.log('[INFO] Trying to stop process, PID:', this.process.pid)
Log.info('Trying to stop process, PID:', this.process.pid)
let cmd = `kill ${this.process.pid}`;
if (process.platform === "win32") {
cmd = `taskkill /pid ${this.process.pid} /t /f`
@@ -146,7 +151,7 @@ export class CaptionEngine {
exec(cmd, (error) => {
if (error) {
controlWindow.sendErrorMessage(i18n('engine.shutdown.error') + error)
console.error(`[ERROR] Failed to kill process: ${error}`)
Log.error(`Failed to kill process: ${error}`)
}
})
}
@@ -158,11 +163,26 @@ export class CaptionEngine {
controlWindow.window.webContents.send('control.engine.stopped')
}
this.processStatus = 'stopped'
console.log('[INFO] Process PID undefined, caption engine process stopped')
Log.info('Process PID undefined, caption engine process stopped')
return
}
this.processStatus = 'stopping'
console.log('[INFO] Caption engine process stopping')
Log.info('Caption engine process stopping')
}
}
function handleEngineData(data: any) {
if(data.command === 'caption') {
allConfig.updateCaptionLog(data);
}
else if(data.command === 'print') {
Log.info('Engine print:', data.content)
}
else if(data.command === 'info') {
Log.info('Engine info:', data.content)
}
else if(data.command === 'usage') {
Log.info('Caption engine usage: ', data.content)
}
}

21
src/main/utils/Log.ts Normal file
View File

@@ -0,0 +1,21 @@
function getTimeString() {
const now = new Date()
const HH = String(now.getHours()).padStart(2, '0')
const MM = String(now.getMinutes()).padStart(2, '0')
const SS = String(now.getSeconds()).padStart(2, '0')
return `${HH}:${MM}:${SS}`
}
export class Log {
static info(...msg: any[]){
console.log(`[INFO ${getTimeString()}]`, ...msg)
}
static warn(...msg: any[]){
console.log(`[WARN ${getTimeString()}]`, ...msg)
}
static error(...msg: any[]){
console.log(`[ERROR ${getTimeString()}]`, ...msg)
}
}

View File

@@ -4,46 +4,109 @@
<a-app class="caption-title">
<span style="margin-right: 30px;">{{ $t('log.title') }}</span>
</a-app>
<a-button
type="primary"
style="margin-right: 20px;"
@click="exportCaptions"
:disabled="captionData.length === 0"
>{{ $t('log.export') }}</a-button>
<a-popover :title="$t('log.copyOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.addIndex') }}</span>
<a-switch v-model:checked="showIndex" />
<span class="input-label">{{ $t('log.copyTime') }}</span>
<a-switch v-model:checked="copyTime" />
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.copyContent') }}</span>
<a-radio-group v-model:value="copyOption">
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="copyCaptions"
:disabled="captionData.length === 0"
>{{ $t('log.copy') }}</a-button>
<a-popover :title="$t('log.baseTime')">
<template #content>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0"
v-model:value="baseHH"
></a-input>
<span class="base-time-label">{{ $t('log.hour') }}</span>
</div>
</div><span style="margin: 0 4px;">:</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="59"
v-model:value="baseMM"
></a-input>
<span class="base-time-label">{{ $t('log.min') }}</span>
</div>
</div><span style="margin: 0 4px;">:</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="59"
v-model:value="baseSS"
></a-input>
<span class="base-time-label">{{ $t('log.sec') }}</span>
</div>
</div><span style="margin: 0 4px;">.</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="999"
v-model:value="baseMS"
></a-input>
<span class="base-time-label">{{ $t('log.ms') }}</span>
</div>
</div>
</template>
<a-button
type="primary"
style="margin-right: 20px;"
@click="changeBaseTime"
:disabled="captionData.length === 0"
>{{ $t('log.changeTime') }}</a-button>
</a-popover>
<a-popover :title="$t('log.exportOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.exportFormat') }}</span>
<a-radio-group v-model:value="exportFormat">
<a-radio-button value="srt">.srt</a-radio-button>
<a-radio-button value="json">.json</a-radio-button>
</a-radio-group>
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.exportContent') }}</span>
<a-radio-group v-model:value="contentOption">
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="exportCaptions"
:disabled="captionData.length === 0"
>{{ $t('log.export') }}</a-button>
</a-popover>
<a-popover :title="$t('log.copyOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.addIndex') }}</span>
<a-switch v-model:checked="showIndex" />
<span class="input-label">{{ $t('log.copyTime') }}</span>
<a-switch v-model:checked="copyTime" />
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.copyContent') }}</span>
<a-radio-group v-model:value="contentOption">
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="copyCaptions"
>{{ $t('log.copy') }}</a-button>
</a-popover>
<a-button
danger
@click="clearCaptions"
>{{ $t('log.clear') }}</a-button>
</div>
<a-table
:columns="columns"
:data-source="captionData"
v-model:pagination="pagination"
style="margin-top: 10px;"
>
<template #bodyCell="{ column, record }">
<template v-if="column.key === 'index'">
@@ -72,21 +135,29 @@ import { storeToRefs } from 'pinia'
import { useCaptionLogStore } from '@renderer/stores/captionLog'
import { message } from 'ant-design-vue'
import { useI18n } from 'vue-i18n'
import * as tc from '../utils/timeCalc'
import { CaptionItem } from '../types'
const { t } = useI18n()
const captionLog = useCaptionLogStore()
const { captionData } = storeToRefs(captionLog)
const exportFormat = ref('srt')
const showIndex = ref(true)
const copyTime = ref(true)
const copyOption = ref('both')
const contentOption = ref('both')
const baseHH = ref<number>(0)
const baseMM = ref<number>(0)
const baseSS = ref<number>(0)
const baseMS = ref<number>(0)
const pagination = ref({
current: 1,
pageSize: 10,
pageSize: 20,
showSizeChanger: true,
pageSizeOptions: ['10', '20', '50'],
showTotal: (total: number) => `Total: ${total}`,
pageSizeOptions: ['10', '20', '50', '100'],
onChange: (page: number, pageSize: number) => {
pagination.value.current = page
pagination.value.pageSize = pageSize
@@ -103,12 +174,23 @@ const columns = [
dataIndex: 'index',
key: 'index',
width: 80,
sorter: (a: CaptionItem, b: CaptionItem) => {
if(a.index <= b.index) return -1
return 1
},
sortDirections: ['descend'],
defaultSortOrder: 'descend',
},
{
title: 'time',
dataIndex: 'time',
key: 'time',
width: 160,
sorter: (a: CaptionItem, b: CaptionItem) => {
if(a.time_s <= b.time_s) return -1
return 1
},
sortDirections: ['descend', 'ascend'],
},
{
title: 'content',
@@ -117,28 +199,68 @@ const columns = [
},
]
function changeBaseTime() {
if(baseHH.value < 0) baseHH.value = 0
if(baseMM.value < 0) baseMM.value = 0
if(baseMM.value > 59) baseMM.value = 59
if(baseSS.value < 0) baseSS.value = 0
if(baseSS.value > 59) baseSS.value = 59
if(baseMS.value < 0) baseMS.value = 0
if(baseMS.value > 999) baseMS.value = 999
const newBase: tc.Time = {
hh: Number(baseHH.value),
mm: Number(baseMM.value),
ss: Number(baseSS.value),
ms: Number(baseMS.value)
}
const oldBase = tc.getTimeFromStr(captionData.value[0].time_s)
const deltaMs = tc.getMsFromTime(newBase) - tc.getMsFromTime(oldBase)
for(let i = 0; i < captionData.value.length; i++){
captionData.value[i].time_s =
tc.getNewTimeStr(captionData.value[i].time_s, deltaMs)
captionData.value[i].time_t =
tc.getNewTimeStr(captionData.value[i].time_t, deltaMs)
}
}
function exportCaptions() {
const jsonData = JSON.stringify(captionData.value, null, 2)
const blob = new Blob([jsonData], { type: 'application/json' })
const exportData = getExportData()
const blob = new Blob([exportData], {
type: exportFormat.value === 'json' ? 'application/json' : 'text/plain'
})
const url = URL.createObjectURL(blob)
const a = document.createElement('a')
a.href = url
const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
a.download = `captions-${timestamp}.json`
a.download = `captions-${timestamp}.${exportFormat.value}`
document.body.appendChild(a)
a.click()
document.body.removeChild(a)
URL.revokeObjectURL(url)
}
function getExportData() {
if(exportFormat.value === 'json') return JSON.stringify(captionData.value, null, 2)
let content = ''
for(let i = 0; i < captionData.value.length; i++){
const item = captionData.value[i]
content += `${i+1}\n`
content += `${item.time_s} --> ${item.time_t}\n`.replace(/\./g, ',')
if(contentOption.value === 'both') content += `${item.text}\n${item.translation}\n\n`
else if(contentOption.value === 'source') content += `${item.text}\n\n`
else content += `${item.translation}\n\n`
}
return content
}
function copyCaptions() {
let content = ''
for(let i = 0; i < captionData.value.length; i++){
const item = captionData.value[i]
if(showIndex.value) content += `${i+1}\n`
if(copyTime.value) content += `${item.time_s} --> ${item.time_t}\n`.replace(/\./g, ',')
if(copyOption.value === 'both') content += `${item.text}\n${item.translation}\n\n`
else if(copyOption.value === 'source') content += `${item.text}\n\n`
if(contentOption.value === 'both') content += `${item.text}\n${item.translation}\n\n`
else if(contentOption.value === 'source') content += `${item.text}\n\n`
else content += `${item.translation}\n\n`
}
navigator.clipboard.writeText(content)
@@ -166,6 +288,23 @@ function clearCaptions() {
margin-bottom: 10px;
}
.base-time {
width: 64px;
display: inline-block;
}
.base-time-container {
display: flex;
flex-direction: column;
align-items: center;
gap: 4px;
}
.base-time-label {
font-size: 12px;
color: var(--tag-color);
}
.time-cell {
display: flex;
flex-direction: column;

View File

@@ -37,7 +37,7 @@
<a-input
class="input-area"
type="range"
min="0" max="64"
min="0" max="72"
v-model:value="currentFontSize"
/>
<div class="input-item-value">{{ currentFontSize }}px</div>
@@ -114,7 +114,7 @@
<a-input
class="input-area"
type="range"
min="0" max="64"
min="0" max="72"
v-model:value="currentTransFontSize"
/>
<div class="input-item-value">{{ currentTransFontSize }}px</div>
@@ -159,7 +159,7 @@
<a-input
class="input-area"
type="range"
min="0" max="10"
min="0" max="12"
v-model:value="currentBlur"
/>
<div class="input-item-value">{{ currentBlur }}px</div>
@@ -335,13 +335,12 @@ watch(changeSignal, (val) => {
}
.preview-container {
line-height: 2em;
width: 60%;
text-align: center;
position: absolute;
padding: 20px;
padding: 10px;
border-radius: 10px;
left: 50%;
left: 64%;
transform: translateX(-50%);
bottom: 20px;
}
@@ -349,7 +348,7 @@ watch(changeSignal, (val) => {
.preview-container p {
text-align: center;
margin: 0;
line-height: 1.5em;
line-height: 1.6em;
}
.left-ellipsis {

View File

@@ -33,7 +33,6 @@
<div class="input-item">
<span class="input-label">{{ $t('engine.audioType') }}</span>
<a-select
:disabled="platform !== 'win32' && platform !== 'darwin'"
class="input-area"
v-model:value="currentAudio"
:options="audioType"
@@ -128,7 +127,7 @@ const { t } = useI18n()
const showMore = ref(false)
const engineControl = useEngineControlStore()
const { platform, captionEngine, audioType, changeSignal } = storeToRefs(engineControl)
const { captionEngine, audioType, changeSignal } = storeToRefs(engineControl)
const currentSourceLang = ref('auto')
const currentTargetLang = ref('zh')
@@ -231,7 +230,7 @@ watch(currentEngine, (val) => {
}
.input-folder>span {
padding: 0 4px;
padding: 0 2px;
border: 2px solid #1677ff;
color: #1677ff;
border-radius: 30%;

View File

@@ -7,12 +7,42 @@
:value="(customized && customizedApp)?$t('status.customized'):engine"
/>
</a-col>
<a-col :span="6">
<a-statistic
:title="$t('status.status')"
:value="engineEnabled?$t('status.started'):$t('status.stopped')"
/>
</a-col>
<a-popover :title="$t('status.engineStatus')">
<template #content>
<a-row class="engine-status">
<a-col :flex="1" :title="$t('status.pid')" style="cursor:pointer;">
<div class="engine-status-title">pid</div>
<div>{{ pid }}</div>
</a-col>
<a-col :flex="1" :title="$t('status.ppid')" style="cursor:pointer;">
<div class="engine-status-title">ppid</div>
<div>{{ ppid }}</div>
</a-col>
<a-col :flex="1" :title="$t('status.cpu')" style="cursor:pointer;">
<div class="engine-status-title">cpu</div>
<div>{{ cpu.toFixed(1) }}%</div>
</a-col>
<a-col :flex="1" :title="$t('status.mem')" style="cursor:pointer;">
<div class="engine-status-title">mem</div>
<div>{{ (mem/1024/1024).toFixed(2) }}MB</div>
</a-col>
<a-col :flex="1" :title="$t('status.elapsed')" style="cursor:pointer;">
<div class="engine-status-title">elapsed</div>
<div>{{ (elapsed/1000).toFixed(0) }}s</div>
</a-col>
</a-row>
</template>
<a-col :span="6" @mouseenter="getEngineInfo" style="cursor: pointer;">
<a-statistic
:title="$t('status.status')"
:value="engineEnabled?$t('status.started'):$t('status.stopped')"
>
<template #suffix v-if="engineEnabled">
<InfoCircleOutlined style="font-size:18px;color:#1677ff"/>
</template>
</a-statistic>
</a-col>
</a-popover>
<a-col :span="6">
<a-statistic :title="$t('status.logNumber')" :value="captionData.length" />
</a-col>
@@ -47,7 +77,7 @@
<p class="about-desc">{{ $t('status.about.desc') }}</p>
<a-divider />
<div class="about-info">
<p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v0.4.0</a-tag></p>
<p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v0.5.1</a-tag></p>
<p>
<b>{{ $t('status.about.author') }}</b>
<a
@@ -88,11 +118,12 @@
</template>
<script setup lang="ts">
import { EngineInfo } from '@renderer/types'
import { ref } from 'vue'
import { storeToRefs } from 'pinia'
import { useCaptionLogStore } from '@renderer/stores/captionLog'
import { useEngineControlStore } from '@renderer/stores/engineControl'
import { GithubOutlined } from '@ant-design/icons-vue';
import { GithubOutlined, InfoCircleOutlined } from '@ant-design/icons-vue';
const showAbout = ref(false)
@@ -101,12 +132,17 @@ const { captionData } = storeToRefs(captionLog)
const engineControl = useEngineControlStore()
const { engineEnabled, engine, customized, customizedApp } = storeToRefs(engineControl)
const pid = ref(0)
const ppid = ref(0)
const cpu = ref(0)
const mem = ref(0)
const elapsed = ref(0)
function openCaptionWindow() {
window.electron.ipcRenderer.send('control.captionWindow.activate')
}
function startEngine() {
console.log(`@@${engineControl.modelPath}##`)
if(engineControl.engine === 'vosk' && engineControl.modelPath.trim() === '') {
engineControl.emptyModelPathErr()
return
@@ -117,9 +153,32 @@ function startEngine() {
function stopEngine() {
window.electron.ipcRenderer.send('control.engine.stop')
}
function getEngineInfo() {
window.electron.ipcRenderer.invoke('control.engine.info').then((data: EngineInfo) => {
pid.value = data.pid
ppid.value = data.ppid
cpu.value = data.cpu
mem.value = data.mem
elapsed.value = data.elapsed
})
}
</script>
<style scoped>
.engine-status {
width: max(420px, 36vw);
display: flex;
align-items: center;
padding: 5px 10px;
}
.engine-status-title {
font-size: 12px;
color: var(--tag-color);
}
.about-tag {
color: var(--tag-color);
margin-bottom: 16px;

View File

@@ -91,6 +91,12 @@ export default {
},
status: {
"engine": "Caption Engine",
"engineStatus": "Caption Engine Status",
"pid": "Process ID",
"ppid": "Parent Process ID",
"cpu": "CPU Usage",
"mem": "Memory Usage",
"elapsed": "Running Time",
"customized": "Customized",
"status": "Engine Status",
"started": "Started",
@@ -110,21 +116,30 @@ export default {
"projLink": "Project Link",
"manual": "User Manual",
"engineDoc": "Caption Engine Manual",
"date": "July 11, 2026"
"date": "July 17, 2025"
}
},
log: {
"title": "Caption Log",
"copy": "Copy to Clipboard",
"changeTime": "Modify Time",
"baseTime": "First Caption Start Time",
"hour": "Hour",
"min": "Minute",
"sec": "Second",
"ms": "Millisecond",
"export": "Export Log",
"copy": "Copy Log",
"exportOptions": "Export Options",
"exportFormat": "Format",
"exportContent": "Content",
"copyOptions": "Copy Options",
"addIndex": "Add Index",
"copyTime": "Copy Time",
"copyContent": "Content",
"both": "Original and Translation",
"source": "Original Only",
"translation": "Translation Only",
"both": "Both",
"source": "Original",
"translation": "Translation",
"copySuccess": "Subtitle copied to clipboard",
"export": "Export Caption Log",
"clear": "Clear Caption Log"
"clear": "Clear Log"
}
}

View File

@@ -91,6 +91,12 @@ export default {
},
status: {
"engine": "字幕エンジン",
"engineStatus": "字幕エンジンの状態",
"pid": "プロセス ID",
"ppid": "親プロセス ID",
"cpu": "CPU 使用率",
"mem": "メモリ使用量",
"elapsed": "稼働時間",
"customized": "カスタマイズ済み",
"status": "エンジン状態",
"started": "開始済み",
@@ -110,21 +116,30 @@ export default {
"projLink": "プロジェクトリンク",
"manual": "ユーザーマニュアル",
"engineDoc": "字幕エンジンマニュアル",
"date": "2025 年 7 月 11 日"
"date": "2025 年 7 月 17 日"
}
},
log: {
"title": "字幕ログ",
"copy": "クリップボードにコピー",
"changeTime": "時間を変更",
"baseTime": "最初の字幕開始時間",
"hour": "時",
"min": "分",
"sec": "秒",
"ms": "ミリ秒",
"export": "エクスポート",
"copy": "ログをコピー",
"exportOptions": "エクスポートオプション",
"exportFormat": "形式",
"exportContent": "内容",
"copyOptions": "コピー設定",
"addIndex": "順序番号",
"copyTime": "時間",
"copyContent": "内容",
"both": "原文と翻訳",
"source": "原文のみ",
"translation": "翻訳のみ",
"both": "すべて",
"source": "原文",
"translation": "翻訳",
"copySuccess": "字幕がクリップボードにコピーされました",
"export": "エクスポート",
"clear": "字幕ログをクリア"
"clear": "ログをクリア"
}
}

View File

@@ -91,6 +91,12 @@ export default {
},
status: {
"engine": "字幕引擎",
"engineStatus": "字幕引擎状态",
"pid": "进程ID",
"ppid": "父进程ID",
"cpu": "CPU使用率",
"mem": "内存使用量",
"elapsed": "运行时间",
"customized": "自定义",
"status": "引擎状态",
"started": "已启动",
@@ -110,21 +116,30 @@ export default {
"projLink": "项目链接",
"manual": "用户手册",
"engineDoc": "字幕引擎手册",
"date": "2025 年 7 月 11 日"
"date": "2025 年 7 月 17 日"
}
},
log: {
"title": "字幕记录",
"export": "导出字幕记录",
"copy": "复制到剪贴板",
"changeTime": "修改时间",
"baseTime": "首条字幕起始时间",
"hour": "时",
"min": "分",
"sec": "秒",
"ms": "毫秒",
"export": "导出字幕",
"copy": "复制内容",
"exportOptions": "导出选项",
"exportFormat": "导出格式",
"exportContent": "导出内容",
"copyOptions": "复制选项",
"addIndex": "添加序号",
"copyTime": "复制时间",
"copyContent": "复制内容",
"both": "原文与翻译",
"source": "原文",
"translation": "翻译",
"both": "全部",
"source": "原文",
"translation": "翻译",
"copySuccess": "字幕已复制到剪贴板",
"clear": "清空字幕记录"
"clear": "清空记录"
}
}

View File

@@ -1,4 +1,4 @@
import { ref, watch } from 'vue'
import { ref } from 'vue'
import { defineStore } from 'pinia'
import { notification } from 'ant-design-vue'
@@ -104,12 +104,6 @@ export const useEngineControlStore = defineStore('engineControl', () => {
});
})
watch(platform, (newValue) => {
if(newValue !== 'win32' && newValue !== 'darwin') {
audio.value = 1
}
})
return {
platform, // 系统平台
captionEngine, // 字幕引擎列表

View File

@@ -54,3 +54,11 @@ export interface FullConfig {
controls: Controls,
captionLog: CaptionItem[]
}
export interface EngineInfo {
pid: number,
ppid: number,
cpu: number,
mem: number,
elapsed: number
}

View File

@@ -0,0 +1,42 @@
export interface Time {
hh: number;
mm: number;
ss: number;
ms: number;
}
export function getTimeFromStr(time: string): Time {
const arr = time.split(":");
const hh = parseInt(arr[0]);
const mm = parseInt(arr[1]);
const ss = parseInt(arr[2].split(".")[0]);
const ms = parseInt(arr[2].split(".")[1]);
return { hh, mm, ss, ms };
}
export function getStrFromTime(time: Time): string {
return `${time.hh}:${time.mm}:${time.ss}.${time.ms}`;
}
export function getMsFromTime(time: Time): number {
return (
time.hh * 3600000 +
time.mm * 60000 +
time.ss * 1000 +
time.ms
);
}
export function getTimeFromMs(milliseconds: number): Time {
const hh = Math.floor(milliseconds / 3600000);
const mm = Math.floor((milliseconds % 3600000) / 60000);
const ss = Math.floor((milliseconds % 60000) / 1000);
const ms = milliseconds % 1000;
return { hh, mm, ss, ms };
}
export function getNewTimeStr(timeStr: string, Ms: number): string {
const timeMs = getMsFromTime(getTimeFromStr(timeStr));
const newTimeMs = timeMs + Ms;
return getStrFromTime(getTimeFromMs(newTimeMs));
}

View File

@@ -1,24 +1,11 @@
<template>
<div
class="caption-page"
ref="caption"
:style="{
backgroundColor: captionStyle.backgroundRGBA
}"
class="caption-page"
ref="caption"
:style="{
backgroundColor: captionStyle.backgroundRGBA
}"
>
<div class="title-bar" :style="{color: captionStyle.fontColor}">
<div class="drag-area">&nbsp;</div>
<div class="option-item" @click="pinCaptionWindow">
<PushpinFilled v-if="pinned" />
<PushpinOutlined v-else />
</div>
<div class="option-item" @click="openControlWindow">
<SettingOutlined />
</div>
<div class="option-item" @click="closeCaptionWindow">
<CloseOutlined />
</div>
</div>
<div
class="caption-container"
:style="{
@@ -46,6 +33,20 @@
<span v-else>{{ $t('example.translation') }}</span>
</p>
</div>
<div class="title-bar" :style="{color: captionStyle.fontColor}">
<div class="option-item" @click="closeCaptionWindow">
<CloseOutlined />
</div>
<div class="option-item" @click="openControlWindow">
<SettingOutlined />
</div>
<div class="option-item" @click="pinCaptionWindow">
<PushpinFilled v-if="pinned" />
<PushpinOutlined v-else />
</div>
<div class="drag-area"></div>
</div>
</div>
</template>
@@ -97,38 +98,21 @@ function closeCaptionWindow() {
border-radius: 8px;
box-sizing: border-box;
border: 1px solid #3333;
}
.title-bar {
display: flex;
align-items: center;
}
.drag-area {
padding: 5px;
flex-grow: 1;
-webkit-app-region: drag;
}
.option-item {
display: inline-block;
padding: 5px 10px;
cursor: pointer;
}
.option-item:hover {
background-color: #2221;
}
.caption-container {
display: inline-block;
width: calc(100% - 32px);
-webkit-app-region: drag;
padding-top: 10px;
padding-bottom: 10px;
}
.caption-container p {
text-align: center;
margin: 0;
line-height: 1.5em;
padding: 0 10px 10px 10px;
line-height: 1.6em;
}
.left-ellipsis {
@@ -142,4 +126,30 @@ function closeCaptionWindow() {
direction: ltr;
display: inline-block;
}
.title-bar {
width: 32px;
display: flex;
flex-direction: column;
vertical-align: top;
}
.option-item {
width: 32px;
height: 32px;
display: flex;
justify-content: center;
align-items: center;
cursor: pointer;
}
.option-item:hover {
background-color: #2221;
}
.drag-area {
display: inline-flex;
flex-grow: 1;
-webkit-app-region: drag;
}
</style>