8 Commits

Author SHA1 Message Date
himeditator
e4f937e6b6 feat(engine): 优化字幕引擎通信和控制逻辑,优化窗口信息展示
- 优化错误处理和引擎重启逻辑
- 添加字幕引擎强制终止功能
- 调整通知和错误提示的显示位置
- 优化日志记录精度到毫秒级
2025-07-28 21:44:49 +08:00
himeditator
cd9f3a847d feat(engine): 重构字幕引擎并实现 WebSocket 通信
- 重构了 Gummy 和 Vosk 字幕引擎的代码,提高了可扩展性和可读性
- 合并 Gummy 和 Vosk 引擎为单个可执行文件
- 实现了字幕引擎和主程序之间的 WebSocket 通信,避免了孤儿进程问题
2025-07-28 15:49:52 +08:00
himeditator
b658ef5440 feat(engine): 优化字幕引擎输出格式、准备合并两个字幕引擎
- 重构字幕引擎相关代码
- 准备合并两个字幕引擎
2025-07-27 17:15:12 +08:00
himeditator
3792eb88b6 refactor(engine): 重构字幕引擎
- 更新 GummyTranslator 类,优化字幕生成逻辑
- 移除 audioprcs 模块,音频处理功能转移到 utils 模块
- 重构 sysaudio 模块,提高音频流管理的灵活性和稳定性
- 修改 TODO.md,完成按时间降序排列字幕记录的功能
- 更新文档,说明因资源限制将不再维护英文和日文文档
2025-07-26 23:37:24 +08:00
himeditator
8e575a9ba3 refactor(engine): 字幕引擎文件夹重命名,字幕记录添加降序选择
- 字幕记录表格可以按时间降序排列
- 将 caption-engine 重命名为 engine
- 更新了相关文件和文件夹的路径
- 修改了 README 和 TODO 文档中的相关内容
- 更新了 Electron 构建配置
2025-07-26 21:29:16 +08:00
himeditator
697488ce84 docs: update README, add TODO 2025-07-20 00:32:57 +08:00
himeditator
f7d2df938d fix(engine): 修复自定义字幕引擎相关问题 2025-07-17 20:52:27 +08:00
himeditator
5513c7e84c docs(compatibility): 添加 Kylin OS 支持、更新文档 2025-07-16 20:55:03 +08:00
59 changed files with 854 additions and 1049 deletions

8
.gitignore vendored
View File

@@ -5,8 +5,8 @@ out
.eslintcache .eslintcache
*.log* *.log*
__pycache__ __pycache__
subenv
caption-engine/build
caption-engine/models
output.wav
.venv .venv
subenv
engine/build
engine/models
engine/notebook

View File

@@ -9,6 +9,6 @@
"editor.defaultFormatter": "esbenp.prettier-vscode" "editor.defaultFormatter": "esbenp.prettier-vscode"
}, },
"python.analysis.extraPaths": [ "python.analysis.extraPaths": [
"./caption-engine" "./engine"
] ]
} }

View File

@@ -4,7 +4,7 @@
<p>Auto Caption 是一个跨平台的实时字幕显示软件。</p> <p>Auto Caption 是一个跨平台的实时字幕显示软件。</p>
<p> <p>
<a href="https://github.com/HiMeditator/auto-caption/releases"> <a href="https://github.com/HiMeditator/auto-caption/releases">
<img src="https://img.shields.io/badge/release-0.5.0-blue"> <img src="https://img.shields.io/badge/release-0.5.1-blue">
</a> </a>
<a href="https://github.com/HiMeditator/auto-caption/issues"> <a href="https://github.com/HiMeditator/auto-caption/issues">
<img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"> <img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
@@ -18,7 +18,7 @@
| <a href="./README_en.md">English</a> | <a href="./README_en.md">English</a>
| <a href="./README_ja.md">日本語</a> | | <a href="./README_ja.md">日本語</a> |
</p> </p>
<p><i>v0.5.0 版本已经发布。<b>目前 Vosk 本地字幕引擎效果较差,且不含翻译</b>,更优秀的字幕引擎正在尝试开发中...</i></p> <p><i>v0.5.1 版本已经发布。<b>目前 Vosk 本地字幕引擎效果较差,且不含翻译</b>,更优秀的字幕引擎正在尝试开发中...</i></p>
</div> </div>
![](./assets/media/main_zh.png) ![](./assets/media/main_zh.png)
@@ -45,6 +45,7 @@
| macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ | | macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ | | Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ | | Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置,详见[Auto Caption 用户手册](./docs/user-manual/zh.md)。 macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置,详见[Auto Caption 用户手册](./docs/user-manual/zh.md)。
@@ -121,10 +122,10 @@ npm install
### 构建字幕引擎 ### 构建字幕引擎
首先进入 `caption-engine` 文件夹,执行如下指令创建虚拟环境: 首先进入 `engine` 文件夹,执行如下指令创建虚拟环境:
```bash ```bash
# in ./caption-engine folder # in ./engine folder
python -m venv subenv python -m venv subenv
# or # or
python3 -m venv subenv python3 -m venv subenv
@@ -172,7 +173,7 @@ vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve()) vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
``` ```
此时项目构建完成,在进入 `caption-engine/dist` 文件夹可见对应的可执行文件。即可进行后续操作。 此时项目构建完成,在进入 `engine/dist` 文件夹可见对应的可执行文件。即可进行后续操作。
### 运行项目 ### 运行项目
@@ -182,8 +183,6 @@ npm run dev
### 构建项目 ### 构建项目
注意目前软件只在 Windows 和 macOS 平台上进行了构建和测试,无法保证软件在 Linux 平台下的正确性。
```bash ```bash
# For windows # For windows
npm run build:win npm run build:win
@@ -198,13 +197,13 @@ npm run build:linux
```yml ```yml
extraResources: extraResources:
# For Windows # For Windows
- from: ./caption-engine/dist/main-gummy.exe - from: ./engine/dist/main-gummy.exe
to: ./caption-engine/main-gummy.exe to: ./engine/main-gummy.exe
- from: ./caption-engine/dist/main-vosk.exe - from: ./engine/dist/main-vosk.exe
to: ./caption-engine/main-vosk.exe to: ./engine/main-vosk.exe
# For macOS and Linux # For macOS and Linux
# - from: ./caption-engine/dist/main-gummy # - from: ./engine/dist/main-gummy
# to: ./caption-engine/main-gummy # to: ./engine/main-gummy
# - from: ./caption-engine/dist/main-vosk # - from: ./engine/dist/main-vosk
# to: ./caption-engine/main-vosk # to: ./engine/main-vosk
``` ```

View File

@@ -4,7 +4,7 @@
<p>Auto Caption is a cross-platform real-time caption display software.</p> <p>Auto Caption is a cross-platform real-time caption display software.</p>
<p> <p>
<a href="https://github.com/HiMeditator/auto-caption/releases"> <a href="https://github.com/HiMeditator/auto-caption/releases">
<img src="https://img.shields.io/badge/release-0.5.0-blue"> <img src="https://img.shields.io/badge/release-0.5.1-blue">
</a> </a>
<a href="https://github.com/HiMeditator/auto-caption/issues"> <a href="https://github.com/HiMeditator/auto-caption/issues">
<img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"> <img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
@@ -18,7 +18,7 @@
| <b>English</b> | <b>English</b>
| <a href="./README_ja.md">日本語</a> | | <a href="./README_ja.md">日本語</a> |
</p> </p>
<p><i>Version v0.5.0 has been released. <b>The current Vosk local caption engine performs poorly and does not include translation</b>. A better caption engine is under development...</i></p> <p><i>Version v0.5.1 has been released. <b>The current Vosk local caption engine performs poorly and does not include translation</b>. A better caption engine is under development...</i></p>
</div> </div>
![](./assets/media/main_en.png) ![](./assets/media/main_en.png)
@@ -45,6 +45,7 @@ The software has been adapted for Windows, macOS, and Linux platforms. The teste
| macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ | | macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ | | Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ | | Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
Additional configuration is required to capture system audio output on macOS and Linux platforms. See [Auto Caption User Manual](./docs/user-manual/en.md) for details. Additional configuration is required to capture system audio output on macOS and Linux platforms. See [Auto Caption User Manual](./docs/user-manual/en.md) for details.
@@ -121,10 +122,10 @@ npm install
### Build Subtitle Engine ### Build Subtitle Engine
First enter the `caption-engine` folder and execute the following commands to create a virtual environment: First enter the `engine` folder and execute the following commands to create a virtual environment:
```bash ```bash
# in ./caption-engine folder # in ./engine folder
python -m venv subenv python -m venv subenv
# or # or
python3 -m venv subenv python3 -m venv subenv
@@ -172,7 +173,7 @@ vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve()) vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
``` ```
After the build completes, you can find the executable file in the `caption-engine/dist` folder. Then proceed with subsequent operations. After the build completes, you can find the executable file in the `engine/dist` folder. Then proceed with subsequent operations.
### Run Project ### Run Project
@@ -182,8 +183,6 @@ npm run dev
### Build Project ### Build Project
Note: Currently the software has only been built and tested on Windows and macOS platforms. Correct operation on Linux platform is not guaranteed.
```bash ```bash
# For windows # For windows
npm run build:win npm run build:win
@@ -198,13 +197,13 @@ Note: You need to modify the configuration content in the `electron-builder.yml`
```yml ```yml
extraResources: extraResources:
# For Windows # For Windows
- from: ./caption-engine/dist/main-gummy.exe - from: ./engine/dist/main-gummy.exe
to: ./caption-engine/main-gummy.exe to: ./engine/main-gummy.exe
- from: ./caption-engine/dist/main-vosk.exe - from: ./engine/dist/main-vosk.exe
to: ./caption-engine/main-vosk.exe to: ./engine/main-vosk.exe
# For macOS and Linux # For macOS and Linux
# - from: ./caption-engine/dist/main-gummy # - from: ./engine/dist/main-gummy
# to: ./caption-engine/main-gummy # to: ./engine/main-gummy
# - from: ./caption-engine/dist/main-vosk # - from: ./engine/dist/main-vosk
# to: ./caption-engine/main-vosk # to: ./engine/main-vosk
``` ```

View File

@@ -4,7 +4,7 @@
<p>Auto Caption はクロスプラットフォームのリアルタイム字幕表示ソフトウェアです。</p> <p>Auto Caption はクロスプラットフォームのリアルタイム字幕表示ソフトウェアです。</p>
<p> <p>
<a href="https://github.com/HiMeditator/auto-caption/releases"> <a href="https://github.com/HiMeditator/auto-caption/releases">
<img src="https://img.shields.io/badge/release-0.5.0-blue"> <img src="https://img.shields.io/badge/release-0.5.1-blue">
</a> </a>
<a href="https://github.com/HiMeditator/auto-caption/issues"> <a href="https://github.com/HiMeditator/auto-caption/issues">
<img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"> <img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange">
@@ -18,7 +18,7 @@
| <a href="./README_en.md">English</a> | <a href="./README_en.md">English</a>
| <b>日本語</b> | | <b>日本語</b> |
</p> </p>
<p><i>バージョン v0.5.0 がリリースされました。<b>現在の Vosk ローカル字幕エンジンは性能が低く、翻訳機能も含まれていません</b>。より優れた字幕エンジンを開発中です...</i></p> <p><i>バージョン v0.5.1 がリリースされました。<b>現在の Vosk ローカル字幕エンジンは性能が低く、翻訳機能も含まれていません</b>。より優れた字幕エンジンを開発中です...</i></p>
</div> </div>
![](./assets/media/main_ja.png) ![](./assets/media/main_ja.png)
@@ -45,6 +45,7 @@
| macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ | | macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ | | Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ | | Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
macOSおよびLinuxプラットフォームでシステムオーディオ出力を取得するには追加設定が必要です。詳細は[Auto Captionユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。 macOSおよびLinuxプラットフォームでシステムオーディオ出力を取得するには追加設定が必要です。詳細は[Auto Captionユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。
@@ -121,10 +122,10 @@ npm install
### 字幕エンジンの構築 ### 字幕エンジンの構築
まず `caption-engine` フォルダに入り、以下のコマンドを実行して仮想環境を作成します: まず `engine` フォルダに入り、以下のコマンドを実行して仮想環境を作成します:
```bash ```bash
# ./caption-engine フォルダ内 # ./engine フォルダ内
python -m venv subenv python -m venv subenv
# または # または
python3 -m venv subenv python3 -m venv subenv
@@ -172,7 +173,7 @@ vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve()) vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
``` ```
これでプロジェクトのビルドが完了し、`caption-engine/dist` フォルダ内に対応する実行可能ファイルが確認できます。その後、次の操作に進むことができます。 これでプロジェクトのビルドが完了し、`engine/dist` フォルダ内に対応する実行可能ファイルが確認できます。その後、次の操作に進むことができます。
### プロジェクト実行 ### プロジェクト実行
@@ -182,8 +183,6 @@ npm run dev
### プロジェクト構築 ### プロジェクト構築
現在、ソフトウェアは Windows と macOS プラットフォームでのみ構築とテストが行われており、Linux プラットフォームでの正しい動作は保証できません。
```bash ```bash
# Windows 用 # Windows 用
npm run build:win npm run build:win
@@ -198,13 +197,13 @@ npm run build:linux
```yml ```yml
extraResources: extraResources:
# Windows用 # Windows用
- from: ./caption-engine/dist/main-gummy.exe - from: ./engine/dist/main-gummy.exe
to: ./caption-engine/main-gummy.exe to: ./engine/main-gummy.exe
- from: ./caption-engine/dist/main-vosk.exe - from: ./engine/dist/main-vosk.exe
to: ./caption-engine/main-vosk.exe to: ./engine/main-vosk.exe
# macOSとLinux用 # macOSとLinux用
# - from: ./caption-engine/dist/main-gummy # - from: ./engine/dist/main-gummy
# to: ./caption-engine/main-gummy # to: ./engine/main-gummy
# - from: ./caption-engine/dist/main-vosk # - from: ./engine/dist/main-vosk
# to: ./caption-engine/main-vosk # to: ./engine/main-vosk
``` ```

Binary file not shown.

Before

Width:  |  Height:  |  Size: 321 KiB

After

Width:  |  Height:  |  Size: 323 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 324 KiB

After

Width:  |  Height:  |  Size: 324 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 323 KiB

After

Width:  |  Height:  |  Size: 324 KiB

Binary file not shown.

View File

@@ -1,2 +0,0 @@
from dashscope.common.error import InvalidParameter
from .gummy import GummyTranslator

View File

@@ -1 +0,0 @@
from .process import mergeChunkChannels, resampleRawChunk, resampleMonoChunk

View File

@@ -1,58 +0,0 @@
import sys
import argparse
if sys.platform == 'win32':
from sysaudio.win import AudioStream
elif sys.platform == 'darwin':
from sysaudio.darwin import AudioStream
elif sys.platform == 'linux':
from sysaudio.linux import AudioStream
else:
raise NotImplementedError(f"Unsupported platform: {sys.platform}")
from audioprcs import mergeChunkChannels
from audio2text import InvalidParameter, GummyTranslator
def convert_audio_to_text(s_lang, t_lang, audio_type, chunk_rate, api_key):
sys.stdout.reconfigure(line_buffering=True) # type: ignore
stream = AudioStream(audio_type, chunk_rate)
if t_lang == 'none':
gummy = GummyTranslator(stream.RATE, s_lang, None, api_key)
else:
gummy = GummyTranslator(stream.RATE, s_lang, t_lang, api_key)
stream.openStream()
gummy.start()
while True:
try:
chunk = stream.read_chunk()
chunk_mono = mergeChunkChannels(chunk, stream.CHANNELS)
try:
gummy.send_audio_frame(chunk_mono)
except InvalidParameter:
gummy.start()
gummy.send_audio_frame(chunk_mono)
except KeyboardInterrupt:
stream.closeStream()
gummy.stop()
break
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
parser.add_argument('-c', '--chunk_rate', default=20, help='The number of audio stream chunks collected per second.')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
args = parser.parse_args()
convert_audio_to_text(
args.source_language,
args.target_language,
int(args.audio_type),
int(args.chunk_rate),
args.api_key
)

View File

@@ -1,38 +0,0 @@
# -*- mode: python ; coding: utf-8 -*-
a = Analysis(
['main-gummy.py'],
pathex=[],
binaries=[],
datas=[],
hiddenimports=[],
hookspath=[],
hooksconfig={},
runtime_hooks=[],
excludes=[],
noarchive=False,
optimize=0,
)
pyz = PYZ(a.pure)
exe = EXE(
pyz,
a.scripts,
a.binaries,
a.datas,
[],
name='main-gummy',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
upx_exclude=[],
runtime_tmpdir=None,
console=True,
disable_windowed_traceback=False,
argv_emulation=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None,
)

View File

@@ -1,83 +0,0 @@
import sys
import json
import argparse
from datetime import datetime
import numpy.core.multiarray
if sys.platform == 'win32':
from sysaudio.win import AudioStream
elif sys.platform == 'darwin':
from sysaudio.darwin import AudioStream
elif sys.platform == 'linux':
from sysaudio.linux import AudioStream
else:
raise NotImplementedError(f"Unsupported platform: {sys.platform}")
from vosk import Model, KaldiRecognizer, SetLogLevel
from audioprcs import resampleRawChunk
SetLogLevel(-1)
def convert_audio_to_text(audio_type, chunk_rate, model_path):
sys.stdout.reconfigure(line_buffering=True) # type: ignore
if model_path.startswith('"'):
model_path = model_path[1:]
if model_path.endswith('"'):
model_path = model_path[:-1]
model = Model(model_path)
recognizer = KaldiRecognizer(model, 16000)
stream = AudioStream(audio_type, chunk_rate)
stream.openStream()
time_str = ''
cur_id = 0
prev_content = ''
while True:
chunk = stream.read_chunk()
chunk_mono = resampleRawChunk(chunk, stream.CHANNELS, stream.RATE, 16000)
caption = {}
if recognizer.AcceptWaveform(chunk_mono):
content = json.loads(recognizer.Result()).get('text', '')
caption['index'] = cur_id
caption['text'] = content
caption['time_s'] = time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
caption['translation'] = ''
prev_content = ''
cur_id += 1
else:
content = json.loads(recognizer.PartialResult()).get('partial', '')
if content == '' or content == prev_content:
continue
if prev_content == '':
time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
caption['index'] = cur_id
caption['text'] = content
caption['time_s'] = time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
caption['translation'] = ''
prev_content = content
try:
json_str = json.dumps(caption) + '\n'
sys.stdout.write(json_str)
sys.stdout.flush()
except Exception as e:
print(e)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
parser.add_argument('-c', '--chunk_rate', default=20, help='The number of audio stream chunks collected per second.')
parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
args = parser.parse_args()
convert_audio_to_text(
int(args.audio_type),
int(args.chunk_rate),
args.model_path
)

View File

@@ -105,3 +105,31 @@
- 调整字幕窗口右上角图标为竖向排布 - 调整字幕窗口右上角图标为竖向排布
- 过滤 Gummy 字幕引擎输出的不完整字幕 - 过滤 Gummy 字幕引擎输出的不完整字幕
## v0.5.1
2025-07-17
### 修复 bug
- 修复无法调用自定义字幕引擎的 bug
- 修复自定义字幕引擎的参数失效 bug
## v0.6.0
2025-07-xx
### 新增功能
- 新增字幕记录排序功能,可选择字幕记录正序或倒叙显示
### 优化体验
- 交换窗口界面信息和错误提示弹窗的位置,防止提示信息挡住操作
### 项目优化
- 重构字幕引擎,提示字幕引擎代码的可扩展性和可读性
- 合并 Gummy 和 Vosk 引擎为单个可执行文件,减小软件体积
- 字幕引擎和主程序添加 WebScoket 通信,完全避免字幕引擎成为孤儿进程

View File

@@ -15,10 +15,13 @@
- [x] 可以调整字幕时间轴 *2025/07/14* - [x] 可以调整字幕时间轴 *2025/07/14*
- [x] 可以导出 srt 格式的字幕记录 *2025/07/14* - [x] 可以导出 srt 格式的字幕记录 *2025/07/14*
- [x] 可以获取字幕引擎的系统资源消耗情况 *2025/07/15* - [x] 可以获取字幕引擎的系统资源消耗情况 *2025/07/15*
- [x] 添加字幕记录按时间降序排列选择 *2025/07/26*
- [x] 重构字幕引擎 *2025/07/28*
## 待完成 ## 待完成
- [ ] 探索更多的语音转文字模型 - [ ] 优化前端界面提示消息
- [ ] 验证 / 添加基于 sherpa-onnx 的字幕引擎
## 后续计划 ## 后续计划

View File

@@ -0,0 +1,102 @@
# caption engine api-doc
本文档主要介绍字幕引擎和 Electron 主进程进程的通信约定。
## 原理说明
本项目的 Python 进程通过标准输出向 Electron 主进程发送数据。Python 进程标准输出 (`sys.stdout`) 的内容一定为一行一行的字符串。且每行字符串均可以解释为一个 JSON 对象。每个 JSON 对象一定有 `command` 参数。
Electron 主进程通过 WebSocket 向 Python 进程发送数据。发送的数据均是转化为字符串的对象,对象格式一定为:
```js
{
command: string,
content: string
}
```
## 标准输出约定
> 数据传递方向:字幕引擎进程 => Electron 主进程
当 JSON 对象的 `command` 参数为下列值时,表示的对应的含义:
### `connect`
```js
{
command: "connect",
content: ""
}
```
字幕引擎 WebSocket 服务已经准备好,命令 Electron 主进程连接字幕引擎 WebSocket 服务
### `kill`
```js
{
command: "connect",
content: ""
}
```
命令 Electron 主进程强制结束字幕引擎进程。
### `caption`
```js
{
command: "caption",
index: number,
time_s: string,
time_t: string,
text: string,
translation: string
}
```
Python 端监听到的音频流转换为的字幕数据。
### `print`
```js
{
command: "print",
content: string
}
```
输出 Python 端打印的内容。
### `info`
```js
{
command: "info",
content: string
}
```
Python 端打印的提示信息,比起 `print`,该信息更希望 Electron 端的关注。
### `usage`
```js
{
command: "usage",
content: string
}
```
Gummy 字幕引擎结束时打印计费消耗信息。
## WebSocket
> 数据传递方向Electron 主进程 => 字幕引擎进程
当 JSON 对象的 `command` 参数为下列值时,表示的对应的含义:
### `stop`
命令当前字幕引擎停止监听并结束任务。

View File

@@ -1,6 +1,8 @@
# Caption Engine Documentation # Caption Engine Documentation
Corresponding Version: v0.5.0 Corresponding Version: v0.5.1
**Note: Due to limited personal resources, the English and Japanese documentation files for this project (except for the README document) will no longer be maintained. The content of this document may not be consistent with the latest version of the project. If you are willing to help with translation, please submit relevant Pull Requests.**
![](../../assets/media/structure_en.png) ![](../../assets/media/structure_en.png)
@@ -20,7 +22,7 @@ Generally, the captured audio stream data consists of short audio chunks, and th
The acquired audio stream may need preprocessing before being converted to text. For instance, Alibaba Cloud's Gummy model can only recognize single-channel audio streams, while the collected audio streams are typically dual-channel, thus requiring conversion from dual-channel to single-channel. Channel conversion can be achieved using methods in the NumPy library. The acquired audio stream may need preprocessing before being converted to text. For instance, Alibaba Cloud's Gummy model can only recognize single-channel audio streams, while the collected audio streams are typically dual-channel, thus requiring conversion from dual-channel to single-channel. Channel conversion can be achieved using methods in the NumPy library.
You can directly use the audio acquisition (`caption-engine/sysaudio`) and audio processing (`caption-engine/audioprcs`) modules I have developed. You can directly use the audio acquisition (`engine/sysaudio`) and audio processing (`engine/audioprcs`) modules I have developed.
### Audio to Text Conversion ### Audio to Text Conversion
@@ -105,10 +107,10 @@ export interface CaptionItem {
If using Python, you can refer to the following method to pass data to the main program: If using Python, you can refer to the following method to pass data to the main program:
```python ```python
# caption-engine\main-gummy.py # engine\main-gummy.py
sys.stdout.reconfigure(line_buffering=True) sys.stdout.reconfigure(line_buffering=True)
# caption-engine\audio2text\gummy.py # engine\audio2text\gummy.py
... ...
def send_to_node(self, data): def send_to_node(self, data):
""" """
@@ -198,4 +200,4 @@ With a working caption engine, specify its path and runtime parameters in the ca
## Reference Code ## Reference Code
The `main-gummy.py` file under the `caption-engine` folder in this project serves as the entry point for the default caption engine. The `src\main\utils\engine.ts` file contains the server-side code for acquiring and processing data from the caption engine. You can read and understand the implementation details and the complete execution process of the caption engine as needed. The `main-gummy.py` file under the `engine` folder in this project serves as the entry point for the default caption engine. The `src\main\utils\engine.ts` file contains the server-side code for acquiring and processing data from the caption engine. You can read and understand the implementation details and the complete execution process of the caption engine as needed.

View File

@@ -1,9 +1,11 @@
# 字幕エンジンの説明文書 # 字幕エンジンの説明文書
対応バージョンv0.5.0 対応バージョンv0.5.1
この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。 この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
**注意個人のリソースが限られているため、このプロジェクトの英語および日本語のドキュメントREADME ドキュメントを除く)のメンテナンスは行われません。このドキュメントの内容は最新版のプロジェクトと一致しない場合があります。翻訳のお手伝いをしていただける場合は、関連するプルリクエストを提出してください。**
![](../../assets/media/structure_ja.png) ![](../../assets/media/structure_ja.png)
## 字幕エンジンの紹介 ## 字幕エンジンの紹介
@@ -22,7 +24,7 @@
取得した音声ストリームは、テキストに変換する前に前処理が必要な場合があります。例えば、アリババクラウドのGummyモデルは単一チャンネルの音声ストリームしか認識できませんが、収集された音声ストリームは通常二重チャンネルであるため、二重チャンネルの音声ストリームを単一チャンネルに変換する必要があります。チャンネル数の変換はNumPyライブラリのメソッドを使って行うことができます。 取得した音声ストリームは、テキストに変換する前に前処理が必要な場合があります。例えば、アリババクラウドのGummyモデルは単一チャンネルの音声ストリームしか認識できませんが、収集された音声ストリームは通常二重チャンネルであるため、二重チャンネルの音声ストリームを単一チャンネルに変換する必要があります。チャンネル数の変換はNumPyライブラリのメソッドを使って行うことができます。
あなたは私によって開発された音声の取得(`caption-engine/sysaudio`)と音声の処理(`caption-engine/audioprcs`)モジュールを直接使用することができます。 あなたは私によって開発された音声の取得(`engine/sysaudio`)と音声の処理(`engine/audioprcs`)モジュールを直接使用することができます。
### 音声からテキストへの変換 ### 音声からテキストへの変換
@@ -107,10 +109,10 @@ export interface CaptionItem {
Python言語を使用する場合、以下の方法でデータをメインプログラムに渡すことができます Python言語を使用する場合、以下の方法でデータをメインプログラムに渡すことができます
```python ```python
# caption-engine\main-gummy.py # engine\main-gummy.py
sys.stdout.reconfigure(line_buffering=True) sys.stdout.reconfigure(line_buffering=True)
# caption-engine\audio2text\gummy.py # engine\audio2text\gummy.py
... ...
def send_to_node(self, data): def send_to_node(self, data):
""" """
@@ -198,4 +200,4 @@ python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>
## 参考コード ## 参考コード
本プロジェクトの`caption-engine`フォルダにある`main-gummy.py`ファイルはデフォルトの字幕エンジンのエントリーコードです。`src\main\utils\engine.ts`はサーバー側で字幕エンジンのデータを取得・処理するコードです。必要に応じて字幕エンジンの実装詳細と完全な実行プロセスを理解するために参照してください。 本プロジェクトの`engine`フォルダにある`main-gummy.py`ファイルはデフォルトの字幕エンジンのエントリーコードです。`src\main\utils\engine.ts`はサーバー側で字幕エンジンのデータを取得・処理するコードです。必要に応じて字幕エンジンの実装詳細と完全な実行プロセスを理解するために参照してください。

View File

@@ -1,6 +1,6 @@
# 字幕引擎说明文档 # 字幕引擎说明文档
对应版本v0.5.0 对应版本v0.5.1
![](../../assets/media/structure_zh.png) ![](../../assets/media/structure_zh.png)
@@ -20,7 +20,7 @@
获取到的音频流在转文字之前可能需要进行预处理。比如阿里云的 Gummy 模型只能识别单通道的音频流,而收集的音频流一般是双通道的,因此要将双通道音频流转换为单通道。通道数的转换可以使用 NumPy 库中的方法实现。 获取到的音频流在转文字之前可能需要进行预处理。比如阿里云的 Gummy 模型只能识别单通道的音频流,而收集的音频流一般是双通道的,因此要将双通道音频流转换为单通道。通道数的转换可以使用 NumPy 库中的方法实现。
你可以直接使用我开发好的音频获取(`caption-engine/sysaudio`)和音频处理(`caption-engine/audioprcs`)模块。 你可以直接使用我开发好的音频获取(`engine/sysaudio`)和音频处理(`engine/audioprcs`)模块。
### 音频转文字 ### 音频转文字
@@ -105,10 +105,10 @@ export interface CaptionItem {
如果使用 python 语言,可以参考以下方式将数据传递给主程序: 如果使用 python 语言,可以参考以下方式将数据传递给主程序:
```python ```python
# caption-engine\main-gummy.py # engine\main-gummy.py
sys.stdout.reconfigure(line_buffering=True) sys.stdout.reconfigure(line_buffering=True)
# caption-engine\audio2text\gummy.py # engine\audio2text\gummy.py
... ...
def send_to_node(self, data): def send_to_node(self, data):
""" """
@@ -198,4 +198,4 @@ python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>
## 参考代码 ## 参考代码
本项目 `caption-engine` 文件夹下的 `main-gummy.py` 文件为默认字幕引擎的入口代码。`src\main\utils\engine.ts` 为服务端获取字幕引擎数据和进行处理的代码。可以根据需要阅读了解字幕引擎的实现细节和完整运行过程。 本项目 `engine` 文件夹下的 `main-gummy.py` 文件为默认字幕引擎的入口代码。`src\main\utils\engine.ts` 为服务端获取字幕引擎数据和进行处理的代码。可以根据需要阅读了解字幕引擎的实现细节和完整运行过程。

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

After

Width:  |  Height:  |  Size: 57 KiB

View File

@@ -1,6 +1,8 @@
# Auto Caption User Manual # Auto Caption User Manual
Corresponding Version: v0.5.0 Corresponding Version: v0.5.1
**Note: Due to limited personal resources, the English and Japanese documentation files for this project (except for the README document) will no longer be maintained. The content of this document may not be consistent with the latest version of the project. If you are willing to help with translation, please submit relevant Pull Requests.**
## Software Introduction ## Software Introduction
@@ -16,6 +18,7 @@ The following operating system versions have been tested and confirmed to work p
| macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ | | macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ | | Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ | | Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
![](../../assets/media/main_en.png) ![](../../assets/media/main_en.png)

View File

@@ -1,9 +1,11 @@
# Auto Caption ユーザーマニュアル # Auto Caption ユーザーマニュアル
対応バージョンv0.5.0 対応バージョンv0.5.1
この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。 この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
**注意個人のリソースが限られているため、このプロジェクトの英語および日本語のドキュメントREADME ドキュメントを除く)のメンテナンスは行われません。このドキュメントの内容は最新版のプロジェクトと一致しない場合があります。翻訳のお手伝いをしていただける場合は、関連するプルリクエストを提出してください。**
## ソフトウェアの概要 ## ソフトウェアの概要
Auto Caption は、クロスプラットフォームの字幕表示ソフトウェアで、システムの音声入力(録音)または出力(音声再生)のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。このソフトウェアが提供するデフォルトの字幕エンジン(アリババクラウド Gummy モデルを使用は、9つの言語中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、スペイン語、イタリア語の認識と翻訳をサポートしています。 Auto Caption は、クロスプラットフォームの字幕表示ソフトウェアで、システムの音声入力(録音)または出力(音声再生)のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。このソフトウェアが提供するデフォルトの字幕エンジン(アリババクラウド Gummy モデルを使用は、9つの言語中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、スペイン語、イタリア語の認識と翻訳をサポートしています。
@@ -18,6 +20,7 @@ Auto Caption は、クロスプラットフォームの字幕表示ソフトウ
| macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ | | macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ | | Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ | | Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
![](../../assets/media/main_ja.png) ![](../../assets/media/main_ja.png)

View File

@@ -1,6 +1,6 @@
# Auto Caption 用户手册 # Auto Caption 用户手册
对应版本v0.5.0 对应版本v0.5.1
## 软件简介 ## 软件简介
@@ -16,6 +16,7 @@ Auto Caption 是一个跨平台的字幕显示软件,能够实时获取系统
| macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ | | macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ | | Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ | | Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
![](../../assets/media/main_zh.png) ![](../../assets/media/main_zh.png)

View File

@@ -10,21 +10,16 @@ files:
- '!{LICENSE,README.md,README_en.md,README_ja.md}' - '!{LICENSE,README.md,README_en.md,README_ja.md}'
- '!{.env,.env.*,.npmrc,pnpm-lock.yaml}' - '!{.env,.env.*,.npmrc,pnpm-lock.yaml}'
- '!{tsconfig.json,tsconfig.node.json,tsconfig.web.json}' - '!{tsconfig.json,tsconfig.node.json,tsconfig.web.json}'
- '!caption-engine/*' - '!engine/*'
- '!engine-test/*'
- '!docs/*' - '!docs/*'
- '!assets/*' - '!assets/*'
extraResources: extraResources:
# For Windows # For Windows
- from: ./caption-engine/dist/main-gummy.exe - from: ./engine/dist/main.exe
to: ./caption-engine/main-gummy.exe to: ./engine/main.exe
- from: ./caption-engine/dist/main-vosk.exe
to: ./caption-engine/main-vosk.exe
# For macOS and Linux # For macOS and Linux
# - from: ./caption-engine/dist/main-gummy # - from: ./engine/dist/main
# to: ./caption-engine/main-gummy # to: ./engine/main
# - from: ./caption-engine/dist/main-vosk
# to: ./caption-engine/main-vosk
win: win:
executableName: auto-caption executableName: auto-caption
icon: build/icon.png icon: build/icon.png

View File

@@ -1,221 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dashscope.audio.asr import * # type: ignore\n",
"import pyaudiowpatch as pyaudio\n",
"import numpy as np\n",
"\n",
"\n",
"def getDefaultSpeakers(mic: pyaudio.PyAudio, info = True):\n",
" \"\"\"\n",
" 获取默认的系统音频输出的回环设备\n",
" Args:\n",
" mic (pyaudio.PyAudio): pyaudio对象\n",
" info (bool, optional): 是否打印设备信息. Defaults to True.\n",
"\n",
" Returns:\n",
" dict: 统音频输出的回环设备\n",
" \"\"\"\n",
" try:\n",
" WASAPI_info = mic.get_host_api_info_by_type(pyaudio.paWASAPI)\n",
" except OSError:\n",
" print(\"Looks like WASAPI is not available on the system. Exiting...\")\n",
" exit()\n",
"\n",
" default_speaker = mic.get_device_info_by_index(WASAPI_info[\"defaultOutputDevice\"])\n",
" if(info): print(\"wasapi_info:\\n\", WASAPI_info, \"\\n\")\n",
" if(info): print(\"default_speaker:\\n\", default_speaker, \"\\n\")\n",
"\n",
" if not default_speaker[\"isLoopbackDevice\"]:\n",
" for loopback in mic.get_loopback_device_info_generator():\n",
" if default_speaker[\"name\"] in loopback[\"name\"]:\n",
" default_speaker = loopback\n",
" if(info): print(\"Using loopback device:\\n\", default_speaker, \"\\n\")\n",
" break\n",
" else:\n",
" print(\"Default loopback output device not found.\")\n",
" print(\"Run `python -m pyaudiowpatch` to check available devices.\")\n",
" print(\"Exiting...\")\n",
" exit()\n",
" \n",
" if(info): print(f\"Recording Device: #{default_speaker['index']} {default_speaker['name']}\")\n",
" return default_speaker\n",
"\n",
"\n",
"class Callback(TranslationRecognizerCallback):\n",
" \"\"\"\n",
" 语音大模型流式传输回调对象\n",
" \"\"\"\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.usage = 0\n",
" self.sentences = []\n",
" self.translations = []\n",
" \n",
" def on_open(self) -> None:\n",
" print(\"\\n流式翻译开始...\\n\")\n",
"\n",
" def on_close(self) -> None:\n",
" print(f\"\\nTokens消耗{self.usage}\")\n",
" print(f\"流式翻译结束...\\n\")\n",
" for i in range(len(self.sentences)):\n",
" print(f\"\\n{self.sentences[i]}\\n{self.translations[i]}\\n\")\n",
"\n",
" def on_event(\n",
" self,\n",
" request_id,\n",
" transcription_result: TranscriptionResult,\n",
" translation_result: TranslationResult,\n",
" usage\n",
" ) -> None:\n",
" if transcription_result is not None:\n",
" id = transcription_result.sentence_id\n",
" text = transcription_result.text\n",
" if transcription_result.stash is not None:\n",
" stash = transcription_result.stash.text\n",
" else:\n",
" stash = \"\"\n",
" print(f\"#{id}: {text}{stash}\")\n",
" if usage: self.sentences.append(text)\n",
" \n",
" if translation_result is not None:\n",
" lang = translation_result.get_language_list()[0]\n",
" text = translation_result.get_translation(lang).text\n",
" if translation_result.get_translation(lang).stash is not None:\n",
" stash = translation_result.get_translation(lang).stash.text\n",
" else:\n",
" stash = \"\"\n",
" print(f\"#{lang}: {text}{stash}\")\n",
" if usage: self.translations.append(text)\n",
" \n",
" if usage: self.usage += usage['duration']"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"采样输入设备:\n",
" - 序号26\n",
" - 名称:耳机 (HUAWEI FreeLace 活力版) [Loopback]\n",
" - 最大输入通道数2\n",
" - 默认低输入延迟0.003s\n",
" - 默认高输入延迟0.01s\n",
" - 默认采样率48000.0Hz\n",
" - 是否回环设备True\n",
"\n",
"音频样本块大小4800\n",
"样本位宽2\n",
"音频数据格式8\n",
"音频通道数2\n",
"音频采样率48000\n",
"\n"
]
}
],
"source": [
"mic = pyaudio.PyAudio()\n",
"default_speaker = getDefaultSpeakers(mic, False)\n",
"\n",
"SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)\n",
"FORMAT = pyaudio.paInt16\n",
"CHANNELS = default_speaker[\"maxInputChannels\"]\n",
"RATE = int(default_speaker[\"defaultSampleRate\"])\n",
"CHUNK = RATE // 10\n",
"INDEX = default_speaker[\"index\"]\n",
"\n",
"dev_info = f\"\"\"\n",
"采样输入设备:\n",
" - 序号:{default_speaker['index']}\n",
" - 名称:{default_speaker['name']}\n",
" - 最大输入通道数:{default_speaker['maxInputChannels']}\n",
" - 默认低输入延迟:{default_speaker['defaultLowInputLatency']}s\n",
" - 默认高输入延迟:{default_speaker['defaultHighInputLatency']}s\n",
" - 默认采样率:{default_speaker['defaultSampleRate']}Hz\n",
" - 是否回环设备:{default_speaker['isLoopbackDevice']}\n",
"\n",
"音频样本块大小:{CHUNK}\n",
"样本位宽:{SAMP_WIDTH}\n",
"音频数据格式:{FORMAT}\n",
"音频通道数:{CHANNELS}\n",
"音频采样率:{RATE}\n",
"\"\"\"\n",
"print(dev_info)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"RECORD_SECONDS = 20 # 监听时长(s)\n",
"\n",
"stream = mic.open(\n",
" format = FORMAT,\n",
" channels = CHANNELS,\n",
" rate = RATE,\n",
" input = True,\n",
" input_device_index = INDEX\n",
")\n",
"translator = TranslationRecognizerRealtime(\n",
" model = \"gummy-realtime-v1\",\n",
" format = \"pcm\",\n",
" sample_rate = RATE,\n",
" transcription_enabled = True,\n",
" translation_enabled = True,\n",
" source_language = \"ja\",\n",
" translation_target_languages = [\"zh\"],\n",
" callback = Callback()\n",
")\n",
"translator.start()\n",
"\n",
"for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):\n",
" data = stream.read(CHUNK)\n",
" data_np = np.frombuffer(data, dtype=np.int16)\n",
" data_np_r = data_np.reshape(-1, CHANNELS)\n",
" print(data_np_r.shape)\n",
" mono_data = np.mean(data_np_r.astype(np.float32), axis=1)\n",
" mono_data = mono_data.astype(np.int16)\n",
" mono_data_bytes = mono_data.tobytes()\n",
" translator.send_audio_frame(mono_data_bytes)\n",
"\n",
"translator.stop()\n",
"stream.stop_stream()\n",
"stream.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "mystd",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,189 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 7,
"id": "1e12f3ef",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" 采样输入设备:\n",
" - 设备类型:音频输出\n",
" - 序号0\n",
" - 名称BlackHole 2ch\n",
" - 最大输入通道数2\n",
" - 默认低输入延迟0.01s\n",
" - 默认高输入延迟0.1s\n",
" - 默认采样率48000.0Hz\n",
"\n",
" 音频样本块大小2400\n",
" 样本位宽2\n",
" 采样格式8\n",
" 音频通道数2\n",
" 音频采样率48000\n",
" \n"
]
}
],
"source": [
"import sys\n",
"import os\n",
"import wave\n",
"\n",
"current_dir = os.getcwd() \n",
"sys.path.append(os.path.join(current_dir, '../caption-engine'))\n",
"\n",
"from sysaudio.darwin import AudioStream\n",
"from audioprcs import resampleRawChunk, mergeChunkChannels\n",
"\n",
"stream = AudioStream(0)\n",
"stream.printInfo()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a72914f4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Recording...\n",
"Done\n"
]
}
],
"source": [
"\"\"\"获取系统音频输出5秒然后保存为wav文件\"\"\"\n",
"\n",
"with wave.open('output.wav', 'wb') as wf:\n",
" wf.setnchannels(stream.CHANNELS)\n",
" wf.setsampwidth(stream.SAMP_WIDTH)\n",
" wf.setframerate(stream.RATE)\n",
" stream.openStream()\n",
"\n",
" print('Recording...')\n",
"\n",
" for _ in range(0, 100):\n",
" chunk = stream.read_chunk()\n",
" if isinstance(chunk, bytes):\n",
" wf.writeframes(chunk)\n",
" else:\n",
" raise Exception('Error: chunk is not bytes')\n",
" \n",
" stream.closeStream() \n",
" print('Done')"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a6e8a098",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Recording...\n",
"Done\n"
]
}
],
"source": [
"\"\"\"获取系统音频输入转换为单通道音频持续5秒然后保存为wav文件\"\"\"\n",
"\n",
"with wave.open('output.wav', 'wb') as wf:\n",
" wf.setnchannels(1)\n",
" wf.setsampwidth(stream.SAMP_WIDTH)\n",
" wf.setframerate(stream.RATE)\n",
" stream.openStream()\n",
"\n",
" print('Recording...')\n",
"\n",
" for _ in range(0, 100):\n",
" chunk = mergeChunkChannels(\n",
" stream.read_chunk(),\n",
" stream.CHANNELS\n",
" )\n",
" if isinstance(chunk, bytes):\n",
" wf.writeframes(chunk)\n",
" else:\n",
" raise Exception('Error: chunk is not bytes')\n",
" \n",
" stream.closeStream() \n",
" print('Done')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "aaca1465",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Recording...\n",
"Done\n"
]
}
],
"source": [
"\"\"\"获取系统音频输入转换为单通道音频并重采样到16000Hz持续5秒然后保存为wav文件\"\"\"\n",
"\n",
"with wave.open('output.wav', 'wb') as wf:\n",
" wf.setnchannels(1)\n",
" wf.setsampwidth(stream.SAMP_WIDTH)\n",
" wf.setframerate(16000)\n",
" stream.openStream()\n",
"\n",
" print('Recording...')\n",
"\n",
" for _ in range(0, 100):\n",
" chunk = resampleRawChunk(\n",
" stream.read_chunk(),\n",
" stream.CHANNELS,\n",
" stream.RATE,\n",
" 16000,\n",
" mode=\"sinc_best\"\n",
" )\n",
" if isinstance(chunk, bytes):\n",
" wf.writeframes(chunk)\n",
" else:\n",
" raise Exception('Error: chunk is not bytes')\n",
" \n",
" stream.closeStream() \n",
" print('Done')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,124 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "6fb12704",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"d:\\Projects\\auto-caption\\caption-engine\\subenv\\Lib\\site-packages\\vosk\\__init__.py\n"
]
}
],
"source": [
"import vosk\n",
"print(vosk.__file__)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "63a06f5c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" 采样设备:\n",
" - 设备类型:音频输入\n",
" - 序号1\n",
" - 名称:麦克风阵列 (Realtek(R) Audio)\n",
" - 最大输入通道数2\n",
" - 默认低输入延迟0.09s\n",
" - 默认高输入延迟0.18s\n",
" - 默认采样率44100.0Hz\n",
" - 是否回环设备False\n",
"\n",
" 音频样本块大小2205\n",
" 样本位宽2\n",
" 采样格式8\n",
" 音频通道数2\n",
" 音频采样率44100\n",
" \n"
]
}
],
"source": [
"import sys\n",
"import os\n",
"import json\n",
"from vosk import Model, KaldiRecognizer\n",
"\n",
"current_dir = os.getcwd() \n",
"sys.path.append(os.path.join(current_dir, '../caption-engine'))\n",
"\n",
"from sysaudio.win import AudioStream\n",
"from audioprcs import resampleRawChunk, mergeChunkChannels\n",
"\n",
"stream = AudioStream(1)\n",
"stream.printInfo()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "5d5a0afa",
"metadata": {},
"outputs": [],
"source": [
"model = Model(os.path.join(\n",
" current_dir,\n",
" '../caption-engine/models/vosk-model-small-cn-0.22'\n",
"))\n",
"recognizer = KaldiRecognizer(model, 16000)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e9d1530",
"metadata": {},
"outputs": [],
"source": [
"stream.openStream()\n",
"\n",
"for i in range(200):\n",
" chunk = stream.read_chunk()\n",
" chunk_mono = resampleRawChunk(chunk, stream.CHANNELS, stream.RATE, 16000)\n",
" if recognizer.AcceptWaveform(chunk_mono):\n",
" result = json.loads(recognizer.Result())\n",
" print(\"acc:\", result.get(\"text\", \"\"))\n",
" else:\n",
" partial = json.loads(recognizer.PartialResult())\n",
" print(\"else:\", partial.get(\"partial\", \"\"))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "subenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,3 @@
from dashscope.common.error import InvalidParameter
from .gummy import GummyRecognizer
from .vosk import VoskRecognizer

View File

@@ -6,8 +6,8 @@ from dashscope.audio.asr import (
) )
import dashscope import dashscope
from datetime import datetime from datetime import datetime
import json from utils import stdout_cmd, stdout_obj, stderr
import sys
class Callback(TranslationRecognizerCallback): class Callback(TranslationRecognizerCallback):
""" """
@@ -15,17 +15,20 @@ class Callback(TranslationRecognizerCallback):
""" """
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.index = 0
self.usage = 0 self.usage = 0
self.cur_id = -1 self.cur_id = -1
self.time_str = '' self.time_str = ''
def on_open(self) -> None: def on_open(self) -> None:
# print("on_open") self.usage = 0
pass self.cur_id = -1
self.time_str = ''
stdout_cmd('info', 'Gummy translator started.')
def on_close(self) -> None: def on_close(self) -> None:
# print("on_close") stdout_cmd('info', 'Gummy translator closed.')
pass stdout_cmd('usage', str(self.usage))
def on_event( def on_event(
self, self,
@@ -35,17 +38,17 @@ class Callback(TranslationRecognizerCallback):
usage usage
) -> None: ) -> None:
caption = {} caption = {}
if transcription_result is not None: if transcription_result is not None:
caption['index'] = transcription_result.sentence_id if self.cur_id != transcription_result.sentence_id:
caption['text'] = transcription_result.text self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
if caption['index'] != self.cur_id: self.cur_id = transcription_result.sentence_id
self.cur_id = caption['index'] self.index += 1
cur_time = datetime.now().strftime('%H:%M:%S.%f')[:-3] caption['command'] = 'caption'
caption['time_s'] = cur_time caption['index'] = self.index
self.time_str = cur_time caption['time_s'] = self.time_str
else:
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3] caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
caption['text'] = transcription_result.text
caption['translation'] = "" caption['translation'] = ""
if translation_result is not None: if translation_result is not None:
@@ -55,21 +58,11 @@ class Callback(TranslationRecognizerCallback):
if usage: if usage:
self.usage += usage['duration'] self.usage += usage['duration']
# print(caption) if 'text' in caption:
self.send_to_node(caption) stdout_obj(caption)
def send_to_node(self, data):
"""
将数据发送到 Node.js 进程
"""
try:
json_data = json.dumps(data) + '\n'
sys.stdout.write(json_data)
sys.stdout.flush()
except Exception as e:
print(f"Error sending data to Node.js: {e}", file=sys.stderr)
class GummyTranslator: class GummyRecognizer:
""" """
使用 Gummy 引擎流式处理的音频数据并在标准输出中输出与 Auto Caption 软件可读取的 JSON 字符串数据 使用 Gummy 引擎流式处理的音频数据并在标准输出中输出与 Auto Caption 软件可读取的 JSON 字符串数据
@@ -77,8 +70,9 @@ class GummyTranslator:
rate: 音频采样率 rate: 音频采样率
source: 源语言代码字符串zh, en, ja source: 源语言代码字符串zh, en, ja
target: 目标语言代码字符串zh, en, ja target: 目标语言代码字符串zh, en, ja
api_key: 阿里云百炼平台 API KEY
""" """
def __init__(self, rate, source, target, api_key): def __init__(self, rate: int, source: str, target: str | None, api_key: str | None):
if api_key: if api_key:
dashscope.api_key = api_key dashscope.api_key = api_key
self.translator = TranslationRecognizerRealtime( self.translator = TranslationRecognizerRealtime(
@@ -97,9 +91,12 @@ class GummyTranslator:
self.translator.start() self.translator.start()
def send_audio_frame(self, data): def send_audio_frame(self, data):
"""发送音频帧""" """发送音频帧,擎将自动识别并将识别结果输出到标准输出中"""
self.translator.send_audio_frame(data) self.translator.send_audio_frame(data)
def stop(self): def stop(self):
"""停止 Gummy 引擎""" """停止 Gummy 引擎"""
self.translator.stop() try:
self.translator.stop()
except Exception:
return

68
engine/audio2text/vosk.py Normal file
View File

@@ -0,0 +1,68 @@
import json
from datetime import datetime
from vosk import Model, KaldiRecognizer, SetLogLevel
from utils import stdout_cmd, stdout_obj
class VoskRecognizer:
"""
使用 Vosk 引擎流式处理的音频数据,并在标准输出中输出与 Auto Caption 软件可读取的 JSON 字符串数据
初始化参数:
model_path: Vosk 识别模型路径
"""
def __init__(self, model_path: str):
SetLogLevel(-1)
if model_path.startswith('"'):
model_path = model_path[1:]
if model_path.endswith('"'):
model_path = model_path[:-1]
self.model_path = model_path
self.time_str = ''
self.cur_id = 0
self.prev_content = ''
self.model = Model(self.model_path)
self.recognizer = KaldiRecognizer(self.model, 16000)
def start(self):
"""启动 Vosk 引擎"""
stdout_cmd('info', 'Vosk recognizer started.')
def send_audio_frame(self, data: bytes):
"""
发送音频帧给 Vosk 引擎,引擎将自动识别并将识别结果输出到标准输出中
Args:
data: 音频帧数据,采样率必须为 16000Hz
"""
caption = {}
caption['command'] = 'caption'
caption['translation'] = ''
if self.recognizer.AcceptWaveform(data):
content = json.loads(self.recognizer.Result()).get('text', '')
caption['index'] = self.cur_id
caption['text'] = content
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.prev_content = ''
self.cur_id += 1
else:
content = json.loads(self.recognizer.PartialResult()).get('partial', '')
if content == '' or content == self.prev_content:
return
if self.prev_content == '':
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
caption['index'] = self.cur_id
caption['text'] = content
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.prev_content = content
stdout_obj(caption)
def stop(self):
"""停止 Vosk 引擎"""
stdout_cmd('info', 'Vosk recognizer closed.')

102
engine/main.py Normal file
View File

@@ -0,0 +1,102 @@
import argparse
from utils import stdout_cmd, stderr
from utils import thread_data, start_server
from utils import merge_chunk_channels, resample_chunk_mono
from audio2text import InvalidParameter, GummyRecognizer
from audio2text import VoskRecognizer
from sysaudio import AudioStream
def main_gummy(s: str, t: str, a: int, c: int, k: str):
global thread_data
stream = AudioStream(a, c)
if t == 'none':
engine = GummyRecognizer(stream.RATE, s, None, k)
else:
engine = GummyRecognizer(stream.RATE, s, t, k)
stream.open_stream()
engine.start()
restart_count = 0
while thread_data.status == "running":
try:
chunk = stream.read_chunk()
if chunk is None: continue
chunk_mono = merge_chunk_channels(chunk, stream.CHANNELS)
try:
engine.send_audio_frame(chunk_mono)
except InvalidParameter as e:
restart_count += 1
if restart_count > 8:
stderr(str(e))
thread_data.status = "kill"
break
else:
stdout_cmd('info', f'Gummy engine stopped, trying to restart #{restart_count}')
except KeyboardInterrupt:
break
stream.close_stream()
engine.stop()
def main_vosk(a: int, c: int, m: str):
global thread_data
stream = AudioStream(a, c)
engine = VoskRecognizer(m)
stream.open_stream()
engine.start()
while thread_data.status == "running":
try:
chunk = stream.read_chunk()
if chunk is None: continue
chunk_mono = resample_chunk_mono(chunk, stream.CHANNELS, stream.RATE, 16000)
engine.send_audio_frame(chunk_mono)
except KeyboardInterrupt:
break
stream.close_stream()
engine.stop()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
# both
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
parser.add_argument('-c', '--chunk_rate', default=20, help='Number of audio stream chunks collected per second')
parser.add_argument('-p', '--port', default=7070, help='The port to run the server on, 0 for no server')
# gummy
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
# vosk
parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
args = parser.parse_args()
if int(args.port) == 0:
thread_data.status = "running"
else:
start_server(int(args.port))
if args.caption_engine == 'gummy':
main_gummy(
args.source_language,
args.target_language,
int(args.audio_type),
int(args.chunk_rate),
args.api_key
)
elif args.caption_engine == 'vosk':
main_vosk(
int(args.audio_type),
int(args.chunk_rate),
args.model_path
)
else:
raise ValueError('Invalid caption engine specified.')
if thread_data.status == "kill":
stdout_cmd('kill')

View File

@@ -9,7 +9,7 @@ else:
vosk_path = str(Path('./subenv/lib/python3.12/site-packages/vosk').resolve()) vosk_path = str(Path('./subenv/lib/python3.12/site-packages/vosk').resolve())
a = Analysis( a = Analysis(
['main-vosk.py'], ['main.py'],
pathex=[], pathex=[],
binaries=[], binaries=[],
datas=[(vosk_path, 'vosk')], datas=[(vosk_path, 'vosk')],
@@ -30,7 +30,7 @@ exe = EXE(
a.binaries, a.binaries,
a.datas, a.datas,
[], [],
name='main-vosk', name='main',
debug=False, debug=False,
bootloader_ignore_signals=False, bootloader_ignore_signals=False,
strip=False, strip=False,
@@ -43,4 +43,5 @@ exe = EXE(
target_arch=None, target_arch=None,
codesign_identity=None, codesign_identity=None,
entitlements_file=None, entitlements_file=None,
onefile=True,
) )

View File

@@ -1,7 +1,6 @@
dashscope dashscope
numpy numpy
samplerate samplerate
PyAudio
PyAudioWPatch PyAudioWPatch
vosk vosk
pyinstaller pyinstaller

View File

@@ -0,0 +1,10 @@
import sys
if sys.platform == "win32":
from .win import AudioStream
elif sys.platform == "darwin":
from .darwin import AudioStream
elif sys.platform == "linux":
from .linux import AudioStream
else:
raise NotImplementedError(f"Unsupported platform: {sys.platform}")

View File

@@ -1,11 +1,24 @@
"""获取 MacOS 系统音频输入/输出流""" """获取 MacOS 系统音频输入/输出流"""
import pyaudio import pyaudio
from textwrap import dedent
def get_blackhole_device(mic: pyaudio.PyAudio):
"""
获取 BlackHole 设备
"""
device_count = mic.get_device_count()
for i in range(device_count):
dev_info = mic.get_device_info_by_index(i)
if 'blackhole' in str(dev_info["name"]).lower():
return dev_info
raise Exception("The device containing BlackHole was not found.")
class AudioStream: class AudioStream:
""" """
获取系统音频流支持 BlackHole 作为系统音频输出捕获 获取系统音频流如果要捕获输出音频支持 BlackHole 作为系统音频输出捕获
初始化参数 初始化参数
audio_type: 0-系统音频输出流需配合 BlackHole1-系统音频输入流 audio_type: 0-系统音频输出流需配合 BlackHole1-系统音频输入流
@@ -15,46 +28,40 @@ class AudioStream:
self.audio_type = audio_type self.audio_type = audio_type
self.mic = pyaudio.PyAudio() self.mic = pyaudio.PyAudio()
if self.audio_type == 0: if self.audio_type == 0:
self.device = self.getOutputDeviceInfo() self.device = get_blackhole_device(self.mic)
else: else:
self.device = self.mic.get_default_input_device_info() self.device = self.mic.get_default_input_device_info()
self.stop_signal = False
self.stream = None self.stream = None
self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16) self.INDEX = self.device["index"]
self.FORMAT = pyaudio.paInt16 self.FORMAT = pyaudio.paInt16
self.CHANNELS = self.device["maxInputChannels"] self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
self.CHANNELS = int(self.device["maxInputChannels"])
self.RATE = int(self.device["defaultSampleRate"]) self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // chunk_rate self.CHUNK = self.RATE // chunk_rate
self.INDEX = self.device["index"]
def getOutputDeviceInfo(self): def get_info(self):
"""查找指定关键词的输入设备"""
device_count = self.mic.get_device_count()
for i in range(device_count):
dev_info = self.mic.get_device_info_by_index(i)
if 'blackhole' in dev_info["name"].lower():
return dev_info
raise Exception("The device containing BlackHole was not found.")
def printInfo(self):
dev_info = f""" dev_info = f"""
采样输入设备 采样设备
- 设备类型{ "音频输出" if self.audio_type == 0 else "音频输入" } - 设备类型{ "音频输出" if self.audio_type == 0 else "音频输入" }
- 序号{self.device['index']} - 设备序号{self.device['index']}
- 名称{self.device['name']} - 设备名称{self.device['name']}
- 最大输入通道数{self.device['maxInputChannels']} - 最大输入通道数{self.device['maxInputChannels']}
- 默认低输入延迟{self.device['defaultLowInputLatency']}s - 默认低输入延迟{self.device['defaultLowInputLatency']}s
- 默认高输入延迟{self.device['defaultHighInputLatency']}s - 默认高输入延迟{self.device['defaultHighInputLatency']}s
- 默认采样率{self.device['defaultSampleRate']}Hz - 默认采样率{self.device['defaultSampleRate']}Hz
- 是否回环设备{self.device['isLoopbackDevice']}
音频样本块大小{self.CHUNK} 设备序号{self.INDEX}
样本格式{self.FORMAT}
样本位宽{self.SAMP_WIDTH} 样本位宽{self.SAMP_WIDTH}
采样格式{self.FORMAT} 样本通道数{self.CHANNELS}
音频通道数{self.CHANNELS} 样本采样率{self.RATE}
音频采样率{self.RATE} 样本块大小{self.CHUNK}
""" """
print(dev_info) return dedent(dev_info).strip()
def openStream(self): def open_stream(self):
""" """
打开并返回系统音频输出流 打开并返回系统音频输出流
""" """
@@ -72,14 +79,24 @@ class AudioStream:
""" """
读取音频数据 读取音频数据
""" """
if self.stop_signal:
self.close_stream()
return None
if not self.stream: return None if not self.stream: return None
return self.stream.read(self.CHUNK, exception_on_overflow=False) return self.stream.read(self.CHUNK, exception_on_overflow=False)
def closeStream(self): def close_stream_signal(self):
""" """
关闭系统音频输 线程安全的关闭系统音频输不一定会立即关闭
""" """
if self.stream is None: return self.stop_signal = True
self.stream.stop_stream()
self.stream.close() def close_stream(self):
self.stream = None """
立即关闭系统音频输入流
"""
if self.stream is not None:
self.stream.stop_stream()
self.stream.close()
self.stream = None
self.stop_signal = False

View File

@@ -1,8 +1,10 @@
"""获取 Linux 系统音频输入流""" """获取 Linux 系统音频输入流"""
import subprocess import subprocess
from textwrap import dedent
def findMonitorSource():
def find_monitor_source():
result = subprocess.run( result = subprocess.run(
["pactl", "list", "short", "sources"], ["pactl", "list", "short", "sources"],
stdout=subprocess.PIPE, text=True stdout=subprocess.PIPE, text=True
@@ -16,7 +18,8 @@ def findMonitorSource():
raise RuntimeError("System output monitor device not found") raise RuntimeError("System output monitor device not found")
def findInputSource():
def find_input_source():
result = subprocess.run( result = subprocess.run(
["pactl", "list", "short", "sources"], ["pactl", "list", "short", "sources"],
stdout=subprocess.PIPE, text=True stdout=subprocess.PIPE, text=True
@@ -28,8 +31,10 @@ def findInputSource():
name = parts[1] name = parts[1]
if ".monitor" not in name: if ".monitor" not in name:
return name return name
raise RuntimeError("Microphone input device not found") raise RuntimeError("Microphone input device not found")
class AudioStream: class AudioStream:
""" """
获取系统音频流 获取系统音频流
@@ -42,34 +47,33 @@ class AudioStream:
self.audio_type = audio_type self.audio_type = audio_type
if self.audio_type == 0: if self.audio_type == 0:
self.source = findMonitorSource() self.source = find_monitor_source()
else: else:
self.source = findInputSource() self.source = find_input_source()
self.stop_signal = False
self.process = None self.process = None
self.SAMP_WIDTH = 2
self.FORMAT = 16 self.FORMAT = 16
self.SAMP_WIDTH = 2
self.CHANNELS = 2 self.CHANNELS = 2
self.RATE = 48000 self.RATE = 48000
self.CHUNK = self.RATE // chunk_rate self.CHUNK = self.RATE // chunk_rate
def printInfo(self): def get_info(self):
dev_info = f""" dev_info = f"""
音频捕获进程 音频捕获进程
- 捕获类型{"音频输出" if self.audio_type == 0 else "音频输入"} - 捕获类型{"音频输出" if self.audio_type == 0 else "音频输入"}
- 设备源{self.source} - 设备源{self.source}
- 捕获进程PID{self.process.pid if self.process else "None"} - 捕获进程 PID{self.process.pid if self.process else "None"}
音频样本块大小{self.CHUNK} 样本格式{self.FORMAT}
样本位宽{self.SAMP_WIDTH} 样本位宽{self.SAMP_WIDTH}
采样格式{self.FORMAT} 样本通道数{self.CHANNELS}
音频通道数{self.CHANNELS} 样本采样率{self.RATE}
音频采样率{self.RATE} 样本块大小{self.CHUNK}
""" """
print(dev_info) print(dev_info)
def openStream(self): def open_stream(self):
""" """
启动音频捕获进程 启动音频捕获进程
""" """
@@ -82,13 +86,23 @@ class AudioStream:
""" """
读取音频数据 读取音频数据
""" """
if self.process: if self.stop_signal:
self.close_stream()
return None
if self.process and self.process.stdout:
return self.process.stdout.read(self.CHUNK) return self.process.stdout.read(self.CHUNK)
return None return None
def closeStream(self): def close_stream_signal(self):
"""
线程安全的关闭系统音频输入流不一定会立即关闭
"""
self.stop_signal = True
def close_stream(self):
""" """
关闭系统音频捕获进程 关闭系统音频捕获进程
""" """
if self.process: if self.process:
self.process.terminate() self.process.terminate()
self.stop_signal = False

View File

@@ -1,14 +1,15 @@
"""获取 Windows 系统音频输入/输出流""" """获取 Windows 系统音频输入/输出流"""
import pyaudiowpatch as pyaudio import pyaudiowpatch as pyaudio
from textwrap import dedent
def getDefaultLoopbackDevice(mic: pyaudio.PyAudio, info = True)->dict: def get_default_loopback_device(mic: pyaudio.PyAudio, info = True)->dict:
""" """
获取默认的系统音频输出的回环设备 获取默认的系统音频输出的回环设备
Args: Args:
mic (pyaudio.PyAudio): pyaudio对象 mic: pyaudio对象
info (bool, optional): 是否打印设备信息 info: 是否打印设备信息
Returns: Returns:
dict: 系统音频输出的回环设备 dict: 系统音频输出的回环设备
@@ -51,38 +52,40 @@ class AudioStream:
self.audio_type = audio_type self.audio_type = audio_type
self.mic = pyaudio.PyAudio() self.mic = pyaudio.PyAudio()
if self.audio_type == 0: if self.audio_type == 0:
self.device = getDefaultLoopbackDevice(self.mic, False) self.device = get_default_loopback_device(self.mic, False)
else: else:
self.device = self.mic.get_default_input_device_info() self.device = self.mic.get_default_input_device_info()
self.stop_signal = False
self.stream = None self.stream = None
self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16) self.INDEX = self.device["index"]
self.FORMAT = pyaudio.paInt16 self.FORMAT = pyaudio.paInt16
self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
self.CHANNELS = int(self.device["maxInputChannels"]) self.CHANNELS = int(self.device["maxInputChannels"])
self.RATE = int(self.device["defaultSampleRate"]) self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // chunk_rate self.CHUNK = self.RATE // chunk_rate
self.INDEX = self.device["index"]
def printInfo(self): def get_info(self):
dev_info = f""" dev_info = f"""
采样设备 采样设备
- 设备类型{ "音频输出" if self.audio_type == 0 else "音频输入" } - 设备类型{ "音频输出" if self.audio_type == 0 else "音频输入" }
- 序号{self.device['index']} - 设备序号{self.device['index']}
- 名称{self.device['name']} - 设备名称{self.device['name']}
- 最大输入通道数{self.device['maxInputChannels']} - 最大输入通道数{self.device['maxInputChannels']}
- 默认低输入延迟{self.device['defaultLowInputLatency']}s - 默认低输入延迟{self.device['defaultLowInputLatency']}s
- 默认高输入延迟{self.device['defaultHighInputLatency']}s - 默认高输入延迟{self.device['defaultHighInputLatency']}s
- 默认采样率{self.device['defaultSampleRate']}Hz - 默认采样率{self.device['defaultSampleRate']}Hz
- 是否回环设备{self.device['isLoopbackDevice']} - 是否回环设备{self.device['isLoopbackDevice']}
音频样本块大小{self.CHUNK} 设备序号{self.INDEX}
样本格式{self.FORMAT}
样本位宽{self.SAMP_WIDTH} 样本位宽{self.SAMP_WIDTH}
采样格式{self.FORMAT} 样本通道数{self.CHANNELS}
音频通道数{self.CHANNELS} 样本采样率{self.RATE}
音频采样率{self.RATE} 样本块大小{self.CHUNK}
""" """
print(dev_info) return dedent(dev_info).strip()
def openStream(self): def open_stream(self):
""" """
打开并返回系统音频输出流 打开并返回系统音频输出流
""" """
@@ -96,18 +99,28 @@ class AudioStream:
) )
return self.stream return self.stream
def read_chunk(self): def read_chunk(self) -> bytes | None:
""" """
读取音频数据 读取音频数据
""" """
if self.stop_signal:
self.close_stream()
return None
if not self.stream: return None if not self.stream: return None
return self.stream.read(self.CHUNK, exception_on_overflow=False) return self.stream.read(self.CHUNK, exception_on_overflow=False)
def closeStream(self): def close_stream_signal(self):
""" """
关闭系统音频输 线程安全的关闭系统音频输不一定会立即关闭
""" """
if self.stream is None: return self.stop_signal = True
self.stream.stop_stream()
self.stream.close() def close_stream(self):
self.stream = None """
关闭系统音频输入流
"""
if self.stream is not None:
self.stream.stop_stream()
self.stream.close()
self.stream = None
self.stop_signal = False

4
engine/utils/__init__.py Normal file
View File

@@ -0,0 +1,4 @@
from .audioprcs import merge_chunk_channels, resample_chunk_mono, resample_mono_chunk
from .sysout import stdout, stdout_cmd, stdout_obj, stderr
from .thdata import thread_data
from .server import start_server

View File

@@ -1,17 +1,19 @@
import samplerate import samplerate
import numpy as np import numpy as np
import numpy.core.multiarray # do not remove
def mergeChunkChannels(chunk, channels): def merge_chunk_channels(chunk: bytes, channels: int) -> bytes:
""" """
将当前多通道音频数据块转换为单通道音频数据块 将当前多通道音频数据块转换为单通道音频数据块
Args: Args:
chunk: (bytes)多通道音频数据块 chunk: 多通道音频数据块
channels: 通道数 channels: 通道数
Returns: Returns:
(bytes)单通道音频数据块 单通道音频数据块
""" """
if channels == 1: return chunk
# (length * channels,) # (length * channels,)
chunk_np = np.frombuffer(chunk, dtype=np.int16) chunk_np = np.frombuffer(chunk, dtype=np.int16)
# (length, channels) # (length, channels)
@@ -22,44 +24,49 @@ def mergeChunkChannels(chunk, channels):
return chunk_mono.tobytes() return chunk_mono.tobytes()
def resampleRawChunk(chunk, channels, orig_sr, target_sr, mode="sinc_best"): def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
""" """
将当前多通道音频数据块转换成单通道音频数据块然后进行重采样 将当前多通道音频数据块转换成单通道音频数据块然后进行重采样
Args: Args:
chunk: (bytes)多通道音频数据块 chunk: 多通道音频数据块
channels: 通道数 channels: 通道数
orig_sr: 原始采样率 orig_sr: 原始采样率
target_sr: 目标采样率 target_sr: 目标采样率
mode: 重采样模式可选'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear' mode: 重采样模式可选'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
Return: Return:
(bytes)单通道音频数据块 单通道音频数据块
""" """
# (length * channels,) if channels == 1:
chunk_np = np.frombuffer(chunk, dtype=np.int16) chunk_mono = chunk
# (length, channels) else:
chunk_np = chunk_np.reshape(-1, channels) # (length * channels,)
# (length,) chunk_np = np.frombuffer(chunk, dtype=np.int16)
chunk_mono_f = np.mean(chunk_np.astype(np.float32), axis=1) # (length, channels)
chunk_mono = chunk_mono_f.astype(np.int16) chunk_np = chunk_np.reshape(-1, channels)
# (length,)
chunk_mono_f = np.mean(chunk_np.astype(np.float32), axis=1)
chunk_mono = chunk_mono_f.astype(np.int16)
ratio = target_sr / orig_sr ratio = target_sr / orig_sr
chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode) chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode)
chunk_mono_r = np.round(chunk_mono_r).astype(np.int16) chunk_mono_r = np.round(chunk_mono_r).astype(np.int16)
return chunk_mono_r.tobytes() return chunk_mono_r.tobytes()
def resampleMonoChunk(chunk, orig_sr, target_sr, mode="sinc_best"):
def resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
""" """
将当前单通道音频块进行重采样 将当前单通道音频块进行重采样
Args: Args:
chunk: (bytes)单通道音频数据块 chunk: 单通道音频数据块
orig_sr: 原始采样率 orig_sr: 原始采样率
target_sr: 目标采样率 target_sr: 目标采样率
mode: 重采样模式可选'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear' mode: 重采样模式可选'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
Return: Return:
(bytes)单通道音频数据块 单通道音频数据块
""" """
chunk_np = np.frombuffer(chunk, dtype=np.int16) chunk_np = np.frombuffer(chunk, dtype=np.int16)
ratio = target_sr / orig_sr ratio = target_sr / orig_sr

36
engine/utils/server.py Normal file
View File

@@ -0,0 +1,36 @@
import socket
import threading
import json
from utils import thread_data, stdout_cmd, stderr
def handle_client(client_socket):
global thread_data
while thread_data.status == 'running':
try:
data = client_socket.recv(4096).decode('utf-8')
if not data:
break
data = json.loads(data)
if data['command'] == 'stop':
thread_data.status = 'stop'
break
except Exception as e:
stderr(f'Communication error: {e}')
break
thread_data.status = 'stop'
client_socket.close()
def start_server(port: int):
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(('localhost', port))
server.listen(1)
stdout_cmd('connect')
client, addr = server.accept()
client_handler = threading.Thread(target=handle_client, args=(client,))
client_handler.daemon = True
client_handler.start()

18
engine/utils/sysout.py Normal file
View File

@@ -0,0 +1,18 @@
import sys
import json
def stdout(text: str):
stdout_cmd("print", text)
def stdout_cmd(command: str, content = ""):
msg = { "command": command, "content": content }
sys.stdout.write(json.dumps(msg) + "\n")
sys.stdout.flush()
def stdout_obj(obj):
sys.stdout.write(json.dumps(obj) + "\n")
sys.stdout.flush()
def stderr(text: str):
sys.stderr.write(text + "\n")
sys.stderr.flush()

5
engine/utils/thdata.py Normal file
View File

@@ -0,0 +1,5 @@
class ThreadData:
def __init__(self):
self.status = "running"
thread_data = ThreadData()

12
package-lock.json generated
View File

@@ -1,12 +1,12 @@
{ {
"name": "auto-caption", "name": "auto-caption",
"version": "0.5.0", "version": "0.5.1",
"lockfileVersion": 3, "lockfileVersion": 3,
"requires": true, "requires": true,
"packages": { "packages": {
"": { "": {
"name": "auto-caption", "name": "auto-caption",
"version": "0.5.0", "version": "0.5.1",
"hasInstallScript": true, "hasInstallScript": true,
"dependencies": { "dependencies": {
"@electron-toolkit/preload": "^3.0.1", "@electron-toolkit/preload": "^3.0.1",
@@ -22,6 +22,7 @@
"@electron-toolkit/eslint-config-ts": "^3.0.0", "@electron-toolkit/eslint-config-ts": "^3.0.0",
"@electron-toolkit/tsconfig": "^1.0.1", "@electron-toolkit/tsconfig": "^1.0.1",
"@types/node": "^22.14.1", "@types/node": "^22.14.1",
"@types/pidusage": "^2.0.5",
"@vitejs/plugin-vue": "^5.2.3", "@vitejs/plugin-vue": "^5.2.3",
"electron": "^35.1.5", "electron": "^35.1.5",
"electron-builder": "^25.1.8", "electron-builder": "^25.1.8",
@@ -2296,6 +2297,13 @@
"undici-types": "~6.21.0" "undici-types": "~6.21.0"
} }
}, },
"node_modules/@types/pidusage": {
"version": "2.0.5",
"resolved": "https://registry.npmmirror.com/@types/pidusage/-/pidusage-2.0.5.tgz",
"integrity": "sha512-MIiyZI4/MK9UGUXWt0jJcCZhVw7YdhBuTOuqP/BjuLDLZ2PmmViMIQgZiWxtaMicQfAz/kMrZ5T7PKxFSkTeUA==",
"dev": true,
"license": "MIT"
},
"node_modules/@types/plist": { "node_modules/@types/plist": {
"version": "3.0.5", "version": "3.0.5",
"resolved": "https://registry.npmmirror.com/@types/plist/-/plist-3.0.5.tgz", "resolved": "https://registry.npmmirror.com/@types/plist/-/plist-3.0.5.tgz",

View File

@@ -1,7 +1,7 @@
{ {
"name": "auto-caption", "name": "auto-caption",
"productName": "Auto Caption", "productName": "Auto Caption",
"version": "0.5.0", "version": "0.5.1",
"description": "A cross-platform subtitle display software.", "description": "A cross-platform subtitle display software.",
"main": "./out/main/index.js", "main": "./out/main/index.js",
"author": "himeditator", "author": "himeditator",
@@ -13,7 +13,7 @@
"typecheck:web": "vue-tsc --noEmit -p tsconfig.web.json --composite false", "typecheck:web": "vue-tsc --noEmit -p tsconfig.web.json --composite false",
"typecheck": "npm run typecheck:node && npm run typecheck:web", "typecheck": "npm run typecheck:node && npm run typecheck:web",
"start": "electron-vite preview", "start": "electron-vite preview",
"dev": "electron-vite dev", "dev": "chcp 65001 && electron-vite dev",
"build": "npm run typecheck && electron-vite build", "build": "npm run typecheck && electron-vite build",
"postinstall": "electron-builder install-app-deps", "postinstall": "electron-builder install-app-deps",
"build:unpack": "npm run build && electron-builder --dir", "build:unpack": "npm run build && electron-builder --dir",
@@ -35,6 +35,7 @@
"@electron-toolkit/eslint-config-ts": "^3.0.0", "@electron-toolkit/eslint-config-ts": "^3.0.0",
"@electron-toolkit/tsconfig": "^1.0.1", "@electron-toolkit/tsconfig": "^1.0.1",
"@types/node": "^22.14.1", "@types/node": "^22.14.1",
"@types/pidusage": "^2.0.5",
"@vitejs/plugin-vue": "^5.2.3", "@vitejs/plugin-vue": "^5.2.3",
"electron": "^35.1.5", "electron": "^35.1.5",
"electron-builder": "^25.1.8", "electron-builder": "^25.1.8",

View File

@@ -2,6 +2,7 @@ import {
UILanguage, UITheme, Styles, Controls, UILanguage, UITheme, Styles, Controls,
CaptionItem, FullConfig CaptionItem, FullConfig
} from '../types' } from '../types'
import { Log } from './Log'
import { app, BrowserWindow } from 'electron' import { app, BrowserWindow } from 'electron'
import * as path from 'path' import * as path from 'path'
import * as fs from 'fs' import * as fs from 'fs'
@@ -48,6 +49,7 @@ class AllConfig {
uiTheme: UITheme = 'system'; uiTheme: UITheme = 'system';
styles: Styles = {...defaultStyles}; styles: Styles = {...defaultStyles};
controls: Controls = {...defaultControls}; controls: Controls = {...defaultControls};
lastLogIndex: number = -1;
captionLog: CaptionItem[] = []; captionLog: CaptionItem[] = [];
constructor() {} constructor() {}
@@ -61,7 +63,7 @@ class AllConfig {
if(config.leftBarWidth) this.leftBarWidth = config.leftBarWidth if(config.leftBarWidth) this.leftBarWidth = config.leftBarWidth
if(config.styles) this.setStyles(config.styles) if(config.styles) this.setStyles(config.styles)
if(config.controls) this.setControls(config.controls) if(config.controls) this.setControls(config.controls)
console.log('[INFO] Read Config from:', configPath) Log.info('Read Config from:', configPath)
} }
} }
@@ -75,7 +77,7 @@ class AllConfig {
} }
const configPath = path.join(app.getPath('userData'), 'config.json') const configPath = path.join(app.getPath('userData'), 'config.json')
fs.writeFileSync(configPath, JSON.stringify(config, null, 2)) fs.writeFileSync(configPath, JSON.stringify(config, null, 2))
console.log('[INFO] Write Config to:', configPath) Log.info('Write Config to:', configPath)
} }
public getFullConfig(): FullConfig { public getFullConfig(): FullConfig {
@@ -96,7 +98,7 @@ class AllConfig {
this.styles[key] = args[key] this.styles[key] = args[key]
} }
} }
console.log('[INFO] Set Styles:', this.styles) Log.info('Set Styles:', this.styles)
} }
public resetStyles() { public resetStyles() {
@@ -105,7 +107,7 @@ class AllConfig {
public sendStyles(window: BrowserWindow) { public sendStyles(window: BrowserWindow) {
window.webContents.send('both.styles.set', this.styles) window.webContents.send('both.styles.set', this.styles)
console.log(`[INFO] Send Styles to #${window.id}:`, this.styles) Log.info(`Send Styles to #${window.id}:`, this.styles)
} }
public setControls(args: Object) { public setControls(args: Object) {
@@ -116,27 +118,28 @@ class AllConfig {
} }
} }
this.controls.engineEnabled = engineEnabled this.controls.engineEnabled = engineEnabled
console.log('[INFO] Set Controls:', this.controls) Log.info('Set Controls:', this.controls)
} }
public sendControls(window: BrowserWindow) { public sendControls(window: BrowserWindow) {
window.webContents.send('control.controls.set', this.controls) window.webContents.send('control.controls.set', this.controls)
console.log(`[INFO] Send Controls to #${window.id}:`, this.controls) Log.info(`Send Controls to #${window.id}:`, this.controls)
} }
public updateCaptionLog(log: CaptionItem) { public updateCaptionLog(log: CaptionItem) {
let command: 'add' | 'upd' = 'add' let command: 'add' | 'upd' = 'add'
if( if(
this.captionLog.length && this.captionLog.length &&
this.captionLog[this.captionLog.length - 1].index === log.index && this.lastLogIndex === log.index
this.captionLog[this.captionLog.length - 1].time_s === log.time_s
) { ) {
this.captionLog.splice(this.captionLog.length - 1, 1, log) this.captionLog.splice(this.captionLog.length - 1, 1, log)
command = 'upd' command = 'upd'
} }
else { else {
this.captionLog.push(log) this.captionLog.push(log)
this.lastLogIndex = log.index
} }
this.captionLog[this.captionLog.length - 1].index = this.captionLog.length
for(const window of BrowserWindow.getAllWindows()){ for(const window of BrowserWindow.getAllWindows()){
this.sendCaptionLog(window, command) this.sendCaptionLog(window, command)
} }

View File

@@ -1,98 +1,78 @@
import { spawn, exec } from 'child_process' import { exec, spawn } from 'child_process'
import { app } from 'electron' import { app } from 'electron'
import { is } from '@electron-toolkit/utils' import { is } from '@electron-toolkit/utils'
import path from 'path' import path from 'path'
import net from 'net'
import { controlWindow } from '../ControlWindow' import { controlWindow } from '../ControlWindow'
import { allConfig } from './AllConfig' import { allConfig } from './AllConfig'
import { i18n } from '../i18n' import { i18n } from '../i18n'
import { Log } from './Log'
export class CaptionEngine { export class CaptionEngine {
appPath: string = '' appPath: string = ''
command: string[] = [] command: string[] = []
process: any | undefined process: any | undefined
processStatus: 'running' | 'stopping' | 'stopped' = 'stopped' client: net.Socket | undefined
status: 'running' | 'starting' | 'stopping' | 'stopped' = 'stopped'
private getApp(): boolean { private getApp(): boolean {
allConfig.controls.customized = false if (allConfig.controls.customized) {
if (allConfig.controls.customized && allConfig.controls.customizedApp) { Log.info('Using customized caption engine')
this.appPath = allConfig.controls.customizedApp this.appPath = allConfig.controls.customizedApp
this.command = [allConfig.controls.customizedCommand] this.command = allConfig.controls.customizedCommand.split(' ')
allConfig.controls.customized = true
} }
else if (allConfig.controls.engine === 'gummy') { else {
if(!allConfig.controls.API_KEY && !process.env.DASHSCOPE_API_KEY) { if(allConfig.controls.engine === 'gummy' &&
!allConfig.controls.API_KEY && !process.env.DASHSCOPE_API_KEY
) {
controlWindow.sendErrorMessage(i18n('gummy.key.missing')) controlWindow.sendErrorMessage(i18n('gummy.key.missing'))
return false return false
} }
let gummyName = 'main-gummy' this.command = []
if (process.platform === 'win32') {
gummyName += '.exe'
}
if (is.dev) { if (is.dev) {
this.appPath = path.join( this.appPath = path.join(
app.getAppPath(), app.getAppPath(), 'engine',
'caption-engine', 'dist', gummyName 'subenv', 'Scripts', 'python.exe'
) )
this.command.push(path.join(
app.getAppPath(), 'engine', 'main.py'
))
// this.appPath = path.join(app.getAppPath(), 'engine', 'dist', 'main.exe')
} }
else { else {
this.appPath = path.join( this.appPath = path.join(process.resourcesPath, 'engine', 'main.exe')
process.resourcesPath, 'caption-engine', gummyName
)
} }
this.command = []
this.command.push('-s', allConfig.controls.sourceLang) if(allConfig.controls.engine === 'gummy') {
this.command.push( this.command.push('-e', 'gummy')
'-t', allConfig.controls.translation ? this.command.push('-s', allConfig.controls.sourceLang)
allConfig.controls.targetLang : 'none' this.command.push(
) '-t', allConfig.controls.translation ?
this.command.push('-a', allConfig.controls.audio ? '1' : '0') allConfig.controls.targetLang : 'none'
if(allConfig.controls.API_KEY) { )
this.command.push('-k', allConfig.controls.API_KEY) this.command.push('-a', allConfig.controls.audio ? '1' : '0')
if(allConfig.controls.API_KEY) {
this.command.push('-k', allConfig.controls.API_KEY)
}
}
else if(allConfig.controls.engine === 'vosk'){
this.command.push('-e', 'vosk')
this.command.push('-a', allConfig.controls.audio ? '1' : '0')
this.command.push('-m', `"${allConfig.controls.modelPath}"`)
} }
} }
else if(allConfig.controls.engine === 'vosk'){ Log.info('Engine Path:', this.appPath)
let voskName = 'main-vosk' Log.info('Engine Command:', this.command)
if (process.platform === 'win32') {
voskName += '.exe'
}
if (is.dev) {
this.appPath = path.join(
app.getAppPath(),
'caption-engine', 'dist', voskName
)
}
else {
this.appPath = path.join(
process.resourcesPath, 'caption-engine', voskName
)
}
this.command = []
this.command.push('-a', allConfig.controls.audio ? '1' : '0')
this.command.push('-m', `"${allConfig.controls.modelPath}"`)
}
console.log('[INFO] Engine Path:', this.appPath)
console.log('[INFO] Engine Command:', this.command)
return true return true
} }
public start() { public connect() {
if (this.processStatus !== 'stopped') { if(this.client) { Log.warn('Client already exists, ignoring...') }
return Log.info('Connecting to caption engine server...');
} this.client = net.createConnection({ port: 7070 }, () => {
if(!this.getApp()){ return } Log.info('Connected to caption engine server');
});
try { this.status = 'running'
this.process = spawn(this.appPath, this.command)
}
catch (e) {
controlWindow.sendErrorMessage(i18n('engine.start.error') + e)
console.error('[ERROR] Error starting subprocess:', e)
return
}
this.processStatus = 'running'
console.log('[INFO] Caption Engine Started, PID:', this.process.pid)
allConfig.controls.engineEnabled = true allConfig.controls.engineEnabled = true
if(controlWindow.window){ if(controlWindow.window){
allConfig.sendControls(controlWindow.window) allConfig.sendControls(controlWindow.window)
@@ -101,72 +81,126 @@ export class CaptionEngine {
this.process.pid this.process.pid
) )
} }
}
public sendCommand(command: string, content: string = "") {
if(this.client === undefined) {
Log.error('Client not initialized yet')
return
}
const data = JSON.stringify({command, content})
this.client.write(data);
Log.info(`Send data to python server: ${data}`);
}
public start() {
if (this.status !== 'stopped') {
Log.warn('Casption engine is not stopped, current status:', this.status)
return
}
if(!this.getApp()){ return }
this.process = spawn(this.appPath, this.command)
this.status = 'starting'
Log.info('Caption Engine Starting, PID:', this.process.pid)
this.process.stdout.on('data', (data: any) => { this.process.stdout.on('data', (data: any) => {
const lines = data.toString().split('\n'); const lines = data.toString().split('\n')
lines.forEach((line: string) => { lines.forEach((line: string) => {
if (line.trim()) { if (line.trim()) {
try { try {
const caption = JSON.parse(line); const data_obj = JSON.parse(line)
if(caption.index === undefined) { handleEngineData(data_obj)
console.log('[INFO] Engine Bad Output:', caption);
}
else allConfig.updateCaptionLog(caption);
} catch (e) { } catch (e) {
controlWindow.sendErrorMessage(i18n('engine.output.parse.error') + e) controlWindow.sendErrorMessage(i18n('engine.output.parse.error') + e)
console.error('[ERROR] Error parsing JSON:', e); Log.error('Error parsing JSON:', e)
} }
} }
}); });
}); });
this.process.stderr.on('data', (data) => { this.process.stderr.on('data', (data: any) => {
if(this.processStatus === 'stopping') return const lines = data.toString().split('\n')
controlWindow.sendErrorMessage(i18n('engine.error') + data) lines.forEach((line: string) => {
console.error(`[ERROR] Subprocess Error: ${data}`); if(line.trim()){
controlWindow.sendErrorMessage(/*i18n('engine.error') +*/ line)
console.error(line)
}
})
}); });
this.process.on('close', (code: any) => { this.process.on('close', (code: any) => {
console.log(`[INFO] Subprocess exited with code ${code}`);
this.process = undefined; this.process = undefined;
this.client = undefined
allConfig.controls.engineEnabled = false allConfig.controls.engineEnabled = false
if(controlWindow.window){ if(controlWindow.window){
allConfig.sendControls(controlWindow.window) allConfig.sendControls(controlWindow.window)
controlWindow.window.webContents.send('control.engine.stopped') controlWindow.window.webContents.send('control.engine.stopped')
} }
this.processStatus = 'stopped' this.status = 'stopped'
console.log('[INFO] Caption engine process stopped') Log.info(`Engine exited with code ${code}`)
}); });
} }
public stop() { public stop() {
if(this.processStatus !== 'running') return if(this.status !== 'running'){
Log.warn('Engine is not running, current status:', this.status)
return
}
this.sendCommand('stop')
if(this.client){
this.client.destroy()
this.client = undefined
}
this.status = 'stopping'
Log.info('Caption engine process stopping...')
}
public kill(){
if(this.status !== 'running'){
Log.warn('Engine is not running, current status:', this.status)
return
}
if (this.process.pid) { if (this.process.pid) {
console.log('[INFO] Trying to stop process, PID:', this.process.pid) Log.warn('Trying to kill engine process, PID:', this.process.pid)
if(this.client){
this.client.destroy()
this.client = undefined
}
let cmd = `kill ${this.process.pid}`; let cmd = `kill ${this.process.pid}`;
if (process.platform === "win32") { if (process.platform === "win32") {
cmd = `taskkill /pid ${this.process.pid} /t /f` cmd = `taskkill /pid ${this.process.pid} /t /f`
} }
exec(cmd, (error) => { exec(cmd)
if (error) {
controlWindow.sendErrorMessage(i18n('engine.shutdown.error') + error)
console.error(`[ERROR] Failed to kill process: ${error}`)
}
})
} }
else { this.status = 'stopping'
this.process = undefined; }
allConfig.controls.engineEnabled = false }
if(controlWindow.window){
allConfig.sendControls(controlWindow.window) function handleEngineData(data: any) {
controlWindow.window.webContents.send('control.engine.stopped') if(data.command === 'connect'){
} captionEngine.connect()
this.processStatus = 'stopped' }
console.log('[INFO] Process PID undefined, caption engine process stopped') else if(data.command === 'kill') {
return if(captionEngine.status !== 'stopped') {
Log.warn('Error occurred, trying to kill Gummy engine...')
captionEngine.kill()
} }
this.processStatus = 'stopping' }
console.log('[INFO] Caption engine process stopping') else if(data.command === 'caption') {
allConfig.updateCaptionLog(data);
}
else if(data.command === 'print') {
Log.info('Engine Print:', data.content)
}
else if(data.command === 'info') {
Log.info('Engine Info:', data.content)
}
else if(data.command === 'usage') {
Log.info('Gummy Engine Usage: ', data.content)
}
else {
Log.warn('Unknown command:', data)
} }
} }

22
src/main/utils/Log.ts Normal file
View File

@@ -0,0 +1,22 @@
function getTimeString() {
const now = new Date()
const HH = String(now.getHours()).padStart(2, '0')
const MM = String(now.getMinutes()).padStart(2, '0')
const SS = String(now.getSeconds()).padStart(2, '0')
const MS = String(now.getMilliseconds()).padStart(3, '0')
return `${HH}:${MM}:${SS}.${MS}`
}
export class Log {
static info(...msg: any[]){
console.log(`[INFO ${getTimeString()}]`, ...msg)
}
static warn(...msg: any[]){
console.warn(`[WARN ${getTimeString()}]`, ...msg)
}
static error(...msg: any[]){
console.error(`[ERROR ${getTimeString()}]`, ...msg)
}
}

View File

@@ -136,6 +136,7 @@ import { useCaptionLogStore } from '@renderer/stores/captionLog'
import { message } from 'ant-design-vue' import { message } from 'ant-design-vue'
import { useI18n } from 'vue-i18n' import { useI18n } from 'vue-i18n'
import * as tc from '../utils/timeCalc' import * as tc from '../utils/timeCalc'
import { CaptionItem } from '../types'
const { t } = useI18n() const { t } = useI18n()
@@ -154,10 +155,9 @@ const baseMS = ref<number>(0)
const pagination = ref({ const pagination = ref({
current: 1, current: 1,
pageSize: 10, pageSize: 20,
showSizeChanger: true, showSizeChanger: true,
pageSizeOptions: ['10', '20', '50'], pageSizeOptions: ['10', '20', '50', '100'],
showTotal: (total: number) => `Total: ${total}`,
onChange: (page: number, pageSize: number) => { onChange: (page: number, pageSize: number) => {
pagination.value.current = page pagination.value.current = page
pagination.value.pageSize = pageSize pagination.value.pageSize = pageSize
@@ -174,12 +174,23 @@ const columns = [
dataIndex: 'index', dataIndex: 'index',
key: 'index', key: 'index',
width: 80, width: 80,
sorter: (a: CaptionItem, b: CaptionItem) => {
if(a.index <= b.index) return -1
return 1
},
sortDirections: ['descend'],
defaultSortOrder: 'descend',
}, },
{ {
title: 'time', title: 'time',
dataIndex: 'time', dataIndex: 'time',
key: 'time', key: 'time',
width: 160, width: 160,
sorter: (a: CaptionItem, b: CaptionItem) => {
if(a.time_s <= b.time_s) return -1
return 1
},
sortDirections: ['descend', 'ascend'],
}, },
{ {
title: 'content', title: 'content',

View File

@@ -37,7 +37,7 @@
<a-input <a-input
class="input-area" class="input-area"
type="range" type="range"
min="0" max="64" min="0" max="72"
v-model:value="currentFontSize" v-model:value="currentFontSize"
/> />
<div class="input-item-value">{{ currentFontSize }}px</div> <div class="input-item-value">{{ currentFontSize }}px</div>
@@ -114,7 +114,7 @@
<a-input <a-input
class="input-area" class="input-area"
type="range" type="range"
min="0" max="64" min="0" max="72"
v-model:value="currentTransFontSize" v-model:value="currentTransFontSize"
/> />
<div class="input-item-value">{{ currentTransFontSize }}px</div> <div class="input-item-value">{{ currentTransFontSize }}px</div>
@@ -159,7 +159,7 @@
<a-input <a-input
class="input-area" class="input-area"
type="range" type="range"
min="0" max="10" min="0" max="12"
v-model:value="currentBlur" v-model:value="currentBlur"
/> />
<div class="input-item-value">{{ currentBlur }}px</div> <div class="input-item-value">{{ currentBlur }}px</div>
@@ -282,7 +282,8 @@ function applyStyle(){
captionStyle.sendStylesChange(); captionStyle.sendStylesChange();
notification.open({ notification.open({
placement: 'topLeft',
message: t('noti.styleChange'), message: t('noti.styleChange'),
description: t('noti.styleInfo') description: t('noti.styleInfo')
}); });

View File

@@ -164,6 +164,7 @@ function applyChange(){
engineControl.sendControlsChange() engineControl.sendControlsChange()
notification.open({ notification.open({
placement: 'topLeft',
message: t('noti.engineChange'), message: t('noti.engineChange'),
description: t('noti.changeInfo') description: t('noti.changeInfo')
}); });

View File

@@ -4,10 +4,10 @@
<a-col :span="6"> <a-col :span="6">
<a-statistic <a-statistic
:title="$t('status.engine')" :title="$t('status.engine')"
:value="(customized && customizedApp)?$t('status.customized'):engine" :value="customized?$t('status.customized'):engine"
/> />
</a-col> </a-col>
<a-popover :title="$t('status.engineStatus')"> <a-popover :title="$t('status.engineStatus')">
<template #content> <template #content>
<a-row class="engine-status"> <a-row class="engine-status">
<a-col :flex="1" :title="$t('status.pid')" style="cursor:pointer;"> <a-col :flex="1" :title="$t('status.pid')" style="cursor:pointer;">
@@ -41,8 +41,8 @@
<InfoCircleOutlined style="font-size:18px;color:#1677ff"/> <InfoCircleOutlined style="font-size:18px;color:#1677ff"/>
</template> </template>
</a-statistic> </a-statistic>
</a-col> </a-col>
</a-popover> </a-popover>
<a-col :span="6"> <a-col :span="6">
<a-statistic :title="$t('status.logNumber')" :value="captionData.length" /> <a-statistic :title="$t('status.logNumber')" :value="captionData.length" />
</a-col> </a-col>
@@ -61,12 +61,14 @@
>{{ $t('status.openCaption') }}</a-button> >{{ $t('status.openCaption') }}</a-button>
<a-button <a-button
class="control-button" class="control-button"
:disabled="engineEnabled" :loading="pending && !engineEnabled"
:disabled="pending || engineEnabled"
@click="startEngine" @click="startEngine"
>{{ $t('status.startEngine') }}</a-button> >{{ $t('status.startEngine') }}</a-button>
<a-button <a-button
danger class="control-button" danger class="control-button"
:disabled="!engineEnabled" :loading="pending && engineEnabled"
:disabled="pending || !engineEnabled"
@click="stopEngine" @click="stopEngine"
>{{ $t('status.stopEngine') }}</a-button> >{{ $t('status.stopEngine') }}</a-button>
</div> </div>
@@ -77,7 +79,7 @@
<p class="about-desc">{{ $t('status.about.desc') }}</p> <p class="about-desc">{{ $t('status.about.desc') }}</p>
<a-divider /> <a-divider />
<div class="about-info"> <div class="about-info">
<p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v0.5.0</a-tag></p> <p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v0.5.1</a-tag></p>
<p> <p>
<b>{{ $t('status.about.author') }}</b> <b>{{ $t('status.about.author') }}</b>
<a <a
@@ -119,18 +121,19 @@
<script setup lang="ts"> <script setup lang="ts">
import { EngineInfo } from '@renderer/types' import { EngineInfo } from '@renderer/types'
import { ref } from 'vue' import { ref, watch } from 'vue'
import { storeToRefs } from 'pinia' import { storeToRefs } from 'pinia'
import { useCaptionLogStore } from '@renderer/stores/captionLog' import { useCaptionLogStore } from '@renderer/stores/captionLog'
import { useEngineControlStore } from '@renderer/stores/engineControl' import { useEngineControlStore } from '@renderer/stores/engineControl'
import { GithubOutlined, InfoCircleOutlined } from '@ant-design/icons-vue'; import { GithubOutlined, InfoCircleOutlined } from '@ant-design/icons-vue';
const showAbout = ref(false) const showAbout = ref(false)
const pending = ref(false)
const captionLog = useCaptionLogStore() const captionLog = useCaptionLogStore()
const { captionData } = storeToRefs(captionLog) const { captionData } = storeToRefs(captionLog)
const engineControl = useEngineControlStore() const engineControl = useEngineControlStore()
const { engineEnabled, engine, customized, customizedApp } = storeToRefs(engineControl) const { engineEnabled, engine, customized } = storeToRefs(engineControl)
const pid = ref(0) const pid = ref(0)
const ppid = ref(0) const ppid = ref(0)
@@ -143,6 +146,7 @@ function openCaptionWindow() {
} }
function startEngine() { function startEngine() {
pending.value = true
if(engineControl.engine === 'vosk' && engineControl.modelPath.trim() === '') { if(engineControl.engine === 'vosk' && engineControl.modelPath.trim() === '') {
engineControl.emptyModelPathErr() engineControl.emptyModelPathErr()
return return
@@ -151,6 +155,7 @@ function startEngine() {
} }
function stopEngine() { function stopEngine() {
pending.value = true
window.electron.ipcRenderer.send('control.engine.stop') window.electron.ipcRenderer.send('control.engine.stop')
} }
@@ -164,6 +169,9 @@ function getEngineInfo() {
}) })
} }
watch(engineEnabled, () => {
pending.value = false
})
</script> </script>
<style scoped> <style scoped>

View File

@@ -93,7 +93,7 @@ export default {
"engine": "Caption Engine", "engine": "Caption Engine",
"engineStatus": "Caption Engine Status", "engineStatus": "Caption Engine Status",
"pid": "Process ID", "pid": "Process ID",
"ppid": "Parent Process ID", "ppid": "Parent Process ID",
"cpu": "CPU Usage", "cpu": "CPU Usage",
"mem": "Memory Usage", "mem": "Memory Usage",
"elapsed": "Running Time", "elapsed": "Running Time",
@@ -116,7 +116,7 @@ export default {
"projLink": "Project Link", "projLink": "Project Link",
"manual": "User Manual", "manual": "User Manual",
"engineDoc": "Caption Engine Manual", "engineDoc": "Caption Engine Manual",
"date": "July 15, 2025" "date": "July 17, 2025"
} }
}, },
log: { log: {

View File

@@ -94,7 +94,7 @@ export default {
"engineStatus": "字幕エンジンの状態", "engineStatus": "字幕エンジンの状態",
"pid": "プロセス ID", "pid": "プロセス ID",
"ppid": "親プロセス ID", "ppid": "親プロセス ID",
"cpu": "CPU 使用率", "cpu": "CPU 使用率",
"mem": "メモリ使用量", "mem": "メモリ使用量",
"elapsed": "稼働時間", "elapsed": "稼働時間",
"customized": "カスタマイズ済み", "customized": "カスタマイズ済み",
@@ -116,7 +116,7 @@ export default {
"projLink": "プロジェクトリンク", "projLink": "プロジェクトリンク",
"manual": "ユーザーマニュアル", "manual": "ユーザーマニュアル",
"engineDoc": "字幕エンジンマニュアル", "engineDoc": "字幕エンジンマニュアル",
"date": "2025 年 7 月 15 日" "date": "2025 年 7 月 17 日"
} }
}, },
log: { log: {

View File

@@ -116,7 +116,7 @@ export default {
"projLink": "项目链接", "projLink": "项目链接",
"manual": "用户手册", "manual": "用户手册",
"engineDoc": "字幕引擎手册", "engineDoc": "字幕引擎手册",
"date": "2025 年 7 月 15 日" "date": "2025 年 7 月 17 日"
} }
}, },
log: { log: {

View File

@@ -64,6 +64,7 @@ export const useEngineControlStore = defineStore('engineControl', () => {
function emptyModelPathErr() { function emptyModelPathErr() {
notification.open({ notification.open({
placement: 'topLeft',
message: t('noti.empty'), message: t('noti.empty'),
description: t('noti.emptyInfo') description: t('noti.emptyInfo')
}); });
@@ -80,15 +81,17 @@ export const useEngineControlStore = defineStore('engineControl', () => {
(translation.value ? `${t('noti.tLang')}${targetLang.value}` : ''); (translation.value ? `${t('noti.tLang')}${targetLang.value}` : '');
const str1 = `${t('noti.custom')}${customizedApp.value}${t('noti.args')}${customizedCommand.value}`; const str1 = `${t('noti.custom')}${customizedApp.value}${t('noti.args')}${customizedCommand.value}`;
notification.open({ notification.open({
placement: 'topLeft',
message: t('noti.started'), message: t('noti.started'),
description: description:
((customized.value && customizedApp.value) ? str1 : str0) + (customized.value ? str1 : str0) +
`${t('noti.pidInfo')}${args}` `${t('noti.pidInfo')}${args}`
}); });
}) })
window.electron.ipcRenderer.on('control.engine.stopped', () => { window.electron.ipcRenderer.on('control.engine.stopped', () => {
notification.open({ notification.open({
placement: 'topLeft',
message: t('noti.stopped'), message: t('noti.stopped'),
description: t('noti.stoppedInfo') description: t('noti.stoppedInfo')
}); });
@@ -99,7 +102,6 @@ export const useEngineControlStore = defineStore('engineControl', () => {
message: t('noti.error'), message: t('noti.error'),
description: message, description: message,
duration: null, duration: null,
placement: 'topLeft',
icon: () => h(ExclamationCircleOutlined, { style: 'color: #ff4d4f' }) icon: () => h(ExclamationCircleOutlined, { style: 'color: #ff4d4f' })
}); });
}) })