52 Commits

Author SHA1 Message Date
himeditator
36636d0caa feat(engine): 添加字幕窗口宽度记忆功能并优化字幕引擎关闭逻辑
- 添加 captionWindowWidth 属性,用于保存字幕窗口宽度
- 修改 CaptionEngine 中的 stop 和 kill 方法,优化字幕引擎关闭逻辑
- 更新 README,添加预备模型列表
2025-08-02 15:40:13 +08:00
himeditator mac
a7a60da260 fix(engine): 字幕引擎启动路径适配、音频重采样函数适配 2025-07-30 00:16:54 +08:00
himeditator
1b7ff33656 feat(docs): 更新项目文档和图片 2025-07-29 23:20:15 +08:00
himeditator mac
d5d692188e feat(engine): 优化字幕引擎、提升程序健壮性
- 优化服务器启动流程,增加异常处理
- 主程序和字幕引擎的 WebSocket 端口号改为随机生成
2025-07-29 19:37:03 +08:00
himeditator
e4f937e6b6 feat(engine): 优化字幕引擎通信和控制逻辑,优化窗口信息展示
- 优化错误处理和引擎重启逻辑
- 添加字幕引擎强制终止功能
- 调整通知和错误提示的显示位置
- 优化日志记录精度到毫秒级
2025-07-28 21:44:49 +08:00
himeditator
cd9f3a847d feat(engine): 重构字幕引擎并实现 WebSocket 通信
- 重构了 Gummy 和 Vosk 字幕引擎的代码,提高了可扩展性和可读性
- 合并 Gummy 和 Vosk 引擎为单个可执行文件
- 实现了字幕引擎和主程序之间的 WebSocket 通信,避免了孤儿进程问题
2025-07-28 15:49:52 +08:00
himeditator
b658ef5440 feat(engine): 优化字幕引擎输出格式、准备合并两个字幕引擎
- 重构字幕引擎相关代码
- 准备合并两个字幕引擎
2025-07-27 17:15:12 +08:00
himeditator
3792eb88b6 refactor(engine): 重构字幕引擎
- 更新 GummyTranslator 类,优化字幕生成逻辑
- 移除 audioprcs 模块,音频处理功能转移到 utils 模块
- 重构 sysaudio 模块,提高音频流管理的灵活性和稳定性
- 修改 TODO.md,完成按时间降序排列字幕记录的功能
- 更新文档,说明因资源限制将不再维护英文和日文文档
2025-07-26 23:37:24 +08:00
himeditator
8e575a9ba3 refactor(engine): 字幕引擎文件夹重命名,字幕记录添加降序选择
- 字幕记录表格可以按时间降序排列
- 将 caption-engine 重命名为 engine
- 更新了相关文件和文件夹的路径
- 修改了 README 和 TODO 文档中的相关内容
- 更新了 Electron 构建配置
2025-07-26 21:29:16 +08:00
himeditator
697488ce84 docs: update README, add TODO 2025-07-20 00:32:57 +08:00
himeditator
f7d2df938d fix(engine): 修复自定义字幕引擎相关问题 2025-07-17 20:52:27 +08:00
himeditator
5513c7e84c docs(compatibility): 添加 Kylin OS 支持、更新文档 2025-07-16 20:55:03 +08:00
himeditator
25b6ad5ed2 release v0.5.0
- 更新了发行说明和用户手册
- 优化了界面显示和功能
- 过滤 Gummy 字幕引擎输出的不完整字幕
2025-07-15 18:48:16 +08:00
himeditator mac
760c01d79e feat(engine): 添加字幕引擎资源消耗监控功能
- 在控制窗口添加引擎状态显示,包括 PID、PPID、CPU 使用率、内存使用量和运行时间
- 优化字幕记录导出和复制功能,支持选择导出内容类型
2025-07-15 13:52:10 +08:00
himeditator
a0a0a2e66d feat(caption): 调整字幕窗口、添加字幕时间轴修改 (#8)
- 新增修改字幕时间功能
- 添加导出字幕记录类型,支持 srt 和 json 格式
- 调整字幕窗口右上角图标为竖向排布
2025-07-14 20:07:22 +08:00
himeditator
665c47d24f feat(linux): 支持 Linux 系统音频输出
- 添加了对 Linux 系统音频输出的支持
- 更新了 README 和用户手册中的平台兼容性信息
- 修改了 AudioStream 类以支持 Linux 平台
2025-07-13 23:28:40 +08:00
himeditator
7f8766b13e docs(engine-manual): 更新字幕引擎开发文档
- 添加了命令行参数指定的详细说明
- 增加了字幕引擎打包和运行的步骤说明
- 修复了一些文档中的错误和拼写问题
2025-07-11 13:25:52 +08:00
himeditator
6920957152 Merge branch 'dev-v0.4.0-vosk' 2025-07-11 02:32:33 +08:00
himeditator
604f8becc9 fix: 添加构建说明、修复 vosk 提示逻辑
- 优化 EngineStatus 组件中的引擎启动逻辑,增加对 vosk 引擎的判断
- 在 README.md、README_en.md 和 README_ja.md 中添加 macOS 截图
2025-07-11 02:31:10 +08:00
Chen Janai
0af5bab75d Merge pull request #7 from HiMeditator/dev-v0.4.0-vosk
Release v0.4.0 with Vosk Caption Engine
2025-07-11 01:36:08 +08:00
himeditator
0b8b823b2e release v0.4.0
- 更新 README 和用户手册,增加 Vosk 引擎的使用说明
- 修改构建配置,支持 Vosk 引擎的打包
- 更新版本号至 0.4.0,准备发布新功能
2025-07-11 01:33:04 +08:00
himeditator
d354a6fefa feat(engine): 优化 Vosk 字幕引擎支持
- 实现文件夹选择功能,用于选择 Vosk 模型路径
- 在 EngineControl 组件中添加模型路径选择按钮和相关提示
- 在 EngineStatus 组件中增加对空模型路径的检查和错误提示
2025-07-10 11:22:39 +08:00
himeditator
1c29fd5adc feat(engine): 添加 Vosk 本地离线引擎支持
- 新增 Vosk 引擎配置和识别逻辑
- 更新用户界面,增加 Vosk 引擎选项和模型路径设置
- 更新依赖,添加 vosk 库
2025-07-09 19:53:30 +08:00
himeditator
f97b885411 release v0.3.0
- 在 README中更新访问者徽章的 page_id 为正确的项目路径
- 修改 electron-builder.yml 中的 extraResources 配置
2025-07-09 02:34:15 +08:00
himeditator
606f9b480b release v0.3.0
- 新增字幕字体粗细、文本阴影等设置选项
- 更新相关文档,增加新功能说明
- 修复系统主题载入颜色bug
2025-07-09 01:33:21 +08:00
Chen Janai
546beb3112 Merge pull request #6 from HiMeditator/mac-adaption
Mac Adaption
2025-07-08 22:46:51 +08:00
himeditator mac
3c9138f115 feat(docs): 更新文档、添加 macOS 平台适配指南 2025-07-08 22:44:11 +08:00
himeditator mac
cbbaaa95a3 feat(gummy): 支持通过设置添加 API KEY
- 更新 main-gummy.py 以支持 API KEY 参数
- 修改 electron-builder.yml 以调整 Gummy 可执行文件路径
2025-07-08 21:05:43 +08:00
himeditator mac
7e953db6bd feat(sysaudio): 支持 macOS 系统音频流采集
- 新增 darwin.py 文件实现 macOS 音频流采集功能
- 修改 main-gummy.py 以支持 macOS 平台
- 更新 AllConfig 和 CaptionEngine 以适配新平台
2025-07-08 17:04:15 +08:00
himeditator mac
65da30f83d build: 进行 macOS 适配,更新图标资源并升级项目版本
- 移除旧的图标资源,更新为新的图标
- 更新项目版本号至 0.2.1
- 修改 README 中的环境搭建说明,增加 macOS 支持
2025-07-08 13:27:44 +08:00
himeditator
1965bbfee7 feat(docs): 修复仅复制原文时的bug,更新 TODO.md 2025-07-08 01:44:38 +08:00
himeditator
8ac1c99c63 feat(log): 添加字幕记录复制功能 (#3)
- 提高记录时间精度,精确到毫秒
- 在字幕记录组件中添加复制到剪贴板的功能
- 提供多种复制选项,包括是否添加序号、是否复制时间、选择复制内容等
2025-07-08 01:33:48 +08:00
himeditator
082eb8579b docs(README): 更新自带字幕引擎说明 (#4)
- 在 README.md、README_en.md 和 README_ja.md 中添加了自带字幕引擎的详细说明
- 给予字幕窗口更大的顶置优先级
2025-07-07 22:54:30 +08:00
himeditator
0696651f04 feat(audio): 重构音频处理模块、音频流重采样测试成功 2025-07-07 22:54:30 +08:00
himeditator
f2aa075e65 refactor(caption-engine): 重构字幕引擎代码结构
- 重构 GummyTranslator 类,增加启动和停止方法
- 优化 AudioStream 类,添加读取音频数据方法
- 更新 main-gummy.py,使用新的 GummyTranslator 和 AudioStream 接口
- 更新文档和 TODO 列表
2025-07-07 22:54:30 +08:00
himeditator
213426dace release v0.2.0
- 更新和增加文档
- 添加新的图片
- 优化文档结构和内容
2025-07-07 22:54:30 +08:00
himeditator
50ea9c5e4c refactor(caption): 重构字幕引擎结构、修复字幕引擎空置报错 (#2)
- 修复gummy字幕引擎长时间空置报错的问题
- 将 python-subprocess 文件夹重命名为 caption-engine
- 删除未使用的 prototype 代码
2025-07-07 22:53:35 +08:00
himeditator
22cfb75d2c feat(renderer): 增加长字幕隐藏功能 (#1)
- 修复暗色主题部分内容的显示颜色
- 添加长字幕内容隐藏功能
- 优化字幕样式预览界面,支持动态显示最新字幕内容
2025-07-07 22:52:49 +08:00
himeditator
f29e15cde5 feat(theme): 添加暗色主题支持
- 新增暗色主题选项和系统主题自动适配功能
- 调整了部分样式以适应暗色主题
2025-07-05 00:54:12 +08:00
himeditator
14e7a7bce4 feat: 完全实现多语言支持、优化软件体验
- 完成多语言的剩余内容的翻译
- 重构配置管理,前端页面实现更快速的配置载入
- 为字幕引擎添加更严格的状态限制,防止出现僵尸进程
2025-07-04 22:27:43 +08:00
himeditator
0b279dedbf docs(api): 修改部分通信接口、更新 API 文档
- 重新定义了通信命令的命名规则和语义
- 修改了多个前端和后端之间的通信接口
- 为模型信息添加国际化
2025-07-04 18:38:56 +08:00
himeditator
0a10068b38 feat(i18n): 实现前端国际化
- 新增英文、日文和中文翻译文件
- 添加语言切换功能
- 更新各组件的文本内容以支持国际化
2025-07-03 23:29:10 +08:00
himeditator
d608bf59c7 feat(i18n): 后端添加国际化支持、优化前端界面
- 后端添加并实现国际化支持
- 前端引入 vue-i18n 模块(尚未添加国际化逻辑)
- 优化用户界面样式,统一输入框和标签样式
2025-07-03 20:36:09 +08:00
himeditator
3dcba07b6e refactor(renderer): 重构项目前端
- 拆分了 CaptionData 和 ControlPage 组件
- 对部分页面和变量进行了重命名
- 重构优化了状态管理,新增状态管理
2025-07-02 20:56:21 +08:00
himeditator
e77779b72a refactor: 重构项目后端
- 移除 .npmrc 中的镜像配置
- 移除 package.json 中未使用的依赖
- 大幅重构后端代码
2025-07-01 21:50:33 +08:00
himeditator
e30124cb87 fix: 修复样式载入问题、微调文档 2025-06-26 23:04:39 +08:00
himeditator
301c691f04 feat(ControlPage): 添加项目信息展示 2025-06-26 21:53:42 +08:00
himeditator
4ff1346b6d docs: 更新用户手册和字幕引擎文档链接 2025-06-26 21:33:45 +08:00
himeditator
b28799b03f feat: 新增配置保存和读取、新增文档
- 添加配置数据文件保存和载入
- 添加字幕样式恢复默认的选项
- 添加用户说明文档
- 添加字幕引擎说明文档
2025-06-26 21:29:06 +08:00
himeditator
147e328d8c refactor(main): 重构字幕引擎启动和错误处理逻辑
- 修改了字幕引擎的启动条件,增加了对环境变量和自定义应用的检查
- 优化了错误处理机制,通过控制窗口发送错误消息
- 在前端增加了错误通知功能
2025-06-26 18:59:53 +08:00
himeditator
c086725d98 feat(renderer): 修改部分页面提示、增加英文版README 2025-06-23 20:23:03 +08:00
Chen Janai
fae8b32edf Update README.md 2025-06-22 16:21:38 +08:00
109 changed files with 6102 additions and 2410 deletions

View File

@@ -6,4 +6,10 @@ indent_style = space
indent_size = 2
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
trim_trailing_whitespace = true
[*.py]
indent_size = 4
[*.ipynb]
indent_size = 4

5
.gitignore vendored
View File

@@ -5,5 +5,8 @@ out
.eslintcache
*.log*
__pycache__
.venv
subenv
python-subprocess/build
engine/build
engine/models
engine/notebook

View File

@@ -7,5 +7,8 @@
},
"[json]": {
"editor.defaultFormatter": "esbenp.prettier-vscode"
}
},
"python.analysis.extraPaths": [
"./engine"
]
}

168
README.md
View File

@@ -1,37 +1,123 @@
<div align="center" >
<img src="./resources/icon.png" width="100px" height="100px"/>
<img src="./build/icon.png" width="100px" height="100px"/>
<h1 align="center">auto-caption</h1>
<p>Auto Caption 是一个跨平台的视频播放和字幕显示软件。</p>
<b>项目初版已经开发完毕。</b>
<p>Auto Caption 是一个跨平台的实时字幕显示软件。</p>
<p>
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-0.6.0-blue"></a>
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
<img src="https://img.shields.io/github/stars/HiMeditator/auto-caption?style=social">
</p>
<p>
| <b>简体中文</b>
| <a href="./README_en.md">English</a>
| <a href="./README_ja.md">日本語</a> |
</p>
<p><i>v0.6.0 版本已经发布,对字幕引擎代码进行了大重构,提升了代码的可扩展性。更多的字幕引擎正在尝试开发中...</i></p>
</div>
![](./assets/01.png)
![](./assets/media/main_zh.png)
## 📥 下载
[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
## 📚 用户手册
## 📚 相关文档
暂无
[Auto Caption 用户手册](./docs/user-manual/zh.md)
### 基本使用
[字幕引擎说明文档](./docs/engine-manual/zh.md)
目前仅提供 Windows 平台的可安装版本。如果使用默认的 Gummy 字幕引擎,需要获取阿里云百炼平台的 API KEY 并配置到环境变量中才能正常使用该模型。相关教程:[获取API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)、[将API Key配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
[项目 API 文档](./docs/api-docs/)
[更新日志](./docs/CHANGELOG.md)
## 📖 基本使用
软件已经适配了 Windows、macOS 和 Linux 平台。测试过的平台信息如下:
| 操作系统版本 | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
| ------------------ | ---------- | ---------------- | ---------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置,详见[Auto Caption 用户手册](./docs/user-manual/zh.md)。
> 国际版的阿里云服务并没有提供 Gummy 模型,因此目前非中国用户无法使用 Gummy 字幕引擎。
如果要使用默认的 Gummy 字幕引擎(使用云端模型进行语音识别和翻译),首先需要获取阿里云百炼平台的 API KEY然后将 API KEY 添加到软件设置中或者配置到环境变量中(仅 Windows 平台支持读取环境变量中的 API KEY这样才能正常使用该模型。相关教程
- [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
- [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
> Vosk 模型的识别效果较差,请谨慎使用。
如果要使用 Vosk 本地字幕引擎,首先需要在 [Vosk Models](https://alphacephei.com/vosk/models) 页面下载你需要的模型,并将模型解压到本地,并将模型文件夹的路径添加到软件的设置中。目前 Vosk 字幕引擎还不支持翻译字幕内容。
![](./assets/media/vosk_zh.png)
**如果你觉得上述字幕引擎不能满足你的需求,而且你会 Python那么你可以考虑开发自己的字幕引擎。详细说明请参考[字幕引擎说明文档](./docs/engine-manual/zh.md)。**
对于开发者,可以自己创建新的字幕引擎。具体通信规范请参考源代码。
## ✨ 特性
- 跨平台、多界面语言支持
- 丰富的字幕样式设置
- 灵活的字幕引擎选择
- 多语言识别与翻译
- 字幕记录展示与导出
- 生成音频输出麦克风输入的字幕
- 生成音频输出麦克风输入的字幕
## ⚙️ 自带字幕引擎说明
目前软件自带 2 个字幕引擎,正在规划新的引擎。它们的详细信息如下。
### Gummy 字幕引擎(云端)
基于通义实验室[Gummy语音翻译大模型](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation/)进行开发,基于[阿里云百炼](https://bailian.console.aliyun.com)的 API 进行调用该云端模型。
**模型详细参数:**
- 音频采样率支持16kHz及以上
- 音频采样位数16bit
- 音频通道数支持:单通道
- 可识别语言:中文、英文、日语、韩语、德语、法语、俄语、意大利语、西班牙语
- 支持的翻译:
- 中文 → 英文、日语、韩语
- 英文 → 中文、日语、韩语
- 日语、韩语、德语、法语、俄语、意大利语、西班牙语 → 中文或英文
**网络流量消耗:**
字幕引擎使用原生采样率(假设为 48kHz进行采样样本位深为 16bit上传音频为为单通道因此上传速率约为
$$
48000\ \text{samples/second} \times 2\ \text{bytes/sample} \times 1\ \text{channel} = 93.75\ \text{KB/s}
$$
而且引擎只会获取到音频流的时候才会上传数据,因此实际上传速率可能更小。模型结果回传流量消耗较小,没有纳入考虑。
### Vosk 字幕引擎(本地)
基于 [vosk-api](https://github.com/alphacep/vosk-api) 开发。目前只支持生成音频对应的原文,不支持生成翻译内容。
### 新规划字幕引擎
以下为备选模型,将根据模型效果和集成难易程度选择。
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
- [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
- [FunASR](https://github.com/modelscope/FunASR)
说明Windows 平台支持生成音频输出和麦克风输入的字幕Linux 平台仅支持生成麦克风输入的字幕。
## 🚀 项目运行
![](./assets/media/structure_zh.png)
### 安装依赖
```bash
@@ -40,18 +126,13 @@ npm install
### 构建字幕引擎
> #### 背景介绍
>
> 所谓的字幕引擎实际上是一个子程序,它会实时获取系统音频输入(录音)或输出(播放声音)的流式数据,并调用音频转文字的模型生成对应音频的字幕。生成的字幕通过 IPC 输出为转换为字符串的 JSON 数据,并返回给主程序。主程序读取字幕数据,处理后显示在窗口上。
>
>目前项目默认使用[阿里云 Gummy 模型](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation/),需要获取阿里云百炼平台的 API KEY 并配置到环境变量中才能正常使用该模型,相关教程:[获取API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)、[将API Key配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)。
>
> 本项目的 gummy 字幕引擎是一个 python 子程序,通过 pyinstaller 打包为可执行文件。 运行字幕引擎子程序的代码在 `src\main\utils\engine.ts` 文件中。
首先进入 `python-subprocess` 文件夹,执行如下指令创建虚拟环境:
首先进入 `engine` 文件夹,执行如下指令创建虚拟环境(需要使用大于等于 Python 3.10 的 Python 运行环境,建议使用 Python 3.12
```bash
# in ./engine folder
python -m venv subenv
# or
python3 -m venv subenv
```
然后激活虚拟环境:
@@ -59,32 +140,51 @@ python -m venv subenv
```bash
# Windows
subenv/Scripts/activate
# Linux
# Linux or macOS
source subenv/bin/activate
```
然后安装依赖(注意如果是 Linux 环境,需要注释调 `requirements.txt` 中的 `PyAudioWPatch`,该模块仅适用于 Windows 环境
然后安装依赖(这一步在 macOS 和 Linux 可能会报错,一般是因为构建失败,需要根据报错信息进行处理
```bash
pip install -r requirements.txt
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
如果在 Linux 系统上安装 `samplerate` 模块报错,可以尝试使用以下命令单独安装:
```bash
pip install samplerate --only-binary=:all:
```
然后使用 `pyinstaller` 构建项目:
```bash
pyinstaller --onefile main-gummy.py
pyinstaller ./main.spec
```
此时项目构建完成,在进入 `python-subprocess/dist` 文件夹可见对应的可执行文件。即可进行后续操作
注意 `main.spec` 文件中 `vosk` 库的路径可能不正确,需要根据实际状况配置(与 Python 环境的版本相关)
```
# Windows
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
# Linux or macOS
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
```
此时项目构建完成,在进入 `engine/dist` 文件夹可见对应的可执行文件。即可进行后续操作。
### 运行项目
```bash
npm run dev
```
### 构建项目
注意目前软件没有适配 macOS 平台,请使用 Windows 或 Linux 系统进行构建。
### 构建项目
```bash
# For windows
@@ -93,4 +193,16 @@ npm run build:win
npm run build:mac
# For Linux
npm run build:linux
```
```
注意,根据不同的平台需要修改项目根目录下 `electron-builder.yml` 文件中的配置内容:
```yml
extraResources:
# For Windows
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# For macOS and Linux
# - from: ./engine/dist/main
# to: ./engine/main
```

207
README_en.md Normal file
View File

@@ -0,0 +1,207 @@
<div align="center" >
<img src="./build/icon.png" width="100px" height="100px"/>
<h1 align="center">auto-caption</h1>
<p>Auto Caption is a cross-platform real-time caption display software.</p>
<p>
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-0.6.0-blue"></a>
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
<img src="https://img.shields.io/github/stars/HiMeditator/auto-caption?style=social">
</p>
<p>
| <a href="./README.md">简体中文</a>
| <b>English</b>
| <a href="./README_ja.md">日本語</a> |
</p>
<p><i>Version 0.6.0 has been released, featuring a major refactor of the subtitle engine code to improve code extensibility. More subtitle engines are being developed...</i></p>
</div>
![](./assets/media/main_en.png)
## 📥 Download
[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
## 📚 Documentation
[Auto Caption User Manual](./docs/user-manual/en.md)
[Caption Engine Documentation](./docs/engine-manual/en.md)
[Project API Documentation (Chinese)](./docs/api-docs/)
[Changelog](./docs/CHANGELOG.md)
## 📖 Basic Usage
The software has been adapted for Windows, macOS, and Linux platforms. The tested platform information is as follows:
| OS Version | Architecture | System Audio Input | System Audio Output |
| ------------------ | ------------ | ------------------ | ------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
Additional configuration is required to capture system audio output on macOS and Linux platforms. See [Auto Caption User Manual](./docs/user-manual/en.md) for details.
> The international version of Alibaba Cloud services does not provide the Gummy model, so non-Chinese users currently cannot use the Gummy caption engine.
To use the default Gummy caption engine (which uses cloud-based models for speech recognition and translation), you first need to obtain an API KEY from the Alibaba Cloud Bailian platform. Then add the API KEY to the software settings or configure it in environment variables (only Windows platform supports reading API KEY from environment variables) to properly use this model. Related tutorials:
- [Obtaining API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
- [Configuring API Key through Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
> The recognition performance of Vosk models is suboptimal, please use with caution.
To use the Vosk local caption engine, first download your required model from [Vosk Models](https://alphacephei.com/vosk/models) page, extract the model locally, and add the model folder path to the software settings. Currently, the Vosk caption engine does not support translated captions.
![](./assets/media/vosk_en.png)
**If you find the above caption engines don't meet your needs and you know Python, you may consider developing your own caption engine. For detailed instructions, please refer to the [Caption Engine Documentation](./docs/engine-manual/en.md).**
## ✨ Features
- Cross-platform, multi-language UI support
- Rich caption style settings
- Flexible caption engine selection
- Multi-language recognition and translation
- Caption recording display and export
- Generate captions for audio output or microphone input
## ⚙️ Built-in Subtitle Engines
Currently, the software comes with 2 subtitle engines, with new engines under development. Their detailed information is as follows.
### Gummy Subtitle Engine (Cloud)
Developed based on Tongyi Lab's [Gummy Speech Translation Model](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation/), using [Alibaba Cloud Bailian](https://bailian.console.aliyun.com) API to call this cloud model.
**Model Parameters:**
- Supported audio sample rate: 16kHz and above
- Audio sample depth: 16bit
- Supported audio channels: Mono
- Recognizable languages: Chinese, English, Japanese, Korean, German, French, Russian, Italian, Spanish
- Supported translations:
- Chinese → English, Japanese, Korean
- English → Chinese, Japanese, Korean
- Japanese, Korean, German, French, Russian, Italian, Spanish → Chinese or English
**Network Traffic Consumption:**
The subtitle engine uses native sample rate (assumed to be 48kHz) for sampling, with 16bit sample depth and mono channel, so the upload rate is approximately:
$$
48000\ \text{samples/second} \times 2\ \text{bytes/sample} \times 1\ \text{channel} = 93.75\ \text{KB/s}
$$
The engine only uploads data when receiving audio streams, so the actual upload rate may be lower. The return traffic consumption of model results is small and not considered here.
### Vosk Subtitle Engine (Local)
Developed based on [vosk-api](https://github.com/alphacep/vosk-api). Currently only supports generating original text from audio, does not support translation content.
### Planned New Subtitle Engines
The following are candidate models that will be selected based on model performance and ease of integration.
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
- [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
- [FunASR](https://github.com/modelscope/FunASR)
## 🚀 Project Setup
![](./assets/media/structure_en.png)
### Install Dependencies
```bash
npm install
```
### Build Subtitle Engine
First enter the `engine` folder and execute the following commands to create a virtual environment (requires Python 3.10 or higher, with Python 3.12 recommended):
```bash
# in ./engine folder
python -m venv subenv
# or
python3 -m venv subenv
```
Then activate the virtual environment:
```bash
# Windows
subenv/Scripts/activate
# Linux or macOS
source subenv/bin/activate
```
Then install dependencies (this step might result in errors on macOS and Linux, usually due to build failures, and you need to handle them based on the error messages):
```bash
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
If you encounter errors when installing the `samplerate` module on Linux systems, you can try installing it separately with this command:
```bash
pip install samplerate --only-binary=:all:
```
Then use `pyinstaller` to build the project:
```bash
pyinstaller ./main.spec
```
Note that the path to the `vosk` library in `main-vosk.spec` might be incorrect and needs to be configured according to the actual situation (related to the version of the Python environment).
```
# Windows
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
# Linux or macOS
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
```
After the build completes, you can find the executable file in the `engine/dist` folder. Then proceed with subsequent operations.
### Run Project
```bash
npm run dev
```
### Build Project
```bash
# For windows
npm run build:win
# For macOS
npm run build:mac
# For Linux
npm run build:linux
```
Note: You need to modify the configuration content in the `electron-builder.yml` file in the project root directory according to different platforms:
```yml
extraResources:
# For Windows
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# For macOS and Linux
# - from: ./engine/dist/main
# to: ./engine/main
```

207
README_ja.md Normal file
View File

@@ -0,0 +1,207 @@
<div align="center" >
<img src="./build/icon.png" width="100px" height="100px"/>
<h1 align="center">auto-caption</h1>
<p>Auto Caption はクロスプラットフォームのリアルタイム字幕表示ソフトウェアです。</p>
<p>
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-0.6.0-blue"></a>
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
<img src="https://img.shields.io/github/stars/HiMeditator/auto-caption?style=social">
</p>
<p>
| <a href="./README.md">简体中文</a>
| <a href="./README_en.md">English</a>
| <b>日本語</b> |
</p>
<p><i>v0.6.0 バージョンがリリースされ、字幕エンジンコードが大規模にリファクタリングされ、コードの拡張性が向上しました。より多くの字幕エンジンの開発が試みられています...</i></p>
</div>
![](./assets/media/main_ja.png)
## 📥 ダウンロード
[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
## 📚 関連ドキュメント
[Auto Caption ユーザーマニュアル](./docs/user-manual/ja.md)
[字幕エンジン説明ドキュメント](./docs/engine-manual/ja.md)
[プロジェクト API ドキュメント(中国語)](./docs/api-docs/)
[更新履歴](./docs/CHANGELOG.md)
## 📖 基本使い方
このソフトウェアはWindows、macOS、Linuxプラットフォームに対応しています。テスト済みのプラットフォーム情報は以下の通りです
| OS バージョン | アーキテクチャ | システムオーディオ入力 | システムオーディオ出力 |
| ------------------ | ------------ | ------------------ | ------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
macOSおよびLinuxプラットフォームでシステムオーディオ出力を取得するには追加設定が必要です。詳細は[Auto Captionユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。
> 阿里雲の国際版サービスでは Gummy モデルを提供していないため、現在中国以外のユーザーは Gummy 字幕エンジンを使用できません。
デフォルトの Gummy 字幕エンジン(クラウドベースのモデルを使用した音声認識と翻訳)を使用するには、まず阿里雲百煉プラットフォームから API KEY を取得する必要があります。その後、API KEY をソフトウェア設定に追加するか、環境変数に設定しますWindows プラットフォームのみ環境変数からの API KEY 読み取りをサポート)。関連チュートリアル:
- [API KEY の取得(中国語)](https://help.aliyun.com/zh/model-studio/get-api-key)
- [環境変数を通じて API Key を設定(中国語)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
> Vosk モデルの認識精度は低いため、注意してご使用ください。
Vosk ローカル字幕エンジンを使用するには、まず [Vosk Models](https://alphacephei.com/vosk/models) ページから必要なモデルをダウンロードし、ローカルに解凍した後、モデルフォルダのパスをソフトウェア設定に追加してください。現在、Vosk 字幕エンジンは字幕の翻訳をサポートしていません。
![](./assets/media/vosk_ja.png)
**上記の字幕エンジンがご要望を満たさず、かつ Python の知識をお持ちの場合、独自の字幕エンジンを開発することも可能です。詳細な説明は[字幕エンジン説明書](./docs/engine-manual/ja.md)をご参照ください。**
## ✨ 特徴
- クロスプラットフォーム、多言語 UI サポート
- 豊富な字幕スタイル設定
- 柔軟な字幕エンジン選択
- 多言語認識と翻訳
- 字幕記録の表示とエクスポート
- オーディオ出力またはマイク入力からの字幕生成
## ⚙️ 字幕エンジン説明
現在、ソフトウェアには2つの字幕エンジンが搭載されており、新しいエンジンが計画されています。それらの詳細情報は以下の通りです。
### Gummy 字幕エンジン(クラウド)
Tongyi Lab の [Gummy 音声翻訳大規模モデル](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation/)をベースに開発され、[Alibaba Cloud Bailian](https://bailian.console.aliyun.com) の APIを使用してこのクラウドモデルを呼び出します。
**モデル詳細パラメータ:**
- サポートするオーディオサンプルレート16kHz以上
- オーディオサンプルビット深度16bit
- サポートするオーディオチャンネル:モノラル
- 認識可能な言語:中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、イタリア語、スペイン語
- サポートする翻訳:
- 中国語 → 英語、日本語、韓国語
- 英語 → 中国語、日本語、韓国語
- 日本語、韓国語、ドイツ語、フランス語、ロシア語、イタリア語、スペイン語 → 中国語または英語
**ネットワークトラフィック消費量:**
字幕エンジンはネイティブサンプルレート48kHz と仮定)でサンプリングを行い、サンプルビット深度は 16bit、アップロードオーディオはモラルチャンネルのため、アップロードレートは約
$$
48000\ \text{samples/second} \times 2\ \text{bytes/sample} \times 1\ \text{channel} = 93.75\ \text{KB/s}
$$
また、エンジンはオーディオストームを取得したときのみデータをアップロードするため、実際のアップロードレートはさらに小さくなる可能性があります。モデル結果の返信トラフィック消費量は小さく、ここでは考慮していません。
### Vosk字幕エンジンローカル
[vosk-api](https://github.com/alphacep/vosk-api) をベースに開発されています。現在は音声に対応する原文の生成のみをサポートしており、翻訳コンテンツはサポートしていません。
### 新規計画字幕エンジン
以下は候補モデルであり、モデルの性能と統合の容易さに基づいて選択されます。
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
- [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
- [FunASR](https://github.com/modelscope/FunASR)
## 🚀 プロジェクト実行
![](./assets/media/structure_ja.png)
### 依存関係のインストール
```bash
npm install
```
### 字幕エンジンの構築
まず `engine` フォルダに入り、以下のコマンドを実行して仮想環境を作成しますPython 3.10 以上が必要で、Python 3.12 が推奨されます):
```bash
# ./engine フォルダ内
python -m venv subenv
# または
python3 -m venv subenv
```
次に仮想環境をアクティブにします:
```bash
# Windows
subenv/Scripts/activate
# Linux または macOS
source subenv/bin/activate
```
次に依存関係をインストールします(このステップでは macOS と Linux でエラーが発生する可能性があります。通常はビルド失敗によるもので、エラーメッセージに基づいて対処する必要があります):
```bash
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
Linux システムで `samplerate` モジュールのインストールに問題が発生した場合、以下のコマンドで個別にインストールを試すことができます:
```bash
pip install samplerate --only-binary=:all:
```
その後、`pyinstaller` を使用してプロジェクトをビルドします:
```bash
pyinstaller ./main.spec
```
`main-vosk.spec` ファイル内の `vosk` ライブラリのパスが正しくない可能性があるため、実際の状況Python 環境のバージョンに関連)に応じて設定する必要があります。
```
# Windows
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
# Linux または macOS
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
```
これでプロジェクトのビルドが完了し、`engine/dist` フォルダ内に対応する実行可能ファイルが確認できます。その後、次の操作に進むことができます。
### プロジェクト実行
```bash
npm run dev
```
### プロジェクト構築
```bash
# Windows 用
npm run build:win
# macOS 用
npm run build:mac
# Linux 用
npm run build:linux
```
注意: プラットフォームに応じて、プロジェクトルートディレクトリにある `electron-builder.yml` ファイルの設定内容を変更する必要があります:
```yml
extraResources:
# Windows 用
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# macOS と Linux 用
# - from: ./engine/dist/main
# to: ./engine/main
```

Binary file not shown.

Before

Width:  |  Height:  |  Size: 311 KiB

BIN
assets/media/main_en.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 370 KiB

BIN
assets/media/main_ja.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 387 KiB

BIN
assets/media/main_zh.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 396 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 323 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 324 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 324 KiB

BIN
assets/media/vosk_en.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

BIN
assets/media/vosk_ja.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

BIN
assets/media/vosk_zh.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

BIN
assets/structure.pptx Normal file

Binary file not shown.

View File

@@ -5,7 +5,9 @@
The following icons are used under CC BY 4.0 license:
- icon.png
- icon.svg
- icon.icns
Source:
- https://icon-icons.com/en/pack/Duetone/2064
- https://icon-icons.com/en/pack/Duetone/2064

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>com.apple.security.cs.allow-jit</key>
<true/>
<key>com.apple.security.cs.allow-unsigned-executable-memory</key>
<true/>
<key>com.apple.security.cs.allow-dyld-environment-variables</key>
<true/>
</dict>
</plist>

BIN
build/icon.icns Normal file

Binary file not shown.

BIN
build/icon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

1
build/icon.svg Normal file
View File

@@ -0,0 +1 @@
<svg id="Layer_1" data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" viewBox="6 6 52 52"><defs><style>.cls-1{fill:#a8d2f0;}.cls-2{fill:#389ad6;}.cls-3,.cls-4{fill:none;}.cls-4{stroke:#295183;stroke-linecap:round;stroke-linejoin:round;stroke-width:2px;}.cls-5{fill:#295183;}</style></defs><title>weather, forecast, direction, compass</title><path class="cls-1" d="M25.56,17.37c-.87,6.45-1.73,22.73,10.26,29.37A1.77,1.77,0,0,1,35.15,50C27.56,51,15,50,13.05,33.13a1.9,1.9,0,0,1,0-.21c0-1.24.11-13.46,10.07-17.41A1.77,1.77,0,0,1,25.56,17.37Z"/><path class="cls-2" d="M30.32,35l1,4.45a3.2,3.2,0,0,0-.22.72c-.1.46-.19.92-.29,1.38-.13.68-.39,1.49-1.06,1.67s-1.32-.44-1.55-1.11S28,40.72,27.84,40s-.76-1.33-1.45-1.26c-.34,0-.62.27-1,.32-.78.16-.31-1.79-.46-2.13a1.67,1.67,0,0,0-1.08-.82c-.91-.27-3.85-.37-3.06-2.07a1.68,1.68,0,0,1,1.07-.76,9.87,9.87,0,0,1,1.4-.32,3.94,3.94,0,0,0,1.26-.32l4.44,1,1.07.23Z"/><path class="cls-2" d="M30.32,28.31l-.24,1.07L29,29.62,27.26,30a1.83,1.83,0,0,0,.52-.8A6,6,0,0,0,28,28c0-.26.07-.5.12-.74a1.26,1.26,0,0,1,.1-.29Z"/><path class="cls-2" d="M34.62,29.37l0-.2.69-.43a2.66,2.66,0,0,1-.38.7Z"/><line class="cls-3" x1="33.74" y1="37.87" x2="33.45" y2="39.16"/><path class="cls-2" d="M37,35.79A4.71,4.71,0,0,1,36,36a7.51,7.51,0,0,0-1,.17,2.43,2.43,0,0,0-.37.13,2,2,0,0,0-.62.47l.4-1.78.23-1.07,1.07-.23Z"/><polyline class="cls-4" points="32 20.86 30.47 27.68 30.17 28.99 29.95 29.95 28.99 30.17 27.42 30.52 26.41 30.75 25.24 31.01 20.86 32 25 32.93 28.99 33.83 29.95 34.04 30.17 35.01 31.07 39.01 32 43.14 32.99 38.75 33.25 37.59 33.47 36.6 33.83 35 34.04 34.04 35 33.83 36.27 33.54 43.14 32 35.01 30.17 34.28 30.01 34.04 29.95 34 29.77 33.83 28.99 33.38 26.98"/><polygon class="cls-4" points="30.17 28.99 29.95 29.95 28.99 30.17 28.09 28.74 26.98 26.98 28.29 27.81 30.17 28.99"/><polygon class="cls-4" points="30.17 35.01 26.98 37.02 28.99 33.83 29.95 34.04 30.17 35.01"/><polygon class="cls-4" points="37.02 37.02 35.26 35.91 33.83 35 34.04 34.04 35 33.83 36.2 35.72 37.02 37.02"/><polygon class="cls-4" points="37.02 26.98 35.01 30.17 34.28 30.01 34.04 29.95 34 29.77 33.83 28.99 37.02 26.98"/><path class="cls-4" d="M38.42,14.13A19.08,19.08,0,1,1,32,13a19.19,19.19,0,0,1,2,.11"/><circle class="cls-5" cx="32.03" cy="16.99" r="1"/><circle class="cls-5" cx="47.01" cy="32.03" r="1"/><circle class="cls-5" cx="31.97" cy="47.01" r="1"/><circle class="cls-5" cx="16.99" cy="31.97" r="1"/></svg>

After

Width:  |  Height:  |  Size: 2.4 KiB

147
docs/CHANGELOG.md Normal file
View File

@@ -0,0 +1,147 @@
## v0.0.1
2025-06-22
发布第一版软件。
## v0.1.0
2025-06-26
### 新增功能
- 添加错误通知
- 添加默认引擎的环境变量检查
- 添加配置数据文件保存和载入
- 添加字幕样式恢复默认的选项
- 添加项目关于信息
### 新增文档
- 添加用户说明文档
- 添加字幕引擎说明文档
## v0.2.0
2025-07-05
对项目进行了重构,修复了 bug添加了新功能。本版本为正式版。
### 新增功能
- 添加长字幕内容隐藏功能 (#1)
- 添加多界面语言支持(中文、英语、日语)
- 添加暗色主题
### 提升体验
- 优化界面布局
- 添加更多可保存和载入的配置项
- 为字幕引擎添加更严格的状态限制,防止出现僵尸进程
### 修复bug
- 添加字幕引擎长时间空置后报错的问题 (#2)
### 新增文档
- 新增日语说明文档
- 新增英语、日语字幕引擎说明文档和用户手册
- 新增 electron ipc api 文档
## v0.3.0
2025-07-09
对字幕引擎代码进行了重构,软件适配了 macOS 平台,添加了新功能。
### 新增功能
- 添加软件内设置 API KEY 的功能
- 添加字幕字体粗细和文本阴影的设置
- 添加复制字幕记录到剪贴板的功能 (#3)
### 优化体验
- 字幕时间记录精确到毫秒
- 更详细的说明文档(添加字幕引擎规格说明、用户文档和字幕引擎文档更新) (#4)
- 适配 macOS 平台
- 字幕窗口有了更大的顶置优先级
- 预览窗口可以实时显示最新的字幕内容
### 修复bug
- 修复使用系统主题时暗色系统载入为亮色的问题
## v0.4.0
2025-07-11
添加了 Vosk 本地字幕引擎,更新了项目文档,继续优化使用体验。
### 新增功能
- 添加了基于 Vosk 的字幕引擎, **当前 Vosk 字幕引擎暂不支持翻译**
- 更新用户界面,增加 Vosk 引擎选项和模型路径设置
### 优化体验
- 字幕窗口右上角图标的颜色改为和字幕原文字体颜色一致
## v0.5.0
2025-07-15
为软件本体添加了更多功能、适配了 Linux。
### 新增功能
- 适配了 Linux 平台
- 新增修改字幕时间功能,可调整字幕时间
- 支持导出 srt 格式的字幕记录
- 支持显示字幕引擎状态pid、ppid、CPU占用率、内存占用、运行时间
### 优化体验
- 调整字幕窗口右上角图标为竖向排布
- 过滤 Gummy 字幕引擎输出的不完整字幕
## v0.5.1
2025-07-17
### 修复 bug
- 修复无法调用自定义字幕引擎的 bug
- 修复自定义字幕引擎的参数失效 bug
## v0.6.0
2025-07-29
### 新增功能
- 新增字幕记录排序功能,可选择字幕记录正序或倒叙显示
### 优化体验
- 减小了软件安装包的体积
- 微调字幕引擎设置界面布局
- 交换窗口界面信息弹窗和错误弹窗的位置,防止提示信息挡住操作
- 提高程序健壮性,完全避免字幕引擎进程成为孤儿进程
- 修改字幕引擎文档,添加更详细的开发说明
### 项目优化
- 重构字幕引擎,提示字幕引擎代码的可扩展性和可读性
- 合并 Gummy 和 Vosk 引擎为单个可执行文件
- 字幕引擎和主程序添加 Socket 通信,完全避免字幕引擎成为孤儿进程
## v0.7.0
2025-08-xx
### 新增功能
- 添加字幕窗口宽度记忆,重新打开时与上次字幕窗口宽度一致
- 在尝试关闭字幕引擎 4s 后字幕引擎仍未关闭,则强制关闭字幕引擎

34
docs/TODO.md Normal file
View File

@@ -0,0 +1,34 @@
## 已完成
- [x] 添加英语和日语语言支持 *2025/07/04*
- [x] 添加暗色主题 *2025/07/04*
- [x] 优化长字幕显示效果 *2025/07/05*
- [x] 修复字幕引擎空置报错的问题 *2025/07/05*
- [x] 增强字幕窗口顶置优先级 *2025/07/07*
- [x] 添加对自带字幕引擎的详细规格说明 *2025/07/07*
- [x] 添加复制字幕到剪贴板功能 *2025/07/08*
- [x] 适配 macOS 平台 *2025/07/08*
- [x] 添加字幕文字描边 *2025/07/09*
- [x] 添加基于 Vosk 的字幕引擎 *2025/07/09*
- [x] 适配 Linux 平台 *2025/07/13*
- [x] 字幕窗口右上角图标改为竖向排布 *2025/07/14*
- [x] 可以调整字幕时间轴 *2025/07/14*
- [x] 可以导出 srt 格式的字幕记录 *2025/07/14*
- [x] 可以获取字幕引擎的系统资源消耗情况 *2025/07/15*
- [x] 添加字幕记录按时间降序排列选择 *2025/07/26*
- [x] 重构字幕引擎 *2025/07/28*
- [x] 优化前端界面提示消息 *2025/07/29*
## 待完成
- [ ] 验证 / 添加基于 sherpa-onnx 的字幕引擎
## 后续计划
- [ ] 添加 Ollama 模型用于本地字幕引擎的翻译
- [ ] 验证 / 添加基于 FunASR 的字幕引擎
- [ ] 减小软件不必要的体积
## 遥远的未来
- [ ] 使用 Tauri 框架重新开发

View File

@@ -0,0 +1,109 @@
# caption engine api-doc
本文档主要介绍字幕引擎和 Electron 主进程进程的通信约定。
## 原理说明
本项目的 Python 进程通过标准输出向 Electron 主进程发送数据。Python 进程标准输出 (`sys.stdout`) 的内容一定为一行一行的字符串。且每行字符串均可以解释为一个 JSON 对象。每个 JSON 对象一定有 `command` 参数。
Electron 主进程通过 TCP Socket 向 Python 进程发送数据。发送的数据均是转化为字符串的对象,对象格式一定为:
```js
{
command: string,
content: string
}
```
## 标准输出约定
> 数据传递方向:字幕引擎进程 => Electron 主进程
当 JSON 对象的 `command` 参数为下列值时,表示的对应的含义:
### `connect`
```js
{
command: "connect",
content: ""
}
```
字幕引擎 TCP Socket 服务已经准备好,命令 Electron 主进程连接字幕引擎 Socket 服务
### `kill`
```js
{
command: "connect",
content: ""
}
```
命令 Electron 主进程强制结束字幕引擎进程。
### `caption`
```js
{
command: "caption",
index: number,
time_s: string,
time_t: string,
text: string,
translation: string
}
```
Python 端监听到的音频流转换为的字幕数据。
### `print`
```js
{
command: "print",
content: string
}
```
输出 Python 端打印的内容。
### `info`
```js
{
command: "info",
content: string
}
```
Python 端打印的提示信息,比起 `print`,该信息更希望 Electron 端的关注。
### `usage`
```js
{
command: "usage",
content: string
}
```
Gummy 字幕引擎结束时打印计费消耗信息。
## TCP Socket
> 数据传递方向Electron 主进程 => 字幕引擎进程
当 JSON 对象的 `command` 参数为下列值时,表示的对应的含义:
### `stop`
```js
{
command: "stop",
content: ""
}
```
命令当前字幕引擎停止监听并结束任务。

View File

@@ -0,0 +1,315 @@
# electron ipc api-doc
本文档主要记录主进程和渲染进程的通信约定。
## 命名方式
本项目渲染进程包含两个:字幕窗口和控制窗口,主进程需要分别和两者进行通信。通信命令的命名规则如下:
1. 命令一般由三个关键字组成,由点号隔开。
2. 第一个关键字表示通信发送目标:
- `config` 表示控制窗口类实例(后端)或控制窗口(前端)
- `engine` 表示字幕窗口类实例(后端)或字幕窗口(前端)
- `both` 表示上述对象都有可能成为目标
3. 第二个关键字表示需要修改的对象 / 发生改变的对象,采用小驼峰命名
4. 第三个关键字一般是动词,表示通信发生时对应动作 / 需要进行的操作
根据上面的描述可以看出通信命令一般有两种语义,一种表示要求执行的操作,另一种表示当前发生的事件。
## 前端 <=> 后端
### `both.window.mounted`
**介绍:** 前端窗口挂载完毕,请求最新的配置数据
**发起方:** 前端
**接收方:** 后端
**数据类型:**
- 发送:无数据
- 接收:`FullConfig`
### `control.nativeTheme.get`
**介绍:** 前端获取系统当前的主题
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:**
- 发送:无数据
- 接收:`string`
### `control.folder.select`
**介绍:** 打开文件夹选择器,并将用户选择的文件夹路径返回给前端
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:**
- 发送:无数据
- 接收:`string`
### `control.engine.info`
**介绍:** 获取字幕引擎的资源消耗情况
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:**
- 发送:无数据
- 接收:`EngineInfo`
## 前端 ==> 后端
### `control.uiLanguage.change`
**介绍:** 前端修改字界面语言,将修改同步给后端
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** `UILanguage`
### `control.uiTheme.change`
**介绍:** 前端修改字界面主题,将修改同步给后端
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** `UITheme`
### `control.leftBarWidth.change`
**介绍:** 前端修改边栏宽度,将修改同步给后端
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** `number`
### `control.captionLog.clear`
**介绍:** 清空字幕记录
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** 无数据
### `control.styles.change`
**介绍:** 前端修改字幕样式,将修改同步给后端
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** `Styles`
### `control.styles.reset`
**介绍:** 将字幕样式恢复为默认
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** 无数据
### `control.controls.change`
**介绍:** 前端修改了字幕引擎配置,将最新配置发送给后端
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** `Controls`
### `control.captionWindow.activate`
**介绍:** 激活字幕窗口
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** 无数据
### `control.engine.start`
**介绍:** 启动字幕引擎
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** 无数据
### `control.engine.stop`
**介绍:** 关闭字幕引擎
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** 无数据
### `caption.windowHeight.change`
**介绍:** 字幕窗口宽度发生改变
**发起方:** 前端字幕窗口
**接收方:** 后端字幕窗口实例
**数据类型:** `number`
### `caption.pin.set`
**介绍:** 是否将窗口置顶
**发起方:** 前端字幕窗口
**接收方:** 后端字幕窗口实例
**数据类型:** `boolean`
### `caption.controlWindow.activate`
**介绍:** 激活控制窗口
**发起方:** 前端字幕窗口
**接收方:** 后端字幕窗口实例
**数据类型:** 无数据
### `caption.window.close`
**介绍:** 关闭字幕窗口
**发起方:** 前端字幕窗口
**接收方:** 后端字幕窗口实例
**数据类型:** 无数据
## 后端 ==> 前端
### `control.uiLanguage.set`
**介绍:** 后端将最新界面语言发送给前端,前端进行设置
**发起方:** 后端
**接收方:** 字幕窗口
**数据类型:** `UILanguage`
### `control.nativeTheme.change`
**介绍:** 系统主题发生改变
**发起方:** 后端
**接收方:** 前端控制窗口
**数据类型:** `string`
### `control.engine.started`
**介绍:** 引擎启动成功,参数为引擎的进程 ID
**发起方:** 后端
**接收方:** 前端控制窗口
**数据类型:** `number`
### `control.engine.stopped`
**介绍:** 引擎关闭
**发起方:** 后端
**接收方:** 前端控制窗口
**数据类型:** 无数据
### `control.error.occurred`
**介绍:** 发送错误
**发起方:** 后端
**接收方:** 前端控制窗口
**数据类型:** `string`
### `control.controls.set`
**介绍:** 后端将最新字幕引擎配置发送给前端,前端进行设置
**发起方:** 后端
**接收方:** 前端控制窗口
**数据类型:** `Controls`
### `both.styles.set`
**介绍:** 后端将最新字幕样式发送给前端,前端进行设置
**发起方:** 后端
**接收方:** 前端
**数据类型:** `Styles`
### `both.captionLog.add`
**介绍:** 添加一条新的字幕数据
**发起方:** 后端
**接收方:** 前端
**数据类型:** `CaptionItem`
### `both.captionLog.upd`
**介绍:** 更新最后一条字幕数据
**发起方:** 后端
**接收方:** 前端
**数据类型:** `CaptionItem`
### `both.captionLog.set`
**介绍:** 设置全部的字幕数据
**发起方:** 后端
**接收方:** 前端
**数据类型:** `CaptionItem[]`

199
docs/engine-manual/en.md Normal file
View File

@@ -0,0 +1,199 @@
# Caption Engine Documentation
Corresponding Version: v0.6.0
![](../../assets/media/structure_en.png)
## Introduction to the Caption Engine
The so-called caption engine is essentially a subprogram that continuously captures real-time streaming data from the system's audio input (microphone) or output (speakers) and invokes an audio-to-text model to generate corresponding captions for the audio. The generated captions are converted into JSON-formatted string data and passed to the main program via standard output (ensuring the string can be correctly interpreted as a JSON object by the main program). The main program reads and interprets the caption data, processes it, and displays it in the window.
**The communication standard between the caption engine process and the Electron main process is: [caption engine api-doc](../api-docs/caption-engine.md).**
## Workflow
The communication flow between the main process and the caption engine:
### Starting the Engine
- **Main Process**: Uses `child_process.spawn()` to launch the caption engine process.
- **Caption Engine Process**: Creates a TCP Socket server thread. After creation, it outputs a JSON object string via standard output, containing a `command` field with the value `connect`.
- **Main Process**: Monitors the standard output of the caption engine process, attempts to split it line by line, parses it into a JSON object, and checks if the `command` field value is `connect`. If so, it connects to the TCP Socket server.
### Caption Recognition
- **Caption Engine Process**: The main thread monitors system audio output, sends audio data chunks to the caption engine for parsing, and outputs the parsed caption data object strings via standard output.
- **Main Process**: Continues to monitor the standard output of the caption engine and performs different operations based on the `command` field of the parsed object.
### Closing the Engine
- **Main Process**: When the user closes the caption engine via the frontend, the main process sends a JSON object string with the `command` field set to `stop` to the caption engine process via Socket communication.
- **Caption Engine Process**: Receives the object string, parses it, and if the `command` field is `stop`, sets the global variable `thread_data.status` to `stop`.
- **Caption Engine Process**: The main thread's loop for monitoring system audio output ends when `thread_data.status` is not `running`, releases resources, and terminates.
- **Main Process**: Detects the termination of the caption engine process, performs corresponding cleanup, and provides feedback to the frontend.
## Implemented Features
The following features are already implemented and can be reused directly.
### Standard Output
Supports printing general information, commands, and error messages.
Example:
```python
from utils import stdout, stdout_cmd, stdout_obj, stderr
stdout("Hello") # {"command": "print", "content": "Hello"}\n
stdout_cmd("connect", "8080") # {"command": "connect", "content": "8080"}\n
stdout_obj({"command": "print", "content": "Hello"})
stderr("Error Info")
```
### Creating a Socket Service
This Socket service listens on a specified port, parses content sent by the Electron main program, and may modify the value of `thread_data.status`.
Example:
```python
from utils import start_server
from utils import thread_data
port = 8080
start_server(port)
while thread_data == 'running':
# do something
pass
```
### Audio Capture
The `AudioStream` class captures audio data and is cross-platform, supporting Windows, Linux, and macOS. Its initialization includes two parameters:
- `audio_type`: The type of audio to capture. `0` for system output audio (speakers), `1` for system input audio (microphone).
- `chunk_rate`: The frequency of audio data capture, i.e., the number of audio chunks captured per second.
The class includes three methods:
- `open_stream()`: Starts audio capture.
- `read_chunk() -> bytes`: Reads an audio chunk.
- `close_stream()`: Stops audio capture.
Example:
```python
from sysaudio import AudioStream
audio_type = 0
chunk_rate = 20
stream = AudioStream(audio_type, chunk_rate)
stream.open_stream()
while True:
data = stream.read_chunk()
# do something with data
pass
stream.close_stream()
```
### Audio Processing
The captured audio stream may require preprocessing before conversion to text. Typically, multi-channel audio needs to be converted to mono, and resampling may be necessary. This project provides three audio processing functions:
- `merge_chunk_channels(chunk: bytes, channels: int) -> bytes`: Converts a multi-channel audio chunk to mono.
- `resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`: Converts a multi-channel audio chunk to mono and resamples it.
- `resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`: Resamples a mono audio chunk.
## Features to Be Implemented in the Caption Engine
### Audio-to-Text Conversion
After obtaining a suitable audio stream, it needs to be converted to text. Typically, various models (cloud-based or local) are used for this purpose. Choose the appropriate model based on requirements.
This part is recommended to be encapsulated as a class with three methods:
- `start(self)`: Starts the model.
- `send_audio_frame(self, data: bytes)`: Processes the current audio chunk data. **The generated caption data is sent to the Electron main process via standard output.**
- `stop(self)`: Stops the model.
Complete caption engine examples:
- [gummy.py](../../engine/audio2text/gummy.py)
- [vosk.py](../../engine/audio2text/vosk.py)
### Caption Translation
Some speech-to-text models do not provide translation. If needed, a translation module must be added.
### Sending Caption Data
After obtaining the text for the current audio stream, it must be sent to the main program. The caption engine process passes caption data to the Electron main process via standard output.
The content must be a JSON string, with the JSON object including the following parameters:
```typescript
export interface CaptionItem {
command: "caption",
index: number, // Caption sequence number
time_s: string, // Start time of the current caption
time_t: string, // End time of the current caption
text: string, // Caption content
translation: string // Caption translation
}
```
**Note: Ensure the buffer is flushed after each JSON output to guarantee the Electron main process receives a string that can be parsed as a JSON object.**
It is recommended to use the project's `stdout_obj` function for sending.
### Command-Line Parameter Specification
Custom caption engine settings are provided via command-line arguments. The current project uses the following parameters:
```python
import argparse
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
# Common parameters
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
parser.add_argument('-c', '--chunk_rate', default=20, help='Number of audio stream chunks collected per second')
parser.add_argument('-p', '--port', default=8080, help='The port to run the server on, 0 for no server')
# Gummy-specific parameters
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
# Vosk-specific parameters
parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
```
For example, to use the Gummy model with Japanese as the source language, Chinese as the target language, and system audio output captions with 0.1s audio chunks, the command-line arguments would be:
```bash
python main.py -e gummy -s ja -t zh -a 0 -c 10 -k <dashscope-api-key>
```
## Additional Notes
### Communication Standards
[caption engine api-doc](../api-docs/caption-engine.md)
### Program Entry
[main.py](../../engine/main.py)
### Development Recommendations
Apart from audio-to-text conversion, it is recommended to reuse the existing code. In this case, the following additions are needed:
- `engine/audio2text/`: Add a new audio-to-text class (file-level).
- `engine/main.py`: Add new parameter settings and workflow functions (refer to `main_gummy` and `main_vosk` functions).
### Packaging
After development and testing, the caption engine must be packaged into an executable. Typically, `pyinstaller` is used. If the packaged executable reports errors, check for missing dependencies.
### Execution
With a functional caption engine, it can be launched in the caption software window by specifying the engine's path and runtime arguments.
![](../img/02_en.png)

201
docs/engine-manual/ja.md Normal file
View File

@@ -0,0 +1,201 @@
# 字幕エンジン説明ドキュメント
対応バージョンv0.6.0
この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
![](../../assets/media/structure_ja.png)
## 字幕エンジン紹介
字幕エンジンとは、システムのオーディオ入力マイクまたは出力スピーカーのストリーミングデータをリアルタイムで取得し、音声を文字に変換するモデルを呼び出して対応する字幕を生成するサブプログラムです。生成された字幕はJSON形式の文字列データに変換され、標準出力を介してメインプログラムに渡されますメインプログラムが受け取る文字列が正しくJSONオブジェクトとして解釈できる必要があります。メインプログラムは字幕データを読み取り、解釈して処理した後、ウィンドウに表示します。
**字幕エンジンプロセスとElectronメインプロセス間の通信は、[caption engine api-doc](../api-docs/caption-engine.md)に準拠しています。**
## 実行フロー
メインプロセスと字幕エンジンの通信フロー:
### エンジンの起動
- メインプロセス:`child_process.spawn()`を使用して字幕エンジンプロセスを起動
- 字幕エンジンプロセスTCP Socketサーバースレッドを作成し、作成後に標準出力にJSONオブジェクトを文字列化して出力。このオブジェクトには`command`フィールドが含まれ、値は`connect`
- メインプロセス字幕エンジンプロセスの標準出力を監視し、標準出力を行ごとに分割してJSONオブジェクトとして解析し、オブジェクトの`command`フィールドの値が`connect`かどうかを判断。`connect`の場合はTCP Socketサーバーに接続
### 字幕認識
- 字幕エンジンプロセス:メインスレッドでシステムオーディオ出力を監視し、オーディオデータブロックを字幕エンジンに送信して解析。字幕エンジンはオーディオデータブロックを解析し、標準出力を介して解析された字幕データオブジェクト文字列を送信
- メインプロセス:字幕エンジンの標準出力を引き続き監視し、解析されたオブジェクトの`command`フィールドに基づいて異なる操作を実行
### エンジンの停止
- メインプロセスユーザーがフロントエンドで字幕エンジンを停止する操作を実行すると、メインプロセスはSocket通信を介して字幕エンジンプロセスに`command`フィールドが`stop`のオブジェクト文字列を送信
- 字幕エンジンプロセス:メインエンジンプロセスから送信された字幕データオブジェクト文字列を受信し、文字列をオブジェクトとして解析。オブジェクトの`command`フィールドが`stop`の場合、グローバル変数`thread_data.status`の値を`stop`に設定
- 字幕エンジンプロセス:メインスレッドでシステムオーディオ出力をループ監視し、`thread_data.status`の値が`running`でない場合、ループを終了し、リソースを解放して実行を終了
- メインプロセス:字幕エンジンプロセスの終了を検出した場合、対応する処理を実行し、フロントエンドにフィードバック
## プロジェクトで実装済みの機能
以下の機能はすでに実装されており、直接再利用できます。
### 標準出力
通常情報、コマンド、エラー情報を出力できます。
サンプル:
```python
from utils import stdout, stdout_cmd, stdout_obj, stderr
stdout("Hello") # {"command": "print", "content": "Hello"}\n
stdout_cmd("connect", "8080") # {"command": "connect", "content": "8080"}\n
stdout_obj({"command": "print", "content": "Hello"})
stderr("Error Info")
```
### Socketサービスの作成
このSocketサービスは指定されたポートを監視し、Electronメインプログラムから送信された内容を解析し、`thread_data.status`の値を変更する可能性があります。
サンプル:
```python
from utils import start_server
from utils import thread_data
port = 8080
start_server(port)
while thread_data == 'running':
# 何か処理
pass
```
### オーディオ取得
`AudioStream`クラスはオーディオデータを取得するために使用され、Windows、Linux、macOSでクロスプラットフォームで実装されています。このクラスの初期化には2つのパラメータが含まれます
- `audio_type`取得するオーディオのタイプ。0はシステム出力オーディオスピーカー、1はシステム入力オーディオマイク
- `chunk_rate`オーディオデータの取得頻度。1秒あたりに取得するオーディオブロックの数
このクラスには3つのメソッドがあります
- `open_stream()`:オーディオ取得を開始
- `read_chunk() -> bytes`1つのオーディオブロックを読み取り
- `close_stream()`:オーディオ取得を閉じる
サンプル:
```python
from sysaudio import AudioStream
audio_type = 0
chunk_rate = 20
stream = AudioStream(audio_type, chunk_rate)
stream.open_stream()
while True:
data = stream.read_chunk()
# データで何か処理
pass
stream.close_stream()
```
### オーディオ処理
取得したオーディオストリームは、文字に変換する前に前処理が必要な場合があります。一般的に、マルチチャンネルオーディオをシングルチャンネルオーディオに変換し、リサンプリングが必要な場合もあります。このプロジェクトでは、3つのオーディオ処理関数を提供しています
- `merge_chunk_channels(chunk: bytes, channels: int) -> bytes`:マルチチャンネルオーディオブロックをシングルチャンネルオーディオブロックに変換
- `resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`:現在のマルチチャンネルオーディオデータブロックをシングルチャンネルオーディオデータブロックに変換し、リサンプリングを実行
- `resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`:現在のシングルチャンネルオーディオブロックをリサンプリング
## 字幕エンジンで実装が必要な機能
### オーディオから文字への変換
適切なオーディオストリームを取得した後、オーディオストリームを文字に変換する必要があります。一般的に、さまざまなモデル(クラウドまたはローカル)を使用してオーディオストリームを文字に変換します。要件に応じて適切なモデルを選択する必要があります。
この部分はクラスとしてカプセル化することをお勧めします。以下の3つのメソッドを実装する必要があります
- `start(self)`:モデルを起動
- `send_audio_frame(self, data: bytes)`:現在のオーディオブロックデータを処理し、**生成された字幕データを標準出力を介してElectronメインプロセスに送信**
- `stop(self)`:モデルを停止
完全な字幕エンジンの実例:
- [gummy.py](../../engine/audio2text/gummy.py)
- [vosk.py](../../engine/audio2text/vosk.py)
### 字幕翻訳
一部の音声文字変換モデルは翻訳を提供していません。必要がある場合、翻訳モジュールを追加する必要があります。
### 字幕データの送信
現在のオーディオストリームのテキストを取得した後、そのテキストをメインプログラムに送信する必要があります。字幕エンジンプロセスは標準出力を介して字幕データをElectronメインプロセスに渡します。
送信する内容はJSON文字列でなければなりません。JSONオブジェクトには以下のパラメータを含める必要があります
```typescript
export interface CaptionItem {
command: "caption",
index: number, // 字幕のシーケンス番号
time_s: string, // 現在の字幕の開始時間
time_t: string, // 現在の字幕の終了時間
text: string, // 字幕の内容
translation: string // 字幕の翻訳
}
```
**JSONデータを出力するたびにバッファをフラッシュし、electronメインプロセスが受信する文字列が常にJSONオブジェクトとして解釈できるようにする必要があります。**
プロジェクトで既に実装されている`stdout_obj`関数を使用して送信することをお勧めします。
### コマンドラインパラメータの指定
カスタム字幕エンジンの設定はコマンドラインパラメータで指定するため、字幕エンジンのパラメータを設定する必要があります。このプロジェクトで現在使用されているパラメータは以下のとおりです:
```python
import argparse
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='システムオーディオストリームをテキストに変換')
# 共通
parser.add_argument('-e', '--caption_engine', default='gummy', help='字幕エンジン: gummyまたはvosk')
parser.add_argument('-a', '--audio_type', default=0, help='オーディオストリームソース: 0は出力、1は入力')
parser.add_argument('-c', '--chunk_rate', default=20, help='1秒あたりに収集するオーディオストリームブロックの数')
parser.add_argument('-p', '--port', default=8080, help='サーバーを実行するポート、0はサーバーなし')
# gummy専用
parser.add_argument('-s', '--source_language', default='en', help='ソース言語コード')
parser.add_argument('-t', '--target_language', default='zh', help='ターゲット言語コード')
parser.add_argument('-k', '--api_key', default='', help='GummyモデルのAPI KEY')
# vosk専用
parser.add_argument('-m', '--model_path', default='', help='voskモデルのパス')
```
たとえば、このプロジェクトの字幕エンジンでGummyモデルを使用し、原文を日本語、翻訳を中国語に指定し、システムオーディオ出力の字幕を取得し、毎回0.1秒のオーディオデータをキャプチャする場合、コマンドラインパラメータは以下のようになります:
```bash
python main.py -e gummy -s ja -t zh -a 0 -c 10 -k <dashscope-api-key>
```
## その他
### 通信規格
[caption engine api-doc](../api-docs/caption-engine.md)
### プログラムエントリ
[main.py](../../engine/main.py)
### 開発の推奨事項
オーディオから文字への変換以外は、このプロジェクトのコードを直接再利用することをお勧めします。その場合、追加する必要がある内容は:
- `engine/audio2text/`:新しいオーディオから文字への変換クラスを追加(ファイルレベル)
- `engine/main.py`:新しいパラメータ設定とプロセス関数を追加(`main_gummy`関数と`main_vosk`関数を参照)
### パッケージ化
字幕エンジンの開発とテストが完了した後、字幕エンジンを実行可能ファイルにパッケージ化する必要があります。一般的に`pyinstaller`を使用してパッケージ化します。パッケージ化された字幕エンジンファイルの実行でエラーが発生した場合、依存ライブラリが不足している可能性があります。不足している依存ライブラリを確認してください。
### 実行
使用可能な字幕エンジンを取得したら、字幕ソフトウェアウィンドウで字幕エンジンのパスと字幕エンジンの実行コマンド(パラメータ)を指定して字幕エンジンを起動できます。
![](../img/02_ja.png)

200
docs/engine-manual/zh.md Normal file
View File

@@ -0,0 +1,200 @@
# 字幕引擎说明文档
对应版本v0.6.0
![](../../assets/media/structure_zh.png)
## 字幕引擎介绍
所谓的字幕引擎实际上是一个子程序,它会实时获取系统音频输入(麦克风)或输出(扬声器)的流式数据,并调用音频转文字的模型生成对应音频的字幕。生成的字幕转换为 JSON 格式的字符串数据,并通过标准输出传递给主程序(需要保证主程序读取到的字符串可以被正确解释为 JSON 对象)。主程序读取并解释字幕数据,处理后显示在窗口上。
**字幕引擎进程和 Electron 主进程之间的通信遵循的标准为:[caption engine api-doc](../api-docs/caption-engine.md)。**
## 运行流程
主进程和字幕引擎通信的流程:
### 启动引擎
- 主进程:使用 `child_process.spawn()` 启动字幕引擎进程
- 字幕引擎进程:创建 TCP Socket 服务器线程,创建后在标准输出中输出转化为字符串的 JSON 对象,该对象中包含 `command` 字段,值为 `connect`
- 主进程:监听字幕引擎进程的标准输出,尝试将标准输出按行分割,解析为 JSON 对象,并判断对象的 `command` 字段值是否为 `connect`,如果是则连接 TCP Socket 服务器
### 字幕识别
- 字幕引擎进程:在主线程监听系统音频输出,并将音频数据块发送给字幕引擎解析,字幕引擎解析音频数据块,通过标准输出发送解析的字幕数据对象字符串
- 主进程:继续监听字幕引擎的标准输出,并根据解析的对象的 `command` 字段采取不同的操作
### 关闭引擎
- 主进程:当用户在前端操作关闭字幕引擎时,主进程通过 Socket 通信给字幕引擎进程发送 `command` 字段为 `stop` 的对象字符串
- 字幕引擎进程:接收主引擎进程发送的字幕数据对象字符串,将字符串解析为对象,如果对象的 `command` 字段为 `stop`,则将全局变量 `thread_data.status` 的值设置为 `stop`
- 字幕引擎进程:主线程循环监听系统音频输出,当 `thread_data.status` 的值不为 `running` 时,则结束循环,释放资源,结束运行
- 主进程:如果检测到字幕引擎进程结束,进行相应处理,并向前端反馈
## 项目已经实现的功能
以下功能已经实现,可以直接复用。
### 标准输出
可以输出普通信息、命令和错误信息。
样例:
```python
from utils import stdout, stdout_cmd, stdout_obj, stderr
stdout("Hello") # {"command": "print", "content": "Hello"}\n
stdout_cmd("connect", "8080") # {"command": "connect", "content": "8080"}\n
stdout_obj({"command": "print", "content": "Hello"})
stderr("Error Info")
```
### 创建 Socket 服务
该 Socket 服务会监听指定端口,会解析 Electron 主程序发送的内容,并可能改变 `thread_data.status` 的值。
样例:
```python
from utils import start_server
from utils import thread_data
port = 8080
start_server(port)
while thread_data == 'running':
# do something
pass
```
### 音频获取
`AudioStream` 类用于获取音频数据,实现是跨平台的,支持 Windows、Linux 和 macOS。该类初始化包含两个参数
- `audio_type`: 获取音频类型0 表示系统输出音频扬声器1 表示系统输入音频(麦克风)
- `chunk_rate`: 音频数据获取频率,每秒音频获取的音频块的数量
该类包含三个方法:
- `open_stream()`: 开启音频获取
- `read_chunk() -> bytes`: 读取一个音频块
- `close_stream()`: 关闭音频获取
样例:
```python
from sysaudio import AudioStream
audio_type = 0
chunk_rate = 20
stream = AudioStream(audio_type, chunk_rate)
stream.open_stream()
while True:
data = stream.read_chunk()
# do something with data
pass
stream.close_stream()
```
### 音频处理
获取到的音频流在转文字之前可能需要进行预处理。一般需要将多通道音频转换为单通道音频,还可能需要进行重采样。本项目提供了三个音频处理函数:
- `merge_chunk_channels(chunk: bytes, channels: int) -> bytes` 将多通道音频块转换为单通道音频块
- `resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`:将当前多通道音频数据块转换成单通道音频数据块,然后进行重采样
- `resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`:将当前单通道音频块进行重采样
## 字幕引擎需要实现的功能
### 音频转文字
在得到了合适的音频流后,需要将音频流转换为文字了。一般使用各种模型(云端或本地)来实现音频流转文字。需要根据需求选择合适的模型。
这部分建议封装为一个类,需要实现三个方法:
- `start(self)`:启动模型
- `send_audio_frame(self, data: bytes)`:处理当前音频块数据,**生成的字幕数据通过标准输出发送给 Electron 主进程**
- `stop(self)`:停止模型
完整的字幕引擎实例如下:
- [gummy.py](../../engine/audio2text/gummy.py)
- [vosk.py](../../engine/audio2text/vosk.py)
### 字幕翻译
有的语音转文字模型并不提供翻译,如果有需求,需要再添加一个翻译模块。
### 字幕数据发送
在获取到当前音频流的文字后,需要将文字发送给主程序。字幕引擎进程通过标准输出将字幕数据传递给 Electron 主进程。
传递的内容必须是 JSON 字符串,其中 JSON 对象需要包含的参数如下:
```typescript
export interface CaptionItem {
command: "caption",
index: number, // 字幕序号
time_s: string, // 当前字幕开始时间
time_t: string, // 当前字幕结束时间
text: string, // 字幕内容
translation: string // 字幕翻译
}
```
**注意必须确保每输出一次字幕 JSON 数据就得刷新缓冲区,确保 electron 主进程每次接收到的字符串都可以被解释为 JSON 对象。**
建议使用项目已经实现的 `stdout_obj` 函数来发送。
### 命令行参数的指定
自定义字幕引擎的设置提供命令行参数指定,因此需要设置好字幕引擎的参数,本项目目前用到的参数如下:
```python
import argparse
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
# both
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
parser.add_argument('-c', '--chunk_rate', default=20, help='Number of audio stream chunks collected per second')
parser.add_argument('-p', '--port', default=8080, help='The port to run the server on, 0 for no server')
# gummy only
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
# vosk only
parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
```
比如对于本项目的字幕引擎,我想使用 Gummy 模型,指定原文为日语,翻译为中文,获取系统音频输出的字幕,每次截取 0.1s 的音频数据,那么命令行参数如下:
```bash
python main.py -e gummy -s ja -t zh -a 0 -c 10 -k <dashscope-api-key>
```
## 其他
### 通信规范
[caption engine api-doc](../api-docs/caption-engine.md)
### 程序入口
[main.py](../../engine/main.py)
### 开发建议
除音频转文字外,其他建议直接复用本项目代码。如果这样,那么需要添加的内容为:
- `engine/audio2text/`:添加新的音频转文字类(文件级别)
- `engine/main.py`:添加新参数设置、流程函数(参考 `main_gummy` 函数和 `main_vosk` 函数)
### 打包
在完成字幕引擎的开发和测试后,需要将字幕引擎打包成可执行文件。一般使用 `pyinstaller` 进行打包。如果打包好的字幕引擎文件执行报错,可能是打包漏掉了某些依赖库,请检查是否缺少了依赖库。
### 运行
有了可以使用的字幕引擎,就可以在字幕软件窗口中通过指定字幕引擎的路径和字幕引擎的运行指令(参数)来启动字幕引擎了。
![](../img/02_zh.png)

BIN
docs/img/01.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

BIN
docs/img/02_en.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

BIN
docs/img/02_ja.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 132 KiB

BIN
docs/img/02_zh.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

BIN
docs/img/03.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

BIN
docs/img/04.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 172 KiB

BIN
docs/img/05.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

128
docs/user-manual/en.md Normal file
View File

@@ -0,0 +1,128 @@
# Auto Caption User Manual
Corresponding Version: v0.6.0
**Note: Due to limited personal resources, the English and Japanese documentation files for this project (except for the README document) will no longer be maintained. The content of this document may not be consistent with the latest version of the project. If you are willing to help with translation, please submit relevant Pull Requests.**
## Software Introduction
Auto Caption is a cross-platform caption display software that can real-time capture system audio input (recording) or output (playback) streaming data and use an audio-to-text model to generate captions for the corresponding audio. The default caption engine provided by the software (using Alibaba Cloud Gummy model) supports recognition and translation in nine languages (Chinese, English, Japanese, Korean, German, French, Russian, Spanish, Italian).
The default caption engine currently has full functionality on Windows, macOS, and Linux platforms. Additional configuration is required to capture system audio output on macOS.
The following operating system versions have been tested and confirmed to work properly. The software cannot guarantee normal operation on untested OS versions.
| OS Version | Architecture | Audio Input Capture | Audio Output Capture |
| ------------------ | ------------ | ------------------- | -------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
![](../../assets/media/main_en.png)
### Software Limitations
To use the Gummy caption engine, you need to obtain an API KEY from Alibaba Cloud.
Additional configuration is required to capture audio output on macOS platform.
The software is built using Electron, so the software size is inevitably large.
## Preparation for Using Gummy Engine
To use the default caption engine provided by the software (Alibaba Cloud Gummy), you need to obtain an API KEY from the Alibaba Cloud Bailian platform. Then add the API KEY to the software settings or configure it in environment variables (only Windows platform supports reading API KEY from environment variables).
**The international version of Alibaba Cloud services does not provide the Gummy model, so non-Chinese users currently cannot use the default caption engine.**
Alibaba Cloud provides detailed tutorials for this part, which can be referenced:
- [Obtaining API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
- [Configuring API Key through Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
## Preparation for Using Vosk Engine
To use the Vosk local caption engine, first download your required model from the [Vosk Models](https://alphacephei.com/vosk/models) page. Then extract the downloaded model package locally and add the corresponding model folder path to the software settings. Currently, the Vosk caption engine does not support translated caption content.
![](../../assets/media/vosk_en.png)
## Capturing System Audio Output on macOS
> Based on the [Setup Multi-Output Device](https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device) tutorial
The caption engine cannot directly capture system audio output on macOS platform and requires additional driver installation. The current caption engine uses [BlackHole](https://github.com/ExistentialAudio/BlackHole). First open Terminal and execute one of the following commands (recommended to choose the first one):
```bash
brew install blackhole-2ch
brew install blackhole-16ch
brew install blackhole-64ch
```
![](../img/03.png)
After installation completes, open `Audio MIDI Setup` (searchable via `cmd + space`). Check if BlackHole appears in the device list - if not, restart your computer.
![](../img/04.png)
Once BlackHole is confirmed installed, in the `Audio MIDI Setup` page, click the plus (+) button at bottom left and select "Create Multi-Output Device". Include both BlackHole and your desired audio output destination in the outputs. Finally, set this multi-output device as your default audio output device.
![](../img/05.png)
Now the caption engine can capture system audio output and generate captions.
## Getting System Audio Output on Linux
First execute in the terminal:
```bash
pactl list short sources
```
If you see output similar to the following, no additional configuration is needed:
```bash
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
```
Otherwise, install `pulseaudio` and `pavucontrol` using the following commands:
```bash
# For Debian/Ubuntu etc.
sudo apt install pulseaudio pavucontrol
# For CentOS etc.
sudo yum install pulseaudio pavucontrol
```
## Software Usage
### Modifying Settings
Caption settings can be divided into three categories: general settings, caption engine settings, and caption style settings. Note that changes to general settings take effect immediately. For the other two categories, after making changes, you need to click the "Apply" option in the upper right corner of the corresponding settings module for the changes to take effect. If you click "Cancel Changes," the current modifications will not be saved and will revert to the previous state.
### Starting and Stopping Captions
After completing all configurations, click the "Start Caption Engine" button on the interface to start the captions. If you need a separate caption display window, click the "Open Caption Window" button to activate the independent caption display window. To pause caption recognition, click the "Stop Caption Engine" button.
### Adjusting the Caption Display Window
The following image shows the caption display window, which displays the latest captions in real-time. The three buttons in the upper right corner of the window have the following functions: pin the window to the front, open the caption control window, and close the caption display window. The width of the window can be adjusted by moving the mouse to the left or right edge of the window and dragging the mouse.
![](../img/01.png)
### Exporting Caption Records
In the caption control window, you can see the records of all collected captions. Click the "Export Log" button to export the caption records as a JSON or SRT file.
## Caption Engine
The so-called caption engine is essentially a subprogram that captures real-time streaming data from system audio input (recording) or output (playback), and invokes speech-to-text models to generate corresponding captions. The generated captions are converted into JSON-formatted strings and passed to the main program through standard output. The main program reads the caption data, processes it, and displays it in the window.
The software provides two default caption engines. If you need other caption engines, you can invoke them by enabling the custom engine option (other engines need to be specifically developed for this software). The engine path refers to the location of the custom caption engine on your computer, while the engine command represents the runtime parameters of the custom caption engine, which should be configured according to the rules of that particular caption engine.
![](../img/02_en.png)
Note that when using a custom caption engine, all previous caption engine settings will be ineffective, and the configuration of the custom caption engine is entirely done through the engine command.
If you are a developer and want to develop a custom caption engine, please refer to the [Caption Engine Explanation Document](../engine-manual/en.md).

131
docs/user-manual/ja.md Normal file
View File

@@ -0,0 +1,131 @@
# Auto Caption ユーザーマニュアル
対応バージョンv0.6.0
この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
**注意個人のリソースが限られているため、このプロジェクトの英語および日本語のドキュメントREADME ドキュメントを除く)のメンテナンスは行われません。このドキュメントの内容は最新版のプロジェクトと一致しない場合があります。翻訳のお手伝いをしていただける場合は、関連するプルリクエストを提出してください。**
## ソフトウェアの概要
Auto Caption は、クロスプラットフォームの字幕表示ソフトウェアで、システムの音声入力(録音)または出力(音声再生)のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。このソフトウェアが提供するデフォルトの字幕エンジン(アリババクラウド Gummy モデルを使用は、9つの言語中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、スペイン語、イタリア語の認識と翻訳をサポートしています。
現在のデフォルト字幕エンジンは Windows、macOS、Linux プラットフォームで完全な機能を有しています。macOSでシステムのオーディオ出力を取得するには追加設定が必要です。
以下のオペレーティングシステムバージョンで正常動作を確認しています。記載以外の OS での正常動作は保証できません。
| OS バージョン | アーキテクチャ | オーディオ入力取得 | オーディオ出力取得 |
| ------------------- | ------------- | ------------------ | ------------------ |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
![](../../assets/media/main_ja.png)
### ソフトウェアの欠点
Gummy 字幕エンジンを使用するには、アリババクラウドの API KEY を取得する必要があります。
macOS プラットフォームでオーディオ出力を取得するには追加の設定が必要です。
ソフトウェアは Electron で構築されているため、そのサイズは避けられないほど大きいです。
## Gummyエンジン使用前の準備
ソフトウェアが提供するデフォルトの字幕エンジンAlibaba Cloud Gummyを使用するには、Alibaba Cloud百煉プラットフォームからAPI KEYを取得する必要があります。その後、API KEYをソフトウェア設定に追加するか、環境変数に設定しますWindowsプラットフォームのみ環境変数からのAPI KEY読み取りをサポート
**Alibaba Cloudの国際版サービスではGummyモデルを提供していないため、現在中国以外のユーザーはデフォルトの字幕エンジンを使用できません。**
この部分についてAlibaba Cloudは詳細なチュートリアルを提供しており、以下を参照できます
- [API KEY の取得(中国語)](https://help.aliyun.com/zh/model-studio/get-api-key)
- [環境変数を通じて API Key を設定(中国語)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
## Voskエンジン使用前の準備
Voskローカル字幕エンジンを使用するには、まず[Vosk Models](https://alphacephei.com/vosk/models)ページから必要なモデルをダウンロードしてください。その後、ダウンロードしたモデルパッケージをローカルに解凍し、対応するモデルフォルダのパスをソフトウェア設定に追加します。現在、Vosk字幕エンジンは字幕の翻訳をサポートしていません。
![](../../assets/media/vosk_ja.png)
## macOS でのシステムオーディオ出力の取得方法
> [マルチ出力デバイスの設定](https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device) チュートリアルに基づいて作成
字幕エンジンは macOS プラットフォームで直接システムオーディオ出力を取得できず、追加のドライバーインストールが必要です。現在の字幕エンジンでは [BlackHole](https://github.com/ExistentialAudio/BlackHole) を使用しています。まずターミナルを開き、以下のいずれかのコマンドを実行してください(最初のオプションを推奨します):
```bash
brew install blackhole-2ch
brew install blackhole-16ch
brew install blackhole-64ch
```
![](../img/03.png)
インストール完了後、`オーディオMIDI設定``cmd + space`で検索可能を開きます。デバイスリストにBlackHoleが表示されているか確認してください - 表示されていない場合はコンピュータを再起動してください。
![](../img/04.png)
BlackHoleのインストールが確認できたら、`オーディオ MIDI 設定`ページで左下のプラス(+)ボタンをクリックし、「マルチ出力デバイスを作成」を選択します。出力に BlackHole と希望するオーディオ出力先の両方を含めてください。最後に、このマルチ出力デバイスをデフォルトのオーディオ出力デバイスに設定します。
![](../img/05.png)
これで字幕エンジンがシステムオーディオ出力をキャプチャし、字幕を生成できるようになります。
## Linux でシステムオーディオ出力を取得する
まずターミナルで以下を実行してください:
```bash
pactl list short sources
```
以下のような出力が確認できれば追加設定は不要です:
```bash
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
```
それ以外の場合は、以下のコマンドで`pulseaudio``pavucontrol`をインストールしてください:
```bash
# Debian/Ubuntu系の場合
sudo apt install pulseaudio pavucontrol
# CentOS系の場合
sudo yum install pulseaudio pavucontrol
```
## ソフトウェアの使い方
### 設定の変更
字幕の設定は3つのカテゴリーに分かれます一般的な設定、字幕エンジンの設定、字幕スタイルの設定。注意すべき点として、一般的な設定の変更は即座に適用されます。しかし、他の2つの設定については、変更後に該当する設定モジュール右上の「適用」オプションをクリックすることで初めて変更が有効になります。「変更を取り消す」を選択すると、現在の変更は保存されず、前回の状態に戻ります。
### 字幕の開始と停止
すべての設定を完了したら、インターフェースの「字幕エンジンを開始」ボタンをクリックして字幕を開始できます。独立した字幕表示ウィンドウが必要な場合は、インターフェースの「字幕ウィンドウを開く」ボタンをクリックして独立した字幕表示ウィンドウをアクティブ化します。字幕認識を一時停止する必要がある場合は、「字幕エンジンを停止」ボタンをクリックします。
### 字幕表示ウィンドウの調整
下の図は字幕表示ウィンドウです。このウィンドウは現在の最新の字幕をリアルタイムで表示します。ウィンドウの右上にある3つのボタンの機能はそれぞれ次の通りですウィンドウを最前面に固定する、字幕制御ウィンドウを開く、字幕表示ウィンドウを閉じる。このウィンドウの幅は調整可能です。マウスをウィンドウの左右の端に移動し、ドラッグして幅を調整します。
![](../img/01.png)
### 字幕記録のエクスポート
「エクスポート」ボタンをクリックすると、字幕記録を JSON または SRT ファイル形式で出力できます。
## 字幕エンジン
字幕エンジンとは、システムのオーディオ入力(録音)または出力(再生音)のストリーミングデータをリアルタイムで取得し、音声テキスト変換モデルを呼び出して対応する字幕を生成するサブプログラムです。生成された字幕は JSON 形式の文字列に変換され、標準出力を通じてメインプログラムに渡されます。メインプログラムは字幕データを読み取り、処理した後、ウィンドウに表示します。
ソフトウェアには2つのデフォルトの字幕エンジンが用意されています。他の字幕エンジンが必要な場合、カスタムエンジンオプションを有効にすることで呼び出すことができます他のエンジンはこのソフトウェア向けに特別に開発する必要があります。エンジンパスはコンピュータ上のカスタム字幕エンジンの場所を指し、エンジンコマンドはカスタム字幕エンジンの実行パラメータを表します。これらは該当する字幕エンジンの規則に従って設定する必要があります。
![](../img/02_ja.png)
カスタム字幕エンジンを使用する場合、前の字幕エンジンの設定はすべて無効になります。カスタム字幕エンジンの設定は完全にエンジンコマンドによって行われます。
開発者の方で、カスタム字幕エンジンを開発したい場合は、[字幕エンジン説明文書](../engine-manual/ja.md)をご覧ください。

126
docs/user-manual/zh.md Normal file
View File

@@ -0,0 +1,126 @@
# Auto Caption 用户手册
对应版本v0.6.0
## 软件简介
Auto Caption 是一个跨平台的字幕显示软件,能够实时获取系统音频输入(录音)或输出(播放声音)的流式数据,并调用音频转文字的模型生成对应音频的字幕。软件提供的默认字幕引擎(使用阿里云 Gummy 模型)支持九种语言(中、英、日、韩、德、法、俄、西、意)的识别与翻译。
目前软件默认字幕引擎在 Windows、 macOS 和 Linux 平台下均拥有完整功能,在 macOS 要获取系统音频输出需要额外配置。
测试过可正常运行的操作系统信息如下,软件不能保证在非下列版本的操作系统上正常运行。
| 操作系统版本 | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
| ------------------ | ---------- | ---------------- | ---------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
![](../../assets/media/main_zh.png)
### 软件缺点
要使用默认的 Gummy 字幕引擎需要获取阿里云的 API KEY。
在 macOS 平台获取音频输出需要额外配置。
软件使用 Electron 构建,因此软件体积不可避免的较大。
## Gummy 引擎使用前准备
要使用软件提供的默认字幕引擎(阿里云 Gummy需要从阿里云百炼平台获取 API KEY然后将 API KEY 添加到软件设置中或者配置到环境变量中(仅 Windows 平台支持读取环境变量中的 API KEY
**国际版的阿里云服务并没有提供 Gummy 模型,因此目前非中国用户无法使用默认字幕引擎。**
这部分阿里云提供了详细的教程,可参考:
- [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
- [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
## Vosk 引擎使用前准备
如果要使用 Vosk 本地字幕引擎,首先需要在 [Vosk Models](https://alphacephei.com/vosk/models) 页面下载你需要的模型。然后将下载的模型安装包解压到本地,并将对应的模型文件夹的路径添加到软件的设置中。目前 Vosk 字幕引擎还不支持翻译字幕内容。
![](../../assets/media/vosk_zh.png)
## macOS 获取系统音频输出
> 基于 [Setup Multi-Output Device](https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device) 教程编写
字幕引擎无法在 macOS 平台直接获取系统的音频输出,需要安装额外的驱动。目前字幕引擎采用的是 [BlackHole](https://github.com/ExistentialAudio/BlackHole)。首先打开终端,执行以下命令中的其中一个(建议选择第一个):
```bash
brew install blackhole-2ch
brew install blackhole-16ch
brew install blackhole-64ch
```
![](../img/03.png)
安装完成后打开 `音频 MIDI 设置``cmd + space` 打开搜索,可以搜索到)。观察设备列表中是否有 BlackHole 设备,如果没有需要重启电脑。
![](../img/04.png)
在确定安装好 BlackHole 设备后,在 `音频 MIDI 设置` 页面,点击左下角的加号,选择“创建多输出设备”。在输出中包含 BlackHole 和你想要的音频输出目标。最后将该多输出设备设置为默认音频输出设备。
![](../img/05.png)
现在字幕引擎就能捕获系统的音频输出并生成字幕了。
## Linux 获取系统音频输出
首先在控制台执行:
```bash
pactl list short sources
```
如果有以下类似的输出内容则无需额外配置:
```bash
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
```
否则,执行以下命令安装 `pulseaudio``pavucontrol`
```bash
# Debian or Ubuntu, etc.
sudo apt install pulseaudio pavucontrol
# CentOS, etc.
sudo yum install pulseaudio pavucontrol
```
## 软件使用
### 修改设置
字幕设置可以分为三类:通用设置、字幕引擎设置、字幕样式设置。需要注意的是,修改通用设置是立即生效的。但是对于其他两类设置,修改后需要点击对应设置模块右上角的“应用”选项,更改才会真正生效。如果点击“取消更改”那么当前修改将不会被保存,而是回退到上次修改的状态。
### 启动和关闭字幕
在修改完全部配置后,点击界面的“启动字幕引擎”按钮,即可启动字幕。如果需要独立的字幕展示窗口,单击界面的“打开字幕窗口”按钮即可激活独立的字幕展示窗口。如果需要暂停字幕识别,单击界面的“关闭字幕引擎”按钮即可。
### 调整字幕展示窗口
如下图为字幕展示窗口,该窗口实时展示当前最新字幕。窗口右上角三个按钮的功能分别是:将窗口固定在最前面、打开字幕控制窗口、关闭字幕展示窗口。该窗口宽度可以调整,将鼠标移动至窗口的左右边缘,拖动鼠标即可调整宽度。
![](../img/01.png)
### 字幕记录的导出
在字幕控制窗口中可以看到当前收集的所有字幕的记录,点击“导出字幕”按钮,即可将字幕记录导出为 JSON 或 SRT 文件。
## 字幕引擎
所谓的字幕引擎实际上是一个子程序,它会实时获取系统音频输入(录音)或输出(播放声音)的流式数据,并调用音频转文字的模型生成对应音频的字幕。生成的字幕通过转换为字符串的 JSON 数据,并通过标准输出传递给主程序。主程序读取字幕数据,处理后显示在窗口上。
软件提供了两个默认的字幕引擎,如果你需要其他的字幕引擎,可以通过打开自定义引擎选项来调用其他字幕引擎(其他引擎需要针对该软件进行开发)。其中引擎路径是自定义字幕引擎在你的电脑上的路径,引擎指令是自定义字幕引擎的运行参数,这部分需要按该字幕引擎的规则进行填写。
![](../img/02_zh.png)
注意使用自定义字幕引擎时,前面的字幕引擎的设置将全部不起作用,自定义字幕引擎的配置完全通过引擎指令进行配置。
如果你是开发者,想开发自定义字幕引擎,请查看[字幕引擎说明文档](../engine-manual/zh.md)。

View File

@@ -6,17 +6,23 @@ files:
- '!**/.vscode/*'
- '!src/*'
- '!electron.vite.config.{js,ts,mjs,cjs}'
- '!{.eslintcache,eslint.config.mjs,.prettierignore,.prettierrc.yaml,dev-app-update.yml,CHANGELOG.md,README.md}'
- '!{.eslintcache,eslint.config.mjs,.prettierignore,.prettierrc.yaml,dev-app-update.yml,CHANGELOG.md}'
- '!{LICENSE,README.md,README_en.md,README_ja.md}'
- '!{.env,.env.*,.npmrc,pnpm-lock.yaml}'
- '!{tsconfig.json,tsconfig.node.json,tsconfig.web.json}'
- '!engine/*'
- '!docs/*'
- '!assets/*'
extraResources:
from: ./python-subprocess/dist/main-gummy.exe
to: ./python-subprocess/dist/main-gummy.exe
asarUnpack:
- resources/**
# For Windows
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# For macOS and Linux
# - from: ./engine/dist/main
# to: ./engine/main
win:
executableName: auto-caption
icon: resources/icon.png
icon: build/icon.png
nsis:
artifactName: ${name}-${version}-setup.${ext}
shortcutName: ${productName}

View File

@@ -0,0 +1,3 @@
from dashscope.common.error import InvalidParameter
from .gummy import GummyRecognizer
from .vosk import VoskRecognizer

102
engine/audio2text/gummy.py Normal file
View File

@@ -0,0 +1,102 @@
from dashscope.audio.asr import (
TranslationRecognizerCallback,
TranscriptionResult,
TranslationResult,
TranslationRecognizerRealtime
)
import dashscope
from datetime import datetime
from utils import stdout_cmd, stdout_obj, stderr
class Callback(TranslationRecognizerCallback):
"""
语音大模型流式传输回调对象
"""
def __init__(self):
super().__init__()
self.index = 0
self.usage = 0
self.cur_id = -1
self.time_str = ''
def on_open(self) -> None:
self.usage = 0
self.cur_id = -1
self.time_str = ''
stdout_cmd('info', 'Gummy translator started.')
def on_close(self) -> None:
stdout_cmd('info', 'Gummy translator closed.')
stdout_cmd('usage', str(self.usage))
def on_event(
self,
request_id,
transcription_result: TranscriptionResult,
translation_result: TranslationResult,
usage
) -> None:
caption = {}
if transcription_result is not None:
if self.cur_id != transcription_result.sentence_id:
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.cur_id = transcription_result.sentence_id
self.index += 1
caption['command'] = 'caption'
caption['index'] = self.index
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
caption['text'] = transcription_result.text
caption['translation'] = ""
if translation_result is not None:
lang = translation_result.get_language_list()[0]
caption['translation'] = translation_result.get_translation(lang).text
if usage:
self.usage += usage['duration']
if 'text' in caption:
stdout_obj(caption)
class GummyRecognizer:
"""
使用 Gummy 引擎流式处理的音频数据,并在标准输出中输出与 Auto Caption 软件可读取的 JSON 字符串数据
初始化参数:
rate: 音频采样率
source: 源语言代码字符串zh, en, ja 等)
target: 目标语言代码字符串zh, en, ja 等)
api_key: 阿里云百炼平台 API KEY
"""
def __init__(self, rate: int, source: str, target: str | None, api_key: str | None):
if api_key:
dashscope.api_key = api_key
self.translator = TranslationRecognizerRealtime(
model = "gummy-realtime-v1",
format = "pcm",
sample_rate = rate,
transcription_enabled = True,
translation_enabled = (target is not None),
source_language = source,
translation_target_languages = [target],
callback = Callback()
)
def start(self):
"""启动 Gummy 引擎"""
self.translator.start()
def send_audio_frame(self, data):
"""发送音频帧,擎将自动识别并将识别结果输出到标准输出中"""
self.translator.send_audio_frame(data)
def stop(self):
"""停止 Gummy 引擎"""
try:
self.translator.stop()
except Exception:
return

68
engine/audio2text/vosk.py Normal file
View File

@@ -0,0 +1,68 @@
import json
from datetime import datetime
from vosk import Model, KaldiRecognizer, SetLogLevel
from utils import stdout_cmd, stdout_obj
class VoskRecognizer:
"""
使用 Vosk 引擎流式处理的音频数据,并在标准输出中输出与 Auto Caption 软件可读取的 JSON 字符串数据
初始化参数:
model_path: Vosk 识别模型路径
"""
def __init__(self, model_path: str):
SetLogLevel(-1)
if model_path.startswith('"'):
model_path = model_path[1:]
if model_path.endswith('"'):
model_path = model_path[:-1]
self.model_path = model_path
self.time_str = ''
self.cur_id = 0
self.prev_content = ''
self.model = Model(self.model_path)
self.recognizer = KaldiRecognizer(self.model, 16000)
def start(self):
"""启动 Vosk 引擎"""
stdout_cmd('info', 'Vosk recognizer started.')
def send_audio_frame(self, data: bytes):
"""
发送音频帧给 Vosk 引擎,引擎将自动识别并将识别结果输出到标准输出中
Args:
data: 音频帧数据,采样率必须为 16000Hz
"""
caption = {}
caption['command'] = 'caption'
caption['translation'] = ''
if self.recognizer.AcceptWaveform(data):
content = json.loads(self.recognizer.Result()).get('text', '')
caption['index'] = self.cur_id
caption['text'] = content
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.prev_content = ''
self.cur_id += 1
else:
content = json.loads(self.recognizer.PartialResult()).get('partial', '')
if content == '' or content == self.prev_content:
return
if self.prev_content == '':
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
caption['index'] = self.cur_id
caption['text'] = content
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.prev_content = content
stdout_obj(caption)
def stop(self):
"""停止 Vosk 引擎"""
stdout_cmd('info', 'Vosk recognizer closed.')

103
engine/main.py Normal file
View File

@@ -0,0 +1,103 @@
import argparse
from utils import stdout_cmd, stderr
from utils import thread_data, start_server
from utils import merge_chunk_channels, resample_chunk_mono
from audio2text import InvalidParameter, GummyRecognizer
from audio2text import VoskRecognizer
from sysaudio import AudioStream
def main_gummy(s: str, t: str, a: int, c: int, k: str):
global thread_data
stream = AudioStream(a, c)
if t == 'none':
engine = GummyRecognizer(stream.RATE, s, None, k)
else:
engine = GummyRecognizer(stream.RATE, s, t, k)
stream.open_stream()
engine.start()
restart_count = 0
while thread_data.status == "running":
try:
chunk = stream.read_chunk()
if chunk is None: continue
chunk_mono = merge_chunk_channels(chunk, stream.CHANNELS)
try:
engine.send_audio_frame(chunk_mono)
except InvalidParameter as e:
restart_count += 1
if restart_count > 8:
stderr(str(e))
thread_data.status = "kill"
break
else:
stdout_cmd('info', f'Gummy engine stopped, trying to restart #{restart_count}')
except KeyboardInterrupt:
break
stream.close_stream()
engine.stop()
def main_vosk(a: int, c: int, m: str):
global thread_data
stream = AudioStream(a, c)
engine = VoskRecognizer(m)
stream.open_stream()
engine.start()
while thread_data.status == "running":
try:
chunk = stream.read_chunk()
if chunk is None: continue
chunk_mono = resample_chunk_mono(chunk, stream.CHANNELS, stream.RATE, 16000)
engine.send_audio_frame(chunk_mono)
except KeyboardInterrupt:
break
stream.close_stream()
engine.stop()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
# both
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
parser.add_argument('-c', '--chunk_rate', default=20, help='Number of audio stream chunks collected per second')
parser.add_argument('-p', '--port', default=8080, help='The port to run the server on, 0 for no server')
# gummy only
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
# vosk only
parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
args = parser.parse_args()
if int(args.port) == 0:
thread_data.status = "running"
else:
start_server(int(args.port))
if args.caption_engine == 'gummy':
main_gummy(
args.source_language,
args.target_language,
int(args.audio_type),
int(args.chunk_rate),
args.api_key
)
elif args.caption_engine == 'vosk':
main_vosk(
int(args.audio_type),
int(args.chunk_rate),
args.model_path
)
else:
raise ValueError('Invalid caption engine specified.')
if thread_data.status == "kill":
stdout_cmd('kill')

View File

@@ -1,11 +1,18 @@
# -*- mode: python ; coding: utf-8 -*-
from pathlib import Path
import sys
if sys.platform == 'win32':
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
else:
vosk_path = str(Path('./subenv/lib/python3.12/site-packages/vosk').resolve())
a = Analysis(
['main-gummy.py'],
['main.py'],
pathex=[],
binaries=[],
datas=[],
datas=[(vosk_path, 'vosk')],
hiddenimports=[],
hookspath=[],
hooksconfig={},
@@ -14,6 +21,7 @@ a = Analysis(
noarchive=False,
optimize=0,
)
pyz = PYZ(a.pure)
exe = EXE(
@@ -22,7 +30,7 @@ exe = EXE(
a.binaries,
a.datas,
[],
name='main-gummy',
name='main',
debug=False,
bootloader_ignore_signals=False,
strip=False,
@@ -35,4 +43,5 @@ exe = EXE(
target_arch=None,
codesign_identity=None,
entitlements_file=None,
onefile=True,
)

View File

@@ -0,0 +1,6 @@
dashscope
numpy
samplerate
PyAudio
vosk
pyinstaller

View File

@@ -0,0 +1,5 @@
dashscope
numpy
vosk
pyinstaller
samplerate # pip install samplerate --only-binary=:all:

View File

@@ -0,0 +1,6 @@
dashscope
numpy
samplerate
PyAudioWPatch
vosk
pyinstaller

View File

@@ -0,0 +1,10 @@
import sys
if sys.platform == "win32":
from .win import AudioStream
elif sys.platform == "darwin":
from .darwin import AudioStream
elif sys.platform == "linux":
from .linux import AudioStream
else:
raise NotImplementedError(f"Unsupported platform: {sys.platform}")

102
engine/sysaudio/darwin.py Normal file
View File

@@ -0,0 +1,102 @@
"""获取 MacOS 系统音频输入/输出流"""
import pyaudio
from textwrap import dedent
def get_blackhole_device(mic: pyaudio.PyAudio):
"""
获取 BlackHole 设备
"""
device_count = mic.get_device_count()
for i in range(device_count):
dev_info = mic.get_device_info_by_index(i)
if 'blackhole' in str(dev_info["name"]).lower():
return dev_info
raise Exception("The device containing BlackHole was not found.")
class AudioStream:
"""
获取系统音频流(如果要捕获输出音频,仅支持 BlackHole 作为系统音频输出捕获)
初始化参数:
audio_type: 0-系统音频输出流(需配合 BlackHole1-系统音频输入流
chunk_rate: 每秒采集音频块的数量默认为20
"""
def __init__(self, audio_type=0, chunk_rate=20):
self.audio_type = audio_type
self.mic = pyaudio.PyAudio()
if self.audio_type == 0:
self.device = get_blackhole_device(self.mic)
else:
self.device = self.mic.get_default_input_device_info()
self.stop_signal = False
self.stream = None
self.INDEX = self.device["index"]
self.FORMAT = pyaudio.paInt16
self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
self.CHANNELS = int(self.device["maxInputChannels"])
self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // chunk_rate
def get_info(self):
dev_info = f"""
采样设备:
- 设备类型:{ "音频输出" if self.audio_type == 0 else "音频输入" }
- 设备序号:{self.device['index']}
- 设备名称:{self.device['name']}
- 最大输入通道数:{self.device['maxInputChannels']}
- 默认低输入延迟:{self.device['defaultLowInputLatency']}s
- 默认高输入延迟:{self.device['defaultHighInputLatency']}s
- 默认采样率:{self.device['defaultSampleRate']}Hz
- 是否回环设备:{self.device['isLoopbackDevice']}
设备序号:{self.INDEX}
样本格式:{self.FORMAT}
样本位宽:{self.SAMP_WIDTH}
样本通道数:{self.CHANNELS}
样本采样率:{self.RATE}
样本块大小:{self.CHUNK}
"""
return dedent(dev_info).strip()
def open_stream(self):
"""
打开并返回系统音频输出流
"""
if self.stream: return self.stream
self.stream = self.mic.open(
format = self.FORMAT,
channels = int(self.CHANNELS),
rate = self.RATE,
input = True,
input_device_index = int(self.INDEX)
)
return self.stream
def read_chunk(self):
"""
读取音频数据
"""
if self.stop_signal:
self.close_stream()
return None
if not self.stream: return None
return self.stream.read(self.CHUNK, exception_on_overflow=False)
def close_stream_signal(self):
"""
线程安全的关闭系统音频输入流,不一定会立即关闭
"""
self.stop_signal = True
def close_stream(self):
"""
立即关闭系统音频输入流
"""
if self.stream is not None:
self.stream.stop_stream()
self.stream.close()
self.stream = None
self.stop_signal = False

108
engine/sysaudio/linux.py Normal file
View File

@@ -0,0 +1,108 @@
"""获取 Linux 系统音频输入流"""
import subprocess
from textwrap import dedent
def find_monitor_source():
result = subprocess.run(
["pactl", "list", "short", "sources"],
stdout=subprocess.PIPE, text=True
)
lines = result.stdout.splitlines()
for line in lines:
parts = line.split('\t')
if len(parts) >= 2 and ".monitor" in parts[1]:
return parts[1]
raise RuntimeError("System output monitor device not found")
def find_input_source():
result = subprocess.run(
["pactl", "list", "short", "sources"],
stdout=subprocess.PIPE, text=True
)
lines = result.stdout.splitlines()
for line in lines:
parts = line.split('\t')
name = parts[1]
if ".monitor" not in name:
return name
raise RuntimeError("Microphone input device not found")
class AudioStream:
"""
获取系统音频流
初始化参数:
audio_type: 0-系统音频输出流不支持不会生效1-系统音频输入流(默认)
chunk_rate: 每秒采集音频块的数量默认为20
"""
def __init__(self, audio_type=1, chunk_rate=20):
self.audio_type = audio_type
if self.audio_type == 0:
self.source = find_monitor_source()
else:
self.source = find_input_source()
self.stop_signal = False
self.process = None
self.FORMAT = 16
self.SAMP_WIDTH = 2
self.CHANNELS = 2
self.RATE = 48000
self.CHUNK = self.RATE // chunk_rate
def get_info(self):
dev_info = f"""
音频捕获进程:
- 捕获类型:{"音频输出" if self.audio_type == 0 else "音频输入"}
- 设备源:{self.source}
- 捕获进程 PID{self.process.pid if self.process else "None"}
样本格式:{self.FORMAT}
样本位宽:{self.SAMP_WIDTH}
样本通道数:{self.CHANNELS}
样本采样率:{self.RATE}
样本块大小:{self.CHUNK}
"""
print(dev_info)
def open_stream(self):
"""
启动音频捕获进程
"""
self.process = subprocess.Popen(
["parec", "-d", self.source, "--format=s16le", "--rate=48000", "--channels=2"],
stdout=subprocess.PIPE
)
def read_chunk(self):
"""
读取音频数据
"""
if self.stop_signal:
self.close_stream()
return None
if self.process and self.process.stdout:
return self.process.stdout.read(self.CHUNK)
return None
def close_stream_signal(self):
"""
线程安全的关闭系统音频输入流,不一定会立即关闭
"""
self.stop_signal = True
def close_stream(self):
"""
关闭系统音频捕获进程
"""
if self.process:
self.process.terminate()
self.stop_signal = False

View File

@@ -1,15 +1,15 @@
"""获取 Windows 系统音频输出流"""
"""获取 Windows 系统音频输入/输出流"""
import pyaudiowpatch as pyaudio
import numpy as np
from textwrap import dedent
def getDefaultLoopbackDevice(mic: pyaudio.PyAudio, info = True)->dict:
def get_default_loopback_device(mic: pyaudio.PyAudio, info = True)->dict:
"""
获取默认的系统音频输出的回环设备
Args:
mic (pyaudio.PyAudio): pyaudio对象
info (bool, optional): 是否打印设备信息
mic: pyaudio对象
info: 是否打印设备信息
Returns:
dict: 系统音频输出的回环设备
@@ -35,75 +35,57 @@ def getDefaultLoopbackDevice(mic: pyaudio.PyAudio, info = True)->dict:
print("Run `python -m pyaudiowpatch` to check available devices.")
print("Exiting...")
exit()
if(info): print(f"Output Stream Device: #{default_speaker['index']} {default_speaker['name']}")
return default_speaker
def mergeStreamChannels(data, channels):
"""
将当前多通道流数据合并为单通道流数据
Args:
data: 多通道数据
channels: 通道数
Returns:
mono_data_bytes: 单通道数据
"""
# (length * channels,)
data_np = np.frombuffer(data, dtype=np.int16)
# (length, channels)
data_np_r = data_np.reshape(-1, channels)
# (length,)
mono_data = np.mean(data_np_r.astype(np.float32), axis=1)
mono_data = mono_data.astype(np.int16)
mono_data_bytes = mono_data.tobytes()
return mono_data_bytes
class AudioStream:
"""
获取系统音频流
参数
audio_type: 默认0-系统音频输出流1-系统音频输入流
初始化参数
audio_type: 0-系统音频输出流默认1-系统音频输入流
chunk_rate: 每秒采集音频块的数量默认为20
"""
def __init__(self, audio_type=0):
def __init__(self, audio_type=0, chunk_rate=20):
self.audio_type = audio_type
self.mic = pyaudio.PyAudio()
if self.audio_type == 0:
self.device = getDefaultLoopbackDevice(self.mic, False)
self.device = get_default_loopback_device(self.mic, False)
else:
self.device = self.mic.get_default_input_device_info()
self.stop_signal = False
self.stream = None
self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
self.FORMAT = pyaudio.paInt16
self.CHANNELS = self.device["maxInputChannels"]
self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // 20
self.INDEX = self.device["index"]
self.FORMAT = pyaudio.paInt16
self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
self.CHANNELS = int(self.device["maxInputChannels"])
self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // chunk_rate
def printInfo(self):
def get_info(self):
dev_info = f"""
采样设备
- 设备类型{ "音频输" if self.audio_type == 0 else "音频输" }
- 序号{self.device['index']}
- 名称{self.device['name']}
- 设备类型{ "音频输" if self.audio_type == 0 else "音频输" }
- 设备序号{self.device['index']}
- 设备名称{self.device['name']}
- 最大输入通道数{self.device['maxInputChannels']}
- 默认低输入延迟{self.device['defaultLowInputLatency']}s
- 默认高输入延迟{self.device['defaultHighInputLatency']}s
- 默认采样率{self.device['defaultSampleRate']}Hz
- 是否回环设备{self.device['isLoopbackDevice']}
音频样本块大小{self.CHUNK}
设备序号{self.INDEX}
样本格式{self.FORMAT}
样本位宽{self.SAMP_WIDTH}
音频数据格式{self.FORMAT}
音频通道数{self.CHANNELS}
音频采样率{self.RATE}
样本通道数{self.CHANNELS}
样本采样率{self.RATE}
样本块大小{self.CHUNK}
"""
print(dev_info)
return dedent(dev_info).strip()
def openStream(self):
def open_stream(self):
"""
打开并返回系统音频输出流
"""
@@ -116,12 +98,29 @@ class AudioStream:
input_device_index = self.INDEX
)
return self.stream
def closeStream(self):
def read_chunk(self) -> bytes | None:
"""
关闭系统音频输出流
读取音频数据
"""
if self.stream is None: return
self.stream.stop_stream()
self.stream.close()
self.stream = None
if self.stop_signal:
self.close_stream()
return None
if not self.stream: return None
return self.stream.read(self.CHUNK, exception_on_overflow=False)
def close_stream_signal(self):
"""
线程安全的关闭系统音频输入流不一定会立即关闭
"""
self.stop_signal = True
def close_stream(self):
"""
关闭系统音频输入流
"""
if self.stream is not None:
self.stream.stop_stream()
self.stream.close()
self.stream = None
self.stop_signal = False

4
engine/utils/__init__.py Normal file
View File

@@ -0,0 +1,4 @@
from .audioprcs import merge_chunk_channels, resample_chunk_mono, resample_mono_chunk
from .sysout import stdout, stdout_cmd, stdout_obj, stderr
from .thdata import thread_data
from .server import start_server

76
engine/utils/audioprcs.py Normal file
View File

@@ -0,0 +1,76 @@
import samplerate
import numpy as np
import numpy.core.multiarray # do not remove
def merge_chunk_channels(chunk: bytes, channels: int) -> bytes:
"""
将当前多通道音频数据块转换为单通道音频数据块
Args:
chunk: 多通道音频数据块
channels: 通道数
Returns:
单通道音频数据块
"""
if channels == 1: return chunk
# (length * channels,)
chunk_np = np.frombuffer(chunk, dtype=np.int16)
# (length, channels)
chunk_np = chunk_np.reshape(-1, channels)
# (length,)
chunk_mono_f = np.mean(chunk_np.astype(np.float32), axis=1)
chunk_mono = np.round(chunk_mono_f).astype(np.int16)
return chunk_mono.tobytes()
def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
"""
将当前多通道音频数据块转换成单通道音频数据块,然后进行重采样
Args:
chunk: 多通道音频数据块
channels: 通道数
orig_sr: 原始采样率
target_sr: 目标采样率
mode: 重采样模式,可选:'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
Return:
单通道音频数据块
"""
if channels == 1:
chunk_mono = np.frombuffer(chunk, dtype=np.int16)
chunk_mono = chunk_mono.astype(np.float32)
else:
# (length * channels,)
chunk_np = np.frombuffer(chunk, dtype=np.int16)
# (length, channels)
chunk_np = chunk_np.reshape(-1, channels)
# (length,)
chunk_mono = np.mean(chunk_np.astype(np.float32), axis=1)
ratio = target_sr / orig_sr
chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode)
chunk_mono_r = np.round(chunk_mono_r).astype(np.int16)
return chunk_mono_r.tobytes()
def resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
"""
将当前单通道音频块进行重采样
Args:
chunk: 单通道音频数据块
orig_sr: 原始采样率
target_sr: 目标采样率
mode: 重采样模式,可选:'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
Return:
单通道音频数据块
"""
chunk_np = np.frombuffer(chunk, dtype=np.int16)
chunk_np = chunk_np.astype(np.float32)
ratio = target_sr / orig_sr
chunk_r = samplerate.resample(chunk_np, ratio, converter_type=mode)
chunk_r = np.round(chunk_r).astype(np.int16)
return chunk_r.tobytes()

41
engine/utils/server.py Normal file
View File

@@ -0,0 +1,41 @@
import socket
import threading
import json
from utils import thread_data, stdout_cmd, stderr
def handle_client(client_socket):
global thread_data
while thread_data.status == 'running':
try:
data = client_socket.recv(4096).decode('utf-8')
if not data:
break
data = json.loads(data)
if data['command'] == 'stop':
thread_data.status = 'stop'
break
except Exception as e:
stderr(f'Communication error: {e}')
break
thread_data.status = 'stop'
client_socket.close()
def start_server(port: int):
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
server.bind(('localhost', port))
server.listen(1)
except Exception as e:
stderr(str(e))
stdout_cmd('kill')
return
stdout_cmd('connect')
client, addr = server.accept()
client_handler = threading.Thread(target=handle_client, args=(client,))
client_handler.daemon = True
client_handler.start()

18
engine/utils/sysout.py Normal file
View File

@@ -0,0 +1,18 @@
import sys
import json
def stdout(text: str):
stdout_cmd("print", text)
def stdout_cmd(command: str, content = ""):
msg = { "command": command, "content": content }
sys.stdout.write(json.dumps(msg) + "\n")
sys.stdout.flush()
def stdout_obj(obj):
sys.stdout.write(json.dumps(obj) + "\n")
sys.stdout.flush()
def stderr(text: str):
sys.stderr.write(text + "\n")
sys.stderr.flush()

5
engine/utils/thdata.py Normal file
View File

@@ -0,0 +1,5 @@
class ThreadData:
def __init__(self):
self.status = "running"
thread_data = ThreadData()

1013
package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,7 @@
{
"name": "auto-caption",
"version": "0.0.1",
"productName": "Auto Caption",
"version": "0.6.0",
"description": "A cross-platform subtitle display software.",
"main": "./out/main/index.js",
"author": "himeditator",
@@ -24,16 +25,17 @@
"@electron-toolkit/preload": "^3.0.1",
"@electron-toolkit/utils": "^4.0.0",
"ant-design-vue": "^4.2.6",
"pidusage": "^4.0.1",
"pinia": "^3.0.2",
"vue-router": "^4.5.1",
"ws": "^8.18.2"
"vue-i18n": "^11.1.9",
"vue-router": "^4.5.1"
},
"devDependencies": {
"@electron-toolkit/eslint-config-prettier": "3.0.0",
"@electron-toolkit/eslint-config-ts": "^3.0.0",
"@electron-toolkit/tsconfig": "^1.0.1",
"@types/node": "^22.14.1",
"@types/ws": "^8.18.1",
"@types/pidusage": "^2.0.5",
"@vitejs/plugin-vue": "^5.2.3",
"electron": "^35.1.5",
"electron-builder": "^25.1.8",

View File

@@ -1,221 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from dashscope.audio.asr import *\n",
"import pyaudiowpatch as pyaudio\n",
"import numpy as np\n",
"\n",
"\n",
"def getDefaultSpeakers(mic: pyaudio.PyAudio, info = True):\n",
" \"\"\"\n",
" 获取默认的系统音频输出的回环设备\n",
" Args:\n",
" mic (pyaudio.PyAudio): pyaudio对象\n",
" info (bool, optional): 是否打印设备信息. Defaults to True.\n",
"\n",
" Returns:\n",
" dict: 统音频输出的回环设备\n",
" \"\"\"\n",
" try:\n",
" WASAPI_info = mic.get_host_api_info_by_type(pyaudio.paWASAPI)\n",
" except OSError:\n",
" print(\"Looks like WASAPI is not available on the system. Exiting...\")\n",
" exit()\n",
"\n",
" default_speaker = mic.get_device_info_by_index(WASAPI_info[\"defaultOutputDevice\"])\n",
" if(info): print(\"wasapi_info:\\n\", WASAPI_info, \"\\n\")\n",
" if(info): print(\"default_speaker:\\n\", default_speaker, \"\\n\")\n",
"\n",
" if not default_speaker[\"isLoopbackDevice\"]:\n",
" for loopback in mic.get_loopback_device_info_generator():\n",
" if default_speaker[\"name\"] in loopback[\"name\"]:\n",
" default_speaker = loopback\n",
" if(info): print(\"Using loopback device:\\n\", default_speaker, \"\\n\")\n",
" break\n",
" else:\n",
" print(\"Default loopback output device not found.\")\n",
" print(\"Run `python -m pyaudiowpatch` to check available devices.\")\n",
" print(\"Exiting...\")\n",
" exit()\n",
" \n",
" if(info): print(f\"Recording Device: #{default_speaker['index']} {default_speaker['name']}\")\n",
" return default_speaker\n",
"\n",
"\n",
"class Callback(TranslationRecognizerCallback):\n",
" \"\"\"\n",
" 语音大模型流式传输回调对象\n",
" \"\"\"\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.usage = 0\n",
" self.sentences = []\n",
" self.translations = []\n",
" \n",
" def on_open(self) -> None:\n",
" print(\"\\n流式翻译开始...\\n\")\n",
"\n",
" def on_close(self) -> None:\n",
" print(f\"\\nTokens消耗{self.usage}\")\n",
" print(f\"流式翻译结束...\\n\")\n",
" for i in range(len(self.sentences)):\n",
" print(f\"\\n{self.sentences[i]}\\n{self.translations[i]}\\n\")\n",
"\n",
" def on_event(\n",
" self,\n",
" request_id,\n",
" transcription_result: TranscriptionResult,\n",
" translation_result: TranslationResult,\n",
" usage\n",
" ) -> None:\n",
" if transcription_result is not None:\n",
" id = transcription_result.sentence_id\n",
" text = transcription_result.text\n",
" if transcription_result.stash is not None:\n",
" stash = transcription_result.stash.text\n",
" else:\n",
" stash = \"\"\n",
" print(f\"#{id}: {text}{stash}\")\n",
" if usage: self.sentences.append(text)\n",
" \n",
" if translation_result is not None:\n",
" lang = translation_result.get_language_list()[0]\n",
" text = translation_result.get_translation(lang).text\n",
" if translation_result.get_translation(lang).stash is not None:\n",
" stash = translation_result.get_translation(lang).stash.text\n",
" else:\n",
" stash = \"\"\n",
" print(f\"#{lang}: {text}{stash}\")\n",
" if usage: self.translations.append(text)\n",
" \n",
" if usage: self.usage += usage['duration']"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"采样输入设备:\n",
" - 序号37\n",
" - 名称:耳机 (HUAWEI FreeLace 活力版) [Loopback]\n",
" - 最大输入通道数2\n",
" - 默认低输入延迟0.003s\n",
" - 默认高输入延迟0.01s\n",
" - 默认采样率44100.0Hz\n",
" - 是否回环设备True\n",
"\n",
"音频样本块大小4410\n",
"样本位宽2\n",
"音频数据格式8\n",
"音频通道数2\n",
"音频采样率44100\n",
"\n"
]
}
],
"source": [
"mic = pyaudio.PyAudio()\n",
"default_speaker = getDefaultSpeakers(mic, False)\n",
"\n",
"SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)\n",
"FORMAT = pyaudio.paInt16\n",
"CHANNELS = default_speaker[\"maxInputChannels\"]\n",
"RATE = int(default_speaker[\"defaultSampleRate\"])\n",
"CHUNK = RATE // 10\n",
"INDEX = default_speaker[\"index\"]\n",
"\n",
"dev_info = f\"\"\"\n",
"采样输入设备:\n",
" - 序号:{default_speaker['index']}\n",
" - 名称:{default_speaker['name']}\n",
" - 最大输入通道数:{default_speaker['maxInputChannels']}\n",
" - 默认低输入延迟:{default_speaker['defaultLowInputLatency']}s\n",
" - 默认高输入延迟:{default_speaker['defaultHighInputLatency']}s\n",
" - 默认采样率:{default_speaker['defaultSampleRate']}Hz\n",
" - 是否回环设备:{default_speaker['isLoopbackDevice']}\n",
"\n",
"音频样本块大小:{CHUNK}\n",
"样本位宽:{SAMP_WIDTH}\n",
"音频数据格式:{FORMAT}\n",
"音频通道数:{CHANNELS}\n",
"音频采样率:{RATE}\n",
"\"\"\"\n",
"print(dev_info)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"RECORD_SECONDS = 20 # 监听时长(s)\n",
"\n",
"stream = mic.open(\n",
" format = FORMAT,\n",
" channels = CHANNELS,\n",
" rate = RATE,\n",
" input = True,\n",
" input_device_index = INDEX\n",
")\n",
"translator = TranslationRecognizerRealtime(\n",
" model = \"gummy-realtime-v1\",\n",
" format = \"pcm\",\n",
" sample_rate = RATE,\n",
" transcription_enabled = True,\n",
" translation_enabled = True,\n",
" source_language = \"ja\",\n",
" translation_target_languages = [\"zh\"],\n",
" callback = Callback()\n",
")\n",
"translator.start()\n",
"\n",
"for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):\n",
" data = stream.read(CHUNK)\n",
" data_np = np.frombuffer(data, dtype=np.int16)\n",
" data_np_r = data_np.reshape(-1, CHANNELS)\n",
" print(data_np_r.shape)\n",
" mono_data = np.mean(data_np_r.astype(np.float32), axis=1)\n",
" mono_data = mono_data.astype(np.int16)\n",
" mono_data_bytes = mono_data.tobytes()\n",
" translator.send_audio_frame(mono_data_bytes)\n",
"\n",
"translator.stop()\n",
"stream.stop_stream()\n",
"stream.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "mystd",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,4 +0,0 @@
numpy
dashscope
pyaudio
pyaudiowpatch

View File

@@ -1,80 +0,0 @@
from dashscope.audio.asr import (
TranslationRecognizerCallback,
TranscriptionResult,
TranslationResult,
TranslationRecognizerRealtime
)
from datetime import datetime
import json
import sys
class Callback(TranslationRecognizerCallback):
"""
语音大模型流式传输回调对象
"""
def __init__(self):
super().__init__()
self.usage = 0
self.cur_id = -1
self.time_str = ''
def on_open(self) -> None:
pass
def on_close(self) -> None:
pass
def on_event(
self,
request_id,
transcription_result: TranscriptionResult,
translation_result: TranslationResult,
usage
) -> None:
caption = {}
if transcription_result is not None:
caption['index'] = transcription_result.sentence_id
caption['text'] = transcription_result.text
if caption['index'] != self.cur_id:
self.cur_id = caption['index']
cur_time = datetime.now().strftime('%H:%M:%S')
caption['time_s'] = cur_time
self.time_str = cur_time
else:
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S')
caption['translation'] = ""
if translation_result is not None:
lang = translation_result.get_language_list()[0]
caption['translation'] = translation_result.get_translation(lang).text
if usage:
self.usage += usage['duration']
# print(caption)
self.send_to_node(caption)
def send_to_node(self, data):
"""
将数据发送到 Node.js 进程
"""
try:
json_data = json.dumps(data) + '\n'
sys.stdout.write(json_data)
sys.stdout.flush()
except Exception as e:
print(f"Error sending data to Node.js: {e}", file=sys.stderr)
class GummyTranslator:
def __init__(self, rate, source, target):
self.translator = TranslationRecognizerRealtime(
model = "gummy-realtime-v1",
format = "pcm",
sample_rate = rate,
transcription_enabled = True,
translation_enabled = (target is not None),
source_language = source,
translation_target_languages = [target],
callback = Callback()
)

View File

@@ -1,48 +0,0 @@
import sys
if sys.platform == 'win32':
from sysaudio.win import AudioStream, mergeStreamChannels
elif sys.platform == 'linux':
from sysaudio.linux import AudioStream, mergeStreamChannels
else:
raise NotImplementedError(f"Unsupported platform: {sys.platform}")
from audio2text.gummy import GummyTranslator
import sys
import argparse
def convert_audio_to_text(s_lang, t_lang, audio_type):
sys.stdout.reconfigure(line_buffering=True)
stream = AudioStream(audio_type)
stream.openStream()
if t_lang == 'none':
gummy = GummyTranslator(stream.RATE, s_lang, None)
else:
gummy = GummyTranslator(stream.RATE, s_lang, t_lang)
gummy.translator.start()
while True:
try:
if not stream.stream: continue
data = stream.stream.read(stream.CHUNK)
data = mergeStreamChannels(data, stream.CHANNELS)
gummy.translator.send_audio_frame(data)
except KeyboardInterrupt:
stream.closeStream()
gummy.translator.stop()
break
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-a', '--audio_type', default='0', help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
args = parser.parse_args()
convert_audio_to_text(
args.source_language,
args.target_language,
0 if args.audio_type == '0' else 1
)

Binary file not shown.

View File

@@ -1,79 +0,0 @@
import pyaudio
import numpy as np
def mergeStreamChannels(data, channels):
"""
将当前多通道流数据合并为单通道流数据
Args:
data: 多通道数据
channels: 通道数
Returns:
mono_data_bytes: 单通道数据
"""
# (length * channels,)
data_np = np.frombuffer(data, dtype=np.int16)
# (length, channels)
data_np_r = data_np.reshape(-1, channels)
# (length,)
mono_data = np.mean(data_np_r.astype(np.float32), axis=1)
mono_data = mono_data.astype(np.int16)
mono_data_bytes = mono_data.tobytes()
return mono_data_bytes
class AudioStream:
def __init__(self, audio_type=1):
self.audio_type = audio_type
self.mic = pyaudio.PyAudio()
self.device = self.mic.get_default_input_device_info()
self.stream = None
self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
self.FORMAT = pyaudio.paInt16
self.CHANNELS = self.device["maxInputChannels"]
self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // 20
self.INDEX = self.device["index"]
def printInfo(self):
dev_info = f"""
采样输入设备:
- 设备类型:{ "音频输入Linux平台目前仅支持该项" }
- 序号:{self.device['index']}
- 名称:{self.device['name']}
- 最大输入通道数:{self.device['maxInputChannels']}
- 默认低输入延迟:{self.device['defaultLowInputLatency']}s
- 默认高输入延迟:{self.device['defaultHighInputLatency']}s
- 默认采样率:{self.device['defaultSampleRate']}Hz
音频样本块大小:{self.CHUNK}
样本位宽:{self.SAMP_WIDTH}
音频数据格式:{self.FORMAT}
音频通道数:{self.CHANNELS}
音频采样率:{self.RATE}
"""
print(dev_info)
def openStream(self):
"""
打开并返回系统音频输出流
"""
if self.stream: return self.stream
self.stream = self.mic.open(
format = self.FORMAT,
channels = self.CHANNELS,
rate = self.RATE,
input = True,
input_device_index = self.INDEX
)
return self.stream
def closeStream(self):
"""
关闭系统音频输出流
"""
if self.stream is None: return
self.stream.stop_stream()
self.stream.close()
self.stream = None

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

View File

@@ -1,43 +1,42 @@
import { shell, BrowserWindow, ipcMain } from 'electron'
import path from 'path'
import { is } from '@electron-toolkit/utils'
import icon from '../../resources/icon.png?asset'
import { controlWindow } from './control'
import { sendStyles, sendCaptionLog } from './utils/config'
import icon from '../../build/icon.png?asset'
import { controlWindow } from './ControlWindow'
import { allConfig } from './utils/AllConfig'
class CaptionWindow {
class CaptionWindow {
window: BrowserWindow | undefined;
public createWindow(): void {
public createWindow(): void {
this.window = new BrowserWindow({
icon: icon,
width: 900,
width: allConfig.captionWindowWidth,
height: 100,
minWidth: 480,
show: false,
frame: false,
transparent: true,
alwaysOnTop: true,
center: true,
autoHideMenuBar: true,
...(process.platform === 'linux' ? { icon } : {}),
webPreferences: {
preload: path.join(__dirname, '../preload/index.js'),
sandbox: false
}
})
setTimeout(() => {
if (this.window) {
sendStyles(this.window);
sendCaptionLog(this.window, 'set');
}
}, 1000);
this.window.setAlwaysOnTop(true, 'screen-saver')
this.window.on('ready-to-show', () => {
this.window?.show()
})
this.window.on('close', () => {
if(this.window) {
allConfig.captionWindowWidth = this.window?.getBounds().width;
}
})
this.window.on('closed', () => {
this.window = undefined
})
@@ -46,7 +45,7 @@ class CaptionWindow {
shell.openExternal(details.url)
return { action: 'deny' }
})
if (is.dev && process.env['ELECTRON_RENDERER_URL']) {
this.window.loadURL(`${process.env['ELECTRON_RENDERER_URL']}/#/caption`)
} else {
@@ -57,7 +56,6 @@ class CaptionWindow {
}
public handleMessage() {
// 字幕窗口请求创建控制窗口
ipcMain.on('caption.controlWindow.activate', () => {
if(!controlWindow.window){
controlWindow.createWindow()
@@ -66,22 +64,23 @@ class CaptionWindow {
controlWindow.window.show()
}
})
// 字幕窗口高度发生变化
ipcMain.on('caption.windowHeight.change', (_, height) => {
if(this.window){
this.window.setSize(this.window.getSize()[0], height)
this.window.setSize(this.window.getSize()[0], height)
}
})
// 关闭字幕窗口
ipcMain.on('caption.window.close', () => {
if(this.window){
this.window.close()
}
})
// 是否固定在最前面
ipcMain.on('caption.pin.set', (_, pinned) => {
if(this.window){
this.window.setAlwaysOnTop(pinned)
if(pinned) this.window.setAlwaysOnTop(true, 'screen-saver')
else this.window.setAlwaysOnTop(false)
}
})
}

164
src/main/ControlWindow.ts Normal file
View File

@@ -0,0 +1,164 @@
import { shell, BrowserWindow, ipcMain, nativeTheme, dialog } from 'electron'
import path from 'path'
import { EngineInfo } from './types'
import pidusage from 'pidusage'
import { is } from '@electron-toolkit/utils'
import icon from '../../build/icon.png?asset'
import { captionWindow } from './CaptionWindow'
import { allConfig } from './utils/AllConfig'
import { captionEngine } from './utils/CaptionEngine'
class ControlWindow {
window: BrowserWindow | undefined;
public createWindow(): void {
this.window = new BrowserWindow({
icon: icon,
width: 1200,
height: 800,
minWidth: 750,
minHeight: 500,
show: false,
center: true,
autoHideMenuBar: true,
webPreferences: {
preload: path.join(__dirname, '../preload/index.js'),
sandbox: false
}
})
allConfig.readConfig()
this.window.on('ready-to-show', () => {
this.window?.show()
})
this.window.on('closed', () => {
this.window = undefined
allConfig.writeConfig()
})
this.window.webContents.setWindowOpenHandler((details) => {
shell.openExternal(details.url)
return { action: 'deny' }
})
if (is.dev && process.env['ELECTRON_RENDERER_URL']) {
this.window.loadURL(process.env['ELECTRON_RENDERER_URL'])
} else {
this.window.loadFile(path.join(__dirname, '../renderer/index.html'))
}
}
public handleMessage() {
nativeTheme.on('updated', () => {
if(allConfig.uiTheme === 'system'){
if(nativeTheme.shouldUseDarkColors && this.window){
this.window.webContents.send('control.nativeTheme.change', 'dark')
}
else if(!nativeTheme.shouldUseDarkColors && this.window){
this.window.webContents.send('control.nativeTheme.change', 'light')
}
}
})
ipcMain.handle('both.window.mounted', () => {
return allConfig.getFullConfig()
})
ipcMain.handle('control.nativeTheme.get', () => {
if(allConfig.uiTheme === 'system'){
if(nativeTheme.shouldUseDarkColors) return 'dark'
return 'light'
}
return allConfig.uiTheme
})
ipcMain.handle('control.folder.select', async () => {
const result = await dialog.showOpenDialog({
properties: ['openDirectory']
});
if (result.canceled) return "";
return result.filePaths[0];
})
ipcMain.handle('control.engine.info', async () => {
const info: EngineInfo = {
pid: 0, ppid: 0, port: 0, cpu: 0, mem: 0, elapsed: 0
}
if(captionEngine.status !== 'running') return info
const stats = await pidusage(captionEngine.process.pid)
info.pid = stats.pid
info.ppid = stats.ppid
info.port = captionEngine.port
info.cpu = stats.cpu
info.mem = stats.memory
info.elapsed = stats.elapsed
return info
})
ipcMain.on('control.uiLanguage.change', (_, args) => {
allConfig.uiLanguage = args
if(captionWindow.window){
captionWindow.window.webContents.send('control.uiLanguage.set', args)
}
})
ipcMain.on('control.uiTheme.change', (_, args) => {
allConfig.uiTheme = args
})
ipcMain.on('control.leftBarWidth.change', (_, args) => {
allConfig.leftBarWidth = args
})
ipcMain.on('control.styles.change', (_, args) => {
allConfig.setStyles(args)
if(captionWindow.window){
allConfig.sendStyles(captionWindow.window)
}
})
ipcMain.on('control.styles.reset', () => {
allConfig.resetStyles()
if(this.window){
allConfig.sendStyles(this.window)
}
if(captionWindow.window){
allConfig.sendStyles(captionWindow.window)
}
})
ipcMain.on('control.captionWindow.activate', () => {
if(!captionWindow.window){
captionWindow.createWindow()
}
else {
captionWindow.window.show()
}
})
ipcMain.on('control.controls.change', (_, args) => {
allConfig.setControls(args)
})
ipcMain.on('control.engine.start', () => {
captionEngine.start()
})
ipcMain.on('control.engine.stop', () => {
captionEngine.stop()
})
ipcMain.on('control.captionLog.clear', () => {
allConfig.captionLog.splice(0)
})
}
public sendErrorMessage(message: string) {
this.window?.webContents.send('control.error.occurred', message)
}
}
export const controlWindow = new ControlWindow()

View File

@@ -1,109 +0,0 @@
import { shell, BrowserWindow, ipcMain } from 'electron'
import path from 'path'
import { is } from '@electron-toolkit/utils'
import icon from '../../resources/icon.png?asset'
import { captionWindow } from './caption'
import {
captionEngine,
captionLog,
controls,
setStyles,
sendStyles,
sendCaptionLog,
setControls,
sendControls
} from './utils/config'
class ControlWindow {
window: BrowserWindow | undefined;
public createWindow(): void {
this.window = new BrowserWindow({
icon: icon,
width: 1200,
height: 800,
minWidth: 900,
minHeight: 600,
show: false,
center: true,
autoHideMenuBar: true,
...(process.platform === 'linux' ? { icon } : {}),
webPreferences: {
preload: path.join(__dirname, '../preload/index.js'),
sandbox: false
}
})
setTimeout(() => {
if (this.window) {
sendStyles(this.window) // 配置初始样式
sendCaptionLog(this.window, 'set') // 配置当前字幕记录
sendControls(this.window) // 配置字幕引擎配置
}
}, 1000);
this.window.on('ready-to-show', () => {
this.window?.show()
})
this.window.on('closed', () => {
this.window = undefined
})
this.window.webContents.setWindowOpenHandler((details) => {
shell.openExternal(details.url)
return { action: 'deny' }
})
if (is.dev && process.env['ELECTRON_RENDERER_URL']) {
this.window.loadURL(process.env['ELECTRON_RENDERER_URL'])
} else {
this.window.loadFile(path.join(__dirname, '../renderer/index.html'))
}
}
public handleMessage() {
// 控制窗口样式更新
ipcMain.on('control.style.change', (_, args) => {
setStyles(args)
if(captionWindow.window){
sendStyles(captionWindow.window)
}
})
// 控制窗口请求创建字幕窗口
ipcMain.on('control.captionWindow.activate', () => {
if(!captionWindow.window){
captionWindow.createWindow()
}
else {
captionWindow.window.show()
}
})
// 字幕引擎控制配置更新并启动引擎
ipcMain.on('control.control.change', (_, args) => {
setControls(args)
})
// 启动字幕引擎
ipcMain.on('control.engine.start', () => {
if(controls.engineEnabled){
this.window?.webContents.send('control.engine.already')
}
else {
captionEngine.start()
this.window?.webContents.send('control.engine.started')
}
})
// 停止字幕引擎
ipcMain.on('control.engine.stop', () => {
captionEngine.stop()
this.window?.webContents.send('control.engine.stopped')
})
// 清空字幕记录
ipcMain.on('control.caption.clear', () => {
captionLog.splice(0)
})
}
}
export const controlWindow = new ControlWindow()

11
src/main/i18n/index.ts Normal file
View File

@@ -0,0 +1,11 @@
import zh from './lang/zh'
import en from './lang/en'
import ja from './lang/ja'
import { allConfig } from '../utils/AllConfig'
export function i18n(key: string): string{
if(allConfig.uiLanguage === 'zh') return zh[key] || key
else if(allConfig.uiLanguage === 'en') return en[key] || key
else if(allConfig.uiLanguage === 'ja') return ja[key] || key
else return key
}

8
src/main/i18n/lang/en.ts Normal file
View File

@@ -0,0 +1,8 @@
export default {
"gummy.key.missing": "API KEY is not set, and the DASHSCOPE_API_KEY environment variable is not detected. To use the gummy engine, you need to obtain an API KEY from the Alibaba Cloud Bailian platform and add it to the settings or configure it in the local environment variables.",
"platform.unsupported": "Unsupported platform: ",
"engine.start.error": "Caption engine failed to start: ",
"engine.output.parse.error": "Unable to parse caption engine output as a JSON object: ",
"engine.error": "Caption engine error: ",
"engine.shutdown.error": "Failed to shut down the caption engine process: "
}

8
src/main/i18n/lang/ja.ts Normal file
View File

@@ -0,0 +1,8 @@
export default {
"gummy.key.missing": "API KEY が設定されておらず、DASHSCOPE_API_KEY 環境変数も検出されていません。Gummy エンジンを使用するには、Alibaba Cloud Bailian プラットフォームから API KEY を取得し、設定に追加するか、ローカルの環境変数に設定する必要があります。",
"platform.unsupported": "サポートされていないプラットフォーム: ",
"engine.start.error": "字幕エンジンの起動に失敗しました: ",
"engine.output.parse.error": "字幕エンジンの出力を JSON オブジェクトとして解析できませんでした: ",
"engine.error": "字幕エンジンエラー: ",
"engine.shutdown.error": "字幕エンジンプロセスの終了に失敗しました: "
}

8
src/main/i18n/lang/zh.ts Normal file
View File

@@ -0,0 +1,8 @@
export default {
"gummy.key.missing": "没有设置 API KEY也没有检测到 DASHSCOPE_API_KEY 环境变量。如果要使用 gummy 引擎,需要在阿里云百炼平台获取 API KEY并在添加到设置中或者配置到本机环境变量。",
"platform.unsupported": "不支持的平台:",
"engine.start.error": "字幕引擎启动失败:",
"engine.output.parse.error": "字幕引擎输出内容无法解析为 JSON 对象:",
"engine.error": "字幕引擎错误:",
"engine.shutdown.error": "字幕引擎进程关闭失败:"
}

View File

@@ -1,8 +1,9 @@
import { app, BrowserWindow } from 'electron'
import { electronApp, optimizer } from '@electron-toolkit/utils'
import { controlWindow } from './control'
import { captionWindow } from './caption'
import { captionEngine } from './utils/config'
import { controlWindow } from './ControlWindow'
import { captionWindow } from './CaptionWindow'
import { allConfig } from './utils/AllConfig'
import { captionEngine } from './utils/CaptionEngine'
app.whenReady().then(() => {
electronApp.setAppUserModelId('com.himeditator.autocaption')
@@ -23,8 +24,9 @@ app.whenReady().then(() => {
})
})
app.on('will-quit', async () => {
captionEngine.stop()
app.on('will-quit', async () => {
captionEngine.kill()
allConfig.writeConfig()
});
app.on('window-all-closed', () => {

View File

@@ -1,13 +1,40 @@
export type UILanguage = "zh" | "en" | "ja"
export type UITheme = "light" | "dark" | "system"
export interface Controls {
engineEnabled: boolean,
sourceLang: string,
targetLang: string,
engine: string,
audio: 0 | 1,
translation: boolean,
API_KEY: string,
modelPath: string,
customized: boolean,
customizedApp: string,
customizedCommand: string
}
export interface Styles {
lineBreak: number,
fontFamily: string,
fontSize: number,
fontColor: string,
fontWeight: number,
background: string,
opacity: number,
showPreview: boolean,
transDisplay: boolean,
transFontFamily: string,
transFontSize: number,
transFontColor: string
transFontColor: string,
transFontWeight: number,
textShadow: boolean,
offsetX: number,
offsetY: number,
blur: number,
textShadowColor: string
}
export interface CaptionItem {
@@ -18,14 +45,21 @@ export interface CaptionItem {
translation: string
}
export interface Controls {
engineEnabled: boolean,
sourceLang: string,
targetLang: string,
engine: string,
audio: 0 | 1,
translation: boolean,
customized: boolean,
customizedApp: string,
customizedCommand: string
export interface FullConfig {
platform: string,
uiLanguage: UILanguage,
uiTheme: UITheme,
leftBarWidth: number,
styles: Styles,
controls: Controls,
captionLog: CaptionItem[]
}
export interface EngineInfo {
pid: number,
ppid: number,
port:number,
cpu: number,
mem: number,
elapsed: number
}

166
src/main/utils/AllConfig.ts Normal file
View File

@@ -0,0 +1,166 @@
import {
UILanguage, UITheme, Styles, Controls,
CaptionItem, FullConfig
} from '../types'
import { Log } from './Log'
import { app, BrowserWindow } from 'electron'
import * as path from 'path'
import * as fs from 'fs'
const defaultStyles: Styles = {
lineBreak: 1,
fontFamily: 'sans-serif',
fontSize: 24,
fontColor: '#000000',
fontWeight: 4,
background: '#dbe2ef',
opacity: 80,
showPreview: true,
transDisplay: true,
transFontFamily: 'sans-serif',
transFontSize: 24,
transFontColor: '#000000',
transFontWeight: 4,
textShadow: false,
offsetX: 2,
offsetY: 2,
blur: 0,
textShadowColor: '#ffffff'
};
const defaultControls: Controls = {
sourceLang: 'en',
targetLang: 'zh',
engine: 'gummy',
audio: 0,
engineEnabled: false,
API_KEY: '',
modelPath: '',
translation: true,
customized: false,
customizedApp: '',
customizedCommand: ''
};
class AllConfig {
captionWindowWidth: number = 900;
uiLanguage: UILanguage = 'zh';
leftBarWidth: number = 8;
uiTheme: UITheme = 'system';
styles: Styles = {...defaultStyles};
controls: Controls = {...defaultControls};
lastLogIndex: number = -1;
captionLog: CaptionItem[] = [];
constructor() {}
public readConfig() {
const configPath = path.join(app.getPath('userData'), 'config.json')
if(fs.existsSync(configPath)){
const config = JSON.parse(fs.readFileSync(configPath, 'utf-8'))
if(config.captionWindowWidth) this.captionWindowWidth = config.captionWindowWidth
if(config.uiLanguage) this.uiLanguage = config.uiLanguage
if(config.uiTheme) this.uiTheme = config.uiTheme
if(config.leftBarWidth) this.leftBarWidth = config.leftBarWidth
if(config.styles) this.setStyles(config.styles)
if(config.controls) this.setControls(config.controls)
Log.info('Read Config from:', configPath)
}
}
public writeConfig() {
const config = {
captionWindowWidth: this.captionWindowWidth,
uiLanguage: this.uiLanguage,
uiTheme: this.uiTheme,
leftBarWidth: this.leftBarWidth,
controls: this.controls,
styles: this.styles
}
const configPath = path.join(app.getPath('userData'), 'config.json')
fs.writeFileSync(configPath, JSON.stringify(config, null, 2))
Log.info('Write Config to:', configPath)
}
public getFullConfig(): FullConfig {
return {
platform: process.platform,
uiLanguage: this.uiLanguage,
uiTheme: this.uiTheme,
leftBarWidth: this.leftBarWidth,
styles: this.styles,
controls: this.controls,
captionLog: this.captionLog
}
}
public setStyles(args: Object) {
for(let key in this.styles) {
if(key in args) {
this.styles[key] = args[key]
}
}
Log.info('Set Styles:', this.styles)
}
public resetStyles() {
this.setStyles(defaultStyles)
}
public sendStyles(window: BrowserWindow) {
window.webContents.send('both.styles.set', this.styles)
Log.info(`Send Styles to #${window.id}:`, this.styles)
}
public setControls(args: Object) {
const engineEnabled = this.controls.engineEnabled
for(let key in this.controls){
if(key in args) {
this.controls[key] = args[key]
}
}
this.controls.engineEnabled = engineEnabled
Log.info('Set Controls:', this.controls)
}
public sendControls(window: BrowserWindow, info = true) {
window.webContents.send('control.controls.set', this.controls)
if(info) Log.info(`Send Controls to #${window.id}:`, this.controls)
}
public updateCaptionLog(log: CaptionItem) {
let command: 'add' | 'upd' = 'add'
if(
this.captionLog.length &&
this.lastLogIndex === log.index
) {
this.captionLog.splice(this.captionLog.length - 1, 1, log)
command = 'upd'
}
else {
this.captionLog.push(log)
this.lastLogIndex = log.index
}
this.captionLog[this.captionLog.length - 1].index = this.captionLog.length
for(const window of BrowserWindow.getAllWindows()){
this.sendCaptionLog(window, command)
}
}
public sendCaptionLog(window: BrowserWindow, command: 'add' | 'upd' | 'set') {
if(command === 'add'){
window.webContents.send(`both.captionLog.add`, this.captionLog[this.captionLog.length - 1])
}
else if(command === 'upd'){
window.webContents.send(`both.captionLog.upd`, this.captionLog[this.captionLog.length - 1])
}
else if(command === 'set'){
window.webContents.send(`both.captionLog.set`, this.captionLog)
}
}
}
export const allConfig = new AllConfig()

View File

@@ -0,0 +1,236 @@
import { exec, spawn } from 'child_process'
import { app } from 'electron'
import { is } from '@electron-toolkit/utils'
import path from 'path'
import net from 'net'
import { controlWindow } from '../ControlWindow'
import { allConfig } from './AllConfig'
import { i18n } from '../i18n'
import { Log } from './Log'
export class CaptionEngine {
appPath: string = ''
command: string[] = []
process: any | undefined
client: net.Socket | undefined
port: number = 8080
status: 'running' | 'starting' | 'stopping' | 'stopped' = 'stopped'
timerID: NodeJS.Timeout | undefined
private getApp(): boolean {
if (allConfig.controls.customized) {
Log.info('Using customized caption engine')
this.appPath = allConfig.controls.customizedApp
this.command = allConfig.controls.customizedCommand.split(' ')
this.port = Math.floor(Math.random() * (65535 - 1024 + 1)) + 1024
this.command.push('-p', this.port.toString())
}
else {
if(allConfig.controls.engine === 'gummy' &&
!allConfig.controls.API_KEY && !process.env.DASHSCOPE_API_KEY
) {
controlWindow.sendErrorMessage(i18n('gummy.key.missing'))
return false
}
this.command = []
if (is.dev) {
if(process.platform === "win32") {
this.appPath = path.join(
app.getAppPath(), 'engine',
'subenv', 'Scripts', 'python.exe'
)
this.command.push(path.join(
app.getAppPath(), 'engine', 'main.py'
))
// this.appPath = path.join(app.getAppPath(), 'engine', 'dist', 'main.exe')
}
else {
this.appPath = path.join(
app.getAppPath(), 'engine',
'subenv', 'bin', 'python3'
)
this.command.push(path.join(
app.getAppPath(), 'engine', 'main.py'
))
}
}
else {
if(process.platform === 'win32') {
this.appPath = path.join(process.resourcesPath, 'engine', 'main.exe')
}
else {
this.appPath = path.join(process.resourcesPath, 'engine', 'main')
}
}
this.command.push('-a', allConfig.controls.audio ? '1' : '0')
this.port = Math.floor(Math.random() * (65535 - 1024 + 1)) + 1024
this.command.push('-p', this.port.toString())
if(allConfig.controls.engine === 'gummy') {
this.command.push('-e', 'gummy')
this.command.push('-s', allConfig.controls.sourceLang)
this.command.push(
'-t', allConfig.controls.translation ?
allConfig.controls.targetLang : 'none'
)
if(allConfig.controls.API_KEY) {
this.command.push('-k', allConfig.controls.API_KEY)
}
}
else if(allConfig.controls.engine === 'vosk'){
this.command.push('-e', 'vosk')
this.command.push('-m', `"${allConfig.controls.modelPath}"`)
}
}
Log.info('Engine Path:', this.appPath)
Log.info('Engine Command:', this.command)
return true
}
public connect() {
Log.info('Connecting to caption engine server...')
if(this.client) { Log.warn('Client already exists, ignoring...') }
this.client = net.createConnection({ port: this.port }, () => {
Log.info('Connected to caption engine server');
});
this.status = 'running'
allConfig.controls.engineEnabled = true
if(controlWindow.window){
allConfig.sendControls(controlWindow.window, false)
controlWindow.window.webContents.send(
'control.engine.started',
this.process.pid
)
}
}
public sendCommand(command: string, content: string = "") {
if(this.client === undefined) {
Log.error('Client not initialized yet')
return
}
const data = JSON.stringify({command, content})
this.client.write(data);
Log.info(`Send data to python server: ${data}`);
}
public start() {
if (this.status !== 'stopped') {
Log.warn('Caption engine is not stopped, current status:', this.status)
return
}
if(!this.getApp()){ return }
this.process = spawn(this.appPath, this.command)
this.status = 'starting'
Log.info('Caption Engine Starting, PID:', this.process.pid)
this.process.stdout.on('data', (data: any) => {
const lines = data.toString().split('\n')
lines.forEach((line: string) => {
if (line.trim()) {
try {
const data_obj = JSON.parse(line)
handleEngineData(data_obj)
} catch (e) {
controlWindow.sendErrorMessage(i18n('engine.output.parse.error') + e)
Log.error('Error parsing JSON:', e)
}
}
});
});
this.process.stderr.on('data', (data: any) => {
const lines = data.toString().split('\n')
lines.forEach((line: string) => {
if(line.trim()){
controlWindow.sendErrorMessage(/*i18n('engine.error') +*/ line)
console.error(line)
}
})
});
this.process.on('close', (code: any) => {
this.process = undefined;
this.client = undefined
allConfig.controls.engineEnabled = false
if(controlWindow.window){
allConfig.sendControls(controlWindow.window, false)
controlWindow.window.webContents.send('control.engine.stopped')
}
this.status = 'stopped'
clearInterval(this.timerID)
Log.info(`Engine exited with code ${code}`)
});
}
public stop() {
if(this.status !== 'running'){
Log.warn('Trying to stop engine which is not running, current status:', this.status)
return
}
this.sendCommand('stop')
if(this.client){
this.client.destroy()
this.client = undefined
}
this.status = 'stopping'
Log.info('Caption engine process stopping...')
this.timerID = setTimeout(() => {
if(this.status !== 'stopping') return
Log.warn('Engine process still not stopped, trying to kill...')
this.kill()
}, 4000);
}
public kill(){
if(!this.process || !this.process.pid) return
if(this.status !== 'running'){
Log.warn('Trying to kill engine which is not running, current status:', this.status)
}
Log.warn('Trying to kill engine process, PID:', this.process.pid)
if(this.client){
this.client.destroy()
this.client = undefined
}
if (this.process.pid) {
let cmd = `kill ${this.process.pid}`;
if (process.platform === "win32") {
cmd = `taskkill /pid ${this.process.pid} /t /f`
}
exec(cmd)
}
this.status = 'stopping'
}
}
function handleEngineData(data: any) {
if(data.command === 'connect'){
captionEngine.connect()
}
else if(data.command === 'kill') {
if(captionEngine.status !== 'stopped') {
Log.warn('Error occurred, trying to kill caption engine...')
captionEngine.kill()
}
}
else if(data.command === 'caption') {
allConfig.updateCaptionLog(data);
}
else if(data.command === 'print') {
Log.info('Engine Print:', data.content)
}
else if(data.command === 'info') {
Log.info('Engine Info:', data.content)
}
else if(data.command === 'usage') {
Log.info('Engine Usage: ', data.content)
}
else {
Log.warn('Unknown command:', data)
}
}
export const captionEngine = new CaptionEngine()

22
src/main/utils/Log.ts Normal file
View File

@@ -0,0 +1,22 @@
function getTimeString() {
const now = new Date()
const HH = String(now.getHours()).padStart(2, '0')
const MM = String(now.getMinutes()).padStart(2, '0')
const SS = String(now.getSeconds()).padStart(2, '0')
const MS = String(now.getMilliseconds()).padStart(3, '0')
return `${HH}:${MM}:${SS}.${MS}`
}
export class Log {
static info(...msg: any[]){
console.log(`[INFO ${getTimeString()}]`, ...msg)
}
static warn(...msg: any[]){
console.warn(`[WARN ${getTimeString()}]`, ...msg)
}
static error(...msg: any[]){
console.error(`[ERROR ${getTimeString()}]`, ...msg)
}
}

View File

@@ -1,89 +0,0 @@
import { Styles, CaptionItem, Controls } from '../types'
import { BrowserWindow } from 'electron'
import { CaptionEngine } from './engine'
export const captionEngine = new CaptionEngine()
export const styles: Styles = {
fontFamily: 'sans-serif',
fontSize: 24,
fontColor: '#000000',
background: '#dbe2ef',
opacity: 80,
transDisplay: true,
transFontFamily: 'sans-serif',
transFontSize: 24,
transFontColor: '#000000'
}
export const captionLog: CaptionItem[] = []
export const controls: Controls = {
sourceLang: 'en',
targetLang: 'zh',
engine: 'gummy',
audio: 0,
engineEnabled: false,
translation: true,
customized: false,
customizedApp: '',
customizedCommand: ''
}
export let engineRunning: boolean = false
export function setStyles(args: any) {
styles.fontFamily = args.fontFamily
styles.fontSize = args.fontSize
styles.fontColor = args.fontColor
styles.background = args.background
styles.opacity = args.opacity
styles.transDisplay = args.transDisplay
styles.transFontFamily = args.transFontFamily
styles.transFontSize = args.transFontSize
styles.transFontColor = args.transFontColor
console.log('[INFO] Set Styles:', styles)
}
export function sendStyles(window: BrowserWindow) {
window.webContents.send('caption.style.set', styles)
console.log(`[INFO] Send Styles to #${window.id}:`, styles)
}
export function sendCaptionLog(window: BrowserWindow, command: string) {
if(command === 'add'){
window.webContents.send(`both.log.add`, captionLog[captionLog.length - 1])
}
else if(command === 'set'){
window.webContents.send(`both.log.${command}`, captionLog)
}
}
export function addCaptionLog(log: CaptionItem) {
if(captionLog.length && captionLog[captionLog.length - 1].index === log.index) {
captionLog.splice(captionLog.length - 1, 1, log)
}
else {
captionLog.push(log)
}
for(const window of BrowserWindow.getAllWindows()){
sendCaptionLog(window, 'add')
}
}
export function setControls(args: any) {
controls.sourceLang = args.sourceLang
controls.targetLang = args.targetLang
controls.engine = args.engine
controls.audio = args.audio
controls.translation = args.translation
controls.customized = args.customized
controls.customizedApp = args.customizedApp
controls.customizedCommand = args.customizedCommand
console.log('[INFO] Set Controls:', controls)
}
export function sendControls(window: BrowserWindow) {
window.webContents.send('control.control.set', controls)
console.log(`[INFO] Send Controls to #${window.id}:`, controls)
}

View File

@@ -1,103 +0,0 @@
import { spawn, exec } from 'child_process'
import { app } from 'electron'
import { is } from '@electron-toolkit/utils'
import path from 'path'
import { addCaptionLog, controls } from './config'
export class CaptionEngine {
appPath: string = ''
command: string[] = []
process: any | undefined
private getApp() {
if(controls.customized && controls.customizedApp){
this.appPath = controls.customizedApp
this.command = [ controls.customizedCommand ]
}
else if(controls.engine === 'gummy'){
let gummyName = ''
if(process.platform === 'win32'){
gummyName = 'main-gummy.exe'
}
else if(process.platform === 'linux'){
gummyName = 'main-gummy'
}
else{
throw new Error('Unsupported platform')
}
if(is.dev){
this.appPath = path.join(
app.getAppPath(),
'python-subprocess', 'dist', gummyName
)
}
else{
this.appPath = path.join(
process.resourcesPath,
'python-subprocess', 'dist', gummyName
)
}
this.command = []
this.command.push('-s', controls.sourceLang)
this.command.push('-t', controls.translation ? controls.targetLang : 'none')
this.command.push('-a', controls.audio ? '1' : '0')
console.log('[INFO] engine', this.appPath)
console.log('[INFO] engine command',this.command)
}
}
public start() {
if (this.process) {
this.stop();
}
this.getApp()
this.process = spawn(this.appPath, this.command)
controls.engineEnabled = true
console.log('[INFO] Caption Engine Started: ', {
appPath: this.appPath,
command: this.command
})
this.process.stdout.on('data', (data) => {
const lines = data.toString().split('\n');
lines.forEach( (line: string) => {
if (line.trim()) {
try {
const caption = JSON.parse(line);
addCaptionLog(caption);
} catch (e) {
console.error('Error parsing JSON:', e);
}
}
});
});
this.process.stderr.on('data', (data) => {
console.error(`Python Error: ${data}`);
});
this.process.on('close', (code: any) => {
console.log(`Python process exited with code ${code}`);
this.process = undefined;
});
}
public stop() {
if (this.process) {
if (process.platform === "win32" && this.process.pid) {
exec(`taskkill /pid ${this.process.pid} /t /f`, (error) => {
if (error) {
console.error(`Failed to kill process: ${error}`);
}
});
} else {
this.process.kill('SIGKILL');
}
this.process = undefined;
controls.engineEnabled = false;
console.log('[INFO] Caption engine process stopped');
}
}
}

View File

@@ -3,4 +3,22 @@
</template>
<script setup lang="ts">
import { onMounted } from 'vue'
import { FullConfig } from './types'
import { useCaptionLogStore } from './stores/captionLog'
import { useCaptionStyleStore } from './stores/captionStyle'
import { useEngineControlStore } from './stores/engineControl'
import { useGeneralSettingStore } from './stores/generalSetting'
onMounted(() => {
window.electron.ipcRenderer.invoke('both.window.mounted').then((data: FullConfig) => {
useGeneralSettingStore().uiLanguage = data.uiLanguage
useGeneralSettingStore().uiTheme = data.uiTheme
useGeneralSettingStore().leftBarWidth = data.leftBarWidth
useCaptionStyleStore().setStyles(data.styles)
useEngineControlStore().platform = data.platform
useEngineControlStore().setControls(data.controls)
useCaptionLogStore().captionData = data.captionLog
})
})
</script>

View File

@@ -0,0 +1,29 @@
.input-item {
margin: 10px 0;
}
.input-label {
display: inline-block;
width: 80px;
text-align: right;
margin-right: 10px;
}
.switch-label {
display: inline-block;
min-width: 80px;
text-align: right;
margin-right: 10px;
}
.input-area {
width: calc(100% - 100px);
min-width: 100px;
}
.input-item-value {
width: 80px;
text-align: right;
font-size: 12px;
color: var(--tag-color)
}

View File

@@ -0,0 +1,12 @@
:root {
--control-background: #fff;
--tag-color: rgba(0, 0, 0, 0.45);
--icon-color: rgba(0, 0, 0, 0.88);
}
body {
margin: 0;
padding: 0;
height: 100vh;
overflow: hidden;
}

View File

@@ -1,6 +0,0 @@
body {
margin: 0;
padding: 0;
height: 100vh;
overflow: hidden;
}

View File

@@ -1,165 +0,0 @@
<template>
<div style="height: 20px;"></div>
<a-card size="small" title="字幕控制">
<template #extra>
<a @click="applyChange">更改设置</a> |
<a @click="cancelChange">取消更改</a>
</template>
<div class="control-item">
<span class="control-label">源语言</span>
<a-select
class="control-input"
v-model:value="currentSourceLang"
:options="langList"
></a-select>
</div>
<div class="control-item">
<span class="control-label">翻译语言</span>
<a-select
class="control-input"
v-model:value="currentTargetLang"
:options="langList.filter((item) => item.value !== 'auto')"
></a-select>
</div>
<div class="control-item">
<span class="control-label">字幕引擎</span>
<a-select
class="control-input"
v-model:value="currentEngine"
:options="captionEngine"
></a-select>
</div>
<div class="control-item">
<span class="control-label">音频选择</span>
<a-select
class="control-input"
v-model:value="currentAudio"
:options="audioType"
></a-select>
</div>
<div class="control-item">
<span class="control-label">启用翻译</span>
<a-switch v-model:checked="currentTranslation" />
<span class="control-label">自定义引擎</span>
<a-switch v-model:checked="currentCustomized" />
</div>
<div v-show="currentCustomized">
<a-card size="small" title="自定义字幕引擎">
<p class="customize-note">说明允许用户使用自定义字幕引擎提供字幕提供的引擎要能通过 <code>child_process.spawn()</code> 进行启动且需要通过 IPC 与项目 node.js 后端进行通信具体通信接口见后端实现</p>
<div class="control-item">
<span class="control-label">引擎路径</span>
<a-input
class="control-input"
v-model:value="currentCustomizedApp"
></a-input>
</div>
<div class="control-item">
<span class="control-label">引擎指令</span>
<a-input
class="control-input"
v-model:value="currentCustomizedCommand"
></a-input>
</div>
</a-card>
</div>
</a-card>
<div style="height: 20px;"></div>
</template>
<script setup lang="ts">
import { ref, computed, watch } from 'vue'
import { storeToRefs } from 'pinia'
import { useCaptionControlStore } from '@renderer/stores/captionControl'
import { notification } from 'ant-design-vue'
const captionControl = useCaptionControlStore()
const { captionEngine, audioType, changeSignal } = storeToRefs(captionControl)
const currentSourceLang = ref('auto')
const currentTargetLang = ref('zh')
const currentEngine = ref('gummy')
const currentAudio = ref<0 | 1>(0)
const currentTranslation = ref<boolean>(false)
const currentCustomized = ref<boolean>(false)
const currentCustomizedApp = ref('')
const currentCustomizedCommand = ref('')
const langList = computed(() => {
for(let item of captionEngine.value){
if(item.value === currentEngine.value) {
return item.languages
}
}
return []
})
function applyChange(){
captionControl.sourceLang = currentSourceLang.value
captionControl.targetLang = currentTargetLang.value
captionControl.engine = currentEngine.value
captionControl.audio = currentAudio.value
captionControl.translation = currentTranslation.value
captionControl.customized = currentCustomized.value
captionControl.customizedApp = currentCustomizedApp.value
captionControl.customizedCommand = currentCustomizedCommand.value
captionControl.sendControlChange()
notification.open({
message: '字幕控制已更改',
description: '如果字幕引擎已经启动,需要关闭后重启才会生效'
});
}
function cancelChange(){
currentSourceLang.value = captionControl.sourceLang
currentTargetLang.value = captionControl.targetLang
currentEngine.value = captionControl.engine
currentAudio.value = captionControl.audio
currentTranslation.value = captionControl.translation
currentCustomized.value = captionControl.customized
currentCustomizedApp.value = captionControl.customizedApp
currentCustomizedCommand.value = captionControl.customizedCommand
}
watch(changeSignal, (val) => {
if(val == true) {
cancelChange();
captionControl.changeSignal = false;
}
})
</script>
<style scoped>
.control-item {
margin: 10px 0;
}
.control-label {
display: inline-block;
width: 80px;
text-align: right;
margin-right: 10px;
}
.customize-note {
padding: 0 20px;
color: red;
font-size: 12px;
}
.control-input {
width: calc(100% - 100px);
min-width: 100px;
}
.control-item-value {
width: 80px;
text-align: right;
font-size: 12px;
color: #666
}
</style>

View File

@@ -1,202 +0,0 @@
<template>
<div class="caption-stat">
<a-row>
<a-col :span="6">
<a-statistic title="字幕引擎" :value="engine" />
</a-col>
<a-col :span="6">
<a-statistic title="字幕引擎状态" :value="engineEnabled?'已启动':'未启动'" />
</a-col>
<a-col :span="6">
<a-statistic title="已记录字幕" :value="captionData.length" />
</a-col>
</a-row>
</div>
<div class="caption-control">
<a-button
type="primary"
class="control-button"
@click="openCaptionWindow"
>打开字幕窗口</a-button>
<a-button
class="control-button"
@click="captionControl.startEngine"
>启动字幕引擎</a-button>
<a-button
danger class="control-button"
@click="captionControl.stopEngine"
>关闭字幕引擎</a-button>
</div>
<div class="caption-list">
<div class="caption-title">
<span style="margin-right: 30px;">字幕记录</span>
<a-button
type="primary"
style="margin-right: 20px;"
@click="exportCaptions"
:disabled="captionData.length === 0"
>
导出字幕记录
</a-button>
<a-button
danger
@click="clearCaptions"
>
清空字幕记录
</a-button>
</div>
<a-table
:columns="columns"
:data-source="captionData"
v-model:pagination="pagination"
>
<template #bodyCell="{ column, record }">
<template v-if="column.key === 'index'">
{{ record.index }}
</template>
<template v-if="column.key === 'time'">
<div class="time-cell">
<div class="time-start">{{ record.time_s }}</div>
<div class="time-end">{{ record.time_t }}</div>
</div>
</template>
<template v-if="column.key === 'content'">
<div class="caption-content">
<div class="caption-text">{{ record.text }}</div>
<div class="caption-translation">{{ record.translation }}</div>
</div>
</template>
</template>
</a-table>
</div>
</template>
<script setup lang="ts">
import { ref } from 'vue'
import { storeToRefs } from 'pinia'
import { useCaptionLogStore } from '@renderer/stores/captionLog'
import { useCaptionControlStore } from '@renderer/stores/captionControl'
const captionLog = useCaptionLogStore()
const { captionData } = storeToRefs(captionLog)
const captionControl = useCaptionControlStore()
const { engineEnabled, engine } = storeToRefs(captionControl)
const pagination = ref({
current: 1,
pageSize: 10,
showSizeChanger: true,
pageSizeOptions: ['10', '20', '50'],
showTotal: (total: number) => `${total} 条记录`,
onChange: (page: number, pageSize: number) => {
pagination.value.current = page
pagination.value.pageSize = pageSize
},
onShowSizeChange: (current: number, size: number) => {
pagination.value.current = current
pagination.value.pageSize = size
}
})
const columns = [
{
title: '序号',
dataIndex: 'index',
key: 'index',
width: 80,
},
{
title: '时间',
dataIndex: 'time',
key: 'time',
width: 160,
},
{
title: '字幕内容',
dataIndex: 'content',
key: 'content',
},
]
function openCaptionWindow() {
window.electron.ipcRenderer.send('control.captionWindow.activate')
}
function exportCaptions() {
const jsonData = JSON.stringify(captionData.value, null, 2)
const blob = new Blob([jsonData], { type: 'application/json' })
const url = URL.createObjectURL(blob)
const a = document.createElement('a')
a.href = url
const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
a.download = `captions-${timestamp}.json`
document.body.appendChild(a)
a.click()
document.body.removeChild(a)
URL.revokeObjectURL(url)
}
function clearCaptions() {
captionLog.clear()
}
</script>
<style scoped>
.caption-control {
display: flex;
flex-wrap: wrap;
justify-content: center;
margin: 30px;
}
.control-button {
height: 40px;
margin: 20px;
font-size: 16px;
}
.caption-list {
background: #fff;
padding: 20px;
border-radius: 8px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
}
.caption-title {
font-size: 24px;
font-weight: bold;
margin-bottom: 10px;
}
.time-cell {
display: flex;
flex-direction: column;
gap: 4px;
font-size: 14px;
}
.time-start {
color: #1677ff;
}
.time-end {
color: #ff4d4f;
}
.caption-content {
padding: 8px 0;
}
.caption-text {
font-size: 16px;
color: #333;
margin-bottom: 4px;
}
.caption-translation {
font-size: 14px;
color: #666;
padding-left: 16px;
border-left: 3px solid #1890ff;
}
</style>

View File

@@ -0,0 +1,337 @@
<template>
<div class="caption-list">
<div>
<a-app class="caption-title">
<span style="margin-right: 30px;">{{ $t('log.title') }}</span>
</a-app>
<a-popover :title="$t('log.baseTime')">
<template #content>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0"
v-model:value="baseHH"
></a-input>
<span class="base-time-label">{{ $t('log.hour') }}</span>
</div>
</div><span style="margin: 0 4px;">:</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="59"
v-model:value="baseMM"
></a-input>
<span class="base-time-label">{{ $t('log.min') }}</span>
</div>
</div><span style="margin: 0 4px;">:</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="59"
v-model:value="baseSS"
></a-input>
<span class="base-time-label">{{ $t('log.sec') }}</span>
</div>
</div><span style="margin: 0 4px;">.</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="999"
v-model:value="baseMS"
></a-input>
<span class="base-time-label">{{ $t('log.ms') }}</span>
</div>
</div>
</template>
<a-button
type="primary"
style="margin-right: 20px;"
@click="changeBaseTime"
:disabled="captionData.length === 0"
>{{ $t('log.changeTime') }}</a-button>
</a-popover>
<a-popover :title="$t('log.exportOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.exportFormat') }}</span>
<a-radio-group v-model:value="exportFormat">
<a-radio-button value="srt">.srt</a-radio-button>
<a-radio-button value="json">.json</a-radio-button>
</a-radio-group>
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.exportContent') }}</span>
<a-radio-group v-model:value="contentOption">
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="exportCaptions"
:disabled="captionData.length === 0"
>{{ $t('log.export') }}</a-button>
</a-popover>
<a-popover :title="$t('log.copyOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.addIndex') }}</span>
<a-switch v-model:checked="showIndex" />
<span class="input-label">{{ $t('log.copyTime') }}</span>
<a-switch v-model:checked="copyTime" />
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.copyContent') }}</span>
<a-radio-group v-model:value="contentOption">
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="copyCaptions"
>{{ $t('log.copy') }}</a-button>
</a-popover>
<a-button
danger
@click="clearCaptions"
>{{ $t('log.clear') }}</a-button>
</div>
<a-table
:columns="columns"
:data-source="captionData"
v-model:pagination="pagination"
style="margin-top: 10px;"
>
<template #bodyCell="{ column, record }">
<template v-if="column.key === 'index'">
{{ record.index }}
</template>
<template v-if="column.key === 'time'">
<div class="time-cell">
<div class="time-start">{{ record.time_s }}</div>
<div class="time-end">{{ record.time_t }}</div>
</div>
</template>
<template v-if="column.key === 'content'">
<div class="caption-content">
<div class="caption-text">{{ record.text }}</div>
<div class="caption-translation">{{ record.translation }}</div>
</div>
</template>
</template>
</a-table>
</div>
</template>
<script setup lang="ts">
import { ref } from 'vue'
import { storeToRefs } from 'pinia'
import { useCaptionLogStore } from '@renderer/stores/captionLog'
import { message } from 'ant-design-vue'
import { useI18n } from 'vue-i18n'
import * as tc from '../utils/timeCalc'
import { CaptionItem } from '../types'
const { t } = useI18n()
const captionLog = useCaptionLogStore()
const { captionData } = storeToRefs(captionLog)
const exportFormat = ref('srt')
const showIndex = ref(true)
const copyTime = ref(true)
const contentOption = ref('both')
const baseHH = ref<number>(0)
const baseMM = ref<number>(0)
const baseSS = ref<number>(0)
const baseMS = ref<number>(0)
const pagination = ref({
current: 1,
pageSize: 20,
showSizeChanger: true,
pageSizeOptions: ['10', '20', '50', '100'],
onChange: (page: number, pageSize: number) => {
pagination.value.current = page
pagination.value.pageSize = pageSize
},
onShowSizeChange: (current: number, size: number) => {
pagination.value.current = current
pagination.value.pageSize = size
}
})
const columns = [
{
title: 'index',
dataIndex: 'index',
key: 'index',
width: 80,
sorter: (a: CaptionItem, b: CaptionItem) => {
if(a.index <= b.index) return -1
return 1
},
sortDirections: ['descend'],
defaultSortOrder: 'descend',
},
{
title: 'time',
dataIndex: 'time',
key: 'time',
width: 160,
sorter: (a: CaptionItem, b: CaptionItem) => {
if(a.time_s <= b.time_s) return -1
return 1
},
sortDirections: ['descend', 'ascend'],
},
{
title: 'content',
dataIndex: 'content',
key: 'content',
},
]
function changeBaseTime() {
if(baseHH.value < 0) baseHH.value = 0
if(baseMM.value < 0) baseMM.value = 0
if(baseMM.value > 59) baseMM.value = 59
if(baseSS.value < 0) baseSS.value = 0
if(baseSS.value > 59) baseSS.value = 59
if(baseMS.value < 0) baseMS.value = 0
if(baseMS.value > 999) baseMS.value = 999
const newBase: tc.Time = {
hh: Number(baseHH.value),
mm: Number(baseMM.value),
ss: Number(baseSS.value),
ms: Number(baseMS.value)
}
const oldBase = tc.getTimeFromStr(captionData.value[0].time_s)
const deltaMs = tc.getMsFromTime(newBase) - tc.getMsFromTime(oldBase)
for(let i = 0; i < captionData.value.length; i++){
captionData.value[i].time_s =
tc.getNewTimeStr(captionData.value[i].time_s, deltaMs)
captionData.value[i].time_t =
tc.getNewTimeStr(captionData.value[i].time_t, deltaMs)
}
}
function exportCaptions() {
const exportData = getExportData()
const blob = new Blob([exportData], {
type: exportFormat.value === 'json' ? 'application/json' : 'text/plain'
})
const url = URL.createObjectURL(blob)
const a = document.createElement('a')
a.href = url
const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
a.download = `captions-${timestamp}.${exportFormat.value}`
document.body.appendChild(a)
a.click()
document.body.removeChild(a)
URL.revokeObjectURL(url)
}
function getExportData() {
if(exportFormat.value === 'json') return JSON.stringify(captionData.value, null, 2)
let content = ''
for(let i = 0; i < captionData.value.length; i++){
const item = captionData.value[i]
content += `${i+1}\n`
content += `${item.time_s} --> ${item.time_t}\n`.replace(/\./g, ',')
if(contentOption.value === 'both') content += `${item.text}\n${item.translation}\n\n`
else if(contentOption.value === 'source') content += `${item.text}\n\n`
else content += `${item.translation}\n\n`
}
return content
}
function copyCaptions() {
let content = ''
for(let i = 0; i < captionData.value.length; i++){
const item = captionData.value[i]
if(showIndex.value) content += `${i+1}\n`
if(copyTime.value) content += `${item.time_s} --> ${item.time_t}\n`.replace(/\./g, ',')
if(contentOption.value === 'both') content += `${item.text}\n${item.translation}\n\n`
else if(contentOption.value === 'source') content += `${item.text}\n\n`
else content += `${item.translation}\n\n`
}
navigator.clipboard.writeText(content)
message.success(t('log.copySuccess'))
}
function clearCaptions() {
captionLog.clear()
}
</script>
<style scoped>
@import url(../assets/input.css);
.caption-list {
padding: 20px;
border-radius: 8px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
}
.caption-title {
display: inline-block;
font-size: 24px;
font-weight: bold;
margin-bottom: 10px;
}
.base-time {
width: 64px;
display: inline-block;
}
.base-time-container {
display: flex;
flex-direction: column;
align-items: center;
gap: 4px;
}
.base-time-label {
font-size: 12px;
color: var(--tag-color);
}
.time-cell {
display: flex;
flex-direction: column;
gap: 4px;
font-size: 14px;
}
.time-start {
color: #1677ff;
}
.time-end {
color: #ff4d4f;
}
.caption-content {
padding: 8px 0;
}
.caption-text {
font-size: 16px;
margin-bottom: 4px;
}
.caption-translation {
font-size: 14px;
padding-left: 16px;
border-left: 3px solid #1890ff;
}
</style>

View File

@@ -1,123 +1,214 @@
<template>
<a-card size="small" title="字幕样式设置">
<a-card size="small" :title="$t('style.title')">
<template #extra>
<a @click="applyStyle">应用样式</a> |
<a @click="resetStyle">取消更改</a>
<a @click="applyStyle">{{ $t('style.applyStyle') }}</a> |
<a @click="backStyle">{{ $t('style.cancelChange') }}</a> |
<a @click="resetStyle">{{ $t('style.resetStyle') }}</a>
</template>
<div class="style-item">
<span class="style-label">字体族</span>
<a-input
class="style-input"
v-model:value="currentFontFamily"
/>
<div class="input-item">
<span class="input-label">{{ $t('style.longCaption') }}</span>
<a-select
class="input-area"
v-model:value="currentLineBreak"
:options="captionStyle.iBreakOptions"
></a-select>
</div>
<div class="style-item">
<span class="style-label">字体颜色</span>
<div class="input-item">
<span class="input-label">{{ $t('style.fontFamily') }}</span>
<a-input
class="style-input"
class="input-area"
v-model:value="currentFontFamily"
/>
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.fontColor') }}</span>
<a-input
class="input-area"
type="color"
v-model:value="currentFontColor"
/>
<div class="style-item-value">{{ currentFontColor }}</div>
<div class="input-item-value">{{ currentFontColor }}</div>
</div>
<div class="style-item">
<span class="style-label">字体大小</span>
<div class="input-item">
<span class="input-label">{{ $t('style.fontSize') }}</span>
<a-input
class="style-input"
class="input-area"
type="range"
min="0" max="64"
min="0" max="72"
v-model:value="currentFontSize"
/>
<div class="style-item-value">{{ currentFontSize }}px</div>
/>
<div class="input-item-value">{{ currentFontSize }}px</div>
</div>
<div class="style-item">
<span class="style-label">背景颜色</span>
<div class="input-item">
<span class="input-label">{{ $t('style.fontWeight') }}</span>
<a-input
class="style-input"
class="input-area"
type="range"
min="1" max="9"
v-model:value="currentFontWeight"
/>
<div class="input-item-value">{{ currentFontWeight*100 }}</div>
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.background') }}</span>
<a-input
class="input-area"
type="color"
v-model:value="currentBackground"
/>
<div class="style-item-value">{{ currentBackground }}</div>
<div class="input-item-value">{{ currentBackground }}</div>
</div>
<div class="style-item">
<span class="style-label">背景透明度</span>
<div class="input-item">
<span class="input-label">{{ $t('style.opacity') }}</span>
<a-input
class="style-input"
class="input-area"
type="range"
min="0"
max="100"
v-model:value="currentOpacity"
/>
<div class="style-item-value">{{ currentOpacity }}</div>
<div class="input-item-value">{{ currentOpacity }}%</div>
</div>
<div class="style-item">
<span class="style-label">显示预览</span>
<a-switch v-model:checked="displayPreview" />
<span class="style-label">显示翻译</span>
<a-switch v-model:checked="currentTransDisplay" />
<div class="input-item">
<span class="input-label">{{ $t('style.preview') }}</span>
<a-switch v-model:checked="currentPreview" />
<span style="display:inline-block;width:10px;"></span>
<div style="display: inline-block;">
<span class="switch-label">{{ $t('style.translation') }}</span>
<a-switch v-model:checked="currentTransDisplay" />
</div>
<span style="display:inline-block;width:10px;"></span>
<div style="display: inline-block;">
<span class="switch-label">{{ $t('style.textShadow') }}</span>
<a-switch v-model:checked="currentTextShadow" />
</div>
</div>
<div v-show="currentTransDisplay">
<a-card size="small" title="翻译样式设置">
<a-card size="small" :title="$t('style.trans.title')">
<template #extra>
<a @click="useSameStyle">使用相同样式</a>
<a @click="useSameStyle">{{ $t('style.trans.useSame') }}</a>
</template>
<div class="style-item">
<span class="style-label">翻译字体</span>
<div class="input-item">
<span class="input-label">{{ $t('style.fontFamily') }}</span>
<a-input
class="style-input"
class="input-area"
v-model:value="currentTransFontFamily"
/>
/>
</div>
<div class="style-item">
<span class="style-label">翻译颜色</span>
<div class="input-item">
<span class="input-label">{{ $t('style.fontColor') }}</span>
<a-input
class="style-input"
class="input-area"
type="color"
v-model:value="currentTransFontColor"
/>
<div class="style-item-value">{{ currentTransFontColor }}</div>
<div class="input-item-value">{{ currentTransFontColor }}</div>
</div>
<div class="style-item">
<span class="style-label">翻译大小</span>
<div class="input-item">
<span class="input-label">{{ $t('style.fontSize') }}</span>
<a-input
class="style-input"
class="input-area"
type="range"
min="0" max="64"
min="0" max="72"
v-model:value="currentTransFontSize"
/>
<div class="style-item-value">{{ currentTransFontSize }}px</div>
/>
<div class="input-item-value">{{ currentTransFontSize }}px</div>
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.fontWeight') }}</span>
<a-input
class="input-area"
type="range"
min="1" max="9"
v-model:value="currentTransFontWeight"
/>
<div class="input-item-value">{{ currentTransFontWeight*100 }}</div>
</div>
</a-card>
</div>
<div v-show="currentTextShadow" style="margin-top:10px;">
<a-card size="small" :title="$t('style.shadow.title')">
<div class="input-item">
<span class="input-label">{{ $t('style.shadow.offsetX') }}</span>
<a-input
class="input-area"
type="range"
min="-10" max="10"
v-model:value="currentOffsetX"
/>
<div class="input-item-value">{{ currentOffsetX }}px</div>
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.shadow.offsetY') }}</span>
<a-input
class="input-area"
type="range"
min="-10" max="10"
v-model:value="currentOffsetY"
/>
<div class="input-item-value">{{ currentOffsetY }}px</div>
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.shadow.blur') }}</span>
<a-input
class="input-area"
type="range"
min="0" max="12"
v-model:value="currentBlur"
/>
<div class="input-item-value">{{ currentBlur }}px</div>
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.shadow.color') }}</span>
<a-input
class="input-area"
type="color"
v-model:value="currentTextShadowColor"
/>
<div class="input-item-value">{{ currentTextShadowColor }}</div>
</div>
</a-card>
</div>
</a-card>
<Teleport to="body">
<div
v-if="displayPreview"
v-if="currentPreview"
class="preview-container"
:style="{
backgroundColor: addOpicityToColor(currentBackground, currentOpacity)
backgroundColor: addOpicityToColor(currentBackground, currentOpacity),
textShadow: currentTextShadow ? `${currentOffsetX}px ${currentOffsetY}px ${currentBlur}px ${currentTextShadowColor}` : 'none'
}"
>
<p class="preview-caption"
<p :class="[currentLineBreak?'':'left-ellipsis']"
:style="{
fontFamily: currentFontFamily,
fontSize: currentFontSize + 'px',
color: currentFontColor
}">
{{ "This is a preview of subtitle styles." }}
color: currentFontColor,
fontWeight: currentFontWeight * 100
}">
<span v-if="captionData.length">{{ captionData[captionData.length-1].text }}</span>
<span v-else>{{ $t('example.original') }}</span>
</p>
<p class="preview-translation" v-if="currentTransDisplay"
<p :class="[currentLineBreak?'':'left-ellipsis']"
v-if="currentTransDisplay"
:style="{
fontFamily: currentTransFontFamily,
fontSize: currentTransFontSize + 'px',
color: currentTransFontColor
color: currentTransFontColor,
fontWeight: currentTransFontWeight * 100
}"
>这是字幕样式预览(翻译)</p>
</div>
>
<span v-if="captionData.length">{{ captionData[captionData.length-1].translation }}</span>
<span v-else>{{ $t('example.translation') }}</span>
</p>
</div>
</Teleport>
</template>
@@ -126,20 +217,35 @@
import { ref, watch } from 'vue'
import { useCaptionStyleStore } from '@renderer/stores/captionStyle'
import { storeToRefs } from 'pinia'
import { notification } from 'ant-design-vue'
import { useI18n } from 'vue-i18n'
import { useCaptionLogStore } from '@renderer/stores/captionLog';
const captionLog = useCaptionLogStore();
const { captionData } = storeToRefs(captionLog);
const { t } = useI18n()
const captionStyle = useCaptionStyleStore()
const { changeSignal } = storeToRefs(captionStyle)
const currentLineBreak = ref<number>(0)
const currentFontFamily = ref<string>('sans-serif')
const currentFontSize = ref<number>(24)
const currentFontColor = ref<string>('#000000')
const currentFontWeight = ref<number>(4)
const currentBackground = ref<string>('#dbe2ef')
const currentOpacity = ref<number>(50)
const currentPreview = ref<boolean>(true)
const currentTransDisplay = ref<boolean>(true)
const currentTransFontFamily = ref<string>('sans-serif')
const currentTransFontSize = ref<number>(24)
const currentTransFontColor = ref<string>('#000000')
const displayPreview = ref<boolean>(true)
const currentTransFontWeight = ref<number>(4)
const currentTextShadow = ref<boolean>(false)
const currentOffsetX = ref<number>(2)
const currentOffsetY = ref<number>(2)
const currentBlur = ref<number>(0)
const currentTextShadowColor = ref<string>('#ffffff')
function addOpicityToColor(color: string, opicity: number) {
const opicityValue = Math.round(opicity * 255 / 100);
@@ -151,87 +257,110 @@ function useSameStyle(){
currentTransFontFamily.value = currentFontFamily.value;
currentTransFontSize.value = currentFontSize.value;
currentTransFontColor.value = currentFontColor.value;
currentTransFontWeight.value = currentFontWeight.value;
}
function applyStyle(){
function applyStyle(){
captionStyle.lineBreak = currentLineBreak.value;
captionStyle.fontFamily = currentFontFamily.value;
captionStyle.fontSize = currentFontSize.value;
captionStyle.fontColor = currentFontColor.value;
captionStyle.fontWeight = currentFontWeight.value;
captionStyle.background = currentBackground.value;
captionStyle.opacity = currentOpacity.value;
captionStyle.showPreview = currentPreview.value;
captionStyle.transDisplay = currentTransDisplay.value;
captionStyle.transFontFamily = currentTransFontFamily.value;
captionStyle.transFontSize = currentTransFontSize.value;
captionStyle.transFontColor = currentTransFontColor.value;
captionStyle.transFontWeight = currentTransFontWeight.value;
captionStyle.textShadow = currentTextShadow.value;
captionStyle.offsetX = currentOffsetX.value;
captionStyle.offsetY = currentOffsetY.value;
captionStyle.blur = currentBlur.value;
captionStyle.textShadowColor = currentTextShadowColor.value;
captionStyle.sendStyleChange();
captionStyle.sendStylesChange();
notification.open({
placement: 'topLeft',
message: t('noti.styleChange'),
description: t('noti.styleInfo')
});
}
function resetStyle(){
function backStyle(){
currentLineBreak.value = captionStyle.lineBreak;
currentFontFamily.value = captionStyle.fontFamily;
currentFontSize.value = captionStyle.fontSize;
currentFontColor.value = captionStyle.fontColor;
currentFontWeight.value = captionStyle.fontWeight;
currentBackground.value = captionStyle.background;
currentOpacity.value = captionStyle.opacity;
currentPreview.value = captionStyle.showPreview;
currentTransDisplay.value = captionStyle.transDisplay;
currentTransFontFamily.value = captionStyle.transFontFamily;
currentTransFontSize.value = captionStyle.transFontSize;
currentTransFontColor.value = captionStyle.transFontColor;
currentTransFontWeight.value = captionStyle.transFontWeight;
currentTextShadow.value = captionStyle.textShadow;
currentOffsetX.value = captionStyle.offsetX;
currentOffsetY.value = captionStyle.offsetY;
currentBlur.value = captionStyle.blur;
currentTextShadowColor.value = captionStyle.textShadowColor;
}
function resetStyle() {
captionStyle.sendStylesReset();
}
watch(changeSignal, (val) => {
if(val == true) {
resetStyle();
backStyle();
captionStyle.changeSignal = false;
}
})
</script>
<style scoped>
.caption-button {
display: flex;
justify-content: center;
@import url(../assets/input.css);
.general-note {
padding: 10px 10px 0;
max-width: min(36vw, 400px);
}
.style-item {
margin: 10px 0;
}
.style-label {
display: inline-block;
width: 80px;
text-align: right;
margin-right: 10px;
}
.style-input {
width: calc(100% - 100px);
min-width: 100px;
}
.style-item-value {
width: 80px;
text-align: right;
font-size: 12px;
color: #666
.hover-label {
color: #1668dc;
cursor: pointer;
font-weight: bold;
}
.preview-container {
line-height: 2em;
width: 60%;
text-align: center;
position: absolute;
padding: 20px;
padding: 10px;
border-radius: 10px;
left: 50%;
left: 64%;
transform: translateX(-50%);
bottom: 20px;
}
.preview-container p {
text-align: center;
margin: 0;
line-height: 1.5em;
line-height: 1.6em;
}
</style>
.left-ellipsis {
white-space: nowrap;
overflow: hidden;
direction: rtl;
text-align: left;
}
.left-ellipsis > span {
direction: ltr;
display: inline-block;
}
</style>

View File

@@ -0,0 +1,249 @@
<template>
<div style="height: 20px;"></div>
<a-card size="small" :title="$t('engine.title')">
<template #extra>
<a @click="applyChange">{{ $t('engine.applyChange') }}</a> |
<a @click="cancelChange">{{ $t('engine.cancelChange') }}</a>
</template>
<div class="input-item">
<span class="input-label">{{ $t('engine.sourceLang') }}</span>
<a-select
class="input-area"
v-model:value="currentSourceLang"
:options="langList"
></a-select>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.transLang') }}</span>
<a-select
:disabled="currentEngine === 'vosk'"
class="input-area"
v-model:value="currentTargetLang"
:options="langList.filter((item) => item.value !== 'auto')"
></a-select>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.captionEngine') }}</span>
<a-select
class="input-area"
v-model:value="currentEngine"
:options="captionEngine"
></a-select>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.audioType') }}</span>
<a-select
class="input-area"
v-model:value="currentAudio"
:options="audioType"
></a-select>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.enableTranslation') }}</span>
<a-switch v-model:checked="currentTranslation" />
<span style="display:inline-block;width:10px;"></span>
<div style="display: inline-block;">
<span class="switch-label">{{ $t('engine.customEngine') }}</span>
<a-switch v-model:checked="currentCustomized" />
</div>
<span style="display:inline-block;width:10px;"></span>
<div style="display: inline-block;">
<span class="switch-label">{{ $t('engine.showMore') }}</span>
<a-switch v-model:checked="showMore" />
</div>
</div>
<a-card size="small" :title="$t('engine.custom.title')" v-show="currentCustomized">
<template #extra>
<a-popover>
<template #content>
<p class="customize-note">{{ $t('engine.custom.note') }}</p>
</template>
<a><InfoCircleOutlined />{{ $t('engine.custom.attention') }}</a>
</a-popover>
</template>
<div class="input-item">
<span class="input-label">{{ $t('engine.custom.app') }}</span>
<a-input
class="input-area"
v-model:value="currentCustomizedApp"
></a-input>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.custom.command') }}</span>
<a-input
class="input-area"
v-model:value="currentCustomizedCommand"
></a-input>
</div>
</a-card>
<a-card size="small" :title="$t('engine.showMore')" v-show="showMore" style="margin-top:10px;">
<div class="input-item">
<a-popover>
<template #content>
<p class="label-hover-info">{{ $t('engine.apikeyInfo') }}</p>
</template>
<span class="input-label info-label">{{ $t('engine.apikey') }}</span>
</a-popover>
<a-input
class="input-area"
type="password"
v-model:value="currentAPI_KEY"
/>
</div>
<div class="input-item">
<a-popover>
<template #content>
<p class="label-hover-info">{{ $t('engine.modelPathInfo') }}</p>
</template>
<span class="input-label info-label">{{ $t('engine.modelPath') }}</span>
</a-popover>
<span
class="input-folder"
@click="selectFolderPath"
><span><FolderOpenOutlined /></span></span>
<a-input
class="input-area"
style="width:calc(100% - 140px);"
v-model:value="currentModelPath"
/>
</div>
</a-card>
</a-card>
<div style="height: 20px;"></div>
</template>
<script setup lang="ts">
import { ref, computed, watch } from 'vue'
import { storeToRefs } from 'pinia'
import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
import { useEngineControlStore } from '@renderer/stores/engineControl'
import { notification } from 'ant-design-vue'
import { FolderOpenOutlined ,InfoCircleOutlined } from '@ant-design/icons-vue';
import { useI18n } from 'vue-i18n'
const { t } = useI18n()
const showMore = ref(false)
const engineControl = useEngineControlStore()
const { captionEngine, audioType, changeSignal } = storeToRefs(engineControl)
const currentSourceLang = ref('auto')
const currentTargetLang = ref('zh')
const currentEngine = ref<string>('gummy')
const currentAudio = ref<0 | 1>(0)
const currentTranslation = ref<boolean>(false)
const currentAPI_KEY = ref<string>('')
const currentModelPath = ref<string>('')
const currentCustomized = ref<boolean>(false)
const currentCustomizedApp = ref('')
const currentCustomizedCommand = ref('')
const langList = computed(() => {
for(let item of captionEngine.value){
if(item.value === currentEngine.value) {
return item.languages
}
}
return []
})
function applyChange(){
engineControl.sourceLang = currentSourceLang.value
engineControl.targetLang = currentTargetLang.value
engineControl.engine = currentEngine.value
engineControl.audio = currentAudio.value
engineControl.translation = currentTranslation.value
engineControl.API_KEY = currentAPI_KEY.value
engineControl.modelPath = currentModelPath.value
engineControl.customized = currentCustomized.value
engineControl.customizedApp = currentCustomizedApp.value
engineControl.customizedCommand = currentCustomizedCommand.value
engineControl.sendControlsChange()
notification.open({
placement: 'topLeft',
message: t('noti.engineChange'),
description: t('noti.changeInfo')
});
}
function cancelChange(){
currentSourceLang.value = engineControl.sourceLang
currentTargetLang.value = engineControl.targetLang
currentEngine.value = engineControl.engine
currentAudio.value = engineControl.audio
currentTranslation.value = engineControl.translation
currentAPI_KEY.value = engineControl.API_KEY
currentModelPath.value = engineControl.modelPath
currentCustomized.value = engineControl.customized
currentCustomizedApp.value = engineControl.customizedApp
currentCustomizedCommand.value = engineControl.customizedCommand
}
function selectFolderPath() {
window.electron.ipcRenderer.invoke('control.folder.select').then((folderPath) => {
if(!folderPath) return
currentModelPath.value = folderPath
})
}
watch(changeSignal, (val) => {
if(val == true) {
cancelChange();
engineControl.changeSignal = false;
}
})
watch(currentEngine, (val) => {
if(val == 'vosk'){
currentSourceLang.value = 'auto'
currentTargetLang.value = ''
}
else if(val == 'gummy'){
currentSourceLang.value = 'auto'
currentTargetLang.value = useGeneralSettingStore().uiLanguage
}
})
</script>
<style scoped>
@import url(../assets/input.css);
.label-hover-info {
margin-top: 10px;
max-width: min(36vw, 380px);
}
.info-label {
color: #1677ff;
cursor: pointer;
}
.input-folder {
display:inline-block;
width: 40px;
font-size:1.38em;
cursor: pointer;
transition: all 0.25s;
}
.input-folder>span {
padding: 0 2px;
border: 2px solid #1677ff;
color: #1677ff;
border-radius: 30%;
}
.input-folder:hover {
transform: scale(1.1);
}
.customize-note {
padding: 10px 10px 0;
color: red;
max-width: min(40vw, 480px);
}
</style>

View File

@@ -0,0 +1,259 @@
<template>
<div class="caption-stat">
<a-row>
<a-col :span="6">
<a-statistic
:title="$t('status.engine')"
:value="customized?$t('status.customized'):engine"
/>
</a-col>
<a-popover :title="$t('status.engineStatus')">
<template #content>
<a-row class="engine-status">
<a-col :flex="1" :title="$t('status.pid')" style="cursor:pointer;">
<div class="engine-status-title">pid</div>
<div>{{ pid }}</div>
</a-col>
<a-col :flex="1" :title="$t('status.ppid')" style="cursor:pointer;">
<div class="engine-status-title">ppid</div>
<div>{{ ppid }}</div>
</a-col>
<a-col :flex="1" :title="$t('status.port')" style="cursor:pointer;">
<div class="engine-status-title">port</div>
<div>{{ port }}</div>
</a-col>
<a-col :flex="1" :title="$t('status.cpu')" style="cursor:pointer;">
<div class="engine-status-title">cpu</div>
<div>{{ cpu.toFixed(1) }}%</div>
</a-col>
<a-col :flex="1" :title="$t('status.mem')" style="cursor:pointer;">
<div class="engine-status-title">mem</div>
<div>{{ (mem/1024/1024).toFixed(2) }}MB</div>
</a-col>
<a-col :flex="1" :title="$t('status.elapsed')" style="cursor:pointer;">
<div class="engine-status-title">elapsed</div>
<div>{{ (elapsed/1000).toFixed(0) }}s</div>
</a-col>
</a-row>
</template>
<a-col :span="6" @mouseenter="getEngineInfo" style="cursor: pointer;">
<a-statistic
:title="$t('status.status')"
:value="engineEnabled?$t('status.started'):$t('status.stopped')"
>
<template #suffix v-if="engineEnabled">
<InfoCircleOutlined style="font-size:18px;color:#1677ff"/>
</template>
</a-statistic>
</a-col>
</a-popover>
<a-col :span="6">
<a-statistic :title="$t('status.logNumber')" :value="captionData.length" />
</a-col>
<a-col :span="6">
<div class="about-tag">{{ $t('status.aboutProj') }}</div>
<GithubOutlined class="proj-info" @click="showAbout = true"/>
</a-col>
</a-row>
</div>
<div class="caption-control">
<a-button
type="primary"
class="control-button"
@click="openCaptionWindow"
>{{ $t('status.openCaption') }}</a-button>
<a-button
class="control-button"
:loading="pending && !engineEnabled"
:disabled="pending || engineEnabled"
@click="startEngine"
>{{ $t('status.startEngine') }}</a-button>
<a-button
danger class="control-button"
:loading="pending && engineEnabled"
:disabled="pending || !engineEnabled"
@click="stopEngine"
>{{ $t('status.stopEngine') }}</a-button>
</div>
<a-modal v-model:open="showAbout" :title="$t('status.about.title')" :footer="null">
<div class="about-modal-content">
<h2 class="about-title">{{ $t('status.about.proj') }}</h2>
<p class="about-desc">{{ $t('status.about.desc') }}</p>
<a-divider />
<div class="about-info">
<p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v0.6.0</a-tag></p>
<p>
<b>{{ $t('status.about.author') }}</b>
<a
href="https://github.com/HiMeditator"
target="_blank"
>
<a-tag color="blue">HiMeditator</a-tag>
</a>
</p>
<p>
<b>{{ $t('status.about.projLink') }}</b>
<a href="https://github.com/HiMeditator/auto-caption" target="_blank">
<a-tag color="blue">GitHub | auto-caption</a-tag>
</a>
</p>
<p>
<b>{{ $t('status.about.manual') }}</b>
<a
:href="`https://github.com/HiMeditator/auto-caption/tree/main/docs/user-manual/${$t('lang')}.md`"
target="_blank"
>
<a-tag color="blue">GitHub | user-manual/{{ $t('lang') }}.md</a-tag>
</a>
</p>
<p>
<b>{{ $t('status.about.engineDoc') }}</b>
<a
:href="`https://github.com/HiMeditator/auto-caption/tree/main/docs/engine-manual/${$t('lang')}.md`"
target="_blank"
>
<a-tag color="blue">GitHub | engine-manual/{{ $t('lang') }}.md</a-tag>
</a>
</p>
</div>
<div class="about-date">{{ $t('status.about.date') }}</div>
</div>
</a-modal>
</template>
<script setup lang="ts">
import { EngineInfo } from '@renderer/types'
import { ref, watch } from 'vue'
import { storeToRefs } from 'pinia'
import { useCaptionLogStore } from '@renderer/stores/captionLog'
import { useEngineControlStore } from '@renderer/stores/engineControl'
import { GithubOutlined, InfoCircleOutlined } from '@ant-design/icons-vue';
const showAbout = ref(false)
const pending = ref(false)
const captionLog = useCaptionLogStore()
const { captionData } = storeToRefs(captionLog)
const engineControl = useEngineControlStore()
const { engineEnabled, engine, customized, errorSignal } = storeToRefs(engineControl)
const pid = ref(0)
const ppid = ref(0)
const port = ref(0)
const cpu = ref(0)
const mem = ref(0)
const elapsed = ref(0)
function openCaptionWindow() {
window.electron.ipcRenderer.send('control.captionWindow.activate')
}
function startEngine() {
pending.value = true
if(engineControl.engine === 'vosk' && engineControl.modelPath.trim() === '') {
engineControl.emptyModelPathErr()
return
}
window.electron.ipcRenderer.send('control.engine.start')
}
function stopEngine() {
pending.value = true
window.electron.ipcRenderer.send('control.engine.stop')
}
function getEngineInfo() {
window.electron.ipcRenderer.invoke('control.engine.info').then((data: EngineInfo) => {
pid.value = data.pid
ppid.value = data.ppid
port.value = data.port
cpu.value = data.cpu
mem.value = data.mem
elapsed.value = data.elapsed
})
}
watch(engineEnabled, () => {
pending.value = false
})
watch(errorSignal, () => {
pending.value = false
errorSignal.value = false
})
</script>
<style scoped>
.engine-status {
width: max(420px, 36vw);
display: flex;
align-items: center;
padding: 5px 10px;
}
.engine-status-title {
font-size: 12px;
color: var(--tag-color);
}
.about-tag {
color: var(--tag-color);
margin-bottom: 16px;
}
.proj-info {
display: inline-block;
font-size: 24px;
cursor: pointer;
color: var(--icon-color);
}
.about-modal-content {
text-align: center;
padding: 8px 0 0 0;
}
.about-title {
font-size: 1.5em;
font-weight: bold;
margin-bottom: 0.2em;
}
.about-desc {
color: #666;
margin-bottom: 0.5em;
}
.about-info {
text-align: left;
display: inline-block;
margin: 0 auto;
font-size: 1em;
}
.about-info b {
margin-right: 1em;
}
.about-date {
margin-top: 1.5em;
color: #aaa;
font-size: 0.95em;
text-align: right;
}
.caption-control {
display: flex;
flex-wrap: wrap;
justify-content: center;
margin: 30px;
}
.control-button {
height: 40px;
margin: 20px;
font-size: 16px;
}
</style>

View File

@@ -0,0 +1,63 @@
<template>
<a-card size="small" :title="$t('general.title')">
<template #extra>
<a-popover>
<template #content>
<p class="general-note">{{ $t('general.note') }}</p>
</template>
<a><InfoCircleOutlined /></a>
</a-popover>
</template>
<div>
<div class="input-item">
<span class="input-label">{{ $t('general.uiLanguage') }}</span>
<a-radio-group v-model:value="uiLanguage">
<a-radio-button value="zh">中文</a-radio-button>
<a-radio-button value="en">English</a-radio-button>
<a-radio-button value="ja">日本語</a-radio-button>
</a-radio-group>
</div>
<div class="input-item">
<span class="input-label">{{ $t('general.theme') }}</span>
<a-radio-group v-model:value="uiTheme">
<a-radio-button value="system">{{ $t('general.system') }}</a-radio-button>
<a-radio-button value="light">{{ $t('general.light') }}</a-radio-button>
<a-radio-button value="dark">{{ $t('general.dark') }}</a-radio-button>
</a-radio-group>
</div>
<div class="input-item">
<span class="input-label">{{ $t('general.barWidth') }}</span>
<a-input
type="range" class="span-input"
min="6" max="12" v-model:value="leftBarWidth"
/>
<div class="input-item-value">{{ (leftBarWidth * 100 / 24).toFixed(0) }}%</div>
</div>
</div>
</a-card>
</template>
<script setup lang="ts">
import { storeToRefs } from 'pinia'
import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
import { InfoCircleOutlined } from '@ant-design/icons-vue';
const generalSettingStore = useGeneralSettingStore()
const { uiLanguage, uiTheme, leftBarWidth } = storeToRefs(generalSettingStore)
</script>
<style scoped>
@import url(../assets/input.css);
.span-input {
width: 100px;
}
.general-note {
padding: 10px 10px 0;
max-width: min(36vw, 400px);
}
</style>

View File

@@ -0,0 +1,32 @@
export const audioTypes = {
zh: [
{
value: 0,
label: '系统音频输出(扬声器)'
},
{
value: 1,
label: '系统音频输入(麦克风)'
}
],
en: [
{
value: 0,
label: 'System Audio Output (Speaker)'
},
{
value: 1,
label: 'System Audio Input (Microphone)'
}
],
ja: [
{
value: 0,
label: 'システム音声出力(スピーカー)'
},
{
value: 1,
label: 'システム音声入力(マイク)'
}
]
}

View File

@@ -0,0 +1,78 @@
export const engines = {
zh: [
{
value: 'gummy',
label: '云端 - 阿里云 - Gummy',
languages: [
{ value: 'auto', label: '自动检测' },
{ value: 'en', label: '英语' },
{ value: 'zh', label: '中文' },
{ value: 'ja', label: '日语' },
{ value: 'ko', label: '韩语' },
{ value: 'de', label: '德语' },
{ value: 'fr', label: '法语' },
{ value: 'ru', label: '俄语' },
{ value: 'es', label: '西班牙语' },
{ value: 'it', label: '意大利语' },
]
},
{
value: 'vosk',
label: '本地 - Vosk',
languages: [
{ value: 'auto', label: '需要自行配置模型' },
]
}
],
en: [
{
value: 'gummy',
label: 'Cloud - Alibaba Cloud - Gummy',
languages: [
{ value: 'auto', label: 'Auto Detect' },
{ value: 'en', label: 'English' },
{ value: 'zh', label: 'Chinese' },
{ value: 'ja', label: 'Japanese' },
{ value: 'ko', label: 'Korean' },
{ value: 'de', label: 'German' },
{ value: 'fr', label: 'French' },
{ value: 'ru', label: 'Russian' },
{ value: 'es', label: 'Spanish' },
{ value: 'it', label: 'Italian' },
]
},
{
value: 'vosk',
label: 'Local - Vosk',
languages: [
{ value: 'auto', label: 'Model needs to be configured manually' },
]
}
],
ja: [
{
value: 'gummy',
label: 'クラウド - アリババクラウド - Gummy',
languages: [
{ value: 'auto', label: '自動検出' },
{ value: 'en', label: '英語' },
{ value: 'zh', label: '中国語' },
{ value: 'ja', label: '日本語' },
{ value: 'ko', label: '韓国語' },
{ value: 'de', label: 'ドイツ語' },
{ value: 'fr', label: 'フランス語' },
{ value: 'ru', label: 'ロシア語' },
{ value: 'es', label: 'スペイン語' },
{ value: 'it', label: 'イタリア語' },
]
},
{
value: 'vosk',
label: 'ローカル - Vosk',
languages: [
{ value: 'auto', label: 'モデルを手動で設定する必要があります' },
]
}
]
}

View File

@@ -0,0 +1,32 @@
export const breakOptions = {
zh: [
{
value: 1,
label: '换行(可能造成字幕窗口高度增加)'
},
{
value: 0,
label: '不换行(省略掉超出字幕窗口宽度的内容)'
}
],
en: [
{
value: 1,
label: 'Wrap (may increase caption window height)'
},
{
value: 0,
label: 'Do not wrap (truncate content that exceeds caption window width)'
}
],
ja: [
{
value: 1,
label: '改行する(字幕ウィンドウの高さが増える可能性があります)'
},
{
value: 0,
label: '改行しない(字幕ウィンドウの幅を超える内容は省略します)'
}
]
}

View File

@@ -0,0 +1,10 @@
import { theme } from 'ant-design-vue';
export const antDesignTheme = {
light: {
token: {}
},
dark: {
algorithm: theme.darkAlgorithm,
}
}

View File

@@ -0,0 +1,20 @@
import { createI18n } from 'vue-i18n';
import zh from './lang/zh';
import en from './lang/en';
import ja from './lang/ja';
export const i18n = createI18n({
legacy: false,
locale: 'zh',
messages: {
zh,
en,
ja
}
});
export * from './config/engine'
export * from './config/audio'
export * from './config/theme'
export * from './config/linebreak'

View File

@@ -0,0 +1,148 @@
export default {
lang: "en",
example: {
"original": "这是字幕样式预览。",
"translation": "(Translation) This is a preview of caption styles."
},
noti: {
"restarted": "Caption Engine Restarted Successfully",
"started": "Caption Engine Started Successfully",
"sLang": "Source language: ",
"trans": ", translation: ",
"engine": ", caption engine: ",
"audio": ", audio type: ",
"sysout": "system audio output (speaker)",
"sysin": "system audio input (microphone)",
"tLang": ", target language: ",
"custom": "Type: Custom engine, engine path: ",
"args": ", command arguments: ",
"pidInfo": ", caption engine process PID: ",
"empty": "Model Path is Empty",
"emptyInfo": "The Vosk model path is empty. Please set the Vosk model path in the additional settings of the subtitle engine settings.",
"stopped": "Caption Engine Stopped",
"stoppedInfo": "The caption engine has stopped. You can click the 'Start Caption Engine' button to restart it.",
"error": "An error occurred",
"engineError": "The subtitle engine encountered an error and requested a forced exit.",
"socketError": "The Socket connection between the main program and the caption engine failed",
"engineChange": "Cpation Engine Configuration Changed",
"changeInfo": "If the caption engine is already running, you need to restart it for the changes to take effect.",
"styleChange": "Caption Style Changed",
"styleInfo": "Caption style changes have been saved and applied."
},
general: {
"title": "General Settings",
"uiLanguage": "Language",
"barWidth": "Width",
"note": "General Settings take effect immediately. Please note that changes to the Caption Engine Settings and Caption Style Settings will only take effect after clicking Apply.",
"theme": "Theme",
"light": "light",
"dark": "dark",
"system": "system"
},
engine: {
"title": "Caption Engine Settings",
"applyChange": "Apply Changes",
"cancelChange": "Cancel Changes",
"sourceLang": "Source",
"transLang": "Translation",
"captionEngine": "Engine",
"audioType": "Audio Type",
"systemOutput": "System Audio Output (Speaker)",
"systemInput": "System Audio Input (Microphone)",
"enableTranslation": "Translation",
"showMore": "More Settings",
"apikey": "API KEY",
"modelPath": "Model Path",
"apikeyInfo": "API KEY required for the Gummy subtitle engine, which needs to be obtained from the Alibaba Cloud Bailing platform. For more details, see the project user manual.",
"modelPathInfo": "The folder path of the model required by the Vosk subtitle engine. You need to download the required model to your local machine in advance. For more details, see the project user manual.",
"customEngine": "Custom Engine",
custom: {
"title": "Custom Caption Engine",
"attention": "Attention",
"note": "Note: Allows users to provide captions using a custom engine. The provided engine should be able to start via the command line and can specify parameters through command-line instructions. The engine needs to communicate with the node.js backend using standard output. For more information, refer to the project's documentation.",
"app": "Engine Path",
"command": "Command"
}
},
style: {
"title": "Caption Style Settings",
"applyStyle": "Apply",
"cancelChange": "Cancel",
"resetStyle": "Reset",
"longCaption": "LongCaption",
"fontFamily": "Font Family",
"fontColor": "Font Color",
"fontSize": "Font Size",
"fontWeight": "Font Weight",
"background": "Background",
"opacity": "Opacity",
"preview": "Preview",
"translation": "Show Translation",
trans: {
"title": "Translation Style Settings",
"useSame": "Use Original Style"
},
"textShadow": "Text Shadow",
shadow: {
"title": "Text Shadow Settings",
"offsetX": "Offset X",
"offsetY": "Offset Y",
"blur": "Blur",
"color": "Color"
}
},
status: {
"engine": "Caption Engine",
"engineStatus": "Caption Engine Status",
"pid": "Process ID",
"ppid": "Parent Process ID",
"cpu": "CPU Usage",
"port": "Socket Port Number",
"mem": "Memory Usage",
"elapsed": "Running Time",
"customized": "Customized",
"status": "Engine Status",
"started": "Started",
"stopped": "Not Started",
"logNumber": "Caption Count",
"aboutProj": "About Project",
"openCaption": "Open Caption Window",
"startEngine": "Start Caption Engine",
"restartEngine": "Restart Caption Engine",
"stopEngine": "Stop Caption Engine",
about: {
"title": "About This Project",
"proj": "Auto Caption Project",
"desc": "A cross-platform real-time caption display software supporting multiple languages.",
"version": "Software Version",
"author": "Project Author",
"projLink": "Project Link",
"manual": "User Manual",
"engineDoc": "Caption Engine Manual",
"date": "July 30, 2025"
}
},
log: {
"title": "Caption Log",
"changeTime": "Modify Time",
"baseTime": "First Caption Start Time",
"hour": "Hour",
"min": "Minute",
"sec": "Second",
"ms": "Millisecond",
"export": "Export Log",
"copy": "Copy Log",
"exportOptions": "Export Options",
"exportFormat": "Format",
"exportContent": "Content",
"copyOptions": "Copy Options",
"addIndex": "Add Index",
"copyTime": "Copy Time",
"copyContent": "Content",
"both": "Both",
"source": "Original",
"translation": "Translation",
"copySuccess": "Subtitle copied to clipboard",
"clear": "Clear Log"
}
}

View File

@@ -0,0 +1,148 @@
export default {
lang: "ja",
example: {
"original": "这是字幕样式预览。",
"translation": "(翻訳)これは字幕のスタイルのプレビューです。"
},
noti: {
"restarted": "字幕エンジンが再起動しました",
"started": "字幕エンジンを開始しました",
"sLang": "ソース言語:",
"trans": "、翻訳する:",
"engine": "、字幕エンジン:",
"audio": "、オーディオタイプ:",
"sysout": "システムオーディオ出力(スピーカー)",
"sysin": "システムオーディオ入力(マイク)",
"tLang": "、翻訳先の言語:",
"custom": "タイプ:カスタムエンジン、エンジンパス:",
"args": "、コマンド引数:",
"pidInfo": "、字幕エンジンプロセス PID",
"empty": "モデルパスが空です",
"emptyInfo": "Vosk モデルのパスが空です。字幕エンジン設定の追加設定で Vosk モデルのパスを設定してください。",
"stopped": "字幕エンジンが停止しました",
"stoppedInfo": "字幕エンジンが停止しました。再起動するには「字幕エンジンを開始」ボタンをクリックしてください。",
"error": "エラーが発生しました",
"engineError": "字幕エンジンにエラーが発生し、強制終了が要求されました。",
"socketError": "メインプログラムと字幕エンジンの Socket 接続に失敗しました",
"engineChange": "字幕エンジンの設定が変更されました",
"changeInfo": "字幕エンジンがすでに起動している場合、変更を有効にするには再起動が必要です。",
"styleChange": "字幕のスタイルが変更されました",
"styleInfo": "字幕のスタイル変更が保存され、適用されました"
},
general: {
"title": "一般設定",
"uiLanguage": "言語設定",
"barWidth": "左側の幅",
"note": "一般設定はすぐに有効になります。字幕エンジンの設定と字幕スタイルの設定を変更した場合は、適用ボタンをクリックしてから有効になりますのでご注意ください。",
"theme": "テーマ",
"light": "明るい",
"dark": "暗い",
"system": "システム"
},
engine: {
"title": "字幕エンジン設定",
"applyChange": "変更を適用",
"cancelChange": "変更をキャンセル",
"sourceLang": "ソース言語",
"transLang": "翻訳言語",
"captionEngine": "エンジン",
"audioType": "オーディオ",
"systemOutput": "システムオーディオ出力(スピーカー)",
"systemInput": "システムオーディオ入力(マイク)",
"enableTranslation": "翻訳",
"showMore": "詳細設定",
"apikey": "API KEY",
"modelPath": "モデルパス",
"apikeyInfo": "Gummy 字幕エンジンに必要な API KEY は、アリババクラウド百煉プラットフォームから取得する必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
"modelPathInfo": "Vosk 字幕エンジンに必要なモデルのフォルダパスです。必要なモデルを事前にローカルマシンにダウンロードする必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
"customEngine": "カスタムエンジン",
custom: {
"title": "カスタムキャプションエンジン",
"attention": "注意事項",
"note": "注意:ユーザーがカスタムエンジンを使用して字幕を提供できるようにします。提供するエンジンは、コマンドラインから起動でき、パラメータをコマンドラインの指示で指定できる必要があります。エンジンは、標準出力を使用して node.js バックエンドと通信する必要があります。詳細については、プロジェクトドキュメントを参照してください。",
"app": "パス",
"command": "コマンド"
}
},
style: {
"title": "字幕スタイル設定",
"applyStyle": "適用",
"cancelChange": "キャンセル",
"resetStyle": "リセット",
"longCaption": "長い字幕",
"fontFamily": "フォント",
"fontColor": "カラー",
"fontSize": "サイズ",
"fontWeight": "文字の太さ",
"background": "背景色",
"opacity": "不透明度",
"preview": "プレビュー",
"translation": "翻訳表示",
trans: {
"title": "翻訳スタイル設定",
"useSame": "原文のスタイルを使用"
},
"textShadow": "文字影",
shadow: {
"title": "テキストの影設定",
"offsetX": "Offset X",
"offsetY": "Offset Y",
"blur": "ぼかし半径",
"color": "影の色"
}
},
status: {
"engine": "字幕エンジン",
"engineStatus": "字幕エンジンの状態",
"pid": "プロセス ID",
"ppid": "親プロセス ID",
"port": "Socket ポート番号",
"cpu": "CPU 使用率",
"mem": "メモリ使用量",
"elapsed": "稼働時間",
"customized": "カスタマイズ済み",
"status": "エンジン状態",
"started": "開始済み",
"stopped": "未開始",
"logNumber": "字幕数",
"aboutProj": "プロジェクト情報",
"openCaption": "字幕ウィンドウを開く",
"startEngine": "字幕エンジンを開始",
"restartEngine": "字幕エンジンを再起動",
"stopEngine": "字幕エンジンを停止",
about: {
"title": "このプロジェクトについて",
"proj": "Auto Caption プロジェクト",
"desc": "複数の言語をサポートするクロスプラットフォームのリアルタイム字幕表示ソフトウェア。",
"version": "ソフトウェアバージョン",
"author": "プロジェクト作者",
"projLink": "プロジェクトリンク",
"manual": "ユーザーマニュアル",
"engineDoc": "字幕エンジンマニュアル",
"date": "2025 年 7 月 30 日"
}
},
log: {
"title": "字幕ログ",
"changeTime": "時間を変更",
"baseTime": "最初の字幕開始時間",
"hour": "時",
"min": "分",
"sec": "秒",
"ms": "ミリ秒",
"export": "エクスポート",
"copy": "ログをコピー",
"exportOptions": "エクスポートオプション",
"exportFormat": "形式",
"exportContent": "内容",
"copyOptions": "コピー設定",
"addIndex": "順序番号",
"copyTime": "時間",
"copyContent": "内容",
"both": "すべて",
"source": "原文",
"translation": "翻訳",
"copySuccess": "字幕がクリップボードにコピーされました",
"clear": "ログをクリア"
}
}

View File

@@ -0,0 +1,148 @@
export default {
lang: "zh",
example: {
"original": "This is a preview of caption styles. ",
"translation": "(翻译)这是字幕样式预览。"
},
noti: {
"restarted": "字幕引擎重启成功",
"started": "字幕引擎启动成功",
"sLang": "源语言:",
"trans": ",是否翻译:",
"engine": ",字幕引擎:",
"audio": ",音频类型:",
"sysout": "系统音频输出(扬声器)",
"sysin": "系统音频输入(麦克风)",
"tLang": ",翻译语言:",
"custom": "类型:自定义引擎,引擎路径:",
"args": ",命令参数:",
"pidInfo": ",字幕引擎进程 PID",
"empty": "模型路径为空",
"emptyInfo": "Vosk 模型模型路径为空,请在字幕引擎设置的更多设置中设置 Vosk 模型的路径。",
"stopped": "字幕引擎停止",
"stoppedInfo": "字幕引擎已经停止,可点击“启动字幕引擎”按钮重新启动",
"error": "发生错误",
"engineError": "字幕引擎发生错误并请求强制退出",
"socketError": "主程序与字幕引擎的 Socket 连接未成功",
"engineChange": "字幕引擎配置已更改",
"changeInfo": "如果字幕引擎已经启动,需要重启字幕引擎修改才会生效",
"styleChange": "字幕样式已修改",
"styleInfo": "字幕样式修改已经保存并生效"
},
general: {
"title": "通用设置",
"uiLanguage": "界面语言",
"barWidth": "左侧宽度",
"note": "通用设置修改后立即生效。注意字幕引擎设置和字幕样式的设置修改后需要点击应用后才会生效。",
"theme": "主题",
"light": "浅色",
"dark": "深色",
"system": "系统"
},
engine: {
"title": "字幕引擎设置",
"applyChange": "应用更改",
"cancelChange": "取消更改",
"sourceLang": "源语言",
"transLang": "翻译语言",
"captionEngine": "字幕引擎",
"audioType": "音频类型",
"systemOutput": "系统音频输出(扬声器)",
"systemInput": "系统音频输入(麦克风)",
"enableTranslation": "启用翻译",
"showMore": "更多设置",
"apikey": "API KEY",
"modelPath": "模型路径",
"apikeyInfo": "Gummy 字幕引擎需要的 API KEY需要在阿里云百炼平台获取。详细信息见项目用户手册。",
"modelPathInfo": "Vosk 字幕引擎需要的模型的文件夹路径,需要提前下载需要的模型到本地。信息详情见项目用户手册。",
"customEngine": "自定义引擎",
custom: {
"title": "自定义字幕引擎",
"attention": "注意事项",
"note": "说明:允许用户使用自定义引擎提供字幕。提供的引擎要能通过命令行启动,且可以提供命令行指令来指定参数。引擎需要使用标准输出与软件 node.js 后端进行通信。详细信息参考项目文档。",
"app": "引擎路径",
"command": "引擎指令"
}
},
style: {
"title": "字幕样式设置",
"applyStyle": "应用样式",
"cancelChange": "取消更改",
"resetStyle": "恢复默认",
"longCaption": "长字幕",
"fontFamily": "字体族",
"fontColor": "字体颜色",
"fontSize": "字体大小",
"fontWeight": "字体粗细",
"background": "背景颜色",
"opacity": "不透明度",
"preview": "显示预览",
"translation": "显示翻译",
trans: {
"title": "翻译样式设置",
"useSame": "使用原文样式"
},
"textShadow": "文本阴影",
shadow: {
"title": "文本阴影设置",
"offsetX": "X轴偏移",
"offsetY": "Y轴偏移",
"blur": "模糊半径",
"color": "阴影颜色"
}
},
status: {
"engine": "字幕引擎",
"engineStatus": "字幕引擎状态",
"pid": "进程ID",
"ppid": "父进程ID",
"port": "Socket 端口号",
"cpu": "CPU使用率",
"mem": "内存使用量",
"elapsed": "运行时间",
"customized": "自定义",
"status": "引擎状态",
"started": "已启动",
"stopped": "未启动",
"logNumber": "字幕数量",
"aboutProj": "项目关于",
"openCaption": "打开字幕窗口",
"startEngine": "启动字幕引擎",
"restartEngine": "重启字幕引擎",
"stopEngine": "关闭字幕引擎",
about: {
"title": "关于本项目",
"proj": "Auto Caption 项目",
"desc": "一个跨平台的支持多种语言的实时字幕显示软件。",
"version": "软件版本",
"author": "项目作者",
"projLink": "项目链接",
"manual": "用户手册",
"engineDoc": "字幕引擎手册",
"date": "2025 年 7 月 30 日"
}
},
log: {
"title": "字幕记录",
"changeTime": "修改时间",
"baseTime": "首条字幕起始时间",
"hour": "时",
"min": "分",
"sec": "秒",
"ms": "毫秒",
"export": "导出字幕",
"copy": "复制内容",
"exportOptions": "导出选项",
"exportFormat": "导出格式",
"exportContent": "导出内容",
"copyOptions": "复制选项",
"addIndex": "添加序号",
"copyTime": "复制时间",
"copyContent": "复制内容",
"both": "全部",
"source": "原文",
"translation": "翻译",
"copySuccess": "字幕已复制到剪贴板",
"clear": "清空记录"
}
}

View File

@@ -1,14 +1,17 @@
import './assets/reset.css'
import { createPinia } from 'pinia'
import './assets/main.css'
import { createApp } from 'vue'
import { createPinia } from 'pinia'
import App from './App.vue'
import router from './router'
import { i18n } from './i18n'
import Antd from 'ant-design-vue';
import 'ant-design-vue/dist/reset.css';
const app = createApp(App)
app.use(createPinia())
app.use(router)
app.use(i18n)
app.use(Antd)
app.mount('#app')
app.mount('#app')

Some files were not shown because too many files have changed in this diff Show More