Compare commits
78 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
564954a834 | ||
|
|
aed15af386 | ||
|
|
4f9d33abc1 | ||
|
|
0dc70d491e | ||
|
|
086ea90a5f | ||
|
|
3324b630d1 | ||
|
|
0825e48902 | ||
|
|
383e582a2d | ||
|
|
e6a65f8362 | ||
|
|
77726753bb | ||
|
|
4b47e50d9e | ||
|
|
4494b2c68b | ||
|
|
4abd6d0808 | ||
|
|
6bff978b88 | ||
|
|
eba2c5ca45 | ||
|
|
2b7ce06f04 | ||
|
|
14987cbfc5 | ||
|
|
56fdc348f8 | ||
|
|
f42458124e | ||
|
|
2352bcee5d | ||
|
|
051a497f3a | ||
|
|
34362fea3d | ||
|
|
771f7ad002 | ||
|
|
01936d5f12 | ||
|
|
1c0bf1f9c4 | ||
|
|
38b4b15cec | ||
|
|
64ea2f0daf | ||
|
|
a7a60da260 | ||
|
|
1b7ff33656 | ||
|
|
d5d692188e | ||
|
|
e4f937e6b6 | ||
|
|
cd9f3a847d | ||
|
|
b658ef5440 | ||
|
|
3792eb88b6 | ||
|
|
8e575a9ba3 | ||
|
|
697488ce84 | ||
|
|
f7d2df938d | ||
|
|
5513c7e84c | ||
|
|
25b6ad5ed2 | ||
|
|
760c01d79e | ||
|
|
a0a0a2e66d | ||
|
|
665c47d24f | ||
|
|
7f8766b13e | ||
|
|
6920957152 | ||
|
|
604f8becc9 | ||
|
|
0af5bab75d | ||
|
|
0b8b823b2e | ||
|
|
d354a6fefa | ||
|
|
1c29fd5adc | ||
|
|
f97b885411 | ||
|
|
606f9b480b | ||
|
|
546beb3112 | ||
|
|
3c9138f115 | ||
|
|
cbbaaa95a3 | ||
|
|
7e953db6bd | ||
|
|
65da30f83d | ||
|
|
1965bbfee7 | ||
|
|
8ac1c99c63 | ||
|
|
082eb8579b | ||
|
|
0696651f04 | ||
|
|
f2aa075e65 | ||
|
|
213426dace | ||
|
|
50ea9c5e4c | ||
|
|
22cfb75d2c | ||
|
|
f29e15cde5 | ||
|
|
14e7a7bce4 | ||
|
|
0b279dedbf | ||
|
|
0a10068b38 | ||
|
|
d608bf59c7 | ||
|
|
3dcba07b6e | ||
|
|
e77779b72a | ||
|
|
e30124cb87 | ||
|
|
301c691f04 | ||
|
|
4ff1346b6d | ||
|
|
b28799b03f | ||
|
|
147e328d8c | ||
|
|
c086725d98 | ||
|
|
fae8b32edf |
@@ -6,4 +6,10 @@ indent_style = space
|
||||
indent_size = 2
|
||||
end_of_line = lf
|
||||
insert_final_newline = true
|
||||
trim_trailing_whitespace = true
|
||||
trim_trailing_whitespace = true
|
||||
|
||||
[*.py]
|
||||
indent_size = 4
|
||||
|
||||
[*.ipynb]
|
||||
indent_size = 4
|
||||
|
||||
14
.gitignore
vendored
@@ -5,5 +5,15 @@ out
|
||||
.eslintcache
|
||||
*.log*
|
||||
__pycache__
|
||||
subenv
|
||||
python-subprocess/build
|
||||
.venv
|
||||
test.py
|
||||
|
||||
engine/build
|
||||
engine/portaudio
|
||||
engine/pyinstaller_cache
|
||||
engine/models
|
||||
engine/notebook
|
||||
# engine/main.spec
|
||||
|
||||
.repomap
|
||||
.virtualme
|
||||
|
||||
5
.vscode/settings.json
vendored
@@ -7,5 +7,8 @@
|
||||
},
|
||||
"[json]": {
|
||||
"editor.defaultFormatter": "esbenp.prettier-vscode"
|
||||
}
|
||||
},
|
||||
"python.analysis.extraPaths": [
|
||||
"./engine"
|
||||
]
|
||||
}
|
||||
|
||||
242
README.md
@@ -1,37 +1,205 @@
|
||||
<div align="center" >
|
||||
<img src="./resources/icon.png" width="100px" height="100px"/>
|
||||
<img src="./build/icon.png" width="100px" height="100px"/>
|
||||
<h1 align="center">auto-caption</h1>
|
||||
<p>Auto Caption 是一个跨平台的视频播放和字幕显示软件。</p>
|
||||
<b>项目初版已经开发完毕。</b>
|
||||
<p>Auto Caption 是一个跨平台的实时字幕显示软件。</p>
|
||||
<p>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.1.1-blue"></a>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
|
||||
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
|
||||
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
|
||||
<img src="https://img.shields.io/github/stars/HiMeditator/auto-caption?style=social">
|
||||
</p>
|
||||
<p>
|
||||
| <b>简体中文</b>
|
||||
| <a href="./README_en.md">English</a>
|
||||
| <a href="./README_ja.md">日本語</a> |
|
||||
</p>
|
||||
<p><i>v1.1.1 版本已经发布,新增 GLM-ASR 云端字幕模型和 OpenAI 兼容模型翻译...</i></p>
|
||||
</div>
|
||||
|
||||

|
||||

|
||||
|
||||
## 📥 下载
|
||||
|
||||
[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
|
||||
软件下载:[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
|
||||
|
||||
## 📚 用户手册
|
||||
Vosk 模型下载:[Vosk Models](https://alphacephei.com/vosk/models)
|
||||
|
||||
暂无
|
||||
SOSV 模型下载:[ Shepra-ONNX SenseVoice Model](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model)
|
||||
|
||||
### 基本使用
|
||||
## 📚 相关文档
|
||||
|
||||
目前仅提供 Windows 平台的可安装版本。如果使用默认的 Gummy 字幕引擎,需要获取阿里云百炼平台的 API KEY 并配置到环境变量中才能正常使用该模型。相关教程:[获取API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)、[将API Key配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)。
|
||||
[Auto Caption 用户手册](./docs/user-manual/zh.md)
|
||||
|
||||
[字幕引擎说明文档](./docs/engine-manual/zh.md)
|
||||
|
||||
[更新日志](./docs/CHANGELOG.md)
|
||||
|
||||
## 👁️🗨️ 预览
|
||||
|
||||
https://github.com/user-attachments/assets/9c188d78-9520-4397-bacf-4c8fdcc54874
|
||||
|
||||
对于开发者,可以自己创建新的字幕引擎。具体通信规范请参考源代码。
|
||||
## ✨ 特性
|
||||
|
||||
- 丰富的字幕样式设置
|
||||
- 灵活的字幕引擎选择
|
||||
- 多语言识别与翻译
|
||||
- 字幕记录展示与导出
|
||||
- 生成音频输出和麦克风输入的字幕
|
||||
- 生成音频输出或麦克风输入的字幕
|
||||
- 支持调用本地 Ollama 模型、云端 OpenAI 兼容模型、或云端 Google 翻译 API 进行翻译
|
||||
- 跨平台(Windows、macOS、Linux)、多界面语言(中文、英语、日语)支持
|
||||
- 丰富的字幕样式设置(字体、字体大小、字体粗细、字体颜色、背景颜色等)
|
||||
- 灵活的字幕引擎选择(阿里云 Gummy 云端模型、GLM-ASR 云端模型、本地 Vosk 模型、本地 SOSV 模型、还可以自己开发模型)
|
||||
- 多语言识别与翻译(见下文“⚙️ 自带字幕引擎说明”)
|
||||
- 字幕记录展示与导出(支持导出 `.srt` 和 `.json` 格式)
|
||||
|
||||
说明:Windows 平台支持生成音频输出和麦克风输入的字幕,Linux 平台仅支持生成麦克风输入的字幕。
|
||||
## 📖 基本使用
|
||||
|
||||
> ⚠️ 注意:目前只维护了 Windows 平台的软件的最新版本,其他平台的最后版本停留在 v1.0.0。
|
||||
|
||||
软件已经适配了 Windows、macOS 和 Linux 平台。测试过的主流平台信息如下:
|
||||
|
||||
| 操作系统版本 | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
|
||||
| ------------------ | ---------- | ---------------- | ---------------- |
|
||||
| Windows 11 24H2 | x64 | ✅ | ✅ |
|
||||
| macOS Sequoia 15.5 | arm64 | ✅ [需要额外配置](./docs/user-manual/zh.md#macos-获取系统音频输出) | ✅ |
|
||||
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
|
||||
|
||||
macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置,详见 [Auto Caption 用户手册](./docs/user-manual/zh.md)。
|
||||
|
||||
下载软件后,需要根据自己的需求选择对应的模型,然后配置模型。
|
||||
|
||||
| | 准确率 | 实时性 | 部署类型 | 支持语言 | 翻译 | 备注 |
|
||||
| ------------------------------------------------------------ | -------- | ------------- | ---------- | ---------- | ---------------------------------------------------------- | ---------------------------------------------------------- |
|
||||
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | 很好😊 | 很好😊 | 云端 / 阿里云 | 10 种 | 自带翻译 | 收费,0.54CNY / 小时 |
|
||||
| [glm-asr-2512](https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512) | 很好😊 | 较差😞 | 云端 / 智谱 AI | 4 种 | 需额外配置 | 收费,约 0.72CNY / 小时 |
|
||||
| [Vosk](https://alphacephei.com/vosk) | 较差😞 | 很好😊 | 本地 / CPU | 超过 30 种 | 需额外配置 | 支持的语言非常多 |
|
||||
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | 一般😐 | 一般😐 | 本地 / CPU | 5 种 | 需额外配置 | 仅有一个模型 |
|
||||
| 自己开发 | 🤔 | 🤔 | 自定义 | 自定义 | 自定义 | 根据[文档](./docs/engine-manual/zh.md)使用 Python 自己开发 |
|
||||
|
||||
如果你选择的不是 Gummy 模型,你还需要配置自己的翻译模型。
|
||||
|
||||
### 配置翻译模型
|
||||
|
||||

|
||||
|
||||
> 注意:翻译不是实时的,翻译模型只会在每句话识别完成后再调用。
|
||||
|
||||
#### Ollama 本地模型
|
||||
|
||||
> 注意:使用参数量过大的模型会导致资源消耗和翻译延迟较大。建议使用参数量小于 1B 的模型,比如: `qwen2.5:0.5b`, `qwen3:0.6b`。
|
||||
|
||||
使用该模型之前你需要确定本机安装了 [Ollama](https://ollama.com/) 软件,并已经下载了需要的大语言模型。只需要将需要调用的大模型名称添加到设置中的 `模型名称` 字段中,并保证 `Base URL` 字段为空。
|
||||
|
||||
#### OpenAI 兼容模型
|
||||
|
||||
如果觉得本地 Ollama 模型的翻译效果不佳,或者不想在本地安装 Ollama 模型,那么可以使用云端的 OpenAI 兼容模型。
|
||||
|
||||
以下是一些模型提供商的 `Base URL`:
|
||||
- OpenAI: https://api.openai.com/v1
|
||||
- DeepSeek:https://api.deepseek.com
|
||||
- 阿里云:https://dashscope.aliyuncs.com/compatible-mode/v1
|
||||
|
||||
API Key 需要在对应的模型提供商处获取。
|
||||
|
||||
#### Google 翻译 API
|
||||
|
||||
> 注意:Google 翻译 API 在无法访问国际网络的地区无法使用。
|
||||
|
||||
无需任何配置,联网即可使用。
|
||||
|
||||
### 使用 Gummy 模型
|
||||
|
||||
> 国际版的阿里云服务似乎并没有提供 Gummy 模型,因此目前非中国用户可能无法使用 Gummy 字幕引擎。
|
||||
|
||||
如果要使用默认的 Gummy 字幕引擎(使用云端模型进行语音识别和翻译),首先需要获取阿里云百炼平台的 API KEY,然后将 API KEY 添加到软件设置中(在字幕引擎设置的更多设置中)或者配置到环境变量中(仅 Windows 平台支持读取环境变量中的 API KEY),这样才能正常使用该模型。相关教程:
|
||||
|
||||
- [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
### 使用 GLM-ASR 模型
|
||||
|
||||
使用前需要获取智谱 AI 平台的 API KEY,并添加到软件设置中。
|
||||
|
||||
API KEY 获取相关链接:[快速开始](https://docs.bigmodel.cn/cn/guide/start/quick-start)。
|
||||
|
||||
### 使用 Vosk 模型
|
||||
|
||||
> Vosk 模型的识别效果较差,请谨慎使用。
|
||||
|
||||
如果要使用 Vosk 本地字幕引擎,首先需要在 [Vosk Models](https://alphacephei.com/vosk/models) 页面下载你需要的模型,并将模型解压到本地,并将模型文件夹的路径添加到软件的设置中。
|
||||
|
||||

|
||||
|
||||
### 使用 SOSV 模型
|
||||
|
||||
使用 SOSV 模型的方式和 Vosk 一样,下载地址如下:https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
|
||||
|
||||
## ⌨️ 在终端中使用
|
||||
|
||||
软件采用模块化设计,可用分为软件主体和字幕引擎两部分,软件主体通过图形界面调用字幕引擎。核心的音频获取和音频识别功能都在字幕引擎中实现,而字幕引擎是可用脱离软件主体单独使用的。
|
||||
|
||||
字幕引擎使用 Python 开发,通过 PyInstaller 打包为可执行文件。因此字幕引擎有两种使用方式:
|
||||
|
||||
1. 使用项目字幕引擎部分的源代码,使用安装了对应库的 Python 环境进行运行
|
||||
2. 使用打包好的字幕引擎的可执行文件,通过终端运行
|
||||
|
||||
运行参数和详细使用介绍请参考[用户手册](./docs/user-manual/zh.md#单独使用字幕引擎)。
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e gummy \
|
||||
-k sk-******************************** \
|
||||
-a 0 \
|
||||
-d 1 \
|
||||
-s en \
|
||||
-t zh
|
||||
```
|
||||
|
||||

|
||||
|
||||
## ⚙️ 自带字幕引擎说明
|
||||
|
||||
目前软件自带 4 个字幕引擎。它们的详细信息如下。
|
||||
|
||||
### Gummy 字幕引擎(云端)
|
||||
|
||||
基于通义实验室[Gummy语音翻译大模型](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation/)进行开发,基于[阿里云百炼](https://bailian.console.aliyun.com)的 API 进行调用该云端模型。
|
||||
|
||||
**模型详细参数:**
|
||||
|
||||
- 音频采样率支持:16kHz及以上
|
||||
- 音频采样位数:16bit
|
||||
- 音频通道数支持:单通道
|
||||
- 可识别语言:中文、英文、日语、韩语、德语、法语、俄语、意大利语、西班牙语
|
||||
- 支持的翻译:
|
||||
- 中文 → 英文、日语、韩语
|
||||
- 英文 → 中文、日语、韩语
|
||||
- 日语、韩语、德语、法语、俄语、意大利语、西班牙语 → 中文或英文
|
||||
|
||||
**网络流量消耗:**
|
||||
|
||||
字幕引擎使用原生采样率(假设为 48kHz)进行采样,样本位深为 16bit,上传音频为为单通道,因此上传速率约为:
|
||||
|
||||
$$
|
||||
48000\ \text{samples/second} \times 2\ \text{bytes/sample} \times 1\ \text{channel} = 93.75\ \text{KB/s}
|
||||
$$
|
||||
|
||||
而且引擎只会获取到音频流的时候才会上传数据,因此实际上传速率可能更小。模型结果回传流量消耗较小,没有纳入考虑。
|
||||
|
||||
### GLM-ASR 字幕引擎(云端)
|
||||
|
||||
https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512
|
||||
|
||||
### Vosk 字幕引擎(本地)
|
||||
|
||||
基于 [vosk-api](https://github.com/alphacep/vosk-api) 开发。该字幕引擎的优点是可选的语言模型非常多(超过 30 种),缺点是识别效果比较差,且生成内容没有标点符号。
|
||||
|
||||
|
||||
### SOSV 字幕引擎(本地)
|
||||
|
||||
[SOSV](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model) 是一个整合包,该整合包主要基于 [Shepra-ONNX SenseVoice](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html),并添加了端点检测模型和标点恢复模型。该模型支持识别的语言有:英语、中文、日语、韩语、粤语。
|
||||
|
||||
## 🚀 项目运行
|
||||
|
||||

|
||||
|
||||
### 安装依赖
|
||||
|
||||
```bash
|
||||
@@ -40,30 +208,26 @@ npm install
|
||||
|
||||
### 构建字幕引擎
|
||||
|
||||
> #### 背景介绍
|
||||
>
|
||||
> 所谓的字幕引擎实际上是一个子程序,它会实时获取系统音频输入(录音)或输出(播放声音)的流式数据,并调用音频转文字的模型生成对应音频的字幕。生成的字幕通过 IPC 输出为转换为字符串的 JSON 数据,并返回给主程序。主程序读取字幕数据,处理后显示在窗口上。
|
||||
>
|
||||
>目前项目默认使用[阿里云 Gummy 模型](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation/),需要获取阿里云百炼平台的 API KEY 并配置到环境变量中才能正常使用该模型,相关教程:[获取API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)、[将API Key配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)。
|
||||
>
|
||||
> 本项目的 gummy 字幕引擎是一个 python 子程序,通过 pyinstaller 打包为可执行文件。 运行字幕引擎子程序的代码在 `src\main\utils\engine.ts` 文件中。
|
||||
|
||||
首先进入 `python-subprocess` 文件夹,执行如下指令创建虚拟环境:
|
||||
首先进入 `engine` 文件夹,执行如下指令创建虚拟环境(需要使用大于等于 Python 3.10 的 Python 运行环境,建议使用 Python 3.12):
|
||||
|
||||
```bash
|
||||
python -m venv subenv
|
||||
cd ./engine
|
||||
# in ./engine folder
|
||||
python -m venv .venv
|
||||
# or
|
||||
python3 -m venv .venv
|
||||
```
|
||||
|
||||
然后激活虚拟环境:
|
||||
|
||||
```bash
|
||||
# Windows
|
||||
subenv/Scripts/activate
|
||||
# Linux
|
||||
source subenv/bin/activate
|
||||
.venv/Scripts/activate
|
||||
# Linux or macOS
|
||||
source .venv/bin/activate
|
||||
```
|
||||
|
||||
然后安装依赖(注意如果是 Linux 环境,需要注释调 `requirements.txt` 中的 `PyAudioWPatch`,该模块仅适用于 Windows 环境):
|
||||
然后安装依赖(这一步在 macOS 和 Linux 可能会报错,一般是因为构建失败,需要根据报错信息进行处理):
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
@@ -72,19 +236,27 @@ pip install -r requirements.txt
|
||||
然后使用 `pyinstaller` 构建项目:
|
||||
|
||||
```bash
|
||||
pyinstaller --onefile main-gummy.py
|
||||
pyinstaller ./main.spec
|
||||
```
|
||||
|
||||
此时项目构建完成,在进入 `python-subprocess/dist` 文件夹可见对应的可执行文件。即可进行后续操作。
|
||||
注意 `main.spec` 文件中 `vosk` 库的路径可能不正确,需要根据实际状况配置(与 Python 环境的版本相关)。
|
||||
|
||||
```
|
||||
# Windows
|
||||
vosk_path = str(Path('./.venv/Lib/site-packages/vosk').resolve())
|
||||
# Linux or macOS
|
||||
vosk_path = str(Path('./.venv/lib/python3.x/site-packages/vosk').resolve())
|
||||
```
|
||||
|
||||
此时项目构建完成,进入 `engine/dist` 文件夹可见对应的可执行文件。即可进行后续操作。
|
||||
|
||||
### 运行项目
|
||||
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
### 构建项目
|
||||
|
||||
注意目前软件没有适配 macOS 平台,请使用 Windows 或 Linux 系统进行构建。
|
||||
### 构建项目
|
||||
|
||||
```bash
|
||||
# For windows
|
||||
@@ -93,4 +265,4 @@ npm run build:win
|
||||
npm run build:mac
|
||||
# For Linux
|
||||
npm run build:linux
|
||||
```
|
||||
```
|
||||
|
||||
268
README_en.md
Normal file
@@ -0,0 +1,268 @@
|
||||
<div align="center" >
|
||||
<img src="./build/icon.png" width="100px" height="100px"/>
|
||||
<h1 align="center">auto-caption</h1>
|
||||
<p>Auto Caption is a cross-platform real-time caption display software.</p>
|
||||
<p>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.1.1-blue"></a>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
|
||||
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
|
||||
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
|
||||
<img src="https://img.shields.io/github/stars/HiMeditator/auto-caption?style=social">
|
||||
</p>
|
||||
<p>
|
||||
| <a href="./README.md">简体中文</a>
|
||||
| <b>English</b>
|
||||
| <a href="./README_ja.md">日本語</a> |
|
||||
</p>
|
||||
<p><i>v1.1.1 has been released, adding the GLM-ASR cloud caption model and OpenAI compatible model translation...</i></p>
|
||||
</div>
|
||||
|
||||

|
||||
|
||||
## 📥 Download
|
||||
|
||||
Software Download: [GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
|
||||
|
||||
Vosk Model Download: [Vosk Models](https://alphacephei.com/vosk/models)
|
||||
|
||||
SOSV Model Download: [Shepra-ONNX SenseVoice Model](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model)
|
||||
|
||||
## 📚 Documentation
|
||||
|
||||
[Auto Caption User Manual](./docs/user-manual/en.md)
|
||||
|
||||
[Caption Engine Documentation](./docs/engine-manual/en.md)
|
||||
|
||||
[Changelog](./docs/CHANGELOG.md)
|
||||
|
||||
## 👁️🗨️ Preview
|
||||
|
||||
https://github.com/user-attachments/assets/9c188d78-9520-4397-bacf-4c8fdcc54874
|
||||
|
||||
## ✨ Features
|
||||
|
||||
- Generate captions from audio output or microphone input
|
||||
- Supports calling local Ollama models, cloud-based OpenAI compatible models, or cloud-based Google Translate API for translation
|
||||
- Cross-platform (Windows, macOS, Linux) and multi-language interface (Chinese, English, Japanese) support
|
||||
- Rich caption style settings (font, font size, font weight, font color, background color, etc.)
|
||||
- Flexible caption engine selection (Aliyun Gummy cloud model,GLM-ASR cloud model, local Vosk model, local SOSV model, or you can develop your own model)
|
||||
- Multi-language recognition and translation (see below "⚙️ Built-in Subtitle Engines")
|
||||
- Subtitle record display and export (supports exporting `.srt` and `.json` formats)
|
||||
|
||||
## 📖 Basic Usage
|
||||
|
||||
> ⚠️ Note: Currently, only the latest version of the software on Windows platform is maintained, while the last versions for other platforms remain at v1.0.0.
|
||||
|
||||
The software has been adapted for Windows, macOS, and Linux platforms. The tested platform information is as follows:
|
||||
|
||||
| OS Version | Architecture | System Audio Input | System Audio Output |
|
||||
| ------------------ | ------------ | ------------------ | ------------------- |
|
||||
| Windows 11 24H2 | x64 | ✅ | ✅ |
|
||||
| macOS Sequoia 15.5 | arm64 | ✅ [Additional config required](./docs/user-manual/en.md#capturing-system-audio-output-on-macos) | ✅ |
|
||||
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
|
||||
|
||||
Additional configuration is required to capture system audio output on macOS and Linux platforms. See [Auto Caption User Manual](./docs/user-manual/en.md) for details.
|
||||
|
||||
|
||||
After downloading the software, you need to select the corresponding model according to your needs and then configure the model.
|
||||
|
||||
| | Accuracy | Real-time | Deployment Type | Supported Languages | Translation | Notes |
|
||||
| ------------------------------------------------------------ | -------- | --------- | --------------- | ------------------- | ----------- | ----- |
|
||||
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | Very good 😊 | Very good 😊 | Cloud / Alibaba Cloud | 10 languages | Built-in translation | Paid, 0.54 CNY/hour |
|
||||
| [glm-asr-2512](https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512) | Very good 😊 | Poor 😞 | Cloud / Zhipu AI | 4 languages | Requires additional configuration | Paid, approximately 0.72 CNY/hour |
|
||||
| [Vosk](https://alphacephei.com/vosk) | Poor 😞 | Very good 😊 | Local / CPU | Over 30 languages | Requires additional configuration | Supports many languages |
|
||||
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | Average 😐 | Average 😐 | Local / CPU | 5 languages | Requires additional configuration | Only one model |
|
||||
| Self-developed | 🤔 | 🤔 | Custom | Custom | Custom | Develop your own using Python according to the [documentation](./docs/engine-manual/en.md) |
|
||||
|
||||
If you choose a model other than Gummy, you also need to configure your own translation model.
|
||||
|
||||
### Configuring Translation Models
|
||||
|
||||

|
||||
|
||||
> Note: Translation is not real-time. The translation model is only called after each sentence recognition is completed.
|
||||
|
||||
#### Ollama Local Model
|
||||
|
||||
> Note: Using models with too many parameters will lead to high resource consumption and translation delays. It is recommended to use models with less than 1B parameters, such as: `qwen2.5:0.5b`, `qwen3:0.6b`.
|
||||
|
||||
Before using this model, you need to confirm that the [Ollama](https://ollama.com/) software is installed on your local machine and that you have downloaded the required large language model. Simply add the name of the large model you want to call to the `Model Name` field in the settings, and ensure that the `Base URL` field is empty.
|
||||
|
||||
#### OpenAI Compatible Model
|
||||
|
||||
If you feel the translation effect of the local Ollama model is not good enough, or don't want to install the Ollama model locally, then you can use cloud-based OpenAI compatible models.
|
||||
|
||||
Here are some model provider `Base URL`s:
|
||||
- OpenAI: https://api.openai.com/v1
|
||||
- DeepSeek: https://api.deepseek.com
|
||||
- Alibaba Cloud: https://dashscope.aliyuncs.com/compatible-mode/v1
|
||||
|
||||
The API Key needs to be obtained from the corresponding model provider.
|
||||
|
||||
#### Google Translate API
|
||||
|
||||
> Note: Google Translate API is not available in some regions.
|
||||
|
||||
No configuration required, just connect to the internet to use.
|
||||
|
||||
### Using Gummy Model
|
||||
|
||||
> The international version of Alibaba Cloud services does not seem to provide the Gummy model, so non-Chinese users may not be able to use the Gummy caption engine at present.
|
||||
|
||||
To use the default Gummy caption engine (using cloud models for speech recognition and translation), you first need to obtain an API KEY from Alibaba Cloud Bailian platform, then add the API KEY to the software settings or configure it in the environment variables (only Windows platform supports reading API KEY from environment variables), so that the model can be used normally. Related tutorials:
|
||||
|
||||
- [Get API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [Configure API Key through Environment Variables](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
### Using the GLM-ASR Model
|
||||
|
||||
Before using it, you need to obtain an API KEY from the Zhipu AI platform and add it to the software settings.
|
||||
|
||||
For API KEY acquisition, see: [Quick Start](https://docs.bigmodel.cn/en/guide/start/quick-start).
|
||||
|
||||
### Using Vosk Model
|
||||
|
||||
> The recognition effect of the Vosk model is poor, please use it with caution.
|
||||
|
||||
To use the Vosk local caption engine, first download the model you need from the [Vosk Models](https://alphacephei.com/vosk/models) page, unzip the model locally, and add the path of the model folder to the software settings.
|
||||
|
||||

|
||||
|
||||
### Using SOSV Model
|
||||
|
||||
The way to use the SOSV model is the same as Vosk. The download address is as follows: https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
|
||||
|
||||
## ⌨️ Using in Terminal
|
||||
|
||||
The software adopts a modular design and can be divided into two parts: the main software body and caption engine. The main software calls caption engine through a graphical interface. Audio acquisition and speech recognition functions are implemented in the caption engine, which can be used independently without the main software.
|
||||
|
||||
Caption engine is developed using Python and packaged as executable files via PyInstaller. Therefore, there are two ways to use caption engine:
|
||||
|
||||
1. Use the source code of the project's caption engine part and run it with a Python environment that has the required libraries installed
|
||||
2. Run the packaged executable file of the caption engine through the terminal
|
||||
|
||||
For runtime parameters and detailed usage instructions, please refer to the [User Manual](./docs/user-manual/en.md#using-caption-engine-standalone).
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e gummy \
|
||||
-k sk-******************************** \
|
||||
-a 0 \
|
||||
-d 1 \
|
||||
-s en \
|
||||
-t zh
|
||||
```
|
||||
|
||||

|
||||
|
||||
## ⚙️ Built-in Subtitle Engines
|
||||
|
||||
Currently, the software comes with 4 caption engines, with new engines under development. Their detailed information is as follows.
|
||||
|
||||
### Gummy Subtitle Engine (Cloud)
|
||||
|
||||
Developed based on Tongyi Lab's [Gummy Speech Translation Model](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation/), using [Alibaba Cloud Bailian](https://bailian.console.aliyun.com) API to call this cloud model.
|
||||
|
||||
**Model Parameters:**
|
||||
|
||||
- Supported audio sample rate: 16kHz and above
|
||||
- Audio sample depth: 16bit
|
||||
- Supported audio channels: Mono
|
||||
- Recognizable languages: Chinese, English, Japanese, Korean, German, French, Russian, Italian, Spanish
|
||||
- Supported translations:
|
||||
- Chinese → English, Japanese, Korean
|
||||
- English → Chinese, Japanese, Korean
|
||||
- Japanese, Korean, German, French, Russian, Italian, Spanish → Chinese or English
|
||||
|
||||
**Network Traffic Consumption:**
|
||||
|
||||
The caption engine uses native sample rate (assumed to be 48kHz) for sampling, with 16bit sample depth and mono channel, so the upload rate is approximately:
|
||||
|
||||
$$
|
||||
48000\ \text{samples/second} \times 2\ \text{bytes/sample} \times 1\ \text{channel} = 93.75\ \text{KB/s}
|
||||
$$
|
||||
|
||||
The engine only uploads data when receiving audio streams, so the actual upload rate may be lower. The return traffic consumption of model results is small and not considered here.
|
||||
|
||||
### GLM-ASR Caption Engine (Cloud)
|
||||
|
||||
https://docs.bigmodel.cn/en/guide/models/sound-and-video/glm-asr-2512
|
||||
|
||||
### Vosk Subtitle Engine (Local)
|
||||
|
||||
Developed based on [vosk-api](https://github.com/alphacep/vosk-api). The advantage of this caption engine is that there are many optional language models (over 30 languages), but the disadvantage is that the recognition effect is relatively poor, and the generated content has no punctuation.
|
||||
|
||||
### SOSV Subtitle Engine (Local)
|
||||
|
||||
[SOSV](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model) is an integrated package, mainly based on [Shepra-ONNX SenseVoice](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html), with added endpoint detection model and punctuation restoration model. The languages supported by this model for recognition are: English, Chinese, Japanese, Korean, and Cantonese.
|
||||
|
||||
## 🚀 Project Setup
|
||||
|
||||

|
||||
|
||||
### Install Dependencies
|
||||
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
|
||||
### Build Subtitle Engine
|
||||
|
||||
First enter the `engine` folder and execute the following commands to create a virtual environment (requires Python 3.10 or higher, with Python 3.12 recommended):
|
||||
|
||||
```bash
|
||||
cd ./engine
|
||||
# in ./engine folder
|
||||
python -m venv .venv
|
||||
# or
|
||||
python3 -m venv .venv
|
||||
```
|
||||
|
||||
Then activate the virtual environment:
|
||||
|
||||
```bash
|
||||
# Windows
|
||||
.venv/Scripts/activate
|
||||
# Linux or macOS
|
||||
source .venv/bin/activate
|
||||
```
|
||||
|
||||
Then install dependencies (this step might result in errors on macOS and Linux, usually due to build failures, and you need to handle them based on the error messages):
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
Then use `pyinstaller` to build the project:
|
||||
|
||||
```bash
|
||||
pyinstaller ./main.spec
|
||||
```
|
||||
|
||||
Note that the path to the `vosk` library in `main-vosk.spec` might be incorrect and needs to be configured according to the actual situation (related to the version of the Python environment).
|
||||
|
||||
```
|
||||
# Windows
|
||||
vosk_path = str(Path('./.venv/Lib/site-packages/vosk').resolve())
|
||||
# Linux or macOS
|
||||
vosk_path = str(Path('./.venv/lib/python3.x/site-packages/vosk').resolve())
|
||||
```
|
||||
|
||||
After the build completes, you can find the executable file in the `engine/dist` folder. Then proceed with subsequent operations.
|
||||
|
||||
### Run Project
|
||||
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
|
||||
### Build Project
|
||||
|
||||
```bash
|
||||
# For windows
|
||||
npm run build:win
|
||||
# For macOS
|
||||
npm run build:mac
|
||||
# For Linux
|
||||
npm run build:linux
|
||||
```
|
||||
269
README_ja.md
Normal file
@@ -0,0 +1,269 @@
|
||||
<div align="center" >
|
||||
<img src="./build/icon.png" width="100px" height="100px"/>
|
||||
<h1 align="center">auto-caption</h1>
|
||||
<p>Auto Caption はクロスプラットフォームのリアルタイム字幕表示ソフトウェアです。</p>
|
||||
<p>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.1.1-blue"></a>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
|
||||
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
|
||||
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
|
||||
<img src="https://img.shields.io/github/stars/HiMeditator/auto-caption?style=social">
|
||||
</p>
|
||||
<p>
|
||||
| <a href="./README.md">简体中文</a>
|
||||
| <a href="./README_en.md">English</a>
|
||||
| <b>日本語</b> |
|
||||
</p>
|
||||
<p><i>v1.1.1 バージョンがリリースされました。GLM-ASR クラウド字幕モデルと OpenAI 互換モデル翻訳が追加されました...</i></p>
|
||||
</div>
|
||||
|
||||

|
||||
|
||||
## 📥 ダウンロード
|
||||
|
||||
ソフトウェアダウンロード: [GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
|
||||
|
||||
Vosk モデルダウンロード: [Vosk Models](https://alphacephei.com/vosk/models)
|
||||
|
||||
SOSV モデルダウンロード: [Shepra-ONNX SenseVoice Model](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model)
|
||||
|
||||
## 📚 関連ドキュメント
|
||||
|
||||
[Auto Caption ユーザーマニュアル](./docs/user-manual/ja.md)
|
||||
|
||||
[字幕エンジン説明ドキュメント](./docs/engine-manual/ja.md)
|
||||
|
||||
[更新履歴](./docs/CHANGELOG.md)
|
||||
|
||||
## 👁️🗨️ プレビュー
|
||||
|
||||
https://github.com/user-attachments/assets/9c188d78-9520-4397-bacf-4c8fdcc54874
|
||||
|
||||
## ✨ 特徴
|
||||
|
||||
- 音声出力またはマイク入力からの字幕生成
|
||||
- ローカルのOllamaモデル、クラウド上のOpenAI互換モデル、またはクラウド上のGoogle翻訳APIを呼び出して翻訳を行うことをサポートしています
|
||||
- クロスプラットフォーム(Windows、macOS、Linux)、多言語インターフェース(中国語、英語、日本語)対応
|
||||
- 豊富な字幕スタイル設定(フォント、フォントサイズ、フォント太さ、フォント色、背景色など)
|
||||
- 柔軟な字幕エンジン選択(阿里云Gummyクラウドモデル、GLM-ASRクラウドモデル、ローカルVoskモデル、ローカルSOSVモデル、または独自にモデルを開発可能)
|
||||
- 多言語認識と翻訳(下記「⚙️ 字幕エンジン説明」参照)
|
||||
- 字幕記録表示とエクスポート(`.srt` および `.json` 形式のエクスポートに対応)
|
||||
|
||||
## 📖 基本使い方
|
||||
|
||||
> ⚠️ 注意:現在、Windowsプラットフォームのソフトウェアの最新バージョンのみがメンテナンスされており、他のプラットフォームの最終バージョンはv1.0.0のままです。
|
||||
|
||||
このソフトウェアは Windows、macOS、Linux プラットフォームに対応しています。テスト済みのプラットフォーム情報は以下の通りです:
|
||||
|
||||
| OS バージョン | アーキテクチャ | システムオーディオ入力 | システムオーディオ出力 |
|
||||
| ------------------ | ------------ | ------------------ | ------------------- |
|
||||
| Windows 11 24H2 | x64 | ✅ | ✅ |
|
||||
| macOS Sequoia 15.5 | arm64 | ✅ [追加設定が必要](./docs/user-manual/ja.md#macos-でのシステムオーディオ出力の取得方法) | ✅ |
|
||||
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
|
||||
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
|
||||
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
|
||||
|
||||
macOS および Linux プラットフォームでシステムオーディオ出力を取得するには追加設定が必要です。詳細は[Auto Captionユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。
|
||||
|
||||
ソフトウェアをダウンロードした後、自分のニーズに応じて対応するモデルを選択し、モデルを設定する必要があります。
|
||||
|
||||
| | 正確性 | 実時間性 | デプロイタイプ | 対応言語 | 翻訳 | 備考 |
|
||||
| ------------------------------------------------------------ | -------- | --------- | -------------- | -------- | ---- | ---- |
|
||||
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | とても良い😊 | とても良い😊 | クラウド / アリババクラウド | 10言語 | 内蔵翻訳 | 有料、0.54元/時間 |
|
||||
| [glm-asr-2512](https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512) | とても良い😊 | 悪い😞 | クラウド / Zhipu AI | 4言語 | 追加設定が必要 | 有料、約0.72元/時間 |
|
||||
| [Vosk](https://alphacephei.com/vosk) | 悪い😞 | とても良い😊 | ローカル / CPU | 30言語以上 | 追加設定が必要 | 多くの言語に対応 |
|
||||
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | 普通😐 | 普通😐 | ローカル / CPU | 5言語 | 追加設定が必要 | 1つのモデルのみ |
|
||||
| 自分で開発 | 🤔 | 🤔 | カスタム | カスタム | カスタム | [ドキュメント](./docs/engine-manual/ja.md)に従ってPythonを使用して自分で開発 |
|
||||
|
||||
Gummyモデル以外を選択した場合、独自の翻訳モデルを設定する必要があります。
|
||||
|
||||
### 翻訳モデルの設定
|
||||
|
||||

|
||||
|
||||
> 注意:翻訳はリアルタイムではありません。翻訳モデルは各文の認識が完了した後にのみ呼び出されます。
|
||||
|
||||
#### Ollama ローカルモデル
|
||||
|
||||
> 注意:パラメータ数が多すぎるモデルを使用すると、リソース消費と翻訳遅延が大きくなります。1B未満のパラメータ数のモデルを使用することを推奨します。例:`qwen2.5:0.5b`、`qwen3:0.6b`。
|
||||
|
||||
このモデルを使用する前に、ローカルマシンに[Ollama](https://ollama.com/)ソフトウェアがインストールされており、必要な大規模言語モデルをダウンロード済みであることを確認してください。設定で呼び出す必要がある大規模モデル名を「モデル名」フィールドに入力し、「Base URL」フィールドが空であることを確認してください。
|
||||
|
||||
#### OpenAI互換モデル
|
||||
|
||||
ローカルのOllamaモデルの翻訳効果が良くないと感じる場合や、ローカルにOllamaモデルをインストールしたくない場合は、クラウド上のOpenAI互換モデルを使用できます。
|
||||
|
||||
いくつかのモデルプロバイダの「Base URL」:
|
||||
- OpenAI: https://api.openai.com/v1
|
||||
- DeepSeek: https://api.deepseek.com
|
||||
- アリババクラウド: https://dashscope.aliyuncs.com/compatible-mode/v1
|
||||
|
||||
API Keyは対応するモデルプロバイダから取得する必要があります。
|
||||
|
||||
#### Google翻訳API
|
||||
|
||||
> 注意:Google翻訳APIは一部の地域では使用できません。
|
||||
|
||||
設定不要で、ネット接続があれば使用できます。
|
||||
|
||||
### Gummyモデルの使用
|
||||
|
||||
> 阿里云の国際版サービスにはGummyモデルが提供されていないため、現在中国以外のユーザーはGummy字幕エンジンを使用できない可能性があります。
|
||||
|
||||
デフォルトのGummy字幕エンジン(クラウドモデルを使用した音声認識と翻訳)を使用するには、まず阿里云百煉プラットフォームのAPI KEYを取得し、API KEYをソフトウェア設定に追加するか環境変数に設定する必要があります(Windowsプラットフォームのみ環境変数からのAPI KEY読み取りをサポート)。関連チュートリアル:
|
||||
|
||||
- [API KEYの取得](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [環境変数へのAPI Keyの設定](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
### GLM-ASR モデルの使用
|
||||
|
||||
使用前に、Zhipu AI プラットフォームから API キーを取得し、それをソフトウェアの設定に追加する必要があります。
|
||||
|
||||
API キーの取得についてはこちらをご覧ください:[クイックスタート](https://docs.bigmodel.cn/ja/guide/start/quick-start)。
|
||||
|
||||
### Voskモデルの使用
|
||||
|
||||
> Voskモデルの認識効果は不良のため、注意して使用してください。
|
||||
|
||||
Voskローカル字幕エンジンを使用するには、まず[Vosk Models](https://alphacephei.com/vosk/models)ページから必要なモデルをダウンロードし、ローカルにモデルを解凍し、モデルフォルダのパスをソフトウェア設定に追加してください。
|
||||
|
||||

|
||||
|
||||
### SOSVモデルの使用
|
||||
|
||||
SOSVモデルの使用方法はVoskと同じで、ダウンロードアドレスは以下の通りです:https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
|
||||
|
||||
## ⌨️ ターミナルでの使用
|
||||
|
||||
ソフトウェアはモジュール化設計を採用しており、ソフトウェア本体と字幕エンジンの2つの部分に分けることができます。ソフトウェア本体はグラフィカルインターフェースを通じて字幕エンジンを呼び出します。コアとなる音声取得および音声認識機能はすべて字幕エンジンに実装されており、字幕エンジンはソフトウェア本体から独立して単独で使用できます。
|
||||
|
||||
字幕エンジンはPythonを使用して開発され、PyInstallerによって実行可能ファイルとしてパッケージ化されます。したがって、字幕エンジンの使用方法は以下の2つがあります:
|
||||
|
||||
1. プロジェクトの字幕エンジン部分のソースコードを使用し、必要なライブラリがインストールされたPython環境で実行する
|
||||
2. パッケージ化された字幕エンジンの実行可能ファイルをターミナルから実行する
|
||||
|
||||
実行引数および詳細な使用方法については、[User Manual](./docs/user-manual/en.md#using-caption-engine-standalone)をご参照ください。
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e gummy \
|
||||
-k sk-******************************** \
|
||||
-a 0 \
|
||||
-d 1 \
|
||||
-s en \
|
||||
-t zh
|
||||
```
|
||||
|
||||

|
||||
|
||||
## ⚙️ 字幕エンジン説明
|
||||
|
||||
現在、ソフトウェアには4つの字幕エンジンが搭載されており、新しいエンジンが計画されています。それらの詳細情報は以下の通りです。
|
||||
|
||||
### Gummy 字幕エンジン(クラウド)
|
||||
|
||||
Tongyi Lab の [Gummy 音声翻訳大規模モデル](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation/)をベースに開発され、[Alibaba Cloud Bailian](https://bailian.console.aliyun.com) の APIを使用してこのクラウドモデルを呼び出します。
|
||||
|
||||
**モデル詳細パラメータ:**
|
||||
|
||||
- サポートするオーディオサンプルレート:16kHz以上
|
||||
- オーディオサンプルビット深度:16bit
|
||||
- サポートするオーディオチャンネル:モノラル
|
||||
- 認識可能な言語:中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、イタリア語、スペイン語
|
||||
- サポートする翻訳:
|
||||
- 中国語 → 英語、日本語、韓国語
|
||||
- 英語 → 中国語、日本語、韓国語
|
||||
- 日本語、韓国語、ドイツ語、フランス語、ロシア語、イタリア語、スペイン語 → 中国語または英語
|
||||
|
||||
**ネットワークトラフィック消費量:**
|
||||
|
||||
字幕エンジンはネイティブサンプルレート(48kHz と仮定)でサンプリングを行い、サンプルビット深度は 16bit、アップロードオーディオはモノラルチャンネルのため、アップロードレートは約:
|
||||
|
||||
$$
|
||||
48000\ \text{samples/second} \times 2\ \text{bytes/sample} \times 1\ \text{channel} = 93.75\ \text{KB/s}
|
||||
$$
|
||||
|
||||
また、エンジンはオーディオストームを取得したときのみデータをアップロードするため、実際のアップロードレートはさらに小さくなる可能性があります。モデル結果の返信トラフィック消費量は小さく、ここでは考慮していません。
|
||||
|
||||
### GLM-ASR 字幕エンジン(クラウド)
|
||||
|
||||
https://docs.bigmodel.cn/ja/guide/models/sound-and-video/glm-asr-2512
|
||||
|
||||
### Vosk字幕エンジン(ローカル)
|
||||
|
||||
[vosk-api](https://github.com/alphacep/vosk-api)をベースに開発。この字幕エンジンの利点は選択可能な言語モデルが非常に多く(30言語以上)、欠点は認識効果が比較的悪く、生成内容に句読点がないことです。
|
||||
|
||||
### SOSV 字幕エンジン(ローカル)
|
||||
|
||||
[SOSV](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model)は統合パッケージで、主に[Shepra-ONNX SenseVoice](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html)をベースにし、エンドポイント検出モデルと句読点復元モデルを追加しています。このモデルが認識をサポートする言語は:英語、中国語、日本語、韓国語、広東語です。
|
||||
|
||||
## 🚀 プロジェクト実行
|
||||
|
||||

|
||||
|
||||
### 依存関係のインストール
|
||||
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
|
||||
### 字幕エンジンの構築
|
||||
|
||||
まず `engine` フォルダに入り、以下のコマンドを実行して仮想環境を作成します(Python 3.10 以上が必要で、Python 3.12 が推奨されます):
|
||||
|
||||
```bash
|
||||
cd ./engine
|
||||
# ./engine フォルダ内
|
||||
python -m venv .venv
|
||||
# または
|
||||
python3 -m venv .venv
|
||||
```
|
||||
|
||||
次に仮想環境をアクティブにします:
|
||||
|
||||
```bash
|
||||
# Windows
|
||||
.venv/Scripts/activate
|
||||
# Linux または macOS
|
||||
source .venv/bin/activate
|
||||
```
|
||||
|
||||
次に依存関係をインストールします(このステップでは macOS と Linux でエラーが発生する可能性があります。通常はビルド失敗によるもので、エラーメッセージに基づいて対処する必要があります):
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
その後、`pyinstaller` を使用してプロジェクトをビルドします:
|
||||
|
||||
```bash
|
||||
pyinstaller ./main.spec
|
||||
```
|
||||
|
||||
`main-vosk.spec` ファイル内の `vosk` ライブラリのパスが正しくない可能性があるため、実際の状況(Python 環境のバージョンに関連)に応じて設定する必要があります。
|
||||
|
||||
```
|
||||
# Windows
|
||||
vosk_path = str(Path('./.venv/Lib/site-packages/vosk').resolve())
|
||||
# Linux または macOS
|
||||
vosk_path = str(Path('./.venv/lib/python3.x/site-packages/vosk').resolve())
|
||||
```
|
||||
|
||||
これでプロジェクトのビルドが完了し、`engine/dist` フォルダ内に対応する実行可能ファイルが確認できます。その後、次の操作に進むことができます。
|
||||
|
||||
### プロジェクト実行
|
||||
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
|
||||
### プロジェクト構築
|
||||
|
||||
```bash
|
||||
# Windows 用
|
||||
npm run build:win
|
||||
# macOS 用
|
||||
npm run build:mac
|
||||
# Linux 用
|
||||
npm run build:linux
|
||||
```
|
||||
BIN
assets/01.png
|
Before Width: | Height: | Size: 311 KiB |
BIN
assets/media/config_en.png
Normal file
|
After Width: | Height: | Size: 52 KiB |
BIN
assets/media/config_ja.png
Normal file
|
After Width: | Height: | Size: 52 KiB |
BIN
assets/media/config_zh.png
Normal file
|
After Width: | Height: | Size: 54 KiB |
BIN
assets/media/engine_en.png
Normal file
|
After Width: | Height: | Size: 79 KiB |
BIN
assets/media/engine_ja.png
Normal file
|
After Width: | Height: | Size: 82 KiB |
BIN
assets/media/engine_zh.png
Normal file
|
After Width: | Height: | Size: 87 KiB |
BIN
assets/media/main_en.png
Normal file
|
After Width: | Height: | Size: 476 KiB |
BIN
assets/media/main_ja.png
Normal file
|
After Width: | Height: | Size: 488 KiB |
BIN
assets/media/main_zh.png
Normal file
|
After Width: | Height: | Size: 486 KiB |
BIN
assets/media/structure_en.png
Normal file
|
After Width: | Height: | Size: 323 KiB |
BIN
assets/media/structure_ja.png
Normal file
|
After Width: | Height: | Size: 324 KiB |
BIN
assets/media/structure_zh.png
Normal file
|
After Width: | Height: | Size: 324 KiB |
BIN
assets/structure.pptx
Normal file
@@ -5,7 +5,9 @@
|
||||
The following icons are used under CC BY 4.0 license:
|
||||
|
||||
- icon.png
|
||||
- icon.svg
|
||||
- icon.icns
|
||||
|
||||
Source:
|
||||
|
||||
- https://icon-icons.com/en/pack/Duetone/2064
|
||||
- https://icon-icons.com/en/pack/Duetone/2064
|
||||
16
build/entitlements.mac.plist
Normal file
@@ -0,0 +1,16 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>com.apple.security.cs.allow-jit</key>
|
||||
<true/>
|
||||
<key>com.apple.security.cs.allow-unsigned-executable-memory</key>
|
||||
<true/>
|
||||
<key>com.apple.security.cs.allow-dyld-environment-variables</key>
|
||||
<true/>
|
||||
<key>com.apple.security.cs.disable-library-validation</key>
|
||||
<true/>
|
||||
<key>com.apple.security.device.audio-input</key>
|
||||
<true/>
|
||||
</dict>
|
||||
</plist>
|
||||
BIN
build/icon.icns
Normal file
BIN
build/icon.png
Normal file
|
After Width: | Height: | Size: 36 KiB |
1
build/icon.svg
Normal file
@@ -0,0 +1 @@
|
||||
<svg id="Layer_1" data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" viewBox="6 6 52 52"><defs><style>.cls-1{fill:#a8d2f0;}.cls-2{fill:#389ad6;}.cls-3,.cls-4{fill:none;}.cls-4{stroke:#295183;stroke-linecap:round;stroke-linejoin:round;stroke-width:2px;}.cls-5{fill:#295183;}</style></defs><title>weather, forecast, direction, compass</title><path class="cls-1" d="M25.56,17.37c-.87,6.45-1.73,22.73,10.26,29.37A1.77,1.77,0,0,1,35.15,50C27.56,51,15,50,13.05,33.13a1.9,1.9,0,0,1,0-.21c0-1.24.11-13.46,10.07-17.41A1.77,1.77,0,0,1,25.56,17.37Z"/><path class="cls-2" d="M30.32,35l1,4.45a3.2,3.2,0,0,0-.22.72c-.1.46-.19.92-.29,1.38-.13.68-.39,1.49-1.06,1.67s-1.32-.44-1.55-1.11S28,40.72,27.84,40s-.76-1.33-1.45-1.26c-.34,0-.62.27-1,.32-.78.16-.31-1.79-.46-2.13a1.67,1.67,0,0,0-1.08-.82c-.91-.27-3.85-.37-3.06-2.07a1.68,1.68,0,0,1,1.07-.76,9.87,9.87,0,0,1,1.4-.32,3.94,3.94,0,0,0,1.26-.32l4.44,1,1.07.23Z"/><path class="cls-2" d="M30.32,28.31l-.24,1.07L29,29.62,27.26,30a1.83,1.83,0,0,0,.52-.8A6,6,0,0,0,28,28c0-.26.07-.5.12-.74a1.26,1.26,0,0,1,.1-.29Z"/><path class="cls-2" d="M34.62,29.37l0-.2.69-.43a2.66,2.66,0,0,1-.38.7Z"/><line class="cls-3" x1="33.74" y1="37.87" x2="33.45" y2="39.16"/><path class="cls-2" d="M37,35.79A4.71,4.71,0,0,1,36,36a7.51,7.51,0,0,0-1,.17,2.43,2.43,0,0,0-.37.13,2,2,0,0,0-.62.47l.4-1.78.23-1.07,1.07-.23Z"/><polyline class="cls-4" points="32 20.86 30.47 27.68 30.17 28.99 29.95 29.95 28.99 30.17 27.42 30.52 26.41 30.75 25.24 31.01 20.86 32 25 32.93 28.99 33.83 29.95 34.04 30.17 35.01 31.07 39.01 32 43.14 32.99 38.75 33.25 37.59 33.47 36.6 33.83 35 34.04 34.04 35 33.83 36.27 33.54 43.14 32 35.01 30.17 34.28 30.01 34.04 29.95 34 29.77 33.83 28.99 33.38 26.98"/><polygon class="cls-4" points="30.17 28.99 29.95 29.95 28.99 30.17 28.09 28.74 26.98 26.98 28.29 27.81 30.17 28.99"/><polygon class="cls-4" points="30.17 35.01 26.98 37.02 28.99 33.83 29.95 34.04 30.17 35.01"/><polygon class="cls-4" points="37.02 37.02 35.26 35.91 33.83 35 34.04 34.04 35 33.83 36.2 35.72 37.02 37.02"/><polygon class="cls-4" points="37.02 26.98 35.01 30.17 34.28 30.01 34.04 29.95 34 29.77 33.83 28.99 37.02 26.98"/><path class="cls-4" d="M38.42,14.13A19.08,19.08,0,1,1,32,13a19.19,19.19,0,0,1,2,.11"/><circle class="cls-5" cx="32.03" cy="16.99" r="1"/><circle class="cls-5" cx="47.01" cy="32.03" r="1"/><circle class="cls-5" cx="31.97" cy="47.01" r="1"/><circle class="cls-5" cx="16.99" cy="31.97" r="1"/></svg>
|
||||
|
After Width: | Height: | Size: 2.4 KiB |
190
docs/CHANGELOG.md
Normal file
@@ -0,0 +1,190 @@
|
||||
## v0.0.1
|
||||
|
||||
2025-06-22
|
||||
|
||||
发布第一版软件。
|
||||
|
||||
## v0.1.0
|
||||
|
||||
2025-06-26
|
||||
|
||||
### 新增功能
|
||||
|
||||
- 添加错误通知
|
||||
- 添加默认引擎的环境变量检查
|
||||
- 添加配置数据文件保存和载入
|
||||
- 添加字幕样式恢复默认的选项
|
||||
- 添加项目关于信息
|
||||
|
||||
### 新增文档
|
||||
|
||||
- 添加用户说明文档
|
||||
- 添加字幕引擎说明文档
|
||||
|
||||
## v0.2.0
|
||||
|
||||
2025-07-05
|
||||
|
||||
对项目进行了重构,修复了 bug,添加了新功能。本版本为正式版。
|
||||
|
||||
### 新增功能
|
||||
|
||||
- 添加长字幕内容隐藏功能 (#1)
|
||||
- 添加多界面语言支持(中文、英语、日语)
|
||||
- 添加暗色主题
|
||||
|
||||
### 提升体验
|
||||
|
||||
- 优化界面布局
|
||||
- 添加更多可保存和载入的配置项
|
||||
- 为字幕引擎添加更严格的状态限制,防止出现僵尸进程
|
||||
|
||||
### 修复bug
|
||||
|
||||
- 添加字幕引擎长时间空置后报错的问题 (#2)
|
||||
|
||||
### 新增文档
|
||||
|
||||
- 新增日语说明文档
|
||||
- 新增英语、日语字幕引擎说明文档和用户手册
|
||||
- 新增 electron ipc api 文档
|
||||
|
||||
## v0.3.0
|
||||
|
||||
2025-07-09
|
||||
|
||||
对字幕引擎代码进行了重构,软件适配了 macOS 平台,添加了新功能。
|
||||
|
||||
### 新增功能
|
||||
|
||||
- 添加软件内设置 API KEY 的功能
|
||||
- 添加字幕字体粗细和文本阴影的设置
|
||||
- 添加复制字幕记录到剪贴板的功能 (#3)
|
||||
|
||||
### 优化体验
|
||||
|
||||
- 字幕时间记录精确到毫秒
|
||||
- 更详细的说明文档(添加字幕引擎规格说明、用户文档和字幕引擎文档更新) (#4)
|
||||
- 适配 macOS 平台
|
||||
- 字幕窗口有了更大的顶置优先级
|
||||
- 预览窗口可以实时显示最新的字幕内容
|
||||
|
||||
### 修复bug
|
||||
|
||||
- 修复使用系统主题时暗色系统载入为亮色的问题
|
||||
|
||||
## v0.4.0
|
||||
|
||||
2025-07-11
|
||||
|
||||
添加了 Vosk 本地字幕引擎,更新了项目文档,继续优化使用体验。
|
||||
|
||||
### 新增功能
|
||||
|
||||
- 添加了基于 Vosk 的字幕引擎, **当前 Vosk 字幕引擎暂不支持翻译**
|
||||
- 更新用户界面,增加 Vosk 引擎选项和模型路径设置
|
||||
|
||||
### 优化体验
|
||||
|
||||
- 字幕窗口右上角图标的颜色改为和字幕原文字体颜色一致
|
||||
|
||||
## v0.5.0
|
||||
|
||||
2025-07-15
|
||||
|
||||
为软件本体添加了更多功能、适配了 Linux。
|
||||
|
||||
### 新增功能
|
||||
|
||||
- 适配了 Linux 平台
|
||||
- 新增修改字幕时间功能,可调整字幕时间
|
||||
- 支持导出 srt 格式的字幕记录
|
||||
- 支持显示字幕引擎状态(pid、ppid、CPU占用率、内存占用、运行时间)
|
||||
|
||||
### 优化体验
|
||||
|
||||
- 调整字幕窗口右上角图标为竖向排布
|
||||
- 过滤 Gummy 字幕引擎输出的不完整字幕
|
||||
|
||||
## v0.5.1
|
||||
|
||||
2025-07-17
|
||||
|
||||
### 修复 bug
|
||||
|
||||
- 修复无法调用自定义字幕引擎的 bug
|
||||
- 修复自定义字幕引擎的参数失效 bug
|
||||
|
||||
## v0.6.0
|
||||
|
||||
2025-07-29
|
||||
|
||||
### 新增功能
|
||||
|
||||
- 新增字幕记录排序功能,可选择字幕记录正序或倒叙显示
|
||||
|
||||
### 优化体验
|
||||
|
||||
- 减小了软件安装包的体积
|
||||
- 微调字幕引擎设置界面布局
|
||||
- 交换窗口界面信息弹窗和错误弹窗的位置,防止提示信息挡住操作
|
||||
- 提高程序健壮性,完全避免字幕引擎进程成为孤儿进程
|
||||
- 修改字幕引擎文档,添加更详细的开发说明
|
||||
|
||||
### 项目优化
|
||||
|
||||
- 重构字幕引擎,提示字幕引擎代码的可扩展性和可读性
|
||||
- 合并 Gummy 和 Vosk 引擎为单个可执行文件
|
||||
- 字幕引擎和主程序添加 Socket 通信,完全避免字幕引擎成为孤儿进程
|
||||
|
||||
|
||||
## v0.7.0
|
||||
|
||||
2025-08-20
|
||||
|
||||
### 新增功能
|
||||
|
||||
- 添加字幕窗口宽度记忆,重新打开时与上次字幕窗口宽度一致
|
||||
- 在尝试关闭字幕引擎 4s 后字幕引擎仍未关闭,则强制关闭字幕引擎
|
||||
- 添加复制最新字幕选项,用户可以选择只复制最近1~3条字幕 (#13)
|
||||
- 添加主题颜色设置,支持六种颜色:蓝色、绿色、橙色、紫色、粉色、暗色/明色
|
||||
- 添加日志记录显示:可以查看软件的字幕引擎输出的日志记录
|
||||
|
||||
### 优化体验
|
||||
|
||||
- 优化软件用户界面的部分组件
|
||||
- 更清晰的日志输出
|
||||
|
||||
|
||||
## v1.0.0
|
||||
|
||||
2025-09-08
|
||||
|
||||
### 新增功能
|
||||
|
||||
- 字幕引擎添加超时关闭功能:如果在规定时间字幕引擎没有启动成功会自动关闭;在字幕引擎启动过程中可选择关闭字幕引擎
|
||||
- 添加非实时翻译功能:支持调用 Ollama 本地模型进行翻译;支持调用 Google 翻译 API 进行翻译
|
||||
- 添加新的翻译模型:添加 SOSV 模型,支持识别英语、中文、日语、韩语、粤语
|
||||
- 添加录音功能:可以将字幕引擎识别的音频保存为 .wav 文件
|
||||
- 添加多行字幕功能,用户可以设置字幕窗口显示的字幕的行数
|
||||
|
||||
### 优化体验
|
||||
|
||||
- 优化部分提示信息显示位置
|
||||
- 替换重采样模型,提高音频重采样质量
|
||||
- 带有额外信息的标签颜色改为与主题色一致
|
||||
|
||||
## v1.1.0
|
||||
|
||||
### 新增功能
|
||||
|
||||
- 添加基于 GLM-ASR 的字幕引擎
|
||||
- 添加 OpenAI API 兼容模型作为新的翻译模型
|
||||
|
||||
## v1.1.1
|
||||
|
||||
### 优化体验
|
||||
|
||||
- 取消字幕窗口的顶置选项,字幕窗口将始终处于顶置状态
|
||||
- 将字幕窗口顶置选项改为鼠标穿透选项,当图钉图标为实心时,表示启用鼠标穿透
|
||||
|
||||
30
docs/TODO.md
Normal file
@@ -0,0 +1,30 @@
|
||||
## 已完成
|
||||
|
||||
- [x] 添加英语和日语语言支持 *2025/07/04*
|
||||
- [x] 添加暗色主题 *2025/07/04*
|
||||
- [x] 优化长字幕显示效果 *2025/07/05*
|
||||
- [x] 修复字幕引擎空置报错的问题 *2025/07/05*
|
||||
- [x] 增强字幕窗口顶置优先级 *2025/07/07*
|
||||
- [x] 添加对自带字幕引擎的详细规格说明 *2025/07/07*
|
||||
- [x] 添加复制字幕到剪贴板功能 *2025/07/08*
|
||||
- [x] 适配 macOS 平台 *2025/07/08*
|
||||
- [x] 添加字幕文字描边 *2025/07/09*
|
||||
- [x] 添加基于 Vosk 的字幕引擎 *2025/07/09*
|
||||
- [x] 适配 Linux 平台 *2025/07/13*
|
||||
- [x] 字幕窗口右上角图标改为竖向排布 *2025/07/14*
|
||||
- [x] 可以调整字幕时间轴 *2025/07/14*
|
||||
- [x] 可以导出 srt 格式的字幕记录 *2025/07/14*
|
||||
- [x] 可以获取字幕引擎的系统资源消耗情况 *2025/07/15*
|
||||
- [x] 添加字幕记录按时间降序排列选择 *2025/07/26*
|
||||
- [x] 重构字幕引擎 *2025/07/28*
|
||||
- [x] 优化前端界面提示消息 *2025/07/29*
|
||||
- [x] 复制字幕记录可选择只复制最近的字幕记录 *2025/08/18*
|
||||
- [x] 添加颜色主题设置 *2025/08/18*
|
||||
- [x] 前端页面添加日志内容展示 *2025/08/19*
|
||||
- [x] 添加 Ollama 模型用于本地字幕引擎的翻译 *2025/09/04*
|
||||
- [x] 验证 / 添加基于 sherpa-onnx 的字幕引擎 *2025/09/06*
|
||||
- [x] 添加 GLM-ASR 模型 *2026/01/10*
|
||||
|
||||
## TODO
|
||||
|
||||
暂无
|
||||
144
docs/api-docs/caption-engine.md
Normal file
@@ -0,0 +1,144 @@
|
||||
# caption engine api-doc
|
||||
|
||||
本文档主要介绍字幕引擎和 Electron 主进程进程的通信约定。
|
||||
|
||||
## 原理说明
|
||||
|
||||
本项目的 Python 进程通过标准输出向 Electron 主进程发送数据。Python 进程标准输出 (`sys.stdout`) 的内容一定为一行一行的字符串。且每行字符串均可以解释为一个 JSON 对象。每个 JSON 对象一定有 `command` 参数。
|
||||
|
||||
Electron 主进程通过 TCP Socket 向 Python 进程发送数据。发送的数据均是转化为字符串的对象,对象格式一定为:
|
||||
|
||||
```js
|
||||
{
|
||||
command: string,
|
||||
content: string
|
||||
}
|
||||
```
|
||||
|
||||
## 标准输出约定
|
||||
|
||||
> 数据传递方向:字幕引擎进程 => Electron 主进程
|
||||
|
||||
当 JSON 对象的 `command` 参数为下列值时,表示的对应的含义:
|
||||
|
||||
### `connect`
|
||||
|
||||
```js
|
||||
{
|
||||
command: "connect",
|
||||
content: ""
|
||||
}
|
||||
```
|
||||
|
||||
字幕引擎 TCP Socket 服务已经准备好,命令 Electron 主进程连接字幕引擎 Socket 服务
|
||||
|
||||
### `kill`
|
||||
|
||||
```js
|
||||
{
|
||||
command: "connect",
|
||||
content: ""
|
||||
}
|
||||
```
|
||||
|
||||
命令 Electron 主进程强制结束字幕引擎进程。
|
||||
|
||||
### `caption`
|
||||
|
||||
```js
|
||||
{
|
||||
command: "caption",
|
||||
index: number,
|
||||
time_s: string,
|
||||
time_t: string,
|
||||
text: string,
|
||||
translation: string
|
||||
}
|
||||
```
|
||||
|
||||
Python 端监听到的音频流转换为的字幕数据。
|
||||
|
||||
### `translation`
|
||||
|
||||
```js
|
||||
{
|
||||
command: "translation",
|
||||
time_s: string,
|
||||
text: string,
|
||||
translation: string
|
||||
}
|
||||
```
|
||||
|
||||
语音识别的内容的翻译,可以根据起始时间确定对应的字幕。
|
||||
|
||||
### `print`
|
||||
|
||||
```js
|
||||
{
|
||||
command: "print",
|
||||
content: string
|
||||
}
|
||||
```
|
||||
|
||||
输出 Python 端打印的内容,不计入日志。
|
||||
|
||||
### `info`
|
||||
|
||||
```js
|
||||
{
|
||||
command: "info",
|
||||
content: string
|
||||
}
|
||||
```
|
||||
|
||||
Python 端打印的提示信息,会计入日志。
|
||||
|
||||
### `warn`
|
||||
|
||||
```js
|
||||
{
|
||||
command: "warn",
|
||||
content: string
|
||||
}
|
||||
```
|
||||
|
||||
Python 端打印的警告信息,会计入日志。
|
||||
|
||||
### `error`
|
||||
|
||||
```js
|
||||
{
|
||||
command: "error",
|
||||
content: string
|
||||
}
|
||||
```
|
||||
|
||||
Python 端打印的错误信息,该错误信息会在前端弹窗显示。
|
||||
|
||||
### `usage`
|
||||
|
||||
```js
|
||||
{
|
||||
command: "usage",
|
||||
content: string
|
||||
}
|
||||
```
|
||||
|
||||
Gummy 字幕引擎结束时打印计费消耗信息。
|
||||
|
||||
## TCP Socket
|
||||
|
||||
> 数据传递方向:Electron 主进程 => 字幕引擎进程
|
||||
|
||||
当 JSON 对象的 `command` 参数为下列值时,表示的对应的含义:
|
||||
|
||||
### `stop`
|
||||
|
||||
```js
|
||||
{
|
||||
command: "stop",
|
||||
content: ""
|
||||
}
|
||||
```
|
||||
|
||||
命令当前字幕引擎停止监听并结束任务。
|
||||
345
docs/api-docs/electron-ipc.md
Normal file
@@ -0,0 +1,345 @@
|
||||
# electron ipc api-doc
|
||||
|
||||
本文档主要记录主进程和渲染进程的通信约定。
|
||||
|
||||
## 命名方式
|
||||
|
||||
本项目渲染进程包含两个:字幕窗口和控制窗口,主进程需要分别和两者进行通信。通信命令的命名规则如下:
|
||||
|
||||
1. 命令一般由三个关键字组成,由点号隔开。
|
||||
2. 第一个关键字表示通信发送目标:
|
||||
- `config` 表示控制窗口类实例(后端)或控制窗口(前端)
|
||||
- `engine` 表示字幕窗口类实例(后端)或字幕窗口(前端)
|
||||
- `both` 表示上述对象都有可能成为目标
|
||||
3. 第二个关键字表示需要修改的对象 / 发生改变的对象,采用小驼峰命名
|
||||
4. 第三个关键字一般是动词,表示通信发生时对应动作 / 需要进行的操作
|
||||
|
||||
根据上面的描述可以看出通信命令一般有两种语义,一种表示要求执行的操作,另一种表示当前发生的事件。
|
||||
|
||||
## 前端 <=> 后端
|
||||
|
||||
### `both.window.mounted`
|
||||
|
||||
**介绍:** 前端窗口挂载完毕,请求最新的配置数据
|
||||
|
||||
**发起方:** 前端
|
||||
|
||||
**接收方:** 后端
|
||||
|
||||
**数据类型:**
|
||||
|
||||
- 发送:无数据
|
||||
- 接收:`FullConfig`
|
||||
|
||||
### `control.nativeTheme.get`
|
||||
|
||||
**介绍:** 前端获取系统当前的主题
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:**
|
||||
|
||||
- 发送:无数据
|
||||
- 接收:`string`
|
||||
|
||||
### `control.folder.select`
|
||||
|
||||
**介绍:** 打开文件夹选择器,并将用户选择的文件夹路径返回给前端
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:**
|
||||
|
||||
- 发送:无数据
|
||||
- 接收:`string`
|
||||
|
||||
### `control.engine.info`
|
||||
|
||||
**介绍:** 获取字幕引擎的资源消耗情况
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:**
|
||||
|
||||
- 发送:无数据
|
||||
- 接收:`EngineInfo`
|
||||
|
||||
## 前端 ==> 后端
|
||||
|
||||
### `control.uiLanguage.change`
|
||||
|
||||
**介绍:** 前端修改字界面语言,将修改同步给后端
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** `UILanguage`
|
||||
|
||||
### `control.uiTheme.change`
|
||||
|
||||
**介绍:** 前端修改界面主题,将修改同步给后端
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** `UITheme`
|
||||
|
||||
### `control.uiColor.change`
|
||||
|
||||
**介绍:** 前端修改界面主题颜色,将修改同步给后端
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** `string`
|
||||
|
||||
### `control.leftBarWidth.change`
|
||||
|
||||
**介绍:** 前端修改边栏宽度,将修改同步给后端
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** `number`
|
||||
|
||||
### `control.captionLog.clear`
|
||||
|
||||
**介绍:** 清空字幕记录
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** 无数据
|
||||
|
||||
### `control.styles.change`
|
||||
|
||||
**介绍:** 前端修改字幕样式,将修改同步给后端
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** `Styles`
|
||||
|
||||
### `control.styles.reset`
|
||||
|
||||
**介绍:** 将字幕样式恢复为默认
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** 无数据
|
||||
|
||||
### `control.controls.change`
|
||||
|
||||
**介绍:** 前端修改了字幕引擎配置,将最新配置发送给后端
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** `Controls`
|
||||
|
||||
### `control.captionWindow.activate`
|
||||
|
||||
**介绍:** 激活字幕窗口
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** 无数据
|
||||
|
||||
### `control.engine.start`
|
||||
|
||||
**介绍:** 启动字幕引擎
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** 无数据
|
||||
|
||||
### `control.engine.stop`
|
||||
|
||||
**介绍:** 关闭字幕引擎
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** 无数据
|
||||
|
||||
### `control.engine.forceKill`
|
||||
|
||||
**介绍:** 强制关闭启动超时的字幕引擎
|
||||
|
||||
**发起方:** 前端控制窗口
|
||||
|
||||
**接收方:** 后端控制窗口实例
|
||||
|
||||
**数据类型:** 无数据
|
||||
|
||||
### `caption.windowHeight.change`
|
||||
|
||||
**介绍:** 字幕窗口宽度发生改变
|
||||
|
||||
**发起方:** 前端字幕窗口
|
||||
|
||||
**接收方:** 后端字幕窗口实例
|
||||
|
||||
**数据类型:** `number`
|
||||
|
||||
### `caption.mouseEvents.ignore`
|
||||
|
||||
**介绍:** 是否设置鼠标穿透
|
||||
|
||||
**发起方:** 前端字幕窗口
|
||||
|
||||
**接收方:** 后端字幕窗口实例
|
||||
|
||||
**数据类型:** `boolean`
|
||||
|
||||
### `caption.controlWindow.activate`
|
||||
|
||||
**介绍:** 激活控制窗口
|
||||
|
||||
**发起方:** 前端字幕窗口
|
||||
|
||||
**接收方:** 后端字幕窗口实例
|
||||
|
||||
**数据类型:** 无数据
|
||||
|
||||
### `caption.window.close`
|
||||
|
||||
**介绍:** 关闭字幕窗口
|
||||
|
||||
**发起方:** 前端字幕窗口
|
||||
|
||||
**接收方:** 后端字幕窗口实例
|
||||
|
||||
**数据类型:** 无数据
|
||||
|
||||
## 后端 ==> 前端
|
||||
|
||||
### `control.uiLanguage.set`
|
||||
|
||||
**介绍:** 后端将最新界面语言发送给前端,前端进行设置
|
||||
|
||||
**发起方:** 后端
|
||||
|
||||
**接收方:** 字幕窗口
|
||||
|
||||
**数据类型:** `UILanguage`
|
||||
|
||||
### `control.nativeTheme.change`
|
||||
|
||||
**介绍:** 系统主题发生改变
|
||||
|
||||
**发起方:** 后端
|
||||
|
||||
**接收方:** 前端控制窗口
|
||||
|
||||
**数据类型:** `string`
|
||||
|
||||
### `control.engine.started`
|
||||
|
||||
**介绍:** 引擎启动成功,参数为引擎的进程 ID
|
||||
|
||||
**发起方:** 后端
|
||||
|
||||
**接收方:** 前端控制窗口
|
||||
|
||||
**数据类型:** `number`
|
||||
|
||||
### `control.engine.stopped`
|
||||
|
||||
**介绍:** 引擎关闭
|
||||
|
||||
**发起方:** 后端
|
||||
|
||||
**接收方:** 前端控制窗口
|
||||
|
||||
**数据类型:** 无数据
|
||||
|
||||
### `control.error.occurred`
|
||||
|
||||
**介绍:** 发送错误
|
||||
|
||||
**发起方:** 后端
|
||||
|
||||
**接收方:** 前端控制窗口
|
||||
|
||||
**数据类型:** `string`
|
||||
|
||||
### `control.controls.set`
|
||||
|
||||
**介绍:** 后端将最新字幕引擎配置发送给前端,前端进行设置
|
||||
|
||||
**发起方:** 后端
|
||||
|
||||
**接收方:** 前端控制窗口
|
||||
|
||||
**数据类型:** `Controls`
|
||||
|
||||
### `control.softwareLog.add`
|
||||
|
||||
**介绍:** 添加一条新的日志数据
|
||||
|
||||
**发起方:** 后端
|
||||
|
||||
**接收方:** 前端控制窗口
|
||||
|
||||
**数据类型:** `SoftwareLog`
|
||||
|
||||
### `both.styles.set`
|
||||
|
||||
**介绍:** 后端将最新字幕样式发送给前端,前端进行设置
|
||||
|
||||
**发起方:** 后端
|
||||
|
||||
**接收方:** 前端
|
||||
|
||||
**数据类型:** `Styles`
|
||||
|
||||
### `both.captionLog.add`
|
||||
|
||||
**介绍:** 添加一条新的字幕数据
|
||||
|
||||
**发起方:** 后端
|
||||
|
||||
**接收方:** 前端
|
||||
|
||||
**数据类型:** `CaptionItem`
|
||||
|
||||
### `both.captionLog.upd`
|
||||
|
||||
**介绍:** 更新最后一条字幕数据
|
||||
|
||||
**发起方:** 后端
|
||||
|
||||
**接收方:** 前端
|
||||
|
||||
**数据类型:** `CaptionItem`
|
||||
|
||||
### `both.captionLog.set`
|
||||
|
||||
**介绍:** 设置全部的字幕数据
|
||||
|
||||
**发起方:** 后端
|
||||
|
||||
**接收方:** 前端
|
||||
|
||||
**数据类型:** `CaptionItem[]`
|
||||
231
docs/engine-manual/en.md
Normal file
@@ -0,0 +1,231 @@
|
||||
# Caption Engine Documentation
|
||||
|
||||
Corresponding version: v1.0.0
|
||||
|
||||

|
||||
|
||||
## Introduction to the Caption Engine
|
||||
|
||||
The so-called caption engine is actually a subprocess that captures streaming data from system audio input (microphone) or output (speaker) in real-time, and invokes an audio-to-text model to generate captions for the corresponding audio. The generated captions are converted into JSON-formatted string data and transmitted to the main program through standard output (ensuring that the string received by the main program can be correctly interpreted as a JSON object). The main program reads and interprets the caption data, processes it, and displays it in the window.
|
||||
|
||||
**Communication between the caption engine process and Electron main process follows the standard: [caption engine api-doc](../api-docs/caption-engine.md).**
|
||||
|
||||
## Execution Flow
|
||||
|
||||
Process of communication between main process and caption engine:
|
||||
|
||||
### Starting the Engine
|
||||
|
||||
- Electron main process: Use `child_process.spawn()` to start the caption engine process
|
||||
- Caption engine process: Create a TCP Socket server thread, after creation output a JSON object converted to string via standard output, containing the `command` field with value `connect`
|
||||
- Main process: Listen to the caption engine process's standard output, try to split the standard output by lines, parse it into a JSON object, and check if the object's `command` field value is `connect`. If so, connect to the TCP Socket server
|
||||
|
||||
### Caption Recognition
|
||||
|
||||
- Caption engine process: Create a new thread to monitor system audio output, put acquired audio data chunks into a shared queue (`shared_data.chunk_queue`). The caption engine continuously reads audio data chunks from the shared queue and parses them. The caption engine may also create a new thread to perform translation operations. Finally, the caption engine sends parsed caption data object strings through standard output
|
||||
- Electron main process: Continuously listen to the caption engine's standard output and take different actions based on the parsed object's `command` field
|
||||
|
||||
### Stopping the Engine
|
||||
|
||||
- Electron main process: When the user operates to close the caption engine in the frontend, the main process sends an object string with `command` field set to `stop` to the caption engine process through Socket communication
|
||||
- Caption engine process: Receive the caption data object string sent by the main engine process, parse the string into an object. If the object's `command` field is `stop`, set the value of global variable `shared_data.status` to `stop`
|
||||
- Caption engine process: Main thread continuously monitors system audio output, when `thread_data.status` value is not `running`, end the loop, release resources, and terminate execution
|
||||
- Electron main process: If the caption engine process termination is detected, perform corresponding processing and provide feedback to the frontend
|
||||
|
||||
## Implemented Features
|
||||
|
||||
The following features have been implemented and can be directly reused.
|
||||
|
||||
### Standard Output
|
||||
|
||||
Can output regular information, commands, and error messages.
|
||||
|
||||
Examples:
|
||||
|
||||
```python
|
||||
from utils import stdout, stdout_cmd, stdout_obj, stderr
|
||||
# {"command": "print", "content": "Hello"}\n
|
||||
stdout("Hello")
|
||||
# {"command": "connect", "content": "8080"}\n
|
||||
stdout_cmd("connect", "8080")
|
||||
# {"command": "print", "content": "print"}\n
|
||||
stdout_obj({"command": "print", "content": "print"})
|
||||
# sys.stderr.write("Error Info" + "\n")
|
||||
stderr("Error Info")
|
||||
```
|
||||
|
||||
### Creating Socket Service
|
||||
|
||||
This Socket service listens on a specified port, parses content sent by the Electron main program, and may change the value of `shared_data.status`.
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
from utils import start_server
|
||||
from utils import shared_data
|
||||
port = 8080
|
||||
start_server(port)
|
||||
while thread_data == 'running':
|
||||
# do something
|
||||
pass
|
||||
```
|
||||
|
||||
### Audio Acquisition
|
||||
|
||||
The `AudioStream` class is used to acquire audio data, with cross-platform implementation supporting Windows, Linux, and macOS. The class initialization includes two parameters:
|
||||
|
||||
- `audio_type`: Audio acquisition type, 0 for system output audio (speaker), 1 for system input audio (microphone)
|
||||
- `chunk_rate`: Audio data acquisition frequency, number of audio chunks acquired per second, default is 10
|
||||
|
||||
The class contains four methods:
|
||||
|
||||
- `open_stream()`: Start audio acquisition
|
||||
- `read_chunk() -> bytes`: Read an audio chunk
|
||||
- `close_stream()`: Close audio acquisition
|
||||
- `close_stream_signal()`: Thread-safe closing of system audio input stream
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
from sysaudio import AudioStream
|
||||
audio_type = 0
|
||||
chunk_rate = 20
|
||||
stream = AudioStream(audio_type, chunk_rate)
|
||||
stream.open_stream()
|
||||
while True:
|
||||
data = stream.read_chunk()
|
||||
# do something with data
|
||||
pass
|
||||
stream.close_stream()
|
||||
```
|
||||
|
||||
### Audio Processing
|
||||
|
||||
Before converting audio streams to text, preprocessing may be required. Usually, multi-channel audio needs to be converted to single-channel audio, and resampling may also be needed. This project provides two audio processing functions:
|
||||
|
||||
- `merge_chunk_channels(chunk: bytes, channels: int) -> bytes`: Convert multi-channel audio chunks to single-channel audio chunks
|
||||
- `resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int) -> bytes`: Convert current multi-channel audio data chunks to single-channel audio data chunks, then perform resampling
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
from sysaudio import AudioStream
|
||||
from utils import merge_chunk_channels
|
||||
stream = AudioStream(1)
|
||||
while True:
|
||||
raw_chunk = stream.read_chunk()
|
||||
chunk = resample_chunk_mono(raw_chunk, stream.CHANNELS, stream.RATE, 16000)
|
||||
# do something with chunk
|
||||
```
|
||||
|
||||
## Features to be Implemented by the Caption Engine
|
||||
|
||||
### Audio to Text Conversion
|
||||
|
||||
After obtaining suitable audio streams, the audio stream needs to be converted to text. Generally, various models (cloud or local) are used to implement audio-to-text conversion. Appropriate models should be selected according to requirements.
|
||||
|
||||
It is recommended to encapsulate this as a class, implementing four methods:
|
||||
|
||||
- `start(self)`: Start the model
|
||||
- `send_audio_frame(self, data: bytes)`: Process current audio chunk data, **generated caption data is sent to Electron main process through standard output**
|
||||
- `translate(self)`: Continuously retrieve data chunks from `shared_data.chunk_queue` and call `send_audio_frame` method to process data chunks
|
||||
- `stop(self)`: Stop the model
|
||||
|
||||
Complete caption engine examples:
|
||||
|
||||
- [gummy.py](../../engine/audio2text/gummy.py)
|
||||
- [vosk.py](../../engine/audio2text/vosk.py)
|
||||
- [sosv.py](../../engine/audio2text/sosv.py)
|
||||
|
||||
### Caption Translation
|
||||
|
||||
Some speech-to-text models do not provide translation. If needed, an additional translation module needs to be added, or built-in translation modules can be used.
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
from utils import google_translate, ollama_translate
|
||||
text = "This is a translation test."
|
||||
google_translate("", "en", text, "time_s")
|
||||
ollama_translate("qwen3:0.6b", "en", text, "time_s")
|
||||
```
|
||||
|
||||
### Caption Data Transmission
|
||||
|
||||
After obtaining the text from the current audio stream, the text needs to be sent to the main program. The caption engine process transmits caption data to the Electron main process through standard output.
|
||||
|
||||
The transmitted content must be a JSON string, where the JSON object needs to contain the following parameters:
|
||||
|
||||
```typescript
|
||||
export interface CaptionItem {
|
||||
command: "caption",
|
||||
index: number, // Caption sequence number
|
||||
time_s: string, // Current caption start time
|
||||
time_t: string, // Current caption end time
|
||||
text: string, // Caption content
|
||||
translation: string // Caption translation
|
||||
}
|
||||
```
|
||||
|
||||
**Note that the buffer must be flushed after each caption JSON data output to ensure that the Electron main process receives strings that can be interpreted as JSON objects each time.** It is recommended to use the project's existing `stdout_obj` function for transmission.
|
||||
|
||||
### Command Line Parameter Specification
|
||||
|
||||
Custom caption engine settings provide command line parameter specification, so the caption engine parameters need to be set properly. Currently used parameters in this project are as follows:
|
||||
|
||||
```python
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
|
||||
# all
|
||||
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
|
||||
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
|
||||
parser.add_argument('-c', '--chunk_rate', default=10, help='Number of audio stream chunks collected per second')
|
||||
parser.add_argument('-p', '--port', default=0, help='The port to run the server on, 0 for no server')
|
||||
parser.add_argument('-t', '--target_language', default='zh', help='Target language code, "none" for no translation')
|
||||
parser.add_argument('-r', '--record', default=0, help='Whether to record the audio, 0 for no recording, 1 for recording')
|
||||
parser.add_argument('-rp', '--record_path', default='', help='Path to save the recorded audio')
|
||||
# gummy and sosv
|
||||
parser.add_argument('-s', '--source_language', default='auto', help='Source language code')
|
||||
# gummy only
|
||||
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
|
||||
# vosk and sosv
|
||||
parser.add_argument('-tm', '--translation_model', default='ollama', help='Model for translation: ollama or google')
|
||||
parser.add_argument('-omn', '--ollama_name', default='', help='Ollama model name for translation')
|
||||
# vosk only
|
||||
parser.add_argument('-vosk', '--vosk_model', default='', help='The path to the vosk model.')
|
||||
# sosv only
|
||||
parser.add_argument('-sosv', '--sosv_model', default=None, help='The SenseVoice model path')
|
||||
```
|
||||
|
||||
For example, for this project's caption engine, if I want to use the Gummy model, specify the original text as Japanese, translate to Chinese, and capture captions from system audio output, with 0.1s audio data segments each time, the command line parameters would be:
|
||||
|
||||
```bash
|
||||
python main.py -e gummy -s ja -t zh -a 0 -c 10 -k <dashscope-api-key>
|
||||
```
|
||||
|
||||
## Additional Notes
|
||||
|
||||
### Communication Standards
|
||||
|
||||
[caption engine api-doc](../api-docs/caption-engine.md)
|
||||
|
||||
### Program Entry
|
||||
|
||||
[main.py](../../engine/main.py)
|
||||
|
||||
### Development Recommendations
|
||||
|
||||
Except for audio-to-text conversion and translation, other components (audio acquisition, audio resampling, and communication with the main process) are recommended to directly reuse the project's code. If this approach is taken, the content that needs to be added includes:
|
||||
|
||||
- `engine/audio2text/`: Add a new audio-to-text class (file-level).
|
||||
- `engine/main.py`: Add new parameter settings and workflow functions (refer to `main_gummy` and `main_vosk` functions).
|
||||
|
||||
### Packaging
|
||||
|
||||
After development and testing, the caption engine must be packaged into an executable. Typically, `pyinstaller` is used. If the packaged executable reports errors, check for missing dependencies.
|
||||
|
||||
### Execution
|
||||
|
||||
With a functional caption engine, it can be launched in the caption software window by specifying the engine's path and runtime arguments.
|
||||
|
||||

|
||||
203
docs/engine-manual/ja.md
Normal file
@@ -0,0 +1,203 @@
|
||||
# 字幕エンジン説明ドキュメント
|
||||
|
||||
## 注意:このドキュメントはメンテナンスが行われていないため、記載されている情報は古くなっています。最新の情報については、[中国語版](./zh.md)または[英語版](./en.md)のドキュメントをご参照ください。
|
||||
|
||||
対応バージョン:v0.6.0
|
||||
|
||||
この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
|
||||
|
||||

|
||||
|
||||
## 字幕エンジン紹介
|
||||
|
||||
字幕エンジンとは、システムのオーディオ入力(マイク)または出力(スピーカー)のストリーミングデータをリアルタイムで取得し、音声を文字に変換するモデルを呼び出して対応する字幕を生成するサブプログラムです。生成された字幕はJSON形式の文字列データに変換され、標準出力を介してメインプログラムに渡されます(メインプログラムが受け取る文字列が正しくJSONオブジェクトとして解釈できる必要があります)。メインプログラムは字幕データを読み取り、解釈して処理した後、ウィンドウに表示します。
|
||||
|
||||
**字幕エンジンプロセスとElectronメインプロセス間の通信は、[caption engine api-doc](../api-docs/caption-engine.md)に準拠しています。**
|
||||
|
||||
## 実行フロー
|
||||
|
||||
メインプロセスと字幕エンジンの通信フロー:
|
||||
|
||||
### エンジンの起動
|
||||
|
||||
- メインプロセス:`child_process.spawn()`を使用して字幕エンジンプロセスを起動
|
||||
- 字幕エンジンプロセス:TCP Socketサーバースレッドを作成し、作成後に標準出力にJSONオブジェクトを文字列化して出力。このオブジェクトには`command`フィールドが含まれ、値は`connect`
|
||||
- メインプロセス:字幕エンジンプロセスの標準出力を監視し、標準出力を行ごとに分割してJSONオブジェクトとして解析し、オブジェクトの`command`フィールドの値が`connect`かどうかを判断。`connect`の場合はTCP Socketサーバーに接続
|
||||
|
||||
### 字幕認識
|
||||
|
||||
- 字幕エンジンプロセス:メインスレッドでシステムオーディオ出力を監視し、オーディオデータブロックを字幕エンジンに送信して解析。字幕エンジンはオーディオデータブロックを解析し、標準出力を介して解析された字幕データオブジェクト文字列を送信
|
||||
- メインプロセス:字幕エンジンの標準出力を引き続き監視し、解析されたオブジェクトの`command`フィールドに基づいて異なる操作を実行
|
||||
|
||||
### エンジンの停止
|
||||
|
||||
- メインプロセス:ユーザーがフロントエンドで字幕エンジンを停止する操作を実行すると、メインプロセスはSocket通信を介して字幕エンジンプロセスに`command`フィールドが`stop`のオブジェクト文字列を送信
|
||||
- 字幕エンジンプロセス:メインエンジンプロセスから送信された字幕データオブジェクト文字列を受信し、文字列をオブジェクトとして解析。オブジェクトの`command`フィールドが`stop`の場合、グローバル変数`thread_data.status`の値を`stop`に設定
|
||||
- 字幕エンジンプロセス:メインスレッドでシステムオーディオ出力をループ監視し、`thread_data.status`の値が`running`でない場合、ループを終了し、リソースを解放して実行を終了
|
||||
- メインプロセス:字幕エンジンプロセスの終了を検出した場合、対応する処理を実行し、フロントエンドにフィードバック
|
||||
|
||||
## プロジェクトで実装済みの機能
|
||||
|
||||
以下の機能はすでに実装されており、直接再利用できます。
|
||||
|
||||
### 標準出力
|
||||
|
||||
通常情報、コマンド、エラー情報を出力できます。
|
||||
|
||||
サンプル:
|
||||
|
||||
```python
|
||||
from utils import stdout, stdout_cmd, stdout_obj, stderr
|
||||
stdout("Hello") # {"command": "print", "content": "Hello"}\n
|
||||
stdout_cmd("connect", "8080") # {"command": "connect", "content": "8080"}\n
|
||||
stdout_obj({"command": "print", "content": "Hello"})
|
||||
stderr("Error Info")
|
||||
```
|
||||
|
||||
### Socketサービスの作成
|
||||
|
||||
このSocketサービスは指定されたポートを監視し、Electronメインプログラムから送信された内容を解析し、`thread_data.status`の値を変更する可能性があります。
|
||||
|
||||
サンプル:
|
||||
|
||||
```python
|
||||
from utils import start_server
|
||||
from utils import thread_data
|
||||
port = 8080
|
||||
start_server(port)
|
||||
while thread_data == 'running':
|
||||
# 何か処理
|
||||
pass
|
||||
```
|
||||
|
||||
### オーディオ取得
|
||||
|
||||
`AudioStream`クラスはオーディオデータを取得するために使用され、Windows、Linux、macOSでクロスプラットフォームで実装されています。このクラスの初期化には2つのパラメータが含まれます:
|
||||
|
||||
- `audio_type`:取得するオーディオのタイプ。0はシステム出力オーディオ(スピーカー)、1はシステム入力オーディオ(マイク)
|
||||
- `chunk_rate`:オーディオデータの取得頻度。1秒あたりに取得するオーディオブロックの数
|
||||
|
||||
このクラスには3つのメソッドがあります:
|
||||
|
||||
- `open_stream()`:オーディオ取得を開始
|
||||
- `read_chunk() -> bytes`:1つのオーディオブロックを読み取り
|
||||
- `close_stream()`:オーディオ取得を閉じる
|
||||
|
||||
サンプル:
|
||||
|
||||
```python
|
||||
from sysaudio import AudioStream
|
||||
audio_type = 0
|
||||
chunk_rate = 20
|
||||
stream = AudioStream(audio_type, chunk_rate)
|
||||
stream.open_stream()
|
||||
while True:
|
||||
data = stream.read_chunk()
|
||||
# データで何か処理
|
||||
pass
|
||||
stream.close_stream()
|
||||
```
|
||||
|
||||
### オーディオ処理
|
||||
|
||||
取得したオーディオストリームは、文字に変換する前に前処理が必要な場合があります。一般的に、マルチチャンネルオーディオをシングルチャンネルオーディオに変換し、リサンプリングが必要な場合もあります。このプロジェクトでは、3つのオーディオ処理関数を提供しています:
|
||||
|
||||
- `merge_chunk_channels(chunk: bytes, channels: int) -> bytes`:マルチチャンネルオーディオブロックをシングルチャンネルオーディオブロックに変換
|
||||
- `resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`:現在のマルチチャンネルオーディオデータブロックをシングルチャンネルオーディオデータブロックに変換し、リサンプリングを実行
|
||||
- `resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`:現在のシングルチャンネルオーディオブロックをリサンプリング
|
||||
|
||||
## 字幕エンジンで実装が必要な機能
|
||||
|
||||
### オーディオから文字への変換
|
||||
|
||||
適切なオーディオストリームを取得した後、オーディオストリームを文字に変換する必要があります。一般的に、さまざまなモデル(クラウドまたはローカル)を使用してオーディオストリームを文字に変換します。要件に応じて適切なモデルを選択する必要があります。
|
||||
|
||||
この部分はクラスとしてカプセル化することをお勧めします。以下の3つのメソッドを実装する必要があります:
|
||||
|
||||
- `start(self)`:モデルを起動
|
||||
- `send_audio_frame(self, data: bytes)`:現在のオーディオブロックデータを処理し、**生成された字幕データを標準出力を介してElectronメインプロセスに送信**
|
||||
- `stop(self)`:モデルを停止
|
||||
|
||||
完全な字幕エンジンの実例:
|
||||
|
||||
- [gummy.py](../../engine/audio2text/gummy.py)
|
||||
- [vosk.py](../../engine/audio2text/vosk.py)
|
||||
|
||||
### 字幕翻訳
|
||||
|
||||
一部の音声文字変換モデルは翻訳を提供していません。必要がある場合、翻訳モジュールを追加する必要があります。
|
||||
|
||||
### 字幕データの送信
|
||||
|
||||
現在のオーディオストリームのテキストを取得した後、そのテキストをメインプログラムに送信する必要があります。字幕エンジンプロセスは標準出力を介して字幕データをElectronメインプロセスに渡します。
|
||||
|
||||
送信する内容はJSON文字列でなければなりません。JSONオブジェクトには以下のパラメータを含める必要があります:
|
||||
|
||||
```typescript
|
||||
export interface CaptionItem {
|
||||
command: "caption",
|
||||
index: number, // 字幕のシーケンス番号
|
||||
time_s: string, // 現在の字幕の開始時間
|
||||
time_t: string, // 現在の字幕の終了時間
|
||||
text: string, // 字幕の内容
|
||||
translation: string // 字幕の翻訳
|
||||
}
|
||||
```
|
||||
|
||||
**JSONデータを出力するたびにバッファをフラッシュし、electronメインプロセスが受信する文字列が常にJSONオブジェクトとして解釈できるようにする必要があります。**
|
||||
|
||||
プロジェクトで既に実装されている`stdout_obj`関数を使用して送信することをお勧めします。
|
||||
|
||||
### コマンドラインパラメータの指定
|
||||
|
||||
カスタム字幕エンジンの設定はコマンドラインパラメータで指定するため、字幕エンジンのパラメータを設定する必要があります。このプロジェクトで現在使用されているパラメータは以下のとおりです:
|
||||
|
||||
```python
|
||||
import argparse
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description='システムオーディオストリームをテキストに変換')
|
||||
# 共通
|
||||
parser.add_argument('-e', '--caption_engine', default='gummy', help='字幕エンジン: gummyまたはvosk')
|
||||
parser.add_argument('-a', '--audio_type', default=0, help='オーディオストリームソース: 0は出力、1は入力')
|
||||
parser.add_argument('-c', '--chunk_rate', default=10, help='1秒あたりに収集するオーディオストリームブロックの数')
|
||||
parser.add_argument('-p', '--port', default=8080, help='サーバーを実行するポート、0はサーバーなし')
|
||||
# gummy専用
|
||||
parser.add_argument('-s', '--source_language', default='en', help='ソース言語コード')
|
||||
parser.add_argument('-t', '--target_language', default='zh', help='ターゲット言語コード')
|
||||
parser.add_argument('-k', '--api_key', default='', help='GummyモデルのAPI KEY')
|
||||
# vosk専用
|
||||
parser.add_argument('-m', '--model_path', default='', help='voskモデルのパス')
|
||||
```
|
||||
|
||||
たとえば、このプロジェクトの字幕エンジンでGummyモデルを使用し、原文を日本語、翻訳を中国語に指定し、システムオーディオ出力の字幕を取得し、毎回0.1秒のオーディオデータをキャプチャする場合、コマンドラインパラメータは以下のようになります:
|
||||
|
||||
```bash
|
||||
python main.py -e gummy -s ja -t zh -a 0 -c 10 -k <dashscope-api-key>
|
||||
```
|
||||
|
||||
## その他
|
||||
|
||||
### 通信規格
|
||||
|
||||
[caption engine api-doc](../api-docs/caption-engine.md)
|
||||
|
||||
### プログラムエントリ
|
||||
|
||||
[main.py](../../engine/main.py)
|
||||
|
||||
### 開発の推奨事項
|
||||
|
||||
オーディオから文字への変換以外は、このプロジェクトのコードを直接再利用することをお勧めします。その場合、追加する必要がある内容は:
|
||||
|
||||
- `engine/audio2text/`:新しいオーディオから文字への変換クラスを追加(ファイルレベル)
|
||||
- `engine/main.py`:新しいパラメータ設定とプロセス関数を追加(`main_gummy`関数と`main_vosk`関数を参照)
|
||||
|
||||
### パッケージ化
|
||||
|
||||
字幕エンジンの開発とテストが完了した後、字幕エンジンを実行可能ファイルにパッケージ化する必要があります。一般的に`pyinstaller`を使用してパッケージ化します。パッケージ化された字幕エンジンファイルの実行でエラーが発生した場合、依存ライブラリが不足している可能性があります。不足している依存ライブラリを確認してください。
|
||||
|
||||
### 実行
|
||||
|
||||
使用可能な字幕エンジンを取得したら、字幕ソフトウェアウィンドウで字幕エンジンのパスと字幕エンジンの実行コマンド(パラメータ)を指定して字幕エンジンを起動できます。
|
||||
|
||||

|
||||
231
docs/engine-manual/zh.md
Normal file
@@ -0,0 +1,231 @@
|
||||
# 字幕引擎说明文档
|
||||
|
||||
对应版本:v1.0.0
|
||||
|
||||

|
||||
|
||||
## 字幕引擎介绍
|
||||
|
||||
所谓的字幕引擎实际上是一个子程序,它会实时获取系统音频输入(麦克风)或输出(扬声器)的流式数据,并调用音频转文字的模型生成对应音频的字幕。生成的字幕转换为 JSON 格式的字符串数据,并通过标准输出传递给主程序(需要保证主程序读取到的字符串可以被正确解释为 JSON 对象)。主程序读取并解释字幕数据,处理后显示在窗口上。
|
||||
|
||||
**字幕引擎进程和 Electron 主进程之间的通信遵循的标准为:[caption engine api-doc](../api-docs/caption-engine.md)。**
|
||||
|
||||
## 运行流程
|
||||
|
||||
主进程和字幕引擎通信的流程:
|
||||
|
||||
### 启动引擎
|
||||
|
||||
- Electron 主进程:使用 `child_process.spawn()` 启动字幕引擎进程
|
||||
- 字幕引擎进程:创建 TCP Socket 服务器线程,创建后在标准输出中输出转化为字符串的 JSON 对象,该对象中包含 `command` 字段,值为 `connect`
|
||||
- 主进程:监听字幕引擎进程的标准输出,尝试将标准输出按行分割,解析为 JSON 对象,并判断对象的 `command` 字段值是否为 `connect`,如果是则连接 TCP Socket 服务器
|
||||
|
||||
### 字幕识别
|
||||
|
||||
- 字幕引擎进程:新建线程监听系统音频输出,将获取的音频数据块放入共享队列中(`shared_data.chunk_queue`)。字幕引擎不断读取共享队列中的音频数据块并解析。字幕引擎还可能新建线程执行翻译操作。最后字幕引擎通过标准输出发送解析的字幕数据对象字符串
|
||||
- Electron 主进程:持续监听字幕引擎的标准输出,并根据解析的对象的 `command` 字段采取不同的操作
|
||||
|
||||
### 关闭引擎
|
||||
|
||||
- Electron 主进程:当用户在前端操作关闭字幕引擎时,主进程通过 Socket 通信给字幕引擎进程发送 `command` 字段为 `stop` 的对象字符串
|
||||
- 字幕引擎进程:接收主引擎进程发送的字幕数据对象字符串,将字符串解析为对象,如果对象的 `command` 字段为 `stop`,则将全局变量 `shared_data.status` 的值设置为 `stop`
|
||||
- 字幕引擎进程:主线程循环监听系统音频输出,当 `thread_data.status` 的值不为 `running` 时,则结束循环,释放资源,结束运行
|
||||
- Electron 主进程:如果检测到字幕引擎进程结束,进行相应处理,并向前端反馈
|
||||
|
||||
## 项目已经实现的功能
|
||||
|
||||
以下功能已经实现,可以直接复用。
|
||||
|
||||
### 标准输出
|
||||
|
||||
可以输出普通信息、命令和错误信息。
|
||||
|
||||
样例:
|
||||
|
||||
```python
|
||||
from utils import stdout, stdout_cmd, stdout_obj, stderr
|
||||
# {"command": "print", "content": "Hello"}\n
|
||||
stdout("Hello")
|
||||
# {"command": "connect", "content": "8080"}\n
|
||||
stdout_cmd("connect", "8080")
|
||||
# {"command": "print", "content": "print"}\n
|
||||
stdout_obj({"command": "print", "content": "print"})
|
||||
# sys.stderr.write("Error Info" + "\n")
|
||||
stderr("Error Info")
|
||||
```
|
||||
|
||||
### 创建 Socket 服务
|
||||
|
||||
该 Socket 服务会监听指定端口,会解析 Electron 主程序发送的内容,并可能改变 `shared_data.status` 的值。
|
||||
|
||||
样例:
|
||||
|
||||
```python
|
||||
from utils import start_server
|
||||
from utils import shared_data
|
||||
port = 8080
|
||||
start_server(port)
|
||||
while thread_data == 'running':
|
||||
# do something
|
||||
pass
|
||||
```
|
||||
|
||||
### 音频获取
|
||||
|
||||
`AudioStream` 类用于获取音频数据,实现是跨平台的,支持 Windows、Linux 和 macOS。该类初始化包含两个参数:
|
||||
|
||||
- `audio_type`: 获取音频类型,0 表示系统输出音频(扬声器),1 表示系统输入音频(麦克风)
|
||||
- `chunk_rate`: 音频数据获取频率,每秒音频获取的音频块的数量,默认为 10
|
||||
|
||||
该类包含四个方法:
|
||||
|
||||
- `open_stream()`: 开启音频获取
|
||||
- `read_chunk() -> bytes`: 读取一个音频块
|
||||
- `close_stream()`: 关闭音频获取
|
||||
- `close_stream_signal()` 线程安全的关闭系统音频输入流
|
||||
|
||||
样例:
|
||||
|
||||
```python
|
||||
from sysaudio import AudioStream
|
||||
audio_type = 0
|
||||
chunk_rate = 20
|
||||
stream = AudioStream(audio_type, chunk_rate)
|
||||
stream.open_stream()
|
||||
while True:
|
||||
data = stream.read_chunk()
|
||||
# do something with data
|
||||
pass
|
||||
stream.close_stream()
|
||||
```
|
||||
|
||||
### 音频处理
|
||||
|
||||
获取到的音频流在转文字之前可能需要进行预处理。一般需要将多通道音频转换为单通道音频,还可能需要进行重采样。本项目提供了两个音频处理函数:
|
||||
|
||||
- `merge_chunk_channels(chunk: bytes, channels: int) -> bytes`: 将多通道音频块转换为单通道音频块
|
||||
- `resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int) -> bytes`:将当前多通道音频数据块转换成单通道音频数据块,然后进行重采样
|
||||
|
||||
样例:
|
||||
|
||||
```python
|
||||
from sysaudio import AudioStream
|
||||
from utils import merge_chunk_channels
|
||||
stream = AudioStream(1)
|
||||
while True:
|
||||
raw_chunk = stream.read_chunk()
|
||||
chunk = resample_chunk_mono(raw_chunk, stream.CHANNELS, stream.RATE, 16000)
|
||||
# do something with chunk
|
||||
```
|
||||
|
||||
## 字幕引擎需要实现的功能
|
||||
|
||||
### 音频转文字
|
||||
|
||||
在得到了合适的音频流后,需要将音频流转换为文字了。一般使用各种模型(云端或本地)来实现音频流转文字。需要根据需求选择合适的模型。
|
||||
|
||||
这部分建议封装为一个类,需要实现四个方法:
|
||||
|
||||
- `start(self)`:启动模型
|
||||
- `send_audio_frame(self, data: bytes)`:处理当前音频块数据,**生成的字幕数据通过标准输出发送给 Electron 主进程**
|
||||
- `translate(self)`:持续从 `shared_data.chunk_queue` 中取出数据块,并调用 `send_audio_frame` 方法处理数据块
|
||||
- `stop(self)`:停止模型
|
||||
|
||||
完整的字幕引擎实例如下:
|
||||
|
||||
- [gummy.py](../../engine/audio2text/gummy.py)
|
||||
- [vosk.py](../../engine/audio2text/vosk.py)
|
||||
- [sosv.py](../../engine/audio2text/sosv.py)
|
||||
|
||||
### 字幕翻译
|
||||
|
||||
有的语音转文字模型并不提供翻译,如果有需求,需要再添加一个翻译模块,也可以使用自带的翻译模块。
|
||||
|
||||
样例:
|
||||
|
||||
```python
|
||||
from utils import google_translate, ollama_translate
|
||||
text = "这是一个翻译测试。"
|
||||
google_translate("", "en", text, "time_s")
|
||||
ollama_translate("qwen3:0.6b", "en", text, "time_s")
|
||||
```
|
||||
|
||||
### 字幕数据发送
|
||||
|
||||
在获取到当前音频流的文字后,需要将文字发送给主程序。字幕引擎进程通过标准输出将字幕数据传递给 Electron 主进程。
|
||||
|
||||
传递的内容必须是 JSON 字符串,其中 JSON 对象需要包含的参数如下:
|
||||
|
||||
```typescript
|
||||
export interface CaptionItem {
|
||||
command: "caption",
|
||||
index: number, // 字幕序号
|
||||
time_s: string, // 当前字幕开始时间
|
||||
time_t: string, // 当前字幕结束时间
|
||||
text: string, // 字幕内容
|
||||
translation: string // 字幕翻译
|
||||
}
|
||||
```
|
||||
|
||||
**注意必须确保每输出一次字幕 JSON 数据就得刷新缓冲区,确保 electron 主进程每次接收到的字符串都可以被解释为 JSON 对象。** 建议使用项目已经实现的 `stdout_obj` 函数来发送。
|
||||
|
||||
### 命令行参数的指定
|
||||
|
||||
自定义字幕引擎的设置提供命令行参数指定,因此需要设置好字幕引擎的参数,本项目目前用到的参数如下:
|
||||
|
||||
```python
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
|
||||
# all
|
||||
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
|
||||
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
|
||||
parser.add_argument('-c', '--chunk_rate', default=10, help='Number of audio stream chunks collected per second')
|
||||
parser.add_argument('-p', '--port', default=0, help='The port to run the server on, 0 for no server')
|
||||
parser.add_argument('-t', '--target_language', default='zh', help='Target language code, "none" for no translation')
|
||||
parser.add_argument('-r', '--record', default=0, help='Whether to record the audio, 0 for no recording, 1 for recording')
|
||||
parser.add_argument('-rp', '--record_path', default='', help='Path to save the recorded audio')
|
||||
# gummy and sosv
|
||||
parser.add_argument('-s', '--source_language', default='auto', help='Source language code')
|
||||
# gummy only
|
||||
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
|
||||
# vosk and sosv
|
||||
parser.add_argument('-tm', '--translation_model', default='ollama', help='Model for translation: ollama or google')
|
||||
parser.add_argument('-omn', '--ollama_name', default='', help='Ollama model name for translation')
|
||||
# vosk only
|
||||
parser.add_argument('-vosk', '--vosk_model', default='', help='The path to the vosk model.')
|
||||
# sosv only
|
||||
parser.add_argument('-sosv', '--sosv_model', default=None, help='The SenseVoice model path')
|
||||
```
|
||||
|
||||
比如对于本项目的字幕引擎,我想使用 Gummy 模型,指定原文为日语,翻译为中文,获取系统音频输出的字幕,每次截取 0.1s 的音频数据,那么命令行参数如下:
|
||||
|
||||
```bash
|
||||
python main.py -e gummy -s ja -t zh -a 0 -c 10 -k <dashscope-api-key>
|
||||
```
|
||||
|
||||
## 其他
|
||||
|
||||
### 通信规范
|
||||
|
||||
[caption engine api-doc](../api-docs/caption-engine.md)
|
||||
|
||||
### 程序入口
|
||||
|
||||
[main.py](../../engine/main.py)
|
||||
|
||||
### 开发建议
|
||||
|
||||
除音频转文字和翻译外,其他(音频获取、音频重采样、与主进程通信)建议直接复用本项目代码。如果这样,那么需要添加的内容为:
|
||||
|
||||
- `engine/audio2text/`:添加新的音频转文字类(文件级别)
|
||||
- `engine/main.py`:添加新参数设置、流程函数(参考 `main_gummy` 函数和 `main_vosk` 函数)
|
||||
|
||||
### 打包
|
||||
|
||||
在完成字幕引擎的开发和测试后,需要将字幕引擎打包成可执行文件。一般使用 `pyinstaller` 进行打包。如果打包好的字幕引擎文件执行报错,可能是打包漏掉了某些依赖库,请检查是否缺少了依赖库。
|
||||
|
||||
### 运行
|
||||
|
||||
有了可以使用的字幕引擎,就可以在字幕软件窗口中通过指定字幕引擎的路径和字幕引擎的运行指令(参数)来启动字幕引擎了。
|
||||
|
||||

|
||||
BIN
docs/img/01.png
Normal file
|
After Width: | Height: | Size: 57 KiB |
BIN
docs/img/02_en.png
Normal file
|
After Width: | Height: | Size: 105 KiB |
BIN
docs/img/02_ja.png
Normal file
|
After Width: | Height: | Size: 132 KiB |
BIN
docs/img/02_zh.png
Normal file
|
After Width: | Height: | Size: 111 KiB |
BIN
docs/img/03.png
Normal file
|
After Width: | Height: | Size: 152 KiB |
BIN
docs/img/04.png
Normal file
|
After Width: | Height: | Size: 172 KiB |
BIN
docs/img/05.png
Normal file
|
After Width: | Height: | Size: 26 KiB |
BIN
docs/img/06.png
Normal file
|
After Width: | Height: | Size: 148 KiB |
BIN
docs/img/07.png
Normal file
|
After Width: | Height: | Size: 94 KiB |
333
docs/user-manual/en.md
Normal file
@@ -0,0 +1,333 @@
|
||||
# Auto Caption User Manual
|
||||
|
||||
Corresponding Version: v1.1.1
|
||||
|
||||
**Note: Due to limited personal resources, the English and Japanese documentation files for this project (except for the README document) will no longer be maintained. The content of this document may not be consistent with the latest version of the project. If you are willing to help with translation, please submit relevant Pull Requests.**
|
||||
|
||||
## Software Introduction
|
||||
|
||||
Auto Caption is a cross-platform caption display software that can real-time capture system audio input (recording) or output (playback) streaming data and use an audio-to-text model to generate captions for the corresponding audio. The default caption engine provided by the software (using Alibaba Cloud Gummy model) supports recognition and translation in nine languages (Chinese, English, Japanese, Korean, German, French, Russian, Spanish, Italian).
|
||||
|
||||
The default caption engine currently has full functionality on Windows, macOS, and Linux platforms. Additional configuration is required to capture system audio output on macOS.
|
||||
|
||||
The following operating system versions have been tested and confirmed to work properly. The software cannot guarantee normal operation on untested OS versions.
|
||||
|
||||
| OS Version | Architecture | Audio Input Capture | Audio Output Capture |
|
||||
| ------------------ | ------------ | ------------------- | -------------------- |
|
||||
| Windows 11 24H2 | x64 | ✅ | ✅ |
|
||||
| macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ |
|
||||
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
|
||||
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
|
||||
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
|
||||
|
||||

|
||||
|
||||
### Software Limitations
|
||||
|
||||
To use the Gummy caption engine, you need to obtain an API KEY from Alibaba Cloud.
|
||||
|
||||
Additional configuration is required to capture audio output on macOS platform.
|
||||
|
||||
The software is built using Electron, so the software size is inevitably large.
|
||||
|
||||
## Preparation for Using Gummy Engine
|
||||
|
||||
To use the default caption engine provided by the software (Alibaba Cloud Gummy), you need to obtain an API KEY from the Alibaba Cloud Bailian platform. Then add the API KEY to the software settings or configure it in environment variables (only Windows platform supports reading API KEY from environment variables).
|
||||
|
||||
**The international version of Alibaba Cloud services does not provide the Gummy model, so non-Chinese users currently cannot use the default caption engine.**
|
||||
|
||||
Alibaba Cloud provides detailed tutorials for this part, which can be referenced:
|
||||
|
||||
- [Obtaining API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [Configuring API Key through Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
|
||||
## Preparation for GLM Engine
|
||||
|
||||
You need to obtain an API KEY first, refer to: [Quick Start](https://docs.bigmodel.cn/en/guide/start/quick-start).
|
||||
|
||||
## Preparation for Using Vosk Engine
|
||||
|
||||
To use the Vosk local caption engine, first download your required model from the [Vosk Models](https://alphacephei.com/vosk/models) page. Then extract the downloaded model package locally and add the corresponding model folder path to the software settings.
|
||||
|
||||

|
||||
|
||||
## Using SOSV Model
|
||||
|
||||
The way to use the SOSV model is the same as Vosk. The download address is as follows: https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
|
||||
|
||||
## Capturing System Audio Output on macOS
|
||||
|
||||
> Based on the [Setup Multi-Output Device](https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device) tutorial
|
||||
|
||||
The caption engine cannot directly capture system audio output on macOS platform and requires additional driver installation. The current caption engine uses [BlackHole](https://github.com/ExistentialAudio/BlackHole). First open Terminal and execute one of the following commands (recommended to choose the first one):
|
||||
|
||||
```bash
|
||||
brew install blackhole-2ch
|
||||
brew install blackhole-16ch
|
||||
brew install blackhole-64ch
|
||||
```
|
||||
|
||||

|
||||
|
||||
After installation completes, open `Audio MIDI Setup` (searchable via `cmd + space`). Check if BlackHole appears in the device list - if not, restart your computer.
|
||||
|
||||

|
||||
|
||||
Once BlackHole is confirmed installed, in the `Audio MIDI Setup` page, click the plus (+) button at bottom left and select "Create Multi-Output Device". Include both BlackHole and your desired audio output destination in the outputs. Finally, set this multi-output device as your default audio output device.
|
||||
|
||||

|
||||
|
||||
Now the caption engine can capture system audio output and generate captions.
|
||||
|
||||
## Getting System Audio Output on Linux
|
||||
|
||||
First execute in the terminal:
|
||||
|
||||
```bash
|
||||
pactl list short sources
|
||||
```
|
||||
|
||||
If you see output similar to the following, no additional configuration is needed:
|
||||
|
||||
```bash
|
||||
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
|
||||
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
|
||||
```
|
||||
|
||||
Otherwise, install `pulseaudio` and `pavucontrol` using the following commands:
|
||||
|
||||
```bash
|
||||
# For Debian/Ubuntu etc.
|
||||
sudo apt install pulseaudio pavucontrol
|
||||
# For CentOS etc.
|
||||
sudo yum install pulseaudio pavucontrol
|
||||
```
|
||||
|
||||
## Software Usage
|
||||
|
||||
### Modifying Settings
|
||||
|
||||
Caption settings can be divided into three categories: general settings, caption engine settings, and caption style settings. Note that changes to general settings take effect immediately. For the other two categories, after making changes, you need to click the "Apply" option in the upper right corner of the corresponding settings module for the changes to take effect. If you click "Cancel Changes," the current modifications will not be saved and will revert to the previous state.
|
||||
|
||||
### Starting and Stopping Captions
|
||||
|
||||
After completing all configurations, click the "Start Caption Engine" button on the interface to start the captions. If you need a separate caption display window, click the "Open Caption Window" button to activate the independent caption display window. To pause caption recognition, click the "Stop Caption Engine" button.
|
||||
|
||||
### Adjusting the Caption Display Window
|
||||
|
||||
The following image shows the caption display window, which displays the latest captions in real-time. The functions of the three buttons in the upper right corner of the window are: to close the caption display window, to open the caption control window, and to enable mouse pass-through. The width of the window can be adjusted by moving the mouse to the left or right edge of the window and dragging the mouse.
|
||||
|
||||

|
||||
|
||||
### Exporting Caption Records
|
||||
|
||||
In the caption control window, you can see the records of all collected captions. Click the "Export Log" button to export the caption records as a JSON or SRT file.
|
||||
|
||||
## Caption Engine
|
||||
|
||||
The so-called caption engine is essentially a subprogram that captures real-time streaming data from system audio input (recording) or output (playback), and invokes speech-to-text models to generate corresponding captions. The generated captions are converted into JSON-formatted strings and passed to the main program through standard output. The main program reads the caption data, processes it, and displays it in the window.
|
||||
|
||||
The software provides two default caption engines. If you need other caption engines, you can invoke them by enabling the custom engine option (other engines need to be specifically developed for this software). The engine path refers to the location of the custom caption engine on your computer, while the engine command represents the runtime parameters of the custom caption engine, which should be configured according to the rules of that particular caption engine.
|
||||
|
||||

|
||||
|
||||
Note that when using a custom caption engine, all previous caption engine settings will be ineffective, and the configuration of the custom caption engine is entirely done through the engine command.
|
||||
|
||||
If you are a developer and want to develop a custom caption engine, please refer to the [Caption Engine Explanation Document](../engine-manual/en.md).
|
||||
|
||||
## Using Caption Engine Standalone
|
||||
|
||||
### Runtime Parameter Description
|
||||
|
||||
> The following content assumes users have some knowledge of running programs via terminal.
|
||||
|
||||
The complete set of runtime parameters available for the caption engine is shown below:
|
||||
|
||||

|
||||
|
||||
However, when used standalone, some parameters may not need to be used or should not be modified.
|
||||
|
||||
The following parameter descriptions only include necessary parameters.
|
||||
|
||||
#### `-e , --caption_engine`
|
||||
|
||||
The caption engine model to select, currently three options are available: `gummy, glm, vosk, sosv`.
|
||||
|
||||
The default value is `gummy`.
|
||||
|
||||
This applies to all models.
|
||||
|
||||
#### `-a, --audio_type`
|
||||
|
||||
The audio type to recognize, where `0` represents system audio output and `1` represents microphone audio input.
|
||||
|
||||
The default value is `0`.
|
||||
|
||||
This applies to all models.
|
||||
|
||||
#### `-d, --display_caption`
|
||||
|
||||
Whether to display captions in the console, `0` means do not display, `1` means display.
|
||||
|
||||
The default value is `0`, but it's recommended to choose `1` when using only the caption engine.
|
||||
|
||||
This applies to all models.
|
||||
|
||||
#### `-t, --target_language`
|
||||
|
||||
> Note that Vosk and SOSV models have poor sentence segmentation, which can make translated content difficult to understand. It's not recommended to use translation with these two models.
|
||||
|
||||
Target language for translation. All models support the following translation languages:
|
||||
|
||||
- `none` No translation
|
||||
- `zh` Simplified Chinese
|
||||
- `en` English
|
||||
- `ja` Japanese
|
||||
- `ko` Korean
|
||||
|
||||
Additionally, `vosk` and `sosv` models also support the following translations:
|
||||
|
||||
- `de` German
|
||||
- `fr` French
|
||||
- `ru` Russian
|
||||
- `es` Spanish
|
||||
- `it` Italian
|
||||
|
||||
The default value is `none`.
|
||||
|
||||
This applies to all models.
|
||||
|
||||
#### `-s, --source_language`
|
||||
|
||||
Source language for recognition. Default value is `auto`, meaning no specific source language.
|
||||
|
||||
Specifying the source language can improve recognition accuracy to some extent. You can specify the source language using the language codes above.
|
||||
|
||||
This applies to Gummy, GLM and SOSV models.
|
||||
|
||||
The Gummy model can use all the languages mentioned above, plus Cantonese (`yue`).
|
||||
|
||||
The GLM model supports specifying the following languages: English, Chinese, Japanese, Korean.
|
||||
|
||||
The SOSV model supports specifying the following languages: English, Chinese, Japanese, Korean, and Cantonese.
|
||||
|
||||
#### `-k, --api_key`
|
||||
|
||||
Specify the Alibaba Cloud API KEY required for the `Gummy` model.
|
||||
|
||||
Default value is empty.
|
||||
|
||||
This only applies to the Gummy model.
|
||||
|
||||
#### `-gkey, --glm_api_key`
|
||||
|
||||
Specifies the API KEY required for the `glm` model. The default value is empty.
|
||||
|
||||
#### `-gmodel, --glm_model`
|
||||
|
||||
Specifies the model name to be used for the `glm` model. The default value is `glm-asr-2512`.
|
||||
|
||||
#### `-gurl, --glm_url`
|
||||
|
||||
Specifies the API URL required for the `glm` model. The default value is: `https://open.bigmodel.cn/api/paas/v4/audio/transcriptions`.
|
||||
|
||||
#### `-tm, --translation_model`
|
||||
|
||||
Specify the translation method for Vosk and SOSV models. Default is `ollama`.
|
||||
|
||||
Supported values are:
|
||||
|
||||
- `ollama` Use local Ollama model for translation. Users need to install Ollama software and corresponding models
|
||||
- `google` Use Google Translate API for translation. No additional configuration needed, but requires network access to Google
|
||||
|
||||
This only applies to Vosk and SOSV models.
|
||||
|
||||
#### `-omn, --ollama_name`
|
||||
|
||||
Specifies the name of the translation model to be used, which can be either a local Ollama model or a cloud model compatible with the OpenAI API. If the Base URL field is not filled in, the local Ollama service will be called by default; otherwise, the API service at the specified address will be invoked via the Python OpenAI library.
|
||||
|
||||
If using an Ollama model, it is recommended to use a model with fewer than 1B parameters, such as `qwen2.5:0.5b` or `qwen3:0.6b`. The corresponding model must be downloaded in Ollama for normal use.
|
||||
|
||||
The default value is empty and applies to models other than Gummy.
|
||||
|
||||
#### `-ourl, --ollama_url`
|
||||
|
||||
The base request URL for calling the OpenAI API. If left blank, the local Ollama model on the default port will be called.
|
||||
|
||||
The default value is empty and applies to models other than Gummy.
|
||||
|
||||
#### `-okey, --ollama_api_key`
|
||||
|
||||
Specifies the API KEY for calling OpenAI-compatible models.
|
||||
|
||||
The default value is empty and applies to models other than Gummy.
|
||||
|
||||
#### `-vosk, --vosk_model`
|
||||
|
||||
Specify the path to the local folder of the Vosk model to call. Default value is empty.
|
||||
|
||||
This only applies to the Vosk model.
|
||||
|
||||
#### `-sosv, --sosv_model`
|
||||
|
||||
Specify the path to the local folder of the SOSV model to call. Default value is empty.
|
||||
|
||||
This only applies to the SOSV model.
|
||||
|
||||
### Running Caption Engine Using Source Code
|
||||
|
||||
> The following content assumes users who use this method have knowledge of Python environment configuration and usage.
|
||||
|
||||
First, download the project source code locally. The caption engine source code is located in the `engine` directory of the project. Then configure the Python environment, where the project dependencies are listed in the `requirements.txt` file in the `engine` directory.
|
||||
|
||||
After configuration, enter the `engine` directory and execute commands to run the caption engine.
|
||||
|
||||
For example, to use the Gummy model, specify audio type as system audio output, source language as English, and target language as Chinese, execute the following command:
|
||||
|
||||
> Note: For better visualization, the commands below are written on multiple lines. If execution fails, try removing backslashes and executing as a single line command.
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e gummy \
|
||||
-k sk-******************************** \
|
||||
-a 0 \
|
||||
-d 1 \
|
||||
-s en \
|
||||
-t zh
|
||||
```
|
||||
|
||||
To specify the Vosk model, audio type as system audio output, translate to English, and use Ollama `qwen3:0.6b` model for translation:
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e vosk \
|
||||
-vosk D:\Projects\auto-caption\engine\models\vosk-model-small-cn-0.22 \
|
||||
-a 0 \
|
||||
-d 1 \
|
||||
-t en \
|
||||
```
|
||||
|
||||
To specify the SOSV model, audio type as microphone, automatically select source language, and no translation:
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e sosv \
|
||||
-sosv D:\\Projects\\auto-caption\\engine\\models\\sosv-int8 \
|
||||
-a 1 \
|
||||
-d 1 \
|
||||
-s auto \
|
||||
-t none
|
||||
```
|
||||
|
||||
Running result using the Gummy model is shown below:
|
||||
|
||||

|
||||
|
||||
### Running Subtitle Engine Executable File
|
||||
|
||||
First, download the executable file for your platform from [GitHub Releases](https://github.com/HiMeditator/auto-caption/releases/tag/engine) (currently only Windows and Linux platform executable files are provided).
|
||||
|
||||
Then open a terminal in the directory containing the caption engine executable file and execute commands to run the caption engine.
|
||||
|
||||
Simply replace `python main.py` in the above commands with the executable file name (for example: `engine-win.exe`).
|
||||
137
docs/user-manual/ja.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Auto Caption ユーザーマニュアル
|
||||
|
||||
対応バージョン:v1.1.1
|
||||
|
||||
この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
|
||||
|
||||
## ソフトウェアの概要
|
||||
|
||||
Auto Caption は、クロスプラットフォームの字幕表示ソフトウェアで、システムの音声入力(録音)または出力(音声再生)のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。このソフトウェアが提供するデフォルトの字幕エンジン(アリババクラウド Gummy モデルを使用)は、9つの言語(中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、スペイン語、イタリア語)の認識と翻訳をサポートしています。
|
||||
|
||||
現在のデフォルト字幕エンジンは Windows、macOS、Linux プラットフォームで完全な機能を有しています。macOSでシステムのオーディオ出力を取得するには追加設定が必要です。
|
||||
|
||||
以下のオペレーティングシステムバージョンで正常動作を確認しています。記載以外の OS での正常動作は保証できません。
|
||||
|
||||
| OS バージョン | アーキテクチャ | オーディオ入力取得 | オーディオ出力取得 |
|
||||
| ------------------- | ------------- | ------------------ | ------------------ |
|
||||
| Windows 11 24H2 | x64 | ✅ | ✅ |
|
||||
| macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ |
|
||||
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
|
||||
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
|
||||
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
|
||||
|
||||

|
||||
|
||||
### ソフトウェアの欠点
|
||||
|
||||
Gummy 字幕エンジンを使用するには、アリババクラウドの API KEY を取得する必要があります。
|
||||
|
||||
macOS プラットフォームでオーディオ出力を取得するには追加の設定が必要です。
|
||||
|
||||
ソフトウェアは Electron で構築されているため、そのサイズは避けられないほど大きいです。
|
||||
|
||||
## Gummyエンジン使用前の準備
|
||||
|
||||
ソフトウェアが提供するデフォルトの字幕エンジン(Alibaba Cloud Gummy)を使用するには、Alibaba Cloud百煉プラットフォームからAPI KEYを取得する必要があります。その後、API KEYをソフトウェア設定に追加するか、環境変数に設定します(Windowsプラットフォームのみ環境変数からのAPI KEY読み取りをサポート)。
|
||||
|
||||
**Alibaba Cloudの国際版サービスではGummyモデルを提供していないため、現在中国以外のユーザーはデフォルトの字幕エンジンを使用できません。**
|
||||
|
||||
この部分についてAlibaba Cloudは詳細なチュートリアルを提供しており、以下を参照できます:
|
||||
|
||||
- [API KEY の取得(中国語)](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [環境変数を通じて API Key を設定(中国語)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
## GLM エンジン使用前の準備
|
||||
|
||||
まずAPI KEYを取得する必要があります。参考:[クイックスタート](https://docs.bigmodel.cn/en/guide/start/quick-start)。
|
||||
|
||||
## Voskエンジン使用前の準備
|
||||
|
||||
Voskローカル字幕エンジンを使用するには、まず[Vosk Models](https://alphacephei.com/vosk/models)ページから必要なモデルをダウンロードしてください。その後、ダウンロードしたモデルパッケージをローカルに解凍し、対応するモデルフォルダのパスをソフトウェア設定に追加します。
|
||||
|
||||

|
||||
|
||||
## SOSVモデルの使用
|
||||
|
||||
SOSVモデルの使用方法はVoskと同じで、ダウンロードアドレスは以下の通りです:https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
|
||||
|
||||
## macOS でのシステムオーディオ出力の取得方法
|
||||
|
||||
> [マルチ出力デバイスの設定](https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device) チュートリアルに基づいて作成
|
||||
|
||||
|
||||
字幕エンジンは macOS プラットフォームで直接システムオーディオ出力を取得できず、追加のドライバーインストールが必要です。現在の字幕エンジンでは [BlackHole](https://github.com/ExistentialAudio/BlackHole) を使用しています。まずターミナルを開き、以下のいずれかのコマンドを実行してください(最初のオプションを推奨します):
|
||||
|
||||
```bash
|
||||
brew install blackhole-2ch
|
||||
brew install blackhole-16ch
|
||||
brew install blackhole-64ch
|
||||
```
|
||||
|
||||

|
||||
|
||||
インストール完了後、`オーディオMIDI設定`(`cmd + space`で検索可能)を開きます。デバイスリストにBlackHoleが表示されているか確認してください - 表示されていない場合はコンピュータを再起動してください。
|
||||
|
||||

|
||||
|
||||
BlackHoleのインストールが確認できたら、`オーディオ MIDI 設定`ページで左下のプラス(+)ボタンをクリックし、「マルチ出力デバイスを作成」を選択します。出力に BlackHole と希望するオーディオ出力先の両方を含めてください。最後に、このマルチ出力デバイスをデフォルトのオーディオ出力デバイスに設定します。
|
||||
|
||||

|
||||
|
||||
これで字幕エンジンがシステムオーディオ出力をキャプチャし、字幕を生成できるようになります。
|
||||
|
||||
## Linux でシステムオーディオ出力を取得する
|
||||
|
||||
まずターミナルで以下を実行してください:
|
||||
|
||||
```bash
|
||||
pactl list short sources
|
||||
```
|
||||
|
||||
以下のような出力が確認できれば追加設定は不要です:
|
||||
|
||||
```bash
|
||||
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
|
||||
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
|
||||
```
|
||||
|
||||
それ以外の場合は、以下のコマンドで`pulseaudio`と`pavucontrol`をインストールしてください:
|
||||
|
||||
```bash
|
||||
# Debian/Ubuntu系の場合
|
||||
sudo apt install pulseaudio pavucontrol
|
||||
# CentOS系の場合
|
||||
sudo yum install pulseaudio pavucontrol
|
||||
```
|
||||
|
||||
## ソフトウェアの使い方
|
||||
|
||||
### 設定の変更
|
||||
|
||||
字幕の設定は3つのカテゴリーに分かれます:一般的な設定、字幕エンジンの設定、字幕スタイルの設定。注意すべき点として、一般的な設定の変更は即座に適用されます。しかし、他の2つの設定については、変更後に該当する設定モジュール右上の「適用」オプションをクリックすることで初めて変更が有効になります。「変更を取り消す」を選択すると、現在の変更は保存されず、前回の状態に戻ります。
|
||||
|
||||
### 字幕の開始と停止
|
||||
|
||||
すべての設定を完了したら、インターフェースの「字幕エンジンを開始」ボタンをクリックして字幕を開始できます。独立した字幕表示ウィンドウが必要な場合は、インターフェースの「字幕ウィンドウを開く」ボタンをクリックして独立した字幕表示ウィンドウをアクティブ化します。字幕認識を一時停止する必要がある場合は、「字幕エンジンを停止」ボタンをクリックします。
|
||||
|
||||
### 字幕表示ウィンドウの調整
|
||||
|
||||
下の図は字幕表示ウィンドウです。このウィンドウは現在の最新の字幕をリアルタイムで表示します。ウィンドウの右上にある3つのボタンの機能は、それぞれ字幕表示ウィンドウを閉じる、字幕制御ウィンドウを開く、マウス透過を有効化することです。このウィンドウの幅は調整可能です。マウスをウィンドウの左右の端に移動し、ドラッグして幅を調整します。
|
||||
|
||||

|
||||
|
||||
### 字幕記録のエクスポート
|
||||
|
||||
「エクスポート」ボタンをクリックすると、字幕記録を JSON または SRT ファイル形式で出力できます。
|
||||
|
||||
## 字幕エンジン
|
||||
|
||||
字幕エンジンとは、システムのオーディオ入力(録音)または出力(再生音)のストリーミングデータをリアルタイムで取得し、音声テキスト変換モデルを呼び出して対応する字幕を生成するサブプログラムです。生成された字幕は JSON 形式の文字列に変換され、標準出力を通じてメインプログラムに渡されます。メインプログラムは字幕データを読み取り、処理した後、ウィンドウに表示します。
|
||||
|
||||
ソフトウェアには2つのデフォルトの字幕エンジンが用意されています。他の字幕エンジンが必要な場合、カスタムエンジンオプションを有効にすることで呼び出すことができます(他のエンジンはこのソフトウェア向けに特別に開発する必要があります)。エンジンパスはコンピュータ上のカスタム字幕エンジンの場所を指し、エンジンコマンドはカスタム字幕エンジンの実行パラメータを表します。これらは該当する字幕エンジンの規則に従って設定する必要があります。
|
||||
|
||||

|
||||
|
||||
カスタム字幕エンジンを使用する場合、前の字幕エンジンの設定はすべて無効になります。カスタム字幕エンジンの設定は完全にエンジンコマンドによって行われます。
|
||||
|
||||
開発者の方で、カスタム字幕エンジンを開発したい場合は、[字幕エンジン説明文書](../engine-manual/ja.md)をご覧ください。
|
||||
330
docs/user-manual/zh.md
Normal file
@@ -0,0 +1,330 @@
|
||||
# Auto Caption 用户手册
|
||||
|
||||
对应版本:v1.1.1
|
||||
|
||||
## 软件简介
|
||||
|
||||
Auto Caption 是一个跨平台的字幕显示软件,能够实时获取系统音频输入(录音)或输出(播放声音)的流式数据,并调用音频转文字的模型生成对应音频的字幕。软件提供的默认字幕引擎(使用阿里云 Gummy 模型)支持九种语言(中、英、日、韩、德、法、俄、西、意)的识别与翻译。
|
||||
|
||||
目前软件默认字幕引擎在 Windows、 macOS 和 Linux 平台下均拥有完整功能,在 macOS 要获取系统音频输出需要额外配置。
|
||||
|
||||
测试过可正常运行的操作系统信息如下,软件不能保证在非下列版本的操作系统上正常运行。
|
||||
|
||||
| 操作系统版本 | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
|
||||
| ------------------ | ---------- | ---------------- | ---------------- |
|
||||
| Windows 11 24H2 | x64 | ✅ | ✅ |
|
||||
| macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ |
|
||||
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
|
||||
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
|
||||
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
|
||||
|
||||

|
||||
|
||||
### 软件缺点
|
||||
|
||||
要使用默认的 Gummy 字幕引擎需要获取阿里云的 API KEY。
|
||||
|
||||
在 macOS 平台获取音频输出需要额外配置。
|
||||
|
||||
软件使用 Electron 构建,因此软件体积不可避免的较大。
|
||||
|
||||
## Gummy 引擎使用前准备
|
||||
|
||||
要使用软件提供的默认字幕引擎(阿里云 Gummy),需要从阿里云百炼平台获取 API KEY,然后将 API KEY 添加到软件设置中或者配置到环境变量中(仅 Windows 平台支持读取环境变量中的 API KEY)。
|
||||
|
||||
**国际版的阿里云服务并没有提供 Gummy 模型,因此目前非中国用户无法使用默认字幕引擎。**
|
||||
|
||||
这部分阿里云提供了详细的教程,可参考:
|
||||
|
||||
- [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
## GLM 引擎使用前准备
|
||||
|
||||
需要先获取 API KEY,参考:[Quick Start](https://docs.bigmodel.cn/en/guide/start/quick-start)。
|
||||
|
||||
## Vosk 引擎使用前准备
|
||||
|
||||
如果要使用 Vosk 本地字幕引擎,首先需要在 [Vosk Models](https://alphacephei.com/vosk/models) 页面下载你需要的模型。然后将下载的模型安装包解压到本地,并将对应的模型文件夹的路径添加到软件的设置中。
|
||||
|
||||

|
||||
|
||||
## 使用 SOSV 模型
|
||||
|
||||
使用 SOSV 模型的方式和 Vosk 一样,下载地址如下:https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
|
||||
|
||||
## macOS 获取系统音频输出
|
||||
|
||||
> 基于 [Setup Multi-Output Device](https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device) 教程编写
|
||||
|
||||
字幕引擎无法在 macOS 平台直接获取系统的音频输出,需要安装额外的驱动。目前字幕引擎采用的是 [BlackHole](https://github.com/ExistentialAudio/BlackHole)。首先打开终端,执行以下命令中的其中一个(建议选择第一个):
|
||||
|
||||
```bash
|
||||
brew install blackhole-2ch
|
||||
brew install blackhole-16ch
|
||||
brew install blackhole-64ch
|
||||
```
|
||||
|
||||

|
||||
|
||||
安装完成后打开 `音频 MIDI 设置`(`cmd + space` 打开搜索,可以搜索到)。观察设备列表中是否有 BlackHole 设备,如果没有需要重启电脑。
|
||||
|
||||

|
||||
|
||||
在确定安装好 BlackHole 设备后,在 `音频 MIDI 设置` 页面,点击左下角的加号,选择“创建多输出设备”。在输出中包含 BlackHole 和你想要的音频输出目标。最后将该多输出设备设置为默认音频输出设备。
|
||||
|
||||

|
||||
|
||||
现在字幕引擎就能捕获系统的音频输出并生成字幕了。
|
||||
|
||||
## Linux 获取系统音频输出
|
||||
|
||||
首先在控制台执行:
|
||||
|
||||
```bash
|
||||
pactl list short sources
|
||||
```
|
||||
|
||||
如果有以下类似的输出内容则无需额外配置:
|
||||
|
||||
```bash
|
||||
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
|
||||
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
|
||||
```
|
||||
|
||||
否则,执行以下命令安装 `pulseaudio` 和 `pavucontrol`:
|
||||
|
||||
```bash
|
||||
# Debian or Ubuntu, etc.
|
||||
sudo apt install pulseaudio pavucontrol
|
||||
# CentOS, etc.
|
||||
sudo yum install pulseaudio pavucontrol
|
||||
```
|
||||
|
||||
## 软件使用
|
||||
|
||||
### 修改设置
|
||||
|
||||
字幕设置可以分为三类:通用设置、字幕引擎设置、字幕样式设置。需要注意的是,修改通用设置是立即生效的。但是对于其他两类设置,修改后需要点击对应设置模块右上角的“应用”选项,更改才会真正生效。如果点击“取消更改”那么当前修改将不会被保存,而是回退到上次修改的状态。
|
||||
|
||||
### 启动和关闭字幕
|
||||
|
||||
在修改完全部配置后,点击界面的“启动字幕引擎”按钮,即可启动字幕。如果需要独立的字幕展示窗口,单击界面的“打开字幕窗口”按钮即可激活独立的字幕展示窗口。如果需要暂停字幕识别,单击界面的“关闭字幕引擎”按钮即可。
|
||||
|
||||
### 调整字幕展示窗口
|
||||
|
||||
如下图为字幕展示窗口,该窗口实时展示当前最新字幕。窗口右上角三个按钮的功能分别是:关闭字幕展示窗口、打开字幕控制窗口、启用鼠标穿透。该窗口宽度可以调整,将鼠标移动至窗口的左右边缘,拖动鼠标即可调整宽度。
|
||||
|
||||

|
||||
|
||||
### 字幕记录的导出
|
||||
|
||||
在字幕控制窗口中可以看到当前收集的所有字幕的记录,点击“导出字幕”按钮,即可将字幕记录导出为 JSON 或 SRT 文件。
|
||||
|
||||
## 字幕引擎
|
||||
|
||||
所谓的字幕引擎实际上是一个子程序,它会实时获取系统音频输入(录音)或输出(播放声音)的流式数据,并调用音频转文字的模型生成对应音频的字幕。生成的字幕通过转换为字符串的 JSON 数据,并通过标准输出传递给主程序。主程序读取字幕数据,处理后显示在窗口上。
|
||||
|
||||
软件提供了两个默认的字幕引擎,如果你需要其他的字幕引擎,可以通过打开自定义引擎选项来调用其他字幕引擎(其他引擎需要针对该软件进行开发)。其中引擎路径是自定义字幕引擎在你的电脑上的路径,引擎指令是自定义字幕引擎的运行参数,这部分需要按该字幕引擎的规则进行填写。
|
||||
|
||||

|
||||
|
||||
注意使用自定义字幕引擎时,前面的字幕引擎的设置将全部不起作用,自定义字幕引擎的配置完全通过引擎指令进行配置。
|
||||
|
||||
如果你是开发者,想开发自定义字幕引擎,请查看[字幕引擎说明文档](../engine-manual/zh.md)。
|
||||
|
||||
## 单独使用字幕引擎
|
||||
|
||||
### 运行参数说明
|
||||
|
||||
> 以下内容默认用户对使用终端运行程序有一定了解。
|
||||
|
||||
字幕引擎可用使用的完整的运行参数如下:
|
||||
|
||||

|
||||
|
||||
而在单独使用时其中某些参数并不需要使用,或者不适合进行修改。
|
||||
|
||||
下面的运行参数说明仅包含必要的参数。
|
||||
|
||||
#### `-e , --caption_engine`
|
||||
|
||||
需要选择的字幕引擎模型,目前有四个可用,分别为:`gummy, glm, vosk, sosv`。
|
||||
|
||||
该项的默认值为 `gummy`。
|
||||
|
||||
该项适用于所有模型。
|
||||
|
||||
#### `-a, --audio_type`
|
||||
|
||||
需要识别的音频类型,其中 `0` 表示系统音频输出,`1` 表示麦克风音频输入。
|
||||
|
||||
该项的默认值为 `0`。
|
||||
|
||||
该项适用于所有模型。
|
||||
|
||||
#### `-d, --display_caption`
|
||||
|
||||
是否在控制台显示字幕,`0` 表示不显示,`1` 表示显示。
|
||||
|
||||
该项默认值为 `0`,只使用字幕引擎的话建议选 `1`。
|
||||
|
||||
该项适用于所有模型。
|
||||
|
||||
#### `-t, --target_language`
|
||||
|
||||
> 其中 Vosk 和 SOSV 模型分句效果较差,会导致翻译内容难以理解,不太建议这两个模型使用翻译。
|
||||
|
||||
需要翻译成的目标语言,所有模型都支持的翻译语言如下:
|
||||
|
||||
- `none` 不进行翻译
|
||||
- `zh` 简体中文
|
||||
- `en` 英语
|
||||
- `ja` 日语
|
||||
- `ko` 韩语
|
||||
|
||||
除此之外 `vosk` 和 `sosv` 模型还支持如下翻译:
|
||||
|
||||
- `de` 德语
|
||||
- `fr` 法语
|
||||
- `ru` 俄语
|
||||
- `es` 西班牙语
|
||||
- `it` 意大利语
|
||||
|
||||
该项的默认值为 `none`。
|
||||
|
||||
该项适用于所有模型。
|
||||
|
||||
#### `-s, --source_language`
|
||||
|
||||
需要识别的语言的源语言,默认值为 `auto`,表示不指定源语言。
|
||||
|
||||
但是指定源语言能在一定程度上提高识别准确率,可用使用上面的语言代码指定源语言。
|
||||
|
||||
该项适用于 Gummy、GLM 和 SOSV 模型。
|
||||
|
||||
其中 Gummy 模型可用使用上述全部的语言,在加上粤语(`yue`)。
|
||||
|
||||
GLM 模型支持指定的语言有:英语、中文、日语、韩语。
|
||||
|
||||
SOSV 模型支持指定的语言有:英语、中文、日语、韩语、粤语。
|
||||
|
||||
#### `-k, --api_key`
|
||||
|
||||
指定 `Gummy` 模型需要使用的阿里云 API KEY。
|
||||
|
||||
该项默认值为空。
|
||||
|
||||
该项仅适用于 Gummy 模型。
|
||||
|
||||
#### `-gkey, --glm_api_key`
|
||||
|
||||
指定 `glm` 模型需要使用的 API KEY,默认为空。
|
||||
|
||||
#### `-gmodel, --glm_model`
|
||||
|
||||
指定 `glm` 模型需要使用的模型名称,默认为 `glm-asr-2512`。
|
||||
|
||||
#### `-gurl, --glm_url`
|
||||
|
||||
指定 `glm` 模型需要使用的 API URL,默认值为:`https://open.bigmodel.cn/api/paas/v4/audio/transcriptions`。
|
||||
|
||||
#### `-tm, --translation_model`
|
||||
|
||||
指定 Vosk 和 SOSV 模型的翻译方式,默认为 `ollama`。
|
||||
|
||||
该项支持的值有:
|
||||
|
||||
- `ollama` 使用本地 Ollama 模型进行翻译,需要用户安装 Ollama 软件和对应的模型
|
||||
- `google` 使用 Google 翻译 API 进行翻译,无需额外配置,但是需要有能访问 Google 的网络
|
||||
|
||||
该项仅适用于 Vosk 和 SOSV 模型。
|
||||
|
||||
#### `-omn, --ollama_name`
|
||||
|
||||
指定要使用的翻译模型名称,可以是 Ollama 本地模型,也可以是 OpenAI API 兼容的云端模型。若未填写 Base URL 字段,则默认调用本地 Ollama 服务,否则会通过 Python OpenAI 库调用该地址指向的 API 服务。
|
||||
|
||||
如果使用 Ollama 模型,建议使用参数量小于 1B 的模型,比如: `qwen2.5:0.5b`, `qwen3:0.6b`。需要在 Ollama 中下载了对应的模型才能正常使用。
|
||||
|
||||
默认值为空,适用于除了 Gummy 外的其他模型。
|
||||
|
||||
#### `-ourl, --ollama_url`
|
||||
|
||||
调用 OpenAI API 的基础请求地址,如果不填写则调用本地默认端口的 Ollama 模型。
|
||||
|
||||
默认值为空,适用于除了 Gummy 外的其他模型。
|
||||
|
||||
#### `-okey, --ollama_api_key`
|
||||
|
||||
指定调用 OpenAI 兼容模型的 API KEY。
|
||||
|
||||
默认值为空,适用于除了 Gummy 外的其他模型。
|
||||
|
||||
#### `-vosk, --vosk_model`
|
||||
|
||||
指定需要调用的 Vosk 模型的本地文件夹的路径。该项默认值为空。
|
||||
|
||||
该项仅适用于 Vosk 模型。
|
||||
|
||||
#### `-sosv, --sosv_model`
|
||||
|
||||
指定需要调用的 SOSV 模型的本地文件夹的路径。该项默认值为空。
|
||||
|
||||
该项仅适用于 SOSV 模型。
|
||||
|
||||
### 使用源代码运行字幕引擎
|
||||
|
||||
> 以下内容默认使用该方式的用户对 Python 环境配置和使用有所了解。
|
||||
|
||||
首先下载项目源代码到本地,其中字幕引擎源代码在项目的 `engine` 目录下。然后配置 Python 环境,其中项目依赖的 Python 包在 `engine` 目录下 `requirements.txt` 文件中。
|
||||
|
||||
配置好后进入 `engine` 目录,执行命令进行运行字幕引擎。
|
||||
|
||||
比如要使用 Gummy 模型,指定音频类型为系统音频输出,源语言为英语,翻译语言为中文,执行的命令如下:
|
||||
|
||||
> 注意:为了更直观,下面的命令写在了多行,如果执行失败,尝试去掉反斜杠,并改换单行命令执行。
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e gummy \
|
||||
-k sk-******************************** \
|
||||
-a 0 \
|
||||
-d 1 \
|
||||
-s en \
|
||||
-t zh
|
||||
```
|
||||
|
||||
指定 Vosk 模型,指定音频类型为系统音频输出,翻译语言为英语,使用 Ollama `qwen3:0.6b` 模型进行翻译:
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e vosk \
|
||||
-vosk D:\Projects\auto-caption\engine\models\vosk-model-small-cn-0.22 \
|
||||
-a 0 \
|
||||
-d 1 \
|
||||
-t en \
|
||||
```
|
||||
|
||||
指定 SOSV 模型,指定音频类型为麦克风,自动选择源语言,不翻译,执行的命令如下:
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e sosv \
|
||||
-sosv D:\\Projects\\auto-caption\\engine\\models\\sosv-int8 \
|
||||
-a 1 \
|
||||
-d 1 \
|
||||
-s auto \
|
||||
-t none
|
||||
```
|
||||
|
||||
使用 Gummy 模型的运行效果如下:
|
||||
|
||||

|
||||
|
||||
### 运行字幕引擎可执行文件
|
||||
|
||||
首先在 [GitHub Release](https://github.com/HiMeditator/auto-caption/releases/tag/engine) 中下载对应平台的可执行文件(目前仅提供 Windows 和 Linux 平台的字幕引擎可执行文件)。
|
||||
|
||||
然后再字幕引擎可执行文件所在目录打开终端,执行命令进行运行字幕引擎。
|
||||
|
||||
只需要将上述指令中的 `python main.py` 替换为可执行文件名称即可(比如:`engine-win.exe`)。
|
||||
@@ -1,22 +1,30 @@
|
||||
appId: com.himeditator.autocaption
|
||||
productName: auto-caption
|
||||
productName: Auto Caption
|
||||
directories:
|
||||
buildResources: build
|
||||
files:
|
||||
- '!**/.vscode/*'
|
||||
- '!src/*'
|
||||
- '!electron.vite.config.{js,ts,mjs,cjs}'
|
||||
- '!{.eslintcache,eslint.config.mjs,.prettierignore,.prettierrc.yaml,dev-app-update.yml,CHANGELOG.md,README.md}'
|
||||
- '!{.eslintcache,eslint.config.mjs,.prettierignore,.prettierrc.yaml,dev-app-update.yml,CHANGELOG.md}'
|
||||
- '!{LICENSE,README.md,README_en.md,README_ja.md}'
|
||||
- '!{.env,.env.*,.npmrc,pnpm-lock.yaml}'
|
||||
- '!{tsconfig.json,tsconfig.node.json,tsconfig.web.json}'
|
||||
- '!engine/*'
|
||||
- '!docs/*'
|
||||
- '!assets/*'
|
||||
- '!.repomap/*'
|
||||
- '!.virtualme/*'
|
||||
extraResources:
|
||||
from: ./python-subprocess/dist/main-gummy.exe
|
||||
to: ./python-subprocess/dist/main-gummy.exe
|
||||
asarUnpack:
|
||||
- resources/**
|
||||
# For Windows
|
||||
- from: ./engine/dist/main.exe
|
||||
to: ./engine/main.exe
|
||||
# For macOS and Linux
|
||||
- from: ./engine/dist/main
|
||||
to: ./engine/main
|
||||
win:
|
||||
executableName: auto-caption
|
||||
icon: resources/icon.png
|
||||
icon: build/icon.png
|
||||
nsis:
|
||||
artifactName: ${name}-${version}-setup.${ext}
|
||||
shortcutName: ${productName}
|
||||
|
||||
4
engine/audio2text/__init__.py
Normal file
@@ -0,0 +1,4 @@
|
||||
from .gummy import GummyRecognizer
|
||||
from .vosk import VoskRecognizer
|
||||
from .sosv import SosvRecognizer
|
||||
from .glm import GlmRecognizer
|
||||
163
engine/audio2text/glm.py
Normal file
@@ -0,0 +1,163 @@
|
||||
import threading
|
||||
import io
|
||||
import wave
|
||||
import struct
|
||||
import math
|
||||
import audioop
|
||||
import requests
|
||||
from datetime import datetime
|
||||
|
||||
from utils import shared_data
|
||||
from utils import stdout_cmd, stdout_obj, google_translate, ollama_translate
|
||||
|
||||
class GlmRecognizer:
|
||||
"""
|
||||
使用 GLM-ASR 引擎处理音频数据,并在标准输出中输出 Auto Caption 软件可读取的 JSON 字符串数据
|
||||
|
||||
初始化参数:
|
||||
url: GLM-ASR API URL
|
||||
model: GLM-ASR 模型名称
|
||||
api_key: GLM-ASR API Key
|
||||
source: 源语言
|
||||
target: 目标语言
|
||||
trans_model: 翻译模型名称
|
||||
ollama_name: Ollama 模型名称
|
||||
"""
|
||||
def __init__(self, url: str, model: str, api_key: str, source: str, target: str | None, trans_model: str, ollama_name: str, ollama_url: str = '', ollama_api_key: str = ''):
|
||||
self.url = url
|
||||
self.model = model
|
||||
self.api_key = api_key
|
||||
self.source = source
|
||||
self.target = target
|
||||
if trans_model == 'google':
|
||||
self.trans_func = google_translate
|
||||
else:
|
||||
self.trans_func = ollama_translate
|
||||
self.ollama_name = ollama_name
|
||||
self.ollama_url = ollama_url
|
||||
self.ollama_api_key = ollama_api_key
|
||||
|
||||
self.audio_buffer = []
|
||||
self.is_speech = False
|
||||
self.silence_frames = 0
|
||||
self.speech_start_time = None
|
||||
self.time_str = ''
|
||||
self.cur_id = 0
|
||||
|
||||
# VAD settings (假设 16k 16bit, chunk size 1024 or similar)
|
||||
# 16bit = 2 bytes per sample.
|
||||
# RMS threshold needs tuning. 500 is a conservative guess for silence.
|
||||
self.threshold = 500
|
||||
self.silence_limit = 15 # frames (approx 0.5-1s depending on chunk size)
|
||||
self.min_speech_frames = 10 # frames
|
||||
|
||||
def start(self):
|
||||
"""启动 GLM 引擎"""
|
||||
stdout_cmd('info', 'GLM-ASR recognizer started.')
|
||||
|
||||
def stop(self):
|
||||
"""停止 GLM 引擎"""
|
||||
stdout_cmd('info', 'GLM-ASR recognizer stopped.')
|
||||
|
||||
def process_audio(self, chunk):
|
||||
# chunk is bytes (int16)
|
||||
rms = audioop.rms(chunk, 2)
|
||||
|
||||
if rms > self.threshold:
|
||||
if not self.is_speech:
|
||||
self.is_speech = True
|
||||
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
|
||||
self.audio_buffer = []
|
||||
self.audio_buffer.append(chunk)
|
||||
self.silence_frames = 0
|
||||
else:
|
||||
if self.is_speech:
|
||||
self.audio_buffer.append(chunk)
|
||||
self.silence_frames += 1
|
||||
if self.silence_frames > self.silence_limit:
|
||||
# Speech ended
|
||||
if len(self.audio_buffer) > self.min_speech_frames:
|
||||
self.recognize(self.audio_buffer, self.time_str)
|
||||
self.is_speech = False
|
||||
self.audio_buffer = []
|
||||
self.silence_frames = 0
|
||||
|
||||
def recognize(self, audio_frames, time_s):
|
||||
audio_bytes = b''.join(audio_frames)
|
||||
|
||||
wav_io = io.BytesIO()
|
||||
with wave.open(wav_io, 'wb') as wav_file:
|
||||
wav_file.setnchannels(1)
|
||||
wav_file.setsampwidth(2)
|
||||
wav_file.setframerate(16000)
|
||||
wav_file.writeframes(audio_bytes)
|
||||
wav_io.seek(0)
|
||||
|
||||
threading.Thread(
|
||||
target=self._do_request,
|
||||
args=(wav_io.read(), time_s, self.cur_id)
|
||||
).start()
|
||||
self.cur_id += 1
|
||||
|
||||
def _do_request(self, audio_content, time_s, index):
|
||||
try:
|
||||
files = {
|
||||
'file': ('audio.wav', audio_content, 'audio/wav')
|
||||
}
|
||||
data = {
|
||||
'model': self.model,
|
||||
'stream': 'false'
|
||||
}
|
||||
headers = {
|
||||
'Authorization': f'Bearer {self.api_key}'
|
||||
}
|
||||
|
||||
response = requests.post(self.url, headers=headers, data=data, files=files, timeout=15)
|
||||
|
||||
if response.status_code == 200:
|
||||
res_json = response.json()
|
||||
text = res_json.get('text', '')
|
||||
if text:
|
||||
self.output_caption(text, time_s, index)
|
||||
else:
|
||||
try:
|
||||
err_msg = response.json()
|
||||
stdout_cmd('error', f"GLM API Error: {err_msg}")
|
||||
except:
|
||||
stdout_cmd('error', f"GLM API Error: {response.text}")
|
||||
|
||||
except Exception as e:
|
||||
stdout_cmd('error', f"GLM Request Failed: {str(e)}")
|
||||
|
||||
def output_caption(self, text, time_s, index):
|
||||
caption = {
|
||||
'command': 'caption',
|
||||
'index': index,
|
||||
'time_s': time_s,
|
||||
'time_t': datetime.now().strftime('%H:%M:%S.%f')[:-3],
|
||||
'text': text,
|
||||
'translation': ''
|
||||
}
|
||||
|
||||
if self.target:
|
||||
if self.trans_func == ollama_translate:
|
||||
th = threading.Thread(
|
||||
target=self.trans_func,
|
||||
args=(self.ollama_name, self.target, caption['text'], time_s, self.ollama_url, self.ollama_api_key),
|
||||
daemon=True
|
||||
)
|
||||
else:
|
||||
th = threading.Thread(
|
||||
target=self.trans_func,
|
||||
args=(self.ollama_name, self.target, caption['text'], time_s),
|
||||
daemon=True
|
||||
)
|
||||
th.start()
|
||||
|
||||
stdout_obj(caption)
|
||||
|
||||
def translate(self):
|
||||
global shared_data
|
||||
while shared_data.status == 'running':
|
||||
chunk = shared_data.chunk_queue.get()
|
||||
self.process_audio(chunk)
|
||||
117
engine/audio2text/gummy.py
Normal file
@@ -0,0 +1,117 @@
|
||||
from dashscope.audio.asr import (
|
||||
TranslationRecognizerCallback,
|
||||
TranscriptionResult,
|
||||
TranslationResult,
|
||||
TranslationRecognizerRealtime
|
||||
)
|
||||
import dashscope
|
||||
from dashscope.common.error import InvalidParameter
|
||||
from datetime import datetime
|
||||
from utils import stdout_cmd, stdout_obj, stdout_err
|
||||
from utils import shared_data
|
||||
|
||||
class Callback(TranslationRecognizerCallback):
|
||||
"""
|
||||
语音大模型流式传输回调对象
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.index = 0
|
||||
self.usage = 0
|
||||
self.cur_id = -1
|
||||
self.time_str = ''
|
||||
|
||||
def on_open(self) -> None:
|
||||
self.usage = 0
|
||||
self.cur_id = -1
|
||||
self.time_str = ''
|
||||
stdout_cmd('info', 'Gummy translator started.')
|
||||
|
||||
def on_close(self) -> None:
|
||||
stdout_cmd('info', 'Gummy translator closed.')
|
||||
stdout_cmd('usage', str(self.usage))
|
||||
|
||||
def on_event(
|
||||
self,
|
||||
request_id,
|
||||
transcription_result: TranscriptionResult,
|
||||
translation_result: TranslationResult,
|
||||
usage
|
||||
) -> None:
|
||||
caption = {}
|
||||
|
||||
if transcription_result is not None:
|
||||
if self.cur_id != transcription_result.sentence_id:
|
||||
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
|
||||
self.cur_id = transcription_result.sentence_id
|
||||
self.index += 1
|
||||
caption['command'] = 'caption'
|
||||
caption['index'] = self.index
|
||||
caption['time_s'] = self.time_str
|
||||
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
|
||||
caption['text'] = transcription_result.text
|
||||
caption['translation'] = ""
|
||||
|
||||
if translation_result is not None:
|
||||
lang = translation_result.get_language_list()[0]
|
||||
caption['translation'] = translation_result.get_translation(lang).text
|
||||
|
||||
if usage:
|
||||
self.usage += usage['duration']
|
||||
|
||||
if 'text' in caption:
|
||||
stdout_obj(caption)
|
||||
|
||||
|
||||
class GummyRecognizer:
|
||||
"""
|
||||
使用 Gummy 引擎流式处理的音频数据,并在标准输出中输出与 Auto Caption 软件可读取的 JSON 字符串数据
|
||||
|
||||
初始化参数:
|
||||
rate: 音频采样率
|
||||
source: 源语言代码字符串(zh, en, ja 等)
|
||||
target: 目标语言代码字符串(zh, en, ja 等)
|
||||
api_key: 阿里云百炼平台 API KEY
|
||||
"""
|
||||
def __init__(self, rate: int, source: str, target: str | None, api_key: str | None):
|
||||
if api_key:
|
||||
dashscope.api_key = api_key
|
||||
self.translator = TranslationRecognizerRealtime(
|
||||
model = "gummy-realtime-v1",
|
||||
format = "pcm",
|
||||
sample_rate = rate,
|
||||
transcription_enabled = True,
|
||||
translation_enabled = (target is not None),
|
||||
source_language = source,
|
||||
translation_target_languages = [target],
|
||||
callback = Callback()
|
||||
)
|
||||
|
||||
def start(self):
|
||||
"""启动 Gummy 引擎"""
|
||||
self.translator.start()
|
||||
|
||||
def translate(self):
|
||||
"""持续读取共享数据中的音频帧,并进行语音识别,将识别结果输出到标准输出中"""
|
||||
global shared_data
|
||||
restart_count = 0
|
||||
while shared_data.status == 'running':
|
||||
chunk = shared_data.chunk_queue.get()
|
||||
try:
|
||||
self.translator.send_audio_frame(chunk)
|
||||
except InvalidParameter as e:
|
||||
restart_count += 1
|
||||
if restart_count > 5:
|
||||
stdout_err(str(e))
|
||||
shared_data.status = "kill"
|
||||
stdout_cmd('kill')
|
||||
break
|
||||
else:
|
||||
stdout_cmd('info', f'Gummy engine stopped, restart attempt: {restart_count}...')
|
||||
|
||||
def stop(self):
|
||||
"""停止 Gummy 引擎"""
|
||||
try:
|
||||
self.translator.stop()
|
||||
except Exception:
|
||||
return
|
||||
178
engine/audio2text/sosv.py
Normal file
@@ -0,0 +1,178 @@
|
||||
"""
|
||||
Shepra-ONNX SenseVoice Model
|
||||
|
||||
This code file references the following:
|
||||
|
||||
https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/simulate-streaming-sense-voice-microphone.py
|
||||
"""
|
||||
|
||||
import time
|
||||
from datetime import datetime
|
||||
import sherpa_onnx
|
||||
import threading
|
||||
import numpy as np
|
||||
|
||||
from utils import shared_data
|
||||
from utils import stdout_cmd, stdout_obj
|
||||
from utils import google_translate, ollama_translate
|
||||
|
||||
|
||||
class SosvRecognizer:
|
||||
"""
|
||||
使用 Sense Voice 非流式模型处理流式音频数据,并在标准输出中输出 Auto Caption 软件可读取的 JSON 字符串数据
|
||||
|
||||
初始化参数:
|
||||
model_path: Shepra ONNX Sense Voice 识别模型路径
|
||||
vad_model: Silero VAD 模型路径
|
||||
source: 识别源语言(auto, zh, en, ja, ko, yue)
|
||||
target: 翻译目标语言
|
||||
trans_model: 翻译模型名称
|
||||
ollama_name: Ollama 模型名称
|
||||
"""
|
||||
def __init__(self, model_path: str, source: str, target: str | None, trans_model: str, ollama_name: str, ollama_url: str = '', ollama_api_key: str = ''):
|
||||
if model_path.startswith('"'):
|
||||
model_path = model_path[1:]
|
||||
if model_path.endswith('"'):
|
||||
model_path = model_path[:-1]
|
||||
self.model_path = model_path
|
||||
self.ext = ""
|
||||
if self.model_path[-4:] == "int8":
|
||||
self.ext = ".int8"
|
||||
self.source = source
|
||||
self.target = target
|
||||
if trans_model == 'google':
|
||||
self.trans_func = google_translate
|
||||
else:
|
||||
self.trans_func = ollama_translate
|
||||
self.ollama_name = ollama_name
|
||||
self.ollama_url = ollama_url
|
||||
self.ollama_api_key = ollama_api_key
|
||||
self.time_str = ''
|
||||
self.cur_id = 0
|
||||
self.prev_content = ''
|
||||
|
||||
def start(self):
|
||||
"""启动 Sense Voice 模型"""
|
||||
self.recognizer = sherpa_onnx.OfflineRecognizer.from_sense_voice(
|
||||
model=f"{self.model_path}/sensevoice/model{self.ext}.onnx",
|
||||
tokens=f"{self.model_path}/sensevoice/tokens.txt",
|
||||
language=self.source,
|
||||
num_threads = 2,
|
||||
)
|
||||
|
||||
vad_config = sherpa_onnx.VadModelConfig()
|
||||
vad_config.silero_vad.model = f"{self.model_path}/silero_vad.onnx"
|
||||
vad_config.silero_vad.threshold = 0.5
|
||||
vad_config.silero_vad.min_silence_duration = 0.1
|
||||
vad_config.silero_vad.min_speech_duration = 0.25
|
||||
vad_config.silero_vad.max_speech_duration = 5
|
||||
vad_config.sample_rate = 16000
|
||||
self.window_size = vad_config.silero_vad.window_size
|
||||
self.vad = sherpa_onnx.VoiceActivityDetector(vad_config, buffer_size_in_seconds=100)
|
||||
|
||||
if self.source == 'en':
|
||||
model_config = sherpa_onnx.OnlinePunctuationModelConfig(
|
||||
cnn_bilstm=f"{self.model_path}/punct-en/model{self.ext}.onnx",
|
||||
bpe_vocab=f"{self.model_path}/punct-en/bpe.vocab"
|
||||
)
|
||||
punct_config = sherpa_onnx.OnlinePunctuationConfig(
|
||||
model_config=model_config,
|
||||
)
|
||||
self.punct = sherpa_onnx.OnlinePunctuation(punct_config)
|
||||
else:
|
||||
punct_config = sherpa_onnx.OfflinePunctuationConfig(
|
||||
model=sherpa_onnx.OfflinePunctuationModelConfig(
|
||||
ct_transformer=f"{self.model_path}/punct/model{self.ext}.onnx"
|
||||
),
|
||||
)
|
||||
self.punct = sherpa_onnx.OfflinePunctuation(punct_config)
|
||||
|
||||
self.buffer = []
|
||||
self.offset = 0
|
||||
self.started = False
|
||||
self.started_time = .0
|
||||
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
|
||||
stdout_cmd('info', 'Shepra ONNX Sense Voice recognizer started.')
|
||||
|
||||
def send_audio_frame(self, data: bytes):
|
||||
"""
|
||||
发送音频帧给 SOSV 引擎,引擎将自动识别并将识别结果输出到标准输出中
|
||||
|
||||
Args:
|
||||
data: 音频帧数据,采样率必须为 16000Hz
|
||||
"""
|
||||
caption = {}
|
||||
caption['command'] = 'caption'
|
||||
caption['translation'] = ''
|
||||
|
||||
data_np = np.frombuffer(data, dtype=np.int16).astype(np.float32)
|
||||
self.buffer = np.concatenate([self.buffer, data_np])
|
||||
while self.offset + self.window_size < len(self.buffer):
|
||||
self.vad.accept_waveform(self.buffer[self.offset: self.offset + self.window_size])
|
||||
if not self.started and self.vad.is_speech_detected():
|
||||
self.started = True
|
||||
self.started_time = time.time()
|
||||
self.offset += self.window_size
|
||||
|
||||
if not self.started:
|
||||
if len(self.buffer) > 10 * self.window_size:
|
||||
self.offset -= len(self.buffer) - 10 * self.window_size
|
||||
self.buffer = self.buffer[-10 * self.window_size:]
|
||||
|
||||
if self.started and time.time() - self.started_time > 0.2:
|
||||
stream = self.recognizer.create_stream()
|
||||
stream.accept_waveform(16000, self.buffer)
|
||||
self.recognizer.decode_stream(stream)
|
||||
text = stream.result.text.strip()
|
||||
if text and self.prev_content != text:
|
||||
caption['index'] = self.cur_id
|
||||
caption['text'] = text
|
||||
caption['time_s'] = self.time_str
|
||||
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
|
||||
self.prev_content = text
|
||||
stdout_obj(caption)
|
||||
self.started_time = time.time()
|
||||
|
||||
while not self.vad.empty():
|
||||
stream = self.recognizer.create_stream()
|
||||
stream.accept_waveform(16000, self.vad.front.samples)
|
||||
self.vad.pop()
|
||||
self.recognizer.decode_stream(stream)
|
||||
text = stream.result.text.strip()
|
||||
|
||||
if self.source == 'en':
|
||||
text_with_punct = self.punct.add_punctuation_with_case(text)
|
||||
else:
|
||||
text_with_punct = self.punct.add_punctuation(text)
|
||||
|
||||
caption['index'] = self.cur_id
|
||||
caption['text'] = text_with_punct
|
||||
caption['time_s'] = self.time_str
|
||||
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
|
||||
if text:
|
||||
stdout_obj(caption)
|
||||
if self.target:
|
||||
th = threading.Thread(
|
||||
target=self.trans_func,
|
||||
args=(self.ollama_name, self.target, caption['text'], self.time_str, self.ollama_url, self.ollama_api_key),
|
||||
daemon=True
|
||||
)
|
||||
th.start()
|
||||
self.cur_id += 1
|
||||
self.prev_content = ''
|
||||
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
|
||||
self.buffer = []
|
||||
self.offset = 0
|
||||
self.started = False
|
||||
self.started_time = .0
|
||||
|
||||
def translate(self):
|
||||
"""持续读取共享数据中的音频帧,并进行语音识别,将识别结果输出到标准输出中"""
|
||||
global shared_data
|
||||
while shared_data.status == 'running':
|
||||
chunk = shared_data.chunk_queue.get()
|
||||
self.send_audio_frame(chunk)
|
||||
|
||||
def stop(self):
|
||||
"""停止 Sense Voice 模型"""
|
||||
stdout_cmd('info', 'Shepra ONNX Sense Voice recognizer closed.')
|
||||
98
engine/audio2text/vosk.py
Normal file
@@ -0,0 +1,98 @@
|
||||
import json
|
||||
import threading
|
||||
import time
|
||||
from datetime import datetime
|
||||
|
||||
from vosk import Model, KaldiRecognizer, SetLogLevel
|
||||
from utils import shared_data
|
||||
from utils import stdout_cmd, stdout_obj, google_translate, ollama_translate
|
||||
|
||||
|
||||
class VoskRecognizer:
|
||||
"""
|
||||
使用 Vosk 引擎流式处理的音频数据,并在标准输出中输出与 Auto Caption 软件可读取的 JSON 字符串数据
|
||||
|
||||
初始化参数:
|
||||
model_path: Vosk 识别模型路径
|
||||
target: 翻译目标语言
|
||||
trans_model: 翻译模型名称
|
||||
ollama_name: Ollama 模型名称
|
||||
"""
|
||||
def __init__(self, model_path: str, target: str | None, trans_model: str, ollama_name: str, ollama_url: str = '', ollama_api_key: str = ''):
|
||||
SetLogLevel(-1)
|
||||
if model_path.startswith('"'):
|
||||
model_path = model_path[1:]
|
||||
if model_path.endswith('"'):
|
||||
model_path = model_path[:-1]
|
||||
self.model_path = model_path
|
||||
self.target = target
|
||||
if trans_model == 'google':
|
||||
self.trans_func = google_translate
|
||||
else:
|
||||
self.trans_func = ollama_translate
|
||||
self.ollama_name = ollama_name
|
||||
self.ollama_url = ollama_url
|
||||
self.ollama_api_key = ollama_api_key
|
||||
self.time_str = ''
|
||||
self.cur_id = 0
|
||||
self.prev_content = ''
|
||||
|
||||
self.model = Model(self.model_path)
|
||||
self.recognizer = KaldiRecognizer(self.model, 16000)
|
||||
|
||||
def start(self):
|
||||
"""启动 Vosk 引擎"""
|
||||
stdout_cmd('info', 'Vosk recognizer started.')
|
||||
|
||||
def send_audio_frame(self, data: bytes):
|
||||
"""
|
||||
发送音频帧给 Vosk 引擎,引擎将自动识别并将识别结果输出到标准输出中
|
||||
|
||||
Args:
|
||||
data: 音频帧数据,采样率必须为 16000Hz
|
||||
"""
|
||||
caption = {}
|
||||
caption['command'] = 'caption'
|
||||
caption['translation'] = ''
|
||||
|
||||
if self.recognizer.AcceptWaveform(data):
|
||||
content = json.loads(self.recognizer.Result()).get('text', '')
|
||||
caption['index'] = self.cur_id
|
||||
caption['text'] = content
|
||||
caption['time_s'] = self.time_str
|
||||
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
|
||||
self.prev_content = ''
|
||||
if content == '': return
|
||||
self.cur_id += 1
|
||||
|
||||
if self.target:
|
||||
th = threading.Thread(
|
||||
target=self.trans_func,
|
||||
args=(self.ollama_name, self.target, caption['text'], self.time_str, self.ollama_url, self.ollama_api_key),
|
||||
daemon=True
|
||||
)
|
||||
th.start()
|
||||
else:
|
||||
content = json.loads(self.recognizer.PartialResult()).get('partial', '')
|
||||
if content == '' or content == self.prev_content:
|
||||
return
|
||||
if self.prev_content == '':
|
||||
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
|
||||
caption['index'] = self.cur_id
|
||||
caption['text'] = content
|
||||
caption['time_s'] = self.time_str
|
||||
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
|
||||
self.prev_content = content
|
||||
|
||||
stdout_obj(caption)
|
||||
|
||||
def translate(self):
|
||||
"""持续读取共享数据中的音频帧,并进行语音识别,将识别结果输出到标准输出中"""
|
||||
global shared_data
|
||||
while shared_data.status == 'running':
|
||||
chunk = shared_data.chunk_queue.get()
|
||||
self.send_audio_frame(chunk)
|
||||
|
||||
def stop(self):
|
||||
"""停止 Vosk 引擎"""
|
||||
stdout_cmd('info', 'Vosk recognizer closed.')
|
||||
279
engine/main.py
Normal file
@@ -0,0 +1,279 @@
|
||||
import wave
|
||||
import argparse
|
||||
import threading
|
||||
import datetime
|
||||
from utils import stdout, stdout_cmd, change_caption_display
|
||||
from utils import shared_data, start_server
|
||||
from utils import merge_chunk_channels, resample_chunk_mono
|
||||
from audio2text import GummyRecognizer
|
||||
from audio2text import VoskRecognizer
|
||||
from audio2text import SosvRecognizer
|
||||
from audio2text import GlmRecognizer
|
||||
from sysaudio import AudioStream
|
||||
|
||||
|
||||
def audio_recording(stream: AudioStream, resample: bool, record = False, path = ''):
|
||||
global shared_data
|
||||
stream.open_stream()
|
||||
wf = None
|
||||
full_name = ''
|
||||
if record:
|
||||
if path != '':
|
||||
if path.startswith('"') and path.endswith('"'):
|
||||
path = path[1:-1]
|
||||
if path[-1] != '/':
|
||||
path += '/'
|
||||
cur_dt = datetime.datetime.now()
|
||||
name = cur_dt.strftime("audio-%Y-%m-%dT%H-%M-%S")
|
||||
full_name = f'{path}{name}.wav'
|
||||
wf = wave.open(full_name, 'wb')
|
||||
wf.setnchannels(stream.CHANNELS)
|
||||
wf.setsampwidth(stream.SAMP_WIDTH)
|
||||
wf.setframerate(stream.RATE)
|
||||
stdout_cmd("info", "Audio recording...")
|
||||
while shared_data.status == 'running':
|
||||
raw_chunk = stream.read_chunk()
|
||||
if record: wf.writeframes(raw_chunk) # type: ignore
|
||||
if raw_chunk is None: continue
|
||||
if resample:
|
||||
chunk = resample_chunk_mono(raw_chunk, stream.CHANNELS, stream.RATE, 16000)
|
||||
else:
|
||||
chunk = merge_chunk_channels(raw_chunk, stream.CHANNELS)
|
||||
shared_data.chunk_queue.put(chunk)
|
||||
if record:
|
||||
stdout_cmd("info", f"Audio saved to {full_name}")
|
||||
wf.close() # type: ignore
|
||||
stream.close_stream_signal()
|
||||
|
||||
|
||||
def main_gummy(s: str, t: str, a: int, c: int, k: str, r: bool, rp: str):
|
||||
"""
|
||||
Parameters:
|
||||
s: Source language
|
||||
t: Target language
|
||||
k: Aliyun Bailian API key
|
||||
r: Whether to record the audio
|
||||
rp: Path to save the recorded audio
|
||||
"""
|
||||
stream = AudioStream(a, c)
|
||||
if t == 'none':
|
||||
engine = GummyRecognizer(stream.RATE, s, None, k)
|
||||
else:
|
||||
engine = GummyRecognizer(stream.RATE, s, t, k)
|
||||
|
||||
engine.start()
|
||||
stream_thread = threading.Thread(
|
||||
target=audio_recording,
|
||||
args=(stream, False, r, rp),
|
||||
daemon=True
|
||||
)
|
||||
stream_thread.start()
|
||||
try:
|
||||
engine.translate()
|
||||
except KeyboardInterrupt:
|
||||
stdout("Keyboard interrupt detected. Exiting...")
|
||||
engine.stop()
|
||||
|
||||
|
||||
def main_vosk(a: int, c: int, vosk: str, t: str, tm: str, omn: str, ourl: str, okey: str, r: bool, rp: str):
|
||||
"""
|
||||
Parameters:
|
||||
a: Audio source: 0 for output, 1 for input
|
||||
c: Chunk number in 1 second
|
||||
vosk: Vosk model path
|
||||
t: Target language
|
||||
tm: Translation model type, ollama or google
|
||||
omn: Ollama model name
|
||||
ourl: Ollama Base URL
|
||||
okey: Ollama API Key
|
||||
r: Whether to record the audio
|
||||
rp: Path to save the recorded audio
|
||||
"""
|
||||
stream = AudioStream(a, c)
|
||||
if t == 'none':
|
||||
engine = VoskRecognizer(vosk, None, tm, omn, ourl, okey)
|
||||
else:
|
||||
engine = VoskRecognizer(vosk, t, tm, omn, ourl, okey)
|
||||
|
||||
engine.start()
|
||||
stream_thread = threading.Thread(
|
||||
target=audio_recording,
|
||||
args=(stream, True, r, rp),
|
||||
daemon=True
|
||||
)
|
||||
stream_thread.start()
|
||||
try:
|
||||
engine.translate()
|
||||
except KeyboardInterrupt:
|
||||
stdout("Keyboard interrupt detected. Exiting...")
|
||||
engine.stop()
|
||||
|
||||
|
||||
def main_sosv(a: int, c: int, sosv: str, s: str, t: str, tm: str, omn: str, ourl: str, okey: str, r: bool, rp: str):
|
||||
"""
|
||||
Parameters:
|
||||
a: Audio source: 0 for output, 1 for input
|
||||
c: Chunk number in 1 second
|
||||
sosv: Sherpa-ONNX SenseVoice model path
|
||||
s: Source language
|
||||
t: Target language
|
||||
tm: Translation model type, ollama or google
|
||||
omn: Ollama model name
|
||||
ourl: Ollama API URL
|
||||
okey: Ollama API Key
|
||||
r: Whether to record the audio
|
||||
rp: Path to save the recorded audio
|
||||
"""
|
||||
stream = AudioStream(a, c)
|
||||
if t == 'none':
|
||||
engine = SosvRecognizer(sosv, s, None, tm, omn, ourl, okey)
|
||||
else:
|
||||
engine = SosvRecognizer(sosv, s, t, tm, omn, ourl, okey)
|
||||
|
||||
engine.start()
|
||||
stream_thread = threading.Thread(
|
||||
target=audio_recording,
|
||||
args=(stream, True, r, rp),
|
||||
daemon=True
|
||||
)
|
||||
stream_thread.start()
|
||||
try:
|
||||
engine.translate()
|
||||
except KeyboardInterrupt:
|
||||
stdout("Keyboard interrupt detected. Exiting...")
|
||||
engine.stop()
|
||||
|
||||
|
||||
def main_glm(a: int, c: int, url: str, model: str, key: str, s: str, t: str, tm: str, omn: str, ourl: str, okey: str, r: bool, rp: str):
|
||||
"""
|
||||
Parameters:
|
||||
a: Audio source
|
||||
c: Chunk rate
|
||||
url: GLM API URL
|
||||
model: GLM Model Name
|
||||
key: GLM API Key
|
||||
s: Source language
|
||||
t: Target language
|
||||
tm: Translation model
|
||||
omn: Ollama model name
|
||||
ourl: Ollama API URL
|
||||
okey: Ollama API Key
|
||||
r: Record
|
||||
rp: Record path
|
||||
"""
|
||||
stream = AudioStream(a, c)
|
||||
if t == 'none':
|
||||
engine = GlmRecognizer(url, model, key, s, None, tm, omn, ourl, okey)
|
||||
else:
|
||||
engine = GlmRecognizer(url, model, key, s, t, tm, omn, ourl, okey)
|
||||
|
||||
engine.start()
|
||||
stream_thread = threading.Thread(
|
||||
target=audio_recording,
|
||||
args=(stream, True, r, rp),
|
||||
daemon=True
|
||||
)
|
||||
stream_thread.start()
|
||||
try:
|
||||
engine.translate()
|
||||
except KeyboardInterrupt:
|
||||
stdout("Keyboard interrupt detected. Exiting...")
|
||||
engine.stop()
|
||||
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
|
||||
# all
|
||||
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy, glm, vosk or sosv')
|
||||
parser.add_argument('-a', '--audio_type', type=int, default=0, help='Audio stream source: 0 for output, 1 for input')
|
||||
parser.add_argument('-c', '--chunk_rate', type=int, default=10, help='Number of audio stream chunks collected per second')
|
||||
parser.add_argument('-p', '--port', type=int, default=0, help='The port to run the server on, 0 for no server')
|
||||
parser.add_argument('-d', '--display_caption', type=int, default=0, help='Display caption on terminal, 0 for no display, 1 for display')
|
||||
parser.add_argument('-t', '--target_language', default='none', help='Target language code, "none" for no translation')
|
||||
parser.add_argument('-r', '--record', type=int, default=0, help='Whether to record the audio, 0 for no recording, 1 for recording')
|
||||
parser.add_argument('-rp', '--record_path', default='', help='Path to save the recorded audio')
|
||||
# gummy and sosv and glm
|
||||
parser.add_argument('-s', '--source_language', default='auto', help='Source language code')
|
||||
# gummy only
|
||||
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
|
||||
# vosk and sosv
|
||||
parser.add_argument('-tm', '--translation_model', default='ollama', help='Model for translation: ollama or google')
|
||||
parser.add_argument('-omn', '--ollama_name', default='', help='Ollama model name for translation')
|
||||
parser.add_argument('-ourl', '--ollama_url', default='', help='Ollama API URL')
|
||||
parser.add_argument('-okey', '--ollama_api_key', default='', help='Ollama API Key')
|
||||
# vosk only
|
||||
parser.add_argument('-vosk', '--vosk_model', default='', help='The path to the vosk model.')
|
||||
# sosv only
|
||||
parser.add_argument('-sosv', '--sosv_model', default=None, help='The SenseVoice model path')
|
||||
# glm only
|
||||
parser.add_argument('-gurl', '--glm_url', default='https://open.bigmodel.cn/api/paas/v4/audio/transcriptions', help='GLM API URL')
|
||||
parser.add_argument('-gmodel', '--glm_model', default='glm-asr-2512', help='GLM Model Name')
|
||||
parser.add_argument('-gkey', '--glm_api_key', default='', help='GLM API Key')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.port != 0:
|
||||
threading.Thread(target=start_server, args=(args.port,), daemon=True).start()
|
||||
|
||||
if args.display_caption == '1':
|
||||
change_caption_display(True)
|
||||
|
||||
if args.caption_engine == 'gummy':
|
||||
main_gummy(
|
||||
args.source_language,
|
||||
args.target_language,
|
||||
int(args.audio_type),
|
||||
int(args.chunk_rate),
|
||||
args.api_key,
|
||||
bool(int(args.record)),
|
||||
args.record_path
|
||||
)
|
||||
elif args.caption_engine == 'vosk':
|
||||
main_vosk(
|
||||
int(args.audio_type),
|
||||
int(args.chunk_rate),
|
||||
args.vosk_model,
|
||||
args.target_language,
|
||||
args.translation_model,
|
||||
args.ollama_name,
|
||||
args.ollama_url,
|
||||
args.ollama_api_key,
|
||||
bool(int(args.record)),
|
||||
args.record_path
|
||||
)
|
||||
elif args.caption_engine == 'sosv':
|
||||
main_sosv(
|
||||
int(args.audio_type),
|
||||
int(args.chunk_rate),
|
||||
args.sosv_model,
|
||||
args.source_language,
|
||||
args.target_language,
|
||||
args.translation_model,
|
||||
args.ollama_name,
|
||||
args.ollama_url,
|
||||
args.ollama_api_key,
|
||||
bool(int(args.record)),
|
||||
args.record_path
|
||||
)
|
||||
elif args.caption_engine == 'glm':
|
||||
main_glm(
|
||||
int(args.audio_type),
|
||||
int(args.chunk_rate),
|
||||
args.glm_url,
|
||||
args.glm_model,
|
||||
args.glm_api_key,
|
||||
args.source_language,
|
||||
args.target_language,
|
||||
args.translation_model,
|
||||
args.ollama_name,
|
||||
args.ollama_url,
|
||||
args.ollama_api_key,
|
||||
bool(int(args.record)),
|
||||
args.record_path
|
||||
)
|
||||
else:
|
||||
raise ValueError('Invalid caption engine specified.')
|
||||
|
||||
if shared_data.status == "kill":
|
||||
stdout_cmd('kill')
|
||||
@@ -1,11 +1,23 @@
|
||||
# -*- mode: python ; coding: utf-8 -*-
|
||||
|
||||
from pathlib import Path
|
||||
import sys
|
||||
|
||||
if sys.platform == 'win32':
|
||||
vosk_path = str(Path('./.venv/Lib/site-packages/vosk').resolve())
|
||||
else:
|
||||
venv_lib = Path('./.venv/lib')
|
||||
python_dirs = list(venv_lib.glob('python*'))
|
||||
if python_dirs:
|
||||
vosk_path = str((python_dirs[0] / 'site-packages' / 'vosk').resolve())
|
||||
else:
|
||||
vosk_path = str(Path('./.venv/lib/python3.12/site-packages/vosk').resolve())
|
||||
|
||||
a = Analysis(
|
||||
['main-gummy.py'],
|
||||
['main.py'],
|
||||
pathex=[],
|
||||
binaries=[],
|
||||
datas=[],
|
||||
datas=[(vosk_path, 'vosk')],
|
||||
hiddenimports=[],
|
||||
hookspath=[],
|
||||
hooksconfig={},
|
||||
@@ -14,6 +26,7 @@ a = Analysis(
|
||||
noarchive=False,
|
||||
optimize=0,
|
||||
)
|
||||
|
||||
pyz = PYZ(a.pure)
|
||||
|
||||
exe = EXE(
|
||||
@@ -22,7 +35,7 @@ exe = EXE(
|
||||
a.binaries,
|
||||
a.datas,
|
||||
[],
|
||||
name='main-gummy',
|
||||
name='main',
|
||||
debug=False,
|
||||
bootloader_ignore_signals=False,
|
||||
strip=False,
|
||||
@@ -35,4 +48,5 @@ exe = EXE(
|
||||
target_arch=None,
|
||||
codesign_identity=None,
|
||||
entitlements_file=None,
|
||||
onefile=True,
|
||||
)
|
||||
12
engine/requirements.txt
Normal file
@@ -0,0 +1,12 @@
|
||||
dashscope
|
||||
numpy
|
||||
resampy
|
||||
vosk
|
||||
pyinstaller
|
||||
pyaudio; sys_platform == 'darwin'
|
||||
pyaudiowpatch; sys_platform == 'win32'
|
||||
googletrans
|
||||
ollama
|
||||
sherpa_onnx
|
||||
requests
|
||||
openai
|
||||
10
engine/sysaudio/__init__.py
Normal file
@@ -0,0 +1,10 @@
|
||||
import sys
|
||||
|
||||
if sys.platform == "win32":
|
||||
from .win import AudioStream
|
||||
elif sys.platform == "darwin":
|
||||
from .darwin import AudioStream
|
||||
elif sys.platform == "linux":
|
||||
from .linux import AudioStream
|
||||
else:
|
||||
raise NotImplementedError(f"Unsupported platform: {sys.platform}")
|
||||
118
engine/sysaudio/darwin.py
Normal file
@@ -0,0 +1,118 @@
|
||||
"""获取 MacOS 系统音频输入/输出流"""
|
||||
|
||||
import pyaudio
|
||||
from textwrap import dedent
|
||||
|
||||
|
||||
def get_blackhole_device(mic: pyaudio.PyAudio):
|
||||
"""
|
||||
获取 BlackHole 设备
|
||||
"""
|
||||
device_count = mic.get_device_count()
|
||||
for i in range(device_count):
|
||||
dev_info = mic.get_device_info_by_index(i)
|
||||
if 'blackhole' in str(dev_info["name"]).lower():
|
||||
return dev_info
|
||||
raise Exception("The device containing BlackHole was not found.")
|
||||
|
||||
|
||||
class AudioStream:
|
||||
"""
|
||||
获取系统音频流(如果要捕获输出音频,仅支持 BlackHole 作为系统音频输出捕获)
|
||||
|
||||
初始化参数:
|
||||
audio_type: 0-系统音频输出流(需配合 BlackHole),1-系统音频输入流
|
||||
chunk_rate: 每秒采集音频块的数量,默认为10
|
||||
"""
|
||||
def __init__(self, audio_type=0, chunk_rate=10):
|
||||
self.audio_type = audio_type
|
||||
self.mic = pyaudio.PyAudio()
|
||||
if self.audio_type == 0:
|
||||
self.device = get_blackhole_device(self.mic)
|
||||
else:
|
||||
self.device = self.mic.get_default_input_device_info()
|
||||
self.stop_signal = False
|
||||
self.stream = None
|
||||
self.INDEX = self.device["index"]
|
||||
self.FORMAT = pyaudio.paInt16
|
||||
self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
|
||||
self.CHANNELS = int(self.device["maxInputChannels"])
|
||||
self.DEFAULT_RATE = int(self.device["defaultSampleRate"])
|
||||
self.CHUNK_RATE = chunk_rate
|
||||
|
||||
self.RATE = 16000
|
||||
self.CHUNK = self.RATE // self.CHUNK_RATE
|
||||
self.open_stream()
|
||||
self.close_stream()
|
||||
|
||||
def get_info(self):
|
||||
dev_info = f"""
|
||||
采样设备:
|
||||
- 设备类型:{ "音频输出" if self.audio_type == 0 else "音频输入" }
|
||||
- 设备序号:{self.device['index']}
|
||||
- 设备名称:{self.device['name']}
|
||||
- 最大输入通道数:{self.device['maxInputChannels']}
|
||||
- 默认低输入延迟:{self.device['defaultLowInputLatency']}s
|
||||
- 默认高输入延迟:{self.device['defaultHighInputLatency']}s
|
||||
- 默认采样率:{self.device['defaultSampleRate']}Hz
|
||||
- 是否回环设备:{self.device['isLoopbackDevice']}
|
||||
|
||||
设备序号:{self.INDEX}
|
||||
样本格式:{self.FORMAT}
|
||||
样本位宽:{self.SAMP_WIDTH}
|
||||
样本通道数:{self.CHANNELS}
|
||||
样本采样率:{self.RATE}
|
||||
样本块大小:{self.CHUNK}
|
||||
"""
|
||||
return dedent(dev_info).strip()
|
||||
|
||||
def open_stream(self):
|
||||
"""
|
||||
打开并返回系统音频输出流
|
||||
"""
|
||||
if self.stream: return self.stream
|
||||
try:
|
||||
self.stream = self.mic.open(
|
||||
format = self.FORMAT,
|
||||
channels = int(self.CHANNELS),
|
||||
rate = self.RATE,
|
||||
input = True,
|
||||
input_device_index = int(self.INDEX)
|
||||
)
|
||||
except OSError:
|
||||
self.RATE = self.DEFAULT_RATE
|
||||
self.CHUNK = self.RATE // self.CHUNK_RATE
|
||||
self.stream = self.mic.open(
|
||||
format = self.FORMAT,
|
||||
channels = int(self.CHANNELS),
|
||||
rate = self.RATE,
|
||||
input = True,
|
||||
input_device_index = int(self.INDEX)
|
||||
)
|
||||
return self.stream
|
||||
|
||||
def read_chunk(self) -> bytes | None:
|
||||
"""
|
||||
读取音频数据
|
||||
"""
|
||||
if self.stop_signal:
|
||||
self.close_stream()
|
||||
return None
|
||||
if not self.stream: return None
|
||||
return self.stream.read(self.CHUNK, exception_on_overflow=False)
|
||||
|
||||
def close_stream_signal(self):
|
||||
"""
|
||||
线程安全的关闭系统音频输入流,不一定会立即关闭
|
||||
"""
|
||||
self.stop_signal = True
|
||||
|
||||
def close_stream(self):
|
||||
"""
|
||||
立即关闭系统音频输入流
|
||||
"""
|
||||
if self.stream is not None:
|
||||
self.stream.stop_stream()
|
||||
self.stream.close()
|
||||
self.stream = None
|
||||
self.stop_signal = False
|
||||
109
engine/sysaudio/linux.py
Normal file
@@ -0,0 +1,109 @@
|
||||
"""获取 Linux 系统音频输入流"""
|
||||
|
||||
import subprocess
|
||||
from textwrap import dedent
|
||||
|
||||
|
||||
def find_monitor_source():
|
||||
result = subprocess.run(
|
||||
["pactl", "list", "short", "sources"],
|
||||
stdout=subprocess.PIPE, text=True
|
||||
)
|
||||
lines = result.stdout.splitlines()
|
||||
|
||||
for line in lines:
|
||||
parts = line.split('\t')
|
||||
if len(parts) >= 2 and ".monitor" in parts[1]:
|
||||
return parts[1]
|
||||
|
||||
raise RuntimeError("System output monitor device not found")
|
||||
|
||||
|
||||
def find_input_source():
|
||||
result = subprocess.run(
|
||||
["pactl", "list", "short", "sources"],
|
||||
stdout=subprocess.PIPE, text=True
|
||||
)
|
||||
lines = result.stdout.splitlines()
|
||||
|
||||
for line in lines:
|
||||
parts = line.split('\t')
|
||||
name = parts[1]
|
||||
if ".monitor" not in name:
|
||||
return name
|
||||
|
||||
raise RuntimeError("Microphone input device not found")
|
||||
|
||||
|
||||
class AudioStream:
|
||||
"""
|
||||
获取系统音频流
|
||||
|
||||
初始化参数:
|
||||
audio_type: 0-系统音频输出流(不支持,不会生效),1-系统音频输入流(默认)
|
||||
chunk_rate: 每秒采集音频块的数量,默认为10
|
||||
"""
|
||||
def __init__(self, audio_type=1, chunk_rate=10):
|
||||
self.audio_type = audio_type
|
||||
|
||||
if self.audio_type == 0:
|
||||
self.source = find_monitor_source()
|
||||
else:
|
||||
self.source = find_input_source()
|
||||
self.stop_signal = False
|
||||
self.process = None
|
||||
self.FORMAT = 16
|
||||
self.SAMP_WIDTH = 2
|
||||
self.CHANNELS = 2
|
||||
self.RATE = 16000
|
||||
self.CHUNK_RATE = chunk_rate
|
||||
self.CHUNK = self.RATE // chunk_rate
|
||||
|
||||
def get_info(self):
|
||||
dev_info = f"""
|
||||
音频捕获进程:
|
||||
- 捕获类型:{"音频输出" if self.audio_type == 0 else "音频输入"}
|
||||
- 设备源:{self.source}
|
||||
- 捕获进程 PID:{self.process.pid if self.process else "None"}
|
||||
|
||||
样本格式:{self.FORMAT}
|
||||
样本位宽:{self.SAMP_WIDTH}
|
||||
样本通道数:{self.CHANNELS}
|
||||
样本采样率:{self.RATE}
|
||||
样本块大小:{self.CHUNK}
|
||||
"""
|
||||
print(dev_info)
|
||||
|
||||
def open_stream(self):
|
||||
"""
|
||||
启动音频捕获进程
|
||||
"""
|
||||
self.process = subprocess.Popen(
|
||||
["parec", "-d", self.source, "--format=s16le", "--rate=16000", "--channels=2"],
|
||||
stdout=subprocess.PIPE
|
||||
)
|
||||
|
||||
def read_chunk(self):
|
||||
"""
|
||||
读取音频数据
|
||||
"""
|
||||
if self.stop_signal:
|
||||
self.close_stream()
|
||||
return None
|
||||
if self.process and self.process.stdout:
|
||||
return self.process.stdout.read(self.CHUNK)
|
||||
return None
|
||||
|
||||
def close_stream_signal(self):
|
||||
"""
|
||||
线程安全的关闭系统音频输入流,不一定会立即关闭
|
||||
"""
|
||||
self.stop_signal = True
|
||||
|
||||
def close_stream(self):
|
||||
"""
|
||||
关闭系统音频捕获进程
|
||||
"""
|
||||
if self.process:
|
||||
self.process.terminate()
|
||||
self.stop_signal = False
|
||||
142
engine/sysaudio/win.py
Normal file
@@ -0,0 +1,142 @@
|
||||
"""获取 Windows 系统音频输入/输出流"""
|
||||
|
||||
import pyaudiowpatch as pyaudio
|
||||
from textwrap import dedent
|
||||
|
||||
|
||||
def get_default_loopback_device(mic: pyaudio.PyAudio, info = True)->dict:
|
||||
"""
|
||||
获取默认的系统音频输出的回环设备
|
||||
Args:
|
||||
mic: pyaudio对象
|
||||
info: 是否打印设备信息
|
||||
|
||||
Returns:
|
||||
dict: 系统音频输出的回环设备
|
||||
"""
|
||||
try:
|
||||
WASAPI_info = mic.get_host_api_info_by_type(pyaudio.paWASAPI)
|
||||
except OSError:
|
||||
print("Looks like WASAPI is not available on the system. Exiting...")
|
||||
exit()
|
||||
|
||||
default_speaker = mic.get_device_info_by_index(WASAPI_info["defaultOutputDevice"])
|
||||
if(info): print("wasapi_info:\n", WASAPI_info, "\n")
|
||||
if(info): print("default_speaker:\n", default_speaker, "\n")
|
||||
|
||||
if not default_speaker["isLoopbackDevice"]:
|
||||
for loopback in mic.get_loopback_device_info_generator():
|
||||
if default_speaker["name"] in loopback["name"]:
|
||||
default_speaker = loopback
|
||||
if(info): print("Using loopback device:\n", default_speaker, "\n")
|
||||
break
|
||||
else:
|
||||
print("Default loopback output device not found.")
|
||||
print("Run `python -m pyaudiowpatch` to check available devices.")
|
||||
print("Exiting...")
|
||||
exit()
|
||||
|
||||
if(info): print(f"Output Stream Device: #{default_speaker['index']} {default_speaker['name']}")
|
||||
return default_speaker
|
||||
|
||||
|
||||
class AudioStream:
|
||||
"""
|
||||
获取系统音频流
|
||||
|
||||
初始化参数:
|
||||
audio_type: 0-系统音频输出流(默认),1-系统音频输入流
|
||||
chunk_rate: 每秒采集音频块的数量,默认为10
|
||||
"""
|
||||
def __init__(self, audio_type=0, chunk_rate=10, chunk_size=-1):
|
||||
self.audio_type = audio_type
|
||||
self.mic = pyaudio.PyAudio()
|
||||
if self.audio_type == 0:
|
||||
self.device = get_default_loopback_device(self.mic, False)
|
||||
else:
|
||||
self.device = self.mic.get_default_input_device_info()
|
||||
self.stop_signal = False
|
||||
self.stream = None
|
||||
self.INDEX = self.device["index"]
|
||||
self.FORMAT = pyaudio.paInt16
|
||||
self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
|
||||
self.CHANNELS = int(self.device["maxInputChannels"])
|
||||
self.DEFAULT_RATE = int(self.device["defaultSampleRate"])
|
||||
self.CHUNK_RATE = chunk_rate
|
||||
|
||||
self.RATE = 16000
|
||||
self.CHUNK = self.RATE // self.CHUNK_RATE
|
||||
self.open_stream()
|
||||
self.close_stream()
|
||||
|
||||
def get_info(self):
|
||||
dev_info = f"""
|
||||
采样设备:
|
||||
- 设备类型:{ "音频输出" if self.audio_type == 0 else "音频输入" }
|
||||
- 设备序号:{self.device['index']}
|
||||
- 设备名称:{self.device['name']}
|
||||
- 最大输入通道数:{self.device['maxInputChannels']}
|
||||
- 默认低输入延迟:{self.device['defaultLowInputLatency']}s
|
||||
- 默认高输入延迟:{self.device['defaultHighInputLatency']}s
|
||||
- 默认采样率:{self.device['defaultSampleRate']}Hz
|
||||
- 是否回环设备:{self.device['isLoopbackDevice']}
|
||||
|
||||
设备序号:{self.INDEX}
|
||||
样本格式:{self.FORMAT}
|
||||
样本位宽:{self.SAMP_WIDTH}
|
||||
样本通道数:{self.CHANNELS}
|
||||
样本采样率:{self.RATE}
|
||||
样本块大小:{self.CHUNK}
|
||||
"""
|
||||
return dedent(dev_info).strip()
|
||||
|
||||
def open_stream(self):
|
||||
"""
|
||||
打开并返回系统音频输出流
|
||||
"""
|
||||
if self.stream: return self.stream
|
||||
try:
|
||||
self.stream = self.mic.open(
|
||||
format = self.FORMAT,
|
||||
channels = self.CHANNELS,
|
||||
rate = self.RATE,
|
||||
input = True,
|
||||
input_device_index = self.INDEX
|
||||
)
|
||||
except OSError:
|
||||
self.RATE = self.DEFAULT_RATE
|
||||
self.CHUNK = self.RATE // self.CHUNK_RATE
|
||||
self.stream = self.mic.open(
|
||||
format = self.FORMAT,
|
||||
channels = self.CHANNELS,
|
||||
rate = self.RATE,
|
||||
input = True,
|
||||
input_device_index = self.INDEX
|
||||
)
|
||||
return self.stream
|
||||
|
||||
def read_chunk(self) -> bytes | None:
|
||||
"""
|
||||
读取音频数据
|
||||
"""
|
||||
if self.stop_signal:
|
||||
self.close_stream()
|
||||
return None
|
||||
if not self.stream: return None
|
||||
return self.stream.read(self.CHUNK, exception_on_overflow=False)
|
||||
|
||||
def close_stream_signal(self):
|
||||
"""
|
||||
线程安全的关闭系统音频输入流,不一定会立即关闭
|
||||
"""
|
||||
self.stop_signal = True
|
||||
|
||||
def close_stream(self):
|
||||
"""
|
||||
关闭系统音频输入流
|
||||
"""
|
||||
if self.stream is not None:
|
||||
self.stream.stop_stream()
|
||||
self.stream.close()
|
||||
self.stream = None
|
||||
self.stop_signal = False
|
||||
6
engine/utils/__init__.py
Normal file
@@ -0,0 +1,6 @@
|
||||
from .audioprcs import merge_chunk_channels, resample_chunk_mono
|
||||
from .sysout import stdout, stdout_err, stdout_cmd, stdout_obj, stderr
|
||||
from .sysout import change_caption_display
|
||||
from .shared import shared_data
|
||||
from .server import start_server
|
||||
from .translation import ollama_translate, google_translate
|
||||
64
engine/utils/audioprcs.py
Normal file
@@ -0,0 +1,64 @@
|
||||
import resampy
|
||||
import numpy as np
|
||||
import numpy.core.multiarray # do not remove
|
||||
|
||||
def merge_chunk_channels(chunk: bytes, channels: int) -> bytes:
|
||||
"""
|
||||
将当前多通道音频数据块转换为单通道音频数据块
|
||||
|
||||
Args:
|
||||
chunk: 多通道音频数据块
|
||||
channels: 通道数
|
||||
|
||||
Returns:
|
||||
单通道音频数据块
|
||||
"""
|
||||
if channels == 1: return chunk
|
||||
# (length * channels,)
|
||||
chunk_np = np.frombuffer(chunk, dtype=np.int16)
|
||||
# (length, channels)
|
||||
chunk_np = chunk_np.reshape(-1, channels)
|
||||
# (length,)
|
||||
chunk_mono_f = np.mean(chunk_np.astype(np.float32), axis=1)
|
||||
chunk_mono = np.round(chunk_mono_f).astype(np.int16)
|
||||
return chunk_mono.tobytes()
|
||||
|
||||
|
||||
def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int) -> bytes:
|
||||
"""
|
||||
将当前多通道音频数据块转换成单通道音频数据块,并进行重采样
|
||||
|
||||
Args:
|
||||
chunk: 多通道音频数据块
|
||||
channels: 通道数
|
||||
orig_sr: 原始采样率
|
||||
target_sr: 目标采样率
|
||||
|
||||
Return:
|
||||
单通道音频数据块
|
||||
"""
|
||||
if channels == 1:
|
||||
chunk_mono = np.frombuffer(chunk, dtype=np.int16)
|
||||
chunk_mono = chunk_mono.astype(np.float32)
|
||||
else:
|
||||
# (length * channels,)
|
||||
chunk_np = np.frombuffer(chunk, dtype=np.int16)
|
||||
# (length, channels)
|
||||
chunk_np = chunk_np.reshape(-1, channels)
|
||||
# (length,)
|
||||
chunk_mono = np.mean(chunk_np.astype(np.float32), axis=1)
|
||||
|
||||
if orig_sr == target_sr:
|
||||
return chunk_mono.astype(np.int16).tobytes()
|
||||
|
||||
chunk_mono_r = resampy.resample(chunk_mono, orig_sr, target_sr)
|
||||
chunk_mono_r = np.round(chunk_mono_r).astype(np.int16)
|
||||
real_len = round(chunk_mono.shape[0] * target_sr / orig_sr)
|
||||
if(chunk_mono_r.shape[0] != real_len):
|
||||
print(chunk_mono_r.shape[0], real_len)
|
||||
if(chunk_mono_r.shape[0] > real_len):
|
||||
chunk_mono_r = chunk_mono_r[:real_len]
|
||||
else:
|
||||
while chunk_mono_r.shape[0] < real_len:
|
||||
chunk_mono_r = np.append(chunk_mono_r, chunk_mono_r[-1])
|
||||
return chunk_mono_r.tobytes()
|
||||
41
engine/utils/server.py
Normal file
@@ -0,0 +1,41 @@
|
||||
import socket
|
||||
import threading
|
||||
import json
|
||||
from utils import shared_data, stdout_cmd, stderr
|
||||
|
||||
|
||||
def handle_client(client_socket):
|
||||
global shared_data
|
||||
while shared_data.status == 'running':
|
||||
try:
|
||||
data = client_socket.recv(4096).decode('utf-8')
|
||||
if not data:
|
||||
break
|
||||
data = json.loads(data)
|
||||
|
||||
if data['command'] == 'stop':
|
||||
shared_data.status = 'stop'
|
||||
break
|
||||
except Exception as e:
|
||||
stderr(f'Communication error: {e}')
|
||||
break
|
||||
|
||||
shared_data.status = 'stop'
|
||||
client_socket.close()
|
||||
|
||||
|
||||
def start_server(port: int):
|
||||
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
|
||||
try:
|
||||
server.bind(('localhost', port))
|
||||
server.listen(1)
|
||||
except Exception as e:
|
||||
stderr(str(e))
|
||||
stdout_cmd('kill')
|
||||
return
|
||||
stdout_cmd('connect')
|
||||
|
||||
client, addr = server.accept()
|
||||
client_handler = threading.Thread(target=handle_client, args=(client,))
|
||||
client_handler.daemon = True
|
||||
client_handler.start()
|
||||
8
engine/utils/shared.py
Normal file
@@ -0,0 +1,8 @@
|
||||
import queue
|
||||
|
||||
class SharedData:
|
||||
def __init__(self):
|
||||
self.status = "running"
|
||||
self.chunk_queue = queue.Queue()
|
||||
|
||||
shared_data = SharedData()
|
||||
61
engine/utils/sysout.py
Normal file
@@ -0,0 +1,61 @@
|
||||
import sys
|
||||
import json
|
||||
import sherpa_onnx
|
||||
|
||||
display_caption = False
|
||||
caption_index = -1
|
||||
display = sherpa_onnx.Display()
|
||||
|
||||
def stdout(text: str):
|
||||
stdout_cmd("print", text)
|
||||
|
||||
def stdout_err(text: str):
|
||||
stdout_cmd("error", text)
|
||||
|
||||
def stdout_cmd(command: str, content = ""):
|
||||
msg = { "command": command, "content": content }
|
||||
sys.stdout.write(json.dumps(msg) + "\n")
|
||||
sys.stdout.flush()
|
||||
|
||||
def change_caption_display(val: bool):
|
||||
global display_caption
|
||||
display_caption = val
|
||||
|
||||
def caption_display(obj):
|
||||
global display_caption
|
||||
global caption_index
|
||||
global display
|
||||
|
||||
if caption_index >=0 and caption_index != int(obj['index']):
|
||||
display.finalize_current_sentence()
|
||||
caption_index = int(obj['index'])
|
||||
full_text = f"{obj['text']}\n{obj['translation']}"
|
||||
if obj['translation']:
|
||||
full_text += "\n"
|
||||
display.update_text(full_text)
|
||||
display.display()
|
||||
|
||||
def translation_display(obj):
|
||||
global original_caption
|
||||
global display
|
||||
full_text = f"{obj['text']}\n{obj['translation']}"
|
||||
if obj['translation']:
|
||||
full_text += "\n"
|
||||
display.update_text(full_text)
|
||||
display.display()
|
||||
display.finalize_current_sentence()
|
||||
|
||||
def stdout_obj(obj):
|
||||
global display_caption
|
||||
if obj['command'] == 'caption' and display_caption:
|
||||
caption_display(obj)
|
||||
return
|
||||
if obj['command'] == 'translation' and display_caption:
|
||||
translation_display(obj)
|
||||
return
|
||||
sys.stdout.write(json.dumps(obj) + "\n")
|
||||
sys.stdout.flush()
|
||||
|
||||
def stderr(text: str):
|
||||
sys.stderr.write(text + "\n")
|
||||
sys.stderr.flush()
|
||||
83
engine/utils/translation.py
Normal file
@@ -0,0 +1,83 @@
|
||||
from ollama import chat, Client
|
||||
from ollama import ChatResponse
|
||||
try:
|
||||
from openai import OpenAI
|
||||
except ImportError:
|
||||
OpenAI = None
|
||||
import asyncio
|
||||
from googletrans import Translator
|
||||
from .sysout import stdout_cmd, stdout_obj
|
||||
|
||||
lang_map = {
|
||||
'en': 'English',
|
||||
'es': 'Spanish',
|
||||
'fr': 'French',
|
||||
'de': 'German',
|
||||
'it': 'Italian',
|
||||
'ru': 'Russian',
|
||||
'ja': 'Japanese',
|
||||
'ko': 'Korean',
|
||||
'zh': 'Chinese',
|
||||
'zh-cn': 'Chinese'
|
||||
}
|
||||
|
||||
def ollama_translate(model: str, target: str, text: str, time_s: str, url: str = '', key: str = ''):
|
||||
content = ""
|
||||
try:
|
||||
if url:
|
||||
if OpenAI:
|
||||
client = OpenAI(base_url=url, api_key=key if key else "ollama")
|
||||
openai_response = client.chat.completions.create(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
|
||||
{"role": "user", "content": text}
|
||||
]
|
||||
)
|
||||
content = openai_response.choices[0].message.content or ""
|
||||
else:
|
||||
client = Client(host=url)
|
||||
response: ChatResponse = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
|
||||
{"role": "user", "content": text}
|
||||
]
|
||||
)
|
||||
content = response.message.content or ""
|
||||
else:
|
||||
response: ChatResponse = chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
|
||||
{"role": "user", "content": text}
|
||||
]
|
||||
)
|
||||
content = response.message.content or ""
|
||||
except Exception as e:
|
||||
stdout_cmd("warn", f"Translation failed: {str(e)}")
|
||||
return
|
||||
|
||||
if content.startswith('<think>'):
|
||||
index = content.find('</think>')
|
||||
if index != -1:
|
||||
content = content[index+8:]
|
||||
stdout_obj({
|
||||
"command": "translation",
|
||||
"time_s": time_s,
|
||||
"text": text,
|
||||
"translation": content.strip()
|
||||
})
|
||||
|
||||
def google_translate(model: str, target: str, text: str, time_s: str):
|
||||
translator = Translator()
|
||||
try:
|
||||
res = asyncio.run(translator.translate(text, dest=target))
|
||||
stdout_obj({
|
||||
"command": "translation",
|
||||
"time_s": time_s,
|
||||
"text": text,
|
||||
"translation": res.text
|
||||
})
|
||||
except Exception as e:
|
||||
stdout_cmd("warn", f"Google translation request failed, please check your network connection...")
|
||||
1080
package-lock.json
generated
10
package.json
@@ -1,6 +1,7 @@
|
||||
{
|
||||
"name": "auto-caption",
|
||||
"version": "0.0.1",
|
||||
"productName": "Auto Caption",
|
||||
"version": "1.1.1",
|
||||
"description": "A cross-platform subtitle display software.",
|
||||
"main": "./out/main/index.js",
|
||||
"author": "himeditator",
|
||||
@@ -24,16 +25,17 @@
|
||||
"@electron-toolkit/preload": "^3.0.1",
|
||||
"@electron-toolkit/utils": "^4.0.0",
|
||||
"ant-design-vue": "^4.2.6",
|
||||
"pidusage": "^4.0.1",
|
||||
"pinia": "^3.0.2",
|
||||
"vue-router": "^4.5.1",
|
||||
"ws": "^8.18.2"
|
||||
"vue-i18n": "^11.1.9",
|
||||
"vue-router": "^4.5.1"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@electron-toolkit/eslint-config-prettier": "3.0.0",
|
||||
"@electron-toolkit/eslint-config-ts": "^3.0.0",
|
||||
"@electron-toolkit/tsconfig": "^1.0.1",
|
||||
"@types/node": "^22.14.1",
|
||||
"@types/ws": "^8.18.1",
|
||||
"@types/pidusage": "^2.0.5",
|
||||
"@vitejs/plugin-vue": "^5.2.3",
|
||||
"electron": "^35.1.5",
|
||||
"electron-builder": "^25.1.8",
|
||||
|
||||
@@ -1,221 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from dashscope.audio.asr import *\n",
|
||||
"import pyaudiowpatch as pyaudio\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def getDefaultSpeakers(mic: pyaudio.PyAudio, info = True):\n",
|
||||
" \"\"\"\n",
|
||||
" 获取默认的系统音频输出的回环设备\n",
|
||||
" Args:\n",
|
||||
" mic (pyaudio.PyAudio): pyaudio对象\n",
|
||||
" info (bool, optional): 是否打印设备信息. Defaults to True.\n",
|
||||
"\n",
|
||||
" Returns:\n",
|
||||
" dict: 统音频输出的回环设备\n",
|
||||
" \"\"\"\n",
|
||||
" try:\n",
|
||||
" WASAPI_info = mic.get_host_api_info_by_type(pyaudio.paWASAPI)\n",
|
||||
" except OSError:\n",
|
||||
" print(\"Looks like WASAPI is not available on the system. Exiting...\")\n",
|
||||
" exit()\n",
|
||||
"\n",
|
||||
" default_speaker = mic.get_device_info_by_index(WASAPI_info[\"defaultOutputDevice\"])\n",
|
||||
" if(info): print(\"wasapi_info:\\n\", WASAPI_info, \"\\n\")\n",
|
||||
" if(info): print(\"default_speaker:\\n\", default_speaker, \"\\n\")\n",
|
||||
"\n",
|
||||
" if not default_speaker[\"isLoopbackDevice\"]:\n",
|
||||
" for loopback in mic.get_loopback_device_info_generator():\n",
|
||||
" if default_speaker[\"name\"] in loopback[\"name\"]:\n",
|
||||
" default_speaker = loopback\n",
|
||||
" if(info): print(\"Using loopback device:\\n\", default_speaker, \"\\n\")\n",
|
||||
" break\n",
|
||||
" else:\n",
|
||||
" print(\"Default loopback output device not found.\")\n",
|
||||
" print(\"Run `python -m pyaudiowpatch` to check available devices.\")\n",
|
||||
" print(\"Exiting...\")\n",
|
||||
" exit()\n",
|
||||
" \n",
|
||||
" if(info): print(f\"Recording Device: #{default_speaker['index']} {default_speaker['name']}\")\n",
|
||||
" return default_speaker\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Callback(TranslationRecognizerCallback):\n",
|
||||
" \"\"\"\n",
|
||||
" 语音大模型流式传输回调对象\n",
|
||||
" \"\"\"\n",
|
||||
" def __init__(self):\n",
|
||||
" super().__init__()\n",
|
||||
" self.usage = 0\n",
|
||||
" self.sentences = []\n",
|
||||
" self.translations = []\n",
|
||||
" \n",
|
||||
" def on_open(self) -> None:\n",
|
||||
" print(\"\\n流式翻译开始...\\n\")\n",
|
||||
"\n",
|
||||
" def on_close(self) -> None:\n",
|
||||
" print(f\"\\nTokens消耗:{self.usage}\")\n",
|
||||
" print(f\"流式翻译结束...\\n\")\n",
|
||||
" for i in range(len(self.sentences)):\n",
|
||||
" print(f\"\\n{self.sentences[i]}\\n{self.translations[i]}\\n\")\n",
|
||||
"\n",
|
||||
" def on_event(\n",
|
||||
" self,\n",
|
||||
" request_id,\n",
|
||||
" transcription_result: TranscriptionResult,\n",
|
||||
" translation_result: TranslationResult,\n",
|
||||
" usage\n",
|
||||
" ) -> None:\n",
|
||||
" if transcription_result is not None:\n",
|
||||
" id = transcription_result.sentence_id\n",
|
||||
" text = transcription_result.text\n",
|
||||
" if transcription_result.stash is not None:\n",
|
||||
" stash = transcription_result.stash.text\n",
|
||||
" else:\n",
|
||||
" stash = \"\"\n",
|
||||
" print(f\"#{id}: {text}{stash}\")\n",
|
||||
" if usage: self.sentences.append(text)\n",
|
||||
" \n",
|
||||
" if translation_result is not None:\n",
|
||||
" lang = translation_result.get_language_list()[0]\n",
|
||||
" text = translation_result.get_translation(lang).text\n",
|
||||
" if translation_result.get_translation(lang).stash is not None:\n",
|
||||
" stash = translation_result.get_translation(lang).stash.text\n",
|
||||
" else:\n",
|
||||
" stash = \"\"\n",
|
||||
" print(f\"#{lang}: {text}{stash}\")\n",
|
||||
" if usage: self.translations.append(text)\n",
|
||||
" \n",
|
||||
" if usage: self.usage += usage['duration']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"采样输入设备:\n",
|
||||
" - 序号:37\n",
|
||||
" - 名称:耳机 (HUAWEI FreeLace 活力版) [Loopback]\n",
|
||||
" - 最大输入通道数:2\n",
|
||||
" - 默认低输入延迟:0.003s\n",
|
||||
" - 默认高输入延迟:0.01s\n",
|
||||
" - 默认采样率:44100.0Hz\n",
|
||||
" - 是否回环设备:True\n",
|
||||
"\n",
|
||||
"音频样本块大小:4410\n",
|
||||
"样本位宽:2\n",
|
||||
"音频数据格式:8\n",
|
||||
"音频通道数:2\n",
|
||||
"音频采样率:44100\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"mic = pyaudio.PyAudio()\n",
|
||||
"default_speaker = getDefaultSpeakers(mic, False)\n",
|
||||
"\n",
|
||||
"SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)\n",
|
||||
"FORMAT = pyaudio.paInt16\n",
|
||||
"CHANNELS = default_speaker[\"maxInputChannels\"]\n",
|
||||
"RATE = int(default_speaker[\"defaultSampleRate\"])\n",
|
||||
"CHUNK = RATE // 10\n",
|
||||
"INDEX = default_speaker[\"index\"]\n",
|
||||
"\n",
|
||||
"dev_info = f\"\"\"\n",
|
||||
"采样输入设备:\n",
|
||||
" - 序号:{default_speaker['index']}\n",
|
||||
" - 名称:{default_speaker['name']}\n",
|
||||
" - 最大输入通道数:{default_speaker['maxInputChannels']}\n",
|
||||
" - 默认低输入延迟:{default_speaker['defaultLowInputLatency']}s\n",
|
||||
" - 默认高输入延迟:{default_speaker['defaultHighInputLatency']}s\n",
|
||||
" - 默认采样率:{default_speaker['defaultSampleRate']}Hz\n",
|
||||
" - 是否回环设备:{default_speaker['isLoopbackDevice']}\n",
|
||||
"\n",
|
||||
"音频样本块大小:{CHUNK}\n",
|
||||
"样本位宽:{SAMP_WIDTH}\n",
|
||||
"音频数据格式:{FORMAT}\n",
|
||||
"音频通道数:{CHANNELS}\n",
|
||||
"音频采样率:{RATE}\n",
|
||||
"\"\"\"\n",
|
||||
"print(dev_info)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"RECORD_SECONDS = 20 # 监听时长(s)\n",
|
||||
"\n",
|
||||
"stream = mic.open(\n",
|
||||
" format = FORMAT,\n",
|
||||
" channels = CHANNELS,\n",
|
||||
" rate = RATE,\n",
|
||||
" input = True,\n",
|
||||
" input_device_index = INDEX\n",
|
||||
")\n",
|
||||
"translator = TranslationRecognizerRealtime(\n",
|
||||
" model = \"gummy-realtime-v1\",\n",
|
||||
" format = \"pcm\",\n",
|
||||
" sample_rate = RATE,\n",
|
||||
" transcription_enabled = True,\n",
|
||||
" translation_enabled = True,\n",
|
||||
" source_language = \"ja\",\n",
|
||||
" translation_target_languages = [\"zh\"],\n",
|
||||
" callback = Callback()\n",
|
||||
")\n",
|
||||
"translator.start()\n",
|
||||
"\n",
|
||||
"for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):\n",
|
||||
" data = stream.read(CHUNK)\n",
|
||||
" data_np = np.frombuffer(data, dtype=np.int16)\n",
|
||||
" data_np_r = data_np.reshape(-1, CHANNELS)\n",
|
||||
" print(data_np_r.shape)\n",
|
||||
" mono_data = np.mean(data_np_r.astype(np.float32), axis=1)\n",
|
||||
" mono_data = mono_data.astype(np.int16)\n",
|
||||
" mono_data_bytes = mono_data.tobytes()\n",
|
||||
" translator.send_audio_frame(mono_data_bytes)\n",
|
||||
"\n",
|
||||
"translator.stop()\n",
|
||||
"stream.stop_stream()\n",
|
||||
"stream.close()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "mystd",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,4 +0,0 @@
|
||||
numpy
|
||||
dashscope
|
||||
pyaudio
|
||||
pyaudiowpatch
|
||||
@@ -1,80 +0,0 @@
|
||||
from dashscope.audio.asr import (
|
||||
TranslationRecognizerCallback,
|
||||
TranscriptionResult,
|
||||
TranslationResult,
|
||||
TranslationRecognizerRealtime
|
||||
)
|
||||
from datetime import datetime
|
||||
import json
|
||||
import sys
|
||||
|
||||
class Callback(TranslationRecognizerCallback):
|
||||
"""
|
||||
语音大模型流式传输回调对象
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.usage = 0
|
||||
self.cur_id = -1
|
||||
self.time_str = ''
|
||||
|
||||
def on_open(self) -> None:
|
||||
pass
|
||||
|
||||
def on_close(self) -> None:
|
||||
pass
|
||||
|
||||
def on_event(
|
||||
self,
|
||||
request_id,
|
||||
transcription_result: TranscriptionResult,
|
||||
translation_result: TranslationResult,
|
||||
usage
|
||||
) -> None:
|
||||
caption = {}
|
||||
if transcription_result is not None:
|
||||
caption['index'] = transcription_result.sentence_id
|
||||
caption['text'] = transcription_result.text
|
||||
if caption['index'] != self.cur_id:
|
||||
self.cur_id = caption['index']
|
||||
cur_time = datetime.now().strftime('%H:%M:%S')
|
||||
caption['time_s'] = cur_time
|
||||
self.time_str = cur_time
|
||||
else:
|
||||
caption['time_s'] = self.time_str
|
||||
caption['time_t'] = datetime.now().strftime('%H:%M:%S')
|
||||
caption['translation'] = ""
|
||||
|
||||
if translation_result is not None:
|
||||
lang = translation_result.get_language_list()[0]
|
||||
caption['translation'] = translation_result.get_translation(lang).text
|
||||
|
||||
if usage:
|
||||
self.usage += usage['duration']
|
||||
|
||||
# print(caption)
|
||||
self.send_to_node(caption)
|
||||
|
||||
def send_to_node(self, data):
|
||||
"""
|
||||
将数据发送到 Node.js 进程
|
||||
"""
|
||||
try:
|
||||
json_data = json.dumps(data) + '\n'
|
||||
sys.stdout.write(json_data)
|
||||
sys.stdout.flush()
|
||||
except Exception as e:
|
||||
print(f"Error sending data to Node.js: {e}", file=sys.stderr)
|
||||
|
||||
class GummyTranslator:
|
||||
def __init__(self, rate, source, target):
|
||||
self.translator = TranslationRecognizerRealtime(
|
||||
model = "gummy-realtime-v1",
|
||||
format = "pcm",
|
||||
sample_rate = rate,
|
||||
transcription_enabled = True,
|
||||
translation_enabled = (target is not None),
|
||||
source_language = source,
|
||||
translation_target_languages = [target],
|
||||
callback = Callback()
|
||||
)
|
||||
@@ -1,48 +0,0 @@
|
||||
import sys
|
||||
|
||||
if sys.platform == 'win32':
|
||||
from sysaudio.win import AudioStream, mergeStreamChannels
|
||||
elif sys.platform == 'linux':
|
||||
from sysaudio.linux import AudioStream, mergeStreamChannels
|
||||
else:
|
||||
raise NotImplementedError(f"Unsupported platform: {sys.platform}")
|
||||
|
||||
from audio2text.gummy import GummyTranslator
|
||||
import sys
|
||||
import argparse
|
||||
|
||||
def convert_audio_to_text(s_lang, t_lang, audio_type):
|
||||
sys.stdout.reconfigure(line_buffering=True)
|
||||
stream = AudioStream(audio_type)
|
||||
stream.openStream()
|
||||
|
||||
if t_lang == 'none':
|
||||
gummy = GummyTranslator(stream.RATE, s_lang, None)
|
||||
else:
|
||||
gummy = GummyTranslator(stream.RATE, s_lang, t_lang)
|
||||
gummy.translator.start()
|
||||
|
||||
while True:
|
||||
try:
|
||||
if not stream.stream: continue
|
||||
data = stream.stream.read(stream.CHUNK)
|
||||
data = mergeStreamChannels(data, stream.CHANNELS)
|
||||
gummy.translator.send_audio_frame(data)
|
||||
except KeyboardInterrupt:
|
||||
stream.closeStream()
|
||||
gummy.translator.stop()
|
||||
break
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
|
||||
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
|
||||
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
|
||||
parser.add_argument('-a', '--audio_type', default='0', help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
|
||||
args = parser.parse_args()
|
||||
convert_audio_to_text(
|
||||
args.source_language,
|
||||
args.target_language,
|
||||
0 if args.audio_type == '0' else 1
|
||||
)
|
||||
|
||||
@@ -1,79 +0,0 @@
|
||||
import pyaudio
|
||||
import numpy as np
|
||||
|
||||
def mergeStreamChannels(data, channels):
|
||||
"""
|
||||
将当前多通道流数据合并为单通道流数据
|
||||
|
||||
Args:
|
||||
data: 多通道数据
|
||||
channels: 通道数
|
||||
|
||||
Returns:
|
||||
mono_data_bytes: 单通道数据
|
||||
"""
|
||||
# (length * channels,)
|
||||
data_np = np.frombuffer(data, dtype=np.int16)
|
||||
# (length, channels)
|
||||
data_np_r = data_np.reshape(-1, channels)
|
||||
# (length,)
|
||||
mono_data = np.mean(data_np_r.astype(np.float32), axis=1)
|
||||
mono_data = mono_data.astype(np.int16)
|
||||
mono_data_bytes = mono_data.tobytes()
|
||||
return mono_data_bytes
|
||||
|
||||
|
||||
class AudioStream:
|
||||
def __init__(self, audio_type=1):
|
||||
self.audio_type = audio_type
|
||||
self.mic = pyaudio.PyAudio()
|
||||
self.device = self.mic.get_default_input_device_info()
|
||||
self.stream = None
|
||||
self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
|
||||
self.FORMAT = pyaudio.paInt16
|
||||
self.CHANNELS = self.device["maxInputChannels"]
|
||||
self.RATE = int(self.device["defaultSampleRate"])
|
||||
self.CHUNK = self.RATE // 20
|
||||
self.INDEX = self.device["index"]
|
||||
|
||||
def printInfo(self):
|
||||
dev_info = f"""
|
||||
采样输入设备:
|
||||
- 设备类型:{ "音频输入(Linux平台目前仅支持该项)" }
|
||||
- 序号:{self.device['index']}
|
||||
- 名称:{self.device['name']}
|
||||
- 最大输入通道数:{self.device['maxInputChannels']}
|
||||
- 默认低输入延迟:{self.device['defaultLowInputLatency']}s
|
||||
- 默认高输入延迟:{self.device['defaultHighInputLatency']}s
|
||||
- 默认采样率:{self.device['defaultSampleRate']}Hz
|
||||
|
||||
音频样本块大小:{self.CHUNK}
|
||||
样本位宽:{self.SAMP_WIDTH}
|
||||
音频数据格式:{self.FORMAT}
|
||||
音频通道数:{self.CHANNELS}
|
||||
音频采样率:{self.RATE}
|
||||
"""
|
||||
print(dev_info)
|
||||
|
||||
def openStream(self):
|
||||
"""
|
||||
打开并返回系统音频输出流
|
||||
"""
|
||||
if self.stream: return self.stream
|
||||
self.stream = self.mic.open(
|
||||
format = self.FORMAT,
|
||||
channels = self.CHANNELS,
|
||||
rate = self.RATE,
|
||||
input = True,
|
||||
input_device_index = self.INDEX
|
||||
)
|
||||
return self.stream
|
||||
|
||||
def closeStream(self):
|
||||
"""
|
||||
关闭系统音频输出流
|
||||
"""
|
||||
if self.stream is None: return
|
||||
self.stream.stop_stream()
|
||||
self.stream.close()
|
||||
self.stream = None
|
||||
@@ -1,127 +0,0 @@
|
||||
"""获取 Windows 系统音频输出流"""
|
||||
|
||||
import pyaudiowpatch as pyaudio
|
||||
import numpy as np
|
||||
|
||||
|
||||
def getDefaultLoopbackDevice(mic: pyaudio.PyAudio, info = True)->dict:
|
||||
"""
|
||||
获取默认的系统音频输出的回环设备
|
||||
Args:
|
||||
mic (pyaudio.PyAudio): pyaudio对象
|
||||
info (bool, optional): 是否打印设备信息
|
||||
|
||||
Returns:
|
||||
dict: 系统音频输出的回环设备
|
||||
"""
|
||||
try:
|
||||
WASAPI_info = mic.get_host_api_info_by_type(pyaudio.paWASAPI)
|
||||
except OSError:
|
||||
print("Looks like WASAPI is not available on the system. Exiting...")
|
||||
exit()
|
||||
|
||||
default_speaker = mic.get_device_info_by_index(WASAPI_info["defaultOutputDevice"])
|
||||
if(info): print("wasapi_info:\n", WASAPI_info, "\n")
|
||||
if(info): print("default_speaker:\n", default_speaker, "\n")
|
||||
|
||||
if not default_speaker["isLoopbackDevice"]:
|
||||
for loopback in mic.get_loopback_device_info_generator():
|
||||
if default_speaker["name"] in loopback["name"]:
|
||||
default_speaker = loopback
|
||||
if(info): print("Using loopback device:\n", default_speaker, "\n")
|
||||
break
|
||||
else:
|
||||
print("Default loopback output device not found.")
|
||||
print("Run `python -m pyaudiowpatch` to check available devices.")
|
||||
print("Exiting...")
|
||||
exit()
|
||||
|
||||
if(info): print(f"Output Stream Device: #{default_speaker['index']} {default_speaker['name']}")
|
||||
return default_speaker
|
||||
|
||||
|
||||
def mergeStreamChannels(data, channels):
|
||||
"""
|
||||
将当前多通道流数据合并为单通道流数据
|
||||
|
||||
Args:
|
||||
data: 多通道数据
|
||||
channels: 通道数
|
||||
|
||||
Returns:
|
||||
mono_data_bytes: 单通道数据
|
||||
"""
|
||||
# (length * channels,)
|
||||
data_np = np.frombuffer(data, dtype=np.int16)
|
||||
# (length, channels)
|
||||
data_np_r = data_np.reshape(-1, channels)
|
||||
# (length,)
|
||||
mono_data = np.mean(data_np_r.astype(np.float32), axis=1)
|
||||
mono_data = mono_data.astype(np.int16)
|
||||
mono_data_bytes = mono_data.tobytes()
|
||||
return mono_data_bytes
|
||||
|
||||
class AudioStream:
|
||||
"""
|
||||
获取系统音频流
|
||||
|
||||
参数:
|
||||
audio_type: (默认)0-系统音频输出流,1-系统音频输入流
|
||||
"""
|
||||
def __init__(self, audio_type=0):
|
||||
self.audio_type = audio_type
|
||||
self.mic = pyaudio.PyAudio()
|
||||
if self.audio_type == 0:
|
||||
self.device = getDefaultLoopbackDevice(self.mic, False)
|
||||
else:
|
||||
self.device = self.mic.get_default_input_device_info()
|
||||
self.stream = None
|
||||
self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
|
||||
self.FORMAT = pyaudio.paInt16
|
||||
self.CHANNELS = self.device["maxInputChannels"]
|
||||
self.RATE = int(self.device["defaultSampleRate"])
|
||||
self.CHUNK = self.RATE // 20
|
||||
self.INDEX = self.device["index"]
|
||||
|
||||
def printInfo(self):
|
||||
dev_info = f"""
|
||||
采样设备:
|
||||
- 设备类型:{ "音频输入" if self.audio_type == 0 else "音频输出" }
|
||||
- 序号:{self.device['index']}
|
||||
- 名称:{self.device['name']}
|
||||
- 最大输入通道数:{self.device['maxInputChannels']}
|
||||
- 默认低输入延迟:{self.device['defaultLowInputLatency']}s
|
||||
- 默认高输入延迟:{self.device['defaultHighInputLatency']}s
|
||||
- 默认采样率:{self.device['defaultSampleRate']}Hz
|
||||
- 是否回环设备:{self.device['isLoopbackDevice']}
|
||||
|
||||
音频样本块大小:{self.CHUNK}
|
||||
样本位宽:{self.SAMP_WIDTH}
|
||||
音频数据格式:{self.FORMAT}
|
||||
音频通道数:{self.CHANNELS}
|
||||
音频采样率:{self.RATE}
|
||||
"""
|
||||
print(dev_info)
|
||||
|
||||
def openStream(self):
|
||||
"""
|
||||
打开并返回系统音频输出流
|
||||
"""
|
||||
if self.stream: return self.stream
|
||||
self.stream = self.mic.open(
|
||||
format = self.FORMAT,
|
||||
channels = self.CHANNELS,
|
||||
rate = self.RATE,
|
||||
input = True,
|
||||
input_device_index = self.INDEX
|
||||
)
|
||||
return self.stream
|
||||
|
||||
def closeStream(self):
|
||||
"""
|
||||
关闭系统音频输出流
|
||||
"""
|
||||
if self.stream is None: return
|
||||
self.stream.stop_stream()
|
||||
self.stream.close()
|
||||
self.stream = None
|
||||
|
Before Width: | Height: | Size: 25 KiB |
@@ -1,43 +1,42 @@
|
||||
import { shell, BrowserWindow, ipcMain } from 'electron'
|
||||
import path from 'path'
|
||||
import { is } from '@electron-toolkit/utils'
|
||||
import icon from '../../resources/icon.png?asset'
|
||||
import { controlWindow } from './control'
|
||||
import { sendStyles, sendCaptionLog } from './utils/config'
|
||||
import icon from '../../build/icon.png?asset'
|
||||
import { controlWindow } from './ControlWindow'
|
||||
import { allConfig } from './utils/AllConfig'
|
||||
|
||||
class CaptionWindow {
|
||||
class CaptionWindow {
|
||||
window: BrowserWindow | undefined;
|
||||
|
||||
public createWindow(): void {
|
||||
|
||||
public createWindow(): void {
|
||||
this.window = new BrowserWindow({
|
||||
icon: icon,
|
||||
width: 900,
|
||||
width: allConfig.captionWindowWidth,
|
||||
height: 100,
|
||||
minWidth: 480,
|
||||
show: false,
|
||||
frame: false,
|
||||
transparent: true,
|
||||
alwaysOnTop: true,
|
||||
center: true,
|
||||
autoHideMenuBar: true,
|
||||
...(process.platform === 'linux' ? { icon } : {}),
|
||||
webPreferences: {
|
||||
preload: path.join(__dirname, '../preload/index.js'),
|
||||
sandbox: false
|
||||
}
|
||||
})
|
||||
|
||||
setTimeout(() => {
|
||||
if (this.window) {
|
||||
sendStyles(this.window);
|
||||
sendCaptionLog(this.window, 'set');
|
||||
}
|
||||
}, 1000);
|
||||
|
||||
this.window.setAlwaysOnTop(true, 'screen-saver')
|
||||
|
||||
this.window.on('ready-to-show', () => {
|
||||
this.window?.show()
|
||||
})
|
||||
|
||||
this.window.on('close', () => {
|
||||
if(this.window) {
|
||||
allConfig.captionWindowWidth = this.window?.getBounds().width;
|
||||
}
|
||||
})
|
||||
|
||||
this.window.on('closed', () => {
|
||||
this.window = undefined
|
||||
})
|
||||
@@ -46,7 +45,7 @@ class CaptionWindow {
|
||||
shell.openExternal(details.url)
|
||||
return { action: 'deny' }
|
||||
})
|
||||
|
||||
|
||||
if (is.dev && process.env['ELECTRON_RENDERER_URL']) {
|
||||
this.window.loadURL(`${process.env['ELECTRON_RENDERER_URL']}/#/caption`)
|
||||
} else {
|
||||
@@ -57,7 +56,6 @@ class CaptionWindow {
|
||||
}
|
||||
|
||||
public handleMessage() {
|
||||
// 字幕窗口请求创建控制窗口
|
||||
ipcMain.on('caption.controlWindow.activate', () => {
|
||||
if(!controlWindow.window){
|
||||
controlWindow.createWindow()
|
||||
@@ -66,22 +64,22 @@ class CaptionWindow {
|
||||
controlWindow.window.show()
|
||||
}
|
||||
})
|
||||
// 字幕窗口高度发生变化
|
||||
|
||||
ipcMain.on('caption.windowHeight.change', (_, height) => {
|
||||
if(this.window){
|
||||
this.window.setSize(this.window.getSize()[0], height)
|
||||
this.window.setSize(this.window.getSize()[0], height)
|
||||
}
|
||||
})
|
||||
// 关闭字幕窗口
|
||||
|
||||
ipcMain.on('caption.window.close', () => {
|
||||
if(this.window){
|
||||
this.window.close()
|
||||
}
|
||||
})
|
||||
// 是否固定在最前面
|
||||
ipcMain.on('caption.pin.set', (_, pinned) => {
|
||||
|
||||
ipcMain.on('caption.mouseEvents.ignore', (_, ignore: boolean) => {
|
||||
if(this.window){
|
||||
this.window.setAlwaysOnTop(pinned)
|
||||
this.window.setIgnoreMouseEvents(ignore, { forward: ignore })
|
||||
}
|
||||
})
|
||||
}
|
||||
176
src/main/ControlWindow.ts
Normal file
@@ -0,0 +1,176 @@
|
||||
import { shell, BrowserWindow, ipcMain, nativeTheme, dialog } from 'electron'
|
||||
import path from 'path'
|
||||
import { EngineInfo } from './types'
|
||||
import pidusage from 'pidusage'
|
||||
import { is } from '@electron-toolkit/utils'
|
||||
import icon from '../../build/icon.png?asset'
|
||||
import { captionWindow } from './CaptionWindow'
|
||||
import { allConfig } from './utils/AllConfig'
|
||||
import { captionEngine } from './utils/CaptionEngine'
|
||||
import { Log } from './utils/Log'
|
||||
|
||||
class ControlWindow {
|
||||
mounted: boolean = false;
|
||||
window: BrowserWindow | undefined;
|
||||
|
||||
public createWindow(): void {
|
||||
this.window = new BrowserWindow({
|
||||
icon: icon,
|
||||
width: 1200,
|
||||
height: 800,
|
||||
minWidth: 750,
|
||||
minHeight: 500,
|
||||
show: false,
|
||||
center: true,
|
||||
autoHideMenuBar: true,
|
||||
webPreferences: {
|
||||
preload: path.join(__dirname, '../preload/index.js'),
|
||||
sandbox: false
|
||||
}
|
||||
})
|
||||
|
||||
allConfig.readConfig()
|
||||
|
||||
this.window.on('ready-to-show', () => {
|
||||
this.window?.show()
|
||||
})
|
||||
|
||||
this.window.on('closed', () => {
|
||||
this.mounted = false
|
||||
this.window = undefined
|
||||
allConfig.writeConfig()
|
||||
})
|
||||
|
||||
this.window.webContents.setWindowOpenHandler((details) => {
|
||||
shell.openExternal(details.url)
|
||||
return { action: 'deny' }
|
||||
})
|
||||
|
||||
if (is.dev && process.env['ELECTRON_RENDERER_URL']) {
|
||||
this.window.loadURL(process.env['ELECTRON_RENDERER_URL'])
|
||||
} else {
|
||||
this.window.loadFile(path.join(__dirname, '../renderer/index.html'))
|
||||
}
|
||||
}
|
||||
|
||||
public handleMessage() {
|
||||
nativeTheme.on('updated', () => {
|
||||
if(allConfig.uiTheme === 'system'){
|
||||
if(nativeTheme.shouldUseDarkColors && this.window){
|
||||
this.window.webContents.send('control.nativeTheme.change', 'dark')
|
||||
}
|
||||
else if(!nativeTheme.shouldUseDarkColors && this.window){
|
||||
this.window.webContents.send('control.nativeTheme.change', 'light')
|
||||
}
|
||||
}
|
||||
})
|
||||
|
||||
ipcMain.handle('both.window.mounted', () => {
|
||||
this.mounted = true
|
||||
return allConfig.getFullConfig(Log.getAndClearLogQueue())
|
||||
})
|
||||
|
||||
ipcMain.handle('control.nativeTheme.get', () => {
|
||||
if(allConfig.uiTheme === 'system'){
|
||||
if(nativeTheme.shouldUseDarkColors) return 'dark'
|
||||
return 'light'
|
||||
}
|
||||
return allConfig.uiTheme
|
||||
})
|
||||
|
||||
ipcMain.handle('control.folder.select', async () => {
|
||||
const result = await dialog.showOpenDialog({
|
||||
properties: ['openDirectory']
|
||||
});
|
||||
|
||||
if (result.canceled) return "";
|
||||
return result.filePaths[0];
|
||||
})
|
||||
|
||||
ipcMain.handle('control.engine.info', async () => {
|
||||
const info: EngineInfo = {
|
||||
pid: 0, ppid: 0, port: 0, cpu: 0, mem: 0, elapsed: 0
|
||||
}
|
||||
if(captionEngine.status !== 'running') return info
|
||||
const stats = await pidusage(captionEngine.process.pid)
|
||||
info.pid = stats.pid
|
||||
info.ppid = stats.ppid
|
||||
info.port = captionEngine.port
|
||||
info.cpu = stats.cpu
|
||||
info.mem = stats.memory
|
||||
info.elapsed = stats.elapsed
|
||||
return info
|
||||
})
|
||||
|
||||
ipcMain.on('control.uiLanguage.change', (_, args) => {
|
||||
allConfig.uiLanguage = args
|
||||
if(captionWindow.window){
|
||||
captionWindow.window.webContents.send('control.uiLanguage.set', args)
|
||||
}
|
||||
})
|
||||
|
||||
ipcMain.on('control.uiTheme.change', (_, args) => {
|
||||
allConfig.uiTheme = args
|
||||
})
|
||||
|
||||
ipcMain.on('control.uiColor.change', (_, args) => {
|
||||
allConfig.uiColor = args
|
||||
})
|
||||
|
||||
ipcMain.on('control.leftBarWidth.change', (_, args) => {
|
||||
allConfig.leftBarWidth = args
|
||||
})
|
||||
|
||||
ipcMain.on('control.styles.change', (_, args) => {
|
||||
allConfig.setStyles(args)
|
||||
if(captionWindow.window){
|
||||
allConfig.sendStyles(captionWindow.window)
|
||||
}
|
||||
})
|
||||
|
||||
ipcMain.on('control.styles.reset', () => {
|
||||
allConfig.resetStyles()
|
||||
if(this.window){
|
||||
allConfig.sendStyles(this.window)
|
||||
}
|
||||
if(captionWindow.window){
|
||||
allConfig.sendStyles(captionWindow.window)
|
||||
}
|
||||
})
|
||||
|
||||
ipcMain.on('control.captionWindow.activate', () => {
|
||||
if(!captionWindow.window){
|
||||
captionWindow.createWindow()
|
||||
}
|
||||
else {
|
||||
captionWindow.window.show()
|
||||
}
|
||||
})
|
||||
|
||||
ipcMain.on('control.controls.change', (_, args) => {
|
||||
allConfig.setControls(args)
|
||||
})
|
||||
|
||||
ipcMain.on('control.engine.start', () => {
|
||||
captionEngine.start()
|
||||
})
|
||||
|
||||
ipcMain.on('control.engine.stop', () => {
|
||||
captionEngine.stop()
|
||||
})
|
||||
|
||||
ipcMain.on('control.engine.forceKill', () => {
|
||||
captionEngine.kill()
|
||||
})
|
||||
|
||||
ipcMain.on('control.captionLog.clear', () => {
|
||||
allConfig.captionLog.splice(0)
|
||||
})
|
||||
}
|
||||
|
||||
public sendErrorMessage(message: string) {
|
||||
this.window?.webContents.send('control.error.occurred', message)
|
||||
}
|
||||
}
|
||||
|
||||
export const controlWindow = new ControlWindow()
|
||||
@@ -1,109 +0,0 @@
|
||||
import { shell, BrowserWindow, ipcMain } from 'electron'
|
||||
import path from 'path'
|
||||
import { is } from '@electron-toolkit/utils'
|
||||
import icon from '../../resources/icon.png?asset'
|
||||
import { captionWindow } from './caption'
|
||||
import {
|
||||
captionEngine,
|
||||
captionLog,
|
||||
controls,
|
||||
setStyles,
|
||||
sendStyles,
|
||||
sendCaptionLog,
|
||||
setControls,
|
||||
sendControls
|
||||
} from './utils/config'
|
||||
|
||||
class ControlWindow {
|
||||
window: BrowserWindow | undefined;
|
||||
|
||||
public createWindow(): void {
|
||||
this.window = new BrowserWindow({
|
||||
icon: icon,
|
||||
width: 1200,
|
||||
height: 800,
|
||||
minWidth: 900,
|
||||
minHeight: 600,
|
||||
show: false,
|
||||
center: true,
|
||||
autoHideMenuBar: true,
|
||||
...(process.platform === 'linux' ? { icon } : {}),
|
||||
webPreferences: {
|
||||
preload: path.join(__dirname, '../preload/index.js'),
|
||||
sandbox: false
|
||||
}
|
||||
})
|
||||
|
||||
setTimeout(() => {
|
||||
if (this.window) {
|
||||
sendStyles(this.window) // 配置初始样式
|
||||
sendCaptionLog(this.window, 'set') // 配置当前字幕记录
|
||||
sendControls(this.window) // 配置字幕引擎配置
|
||||
}
|
||||
}, 1000);
|
||||
|
||||
|
||||
this.window.on('ready-to-show', () => {
|
||||
this.window?.show()
|
||||
})
|
||||
|
||||
this.window.on('closed', () => {
|
||||
this.window = undefined
|
||||
})
|
||||
|
||||
this.window.webContents.setWindowOpenHandler((details) => {
|
||||
shell.openExternal(details.url)
|
||||
return { action: 'deny' }
|
||||
})
|
||||
|
||||
if (is.dev && process.env['ELECTRON_RENDERER_URL']) {
|
||||
this.window.loadURL(process.env['ELECTRON_RENDERER_URL'])
|
||||
} else {
|
||||
this.window.loadFile(path.join(__dirname, '../renderer/index.html'))
|
||||
}
|
||||
}
|
||||
|
||||
public handleMessage() {
|
||||
// 控制窗口样式更新
|
||||
ipcMain.on('control.style.change', (_, args) => {
|
||||
setStyles(args)
|
||||
if(captionWindow.window){
|
||||
sendStyles(captionWindow.window)
|
||||
}
|
||||
})
|
||||
// 控制窗口请求创建字幕窗口
|
||||
ipcMain.on('control.captionWindow.activate', () => {
|
||||
if(!captionWindow.window){
|
||||
captionWindow.createWindow()
|
||||
}
|
||||
else {
|
||||
captionWindow.window.show()
|
||||
}
|
||||
})
|
||||
// 字幕引擎控制配置更新并启动引擎
|
||||
ipcMain.on('control.control.change', (_, args) => {
|
||||
setControls(args)
|
||||
})
|
||||
// 启动字幕引擎
|
||||
ipcMain.on('control.engine.start', () => {
|
||||
if(controls.engineEnabled){
|
||||
this.window?.webContents.send('control.engine.already')
|
||||
}
|
||||
else {
|
||||
captionEngine.start()
|
||||
this.window?.webContents.send('control.engine.started')
|
||||
}
|
||||
})
|
||||
// 停止字幕引擎
|
||||
ipcMain.on('control.engine.stop', () => {
|
||||
captionEngine.stop()
|
||||
this.window?.webContents.send('control.engine.stopped')
|
||||
})
|
||||
// 清空字幕记录
|
||||
ipcMain.on('control.caption.clear', () => {
|
||||
captionLog.splice(0)
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
export const controlWindow = new ControlWindow()
|
||||
11
src/main/i18n/index.ts
Normal file
@@ -0,0 +1,11 @@
|
||||
import zh from './lang/zh'
|
||||
import en from './lang/en'
|
||||
import ja from './lang/ja'
|
||||
import { allConfig } from '../utils/AllConfig'
|
||||
|
||||
export function i18n(key: string): string{
|
||||
if(allConfig.uiLanguage === 'zh') return zh[key] || key
|
||||
else if(allConfig.uiLanguage === 'en') return en[key] || key
|
||||
else if(allConfig.uiLanguage === 'ja') return ja[key] || key
|
||||
else return key
|
||||
}
|
||||
9
src/main/i18n/lang/en.ts
Normal file
@@ -0,0 +1,9 @@
|
||||
export default {
|
||||
"gummy.key.missing": "API KEY is not set, and the DASHSCOPE_API_KEY environment variable is not detected. To use the gummy engine, you need to obtain an API KEY from the Alibaba Cloud Bailian platform and add it to the settings or configure it in the local environment variables.",
|
||||
"platform.unsupported": "Unsupported platform: ",
|
||||
"engine.start.error": "Caption engine failed to start: ",
|
||||
"engine.output.parse.error": "Unable to parse caption engine output as a JSON object: ",
|
||||
"engine.error": "Caption engine error: ",
|
||||
"engine.shutdown.error": "Failed to shut down the caption engine process: ",
|
||||
"engine.start.timeout": "Caption engine startup timeout, automatically force stopped"
|
||||
}
|
||||
9
src/main/i18n/lang/ja.ts
Normal file
@@ -0,0 +1,9 @@
|
||||
export default {
|
||||
"gummy.key.missing": "API KEY が設定されておらず、DASHSCOPE_API_KEY 環境変数も検出されていません。Gummy エンジンを使用するには、Alibaba Cloud Bailian プラットフォームから API KEY を取得し、設定に追加するか、ローカルの環境変数に設定する必要があります。",
|
||||
"platform.unsupported": "サポートされていないプラットフォーム: ",
|
||||
"engine.start.error": "字幕エンジンの起動に失敗しました: ",
|
||||
"engine.output.parse.error": "字幕エンジンの出力を JSON オブジェクトとして解析できませんでした: ",
|
||||
"engine.error": "字幕エンジンエラー: ",
|
||||
"engine.shutdown.error": "字幕エンジンプロセスの終了に失敗しました: ",
|
||||
"engine.start.timeout": "字幕エンジンの起動がタイムアウトしました。自動的に強制停止しました"
|
||||
}
|
||||
9
src/main/i18n/lang/zh.ts
Normal file
@@ -0,0 +1,9 @@
|
||||
export default {
|
||||
"gummy.key.missing": "没有设置 API KEY,也没有检测到 DASHSCOPE_API_KEY 环境变量。如果要使用 gummy 引擎,需要在阿里云百炼平台获取 API KEY,并在添加到设置中或者配置到本机环境变量。",
|
||||
"platform.unsupported": "不支持的平台:",
|
||||
"engine.start.error": "字幕引擎启动失败:",
|
||||
"engine.output.parse.error": "字幕引擎输出内容无法解析为 JSON 对象:",
|
||||
"engine.error": "字幕引擎错误:",
|
||||
"engine.shutdown.error": "字幕引擎进程关闭失败:",
|
||||
"engine.start.timeout": "字幕引擎启动超时,已自动强制停止"
|
||||
}
|
||||
@@ -1,8 +1,9 @@
|
||||
import { app, BrowserWindow } from 'electron'
|
||||
import { electronApp, optimizer } from '@electron-toolkit/utils'
|
||||
import { controlWindow } from './control'
|
||||
import { captionWindow } from './caption'
|
||||
import { captionEngine } from './utils/config'
|
||||
import { controlWindow } from './ControlWindow'
|
||||
import { captionWindow } from './CaptionWindow'
|
||||
import { allConfig } from './utils/AllConfig'
|
||||
import { captionEngine } from './utils/CaptionEngine'
|
||||
|
||||
app.whenReady().then(() => {
|
||||
electronApp.setAppUserModelId('com.himeditator.autocaption')
|
||||
@@ -23,8 +24,9 @@ app.whenReady().then(() => {
|
||||
})
|
||||
})
|
||||
|
||||
app.on('will-quit', async () => {
|
||||
captionEngine.stop()
|
||||
app.on('will-quit', async () => {
|
||||
captionEngine.kill()
|
||||
allConfig.writeConfig()
|
||||
});
|
||||
|
||||
app.on('window-all-closed', () => {
|
||||
|
||||
@@ -1,13 +1,52 @@
|
||||
export type UILanguage = "zh" | "en" | "ja"
|
||||
|
||||
export type UITheme = "light" | "dark" | "system"
|
||||
|
||||
export interface Controls {
|
||||
engineEnabled: boolean,
|
||||
sourceLang: string,
|
||||
targetLang: string,
|
||||
transModel: string,
|
||||
ollamaName: string,
|
||||
ollamaUrl: string,
|
||||
ollamaApiKey: string,
|
||||
engine: string,
|
||||
audio: 0 | 1,
|
||||
translation: boolean,
|
||||
recording: boolean,
|
||||
API_KEY: string,
|
||||
voskModelPath: string,
|
||||
sosvModelPath: string,
|
||||
glmUrl: string,
|
||||
glmModel: string,
|
||||
glmApiKey: string,
|
||||
recordingPath: string,
|
||||
customized: boolean,
|
||||
customizedApp: string,
|
||||
customizedCommand: string,
|
||||
startTimeoutSeconds: number
|
||||
}
|
||||
|
||||
export interface Styles {
|
||||
lineNumber: number,
|
||||
lineBreak: number,
|
||||
fontFamily: string,
|
||||
fontSize: number,
|
||||
fontColor: string,
|
||||
fontWeight: number,
|
||||
background: string,
|
||||
opacity: number,
|
||||
showPreview: boolean,
|
||||
transDisplay: boolean,
|
||||
transFontFamily: string,
|
||||
transFontSize: number,
|
||||
transFontColor: string
|
||||
transFontColor: string,
|
||||
transFontWeight: number,
|
||||
textShadow: boolean,
|
||||
offsetX: number,
|
||||
offsetY: number,
|
||||
blur: number,
|
||||
textShadowColor: string
|
||||
}
|
||||
|
||||
export interface CaptionItem {
|
||||
@@ -18,14 +57,30 @@ export interface CaptionItem {
|
||||
translation: string
|
||||
}
|
||||
|
||||
export interface Controls {
|
||||
engineEnabled: boolean,
|
||||
sourceLang: string,
|
||||
targetLang: string,
|
||||
engine: string,
|
||||
audio: 0 | 1,
|
||||
translation: boolean,
|
||||
customized: boolean,
|
||||
customizedApp: string,
|
||||
customizedCommand: string
|
||||
export interface SoftwareLogItem {
|
||||
type: "INFO" | "WARN" | "ERROR",
|
||||
index: number,
|
||||
time: string,
|
||||
text: string
|
||||
}
|
||||
|
||||
export interface FullConfig {
|
||||
platform: string,
|
||||
uiLanguage: UILanguage,
|
||||
uiTheme: UITheme,
|
||||
uiColor: string,
|
||||
leftBarWidth: number,
|
||||
styles: Styles,
|
||||
controls: Controls,
|
||||
captionLog: CaptionItem[],
|
||||
softwareLog: SoftwareLogItem[]
|
||||
}
|
||||
|
||||
export interface EngineInfo {
|
||||
pid: number,
|
||||
ppid: number,
|
||||
port:number,
|
||||
cpu: number,
|
||||
mem: number,
|
||||
elapsed: number
|
||||
}
|
||||
211
src/main/utils/AllConfig.ts
Normal file
@@ -0,0 +1,211 @@
|
||||
import {
|
||||
UILanguage, UITheme, Styles, Controls,
|
||||
CaptionItem, FullConfig, SoftwareLogItem
|
||||
} from '../types'
|
||||
import { Log } from './Log'
|
||||
import { app, BrowserWindow } from 'electron'
|
||||
import { passwordMaskingForObject } from './UtilsFunc'
|
||||
import * as path from 'path'
|
||||
import * as fs from 'fs'
|
||||
import * as os from 'os'
|
||||
|
||||
interface CaptionTranslation {
|
||||
time_s: string,
|
||||
translation: string
|
||||
}
|
||||
|
||||
function getDesktopPath() {
|
||||
const homeDir = os.homedir()
|
||||
return path.join(homeDir, 'Desktop')
|
||||
}
|
||||
|
||||
const defaultStyles: Styles = {
|
||||
lineNumber: 1,
|
||||
lineBreak: 1,
|
||||
fontFamily: 'sans-serif',
|
||||
fontSize: 24,
|
||||
fontColor: '#000000',
|
||||
fontWeight: 4,
|
||||
background: '#dbe2ef',
|
||||
opacity: 80,
|
||||
showPreview: true,
|
||||
transDisplay: true,
|
||||
transFontFamily: 'sans-serif',
|
||||
transFontSize: 24,
|
||||
transFontColor: '#000000',
|
||||
transFontWeight: 4,
|
||||
textShadow: false,
|
||||
offsetX: 2,
|
||||
offsetY: 2,
|
||||
blur: 0,
|
||||
textShadowColor: '#ffffff'
|
||||
};
|
||||
|
||||
const defaultControls: Controls = {
|
||||
sourceLang: 'en',
|
||||
targetLang: 'zh',
|
||||
transModel: 'ollama',
|
||||
ollamaName: 'qwen2.5:0.5b',
|
||||
ollamaUrl: 'http://localhost:11434',
|
||||
ollamaApiKey: '',
|
||||
engine: 'gummy',
|
||||
audio: 0,
|
||||
engineEnabled: false,
|
||||
API_KEY: '',
|
||||
voskModelPath: '',
|
||||
sosvModelPath: '',
|
||||
glmUrl: 'https://open.bigmodel.cn/api/paas/v4/audio/transcriptions',
|
||||
glmModel: 'glm-asr-2512',
|
||||
glmApiKey: '',
|
||||
recordingPath: getDesktopPath(),
|
||||
translation: true,
|
||||
recording: false,
|
||||
customized: false,
|
||||
customizedApp: '',
|
||||
customizedCommand: '',
|
||||
startTimeoutSeconds: 30
|
||||
};
|
||||
|
||||
|
||||
class AllConfig {
|
||||
captionWindowWidth: number = 900;
|
||||
|
||||
uiLanguage: UILanguage = 'zh';
|
||||
leftBarWidth: number = 8;
|
||||
uiTheme: UITheme = 'system';
|
||||
uiColor: string = '#1677ff';
|
||||
styles: Styles = {...defaultStyles};
|
||||
controls: Controls = {...defaultControls};
|
||||
|
||||
lastLogIndex: number = -1;
|
||||
captionLog: CaptionItem[] = [];
|
||||
|
||||
constructor() {}
|
||||
|
||||
public readConfig() {
|
||||
const configPath = path.join(app.getPath('userData'), 'config.json')
|
||||
if(fs.existsSync(configPath)){
|
||||
Log.info('Read Config from:', configPath)
|
||||
const config = JSON.parse(fs.readFileSync(configPath, 'utf-8'))
|
||||
if(config.captionWindowWidth) this.captionWindowWidth = config.captionWindowWidth
|
||||
if(config.uiLanguage) this.uiLanguage = config.uiLanguage
|
||||
if(config.uiTheme) this.uiTheme = config.uiTheme
|
||||
if(config.uiColor) this.uiColor = config.uiColor
|
||||
if(config.leftBarWidth) this.leftBarWidth = config.leftBarWidth
|
||||
if(config.styles) this.setStyles(config.styles)
|
||||
if(config.controls) this.setControls(config.controls)
|
||||
}
|
||||
}
|
||||
|
||||
public writeConfig() {
|
||||
const config = {
|
||||
captionWindowWidth: this.captionWindowWidth,
|
||||
uiLanguage: this.uiLanguage,
|
||||
uiTheme: this.uiTheme,
|
||||
uiColor: this.uiColor,
|
||||
leftBarWidth: this.leftBarWidth,
|
||||
controls: this.controls,
|
||||
styles: this.styles
|
||||
}
|
||||
const configPath = path.join(app.getPath('userData'), 'config.json')
|
||||
fs.writeFileSync(configPath, JSON.stringify(config, null, 2))
|
||||
Log.info('Write Config to:', configPath)
|
||||
}
|
||||
|
||||
public getFullConfig(softwareLog: SoftwareLogItem[]): FullConfig {
|
||||
return {
|
||||
platform: process.platform,
|
||||
uiLanguage: this.uiLanguage,
|
||||
uiTheme: this.uiTheme,
|
||||
uiColor: this.uiColor,
|
||||
leftBarWidth: this.leftBarWidth,
|
||||
styles: this.styles,
|
||||
controls: this.controls,
|
||||
captionLog: this.captionLog,
|
||||
softwareLog: softwareLog
|
||||
}
|
||||
}
|
||||
|
||||
public setStyles(args: Object) {
|
||||
for(let key in this.styles) {
|
||||
if(key in args) {
|
||||
this.styles[key] = args[key]
|
||||
}
|
||||
}
|
||||
Log.info('Set Styles:', this.styles)
|
||||
}
|
||||
|
||||
public resetStyles() {
|
||||
this.setStyles(defaultStyles)
|
||||
}
|
||||
|
||||
public sendStyles(window: BrowserWindow) {
|
||||
window.webContents.send('both.styles.set', this.styles)
|
||||
Log.info(`Send Styles to #${window.id}:`, this.styles)
|
||||
}
|
||||
|
||||
public setControls(args: Object) {
|
||||
const engineEnabled = this.controls.engineEnabled
|
||||
for(let key in this.controls){
|
||||
if(key in args) {
|
||||
this.controls[key] = args[key]
|
||||
}
|
||||
}
|
||||
this.controls.engineEnabled = engineEnabled
|
||||
Log.info('Set Controls:', passwordMaskingForObject(this.controls))
|
||||
}
|
||||
|
||||
public sendControls(window: BrowserWindow, info = true) {
|
||||
window.webContents.send('control.controls.set', this.controls)
|
||||
if(info) Log.info(`Send Controls to #${window.id}:`, this.controls)
|
||||
}
|
||||
|
||||
public updateCaptionLog(log: CaptionItem) {
|
||||
let command: 'add' | 'upd' = 'add'
|
||||
if(
|
||||
this.captionLog.length &&
|
||||
this.lastLogIndex === log.index
|
||||
) {
|
||||
this.captionLog.splice(this.captionLog.length - 1, 1, log)
|
||||
command = 'upd'
|
||||
}
|
||||
else {
|
||||
this.captionLog.push(log)
|
||||
this.lastLogIndex = log.index
|
||||
}
|
||||
this.captionLog[this.captionLog.length - 1].index = this.captionLog.length
|
||||
for(const window of BrowserWindow.getAllWindows()){
|
||||
this.sendCaptionLog(window, command)
|
||||
}
|
||||
}
|
||||
|
||||
public updateCaptionTranslation(trans: CaptionTranslation){
|
||||
for(let i = this.captionLog.length - 1; i >= 0; i--){
|
||||
if(this.captionLog[i].time_s === trans.time_s){
|
||||
this.captionLog[i].translation = trans.translation
|
||||
for(const window of BrowserWindow.getAllWindows()){
|
||||
this.sendCaptionLog(window, 'upd', i)
|
||||
}
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
public sendCaptionLog(
|
||||
window: BrowserWindow,
|
||||
command: 'add' | 'upd' | 'set',
|
||||
index: number | undefined = undefined
|
||||
) {
|
||||
if(command === 'add'){
|
||||
window.webContents.send(`both.captionLog.add`, this.captionLog.at(-1))
|
||||
}
|
||||
else if(command === 'upd'){
|
||||
if(index !== undefined) window.webContents.send(`both.captionLog.upd`, this.captionLog[index])
|
||||
else window.webContents.send(`both.captionLog.upd`, this.captionLog.at(-1))
|
||||
}
|
||||
else if(command === 'set'){
|
||||
window.webContents.send(`both.captionLog.set`, this.captionLog)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
export const allConfig = new AllConfig()
|
||||
300
src/main/utils/CaptionEngine.ts
Normal file
@@ -0,0 +1,300 @@
|
||||
import { exec, spawn } from 'child_process'
|
||||
import { app } from 'electron'
|
||||
import { is } from '@electron-toolkit/utils'
|
||||
import * as path from 'path'
|
||||
import * as net from 'net'
|
||||
import { controlWindow } from '../ControlWindow'
|
||||
import { allConfig } from './AllConfig'
|
||||
import { i18n } from '../i18n'
|
||||
import { Log } from './Log'
|
||||
import { passwordMaskingForList } from './UtilsFunc'
|
||||
|
||||
export class CaptionEngine {
|
||||
appPath: string = ''
|
||||
command: string[] = []
|
||||
process: any | undefined
|
||||
client: net.Socket | undefined
|
||||
port: number = 8080
|
||||
status: 'running' | 'starting' | 'stopping' | 'stopped' | 'starting-timeout' = 'stopped'
|
||||
timerID: NodeJS.Timeout | undefined
|
||||
startTimeoutID: NodeJS.Timeout | undefined
|
||||
|
||||
private getApp(): boolean {
|
||||
if (allConfig.controls.customized) {
|
||||
Log.info('Using customized caption engine')
|
||||
this.appPath = allConfig.controls.customizedApp
|
||||
this.command = allConfig.controls.customizedCommand.split(' ')
|
||||
this.port = Math.floor(Math.random() * (65535 - 1024 + 1)) + 1024
|
||||
this.command.push('-p', this.port.toString())
|
||||
}
|
||||
else {
|
||||
if(allConfig.controls.engine === 'gummy' &&
|
||||
!allConfig.controls.API_KEY && !process.env.DASHSCOPE_API_KEY
|
||||
) {
|
||||
controlWindow.sendErrorMessage(i18n('gummy.key.missing'))
|
||||
return false
|
||||
}
|
||||
this.command = []
|
||||
if (is.dev) {
|
||||
if(process.platform === "win32") {
|
||||
this.appPath = path.join(
|
||||
app.getAppPath(), 'engine',
|
||||
'.venv', 'Scripts', 'python.exe'
|
||||
)
|
||||
this.command.push(path.join(
|
||||
app.getAppPath(), 'engine', 'main.py'
|
||||
))
|
||||
// this.appPath = path.join(app.getAppPath(), 'engine', 'dist', 'main.exe')
|
||||
}
|
||||
else {
|
||||
this.appPath = path.join(
|
||||
app.getAppPath(), 'engine',
|
||||
'.venv', 'bin', 'python3'
|
||||
)
|
||||
this.command.push(path.join(
|
||||
app.getAppPath(), 'engine', 'main.py'
|
||||
))
|
||||
}
|
||||
}
|
||||
else {
|
||||
if(process.platform === 'win32') {
|
||||
this.appPath = path.join(process.resourcesPath, 'engine', 'main.exe')
|
||||
}
|
||||
else {
|
||||
this.appPath = path.join(process.resourcesPath, 'engine', 'main', 'main')
|
||||
}
|
||||
}
|
||||
this.command.push('-a', allConfig.controls.audio ? '1' : '0')
|
||||
if(allConfig.controls.recording) {
|
||||
this.command.push('-r', '1')
|
||||
this.command.push('-rp', `"${allConfig.controls.recordingPath}"`)
|
||||
}
|
||||
this.port = Math.floor(Math.random() * (65535 - 1024 + 1)) + 1024
|
||||
this.command.push('-p', this.port.toString())
|
||||
this.command.push(
|
||||
'-t', allConfig.controls.translation ?
|
||||
allConfig.controls.targetLang : 'none'
|
||||
)
|
||||
|
||||
if(allConfig.controls.engine === 'gummy') {
|
||||
this.command.push('-e', 'gummy')
|
||||
this.command.push('-s', allConfig.controls.sourceLang)
|
||||
if(allConfig.controls.API_KEY) {
|
||||
this.command.push('-k', allConfig.controls.API_KEY)
|
||||
}
|
||||
}
|
||||
else if(allConfig.controls.engine === 'vosk'){
|
||||
this.command.push('-e', 'vosk')
|
||||
this.command.push('-vosk', `"${allConfig.controls.voskModelPath}"`)
|
||||
this.command.push('-tm', allConfig.controls.transModel)
|
||||
this.command.push('-omn', allConfig.controls.ollamaName)
|
||||
if(allConfig.controls.ollamaUrl) this.command.push('-ourl', allConfig.controls.ollamaUrl)
|
||||
if(allConfig.controls.ollamaApiKey) this.command.push('-okey', allConfig.controls.ollamaApiKey)
|
||||
}
|
||||
else if(allConfig.controls.engine === 'sosv'){
|
||||
this.command.push('-e', 'sosv')
|
||||
this.command.push('-s', allConfig.controls.sourceLang)
|
||||
this.command.push('-sosv', `"${allConfig.controls.sosvModelPath}"`)
|
||||
this.command.push('-tm', allConfig.controls.transModel)
|
||||
this.command.push('-omn', allConfig.controls.ollamaName)
|
||||
if(allConfig.controls.ollamaUrl) this.command.push('-ourl', allConfig.controls.ollamaUrl)
|
||||
if(allConfig.controls.ollamaApiKey) this.command.push('-okey', allConfig.controls.ollamaApiKey)
|
||||
}
|
||||
else if(allConfig.controls.engine === 'glm'){
|
||||
this.command.push('-e', 'glm')
|
||||
this.command.push('-s', allConfig.controls.sourceLang)
|
||||
this.command.push('-gurl', allConfig.controls.glmUrl)
|
||||
this.command.push('-gmodel', allConfig.controls.glmModel)
|
||||
if(allConfig.controls.glmApiKey) {
|
||||
this.command.push('-gkey', allConfig.controls.glmApiKey)
|
||||
}
|
||||
this.command.push('-tm', allConfig.controls.transModel)
|
||||
this.command.push('-omn', allConfig.controls.ollamaName)
|
||||
if(allConfig.controls.ollamaUrl) this.command.push('-ourl', allConfig.controls.ollamaUrl)
|
||||
if(allConfig.controls.ollamaApiKey) this.command.push('-okey', allConfig.controls.ollamaApiKey)
|
||||
}
|
||||
}
|
||||
Log.info('Engine Path:', this.appPath)
|
||||
Log.info('Engine Command:', passwordMaskingForList(this.command))
|
||||
return true
|
||||
}
|
||||
|
||||
public connect() {
|
||||
if(this.client) { Log.warn('Client already exists, ignoring...') }
|
||||
if (this.startTimeoutID) {
|
||||
clearTimeout(this.startTimeoutID)
|
||||
this.startTimeoutID = undefined
|
||||
}
|
||||
this.client = net.createConnection({ port: this.port }, () => {
|
||||
Log.info('Connected to caption engine server');
|
||||
});
|
||||
this.status = 'running'
|
||||
allConfig.controls.engineEnabled = true
|
||||
if(controlWindow.window){
|
||||
allConfig.sendControls(controlWindow.window, false)
|
||||
controlWindow.window.webContents.send(
|
||||
'control.engine.started',
|
||||
this.process.pid
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
public sendCommand(command: string, content: string = "") {
|
||||
if(this.client === undefined) {
|
||||
Log.error('Client not initialized yet')
|
||||
return
|
||||
}
|
||||
const data = JSON.stringify({command, content})
|
||||
this.client.write(data);
|
||||
Log.info(`Send data to python server: ${data}`);
|
||||
}
|
||||
|
||||
public start() {
|
||||
if (this.status !== 'stopped') {
|
||||
Log.warn('Caption engine is not stopped, current status:', this.status)
|
||||
return
|
||||
}
|
||||
if(!this.getApp()){ return }
|
||||
|
||||
this.process = spawn(this.appPath, this.command)
|
||||
this.status = 'starting'
|
||||
Log.info('Caption Engine Starting, PID:', this.process.pid)
|
||||
|
||||
const timeoutMs = allConfig.controls.startTimeoutSeconds * 1000
|
||||
this.startTimeoutID = setTimeout(() => {
|
||||
if (this.status === 'starting') {
|
||||
Log.warn(`Engine start timeout after ${allConfig.controls.startTimeoutSeconds} seconds, forcing kill...`)
|
||||
this.status = 'starting-timeout'
|
||||
controlWindow.sendErrorMessage(i18n('engine.start.timeout'))
|
||||
this.kill()
|
||||
}
|
||||
}, timeoutMs)
|
||||
|
||||
this.process.stdout.on('data', (data: any) => {
|
||||
const lines = data.toString().split('\n')
|
||||
lines.forEach((line: string) => {
|
||||
if (line.trim()) {
|
||||
try {
|
||||
const data_obj = JSON.parse(line)
|
||||
handleEngineData(data_obj)
|
||||
} catch (e) {
|
||||
// controlWindow.sendErrorMessage(i18n('engine.output.parse.error') + e)
|
||||
Log.error('Error parsing JSON:', e)
|
||||
}
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
this.process.stderr.on('data', (data: any) => {
|
||||
const lines = data.toString().split('\n')
|
||||
lines.forEach((line: string) => {
|
||||
if(line.trim()){
|
||||
Log.error(line)
|
||||
}
|
||||
})
|
||||
});
|
||||
|
||||
this.process.on('close', (code: any) => {
|
||||
this.process = undefined;
|
||||
this.client = undefined
|
||||
allConfig.controls.engineEnabled = false
|
||||
if(controlWindow.window){
|
||||
allConfig.sendControls(controlWindow.window, false)
|
||||
controlWindow.window.webContents.send('control.engine.stopped')
|
||||
}
|
||||
this.status = 'stopped'
|
||||
clearInterval(this.timerID)
|
||||
if (this.startTimeoutID) {
|
||||
clearTimeout(this.startTimeoutID)
|
||||
this.startTimeoutID = undefined
|
||||
}
|
||||
Log.info(`Engine exited with code ${code}`)
|
||||
});
|
||||
}
|
||||
|
||||
public stop() {
|
||||
if(this.status !== 'running'){
|
||||
Log.warn('Trying to stop engine which is not running, current status:', this.status)
|
||||
}
|
||||
this.sendCommand('stop')
|
||||
if(this.client){
|
||||
this.client.destroy()
|
||||
this.client = undefined
|
||||
}
|
||||
this.status = 'stopping'
|
||||
this.timerID = setTimeout(() => {
|
||||
if(this.status !== 'stopping') return
|
||||
Log.warn('Engine process still not stopped, trying to kill...')
|
||||
this.kill()
|
||||
}, 4000);
|
||||
}
|
||||
|
||||
public kill(){
|
||||
if(!this.process || !this.process.pid) return
|
||||
if(this.status !== 'running'){
|
||||
Log.warn('Trying to kill engine which is not running, current status:', this.status)
|
||||
}
|
||||
Log.warn('Killing engine process, PID:', this.process.pid)
|
||||
|
||||
if (this.startTimeoutID) {
|
||||
clearTimeout(this.startTimeoutID)
|
||||
this.startTimeoutID = undefined
|
||||
}
|
||||
if(this.client){
|
||||
this.client.destroy()
|
||||
this.client = undefined
|
||||
}
|
||||
if (this.process.pid) {
|
||||
let cmd = `kill -9 ${this.process.pid}`;
|
||||
if (process.platform === "win32") {
|
||||
cmd = `taskkill /pid ${this.process.pid} /t /f`
|
||||
}
|
||||
exec(cmd, (error) => {
|
||||
if (error) {
|
||||
Log.error('Failed to kill process:', error)
|
||||
} else {
|
||||
Log.info('Process killed successfully')
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function handleEngineData(data: any) {
|
||||
if(data.command === 'connect'){
|
||||
captionEngine.connect()
|
||||
}
|
||||
else if(data.command === 'kill') {
|
||||
if(captionEngine.status !== 'stopped') {
|
||||
Log.warn('Error occurred, trying to kill caption engine...')
|
||||
captionEngine.kill()
|
||||
}
|
||||
}
|
||||
else if(data.command === 'caption') {
|
||||
allConfig.updateCaptionLog(data);
|
||||
}
|
||||
else if(data.command === 'translation') {
|
||||
allConfig.updateCaptionTranslation(data);
|
||||
}
|
||||
else if(data.command === 'print') {
|
||||
console.log(data.content)
|
||||
}
|
||||
else if(data.command === 'info') {
|
||||
Log.info('Engine Info:', data.content)
|
||||
}
|
||||
else if(data.command === 'warn') {
|
||||
Log.warn('Engine Warn:', data.content)
|
||||
}
|
||||
else if(data.command === 'error') {
|
||||
Log.error('Engine Error:', data.content)
|
||||
controlWindow.sendErrorMessage(/*i18n('engine.error') +*/ data.content)
|
||||
}
|
||||
else if(data.command === 'usage') {
|
||||
Log.info('Engine Token Usage: ', data.content)
|
||||
}
|
||||
else {
|
||||
Log.warn('Unknown command:', data)
|
||||
}
|
||||
}
|
||||
|
||||
export const captionEngine = new CaptionEngine()
|
||||
58
src/main/utils/Log.ts
Normal file
@@ -0,0 +1,58 @@
|
||||
import { controlWindow } from "../ControlWindow"
|
||||
import { type SoftwareLogItem } from "../types"
|
||||
|
||||
let logIndex = 0
|
||||
const logQueue: SoftwareLogItem[] = []
|
||||
|
||||
function getTimeString() {
|
||||
const now = new Date()
|
||||
const HH = String(now.getHours()).padStart(2, '0')
|
||||
const MM = String(now.getMinutes()).padStart(2, '0')
|
||||
const SS = String(now.getSeconds()).padStart(2, '0')
|
||||
const MS = String(now.getMilliseconds()).padStart(3, '0')
|
||||
return `${HH}:${MM}:${SS}.${MS}`
|
||||
}
|
||||
|
||||
export class Log {
|
||||
static getAndClearLogQueue() {
|
||||
const copiedQueue = structuredClone(logQueue)
|
||||
logQueue.length = 0
|
||||
return copiedQueue
|
||||
}
|
||||
|
||||
static handleLog(logType: "INFO" | "WARN" | "ERROR", ...msg: any[]) {
|
||||
const timeStr = getTimeString()
|
||||
const logPre = `[${logType} ${timeStr}]`
|
||||
let logStr = ""
|
||||
for(let i = 0; i < msg.length; i++) {
|
||||
logStr += i ? " " : ""
|
||||
if(typeof msg[i] === "string") logStr += msg[i]
|
||||
else logStr += JSON.stringify(msg[i], undefined, 2)
|
||||
}
|
||||
console.log(logPre, logStr)
|
||||
const logItem: SoftwareLogItem = {
|
||||
type: logType,
|
||||
index: ++logIndex,
|
||||
time: timeStr,
|
||||
text: logStr
|
||||
}
|
||||
if(controlWindow.mounted && controlWindow.window) {
|
||||
controlWindow.window.webContents.send('control.softwareLog.add', logItem)
|
||||
}
|
||||
else {
|
||||
logQueue.push(logItem)
|
||||
}
|
||||
}
|
||||
|
||||
static info(...msg: any[]){
|
||||
this.handleLog("INFO", ...msg)
|
||||
}
|
||||
|
||||
static warn(...msg: any[]){
|
||||
this.handleLog("WARN", ...msg)
|
||||
}
|
||||
|
||||
static error(...msg: any[]){
|
||||
this.handleLog("ERROR", ...msg)
|
||||
}
|
||||
}
|
||||
24
src/main/utils/UtilsFunc.ts
Normal file
@@ -0,0 +1,24 @@
|
||||
function passwordMasking(pwd: string) {
|
||||
return pwd.replace(/./g, '*')
|
||||
}
|
||||
|
||||
export function passwordMaskingForList(args: string[]) {
|
||||
const maskedArgs = [...args]
|
||||
for(let i = 1; i < maskedArgs.length; i++) {
|
||||
if(maskedArgs[i-1] === '-k' || maskedArgs[i-1] === '-okey' || maskedArgs[i-1] === '-gkey') {
|
||||
maskedArgs[i] = passwordMasking(maskedArgs[i])
|
||||
}
|
||||
}
|
||||
return maskedArgs
|
||||
}
|
||||
|
||||
export function passwordMaskingForObject(args: Record<string, any>) {
|
||||
const maskedArgs = {...args}
|
||||
for(const key in maskedArgs) {
|
||||
const lKey = key.toLowerCase()
|
||||
if(lKey.includes('api') && lKey.includes('key')) {
|
||||
maskedArgs[key] = passwordMasking(maskedArgs[key])
|
||||
}
|
||||
}
|
||||
return maskedArgs
|
||||
}
|
||||
@@ -1,89 +0,0 @@
|
||||
import { Styles, CaptionItem, Controls } from '../types'
|
||||
import { BrowserWindow } from 'electron'
|
||||
import { CaptionEngine } from './engine'
|
||||
|
||||
export const captionEngine = new CaptionEngine()
|
||||
|
||||
export const styles: Styles = {
|
||||
fontFamily: 'sans-serif',
|
||||
fontSize: 24,
|
||||
fontColor: '#000000',
|
||||
background: '#dbe2ef',
|
||||
opacity: 80,
|
||||
transDisplay: true,
|
||||
transFontFamily: 'sans-serif',
|
||||
transFontSize: 24,
|
||||
transFontColor: '#000000'
|
||||
}
|
||||
|
||||
export const captionLog: CaptionItem[] = []
|
||||
|
||||
export const controls: Controls = {
|
||||
sourceLang: 'en',
|
||||
targetLang: 'zh',
|
||||
engine: 'gummy',
|
||||
audio: 0,
|
||||
engineEnabled: false,
|
||||
translation: true,
|
||||
customized: false,
|
||||
customizedApp: '',
|
||||
customizedCommand: ''
|
||||
}
|
||||
|
||||
export let engineRunning: boolean = false
|
||||
|
||||
export function setStyles(args: any) {
|
||||
styles.fontFamily = args.fontFamily
|
||||
styles.fontSize = args.fontSize
|
||||
styles.fontColor = args.fontColor
|
||||
styles.background = args.background
|
||||
styles.opacity = args.opacity
|
||||
styles.transDisplay = args.transDisplay
|
||||
styles.transFontFamily = args.transFontFamily
|
||||
styles.transFontSize = args.transFontSize
|
||||
styles.transFontColor = args.transFontColor
|
||||
console.log('[INFO] Set Styles:', styles)
|
||||
}
|
||||
|
||||
export function sendStyles(window: BrowserWindow) {
|
||||
window.webContents.send('caption.style.set', styles)
|
||||
console.log(`[INFO] Send Styles to #${window.id}:`, styles)
|
||||
}
|
||||
|
||||
export function sendCaptionLog(window: BrowserWindow, command: string) {
|
||||
if(command === 'add'){
|
||||
window.webContents.send(`both.log.add`, captionLog[captionLog.length - 1])
|
||||
}
|
||||
else if(command === 'set'){
|
||||
window.webContents.send(`both.log.${command}`, captionLog)
|
||||
}
|
||||
}
|
||||
|
||||
export function addCaptionLog(log: CaptionItem) {
|
||||
if(captionLog.length && captionLog[captionLog.length - 1].index === log.index) {
|
||||
captionLog.splice(captionLog.length - 1, 1, log)
|
||||
}
|
||||
else {
|
||||
captionLog.push(log)
|
||||
}
|
||||
for(const window of BrowserWindow.getAllWindows()){
|
||||
sendCaptionLog(window, 'add')
|
||||
}
|
||||
}
|
||||
|
||||
export function setControls(args: any) {
|
||||
controls.sourceLang = args.sourceLang
|
||||
controls.targetLang = args.targetLang
|
||||
controls.engine = args.engine
|
||||
controls.audio = args.audio
|
||||
controls.translation = args.translation
|
||||
controls.customized = args.customized
|
||||
controls.customizedApp = args.customizedApp
|
||||
controls.customizedCommand = args.customizedCommand
|
||||
console.log('[INFO] Set Controls:', controls)
|
||||
}
|
||||
|
||||
export function sendControls(window: BrowserWindow) {
|
||||
window.webContents.send('control.control.set', controls)
|
||||
console.log(`[INFO] Send Controls to #${window.id}:`, controls)
|
||||
}
|
||||
@@ -1,103 +0,0 @@
|
||||
import { spawn, exec } from 'child_process'
|
||||
import { app } from 'electron'
|
||||
import { is } from '@electron-toolkit/utils'
|
||||
import path from 'path'
|
||||
import { addCaptionLog, controls } from './config'
|
||||
|
||||
export class CaptionEngine {
|
||||
appPath: string = ''
|
||||
command: string[] = []
|
||||
process: any | undefined
|
||||
|
||||
private getApp() {
|
||||
if(controls.customized && controls.customizedApp){
|
||||
this.appPath = controls.customizedApp
|
||||
this.command = [ controls.customizedCommand ]
|
||||
}
|
||||
else if(controls.engine === 'gummy'){
|
||||
let gummyName = ''
|
||||
if(process.platform === 'win32'){
|
||||
gummyName = 'main-gummy.exe'
|
||||
}
|
||||
else if(process.platform === 'linux'){
|
||||
gummyName = 'main-gummy'
|
||||
}
|
||||
else{
|
||||
throw new Error('Unsupported platform')
|
||||
}
|
||||
if(is.dev){
|
||||
this.appPath = path.join(
|
||||
app.getAppPath(),
|
||||
'python-subprocess', 'dist', gummyName
|
||||
)
|
||||
}
|
||||
else{
|
||||
this.appPath = path.join(
|
||||
process.resourcesPath,
|
||||
'python-subprocess', 'dist', gummyName
|
||||
)
|
||||
}
|
||||
this.command = []
|
||||
this.command.push('-s', controls.sourceLang)
|
||||
this.command.push('-t', controls.translation ? controls.targetLang : 'none')
|
||||
this.command.push('-a', controls.audio ? '1' : '0')
|
||||
|
||||
console.log('[INFO] engine', this.appPath)
|
||||
console.log('[INFO] engine command',this.command)
|
||||
}
|
||||
}
|
||||
|
||||
public start() {
|
||||
if (this.process) {
|
||||
this.stop();
|
||||
}
|
||||
this.getApp()
|
||||
this.process = spawn(this.appPath, this.command)
|
||||
controls.engineEnabled = true
|
||||
|
||||
console.log('[INFO] Caption Engine Started: ', {
|
||||
appPath: this.appPath,
|
||||
command: this.command
|
||||
})
|
||||
|
||||
this.process.stdout.on('data', (data) => {
|
||||
const lines = data.toString().split('\n');
|
||||
lines.forEach( (line: string) => {
|
||||
if (line.trim()) {
|
||||
try {
|
||||
const caption = JSON.parse(line);
|
||||
addCaptionLog(caption);
|
||||
} catch (e) {
|
||||
console.error('Error parsing JSON:', e);
|
||||
}
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
this.process.stderr.on('data', (data) => {
|
||||
console.error(`Python Error: ${data}`);
|
||||
});
|
||||
|
||||
this.process.on('close', (code: any) => {
|
||||
console.log(`Python process exited with code ${code}`);
|
||||
this.process = undefined;
|
||||
});
|
||||
}
|
||||
|
||||
public stop() {
|
||||
if (this.process) {
|
||||
if (process.platform === "win32" && this.process.pid) {
|
||||
exec(`taskkill /pid ${this.process.pid} /t /f`, (error) => {
|
||||
if (error) {
|
||||
console.error(`Failed to kill process: ${error}`);
|
||||
}
|
||||
});
|
||||
} else {
|
||||
this.process.kill('SIGKILL');
|
||||
}
|
||||
this.process = undefined;
|
||||
controls.engineEnabled = false;
|
||||
console.log('[INFO] Caption engine process stopped');
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -2,7 +2,7 @@
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<title>Auto Caption</title>
|
||||
<title>Auto Caption v1.1.1</title>
|
||||
<!-- https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP -->
|
||||
<meta
|
||||
http-equiv="Content-Security-Policy"
|
||||
|
||||
@@ -3,4 +3,25 @@
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { onMounted } from 'vue'
|
||||
import { FullConfig } from './types'
|
||||
import { useCaptionLogStore } from './stores/captionLog'
|
||||
import { useSoftwareLogStore } from './stores/softwareLog'
|
||||
import { useCaptionStyleStore } from './stores/captionStyle'
|
||||
import { useEngineControlStore } from './stores/engineControl'
|
||||
import { useGeneralSettingStore } from './stores/generalSetting'
|
||||
|
||||
onMounted(() => {
|
||||
window.electron.ipcRenderer.invoke('both.window.mounted').then((data: FullConfig) => {
|
||||
useGeneralSettingStore().uiLanguage = data.uiLanguage
|
||||
useGeneralSettingStore().uiTheme = data.uiTheme
|
||||
useGeneralSettingStore().uiColor = data.uiColor
|
||||
useGeneralSettingStore().leftBarWidth = data.leftBarWidth
|
||||
useCaptionStyleStore().setStyles(data.styles)
|
||||
useEngineControlStore().platform = data.platform
|
||||
useEngineControlStore().setControls(data.controls)
|
||||
useCaptionLogStore().captionData = data.captionLog
|
||||
useSoftwareLogStore().softwareLogs = data.softwareLog
|
||||
})
|
||||
})
|
||||
</script>
|
||||
|
||||
30
src/renderer/src/assets/input.css
Normal file
@@ -0,0 +1,30 @@
|
||||
.input-item {
|
||||
margin: 10px 0;
|
||||
}
|
||||
|
||||
.input-label {
|
||||
display: inline-block;
|
||||
width: 80px;
|
||||
text-align: right;
|
||||
margin-right: 10px;
|
||||
}
|
||||
|
||||
.switch-label {
|
||||
display: inline-block;
|
||||
min-width: 80px;
|
||||
text-align: right;
|
||||
margin-right: 10px;
|
||||
}
|
||||
|
||||
.input-area {
|
||||
display: inline-block;
|
||||
width: calc(100% - 100px);
|
||||
min-width: 100px;
|
||||
}
|
||||
|
||||
.input-item-value {
|
||||
width: 80px;
|
||||
text-align: right;
|
||||
font-size: 12px;
|
||||
color: var(--tag-color)
|
||||
}
|
||||
12
src/renderer/src/assets/main.css
Normal file
@@ -0,0 +1,12 @@
|
||||
:root {
|
||||
--control-background: #fff;
|
||||
--tag-color: rgba(0, 0, 0, 0.45);
|
||||
--icon-color: rgba(0, 0, 0, 0.88);
|
||||
}
|
||||
|
||||
body {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
height: 100vh;
|
||||
overflow: hidden;
|
||||
}
|
||||
@@ -1,6 +0,0 @@
|
||||
body {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
height: 100vh;
|
||||
overflow: hidden;
|
||||
}
|
||||
@@ -1,165 +0,0 @@
|
||||
<template>
|
||||
<div style="height: 20px;"></div>
|
||||
<a-card size="small" title="字幕控制">
|
||||
<template #extra>
|
||||
<a @click="applyChange">更改设置</a> |
|
||||
<a @click="cancelChange">取消更改</a>
|
||||
</template>
|
||||
<div class="control-item">
|
||||
<span class="control-label">源语言</span>
|
||||
<a-select
|
||||
class="control-input"
|
||||
v-model:value="currentSourceLang"
|
||||
:options="langList"
|
||||
></a-select>
|
||||
</div>
|
||||
<div class="control-item">
|
||||
<span class="control-label">翻译语言</span>
|
||||
<a-select
|
||||
class="control-input"
|
||||
v-model:value="currentTargetLang"
|
||||
:options="langList.filter((item) => item.value !== 'auto')"
|
||||
></a-select>
|
||||
</div>
|
||||
<div class="control-item">
|
||||
<span class="control-label">字幕引擎</span>
|
||||
<a-select
|
||||
class="control-input"
|
||||
v-model:value="currentEngine"
|
||||
:options="captionEngine"
|
||||
></a-select>
|
||||
</div>
|
||||
<div class="control-item">
|
||||
<span class="control-label">音频选择</span>
|
||||
<a-select
|
||||
class="control-input"
|
||||
v-model:value="currentAudio"
|
||||
:options="audioType"
|
||||
></a-select>
|
||||
</div>
|
||||
<div class="control-item">
|
||||
<span class="control-label">启用翻译</span>
|
||||
<a-switch v-model:checked="currentTranslation" />
|
||||
<span class="control-label">自定义引擎</span>
|
||||
<a-switch v-model:checked="currentCustomized" />
|
||||
</div>
|
||||
<div v-show="currentCustomized">
|
||||
<a-card size="small" title="自定义字幕引擎">
|
||||
<p class="customize-note">说明:允许用户使用自定义字幕引擎提供字幕。提供的引擎要能通过 <code>child_process.spawn()</code> 进行启动,且需要通过 IPC 与项目 node.js 后端进行通信。具体通信接口见后端实现。</p>
|
||||
<div class="control-item">
|
||||
<span class="control-label">引擎路径</span>
|
||||
<a-input
|
||||
class="control-input"
|
||||
v-model:value="currentCustomizedApp"
|
||||
></a-input>
|
||||
</div>
|
||||
<div class="control-item">
|
||||
<span class="control-label">引擎指令</span>
|
||||
<a-input
|
||||
class="control-input"
|
||||
v-model:value="currentCustomizedCommand"
|
||||
></a-input>
|
||||
</div>
|
||||
</a-card>
|
||||
</div>
|
||||
</a-card>
|
||||
<div style="height: 20px;"></div>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { ref, computed, watch } from 'vue'
|
||||
import { storeToRefs } from 'pinia'
|
||||
import { useCaptionControlStore } from '@renderer/stores/captionControl'
|
||||
import { notification } from 'ant-design-vue'
|
||||
|
||||
const captionControl = useCaptionControlStore()
|
||||
const { captionEngine, audioType, changeSignal } = storeToRefs(captionControl)
|
||||
|
||||
const currentSourceLang = ref('auto')
|
||||
const currentTargetLang = ref('zh')
|
||||
const currentEngine = ref('gummy')
|
||||
const currentAudio = ref<0 | 1>(0)
|
||||
const currentTranslation = ref<boolean>(false)
|
||||
|
||||
const currentCustomized = ref<boolean>(false)
|
||||
const currentCustomizedApp = ref('')
|
||||
const currentCustomizedCommand = ref('')
|
||||
|
||||
const langList = computed(() => {
|
||||
for(let item of captionEngine.value){
|
||||
if(item.value === currentEngine.value) {
|
||||
return item.languages
|
||||
}
|
||||
}
|
||||
return []
|
||||
})
|
||||
|
||||
function applyChange(){
|
||||
captionControl.sourceLang = currentSourceLang.value
|
||||
captionControl.targetLang = currentTargetLang.value
|
||||
captionControl.engine = currentEngine.value
|
||||
captionControl.audio = currentAudio.value
|
||||
captionControl.translation = currentTranslation.value
|
||||
|
||||
captionControl.customized = currentCustomized.value
|
||||
captionControl.customizedApp = currentCustomizedApp.value
|
||||
captionControl.customizedCommand = currentCustomizedCommand.value
|
||||
|
||||
captionControl.sendControlChange()
|
||||
|
||||
notification.open({
|
||||
message: '字幕控制已更改',
|
||||
description: '如果字幕引擎已经启动,需要关闭后重启才会生效'
|
||||
});
|
||||
}
|
||||
|
||||
function cancelChange(){
|
||||
currentSourceLang.value = captionControl.sourceLang
|
||||
currentTargetLang.value = captionControl.targetLang
|
||||
currentEngine.value = captionControl.engine
|
||||
currentAudio.value = captionControl.audio
|
||||
currentTranslation.value = captionControl.translation
|
||||
|
||||
currentCustomized.value = captionControl.customized
|
||||
currentCustomizedApp.value = captionControl.customizedApp
|
||||
currentCustomizedCommand.value = captionControl.customizedCommand
|
||||
}
|
||||
|
||||
watch(changeSignal, (val) => {
|
||||
if(val == true) {
|
||||
cancelChange();
|
||||
captionControl.changeSignal = false;
|
||||
}
|
||||
})
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
.control-item {
|
||||
margin: 10px 0;
|
||||
}
|
||||
|
||||
.control-label {
|
||||
display: inline-block;
|
||||
width: 80px;
|
||||
text-align: right;
|
||||
margin-right: 10px;
|
||||
}
|
||||
|
||||
.customize-note {
|
||||
padding: 0 20px;
|
||||
color: red;
|
||||
font-size: 12px;
|
||||
}
|
||||
|
||||
.control-input {
|
||||
width: calc(100% - 100px);
|
||||
min-width: 100px;
|
||||
}
|
||||
|
||||
.control-item-value {
|
||||
width: 80px;
|
||||
text-align: right;
|
||||
font-size: 12px;
|
||||
color: #666
|
||||
}
|
||||
</style>
|
||||
@@ -1,202 +0,0 @@
|
||||
<template>
|
||||
<div class="caption-stat">
|
||||
<a-row>
|
||||
<a-col :span="6">
|
||||
<a-statistic title="字幕引擎" :value="engine" />
|
||||
</a-col>
|
||||
<a-col :span="6">
|
||||
<a-statistic title="字幕引擎状态" :value="engineEnabled?'已启动':'未启动'" />
|
||||
</a-col>
|
||||
<a-col :span="6">
|
||||
<a-statistic title="已记录字幕" :value="captionData.length" />
|
||||
</a-col>
|
||||
</a-row>
|
||||
</div>
|
||||
|
||||
<div class="caption-control">
|
||||
<a-button
|
||||
type="primary"
|
||||
class="control-button"
|
||||
@click="openCaptionWindow"
|
||||
>打开字幕窗口</a-button>
|
||||
<a-button
|
||||
class="control-button"
|
||||
@click="captionControl.startEngine"
|
||||
>启动字幕引擎</a-button>
|
||||
<a-button
|
||||
danger class="control-button"
|
||||
@click="captionControl.stopEngine"
|
||||
>关闭字幕引擎</a-button>
|
||||
</div>
|
||||
|
||||
<div class="caption-list">
|
||||
<div class="caption-title">
|
||||
<span style="margin-right: 30px;">字幕记录</span>
|
||||
<a-button
|
||||
type="primary"
|
||||
style="margin-right: 20px;"
|
||||
@click="exportCaptions"
|
||||
:disabled="captionData.length === 0"
|
||||
>
|
||||
导出字幕记录
|
||||
</a-button>
|
||||
<a-button
|
||||
danger
|
||||
@click="clearCaptions"
|
||||
>
|
||||
清空字幕记录
|
||||
</a-button>
|
||||
</div>
|
||||
<a-table
|
||||
:columns="columns"
|
||||
:data-source="captionData"
|
||||
v-model:pagination="pagination"
|
||||
>
|
||||
<template #bodyCell="{ column, record }">
|
||||
<template v-if="column.key === 'index'">
|
||||
{{ record.index }}
|
||||
</template>
|
||||
<template v-if="column.key === 'time'">
|
||||
<div class="time-cell">
|
||||
<div class="time-start">{{ record.time_s }}</div>
|
||||
<div class="time-end">{{ record.time_t }}</div>
|
||||
</div>
|
||||
</template>
|
||||
<template v-if="column.key === 'content'">
|
||||
<div class="caption-content">
|
||||
<div class="caption-text">{{ record.text }}</div>
|
||||
<div class="caption-translation">{{ record.translation }}</div>
|
||||
</div>
|
||||
</template>
|
||||
</template>
|
||||
</a-table>
|
||||
</div>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { ref } from 'vue'
|
||||
import { storeToRefs } from 'pinia'
|
||||
import { useCaptionLogStore } from '@renderer/stores/captionLog'
|
||||
import { useCaptionControlStore } from '@renderer/stores/captionControl'
|
||||
const captionLog = useCaptionLogStore()
|
||||
const { captionData } = storeToRefs(captionLog)
|
||||
const captionControl = useCaptionControlStore()
|
||||
const { engineEnabled, engine } = storeToRefs(captionControl)
|
||||
const pagination = ref({
|
||||
current: 1,
|
||||
pageSize: 10,
|
||||
showSizeChanger: true,
|
||||
pageSizeOptions: ['10', '20', '50'],
|
||||
showTotal: (total: number) => `共 ${total} 条记录`,
|
||||
onChange: (page: number, pageSize: number) => {
|
||||
pagination.value.current = page
|
||||
pagination.value.pageSize = pageSize
|
||||
},
|
||||
onShowSizeChange: (current: number, size: number) => {
|
||||
pagination.value.current = current
|
||||
pagination.value.pageSize = size
|
||||
}
|
||||
})
|
||||
|
||||
const columns = [
|
||||
{
|
||||
title: '序号',
|
||||
dataIndex: 'index',
|
||||
key: 'index',
|
||||
width: 80,
|
||||
},
|
||||
{
|
||||
title: '时间',
|
||||
dataIndex: 'time',
|
||||
key: 'time',
|
||||
width: 160,
|
||||
},
|
||||
{
|
||||
title: '字幕内容',
|
||||
dataIndex: 'content',
|
||||
key: 'content',
|
||||
},
|
||||
]
|
||||
|
||||
function openCaptionWindow() {
|
||||
window.electron.ipcRenderer.send('control.captionWindow.activate')
|
||||
}
|
||||
|
||||
function exportCaptions() {
|
||||
const jsonData = JSON.stringify(captionData.value, null, 2)
|
||||
const blob = new Blob([jsonData], { type: 'application/json' })
|
||||
const url = URL.createObjectURL(blob)
|
||||
const a = document.createElement('a')
|
||||
a.href = url
|
||||
const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
|
||||
a.download = `captions-${timestamp}.json`
|
||||
document.body.appendChild(a)
|
||||
a.click()
|
||||
document.body.removeChild(a)
|
||||
URL.revokeObjectURL(url)
|
||||
}
|
||||
|
||||
function clearCaptions() {
|
||||
captionLog.clear()
|
||||
}
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
.caption-control {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
justify-content: center;
|
||||
margin: 30px;
|
||||
}
|
||||
|
||||
.control-button {
|
||||
height: 40px;
|
||||
margin: 20px;
|
||||
font-size: 16px;
|
||||
}
|
||||
|
||||
.caption-list {
|
||||
background: #fff;
|
||||
padding: 20px;
|
||||
border-radius: 8px;
|
||||
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
.caption-title {
|
||||
font-size: 24px;
|
||||
font-weight: bold;
|
||||
margin-bottom: 10px;
|
||||
}
|
||||
|
||||
.time-cell {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 4px;
|
||||
font-size: 14px;
|
||||
}
|
||||
|
||||
.time-start {
|
||||
color: #1677ff;
|
||||
}
|
||||
|
||||
.time-end {
|
||||
color: #ff4d4f;
|
||||
}
|
||||
|
||||
.caption-content {
|
||||
padding: 8px 0;
|
||||
}
|
||||
|
||||
.caption-text {
|
||||
font-size: 16px;
|
||||
color: #333;
|
||||
margin-bottom: 4px;
|
||||
}
|
||||
|
||||
.caption-translation {
|
||||
font-size: 14px;
|
||||
color: #666;
|
||||
padding-left: 16px;
|
||||
border-left: 3px solid #1890ff;
|
||||
}
|
||||
</style>
|
||||
362
src/renderer/src/components/CaptionLog.vue
Normal file
@@ -0,0 +1,362 @@
|
||||
<template>
|
||||
<div>
|
||||
<div class="caption-title">
|
||||
<span style="margin-right: 30px;">{{ $t('log.title') }}</span>
|
||||
</div>
|
||||
<a-popover :title="$t('log.baseTime')">
|
||||
<template #content>
|
||||
<div class="base-time">
|
||||
<div class="base-time-container">
|
||||
<a-input
|
||||
type="number" min="0"
|
||||
v-model:value="baseHH"
|
||||
></a-input>
|
||||
<span class="base-time-label">{{ $t('log.hour') }}</span>
|
||||
</div>
|
||||
</div><span style="margin: 0 4px;">:</span>
|
||||
<div class="base-time">
|
||||
<div class="base-time-container">
|
||||
<a-input
|
||||
type="number" min="0" max="59"
|
||||
v-model:value="baseMM"
|
||||
></a-input>
|
||||
<span class="base-time-label">{{ $t('log.min') }}</span>
|
||||
</div>
|
||||
</div><span style="margin: 0 4px;">:</span>
|
||||
<div class="base-time">
|
||||
<div class="base-time-container">
|
||||
<a-input
|
||||
type="number" min="0" max="59"
|
||||
v-model:value="baseSS"
|
||||
></a-input>
|
||||
<span class="base-time-label">{{ $t('log.sec') }}</span>
|
||||
</div>
|
||||
</div><span style="margin: 0 4px;">.</span>
|
||||
<div class="base-time">
|
||||
<div class="base-time-container">
|
||||
<a-input
|
||||
type="number" min="0" max="999"
|
||||
v-model:value="baseMS"
|
||||
></a-input>
|
||||
<span class="base-time-label">{{ $t('log.ms') }}</span>
|
||||
</div>
|
||||
</div>
|
||||
</template>
|
||||
<a-button
|
||||
type="primary"
|
||||
style="margin-right: 20px;"
|
||||
@click="changeBaseTime"
|
||||
:disabled="captionData.length === 0"
|
||||
>{{ $t('log.changeTime') }}</a-button>
|
||||
</a-popover>
|
||||
<a-popover :title="$t('log.exportOptions')">
|
||||
<template #content>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('log.exportFormat') }}</span>
|
||||
<a-radio-group v-model:value="exportFormat">
|
||||
<a-radio-button value="srt"><code>.srt</code></a-radio-button>
|
||||
<a-radio-button value="json"><code>.json</code></a-radio-button>
|
||||
</a-radio-group>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('log.exportContent') }}</span>
|
||||
<a-radio-group v-model:value="contentOption">
|
||||
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
|
||||
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
|
||||
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
|
||||
</a-radio-group>
|
||||
</div>
|
||||
</template>
|
||||
<a-button
|
||||
style="margin-right: 20px;"
|
||||
@click="exportCaptions"
|
||||
:disabled="captionData.length === 0"
|
||||
>{{ $t('log.export') }}</a-button>
|
||||
</a-popover>
|
||||
<a-popover :title="$t('log.copyOptions')">
|
||||
<template #content>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('log.addIndex') }}</span>
|
||||
<a-switch v-model:checked="showIndex" />
|
||||
<span class="input-label">{{ $t('log.copyTime') }}</span>
|
||||
<a-switch v-model:checked="copyTime" />
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('log.copyContent') }}</span>
|
||||
<a-radio-group v-model:value="contentOption">
|
||||
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
|
||||
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
|
||||
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
|
||||
</a-radio-group>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('log.copyNum') }}</span>
|
||||
<a-radio-group v-model:value="copyNum">
|
||||
<a-radio-button :value="0"><code>[:]</code></a-radio-button>
|
||||
<a-radio-button :value="1"><code>[-1:]</code></a-radio-button>
|
||||
<a-radio-button :value="2"><code>[-2:]</code></a-radio-button>
|
||||
<a-radio-button :value="3"><code>[-3:]</code></a-radio-button>
|
||||
</a-radio-group>
|
||||
</div>
|
||||
</template>
|
||||
<a-button
|
||||
style="margin-right: 20px;"
|
||||
@click="copyCaptions"
|
||||
>{{ $t('log.copy') }}</a-button>
|
||||
</a-popover>
|
||||
<a-button
|
||||
danger
|
||||
@click="clearCaptions"
|
||||
>{{ $t('log.clear') }}</a-button>
|
||||
</div>
|
||||
|
||||
<a-table
|
||||
:columns="columns"
|
||||
:data-source="captionData"
|
||||
v-model:pagination="pagination"
|
||||
style="margin-top: 10px;"
|
||||
>
|
||||
<template #bodyCell="{ column, record }">
|
||||
<template v-if="column.key === 'index'">
|
||||
{{ record.index }}
|
||||
</template>
|
||||
<template v-if="column.key === 'time'">
|
||||
<div class="time-cell">
|
||||
<code class="time-start"
|
||||
:style="`color: ${uiColor}`"
|
||||
>{{ record.time_s }}</code>
|
||||
<code class="time-end">{{ record.time_t }}</code>
|
||||
</div>
|
||||
</template>
|
||||
<template v-if="column.key === 'content'">
|
||||
<div class="caption-content">
|
||||
<div class="caption-text">{{ record.text }}</div>
|
||||
<div
|
||||
class="caption-translation"
|
||||
:style="`border-left: 3px solid ${uiColor};`"
|
||||
>{{ record.translation }}</div>
|
||||
</div>
|
||||
</template>
|
||||
</template>
|
||||
</a-table>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { ref } from 'vue'
|
||||
import { storeToRefs } from 'pinia'
|
||||
import { useCaptionLogStore } from '@renderer/stores/captionLog'
|
||||
import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
|
||||
import { message } from 'ant-design-vue'
|
||||
import { useI18n } from 'vue-i18n'
|
||||
import * as tc from '../utils/timeCalc'
|
||||
import { CaptionItem } from '../types'
|
||||
|
||||
const { t } = useI18n()
|
||||
|
||||
const captionLog = useCaptionLogStore()
|
||||
const { captionData } = storeToRefs(captionLog)
|
||||
|
||||
const generalSetting = useGeneralSettingStore()
|
||||
const { uiColor } = storeToRefs(generalSetting)
|
||||
|
||||
const exportFormat = ref('srt')
|
||||
const showIndex = ref(true)
|
||||
const copyTime = ref(true)
|
||||
const contentOption = ref('both')
|
||||
const copyNum = ref(0)
|
||||
|
||||
const baseHH = ref<number>(0)
|
||||
const baseMM = ref<number>(0)
|
||||
const baseSS = ref<number>(0)
|
||||
const baseMS = ref<number>(0)
|
||||
|
||||
const pagination = ref({
|
||||
current: 1,
|
||||
pageSize: 20,
|
||||
showSizeChanger: true,
|
||||
pageSizeOptions: ['10', '20', '50', '100'],
|
||||
onChange: (page: number, pageSize: number) => {
|
||||
pagination.value.current = page
|
||||
pagination.value.pageSize = pageSize
|
||||
},
|
||||
onShowSizeChange: (current: number, size: number) => {
|
||||
pagination.value.current = current
|
||||
pagination.value.pageSize = size
|
||||
}
|
||||
})
|
||||
|
||||
const columns = [
|
||||
{
|
||||
title: 'index',
|
||||
dataIndex: 'index',
|
||||
key: 'index',
|
||||
width: 80,
|
||||
sorter: (a: CaptionItem, b: CaptionItem) => {
|
||||
if(a.index <= b.index) return -1
|
||||
return 1
|
||||
},
|
||||
sortDirections: ['descend'],
|
||||
defaultSortOrder: 'descend',
|
||||
},
|
||||
{
|
||||
title: 'time',
|
||||
dataIndex: 'time',
|
||||
key: 'time',
|
||||
width: 150,
|
||||
sorter: (a: CaptionItem, b: CaptionItem) => {
|
||||
if(a.time_s <= b.time_s) return -1
|
||||
return 1
|
||||
},
|
||||
sortDirections: ['descend', 'ascend'],
|
||||
},
|
||||
{
|
||||
title: 'content',
|
||||
dataIndex: 'content',
|
||||
key: 'content',
|
||||
},
|
||||
]
|
||||
|
||||
function changeBaseTime() {
|
||||
if(baseHH.value < 0) baseHH.value = 0
|
||||
if(baseMM.value < 0) baseMM.value = 0
|
||||
if(baseMM.value > 59) baseMM.value = 59
|
||||
if(baseSS.value < 0) baseSS.value = 0
|
||||
if(baseSS.value > 59) baseSS.value = 59
|
||||
if(baseMS.value < 0) baseMS.value = 0
|
||||
if(baseMS.value > 999) baseMS.value = 999
|
||||
const newBase: tc.Time = {
|
||||
hh: Number(baseHH.value),
|
||||
mm: Number(baseMM.value),
|
||||
ss: Number(baseSS.value),
|
||||
ms: Number(baseMS.value)
|
||||
}
|
||||
const oldBase = tc.getTimeFromStr(captionData.value[0].time_s)
|
||||
const deltaMs = tc.getMsFromTime(newBase) - tc.getMsFromTime(oldBase)
|
||||
for(let i = 0; i < captionData.value.length; i++){
|
||||
captionData.value[i].time_s =
|
||||
tc.getNewTimeStr(captionData.value[i].time_s, deltaMs)
|
||||
captionData.value[i].time_t =
|
||||
tc.getNewTimeStr(captionData.value[i].time_t, deltaMs)
|
||||
}
|
||||
}
|
||||
|
||||
function exportCaptions() {
|
||||
const exportData = getExportData()
|
||||
const blob = new Blob([exportData], {
|
||||
type: exportFormat.value === 'json' ? 'application/json' : 'text/plain'
|
||||
})
|
||||
const url = URL.createObjectURL(blob)
|
||||
const a = document.createElement('a')
|
||||
a.href = url
|
||||
const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
|
||||
a.download = `captions-${timestamp}.${exportFormat.value}`
|
||||
document.body.appendChild(a)
|
||||
a.click()
|
||||
document.body.removeChild(a)
|
||||
URL.revokeObjectURL(url)
|
||||
}
|
||||
|
||||
function getExportData() {
|
||||
if(exportFormat.value === 'json') return JSON.stringify(captionData.value, null, 2)
|
||||
let content = ''
|
||||
for(let i = 0; i < captionData.value.length; i++){
|
||||
const item = captionData.value[i]
|
||||
content += `${i+1}\n`
|
||||
content += `${item.time_s} --> ${item.time_t}\n`.replace(/\./g, ',')
|
||||
if(contentOption.value === 'both') content += `${item.text}\n${item.translation}\n\n`
|
||||
else if(contentOption.value === 'source') content += `${item.text}\n\n`
|
||||
else content += `${item.translation}\n\n`
|
||||
}
|
||||
return content
|
||||
}
|
||||
|
||||
function copyCaptions() {
|
||||
let content = ''
|
||||
let start = 0
|
||||
if(copyNum.value > 0) {
|
||||
start = captionData.value.length - copyNum.value
|
||||
if(start < 0) start = 0
|
||||
}
|
||||
for(let i = start; i < captionData.value.length; i++){
|
||||
const item = captionData.value[i]
|
||||
if(showIndex.value) content += `${i+1}\n`
|
||||
if(copyTime.value) content += `${item.time_s} --> ${item.time_t}\n`.replace(/\./g, ',')
|
||||
if(contentOption.value === 'both') content += `${item.text}\n${item.translation}\n\n`
|
||||
else if(contentOption.value === 'source') content += `${item.text}\n\n`
|
||||
else content += `${item.translation}\n\n`
|
||||
}
|
||||
navigator.clipboard.writeText(content)
|
||||
message.success(t('log.copySuccess'))
|
||||
}
|
||||
|
||||
function clearCaptions() {
|
||||
captionLog.clear()
|
||||
}
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
@import url(../assets/input.css);
|
||||
|
||||
.caption-list {
|
||||
padding: 20px;
|
||||
border-radius: 8px;
|
||||
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
.caption-title {
|
||||
color: var(--icon-color);
|
||||
display: inline-block;
|
||||
font-size: 24px;
|
||||
font-weight: bold;
|
||||
margin: 10px 0;
|
||||
}
|
||||
|
||||
.base-time {
|
||||
width: 64px;
|
||||
display: inline-block;
|
||||
}
|
||||
|
||||
.base-time-container {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
align-items: center;
|
||||
gap: 4px;
|
||||
}
|
||||
|
||||
.base-time-label {
|
||||
font-size: 12px;
|
||||
color: var(--tag-color);
|
||||
}
|
||||
|
||||
.time-cell {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 4px;
|
||||
font-size: 14px;
|
||||
}
|
||||
|
||||
.time-start {
|
||||
display: block;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
.time-end {
|
||||
display: block;
|
||||
font-weight: bold;
|
||||
color: #ff4d4f;
|
||||
}
|
||||
|
||||
.caption-content {
|
||||
padding: 8px 0;
|
||||
}
|
||||
|
||||
.caption-text {
|
||||
font-size: 16px;
|
||||
margin-bottom: 4px;
|
||||
}
|
||||
|
||||
.caption-translation {
|
||||
font-size: 14px;
|
||||
padding-left: 16px;
|
||||
}
|
||||
</style>
|
||||
@@ -1,123 +1,245 @@
|
||||
<template>
|
||||
<a-card size="small" title="字幕样式设置">
|
||||
<a-card size="small" :title="$t('style.title')">
|
||||
<template #extra>
|
||||
<a @click="applyStyle">应用样式</a> |
|
||||
<a @click="resetStyle">取消更改</a>
|
||||
<a @click="applyStyle">{{ $t('style.applyStyle') }}</a> |
|
||||
<a @click="backStyle">{{ $t('style.cancelChange') }}</a> |
|
||||
<a @click="resetStyle">{{ $t('style.resetStyle') }}</a>
|
||||
</template>
|
||||
<div class="style-item">
|
||||
<span class="style-label">字体族</span>
|
||||
<a-input
|
||||
class="style-input"
|
||||
v-model:value="currentFontFamily"
|
||||
/>
|
||||
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.lineNumber') }}</span>
|
||||
<a-radio-group v-model:value="currentLineNumber">
|
||||
<a-radio-button :value="1">1</a-radio-button>
|
||||
<a-radio-button :value="2">2</a-radio-button>
|
||||
<a-radio-button :value="3">3</a-radio-button>
|
||||
<a-radio-button :value="4">4</a-radio-button>
|
||||
</a-radio-group>
|
||||
</div>
|
||||
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.longCaption') }}</span>
|
||||
<a-select
|
||||
class="input-area"
|
||||
v-model:value="currentLineBreak"
|
||||
:options="captionStyle.iBreakOptions"
|
||||
></a-select>
|
||||
</div>
|
||||
<div class="style-item">
|
||||
<span class="style-label">字体颜色</span>
|
||||
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.fontFamily') }}</span>
|
||||
<a-input
|
||||
class="style-input"
|
||||
class="input-area"
|
||||
v-model:value="currentFontFamily"
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.fontColor') }}</span>
|
||||
<a-input
|
||||
class="input-area"
|
||||
type="color"
|
||||
v-model:value="currentFontColor"
|
||||
/>
|
||||
<div class="style-item-value">{{ currentFontColor }}</div>
|
||||
<div class="input-item-value">{{ currentFontColor }}</div>
|
||||
</div>
|
||||
<div class="style-item">
|
||||
<span class="style-label">字体大小</span>
|
||||
<a-input
|
||||
class="style-input"
|
||||
type="range"
|
||||
min="0" max="64"
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.fontSize') }}</span>
|
||||
<a-slider
|
||||
class="input-area"
|
||||
:min="0" :max="72"
|
||||
v-model:value="currentFontSize"
|
||||
/>
|
||||
<div class="style-item-value">{{ currentFontSize }}px</div>
|
||||
/>
|
||||
<div class="input-item-value">{{ currentFontSize }}px</div>
|
||||
</div>
|
||||
<div class="style-item">
|
||||
<span class="style-label">背景颜色</span>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.fontWeight') }}</span>
|
||||
<a-slider
|
||||
class="input-area"
|
||||
:min="1" :max="9"
|
||||
v-model:value="currentFontWeight"
|
||||
/>
|
||||
<div class="input-item-value">{{ currentFontWeight*100 }}</div>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.background') }}</span>
|
||||
<a-input
|
||||
class="style-input"
|
||||
class="input-area"
|
||||
type="color"
|
||||
v-model:value="currentBackground"
|
||||
/>
|
||||
<div class="style-item-value">{{ currentBackground }}</div>
|
||||
<div class="input-item-value">{{ currentBackground }}</div>
|
||||
</div>
|
||||
<div class="style-item">
|
||||
<span class="style-label">背景透明度</span>
|
||||
<a-input
|
||||
class="style-input"
|
||||
type="range"
|
||||
min="0"
|
||||
max="100"
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.opacity') }}</span>
|
||||
<a-slider
|
||||
class="input-area"
|
||||
:min="0"
|
||||
:max="100"
|
||||
v-model:value="currentOpacity"
|
||||
/>
|
||||
<div class="style-item-value">{{ currentOpacity }}</div>
|
||||
<div class="input-item-value">{{ currentOpacity }}%</div>
|
||||
</div>
|
||||
|
||||
<div class="style-item">
|
||||
<span class="style-label">显示预览</span>
|
||||
<a-switch v-model:checked="displayPreview" />
|
||||
<span class="style-label">显示翻译</span>
|
||||
<a-switch v-model:checked="currentTransDisplay" />
|
||||
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.preview') }}</span>
|
||||
<a-switch v-model:checked="currentPreview" />
|
||||
<span style="display:inline-block;width:10px;"></span>
|
||||
<div style="display: inline-block;">
|
||||
<span class="switch-label">{{ $t('style.translation') }}</span>
|
||||
<a-switch v-model:checked="currentTransDisplay" />
|
||||
</div>
|
||||
<span style="display:inline-block;width:10px;"></span>
|
||||
<div style="display: inline-block;">
|
||||
<span class="switch-label">{{ $t('style.textShadow') }}</span>
|
||||
<a-switch v-model:checked="currentTextShadow" />
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div v-show="currentTransDisplay">
|
||||
<a-card size="small" title="翻译样式设置">
|
||||
<a-card size="small" :title="$t('style.trans.title')">
|
||||
<template #extra>
|
||||
<a @click="useSameStyle">使用相同样式</a>
|
||||
<a @click="useSameStyle">{{ $t('style.trans.useSame') }}</a>
|
||||
</template>
|
||||
<div class="style-item">
|
||||
<span class="style-label">翻译字体</span>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.fontFamily') }}</span>
|
||||
<a-input
|
||||
class="style-input"
|
||||
class="input-area"
|
||||
v-model:value="currentTransFontFamily"
|
||||
/>
|
||||
/>
|
||||
</div>
|
||||
<div class="style-item">
|
||||
<span class="style-label">翻译颜色</span>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.fontColor') }}</span>
|
||||
<a-input
|
||||
class="style-input"
|
||||
class="input-area"
|
||||
type="color"
|
||||
v-model:value="currentTransFontColor"
|
||||
/>
|
||||
<div class="style-item-value">{{ currentTransFontColor }}</div>
|
||||
<div class="input-item-value">{{ currentTransFontColor }}</div>
|
||||
</div>
|
||||
<div class="style-item">
|
||||
<span class="style-label">翻译大小</span>
|
||||
<a-input
|
||||
class="style-input"
|
||||
type="range"
|
||||
min="0" max="64"
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.fontSize') }}</span>
|
||||
<a-slider
|
||||
class="input-area"
|
||||
:min="0" :max="72"
|
||||
v-model:value="currentTransFontSize"
|
||||
/>
|
||||
<div class="style-item-value">{{ currentTransFontSize }}px</div>
|
||||
/>
|
||||
<div class="input-item-value">{{ currentTransFontSize }}px</div>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.fontWeight') }}</span>
|
||||
<a-slider
|
||||
class="input-area"
|
||||
:min="1" :max="9"
|
||||
v-model:value="currentTransFontWeight"
|
||||
/>
|
||||
<div class="input-item-value">{{ currentTransFontWeight*100 }}</div>
|
||||
</div>
|
||||
</a-card>
|
||||
</div>
|
||||
|
||||
<div v-show="currentTextShadow" style="margin-top:10px;">
|
||||
<a-card size="small" :title="$t('style.shadow.title')">
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.shadow.offsetX') }}</span>
|
||||
<a-slider
|
||||
class="input-area"
|
||||
:min="-10" :max="10"
|
||||
v-model:value="currentOffsetX"
|
||||
/>
|
||||
<div class="input-item-value">{{ currentOffsetX }}px</div>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.shadow.offsetY') }}</span>
|
||||
<a-slider
|
||||
class="input-area"
|
||||
:min="-10" :max="10"
|
||||
v-model:value="currentOffsetY"
|
||||
/>
|
||||
<div class="input-item-value">{{ currentOffsetY }}px</div>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.shadow.blur') }}</span>
|
||||
<a-slider
|
||||
class="input-area"
|
||||
:min="0" :max="12"
|
||||
v-model:value="currentBlur"
|
||||
/>
|
||||
<div class="input-item-value">{{ currentBlur }}px</div>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('style.shadow.color') }}</span>
|
||||
<a-input
|
||||
class="input-area"
|
||||
type="color"
|
||||
v-model:value="currentTextShadowColor"
|
||||
/>
|
||||
<div class="input-item-value">{{ currentTextShadowColor }}</div>
|
||||
</div>
|
||||
</a-card>
|
||||
</div>
|
||||
</a-card>
|
||||
|
||||
<Teleport to="body">
|
||||
<div
|
||||
v-if="displayPreview"
|
||||
v-if="currentPreview"
|
||||
class="preview-container"
|
||||
:style="{
|
||||
backgroundColor: addOpicityToColor(currentBackground, currentOpacity)
|
||||
backgroundColor: addOpicityToColor(currentBackground, currentOpacity),
|
||||
textShadow: currentTextShadow ? `${currentOffsetX}px ${currentOffsetY}px ${currentBlur}px ${currentTextShadowColor}` : 'none'
|
||||
}"
|
||||
>
|
||||
<p class="preview-caption"
|
||||
:style="{
|
||||
fontFamily: currentFontFamily,
|
||||
fontSize: currentFontSize + 'px',
|
||||
color: currentFontColor
|
||||
}">
|
||||
{{ "This is a preview of subtitle styles." }}
|
||||
</p>
|
||||
<p class="preview-translation" v-if="currentTransDisplay"
|
||||
:style="{
|
||||
fontFamily: currentTransFontFamily,
|
||||
fontSize: currentTransFontSize + 'px',
|
||||
color: currentTransFontColor
|
||||
}"
|
||||
>这是字幕样式预览(翻译)</p>
|
||||
</div>
|
||||
<template v-if="captionData.length">
|
||||
<template
|
||||
v-for="val in revArr[Math.min(currentLineNumber, captionData.length)]"
|
||||
:key="captionData[captionData.length - val].time_s"
|
||||
>
|
||||
<p :class="[currentLineBreak?'':'left-ellipsis']"
|
||||
:style="{
|
||||
fontFamily: currentFontFamily,
|
||||
fontSize: currentFontSize + 'px',
|
||||
color: currentFontColor,
|
||||
fontWeight: currentFontWeight * 100
|
||||
}">
|
||||
<span>{{ captionData[captionData.length - val].text }}</span>
|
||||
</p>
|
||||
<p :class="[currentLineBreak?'':'left-ellipsis']"
|
||||
v-if="currentTransDisplay && captionData[captionData.length - val].translation"
|
||||
:style="{
|
||||
fontFamily: currentTransFontFamily,
|
||||
fontSize: currentTransFontSize + 'px',
|
||||
color: currentTransFontColor,
|
||||
fontWeight: currentTransFontWeight * 100
|
||||
}"
|
||||
>
|
||||
<span>{{ captionData[captionData.length - val].translation }}</span>
|
||||
</p>
|
||||
</template>
|
||||
</template>
|
||||
<template v-else>
|
||||
<template v-for="val in currentLineNumber" :key="val">
|
||||
<p :class="[currentLineBreak?'':'left-ellipsis']"
|
||||
:style="{
|
||||
fontFamily: currentFontFamily,
|
||||
fontSize: currentFontSize + 'px',
|
||||
color: currentFontColor,
|
||||
fontWeight: currentFontWeight * 100
|
||||
}">
|
||||
<span>{{ $t('example.original') }}</span>
|
||||
</p>
|
||||
<p :class="[currentLineBreak?'':'left-ellipsis']"
|
||||
v-if="currentTransDisplay"
|
||||
:style="{
|
||||
fontFamily: currentTransFontFamily,
|
||||
fontSize: currentTransFontSize + 'px',
|
||||
color: currentTransFontColor,
|
||||
fontWeight: currentTransFontWeight * 100
|
||||
}"
|
||||
>
|
||||
<span>{{ $t('example.translation') }}</span>
|
||||
</p>
|
||||
</template>
|
||||
</template>
|
||||
</div>
|
||||
</Teleport>
|
||||
|
||||
</template>
|
||||
@@ -126,20 +248,44 @@
|
||||
import { ref, watch } from 'vue'
|
||||
import { useCaptionStyleStore } from '@renderer/stores/captionStyle'
|
||||
import { storeToRefs } from 'pinia'
|
||||
import { notification } from 'ant-design-vue'
|
||||
import { useI18n } from 'vue-i18n'
|
||||
import { useCaptionLogStore } from '@renderer/stores/captionLog';
|
||||
|
||||
const revArr = {
|
||||
1: [1],
|
||||
2: [2, 1],
|
||||
3: [3, 2, 1],
|
||||
4: [4, 3, 2, 1],
|
||||
}
|
||||
|
||||
const captionLog = useCaptionLogStore();
|
||||
const { captionData } = storeToRefs(captionLog);
|
||||
|
||||
const { t } = useI18n()
|
||||
|
||||
const captionStyle = useCaptionStyleStore()
|
||||
const { changeSignal } = storeToRefs(captionStyle)
|
||||
|
||||
const currentLineNumber = ref<number>(1)
|
||||
const currentLineBreak = ref<number>(0)
|
||||
const currentFontFamily = ref<string>('sans-serif')
|
||||
const currentFontSize = ref<number>(24)
|
||||
const currentFontColor = ref<string>('#000000')
|
||||
const currentFontWeight = ref<number>(4)
|
||||
const currentBackground = ref<string>('#dbe2ef')
|
||||
const currentOpacity = ref<number>(50)
|
||||
const currentPreview = ref<boolean>(true)
|
||||
const currentTransDisplay = ref<boolean>(true)
|
||||
const currentTransFontFamily = ref<string>('sans-serif')
|
||||
const currentTransFontSize = ref<number>(24)
|
||||
const currentTransFontColor = ref<string>('#000000')
|
||||
const displayPreview = ref<boolean>(true)
|
||||
const currentTransFontWeight = ref<number>(4)
|
||||
const currentTextShadow = ref<boolean>(false)
|
||||
const currentOffsetX = ref<number>(2)
|
||||
const currentOffsetY = ref<number>(2)
|
||||
const currentBlur = ref<number>(0)
|
||||
const currentTextShadowColor = ref<string>('#ffffff')
|
||||
|
||||
function addOpicityToColor(color: string, opicity: number) {
|
||||
const opicityValue = Math.round(opicity * 255 / 100);
|
||||
@@ -151,87 +297,112 @@ function useSameStyle(){
|
||||
currentTransFontFamily.value = currentFontFamily.value;
|
||||
currentTransFontSize.value = currentFontSize.value;
|
||||
currentTransFontColor.value = currentFontColor.value;
|
||||
currentTransFontWeight.value = currentFontWeight.value;
|
||||
}
|
||||
|
||||
function applyStyle(){
|
||||
function applyStyle(){
|
||||
captionStyle.lineNumber = currentLineNumber.value;
|
||||
captionStyle.lineBreak = currentLineBreak.value;
|
||||
captionStyle.fontFamily = currentFontFamily.value;
|
||||
captionStyle.fontSize = currentFontSize.value;
|
||||
captionStyle.fontColor = currentFontColor.value;
|
||||
captionStyle.fontWeight = currentFontWeight.value;
|
||||
captionStyle.background = currentBackground.value;
|
||||
captionStyle.opacity = currentOpacity.value;
|
||||
|
||||
captionStyle.showPreview = currentPreview.value;
|
||||
captionStyle.transDisplay = currentTransDisplay.value;
|
||||
captionStyle.transFontFamily = currentTransFontFamily.value;
|
||||
captionStyle.transFontSize = currentTransFontSize.value;
|
||||
captionStyle.transFontColor = currentTransFontColor.value;
|
||||
captionStyle.transFontWeight = currentTransFontWeight.value;
|
||||
captionStyle.textShadow = currentTextShadow.value;
|
||||
captionStyle.offsetX = currentOffsetX.value;
|
||||
captionStyle.offsetY = currentOffsetY.value;
|
||||
captionStyle.blur = currentBlur.value;
|
||||
captionStyle.textShadowColor = currentTextShadowColor.value;
|
||||
|
||||
captionStyle.sendStyleChange();
|
||||
captionStyle.sendStylesChange();
|
||||
|
||||
notification.open({
|
||||
placement: 'topLeft',
|
||||
message: t('noti.styleChange'),
|
||||
description: t('noti.styleInfo')
|
||||
});
|
||||
}
|
||||
|
||||
function resetStyle(){
|
||||
function backStyle(){
|
||||
currentLineNumber.value = captionStyle.lineNumber;
|
||||
currentLineBreak.value = captionStyle.lineBreak;
|
||||
currentFontFamily.value = captionStyle.fontFamily;
|
||||
currentFontSize.value = captionStyle.fontSize;
|
||||
currentFontColor.value = captionStyle.fontColor;
|
||||
currentFontWeight.value = captionStyle.fontWeight;
|
||||
currentBackground.value = captionStyle.background;
|
||||
currentOpacity.value = captionStyle.opacity;
|
||||
|
||||
currentPreview.value = captionStyle.showPreview;
|
||||
currentTransDisplay.value = captionStyle.transDisplay;
|
||||
currentTransFontFamily.value = captionStyle.transFontFamily;
|
||||
currentTransFontSize.value = captionStyle.transFontSize;
|
||||
currentTransFontColor.value = captionStyle.transFontColor;
|
||||
currentTransFontWeight.value = captionStyle.transFontWeight;
|
||||
currentTextShadow.value = captionStyle.textShadow;
|
||||
currentOffsetX.value = captionStyle.offsetX;
|
||||
currentOffsetY.value = captionStyle.offsetY;
|
||||
currentBlur.value = captionStyle.blur;
|
||||
currentTextShadowColor.value = captionStyle.textShadowColor;
|
||||
}
|
||||
|
||||
function resetStyle() {
|
||||
captionStyle.sendStylesReset();
|
||||
}
|
||||
|
||||
watch(changeSignal, (val) => {
|
||||
if(val == true) {
|
||||
resetStyle();
|
||||
if(val === true) {
|
||||
backStyle();
|
||||
captionStyle.changeSignal = false;
|
||||
}
|
||||
})
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
.caption-button {
|
||||
display: flex;
|
||||
justify-content: center;
|
||||
@import url(../assets/input.css);
|
||||
.general-note {
|
||||
padding: 10px 10px 0;
|
||||
max-width: min(36vw, 400px);
|
||||
}
|
||||
|
||||
.style-item {
|
||||
margin: 10px 0;
|
||||
}
|
||||
|
||||
.style-label {
|
||||
display: inline-block;
|
||||
width: 80px;
|
||||
text-align: right;
|
||||
margin-right: 10px;
|
||||
}
|
||||
|
||||
.style-input {
|
||||
width: calc(100% - 100px);
|
||||
min-width: 100px;
|
||||
}
|
||||
|
||||
.style-item-value {
|
||||
width: 80px;
|
||||
text-align: right;
|
||||
font-size: 12px;
|
||||
color: #666
|
||||
.hover-label {
|
||||
color: #1668dc;
|
||||
cursor: pointer;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
.preview-container {
|
||||
line-height: 2em;
|
||||
width: 60%;
|
||||
text-align: center;
|
||||
position: absolute;
|
||||
padding: 20px;
|
||||
padding: 10px;
|
||||
border-radius: 10px;
|
||||
left: 50%;
|
||||
left: 64%;
|
||||
transform: translateX(-50%);
|
||||
bottom: 20px;
|
||||
}
|
||||
|
||||
.preview-container p {
|
||||
text-align: center;
|
||||
margin: 0;
|
||||
line-height: 1.5em;
|
||||
line-height: 1.6em;
|
||||
}
|
||||
</style>
|
||||
|
||||
.left-ellipsis {
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
direction: rtl;
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
.left-ellipsis > span {
|
||||
direction: ltr;
|
||||
display: inline-block;
|
||||
}
|
||||
</style>
|
||||
|
||||
479
src/renderer/src/components/EngineControl.vue
Normal file
@@ -0,0 +1,479 @@
|
||||
<template>
|
||||
<div style="height: 20px;"></div>
|
||||
<a-card size="small" :title="$t('engine.title')">
|
||||
<template #extra>
|
||||
<a @click="applyChange">{{ $t('engine.applyChange') }}</a> |
|
||||
<a @click="cancelChange">{{ $t('engine.cancelChange') }}</a>
|
||||
</template>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('engine.captionEngine') }}</span>
|
||||
<a-select
|
||||
class="input-area"
|
||||
v-model:value="currentEngine"
|
||||
:options="captionEngine"
|
||||
></a-select>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('engine.sourceLang') }}</span>
|
||||
<a-select
|
||||
:disabled="currentEngine === 'vosk'"
|
||||
class="input-area"
|
||||
v-model:value="currentSourceLang"
|
||||
:options="sLangList"
|
||||
></a-select>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('engine.transLang') }}</span>
|
||||
<a-select
|
||||
class="input-area"
|
||||
v-model:value="currentTargetLang"
|
||||
:options="tLangList"
|
||||
></a-select>
|
||||
</div>
|
||||
<div class="input-item" v-if="transModel">
|
||||
<span class="input-label">{{ $t('engine.transModel') }}</span>
|
||||
<a-select
|
||||
class="input-area"
|
||||
v-model:value="currentTransModel"
|
||||
:options="transModel"
|
||||
></a-select>
|
||||
</div>
|
||||
<div class="input-item" v-if="transModel && currentTransModel === 'ollama'">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.modelNameNote') }}</p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>{{ $t('engine.modelName') }}</span>
|
||||
</a-popover>
|
||||
<a-input
|
||||
class="input-area"
|
||||
v-model:value="currentOllamaName"
|
||||
></a-input>
|
||||
</div>
|
||||
<div class="input-item" v-if="transModel && currentTransModel === 'ollama'">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.baseURL') }}</p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>Base URL</span>
|
||||
</a-popover>
|
||||
<a-input
|
||||
class="input-area"
|
||||
v-model:value="currentOllamaUrl"
|
||||
placeholder="http://localhost:11434"
|
||||
></a-input>
|
||||
</div>
|
||||
<div class="input-item" v-if="transModel && currentTransModel === 'ollama'">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.apiKey') }}</p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>API Key</span>
|
||||
</a-popover>
|
||||
<a-input
|
||||
class="input-area"
|
||||
type="password"
|
||||
v-model:value="currentOllamaApiKey"
|
||||
/>
|
||||
</div>
|
||||
<div class="input-item" v-if="currentEngine === 'glm'">
|
||||
<span class="input-label">GLM API URL</span>
|
||||
<a-input
|
||||
class="input-area"
|
||||
v-model:value="currentGlmUrl"
|
||||
placeholder="https://open.bigmodel.cn/api/paas/v4/audio/transcriptions"
|
||||
></a-input>
|
||||
</div>
|
||||
<div class="input-item" v-if="currentEngine === 'glm'">
|
||||
<span class="input-label">GLM Model Name</span>
|
||||
<a-input
|
||||
class="input-area"
|
||||
v-model:value="currentGlmModel"
|
||||
placeholder="glm-asr-2512"
|
||||
></a-input>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('engine.audioType') }}</span>
|
||||
<a-select
|
||||
class="input-area"
|
||||
v-model:value="currentAudio"
|
||||
:options="audioType"
|
||||
></a-select>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('engine.enableTranslation') }}</span>
|
||||
<a-switch v-model:checked="currentTranslation" />
|
||||
<span style="display:inline-block;width:10px;"></span>
|
||||
<div style="display: inline-block;">
|
||||
<span class="switch-label">{{ $t('engine.enableRecording') }}</span>
|
||||
<a-switch v-model:checked="currentRecording" />
|
||||
</div>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('engine.customEngine') }}</span>
|
||||
<a-switch v-model:checked="currentCustomized" />
|
||||
<span style="display:inline-block;width:10px;"></span>
|
||||
<div style="display: inline-block;">
|
||||
<span class="switch-label">{{ $t('engine.showMore') }}</span>
|
||||
<a-switch v-model:checked="showMore" />
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<a-card size="small" :title="$t('engine.custom.title')" v-show="currentCustomized">
|
||||
<template #extra>
|
||||
<a-popover>
|
||||
<template #content>
|
||||
<p class="customize-note">{{ $t('engine.custom.note') }}</p>
|
||||
</template>
|
||||
<a><InfoCircleOutlined />{{ $t('engine.custom.attention') }}</a>
|
||||
</a-popover>
|
||||
</template>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('engine.custom.app') }}</span>
|
||||
<a-input
|
||||
class="input-area"
|
||||
v-model:value="currentCustomizedApp"
|
||||
></a-input>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('engine.custom.command') }}</span>
|
||||
<a-input
|
||||
class="input-area"
|
||||
v-model:value="currentCustomizedCommand"
|
||||
></a-input>
|
||||
</div>
|
||||
</a-card>
|
||||
|
||||
<a-card size="small" :title="$t('engine.showMore')" v-show="showMore" style="margin-top:10px;">
|
||||
<div class="input-item">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.apikeyInfo') }}</p>
|
||||
<p><a href="https://bailian.console.aliyun.com" target="_blank">
|
||||
https://bailian.console.aliyun.com
|
||||
</a></p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>ALI {{ $t('engine.apikey') }}</span>
|
||||
</a-popover>
|
||||
<a-input
|
||||
class="input-area"
|
||||
type="password"
|
||||
v-model:value="currentAPI_KEY"
|
||||
/>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.glmApikeyInfo') }}</p>
|
||||
<p><a href="https://open.bigmodel.cn/" target="_blank">
|
||||
https://open.bigmodel.cn
|
||||
</a></p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>GLM {{ $t('engine.apikey') }}</span>
|
||||
</a-popover>
|
||||
<a-input
|
||||
class="input-area"
|
||||
type="password"
|
||||
v-model:value="currentGlmApiKey"
|
||||
/>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.voskModelPathInfo') }}</p>
|
||||
<p class="label-hover-info">
|
||||
<a href="https://alphacephei.com/vosk/models" target="_blank">Vosk {{ $t('engine.modelDownload') }}</a>
|
||||
</p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>{{ $t('engine.voskModelPath') }}</span>
|
||||
</a-popover>
|
||||
<span
|
||||
class="input-folder"
|
||||
:style="{color: uiColor}"
|
||||
@click="selectFolderPath('vosk')"
|
||||
><span><FolderOpenOutlined /></span></span>
|
||||
<a-input
|
||||
class="input-area"
|
||||
style="width:calc(100% - 140px);"
|
||||
v-model:value="currentVoskModelPath"
|
||||
/>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.sosvModelPathInfo') }}</p>
|
||||
<p class="label-hover-info">
|
||||
<a href="https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model" target="_blank">SOSV {{ $t('engine.modelDownload') }}</a>
|
||||
</p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>{{ $t('engine.sosvModelPath') }}</span>
|
||||
</a-popover>
|
||||
<span
|
||||
class="input-folder"
|
||||
:style="{color: uiColor}"
|
||||
@click="selectFolderPath('sosv')"
|
||||
><span><FolderOpenOutlined /></span></span>
|
||||
<a-input
|
||||
class="input-area"
|
||||
style="width:calc(100% - 140px);"
|
||||
v-model:value="currentSosvModelPath"
|
||||
/>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.recordingPathInfo') }}</p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>{{ $t('engine.recordingPath') }}</span>
|
||||
</a-popover>
|
||||
<span
|
||||
class="input-folder"
|
||||
:style="{color: uiColor}"
|
||||
@click="selectFolderPath('rec')"
|
||||
><span><FolderOpenOutlined /></span></span>
|
||||
<a-input
|
||||
class="input-area"
|
||||
style="width:calc(100% - 140px);"
|
||||
v-model:value="currentRecordingPath"
|
||||
/>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.startTimeoutInfo') }}</p>
|
||||
</template>
|
||||
<span
|
||||
class="input-label info-label"
|
||||
:style="{color: uiColor, verticalAlign: 'middle'}"
|
||||
>{{ $t('engine.startTimeout') }}</span>
|
||||
</a-popover>
|
||||
<a-input-number
|
||||
class="input-area"
|
||||
v-model:value="currentStartTimeoutSeconds"
|
||||
:min="10"
|
||||
:max="120"
|
||||
:step="5"
|
||||
:addon-after="$t('engine.seconds')"
|
||||
/>
|
||||
</div>
|
||||
</a-card>
|
||||
</a-card>
|
||||
<div style="height: 20px;"></div>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { ref, computed, watch, h } from 'vue'
|
||||
import { storeToRefs } from 'pinia'
|
||||
import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
|
||||
import { useEngineControlStore } from '@renderer/stores/engineControl'
|
||||
import { notification } from 'ant-design-vue'
|
||||
import { ExclamationCircleOutlined, FolderOpenOutlined ,InfoCircleOutlined } from '@ant-design/icons-vue';
|
||||
import { useI18n } from 'vue-i18n'
|
||||
|
||||
const { t } = useI18n()
|
||||
const showMore = ref(false)
|
||||
|
||||
const engineControl = useEngineControlStore()
|
||||
const { captionEngine, audioType, changeSignal } = storeToRefs(engineControl)
|
||||
|
||||
const generalSetting = useGeneralSettingStore()
|
||||
const { uiColor } = storeToRefs(generalSetting)
|
||||
|
||||
const currentSourceLang = ref('auto')
|
||||
const currentTargetLang = ref('zh')
|
||||
const currentEngine = ref<string>('gummy')
|
||||
const currentAudio = ref<0 | 1>(0)
|
||||
const currentTranslation = ref<boolean>(true)
|
||||
const currentRecording = ref<boolean>(false)
|
||||
const currentTransModel = ref('ollama')
|
||||
const currentOllamaName = ref('')
|
||||
const currentOllamaUrl = ref('')
|
||||
const currentOllamaApiKey = ref('')
|
||||
const currentAPI_KEY = ref<string>('')
|
||||
const currentVoskModelPath = ref<string>('')
|
||||
const currentSosvModelPath = ref<string>('')
|
||||
const currentGlmUrl = ref<string>('')
|
||||
const currentGlmModel = ref<string>('')
|
||||
const currentGlmApiKey = ref<string>('')
|
||||
const currentRecordingPath = ref<string>('')
|
||||
const currentCustomized = ref<boolean>(false)
|
||||
const currentCustomizedApp = ref('')
|
||||
const currentCustomizedCommand = ref('')
|
||||
const currentStartTimeoutSeconds = ref<number>(30)
|
||||
|
||||
const sLangList = computed(() => {
|
||||
for(let item of captionEngine.value){
|
||||
if(item.value === currentEngine.value) {
|
||||
return item.languages.filter(item => item.type <= 0)
|
||||
}
|
||||
}
|
||||
return []
|
||||
})
|
||||
|
||||
const tLangList = computed(() => {
|
||||
for(let item of captionEngine.value){
|
||||
if(item.value === currentEngine.value) {
|
||||
return item.languages.filter(item => item.type >= 0)
|
||||
}
|
||||
}
|
||||
return []
|
||||
})
|
||||
|
||||
const transModel = computed(() => {
|
||||
for(let item of captionEngine.value){
|
||||
if(item.value === currentEngine.value) {
|
||||
return item.transModel
|
||||
}
|
||||
}
|
||||
return []
|
||||
})
|
||||
|
||||
function applyChange(){
|
||||
if(
|
||||
currentTranslation.value && transModel.value &&
|
||||
currentTransModel.value === 'ollama' && !currentOllamaName.value.trim()
|
||||
) {
|
||||
notification.open({
|
||||
message: t('noti.ollamaNameNull'),
|
||||
description: t('noti.ollamaNameNullNote'),
|
||||
duration: null,
|
||||
icon: () => h(ExclamationCircleOutlined, { style: 'color: #ff4d4f' })
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
engineControl.sourceLang = currentSourceLang.value
|
||||
engineControl.targetLang = currentTargetLang.value
|
||||
engineControl.transModel = currentTransModel.value
|
||||
engineControl.ollamaName = currentOllamaName.value
|
||||
engineControl.engine = currentEngine.value
|
||||
engineControl.ollamaUrl = currentOllamaUrl.value ?? "http://localhost:11434"
|
||||
engineControl.ollamaApiKey = currentOllamaApiKey.value
|
||||
engineControl.audio = currentAudio.value
|
||||
engineControl.translation = currentTranslation.value
|
||||
engineControl.recording = currentRecording.value
|
||||
engineControl.API_KEY = currentAPI_KEY.value
|
||||
engineControl.voskModelPath = currentVoskModelPath.value
|
||||
engineControl.sosvModelPath = currentSosvModelPath.value
|
||||
engineControl.glmUrl = currentGlmUrl.value ?? "https://open.bigmodel.cn/api/paas/v4/audio/transcriptions"
|
||||
engineControl.glmModel = currentGlmModel.value ?? "glm-asr-2512"
|
||||
engineControl.glmApiKey = currentGlmApiKey.value
|
||||
engineControl.recordingPath = currentRecordingPath.value
|
||||
engineControl.customized = currentCustomized.value
|
||||
engineControl.customizedApp = currentCustomizedApp.value
|
||||
engineControl.customizedCommand = currentCustomizedCommand.value
|
||||
engineControl.startTimeoutSeconds = currentStartTimeoutSeconds.value
|
||||
|
||||
engineControl.sendControlsChange()
|
||||
|
||||
notification.open({
|
||||
placement: 'topLeft',
|
||||
message: t('noti.engineChange'),
|
||||
description: t('noti.changeInfo')
|
||||
});
|
||||
}
|
||||
|
||||
function cancelChange(){
|
||||
currentSourceLang.value = engineControl.sourceLang
|
||||
currentTargetLang.value = engineControl.targetLang
|
||||
currentTransModel.value = engineControl.transModel
|
||||
currentOllamaName.value = engineControl.ollamaName
|
||||
currentOllamaUrl.value = engineControl.ollamaUrl
|
||||
currentOllamaApiKey.value = engineControl.ollamaApiKey
|
||||
currentEngine.value = engineControl.engine
|
||||
currentAudio.value = engineControl.audio
|
||||
currentTranslation.value = engineControl.translation
|
||||
currentRecording.value = engineControl.recording
|
||||
currentAPI_KEY.value = engineControl.API_KEY
|
||||
currentVoskModelPath.value = engineControl.voskModelPath
|
||||
currentSosvModelPath.value = engineControl.sosvModelPath
|
||||
currentGlmUrl.value = engineControl.glmUrl
|
||||
currentGlmModel.value = engineControl.glmModel
|
||||
currentGlmApiKey.value = engineControl.glmApiKey
|
||||
currentRecordingPath.value = engineControl.recordingPath
|
||||
currentCustomized.value = engineControl.customized
|
||||
currentCustomizedApp.value = engineControl.customizedApp
|
||||
currentCustomizedCommand.value = engineControl.customizedCommand
|
||||
currentStartTimeoutSeconds.value = engineControl.startTimeoutSeconds
|
||||
}
|
||||
|
||||
function selectFolderPath(type: 'vosk' | 'sosv' | 'rec') {
|
||||
window.electron.ipcRenderer.invoke('control.folder.select').then((folderPath) => {
|
||||
if(!folderPath) return
|
||||
if(type == 'vosk')
|
||||
currentVoskModelPath.value = folderPath
|
||||
else if(type == 'sosv')
|
||||
currentSosvModelPath.value = folderPath
|
||||
else if(type == 'rec')
|
||||
currentRecordingPath.value = folderPath
|
||||
})
|
||||
}
|
||||
|
||||
watch(changeSignal, (val) => {
|
||||
if(val == true) {
|
||||
cancelChange();
|
||||
engineControl.changeSignal = false;
|
||||
}
|
||||
})
|
||||
|
||||
watch(currentEngine, (val) => {
|
||||
if(val == 'vosk'){
|
||||
currentSourceLang.value = 'auto'
|
||||
currentTargetLang.value = useGeneralSettingStore().uiLanguage
|
||||
if(currentTargetLang.value === 'zh') {
|
||||
currentTargetLang.value = 'zh-cn'
|
||||
}
|
||||
}
|
||||
else{
|
||||
currentSourceLang.value = 'auto'
|
||||
currentTargetLang.value = useGeneralSettingStore().uiLanguage
|
||||
}
|
||||
})
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
@import url(../assets/input.css);
|
||||
|
||||
.label-hover-info {
|
||||
margin-top: 10px;
|
||||
max-width: min(36vw, 380px);
|
||||
}
|
||||
|
||||
.info-label {
|
||||
cursor: pointer;
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
.input-folder {
|
||||
display:inline-block;
|
||||
width: 40px;
|
||||
font-size:1.38em;
|
||||
cursor: pointer;
|
||||
transition: all 0.25s;
|
||||
}
|
||||
|
||||
.input-folder:hover {
|
||||
transform: scale(1.1);
|
||||
}
|
||||
|
||||
.customize-note {
|
||||
padding: 10px 10px 0;
|
||||
max-width: min(40vw, 480px);
|
||||
}
|
||||
</style>
|
||||
300
src/renderer/src/components/EngineStatus.vue
Normal file
@@ -0,0 +1,300 @@
|
||||
<template>
|
||||
<div class="caption-stat">
|
||||
<a-row>
|
||||
<a-col :span="5">
|
||||
<a-statistic
|
||||
:title="$t('status.engine')"
|
||||
:value="customized?$t('status.customized'):engine"
|
||||
/>
|
||||
</a-col>
|
||||
<a-popover :title="$t('status.engineStatus')">
|
||||
<template #content>
|
||||
<a-row class="engine-status">
|
||||
<a-col :flex="1" :title="$t('status.pid')" style="cursor:pointer;">
|
||||
<div class="engine-status-title">pid</div>
|
||||
<div>{{ pid }}</div>
|
||||
</a-col>
|
||||
<a-col :flex="1" :title="$t('status.ppid')" style="cursor:pointer;">
|
||||
<div class="engine-status-title">ppid</div>
|
||||
<div>{{ ppid }}</div>
|
||||
</a-col>
|
||||
<a-col :flex="1" :title="$t('status.port')" style="cursor:pointer;">
|
||||
<div class="engine-status-title">port</div>
|
||||
<div>{{ port }}</div>
|
||||
</a-col>
|
||||
<a-col :flex="1" :title="$t('status.cpu')" style="cursor:pointer;">
|
||||
<div class="engine-status-title">cpu</div>
|
||||
<div>{{ cpu.toFixed(1) }}%</div>
|
||||
</a-col>
|
||||
<a-col :flex="1" :title="$t('status.mem')" style="cursor:pointer;">
|
||||
<div class="engine-status-title">mem</div>
|
||||
<div>{{ (mem/1024/1024).toFixed(2) }}MB</div>
|
||||
</a-col>
|
||||
<a-col :flex="1" :title="$t('status.elapsed')" style="cursor:pointer;">
|
||||
<div class="engine-status-title">elapsed</div>
|
||||
<div>{{ (elapsed/1000).toFixed(0) }}s</div>
|
||||
</a-col>
|
||||
</a-row>
|
||||
</template>
|
||||
<a-col :span="5" @mouseenter="getEngineInfo" style="cursor: pointer;">
|
||||
<a-statistic
|
||||
:title="$t('status.status')"
|
||||
:value="engineEnabled?$t('status.started'):$t('status.stopped')"
|
||||
>
|
||||
<template #suffix v-if="engineEnabled">
|
||||
<InfoCircleOutlined style="font-size:18px;color:#1677ff"/>
|
||||
</template>
|
||||
</a-statistic>
|
||||
</a-col>
|
||||
</a-popover>
|
||||
<a-col :span="5">
|
||||
<a-statistic :title="$t('status.logNumber')" :value="captionData.length" />
|
||||
</a-col>
|
||||
<a-col :span="5">
|
||||
<a-statistic :title="$t('status.logNumber2')" :value="softwareLogs.length" />
|
||||
</a-col>
|
||||
<a-col :span="4">
|
||||
<div class="about-tag">{{ $t('status.aboutProj') }}</div>
|
||||
<GithubOutlined class="proj-info" @click="showAbout = true"/>
|
||||
</a-col>
|
||||
</a-row>
|
||||
</div>
|
||||
|
||||
<div class="caption-control">
|
||||
<a-button
|
||||
type="primary"
|
||||
class="control-button"
|
||||
@click="openCaptionWindow"
|
||||
>{{ $t('status.openCaption') }}</a-button>
|
||||
<a-button
|
||||
v-if="!isStarting"
|
||||
class="control-button"
|
||||
:loading="pending && !engineEnabled"
|
||||
:disabled="pending || engineEnabled"
|
||||
@click="startEngine"
|
||||
>{{ $t('status.startEngine') }}</a-button>
|
||||
<a-popconfirm
|
||||
v-if="isStarting"
|
||||
:title="$t('status.forceKillConfirm')"
|
||||
:ok-text="$t('status.confirm')"
|
||||
:cancel-text="$t('status.cancel')"
|
||||
@confirm="forceKillEngine"
|
||||
>
|
||||
<a-button
|
||||
danger
|
||||
class="control-button"
|
||||
type="primary"
|
||||
:icon="h(LoadingOutlined)"
|
||||
>{{ $t('status.forceKillStarting') }}</a-button>
|
||||
</a-popconfirm>
|
||||
<a-button
|
||||
danger class="control-button"
|
||||
:loading="pending && engineEnabled"
|
||||
:disabled="pending || !engineEnabled"
|
||||
@click="stopEngine"
|
||||
>{{ $t('status.stopEngine') }}</a-button>
|
||||
</div>
|
||||
|
||||
<a-modal v-model:open="showAbout" :title="$t('status.about.title')" :footer="null">
|
||||
<div class="about-modal-content">
|
||||
<h2 class="about-title">{{ $t('status.about.proj') }}</h2>
|
||||
<p class="about-desc">{{ $t('status.about.desc') }}</p>
|
||||
<a-divider />
|
||||
<div class="about-info">
|
||||
<p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v1.1.1</a-tag></p>
|
||||
<p>
|
||||
<b>{{ $t('status.about.author') }}</b>
|
||||
<a
|
||||
href="https://github.com/HiMeditator"
|
||||
target="_blank"
|
||||
>
|
||||
<a-tag color="blue">HiMeditator</a-tag>
|
||||
</a>
|
||||
</p>
|
||||
<p>
|
||||
<b>{{ $t('status.about.projLink') }}</b>
|
||||
<a href="https://github.com/HiMeditator/auto-caption" target="_blank">
|
||||
<a-tag color="blue">GitHub | auto-caption</a-tag>
|
||||
</a>
|
||||
</p>
|
||||
<p>
|
||||
<b>{{ $t('status.about.manual') }}</b>
|
||||
<a
|
||||
:href="`https://github.com/HiMeditator/auto-caption/tree/main/docs/user-manual/${$t('lang')}.md`"
|
||||
target="_blank"
|
||||
>
|
||||
<a-tag color="blue">GitHub | user-manual/{{ $t('lang') }}.md</a-tag>
|
||||
</a>
|
||||
</p>
|
||||
<p>
|
||||
<b>{{ $t('status.about.engineDoc') }}</b>
|
||||
<a
|
||||
:href="`https://github.com/HiMeditator/auto-caption/tree/main/docs/engine-manual/${$t('lang')}.md`"
|
||||
target="_blank"
|
||||
>
|
||||
<a-tag color="blue">GitHub | engine-manual/{{ $t('lang') }}.md</a-tag>
|
||||
</a>
|
||||
</p>
|
||||
</div>
|
||||
<div class="about-date">{{ $t('status.about.date') }}</div>
|
||||
</div>
|
||||
</a-modal>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { EngineInfo } from '@renderer/types'
|
||||
import { ref, watch, h } from 'vue'
|
||||
import { storeToRefs } from 'pinia'
|
||||
import { useCaptionLogStore } from '@renderer/stores/captionLog'
|
||||
import { useSoftwareLogStore } from '@renderer/stores/softwareLog'
|
||||
import { useEngineControlStore } from '@renderer/stores/engineControl'
|
||||
import { GithubOutlined, InfoCircleOutlined, LoadingOutlined } from '@ant-design/icons-vue'
|
||||
|
||||
const showAbout = ref(false)
|
||||
const pending = ref(false)
|
||||
const isStarting = ref(false)
|
||||
|
||||
const captionLog = useCaptionLogStore()
|
||||
const { captionData } = storeToRefs(captionLog)
|
||||
const softwareLog = useSoftwareLogStore()
|
||||
const { softwareLogs } = storeToRefs(softwareLog)
|
||||
const engineControl = useEngineControlStore()
|
||||
const { engineEnabled, engine, customized, errorSignal } = storeToRefs(engineControl)
|
||||
|
||||
const pid = ref(0)
|
||||
const ppid = ref(0)
|
||||
const port = ref(0)
|
||||
const cpu = ref(0)
|
||||
const mem = ref(0)
|
||||
const elapsed = ref(0)
|
||||
|
||||
function openCaptionWindow() {
|
||||
window.electron.ipcRenderer.send('control.captionWindow.activate')
|
||||
}
|
||||
|
||||
function startEngine() {
|
||||
pending.value = true
|
||||
isStarting.value = true
|
||||
if(engineControl.engine === 'vosk' && engineControl.voskModelPath.trim() === '') {
|
||||
engineControl.emptyModelPathErr()
|
||||
pending.value = false
|
||||
isStarting.value = false
|
||||
return
|
||||
}
|
||||
if(engineControl.engine === 'sosv' && engineControl.sosvModelPath.trim() === '') {
|
||||
engineControl.emptyModelPathErr()
|
||||
pending.value = false
|
||||
isStarting.value = false
|
||||
return
|
||||
}
|
||||
window.electron.ipcRenderer.send('control.engine.start')
|
||||
}
|
||||
|
||||
function stopEngine() {
|
||||
pending.value = true
|
||||
window.electron.ipcRenderer.send('control.engine.stop')
|
||||
}
|
||||
|
||||
function forceKillEngine() {
|
||||
pending.value = true
|
||||
isStarting.value = false
|
||||
window.electron.ipcRenderer.send('control.engine.forceKill')
|
||||
}
|
||||
|
||||
function getEngineInfo() {
|
||||
window.electron.ipcRenderer.invoke('control.engine.info').then((data: EngineInfo) => {
|
||||
pid.value = data.pid
|
||||
ppid.value = data.ppid
|
||||
port.value = data.port
|
||||
cpu.value = data.cpu
|
||||
mem.value = data.mem
|
||||
elapsed.value = data.elapsed
|
||||
})
|
||||
}
|
||||
|
||||
watch(engineEnabled, (enabled) => {
|
||||
pending.value = false
|
||||
if (enabled) {
|
||||
isStarting.value = false
|
||||
}
|
||||
})
|
||||
|
||||
watch(errorSignal, () => {
|
||||
pending.value = false
|
||||
isStarting.value = false
|
||||
errorSignal.value = false
|
||||
})
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
.engine-status {
|
||||
width: max(420px, 36vw);
|
||||
display: flex;
|
||||
align-items: center;
|
||||
padding: 5px 10px;
|
||||
}
|
||||
|
||||
.engine-status-title {
|
||||
font-size: 12px;
|
||||
color: var(--tag-color);
|
||||
}
|
||||
|
||||
.about-tag {
|
||||
color: var(--tag-color);
|
||||
margin-bottom: 16px;
|
||||
}
|
||||
|
||||
.proj-info {
|
||||
display: inline-block;
|
||||
font-size: 24px;
|
||||
cursor: pointer;
|
||||
color: var(--icon-color);
|
||||
}
|
||||
|
||||
.about-modal-content {
|
||||
text-align: center;
|
||||
padding: 8px 0 0 0;
|
||||
}
|
||||
|
||||
.about-title {
|
||||
font-size: 1.5em;
|
||||
font-weight: bold;
|
||||
margin-bottom: 0.2em;
|
||||
}
|
||||
|
||||
.about-desc {
|
||||
color: #666;
|
||||
margin-bottom: 0.5em;
|
||||
}
|
||||
|
||||
.about-info {
|
||||
text-align: left;
|
||||
display: inline-block;
|
||||
margin: 0 auto;
|
||||
font-size: 1em;
|
||||
}
|
||||
|
||||
.about-info b {
|
||||
margin-right: 1em;
|
||||
}
|
||||
|
||||
.about-date {
|
||||
margin-top: 1.5em;
|
||||
color: #aaa;
|
||||
font-size: 0.95em;
|
||||
text-align: right;
|
||||
}
|
||||
|
||||
.caption-control {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
justify-content: center;
|
||||
margin: 30px;
|
||||
}
|
||||
|
||||
.control-button {
|
||||
height: 40px;
|
||||
margin: 20px;
|
||||
font-size: 16px;
|
||||
}
|
||||
</style>
|
||||
112
src/renderer/src/components/GeneralSetting.vue
Normal file
@@ -0,0 +1,112 @@
|
||||
<template>
|
||||
<a-card size="small" :title="$t('general.title')">
|
||||
<template #extra>
|
||||
<a-popover>
|
||||
<template #content>
|
||||
<p class="general-note">{{ $t('general.note') }}</p>
|
||||
</template>
|
||||
<a><InfoCircleOutlined /></a>
|
||||
</a-popover>
|
||||
</template>
|
||||
|
||||
<div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('general.uiLanguage') }}</span>
|
||||
<a-radio-group v-model:value="uiLanguage">
|
||||
<a-radio-button value="zh">中文</a-radio-button>
|
||||
<a-radio-button value="en">English</a-radio-button>
|
||||
<a-radio-button value="ja">日本語</a-radio-button>
|
||||
</a-radio-group>
|
||||
</div>
|
||||
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('general.theme') }}</span>
|
||||
<a-radio-group v-model:value="uiTheme">
|
||||
<a-radio-button value="system">{{ $t('general.system') }}</a-radio-button>
|
||||
<a-radio-button value="light">{{ $t('general.light') }}</a-radio-button>
|
||||
<a-radio-button value="dark">{{ $t('general.dark') }}</a-radio-button>
|
||||
</a-radio-group>
|
||||
</div>
|
||||
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('general.color') }}</span>
|
||||
<a-radio-group v-model:value="uiColor">
|
||||
<template v-for="color in colorList" :key="color">
|
||||
<a-radio-button :value="color"
|
||||
:style="{backgroundColor: color}"
|
||||
>
|
||||
<CheckOutlined style="color: white;" v-if="color === uiColor" />
|
||||
<span v-else> </span>
|
||||
</a-radio-button>
|
||||
</template>
|
||||
</a-radio-group>
|
||||
</div>
|
||||
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('general.barWidth') }}</span>
|
||||
<a-slider class="span-input"
|
||||
:min="6" :max="12" v-model:value="leftBarWidth"
|
||||
/>
|
||||
<div class="input-item-value">{{ (leftBarWidth * 100 / 24).toFixed(0) }}%</div>
|
||||
</div>
|
||||
</div>
|
||||
</a-card>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { ref, watch } from 'vue'
|
||||
import { storeToRefs } from 'pinia'
|
||||
import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
|
||||
import { InfoCircleOutlined, CheckOutlined } from '@ant-design/icons-vue'
|
||||
|
||||
const generalSettingStore = useGeneralSettingStore()
|
||||
const { uiLanguage, realTheme, uiTheme, uiColor, leftBarWidth } = storeToRefs(generalSettingStore)
|
||||
|
||||
const colorListLight = [
|
||||
'#1677ff',
|
||||
'#00b96b',
|
||||
'#fa8c16',
|
||||
'#9254de',
|
||||
'#eb2f96',
|
||||
'#000000'
|
||||
]
|
||||
|
||||
const colorListDark = [
|
||||
'#1677ff',
|
||||
'#00b96b',
|
||||
'#fa8c16',
|
||||
'#9254de',
|
||||
'#eb2f96',
|
||||
'#b9d7ea'
|
||||
]
|
||||
|
||||
const colorList = ref(colorListLight)
|
||||
|
||||
watch(realTheme, (val) => {
|
||||
if(val === 'dark') {
|
||||
colorList.value = colorListDark
|
||||
} else {
|
||||
colorList.value = colorListLight
|
||||
}
|
||||
console.log(val)
|
||||
})
|
||||
|
||||
watch(uiTheme, (val) => {
|
||||
console.log(val)
|
||||
})
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
@import url(../assets/input.css);
|
||||
|
||||
.span-input {
|
||||
display: inline-block;
|
||||
width: 100px;
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.general-note {
|
||||
padding: 10px 10px 0;
|
||||
max-width: min(36vw, 400px);
|
||||
}
|
||||
</style>
|
||||