1 Commits

Author SHA1 Message Date
himeditator
36636d0caa feat(engine): 添加字幕窗口宽度记忆功能并优化字幕引擎关闭逻辑
- 添加 captionWindowWidth 属性,用于保存字幕窗口宽度
- 修改 CaptionEngine 中的 stop 和 kill 方法,优化字幕引擎关闭逻辑
- 更新 README,添加预备模型列表
2025-08-02 15:40:13 +08:00
86 changed files with 975 additions and 3648 deletions

9
.gitignore vendored
View File

@@ -6,14 +6,7 @@ out
*.log*
__pycache__
.venv
test.py
subenv
engine/build
engine/portaudio
engine/pyinstaller_cache
engine/models
engine/notebook
# engine/main.spec
.repomap
.virtualme

184
README.md
View File

@@ -3,7 +3,7 @@
<h1 align="center">auto-caption</h1>
<p>Auto Caption 是一个跨平台的实时字幕显示软件。</p>
<p>
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.1.0-blue"></a>
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-0.6.0-blue"></a>
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
@@ -14,18 +14,14 @@
| <a href="./README_en.md">English</a>
| <a href="./README_ja.md">日本語</a> |
</p>
<p><i>v1.1.0 版本已经发布,新增 GLM-ASR 云端字幕模型和 OpenAI 兼容模型翻译...</i></p>
<p><i>v0.6.0 版本已经发布,对字幕引擎代码进行了大重构,提升了代码的可扩展性。更多的字幕引擎正在尝试开发中...</i></p>
</div>
![](./assets/media/main_zh.png)
## 📥 下载
软件下载:[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
Vosk 模型下载:[Vosk Models](https://alphacephei.com/vosk/models)
SOSV 模型下载:[ Shepra-ONNX SenseVoice Model](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model)
[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
## 📚 相关文档
@@ -33,130 +29,51 @@ SOSV 模型下载:[ Shepra-ONNX SenseVoice Model](https://github.com/HiMeditat
[字幕引擎说明文档](./docs/engine-manual/zh.md)
[项目 API 文档](./docs/api-docs/)
[更新日志](./docs/CHANGELOG.md)
## 👁️‍🗨️ 预览
https://github.com/user-attachments/assets/9c188d78-9520-4397-bacf-4c8fdcc54874
## ✨ 特性
- 生成音频输出或麦克风输入的字幕
- 支持调用本地 Ollama 模型、云端 OpenAI 兼容模型、或云端 Google 翻译 API 进行翻译
- 跨平台Windows、macOS、Linux、多界面语言中文、英语、日语支持
- 丰富的字幕样式设置(字体、字体大小、字体粗细、字体颜色、背景颜色等)
- 灵活的字幕引擎选择(阿里云 Gummy 云端模型、GLM-ASR 云端模型、本地 Vosk 模型、本地 SOSV 模型、还可以自己开发模型)
- 多语言识别与翻译(见下文“⚙️ 自带字幕引擎说明”)
- 字幕记录展示与导出(支持导出 `.srt``.json` 格式)
## 📖 基本使用
> ⚠️ 注意:目前只维护了 Windows 平台的软件的最新版本,其他平台的最后版本停留在 v1.0.0。
软件已经适配了 Windows、macOS 和 Linux 平台。测试过的主流平台信息如下:
软件已经适配了 Windows、macOS 和 Linux 平台。测试过的平台信息如下:
| 操作系统版本 | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
| ------------------ | ---------- | ---------------- | ---------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ [需要额外配置](./docs/user-manual/zh.md#macos-获取系统音频输出) | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置,详见 [Auto Caption 用户手册](./docs/user-manual/zh.md)。
macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置,详见[Auto Caption 用户手册](./docs/user-manual/zh.md)。
下载软件后,需要根据自己的需求选择对应的模型,然后配置模型
> 国际版的阿里云服务并没有提供 Gummy 模型,因此目前非中国用户无法使用 Gummy 字幕引擎
| | 准确率 | 实时性 | 部署类型 | 支持语言 | 翻译 | 备注 |
| ------------------------------------------------------------ | -------- | ------------- | ---------- | ---------- | ---------------------------------------------------------- | ---------------------------------------------------------- |
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | 很好😊 | 很好😊 | 云端 / 阿里云 | 10 种 | 自带翻译 | 收费0.54CNY / 小时 |
| [glm-asr-2512](https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512) | 很好😊 | 较差😞 | 云端 / 智谱 AI | 4 种 | 需额外配置 | 收费,约 0.72CNY / 小时 |
| [Vosk](https://alphacephei.com/vosk) | 较差😞 | 很好😊 | 本地 / CPU | 超过 30 种 | 需额外配置 | 支持的语言非常多 |
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | 一般😐 | 一般😐 | 本地 / CPU | 5 种 | 需额外配置 | 仅有一个模型 |
| 自己开发 | 🤔 | 🤔 | 自定义 | 自定义 | 自定义 | 根据[文档](./docs/engine-manual/zh.md)使用 Python 自己开发 |
如果你选择的不是 Gummy 模型,你还需要配置自己的翻译模型。
### 配置翻译模型
![](./assets/media/engine_zh.png)
> 注意:翻译不是实时的,翻译模型只会在每句话识别完成后再调用。
#### Ollama 本地模型
> 注意:使用参数量过大的模型会导致资源消耗和翻译延迟较大。建议使用参数量小于 1B 的模型,比如: `qwen2.5:0.5b`, `qwen3:0.6b`。
使用该模型之前你需要确定本机安装了 [Ollama](https://ollama.com/) 软件,并已经下载了需要的大语言模型。只需要将需要调用的大模型名称添加到设置中的 `模型名称` 字段中,并保证 `Base URL` 字段为空。
#### OpenAI 兼容模型
如果觉得本地 Ollama 模型的翻译效果不佳,或者不想在本地安装 Ollama 模型,那么可以使用云端的 OpenAI 兼容模型。
以下是一些模型提供商的 `Base URL`
- OpenAI: https://api.openai.com/v1
- DeepSeekhttps://api.deepseek.com
- 阿里云https://dashscope.aliyuncs.com/compatible-mode/v1
API Key 需要在对应的模型提供商处获取。
#### Google 翻译 API
> 注意Google 翻译 API 在无法访问国际网络的地区无法使用。
无需任何配置,联网即可使用。
### 使用 Gummy 模型
> 国际版的阿里云服务似乎并没有提供 Gummy 模型,因此目前非中国用户可能无法使用 Gummy 字幕引擎。
如果要使用默认的 Gummy 字幕引擎(使用云端模型进行语音识别和翻译),首先需要获取阿里云百炼平台的 API KEY然后将 API KEY 添加到软件设置中(在字幕引擎设置的更多设置中)或者配置到环境变量中(仅 Windows 平台支持读取环境变量中的 API KEY这样才能正常使用该模型。相关教程
如果要使用默认的 Gummy 字幕引擎(使用云端模型进行语音识别和翻译),首先需要获取阿里云百炼平台的 API KEY然后将 API KEY 添加到软件设置中或者配置到环境变量中(仅 Windows 平台支持读取环境变量中的 API KEY这样才能正常使用该模型。相关教程
- [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
- [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
### 使用 GLM-ASR 模型
使用前需要获取智谱 AI 平台的 API KEY并添加到软件设置中。
API KEY 获取相关链接:[快速开始](https://docs.bigmodel.cn/cn/guide/start/quick-start)。
### 使用 Vosk 模型
> Vosk 模型的识别效果较差,请谨慎使用。
如果要使用 Vosk 本地字幕引擎,首先需要在 [Vosk Models](https://alphacephei.com/vosk/models) 页面下载你需要的模型,并将模型解压到本地,并将模型文件夹的路径添加到软件的设置中。
如果要使用 Vosk 本地字幕引擎,首先需要在 [Vosk Models](https://alphacephei.com/vosk/models) 页面下载你需要的模型,并将模型解压到本地,并将模型文件夹的路径添加到软件的设置中。目前 Vosk 字幕引擎还不支持翻译字幕内容。
![](./assets/media/config_zh.png)
![](./assets/media/vosk_zh.png)
### 使用 SOSV 模型
**如果你觉得上述字幕引擎不能满足你的需求,而且你会 Python那么你可以考虑开发自己的字幕引擎。详细说明请参考[字幕引擎说明文档](./docs/engine-manual/zh.md)。**
使用 SOSV 模型的方式和 Vosk 一样下载地址如下https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
## ✨ 特性
## ⌨️ 在终端中使用
软件采用模块化设计,可用分为软件主体和字幕引擎两部分,软件主体通过图形界面调用字幕引擎。核心的音频获取和音频识别功能都在字幕引擎中实现,而字幕引擎是可用脱离软件主体单独使用的。
字幕引擎使用 Python 开发,通过 PyInstaller 打包为可执行文件。因此字幕引擎有两种使用方式:
1. 使用项目字幕引擎部分的源代码,使用安装了对应库的 Python 环境进行运行
2. 使用打包好的字幕引擎的可执行文件,通过终端运行
运行参数和详细使用介绍请参考[用户手册](./docs/user-manual/zh.md#单独使用字幕引擎)。
```bash
python main.py \
-e gummy \
-k sk-******************************** \
-a 0 \
-d 1 \
-s en \
-t zh
```
![](./docs/img/07.png)
- 跨平台、多界面语言支持
- 丰富的字幕样式设置
- 灵活的字幕引擎选择
- 多语言识别与翻译
- 字幕记录展示与导出
- 生成音频输出或麦克风输入的字幕
## ⚙️ 自带字幕引擎说明
目前软件自带 4 个字幕引擎。它们的详细信息如下。
目前软件自带 2 个字幕引擎,正在规划新的引擎。它们的详细信息如下。
### Gummy 字幕引擎(云端)
@@ -183,18 +100,19 @@ $$
而且引擎只会获取到音频流的时候才会上传数据,因此实际上传速率可能更小。模型结果回传流量消耗较小,没有纳入考虑。
### GLM-ASR 字幕引擎(云端)
https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512
### Vosk 字幕引擎(本地)
基于 [vosk-api](https://github.com/alphacep/vosk-api) 开发。该字幕引擎的优点是可选的语言模型非常多(超过 30 种),缺点是识别效果比较差,且生成内容没有标点符号
基于 [vosk-api](https://github.com/alphacep/vosk-api) 开发。目前只支持生成音频对应的原文,不支持生成翻译内容
### 新规划字幕引擎
### SOSV 字幕引擎(本地)
以下为备选模型,将根据模型效果和集成难易程度选择。
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
- [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
- [FunASR](https://github.com/modelscope/FunASR)
[SOSV](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model) 是一个整合包,该整合包主要基于 [Shepra-ONNX SenseVoice](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html),并添加了端点检测模型和标点恢复模型。该模型支持识别的语言有:英语、中文、日语、韩语、粤语。
## 🚀 项目运行
@@ -211,26 +129,36 @@ npm install
首先进入 `engine` 文件夹,执行如下指令创建虚拟环境(需要使用大于等于 Python 3.10 的 Python 运行环境,建议使用 Python 3.12
```bash
cd ./engine
# in ./engine folder
python -m venv .venv
python -m venv subenv
# or
python3 -m venv .venv
python3 -m venv subenv
```
然后激活虚拟环境:
```bash
# Windows
.venv/Scripts/activate
subenv/Scripts/activate
# Linux or macOS
source .venv/bin/activate
source subenv/bin/activate
```
然后安装依赖(这一步在 macOS 和 Linux 可能会报错,一般是因为构建失败,需要根据报错信息进行处理):
```bash
pip install -r requirements.txt
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
如果在 Linux 系统上安装 `samplerate` 模块报错,可以尝试使用以下命令单独安装:
```bash
pip install samplerate --only-binary=:all:
```
然后使用 `pyinstaller` 构建项目:
@@ -243,12 +171,12 @@ pyinstaller ./main.spec
```
# Windows
vosk_path = str(Path('./.venv/Lib/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
# Linux or macOS
vosk_path = str(Path('./.venv/lib/python3.x/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
```
此时项目构建完成,进入 `engine/dist` 文件夹可见对应的可执行文件。即可进行后续操作。
此时项目构建完成,进入 `engine/dist` 文件夹可见对应的可执行文件。即可进行后续操作。
### 运行项目
@@ -266,3 +194,15 @@ npm run build:mac
# For Linux
npm run build:linux
```
注意,根据不同的平台需要修改项目根目录下 `electron-builder.yml` 文件中的配置内容:
```yml
extraResources:
# For Windows
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# For macOS and Linux
# - from: ./engine/dist/main
# to: ./engine/main
```

View File

@@ -3,7 +3,7 @@
<h1 align="center">auto-caption</h1>
<p>Auto Caption is a cross-platform real-time caption display software.</p>
<p>
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.1.0-blue"></a>
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-0.6.0-blue"></a>
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
@@ -14,18 +14,14 @@
| <b>English</b>
| <a href="./README_ja.md">日本語</a> |
</p>
<p><i>v1.1.0 has been released, adding the GLM-ASR cloud caption model and OpenAI compatible model translation...</i></p>
<p><i>Version 0.6.0 has been released, featuring a major refactor of the subtitle engine code to improve code extensibility. More subtitle engines are being developed...</i></p>
</div>
![](./assets/media/main_en.png)
## 📥 Download
Software Download: [GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
Vosk Model Download: [Vosk Models](https://alphacephei.com/vosk/models)
SOSV Model Download: [Shepra-ONNX SenseVoice Model](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model)
[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
## 📚 Documentation
@@ -33,131 +29,51 @@ SOSV Model Download: [Shepra-ONNX SenseVoice Model](https://github.com/HiMeditat
[Caption Engine Documentation](./docs/engine-manual/en.md)
[Project API Documentation (Chinese)](./docs/api-docs/)
[Changelog](./docs/CHANGELOG.md)
## 👁️‍🗨️ Preview
https://github.com/user-attachments/assets/9c188d78-9520-4397-bacf-4c8fdcc54874
## ✨ Features
- Generate captions from audio output or microphone input
- Supports calling local Ollama models, cloud-based OpenAI compatible models, or cloud-based Google Translate API for translation
- Cross-platform (Windows, macOS, Linux) and multi-language interface (Chinese, English, Japanese) support
- Rich caption style settings (font, font size, font weight, font color, background color, etc.)
- Flexible caption engine selection (Aliyun Gummy cloud model,GLM-ASR cloud model, local Vosk model, local SOSV model, or you can develop your own model)
- Multi-language recognition and translation (see below "⚙️ Built-in Subtitle Engines")
- Subtitle record display and export (supports exporting `.srt` and `.json` formats)
## 📖 Basic Usage
> ⚠️ Note: Currently, only the latest version of the software on Windows platform is maintained, while the last versions for other platforms remain at v1.0.0.
The software has been adapted for Windows, macOS, and Linux platforms. The tested platform information is as follows:
| OS Version | Architecture | System Audio Input | System Audio Output |
| ------------------ | ------------ | ------------------ | ------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ [Additional config required](./docs/user-manual/en.md#capturing-system-audio-output-on-macos) | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
Additional configuration is required to capture system audio output on macOS and Linux platforms. See [Auto Caption User Manual](./docs/user-manual/en.md) for details.
> The international version of Alibaba Cloud services does not provide the Gummy model, so non-Chinese users currently cannot use the Gummy caption engine.
After downloading the software, you need to select the corresponding model according to your needs and then configure the model.
To use the default Gummy caption engine (which uses cloud-based models for speech recognition and translation), you first need to obtain an API KEY from the Alibaba Cloud Bailian platform. Then add the API KEY to the software settings or configure it in environment variables (only Windows platform supports reading API KEY from environment variables) to properly use this model. Related tutorials:
| | Accuracy | Real-time | Deployment Type | Supported Languages | Translation | Notes |
| ------------------------------------------------------------ | -------- | --------- | --------------- | ------------------- | ----------- | ----- |
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | Very good 😊 | Very good 😊 | Cloud / Alibaba Cloud | 10 languages | Built-in translation | Paid, 0.54 CNY/hour |
| [glm-asr-2512](https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512) | Very good 😊 | Poor 😞 | Cloud / Zhipu AI | 4 languages | Requires additional configuration | Paid, approximately 0.72 CNY/hour |
| [Vosk](https://alphacephei.com/vosk) | Poor 😞 | Very good 😊 | Local / CPU | Over 30 languages | Requires additional configuration | Supports many languages |
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | Average 😐 | Average 😐 | Local / CPU | 5 languages | Requires additional configuration | Only one model |
| Self-developed | 🤔 | 🤔 | Custom | Custom | Custom | Develop your own using Python according to the [documentation](./docs/engine-manual/en.md) |
- [Obtaining API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
- [Configuring API Key through Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
If you choose a model other than Gummy, you also need to configure your own translation model.
> The recognition performance of Vosk models is suboptimal, please use with caution.
### Configuring Translation Models
To use the Vosk local caption engine, first download your required model from [Vosk Models](https://alphacephei.com/vosk/models) page, extract the model locally, and add the model folder path to the software settings. Currently, the Vosk caption engine does not support translated captions.
![](./assets/media/engine_en.png)
![](./assets/media/vosk_en.png)
> Note: Translation is not real-time. The translation model is only called after each sentence recognition is completed.
**If you find the above caption engines don't meet your needs and you know Python, you may consider developing your own caption engine. For detailed instructions, please refer to the [Caption Engine Documentation](./docs/engine-manual/en.md).**
#### Ollama Local Model
## ✨ Features
> Note: Using models with too many parameters will lead to high resource consumption and translation delays. It is recommended to use models with less than 1B parameters, such as: `qwen2.5:0.5b`, `qwen3:0.6b`.
Before using this model, you need to confirm that the [Ollama](https://ollama.com/) software is installed on your local machine and that you have downloaded the required large language model. Simply add the name of the large model you want to call to the `Model Name` field in the settings, and ensure that the `Base URL` field is empty.
#### OpenAI Compatible Model
If you feel the translation effect of the local Ollama model is not good enough, or don't want to install the Ollama model locally, then you can use cloud-based OpenAI compatible models.
Here are some model provider `Base URL`s:
- OpenAI: https://api.openai.com/v1
- DeepSeek: https://api.deepseek.com
- Alibaba Cloud: https://dashscope.aliyuncs.com/compatible-mode/v1
The API Key needs to be obtained from the corresponding model provider.
#### Google Translate API
> Note: Google Translate API is not available in some regions.
No configuration required, just connect to the internet to use.
### Using Gummy Model
> The international version of Alibaba Cloud services does not seem to provide the Gummy model, so non-Chinese users may not be able to use the Gummy caption engine at present.
To use the default Gummy caption engine (using cloud models for speech recognition and translation), you first need to obtain an API KEY from Alibaba Cloud Bailian platform, then add the API KEY to the software settings or configure it in the environment variables (only Windows platform supports reading API KEY from environment variables), so that the model can be used normally. Related tutorials:
- [Get API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
- [Configure API Key through Environment Variables](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
### Using the GLM-ASR Model
Before using it, you need to obtain an API KEY from the Zhipu AI platform and add it to the software settings.
For API KEY acquisition, see: [Quick Start](https://docs.bigmodel.cn/en/guide/start/quick-start).
### Using Vosk Model
> The recognition effect of the Vosk model is poor, please use it with caution.
To use the Vosk local caption engine, first download the model you need from the [Vosk Models](https://alphacephei.com/vosk/models) page, unzip the model locally, and add the path of the model folder to the software settings.
![](./assets/media/config_en.png)
### Using SOSV Model
The way to use the SOSV model is the same as Vosk. The download address is as follows: https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
## ⌨️ Using in Terminal
The software adopts a modular design and can be divided into two parts: the main software body and caption engine. The main software calls caption engine through a graphical interface. Audio acquisition and speech recognition functions are implemented in the caption engine, which can be used independently without the main software.
Caption engine is developed using Python and packaged as executable files via PyInstaller. Therefore, there are two ways to use caption engine:
1. Use the source code of the project's caption engine part and run it with a Python environment that has the required libraries installed
2. Run the packaged executable file of the caption engine through the terminal
For runtime parameters and detailed usage instructions, please refer to the [User Manual](./docs/user-manual/en.md#using-caption-engine-standalone).
```bash
python main.py \
-e gummy \
-k sk-******************************** \
-a 0 \
-d 1 \
-s en \
-t zh
```
![](./docs/img/07.png)
- Cross-platform, multi-language UI support
- Rich caption style settings
- Flexible caption engine selection
- Multi-language recognition and translation
- Caption recording display and export
- Generate captions for audio output or microphone input
## ⚙️ Built-in Subtitle Engines
Currently, the software comes with 4 caption engines, with new engines under development. Their detailed information is as follows.
Currently, the software comes with 2 subtitle engines, with new engines under development. Their detailed information is as follows.
### Gummy Subtitle Engine (Cloud)
@@ -176,7 +92,7 @@ Developed based on Tongyi Lab's [Gummy Speech Translation Model](https://help.al
**Network Traffic Consumption:**
The caption engine uses native sample rate (assumed to be 48kHz) for sampling, with 16bit sample depth and mono channel, so the upload rate is approximately:
The subtitle engine uses native sample rate (assumed to be 48kHz) for sampling, with 16bit sample depth and mono channel, so the upload rate is approximately:
$$
48000\ \text{samples/second} \times 2\ \text{bytes/sample} \times 1\ \text{channel} = 93.75\ \text{KB/s}
@@ -184,17 +100,18 @@ $$
The engine only uploads data when receiving audio streams, so the actual upload rate may be lower. The return traffic consumption of model results is small and not considered here.
### GLM-ASR Caption Engine (Cloud)
https://docs.bigmodel.cn/en/guide/models/sound-and-video/glm-asr-2512
### Vosk Subtitle Engine (Local)
Developed based on [vosk-api](https://github.com/alphacep/vosk-api). The advantage of this caption engine is that there are many optional language models (over 30 languages), but the disadvantage is that the recognition effect is relatively poor, and the generated content has no punctuation.
Developed based on [vosk-api](https://github.com/alphacep/vosk-api). Currently only supports generating original text from audio, does not support translation content.
### SOSV Subtitle Engine (Local)
### Planned New Subtitle Engines
[SOSV](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model) is an integrated package, mainly based on [Shepra-ONNX SenseVoice](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html), with added endpoint detection model and punctuation restoration model. The languages supported by this model for recognition are: English, Chinese, Japanese, Korean, and Cantonese.
The following are candidate models that will be selected based on model performance and ease of integration.
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
- [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
- [FunASR](https://github.com/modelscope/FunASR)
## 🚀 Project Setup
@@ -211,26 +128,36 @@ npm install
First enter the `engine` folder and execute the following commands to create a virtual environment (requires Python 3.10 or higher, with Python 3.12 recommended):
```bash
cd ./engine
# in ./engine folder
python -m venv .venv
python -m venv subenv
# or
python3 -m venv .venv
python3 -m venv subenv
```
Then activate the virtual environment:
```bash
# Windows
.venv/Scripts/activate
subenv/Scripts/activate
# Linux or macOS
source .venv/bin/activate
source subenv/bin/activate
```
Then install dependencies (this step might result in errors on macOS and Linux, usually due to build failures, and you need to handle them based on the error messages):
```bash
pip install -r requirements.txt
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
If you encounter errors when installing the `samplerate` module on Linux systems, you can try installing it separately with this command:
```bash
pip install samplerate --only-binary=:all:
```
Then use `pyinstaller` to build the project:
@@ -243,9 +170,9 @@ Note that the path to the `vosk` library in `main-vosk.spec` might be incorrect
```
# Windows
vosk_path = str(Path('./.venv/Lib/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
# Linux or macOS
vosk_path = str(Path('./.venv/lib/python3.x/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
```
After the build completes, you can find the executable file in the `engine/dist` folder. Then proceed with subsequent operations.
@@ -266,3 +193,15 @@ npm run build:mac
# For Linux
npm run build:linux
```
Note: You need to modify the configuration content in the `electron-builder.yml` file in the project root directory according to different platforms:
```yml
extraResources:
# For Windows
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# For macOS and Linux
# - from: ./engine/dist/main
# to: ./engine/main
```

View File

@@ -3,7 +3,7 @@
<h1 align="center">auto-caption</h1>
<p>Auto Caption はクロスプラットフォームのリアルタイム字幕表示ソフトウェアです。</p>
<p>
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.1.0-blue"></a>
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-0.6.0-blue"></a>
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
@@ -14,18 +14,14 @@
| <a href="./README_en.md">English</a>
| <b>日本語</b> |
</p>
<p><i>v1.1.0 バージョンがリリースされました。GLM-ASR クラウド字幕モデルと OpenAI 互換モデル翻訳が追加されました...</i></p>
<p><i>v0.6.0 バージョンがリリースされ、字幕エンジンコードが大規模にリファクタリングされ、コードの拡張性が向上しました。より多くの字幕エンジンの開発が試みられています...</i></p>
</div>
![](./assets/media/main_ja.png)
## 📥 ダウンロード
ソフトウェアダウンロード: [GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
Vosk モデルダウンロード: [Vosk Models](https://alphacephei.com/vosk/models)
SOSV モデルダウンロード: [Shepra-ONNX SenseVoice Model](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model)
[GitHub Releases](https://github.com/HiMeditator/auto-caption/releases)
## 📚 関連ドキュメント
@@ -33,132 +29,51 @@ SOSV モデルダウンロード: [Shepra-ONNX SenseVoice Model](https://github.
[字幕エンジン説明ドキュメント](./docs/engine-manual/ja.md)
[プロジェクト API ドキュメント(中国語)](./docs/api-docs/)
[更新履歴](./docs/CHANGELOG.md)
## 👁️‍🗨️ プレビュー
https://github.com/user-attachments/assets/9c188d78-9520-4397-bacf-4c8fdcc54874
## ✨ 特徴
- 音声出力またはマイク入力からの字幕生成
- ローカルのOllamaモデル、クラウド上のOpenAI互換モデル、またはクラウド上のGoogle翻訳APIを呼び出して翻訳を行うことをサポートしています
- クロスプラットフォームWindows、macOS、Linux、多言語インターフェース中国語、英語、日本語対応
- 豊富な字幕スタイル設定(フォント、フォントサイズ、フォント太さ、フォント色、背景色など)
- 柔軟な字幕エンジン選択阿里云Gummyクラウドモデル、GLM-ASRクラウドモデル、ローカルVoskモデル、ローカルSOSVモデル、または独自にモデルを開発可能
- 多言語認識と翻訳(下記「⚙️ 字幕エンジン説明」参照)
- 字幕記録表示とエクスポート(`.srt` および `.json` 形式のエクスポートに対応)
## 📖 基本使い方
> ⚠️ 注意現在、Windowsプラットフォームのソフトウェアの最新バージョンのみがメンテナンスされており、他のプラットフォームの最終バージョンはv1.0.0のままです
このソフトウェアは Windows、macOS、Linux プラットフォームに対応しています。テスト済みのプラットフォーム情報は以下の通りです:
このソフトウェアはWindows、macOS、Linuxプラットフォームに対応しています。テスト済みのプラットフォーム情報は以下の通りです
| OS バージョン | アーキテクチャ | システムオーディオ入力 | システムオーディオ出力 |
| ------------------ | ------------ | ------------------ | ------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ [追加設定が必要](./docs/user-manual/ja.md#macos-でのシステムオーディオ出力の取得方法) | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ | ✅ |
| Kali Linux 2022.3 | x64 | ✅ | ✅ |
| Kylin Server V10 SP3 | x64 | ✅ | ✅ |
macOS および Linux プラットフォームでシステムオーディオ出力を取得するには追加設定が必要です。詳細は[Auto Captionユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。
macOSおよびLinuxプラットフォームでシステムオーディオ出力を取得するには追加設定が必要です。詳細は[Auto Captionユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。
ソフトウェアをダウンロードした後、自分のニーズに応じて対応するモデルを選択し、モデルを設定する必要があります
> 阿里雲の国際版サービスでは Gummy モデルを提供していないため、現在中国以外のユーザーは Gummy 字幕エンジンを使用できません
| | 正確性 | 実時間性 | デプロイタイプ | 対応言語 | 翻訳 | 備考 |
| ------------------------------------------------------------ | -------- | --------- | -------------- | -------- | ---- | ---- |
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | とても良い😊 | とても良い😊 | クラウド / アリババクラウド | 10言語 | 内蔵翻訳 | 有料、0.54元/時間 |
| [glm-asr-2512](https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512) | とても良い😊 | 悪い😞 | クラウド / Zhipu AI | 4言語 | 追加設定が必要 | 有料、約0.72元/時間 |
| [Vosk](https://alphacephei.com/vosk) | 悪い😞 | とても良い😊 | ローカル / CPU | 30言語以上 | 追加設定が必要 | 多くの言語に対応 |
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | 普通😐 | 普通😐 | ローカル / CPU | 5言語 | 追加設定が必要 | 1つのモデルのみ |
| 自分で開発 | 🤔 | 🤔 | カスタム | カスタム | カスタム | [ドキュメント](./docs/engine-manual/ja.md)に従ってPythonを使用して自分で開発 |
デフォルトの Gummy 字幕エンジン(クラウドベースのモデルを使用した音声認識と翻訳)を使用するには、まず阿里雲百煉プラットフォームから API KEY を取得する必要があります。その後、API KEY をソフトウェア設定に追加するか、環境変数に設定しますWindows プラットフォームのみ環境変数からの API KEY 読み取りをサポート)。関連チュートリアル:
Gummyモデル以外を選択した場合、独自の翻訳モデルを設定する必要があります。
- [API KEY の取得(中国語)](https://help.aliyun.com/zh/model-studio/get-api-key)
- [環境変数を通じて API Key を設定(中国語)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
### 翻訳モデルの設定
> Vosk モデルの認識精度は低いため、注意してご使用ください。
![](./assets/media/engine_ja.png)
Vosk ローカル字幕エンジンを使用するには、まず [Vosk Models](https://alphacephei.com/vosk/models) ページから必要なモデルをダウンロードし、ローカルに解凍した後、モデルフォルダのパスをソフトウェア設定に追加してください。現在、Vosk 字幕エンジンは字幕の翻訳をサポートしていません。
> 注意:翻訳はリアルタイムではありません。翻訳モデルは各文の認識が完了した後にのみ呼び出されます。
![](./assets/media/vosk_ja.png)
#### Ollama ローカルモデル
**上記の字幕エンジンがご要望を満たさず、かつ Python の知識をお持ちの場合、独自の字幕エンジンを開発することも可能です。詳細な説明は[字幕エンジン説明書](./docs/engine-manual/ja.md)をご参照ください。**
> 注意パラメータ数が多すぎるモデルを使用すると、リソース消費と翻訳遅延が大きくなります。1B未満のパラメータ数のモデルを使用することを推奨します。例`qwen2.5:0.5b`、`qwen3:0.6b`。
## ✨ 特徴
このモデルを使用する前に、ローカルマシンに[Ollama](https://ollama.com/)ソフトウェアがインストールされており、必要な大規模言語モデルをダウンロード済みであることを確認してください。設定で呼び出す必要がある大規模モデル名を「モデル名」フィールドに入力し、「Base URL」フィールドが空であることを確認してください。
#### OpenAI互換モデル
ローカルのOllamaモデルの翻訳効果が良くないと感じる場合や、ローカルにOllamaモデルをインストールしたくない場合は、クラウド上のOpenAI互換モデルを使用できます。
いくつかのモデルプロバイダの「Base URL」
- OpenAI: https://api.openai.com/v1
- DeepSeek: https://api.deepseek.com
- アリババクラウド: https://dashscope.aliyuncs.com/compatible-mode/v1
API Keyは対応するモデルプロバイダから取得する必要があります。
#### Google翻訳API
> 注意Google翻訳APIは一部の地域では使用できません。
設定不要で、ネット接続があれば使用できます。
### Gummyモデルの使用
> 阿里云の国際版サービスにはGummyモデルが提供されていないため、現在中国以外のユーザーはGummy字幕エンジンを使用できない可能性があります。
デフォルトのGummy字幕エンジンクラウドモデルを使用した音声認識と翻訳を使用するには、まず阿里云百煉プラットフォームのAPI KEYを取得し、API KEYをソフトウェア設定に追加するか環境変数に設定する必要がありますWindowsプラットフォームのみ環境変数からのAPI KEY読み取りをサポート。関連チュートリアル
- [API KEYの取得](https://help.aliyun.com/zh/model-studio/get-api-key)
- [環境変数へのAPI Keyの設定](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
### GLM-ASR モデルの使用
使用前に、Zhipu AI プラットフォームから API キーを取得し、それをソフトウェアの設定に追加する必要があります。
API キーの取得についてはこちらをご覧ください:[クイックスタート](https://docs.bigmodel.cn/ja/guide/start/quick-start)。
### Voskモデルの使用
> Voskモデルの認識効果は不良のため、注意して使用してください。
Voskローカル字幕エンジンを使用するには、まず[Vosk Models](https://alphacephei.com/vosk/models)ページから必要なモデルをダウンロードし、ローカルにモデルを解凍し、モデルフォルダのパスをソフトウェア設定に追加してください。
![](./assets/media/config_ja.png)
### SOSVモデルの使用
SOSVモデルの使用方法はVoskと同じで、ダウンロードアドレスは以下の通りですhttps://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
## ⌨️ ターミナルでの使用
ソフトウェアはモジュール化設計を採用しており、ソフトウェア本体と字幕エンジンの2つの部分に分けることができます。ソフトウェア本体はグラフィカルインターフェースを通じて字幕エンジンを呼び出します。コアとなる音声取得および音声認識機能はすべて字幕エンジンに実装されており、字幕エンジンはソフトウェア本体から独立して単独で使用できます。
字幕エンジンはPythonを使用して開発され、PyInstallerによって実行可能ファイルとしてパッケージ化されます。したがって、字幕エンジンの使用方法は以下の2つがあります
1. プロジェクトの字幕エンジン部分のソースコードを使用し、必要なライブラリがインストールされたPython環境で実行する
2. パッケージ化された字幕エンジンの実行可能ファイルをターミナルから実行する
実行引数および詳細な使用方法については、[User Manual](./docs/user-manual/en.md#using-caption-engine-standalone)をご参照ください。
```bash
python main.py \
-e gummy \
-k sk-******************************** \
-a 0 \
-d 1 \
-s en \
-t zh
```
![](./docs/img/07.png)
- クロスプラットフォーム、多言語 UI サポート
- 豊富な字幕スタイル設定
- 柔軟な字幕エンジン選択
- 多言語認識と翻訳
- 字幕記録の表示とエクスポート
- オーディオ出力またはマイク入力からの字幕生成
## ⚙️ 字幕エンジン説明
現在、ソフトウェアには4つの字幕エンジンが搭載されており、新しいエンジンが計画されています。それらの詳細情報は以下の通りです。
現在、ソフトウェアには2つの字幕エンジンが搭載されており、新しいエンジンが計画されています。それらの詳細情報は以下の通りです。
### Gummy 字幕エンジン(クラウド)
@@ -185,17 +100,18 @@ $$
また、エンジンはオーディオストームを取得したときのみデータをアップロードするため、実際のアップロードレートはさらに小さくなる可能性があります。モデル結果の返信トラフィック消費量は小さく、ここでは考慮していません。
### GLM-ASR 字幕エンジン(クラウド)
https://docs.bigmodel.cn/ja/guide/models/sound-and-video/glm-asr-2512
### Vosk字幕エンジンローカル
[vosk-api](https://github.com/alphacep/vosk-api)をベースに開発。この字幕エンジンの利点は選択可能な言語モデルが非常に多く30言語以上、欠点は認識効果が比較的悪く、生成内容に句読点がないことです
[vosk-api](https://github.com/alphacep/vosk-api) をベースに開発されています。現在は音声に対応する原文の生成のみをサポートしており、翻訳コンテンツはサポートしていません
### SOSV 字幕エンジン(ローカル)
### 新規計画字幕エンジン
[SOSV](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model)は統合パッケージで、主に[Shepra-ONNX SenseVoice](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html)をベースにし、エンドポイント検出モデルと句読点復元モデルを追加しています。このモデルが認識をサポートする言語は:英語、中国語、日本語、韓国語、広東語です。
以下は候補モデルであり、モデルの性能と統合の容易さに基づいて選択されます。
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
- [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
- [FunASR](https://github.com/modelscope/FunASR)
## 🚀 プロジェクト実行
@@ -212,26 +128,36 @@ npm install
まず `engine` フォルダに入り、以下のコマンドを実行して仮想環境を作成しますPython 3.10 以上が必要で、Python 3.12 が推奨されます):
```bash
cd ./engine
# ./engine フォルダ内
python -m venv .venv
python -m venv subenv
# または
python3 -m venv .venv
python3 -m venv subenv
```
次に仮想環境をアクティブにします:
```bash
# Windows
.venv/Scripts/activate
subenv/Scripts/activate
# Linux または macOS
source .venv/bin/activate
source subenv/bin/activate
```
次に依存関係をインストールします(このステップでは macOS と Linux でエラーが発生する可能性があります。通常はビルド失敗によるもので、エラーメッセージに基づいて対処する必要があります):
```bash
pip install -r requirements.txt
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
Linux システムで `samplerate` モジュールのインストールに問題が発生した場合、以下のコマンドで個別にインストールを試すことができます:
```bash
pip install samplerate --only-binary=:all:
```
その後、`pyinstaller` を使用してプロジェクトをビルドします:
@@ -244,9 +170,9 @@ pyinstaller ./main.spec
```
# Windows
vosk_path = str(Path('./.venv/Lib/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
# Linux または macOS
vosk_path = str(Path('./.venv/lib/python3.x/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
```
これでプロジェクトのビルドが完了し、`engine/dist` フォルダ内に対応する実行可能ファイルが確認できます。その後、次の操作に進むことができます。
@@ -267,3 +193,15 @@ npm run build:mac
# Linux 用
npm run build:linux
```
注意: プラットフォームに応じて、プロジェクトルートディレクトリにある `electron-builder.yml` ファイルの設定内容を変更する必要があります:
```yml
extraResources:
# Windows 用
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# macOS と Linux 用
# - from: ./engine/dist/main
# to: ./engine/main
```

Binary file not shown.

Before

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 476 KiB

After

Width:  |  Height:  |  Size: 370 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 488 KiB

After

Width:  |  Height:  |  Size: 387 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 486 KiB

After

Width:  |  Height:  |  Size: 396 KiB

BIN
assets/media/vosk_en.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

BIN
assets/media/vosk_ja.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

BIN
assets/media/vosk_zh.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

View File

@@ -8,9 +8,5 @@
<true/>
<key>com.apple.security.cs.allow-dyld-environment-variables</key>
<true/>
<key>com.apple.security.cs.disable-library-validation</key>
<true/>
<key>com.apple.security.device.audio-input</key>
<true/>
</dict>
</plist>
</plist>

View File

@@ -137,46 +137,11 @@
- 合并 Gummy 和 Vosk 引擎为单个可执行文件
- 字幕引擎和主程序添加 Socket 通信,完全避免字幕引擎成为孤儿进程
## v0.7.0
2025-08-20
2025-08-xx
### 新增功能
- 添加字幕窗口宽度记忆,重新打开时与上次字幕窗口宽度一致
- 在尝试关闭字幕引擎 4s 后字幕引擎仍未关闭,则强制关闭字幕引擎
- 添加复制最新字幕选项用户可以选择只复制最近1~3条字幕 (#13)
- 添加主题颜色设置,支持六种颜色:蓝色、绿色、橙色、紫色、粉色、暗色/明色
- 添加日志记录显示:可以查看软件的字幕引擎输出的日志记录
### 优化体验
- 优化软件用户界面的部分组件
- 更清晰的日志输出
## v1.0.0
2025-09-08
### 新增功能
- 字幕引擎添加超时关闭功能:如果在规定时间字幕引擎没有启动成功会自动关闭;在字幕引擎启动过程中可选择关闭字幕引擎
- 添加非实时翻译功能:支持调用 Ollama 本地模型进行翻译;支持调用 Google 翻译 API 进行翻译
- 添加新的翻译模型:添加 SOSV 模型,支持识别英语、中文、日语、韩语、粤语
- 添加录音功能:可以将字幕引擎识别的音频保存为 .wav 文件
- 添加多行字幕功能,用户可以设置字幕窗口显示的字幕的行数
### 优化体验
- 优化部分提示信息显示位置
- 替换重采样模型,提高音频重采样质量
- 带有额外信息的标签颜色改为与主题色一致
## v1.1.0
### 新增功能
- 添加基于 GLM-ASR 的字幕引擎
- 添加 OpenAI API 兼容模型作为新的翻译模型
- 在尝试关闭字幕引擎 4s 后字幕引擎仍未关闭,则强制关闭字幕引擎

View File

@@ -18,13 +18,17 @@
- [x] 添加字幕记录按时间降序排列选择 *2025/07/26*
- [x] 重构字幕引擎 *2025/07/28*
- [x] 优化前端界面提示消息 *2025/07/29*
- [x] 复制字幕记录可选择只复制最近的字幕记录 *2025/08/18*
- [x] 添加颜色主题设置 *2025/08/18*
- [x] 前端页面添加日志内容展示 *2025/08/19*
- [x] 添加 Ollama 模型用于本地字幕引擎的翻译 *2025/09/04*
- [x] 验证 / 添加基于 sherpa-onnx 的字幕引擎 *2025/09/06*
- [x] 添加 GLM-ASR 模型 *2026/01/10*
## TODO
## 待完成
暂无
- [ ] 验证 / 添加基于 sherpa-onnx 的字幕引擎
## 后续计划
- [ ] 添加 Ollama 模型用于本地字幕引擎的翻译
- [ ] 验证 / 添加基于 FunASR 的字幕引擎
- [ ] 减小软件不必要的体积
## 遥远的未来
- [ ] 使用 Tauri 框架重新开发

View File

@@ -58,19 +58,6 @@ Electron 主进程通过 TCP Socket 向 Python 进程发送数据。发送的数
Python 端监听到的音频流转换为的字幕数据。
### `translation`
```js
{
command: "translation",
time_s: string,
text: string,
translation: string
}
```
语音识别的内容的翻译,可以根据起始时间确定对应的字幕。
### `print`
```js
@@ -80,7 +67,7 @@ Python 端监听到的音频流转换为的字幕数据。
}
```
输出 Python 端打印的内容,不计入日志
输出 Python 端打印的内容。
### `info`
@@ -91,29 +78,7 @@ Python 端监听到的音频流转换为的字幕数据。
}
```
Python 端打印的提示信息,会计入日志
### `warn`
```js
{
command: "warn",
content: string
}
```
Python 端打印的警告信息,会计入日志。
### `error`
```js
{
command: "error",
content: string
}
```
Python 端打印的错误信息,该错误信息会在前端弹窗显示。
Python 端打印的提示信息,比起 `print`,该信息更希望 Electron 端的关注
### `usage`

View File

@@ -84,7 +84,7 @@
### `control.uiTheme.change`
**介绍:** 前端修改界面主题,将修改同步给后端
**介绍:** 前端修改界面主题,将修改同步给后端
**发起方:** 前端控制窗口
@@ -92,16 +92,6 @@
**数据类型:** `UITheme`
### `control.uiColor.change`
**介绍:** 前端修改界面主题颜色,将修改同步给后端
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** `string`
### `control.leftBarWidth.change`
**介绍:** 前端修改边栏宽度,将修改同步给后端
@@ -182,16 +172,6 @@
**数据类型:** 无数据
### `control.engine.forceKill`
**介绍:** 强制关闭启动超时的字幕引擎
**发起方:** 前端控制窗口
**接收方:** 后端控制窗口实例
**数据类型:** 无数据
### `caption.windowHeight.change`
**介绍:** 字幕窗口宽度发生改变
@@ -294,16 +274,6 @@
**数据类型:** `Controls`
### `control.softwareLog.add`
**介绍:** 添加一条新的日志数据
**发起方:** 后端
**接收方:** 前端控制窗口
**数据类型:** `SoftwareLog`
### `both.styles.set`
**介绍:** 后端将最新字幕样式发送给前端,前端进行设置

View File

@@ -1,207 +1,175 @@
# Caption Engine Documentation
# Caption Engine Documentation
Corresponding version: v1.0.0
Corresponding Version: v0.6.0
![](../../assets/media/structure_en.png)
![](../../assets/media/structure_en.png)
## Introduction to the Caption Engine
## Introduction to the Caption Engine
The so-called caption engine is actually a subprocess that captures streaming data from system audio input (microphone) or output (speaker) in real-time, and invokes an audio-to-text model to generate captions for the corresponding audio. The generated captions are converted into JSON-formatted string data and transmitted to the main program through standard output (ensuring that the string received by the main program can be correctly interpreted as a JSON object). The main program reads and interprets the caption data, processes it, and displays it in the window.
The so-called caption engine is essentially a subprogram that continuously captures real-time streaming data from the system's audio input (microphone) or output (speakers) and invokes an audio-to-text model to generate corresponding captions for the audio. The generated captions are converted into JSON-formatted string data and passed to the main program via standard output (ensuring the string can be correctly interpreted as a JSON object by the main program). The main program reads and interprets the caption data, processes it, and displays it in the window.
**Communication between the caption engine process and Electron main process follows the standard: [caption engine api-doc](../api-docs/caption-engine.md).**
**The communication standard between the caption engine process and the Electron main process is: [caption engine api-doc](../api-docs/caption-engine.md).**
## Execution Flow
## Workflow
Process of communication between main process and caption engine:
The communication flow between the main process and the caption engine:
### Starting the Engine
### Starting the Engine
- Electron main process: Use `child_process.spawn()` to start the caption engine process
- Caption engine process: Create a TCP Socket server thread, after creation output a JSON object converted to string via standard output, containing the `command` field with value `connect`
- Main process: Listen to the caption engine process's standard output, try to split the standard output by lines, parse it into a JSON object, and check if the object's `command` field value is `connect`. If so, connect to the TCP Socket server
- **Main Process**: Uses `child_process.spawn()` to launch the caption engine process.
- **Caption Engine Process**: Creates a TCP Socket server thread. After creation, it outputs a JSON object string via standard output, containing a `command` field with the value `connect`.
- **Main Process**: Monitors the standard output of the caption engine process, attempts to split it line by line, parses it into a JSON object, and checks if the `command` field value is `connect`. If so, it connects to the TCP Socket server.
### Caption Recognition
### Caption Recognition
- Caption engine process: Create a new thread to monitor system audio output, put acquired audio data chunks into a shared queue (`shared_data.chunk_queue`). The caption engine continuously reads audio data chunks from the shared queue and parses them. The caption engine may also create a new thread to perform translation operations. Finally, the caption engine sends parsed caption data object strings through standard output
- Electron main process: Continuously listen to the caption engine's standard output and take different actions based on the parsed object's `command` field
- **Caption Engine Process**: The main thread monitors system audio output, sends audio data chunks to the caption engine for parsing, and outputs the parsed caption data object strings via standard output.
- **Main Process**: Continues to monitor the standard output of the caption engine and performs different operations based on the `command` field of the parsed object.
### Stopping the Engine
### Closing the Engine
- Electron main process: When the user operates to close the caption engine in the frontend, the main process sends an object string with `command` field set to `stop` to the caption engine process through Socket communication
- Caption engine process: Receive the caption data object string sent by the main engine process, parse the string into an object. If the object's `command` field is `stop`, set the value of global variable `shared_data.status` to `stop`
- Caption engine process: Main thread continuously monitors system audio output, when `thread_data.status` value is not `running`, end the loop, release resources, and terminate execution
- Electron main process: If the caption engine process termination is detected, perform corresponding processing and provide feedback to the frontend
- **Main Process**: When the user closes the caption engine via the frontend, the main process sends a JSON object string with the `command` field set to `stop` to the caption engine process via Socket communication.
- **Caption Engine Process**: Receives the object string, parses it, and if the `command` field is `stop`, sets the global variable `thread_data.status` to `stop`.
- **Caption Engine Process**: The main thread's loop for monitoring system audio output ends when `thread_data.status` is not `running`, releases resources, and terminates.
- **Main Process**: Detects the termination of the caption engine process, performs corresponding cleanup, and provides feedback to the frontend.
## Implemented Features
## Implemented Features
The following features have been implemented and can be directly reused.
The following features are already implemented and can be reused directly.
### Standard Output
### Standard Output
Can output regular information, commands, and error messages.
Supports printing general information, commands, and error messages.
Examples:
Example:
```python
from utils import stdout, stdout_cmd, stdout_obj, stderr
# {"command": "print", "content": "Hello"}\n
stdout("Hello")
# {"command": "connect", "content": "8080"}\n
stdout_cmd("connect", "8080")
# {"command": "print", "content": "print"}\n
stdout_obj({"command": "print", "content": "print"})
# sys.stderr.write("Error Info" + "\n")
stderr("Error Info")
```
```python
from utils import stdout, stdout_cmd, stdout_obj, stderr
stdout("Hello") # {"command": "print", "content": "Hello"}\n
stdout_cmd("connect", "8080") # {"command": "connect", "content": "8080"}\n
stdout_obj({"command": "print", "content": "Hello"})
stderr("Error Info")
```
### Creating Socket Service
### Creating a Socket Service
This Socket service listens on a specified port, parses content sent by the Electron main program, and may change the value of `shared_data.status`.
This Socket service listens on a specified port, parses content sent by the Electron main program, and may modify the value of `thread_data.status`.
Example:
Example:
```python
from utils import start_server
from utils import shared_data
port = 8080
start_server(port)
while thread_data == 'running':
# do something
pass
```
```python
from utils import start_server
from utils import thread_data
port = 8080
start_server(port)
while thread_data == 'running':
# do something
pass
```
### Audio Acquisition
### Audio Capture
The `AudioStream` class is used to acquire audio data, with cross-platform implementation supporting Windows, Linux, and macOS. The class initialization includes two parameters:
The `AudioStream` class captures audio data and is cross-platform, supporting Windows, Linux, and macOS. Its initialization includes two parameters:
- `audio_type`: Audio acquisition type, 0 for system output audio (speaker), 1 for system input audio (microphone)
- `chunk_rate`: Audio data acquisition frequency, number of audio chunks acquired per second, default is 10
- `audio_type`: The type of audio to capture. `0` for system output audio (speakers), `1` for system input audio (microphone).
- `chunk_rate`: The frequency of audio data capture, i.e., the number of audio chunks captured per second.
The class contains four methods:
The class includes three methods:
- `open_stream()`: Start audio acquisition
- `read_chunk() -> bytes`: Read an audio chunk
- `close_stream()`: Close audio acquisition
- `close_stream_signal()`: Thread-safe closing of system audio input stream
- `open_stream()`: Starts audio capture.
- `read_chunk() -> bytes`: Reads an audio chunk.
- `close_stream()`: Stops audio capture.
Example:
Example:
```python
from sysaudio import AudioStream
audio_type = 0
chunk_rate = 20
stream = AudioStream(audio_type, chunk_rate)
stream.open_stream()
while True:
data = stream.read_chunk()
# do something with data
pass
stream.close_stream()
```
```python
from sysaudio import AudioStream
audio_type = 0
chunk_rate = 20
stream = AudioStream(audio_type, chunk_rate)
stream.open_stream()
while True:
data = stream.read_chunk()
# do something with data
pass
stream.close_stream()
```
### Audio Processing
### Audio Processing
Before converting audio streams to text, preprocessing may be required. Usually, multi-channel audio needs to be converted to single-channel audio, and resampling may also be needed. This project provides two audio processing functions:
The captured audio stream may require preprocessing before conversion to text. Typically, multi-channel audio needs to be converted to mono, and resampling may be necessary. This project provides three audio processing functions:
- `merge_chunk_channels(chunk: bytes, channels: int) -> bytes`: Convert multi-channel audio chunks to single-channel audio chunks
- `resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int) -> bytes`: Convert current multi-channel audio data chunks to single-channel audio data chunks, then perform resampling
- `merge_chunk_channels(chunk: bytes, channels: int) -> bytes`: Converts a multi-channel audio chunk to mono.
- `resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`: Converts a multi-channel audio chunk to mono and resamples it.
- `resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`: Resamples a mono audio chunk.
Example:
## Features to Be Implemented in the Caption Engine
```python
from sysaudio import AudioStream
from utils import merge_chunk_channels
stream = AudioStream(1)
while True:
raw_chunk = stream.read_chunk()
chunk = resample_chunk_mono(raw_chunk, stream.CHANNELS, stream.RATE, 16000)
# do something with chunk
```
### Audio-to-Text Conversion
## Features to be Implemented by the Caption Engine
After obtaining a suitable audio stream, it needs to be converted to text. Typically, various models (cloud-based or local) are used for this purpose. Choose the appropriate model based on requirements.
### Audio to Text Conversion
This part is recommended to be encapsulated as a class with three methods:
After obtaining suitable audio streams, the audio stream needs to be converted to text. Generally, various models (cloud or local) are used to implement audio-to-text conversion. Appropriate models should be selected according to requirements.
- `start(self)`: Starts the model.
- `send_audio_frame(self, data: bytes)`: Processes the current audio chunk data. **The generated caption data is sent to the Electron main process via standard output.**
- `stop(self)`: Stops the model.
It is recommended to encapsulate this as a class, implementing four methods:
Complete caption engine examples:
- `start(self)`: Start the model
- `send_audio_frame(self, data: bytes)`: Process current audio chunk data, **generated caption data is sent to Electron main process through standard output**
- `translate(self)`: Continuously retrieve data chunks from `shared_data.chunk_queue` and call `send_audio_frame` method to process data chunks
- `stop(self)`: Stop the model
- [gummy.py](../../engine/audio2text/gummy.py)
- [vosk.py](../../engine/audio2text/vosk.py)
Complete caption engine examples:
### Caption Translation
- [gummy.py](../../engine/audio2text/gummy.py)
- [vosk.py](../../engine/audio2text/vosk.py)
- [sosv.py](../../engine/audio2text/sosv.py)
Some speech-to-text models do not provide translation. If needed, a translation module must be added.
### Caption Translation
### Sending Caption Data
Some speech-to-text models do not provide translation. If needed, an additional translation module needs to be added, or built-in translation modules can be used.
After obtaining the text for the current audio stream, it must be sent to the main program. The caption engine process passes caption data to the Electron main process via standard output.
Example:
The content must be a JSON string, with the JSON object including the following parameters:
```python
from utils import google_translate, ollama_translate
text = "This is a translation test."
google_translate("", "en", text, "time_s")
ollama_translate("qwen3:0.6b", "en", text, "time_s")
```
```typescript
export interface CaptionItem {
command: "caption",
index: number, // Caption sequence number
time_s: string, // Start time of the current caption
time_t: string, // End time of the current caption
text: string, // Caption content
translation: string // Caption translation
}
```
### Caption Data Transmission
**Note: Ensure the buffer is flushed after each JSON output to guarantee the Electron main process receives a string that can be parsed as a JSON object.**
After obtaining the text from the current audio stream, the text needs to be sent to the main program. The caption engine process transmits caption data to the Electron main process through standard output.
It is recommended to use the project's `stdout_obj` function for sending.
The transmitted content must be a JSON string, where the JSON object needs to contain the following parameters:
### Command-Line Parameter Specification
```typescript
export interface CaptionItem {
command: "caption",
index: number, // Caption sequence number
time_s: string, // Current caption start time
time_t: string, // Current caption end time
text: string, // Caption content
translation: string // Caption translation
}
```
Custom caption engine settings are provided via command-line arguments. The current project uses the following parameters:
**Note that the buffer must be flushed after each caption JSON data output to ensure that the Electron main process receives strings that can be interpreted as JSON objects each time.** It is recommended to use the project's existing `stdout_obj` function for transmission.
```python
import argparse
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
# Common parameters
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
parser.add_argument('-c', '--chunk_rate', default=20, help='Number of audio stream chunks collected per second')
parser.add_argument('-p', '--port', default=8080, help='The port to run the server on, 0 for no server')
# Gummy-specific parameters
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
# Vosk-specific parameters
parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
```
### Command Line Parameter Specification
For example, to use the Gummy model with Japanese as the source language, Chinese as the target language, and system audio output captions with 0.1s audio chunks, the command-line arguments would be:
Custom caption engine settings provide command line parameter specification, so the caption engine parameters need to be set properly. Currently used parameters in this project are as follows:
```python
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
# all
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
parser.add_argument('-c', '--chunk_rate', default=10, help='Number of audio stream chunks collected per second')
parser.add_argument('-p', '--port', default=0, help='The port to run the server on, 0 for no server')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code, "none" for no translation')
parser.add_argument('-r', '--record', default=0, help='Whether to record the audio, 0 for no recording, 1 for recording')
parser.add_argument('-rp', '--record_path', default='', help='Path to save the recorded audio')
# gummy and sosv
parser.add_argument('-s', '--source_language', default='auto', help='Source language code')
# gummy only
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
# vosk and sosv
parser.add_argument('-tm', '--translation_model', default='ollama', help='Model for translation: ollama or google')
parser.add_argument('-omn', '--ollama_name', default='', help='Ollama model name for translation')
# vosk only
parser.add_argument('-vosk', '--vosk_model', default='', help='The path to the vosk model.')
# sosv only
parser.add_argument('-sosv', '--sosv_model', default=None, help='The SenseVoice model path')
```
For example, for this project's caption engine, if I want to use the Gummy model, specify the original text as Japanese, translate to Chinese, and capture captions from system audio output, with 0.1s audio data segments each time, the command line parameters would be:
```bash
python main.py -e gummy -s ja -t zh -a 0 -c 10 -k <dashscope-api-key>
```
```bash
python main.py -e gummy -s ja -t zh -a 0 -c 10 -k <dashscope-api-key>
```
## Additional Notes
@@ -215,7 +183,7 @@ python main.py -e gummy -s ja -t zh -a 0 -c 10 -k <dashscope-api-key>
### Development Recommendations
Except for audio-to-text conversion and translation, other components (audio acquisition, audio resampling, and communication with the main process) are recommended to directly reuse the project's code. If this approach is taken, the content that needs to be added includes:
Apart from audio-to-text conversion, it is recommended to reuse the existing code. In this case, the following additions are needed:
- `engine/audio2text/`: Add a new audio-to-text class (file-level).
- `engine/main.py`: Add new parameter settings and workflow functions (refer to `main_gummy` and `main_vosk` functions).

View File

@@ -1,7 +1,5 @@
# 字幕エンジン説明ドキュメント
## 注意:このドキュメントはメンテナンスが行われていないため、記載されている情報は古くなっています。最新の情報については、[中国語版](./zh.md)または[英語版](./en.md)のドキュメントをご参照ください。
対応バージョンv0.6.0
この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
@@ -159,7 +157,7 @@ if __name__ == "__main__":
# 共通
parser.add_argument('-e', '--caption_engine', default='gummy', help='字幕エンジン: gummyまたはvosk')
parser.add_argument('-a', '--audio_type', default=0, help='オーディオストリームソース: 0は出力、1は入力')
parser.add_argument('-c', '--chunk_rate', default=10, help='1秒あたりに収集するオーディオストリームブロックの数')
parser.add_argument('-c', '--chunk_rate', default=20, help='1秒あたりに収集するオーディオストリームブロックの数')
parser.add_argument('-p', '--port', default=8080, help='サーバーを実行するポート、0はサーバーなし')
# gummy専用
parser.add_argument('-s', '--source_language', default='en', help='ソース言語コード')

View File

@@ -1,6 +1,6 @@
# 字幕引擎说明文档
对应版本v1.0.0
对应版本v0.6.0
![](../../assets/media/structure_zh.png)
@@ -16,21 +16,22 @@
### 启动引擎
- Electron 主进程:使用 `child_process.spawn()` 启动字幕引擎进程
- 主进程:使用 `child_process.spawn()` 启动字幕引擎进程
- 字幕引擎进程:创建 TCP Socket 服务器线程,创建后在标准输出中输出转化为字符串的 JSON 对象,该对象中包含 `command` 字段,值为 `connect`
- 主进程:监听字幕引擎进程的标准输出,尝试将标准输出按行分割,解析为 JSON 对象,并判断对象的 `command` 字段值是否为 `connect`,如果是则连接 TCP Socket 服务器
### 字幕识别
- 字幕引擎进程:新建线程监听系统音频输出,将获取的音频数据块放入共享队列中(`shared_data.chunk_queue`)。字幕引擎不断读取共享队列中的音频数据块并解析字幕引擎还可能新建线程执行翻译操作。最后字幕引擎通过标准输出发送解析的字幕数据对象字符串
- Electron 主进程:续监听字幕引擎的标准输出,并根据解析的对象的 `command` 字段采取不同的操作
- 字幕引擎进程:在主线程监听系统音频输出,将音频数据块发送给字幕引擎解析字幕引擎解析音频数据块,通过标准输出发送解析的字幕数据对象字符串
- 主进程:续监听字幕引擎的标准输出,并根据解析的对象的 `command` 字段采取不同的操作
### 关闭引擎
- Electron 主进程:当用户在前端操作关闭字幕引擎时,主进程通过 Socket 通信给字幕引擎进程发送 `command` 字段为 `stop` 的对象字符串
- 字幕引擎进程:接收主引擎进程发送的字幕数据对象字符串,将字符串解析为对象,如果对象的 `command` 字段为 `stop`,则将全局变量 `shared_data.status` 的值设置为 `stop`
- 主进程:当用户在前端操作关闭字幕引擎时,主进程通过 Socket 通信给字幕引擎进程发送 `command` 字段为 `stop` 的对象字符串
- 字幕引擎进程:接收主引擎进程发送的字幕数据对象字符串,将字符串解析为对象,如果对象的 `command` 字段为 `stop`,则将全局变量 `thread_data.status` 的值设置为 `stop`
- 字幕引擎进程:主线程循环监听系统音频输出,当 `thread_data.status` 的值不为 `running` 时,则结束循环,释放资源,结束运行
- Electron 主进程:如果检测到字幕引擎进程结束,进行相应处理,并向前端反馈
- 主进程:如果检测到字幕引擎进程结束,进行相应处理,并向前端反馈
## 项目已经实现的功能
@@ -44,25 +45,21 @@
```python
from utils import stdout, stdout_cmd, stdout_obj, stderr
# {"command": "print", "content": "Hello"}\n
stdout("Hello")
# {"command": "connect", "content": "8080"}\n
stdout_cmd("connect", "8080")
# {"command": "print", "content": "print"}\n
stdout_obj({"command": "print", "content": "print"})
# sys.stderr.write("Error Info" + "\n")
stdout("Hello") # {"command": "print", "content": "Hello"}\n
stdout_cmd("connect", "8080") # {"command": "connect", "content": "8080"}\n
stdout_obj({"command": "print", "content": "Hello"})
stderr("Error Info")
```
### 创建 Socket 服务
该 Socket 服务会监听指定端口,会解析 Electron 主程序发送的内容,并可能改变 `shared_data.status` 的值。
该 Socket 服务会监听指定端口,会解析 Electron 主程序发送的内容,并可能改变 `thread_data.status` 的值。
样例:
```python
from utils import start_server
from utils import shared_data
from utils import thread_data
port = 8080
start_server(port)
while thread_data == 'running':
@@ -75,14 +72,13 @@ while thread_data == 'running':
`AudioStream` 类用于获取音频数据,实现是跨平台的,支持 Windows、Linux 和 macOS。该类初始化包含两个参数
- `audio_type`: 获取音频类型0 表示系统输出音频扬声器1 表示系统输入音频(麦克风)
- `chunk_rate`: 音频数据获取频率,每秒音频获取的音频块的数量,默认为 10
- `chunk_rate`: 音频数据获取频率,每秒音频获取的音频块的数量
该类包含个方法:
该类包含个方法:
- `open_stream()`: 开启音频获取
- `read_chunk() -> bytes`: 读取一个音频块
- `close_stream()`: 关闭音频获取
- `close_stream_signal()` 线程安全的关闭系统音频输入流
样例:
@@ -101,22 +97,11 @@ stream.close_stream()
### 音频处理
获取到的音频流在转文字之前可能需要进行预处理。一般需要将多通道音频转换为单通道音频,还可能需要进行重采样。本项目提供了个音频处理函数:
获取到的音频流在转文字之前可能需要进行预处理。一般需要将多通道音频转换为单通道音频,还可能需要进行重采样。本项目提供了个音频处理函数:
- `merge_chunk_channels(chunk: bytes, channels: int) -> bytes` 将多通道音频块转换为单通道音频块
- `resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int) -> bytes`:将当前多通道音频数据块转换成单通道音频数据块,然后进行重采样
样例:
```python
from sysaudio import AudioStream
from utils import merge_chunk_channels
stream = AudioStream(1)
while True:
raw_chunk = stream.read_chunk()
chunk = resample_chunk_mono(raw_chunk, stream.CHANNELS, stream.RATE, 16000)
# do something with chunk
```
- `resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`:将当前多通道音频数据块转换成单通道音频数据块,然后进行重采样
- `resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes`:将当前单通道音频块进行重采样
## 字幕引擎需要实现的功能
@@ -124,31 +109,20 @@ while True:
在得到了合适的音频流后,需要将音频流转换为文字了。一般使用各种模型(云端或本地)来实现音频流转文字。需要根据需求选择合适的模型。
这部分建议封装为一个类,需要实现个方法:
这部分建议封装为一个类,需要实现个方法:
- `start(self)`:启动模型
- `send_audio_frame(self, data: bytes)`:处理当前音频块数据,**生成的字幕数据通过标准输出发送给 Electron 主进程**
- `translate(self)`:持续从 `shared_data.chunk_queue` 中取出数据块,并调用 `send_audio_frame` 方法处理数据块
- `stop(self)`:停止模型
完整的字幕引擎实例如下:
- [gummy.py](../../engine/audio2text/gummy.py)
- [vosk.py](../../engine/audio2text/vosk.py)
- [sosv.py](../../engine/audio2text/sosv.py)
### 字幕翻译
有的语音转文字模型并不提供翻译,如果有需求,需要再添加一个翻译模块,也可以使用自带的翻译模块
样例:
```python
from utils import google_translate, ollama_translate
text = "这是一个翻译测试。"
google_translate("", "en", text, "time_s")
ollama_translate("qwen3:0.6b", "en", text, "time_s")
```
有的语音转文字模型并不提供翻译,如果有需求,需要再添加一个翻译模块。
### 字幕数据发送
@@ -159,42 +133,37 @@ ollama_translate("qwen3:0.6b", "en", text, "time_s")
```typescript
export interface CaptionItem {
command: "caption",
index: number, // 字幕序号
time_s: string, // 当前字幕开始时间
time_t: string, // 当前字幕结束时间
text: string, // 字幕内容
translation: string // 字幕翻译
index: number, // 字幕序号
time_s: string, // 当前字幕开始时间
time_t: string, // 当前字幕结束时间
text: string, // 字幕内容
translation: string // 字幕翻译
}
```
**注意必须确保每输出一次字幕 JSON 数据就得刷新缓冲区,确保 electron 主进程每次接收到的字符串都可以被解释为 JSON 对象。** 建议使用项目已经实现的 `stdout_obj` 函数来发送。
**注意必须确保每输出一次字幕 JSON 数据就得刷新缓冲区,确保 electron 主进程每次接收到的字符串都可以被解释为 JSON 对象。**
建议使用项目已经实现的 `stdout_obj` 函数来发送。
### 命令行参数的指定
自定义字幕引擎的设置提供命令行参数指定,因此需要设置好字幕引擎的参数,本项目目前用到的参数如下:
```python
import argparse
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
# all
# both
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
parser.add_argument('-c', '--chunk_rate', default=10, help='Number of audio stream chunks collected per second')
parser.add_argument('-p', '--port', default=0, help='The port to run the server on, 0 for no server')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code, "none" for no translation')
parser.add_argument('-r', '--record', default=0, help='Whether to record the audio, 0 for no recording, 1 for recording')
parser.add_argument('-rp', '--record_path', default='', help='Path to save the recorded audio')
# gummy and sosv
parser.add_argument('-s', '--source_language', default='auto', help='Source language code')
parser.add_argument('-c', '--chunk_rate', default=20, help='Number of audio stream chunks collected per second')
parser.add_argument('-p', '--port', default=8080, help='The port to run the server on, 0 for no server')
# gummy only
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
# vosk and sosv
parser.add_argument('-tm', '--translation_model', default='ollama', help='Model for translation: ollama or google')
parser.add_argument('-omn', '--ollama_name', default='', help='Ollama model name for translation')
# vosk only
parser.add_argument('-vosk', '--vosk_model', default='', help='The path to the vosk model.')
# sosv only
parser.add_argument('-sosv', '--sosv_model', default=None, help='The SenseVoice model path')
parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
```
比如对于本项目的字幕引擎,我想使用 Gummy 模型,指定原文为日语,翻译为中文,获取系统音频输出的字幕,每次截取 0.1s 的音频数据,那么命令行参数如下:
@@ -215,7 +184,7 @@ python main.py -e gummy -s ja -t zh -a 0 -c 10 -k <dashscope-api-key>
### 开发建议
除音频转文字和翻译外,其他(音频获取、音频重采样、与主进程通信)建议直接复用本项目代码。如果这样,那么需要添加的内容为:
除音频转文字外,其他建议直接复用本项目代码。如果这样,那么需要添加的内容为:
- `engine/audio2text/`:添加新的音频转文字类(文件级别)
- `engine/main.py`:添加新参数设置、流程函数(参考 `main_gummy` 函数和 `main_vosk` 函数)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 148 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 94 KiB

View File

@@ -1,6 +1,6 @@
# Auto Caption User Manual
Corresponding Version: v1.1.0
Corresponding Version: v0.6.0
**Note: Due to limited personal resources, the English and Japanese documentation files for this project (except for the README document) will no longer be maintained. The content of this document may not be consistent with the latest version of the project. If you are willing to help with translation, please submit relevant Pull Requests.**
@@ -41,20 +41,11 @@ Alibaba Cloud provides detailed tutorials for this part, which can be referenced
- [Obtaining API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
- [Configuring API Key through Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
## Preparation for GLM Engine
You need to obtain an API KEY first, refer to: [Quick Start](https://docs.bigmodel.cn/en/guide/start/quick-start).
## Preparation for Using Vosk Engine
To use the Vosk local caption engine, first download your required model from the [Vosk Models](https://alphacephei.com/vosk/models) page. Then extract the downloaded model package locally and add the corresponding model folder path to the software settings.
To use the Vosk local caption engine, first download your required model from the [Vosk Models](https://alphacephei.com/vosk/models) page. Then extract the downloaded model package locally and add the corresponding model folder path to the software settings. Currently, the Vosk caption engine does not support translated caption content.
![](../../assets/media/config_en.png)
## Using SOSV Model
The way to use the SOSV model is the same as Vosk. The download address is as follows: https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
![](../../assets/media/vosk_en.png)
## Capturing System Audio Output on macOS
@@ -135,199 +126,3 @@ The software provides two default caption engines. If you need other caption eng
Note that when using a custom caption engine, all previous caption engine settings will be ineffective, and the configuration of the custom caption engine is entirely done through the engine command.
If you are a developer and want to develop a custom caption engine, please refer to the [Caption Engine Explanation Document](../engine-manual/en.md).
## Using Caption Engine Standalone
### Runtime Parameter Description
> The following content assumes users have some knowledge of running programs via terminal.
The complete set of runtime parameters available for the caption engine is shown below:
![](../img/06.png)
However, when used standalone, some parameters may not need to be used or should not be modified.
The following parameter descriptions only include necessary parameters.
#### `-e , --caption_engine`
The caption engine model to select, currently three options are available: `gummy, glm, vosk, sosv`.
The default value is `gummy`.
This applies to all models.
#### `-a, --audio_type`
The audio type to recognize, where `0` represents system audio output and `1` represents microphone audio input.
The default value is `0`.
This applies to all models.
#### `-d, --display_caption`
Whether to display captions in the console, `0` means do not display, `1` means display.
The default value is `0`, but it's recommended to choose `1` when using only the caption engine.
This applies to all models.
#### `-t, --target_language`
> Note that Vosk and SOSV models have poor sentence segmentation, which can make translated content difficult to understand. It's not recommended to use translation with these two models.
Target language for translation. All models support the following translation languages:
- `none` No translation
- `zh` Simplified Chinese
- `en` English
- `ja` Japanese
- `ko` Korean
Additionally, `vosk` and `sosv` models also support the following translations:
- `de` German
- `fr` French
- `ru` Russian
- `es` Spanish
- `it` Italian
The default value is `none`.
This applies to all models.
#### `-s, --source_language`
Source language for recognition. Default value is `auto`, meaning no specific source language.
Specifying the source language can improve recognition accuracy to some extent. You can specify the source language using the language codes above.
This applies to Gummy, GLM and SOSV models.
The Gummy model can use all the languages mentioned above, plus Cantonese (`yue`).
The GLM model supports specifying the following languages: English, Chinese, Japanese, Korean.
The SOSV model supports specifying the following languages: English, Chinese, Japanese, Korean, and Cantonese.
#### `-k, --api_key`
Specify the Alibaba Cloud API KEY required for the `Gummy` model.
Default value is empty.
This only applies to the Gummy model.
#### `-gkey, --glm_api_key`
Specifies the API KEY required for the `glm` model. The default value is empty.
#### `-gmodel, --glm_model`
Specifies the model name to be used for the `glm` model. The default value is `glm-asr-2512`.
#### `-gurl, --glm_url`
Specifies the API URL required for the `glm` model. The default value is: `https://open.bigmodel.cn/api/paas/v4/audio/transcriptions`.
#### `-tm, --translation_model`
Specify the translation method for Vosk and SOSV models. Default is `ollama`.
Supported values are:
- `ollama` Use local Ollama model for translation. Users need to install Ollama software and corresponding models
- `google` Use Google Translate API for translation. No additional configuration needed, but requires network access to Google
This only applies to Vosk and SOSV models.
#### `-omn, --ollama_name`
Specifies the name of the translation model to be used, which can be either a local Ollama model or a cloud model compatible with the OpenAI API. If the Base URL field is not filled in, the local Ollama service will be called by default; otherwise, the API service at the specified address will be invoked via the Python OpenAI library.
If using an Ollama model, it is recommended to use a model with fewer than 1B parameters, such as `qwen2.5:0.5b` or `qwen3:0.6b`. The corresponding model must be downloaded in Ollama for normal use.
The default value is empty and applies to models other than Gummy.
#### `-ourl, --ollama_url`
The base request URL for calling the OpenAI API. If left blank, the local Ollama model on the default port will be called.
The default value is empty and applies to models other than Gummy.
#### `-okey, --ollama_api_key`
Specifies the API KEY for calling OpenAI-compatible models.
The default value is empty and applies to models other than Gummy.
#### `-vosk, --vosk_model`
Specify the path to the local folder of the Vosk model to call. Default value is empty.
This only applies to the Vosk model.
#### `-sosv, --sosv_model`
Specify the path to the local folder of the SOSV model to call. Default value is empty.
This only applies to the SOSV model.
### Running Caption Engine Using Source Code
> The following content assumes users who use this method have knowledge of Python environment configuration and usage.
First, download the project source code locally. The caption engine source code is located in the `engine` directory of the project. Then configure the Python environment, where the project dependencies are listed in the `requirements.txt` file in the `engine` directory.
After configuration, enter the `engine` directory and execute commands to run the caption engine.
For example, to use the Gummy model, specify audio type as system audio output, source language as English, and target language as Chinese, execute the following command:
> Note: For better visualization, the commands below are written on multiple lines. If execution fails, try removing backslashes and executing as a single line command.
```bash
python main.py \
-e gummy \
-k sk-******************************** \
-a 0 \
-d 1 \
-s en \
-t zh
```
To specify the Vosk model, audio type as system audio output, translate to English, and use Ollama `qwen3:0.6b` model for translation:
```bash
python main.py \
-e vosk \
-vosk D:\Projects\auto-caption\engine\models\vosk-model-small-cn-0.22 \
-a 0 \
-d 1 \
-t en \
```
To specify the SOSV model, audio type as microphone, automatically select source language, and no translation:
```bash
python main.py \
-e sosv \
-sosv D:\\Projects\\auto-caption\\engine\\models\\sosv-int8 \
-a 1 \
-d 1 \
-s auto \
-t none
```
Running result using the Gummy model is shown below:
![](../img/07.png)
### Running Subtitle Engine Executable File
First, download the executable file for your platform from [GitHub Releases](https://github.com/HiMeditator/auto-caption/releases/tag/engine) (currently only Windows and Linux platform executable files are provided).
Then open a terminal in the directory containing the caption engine executable file and execute commands to run the caption engine.
Simply replace `python main.py` in the above commands with the executable file name (for example: `engine-win.exe`).

View File

@@ -1,9 +1,11 @@
# Auto Caption ユーザーマニュアル
対応バージョンv1.1.0
対応バージョンv0.6.0
この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
**注意個人のリソースが限られているため、このプロジェクトの英語および日本語のドキュメントREADME ドキュメントを除く)のメンテナンスは行われません。このドキュメントの内容は最新版のプロジェクトと一致しない場合があります。翻訳のお手伝いをしていただける場合は、関連するプルリクエストを提出してください。**
## ソフトウェアの概要
Auto Caption は、クロスプラットフォームの字幕表示ソフトウェアで、システムの音声入力(録音)または出力(音声再生)のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。このソフトウェアが提供するデフォルトの字幕エンジン(アリババクラウド Gummy モデルを使用は、9つの言語中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、スペイン語、イタリア語の認識と翻訳をサポートしています。
@@ -41,19 +43,11 @@ macOS プラットフォームでオーディオ出力を取得するには追
- [API KEY の取得(中国語)](https://help.aliyun.com/zh/model-studio/get-api-key)
- [環境変数を通じて API Key を設定(中国語)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
## GLM エンジン使用前の準備
まずAPI KEYを取得する必要があります。参考[クイックスタート](https://docs.bigmodel.cn/en/guide/start/quick-start)。
## Voskエンジン使用前の準備
Voskローカル字幕エンジンを使用するには、まず[Vosk Models](https://alphacephei.com/vosk/models)ページから必要なモデルをダウンロードしてください。その後、ダウンロードしたモデルパッケージをローカルに解凍し、対応するモデルフォルダのパスをソフトウェア設定に追加します。
Voskローカル字幕エンジンを使用するには、まず[Vosk Models](https://alphacephei.com/vosk/models)ページから必要なモデルをダウンロードしてください。その後、ダウンロードしたモデルパッケージをローカルに解凍し、対応するモデルフォルダのパスをソフトウェア設定に追加します。現在、Vosk字幕エンジンは字幕の翻訳をサポートしていません。
![](../../assets/media/config_ja.png)
## SOSVモデルの使用
SOSVモデルの使用方法はVoskと同じで、ダウンロードアドレスは以下の通りですhttps://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
![](../../assets/media/vosk_ja.png)
## macOS でのシステムオーディオ出力の取得方法

View File

@@ -1,6 +1,6 @@
# Auto Caption 用户手册
对应版本v1.1.0
对应版本v0.6.0
## 软件简介
@@ -39,19 +39,11 @@ Auto Caption 是一个跨平台的字幕显示软件,能够实时获取系统
- [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
- [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
## GLM 引擎使用前准备
需要先获取 API KEY参考[Quick Start](https://docs.bigmodel.cn/en/guide/start/quick-start)。
## Vosk 引擎使用前准备
如果要使用 Vosk 本地字幕引擎,首先需要在 [Vosk Models](https://alphacephei.com/vosk/models) 页面下载你需要的模型。然后将下载的模型安装包解压到本地,并将对应的模型文件夹的路径添加到软件的设置中。
如果要使用 Vosk 本地字幕引擎,首先需要在 [Vosk Models](https://alphacephei.com/vosk/models) 页面下载你需要的模型。然后将下载的模型安装包解压到本地,并将对应的模型文件夹的路径添加到软件的设置中。目前 Vosk 字幕引擎还不支持翻译字幕内容。
![](../../assets/media/config_zh.png)
## 使用 SOSV 模型
使用 SOSV 模型的方式和 Vosk 一样下载地址如下https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model
![](../../assets/media/vosk_zh.png)
## macOS 获取系统音频输出
@@ -132,199 +124,3 @@ sudo yum install pulseaudio pavucontrol
注意使用自定义字幕引擎时,前面的字幕引擎的设置将全部不起作用,自定义字幕引擎的配置完全通过引擎指令进行配置。
如果你是开发者,想开发自定义字幕引擎,请查看[字幕引擎说明文档](../engine-manual/zh.md)。
## 单独使用字幕引擎
### 运行参数说明
> 以下内容默认用户对使用终端运行程序有一定了解。
字幕引擎可用使用的完整的运行参数如下:
![](../img/06.png)
而在单独使用时其中某些参数并不需要使用,或者不适合进行修改。
下面的运行参数说明仅包含必要的参数。
#### `-e , --caption_engine`
需要选择的字幕引擎模型,目前有四个可用,分别为:`gummy, glm, vosk, sosv`
该项的默认值为 `gummy`
该项适用于所有模型。
#### `-a, --audio_type`
需要识别的音频类型,其中 `0` 表示系统音频输出,`1` 表示麦克风音频输入。
该项的默认值为 `0`
该项适用于所有模型。
#### `-d, --display_caption`
是否在控制台显示字幕,`0` 表示不显示,`1` 表示显示。
该项默认值为 `0`,只使用字幕引擎的话建议选 `1`
该项适用于所有模型。
#### `-t, --target_language`
> 其中 Vosk 和 SOSV 模型分句效果较差,会导致翻译内容难以理解,不太建议这两个模型使用翻译。
需要翻译成的目标语言,所有模型都支持的翻译语言如下:
- `none` 不进行翻译
- `zh` 简体中文
- `en` 英语
- `ja` 日语
- `ko` 韩语
除此之外 `vosk``sosv` 模型还支持如下翻译:
- `de` 德语
- `fr` 法语
- `ru` 俄语
- `es` 西班牙语
- `it` 意大利语
该项的默认值为 `none`
该项适用于所有模型。
#### `-s, --source_language`
需要识别的语言的源语言,默认值为 `auto`,表示不指定源语言。
但是指定源语言能在一定程度上提高识别准确率,可用使用上面的语言代码指定源语言。
该项适用于 Gummy、GLM 和 SOSV 模型。
其中 Gummy 模型可用使用上述全部的语言,在加上粤语(`yue`)。
GLM 模型支持指定的语言有:英语、中文、日语、韩语。
SOSV 模型支持指定的语言有:英语、中文、日语、韩语、粤语。
#### `-k, --api_key`
指定 `Gummy` 模型需要使用的阿里云 API KEY。
该项默认值为空。
该项仅适用于 Gummy 模型。
#### `-gkey, --glm_api_key`
指定 `glm` 模型需要使用的 API KEY默认为空。
#### `-gmodel, --glm_model`
指定 `glm` 模型需要使用的模型名称,默认为 `glm-asr-2512`
#### `-gurl, --glm_url`
指定 `glm` 模型需要使用的 API URL默认值为`https://open.bigmodel.cn/api/paas/v4/audio/transcriptions`
#### `-tm, --translation_model`
指定 Vosk 和 SOSV 模型的翻译方式,默认为 `ollama`
该项支持的值有:
- `ollama` 使用本地 Ollama 模型进行翻译,需要用户安装 Ollama 软件和对应的模型
- `google` 使用 Google 翻译 API 进行翻译,无需额外配置,但是需要有能访问 Google 的网络
该项仅适用于 Vosk 和 SOSV 模型。
#### `-omn, --ollama_name`
指定要使用的翻译模型名称,可以是 Ollama 本地模型,也可以是 OpenAI API 兼容的云端模型。若未填写 Base URL 字段,则默认调用本地 Ollama 服务,否则会通过 Python OpenAI 库调用该地址指向的 API 服务。
如果使用 Ollama 模型,建议使用参数量小于 1B 的模型,比如: `qwen2.5:0.5b`, `qwen3:0.6b`。需要在 Ollama 中下载了对应的模型才能正常使用。
默认值为空,适用于除了 Gummy 外的其他模型。
#### `-ourl, --ollama_url`
调用 OpenAI API 的基础请求地址,如果不填写则调用本地默认端口的 Ollama 模型。
默认值为空,适用于除了 Gummy 外的其他模型。
#### `-okey, --ollama_api_key`
指定调用 OpenAI 兼容模型的 API KEY。
默认值为空,适用于除了 Gummy 外的其他模型。
#### `-vosk, --vosk_model`
指定需要调用的 Vosk 模型的本地文件夹的路径。该项默认值为空。
该项仅适用于 Vosk 模型。
#### `-sosv, --sosv_model`
指定需要调用的 SOSV 模型的本地文件夹的路径。该项默认值为空。
该项仅适用于 SOSV 模型。
### 使用源代码运行字幕引擎
> 以下内容默认使用该方式的用户对 Python 环境配置和使用有所了解。
首先下载项目源代码到本地,其中字幕引擎源代码在项目的 `engine` 目录下。然后配置 Python 环境,其中项目依赖的 Python 包在 `engine` 目录下 `requirements.txt` 文件中。
配置好后进入 `engine` 目录,执行命令进行运行字幕引擎。
比如要使用 Gummy 模型,指定音频类型为系统音频输出,源语言为英语,翻译语言为中文,执行的命令如下:
> 注意:为了更直观,下面的命令写在了多行,如果执行失败,尝试去掉反斜杠,并改换单行命令执行。
```bash
python main.py \
-e gummy \
-k sk-******************************** \
-a 0 \
-d 1 \
-s en \
-t zh
```
指定 Vosk 模型,指定音频类型为系统音频输出,翻译语言为英语,使用 Ollama `qwen3:0.6b` 模型进行翻译:
```bash
python main.py \
-e vosk \
-vosk D:\Projects\auto-caption\engine\models\vosk-model-small-cn-0.22 \
-a 0 \
-d 1 \
-t en \
```
指定 SOSV 模型,指定音频类型为麦克风,自动选择源语言,不翻译,执行的命令如下:
```bash
python main.py \
-e sosv \
-sosv D:\\Projects\\auto-caption\\engine\\models\\sosv-int8 \
-a 1 \
-d 1 \
-s auto \
-t none
```
使用 Gummy 模型的运行效果如下:
![](../img/07.png)
### 运行字幕引擎可执行文件
首先在 [GitHub Release](https://github.com/HiMeditator/auto-caption/releases/tag/engine) 中下载对应平台的可执行文件(目前仅提供 Windows 和 Linux 平台的字幕引擎可执行文件)。
然后再字幕引擎可执行文件所在目录打开终端,执行命令进行运行字幕引擎。
只需要将上述指令中的 `python main.py` 替换为可执行文件名称即可(比如:`engine-win.exe`)。

View File

@@ -1,5 +1,5 @@
appId: com.himeditator.autocaption
productName: Auto Caption
productName: auto-caption
directories:
buildResources: build
files:
@@ -13,15 +13,13 @@ files:
- '!engine/*'
- '!docs/*'
- '!assets/*'
- '!.repomap/*'
- '!.virtualme/*'
extraResources:
# For Windows
- from: ./engine/dist/main.exe
to: ./engine/main.exe
# For macOS and Linux
- from: ./engine/dist/main
to: ./engine/main
# - from: ./engine/dist/main
# to: ./engine/main
win:
executableName: auto-caption
icon: build/icon.png

View File

@@ -1,4 +1,3 @@
from dashscope.common.error import InvalidParameter
from .gummy import GummyRecognizer
from .vosk import VoskRecognizer
from .sosv import SosvRecognizer
from .glm import GlmRecognizer
from .vosk import VoskRecognizer

View File

@@ -1,163 +0,0 @@
import threading
import io
import wave
import struct
import math
import audioop
import requests
from datetime import datetime
from utils import shared_data
from utils import stdout_cmd, stdout_obj, google_translate, ollama_translate
class GlmRecognizer:
"""
使用 GLM-ASR 引擎处理音频数据,并在标准输出中输出 Auto Caption 软件可读取的 JSON 字符串数据
初始化参数:
url: GLM-ASR API URL
model: GLM-ASR 模型名称
api_key: GLM-ASR API Key
source: 源语言
target: 目标语言
trans_model: 翻译模型名称
ollama_name: Ollama 模型名称
"""
def __init__(self, url: str, model: str, api_key: str, source: str, target: str | None, trans_model: str, ollama_name: str, ollama_url: str = '', ollama_api_key: str = ''):
self.url = url
self.model = model
self.api_key = api_key
self.source = source
self.target = target
if trans_model == 'google':
self.trans_func = google_translate
else:
self.trans_func = ollama_translate
self.ollama_name = ollama_name
self.ollama_url = ollama_url
self.ollama_api_key = ollama_api_key
self.audio_buffer = []
self.is_speech = False
self.silence_frames = 0
self.speech_start_time = None
self.time_str = ''
self.cur_id = 0
# VAD settings (假设 16k 16bit, chunk size 1024 or similar)
# 16bit = 2 bytes per sample.
# RMS threshold needs tuning. 500 is a conservative guess for silence.
self.threshold = 500
self.silence_limit = 15 # frames (approx 0.5-1s depending on chunk size)
self.min_speech_frames = 10 # frames
def start(self):
"""启动 GLM 引擎"""
stdout_cmd('info', 'GLM-ASR recognizer started.')
def stop(self):
"""停止 GLM 引擎"""
stdout_cmd('info', 'GLM-ASR recognizer stopped.')
def process_audio(self, chunk):
# chunk is bytes (int16)
rms = audioop.rms(chunk, 2)
if rms > self.threshold:
if not self.is_speech:
self.is_speech = True
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.audio_buffer = []
self.audio_buffer.append(chunk)
self.silence_frames = 0
else:
if self.is_speech:
self.audio_buffer.append(chunk)
self.silence_frames += 1
if self.silence_frames > self.silence_limit:
# Speech ended
if len(self.audio_buffer) > self.min_speech_frames:
self.recognize(self.audio_buffer, self.time_str)
self.is_speech = False
self.audio_buffer = []
self.silence_frames = 0
def recognize(self, audio_frames, time_s):
audio_bytes = b''.join(audio_frames)
wav_io = io.BytesIO()
with wave.open(wav_io, 'wb') as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(16000)
wav_file.writeframes(audio_bytes)
wav_io.seek(0)
threading.Thread(
target=self._do_request,
args=(wav_io.read(), time_s, self.cur_id)
).start()
self.cur_id += 1
def _do_request(self, audio_content, time_s, index):
try:
files = {
'file': ('audio.wav', audio_content, 'audio/wav')
}
data = {
'model': self.model,
'stream': 'false'
}
headers = {
'Authorization': f'Bearer {self.api_key}'
}
response = requests.post(self.url, headers=headers, data=data, files=files, timeout=15)
if response.status_code == 200:
res_json = response.json()
text = res_json.get('text', '')
if text:
self.output_caption(text, time_s, index)
else:
try:
err_msg = response.json()
stdout_cmd('error', f"GLM API Error: {err_msg}")
except:
stdout_cmd('error', f"GLM API Error: {response.text}")
except Exception as e:
stdout_cmd('error', f"GLM Request Failed: {str(e)}")
def output_caption(self, text, time_s, index):
caption = {
'command': 'caption',
'index': index,
'time_s': time_s,
'time_t': datetime.now().strftime('%H:%M:%S.%f')[:-3],
'text': text,
'translation': ''
}
if self.target:
if self.trans_func == ollama_translate:
th = threading.Thread(
target=self.trans_func,
args=(self.ollama_name, self.target, caption['text'], time_s, self.ollama_url, self.ollama_api_key),
daemon=True
)
else:
th = threading.Thread(
target=self.trans_func,
args=(self.ollama_name, self.target, caption['text'], time_s),
daemon=True
)
th.start()
stdout_obj(caption)
def translate(self):
global shared_data
while shared_data.status == 'running':
chunk = shared_data.chunk_queue.get()
self.process_audio(chunk)

View File

@@ -5,10 +5,9 @@ from dashscope.audio.asr import (
TranslationRecognizerRealtime
)
import dashscope
from dashscope.common.error import InvalidParameter
from datetime import datetime
from utils import stdout_cmd, stdout_obj, stdout_err
from utils import shared_data
from utils import stdout_cmd, stdout_obj, stderr
class Callback(TranslationRecognizerCallback):
"""
@@ -91,23 +90,9 @@ class GummyRecognizer:
"""启动 Gummy 引擎"""
self.translator.start()
def translate(self):
"""持续读取共享数据中的音频帧,并进行语音识别将识别结果输出到标准输出中"""
global shared_data
restart_count = 0
while shared_data.status == 'running':
chunk = shared_data.chunk_queue.get()
try:
self.translator.send_audio_frame(chunk)
except InvalidParameter as e:
restart_count += 1
if restart_count > 5:
stdout_err(str(e))
shared_data.status = "kill"
stdout_cmd('kill')
break
else:
stdout_cmd('info', f'Gummy engine stopped, restart attempt: {restart_count}...')
def send_audio_frame(self, data):
"""发送音频帧,擎将自动识别将识别结果输出到标准输出中"""
self.translator.send_audio_frame(data)
def stop(self):
"""停止 Gummy 引擎"""

View File

@@ -1,178 +0,0 @@
"""
Shepra-ONNX SenseVoice Model
This code file references the following:
https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/simulate-streaming-sense-voice-microphone.py
"""
import time
from datetime import datetime
import sherpa_onnx
import threading
import numpy as np
from utils import shared_data
from utils import stdout_cmd, stdout_obj
from utils import google_translate, ollama_translate
class SosvRecognizer:
"""
使用 Sense Voice 非流式模型处理流式音频数据,并在标准输出中输出 Auto Caption 软件可读取的 JSON 字符串数据
初始化参数:
model_path: Shepra ONNX Sense Voice 识别模型路径
vad_model: Silero VAD 模型路径
source: 识别源语言(auto, zh, en, ja, ko, yue)
target: 翻译目标语言
trans_model: 翻译模型名称
ollama_name: Ollama 模型名称
"""
def __init__(self, model_path: str, source: str, target: str | None, trans_model: str, ollama_name: str, ollama_url: str = '', ollama_api_key: str = ''):
if model_path.startswith('"'):
model_path = model_path[1:]
if model_path.endswith('"'):
model_path = model_path[:-1]
self.model_path = model_path
self.ext = ""
if self.model_path[-4:] == "int8":
self.ext = ".int8"
self.source = source
self.target = target
if trans_model == 'google':
self.trans_func = google_translate
else:
self.trans_func = ollama_translate
self.ollama_name = ollama_name
self.ollama_url = ollama_url
self.ollama_api_key = ollama_api_key
self.time_str = ''
self.cur_id = 0
self.prev_content = ''
def start(self):
"""启动 Sense Voice 模型"""
self.recognizer = sherpa_onnx.OfflineRecognizer.from_sense_voice(
model=f"{self.model_path}/sensevoice/model{self.ext}.onnx",
tokens=f"{self.model_path}/sensevoice/tokens.txt",
language=self.source,
num_threads = 2,
)
vad_config = sherpa_onnx.VadModelConfig()
vad_config.silero_vad.model = f"{self.model_path}/silero_vad.onnx"
vad_config.silero_vad.threshold = 0.5
vad_config.silero_vad.min_silence_duration = 0.1
vad_config.silero_vad.min_speech_duration = 0.25
vad_config.silero_vad.max_speech_duration = 5
vad_config.sample_rate = 16000
self.window_size = vad_config.silero_vad.window_size
self.vad = sherpa_onnx.VoiceActivityDetector(vad_config, buffer_size_in_seconds=100)
if self.source == 'en':
model_config = sherpa_onnx.OnlinePunctuationModelConfig(
cnn_bilstm=f"{self.model_path}/punct-en/model{self.ext}.onnx",
bpe_vocab=f"{self.model_path}/punct-en/bpe.vocab"
)
punct_config = sherpa_onnx.OnlinePunctuationConfig(
model_config=model_config,
)
self.punct = sherpa_onnx.OnlinePunctuation(punct_config)
else:
punct_config = sherpa_onnx.OfflinePunctuationConfig(
model=sherpa_onnx.OfflinePunctuationModelConfig(
ct_transformer=f"{self.model_path}/punct/model{self.ext}.onnx"
),
)
self.punct = sherpa_onnx.OfflinePunctuation(punct_config)
self.buffer = []
self.offset = 0
self.started = False
self.started_time = .0
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
stdout_cmd('info', 'Shepra ONNX Sense Voice recognizer started.')
def send_audio_frame(self, data: bytes):
"""
发送音频帧给 SOSV 引擎,引擎将自动识别并将识别结果输出到标准输出中
Args:
data: 音频帧数据,采样率必须为 16000Hz
"""
caption = {}
caption['command'] = 'caption'
caption['translation'] = ''
data_np = np.frombuffer(data, dtype=np.int16).astype(np.float32)
self.buffer = np.concatenate([self.buffer, data_np])
while self.offset + self.window_size < len(self.buffer):
self.vad.accept_waveform(self.buffer[self.offset: self.offset + self.window_size])
if not self.started and self.vad.is_speech_detected():
self.started = True
self.started_time = time.time()
self.offset += self.window_size
if not self.started:
if len(self.buffer) > 10 * self.window_size:
self.offset -= len(self.buffer) - 10 * self.window_size
self.buffer = self.buffer[-10 * self.window_size:]
if self.started and time.time() - self.started_time > 0.2:
stream = self.recognizer.create_stream()
stream.accept_waveform(16000, self.buffer)
self.recognizer.decode_stream(stream)
text = stream.result.text.strip()
if text and self.prev_content != text:
caption['index'] = self.cur_id
caption['text'] = text
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.prev_content = text
stdout_obj(caption)
self.started_time = time.time()
while not self.vad.empty():
stream = self.recognizer.create_stream()
stream.accept_waveform(16000, self.vad.front.samples)
self.vad.pop()
self.recognizer.decode_stream(stream)
text = stream.result.text.strip()
if self.source == 'en':
text_with_punct = self.punct.add_punctuation_with_case(text)
else:
text_with_punct = self.punct.add_punctuation(text)
caption['index'] = self.cur_id
caption['text'] = text_with_punct
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
if text:
stdout_obj(caption)
if self.target:
th = threading.Thread(
target=self.trans_func,
args=(self.ollama_name, self.target, caption['text'], self.time_str, self.ollama_url, self.ollama_api_key),
daemon=True
)
th.start()
self.cur_id += 1
self.prev_content = ''
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.buffer = []
self.offset = 0
self.started = False
self.started_time = .0
def translate(self):
"""持续读取共享数据中的音频帧,并进行语音识别,将识别结果输出到标准输出中"""
global shared_data
while shared_data.status == 'running':
chunk = shared_data.chunk_queue.get()
self.send_audio_frame(chunk)
def stop(self):
"""停止 Sense Voice 模型"""
stdout_cmd('info', 'Shepra ONNX Sense Voice recognizer closed.')

View File

@@ -1,11 +1,8 @@
import json
import threading
import time
from datetime import datetime
from vosk import Model, KaldiRecognizer, SetLogLevel
from utils import shared_data
from utils import stdout_cmd, stdout_obj, google_translate, ollama_translate
from utils import stdout_cmd, stdout_obj
class VoskRecognizer:
@@ -14,25 +11,14 @@ class VoskRecognizer:
初始化参数:
model_path: Vosk 识别模型路径
target: 翻译目标语言
trans_model: 翻译模型名称
ollama_name: Ollama 模型名称
"""
def __init__(self, model_path: str, target: str | None, trans_model: str, ollama_name: str, ollama_url: str = '', ollama_api_key: str = ''):
def __init__(self, model_path: str):
SetLogLevel(-1)
if model_path.startswith('"'):
model_path = model_path[1:]
if model_path.endswith('"'):
model_path = model_path[:-1]
self.model_path = model_path
self.target = target
if trans_model == 'google':
self.trans_func = google_translate
else:
self.trans_func = ollama_translate
self.ollama_name = ollama_name
self.ollama_url = ollama_url
self.ollama_api_key = ollama_api_key
self.time_str = ''
self.cur_id = 0
self.prev_content = ''
@@ -62,16 +48,7 @@ class VoskRecognizer:
caption['time_s'] = self.time_str
caption['time_t'] = datetime.now().strftime('%H:%M:%S.%f')[:-3]
self.prev_content = ''
if content == '': return
self.cur_id += 1
if self.target:
th = threading.Thread(
target=self.trans_func,
args=(self.ollama_name, self.target, caption['text'], self.time_str, self.ollama_url, self.ollama_api_key),
daemon=True
)
th.start()
else:
content = json.loads(self.recognizer.PartialResult()).get('partial', '')
if content == '' or content == self.prev_content:
@@ -86,13 +63,6 @@ class VoskRecognizer:
stdout_obj(caption)
def translate(self):
"""持续读取共享数据中的音频帧,并进行语音识别,将识别结果输出到标准输出中"""
global shared_data
while shared_data.status == 'running':
chunk = shared_data.chunk_queue.get()
self.send_audio_frame(chunk)
def stop(self):
"""停止 Vosk 引擎"""
stdout_cmd('info', 'Vosk recognizer closed.')

View File

@@ -1,223 +1,86 @@
import wave
import argparse
import threading
import datetime
from utils import stdout, stdout_cmd, change_caption_display
from utils import shared_data, start_server
from utils import stdout_cmd, stderr
from utils import thread_data, start_server
from utils import merge_chunk_channels, resample_chunk_mono
from audio2text import GummyRecognizer
from audio2text import InvalidParameter, GummyRecognizer
from audio2text import VoskRecognizer
from audio2text import SosvRecognizer
from audio2text import GlmRecognizer
from sysaudio import AudioStream
def audio_recording(stream: AudioStream, resample: bool, record = False, path = ''):
global shared_data
stream.open_stream()
wf = None
full_name = ''
if record:
if path != '':
if path.startswith('"') and path.endswith('"'):
path = path[1:-1]
if path[-1] != '/':
path += '/'
cur_dt = datetime.datetime.now()
name = cur_dt.strftime("audio-%Y-%m-%dT%H-%M-%S")
full_name = f'{path}{name}.wav'
wf = wave.open(full_name, 'wb')
wf.setnchannels(stream.CHANNELS)
wf.setsampwidth(stream.SAMP_WIDTH)
wf.setframerate(stream.RATE)
stdout_cmd("info", "Audio recording...")
while shared_data.status == 'running':
raw_chunk = stream.read_chunk()
if record: wf.writeframes(raw_chunk) # type: ignore
if raw_chunk is None: continue
if resample:
chunk = resample_chunk_mono(raw_chunk, stream.CHANNELS, stream.RATE, 16000)
else:
chunk = merge_chunk_channels(raw_chunk, stream.CHANNELS)
shared_data.chunk_queue.put(chunk)
if record:
stdout_cmd("info", f"Audio saved to {full_name}")
wf.close() # type: ignore
stream.close_stream_signal()
def main_gummy(s: str, t: str, a: int, c: int, k: str, r: bool, rp: str):
"""
Parameters:
s: Source language
t: Target language
k: Aliyun Bailian API key
r: Whether to record the audio
rp: Path to save the recorded audio
"""
def main_gummy(s: str, t: str, a: int, c: int, k: str):
global thread_data
stream = AudioStream(a, c)
if t == 'none':
engine = GummyRecognizer(stream.RATE, s, None, k)
else:
engine = GummyRecognizer(stream.RATE, s, t, k)
stream.open_stream()
engine.start()
stream_thread = threading.Thread(
target=audio_recording,
args=(stream, False, r, rp),
daemon=True
)
stream_thread.start()
try:
engine.translate()
except KeyboardInterrupt:
stdout("Keyboard interrupt detected. Exiting...")
restart_count = 0
while thread_data.status == "running":
try:
chunk = stream.read_chunk()
if chunk is None: continue
chunk_mono = merge_chunk_channels(chunk, stream.CHANNELS)
try:
engine.send_audio_frame(chunk_mono)
except InvalidParameter as e:
restart_count += 1
if restart_count > 8:
stderr(str(e))
thread_data.status = "kill"
break
else:
stdout_cmd('info', f'Gummy engine stopped, trying to restart #{restart_count}')
except KeyboardInterrupt:
break
stream.close_stream()
engine.stop()
def main_vosk(a: int, c: int, vosk: str, t: str, tm: str, omn: str, ourl: str, okey: str, r: bool, rp: str):
"""
Parameters:
a: Audio source: 0 for output, 1 for input
c: Chunk number in 1 second
vosk: Vosk model path
t: Target language
tm: Translation model type, ollama or google
omn: Ollama model name
ourl: Ollama Base URL
okey: Ollama API Key
r: Whether to record the audio
rp: Path to save the recorded audio
"""
def main_vosk(a: int, c: int, m: str):
global thread_data
stream = AudioStream(a, c)
if t == 'none':
engine = VoskRecognizer(vosk, None, tm, omn, ourl, okey)
else:
engine = VoskRecognizer(vosk, t, tm, omn, ourl, okey)
engine = VoskRecognizer(m)
stream.open_stream()
engine.start()
stream_thread = threading.Thread(
target=audio_recording,
args=(stream, True, r, rp),
daemon=True
)
stream_thread.start()
try:
engine.translate()
except KeyboardInterrupt:
stdout("Keyboard interrupt detected. Exiting...")
while thread_data.status == "running":
try:
chunk = stream.read_chunk()
if chunk is None: continue
chunk_mono = resample_chunk_mono(chunk, stream.CHANNELS, stream.RATE, 16000)
engine.send_audio_frame(chunk_mono)
except KeyboardInterrupt:
break
stream.close_stream()
engine.stop()
def main_sosv(a: int, c: int, sosv: str, s: str, t: str, tm: str, omn: str, ourl: str, okey: str, r: bool, rp: str):
"""
Parameters:
a: Audio source: 0 for output, 1 for input
c: Chunk number in 1 second
sosv: Sherpa-ONNX SenseVoice model path
s: Source language
t: Target language
tm: Translation model type, ollama or google
omn: Ollama model name
ourl: Ollama API URL
okey: Ollama API Key
r: Whether to record the audio
rp: Path to save the recorded audio
"""
stream = AudioStream(a, c)
if t == 'none':
engine = SosvRecognizer(sosv, s, None, tm, omn, ourl, okey)
else:
engine = SosvRecognizer(sosv, s, t, tm, omn, ourl, okey)
engine.start()
stream_thread = threading.Thread(
target=audio_recording,
args=(stream, True, r, rp),
daemon=True
)
stream_thread.start()
try:
engine.translate()
except KeyboardInterrupt:
stdout("Keyboard interrupt detected. Exiting...")
engine.stop()
def main_glm(a: int, c: int, url: str, model: str, key: str, s: str, t: str, tm: str, omn: str, ourl: str, okey: str, r: bool, rp: str):
"""
Parameters:
a: Audio source
c: Chunk rate
url: GLM API URL
model: GLM Model Name
key: GLM API Key
s: Source language
t: Target language
tm: Translation model
omn: Ollama model name
ourl: Ollama API URL
okey: Ollama API Key
r: Record
rp: Record path
"""
stream = AudioStream(a, c)
if t == 'none':
engine = GlmRecognizer(url, model, key, s, None, tm, omn, ourl, okey)
else:
engine = GlmRecognizer(url, model, key, s, t, tm, omn, ourl, okey)
engine.start()
stream_thread = threading.Thread(
target=audio_recording,
args=(stream, True, r, rp),
daemon=True
)
stream_thread.start()
try:
engine.translate()
except KeyboardInterrupt:
stdout("Keyboard interrupt detected. Exiting...")
engine.stop()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
# all
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy, glm, vosk or sosv')
parser.add_argument('-a', '--audio_type', type=int, default=0, help='Audio stream source: 0 for output, 1 for input')
parser.add_argument('-c', '--chunk_rate', type=int, default=10, help='Number of audio stream chunks collected per second')
parser.add_argument('-p', '--port', type=int, default=0, help='The port to run the server on, 0 for no server')
parser.add_argument('-d', '--display_caption', type=int, default=0, help='Display caption on terminal, 0 for no display, 1 for display')
parser.add_argument('-t', '--target_language', default='none', help='Target language code, "none" for no translation')
parser.add_argument('-r', '--record', type=int, default=0, help='Whether to record the audio, 0 for no recording, 1 for recording')
parser.add_argument('-rp', '--record_path', default='', help='Path to save the recorded audio')
# gummy and sosv and glm
parser.add_argument('-s', '--source_language', default='auto', help='Source language code')
# both
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
parser.add_argument('-c', '--chunk_rate', default=20, help='Number of audio stream chunks collected per second')
parser.add_argument('-p', '--port', default=8080, help='The port to run the server on, 0 for no server')
# gummy only
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
# vosk and sosv
parser.add_argument('-tm', '--translation_model', default='ollama', help='Model for translation: ollama or google')
parser.add_argument('-omn', '--ollama_name', default='', help='Ollama model name for translation')
parser.add_argument('-ourl', '--ollama_url', default='', help='Ollama API URL')
parser.add_argument('-okey', '--ollama_api_key', default='', help='Ollama API Key')
# vosk only
parser.add_argument('-vosk', '--vosk_model', default='', help='The path to the vosk model.')
# sosv only
parser.add_argument('-sosv', '--sosv_model', default=None, help='The SenseVoice model path')
# glm only
parser.add_argument('-gurl', '--glm_url', default='https://open.bigmodel.cn/api/paas/v4/audio/transcriptions', help='GLM API URL')
parser.add_argument('-gmodel', '--glm_model', default='glm-asr-2512', help='GLM Model Name')
parser.add_argument('-gkey', '--glm_api_key', default='', help='GLM API Key')
parser.add_argument('-m', '--model_path', default='', help='The path to the vosk model.')
args = parser.parse_args()
if args.port != 0:
threading.Thread(target=start_server, args=(args.port,), daemon=True).start()
if args.display_caption == '1':
change_caption_display(True)
args = parser.parse_args()
if int(args.port) == 0:
thread_data.status = "running"
else:
start_server(int(args.port))
if args.caption_engine == 'gummy':
main_gummy(
@@ -225,55 +88,16 @@ if __name__ == "__main__":
args.target_language,
int(args.audio_type),
int(args.chunk_rate),
args.api_key,
bool(int(args.record)),
args.record_path
args.api_key
)
elif args.caption_engine == 'vosk':
main_vosk(
int(args.audio_type),
int(args.chunk_rate),
args.vosk_model,
args.target_language,
args.translation_model,
args.ollama_name,
args.ollama_url,
args.ollama_api_key,
bool(int(args.record)),
args.record_path
)
elif args.caption_engine == 'sosv':
main_sosv(
int(args.audio_type),
int(args.chunk_rate),
args.sosv_model,
args.source_language,
args.target_language,
args.translation_model,
args.ollama_name,
args.ollama_url,
args.ollama_api_key,
bool(int(args.record)),
args.record_path
)
elif args.caption_engine == 'glm':
main_glm(
int(args.audio_type),
int(args.chunk_rate),
args.glm_url,
args.glm_model,
args.glm_api_key,
args.source_language,
args.target_language,
args.translation_model,
args.ollama_name,
args.ollama_url,
args.ollama_api_key,
bool(int(args.record)),
args.record_path
args.model_path
)
else:
raise ValueError('Invalid caption engine specified.')
if shared_data.status == "kill":
if thread_data.status == "kill":
stdout_cmd('kill')

View File

@@ -4,14 +4,9 @@ from pathlib import Path
import sys
if sys.platform == 'win32':
vosk_path = str(Path('./.venv/Lib/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
else:
venv_lib = Path('./.venv/lib')
python_dirs = list(venv_lib.glob('python*'))
if python_dirs:
vosk_path = str((python_dirs[0] / 'site-packages' / 'vosk').resolve())
else:
vosk_path = str(Path('./.venv/lib/python3.12/site-packages/vosk').resolve())
vosk_path = str(Path('./subenv/lib/python3.12/site-packages/vosk').resolve())
a = Analysis(
['main.py'],

View File

@@ -1,12 +0,0 @@
dashscope
numpy
resampy
vosk
pyinstaller
pyaudio; sys_platform == 'darwin'
pyaudiowpatch; sys_platform == 'win32'
googletrans
ollama
sherpa_onnx
requests
openai

View File

@@ -0,0 +1,6 @@
dashscope
numpy
samplerate
PyAudio
vosk
pyinstaller

View File

@@ -0,0 +1,5 @@
dashscope
numpy
vosk
pyinstaller
samplerate # pip install samplerate --only-binary=:all:

View File

@@ -0,0 +1,6 @@
dashscope
numpy
samplerate
PyAudioWPatch
vosk
pyinstaller

View File

@@ -22,9 +22,9 @@ class AudioStream:
初始化参数:
audio_type: 0-系统音频输出流(需配合 BlackHole1-系统音频输入流
chunk_rate: 每秒采集音频块的数量,默认为10
chunk_rate: 每秒采集音频块的数量,默认为20
"""
def __init__(self, audio_type=0, chunk_rate=10):
def __init__(self, audio_type=0, chunk_rate=20):
self.audio_type = audio_type
self.mic = pyaudio.PyAudio()
if self.audio_type == 0:
@@ -37,13 +37,8 @@ class AudioStream:
self.FORMAT = pyaudio.paInt16
self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
self.CHANNELS = int(self.device["maxInputChannels"])
self.DEFAULT_RATE = int(self.device["defaultSampleRate"])
self.CHUNK_RATE = chunk_rate
self.RATE = 16000
self.CHUNK = self.RATE // self.CHUNK_RATE
self.open_stream()
self.close_stream()
self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // chunk_rate
def get_info(self):
dev_info = f"""
@@ -71,27 +66,16 @@ class AudioStream:
打开并返回系统音频输出流
"""
if self.stream: return self.stream
try:
self.stream = self.mic.open(
format = self.FORMAT,
channels = int(self.CHANNELS),
rate = self.RATE,
input = True,
input_device_index = int(self.INDEX)
)
except OSError:
self.RATE = self.DEFAULT_RATE
self.CHUNK = self.RATE // self.CHUNK_RATE
self.stream = self.mic.open(
format = self.FORMAT,
channels = int(self.CHANNELS),
rate = self.RATE,
input = True,
input_device_index = int(self.INDEX)
)
self.stream = self.mic.open(
format = self.FORMAT,
channels = int(self.CHANNELS),
rate = self.RATE,
input = True,
input_device_index = int(self.INDEX)
)
return self.stream
def read_chunk(self) -> bytes | None:
def read_chunk(self):
"""
读取音频数据
"""

View File

@@ -41,9 +41,9 @@ class AudioStream:
初始化参数:
audio_type: 0-系统音频输出流不支持不会生效1-系统音频输入流(默认)
chunk_rate: 每秒采集音频块的数量,默认为10
chunk_rate: 每秒采集音频块的数量,默认为20
"""
def __init__(self, audio_type=1, chunk_rate=10):
def __init__(self, audio_type=1, chunk_rate=20):
self.audio_type = audio_type
if self.audio_type == 0:
@@ -55,8 +55,7 @@ class AudioStream:
self.FORMAT = 16
self.SAMP_WIDTH = 2
self.CHANNELS = 2
self.RATE = 16000
self.CHUNK_RATE = chunk_rate
self.RATE = 48000
self.CHUNK = self.RATE // chunk_rate
def get_info(self):
@@ -79,7 +78,7 @@ class AudioStream:
启动音频捕获进程
"""
self.process = subprocess.Popen(
["parec", "-d", self.source, "--format=s16le", "--rate=16000", "--channels=2"],
["parec", "-d", self.source, "--format=s16le", "--rate=48000", "--channels=2"],
stdout=subprocess.PIPE
)

View File

@@ -46,9 +46,9 @@ class AudioStream:
初始化参数:
audio_type: 0-系统音频输出流默认1-系统音频输入流
chunk_rate: 每秒采集音频块的数量,默认为10
chunk_rate: 每秒采集音频块的数量,默认为20
"""
def __init__(self, audio_type=0, chunk_rate=10, chunk_size=-1):
def __init__(self, audio_type=0, chunk_rate=20):
self.audio_type = audio_type
self.mic = pyaudio.PyAudio()
if self.audio_type == 0:
@@ -61,13 +61,8 @@ class AudioStream:
self.FORMAT = pyaudio.paInt16
self.SAMP_WIDTH = pyaudio.get_sample_size(self.FORMAT)
self.CHANNELS = int(self.device["maxInputChannels"])
self.DEFAULT_RATE = int(self.device["defaultSampleRate"])
self.CHUNK_RATE = chunk_rate
self.RATE = 16000
self.CHUNK = self.RATE // self.CHUNK_RATE
self.open_stream()
self.close_stream()
self.RATE = int(self.device["defaultSampleRate"])
self.CHUNK = self.RATE // chunk_rate
def get_info(self):
dev_info = f"""
@@ -95,24 +90,13 @@ class AudioStream:
打开并返回系统音频输出流
"""
if self.stream: return self.stream
try:
self.stream = self.mic.open(
format = self.FORMAT,
channels = self.CHANNELS,
rate = self.RATE,
input = True,
input_device_index = self.INDEX
)
except OSError:
self.RATE = self.DEFAULT_RATE
self.CHUNK = self.RATE // self.CHUNK_RATE
self.stream = self.mic.open(
format = self.FORMAT,
channels = self.CHANNELS,
rate = self.RATE,
input = True,
input_device_index = self.INDEX
)
self.stream = self.mic.open(
format = self.FORMAT,
channels = self.CHANNELS,
rate = self.RATE,
input = True,
input_device_index = self.INDEX
)
return self.stream
def read_chunk(self) -> bytes | None:

View File

@@ -1,6 +1,4 @@
from .audioprcs import merge_chunk_channels, resample_chunk_mono
from .sysout import stdout, stdout_err, stdout_cmd, stdout_obj, stderr
from .sysout import change_caption_display
from .shared import shared_data
from .server import start_server
from .translation import ollama_translate, google_translate
from .audioprcs import merge_chunk_channels, resample_chunk_mono, resample_mono_chunk
from .sysout import stdout, stdout_cmd, stdout_obj, stderr
from .thdata import thread_data
from .server import start_server

View File

@@ -1,4 +1,4 @@
import resampy
import samplerate
import numpy as np
import numpy.core.multiarray # do not remove
@@ -24,15 +24,16 @@ def merge_chunk_channels(chunk: bytes, channels: int) -> bytes:
return chunk_mono.tobytes()
def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int) -> bytes:
def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
"""
将当前多通道音频数据块转换成单通道音频数据块,进行重采样
将当前多通道音频数据块转换成单通道音频数据块,然后进行重采样
Args:
chunk: 多通道音频数据块
channels: 通道数
orig_sr: 原始采样率
target_sr: 目标采样率
mode: 重采样模式,可选:'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
Return:
单通道音频数据块
@@ -48,17 +49,28 @@ def resample_chunk_mono(chunk: bytes, channels: int, orig_sr: int, target_sr: in
# (length,)
chunk_mono = np.mean(chunk_np.astype(np.float32), axis=1)
if orig_sr == target_sr:
return chunk_mono.astype(np.int16).tobytes()
chunk_mono_r = resampy.resample(chunk_mono, orig_sr, target_sr)
ratio = target_sr / orig_sr
chunk_mono_r = samplerate.resample(chunk_mono, ratio, converter_type=mode)
chunk_mono_r = np.round(chunk_mono_r).astype(np.int16)
real_len = round(chunk_mono.shape[0] * target_sr / orig_sr)
if(chunk_mono_r.shape[0] != real_len):
print(chunk_mono_r.shape[0], real_len)
if(chunk_mono_r.shape[0] > real_len):
chunk_mono_r = chunk_mono_r[:real_len]
else:
while chunk_mono_r.shape[0] < real_len:
chunk_mono_r = np.append(chunk_mono_r, chunk_mono_r[-1])
return chunk_mono_r.tobytes()
def resample_mono_chunk(chunk: bytes, orig_sr: int, target_sr: int, mode="sinc_best") -> bytes:
"""
将当前单通道音频块进行重采样
Args:
chunk: 单通道音频数据块
orig_sr: 原始采样率
target_sr: 目标采样率
mode: 重采样模式,可选:'sinc_best' | 'sinc_medium' | 'sinc_fastest' | 'zero_order_hold' | 'linear'
Return:
单通道音频数据块
"""
chunk_np = np.frombuffer(chunk, dtype=np.int16)
chunk_np = chunk_np.astype(np.float32)
ratio = target_sr / orig_sr
chunk_r = samplerate.resample(chunk_np, ratio, converter_type=mode)
chunk_r = np.round(chunk_r).astype(np.int16)
return chunk_r.tobytes()

View File

@@ -1,12 +1,12 @@
import socket
import threading
import json
from utils import shared_data, stdout_cmd, stderr
from utils import thread_data, stdout_cmd, stderr
def handle_client(client_socket):
global shared_data
while shared_data.status == 'running':
global thread_data
while thread_data.status == 'running':
try:
data = client_socket.recv(4096).decode('utf-8')
if not data:
@@ -14,13 +14,13 @@ def handle_client(client_socket):
data = json.loads(data)
if data['command'] == 'stop':
shared_data.status = 'stop'
thread_data.status = 'stop'
break
except Exception as e:
stderr(f'Communication error: {e}')
break
shared_data.status = 'stop'
thread_data.status = 'stop'
client_socket.close()

View File

@@ -1,8 +0,0 @@
import queue
class SharedData:
def __init__(self):
self.status = "running"
self.chunk_queue = queue.Queue()
shared_data = SharedData()

View File

@@ -1,58 +1,15 @@
import sys
import json
import sherpa_onnx
display_caption = False
caption_index = -1
display = sherpa_onnx.Display()
def stdout(text: str):
stdout_cmd("print", text)
def stdout_err(text: str):
stdout_cmd("error", text)
def stdout_cmd(command: str, content = ""):
msg = { "command": command, "content": content }
sys.stdout.write(json.dumps(msg) + "\n")
sys.stdout.flush()
def change_caption_display(val: bool):
global display_caption
display_caption = val
def caption_display(obj):
global display_caption
global caption_index
global display
if caption_index >=0 and caption_index != int(obj['index']):
display.finalize_current_sentence()
caption_index = int(obj['index'])
full_text = f"{obj['text']}\n{obj['translation']}"
if obj['translation']:
full_text += "\n"
display.update_text(full_text)
display.display()
def translation_display(obj):
global original_caption
global display
full_text = f"{obj['text']}\n{obj['translation']}"
if obj['translation']:
full_text += "\n"
display.update_text(full_text)
display.display()
display.finalize_current_sentence()
def stdout_obj(obj):
global display_caption
if obj['command'] == 'caption' and display_caption:
caption_display(obj)
return
if obj['command'] == 'translation' and display_caption:
translation_display(obj)
return
sys.stdout.write(json.dumps(obj) + "\n")
sys.stdout.flush()

5
engine/utils/thdata.py Normal file
View File

@@ -0,0 +1,5 @@
class ThreadData:
def __init__(self):
self.status = "running"
thread_data = ThreadData()

View File

@@ -1,83 +0,0 @@
from ollama import chat, Client
from ollama import ChatResponse
try:
from openai import OpenAI
except ImportError:
OpenAI = None
import asyncio
from googletrans import Translator
from .sysout import stdout_cmd, stdout_obj
lang_map = {
'en': 'English',
'es': 'Spanish',
'fr': 'French',
'de': 'German',
'it': 'Italian',
'ru': 'Russian',
'ja': 'Japanese',
'ko': 'Korean',
'zh': 'Chinese',
'zh-cn': 'Chinese'
}
def ollama_translate(model: str, target: str, text: str, time_s: str, url: str = '', key: str = ''):
content = ""
try:
if url:
if OpenAI:
client = OpenAI(base_url=url, api_key=key if key else "ollama")
openai_response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
{"role": "user", "content": text}
]
)
content = openai_response.choices[0].message.content or ""
else:
client = Client(host=url)
response: ChatResponse = client.chat(
model=model,
messages=[
{"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
{"role": "user", "content": text}
]
)
content = response.message.content or ""
else:
response: ChatResponse = chat(
model=model,
messages=[
{"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
{"role": "user", "content": text}
]
)
content = response.message.content or ""
except Exception as e:
stdout_cmd("warn", f"Translation failed: {str(e)}")
return
if content.startswith('<think>'):
index = content.find('</think>')
if index != -1:
content = content[index+8:]
stdout_obj({
"command": "translation",
"time_s": time_s,
"text": text,
"translation": content.strip()
})
def google_translate(model: str, target: str, text: str, time_s: str):
translator = Translator()
try:
res = asyncio.run(translator.translate(text, dest=target))
stdout_obj({
"command": "translation",
"time_s": time_s,
"text": text,
"translation": res.text
})
except Exception as e:
stdout_cmd("warn", f"Google translation request failed, please check your network connection...")

71
package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "auto-caption",
"version": "1.0.0",
"version": "0.6.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "auto-caption",
"version": "1.0.0",
"version": "0.6.0",
"hasInstallScript": true,
"dependencies": {
"@electron-toolkit/preload": "^3.0.1",
@@ -110,7 +110,6 @@
"integrity": "sha512-IaaGWsQqfsQWVLqMn9OB92MNN7zukfVA4s7KKAI0KfrrDsZ0yhi5uV4baBuLuN7n3vsZpwP8asPPcVwApxvjBQ==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"@ampproject/remapping": "^2.2.0",
"@babel/code-frame": "^7.27.1",
@@ -2275,7 +2274,6 @@
"resolved": "https://registry.npmmirror.com/@types/node/-/node-22.15.17.tgz",
"integrity": "sha512-wIX2aSZL5FE+MR0JlvF87BNVrtFWf6AE6rxSE9X7OwnVvoyCQjpzSRJ+M87se/4QCkCiebQAqrJ0y6fwIyi7nw==",
"license": "MIT",
"peer": true,
"dependencies": {
"undici-types": "~6.21.0"
}
@@ -2362,7 +2360,6 @@
"integrity": "sha512-B2MdzyWxCE2+SqiZHAjPphft+/2x2FlO9YBx7eKE1BCb+rqBlQdhtAEhzIEdozHd55DXPmxBdpMygFJjfjjA9A==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"@typescript-eslint/scope-manager": "8.32.0",
"@typescript-eslint/types": "8.32.0",
@@ -2794,7 +2791,6 @@
"integrity": "sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==",
"dev": true,
"license": "MIT",
"peer": true,
"bin": {
"acorn": "bin/acorn"
},
@@ -2855,7 +2851,6 @@
"integrity": "sha512-j3fVLgvTo527anyYyJOGTYJbG+vnnQYvE0m5mmkc1TK+nxAppkCLMIL0aZ4dblVCNoGShhm+kzE4ZUykBoMg4g==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"fast-deep-equal": "^3.1.1",
"fast-json-stable-stringify": "^2.0.0",
@@ -3069,6 +3064,7 @@
"integrity": "sha512-+25nxyyznAXF7Nef3y0EbBeqmGZgeN/BxHX29Rs39djAfaFalmQ89SE6CWyDCHzGL0yt/ycBtNOmGTW0FyGWNw==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"archiver-utils": "^2.1.0",
"async": "^3.2.4",
@@ -3088,6 +3084,7 @@
"integrity": "sha512-bEL/yUb/fNNiNTuUz979Z0Yg5L+LzLxGJz8x79lYmR54fmTIb6ob/hNQgkQnIUDWIFjZVQwl9Xs356I6BAMHfw==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"glob": "^7.1.4",
"graceful-fs": "^4.2.0",
@@ -3110,6 +3107,7 @@
"integrity": "sha512-8p0AUk4XODgIewSi0l8Epjs+EVnWiK7NoDIEGU0HhE7+ZyY8D1IMY7odu5lRrFXGg71L15KG8QrPmum45RTtdA==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"core-util-is": "~1.0.0",
"inherits": "~2.0.3",
@@ -3125,7 +3123,8 @@
"resolved": "https://registry.npmmirror.com/safe-buffer/-/safe-buffer-5.1.2.tgz",
"integrity": "sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g==",
"dev": true,
"license": "MIT"
"license": "MIT",
"peer": true
},
"node_modules/archiver-utils/node_modules/string_decoder": {
"version": "1.1.1",
@@ -3133,6 +3132,7 @@
"integrity": "sha512-n/ShnvDi6FHbbVfviro+WojiFzv+s8MPMHBczVePfUpDJLwoLT0ht1l4YwBCbi8pJAveEEdnkHyPyTP/mzRfwg==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"safe-buffer": "~5.1.0"
}
@@ -3351,7 +3351,6 @@
}
],
"license": "MIT",
"peer": true,
"dependencies": {
"caniuse-lite": "^1.0.30001716",
"electron-to-chromium": "^1.5.149",
@@ -3849,6 +3848,7 @@
"integrity": "sha512-D3uMHtGc/fcO1Gt1/L7i1e33VOvD4A9hfQLP+6ewd+BvG/gQ84Yh4oftEhAdjSMgBgwGL+jsppT7JYNpo6MHHg==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"buffer-crc32": "^0.2.13",
"crc32-stream": "^4.0.2",
@@ -3994,6 +3994,7 @@
"integrity": "sha512-ROmzCKrTnOwybPcJApAA6WBWij23HVfGVNKqqrZpuyZOHqK2CwHSvpGuyt/UNNvaIjEd8X5IFGp4Mh+Ie1IHJQ==",
"dev": true,
"license": "Apache-2.0",
"peer": true,
"bin": {
"crc32": "bin/crc32.njs"
},
@@ -4007,6 +4008,7 @@
"integrity": "sha512-NT7w2JVU7DFroFdYkeq8cywxrgjPHWkdX1wjpRQXPX5Asews3tA+Ght6lddQO5Mkumffp3X7GEqku3epj2toIw==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"crc-32": "^1.2.0",
"readable-stream": "^3.4.0"
@@ -4246,7 +4248,6 @@
"integrity": "sha512-NoXo6Liy2heSklTI5OIZbCgXC1RzrDQsZkeEwXhdOro3FT1VBOvbubvscdPnjVuQ4AMwwv61oaH96AbiYg9EnQ==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"app-builder-lib": "25.1.8",
"builder-util": "25.1.7",
@@ -4409,7 +4410,6 @@
"integrity": "sha512-6dLslJrQYB1qvqVPYRv1PhAA/uytC66nUeiTcq2JXiBzrmTWCHppqtGUjZhvnSRVatBCT5/SFdizdzcBiEiYUg==",
"hasInstallScript": true,
"license": "MIT",
"peer": true,
"dependencies": {
"@electron/get": "^2.0.0",
"@types/node": "^22.7.7",
@@ -4454,6 +4454,7 @@
"integrity": "sha512-2ntkJ+9+0GFP6nAISiMabKt6eqBB0kX1QqHNWFWAXgi0VULKGisM46luRFpIBiU3u/TDmhZMM8tzvo2Abn3ayg==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"app-builder-lib": "25.1.8",
"archiver": "^5.3.1",
@@ -4467,6 +4468,7 @@
"integrity": "sha512-oRXApq54ETRj4eMiFzGnHWGy+zo5raudjuxN0b8H7s/RU2oW0Wvsx9O0ACRN/kRq9E8Vu/ReskGB5o3ji+FzHQ==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"graceful-fs": "^4.2.0",
"jsonfile": "^6.0.1",
@@ -4482,6 +4484,7 @@
"integrity": "sha512-5dgndWOriYSm5cnYaJNhalLNDKOqFwyDB/rr1E9ZsGciGvKPs8R2xYGCacuf3z6K1YKDz182fd+fY3cn3pMqXQ==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"universalify": "^2.0.0"
},
@@ -4495,6 +4498,7 @@
"integrity": "sha512-gptHNQghINnc/vTGIk0SOFGFNXw7JVrlRUtConJRlvaw6DuX0wO5Jeko9sWrMBhh+PsYAZ7oXAiOnf/UKogyiw==",
"dev": true,
"license": "MIT",
"peer": true,
"engines": {
"node": ">= 10.0.0"
}
@@ -4809,7 +4813,6 @@
"integrity": "sha512-LSehfdpgMeWcTZkWZVIJl+tkZ2nuSkyyB9C27MZqFWXuph7DvaowgcTvKqxvpLW1JZIk8PN7hFY3Rj9LQ7m7lg==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"@eslint-community/eslint-utils": "^4.2.0",
"@eslint-community/regexpp": "^4.12.1",
@@ -4871,7 +4874,6 @@
"integrity": "sha512-zc1UmCpNltmVY34vuLRV61r1K27sWuX39E+uyUnY8xS2Bex88VV9cugG+UZbRSRGtGyFboj+D8JODyme1plMpw==",
"dev": true,
"license": "MIT",
"peer": true,
"bin": {
"eslint-config-prettier": "bin/cli.js"
},
@@ -5349,7 +5351,8 @@
"resolved": "https://registry.npmmirror.com/fs-constants/-/fs-constants-1.0.0.tgz",
"integrity": "sha512-y6OAwoSIf7FyjMIv94u+b5rdheZEjzR63GTyZJm5qh4Bi+2YgwLCcI/fPFZkL5PSixOt6ZNKm+w+Hfp/Bciwow==",
"dev": true,
"license": "MIT"
"license": "MIT",
"peer": true
},
"node_modules/fs-extra": {
"version": "8.1.0",
@@ -6105,7 +6108,8 @@
"resolved": "https://registry.npmmirror.com/isarray/-/isarray-1.0.0.tgz",
"integrity": "sha512-VLghIWNM6ELQzo7zwmcg0NmTVyWKYjvIeM83yjp0wRDTmUnrM678fQbcKBo6n2CJEF0szoG//ytg+TKla89ALQ==",
"dev": true,
"license": "MIT"
"license": "MIT",
"peer": true
},
"node_modules/isbinaryfile": {
"version": "5.0.4",
@@ -6296,6 +6300,7 @@
"integrity": "sha512-b94GiNHQNy6JNTrt5w6zNyffMrNkXZb3KTkCZJb2V1xaEGCk093vkZ2jk3tpaeP33/OiXC+WvK9AxUebnf5nbw==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"readable-stream": "^2.0.5"
},
@@ -6309,6 +6314,7 @@
"integrity": "sha512-8p0AUk4XODgIewSi0l8Epjs+EVnWiK7NoDIEGU0HhE7+ZyY8D1IMY7odu5lRrFXGg71L15KG8QrPmum45RTtdA==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"core-util-is": "~1.0.0",
"inherits": "~2.0.3",
@@ -6324,7 +6330,8 @@
"resolved": "https://registry.npmmirror.com/safe-buffer/-/safe-buffer-5.1.2.tgz",
"integrity": "sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g==",
"dev": true,
"license": "MIT"
"license": "MIT",
"peer": true
},
"node_modules/lazystream/node_modules/string_decoder": {
"version": "1.1.1",
@@ -6332,6 +6339,7 @@
"integrity": "sha512-n/ShnvDi6FHbbVfviro+WojiFzv+s8MPMHBczVePfUpDJLwoLT0ht1l4YwBCbi8pJAveEEdnkHyPyTP/mzRfwg==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"safe-buffer": "~5.1.0"
}
@@ -6383,28 +6391,32 @@
"resolved": "https://registry.npmmirror.com/lodash.defaults/-/lodash.defaults-4.2.0.tgz",
"integrity": "sha512-qjxPLHd3r5DnsdGacqOMU6pb/avJzdh9tFX2ymgoZE27BmjXrNy/y4LoaiTeAb+O3gL8AfpJGtqfX/ae2leYYQ==",
"dev": true,
"license": "MIT"
"license": "MIT",
"peer": true
},
"node_modules/lodash.difference": {
"version": "4.5.0",
"resolved": "https://registry.npmmirror.com/lodash.difference/-/lodash.difference-4.5.0.tgz",
"integrity": "sha512-dS2j+W26TQ7taQBGN8Lbbq04ssV3emRw4NY58WErlTO29pIqS0HmoT5aJ9+TUQ1N3G+JOZSji4eugsWwGp9yPA==",
"dev": true,
"license": "MIT"
"license": "MIT",
"peer": true
},
"node_modules/lodash.flatten": {
"version": "4.4.0",
"resolved": "https://registry.npmmirror.com/lodash.flatten/-/lodash.flatten-4.4.0.tgz",
"integrity": "sha512-C5N2Z3DgnnKr0LOpv/hKCgKdb7ZZwafIrsesve6lmzvZIRZRGaZ/l6Q8+2W7NaT+ZwO3fFlSCzCzrDCFdJfZ4g==",
"dev": true,
"license": "MIT"
"license": "MIT",
"peer": true
},
"node_modules/lodash.isplainobject": {
"version": "4.0.6",
"resolved": "https://registry.npmmirror.com/lodash.isplainobject/-/lodash.isplainobject-4.0.6.tgz",
"integrity": "sha512-oSXzaWypCMHkPC3NvBEaPHf0KsA5mvPrOPgQWDsbg8n7orZ290M0BmC/jgRZ4vcJ6DTAhjrsSYgdsW/F+MFOBA==",
"dev": true,
"license": "MIT"
"license": "MIT",
"peer": true
},
"node_modules/lodash.merge": {
"version": "4.6.2",
@@ -6418,7 +6430,8 @@
"resolved": "https://registry.npmmirror.com/lodash.union/-/lodash.union-4.6.0.tgz",
"integrity": "sha512-c4pB2CdGrGdjMKYLA+XiRDO7Y0PRQbm/Gzg8qMj+QH+pFVAoTp5sBpO0odL3FjoPCGjK96p6qsP+yQoiLoOBcw==",
"dev": true,
"license": "MIT"
"license": "MIT",
"peer": true
},
"node_modules/log-symbols": {
"version": "4.1.0",
@@ -6971,6 +6984,7 @@
"integrity": "sha512-6eZs5Ls3WtCisHWp9S2GUy8dqkpGi4BVSz3GaqiE6ezub0512ESztXUwUB6C6IKbQkY2Pnb/mD4WYojCRwcwLA==",
"dev": true,
"license": "MIT",
"peer": true,
"engines": {
"node": ">=0.10.0"
}
@@ -7394,7 +7408,6 @@
"integrity": "sha512-QQtaxnoDJeAkDvDKWCLiwIXkTgRhwYDEQCghU9Z6q03iyek/rxRh/2lC3HB7P8sWT2xC/y5JDctPLBIGzHKbhw==",
"dev": true,
"license": "MIT",
"peer": true,
"bin": {
"prettier": "bin/prettier.cjs"
},
@@ -7423,7 +7436,8 @@
"resolved": "https://registry.npmmirror.com/process-nextick-args/-/process-nextick-args-2.0.1.tgz",
"integrity": "sha512-3ouUOpQhtgrbOa17J7+uxOTpITYWaGP7/AhoR3+A+/1e9skrzelGi/dXzEYyvbxubEF6Wn2ypscTKiKJFFn1ag==",
"dev": true,
"license": "MIT"
"license": "MIT",
"peer": true
},
"node_modules/progress": {
"version": "2.0.3",
@@ -7542,6 +7556,7 @@
"integrity": "sha512-v05I2k7xN8zXvPD9N+z/uhXPaj0sUFCe2rcWZIpBsqxfP7xXFQ0tipAd/wjj1YxWyWtUS5IDJpOG82JKt2EAVA==",
"dev": true,
"license": "Apache-2.0",
"peer": true,
"dependencies": {
"minimatch": "^5.1.0"
}
@@ -7552,6 +7567,7 @@
"integrity": "sha512-lKwV/1brpG6mBUFHtb7NUmtABCb2WZZmm2wNiOA5hAb8VdCS4B3dtMWyvcoViccwAW/COERjXLt0zP1zXUN26g==",
"dev": true,
"license": "ISC",
"peer": true,
"dependencies": {
"brace-expansion": "^2.0.1"
},
@@ -8219,6 +8235,7 @@
"integrity": "sha512-ujeqbceABgwMZxEJnk2HDY2DlnUZ+9oEcb1KzTVfYHio0UE6dG71n60d8D2I4qNvleWrrXpmjpt7vZeF1LnMZQ==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"bl": "^4.0.3",
"end-of-stream": "^1.4.1",
@@ -8343,7 +8360,6 @@
"integrity": "sha512-M7BAV6Rlcy5u+m6oPhAPFgJTzAioX/6B0DxyvDlo9l8+T3nLKbrczg2WLUyzd45L8RqfUMyGPzekbMvX2Ldkwg==",
"dev": true,
"license": "MIT",
"peer": true,
"engines": {
"node": ">=12"
},
@@ -8446,7 +8462,6 @@
"integrity": "sha512-p1diW6TqL9L07nNxvRMM7hMMw4c5XOo/1ibL4aAIGmSAt9slTE1Xgw5KWuof2uTOvCg9BY7ZRi+GaF+7sfgPeQ==",
"devOptional": true,
"license": "Apache-2.0",
"peer": true,
"bin": {
"tsc": "bin/tsc",
"tsserver": "bin/tsserver"
@@ -8596,7 +8611,6 @@
"integrity": "sha512-cZn6NDFE7wdTpINgs++ZJ4N49W2vRp8LCKrn3Ob1kYNtOo21vfDoaV5GzBfLU4MovSAB8uNRm4jgzVQZ+mBzPQ==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"esbuild": "^0.25.0",
"fdir": "^6.4.4",
@@ -8687,7 +8701,6 @@
"integrity": "sha512-M7BAV6Rlcy5u+m6oPhAPFgJTzAioX/6B0DxyvDlo9l8+T3nLKbrczg2WLUyzd45L8RqfUMyGPzekbMvX2Ldkwg==",
"dev": true,
"license": "MIT",
"peer": true,
"engines": {
"node": ">=12"
},
@@ -8707,7 +8720,6 @@
"resolved": "https://registry.npmmirror.com/vue/-/vue-3.5.13.tgz",
"integrity": "sha512-wmeiSMxkZCSc+PM2w2VRsOYAZC8GdipNFRTsLSfodVqI9mbejKeXEGr8SckuLnrQPGe3oJN5c3K0vpoU9q/wCQ==",
"license": "MIT",
"peer": true,
"dependencies": {
"@vue/compiler-dom": "3.5.13",
"@vue/compiler-sfc": "3.5.13",
@@ -8730,7 +8742,6 @@
"integrity": "sha512-dbCBnd2e02dYWsXoqX5yKUZlOt+ExIpq7hmHKPb5ZqKcjf++Eo0hMseFTZMLKThrUk61m+Uv6A2YSBve6ZvuDQ==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"debug": "^4.4.0",
"eslint-scope": "^8.2.0",
@@ -9035,6 +9046,7 @@
"integrity": "sha512-9qv4rlDiopXg4E69k+vMHjNN63YFMe9sZMrdlvKnCjlCRWeCBswPPMPUfx+ipsAWq1LXHe70RcbaHdJJpS6hyQ==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"archiver-utils": "^3.0.4",
"compress-commons": "^4.1.2",
@@ -9050,6 +9062,7 @@
"integrity": "sha512-KVgf4XQVrTjhyWmx6cte4RxonPLR9onExufI1jhvw/MQ4BB6IsZD5gT8Lq+u/+pRkWna/6JoHpiQioaqFP5Rzw==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"glob": "^7.2.3",
"graceful-fs": "^4.2.0",

View File

@@ -1,7 +1,7 @@
{
"name": "auto-caption",
"productName": "Auto Caption",
"version": "1.1.0",
"version": "0.6.0",
"description": "A cross-platform subtitle display software.",
"main": "./out/main/index.js",
"author": "himeditator",

View File

@@ -7,10 +7,8 @@ import icon from '../../build/icon.png?asset'
import { captionWindow } from './CaptionWindow'
import { allConfig } from './utils/AllConfig'
import { captionEngine } from './utils/CaptionEngine'
import { Log } from './utils/Log'
class ControlWindow {
mounted: boolean = false;
window: BrowserWindow | undefined;
public createWindow(): void {
@@ -36,7 +34,6 @@ class ControlWindow {
})
this.window.on('closed', () => {
this.mounted = false
this.window = undefined
allConfig.writeConfig()
})
@@ -66,8 +63,7 @@ class ControlWindow {
})
ipcMain.handle('both.window.mounted', () => {
this.mounted = true
return allConfig.getFullConfig(Log.getAndClearLogQueue())
return allConfig.getFullConfig()
})
ipcMain.handle('control.nativeTheme.get', () => {
@@ -113,10 +109,6 @@ class ControlWindow {
allConfig.uiTheme = args
})
ipcMain.on('control.uiColor.change', (_, args) => {
allConfig.uiColor = args
})
ipcMain.on('control.leftBarWidth.change', (_, args) => {
allConfig.leftBarWidth = args
})
@@ -159,10 +151,6 @@ class ControlWindow {
captionEngine.stop()
})
ipcMain.on('control.engine.forceKill', () => {
captionEngine.kill()
})
ipcMain.on('control.captionLog.clear', () => {
allConfig.captionLog.splice(0)
})

View File

@@ -4,6 +4,5 @@ export default {
"engine.start.error": "Caption engine failed to start: ",
"engine.output.parse.error": "Unable to parse caption engine output as a JSON object: ",
"engine.error": "Caption engine error: ",
"engine.shutdown.error": "Failed to shut down the caption engine process: ",
"engine.start.timeout": "Caption engine startup timeout, automatically force stopped"
"engine.shutdown.error": "Failed to shut down the caption engine process: "
}

View File

@@ -4,6 +4,5 @@ export default {
"engine.start.error": "字幕エンジンの起動に失敗しました: ",
"engine.output.parse.error": "字幕エンジンの出力を JSON オブジェクトとして解析できませんでした: ",
"engine.error": "字幕エンジンエラー: ",
"engine.shutdown.error": "字幕エンジンプロセスの終了に失敗しました: ",
"engine.start.timeout": "字幕エンジンの起動がタイムアウトしました。自動的に強制停止しました"
"engine.shutdown.error": "字幕エンジンプロセスの終了に失敗しました: "
}

View File

@@ -4,6 +4,5 @@ export default {
"engine.start.error": "字幕引擎启动失败:",
"engine.output.parse.error": "字幕引擎输出内容无法解析为 JSON 对象:",
"engine.error": "字幕引擎错误:",
"engine.shutdown.error": "字幕引擎进程关闭失败:",
"engine.start.timeout": "字幕引擎启动超时,已自动强制停止"
"engine.shutdown.error": "字幕引擎进程关闭失败:"
}

View File

@@ -6,29 +6,17 @@ export interface Controls {
engineEnabled: boolean,
sourceLang: string,
targetLang: string,
transModel: string,
ollamaName: string,
ollamaUrl: string,
ollamaApiKey: string,
engine: string,
audio: 0 | 1,
translation: boolean,
recording: boolean,
API_KEY: string,
voskModelPath: string,
sosvModelPath: string,
glmUrl: string,
glmModel: string,
glmApiKey: string,
recordingPath: string,
modelPath: string,
customized: boolean,
customizedApp: string,
customizedCommand: string,
startTimeoutSeconds: number
customizedCommand: string
}
export interface Styles {
lineNumber: number,
lineBreak: number,
fontFamily: string,
fontSize: number,
@@ -57,23 +45,14 @@ export interface CaptionItem {
translation: string
}
export interface SoftwareLogItem {
type: "INFO" | "WARN" | "ERROR",
index: number,
time: string,
text: string
}
export interface FullConfig {
platform: string,
uiLanguage: UILanguage,
uiTheme: UITheme,
uiColor: string,
leftBarWidth: number,
styles: Styles,
controls: Controls,
captionLog: CaptionItem[],
softwareLog: SoftwareLogItem[]
captionLog: CaptionItem[]
}
export interface EngineInfo {

View File

@@ -1,26 +1,13 @@
import {
UILanguage, UITheme, Styles, Controls,
CaptionItem, FullConfig, SoftwareLogItem
CaptionItem, FullConfig
} from '../types'
import { Log } from './Log'
import { app, BrowserWindow } from 'electron'
import { passwordMaskingForObject } from './UtilsFunc'
import * as path from 'path'
import * as fs from 'fs'
import * as os from 'os'
interface CaptionTranslation {
time_s: string,
translation: string
}
function getDesktopPath() {
const homeDir = os.homedir()
return path.join(homeDir, 'Desktop')
}
const defaultStyles: Styles = {
lineNumber: 1,
lineBreak: 1,
fontFamily: 'sans-serif',
fontSize: 24,
@@ -44,26 +31,15 @@ const defaultStyles: Styles = {
const defaultControls: Controls = {
sourceLang: 'en',
targetLang: 'zh',
transModel: 'ollama',
ollamaName: 'qwen2.5:0.5b',
ollamaUrl: 'http://localhost:11434',
ollamaApiKey: '',
engine: 'gummy',
audio: 0,
engineEnabled: false,
API_KEY: '',
voskModelPath: '',
sosvModelPath: '',
glmUrl: 'https://open.bigmodel.cn/api/paas/v4/audio/transcriptions',
glmModel: 'glm-asr-2512',
glmApiKey: '',
recordingPath: getDesktopPath(),
modelPath: '',
translation: true,
recording: false,
customized: false,
customizedApp: '',
customizedCommand: '',
startTimeoutSeconds: 30
customizedCommand: ''
};
@@ -73,7 +49,6 @@ class AllConfig {
uiLanguage: UILanguage = 'zh';
leftBarWidth: number = 8;
uiTheme: UITheme = 'system';
uiColor: string = '#1677ff';
styles: Styles = {...defaultStyles};
controls: Controls = {...defaultControls};
@@ -85,15 +60,14 @@ class AllConfig {
public readConfig() {
const configPath = path.join(app.getPath('userData'), 'config.json')
if(fs.existsSync(configPath)){
Log.info('Read Config from:', configPath)
const config = JSON.parse(fs.readFileSync(configPath, 'utf-8'))
if(config.captionWindowWidth) this.captionWindowWidth = config.captionWindowWidth
if(config.uiLanguage) this.uiLanguage = config.uiLanguage
if(config.uiTheme) this.uiTheme = config.uiTheme
if(config.uiColor) this.uiColor = config.uiColor
if(config.leftBarWidth) this.leftBarWidth = config.leftBarWidth
if(config.styles) this.setStyles(config.styles)
if(config.controls) this.setControls(config.controls)
Log.info('Read Config from:', configPath)
}
}
@@ -102,7 +76,6 @@ class AllConfig {
captionWindowWidth: this.captionWindowWidth,
uiLanguage: this.uiLanguage,
uiTheme: this.uiTheme,
uiColor: this.uiColor,
leftBarWidth: this.leftBarWidth,
controls: this.controls,
styles: this.styles
@@ -112,17 +85,15 @@ class AllConfig {
Log.info('Write Config to:', configPath)
}
public getFullConfig(softwareLog: SoftwareLogItem[]): FullConfig {
public getFullConfig(): FullConfig {
return {
platform: process.platform,
uiLanguage: this.uiLanguage,
uiTheme: this.uiTheme,
uiColor: this.uiColor,
leftBarWidth: this.leftBarWidth,
styles: this.styles,
controls: this.controls,
captionLog: this.captionLog,
softwareLog: softwareLog
captionLog: this.captionLog
}
}
@@ -152,7 +123,7 @@ class AllConfig {
}
}
this.controls.engineEnabled = engineEnabled
Log.info('Set Controls:', passwordMaskingForObject(this.controls))
Log.info('Set Controls:', this.controls)
}
public sendControls(window: BrowserWindow, info = true) {
@@ -179,28 +150,12 @@ class AllConfig {
}
}
public updateCaptionTranslation(trans: CaptionTranslation){
for(let i = this.captionLog.length - 1; i >= 0; i--){
if(this.captionLog[i].time_s === trans.time_s){
this.captionLog[i].translation = trans.translation
for(const window of BrowserWindow.getAllWindows()){
this.sendCaptionLog(window, 'upd', i)
}
break
}
}
}
public sendCaptionLog(
window: BrowserWindow,
command: 'add' | 'upd' | 'set',
index: number | undefined = undefined
) {
public sendCaptionLog(window: BrowserWindow, command: 'add' | 'upd' | 'set') {
if(command === 'add'){
window.webContents.send(`both.captionLog.add`, this.captionLog.at(-1))
window.webContents.send(`both.captionLog.add`, this.captionLog[this.captionLog.length - 1])
}
else if(command === 'upd'){
if(index !== undefined) window.webContents.send(`both.captionLog.upd`, this.captionLog[index])
else window.webContents.send(`both.captionLog.upd`, this.captionLog.at(-1))
window.webContents.send(`both.captionLog.upd`, this.captionLog[this.captionLog.length - 1])
}
else if(command === 'set'){
window.webContents.send(`both.captionLog.set`, this.captionLog)

View File

@@ -1,13 +1,12 @@
import { exec, spawn } from 'child_process'
import { app } from 'electron'
import { is } from '@electron-toolkit/utils'
import * as path from 'path'
import * as net from 'net'
import path from 'path'
import net from 'net'
import { controlWindow } from '../ControlWindow'
import { allConfig } from './AllConfig'
import { i18n } from '../i18n'
import { Log } from './Log'
import { passwordMaskingForList } from './UtilsFunc'
export class CaptionEngine {
appPath: string = ''
@@ -15,9 +14,8 @@ export class CaptionEngine {
process: any | undefined
client: net.Socket | undefined
port: number = 8080
status: 'running' | 'starting' | 'stopping' | 'stopped' | 'starting-timeout' = 'stopped'
status: 'running' | 'starting' | 'stopping' | 'stopped' = 'stopped'
timerID: NodeJS.Timeout | undefined
startTimeoutID: NodeJS.Timeout | undefined
private getApp(): boolean {
if (allConfig.controls.customized) {
@@ -39,7 +37,7 @@ export class CaptionEngine {
if(process.platform === "win32") {
this.appPath = path.join(
app.getAppPath(), 'engine',
'.venv', 'Scripts', 'python.exe'
'subenv', 'Scripts', 'python.exe'
)
this.command.push(path.join(
app.getAppPath(), 'engine', 'main.py'
@@ -49,7 +47,7 @@ export class CaptionEngine {
else {
this.appPath = path.join(
app.getAppPath(), 'engine',
'.venv', 'bin', 'python3'
'subenv', 'bin', 'python3'
)
this.command.push(path.join(
app.getAppPath(), 'engine', 'main.py'
@@ -61,70 +59,39 @@ export class CaptionEngine {
this.appPath = path.join(process.resourcesPath, 'engine', 'main.exe')
}
else {
this.appPath = path.join(process.resourcesPath, 'engine', 'main', 'main')
this.appPath = path.join(process.resourcesPath, 'engine', 'main')
}
}
this.command.push('-a', allConfig.controls.audio ? '1' : '0')
if(allConfig.controls.recording) {
this.command.push('-r', '1')
this.command.push('-rp', `"${allConfig.controls.recordingPath}"`)
}
this.port = Math.floor(Math.random() * (65535 - 1024 + 1)) + 1024
this.command.push('-p', this.port.toString())
this.command.push(
'-t', allConfig.controls.translation ?
allConfig.controls.targetLang : 'none'
)
if(allConfig.controls.engine === 'gummy') {
this.command.push('-e', 'gummy')
this.command.push('-s', allConfig.controls.sourceLang)
this.command.push(
'-t', allConfig.controls.translation ?
allConfig.controls.targetLang : 'none'
)
if(allConfig.controls.API_KEY) {
this.command.push('-k', allConfig.controls.API_KEY)
}
}
else if(allConfig.controls.engine === 'vosk'){
this.command.push('-e', 'vosk')
this.command.push('-vosk', `"${allConfig.controls.voskModelPath}"`)
this.command.push('-tm', allConfig.controls.transModel)
this.command.push('-omn', allConfig.controls.ollamaName)
if(allConfig.controls.ollamaUrl) this.command.push('-ourl', allConfig.controls.ollamaUrl)
if(allConfig.controls.ollamaApiKey) this.command.push('-okey', allConfig.controls.ollamaApiKey)
}
else if(allConfig.controls.engine === 'sosv'){
this.command.push('-e', 'sosv')
this.command.push('-s', allConfig.controls.sourceLang)
this.command.push('-sosv', `"${allConfig.controls.sosvModelPath}"`)
this.command.push('-tm', allConfig.controls.transModel)
this.command.push('-omn', allConfig.controls.ollamaName)
if(allConfig.controls.ollamaUrl) this.command.push('-ourl', allConfig.controls.ollamaUrl)
if(allConfig.controls.ollamaApiKey) this.command.push('-okey', allConfig.controls.ollamaApiKey)
}
else if(allConfig.controls.engine === 'glm'){
this.command.push('-e', 'glm')
this.command.push('-s', allConfig.controls.sourceLang)
this.command.push('-gurl', allConfig.controls.glmUrl)
this.command.push('-gmodel', allConfig.controls.glmModel)
if(allConfig.controls.glmApiKey) {
this.command.push('-gkey', allConfig.controls.glmApiKey)
}
this.command.push('-tm', allConfig.controls.transModel)
this.command.push('-omn', allConfig.controls.ollamaName)
if(allConfig.controls.ollamaUrl) this.command.push('-ourl', allConfig.controls.ollamaUrl)
if(allConfig.controls.ollamaApiKey) this.command.push('-okey', allConfig.controls.ollamaApiKey)
this.command.push('-m', `"${allConfig.controls.modelPath}"`)
}
}
Log.info('Engine Path:', this.appPath)
Log.info('Engine Command:', passwordMaskingForList(this.command))
Log.info('Engine Command:', this.command)
return true
}
public connect() {
Log.info('Connecting to caption engine server...')
if(this.client) { Log.warn('Client already exists, ignoring...') }
if (this.startTimeoutID) {
clearTimeout(this.startTimeoutID)
this.startTimeoutID = undefined
}
this.client = net.createConnection({ port: this.port }, () => {
Log.info('Connected to caption engine server');
});
@@ -159,16 +126,6 @@ export class CaptionEngine {
this.process = spawn(this.appPath, this.command)
this.status = 'starting'
Log.info('Caption Engine Starting, PID:', this.process.pid)
const timeoutMs = allConfig.controls.startTimeoutSeconds * 1000
this.startTimeoutID = setTimeout(() => {
if (this.status === 'starting') {
Log.warn(`Engine start timeout after ${allConfig.controls.startTimeoutSeconds} seconds, forcing kill...`)
this.status = 'starting-timeout'
controlWindow.sendErrorMessage(i18n('engine.start.timeout'))
this.kill()
}
}, timeoutMs)
this.process.stdout.on('data', (data: any) => {
const lines = data.toString().split('\n')
@@ -178,7 +135,7 @@ export class CaptionEngine {
const data_obj = JSON.parse(line)
handleEngineData(data_obj)
} catch (e) {
// controlWindow.sendErrorMessage(i18n('engine.output.parse.error') + e)
controlWindow.sendErrorMessage(i18n('engine.output.parse.error') + e)
Log.error('Error parsing JSON:', e)
}
}
@@ -189,7 +146,8 @@ export class CaptionEngine {
const lines = data.toString().split('\n')
lines.forEach((line: string) => {
if(line.trim()){
Log.error(line)
controlWindow.sendErrorMessage(/*i18n('engine.error') +*/ line)
console.error(line)
}
})
});
@@ -204,10 +162,6 @@ export class CaptionEngine {
}
this.status = 'stopped'
clearInterval(this.timerID)
if (this.startTimeoutID) {
clearTimeout(this.startTimeoutID)
this.startTimeoutID = undefined
}
Log.info(`Engine exited with code ${code}`)
});
}
@@ -215,6 +169,7 @@ export class CaptionEngine {
public stop() {
if(this.status !== 'running'){
Log.warn('Trying to stop engine which is not running, current status:', this.status)
return
}
this.sendCommand('stop')
if(this.client){
@@ -222,6 +177,7 @@ export class CaptionEngine {
this.client = undefined
}
this.status = 'stopping'
Log.info('Caption engine process stopping...')
this.timerID = setTimeout(() => {
if(this.status !== 'stopping') return
Log.warn('Engine process still not stopped, trying to kill...')
@@ -234,29 +190,19 @@ export class CaptionEngine {
if(this.status !== 'running'){
Log.warn('Trying to kill engine which is not running, current status:', this.status)
}
Log.warn('Killing engine process, PID:', this.process.pid)
if (this.startTimeoutID) {
clearTimeout(this.startTimeoutID)
this.startTimeoutID = undefined
}
Log.warn('Trying to kill engine process, PID:', this.process.pid)
if(this.client){
this.client.destroy()
this.client = undefined
}
if (this.process.pid) {
let cmd = `kill -9 ${this.process.pid}`;
let cmd = `kill ${this.process.pid}`;
if (process.platform === "win32") {
cmd = `taskkill /pid ${this.process.pid} /t /f`
}
exec(cmd, (error) => {
if (error) {
Log.error('Failed to kill process:', error)
} else {
Log.info('Process killed successfully')
}
})
exec(cmd)
}
this.status = 'stopping'
}
}
@@ -273,24 +219,14 @@ function handleEngineData(data: any) {
else if(data.command === 'caption') {
allConfig.updateCaptionLog(data);
}
else if(data.command === 'translation') {
allConfig.updateCaptionTranslation(data);
}
else if(data.command === 'print') {
console.log(data.content)
Log.info('Engine Print:', data.content)
}
else if(data.command === 'info') {
Log.info('Engine Info:', data.content)
}
else if(data.command === 'warn') {
Log.warn('Engine Warn:', data.content)
}
else if(data.command === 'error') {
Log.error('Engine Error:', data.content)
controlWindow.sendErrorMessage(/*i18n('engine.error') +*/ data.content)
}
else if(data.command === 'usage') {
Log.info('Engine Token Usage: ', data.content)
Log.info('Engine Usage: ', data.content)
}
else {
Log.warn('Unknown command:', data)

View File

@@ -1,9 +1,3 @@
import { controlWindow } from "../ControlWindow"
import { type SoftwareLogItem } from "../types"
let logIndex = 0
const logQueue: SoftwareLogItem[] = []
function getTimeString() {
const now = new Date()
const HH = String(now.getHours()).padStart(2, '0')
@@ -14,45 +8,15 @@ function getTimeString() {
}
export class Log {
static getAndClearLogQueue() {
const copiedQueue = structuredClone(logQueue)
logQueue.length = 0
return copiedQueue
}
static handleLog(logType: "INFO" | "WARN" | "ERROR", ...msg: any[]) {
const timeStr = getTimeString()
const logPre = `[${logType} ${timeStr}]`
let logStr = ""
for(let i = 0; i < msg.length; i++) {
logStr += i ? " " : ""
if(typeof msg[i] === "string") logStr += msg[i]
else logStr += JSON.stringify(msg[i], undefined, 2)
}
console.log(logPre, logStr)
const logItem: SoftwareLogItem = {
type: logType,
index: ++logIndex,
time: timeStr,
text: logStr
}
if(controlWindow.mounted && controlWindow.window) {
controlWindow.window.webContents.send('control.softwareLog.add', logItem)
}
else {
logQueue.push(logItem)
}
}
static info(...msg: any[]){
this.handleLog("INFO", ...msg)
console.log(`[INFO ${getTimeString()}]`, ...msg)
}
static warn(...msg: any[]){
this.handleLog("WARN", ...msg)
console.warn(`[WARN ${getTimeString()}]`, ...msg)
}
static error(...msg: any[]){
this.handleLog("ERROR", ...msg)
console.error(`[ERROR ${getTimeString()}]`, ...msg)
}
}

View File

@@ -1,24 +0,0 @@
function passwordMasking(pwd: string) {
return pwd.replace(/./g, '*')
}
export function passwordMaskingForList(args: string[]) {
const maskedArgs = [...args]
for(let i = 1; i < maskedArgs.length; i++) {
if(maskedArgs[i-1] === '-k' || maskedArgs[i-1] === '-okey' || maskedArgs[i-1] === '-gkey') {
maskedArgs[i] = passwordMasking(maskedArgs[i])
}
}
return maskedArgs
}
export function passwordMaskingForObject(args: Record<string, any>) {
const maskedArgs = {...args}
for(const key in maskedArgs) {
const lKey = key.toLowerCase()
if(lKey.includes('api') && lKey.includes('key')) {
maskedArgs[key] = passwordMasking(maskedArgs[key])
}
}
return maskedArgs
}

View File

@@ -2,7 +2,7 @@
<html>
<head>
<meta charset="UTF-8" />
<title>Auto Caption v1.1.0</title>
<title>Auto Caption</title>
<!-- https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP -->
<meta
http-equiv="Content-Security-Policy"

View File

@@ -6,7 +6,6 @@
import { onMounted } from 'vue'
import { FullConfig } from './types'
import { useCaptionLogStore } from './stores/captionLog'
import { useSoftwareLogStore } from './stores/softwareLog'
import { useCaptionStyleStore } from './stores/captionStyle'
import { useEngineControlStore } from './stores/engineControl'
import { useGeneralSettingStore } from './stores/generalSetting'
@@ -15,13 +14,11 @@ onMounted(() => {
window.electron.ipcRenderer.invoke('both.window.mounted').then((data: FullConfig) => {
useGeneralSettingStore().uiLanguage = data.uiLanguage
useGeneralSettingStore().uiTheme = data.uiTheme
useGeneralSettingStore().uiColor = data.uiColor
useGeneralSettingStore().leftBarWidth = data.leftBarWidth
useCaptionStyleStore().setStyles(data.styles)
useEngineControlStore().platform = data.platform
useEngineControlStore().setControls(data.controls)
useCaptionLogStore().captionData = data.captionLog
useSoftwareLogStore().softwareLogs = data.softwareLog
})
})
</script>

View File

@@ -17,7 +17,6 @@
}
.input-area {
display: inline-block;
width: calc(100% - 100px);
min-width: 100px;
}

View File

@@ -1,151 +1,138 @@
<template>
<div>
<div class="caption-title">
<span style="margin-right: 30px;">{{ $t('log.title') }}</span>
<div class="caption-list">
<div>
<a-app class="caption-title">
<span style="margin-right: 30px;">{{ $t('log.title') }}</span>
</a-app>
<a-popover :title="$t('log.baseTime')">
<template #content>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0"
v-model:value="baseHH"
></a-input>
<span class="base-time-label">{{ $t('log.hour') }}</span>
</div>
</div><span style="margin: 0 4px;">:</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="59"
v-model:value="baseMM"
></a-input>
<span class="base-time-label">{{ $t('log.min') }}</span>
</div>
</div><span style="margin: 0 4px;">:</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="59"
v-model:value="baseSS"
></a-input>
<span class="base-time-label">{{ $t('log.sec') }}</span>
</div>
</div><span style="margin: 0 4px;">.</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="999"
v-model:value="baseMS"
></a-input>
<span class="base-time-label">{{ $t('log.ms') }}</span>
</div>
</div>
</template>
<a-button
type="primary"
style="margin-right: 20px;"
@click="changeBaseTime"
:disabled="captionData.length === 0"
>{{ $t('log.changeTime') }}</a-button>
</a-popover>
<a-popover :title="$t('log.exportOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.exportFormat') }}</span>
<a-radio-group v-model:value="exportFormat">
<a-radio-button value="srt">.srt</a-radio-button>
<a-radio-button value="json">.json</a-radio-button>
</a-radio-group>
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.exportContent') }}</span>
<a-radio-group v-model:value="contentOption">
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="exportCaptions"
:disabled="captionData.length === 0"
>{{ $t('log.export') }}</a-button>
</a-popover>
<a-popover :title="$t('log.copyOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.addIndex') }}</span>
<a-switch v-model:checked="showIndex" />
<span class="input-label">{{ $t('log.copyTime') }}</span>
<a-switch v-model:checked="copyTime" />
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.copyContent') }}</span>
<a-radio-group v-model:value="contentOption">
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="copyCaptions"
>{{ $t('log.copy') }}</a-button>
</a-popover>
<a-button
danger
@click="clearCaptions"
>{{ $t('log.clear') }}</a-button>
</div>
<a-popover :title="$t('log.baseTime')">
<template #content>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0"
v-model:value="baseHH"
></a-input>
<span class="base-time-label">{{ $t('log.hour') }}</span>
</div>
</div><span style="margin: 0 4px;">:</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="59"
v-model:value="baseMM"
></a-input>
<span class="base-time-label">{{ $t('log.min') }}</span>
</div>
</div><span style="margin: 0 4px;">:</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="59"
v-model:value="baseSS"
></a-input>
<span class="base-time-label">{{ $t('log.sec') }}</span>
</div>
</div><span style="margin: 0 4px;">.</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="999"
v-model:value="baseMS"
></a-input>
<span class="base-time-label">{{ $t('log.ms') }}</span>
</div>
</div>
</template>
<a-button
type="primary"
style="margin-right: 20px;"
@click="changeBaseTime"
:disabled="captionData.length === 0"
>{{ $t('log.changeTime') }}</a-button>
</a-popover>
<a-popover :title="$t('log.exportOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.exportFormat') }}</span>
<a-radio-group v-model:value="exportFormat">
<a-radio-button value="srt"><code>.srt</code></a-radio-button>
<a-radio-button value="json"><code>.json</code></a-radio-button>
</a-radio-group>
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.exportContent') }}</span>
<a-radio-group v-model:value="contentOption">
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="exportCaptions"
:disabled="captionData.length === 0"
>{{ $t('log.export') }}</a-button>
</a-popover>
<a-popover :title="$t('log.copyOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.addIndex') }}</span>
<a-switch v-model:checked="showIndex" />
<span class="input-label">{{ $t('log.copyTime') }}</span>
<a-switch v-model:checked="copyTime" />
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.copyContent') }}</span>
<a-radio-group v-model:value="contentOption">
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
</a-radio-group>
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.copyNum') }}</span>
<a-radio-group v-model:value="copyNum">
<a-radio-button :value="0"><code>[:]</code></a-radio-button>
<a-radio-button :value="1"><code>[-1:]</code></a-radio-button>
<a-radio-button :value="2"><code>[-2:]</code></a-radio-button>
<a-radio-button :value="3"><code>[-3:]</code></a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="copyCaptions"
>{{ $t('log.copy') }}</a-button>
</a-popover>
<a-button
danger
@click="clearCaptions"
>{{ $t('log.clear') }}</a-button>
</div>
<a-table
:columns="columns"
:data-source="captionData"
v-model:pagination="pagination"
style="margin-top: 10px;"
>
<template #bodyCell="{ column, record }">
<template v-if="column.key === 'index'">
{{ record.index }}
<a-table
:columns="columns"
:data-source="captionData"
v-model:pagination="pagination"
style="margin-top: 10px;"
>
<template #bodyCell="{ column, record }">
<template v-if="column.key === 'index'">
{{ record.index }}
</template>
<template v-if="column.key === 'time'">
<div class="time-cell">
<div class="time-start">{{ record.time_s }}</div>
<div class="time-end">{{ record.time_t }}</div>
</div>
</template>
<template v-if="column.key === 'content'">
<div class="caption-content">
<div class="caption-text">{{ record.text }}</div>
<div class="caption-translation">{{ record.translation }}</div>
</div>
</template>
</template>
<template v-if="column.key === 'time'">
<div class="time-cell">
<code class="time-start"
:style="`color: ${uiColor}`"
>{{ record.time_s }}</code>
<code class="time-end">{{ record.time_t }}</code>
</div>
</template>
<template v-if="column.key === 'content'">
<div class="caption-content">
<div class="caption-text">{{ record.text }}</div>
<div
class="caption-translation"
:style="`border-left: 3px solid ${uiColor};`"
>{{ record.translation }}</div>
</div>
</template>
</template>
</a-table>
</a-table>
</div>
</template>
<script setup lang="ts">
import { ref } from 'vue'
import { storeToRefs } from 'pinia'
import { useCaptionLogStore } from '@renderer/stores/captionLog'
import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
import { message } from 'ant-design-vue'
import { useI18n } from 'vue-i18n'
import * as tc from '../utils/timeCalc'
@@ -156,14 +143,10 @@ const { t } = useI18n()
const captionLog = useCaptionLogStore()
const { captionData } = storeToRefs(captionLog)
const generalSetting = useGeneralSettingStore()
const { uiColor } = storeToRefs(generalSetting)
const exportFormat = ref('srt')
const showIndex = ref(true)
const copyTime = ref(true)
const contentOption = ref('both')
const copyNum = ref(0)
const baseHH = ref<number>(0)
const baseMM = ref<number>(0)
@@ -202,7 +185,7 @@ const columns = [
title: 'time',
dataIndex: 'time',
key: 'time',
width: 150,
width: 160,
sorter: (a: CaptionItem, b: CaptionItem) => {
if(a.time_s <= b.time_s) return -1
return 1
@@ -272,12 +255,7 @@ function getExportData() {
function copyCaptions() {
let content = ''
let start = 0
if(copyNum.value > 0) {
start = captionData.value.length - copyNum.value
if(start < 0) start = 0
}
for(let i = start; i < captionData.value.length; i++){
for(let i = 0; i < captionData.value.length; i++){
const item = captionData.value[i]
if(showIndex.value) content += `${i+1}\n`
if(copyTime.value) content += `${item.time_s} --> ${item.time_t}\n`.replace(/\./g, ',')
@@ -304,11 +282,10 @@ function clearCaptions() {
}
.caption-title {
color: var(--icon-color);
display: inline-block;
font-size: 24px;
font-weight: bold;
margin: 10px 0;
margin-bottom: 10px;
}
.base-time {
@@ -336,13 +313,10 @@ function clearCaptions() {
}
.time-start {
display: block;
font-weight: bold;
color: #1677ff;
}
.time-end {
display: block;
font-weight: bold;
color: #ff4d4f;
}
@@ -358,5 +332,6 @@ function clearCaptions() {
.caption-translation {
font-size: 14px;
padding-left: 16px;
border-left: 3px solid #1890ff;
}
</style>

View File

@@ -6,16 +6,6 @@
<a @click="resetStyle">{{ $t('style.resetStyle') }}</a>
</template>
<div class="input-item">
<span class="input-label">{{ $t('style.lineNumber') }}</span>
<a-radio-group v-model:value="currentLineNumber">
<a-radio-button :value="1">1</a-radio-button>
<a-radio-button :value="2">2</a-radio-button>
<a-radio-button :value="3">3</a-radio-button>
<a-radio-button :value="4">4</a-radio-button>
</a-radio-group>
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.longCaption') }}</span>
<a-select
@@ -44,18 +34,20 @@
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.fontSize') }}</span>
<a-slider
<a-input
class="input-area"
:min="0" :max="72"
type="range"
min="0" max="72"
v-model:value="currentFontSize"
/>
<div class="input-item-value">{{ currentFontSize }}px</div>
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.fontWeight') }}</span>
<a-slider
<a-input
class="input-area"
:min="1" :max="9"
type="range"
min="1" max="9"
v-model:value="currentFontWeight"
/>
<div class="input-item-value">{{ currentFontWeight*100 }}</div>
@@ -71,10 +63,11 @@
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.opacity') }}</span>
<a-slider
<a-input
class="input-area"
:min="0"
:max="100"
type="range"
min="0"
max="100"
v-model:value="currentOpacity"
/>
<div class="input-item-value">{{ currentOpacity }}%</div>
@@ -118,18 +111,20 @@
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.fontSize') }}</span>
<a-slider
<a-input
class="input-area"
:min="0" :max="72"
type="range"
min="0" max="72"
v-model:value="currentTransFontSize"
/>
<div class="input-item-value">{{ currentTransFontSize }}px</div>
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.fontWeight') }}</span>
<a-slider
<a-input
class="input-area"
:min="1" :max="9"
type="range"
min="1" max="9"
v-model:value="currentTransFontWeight"
/>
<div class="input-item-value">{{ currentTransFontWeight*100 }}</div>
@@ -141,27 +136,30 @@
<a-card size="small" :title="$t('style.shadow.title')">
<div class="input-item">
<span class="input-label">{{ $t('style.shadow.offsetX') }}</span>
<a-slider
<a-input
class="input-area"
:min="-10" :max="10"
type="range"
min="-10" max="10"
v-model:value="currentOffsetX"
/>
<div class="input-item-value">{{ currentOffsetX }}px</div>
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.shadow.offsetY') }}</span>
<a-slider
<a-input
class="input-area"
:min="-10" :max="10"
type="range"
min="-10" max="10"
v-model:value="currentOffsetY"
/>
<div class="input-item-value">{{ currentOffsetY }}px</div>
</div>
<div class="input-item">
<span class="input-label">{{ $t('style.shadow.blur') }}</span>
<a-slider
<a-input
class="input-area"
:min="0" :max="12"
type="range"
min="0" max="12"
v-model:value="currentBlur"
/>
<div class="input-item-value">{{ currentBlur }}px</div>
@@ -188,57 +186,28 @@
textShadow: currentTextShadow ? `${currentOffsetX}px ${currentOffsetY}px ${currentBlur}px ${currentTextShadowColor}` : 'none'
}"
>
<template v-if="captionData.length">
<template
v-for="val in revArr[Math.min(currentLineNumber, captionData.length)]"
:key="captionData[captionData.length - val].time_s"
>
<p :class="[currentLineBreak?'':'left-ellipsis']"
:style="{
fontFamily: currentFontFamily,
fontSize: currentFontSize + 'px',
color: currentFontColor,
fontWeight: currentFontWeight * 100
}">
<span>{{ captionData[captionData.length - val].text }}</span>
</p>
<p :class="[currentLineBreak?'':'left-ellipsis']"
v-if="currentTransDisplay && captionData[captionData.length - val].translation"
:style="{
fontFamily: currentTransFontFamily,
fontSize: currentTransFontSize + 'px',
color: currentTransFontColor,
fontWeight: currentTransFontWeight * 100
}"
>
<span>{{ captionData[captionData.length - val].translation }}</span>
</p>
</template>
</template>
<template v-else>
<template v-for="val in currentLineNumber" :key="val">
<p :class="[currentLineBreak?'':'left-ellipsis']"
:style="{
fontFamily: currentFontFamily,
fontSize: currentFontSize + 'px',
color: currentFontColor,
fontWeight: currentFontWeight * 100
}">
<span>{{ $t('example.original') }}</span>
</p>
<p :class="[currentLineBreak?'':'left-ellipsis']"
v-if="currentTransDisplay"
:style="{
fontFamily: currentTransFontFamily,
fontSize: currentTransFontSize + 'px',
color: currentTransFontColor,
fontWeight: currentTransFontWeight * 100
}"
>
<span>{{ $t('example.translation') }}</span>
</p>
</template>
</template>
<p :class="[currentLineBreak?'':'left-ellipsis']"
:style="{
fontFamily: currentFontFamily,
fontSize: currentFontSize + 'px',
color: currentFontColor,
fontWeight: currentFontWeight * 100
}">
<span v-if="captionData.length">{{ captionData[captionData.length-1].text }}</span>
<span v-else>{{ $t('example.original') }}</span>
</p>
<p :class="[currentLineBreak?'':'left-ellipsis']"
v-if="currentTransDisplay"
:style="{
fontFamily: currentTransFontFamily,
fontSize: currentTransFontSize + 'px',
color: currentTransFontColor,
fontWeight: currentTransFontWeight * 100
}"
>
<span v-if="captionData.length">{{ captionData[captionData.length-1].translation }}</span>
<span v-else>{{ $t('example.translation') }}</span>
</p>
</div>
</Teleport>
@@ -251,14 +220,6 @@ import { storeToRefs } from 'pinia'
import { notification } from 'ant-design-vue'
import { useI18n } from 'vue-i18n'
import { useCaptionLogStore } from '@renderer/stores/captionLog';
const revArr = {
1: [1],
2: [2, 1],
3: [3, 2, 1],
4: [4, 3, 2, 1],
}
const captionLog = useCaptionLogStore();
const { captionData } = storeToRefs(captionLog);
@@ -267,7 +228,6 @@ const { t } = useI18n()
const captionStyle = useCaptionStyleStore()
const { changeSignal } = storeToRefs(captionStyle)
const currentLineNumber = ref<number>(1)
const currentLineBreak = ref<number>(0)
const currentFontFamily = ref<string>('sans-serif')
const currentFontSize = ref<number>(24)
@@ -301,7 +261,6 @@ function useSameStyle(){
}
function applyStyle(){
captionStyle.lineNumber = currentLineNumber.value;
captionStyle.lineBreak = currentLineBreak.value;
captionStyle.fontFamily = currentFontFamily.value;
captionStyle.fontSize = currentFontSize.value;
@@ -331,7 +290,6 @@ function applyStyle(){
}
function backStyle(){
currentLineNumber.value = captionStyle.lineNumber;
currentLineBreak.value = captionStyle.lineBreak;
currentFontFamily.value = captionStyle.fontFamily;
currentFontSize.value = captionStyle.fontSize;
@@ -357,7 +315,7 @@ function resetStyle() {
}
watch(changeSignal, (val) => {
if(val === true) {
if(val == true) {
backStyle();
captionStyle.changeSignal = false;
}

View File

@@ -5,6 +5,23 @@
<a @click="applyChange">{{ $t('engine.applyChange') }}</a> |
<a @click="cancelChange">{{ $t('engine.cancelChange') }}</a>
</template>
<div class="input-item">
<span class="input-label">{{ $t('engine.sourceLang') }}</span>
<a-select
class="input-area"
v-model:value="currentSourceLang"
:options="langList"
></a-select>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.transLang') }}</span>
<a-select
:disabled="currentEngine === 'vosk'"
class="input-area"
v-model:value="currentTargetLang"
:options="langList.filter((item) => item.value !== 'auto')"
></a-select>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.captionEngine') }}</span>
<a-select
@@ -13,91 +30,6 @@
:options="captionEngine"
></a-select>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.sourceLang') }}</span>
<a-select
:disabled="currentEngine === 'vosk'"
class="input-area"
v-model:value="currentSourceLang"
:options="sLangList"
></a-select>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.transLang') }}</span>
<a-select
class="input-area"
v-model:value="currentTargetLang"
:options="tLangList"
></a-select>
</div>
<div class="input-item" v-if="transModel">
<span class="input-label">{{ $t('engine.transModel') }}</span>
<a-select
class="input-area"
v-model:value="currentTransModel"
:options="transModel"
></a-select>
</div>
<div class="input-item" v-if="transModel && currentTransModel === 'ollama'">
<a-popover placement="right">
<template #content>
<p class="label-hover-info">{{ $t('engine.modelNameNote') }}</p>
</template>
<span class="input-label info-label"
:style="{color: uiColor}"
>{{ $t('engine.modelName') }}</span>
</a-popover>
<a-input
class="input-area"
v-model:value="currentOllamaName"
></a-input>
</div>
<div class="input-item" v-if="transModel && currentTransModel === 'ollama'">
<a-popover placement="right">
<template #content>
<p class="label-hover-info">{{ $t('engine.baseURL') }}</p>
</template>
<span class="input-label info-label"
:style="{color: uiColor}"
>Base URL</span>
</a-popover>
<a-input
class="input-area"
v-model:value="currentOllamaUrl"
placeholder="http://localhost:11434"
></a-input>
</div>
<div class="input-item" v-if="transModel && currentTransModel === 'ollama'">
<a-popover placement="right">
<template #content>
<p class="label-hover-info">{{ $t('engine.apiKey') }}</p>
</template>
<span class="input-label info-label"
:style="{color: uiColor}"
>API Key</span>
</a-popover>
<a-input
class="input-area"
type="password"
v-model:value="currentOllamaApiKey"
/>
</div>
<div class="input-item" v-if="currentEngine === 'glm'">
<span class="input-label">GLM API URL</span>
<a-input
class="input-area"
v-model:value="currentGlmUrl"
placeholder="https://open.bigmodel.cn/api/paas/v4/audio/transcriptions"
></a-input>
</div>
<div class="input-item" v-if="currentEngine === 'glm'">
<span class="input-label">GLM Model Name</span>
<a-input
class="input-area"
v-model:value="currentGlmModel"
placeholder="glm-asr-2512"
></a-input>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.audioType') }}</span>
<a-select
@@ -111,13 +43,9 @@
<a-switch v-model:checked="currentTranslation" />
<span style="display:inline-block;width:10px;"></span>
<div style="display: inline-block;">
<span class="switch-label">{{ $t('engine.enableRecording') }}</span>
<a-switch v-model:checked="currentRecording" />
<span class="switch-label">{{ $t('engine.customEngine') }}</span>
<a-switch v-model:checked="currentCustomized" />
</div>
</div>
<div class="input-item">
<span class="input-label">{{ $t('engine.customEngine') }}</span>
<a-switch v-model:checked="currentCustomized" />
<span style="display:inline-block;width:10px;"></span>
<div style="display: inline-block;">
<span class="switch-label">{{ $t('engine.showMore') }}</span>
@@ -152,16 +80,11 @@
<a-card size="small" :title="$t('engine.showMore')" v-show="showMore" style="margin-top:10px;">
<div class="input-item">
<a-popover placement="right">
<a-popover>
<template #content>
<p class="label-hover-info">{{ $t('engine.apikeyInfo') }}</p>
<p><a href="https://bailian.console.aliyun.com" target="_blank">
https://bailian.console.aliyun.com
</a></p>
</template>
<span class="input-label info-label"
:style="{color: uiColor}"
>ALI {{ $t('engine.apikey') }}</span>
<span class="input-label info-label">{{ $t('engine.apikey') }}</span>
</a-popover>
<a-input
class="input-area"
@@ -170,106 +93,20 @@
/>
</div>
<div class="input-item">
<a-popover placement="right">
<a-popover>
<template #content>
<p class="label-hover-info">{{ $t('engine.glmApikeyInfo') }}</p>
<p><a href="https://open.bigmodel.cn/" target="_blank">
https://open.bigmodel.cn
</a></p>
<p class="label-hover-info">{{ $t('engine.modelPathInfo') }}</p>
</template>
<span class="input-label info-label"
:style="{color: uiColor}"
>GLM {{ $t('engine.apikey') }}</span>
</a-popover>
<a-input
class="input-area"
type="password"
v-model:value="currentGlmApiKey"
/>
</div>
<div class="input-item">
<a-popover placement="right">
<template #content>
<p class="label-hover-info">{{ $t('engine.voskModelPathInfo') }}</p>
<p class="label-hover-info">
<a href="https://alphacephei.com/vosk/models" target="_blank">Vosk {{ $t('engine.modelDownload') }}</a>
</p>
</template>
<span class="input-label info-label"
:style="{color: uiColor}"
>{{ $t('engine.voskModelPath') }}</span>
<span class="input-label info-label">{{ $t('engine.modelPath') }}</span>
</a-popover>
<span
class="input-folder"
:style="{color: uiColor}"
@click="selectFolderPath('vosk')"
@click="selectFolderPath"
><span><FolderOpenOutlined /></span></span>
<a-input
class="input-area"
style="width:calc(100% - 140px);"
v-model:value="currentVoskModelPath"
/>
</div>
<div class="input-item">
<a-popover placement="right">
<template #content>
<p class="label-hover-info">{{ $t('engine.sosvModelPathInfo') }}</p>
<p class="label-hover-info">
<a href="https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model" target="_blank">SOSV {{ $t('engine.modelDownload') }}</a>
</p>
</template>
<span class="input-label info-label"
:style="{color: uiColor}"
>{{ $t('engine.sosvModelPath') }}</span>
</a-popover>
<span
class="input-folder"
:style="{color: uiColor}"
@click="selectFolderPath('sosv')"
><span><FolderOpenOutlined /></span></span>
<a-input
class="input-area"
style="width:calc(100% - 140px);"
v-model:value="currentSosvModelPath"
/>
</div>
<div class="input-item">
<a-popover placement="right">
<template #content>
<p class="label-hover-info">{{ $t('engine.recordingPathInfo') }}</p>
</template>
<span class="input-label info-label"
:style="{color: uiColor}"
>{{ $t('engine.recordingPath') }}</span>
</a-popover>
<span
class="input-folder"
:style="{color: uiColor}"
@click="selectFolderPath('rec')"
><span><FolderOpenOutlined /></span></span>
<a-input
class="input-area"
style="width:calc(100% - 140px);"
v-model:value="currentRecordingPath"
/>
</div>
<div class="input-item">
<a-popover placement="right">
<template #content>
<p class="label-hover-info">{{ $t('engine.startTimeoutInfo') }}</p>
</template>
<span
class="input-label info-label"
:style="{color: uiColor, verticalAlign: 'middle'}"
>{{ $t('engine.startTimeout') }}</span>
</a-popover>
<a-input-number
class="input-area"
v-model:value="currentStartTimeoutSeconds"
:min="10"
:max="120"
:step="5"
:addon-after="$t('engine.seconds')"
v-model:value="currentModelPath"
/>
</div>
</a-card>
@@ -278,12 +115,12 @@
</template>
<script setup lang="ts">
import { ref, computed, watch, h } from 'vue'
import { ref, computed, watch } from 'vue'
import { storeToRefs } from 'pinia'
import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
import { useEngineControlStore } from '@renderer/stores/engineControl'
import { notification } from 'ant-design-vue'
import { ExclamationCircleOutlined, FolderOpenOutlined ,InfoCircleOutlined } from '@ant-design/icons-vue';
import { FolderOpenOutlined ,InfoCircleOutlined } from '@ant-design/icons-vue';
import { useI18n } from 'vue-i18n'
const { t } = useI18n()
@@ -292,93 +129,37 @@ const showMore = ref(false)
const engineControl = useEngineControlStore()
const { captionEngine, audioType, changeSignal } = storeToRefs(engineControl)
const generalSetting = useGeneralSettingStore()
const { uiColor } = storeToRefs(generalSetting)
const currentSourceLang = ref('auto')
const currentTargetLang = ref('zh')
const currentEngine = ref<string>('gummy')
const currentAudio = ref<0 | 1>(0)
const currentTranslation = ref<boolean>(true)
const currentRecording = ref<boolean>(false)
const currentTransModel = ref('ollama')
const currentOllamaName = ref('')
const currentOllamaUrl = ref('')
const currentOllamaApiKey = ref('')
const currentTranslation = ref<boolean>(false)
const currentAPI_KEY = ref<string>('')
const currentVoskModelPath = ref<string>('')
const currentSosvModelPath = ref<string>('')
const currentGlmUrl = ref<string>('')
const currentGlmModel = ref<string>('')
const currentGlmApiKey = ref<string>('')
const currentRecordingPath = ref<string>('')
const currentModelPath = ref<string>('')
const currentCustomized = ref<boolean>(false)
const currentCustomizedApp = ref('')
const currentCustomizedCommand = ref('')
const currentStartTimeoutSeconds = ref<number>(30)
const sLangList = computed(() => {
const langList = computed(() => {
for(let item of captionEngine.value){
if(item.value === currentEngine.value) {
return item.languages.filter(item => item.type <= 0)
}
}
return []
})
const tLangList = computed(() => {
for(let item of captionEngine.value){
if(item.value === currentEngine.value) {
return item.languages.filter(item => item.type >= 0)
}
}
return []
})
const transModel = computed(() => {
for(let item of captionEngine.value){
if(item.value === currentEngine.value) {
return item.transModel
return item.languages
}
}
return []
})
function applyChange(){
if(
currentTranslation.value && transModel.value &&
currentTransModel.value === 'ollama' && !currentOllamaName.value.trim()
) {
notification.open({
message: t('noti.ollamaNameNull'),
description: t('noti.ollamaNameNullNote'),
duration: null,
icon: () => h(ExclamationCircleOutlined, { style: 'color: #ff4d4f' })
})
return
}
engineControl.sourceLang = currentSourceLang.value
engineControl.targetLang = currentTargetLang.value
engineControl.transModel = currentTransModel.value
engineControl.ollamaName = currentOllamaName.value
engineControl.engine = currentEngine.value
engineControl.ollamaUrl = currentOllamaUrl.value ?? "http://localhost:11434"
engineControl.ollamaApiKey = currentOllamaApiKey.value
engineControl.audio = currentAudio.value
engineControl.translation = currentTranslation.value
engineControl.recording = currentRecording.value
engineControl.API_KEY = currentAPI_KEY.value
engineControl.voskModelPath = currentVoskModelPath.value
engineControl.sosvModelPath = currentSosvModelPath.value
engineControl.glmUrl = currentGlmUrl.value ?? "https://open.bigmodel.cn/api/paas/v4/audio/transcriptions"
engineControl.glmModel = currentGlmModel.value ?? "glm-asr-2512"
engineControl.glmApiKey = currentGlmApiKey.value
engineControl.recordingPath = currentRecordingPath.value
engineControl.modelPath = currentModelPath.value
engineControl.customized = currentCustomized.value
engineControl.customizedApp = currentCustomizedApp.value
engineControl.customizedCommand = currentCustomizedCommand.value
engineControl.startTimeoutSeconds = currentStartTimeoutSeconds.value
engineControl.sendControlsChange()
@@ -392,36 +173,20 @@ function applyChange(){
function cancelChange(){
currentSourceLang.value = engineControl.sourceLang
currentTargetLang.value = engineControl.targetLang
currentTransModel.value = engineControl.transModel
currentOllamaName.value = engineControl.ollamaName
currentOllamaUrl.value = engineControl.ollamaUrl
currentOllamaApiKey.value = engineControl.ollamaApiKey
currentEngine.value = engineControl.engine
currentAudio.value = engineControl.audio
currentTranslation.value = engineControl.translation
currentRecording.value = engineControl.recording
currentAPI_KEY.value = engineControl.API_KEY
currentVoskModelPath.value = engineControl.voskModelPath
currentSosvModelPath.value = engineControl.sosvModelPath
currentGlmUrl.value = engineControl.glmUrl
currentGlmModel.value = engineControl.glmModel
currentGlmApiKey.value = engineControl.glmApiKey
currentRecordingPath.value = engineControl.recordingPath
currentModelPath.value = engineControl.modelPath
currentCustomized.value = engineControl.customized
currentCustomizedApp.value = engineControl.customizedApp
currentCustomizedCommand.value = engineControl.customizedCommand
currentStartTimeoutSeconds.value = engineControl.startTimeoutSeconds
}
function selectFolderPath(type: 'vosk' | 'sosv' | 'rec') {
function selectFolderPath() {
window.electron.ipcRenderer.invoke('control.folder.select').then((folderPath) => {
if(!folderPath) return
if(type == 'vosk')
currentVoskModelPath.value = folderPath
else if(type == 'sosv')
currentSosvModelPath.value = folderPath
else if(type == 'rec')
currentRecordingPath.value = folderPath
currentModelPath.value = folderPath
})
}
@@ -435,12 +200,9 @@ watch(changeSignal, (val) => {
watch(currentEngine, (val) => {
if(val == 'vosk'){
currentSourceLang.value = 'auto'
currentTargetLang.value = useGeneralSettingStore().uiLanguage
if(currentTargetLang.value === 'zh') {
currentTargetLang.value = 'zh-cn'
}
currentTargetLang.value = ''
}
else{
else if(val == 'gummy'){
currentSourceLang.value = 'auto'
currentTargetLang.value = useGeneralSettingStore().uiLanguage
}
@@ -456,8 +218,8 @@ watch(currentEngine, (val) => {
}
.info-label {
color: #1677ff;
cursor: pointer;
font-style: italic;
}
.input-folder {
@@ -468,12 +230,20 @@ watch(currentEngine, (val) => {
transition: all 0.25s;
}
.input-folder>span {
padding: 0 2px;
border: 2px solid #1677ff;
color: #1677ff;
border-radius: 30%;
}
.input-folder:hover {
transform: scale(1.1);
}
.customize-note {
padding: 10px 10px 0;
color: red;
max-width: min(40vw, 480px);
}
</style>

View File

@@ -1,7 +1,7 @@
<template>
<div class="caption-stat">
<a-row>
<a-col :span="5">
<a-col :span="6">
<a-statistic
:title="$t('status.engine')"
:value="customized?$t('status.customized'):engine"
@@ -36,7 +36,7 @@
</a-col>
</a-row>
</template>
<a-col :span="5" @mouseenter="getEngineInfo" style="cursor: pointer;">
<a-col :span="6" @mouseenter="getEngineInfo" style="cursor: pointer;">
<a-statistic
:title="$t('status.status')"
:value="engineEnabled?$t('status.started'):$t('status.stopped')"
@@ -47,13 +47,10 @@
</a-statistic>
</a-col>
</a-popover>
<a-col :span="5">
<a-col :span="6">
<a-statistic :title="$t('status.logNumber')" :value="captionData.length" />
</a-col>
<a-col :span="5">
<a-statistic :title="$t('status.logNumber2')" :value="softwareLogs.length" />
</a-col>
<a-col :span="4">
<a-col :span="6">
<div class="about-tag">{{ $t('status.aboutProj') }}</div>
<GithubOutlined class="proj-info" @click="showAbout = true"/>
</a-col>
@@ -67,26 +64,11 @@
@click="openCaptionWindow"
>{{ $t('status.openCaption') }}</a-button>
<a-button
v-if="!isStarting"
class="control-button"
:loading="pending && !engineEnabled"
:disabled="pending || engineEnabled"
@click="startEngine"
>{{ $t('status.startEngine') }}</a-button>
<a-popconfirm
v-if="isStarting"
:title="$t('status.forceKillConfirm')"
:ok-text="$t('status.confirm')"
:cancel-text="$t('status.cancel')"
@confirm="forceKillEngine"
>
<a-button
danger
class="control-button"
type="primary"
:icon="h(LoadingOutlined)"
>{{ $t('status.forceKillStarting') }}</a-button>
</a-popconfirm>
<a-button
danger class="control-button"
:loading="pending && engineEnabled"
@@ -101,7 +83,7 @@
<p class="about-desc">{{ $t('status.about.desc') }}</p>
<a-divider />
<div class="about-info">
<p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v1.1.0</a-tag></p>
<p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v0.6.0</a-tag></p>
<p>
<b>{{ $t('status.about.author') }}</b>
<a
@@ -143,21 +125,17 @@
<script setup lang="ts">
import { EngineInfo } from '@renderer/types'
import { ref, watch, h } from 'vue'
import { ref, watch } from 'vue'
import { storeToRefs } from 'pinia'
import { useCaptionLogStore } from '@renderer/stores/captionLog'
import { useSoftwareLogStore } from '@renderer/stores/softwareLog'
import { useEngineControlStore } from '@renderer/stores/engineControl'
import { GithubOutlined, InfoCircleOutlined, LoadingOutlined } from '@ant-design/icons-vue'
import { GithubOutlined, InfoCircleOutlined } from '@ant-design/icons-vue';
const showAbout = ref(false)
const pending = ref(false)
const isStarting = ref(false)
const captionLog = useCaptionLogStore()
const { captionData } = storeToRefs(captionLog)
const softwareLog = useSoftwareLogStore()
const { softwareLogs } = storeToRefs(softwareLog)
const engineControl = useEngineControlStore()
const { engineEnabled, engine, customized, errorSignal } = storeToRefs(engineControl)
@@ -174,17 +152,8 @@ function openCaptionWindow() {
function startEngine() {
pending.value = true
isStarting.value = true
if(engineControl.engine === 'vosk' && engineControl.voskModelPath.trim() === '') {
if(engineControl.engine === 'vosk' && engineControl.modelPath.trim() === '') {
engineControl.emptyModelPathErr()
pending.value = false
isStarting.value = false
return
}
if(engineControl.engine === 'sosv' && engineControl.sosvModelPath.trim() === '') {
engineControl.emptyModelPathErr()
pending.value = false
isStarting.value = false
return
}
window.electron.ipcRenderer.send('control.engine.start')
@@ -195,12 +164,6 @@ function stopEngine() {
window.electron.ipcRenderer.send('control.engine.stop')
}
function forceKillEngine() {
pending.value = true
isStarting.value = false
window.electron.ipcRenderer.send('control.engine.forceKill')
}
function getEngineInfo() {
window.electron.ipcRenderer.invoke('control.engine.info').then((data: EngineInfo) => {
pid.value = data.pid
@@ -212,16 +175,12 @@ function getEngineInfo() {
})
}
watch(engineEnabled, (enabled) => {
watch(engineEnabled, () => {
pending.value = false
if (enabled) {
isStarting.value = false
}
})
watch(errorSignal, () => {
pending.value = false
isStarting.value = false
errorSignal.value = false
})
</script>

View File

@@ -28,24 +28,11 @@
</a-radio-group>
</div>
<div class="input-item">
<span class="input-label">{{ $t('general.color') }}</span>
<a-radio-group v-model:value="uiColor">
<template v-for="color in colorList" :key="color">
<a-radio-button :value="color"
:style="{backgroundColor: color}"
>
<CheckOutlined style="color: white;" v-if="color === uiColor" />
<span v-else>&nbsp;</span>
</a-radio-button>
</template>
</a-radio-group>
</div>
<div class="input-item">
<span class="input-label">{{ $t('general.barWidth') }}</span>
<a-slider class="span-input"
:min="6" :max="12" v-model:value="leftBarWidth"
<a-input
type="range" class="span-input"
min="6" max="12" v-model:value="leftBarWidth"
/>
<div class="input-item-value">{{ (leftBarWidth * 100 / 24).toFixed(0) }}%</div>
</div>
@@ -54,55 +41,19 @@
</template>
<script setup lang="ts">
import { ref, watch } from 'vue'
import { storeToRefs } from 'pinia'
import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
import { InfoCircleOutlined, CheckOutlined } from '@ant-design/icons-vue'
import { InfoCircleOutlined } from '@ant-design/icons-vue';
const generalSettingStore = useGeneralSettingStore()
const { uiLanguage, realTheme, uiTheme, uiColor, leftBarWidth } = storeToRefs(generalSettingStore)
const colorListLight = [
'#1677ff',
'#00b96b',
'#fa8c16',
'#9254de',
'#eb2f96',
'#000000'
]
const colorListDark = [
'#1677ff',
'#00b96b',
'#fa8c16',
'#9254de',
'#eb2f96',
'#b9d7ea'
]
const colorList = ref(colorListLight)
watch(realTheme, (val) => {
if(val === 'dark') {
colorList.value = colorListDark
} else {
colorList.value = colorListLight
}
console.log(val)
})
watch(uiTheme, (val) => {
console.log(val)
})
const { uiLanguage, uiTheme, leftBarWidth } = storeToRefs(generalSettingStore)
</script>
<style scoped>
@import url(../assets/input.css);
.span-input {
display: inline-block;
width: 100px;
margin: 0;
}
.general-note {

View File

@@ -1,115 +0,0 @@
<template>
<div>
<div class="log-title">
<span style="margin-right: 30px;">{{ $t('log.title2') }}</span>
</div>
<a-button
danger
@click="softwareLog.clear()"
>{{ $t('log.clear') }}</a-button>
</div>
<a-table
:columns="columns"
:data-source="softwareLogs"
v-model:pagination="pagination"
style="margin-top: 10px;"
>
<template #bodyCell="{ column, record }">
<template v-if="column.key === 'index'">
{{ record.index }}
</template>
<template v-if="column.key === 'type'">
<code :class="record.type">{{ record.type }}</code>
</template>
<template v-if="column.key === 'time'">
<code>{{ record.time }}</code>
</template>
<template v-if="column.key === 'content'">
<code>{{ record.text }}</code>
</template>
</template>
</a-table>
</template>
<script setup lang="ts">
import { ref } from 'vue'
import { storeToRefs } from 'pinia'
import { useSoftwareLogStore } from '@renderer/stores/softwareLog'
import { type SoftwareLogItem } from '../types'
const softwareLog = useSoftwareLogStore()
const { softwareLogs } = storeToRefs(softwareLog)
const pagination = ref({
current: 1,
pageSize: 20,
showSizeChanger: true,
pageSizeOptions: ['10', '20', '50', '100'],
onChange: (page: number, pageSize: number) => {
pagination.value.current = page
pagination.value.pageSize = pageSize
},
onShowSizeChange: (current: number, size: number) => {
pagination.value.current = current
pagination.value.pageSize = size
}
})
const columns = [
{
title: 'index',
dataIndex: 'index',
key: 'index',
width: 80,
sorter: (a: SoftwareLogItem, b: SoftwareLogItem) => {
if(a.index <= b.index) return -1
return 1
},
sortDirections: ['descend'],
defaultSortOrder: 'descend',
},
{
title: 'type',
dataIndex: 'type',
key: 'type',
width: 80,
sorter: (a: SoftwareLogItem, b: SoftwareLogItem) => {
if(a.type <= b.type) return -1
return 1
},
},
{
title: 'time',
dataIndex: 'time',
key: 'time',
width: 135,
sortDirections: ['descend'],
},
{
title: 'content',
dataIndex: 'content',
key: 'content',
},
]
</script>
<style scoped>
.log-title {
color: var(--icon-color);
display: inline-block;
font-size: 24px;
font-weight: bold;
margin: 10px 0;
}
.WARN {
color: #ff7c05;
font-weight: bold;
}
.ERROR {
color: #ff0000;
font-weight: bold;
}
</style>

View File

@@ -1,228 +1,78 @@
// type: -1 仅为源语言0 两者皆可, 1 仅为翻译语言
export const engines = {
zh: [
{
value: 'gummy',
label: '云端 / 阿里云 / Gummy',
label: '云端 - 阿里云 - Gummy',
languages: [
{ value: 'auto', type: -1, label: '自动检测' },
{ value: 'en', type: 0, label: '英语' },
{ value: 'zh', type: 0, label: '中文' },
{ value: 'ja', type: 0, label: '日语' },
{ value: 'ko', type: 0, label: '韩语' },
{ value: 'de', type: -1, label: '德语' },
{ value: 'fr', type: -1, label: '法语' },
{ value: 'ru', type: -1, label: '俄语' },
{ value: 'es', type: -1, label: '西班牙语' },
{ value: 'it', type: -1, label: '意大利语' },
{ value: 'yue', type: -1, label: '粤语' },
{ value: 'auto', label: '自动检测' },
{ value: 'en', label: '英语' },
{ value: 'zh', label: '中文' },
{ value: 'ja', label: '日语' },
{ value: 'ko', label: '韩语' },
{ value: 'de', label: '德语' },
{ value: 'fr', label: '法语' },
{ value: 'ru', label: '俄语' },
{ value: 'es', label: '西班牙语' },
{ value: 'it', label: '意大利语' },
]
},
{
value: 'vosk',
label: '本地 / Vosk',
label: '本地 - Vosk',
languages: [
{ value: 'auto', type: -1, label: '需要自行配置模型' },
{ value: 'en', type: 1, label: '英语' },
{ value: 'zh-cn', type: 1, label: '中文' },
{ value: 'ja', type: 1, label: '日语' },
{ value: 'ko', type: 1, label: '韩语' },
{ value: 'de', type: 1, label: '德语' },
{ value: 'fr', type: 1, label: '法语' },
{ value: 'ru', type: 1, label: '俄语' },
{ value: 'es', type: 1, label: '西班牙语' },
{ value: 'it', type: 1, label: '意大利语' },
],
transModel: [
{ value: 'ollama', label: 'Ollama 模型或 OpenAI 兼容模型' },
{ value: 'google', label: 'Google API 调用' },
]
},
{
value: 'sosv',
label: '本地 / SOSV',
languages: [
{ value: 'auto', type: -1, label: '自动检测' },
{ value: 'en', type: 0, label: '英语' },
{ value: 'zh', type: 0, label: '中文' },
{ value: 'ja', type: 0, label: '日语' },
{ value: 'ko', type: 0, label: '韩语' },
{ value: 'yue', type: -1, label: '粤语' },
{ value: 'de', type: 1, label: '德语' },
{ value: 'fr', type: 1, label: '法语' },
{ value: 'ru', type: 1, label: '俄语' },
{ value: 'es', type: 1, label: '西班牙语' },
{ value: 'it', type: 1, label: '意大利语' },
],
transModel: [
{ value: 'ollama', label: 'Ollama 模型或 OpenAI 兼容模型' },
{ value: 'google', label: 'Google API 调用' },
]
},
{
value: 'glm',
label: '云端 / 智谱AI / GLM-ASR',
languages: [
{ value: 'auto', type: -1, label: '自动检测' },
{ value: 'en', type: 0, label: '英语' },
{ value: 'zh', type: 0, label: '中文' },
{ value: 'ja', type: 0, label: '日语' },
{ value: 'ko', type: 0, label: '韩语' },
],
transModel: [
{ value: 'ollama', label: 'Ollama 模型或 OpenAI 兼容模型' },
{ value: 'google', label: 'Google API 调用' },
{ value: 'auto', label: '需要自行配置模型' },
]
}
],
en: [
{
value: 'gummy',
label: 'Cloud / Alibaba Cloud / Gummy',
label: 'Cloud - Alibaba Cloud - Gummy',
languages: [
{ value: 'auto', type: -1, label: 'Auto Detect' },
{ value: 'en', type: 0, label: 'English' },
{ value: 'zh', type: 0, label: 'Chinese' },
{ value: 'ja', type: 0, label: 'Japanese' },
{ value: 'ko', type: 0, label: 'Korean' },
{ value: 'de', type: -1, label: 'German' },
{ value: 'fr', type: -1, label: 'French' },
{ value: 'ru', type: -1, label: 'Russian' },
{ value: 'es', type: -1, label: 'Spanish' },
{ value: 'it', type: -1, label: 'Italian' },
{ value: 'yue', type: -1, label: 'Cantonese' },
{ value: 'auto', label: 'Auto Detect' },
{ value: 'en', label: 'English' },
{ value: 'zh', label: 'Chinese' },
{ value: 'ja', label: 'Japanese' },
{ value: 'ko', label: 'Korean' },
{ value: 'de', label: 'German' },
{ value: 'fr', label: 'French' },
{ value: 'ru', label: 'Russian' },
{ value: 'es', label: 'Spanish' },
{ value: 'it', label: 'Italian' },
]
},
{
value: 'vosk',
label: 'Local / Vosk',
label: 'Local - Vosk',
languages: [
{ value: 'auto', type: -1, label: 'Model needs to be configured manually' },
{ value: 'en', type: 1, label: 'English' },
{ value: 'zh-cn', type: 1, label: 'Chinese' },
{ value: 'ja', type: 1, label: 'Japanese' },
{ value: 'ko', type: 1, label: 'Korean' },
{ value: 'de', type: 1, label: 'German' },
{ value: 'fr', type: 1, label: 'French' },
{ value: 'ru', type: 1, label: 'Russian' },
{ value: 'es', type: 1, label: 'Spanish' },
{ value: 'it', type: 1, label: 'Italian' },
],
transModel: [
{ value: 'ollama', label: 'Ollama Model or OpenAI-compatible Model' },
{ value: 'google', label: 'Google API Call' },
]
},
{
value: 'sosv',
label: 'Local / SOSV',
languages: [
{ value: 'auto', type: -1, label: 'Auto Detect' },
{ value: 'en', type: 0, label: 'English' },
{ value: 'zh-cn', type: 0, label: 'Chinese' },
{ value: 'ja', type: 0, label: 'Japanese' },
{ value: 'ko', type: 0, label: 'Korean' },
{ value: 'yue', type: -1, label: 'Cantonese' },
{ value: 'de', type: 1, label: 'German' },
{ value: 'fr', type: 1, label: 'French' },
{ value: 'ru', type: 1, label: 'Russian' },
{ value: 'es', type: 1, label: 'Spanish' },
{ value: 'it', type: 1, label: 'Italian' },
],
transModel: [
{ value: 'ollama', label: 'Ollama Model or OpenAI-compatible Model' },
{ value: 'google', label: 'Google API Call' },
]
},
{
value: 'glm',
label: 'Cloud / Zhipu AI / GLM-ASR',
languages: [
{ value: 'auto', type: -1, label: 'Auto Detect' },
{ value: 'en', type: 0, label: 'English' },
{ value: 'zh', type: 0, label: 'Chinese' },
{ value: 'ja', type: 0, label: 'Japanese' },
{ value: 'ko', type: 0, label: 'Korean' },
],
transModel: [
{ value: 'ollama', label: 'Ollama Model or OpenAI-compatible Model' },
{ value: 'google', label: 'Google API Call' },
{ value: 'auto', label: 'Model needs to be configured manually' },
]
}
],
ja: [
{
value: 'gummy',
label: 'クラウド / アリババクラウド / Gummy',
label: 'クラウド - アリババクラウド - Gummy',
languages: [
{ value: 'auto', type: -1, label: '自動検出' },
{ value: 'en', type: 0, label: '英語' },
{ value: 'zh', type: 0, label: '中国語' },
{ value: 'ja', type: 0, label: '日本語' },
{ value: 'ko', type: 0, label: '韓国語' },
{ value: 'de', type: -1, label: 'ドイツ語' },
{ value: 'fr', type: -1, label: 'フランス語' },
{ value: 'ru', type: -1, label: 'ロシア語' },
{ value: 'es', type: -1, label: 'スペイン語' },
{ value: 'it', type: -1, label: 'イタリア語' },
{ value: 'yue', type: -1, label: '広東語' },
{ value: 'auto', label: '自動検出' },
{ value: 'en', label: '英語' },
{ value: 'zh', label: '中国語' },
{ value: 'ja', label: '日本語' },
{ value: 'ko', label: '韓国語' },
{ value: 'de', label: 'ドイツ語' },
{ value: 'fr', label: 'フランス語' },
{ value: 'ru', label: 'ロシア語' },
{ value: 'es', label: 'スペイン語' },
{ value: 'it', label: 'イタリア語' },
]
},
{
value: 'vosk',
label: 'ローカル / Vosk',
label: 'ローカル - Vosk',
languages: [
{ value: 'auto', type: -1, label: 'モデルを手動で設定する必要があります' },
{ value: 'en', type: 1, label: '英語' },
{ value: 'zh-cn', type: 1, label: '中国語' },
{ value: 'ja', type: 1, label: '日本語' },
{ value: 'ko', type: 1, label: '韓国語' },
{ value: 'de', type: 1, label: 'ドイツ語' },
{ value: 'fr', type: 1, label: 'フランス語' },
{ value: 'ru', type: 1, label: 'ロシア語' },
{ value: 'es', type: 1, label: 'スペイン語' },
{ value: 'it', type: 1, label: 'イタリア語' },
],
transModel: [
{ value: 'ollama', label: 'Ollama モデルまたは OpenAI 互換モデル' },
{ value: 'google', label: 'Google API 呼び出し' },
]
},
{
value: 'sosv',
label: 'ローカル / SOSV',
languages: [
{ value: 'auto', type: -1, label: '自動検出' },
{ value: 'en', type: 0, label: '英語' },
{ value: 'zh-cn', type: 0, label: '中国語' },
{ value: 'ja', type: 0, label: '日本語' },
{ value: 'ko', type: 0, label: '韓国語' },
{ value: 'yue', type: -1, label: '広東語' },
{ value: 'de', type: 1, label: 'ドイツ語' },
{ value: 'fr', type: 1, label: 'フランス語' },
{ value: 'ru', type: 1, label: 'ロシア語' },
{ value: 'es', type: 1, label: 'スペイン語' },
{ value: 'it', type: 1, label: 'イタリア語' },
],
transModel: [
{ value: 'ollama', label: 'Ollama モデルまたは OpenAI 互換モデル' },
{ value: 'google', label: 'Google API 呼び出し' },
]
},
{
value: 'glm',
label: 'クラウド / 智譜AI / GLM-ASR',
languages: [
{ value: 'auto', type: -1, label: '自動検出' },
{ value: 'en', type: 0, label: '英語' },
{ value: 'zh', type: 0, label: '中国語' },
{ value: 'ja', type: 0, label: '日本語' },
{ value: 'ko', type: 0, label: '韓国語' },
],
transModel: [
{ value: 'ollama', label: 'Ollama モデルまたは OpenAI 互換モデル' },
{ value: 'google', label: 'Google API 呼び出し' },
{ value: 'auto', label: 'モデルを手動で設定する必要があります' },
]
}
]
}

View File

@@ -1,41 +0,0 @@
import { h } from 'vue';
import { OrderedListOutlined, FileTextOutlined } from '@ant-design/icons-vue'
export const logMenu = {
zh: [
{
key: 'captionLog',
icon: () => h(OrderedListOutlined),
label: '字幕记录',
},
{
key: 'projLog',
icon: () => h(FileTextOutlined),
label: '日志记录',
},
],
en: [
{
key: 'captionLog',
icon: () => h(OrderedListOutlined),
label: 'Caption Log',
},
{
key: 'projLog',
icon: () => h(FileTextOutlined),
label: 'Software Log',
},
],
ja: [
{
key: 'captionLog',
icon: () => h(OrderedListOutlined),
label: '字幕記録',
},
{
key: 'projLog',
icon: () => h(FileTextOutlined),
label: 'ログ記録',
},
]
}

View File

@@ -1,29 +1,10 @@
import { theme } from 'ant-design-vue';
let isLight = true
let themeColor = '#1677ff'
export function setThemeColor(color: string) {
themeColor = color
}
export function getTheme(curIsLight?: boolean) {
const lightTheme = {
token: {
colorPrimary: themeColor,
colorInfo: themeColor
}
}
const darkTheme = {
export const antDesignTheme = {
light: {
token: {}
},
dark: {
algorithm: theme.darkAlgorithm,
token: {
colorPrimary: themeColor,
colorInfo: themeColor
}
}
if(curIsLight !== undefined){
isLight = curIsLight
}
return isLight ? lightTheme : darkTheme
}
}

View File

@@ -18,4 +18,3 @@ export * from './config/engine'
export * from './config/audio'
export * from './config/theme'
export * from './config/linebreak'
export * from './config/logMenu'

View File

@@ -18,19 +18,16 @@ export default {
"args": ", command arguments: ",
"pidInfo": ", caption engine process PID: ",
"empty": "Model Path is Empty",
"emptyInfo": "The model path for the selected local model is empty. Please download the corresponding model first, and then configure the model path in [Caption Engine Settings > More Settings].",
"emptyInfo": "The Vosk model path is empty. Please set the Vosk model path in the additional settings of the subtitle engine settings.",
"stopped": "Caption Engine Stopped",
"stoppedInfo": "The caption engine has stopped. You can click the 'Start Caption Engine' button to restart it.",
"error": "An error occurred",
"engineError": "The caption engine encountered an error and requested a forced exit.",
"engineError": "The subtitle engine encountered an error and requested a forced exit.",
"socketError": "The Socket connection between the main program and the caption engine failed",
"engineChange": "Cpation Engine Configuration Changed",
"changeInfo": "If the caption engine is already running, you need to restart it for the changes to take effect.",
"styleChange": "Caption Style Changed",
"styleInfo": "Caption style changes have been saved and applied.",
"engineStartTimeout": "Caption engine startup timeout, automatically force stopped",
"ollamaNameNull": "'Ollama' Field is Empty",
"ollamaNameNullNote": "When selecting Ollama model as the translation model, the 'Ollama' field cannot be empty and must be filled with the name of a locally configured Ollama model."
"styleInfo": "Caption style changes have been saved and applied."
},
general: {
"title": "General Settings",
@@ -40,8 +37,7 @@ export default {
"theme": "Theme",
"light": "light",
"dark": "dark",
"system": "system",
"color": "Color"
"system": "system"
},
engine: {
"title": "Caption Engine Settings",
@@ -49,33 +45,17 @@ export default {
"cancelChange": "Cancel Changes",
"sourceLang": "Source",
"transLang": "Translation",
"transModel": "Model",
"modelName": "Model Name",
"modelNameNote": "Please enter the translation model name you wish to use, which can be either a local Ollama model or an OpenAI API compatible cloud model. If the Base URL field is left blank, the local Ollama service will be called by default; otherwise, the API service at the specified address will be called via the Python OpenAI library.",
"baseURL": "The base request URL for calling OpenAI API. If left empty, the local default port Ollama model will be used.",
"apiKey": "The API KEY required for the model corresponding to OpenAI API.",
"captionEngine": "Engine",
"audioType": "Audio Type",
"systemOutput": "System Audio Output (Speaker)",
"systemInput": "System Audio Input (Microphone)",
"enableTranslation": "Translation",
"enableRecording": "Enable Recording",
"showMore": "More Settings",
"apikey": "API KEY",
"voskModelPath": "Vosk Path",
"sosvModelPath": "SOSV Path",
"recordingPath": "Save Path",
"startTimeout": "Timeout",
"seconds": "seconds",
"apikeyInfo": "API KEY required for the Gummy caption engine, which needs to be obtained from the Alibaba Cloud Bailing platform. For more details, see the project user manual.",
"glmApikeyInfo": "API KEY required for GLM caption engine, which needs to be obtained from the Zhipu AI platform.",
"voskModelPathInfo": "The folder path of the model required by the Vosk caption engine. You need to download the required model to your local machine in advance. For more details, see the project user manual.",
"sosvModelPathInfo": "The folder path of the model required by the SOSV caption engine. You need to download the required model to your local machine in advance. For more details, see the project user manual.",
"recordingPathInfo": "The path to save recording files, requiring a folder path. The software will automatically name the recording file and save it as .wav file.",
"modelDownload": "Model Download Link",
"startTimeoutInfo": "Caption engine startup timeout duration. Engine will be forcefully stopped if startup exceeds this time. Recommended range: 10-120 seconds.",
"customEngine": "Custom Eng",
"modelPath": "Model Path",
"apikeyInfo": "API KEY required for the Gummy subtitle engine, which needs to be obtained from the Alibaba Cloud Bailing platform. For more details, see the project user manual.",
"modelPathInfo": "The folder path of the model required by the Vosk subtitle engine. You need to download the required model to your local machine in advance. For more details, see the project user manual.",
"customEngine": "Custom Engine",
custom: {
"title": "Custom Caption Engine",
"attention": "Attention",
@@ -88,7 +68,6 @@ export default {
"title": "Caption Style Settings",
"applyStyle": "Apply",
"cancelChange": "Cancel",
"lineNumber": "CaptionLines",
"resetStyle": "Reset",
"longCaption": "LongCaption",
"fontFamily": "Font Family",
@@ -126,17 +105,11 @@ export default {
"started": "Started",
"stopped": "Not Started",
"logNumber": "Caption Count",
"logNumber2": "Log Count",
"aboutProj": "About Project",
"openCaption": "Open Caption Window",
"startEngine": "Start Caption Engine",
"restartEngine": "Restart Caption Engine",
"stopEngine": "Stop Caption Engine",
"forceKill": "Force Stop",
"forceKillStarting": "Starting Engine... (Force Stop)",
"forceKillConfirm": "Are you sure you want to force stop the caption engine? This will terminate the process immediately.",
"confirm": "Confirm",
"cancel": "Cancel",
about: {
"title": "About This Project",
"proj": "Auto Caption Project",
@@ -146,7 +119,7 @@ export default {
"projLink": "Project Link",
"manual": "User Manual",
"engineDoc": "Caption Engine Manual",
"date": "January 10th, 2026"
"date": "July 30, 2025"
}
},
log: {
@@ -169,10 +142,7 @@ export default {
"both": "Both",
"source": "Original",
"translation": "Translation",
"copyNum": "Copy Count",
"all": "All",
"copySuccess": "Subtitle copied to clipboard",
"clear": "Clear Log",
"title2": "Software Log"
"clear": "Clear Log"
}
}

View File

@@ -18,7 +18,7 @@ export default {
"args": "、コマンド引数:",
"pidInfo": "、字幕エンジンプロセス PID",
"empty": "モデルパスが空です",
"emptyInfo": "選択されたローカルモデルに対応するモデルパスが空です。まず対応するモデルをダウンロードし、次に【字幕エンジン設定 > 詳細設定】でモデルのパスを設定してください。",
"emptyInfo": "Vosk モデルパスが空です。字幕エンジン設定の追加設定で Vosk モデルのパスを設定してください。",
"stopped": "字幕エンジンが停止しました",
"stoppedInfo": "字幕エンジンが停止しました。再起動するには「字幕エンジンを開始」ボタンをクリックしてください。",
"error": "エラーが発生しました",
@@ -27,10 +27,7 @@ export default {
"engineChange": "字幕エンジンの設定が変更されました",
"changeInfo": "字幕エンジンがすでに起動している場合、変更を有効にするには再起動が必要です。",
"styleChange": "字幕のスタイルが変更されました",
"styleInfo": "字幕のスタイル変更が保存され、適用されました",
"engineStartTimeout": "字幕エンジンの起動がタイムアウトしました。自動的に強制停止しました",
"ollamaNameNull": "Ollama フィールドが空です",
"ollamaNameNullNote": "Ollama モデルを翻訳モデルとして選択する場合、Ollama フィールドは空にできません。ローカルで設定された Ollama モデルの名前を入力してください。"
"styleInfo": "字幕のスタイル変更が保存され、適用されました"
},
general: {
"title": "一般設定",
@@ -40,8 +37,7 @@ export default {
"theme": "テーマ",
"light": "明るい",
"dark": "暗い",
"system": "システム",
"color": "カラー"
"system": "システム"
},
engine: {
"title": "字幕エンジン設定",
@@ -49,32 +45,17 @@ export default {
"cancelChange": "変更をキャンセル",
"sourceLang": "ソース言語",
"transLang": "翻訳言語",
"transModel": "翻訳モデル",
"modelName": "モデル名",
"modelNameNote": "使用する翻訳モデル名を入力してください。Ollama のローカルモデルでも OpenAI API 互換のクラウドモデルでも可能です。Base URL フィールドが未入力の場合、デフォルトでローカルの Ollama サービスが呼び出され、それ以外の場合は Python OpenAI ライブラリ経由で指定されたアドレスの API サービスが呼び出されます。",
"baseURL": "OpenAI API を呼び出すための基本リクエスト URL です。未記入の場合、ローカルのデフォルトポートの Ollama モデルが呼び出されます。",
"apiKey": "OpenAI API に対応するモデルを使用するために必要な API キーです。",
"captionEngine": "エンジン",
"audioType": "オーディオ",
"systemOutput": "システムオーディオ出力(スピーカー)",
"systemInput": "システムオーディオ入力(マイク)",
"enableTranslation": "翻訳",
"enableRecording": "録音",
"showMore": "詳細設定",
"apikey": "API KEY",
"voskModelPath": "Voskパス",
"sosvModelPath": "SOSVパス",
"recordingPath": "保存パス",
"startTimeout": "時間制限",
"seconds": "秒",
"modelPath": "モデルパス",
"apikeyInfo": "Gummy 字幕エンジンに必要な API KEY は、アリババクラウド百煉プラットフォームから取得する必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
"glmApikeyInfo": "GLM 字幕エンジンに必要な API KEY で、智譜 AI プラットフォームから取得する必要があります。",
"voskModelPathInfo": "Vosk 字幕エンジンに必要なモデルのフォルダパスです。必要なモデルを事前にローカルマシンにダウンロードする必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
"sosvModelPathInfo": "SOSV 字幕エンジンに必要なモデルのフォルダパスです。必要なモデルを事前にローカルマシンにダウンロードする必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
"recordingPathInfo": "録音ファイルの保存パスで、フォルダパスを指定する必要があります。ソフトウェアが自動的に録音ファイルに名前を付けて .wav ファイルとして保存します。",
"modelDownload": "モデルダウンロードリンク",
"startTimeoutInfo": "字幕エンジンの起動タイムアウト時間です。この時間を超えると自動的に強制停止されます。10-120秒の範囲で設定することを推奨します。",
"customEngine": "カスタム",
"modelPathInfo": "Vosk 字幕エンジンに必要なモデルのフォルダパスです。必要なモデルを事前にローカルマシンにダウンロードする必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
"customEngine": "カスタムエンジン",
custom: {
"title": "カスタムキャプションエンジン",
"attention": "注意事項",
@@ -88,7 +69,6 @@ export default {
"applyStyle": "適用",
"cancelChange": "キャンセル",
"resetStyle": "リセット",
"lineNumber": "字幕行数",
"longCaption": "長い字幕",
"fontFamily": "フォント",
"fontColor": "カラー",
@@ -120,22 +100,16 @@ export default {
"cpu": "CPU 使用率",
"mem": "メモリ使用量",
"elapsed": "稼働時間",
"customized": "カスタ",
"customized": "カスタマイズ済み",
"status": "エンジン状態",
"started": "開始済み",
"stopped": "未開始",
"logNumber": "字幕数",
"logNumber2": "ログ数",
"aboutProj": "プロジェクト情報",
"openCaption": "字幕ウィンドウを開く",
"startEngine": "字幕エンジンを開始",
"restartEngine": "字幕エンジンを再起動",
"stopEngine": "字幕エンジンを停止",
"forceKill": "強制停止",
"forceKillStarting": "エンジン起動中... (強制停止)",
"forceKillConfirm": "字幕エンジンを強制停止しますか?プロセスが直ちに終了されます。",
"confirm": "確認",
"cancel": "キャンセル",
about: {
"title": "このプロジェクトについて",
"proj": "Auto Caption プロジェクト",
@@ -145,11 +119,11 @@ export default {
"projLink": "プロジェクトリンク",
"manual": "ユーザーマニュアル",
"engineDoc": "字幕エンジンマニュアル",
"date": "2026110 日"
"date": "2025730 日"
}
},
log: {
"title": "字幕記録",
"title": "字幕ログ",
"changeTime": "時間を変更",
"baseTime": "最初の字幕開始時間",
"hour": "時",
@@ -157,7 +131,7 @@ export default {
"sec": "秒",
"ms": "ミリ秒",
"export": "エクスポート",
"copy": "記録をコピー",
"copy": "ログをコピー",
"exportOptions": "エクスポートオプション",
"exportFormat": "形式",
"exportContent": "内容",
@@ -168,10 +142,7 @@ export default {
"both": "すべて",
"source": "原文",
"translation": "翻訳",
"copyNum": "コピー数",
"all": "すべて",
"copySuccess": "字幕がクリップボードにコピーされました",
"clear": "記録をクリア",
"title2": "ログ記録"
"clear": "ログをクリア"
}
}

View File

@@ -18,7 +18,7 @@ export default {
"args": ",命令参数:",
"pidInfo": ",字幕引擎进程 PID",
"empty": "模型路径为空",
"emptyInfo": "选择的本地模型对应的模型路径为空。请先下载对应模型,然后在【字幕引擎设置 > 更多设置】中配置对应模型的路径。",
"emptyInfo": "Vosk 模型模型路径为空,请在字幕引擎设置更多设置中设置 Vosk 模型的路径。",
"stopped": "字幕引擎停止",
"stoppedInfo": "字幕引擎已经停止,可点击“启动字幕引擎”按钮重新启动",
"error": "发生错误",
@@ -27,10 +27,7 @@ export default {
"engineChange": "字幕引擎配置已更改",
"changeInfo": "如果字幕引擎已经启动,需要重启字幕引擎修改才会生效",
"styleChange": "字幕样式已修改",
"styleInfo": "字幕样式修改已经保存并生效",
"engineStartTimeout": "字幕引擎启动超时,已自动强制停止",
"ollamaNameNull": "Ollama 字段为空",
"ollamaNameNullNote": "选择 Ollama 模型作为翻译模型时Ollama 字段不能为空,需要填写本地已经配置好的 Ollama 模型的名称。"
"styleInfo": "字幕样式修改已经保存并生效"
},
general: {
"title": "通用设置",
@@ -40,8 +37,7 @@ export default {
"theme": "主题",
"light": "浅色",
"dark": "深色",
"system": "系统",
"color": "颜色"
"system": "系统"
},
engine: {
"title": "字幕引擎设置",
@@ -49,31 +45,16 @@ export default {
"cancelChange": "取消更改",
"sourceLang": "源语言",
"transLang": "翻译语言",
"transModel": "翻译模型",
"modelName": "模型名称",
"modelNameNote": "请输入要使用的翻译模型名称,可以是 Ollama 本地模型,也可以是 OpenAI API 兼容的云端模型。若未填写 Base URL 字段,则默认调用本地 Ollama 服务,否则会通过 Python OpenAI 库调用该地址指向的 API 服务。",
"baseURL": "调用 OpenAI API 的基础请求地址,如果不填写则调用本地默认端口的 Ollama 模型。",
"apiKey": "调用 OpenAI API 对应的模型需要使用的 API KEY。",
"captionEngine": "字幕引擎",
"audioType": "音频类型",
"systemOutput": "系统音频输出(扬声器)",
"systemInput": "系统音频输入(麦克风)",
"enableTranslation": "启用翻译",
"enableRecording": "启用录制",
"showMore": "更多设置",
"apikey": "API KEY",
"voskModelPath": "Vosk路径",
"sosvModelPath": "SOSV路径",
"recordingPath": "保存路径",
"startTimeout": "启动超时",
"seconds": "秒",
"modelPath": "模型路径",
"apikeyInfo": "Gummy 字幕引擎需要的 API KEY需要在阿里云百炼平台获取。详细信息见项目用户手册。",
"glmApikeyInfo": "GLM 字幕引擎需要的 API KEY需要在智谱 AI 平台获取。",
"voskModelPathInfo": "Vosk 字幕引擎需要的模型的文件夹路径,需要提前下载需要的模型到本地。信息详情见项目用户手册。",
"sosvModelPathInfo": "SOSV 字幕引擎需要的模型的文件夹路径,需要提前下载需要的模型到本地。信息详情见项目用户手册。",
"recordingPathInfo": "录音文件保存路径,需要提供文件夹路径。软件会自动命名录音文件并保存为 .wav 文件。",
"modelDownload": "模型下载地址",
"startTimeoutInfo": "字幕引擎启动超时时间,超过此时间将自动强制停止。建议设置为 10-120 秒之间。",
"modelPathInfo": "Vosk 字幕引擎需要的模型的文件夹路径,需要提前下载需要的模型到本地。信息详情见项目用户手册。",
"customEngine": "自定义引擎",
custom: {
"title": "自定义字幕引擎",
@@ -88,7 +69,6 @@ export default {
"applyStyle": "应用样式",
"cancelChange": "取消更改",
"resetStyle": "恢复默认",
"lineNumber": "字幕行数",
"longCaption": "长字幕",
"fontFamily": "字体族",
"fontColor": "字体颜色",
@@ -125,17 +105,11 @@ export default {
"started": "已启动",
"stopped": "未启动",
"logNumber": "字幕数量",
"logNumber2": "日志数量",
"aboutProj": "项目关于",
"openCaption": "打开字幕窗口",
"startEngine": "启动字幕引擎",
"restartEngine": "重启字幕引擎",
"stopEngine": "关闭字幕引擎",
"forceKill": "强行停止",
"forceKillStarting": "正在启动引擎... (强行停止)",
"forceKillConfirm": "确定要强行停止字幕引擎吗?这将立即终止进程。",
"confirm": "确定",
"cancel": "取消",
about: {
"title": "关于本项目",
"proj": "Auto Caption 项目",
@@ -145,7 +119,7 @@ export default {
"projLink": "项目链接",
"manual": "用户手册",
"engineDoc": "字幕引擎手册",
"date": "2026110 日"
"date": "2025730 日"
}
},
log: {
@@ -168,10 +142,7 @@ export default {
"both": "全部",
"source": "原文",
"translation": "翻译",
"copyNum": "复制数量",
"all": "全部",
"copySuccess": "字幕已复制到剪贴板",
"clear": "清空记录",
"title2": "日志记录"
"clear": "清空记录"
}
}

View File

@@ -15,12 +15,7 @@ export const useCaptionLogStore = defineStore('captionLog', () => {
})
window.electron.ipcRenderer.on('both.captionLog.upd', (_, log) => {
for(let i = captionData.value.length - 1; i >= 0; i--) {
if(captionData.value[i].time_s === log.time_s){
captionData.value.splice(i, 1, log)
break
}
}
captionData.value.splice(captionData.value.length - 1, 1, log)
})
window.electron.ipcRenderer.on('both.captionLog.set', (_, logs) => {

View File

@@ -4,7 +4,6 @@ import { Styles } from '@renderer/types'
import { breakOptions } from '@renderer/i18n'
export const useCaptionStyleStore = defineStore('captionStyle', () => {
const lineNumber = ref<number>(1)
const lineBreak = ref<number>(1)
const fontFamily = ref<string>('sans-serif')
const fontSize = ref<number>(24)
@@ -39,7 +38,6 @@ export const useCaptionStyleStore = defineStore('captionStyle', () => {
function sendStylesChange() {
const styles: Styles = {
lineNumber: lineNumber.value,
lineBreak: lineBreak.value,
fontFamily: fontFamily.value,
fontSize: fontSize.value,
@@ -67,7 +65,6 @@ export const useCaptionStyleStore = defineStore('captionStyle', () => {
}
function setStyles(args: Styles){
lineNumber.value = args.lineNumber
lineBreak.value = args.lineBreak
fontFamily.value = args.fontFamily
fontSize.value = args.fontSize
@@ -94,7 +91,6 @@ export const useCaptionStyleStore = defineStore('captionStyle', () => {
})
return {
lineNumber, // 显示字幕行数
lineBreak, // 换行方式
fontFamily, // 字体族
fontSize, // 字体大小

View File

@@ -19,25 +19,14 @@ export const useEngineControlStore = defineStore('engineControl', () => {
const engineEnabled = ref(false)
const sourceLang = ref<string>('en')
const targetLang = ref<string>('zh')
const transModel = ref<string>('ollama')
const ollamaName = ref<string>('')
const ollamaUrl = ref<string>('')
const ollamaApiKey = ref<string>('')
const engine = ref<string>('gummy')
const audio = ref<0 | 1>(0)
const translation = ref<boolean>(true)
const recording = ref<boolean>(false)
const API_KEY = ref<string>('')
const voskModelPath = ref<string>('')
const sosvModelPath = ref<string>('')
const glmUrl = ref<string>('https://open.bigmodel.cn/api/paas/v4/audio/transcriptions')
const glmModel = ref<string>('glm-asr-2512')
const glmApiKey = ref<string>('')
const recordingPath = ref<string>('')
const modelPath = ref<string>('')
const customized = ref<boolean>(false)
const customizedApp = ref<string>('')
const customizedCommand = ref<string>('')
const startTimeoutSeconds = ref<number>(30)
const changeSignal = ref<boolean>(false)
const errorSignal = ref<boolean>(false)
@@ -47,25 +36,14 @@ export const useEngineControlStore = defineStore('engineControl', () => {
engineEnabled: engineEnabled.value,
sourceLang: sourceLang.value,
targetLang: targetLang.value,
transModel: transModel.value,
ollamaName: ollamaName.value,
ollamaUrl: ollamaUrl.value,
ollamaApiKey: ollamaApiKey.value,
engine: engine.value,
audio: audio.value,
translation: translation.value,
recording: recording.value,
API_KEY: API_KEY.value,
voskModelPath: voskModelPath.value,
sosvModelPath: sosvModelPath.value,
glmUrl: glmUrl.value,
glmModel: glmModel.value,
glmApiKey: glmApiKey.value,
recordingPath: recordingPath.value,
modelPath: modelPath.value,
customized: customized.value,
customizedApp: customizedApp.value,
customizedCommand: customizedCommand.value,
startTimeoutSeconds: startTimeoutSeconds.value
customizedCommand: customizedCommand.value
}
window.electron.ipcRenderer.send('control.controls.change', controls)
}
@@ -88,35 +66,23 @@ export const useEngineControlStore = defineStore('engineControl', () => {
}
sourceLang.value = controls.sourceLang
targetLang.value = controls.targetLang
transModel.value = controls.transModel
ollamaName.value = controls.ollamaName
ollamaUrl.value = controls.ollamaUrl
ollamaApiKey.value = controls.ollamaApiKey
engine.value = controls.engine
audio.value = controls.audio
engineEnabled.value = controls.engineEnabled
translation.value = controls.translation
recording.value = controls.recording
API_KEY.value = controls.API_KEY
voskModelPath.value = controls.voskModelPath
sosvModelPath.value = controls.sosvModelPath
glmUrl.value = controls.glmUrl || 'https://open.bigmodel.cn/api/paas/v4/audio/transcriptions'
glmModel.value = controls.glmModel || 'glm-asr-2512'
glmApiKey.value = controls.glmApiKey
recordingPath.value = controls.recordingPath
modelPath.value = controls.modelPath
customized.value = controls.customized
customizedApp.value = controls.customizedApp
customizedCommand.value = controls.customizedCommand
startTimeoutSeconds.value = controls.startTimeoutSeconds
changeSignal.value = true
}
function emptyModelPathErr() {
notification.open({
placement: 'topLeft',
message: t('noti.empty'),
description: t('noti.emptyInfo'),
duration: null,
icon: () => h(ExclamationCircleOutlined, { style: 'color: #ff4d4f' })
description: t('noti.emptyInfo')
});
}
@@ -163,25 +129,14 @@ export const useEngineControlStore = defineStore('engineControl', () => {
engineEnabled, // 字幕引擎是否启用
sourceLang, // 源语言
targetLang, // 目标语言
transModel, // 翻译模型
ollamaName, // Ollama 模型
ollamaUrl,
ollamaApiKey,
engine, // 字幕引擎
audio, // 选择音频
translation, // 是否启用翻译
recording, // 是否启用录音
API_KEY, // API KEY
voskModelPath, // vosk 模型路径
sosvModelPath, // sosv 模型路径
glmUrl, // GLM API URL
glmModel, // GLM 模型名称
glmApiKey, // GLM API Key
recordingPath, // 录音保存路径
modelPath, // vosk 模型路径
customized, // 是否使用自定义字幕引擎
customizedApp, // 自定义字幕引擎的应用程序
customizedCommand, // 自定义字幕引擎的命令
startTimeoutSeconds, // 启动超时时间(秒)
setControls, // 设置引擎配置
sendControlsChange, // 发送最新控制消息到后端
emptyModelPathErr, // 模型路径为空时显示警告

View File

@@ -3,35 +3,20 @@ import { defineStore } from 'pinia'
import { i18n } from '../i18n'
import type { UILanguage, UITheme } from '../types'
import { engines, audioTypes, breakOptions, setThemeColor, getTheme } from '../i18n'
import { engines, audioTypes, antDesignTheme, breakOptions } from '../i18n'
import { useEngineControlStore } from './engineControl'
import { useCaptionStyleStore } from './captionStyle'
type RealTheme = 'light' | 'dark'
export const useGeneralSettingStore = defineStore('generalSetting', () => {
const uiLanguage = ref<UILanguage>('zh')
const realTheme = ref<RealTheme>('light')
const uiTheme = ref<UITheme>('system')
const uiColor = ref<string>('#1677ff')
const leftBarWidth = ref<number>(8)
const antdTheme = ref<Object>(getTheme())
function handleThemeChange(newTheme: RealTheme) {
realTheme.value = newTheme
if(newTheme === 'dark' && uiColor.value === '#000000') {
uiColor.value = '#b9d7ea'
}
if(newTheme === 'light' && uiColor.value === '#b9d7ea') {
uiColor.value = '#000000'
}
}
const antdTheme = ref<Object>(antDesignTheme['light'])
window.electron.ipcRenderer.invoke('control.nativeTheme.get').then((theme) => {
if(theme === 'light') setLightTheme()
else if(theme === 'dark') setDarkTheme()
handleThemeChange(theme)
})
watch(uiLanguage, (newValue) => {
@@ -45,48 +30,30 @@ export const useGeneralSettingStore = defineStore('generalSetting', () => {
watch(uiTheme, (newValue) => {
window.electron.ipcRenderer.send('control.uiTheme.change', newValue)
if(newValue === 'system'){
window.electron.ipcRenderer.invoke('control.nativeTheme.get').then((theme: RealTheme) => {
window.electron.ipcRenderer.invoke('control.nativeTheme.get').then((theme) => {
if(theme === 'light') setLightTheme()
else if(theme === 'dark') setDarkTheme()
handleThemeChange(theme)
})
}
else if(newValue === 'light'){
setLightTheme()
handleThemeChange('light')
}
else if(newValue === 'dark') {
setDarkTheme()
handleThemeChange('dark')
}
})
watch(uiColor, (newValue) => {
setThemeColor(newValue)
antdTheme.value = getTheme()
window.electron.ipcRenderer.send('control.uiColor.change', newValue)
else if(newValue === 'light') setLightTheme()
else if(newValue === 'dark') setDarkTheme()
})
watch(leftBarWidth, (newValue) => {
window.electron.ipcRenderer.send('control.leftBarWidth.change', newValue)
})
watch(realTheme, (newValue) => {
console.log('realTheme', newValue)
})
window.electron.ipcRenderer.on('control.uiLanguage.set', (_, args: UILanguage) => {
uiLanguage.value = args
})
window.electron.ipcRenderer.on('control.nativeTheme.change', (_, args: RealTheme) => {
window.electron.ipcRenderer.on('control.nativeTheme.change', (_, args) => {
if(args === 'light') setLightTheme()
else if(args === 'dark') setDarkTheme()
handleThemeChange(args)
})
function setLightTheme(){
antdTheme.value = getTheme(true)
antdTheme.value = antDesignTheme.light
const root = document.documentElement
root.style.setProperty('--control-background', '#fff')
root.style.setProperty('--tag-color', 'rgba(0, 0, 0, 0.45)')
@@ -94,7 +61,7 @@ export const useGeneralSettingStore = defineStore('generalSetting', () => {
}
function setDarkTheme(){
antdTheme.value = getTheme(false)
antdTheme.value = antDesignTheme.dark
const root = document.documentElement
root.style.setProperty('--control-background', '#000')
root.style.setProperty('--tag-color', 'rgba(255, 255, 255, 0.45)')
@@ -103,9 +70,7 @@ export const useGeneralSettingStore = defineStore('generalSetting', () => {
return {
uiLanguage,
realTheme,
uiTheme,
uiColor,
leftBarWidth,
antdTheme
}

View File

@@ -1,21 +0,0 @@
import { ref } from 'vue'
import { defineStore } from 'pinia'
import { type SoftwareLogItem } from '../types'
export const useSoftwareLogStore = defineStore('softwareLog', () => {
const softwareLogs = ref<SoftwareLogItem[]>([])
function clear() {
softwareLogs.value = []
}
window.electron.ipcRenderer.on('control.softwareLog.add', (_, log) => {
softwareLogs.value.push(log)
console.log(log)
})
return {
softwareLogs,
clear
}
})

View File

@@ -6,29 +6,17 @@ export interface Controls {
engineEnabled: boolean,
sourceLang: string,
targetLang: string,
transModel: string,
ollamaName: string,
ollamaUrl: string,
ollamaApiKey: string,
engine: string,
audio: 0 | 1,
translation: boolean,
recording: boolean,
API_KEY: string,
voskModelPath: string,
sosvModelPath: string,
glmUrl: string,
glmModel: string,
glmApiKey: string,
recordingPath: string,
modelPath: string,
customized: boolean,
customizedApp: string,
customizedCommand: string,
startTimeoutSeconds: number
customizedCommand: string
}
export interface Styles {
lineNumber: number,
lineBreak: number,
fontFamily: string,
fontSize: number,
@@ -57,23 +45,14 @@ export interface CaptionItem {
translation: string
}
export interface SoftwareLogItem {
type: "INFO" | "WARN" | "ERROR",
index: number,
time: string,
text: string
}
export interface FullConfig {
platform: string,
uiLanguage: UILanguage,
uiTheme: UITheme,
uiColor: string,
leftBarWidth: number,
styles: Styles,
controls: Controls,
captionLog: CaptionItem[],
softwareLog: SoftwareLogItem[]
captionLog: CaptionItem[]
}
export interface EngineInfo {

View File

@@ -12,53 +12,26 @@
textShadow: captionStyle.textShadow ? `${captionStyle.offsetX}px ${captionStyle.offsetY}px ${captionStyle.blur}px ${captionStyle.textShadowColor}` : 'none'
}"
>
<template v-if="captionData.length">
<template
v-for="val in revArr[Math.min(captionStyle.lineNumber, captionData.length)]"
:key="captionData[captionData.length - val].time_s"
>
<p :class="[captionStyle.lineBreak?'':'left-ellipsis']" :style="{
fontFamily: captionStyle.fontFamily,
fontSize: captionStyle.fontSize + 'px',
color: captionStyle.fontColor,
fontWeight: captionStyle.fontWeight * 100
}">
<span>{{ captionData[captionData.length - val].text }}</span>
</p>
<p :class="[captionStyle.lineBreak?'':'left-ellipsis']"
v-if="captionStyle.transDisplay && captionData[captionData.length - val].translation"
:style="{
fontFamily: captionStyle.transFontFamily,
fontSize: captionStyle.transFontSize + 'px',
color: captionStyle.transFontColor,
fontWeight: captionStyle.transFontWeight * 100
}">
<span>{{ captionData[captionData.length - val].translation }}</span>
</p>
</template>
</template>
<template v-else>
<template v-for="val in captionStyle.lineNumber" :key="val">
<p :class="[captionStyle.lineBreak?'':'left-ellipsis']" :style="{
fontFamily: captionStyle.fontFamily,
fontSize: captionStyle.fontSize + 'px',
color: captionStyle.fontColor,
fontWeight: captionStyle.fontWeight * 100
}">
<span>{{ $t('example.original') }}</span>
</p>
<p :class="[captionStyle.lineBreak?'':'left-ellipsis']"
v-if="captionStyle.transDisplay"
:style="{
fontFamily: captionStyle.transFontFamily,
fontSize: captionStyle.transFontSize + 'px',
color: captionStyle.transFontColor,
fontWeight: captionStyle.transFontWeight * 100
}">
<span>{{ $t('example.translation') }}</span>
</p>
</template>
</template>
<p :class="[captionStyle.lineBreak?'':'left-ellipsis']" :style="{
fontFamily: captionStyle.fontFamily,
fontSize: captionStyle.fontSize + 'px',
color: captionStyle.fontColor,
fontWeight: captionStyle.fontWeight * 100
}">
<span v-if="captionData.length">{{ captionData[captionData.length-1].text }}</span>
<span v-else>{{ $t('example.original') }}</span>
</p>
<p :class="[captionStyle.lineBreak?'':'left-ellipsis']"
v-if="captionStyle.transDisplay"
:style="{
fontFamily: captionStyle.transFontFamily,
fontSize: captionStyle.transFontSize + 'px',
color: captionStyle.transFontColor,
fontWeight: captionStyle.transFontWeight * 100
}">
<span v-if="captionData.length">{{ captionData[captionData.length-1].translation }}</span>
<span v-else>{{ $t('example.translation') }}</span>
</p>
</div>
<div class="title-bar" :style="{color: captionStyle.fontColor}">
@@ -83,14 +56,6 @@ import { ref, onMounted } from 'vue';
import { useCaptionStyleStore } from '@renderer/stores/captionStyle';
import { useCaptionLogStore } from '@renderer/stores/captionLog';
import { storeToRefs } from 'pinia';
const revArr = {
1: [1],
2: [2, 1],
3: [3, 2, 1],
4: [4, 3, 2, 1],
}
const captionStyle = useCaptionStyleStore();
const captionLog = useCaptionLogStore();
const { captionData } = storeToRefs(captionLog);

View File

@@ -11,13 +11,7 @@
<a-col :span="24 - leftBarWidth">
<div class="caption-data">
<EngineStatus />
<div class="log-container">
<a-menu v-model:selectedKeys="current" mode="horizontal" :items="items" />
<div style="padding: 16px;">
<CaptionLog v-if="current[0] === 'captionLog'" />
<SoftwareLog v-else />
</div>
</div>
<CaptionLog />
</div>
</a-col>
</a-row>
@@ -30,22 +24,11 @@ import CaptionStyle from '../components/CaptionStyle.vue'
import EngineControl from '../components/EngineControl.vue'
import EngineStatus from '@renderer/components/EngineStatus.vue'
import CaptionLog from '../components/CaptionLog.vue'
import SoftwareLog from '@renderer/components/SoftwareLog.vue'
import { storeToRefs } from 'pinia'
import { useGeneralSettingStore } from '@renderer/stores/generalSetting'
import { ref, watch } from 'vue'
import { MenuProps } from 'ant-design-vue'
import { logMenu } from '@renderer/i18n'
const generalSettingStore = useGeneralSettingStore()
const { leftBarWidth, antdTheme, uiLanguage } = storeToRefs(generalSettingStore)
const current = ref<string[]>(['captionLog'])
const items = ref<MenuProps['items']>(logMenu[uiLanguage.value])
watch(uiLanguage, (val) => {
items.value = logMenu[val]
})
const { leftBarWidth, antdTheme } = storeToRefs(generalSettingStore)
</script>
<style scoped>
@@ -70,10 +53,4 @@ watch(uiLanguage, (val) => {
.caption-data::-webkit-scrollbar {
display: none;
}
.log-container {
padding: 20px 10px;
border-radius: 8px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
}
</style>