Compare commits
7 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
564954a834 | ||
|
|
aed15af386 | ||
|
|
4f9d33abc1 | ||
|
|
0dc70d491e | ||
|
|
086ea90a5f | ||
|
|
3324b630d1 | ||
|
|
0825e48902 |
5
.gitignore
vendored
@@ -7,8 +7,13 @@ out
|
||||
__pycache__
|
||||
.venv
|
||||
test.py
|
||||
|
||||
engine/build
|
||||
engine/portaudio
|
||||
engine/pyinstaller_cache
|
||||
engine/models
|
||||
engine/notebook
|
||||
# engine/main.spec
|
||||
|
||||
.repomap
|
||||
.virtualme
|
||||
|
||||
68
README.md
@@ -3,7 +3,7 @@
|
||||
<h1 align="center">auto-caption</h1>
|
||||
<p>Auto Caption 是一个跨平台的实时字幕显示软件。</p>
|
||||
<p>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.0.0-blue"></a>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.1.1-blue"></a>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
|
||||
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
|
||||
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
|
||||
@@ -14,7 +14,7 @@
|
||||
| <a href="./README_en.md">English</a>
|
||||
| <a href="./README_ja.md">日本語</a> |
|
||||
</p>
|
||||
<p><i>v1.0.0 版本已经发布,新增 SOSV 本地字幕模型。当前功能已经基本完整,暂无继续开发计划...</i></p>
|
||||
<p><i>v1.1.1 版本已经发布,新增 GLM-ASR 云端字幕模型和 OpenAI 兼容模型翻译...</i></p>
|
||||
</div>
|
||||
|
||||

|
||||
@@ -35,18 +35,24 @@ SOSV 模型下载:[ Shepra-ONNX SenseVoice Model](https://github.com/HiMeditat
|
||||
|
||||
[更新日志](./docs/CHANGELOG.md)
|
||||
|
||||
## 👁️🗨️ 预览
|
||||
|
||||
https://github.com/user-attachments/assets/9c188d78-9520-4397-bacf-4c8fdcc54874
|
||||
|
||||
## ✨ 特性
|
||||
|
||||
- 生成音频输出或麦克风输入的字幕
|
||||
- 支持调用本地 Ollama 模型或云端 Google 翻译 API 进行翻译
|
||||
- 支持调用本地 Ollama 模型、云端 OpenAI 兼容模型、或云端 Google 翻译 API 进行翻译
|
||||
- 跨平台(Windows、macOS、Linux)、多界面语言(中文、英语、日语)支持
|
||||
- 丰富的字幕样式设置(字体、字体大小、字体粗细、字体颜色、背景颜色等)
|
||||
- 灵活的字幕引擎选择(阿里云 Gummy 云端模型、本地 Vosk 模型、本地 SOSV 模型、还可以自己开发模型)
|
||||
- 灵活的字幕引擎选择(阿里云 Gummy 云端模型、GLM-ASR 云端模型、本地 Vosk 模型、本地 SOSV 模型、还可以自己开发模型)
|
||||
- 多语言识别与翻译(见下文“⚙️ 自带字幕引擎说明”)
|
||||
- 字幕记录展示与导出(支持导出 `.srt` 和 `.json` 格式)
|
||||
|
||||
## 📖 基本使用
|
||||
|
||||
> ⚠️ 注意:目前只维护了 Windows 平台的软件的最新版本,其他平台的最后版本停留在 v1.0.0。
|
||||
|
||||
软件已经适配了 Windows、macOS 和 Linux 平台。测试过的主流平台信息如下:
|
||||
|
||||
| 操作系统版本 | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
|
||||
@@ -59,14 +65,15 @@ macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置,
|
||||
|
||||
下载软件后,需要根据自己的需求选择对应的模型,然后配置模型。
|
||||
|
||||
| | 识别效果 | 部署类型 | 支持语言 | 翻译 | 备注 |
|
||||
| ------------------------------------------------------------ | -------- | ------------- | ---------- | ---------- | ---------------------------------------------------------- |
|
||||
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | 很好😊 | 云端 / 阿里云 | 10 种 | 自带翻译 | 收费,0.54CNY / 小时 |
|
||||
| [Vosk](https://alphacephei.com/vosk) | 较差😞 | 本地 / CPU | 超过 30 种 | 需额外配置 | 支持的语言非常多 |
|
||||
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | 一般😐 | 本地 / CPU | 5 种 | 需额外配置 | 仅有一个模型 |
|
||||
| 自己开发 | 🤔 | 自定义 | 自定义 | 自定义 | 根据[文档](./docs/engine-manual/zh.md)使用 Python 自己开发 |
|
||||
| | 准确率 | 实时性 | 部署类型 | 支持语言 | 翻译 | 备注 |
|
||||
| ------------------------------------------------------------ | -------- | ------------- | ---------- | ---------- | ---------------------------------------------------------- | ---------------------------------------------------------- |
|
||||
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | 很好😊 | 很好😊 | 云端 / 阿里云 | 10 种 | 自带翻译 | 收费,0.54CNY / 小时 |
|
||||
| [glm-asr-2512](https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512) | 很好😊 | 较差😞 | 云端 / 智谱 AI | 4 种 | 需额外配置 | 收费,约 0.72CNY / 小时 |
|
||||
| [Vosk](https://alphacephei.com/vosk) | 较差😞 | 很好😊 | 本地 / CPU | 超过 30 种 | 需额外配置 | 支持的语言非常多 |
|
||||
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | 一般😐 | 一般😐 | 本地 / CPU | 5 种 | 需额外配置 | 仅有一个模型 |
|
||||
| 自己开发 | 🤔 | 🤔 | 自定义 | 自定义 | 自定义 | 根据[文档](./docs/engine-manual/zh.md)使用 Python 自己开发 |
|
||||
|
||||
如果你选择使用 Vosk 或 SOSV 模型,你还需要配置自己的翻译模型。
|
||||
如果你选择的不是 Gummy 模型,你还需要配置自己的翻译模型。
|
||||
|
||||
### 配置翻译模型
|
||||
|
||||
@@ -78,11 +85,22 @@ macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置,
|
||||
|
||||
> 注意:使用参数量过大的模型会导致资源消耗和翻译延迟较大。建议使用参数量小于 1B 的模型,比如: `qwen2.5:0.5b`, `qwen3:0.6b`。
|
||||
|
||||
使用该模型之前你需要确定本机安装了 [Ollama](https://ollama.com/) 软件,并已经下载了需要的大语言模型。只需要将需要调用的大模型名称添加到设置中的 `Ollama` 字段中。
|
||||
使用该模型之前你需要确定本机安装了 [Ollama](https://ollama.com/) 软件,并已经下载了需要的大语言模型。只需要将需要调用的大模型名称添加到设置中的 `模型名称` 字段中,并保证 `Base URL` 字段为空。
|
||||
|
||||
#### OpenAI 兼容模型
|
||||
|
||||
如果觉得本地 Ollama 模型的翻译效果不佳,或者不想在本地安装 Ollama 模型,那么可以使用云端的 OpenAI 兼容模型。
|
||||
|
||||
以下是一些模型提供商的 `Base URL`:
|
||||
- OpenAI: https://api.openai.com/v1
|
||||
- DeepSeek:https://api.deepseek.com
|
||||
- 阿里云:https://dashscope.aliyuncs.com/compatible-mode/v1
|
||||
|
||||
API Key 需要在对应的模型提供商处获取。
|
||||
|
||||
#### Google 翻译 API
|
||||
|
||||
> 注意:Google 翻译 API 在部分地区无法使用。
|
||||
> 注意:Google 翻译 API 在无法访问国际网络的地区无法使用。
|
||||
|
||||
无需任何配置,联网即可使用。
|
||||
|
||||
@@ -90,11 +108,17 @@ macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置,
|
||||
|
||||
> 国际版的阿里云服务似乎并没有提供 Gummy 模型,因此目前非中国用户可能无法使用 Gummy 字幕引擎。
|
||||
|
||||
如果要使用默认的 Gummy 字幕引擎(使用云端模型进行语音识别和翻译),首先需要获取阿里云百炼平台的 API KEY,然后将 API KEY 添加到软件设置中或者配置到环境变量中(仅 Windows 平台支持读取环境变量中的 API KEY),这样才能正常使用该模型。相关教程:
|
||||
如果要使用默认的 Gummy 字幕引擎(使用云端模型进行语音识别和翻译),首先需要获取阿里云百炼平台的 API KEY,然后将 API KEY 添加到软件设置中(在字幕引擎设置的更多设置中)或者配置到环境变量中(仅 Windows 平台支持读取环境变量中的 API KEY),这样才能正常使用该模型。相关教程:
|
||||
|
||||
- [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
### 使用 GLM-ASR 模型
|
||||
|
||||
使用前需要获取智谱 AI 平台的 API KEY,并添加到软件设置中。
|
||||
|
||||
API KEY 获取相关链接:[快速开始](https://docs.bigmodel.cn/cn/guide/start/quick-start)。
|
||||
|
||||
### 使用 Vosk 模型
|
||||
|
||||
> Vosk 模型的识别效果较差,请谨慎使用。
|
||||
@@ -132,7 +156,7 @@ python main.py \
|
||||
|
||||
## ⚙️ 自带字幕引擎说明
|
||||
|
||||
目前软件自带 3 个字幕引擎,正在规划新的引擎。它们的详细信息如下。
|
||||
目前软件自带 4 个字幕引擎。它们的详细信息如下。
|
||||
|
||||
### Gummy 字幕引擎(云端)
|
||||
|
||||
@@ -159,6 +183,10 @@ $$
|
||||
|
||||
而且引擎只会获取到音频流的时候才会上传数据,因此实际上传速率可能更小。模型结果回传流量消耗较小,没有纳入考虑。
|
||||
|
||||
### GLM-ASR 字幕引擎(云端)
|
||||
|
||||
https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512
|
||||
|
||||
### Vosk 字幕引擎(本地)
|
||||
|
||||
基于 [vosk-api](https://github.com/alphacep/vosk-api) 开发。该字幕引擎的优点是可选的语言模型非常多(超过 30 种),缺点是识别效果比较差,且生成内容没有标点符号。
|
||||
@@ -168,16 +196,6 @@ $$
|
||||
|
||||
[SOSV](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model) 是一个整合包,该整合包主要基于 [Shepra-ONNX SenseVoice](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html),并添加了端点检测模型和标点恢复模型。该模型支持识别的语言有:英语、中文、日语、韩语、粤语。
|
||||
|
||||
### 新规划字幕引擎
|
||||
|
||||
以下为备选模型,将根据模型效果和集成难易程度选择。
|
||||
|
||||
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
|
||||
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
|
||||
- [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
|
||||
- [FunASR](https://github.com/modelscope/FunASR)
|
||||
- [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit)
|
||||
|
||||
## 🚀 项目运行
|
||||
|
||||

|
||||
|
||||
64
README_en.md
@@ -3,7 +3,7 @@
|
||||
<h1 align="center">auto-caption</h1>
|
||||
<p>Auto Caption is a cross-platform real-time caption display software.</p>
|
||||
<p>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.0.0-blue"></a>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.1.1-blue"></a>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
|
||||
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
|
||||
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
|
||||
@@ -14,7 +14,7 @@
|
||||
| <b>English</b>
|
||||
| <a href="./README_ja.md">日本語</a> |
|
||||
</p>
|
||||
<p><i>Version 1.0.0 has been released, with the addition of the SOSV local caption model. The current features are basically complete, and there are no further development plans...</i></p>
|
||||
<p><i>v1.1.1 has been released, adding the GLM-ASR cloud caption model and OpenAI compatible model translation...</i></p>
|
||||
</div>
|
||||
|
||||

|
||||
@@ -35,18 +35,24 @@ SOSV Model Download: [Shepra-ONNX SenseVoice Model](https://github.com/HiMeditat
|
||||
|
||||
[Changelog](./docs/CHANGELOG.md)
|
||||
|
||||
## 👁️🗨️ Preview
|
||||
|
||||
https://github.com/user-attachments/assets/9c188d78-9520-4397-bacf-4c8fdcc54874
|
||||
|
||||
## ✨ Features
|
||||
|
||||
- Generate captions from audio output or microphone input
|
||||
- Supports translation by calling local Ollama models or cloud-based Google Translate API
|
||||
- Supports calling local Ollama models, cloud-based OpenAI compatible models, or cloud-based Google Translate API for translation
|
||||
- Cross-platform (Windows, macOS, Linux) and multi-language interface (Chinese, English, Japanese) support
|
||||
- Rich caption style settings (font, font size, font weight, font color, background color, etc.)
|
||||
- Flexible caption engine selection (Alibaba Cloud Gummy cloud model, local Vosk model, local SOSV model, or you can develop your own model)
|
||||
- Flexible caption engine selection (Aliyun Gummy cloud model,GLM-ASR cloud model, local Vosk model, local SOSV model, or you can develop your own model)
|
||||
- Multi-language recognition and translation (see below "⚙️ Built-in Subtitle Engines")
|
||||
- Subtitle record display and export (supports exporting `.srt` and `.json` formats)
|
||||
|
||||
## 📖 Basic Usage
|
||||
|
||||
> ⚠️ Note: Currently, only the latest version of the software on Windows platform is maintained, while the last versions for other platforms remain at v1.0.0.
|
||||
|
||||
The software has been adapted for Windows, macOS, and Linux platforms. The tested platform information is as follows:
|
||||
|
||||
| OS Version | Architecture | System Audio Input | System Audio Output |
|
||||
@@ -60,14 +66,15 @@ Additional configuration is required to capture system audio output on macOS and
|
||||
|
||||
After downloading the software, you need to select the corresponding model according to your needs and then configure the model.
|
||||
|
||||
| | Recognition Quality | Deployment Type | Supported Languages | Translation | Notes |
|
||||
| ------------------------------------------------------------ | ------------------- | ------------------ | ------------------- | ------------- | ---------------------------------------------------------- |
|
||||
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | Excellent 😊 | Alibaba Cloud | 10 languages | Built-in | Paid, 0.54 CNY/hour |
|
||||
| [Vosk](https://alphacephei.com/vosk) | Poor 😞 | Local / CPU | Over 30 languages | Requires setup | Supports many languages |
|
||||
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | Fair 😐 | Local / CPU | 5 languages | Requires setup | Only one model available |
|
||||
| DIY Development | 🤔 | Custom | Custom | Custom | Develop your own using Python according to [documentation](./docs/engine-manual/zh.md) |
|
||||
| | Accuracy | Real-time | Deployment Type | Supported Languages | Translation | Notes |
|
||||
| ------------------------------------------------------------ | -------- | --------- | --------------- | ------------------- | ----------- | ----- |
|
||||
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | Very good 😊 | Very good 😊 | Cloud / Alibaba Cloud | 10 languages | Built-in translation | Paid, 0.54 CNY/hour |
|
||||
| [glm-asr-2512](https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512) | Very good 😊 | Poor 😞 | Cloud / Zhipu AI | 4 languages | Requires additional configuration | Paid, approximately 0.72 CNY/hour |
|
||||
| [Vosk](https://alphacephei.com/vosk) | Poor 😞 | Very good 😊 | Local / CPU | Over 30 languages | Requires additional configuration | Supports many languages |
|
||||
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | Average 😐 | Average 😐 | Local / CPU | 5 languages | Requires additional configuration | Only one model |
|
||||
| Self-developed | 🤔 | 🤔 | Custom | Custom | Custom | Develop your own using Python according to the [documentation](./docs/engine-manual/en.md) |
|
||||
|
||||
If you choose to use the Vosk or SOSV model, you also need to configure your own translation model.
|
||||
If you choose a model other than Gummy, you also need to configure your own translation model.
|
||||
|
||||
### Configuring Translation Models
|
||||
|
||||
@@ -79,7 +86,18 @@ If you choose to use the Vosk or SOSV model, you also need to configure your own
|
||||
|
||||
> Note: Using models with too many parameters will lead to high resource consumption and translation delays. It is recommended to use models with less than 1B parameters, such as: `qwen2.5:0.5b`, `qwen3:0.6b`.
|
||||
|
||||
Before using this model, you need to ensure that [Ollama](https://ollama.com/) software is installed on your machine and the required large language model has been downloaded. Simply add the name of the large model you want to call to the `Ollama` field in the settings.
|
||||
Before using this model, you need to confirm that the [Ollama](https://ollama.com/) software is installed on your local machine and that you have downloaded the required large language model. Simply add the name of the large model you want to call to the `Model Name` field in the settings, and ensure that the `Base URL` field is empty.
|
||||
|
||||
#### OpenAI Compatible Model
|
||||
|
||||
If you feel the translation effect of the local Ollama model is not good enough, or don't want to install the Ollama model locally, then you can use cloud-based OpenAI compatible models.
|
||||
|
||||
Here are some model provider `Base URL`s:
|
||||
- OpenAI: https://api.openai.com/v1
|
||||
- DeepSeek: https://api.deepseek.com
|
||||
- Alibaba Cloud: https://dashscope.aliyuncs.com/compatible-mode/v1
|
||||
|
||||
The API Key needs to be obtained from the corresponding model provider.
|
||||
|
||||
#### Google Translate API
|
||||
|
||||
@@ -96,6 +114,12 @@ To use the default Gummy caption engine (using cloud models for speech recogniti
|
||||
- [Get API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [Configure API Key through Environment Variables](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
### Using the GLM-ASR Model
|
||||
|
||||
Before using it, you need to obtain an API KEY from the Zhipu AI platform and add it to the software settings.
|
||||
|
||||
For API KEY acquisition, see: [Quick Start](https://docs.bigmodel.cn/en/guide/start/quick-start).
|
||||
|
||||
### Using Vosk Model
|
||||
|
||||
> The recognition effect of the Vosk model is poor, please use it with caution.
|
||||
@@ -133,7 +157,7 @@ python main.py \
|
||||
|
||||
## ⚙️ Built-in Subtitle Engines
|
||||
|
||||
Currently, the software comes with 3 caption engines, with new engines under development. Their detailed information is as follows.
|
||||
Currently, the software comes with 4 caption engines, with new engines under development. Their detailed information is as follows.
|
||||
|
||||
### Gummy Subtitle Engine (Cloud)
|
||||
|
||||
@@ -160,6 +184,10 @@ $$
|
||||
|
||||
The engine only uploads data when receiving audio streams, so the actual upload rate may be lower. The return traffic consumption of model results is small and not considered here.
|
||||
|
||||
### GLM-ASR Caption Engine (Cloud)
|
||||
|
||||
https://docs.bigmodel.cn/en/guide/models/sound-and-video/glm-asr-2512
|
||||
|
||||
### Vosk Subtitle Engine (Local)
|
||||
|
||||
Developed based on [vosk-api](https://github.com/alphacep/vosk-api). The advantage of this caption engine is that there are many optional language models (over 30 languages), but the disadvantage is that the recognition effect is relatively poor, and the generated content has no punctuation.
|
||||
@@ -168,16 +196,6 @@ Developed based on [vosk-api](https://github.com/alphacep/vosk-api). The advanta
|
||||
|
||||
[SOSV](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model) is an integrated package, mainly based on [Shepra-ONNX SenseVoice](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html), with added endpoint detection model and punctuation restoration model. The languages supported by this model for recognition are: English, Chinese, Japanese, Korean, and Cantonese.
|
||||
|
||||
### Planned New Subtitle Engines
|
||||
|
||||
The following are candidate models that will be selected based on model performance and ease of integration.
|
||||
|
||||
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
|
||||
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
|
||||
- [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
|
||||
- [FunASR](https://github.com/modelscope/FunASR)
|
||||
- [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit)
|
||||
|
||||
## 🚀 Project Setup
|
||||
|
||||

|
||||
|
||||
64
README_ja.md
@@ -3,7 +3,7 @@
|
||||
<h1 align="center">auto-caption</h1>
|
||||
<p>Auto Caption はクロスプラットフォームのリアルタイム字幕表示ソフトウェアです。</p>
|
||||
<p>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.0.0-blue"></a>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/releases"><img src="https://img.shields.io/badge/release-1.1.1-blue"></a>
|
||||
<a href="https://github.com/HiMeditator/auto-caption/issues"><img src="https://img.shields.io/github/issues/HiMeditator/auto-caption?color=orange"></a>
|
||||
<img src="https://img.shields.io/github/languages/top/HiMeditator/auto-caption?color=royalblue">
|
||||
<img src="https://img.shields.io/github/repo-size/HiMeditator/auto-caption?color=green">
|
||||
@@ -14,7 +14,7 @@
|
||||
| <a href="./README_en.md">English</a>
|
||||
| <b>日本語</b> |
|
||||
</p>
|
||||
<p><i>v1.0.0 バージョンがリリースされ、SOSV ローカル字幕モデルが追加されました。現在の機能は基本的に完了しており、今後の開発計画はありません...</i></p>
|
||||
<p><i>v1.1.1 バージョンがリリースされました。GLM-ASR クラウド字幕モデルと OpenAI 互換モデル翻訳が追加されました...</i></p>
|
||||
</div>
|
||||
|
||||

|
||||
@@ -35,18 +35,24 @@ SOSV モデルダウンロード: [Shepra-ONNX SenseVoice Model](https://github.
|
||||
|
||||
[更新履歴](./docs/CHANGELOG.md)
|
||||
|
||||
## 👁️🗨️ プレビュー
|
||||
|
||||
https://github.com/user-attachments/assets/9c188d78-9520-4397-bacf-4c8fdcc54874
|
||||
|
||||
## ✨ 特徴
|
||||
|
||||
- 音声出力またはマイク入力からの字幕生成
|
||||
- ローカルのOllamaモデルまたはクラウドベースのGoogle翻訳APIを呼び出して翻訳をサポート
|
||||
- ローカルのOllamaモデル、クラウド上のOpenAI互換モデル、またはクラウド上のGoogle翻訳APIを呼び出して翻訳を行うことをサポートしています
|
||||
- クロスプラットフォーム(Windows、macOS、Linux)、多言語インターフェース(中国語、英語、日本語)対応
|
||||
- 豊富な字幕スタイル設定(フォント、フォントサイズ、フォント太さ、フォント色、背景色など)
|
||||
- 柔軟な字幕エンジン選択(阿里云Gummyクラウドモデル、ローカルVoskモデル、ローカルSOSVモデル、または独自にモデルを開発可能)
|
||||
- 柔軟な字幕エンジン選択(阿里云Gummyクラウドモデル、GLM-ASRクラウドモデル、ローカルVoskモデル、ローカルSOSVモデル、または独自にモデルを開発可能)
|
||||
- 多言語認識と翻訳(下記「⚙️ 字幕エンジン説明」参照)
|
||||
- 字幕記録表示とエクスポート(`.srt` および `.json` 形式のエクスポートに対応)
|
||||
|
||||
## 📖 基本使い方
|
||||
|
||||
> ⚠️ 注意:現在、Windowsプラットフォームのソフトウェアの最新バージョンのみがメンテナンスされており、他のプラットフォームの最終バージョンはv1.0.0のままです。
|
||||
|
||||
このソフトウェアは Windows、macOS、Linux プラットフォームに対応しています。テスト済みのプラットフォーム情報は以下の通りです:
|
||||
|
||||
| OS バージョン | アーキテクチャ | システムオーディオ入力 | システムオーディオ出力 |
|
||||
@@ -61,14 +67,15 @@ macOS および Linux プラットフォームでシステムオーディオ出
|
||||
|
||||
ソフトウェアをダウンロードした後、自分のニーズに応じて対応するモデルを選択し、モデルを設定する必要があります。
|
||||
|
||||
| | 認識効果 | デプロイタイプ | 対応言語 | 翻訳 | 備考 |
|
||||
| ------------------------------------------------------------ | -------- | ----------------- | ---------- | ---------- | ---------------------------------------------------------- |
|
||||
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | 良好😊 | クラウド / 阿里云 | 10種 | 内蔵翻訳 | 有料、0.54CNY / 時間 |
|
||||
| [Vosk](https://alphacephei.com/vosk) | 不良😞 | ローカル / CPU | 30種以上 | 追加設定必要 | 対応言語が非常に多い |
|
||||
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | 一般😐 | ローカル / CPU | 5種 | 追加設定必要 | モデルは一つのみ |
|
||||
| 自前開発 | 🤔 | カスタム | カスタム | カスタム | [ドキュメント](./docs/engine-manual/zh.md)に従ってPythonで自前開発 |
|
||||
| | 正確性 | 実時間性 | デプロイタイプ | 対応言語 | 翻訳 | 備考 |
|
||||
| ------------------------------------------------------------ | -------- | --------- | -------------- | -------- | ---- | ---- |
|
||||
| [Gummy](https://help.aliyun.com/zh/model-studio/gummy-speech-recognition-translation) | とても良い😊 | とても良い😊 | クラウド / アリババクラウド | 10言語 | 内蔵翻訳 | 有料、0.54元/時間 |
|
||||
| [glm-asr-2512](https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-asr-2512) | とても良い😊 | 悪い😞 | クラウド / Zhipu AI | 4言語 | 追加設定が必要 | 有料、約0.72元/時間 |
|
||||
| [Vosk](https://alphacephei.com/vosk) | 悪い😞 | とても良い😊 | ローカル / CPU | 30言語以上 | 追加設定が必要 | 多くの言語に対応 |
|
||||
| [SOSV](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html) | 普通😐 | 普通😐 | ローカル / CPU | 5言語 | 追加設定が必要 | 1つのモデルのみ |
|
||||
| 自分で開発 | 🤔 | 🤔 | カスタム | カスタム | カスタム | [ドキュメント](./docs/engine-manual/ja.md)に従ってPythonを使用して自分で開発 |
|
||||
|
||||
VoskまたはSOSVモデルを使用する場合、独自の翻訳モデルも設定する必要があります。
|
||||
Gummyモデル以外を選択した場合、独自の翻訳モデルを設定する必要があります。
|
||||
|
||||
### 翻訳モデルの設定
|
||||
|
||||
@@ -80,7 +87,18 @@ VoskまたはSOSVモデルを使用する場合、独自の翻訳モデルも設
|
||||
|
||||
> 注意:パラメータ数が多すぎるモデルを使用すると、リソース消費と翻訳遅延が大きくなります。1B未満のパラメータ数のモデルを使用することを推奨します。例:`qwen2.5:0.5b`、`qwen3:0.6b`。
|
||||
|
||||
このモデルを使用する前に、ローカルマシンに[Ollama](https://ollama.com/)ソフトウェアがインストールされ、必要な大規模言語モデルがダウンロードされていることを確認してください。必要な大規模モデル名を設定の`Ollama`フィールドに追加するだけでOKです。
|
||||
このモデルを使用する前に、ローカルマシンに[Ollama](https://ollama.com/)ソフトウェアがインストールされており、必要な大規模言語モデルをダウンロード済みであることを確認してください。設定で呼び出す必要がある大規模モデル名を「モデル名」フィールドに入力し、「Base URL」フィールドが空であることを確認してください。
|
||||
|
||||
#### OpenAI互換モデル
|
||||
|
||||
ローカルのOllamaモデルの翻訳効果が良くないと感じる場合や、ローカルにOllamaモデルをインストールしたくない場合は、クラウド上のOpenAI互換モデルを使用できます。
|
||||
|
||||
いくつかのモデルプロバイダの「Base URL」:
|
||||
- OpenAI: https://api.openai.com/v1
|
||||
- DeepSeek: https://api.deepseek.com
|
||||
- アリババクラウド: https://dashscope.aliyuncs.com/compatible-mode/v1
|
||||
|
||||
API Keyは対応するモデルプロバイダから取得する必要があります。
|
||||
|
||||
#### Google翻訳API
|
||||
|
||||
@@ -97,6 +115,12 @@ VoskまたはSOSVモデルを使用する場合、独自の翻訳モデルも設
|
||||
- [API KEYの取得](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [環境変数へのAPI Keyの設定](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
### GLM-ASR モデルの使用
|
||||
|
||||
使用前に、Zhipu AI プラットフォームから API キーを取得し、それをソフトウェアの設定に追加する必要があります。
|
||||
|
||||
API キーの取得についてはこちらをご覧ください:[クイックスタート](https://docs.bigmodel.cn/ja/guide/start/quick-start)。
|
||||
|
||||
### Voskモデルの使用
|
||||
|
||||
> Voskモデルの認識効果は不良のため、注意して使用してください。
|
||||
@@ -134,7 +158,7 @@ python main.py \
|
||||
|
||||
## ⚙️ 字幕エンジン説明
|
||||
|
||||
現在、ソフトウェアには3つの字幕エンジンが搭載されており、新しいエンジンが計画されています。それらの詳細情報は以下の通りです。
|
||||
現在、ソフトウェアには4つの字幕エンジンが搭載されており、新しいエンジンが計画されています。それらの詳細情報は以下の通りです。
|
||||
|
||||
### Gummy 字幕エンジン(クラウド)
|
||||
|
||||
@@ -161,6 +185,10 @@ $$
|
||||
|
||||
また、エンジンはオーディオストームを取得したときのみデータをアップロードするため、実際のアップロードレートはさらに小さくなる可能性があります。モデル結果の返信トラフィック消費量は小さく、ここでは考慮していません。
|
||||
|
||||
### GLM-ASR 字幕エンジン(クラウド)
|
||||
|
||||
https://docs.bigmodel.cn/ja/guide/models/sound-and-video/glm-asr-2512
|
||||
|
||||
### Vosk字幕エンジン(ローカル)
|
||||
|
||||
[vosk-api](https://github.com/alphacep/vosk-api)をベースに開発。この字幕エンジンの利点は選択可能な言語モデルが非常に多く(30言語以上)、欠点は認識効果が比較的悪く、生成内容に句読点がないことです。
|
||||
@@ -169,16 +197,6 @@ $$
|
||||
|
||||
[SOSV](https://github.com/HiMeditator/auto-caption/releases/tag/sosv-model)は統合パッケージで、主に[Shepra-ONNX SenseVoice](https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html)をベースにし、エンドポイント検出モデルと句読点復元モデルを追加しています。このモデルが認識をサポートする言語は:英語、中国語、日本語、韓国語、広東語です。
|
||||
|
||||
### 新規計画字幕エンジン
|
||||
|
||||
以下は候補モデルであり、モデルの性能と統合の容易さに基づいて選択されます。
|
||||
|
||||
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
|
||||
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
|
||||
- [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
|
||||
- [FunASR](https://github.com/modelscope/FunASR)
|
||||
- [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit)
|
||||
|
||||
## 🚀 プロジェクト実行
|
||||
|
||||

|
||||
|
||||
|
Before Width: | Height: | Size: 68 KiB After Width: | Height: | Size: 52 KiB |
|
Before Width: | Height: | Size: 69 KiB After Width: | Height: | Size: 52 KiB |
|
Before Width: | Height: | Size: 72 KiB After Width: | Height: | Size: 54 KiB |
|
Before Width: | Height: | Size: 60 KiB After Width: | Height: | Size: 79 KiB |
|
Before Width: | Height: | Size: 62 KiB After Width: | Height: | Size: 82 KiB |
|
Before Width: | Height: | Size: 81 KiB After Width: | Height: | Size: 87 KiB |
|
Before Width: | Height: | Size: 404 KiB After Width: | Height: | Size: 476 KiB |
|
Before Width: | Height: | Size: 417 KiB After Width: | Height: | Size: 488 KiB |
|
Before Width: | Height: | Size: 417 KiB After Width: | Height: | Size: 486 KiB |
@@ -8,5 +8,9 @@
|
||||
<true/>
|
||||
<key>com.apple.security.cs.allow-dyld-environment-variables</key>
|
||||
<true/>
|
||||
<key>com.apple.security.cs.disable-library-validation</key>
|
||||
<true/>
|
||||
<key>com.apple.security.device.audio-input</key>
|
||||
<true/>
|
||||
</dict>
|
||||
</plist>
|
||||
</plist>
|
||||
|
||||
@@ -172,4 +172,19 @@
|
||||
|
||||
- 优化部分提示信息显示位置
|
||||
- 替换重采样模型,提高音频重采样质量
|
||||
- 带有额外信息的标签颜色改为与主题色一致
|
||||
- 带有额外信息的标签颜色改为与主题色一致
|
||||
|
||||
## v1.1.0
|
||||
|
||||
### 新增功能
|
||||
|
||||
- 添加基于 GLM-ASR 的字幕引擎
|
||||
- 添加 OpenAI API 兼容模型作为新的翻译模型
|
||||
|
||||
## v1.1.1
|
||||
|
||||
### 优化体验
|
||||
|
||||
- 取消字幕窗口的顶置选项,字幕窗口将始终处于顶置状态
|
||||
- 将字幕窗口顶置选项改为鼠标穿透选项,当图钉图标为实心时,表示启用鼠标穿透
|
||||
|
||||
|
||||
15
docs/TODO.md
@@ -23,17 +23,8 @@
|
||||
- [x] 前端页面添加日志内容展示 *2025/08/19*
|
||||
- [x] 添加 Ollama 模型用于本地字幕引擎的翻译 *2025/09/04*
|
||||
- [x] 验证 / 添加基于 sherpa-onnx 的字幕引擎 *2025/09/06*
|
||||
- [x] 添加 GLM-ASR 模型 *2026/01/10*
|
||||
|
||||
## 待完成
|
||||
## TODO
|
||||
|
||||
- [ ] 调研更多的云端模型(火山、OpenAI、Google等)
|
||||
- [ ] 验证 / 添加基于 sherpa-onnx 的字幕引擎
|
||||
|
||||
## 后续计划
|
||||
|
||||
- [ ] 验证 / 添加基于 FunASR 的字幕引擎
|
||||
- [ ] 减小软件不必要的体积
|
||||
|
||||
## 遥远的未来
|
||||
|
||||
- [ ] 使用 Tauri 框架重新开发
|
||||
暂无
|
||||
|
||||
@@ -202,9 +202,9 @@
|
||||
|
||||
**数据类型:** `number`
|
||||
|
||||
### `caption.pin.set`
|
||||
### `caption.mouseEvents.ignore`
|
||||
|
||||
**介绍:** 是否将窗口置顶
|
||||
**介绍:** 是否设置鼠标穿透
|
||||
|
||||
**发起方:** 前端字幕窗口
|
||||
|
||||
|
||||
BIN
docs/img/06.png
|
Before Width: | Height: | Size: 118 KiB After Width: | Height: | Size: 148 KiB |
@@ -1,6 +1,6 @@
|
||||
# Auto Caption User Manual
|
||||
|
||||
Corresponding Version: v1.0.0
|
||||
Corresponding Version: v1.1.1
|
||||
|
||||
**Note: Due to limited personal resources, the English and Japanese documentation files for this project (except for the README document) will no longer be maintained. The content of this document may not be consistent with the latest version of the project. If you are willing to help with translation, please submit relevant Pull Requests.**
|
||||
|
||||
@@ -41,6 +41,11 @@ Alibaba Cloud provides detailed tutorials for this part, which can be referenced
|
||||
- [Obtaining API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [Configuring API Key through Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
|
||||
## Preparation for GLM Engine
|
||||
|
||||
You need to obtain an API KEY first, refer to: [Quick Start](https://docs.bigmodel.cn/en/guide/start/quick-start).
|
||||
|
||||
## Preparation for Using Vosk Engine
|
||||
|
||||
To use the Vosk local caption engine, first download your required model from the [Vosk Models](https://alphacephei.com/vosk/models) page. Then extract the downloaded model package locally and add the corresponding model folder path to the software settings.
|
||||
@@ -111,7 +116,7 @@ After completing all configurations, click the "Start Caption Engine" button on
|
||||
|
||||
### Adjusting the Caption Display Window
|
||||
|
||||
The following image shows the caption display window, which displays the latest captions in real-time. The three buttons in the upper right corner of the window have the following functions: pin the window to the front, open the caption control window, and close the caption display window. The width of the window can be adjusted by moving the mouse to the left or right edge of the window and dragging the mouse.
|
||||
The following image shows the caption display window, which displays the latest captions in real-time. The functions of the three buttons in the upper right corner of the window are: to close the caption display window, to open the caption control window, and to enable mouse pass-through. The width of the window can be adjusted by moving the mouse to the left or right edge of the window and dragging the mouse.
|
||||
|
||||

|
||||
|
||||
@@ -147,7 +152,7 @@ The following parameter descriptions only include necessary parameters.
|
||||
|
||||
#### `-e , --caption_engine`
|
||||
|
||||
The caption engine model to select, currently three options are available: `gummy, vosk, sosv`.
|
||||
The caption engine model to select, currently three options are available: `gummy, glm, vosk, sosv`.
|
||||
|
||||
The default value is `gummy`.
|
||||
|
||||
@@ -199,10 +204,12 @@ Source language for recognition. Default value is `auto`, meaning no specific so
|
||||
|
||||
Specifying the source language can improve recognition accuracy to some extent. You can specify the source language using the language codes above.
|
||||
|
||||
This only applies to Gummy and SOSV models.
|
||||
This applies to Gummy, GLM and SOSV models.
|
||||
|
||||
The Gummy model can use all the languages mentioned above, plus Cantonese (`yue`).
|
||||
|
||||
The GLM model supports specifying the following languages: English, Chinese, Japanese, Korean.
|
||||
|
||||
The SOSV model supports specifying the following languages: English, Chinese, Japanese, Korean, and Cantonese.
|
||||
|
||||
#### `-k, --api_key`
|
||||
@@ -213,6 +220,18 @@ Default value is empty.
|
||||
|
||||
This only applies to the Gummy model.
|
||||
|
||||
#### `-gkey, --glm_api_key`
|
||||
|
||||
Specifies the API KEY required for the `glm` model. The default value is empty.
|
||||
|
||||
#### `-gmodel, --glm_model`
|
||||
|
||||
Specifies the model name to be used for the `glm` model. The default value is `glm-asr-2512`.
|
||||
|
||||
#### `-gurl, --glm_url`
|
||||
|
||||
Specifies the API URL required for the `glm` model. The default value is: `https://open.bigmodel.cn/api/paas/v4/audio/transcriptions`.
|
||||
|
||||
#### `-tm, --translation_model`
|
||||
|
||||
Specify the translation method for Vosk and SOSV models. Default is `ollama`.
|
||||
@@ -226,13 +245,23 @@ This only applies to Vosk and SOSV models.
|
||||
|
||||
#### `-omn, --ollama_name`
|
||||
|
||||
Specify the Ollama model to call for translation. Default value is empty.
|
||||
Specifies the name of the translation model to be used, which can be either a local Ollama model or a cloud model compatible with the OpenAI API. If the Base URL field is not filled in, the local Ollama service will be called by default; otherwise, the API service at the specified address will be invoked via the Python OpenAI library.
|
||||
|
||||
It's recommended to use models with less than 1B parameters, such as: `qwen2.5:0.5b`, `qwen3:0.6b`.
|
||||
If using an Ollama model, it is recommended to use a model with fewer than 1B parameters, such as `qwen2.5:0.5b` or `qwen3:0.6b`. The corresponding model must be downloaded in Ollama for normal use.
|
||||
|
||||
Users need to download the corresponding model in Ollama to use it properly.
|
||||
The default value is empty and applies to models other than Gummy.
|
||||
|
||||
This only applies to Vosk and SOSV models.
|
||||
#### `-ourl, --ollama_url`
|
||||
|
||||
The base request URL for calling the OpenAI API. If left blank, the local Ollama model on the default port will be called.
|
||||
|
||||
The default value is empty and applies to models other than Gummy.
|
||||
|
||||
#### `-okey, --ollama_api_key`
|
||||
|
||||
Specifies the API KEY for calling OpenAI-compatible models.
|
||||
|
||||
The default value is empty and applies to models other than Gummy.
|
||||
|
||||
#### `-vosk, --vosk_model`
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Auto Caption ユーザーマニュアル
|
||||
|
||||
対応バージョン:v1.0.0
|
||||
対応バージョン:v1.1.1
|
||||
|
||||
この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
|
||||
|
||||
@@ -41,6 +41,10 @@ macOS プラットフォームでオーディオ出力を取得するには追
|
||||
- [API KEY の取得(中国語)](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [環境変数を通じて API Key を設定(中国語)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
## GLM エンジン使用前の準備
|
||||
|
||||
まずAPI KEYを取得する必要があります。参考:[クイックスタート](https://docs.bigmodel.cn/en/guide/start/quick-start)。
|
||||
|
||||
## Voskエンジン使用前の準備
|
||||
|
||||
Voskローカル字幕エンジンを使用するには、まず[Vosk Models](https://alphacephei.com/vosk/models)ページから必要なモデルをダウンロードしてください。その後、ダウンロードしたモデルパッケージをローカルに解凍し、対応するモデルフォルダのパスをソフトウェア設定に追加します。
|
||||
@@ -112,7 +116,7 @@ sudo yum install pulseaudio pavucontrol
|
||||
|
||||
### 字幕表示ウィンドウの調整
|
||||
|
||||
下の図は字幕表示ウィンドウです。このウィンドウは現在の最新の字幕をリアルタイムで表示します。ウィンドウの右上にある3つのボタンの機能はそれぞれ次の通りです:ウィンドウを最前面に固定する、字幕制御ウィンドウを開く、字幕表示ウィンドウを閉じる。このウィンドウの幅は調整可能です。マウスをウィンドウの左右の端に移動し、ドラッグして幅を調整します。
|
||||
下の図は字幕表示ウィンドウです。このウィンドウは現在の最新の字幕をリアルタイムで表示します。ウィンドウの右上にある3つのボタンの機能は、それぞれ字幕表示ウィンドウを閉じる、字幕制御ウィンドウを開く、マウス透過を有効化することです。このウィンドウの幅は調整可能です。マウスをウィンドウの左右の端に移動し、ドラッグして幅を調整します。
|
||||
|
||||

|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Auto Caption 用户手册
|
||||
|
||||
对应版本:v1.0.0
|
||||
对应版本:v1.1.1
|
||||
|
||||
## 软件简介
|
||||
|
||||
@@ -39,6 +39,10 @@ Auto Caption 是一个跨平台的字幕显示软件,能够实时获取系统
|
||||
- [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
|
||||
- [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
|
||||
|
||||
## GLM 引擎使用前准备
|
||||
|
||||
需要先获取 API KEY,参考:[Quick Start](https://docs.bigmodel.cn/en/guide/start/quick-start)。
|
||||
|
||||
## Vosk 引擎使用前准备
|
||||
|
||||
如果要使用 Vosk 本地字幕引擎,首先需要在 [Vosk Models](https://alphacephei.com/vosk/models) 页面下载你需要的模型。然后将下载的模型安装包解压到本地,并将对应的模型文件夹的路径添加到软件的设置中。
|
||||
@@ -109,7 +113,7 @@ sudo yum install pulseaudio pavucontrol
|
||||
|
||||
### 调整字幕展示窗口
|
||||
|
||||
如下图为字幕展示窗口,该窗口实时展示当前最新字幕。窗口右上角三个按钮的功能分别是:将窗口固定在最前面、打开字幕控制窗口、关闭字幕展示窗口。该窗口宽度可以调整,将鼠标移动至窗口的左右边缘,拖动鼠标即可调整宽度。
|
||||
如下图为字幕展示窗口,该窗口实时展示当前最新字幕。窗口右上角三个按钮的功能分别是:关闭字幕展示窗口、打开字幕控制窗口、启用鼠标穿透。该窗口宽度可以调整,将鼠标移动至窗口的左右边缘,拖动鼠标即可调整宽度。
|
||||
|
||||

|
||||
|
||||
@@ -145,7 +149,7 @@ sudo yum install pulseaudio pavucontrol
|
||||
|
||||
#### `-e , --caption_engine`
|
||||
|
||||
需要选择的字幕引擎模型,目前有三个可用,分别为:`gummy, vosk, sosv`。
|
||||
需要选择的字幕引擎模型,目前有四个可用,分别为:`gummy, glm, vosk, sosv`。
|
||||
|
||||
该项的默认值为 `gummy`。
|
||||
|
||||
@@ -197,11 +201,13 @@ sudo yum install pulseaudio pavucontrol
|
||||
|
||||
但是指定源语言能在一定程度上提高识别准确率,可用使用上面的语言代码指定源语言。
|
||||
|
||||
该项仅适用于 Gummy 和 SOSV 模型。
|
||||
该项适用于 Gummy、GLM 和 SOSV 模型。
|
||||
|
||||
其中 Gummy 模型可用使用上述全部的语言,在加上粤语(`yue`)。
|
||||
|
||||
而 SOSV 模型支持指定的语言有:英语、中文、日语、韩语、粤语。
|
||||
GLM 模型支持指定的语言有:英语、中文、日语、韩语。
|
||||
|
||||
SOSV 模型支持指定的语言有:英语、中文、日语、韩语、粤语。
|
||||
|
||||
#### `-k, --api_key`
|
||||
|
||||
@@ -211,6 +217,18 @@ sudo yum install pulseaudio pavucontrol
|
||||
|
||||
该项仅适用于 Gummy 模型。
|
||||
|
||||
#### `-gkey, --glm_api_key`
|
||||
|
||||
指定 `glm` 模型需要使用的 API KEY,默认为空。
|
||||
|
||||
#### `-gmodel, --glm_model`
|
||||
|
||||
指定 `glm` 模型需要使用的模型名称,默认为 `glm-asr-2512`。
|
||||
|
||||
#### `-gurl, --glm_url`
|
||||
|
||||
指定 `glm` 模型需要使用的 API URL,默认值为:`https://open.bigmodel.cn/api/paas/v4/audio/transcriptions`。
|
||||
|
||||
#### `-tm, --translation_model`
|
||||
|
||||
指定 Vosk 和 SOSV 模型的翻译方式,默认为 `ollama`。
|
||||
@@ -224,13 +242,23 @@ sudo yum install pulseaudio pavucontrol
|
||||
|
||||
#### `-omn, --ollama_name`
|
||||
|
||||
指定需要调用进行翻译的 Ollama 模型。该项默认值为空。
|
||||
指定要使用的翻译模型名称,可以是 Ollama 本地模型,也可以是 OpenAI API 兼容的云端模型。若未填写 Base URL 字段,则默认调用本地 Ollama 服务,否则会通过 Python OpenAI 库调用该地址指向的 API 服务。
|
||||
|
||||
建议使用参数量小于 1B 的模型,比如: `qwen2.5:0.5b`, `qwen3:0.6b`。
|
||||
如果使用 Ollama 模型,建议使用参数量小于 1B 的模型,比如: `qwen2.5:0.5b`, `qwen3:0.6b`。需要在 Ollama 中下载了对应的模型才能正常使用。
|
||||
|
||||
用户需要在 Ollama 中下载了对应的模型才能正常使用。
|
||||
默认值为空,适用于除了 Gummy 外的其他模型。
|
||||
|
||||
该项仅适用于 Vosk 和 SOSV 模型。
|
||||
#### `-ourl, --ollama_url`
|
||||
|
||||
调用 OpenAI API 的基础请求地址,如果不填写则调用本地默认端口的 Ollama 模型。
|
||||
|
||||
默认值为空,适用于除了 Gummy 外的其他模型。
|
||||
|
||||
#### `-okey, --ollama_api_key`
|
||||
|
||||
指定调用 OpenAI 兼容模型的 API KEY。
|
||||
|
||||
默认值为空,适用于除了 Gummy 外的其他模型。
|
||||
|
||||
#### `-vosk, --vosk_model`
|
||||
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
from .gummy import GummyRecognizer
|
||||
from .vosk import VoskRecognizer
|
||||
from .sosv import SosvRecognizer
|
||||
from .sosv import SosvRecognizer
|
||||
from .glm import GlmRecognizer
|
||||
|
||||
163
engine/audio2text/glm.py
Normal file
@@ -0,0 +1,163 @@
|
||||
import threading
|
||||
import io
|
||||
import wave
|
||||
import struct
|
||||
import math
|
||||
import audioop
|
||||
import requests
|
||||
from datetime import datetime
|
||||
|
||||
from utils import shared_data
|
||||
from utils import stdout_cmd, stdout_obj, google_translate, ollama_translate
|
||||
|
||||
class GlmRecognizer:
|
||||
"""
|
||||
使用 GLM-ASR 引擎处理音频数据,并在标准输出中输出 Auto Caption 软件可读取的 JSON 字符串数据
|
||||
|
||||
初始化参数:
|
||||
url: GLM-ASR API URL
|
||||
model: GLM-ASR 模型名称
|
||||
api_key: GLM-ASR API Key
|
||||
source: 源语言
|
||||
target: 目标语言
|
||||
trans_model: 翻译模型名称
|
||||
ollama_name: Ollama 模型名称
|
||||
"""
|
||||
def __init__(self, url: str, model: str, api_key: str, source: str, target: str | None, trans_model: str, ollama_name: str, ollama_url: str = '', ollama_api_key: str = ''):
|
||||
self.url = url
|
||||
self.model = model
|
||||
self.api_key = api_key
|
||||
self.source = source
|
||||
self.target = target
|
||||
if trans_model == 'google':
|
||||
self.trans_func = google_translate
|
||||
else:
|
||||
self.trans_func = ollama_translate
|
||||
self.ollama_name = ollama_name
|
||||
self.ollama_url = ollama_url
|
||||
self.ollama_api_key = ollama_api_key
|
||||
|
||||
self.audio_buffer = []
|
||||
self.is_speech = False
|
||||
self.silence_frames = 0
|
||||
self.speech_start_time = None
|
||||
self.time_str = ''
|
||||
self.cur_id = 0
|
||||
|
||||
# VAD settings (假设 16k 16bit, chunk size 1024 or similar)
|
||||
# 16bit = 2 bytes per sample.
|
||||
# RMS threshold needs tuning. 500 is a conservative guess for silence.
|
||||
self.threshold = 500
|
||||
self.silence_limit = 15 # frames (approx 0.5-1s depending on chunk size)
|
||||
self.min_speech_frames = 10 # frames
|
||||
|
||||
def start(self):
|
||||
"""启动 GLM 引擎"""
|
||||
stdout_cmd('info', 'GLM-ASR recognizer started.')
|
||||
|
||||
def stop(self):
|
||||
"""停止 GLM 引擎"""
|
||||
stdout_cmd('info', 'GLM-ASR recognizer stopped.')
|
||||
|
||||
def process_audio(self, chunk):
|
||||
# chunk is bytes (int16)
|
||||
rms = audioop.rms(chunk, 2)
|
||||
|
||||
if rms > self.threshold:
|
||||
if not self.is_speech:
|
||||
self.is_speech = True
|
||||
self.time_str = datetime.now().strftime('%H:%M:%S.%f')[:-3]
|
||||
self.audio_buffer = []
|
||||
self.audio_buffer.append(chunk)
|
||||
self.silence_frames = 0
|
||||
else:
|
||||
if self.is_speech:
|
||||
self.audio_buffer.append(chunk)
|
||||
self.silence_frames += 1
|
||||
if self.silence_frames > self.silence_limit:
|
||||
# Speech ended
|
||||
if len(self.audio_buffer) > self.min_speech_frames:
|
||||
self.recognize(self.audio_buffer, self.time_str)
|
||||
self.is_speech = False
|
||||
self.audio_buffer = []
|
||||
self.silence_frames = 0
|
||||
|
||||
def recognize(self, audio_frames, time_s):
|
||||
audio_bytes = b''.join(audio_frames)
|
||||
|
||||
wav_io = io.BytesIO()
|
||||
with wave.open(wav_io, 'wb') as wav_file:
|
||||
wav_file.setnchannels(1)
|
||||
wav_file.setsampwidth(2)
|
||||
wav_file.setframerate(16000)
|
||||
wav_file.writeframes(audio_bytes)
|
||||
wav_io.seek(0)
|
||||
|
||||
threading.Thread(
|
||||
target=self._do_request,
|
||||
args=(wav_io.read(), time_s, self.cur_id)
|
||||
).start()
|
||||
self.cur_id += 1
|
||||
|
||||
def _do_request(self, audio_content, time_s, index):
|
||||
try:
|
||||
files = {
|
||||
'file': ('audio.wav', audio_content, 'audio/wav')
|
||||
}
|
||||
data = {
|
||||
'model': self.model,
|
||||
'stream': 'false'
|
||||
}
|
||||
headers = {
|
||||
'Authorization': f'Bearer {self.api_key}'
|
||||
}
|
||||
|
||||
response = requests.post(self.url, headers=headers, data=data, files=files, timeout=15)
|
||||
|
||||
if response.status_code == 200:
|
||||
res_json = response.json()
|
||||
text = res_json.get('text', '')
|
||||
if text:
|
||||
self.output_caption(text, time_s, index)
|
||||
else:
|
||||
try:
|
||||
err_msg = response.json()
|
||||
stdout_cmd('error', f"GLM API Error: {err_msg}")
|
||||
except:
|
||||
stdout_cmd('error', f"GLM API Error: {response.text}")
|
||||
|
||||
except Exception as e:
|
||||
stdout_cmd('error', f"GLM Request Failed: {str(e)}")
|
||||
|
||||
def output_caption(self, text, time_s, index):
|
||||
caption = {
|
||||
'command': 'caption',
|
||||
'index': index,
|
||||
'time_s': time_s,
|
||||
'time_t': datetime.now().strftime('%H:%M:%S.%f')[:-3],
|
||||
'text': text,
|
||||
'translation': ''
|
||||
}
|
||||
|
||||
if self.target:
|
||||
if self.trans_func == ollama_translate:
|
||||
th = threading.Thread(
|
||||
target=self.trans_func,
|
||||
args=(self.ollama_name, self.target, caption['text'], time_s, self.ollama_url, self.ollama_api_key),
|
||||
daemon=True
|
||||
)
|
||||
else:
|
||||
th = threading.Thread(
|
||||
target=self.trans_func,
|
||||
args=(self.ollama_name, self.target, caption['text'], time_s),
|
||||
daemon=True
|
||||
)
|
||||
th.start()
|
||||
|
||||
stdout_obj(caption)
|
||||
|
||||
def translate(self):
|
||||
global shared_data
|
||||
while shared_data.status == 'running':
|
||||
chunk = shared_data.chunk_queue.get()
|
||||
self.process_audio(chunk)
|
||||
@@ -29,7 +29,7 @@ class SosvRecognizer:
|
||||
trans_model: 翻译模型名称
|
||||
ollama_name: Ollama 模型名称
|
||||
"""
|
||||
def __init__(self, model_path: str, source: str, target: str | None, trans_model: str, ollama_name: str):
|
||||
def __init__(self, model_path: str, source: str, target: str | None, trans_model: str, ollama_name: str, ollama_url: str = '', ollama_api_key: str = ''):
|
||||
if model_path.startswith('"'):
|
||||
model_path = model_path[1:]
|
||||
if model_path.endswith('"'):
|
||||
@@ -45,6 +45,8 @@ class SosvRecognizer:
|
||||
else:
|
||||
self.trans_func = ollama_translate
|
||||
self.ollama_name = ollama_name
|
||||
self.ollama_url = ollama_url
|
||||
self.ollama_api_key = ollama_api_key
|
||||
self.time_str = ''
|
||||
self.cur_id = 0
|
||||
self.prev_content = ''
|
||||
@@ -152,7 +154,7 @@ class SosvRecognizer:
|
||||
if self.target:
|
||||
th = threading.Thread(
|
||||
target=self.trans_func,
|
||||
args=(self.ollama_name, self.target, caption['text'], self.time_str),
|
||||
args=(self.ollama_name, self.target, caption['text'], self.time_str, self.ollama_url, self.ollama_api_key),
|
||||
daemon=True
|
||||
)
|
||||
th.start()
|
||||
|
||||
@@ -18,7 +18,7 @@ class VoskRecognizer:
|
||||
trans_model: 翻译模型名称
|
||||
ollama_name: Ollama 模型名称
|
||||
"""
|
||||
def __init__(self, model_path: str, target: str | None, trans_model: str, ollama_name: str):
|
||||
def __init__(self, model_path: str, target: str | None, trans_model: str, ollama_name: str, ollama_url: str = '', ollama_api_key: str = ''):
|
||||
SetLogLevel(-1)
|
||||
if model_path.startswith('"'):
|
||||
model_path = model_path[1:]
|
||||
@@ -31,6 +31,8 @@ class VoskRecognizer:
|
||||
else:
|
||||
self.trans_func = ollama_translate
|
||||
self.ollama_name = ollama_name
|
||||
self.ollama_url = ollama_url
|
||||
self.ollama_api_key = ollama_api_key
|
||||
self.time_str = ''
|
||||
self.cur_id = 0
|
||||
self.prev_content = ''
|
||||
@@ -66,7 +68,7 @@ class VoskRecognizer:
|
||||
if self.target:
|
||||
th = threading.Thread(
|
||||
target=self.trans_func,
|
||||
args=(self.ollama_name, self.target, caption['text'], self.time_str),
|
||||
args=(self.ollama_name, self.target, caption['text'], self.time_str, self.ollama_url, self.ollama_api_key),
|
||||
daemon=True
|
||||
)
|
||||
th.start()
|
||||
|
||||
113
engine/main.py
@@ -8,6 +8,7 @@ from utils import merge_chunk_channels, resample_chunk_mono
|
||||
from audio2text import GummyRecognizer
|
||||
from audio2text import VoskRecognizer
|
||||
from audio2text import SosvRecognizer
|
||||
from audio2text import GlmRecognizer
|
||||
from sysaudio import AudioStream
|
||||
|
||||
|
||||
@@ -74,7 +75,7 @@ def main_gummy(s: str, t: str, a: int, c: int, k: str, r: bool, rp: str):
|
||||
engine.stop()
|
||||
|
||||
|
||||
def main_vosk(a: int, c: int, vosk: str, t: str, tm: str, omn: str, r: bool, rp: str):
|
||||
def main_vosk(a: int, c: int, vosk: str, t: str, tm: str, omn: str, ourl: str, okey: str, r: bool, rp: str):
|
||||
"""
|
||||
Parameters:
|
||||
a: Audio source: 0 for output, 1 for input
|
||||
@@ -83,14 +84,16 @@ def main_vosk(a: int, c: int, vosk: str, t: str, tm: str, omn: str, r: bool, rp:
|
||||
t: Target language
|
||||
tm: Translation model type, ollama or google
|
||||
omn: Ollama model name
|
||||
ourl: Ollama Base URL
|
||||
okey: Ollama API Key
|
||||
r: Whether to record the audio
|
||||
rp: Path to save the recorded audio
|
||||
"""
|
||||
stream = AudioStream(a, c)
|
||||
if t == 'none':
|
||||
engine = VoskRecognizer(vosk, None, tm, omn)
|
||||
engine = VoskRecognizer(vosk, None, tm, omn, ourl, okey)
|
||||
else:
|
||||
engine = VoskRecognizer(vosk, t, tm, omn)
|
||||
engine = VoskRecognizer(vosk, t, tm, omn, ourl, okey)
|
||||
|
||||
engine.start()
|
||||
stream_thread = threading.Thread(
|
||||
@@ -106,7 +109,7 @@ def main_vosk(a: int, c: int, vosk: str, t: str, tm: str, omn: str, r: bool, rp:
|
||||
engine.stop()
|
||||
|
||||
|
||||
def main_sosv(a: int, c: int, sosv: str, s: str, t: str, tm: str, omn: str, r: bool, rp: str):
|
||||
def main_sosv(a: int, c: int, sosv: str, s: str, t: str, tm: str, omn: str, ourl: str, okey: str, r: bool, rp: str):
|
||||
"""
|
||||
Parameters:
|
||||
a: Audio source: 0 for output, 1 for input
|
||||
@@ -116,14 +119,16 @@ def main_sosv(a: int, c: int, sosv: str, s: str, t: str, tm: str, omn: str, r: b
|
||||
t: Target language
|
||||
tm: Translation model type, ollama or google
|
||||
omn: Ollama model name
|
||||
ourl: Ollama API URL
|
||||
okey: Ollama API Key
|
||||
r: Whether to record the audio
|
||||
rp: Path to save the recorded audio
|
||||
"""
|
||||
stream = AudioStream(a, c)
|
||||
if t == 'none':
|
||||
engine = SosvRecognizer(sosv, s, None, tm, omn)
|
||||
engine = SosvRecognizer(sosv, s, None, tm, omn, ourl, okey)
|
||||
else:
|
||||
engine = SosvRecognizer(sosv, s, t, tm, omn)
|
||||
engine = SosvRecognizer(sosv, s, t, tm, omn, ourl, okey)
|
||||
|
||||
engine.start()
|
||||
stream_thread = threading.Thread(
|
||||
@@ -139,38 +144,80 @@ def main_sosv(a: int, c: int, sosv: str, s: str, t: str, tm: str, omn: str, r: b
|
||||
engine.stop()
|
||||
|
||||
|
||||
def main_glm(a: int, c: int, url: str, model: str, key: str, s: str, t: str, tm: str, omn: str, ourl: str, okey: str, r: bool, rp: str):
|
||||
"""
|
||||
Parameters:
|
||||
a: Audio source
|
||||
c: Chunk rate
|
||||
url: GLM API URL
|
||||
model: GLM Model Name
|
||||
key: GLM API Key
|
||||
s: Source language
|
||||
t: Target language
|
||||
tm: Translation model
|
||||
omn: Ollama model name
|
||||
ourl: Ollama API URL
|
||||
okey: Ollama API Key
|
||||
r: Record
|
||||
rp: Record path
|
||||
"""
|
||||
stream = AudioStream(a, c)
|
||||
if t == 'none':
|
||||
engine = GlmRecognizer(url, model, key, s, None, tm, omn, ourl, okey)
|
||||
else:
|
||||
engine = GlmRecognizer(url, model, key, s, t, tm, omn, ourl, okey)
|
||||
|
||||
engine.start()
|
||||
stream_thread = threading.Thread(
|
||||
target=audio_recording,
|
||||
args=(stream, True, r, rp),
|
||||
daemon=True
|
||||
)
|
||||
stream_thread.start()
|
||||
try:
|
||||
engine.translate()
|
||||
except KeyboardInterrupt:
|
||||
stdout("Keyboard interrupt detected. Exiting...")
|
||||
engine.stop()
|
||||
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
|
||||
# all
|
||||
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy or vosk or sosv')
|
||||
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output, 1 for input')
|
||||
parser.add_argument('-c', '--chunk_rate', default=10, help='Number of audio stream chunks collected per second')
|
||||
parser.add_argument('-p', '--port', default=0, help='The port to run the server on, 0 for no server')
|
||||
parser.add_argument('-d', '--display_caption', default=0, help='Display caption on terminal, 0 for no display, 1 for display')
|
||||
parser.add_argument('-e', '--caption_engine', default='gummy', help='Caption engine: gummy, glm, vosk or sosv')
|
||||
parser.add_argument('-a', '--audio_type', type=int, default=0, help='Audio stream source: 0 for output, 1 for input')
|
||||
parser.add_argument('-c', '--chunk_rate', type=int, default=10, help='Number of audio stream chunks collected per second')
|
||||
parser.add_argument('-p', '--port', type=int, default=0, help='The port to run the server on, 0 for no server')
|
||||
parser.add_argument('-d', '--display_caption', type=int, default=0, help='Display caption on terminal, 0 for no display, 1 for display')
|
||||
parser.add_argument('-t', '--target_language', default='none', help='Target language code, "none" for no translation')
|
||||
parser.add_argument('-r', '--record', default=0, help='Whether to record the audio, 0 for no recording, 1 for recording')
|
||||
parser.add_argument('-r', '--record', type=int, default=0, help='Whether to record the audio, 0 for no recording, 1 for recording')
|
||||
parser.add_argument('-rp', '--record_path', default='', help='Path to save the recorded audio')
|
||||
# gummy and sosv
|
||||
# gummy and sosv and glm
|
||||
parser.add_argument('-s', '--source_language', default='auto', help='Source language code')
|
||||
# gummy only
|
||||
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
|
||||
# vosk and sosv
|
||||
parser.add_argument('-tm', '--translation_model', default='ollama', help='Model for translation: ollama or google')
|
||||
parser.add_argument('-omn', '--ollama_name', default='', help='Ollama model name for translation')
|
||||
parser.add_argument('-ourl', '--ollama_url', default='', help='Ollama API URL')
|
||||
parser.add_argument('-okey', '--ollama_api_key', default='', help='Ollama API Key')
|
||||
# vosk only
|
||||
parser.add_argument('-vosk', '--vosk_model', default='', help='The path to the vosk model.')
|
||||
# sosv only
|
||||
parser.add_argument('-sosv', '--sosv_model', default=None, help='The SenseVoice model path')
|
||||
# glm only
|
||||
parser.add_argument('-gurl', '--glm_url', default='https://open.bigmodel.cn/api/paas/v4/audio/transcriptions', help='GLM API URL')
|
||||
parser.add_argument('-gmodel', '--glm_model', default='glm-asr-2512', help='GLM Model Name')
|
||||
parser.add_argument('-gkey', '--glm_api_key', default='', help='GLM API Key')
|
||||
|
||||
args = parser.parse_args()
|
||||
if int(args.port) == 0:
|
||||
shared_data.status = "running"
|
||||
else:
|
||||
start_server(int(args.port))
|
||||
|
||||
if int(args.display_caption) != 0:
|
||||
|
||||
if args.port != 0:
|
||||
threading.Thread(target=start_server, args=(args.port,), daemon=True).start()
|
||||
|
||||
if args.display_caption == '1':
|
||||
change_caption_display(True)
|
||||
print("Caption will be displayed on terminal")
|
||||
|
||||
if args.caption_engine == 'gummy':
|
||||
main_gummy(
|
||||
@@ -179,7 +226,7 @@ if __name__ == "__main__":
|
||||
int(args.audio_type),
|
||||
int(args.chunk_rate),
|
||||
args.api_key,
|
||||
True if int(args.record) == 1 else False,
|
||||
bool(int(args.record)),
|
||||
args.record_path
|
||||
)
|
||||
elif args.caption_engine == 'vosk':
|
||||
@@ -190,7 +237,9 @@ if __name__ == "__main__":
|
||||
args.target_language,
|
||||
args.translation_model,
|
||||
args.ollama_name,
|
||||
True if int(args.record) == 1 else False,
|
||||
args.ollama_url,
|
||||
args.ollama_api_key,
|
||||
bool(int(args.record)),
|
||||
args.record_path
|
||||
)
|
||||
elif args.caption_engine == 'sosv':
|
||||
@@ -202,7 +251,25 @@ if __name__ == "__main__":
|
||||
args.target_language,
|
||||
args.translation_model,
|
||||
args.ollama_name,
|
||||
True if int(args.record) == 1 else False,
|
||||
args.ollama_url,
|
||||
args.ollama_api_key,
|
||||
bool(int(args.record)),
|
||||
args.record_path
|
||||
)
|
||||
elif args.caption_engine == 'glm':
|
||||
main_glm(
|
||||
int(args.audio_type),
|
||||
int(args.chunk_rate),
|
||||
args.glm_url,
|
||||
args.glm_model,
|
||||
args.glm_api_key,
|
||||
args.source_language,
|
||||
args.target_language,
|
||||
args.translation_model,
|
||||
args.ollama_name,
|
||||
args.ollama_url,
|
||||
args.ollama_api_key,
|
||||
bool(int(args.record)),
|
||||
args.record_path
|
||||
)
|
||||
else:
|
||||
|
||||
@@ -6,7 +6,12 @@ import sys
|
||||
if sys.platform == 'win32':
|
||||
vosk_path = str(Path('./.venv/Lib/site-packages/vosk').resolve())
|
||||
else:
|
||||
vosk_path = str(Path('./.venv/lib/python3.12/site-packages/vosk').resolve())
|
||||
venv_lib = Path('./.venv/lib')
|
||||
python_dirs = list(venv_lib.glob('python*'))
|
||||
if python_dirs:
|
||||
vosk_path = str((python_dirs[0] / 'site-packages' / 'vosk').resolve())
|
||||
else:
|
||||
vosk_path = str(Path('./.venv/lib/python3.12/site-packages/vosk').resolve())
|
||||
|
||||
a = Analysis(
|
||||
['main.py'],
|
||||
|
||||
@@ -7,4 +7,6 @@ pyaudio; sys_platform == 'darwin'
|
||||
pyaudiowpatch; sys_platform == 'win32'
|
||||
googletrans
|
||||
ollama
|
||||
sherpa_onnx
|
||||
sherpa_onnx
|
||||
requests
|
||||
openai
|
||||
|
||||
@@ -47,7 +47,6 @@ def translation_display(obj):
|
||||
|
||||
def stdout_obj(obj):
|
||||
global display_caption
|
||||
print(obj['command'], display_caption)
|
||||
if obj['command'] == 'caption' and display_caption:
|
||||
caption_display(obj)
|
||||
return
|
||||
|
||||
@@ -1,5 +1,9 @@
|
||||
from ollama import chat
|
||||
from ollama import chat, Client
|
||||
from ollama import ChatResponse
|
||||
try:
|
||||
from openai import OpenAI
|
||||
except ImportError:
|
||||
OpenAI = None
|
||||
import asyncio
|
||||
from googletrans import Translator
|
||||
from .sysout import stdout_cmd, stdout_obj
|
||||
@@ -17,15 +21,43 @@ lang_map = {
|
||||
'zh-cn': 'Chinese'
|
||||
}
|
||||
|
||||
def ollama_translate(model: str, target: str, text: str, time_s: str):
|
||||
response: ChatResponse = chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
|
||||
{"role": "user", "content": text}
|
||||
]
|
||||
)
|
||||
content = response.message.content or ""
|
||||
def ollama_translate(model: str, target: str, text: str, time_s: str, url: str = '', key: str = ''):
|
||||
content = ""
|
||||
try:
|
||||
if url:
|
||||
if OpenAI:
|
||||
client = OpenAI(base_url=url, api_key=key if key else "ollama")
|
||||
openai_response = client.chat.completions.create(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
|
||||
{"role": "user", "content": text}
|
||||
]
|
||||
)
|
||||
content = openai_response.choices[0].message.content or ""
|
||||
else:
|
||||
client = Client(host=url)
|
||||
response: ChatResponse = client.chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
|
||||
{"role": "user", "content": text}
|
||||
]
|
||||
)
|
||||
content = response.message.content or ""
|
||||
else:
|
||||
response: ChatResponse = chat(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": f"/no_think Translate the following content into {lang_map[target]}, and do not output any additional information."},
|
||||
{"role": "user", "content": text}
|
||||
]
|
||||
)
|
||||
content = response.message.content or ""
|
||||
except Exception as e:
|
||||
stdout_cmd("warn", f"Translation failed: {str(e)}")
|
||||
return
|
||||
|
||||
if content.startswith('<think>'):
|
||||
index = content.find('</think>')
|
||||
if index != -1:
|
||||
|
||||
67
package-lock.json
generated
@@ -110,6 +110,7 @@
|
||||
"integrity": "sha512-IaaGWsQqfsQWVLqMn9OB92MNN7zukfVA4s7KKAI0KfrrDsZ0yhi5uV4baBuLuN7n3vsZpwP8asPPcVwApxvjBQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"@ampproject/remapping": "^2.2.0",
|
||||
"@babel/code-frame": "^7.27.1",
|
||||
@@ -2274,6 +2275,7 @@
|
||||
"resolved": "https://registry.npmmirror.com/@types/node/-/node-22.15.17.tgz",
|
||||
"integrity": "sha512-wIX2aSZL5FE+MR0JlvF87BNVrtFWf6AE6rxSE9X7OwnVvoyCQjpzSRJ+M87se/4QCkCiebQAqrJ0y6fwIyi7nw==",
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"undici-types": "~6.21.0"
|
||||
}
|
||||
@@ -2360,6 +2362,7 @@
|
||||
"integrity": "sha512-B2MdzyWxCE2+SqiZHAjPphft+/2x2FlO9YBx7eKE1BCb+rqBlQdhtAEhzIEdozHd55DXPmxBdpMygFJjfjjA9A==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"@typescript-eslint/scope-manager": "8.32.0",
|
||||
"@typescript-eslint/types": "8.32.0",
|
||||
@@ -2791,6 +2794,7 @@
|
||||
"integrity": "sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"bin": {
|
||||
"acorn": "bin/acorn"
|
||||
},
|
||||
@@ -2851,6 +2855,7 @@
|
||||
"integrity": "sha512-j3fVLgvTo527anyYyJOGTYJbG+vnnQYvE0m5mmkc1TK+nxAppkCLMIL0aZ4dblVCNoGShhm+kzE4ZUykBoMg4g==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"fast-deep-equal": "^3.1.1",
|
||||
"fast-json-stable-stringify": "^2.0.0",
|
||||
@@ -3064,7 +3069,6 @@
|
||||
"integrity": "sha512-+25nxyyznAXF7Nef3y0EbBeqmGZgeN/BxHX29Rs39djAfaFalmQ89SE6CWyDCHzGL0yt/ycBtNOmGTW0FyGWNw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"archiver-utils": "^2.1.0",
|
||||
"async": "^3.2.4",
|
||||
@@ -3084,7 +3088,6 @@
|
||||
"integrity": "sha512-bEL/yUb/fNNiNTuUz979Z0Yg5L+LzLxGJz8x79lYmR54fmTIb6ob/hNQgkQnIUDWIFjZVQwl9Xs356I6BAMHfw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"glob": "^7.1.4",
|
||||
"graceful-fs": "^4.2.0",
|
||||
@@ -3107,7 +3110,6 @@
|
||||
"integrity": "sha512-8p0AUk4XODgIewSi0l8Epjs+EVnWiK7NoDIEGU0HhE7+ZyY8D1IMY7odu5lRrFXGg71L15KG8QrPmum45RTtdA==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"core-util-is": "~1.0.0",
|
||||
"inherits": "~2.0.3",
|
||||
@@ -3123,8 +3125,7 @@
|
||||
"resolved": "https://registry.npmmirror.com/safe-buffer/-/safe-buffer-5.1.2.tgz",
|
||||
"integrity": "sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/archiver-utils/node_modules/string_decoder": {
|
||||
"version": "1.1.1",
|
||||
@@ -3132,7 +3133,6 @@
|
||||
"integrity": "sha512-n/ShnvDi6FHbbVfviro+WojiFzv+s8MPMHBczVePfUpDJLwoLT0ht1l4YwBCbi8pJAveEEdnkHyPyTP/mzRfwg==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"safe-buffer": "~5.1.0"
|
||||
}
|
||||
@@ -3351,6 +3351,7 @@
|
||||
}
|
||||
],
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"caniuse-lite": "^1.0.30001716",
|
||||
"electron-to-chromium": "^1.5.149",
|
||||
@@ -3848,7 +3849,6 @@
|
||||
"integrity": "sha512-D3uMHtGc/fcO1Gt1/L7i1e33VOvD4A9hfQLP+6ewd+BvG/gQ84Yh4oftEhAdjSMgBgwGL+jsppT7JYNpo6MHHg==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"buffer-crc32": "^0.2.13",
|
||||
"crc32-stream": "^4.0.2",
|
||||
@@ -3994,7 +3994,6 @@
|
||||
"integrity": "sha512-ROmzCKrTnOwybPcJApAA6WBWij23HVfGVNKqqrZpuyZOHqK2CwHSvpGuyt/UNNvaIjEd8X5IFGp4Mh+Ie1IHJQ==",
|
||||
"dev": true,
|
||||
"license": "Apache-2.0",
|
||||
"peer": true,
|
||||
"bin": {
|
||||
"crc32": "bin/crc32.njs"
|
||||
},
|
||||
@@ -4008,7 +4007,6 @@
|
||||
"integrity": "sha512-NT7w2JVU7DFroFdYkeq8cywxrgjPHWkdX1wjpRQXPX5Asews3tA+Ght6lddQO5Mkumffp3X7GEqku3epj2toIw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"crc-32": "^1.2.0",
|
||||
"readable-stream": "^3.4.0"
|
||||
@@ -4248,6 +4246,7 @@
|
||||
"integrity": "sha512-NoXo6Liy2heSklTI5OIZbCgXC1RzrDQsZkeEwXhdOro3FT1VBOvbubvscdPnjVuQ4AMwwv61oaH96AbiYg9EnQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"app-builder-lib": "25.1.8",
|
||||
"builder-util": "25.1.7",
|
||||
@@ -4410,6 +4409,7 @@
|
||||
"integrity": "sha512-6dLslJrQYB1qvqVPYRv1PhAA/uytC66nUeiTcq2JXiBzrmTWCHppqtGUjZhvnSRVatBCT5/SFdizdzcBiEiYUg==",
|
||||
"hasInstallScript": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"@electron/get": "^2.0.0",
|
||||
"@types/node": "^22.7.7",
|
||||
@@ -4454,7 +4454,6 @@
|
||||
"integrity": "sha512-2ntkJ+9+0GFP6nAISiMabKt6eqBB0kX1QqHNWFWAXgi0VULKGisM46luRFpIBiU3u/TDmhZMM8tzvo2Abn3ayg==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"app-builder-lib": "25.1.8",
|
||||
"archiver": "^5.3.1",
|
||||
@@ -4468,7 +4467,6 @@
|
||||
"integrity": "sha512-oRXApq54ETRj4eMiFzGnHWGy+zo5raudjuxN0b8H7s/RU2oW0Wvsx9O0ACRN/kRq9E8Vu/ReskGB5o3ji+FzHQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"graceful-fs": "^4.2.0",
|
||||
"jsonfile": "^6.0.1",
|
||||
@@ -4484,7 +4482,6 @@
|
||||
"integrity": "sha512-5dgndWOriYSm5cnYaJNhalLNDKOqFwyDB/rr1E9ZsGciGvKPs8R2xYGCacuf3z6K1YKDz182fd+fY3cn3pMqXQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"universalify": "^2.0.0"
|
||||
},
|
||||
@@ -4498,7 +4495,6 @@
|
||||
"integrity": "sha512-gptHNQghINnc/vTGIk0SOFGFNXw7JVrlRUtConJRlvaw6DuX0wO5Jeko9sWrMBhh+PsYAZ7oXAiOnf/UKogyiw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"engines": {
|
||||
"node": ">= 10.0.0"
|
||||
}
|
||||
@@ -4813,6 +4809,7 @@
|
||||
"integrity": "sha512-LSehfdpgMeWcTZkWZVIJl+tkZ2nuSkyyB9C27MZqFWXuph7DvaowgcTvKqxvpLW1JZIk8PN7hFY3Rj9LQ7m7lg==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"@eslint-community/eslint-utils": "^4.2.0",
|
||||
"@eslint-community/regexpp": "^4.12.1",
|
||||
@@ -4874,6 +4871,7 @@
|
||||
"integrity": "sha512-zc1UmCpNltmVY34vuLRV61r1K27sWuX39E+uyUnY8xS2Bex88VV9cugG+UZbRSRGtGyFboj+D8JODyme1plMpw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"bin": {
|
||||
"eslint-config-prettier": "bin/cli.js"
|
||||
},
|
||||
@@ -5351,8 +5349,7 @@
|
||||
"resolved": "https://registry.npmmirror.com/fs-constants/-/fs-constants-1.0.0.tgz",
|
||||
"integrity": "sha512-y6OAwoSIf7FyjMIv94u+b5rdheZEjzR63GTyZJm5qh4Bi+2YgwLCcI/fPFZkL5PSixOt6ZNKm+w+Hfp/Bciwow==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/fs-extra": {
|
||||
"version": "8.1.0",
|
||||
@@ -6108,8 +6105,7 @@
|
||||
"resolved": "https://registry.npmmirror.com/isarray/-/isarray-1.0.0.tgz",
|
||||
"integrity": "sha512-VLghIWNM6ELQzo7zwmcg0NmTVyWKYjvIeM83yjp0wRDTmUnrM678fQbcKBo6n2CJEF0szoG//ytg+TKla89ALQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/isbinaryfile": {
|
||||
"version": "5.0.4",
|
||||
@@ -6300,7 +6296,6 @@
|
||||
"integrity": "sha512-b94GiNHQNy6JNTrt5w6zNyffMrNkXZb3KTkCZJb2V1xaEGCk093vkZ2jk3tpaeP33/OiXC+WvK9AxUebnf5nbw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"readable-stream": "^2.0.5"
|
||||
},
|
||||
@@ -6314,7 +6309,6 @@
|
||||
"integrity": "sha512-8p0AUk4XODgIewSi0l8Epjs+EVnWiK7NoDIEGU0HhE7+ZyY8D1IMY7odu5lRrFXGg71L15KG8QrPmum45RTtdA==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"core-util-is": "~1.0.0",
|
||||
"inherits": "~2.0.3",
|
||||
@@ -6330,8 +6324,7 @@
|
||||
"resolved": "https://registry.npmmirror.com/safe-buffer/-/safe-buffer-5.1.2.tgz",
|
||||
"integrity": "sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/lazystream/node_modules/string_decoder": {
|
||||
"version": "1.1.1",
|
||||
@@ -6339,7 +6332,6 @@
|
||||
"integrity": "sha512-n/ShnvDi6FHbbVfviro+WojiFzv+s8MPMHBczVePfUpDJLwoLT0ht1l4YwBCbi8pJAveEEdnkHyPyTP/mzRfwg==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"safe-buffer": "~5.1.0"
|
||||
}
|
||||
@@ -6391,32 +6383,28 @@
|
||||
"resolved": "https://registry.npmmirror.com/lodash.defaults/-/lodash.defaults-4.2.0.tgz",
|
||||
"integrity": "sha512-qjxPLHd3r5DnsdGacqOMU6pb/avJzdh9tFX2ymgoZE27BmjXrNy/y4LoaiTeAb+O3gL8AfpJGtqfX/ae2leYYQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/lodash.difference": {
|
||||
"version": "4.5.0",
|
||||
"resolved": "https://registry.npmmirror.com/lodash.difference/-/lodash.difference-4.5.0.tgz",
|
||||
"integrity": "sha512-dS2j+W26TQ7taQBGN8Lbbq04ssV3emRw4NY58WErlTO29pIqS0HmoT5aJ9+TUQ1N3G+JOZSji4eugsWwGp9yPA==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/lodash.flatten": {
|
||||
"version": "4.4.0",
|
||||
"resolved": "https://registry.npmmirror.com/lodash.flatten/-/lodash.flatten-4.4.0.tgz",
|
||||
"integrity": "sha512-C5N2Z3DgnnKr0LOpv/hKCgKdb7ZZwafIrsesve6lmzvZIRZRGaZ/l6Q8+2W7NaT+ZwO3fFlSCzCzrDCFdJfZ4g==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/lodash.isplainobject": {
|
||||
"version": "4.0.6",
|
||||
"resolved": "https://registry.npmmirror.com/lodash.isplainobject/-/lodash.isplainobject-4.0.6.tgz",
|
||||
"integrity": "sha512-oSXzaWypCMHkPC3NvBEaPHf0KsA5mvPrOPgQWDsbg8n7orZ290M0BmC/jgRZ4vcJ6DTAhjrsSYgdsW/F+MFOBA==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/lodash.merge": {
|
||||
"version": "4.6.2",
|
||||
@@ -6430,8 +6418,7 @@
|
||||
"resolved": "https://registry.npmmirror.com/lodash.union/-/lodash.union-4.6.0.tgz",
|
||||
"integrity": "sha512-c4pB2CdGrGdjMKYLA+XiRDO7Y0PRQbm/Gzg8qMj+QH+pFVAoTp5sBpO0odL3FjoPCGjK96p6qsP+yQoiLoOBcw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/log-symbols": {
|
||||
"version": "4.1.0",
|
||||
@@ -6984,7 +6971,6 @@
|
||||
"integrity": "sha512-6eZs5Ls3WtCisHWp9S2GUy8dqkpGi4BVSz3GaqiE6ezub0512ESztXUwUB6C6IKbQkY2Pnb/mD4WYojCRwcwLA==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"engines": {
|
||||
"node": ">=0.10.0"
|
||||
}
|
||||
@@ -7408,6 +7394,7 @@
|
||||
"integrity": "sha512-QQtaxnoDJeAkDvDKWCLiwIXkTgRhwYDEQCghU9Z6q03iyek/rxRh/2lC3HB7P8sWT2xC/y5JDctPLBIGzHKbhw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"bin": {
|
||||
"prettier": "bin/prettier.cjs"
|
||||
},
|
||||
@@ -7436,8 +7423,7 @@
|
||||
"resolved": "https://registry.npmmirror.com/process-nextick-args/-/process-nextick-args-2.0.1.tgz",
|
||||
"integrity": "sha512-3ouUOpQhtgrbOa17J7+uxOTpITYWaGP7/AhoR3+A+/1e9skrzelGi/dXzEYyvbxubEF6Wn2ypscTKiKJFFn1ag==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/progress": {
|
||||
"version": "2.0.3",
|
||||
@@ -7556,7 +7542,6 @@
|
||||
"integrity": "sha512-v05I2k7xN8zXvPD9N+z/uhXPaj0sUFCe2rcWZIpBsqxfP7xXFQ0tipAd/wjj1YxWyWtUS5IDJpOG82JKt2EAVA==",
|
||||
"dev": true,
|
||||
"license": "Apache-2.0",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"minimatch": "^5.1.0"
|
||||
}
|
||||
@@ -7567,7 +7552,6 @@
|
||||
"integrity": "sha512-lKwV/1brpG6mBUFHtb7NUmtABCb2WZZmm2wNiOA5hAb8VdCS4B3dtMWyvcoViccwAW/COERjXLt0zP1zXUN26g==",
|
||||
"dev": true,
|
||||
"license": "ISC",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"brace-expansion": "^2.0.1"
|
||||
},
|
||||
@@ -8235,7 +8219,6 @@
|
||||
"integrity": "sha512-ujeqbceABgwMZxEJnk2HDY2DlnUZ+9oEcb1KzTVfYHio0UE6dG71n60d8D2I4qNvleWrrXpmjpt7vZeF1LnMZQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"bl": "^4.0.3",
|
||||
"end-of-stream": "^1.4.1",
|
||||
@@ -8360,6 +8343,7 @@
|
||||
"integrity": "sha512-M7BAV6Rlcy5u+m6oPhAPFgJTzAioX/6B0DxyvDlo9l8+T3nLKbrczg2WLUyzd45L8RqfUMyGPzekbMvX2Ldkwg==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"engines": {
|
||||
"node": ">=12"
|
||||
},
|
||||
@@ -8462,6 +8446,7 @@
|
||||
"integrity": "sha512-p1diW6TqL9L07nNxvRMM7hMMw4c5XOo/1ibL4aAIGmSAt9slTE1Xgw5KWuof2uTOvCg9BY7ZRi+GaF+7sfgPeQ==",
|
||||
"devOptional": true,
|
||||
"license": "Apache-2.0",
|
||||
"peer": true,
|
||||
"bin": {
|
||||
"tsc": "bin/tsc",
|
||||
"tsserver": "bin/tsserver"
|
||||
@@ -8611,6 +8596,7 @@
|
||||
"integrity": "sha512-cZn6NDFE7wdTpINgs++ZJ4N49W2vRp8LCKrn3Ob1kYNtOo21vfDoaV5GzBfLU4MovSAB8uNRm4jgzVQZ+mBzPQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"esbuild": "^0.25.0",
|
||||
"fdir": "^6.4.4",
|
||||
@@ -8701,6 +8687,7 @@
|
||||
"integrity": "sha512-M7BAV6Rlcy5u+m6oPhAPFgJTzAioX/6B0DxyvDlo9l8+T3nLKbrczg2WLUyzd45L8RqfUMyGPzekbMvX2Ldkwg==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"engines": {
|
||||
"node": ">=12"
|
||||
},
|
||||
@@ -8720,6 +8707,7 @@
|
||||
"resolved": "https://registry.npmmirror.com/vue/-/vue-3.5.13.tgz",
|
||||
"integrity": "sha512-wmeiSMxkZCSc+PM2w2VRsOYAZC8GdipNFRTsLSfodVqI9mbejKeXEGr8SckuLnrQPGe3oJN5c3K0vpoU9q/wCQ==",
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"@vue/compiler-dom": "3.5.13",
|
||||
"@vue/compiler-sfc": "3.5.13",
|
||||
@@ -8742,6 +8730,7 @@
|
||||
"integrity": "sha512-dbCBnd2e02dYWsXoqX5yKUZlOt+ExIpq7hmHKPb5ZqKcjf++Eo0hMseFTZMLKThrUk61m+Uv6A2YSBve6ZvuDQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"debug": "^4.4.0",
|
||||
"eslint-scope": "^8.2.0",
|
||||
@@ -9046,7 +9035,6 @@
|
||||
"integrity": "sha512-9qv4rlDiopXg4E69k+vMHjNN63YFMe9sZMrdlvKnCjlCRWeCBswPPMPUfx+ipsAWq1LXHe70RcbaHdJJpS6hyQ==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"archiver-utils": "^3.0.4",
|
||||
"compress-commons": "^4.1.2",
|
||||
@@ -9062,7 +9050,6 @@
|
||||
"integrity": "sha512-KVgf4XQVrTjhyWmx6cte4RxonPLR9onExufI1jhvw/MQ4BB6IsZD5gT8Lq+u/+pRkWna/6JoHpiQioaqFP5Rzw==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"peer": true,
|
||||
"dependencies": {
|
||||
"glob": "^7.2.3",
|
||||
"graceful-fs": "^4.2.0",
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"name": "auto-caption",
|
||||
"productName": "Auto Caption",
|
||||
"version": "1.0.0",
|
||||
"version": "1.1.1",
|
||||
"description": "A cross-platform subtitle display software.",
|
||||
"main": "./out/main/index.js",
|
||||
"author": "himeditator",
|
||||
|
||||
@@ -77,10 +77,9 @@ class CaptionWindow {
|
||||
}
|
||||
})
|
||||
|
||||
ipcMain.on('caption.pin.set', (_, pinned) => {
|
||||
ipcMain.on('caption.mouseEvents.ignore', (_, ignore: boolean) => {
|
||||
if(this.window){
|
||||
if(pinned) this.window.setAlwaysOnTop(true, 'screen-saver')
|
||||
else this.window.setAlwaysOnTop(false)
|
||||
this.window.setIgnoreMouseEvents(ignore, { forward: ignore })
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
@@ -8,6 +8,8 @@ export interface Controls {
|
||||
targetLang: string,
|
||||
transModel: string,
|
||||
ollamaName: string,
|
||||
ollamaUrl: string,
|
||||
ollamaApiKey: string,
|
||||
engine: string,
|
||||
audio: 0 | 1,
|
||||
translation: boolean,
|
||||
@@ -15,6 +17,9 @@ export interface Controls {
|
||||
API_KEY: string,
|
||||
voskModelPath: string,
|
||||
sosvModelPath: string,
|
||||
glmUrl: string,
|
||||
glmModel: string,
|
||||
glmApiKey: string,
|
||||
recordingPath: string,
|
||||
customized: boolean,
|
||||
customizedApp: string,
|
||||
|
||||
@@ -4,9 +4,10 @@ import {
|
||||
} from '../types'
|
||||
import { Log } from './Log'
|
||||
import { app, BrowserWindow } from 'electron'
|
||||
import { passwordMaskingForObject } from './UtilsFunc'
|
||||
import * as path from 'path'
|
||||
import * as fs from 'fs'
|
||||
import os from 'os'
|
||||
import * as os from 'os'
|
||||
|
||||
interface CaptionTranslation {
|
||||
time_s: string,
|
||||
@@ -44,13 +45,18 @@ const defaultControls: Controls = {
|
||||
sourceLang: 'en',
|
||||
targetLang: 'zh',
|
||||
transModel: 'ollama',
|
||||
ollamaName: '',
|
||||
ollamaName: 'qwen2.5:0.5b',
|
||||
ollamaUrl: 'http://localhost:11434',
|
||||
ollamaApiKey: '',
|
||||
engine: 'gummy',
|
||||
audio: 0,
|
||||
engineEnabled: false,
|
||||
API_KEY: '',
|
||||
voskModelPath: '',
|
||||
sosvModelPath: '',
|
||||
glmUrl: 'https://open.bigmodel.cn/api/paas/v4/audio/transcriptions',
|
||||
glmModel: 'glm-asr-2512',
|
||||
glmApiKey: '',
|
||||
recordingPath: getDesktopPath(),
|
||||
translation: true,
|
||||
recording: false,
|
||||
@@ -146,9 +152,7 @@ class AllConfig {
|
||||
}
|
||||
}
|
||||
this.controls.engineEnabled = engineEnabled
|
||||
let _controls = {...this.controls}
|
||||
_controls.API_KEY = _controls.API_KEY.replace(/./g, '*')
|
||||
Log.info('Set Controls:', _controls)
|
||||
Log.info('Set Controls:', passwordMaskingForObject(this.controls))
|
||||
}
|
||||
|
||||
public sendControls(window: BrowserWindow, info = true) {
|
||||
|
||||
@@ -1,12 +1,13 @@
|
||||
import { exec, spawn } from 'child_process'
|
||||
import { app } from 'electron'
|
||||
import { is } from '@electron-toolkit/utils'
|
||||
import path from 'path'
|
||||
import net from 'net'
|
||||
import * as path from 'path'
|
||||
import * as net from 'net'
|
||||
import { controlWindow } from '../ControlWindow'
|
||||
import { allConfig } from './AllConfig'
|
||||
import { i18n } from '../i18n'
|
||||
import { Log } from './Log'
|
||||
import { passwordMaskingForList } from './UtilsFunc'
|
||||
|
||||
export class CaptionEngine {
|
||||
appPath: string = ''
|
||||
@@ -60,7 +61,7 @@ export class CaptionEngine {
|
||||
this.appPath = path.join(process.resourcesPath, 'engine', 'main.exe')
|
||||
}
|
||||
else {
|
||||
this.appPath = path.join(process.resourcesPath, 'engine', 'main')
|
||||
this.appPath = path.join(process.resourcesPath, 'engine', 'main', 'main')
|
||||
}
|
||||
}
|
||||
this.command.push('-a', allConfig.controls.audio ? '1' : '0')
|
||||
@@ -87,6 +88,8 @@ export class CaptionEngine {
|
||||
this.command.push('-vosk', `"${allConfig.controls.voskModelPath}"`)
|
||||
this.command.push('-tm', allConfig.controls.transModel)
|
||||
this.command.push('-omn', allConfig.controls.ollamaName)
|
||||
if(allConfig.controls.ollamaUrl) this.command.push('-ourl', allConfig.controls.ollamaUrl)
|
||||
if(allConfig.controls.ollamaApiKey) this.command.push('-okey', allConfig.controls.ollamaApiKey)
|
||||
}
|
||||
else if(allConfig.controls.engine === 'sosv'){
|
||||
this.command.push('-e', 'sosv')
|
||||
@@ -94,15 +97,25 @@ export class CaptionEngine {
|
||||
this.command.push('-sosv', `"${allConfig.controls.sosvModelPath}"`)
|
||||
this.command.push('-tm', allConfig.controls.transModel)
|
||||
this.command.push('-omn', allConfig.controls.ollamaName)
|
||||
if(allConfig.controls.ollamaUrl) this.command.push('-ourl', allConfig.controls.ollamaUrl)
|
||||
if(allConfig.controls.ollamaApiKey) this.command.push('-okey', allConfig.controls.ollamaApiKey)
|
||||
}
|
||||
else if(allConfig.controls.engine === 'glm'){
|
||||
this.command.push('-e', 'glm')
|
||||
this.command.push('-s', allConfig.controls.sourceLang)
|
||||
this.command.push('-gurl', allConfig.controls.glmUrl)
|
||||
this.command.push('-gmodel', allConfig.controls.glmModel)
|
||||
if(allConfig.controls.glmApiKey) {
|
||||
this.command.push('-gkey', allConfig.controls.glmApiKey)
|
||||
}
|
||||
this.command.push('-tm', allConfig.controls.transModel)
|
||||
this.command.push('-omn', allConfig.controls.ollamaName)
|
||||
if(allConfig.controls.ollamaUrl) this.command.push('-ourl', allConfig.controls.ollamaUrl)
|
||||
if(allConfig.controls.ollamaApiKey) this.command.push('-okey', allConfig.controls.ollamaApiKey)
|
||||
}
|
||||
}
|
||||
Log.info('Engine Path:', this.appPath)
|
||||
if(this.command.length > 2 && this.command.at(-2) === '-k') {
|
||||
const _command = [...this.command]
|
||||
_command[_command.length -1] = _command[_command.length -1].replace(/./g, '*')
|
||||
Log.info('Engine Command:', _command)
|
||||
}
|
||||
else Log.info('Engine Command:', this.command)
|
||||
Log.info('Engine Command:', passwordMaskingForList(this.command))
|
||||
return true
|
||||
}
|
||||
|
||||
@@ -165,7 +178,7 @@ export class CaptionEngine {
|
||||
const data_obj = JSON.parse(line)
|
||||
handleEngineData(data_obj)
|
||||
} catch (e) {
|
||||
controlWindow.sendErrorMessage(i18n('engine.output.parse.error') + e)
|
||||
// controlWindow.sendErrorMessage(i18n('engine.output.parse.error') + e)
|
||||
Log.error('Error parsing JSON:', e)
|
||||
}
|
||||
}
|
||||
|
||||
24
src/main/utils/UtilsFunc.ts
Normal file
@@ -0,0 +1,24 @@
|
||||
function passwordMasking(pwd: string) {
|
||||
return pwd.replace(/./g, '*')
|
||||
}
|
||||
|
||||
export function passwordMaskingForList(args: string[]) {
|
||||
const maskedArgs = [...args]
|
||||
for(let i = 1; i < maskedArgs.length; i++) {
|
||||
if(maskedArgs[i-1] === '-k' || maskedArgs[i-1] === '-okey' || maskedArgs[i-1] === '-gkey') {
|
||||
maskedArgs[i] = passwordMasking(maskedArgs[i])
|
||||
}
|
||||
}
|
||||
return maskedArgs
|
||||
}
|
||||
|
||||
export function passwordMaskingForObject(args: Record<string, any>) {
|
||||
const maskedArgs = {...args}
|
||||
for(const key in maskedArgs) {
|
||||
const lKey = key.toLowerCase()
|
||||
if(lKey.includes('api') && lKey.includes('key')) {
|
||||
maskedArgs[key] = passwordMasking(maskedArgs[key])
|
||||
}
|
||||
}
|
||||
return maskedArgs
|
||||
}
|
||||
@@ -2,7 +2,7 @@
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<title>Auto Caption v1.0.0</title>
|
||||
<title>Auto Caption v1.1.1</title>
|
||||
<!-- https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP -->
|
||||
<meta
|
||||
http-equiv="Content-Security-Policy"
|
||||
|
||||
@@ -41,17 +41,63 @@
|
||||
<div class="input-item" v-if="transModel && currentTransModel === 'ollama'">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.ollamaNote') }}</p>
|
||||
<p class="label-hover-info">{{ $t('engine.modelNameNote') }}</p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>{{ $t('engine.ollama') }}</span>
|
||||
>{{ $t('engine.modelName') }}</span>
|
||||
</a-popover>
|
||||
<a-input
|
||||
class="input-area"
|
||||
v-model:value="currentOllamaName"
|
||||
></a-input>
|
||||
</div>
|
||||
<div class="input-item" v-if="transModel && currentTransModel === 'ollama'">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.baseURL') }}</p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>Base URL</span>
|
||||
</a-popover>
|
||||
<a-input
|
||||
class="input-area"
|
||||
v-model:value="currentOllamaUrl"
|
||||
placeholder="http://localhost:11434"
|
||||
></a-input>
|
||||
</div>
|
||||
<div class="input-item" v-if="transModel && currentTransModel === 'ollama'">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.apiKey') }}</p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>API Key</span>
|
||||
</a-popover>
|
||||
<a-input
|
||||
class="input-area"
|
||||
type="password"
|
||||
v-model:value="currentOllamaApiKey"
|
||||
/>
|
||||
</div>
|
||||
<div class="input-item" v-if="currentEngine === 'glm'">
|
||||
<span class="input-label">GLM API URL</span>
|
||||
<a-input
|
||||
class="input-area"
|
||||
v-model:value="currentGlmUrl"
|
||||
placeholder="https://open.bigmodel.cn/api/paas/v4/audio/transcriptions"
|
||||
></a-input>
|
||||
</div>
|
||||
<div class="input-item" v-if="currentEngine === 'glm'">
|
||||
<span class="input-label">GLM Model Name</span>
|
||||
<a-input
|
||||
class="input-area"
|
||||
v-model:value="currentGlmModel"
|
||||
placeholder="glm-asr-2512"
|
||||
></a-input>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<span class="input-label">{{ $t('engine.audioType') }}</span>
|
||||
<a-select
|
||||
@@ -115,7 +161,7 @@
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>{{ $t('engine.apikey') }}</span>
|
||||
>ALI {{ $t('engine.apikey') }}</span>
|
||||
</a-popover>
|
||||
<a-input
|
||||
class="input-area"
|
||||
@@ -123,6 +169,24 @@
|
||||
v-model:value="currentAPI_KEY"
|
||||
/>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
<p class="label-hover-info">{{ $t('engine.glmApikeyInfo') }}</p>
|
||||
<p><a href="https://open.bigmodel.cn/" target="_blank">
|
||||
https://open.bigmodel.cn
|
||||
</a></p>
|
||||
</template>
|
||||
<span class="input-label info-label"
|
||||
:style="{color: uiColor}"
|
||||
>GLM {{ $t('engine.apikey') }}</span>
|
||||
</a-popover>
|
||||
<a-input
|
||||
class="input-area"
|
||||
type="password"
|
||||
v-model:value="currentGlmApiKey"
|
||||
/>
|
||||
</div>
|
||||
<div class="input-item">
|
||||
<a-popover placement="right">
|
||||
<template #content>
|
||||
@@ -239,9 +303,14 @@ const currentTranslation = ref<boolean>(true)
|
||||
const currentRecording = ref<boolean>(false)
|
||||
const currentTransModel = ref('ollama')
|
||||
const currentOllamaName = ref('')
|
||||
const currentOllamaUrl = ref('')
|
||||
const currentOllamaApiKey = ref('')
|
||||
const currentAPI_KEY = ref<string>('')
|
||||
const currentVoskModelPath = ref<string>('')
|
||||
const currentSosvModelPath = ref<string>('')
|
||||
const currentGlmUrl = ref<string>('')
|
||||
const currentGlmModel = ref<string>('')
|
||||
const currentGlmApiKey = ref<string>('')
|
||||
const currentRecordingPath = ref<string>('')
|
||||
const currentCustomized = ref<boolean>(false)
|
||||
const currentCustomizedApp = ref('')
|
||||
@@ -294,12 +363,17 @@ function applyChange(){
|
||||
engineControl.transModel = currentTransModel.value
|
||||
engineControl.ollamaName = currentOllamaName.value
|
||||
engineControl.engine = currentEngine.value
|
||||
engineControl.ollamaUrl = currentOllamaUrl.value ?? "http://localhost:11434"
|
||||
engineControl.ollamaApiKey = currentOllamaApiKey.value
|
||||
engineControl.audio = currentAudio.value
|
||||
engineControl.translation = currentTranslation.value
|
||||
engineControl.recording = currentRecording.value
|
||||
engineControl.API_KEY = currentAPI_KEY.value
|
||||
engineControl.voskModelPath = currentVoskModelPath.value
|
||||
engineControl.sosvModelPath = currentSosvModelPath.value
|
||||
engineControl.glmUrl = currentGlmUrl.value ?? "https://open.bigmodel.cn/api/paas/v4/audio/transcriptions"
|
||||
engineControl.glmModel = currentGlmModel.value ?? "glm-asr-2512"
|
||||
engineControl.glmApiKey = currentGlmApiKey.value
|
||||
engineControl.recordingPath = currentRecordingPath.value
|
||||
engineControl.customized = currentCustomized.value
|
||||
engineControl.customizedApp = currentCustomizedApp.value
|
||||
@@ -320,6 +394,8 @@ function cancelChange(){
|
||||
currentTargetLang.value = engineControl.targetLang
|
||||
currentTransModel.value = engineControl.transModel
|
||||
currentOllamaName.value = engineControl.ollamaName
|
||||
currentOllamaUrl.value = engineControl.ollamaUrl
|
||||
currentOllamaApiKey.value = engineControl.ollamaApiKey
|
||||
currentEngine.value = engineControl.engine
|
||||
currentAudio.value = engineControl.audio
|
||||
currentTranslation.value = engineControl.translation
|
||||
@@ -327,6 +403,9 @@ function cancelChange(){
|
||||
currentAPI_KEY.value = engineControl.API_KEY
|
||||
currentVoskModelPath.value = engineControl.voskModelPath
|
||||
currentSosvModelPath.value = engineControl.sosvModelPath
|
||||
currentGlmUrl.value = engineControl.glmUrl
|
||||
currentGlmModel.value = engineControl.glmModel
|
||||
currentGlmApiKey.value = engineControl.glmApiKey
|
||||
currentRecordingPath.value = engineControl.recordingPath
|
||||
currentCustomized.value = engineControl.customized
|
||||
currentCustomizedApp.value = engineControl.customizedApp
|
||||
|
||||
@@ -101,7 +101,7 @@
|
||||
<p class="about-desc">{{ $t('status.about.desc') }}</p>
|
||||
<a-divider />
|
||||
<div class="about-info">
|
||||
<p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v1.0.0</a-tag></p>
|
||||
<p><b>{{ $t('status.about.version') }}</b><a-tag color="green">v1.1.1</a-tag></p>
|
||||
<p>
|
||||
<b>{{ $t('status.about.author') }}</b>
|
||||
<a
|
||||
|
||||
@@ -34,7 +34,7 @@ export const engines = {
|
||||
{ value: 'it', type: 1, label: '意大利语' },
|
||||
],
|
||||
transModel: [
|
||||
{ value: 'ollama', label: 'Ollama 本地模型' },
|
||||
{ value: 'ollama', label: 'Ollama 模型或 OpenAI 兼容模型' },
|
||||
{ value: 'google', label: 'Google API 调用' },
|
||||
]
|
||||
},
|
||||
@@ -55,7 +55,22 @@ export const engines = {
|
||||
{ value: 'it', type: 1, label: '意大利语' },
|
||||
],
|
||||
transModel: [
|
||||
{ value: 'ollama', label: 'Ollama 本地模型' },
|
||||
{ value: 'ollama', label: 'Ollama 模型或 OpenAI 兼容模型' },
|
||||
{ value: 'google', label: 'Google API 调用' },
|
||||
]
|
||||
},
|
||||
{
|
||||
value: 'glm',
|
||||
label: '云端 / 智谱AI / GLM-ASR',
|
||||
languages: [
|
||||
{ value: 'auto', type: -1, label: '自动检测' },
|
||||
{ value: 'en', type: 0, label: '英语' },
|
||||
{ value: 'zh', type: 0, label: '中文' },
|
||||
{ value: 'ja', type: 0, label: '日语' },
|
||||
{ value: 'ko', type: 0, label: '韩语' },
|
||||
],
|
||||
transModel: [
|
||||
{ value: 'ollama', label: 'Ollama 模型或 OpenAI 兼容模型' },
|
||||
{ value: 'google', label: 'Google API 调用' },
|
||||
]
|
||||
}
|
||||
@@ -94,7 +109,7 @@ export const engines = {
|
||||
{ value: 'it', type: 1, label: 'Italian' },
|
||||
],
|
||||
transModel: [
|
||||
{ value: 'ollama', label: 'Ollama Local Model' },
|
||||
{ value: 'ollama', label: 'Ollama Model or OpenAI-compatible Model' },
|
||||
{ value: 'google', label: 'Google API Call' },
|
||||
]
|
||||
},
|
||||
@@ -115,7 +130,22 @@ export const engines = {
|
||||
{ value: 'it', type: 1, label: 'Italian' },
|
||||
],
|
||||
transModel: [
|
||||
{ value: 'ollama', label: 'Ollama Local Model' },
|
||||
{ value: 'ollama', label: 'Ollama Model or OpenAI-compatible Model' },
|
||||
{ value: 'google', label: 'Google API Call' },
|
||||
]
|
||||
},
|
||||
{
|
||||
value: 'glm',
|
||||
label: 'Cloud / Zhipu AI / GLM-ASR',
|
||||
languages: [
|
||||
{ value: 'auto', type: -1, label: 'Auto Detect' },
|
||||
{ value: 'en', type: 0, label: 'English' },
|
||||
{ value: 'zh', type: 0, label: 'Chinese' },
|
||||
{ value: 'ja', type: 0, label: 'Japanese' },
|
||||
{ value: 'ko', type: 0, label: 'Korean' },
|
||||
],
|
||||
transModel: [
|
||||
{ value: 'ollama', label: 'Ollama Model or OpenAI-compatible Model' },
|
||||
{ value: 'google', label: 'Google API Call' },
|
||||
]
|
||||
}
|
||||
@@ -154,7 +184,7 @@ export const engines = {
|
||||
{ value: 'it', type: 1, label: 'イタリア語' },
|
||||
],
|
||||
transModel: [
|
||||
{ value: 'ollama', label: 'Ollama ローカルモデル' },
|
||||
{ value: 'ollama', label: 'Ollama モデルまたは OpenAI 互換モデル' },
|
||||
{ value: 'google', label: 'Google API 呼び出し' },
|
||||
]
|
||||
},
|
||||
@@ -175,7 +205,22 @@ export const engines = {
|
||||
{ value: 'it', type: 1, label: 'イタリア語' },
|
||||
],
|
||||
transModel: [
|
||||
{ value: 'ollama', label: 'Ollama ローカルモデル' },
|
||||
{ value: 'ollama', label: 'Ollama モデルまたは OpenAI 互換モデル' },
|
||||
{ value: 'google', label: 'Google API 呼び出し' },
|
||||
]
|
||||
},
|
||||
{
|
||||
value: 'glm',
|
||||
label: 'クラウド / 智譜AI / GLM-ASR',
|
||||
languages: [
|
||||
{ value: 'auto', type: -1, label: '自動検出' },
|
||||
{ value: 'en', type: 0, label: '英語' },
|
||||
{ value: 'zh', type: 0, label: '中国語' },
|
||||
{ value: 'ja', type: 0, label: '日本語' },
|
||||
{ value: 'ko', type: 0, label: '韓国語' },
|
||||
],
|
||||
transModel: [
|
||||
{ value: 'ollama', label: 'Ollama モデルまたは OpenAI 互換モデル' },
|
||||
{ value: 'google', label: 'Google API 呼び出し' },
|
||||
]
|
||||
}
|
||||
|
||||
@@ -22,7 +22,7 @@ export default {
|
||||
"stopped": "Caption Engine Stopped",
|
||||
"stoppedInfo": "The caption engine has stopped. You can click the 'Start Caption Engine' button to restart it.",
|
||||
"error": "An error occurred",
|
||||
"engineError": "The subtitle engine encountered an error and requested a forced exit.",
|
||||
"engineError": "The caption engine encountered an error and requested a forced exit.",
|
||||
"socketError": "The Socket connection between the main program and the caption engine failed",
|
||||
"engineChange": "Cpation Engine Configuration Changed",
|
||||
"changeInfo": "If the caption engine is already running, you need to restart it for the changes to take effect.",
|
||||
@@ -50,8 +50,10 @@ export default {
|
||||
"sourceLang": "Source",
|
||||
"transLang": "Translation",
|
||||
"transModel": "Model",
|
||||
"ollama": "Ollama",
|
||||
"ollamaNote": "To use for translation, the name of the local Ollama model that will call the service on the default port. It is recommended to use a non-inference model with less than 1B parameters.",
|
||||
"modelName": "Model Name",
|
||||
"modelNameNote": "Please enter the translation model name you wish to use, which can be either a local Ollama model or an OpenAI API compatible cloud model. If the Base URL field is left blank, the local Ollama service will be called by default; otherwise, the API service at the specified address will be called via the Python OpenAI library.",
|
||||
"baseURL": "The base request URL for calling OpenAI API. If left empty, the local default port Ollama model will be used.",
|
||||
"apiKey": "The API KEY required for the model corresponding to OpenAI API.",
|
||||
"captionEngine": "Engine",
|
||||
"audioType": "Audio Type",
|
||||
"systemOutput": "System Audio Output (Speaker)",
|
||||
@@ -65,9 +67,10 @@ export default {
|
||||
"recordingPath": "Save Path",
|
||||
"startTimeout": "Timeout",
|
||||
"seconds": "seconds",
|
||||
"apikeyInfo": "API KEY required for the Gummy subtitle engine, which needs to be obtained from the Alibaba Cloud Bailing platform. For more details, see the project user manual.",
|
||||
"voskModelPathInfo": "The folder path of the model required by the Vosk subtitle engine. You need to download the required model to your local machine in advance. For more details, see the project user manual.",
|
||||
"sosvModelPathInfo": "The folder path of the model required by the SOSV subtitle engine. You need to download the required model to your local machine in advance. For more details, see the project user manual.",
|
||||
"apikeyInfo": "API KEY required for the Gummy caption engine, which needs to be obtained from the Alibaba Cloud Bailing platform. For more details, see the project user manual.",
|
||||
"glmApikeyInfo": "API KEY required for GLM caption engine, which needs to be obtained from the Zhipu AI platform.",
|
||||
"voskModelPathInfo": "The folder path of the model required by the Vosk caption engine. You need to download the required model to your local machine in advance. For more details, see the project user manual.",
|
||||
"sosvModelPathInfo": "The folder path of the model required by the SOSV caption engine. You need to download the required model to your local machine in advance. For more details, see the project user manual.",
|
||||
"recordingPathInfo": "The path to save recording files, requiring a folder path. The software will automatically name the recording file and save it as .wav file.",
|
||||
"modelDownload": "Model Download Link",
|
||||
"startTimeoutInfo": "Caption engine startup timeout duration. Engine will be forcefully stopped if startup exceeds this time. Recommended range: 10-120 seconds.",
|
||||
@@ -143,7 +146,7 @@ export default {
|
||||
"projLink": "Project Link",
|
||||
"manual": "User Manual",
|
||||
"engineDoc": "Caption Engine Manual",
|
||||
"date": "September 8th, 2025"
|
||||
"date": "January 31, 2026"
|
||||
}
|
||||
},
|
||||
log: {
|
||||
|
||||
@@ -50,8 +50,10 @@ export default {
|
||||
"sourceLang": "ソース言語",
|
||||
"transLang": "翻訳言語",
|
||||
"transModel": "翻訳モデル",
|
||||
"ollama": "Ollama",
|
||||
"ollamaNote": "翻訳に使用する、デフォルトポートでサービスを呼び出すローカルOllamaモデルの名前。1B 未満のパラメータを持つ非推論モデルの使用を推奨します。",
|
||||
"modelName": "モデル名",
|
||||
"modelNameNote": "使用する翻訳モデル名を入力してください。Ollama のローカルモデルでも OpenAI API 互換のクラウドモデルでも可能です。Base URL フィールドが未入力の場合、デフォルトでローカルの Ollama サービスが呼び出され、それ以外の場合は Python OpenAI ライブラリ経由で指定されたアドレスの API サービスが呼び出されます。",
|
||||
"baseURL": "OpenAI API を呼び出すための基本リクエスト URL です。未記入の場合、ローカルのデフォルトポートの Ollama モデルが呼び出されます。",
|
||||
"apiKey": "OpenAI API に対応するモデルを使用するために必要な API キーです。",
|
||||
"captionEngine": "エンジン",
|
||||
"audioType": "オーディオ",
|
||||
"systemOutput": "システムオーディオ出力(スピーカー)",
|
||||
@@ -66,6 +68,7 @@ export default {
|
||||
"startTimeout": "時間制限",
|
||||
"seconds": "秒",
|
||||
"apikeyInfo": "Gummy 字幕エンジンに必要な API KEY は、アリババクラウド百煉プラットフォームから取得する必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
|
||||
"glmApikeyInfo": "GLM 字幕エンジンに必要な API KEY で、智譜 AI プラットフォームから取得する必要があります。",
|
||||
"voskModelPathInfo": "Vosk 字幕エンジンに必要なモデルのフォルダパスです。必要なモデルを事前にローカルマシンにダウンロードする必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
|
||||
"sosvModelPathInfo": "SOSV 字幕エンジンに必要なモデルのフォルダパスです。必要なモデルを事前にローカルマシンにダウンロードする必要があります。詳細情報はプロジェクトのユーザーマニュアルをご覧ください。",
|
||||
"recordingPathInfo": "録音ファイルの保存パスで、フォルダパスを指定する必要があります。ソフトウェアが自動的に録音ファイルに名前を付けて .wav ファイルとして保存します。",
|
||||
@@ -142,7 +145,7 @@ export default {
|
||||
"projLink": "プロジェクトリンク",
|
||||
"manual": "ユーザーマニュアル",
|
||||
"engineDoc": "字幕エンジンマニュアル",
|
||||
"date": "2025 年 9 月 8 日"
|
||||
"date": "2026 年 1 月 31 日"
|
||||
}
|
||||
},
|
||||
log: {
|
||||
|
||||
@@ -50,8 +50,10 @@ export default {
|
||||
"sourceLang": "源语言",
|
||||
"transLang": "翻译语言",
|
||||
"transModel": "翻译模型",
|
||||
"ollama": "Ollama",
|
||||
"ollamaNote": "要使用的进行翻译的本地 Ollama 模型的名称,将调用默认端口的服务,建议使用参数量小于 1B 的非推理模型。",
|
||||
"modelName": "模型名称",
|
||||
"modelNameNote": "请输入要使用的翻译模型名称,可以是 Ollama 本地模型,也可以是 OpenAI API 兼容的云端模型。若未填写 Base URL 字段,则默认调用本地 Ollama 服务,否则会通过 Python OpenAI 库调用该地址指向的 API 服务。",
|
||||
"baseURL": "调用 OpenAI API 的基础请求地址,如果不填写则调用本地默认端口的 Ollama 模型。",
|
||||
"apiKey": "调用 OpenAI API 对应的模型需要使用的 API KEY。",
|
||||
"captionEngine": "字幕引擎",
|
||||
"audioType": "音频类型",
|
||||
"systemOutput": "系统音频输出(扬声器)",
|
||||
@@ -66,6 +68,7 @@ export default {
|
||||
"startTimeout": "启动超时",
|
||||
"seconds": "秒",
|
||||
"apikeyInfo": "Gummy 字幕引擎需要的 API KEY,需要在阿里云百炼平台获取。详细信息见项目用户手册。",
|
||||
"glmApikeyInfo": "GLM 字幕引擎需要的 API KEY,需要在智谱 AI 平台获取。",
|
||||
"voskModelPathInfo": "Vosk 字幕引擎需要的模型的文件夹路径,需要提前下载需要的模型到本地。信息详情见项目用户手册。",
|
||||
"sosvModelPathInfo": "SOSV 字幕引擎需要的模型的文件夹路径,需要提前下载需要的模型到本地。信息详情见项目用户手册。",
|
||||
"recordingPathInfo": "录音文件保存路径,需要提供文件夹路径。软件会自动命名录音文件并保存为 .wav 文件。",
|
||||
@@ -142,7 +145,7 @@ export default {
|
||||
"projLink": "项目链接",
|
||||
"manual": "用户手册",
|
||||
"engineDoc": "字幕引擎手册",
|
||||
"date": "2025 年 9 月 8 日"
|
||||
"date": "2026 年 1 月 31 日"
|
||||
}
|
||||
},
|
||||
log: {
|
||||
|
||||
@@ -21,6 +21,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
|
||||
const targetLang = ref<string>('zh')
|
||||
const transModel = ref<string>('ollama')
|
||||
const ollamaName = ref<string>('')
|
||||
const ollamaUrl = ref<string>('')
|
||||
const ollamaApiKey = ref<string>('')
|
||||
const engine = ref<string>('gummy')
|
||||
const audio = ref<0 | 1>(0)
|
||||
const translation = ref<boolean>(true)
|
||||
@@ -28,6 +30,9 @@ export const useEngineControlStore = defineStore('engineControl', () => {
|
||||
const API_KEY = ref<string>('')
|
||||
const voskModelPath = ref<string>('')
|
||||
const sosvModelPath = ref<string>('')
|
||||
const glmUrl = ref<string>('https://open.bigmodel.cn/api/paas/v4/audio/transcriptions')
|
||||
const glmModel = ref<string>('glm-asr-2512')
|
||||
const glmApiKey = ref<string>('')
|
||||
const recordingPath = ref<string>('')
|
||||
const customized = ref<boolean>(false)
|
||||
const customizedApp = ref<string>('')
|
||||
@@ -44,6 +49,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
|
||||
targetLang: targetLang.value,
|
||||
transModel: transModel.value,
|
||||
ollamaName: ollamaName.value,
|
||||
ollamaUrl: ollamaUrl.value,
|
||||
ollamaApiKey: ollamaApiKey.value,
|
||||
engine: engine.value,
|
||||
audio: audio.value,
|
||||
translation: translation.value,
|
||||
@@ -51,6 +58,9 @@ export const useEngineControlStore = defineStore('engineControl', () => {
|
||||
API_KEY: API_KEY.value,
|
||||
voskModelPath: voskModelPath.value,
|
||||
sosvModelPath: sosvModelPath.value,
|
||||
glmUrl: glmUrl.value,
|
||||
glmModel: glmModel.value,
|
||||
glmApiKey: glmApiKey.value,
|
||||
recordingPath: recordingPath.value,
|
||||
customized: customized.value,
|
||||
customizedApp: customizedApp.value,
|
||||
@@ -80,6 +90,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
|
||||
targetLang.value = controls.targetLang
|
||||
transModel.value = controls.transModel
|
||||
ollamaName.value = controls.ollamaName
|
||||
ollamaUrl.value = controls.ollamaUrl
|
||||
ollamaApiKey.value = controls.ollamaApiKey
|
||||
engine.value = controls.engine
|
||||
audio.value = controls.audio
|
||||
engineEnabled.value = controls.engineEnabled
|
||||
@@ -88,6 +100,9 @@ export const useEngineControlStore = defineStore('engineControl', () => {
|
||||
API_KEY.value = controls.API_KEY
|
||||
voskModelPath.value = controls.voskModelPath
|
||||
sosvModelPath.value = controls.sosvModelPath
|
||||
glmUrl.value = controls.glmUrl || 'https://open.bigmodel.cn/api/paas/v4/audio/transcriptions'
|
||||
glmModel.value = controls.glmModel || 'glm-asr-2512'
|
||||
glmApiKey.value = controls.glmApiKey
|
||||
recordingPath.value = controls.recordingPath
|
||||
customized.value = controls.customized
|
||||
customizedApp.value = controls.customizedApp
|
||||
@@ -150,6 +165,8 @@ export const useEngineControlStore = defineStore('engineControl', () => {
|
||||
targetLang, // 目标语言
|
||||
transModel, // 翻译模型
|
||||
ollamaName, // Ollama 模型
|
||||
ollamaUrl,
|
||||
ollamaApiKey,
|
||||
engine, // 字幕引擎
|
||||
audio, // 选择音频
|
||||
translation, // 是否启用翻译
|
||||
@@ -157,6 +174,9 @@ export const useEngineControlStore = defineStore('engineControl', () => {
|
||||
API_KEY, // API KEY
|
||||
voskModelPath, // vosk 模型路径
|
||||
sosvModelPath, // sosv 模型路径
|
||||
glmUrl, // GLM API URL
|
||||
glmModel, // GLM 模型名称
|
||||
glmApiKey, // GLM API Key
|
||||
recordingPath, // 录音保存路径
|
||||
customized, // 是否使用自定义字幕引擎
|
||||
customizedApp, // 自定义字幕引擎的应用程序
|
||||
|
||||
@@ -8,6 +8,8 @@ export interface Controls {
|
||||
targetLang: string,
|
||||
transModel: string,
|
||||
ollamaName: string,
|
||||
ollamaUrl: string,
|
||||
ollamaApiKey: string,
|
||||
engine: string,
|
||||
audio: 0 | 1,
|
||||
translation: boolean,
|
||||
@@ -15,6 +17,9 @@ export interface Controls {
|
||||
API_KEY: string,
|
||||
voskModelPath: string,
|
||||
sosvModelPath: string,
|
||||
glmUrl: string,
|
||||
glmModel: string,
|
||||
glmApiKey: string,
|
||||
recordingPath: string,
|
||||
customized: boolean,
|
||||
customizedApp: string,
|
||||
|
||||
@@ -61,7 +61,11 @@
|
||||
</template>
|
||||
</div>
|
||||
|
||||
<div class="title-bar" :style="{color: captionStyle.fontColor}">
|
||||
<div class="title-bar"
|
||||
:style="{color: captionStyle.fontColor}"
|
||||
@mouseenter="onTitleBarEnter()"
|
||||
@mouseleave="onTitleBarLeave()"
|
||||
>
|
||||
<div class="option-item" @click="closeCaptionWindow">
|
||||
<CloseOutlined />
|
||||
</div>
|
||||
@@ -96,7 +100,7 @@ const captionLog = useCaptionLogStore();
|
||||
const { captionData } = storeToRefs(captionLog);
|
||||
const caption = ref();
|
||||
const windowHeight = ref(100);
|
||||
const pinned = ref(true);
|
||||
const pinned = ref(false);
|
||||
|
||||
onMounted(() => {
|
||||
const resizeObserver = new ResizeObserver(entries => {
|
||||
@@ -114,7 +118,7 @@ onMounted(() => {
|
||||
|
||||
function pinCaptionWindow() {
|
||||
pinned.value = !pinned.value;
|
||||
window.electron.ipcRenderer.send('caption.pin.set', pinned.value)
|
||||
window.electron.ipcRenderer.send('caption.mouseEvents.ignore', pinned.value)
|
||||
}
|
||||
|
||||
function openControlWindow() {
|
||||
@@ -124,6 +128,18 @@ function openControlWindow() {
|
||||
function closeCaptionWindow() {
|
||||
window.electron.ipcRenderer.send('caption.window.close')
|
||||
}
|
||||
|
||||
function onTitleBarEnter() {
|
||||
if(pinned.value) {
|
||||
window.electron.ipcRenderer.send('caption.mouseEvents.ignore', false)
|
||||
}
|
||||
}
|
||||
|
||||
function onTitleBarLeave() {
|
||||
if(pinned.value) {
|
||||
window.electron.ipcRenderer.send('caption.mouseEvents.ignore', true)
|
||||
}
|
||||
}
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
|
||||