5 Commits

Author SHA1 Message Date
himeditator
a0a0a2e66d feat(caption): 调整字幕窗口、添加字幕时间轴修改 (#8)
- 新增修改字幕时间功能
- 添加导出字幕记录类型,支持 srt 和 json 格式
- 调整字幕窗口右上角图标为竖向排布
2025-07-14 20:07:22 +08:00
himeditator
665c47d24f feat(linux): 支持 Linux 系统音频输出
- 添加了对 Linux 系统音频输出的支持
- 更新了 README 和用户手册中的平台兼容性信息
- 修改了 AudioStream 类以支持 Linux 平台
2025-07-13 23:28:40 +08:00
himeditator
7f8766b13e docs(engine-manual): 更新字幕引擎开发文档
- 添加了命令行参数指定的详细说明
- 增加了字幕引擎打包和运行的步骤说明
- 修复了一些文档中的错误和拼写问题
2025-07-11 13:25:52 +08:00
himeditator
6920957152 Merge branch 'dev-v0.4.0-vosk' 2025-07-11 02:32:33 +08:00
Chen Janai
0af5bab75d Merge pull request #7 from HiMeditator/dev-v0.4.0-vosk
Release v0.4.0 with Vosk Caption Engine
2025-07-11 01:36:08 +08:00
29 changed files with 662 additions and 224 deletions

4
.npmrc
View File

@@ -1,2 +1,2 @@
# electron_mirror=https://npmmirror.com/mirrors/electron/
# electron_builder_binaries_mirror=https://npmmirror.com/mirrors/electron-builder-binaries/
electron_mirror=https://npmmirror.com/mirrors/electron/
electron_builder_binaries_mirror=https://npmmirror.com/mirrors/electron-builder-binaries/

View File

@@ -39,7 +39,15 @@
## 📖 基本使用
目前提供了 WindowsmacOS 平台的可安装版本。
软件已经适配了 WindowsmacOS 和 Linux 平台。测试过的平台信息如下:
| 操作系统版本 | 处理器架构 | 获取系统音频输入 | 获取系统音频输出 |
| ------------------ | ---------- | ---------------- | ---------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅需要额外配置 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅需要额外配置 | ✅ |
macOS 平台和 Linux 平台获取系统音频输出需要进行额外设置,详见[Auto Caption 用户手册](./docs/user-manual/zh.md)。
> 国际版的阿里云服务并没有提供 Gummy 模型,因此目前非中国用户无法使用 Gummy 字幕引擎。
@@ -54,7 +62,6 @@
![](./assets/media/vosk_zh.png)
**如果你觉得上述字幕引擎不能满足你的需求,而且你会 Python那么你可以考虑开发自己的字幕引擎。详细说明请参考[字幕引擎说明文档](./docs/engine-manual/zh.md)。**
## ✨ 特性
@@ -66,10 +73,6 @@
- 字幕记录展示与导出
- 生成音频输出或麦克风输入的字幕
说明:
- Windows 和 macOS 平台支持生成音频输出和麦克风输入的字幕,但是 **macOS 平台获取系统音频输出需要进行设置,详见[Auto Caption 用户手册](./docs/user-manual/zh.md)**
- Linux 平台目前无法获取系统音频输出,仅支持生成麦克风输入的字幕
## ⚙️ 自带字幕引擎说明
目前软件自带 2 个字幕引擎,正在规划 1 个新的引擎。它们的详细信息如下。
@@ -137,12 +140,21 @@ subenv/Scripts/activate
source subenv/bin/activate
```
然后安装依赖(注意如果是 Linux 或 macOS 环境,需要注释掉 `requirements.txt` 中的 `PyAudioWPatch`,该模块仅适用于 Windows 环境)。
> 这一步可能会报错,一般是因为构建失败,需要根据报错信息安装对应的构建工具包。
然后安装依赖(这一步可能会报错,一般是因为构建失败,需要根据报错信息安装对应的工具包):
```bash
pip install -r requirements.txt
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
如果在 Linux 系统上安装 samplerate 模块报错,可以尝试使用以下命令单独安装:
```bash
pip install samplerate --only-binary=:all:
```
然后使用 `pyinstaller` 构建项目:
@@ -152,7 +164,7 @@ pyinstaller ./main-gummy.spec
pyinstaller ./main-vosk.spec
```
注意 `main-vosk.spec` 文件中 `vsok` 库的路径可能不正确,需要根据实际状况配置。
注意 `main-vosk.spec` 文件中 `vosk` 库的路径可能不正确,需要根据实际状况配置。
```
# Windows
@@ -168,6 +180,7 @@ vosk_path = str(Path('./subenv/lib/python3.x/site-packages/vosk').resolve())
```bash
npm run dev
```
### 构建项目
注意目前软件只在 Windows 和 macOS 平台上进行了构建和测试,无法保证软件在 Linux 平台下的正确性。

View File

@@ -39,7 +39,15 @@
## 📖 Basic Usage
Currently, installable versions are available for Windows and macOS platforms.
The software has been adapted for Windows, macOS, and Linux platforms. The tested platform information is as follows:
| OS Version | Architecture | System Audio Input | System Audio Output |
| ------------------ | ------------ | ------------------ | ------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ Additional config required | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ Additional config required | ✅ |
Additional configuration is required to capture system audio output on macOS and Linux platforms. See [Auto Caption User Manual](./docs/user-manual/en.md) for details.
> The international version of Alibaba Cloud services does not provide the Gummy model, so non-Chinese users currently cannot use the Gummy caption engine.
@@ -65,10 +73,6 @@ To use the Vosk local caption engine, first download your required model from [V
- Caption recording display and export
- Generate captions for audio output or microphone input
Notes:
- Windows and macOS platforms support generating captions for both audio output and microphone input, but **macOS requires additional setup to capture system audio output. See [Auto Caption User Manual](./docs/user-manual/en.md) for details.**
- Linux platform currently cannot capture system audio output, only supports generating subtitles for microphone input.
## ⚙️ Built-in Subtitle Engines
Currently, the software comes with 2 subtitle engines, with 1 new engine planned. Details are as follows.
@@ -136,12 +140,21 @@ subenv/Scripts/activate
source subenv/bin/activate
```
Then install dependencies (note: for Linux or macOS environments, you need to comment out `PyAudioWPatch` in `requirements.txt`, as this module is only for Windows environments).
> This step may report errors, usually due to build failures. You need to install corresponding build tools based on the error messages.
Then install dependencies (this step may fail, usually due to build failures - you'll need to install the corresponding tool packages based on the error messages):
```bash
pip install -r requirements.txt
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
If you encounter errors when installing the `samplerate` module on Linux systems, you can try installing it separately with this command:
```bash
pip install samplerate --only-binary=:all:
```
Then use `pyinstaller` to build the project:

View File

@@ -39,7 +39,15 @@
## 📖 基本使い方
現在、WindowsmacOS プラットフォーム向けのインストール可能なバージョンを提供しています。
このソフトウェアはWindowsmacOS、Linuxプラットフォームに対応しています。テスト済みのプラットフォーム情報は以下の通りです:
| OS バージョン | アーキテクチャ | システムオーディオ入力 | システムオーディオ出力 |
| ------------------ | ------------ | ------------------ | ------------------- |
| Windows 11 24H2 | x64 | ✅ | ✅ |
| macOS Sequoia 15.5 | arm64 | ✅ 追加設定が必要 | ✅ |
| Ubuntu 24.04.2 | x64 | ✅ 追加設定が必要 | ✅ |
macOSおよびLinuxプラットフォームでシステムオーディオ出力を取得するには追加設定が必要です。詳細は[Auto Captionユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。
> 阿里雲の国際版サービスでは Gummy モデルを提供していないため、現在中国以外のユーザーは Gummy 字幕エンジンを使用できません。
@@ -65,10 +73,6 @@ Vosk ローカル字幕エンジンを使用するには、まず [Vosk Models](
- 字幕記録の表示とエクスポート
- オーディオ出力またはマイク入力からの字幕生成
注記:
- Windows と macOS プラットフォームはオーディオ出力とマイク入力の両方からの字幕生成をサポートしていますが、**macOS プラットフォームでシステムオーディオ出力を取得するには設定が必要です。詳細は[Auto Caption ユーザーマニュアル](./docs/user-manual/ja.md)をご覧ください。**
- Linux プラットフォームは現在システムオーディオ出力を取得できず、マイク入力からの字幕生成のみをサポートしています。
## ⚙️ 字幕エンジン説明
現在ソフトウェアには2つの字幕エンジンが組み込まれており、1つの新しいエンジンを計画中です。詳細は以下の通りです。
@@ -136,12 +140,21 @@ subenv/Scripts/activate
source subenv/bin/activate
```
その後、依存関係をインストールします(Linux または macOS 環境の場合、`requirements.txt` 内の `PyAudioWPatch` をコメントアウトする必要があります。このモジュールは Windows 環境専用です)。
> このステップでエラーが発生する場合があります。一般的にはビルド失敗が原因で、エラーメッセージに基づいて対応するビルドツールパッケージをインストールする必要があります。
次に依存関係をインストールします(このステップは失敗する可能性があります、通常はビルド失敗が原因です - エラーメッセージに基づいて対応するツールパッケージをインストールする必要があります):
```bash
pip install -r requirements.txt
# Windows
pip install -r requirements_win.txt
# macOS
pip install -r requirements_darwin.txt
# Linux
pip install -r requirements_linux.txt
```
Linuxシステムで`samplerate`モジュールのインストールに問題が発生した場合、以下のコマンドで個別にインストールを試すことができます:
```bash
pip install samplerate --only-binary=:all:
```
その後、`pyinstaller` を使用してプロジェクトをビルドします:

View File

@@ -1,8 +1,12 @@
# -*- mode: python ; coding: utf-8 -*-
from pathlib import Path
import sys
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
if sys.platform == 'win32':
vosk_path = str(Path('./subenv/Lib/site-packages/vosk').resolve())
else:
vosk_path = str(Path('./subenv/lib/python3.12/site-packages/vosk').resolve())
a = Analysis(
['main-vosk.py'],

View File

@@ -2,6 +2,5 @@ dashscope
numpy
samplerate
PyAudio
PyAudioWPatch # Windows only
vosk
pyinstaller

View File

@@ -0,0 +1,5 @@
dashscope
numpy
vosk
pyinstaller
samplerate # pip install samplerate --only-binary=:all:

View File

@@ -0,0 +1,7 @@
dashscope
numpy
samplerate
PyAudio
PyAudioWPatch
vosk
pyinstaller

View File

@@ -1,7 +1,34 @@
"""获取 Linux 系统音频输入流"""
import pyaudio
import subprocess
def findMonitorSource():
result = subprocess.run(
["pactl", "list", "short", "sources"],
stdout=subprocess.PIPE, text=True
)
lines = result.stdout.splitlines()
for line in lines:
parts = line.split('\t')
if len(parts) >= 2 and ".monitor" in parts[1]:
return parts[1]
raise RuntimeError("System output monitor device not found")
def findInputSource():
result = subprocess.run(
["pactl", "list", "short", "sources"],
stdout=subprocess.PIPE, text=True
)
lines = result.stdout.splitlines()
for line in lines:
parts = line.split('\t')
name = parts[1]
if ".monitor" not in name:
return name
raise RuntimeError("Microphone input device not found")
class AudioStream:
"""
@@ -13,26 +40,26 @@ class AudioStream:
"""
def __init__(self, audio_type=1, chunk_rate=20):
self.audio_type = audio_type
self.mic = pyaudio.PyAudio()
self.device = self.mic.get_default_input_device_info()
self.stream = None
self.SAMP_WIDTH = pyaudio.get_sample_size(pyaudio.paInt16)
self.FORMAT = pyaudio.paInt16
self.CHANNELS = self.device["maxInputChannels"]
self.RATE = int(self.device["defaultSampleRate"])
if self.audio_type == 0:
self.source = findMonitorSource()
else:
self.source = findInputSource()
self.process = None
self.SAMP_WIDTH = 2
self.FORMAT = 16
self.CHANNELS = 2
self.RATE = 48000
self.CHUNK = self.RATE // chunk_rate
self.INDEX = self.device["index"]
def printInfo(self):
dev_info = f"""
采样输入设备
- 设备类型:{ "音频输Linux平台目前仅支持该项" }
- 序号{self.device['index']}
- 名称{self.device['name']}
- 最大输入通道数:{self.device['maxInputChannels']}
- 默认低输入延迟:{self.device['defaultLowInputLatency']}s
- 默认高输入延迟:{self.device['defaultHighInputLatency']}s
- 默认采样率:{self.device['defaultSampleRate']}Hz
音频捕获进程
- 捕获类型:{"音频输" if self.audio_type == 0 else "音频输入"}
- 设备源{self.source}
- 捕获进程PID{self.process.pid if self.process else "None"}
音频样本块大小:{self.CHUNK}
样本位宽:{self.SAMP_WIDTH}
@@ -44,30 +71,24 @@ class AudioStream:
def openStream(self):
"""
打开并返回系统音频输出流
启动音频捕获进程
"""
if self.stream: return self.stream
self.stream = self.mic.open(
format = self.FORMAT,
channels = int(self.CHANNELS),
rate = self.RATE,
input = True,
input_device_index = int(self.INDEX)
self.process = subprocess.Popen(
["parec", "-d", self.source, "--format=s16le", "--rate=48000", "--channels=2"],
stdout=subprocess.PIPE
)
return self.stream
def read_chunk(self):
"""
读取音频数据
"""
if not self.stream: return None
return self.stream.read(self.CHUNK)
if self.process:
return self.process.stdout.read(self.CHUNK)
return None
def closeStream(self):
"""
关闭系统音频输出流
关闭系统音频捕获进程
"""
if self.stream is None: return
self.stream.stop_stream()
self.stream.close()
self.stream = None
if self.process:
self.process.terminate()

View File

@@ -10,12 +10,19 @@
- [x] 适配 macOS 平台 *2025/07/08*
- [x] 添加字幕文字描边 *2025/07/09*
- [x] 添加基于 Vosk 的字幕引擎 *2025/07/09*
- [x] 适配 Linux 平台 *2025/07/13*
- [x] 字幕窗口右上角图标改为竖向排布 *2025/07/14*
- [x] 可以调整字幕时间轴 *2025/07/14*
- [x] 可以导出 srt 格式的字幕记录 *2025/07/14*
## 待完成
- [ ] 可以获取字幕引擎的系统资源消耗情况
## 后续计划
- [ ] 添加 Ollama 模型用于本地字幕引擎的翻译
- [ ] 添加本地字幕引擎
- [ ] 验证 / 添加基于 FunASR 的字幕引擎
- [ ] 验证 / 添加基于 FunASR 的字幕引擎
- [ ] 减小软件不必要的体积
## 遥远的未来

View File

@@ -151,6 +151,51 @@ Data receiver code is as follows:
...
```
## Usage of Caption Engine
### Command Line Parameter Specification
The custom caption engine settings are specified via command line parameters. Common required parameters are as follows:
```python
import argparse
...
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
parser.add_argument('-c', '--chunk_rate', default=20, help='The number of audio stream chunks collected per second.')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
args = parser.parse_args()
convert_audio_to_text(
args.source_language,
args.target_language,
int(args.audio_type),
int(args.chunk_rate),
args.api_key
)
```
For example, to specify Japanese as source language, Chinese as target language, capture system audio output, and collect 0.1s audio chunks, use the following command:
```bash
python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>
```
### Packaging
After development and testing, package the caption engine into an executable file using `pyinstaller`. If errors occur, check for missing dependencies.
### Execution
With a working caption engine, specify its path and runtime parameters in the caption software window to launch it.
![](../img/02_en.png)
## Reference Code
The `main-gummy.py` file under the `caption-engine` folder in this project serves as the entry point for the default caption engine. The `src\main\utils\engine.ts` file contains the server-side code for acquiring and processing data from the caption engine. You can read and understand the implementation details and the complete execution process of the caption engine as needed.

View File

@@ -125,4 +125,77 @@ sys.stdout.reconfigure(line_buffering=True)
...
```
データ受信側のコード
データ受信側のコード
```typescript
// src\main\utils\engine.ts
...
this.process.stdout.on('data', (data) => {
const lines = data.toString().split('\n');
lines.forEach((line: string) => {
if (line.trim()) {
try {
const caption = JSON.parse(line);
addCaptionLog(caption);
} catch (e) {
controlWindow.sendErrorMessage('字幕エンジンの出力をJSONオブジェクトとして解析できません:' + e)
console.error('[ERROR] JSON解析エラー:', e);
}
}
});
});
this.process.stderr.on('data', (data) => {
controlWindow.sendErrorMessage('字幕エンジンエラー:' + data)
console.error(`[ERROR] サブプロセスエラー: ${data}`);
});
...
```
## 字幕エンジンの使用方法
### コマンドライン引数の指定
カスタム字幕エンジンの設定はコマンドライン引数で指定します。主な必要なパラメータは以下の通りです:
```python
import argparse
...
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='システムのオーディオストリームをテキストに変換')
parser.add_argument('-s', '--source_language', default='en', help='ソース言語コード')
parser.add_argument('-t', '--target_language', default='zh', help='ターゲット言語コード')
parser.add_argument('-a', '--audio_type', default=0, help='オーディオストリームソース: 0は出力音声、1は入力音声')
parser.add_argument('-c', '--chunk_rate', default=20, help='1秒間に収集するオーディオチャンク数')
parser.add_argument('-k', '--api_key', default='', help='GummyモデルのAPIキー')
args = parser.parse_args()
convert_audio_to_text(
args.source_language,
args.target_language,
int(args.audio_type),
int(args.chunk_rate),
args.api_key
)
```
原文を日本語、翻訳を中国語に指定し、システム音声出力を取得、0.1秒のオーディオデータを収集する場合:
```bash
python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>
```
### パッケージ化
開発とテスト完了後、`pyinstaller`を使用して実行可能ファイルにパッケージ化します。エラーが発生した場合、依存ライブラリの不足を確認してください。
### 実行
利用可能な字幕エンジンが準備できたら、字幕ソフトウェアのウィンドウでエンジンのパスと実行パラメータを指定して起動します。
![](../img/02_ja.png)
## 参考コード
本プロジェクトの`caption-engine`フォルダにある`main-gummy.py`ファイルはデフォルトの字幕エンジンのエントリーコードです。`src\main\utils\engine.ts`はサーバー側で字幕エンジンのデータを取得・処理するコードです。必要に応じて字幕エンジンの実装詳細と完全な実行プロセスを理解するために参照してください。

View File

@@ -32,7 +32,7 @@
import sys
import argparse
# 引入系统音频获取
# 引入系统音频获取
if sys.platform == 'win32':
from sysaudio.win import AudioStream
elif sys.platform == 'darwin':
@@ -100,7 +100,7 @@ export interface CaptionItem {
}
```
**注意必须确保咱们一起每输出一次字幕 JSON 数据就得刷新缓冲区,确保 electron 主进程每次接收到的字符串都可以被解释为 JSON 对象。**
**注意必须确保每输出一次字幕 JSON 数据就得刷新缓冲区,确保 electron 主进程每次接收到的字符串都可以被解释为 JSON 对象。**
如果使用 python 语言,可以参考以下方式将数据传递给主程序:
@@ -151,6 +151,51 @@ sys.stdout.reconfigure(line_buffering=True)
...
```
## 字幕引擎的使用
### 命令行参数的指定
自定义字幕引擎的设置提供命令行参数指定,因此需要设置好字幕引擎的参数,常见的需要的参数如下:
```python
import argparse
...
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert system audio stream to text')
parser.add_argument('-s', '--source_language', default='en', help='Source language code')
parser.add_argument('-t', '--target_language', default='zh', help='Target language code')
parser.add_argument('-a', '--audio_type', default=0, help='Audio stream source: 0 for output audio stream, 1 for input audio stream')
parser.add_argument('-c', '--chunk_rate', default=20, help='The number of audio stream chunks collected per second.')
parser.add_argument('-k', '--api_key', default='', help='API KEY for Gummy model')
args = parser.parse_args()
convert_audio_to_text(
args.source_language,
args.target_language,
int(args.audio_type),
int(args.chunk_rate),
args.api_key
)
```
比如对应上面的字幕引擎,我想指定原文为日语,翻译为中文,获取系统音频输出的字幕,每次截取 0.1s 的音频数据,那么命令行参数如下:
```bash
python main-gummy.py -s ja -t zh -a 0 -c 10 -k <your-api-key>
```
### 打包
在完成字幕引擎的开发和测试后,需要将字幕引擎打包成可执行文件。一般使用 `pyinstaller` 进行打包。如果打包好的字幕引擎文件执行报错,可能是打包漏掉了某些依赖库,请检查是否缺少了依赖库。
### 运行
有了可以使用的字幕引擎,就可以在字幕软件窗口中通过指定字幕引擎的路径和字幕引擎的运行指令(参数)来启动字幕引擎了。
![](../img/02_zh.png)
## 参考代码
本项目 `caption-engine` 文件夹下的 `main-gummy.py` 文件为默认字幕引擎的入口代码。`src\main\utils\engine.ts` 为服务端获取字幕引擎数据和进行处理的代码。可以根据需要阅读了解字幕引擎的实现细节和完整运行过程。

View File

@@ -61,6 +61,30 @@ Once BlackHole is confirmed installed, in the `Audio MIDI Setup` page, click the
Now the caption engine can capture system audio output and generate captions.
## Getting System Audio Output on Linux
Execute the following commands to install `pulseaudio` and `pavucontrol`:
```bash
# For Debian or Ubuntu, etc.
sudo apt install pulseaudio pavucontrol
# For CentOS, etc.
sudo yum install pulseaudio pavucontrol
```
Then execute:
```bash
pactl list short sources
```
If you see output similar to the following, the configuration was successful:
```bash
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
```
## Software Usage
### Modifying Settings

View File

@@ -64,6 +64,30 @@ BlackHoleのインストールが確認できたら、`オーディオ MIDI 設
これで字幕エンジンがシステムオーディオ出力をキャプチャし、字幕を生成できるようになります。
## Linux でシステムオーディオ出力を取得する
以下のコマンドを実行して `pulseaudio``pavucontrol` をインストールします:
```bash
# Debian や Ubuntu など
sudo apt install pulseaudio pavucontrol
# CentOS など
sudo yum install pulseaudio pavucontrol
```
次に実行:
```bash
pactl list short sources
```
以下のような出力があれば設定は成功です:
```bash
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
```
## ソフトウェアの使い方
### 設定の変更

View File

@@ -29,7 +29,6 @@ Auto Caption 是一个跨平台的字幕显示软件,能够实时获取系统
这部分阿里云提供了详细的教程,可参考:
- [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
- [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
## Vosk 引擎使用前准备
@@ -62,6 +61,30 @@ brew install blackhole-64ch
现在字幕引擎就能捕获系统的音频输出并生成字幕了。
## Linux 获取系统音频输出
执行以下命令安装 `pulseaudio``pavucontrol`
```bash
# Debian or Ubuntu, etc.
sudo apt install pulseaudio pavucontrol
# CentOS, etc.
sudo yum install pulseaudio pavucontrol
```
然后执行:
```bash
pactl list short sources
```
如果有以下类似的输出内容则配置成功:
```bash
220 alsa_output.pci-0000_02_02.0.3.analog-stereo.monitor PipeWire s16le 2ch 48000Hz SUSPENDED
221 alsa_input.pci-0000_02_02.0.3.analog-stereo PipeWire s16le 2ch 48000Hz SUSPENDED
```
## 软件使用
### 修改设置

View File

@@ -1,64 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "440d4a07",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"d:\\Projects\\auto-caption\\caption-engine\\subenv\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n",
"None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\n"
]
},
{
"ename": "ImportError",
"evalue": "\nMarianTokenizer requires the SentencePiece library but it was not found in your environment. Check out the instructions on the\ninstallation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones\nthat match your environment. Please note that you may need to restart your runtime after installation.\n",
"output_type": "error",
"traceback": [
"\u001b[31m---------------------------------------------------------------------------\u001b[39m",
"\u001b[31mImportError\u001b[39m Traceback (most recent call last)",
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 3\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mtransformers\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m MarianMTModel, MarianTokenizer\n\u001b[32m----> \u001b[39m\u001b[32m3\u001b[39m tokenizer = \u001b[43mMarianTokenizer\u001b[49m\u001b[43m.\u001b[49m\u001b[43mfrom_pretrained\u001b[49m(\u001b[33m\"\u001b[39m\u001b[33mHelsinki-NLP/opus-mt-en-zh\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m 4\u001b[39m model = MarianMTModel.from_pretrained(\u001b[33m\"\u001b[39m\u001b[33mHelsinki-NLP/opus-mt-en-zh\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m 6\u001b[39m tokenizer.save_pretrained(\u001b[33m\"\u001b[39m\u001b[33m./model_en_zh\u001b[39m\u001b[33m\"\u001b[39m)\n",
"\u001b[36mFile \u001b[39m\u001b[32md:\\Projects\\auto-caption\\caption-engine\\subenv\\Lib\\site-packages\\transformers\\utils\\import_utils.py:1994\u001b[39m, in \u001b[36mDummyObject.__getattribute__\u001b[39m\u001b[34m(cls, key)\u001b[39m\n\u001b[32m 1992\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m (key.startswith(\u001b[33m\"\u001b[39m\u001b[33m_\u001b[39m\u001b[33m\"\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m key != \u001b[33m\"\u001b[39m\u001b[33m_from_config\u001b[39m\u001b[33m\"\u001b[39m) \u001b[38;5;129;01mor\u001b[39;00m key == \u001b[33m\"\u001b[39m\u001b[33mis_dummy\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;129;01mor\u001b[39;00m key == \u001b[33m\"\u001b[39m\u001b[33mmro\u001b[39m\u001b[33m\"\u001b[39m \u001b[38;5;129;01mor\u001b[39;00m key == \u001b[33m\"\u001b[39m\u001b[33mcall\u001b[39m\u001b[33m\"\u001b[39m:\n\u001b[32m 1993\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28msuper\u001b[39m().\u001b[34m__getattribute__\u001b[39m(key)\n\u001b[32m-> \u001b[39m\u001b[32m1994\u001b[39m \u001b[43mrequires_backends\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mcls\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mcls\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_backends\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[36mFile \u001b[39m\u001b[32md:\\Projects\\auto-caption\\caption-engine\\subenv\\Lib\\site-packages\\transformers\\utils\\import_utils.py:1980\u001b[39m, in \u001b[36mrequires_backends\u001b[39m\u001b[34m(obj, backends)\u001b[39m\n\u001b[32m 1977\u001b[39m failed.append(msg.format(name))\n\u001b[32m 1979\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m failed:\n\u001b[32m-> \u001b[39m\u001b[32m1980\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m(\u001b[33m\"\u001b[39m\u001b[33m\"\u001b[39m.join(failed))\n",
"\u001b[31mImportError\u001b[39m: \nMarianTokenizer requires the SentencePiece library but it was not found in your environment. Check out the instructions on the\ninstallation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones\nthat match your environment. Please note that you may need to restart your runtime after installation.\n"
]
}
],
"source": [
"from transformers import MarianMTModel, MarianTokenizer\n",
"\n",
"tokenizer = MarianTokenizer.from_pretrained(\"Helsinki-NLP/opus-mt-en-zh\")\n",
"model = MarianMTModel.from_pretrained(\"Helsinki-NLP/opus-mt-en-zh\")\n",
"\n",
"tokenizer.save_pretrained(\"./model_en_zh\")\n",
"model.save_pretrained(\"./model_en_zh\")\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "subenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -60,7 +60,6 @@ class AllConfig {
if(config.uiTheme) this.uiTheme = config.uiTheme
if(config.leftBarWidth) this.leftBarWidth = config.leftBarWidth
if(config.styles) this.setStyles(config.styles)
if(process.platform !== 'win32' && process.platform !== 'darwin') config.controls.audio = 1
if(config.controls) this.setControls(config.controls)
console.log('[INFO] Read Config from:', configPath)
}

View File

@@ -118,6 +118,7 @@ export class CaptionEngine {
});
this.process.stderr.on('data', (data) => {
if(this.processStatus === 'stopping') return
controlWindow.sendErrorMessage(i18n('engine.error') + data)
console.error(`[ERROR] Subprocess Error: ${data}`);
});

View File

@@ -4,46 +4,102 @@
<a-app class="caption-title">
<span style="margin-right: 30px;">{{ $t('log.title') }}</span>
</a-app>
<a-button
type="primary"
style="margin-right: 20px;"
@click="exportCaptions"
:disabled="captionData.length === 0"
>{{ $t('log.export') }}</a-button>
<a-popover :title="$t('log.copyOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.addIndex') }}</span>
<a-switch v-model:checked="showIndex" />
<span class="input-label">{{ $t('log.copyTime') }}</span>
<a-switch v-model:checked="copyTime" />
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.copyContent') }}</span>
<a-radio-group v-model:value="copyOption">
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="copyCaptions"
:disabled="captionData.length === 0"
>{{ $t('log.copy') }}</a-button>
<a-popover :title="$t('log.baseTime')">
<template #content>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0"
v-model:value="baseHH"
></a-input>
<span class="base-time-label">{{ $t('log.hour') }}</span>
</div>
</div><span style="margin: 0 4px;">:</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="59"
v-model:value="baseMM"
></a-input>
<span class="base-time-label">{{ $t('log.min') }}</span>
</div>
</div><span style="margin: 0 4px;">:</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="59"
v-model:value="baseSS"
></a-input>
<span class="base-time-label">{{ $t('log.sec') }}</span>
</div>
</div><span style="margin: 0 4px;">.</span>
<div class="base-time">
<div class="base-time-container">
<a-input
type="number" min="0" max="999"
v-model:value="baseMS"
></a-input>
<span class="base-time-label">{{ $t('log.ms') }}</span>
</div>
</div>
</template>
<a-button
type="primary"
style="margin-right: 20px;"
@click="changeBaseTime"
:disabled="captionData.length === 0"
>{{ $t('log.changeTime') }}</a-button>
</a-popover>
<a-popover :title="$t('log.exportOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.exportFormat') }}</span>
<a-radio-group v-model:value="exportFormat">
<a-radio-button value="srt">.srt</a-radio-button>
<a-radio-button value="json">.json</a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="exportCaptions"
:disabled="captionData.length === 0"
>{{ $t('log.export') }}</a-button>
</a-popover>
<a-popover :title="$t('log.copyOptions')">
<template #content>
<div class="input-item">
<span class="input-label">{{ $t('log.addIndex') }}</span>
<a-switch v-model:checked="showIndex" />
<span class="input-label">{{ $t('log.copyTime') }}</span>
<a-switch v-model:checked="copyTime" />
</div>
<div class="input-item">
<span class="input-label">{{ $t('log.copyContent') }}</span>
<a-radio-group v-model:value="copyOption">
<a-radio-button value="both">{{ $t('log.both') }}</a-radio-button>
<a-radio-button value="source">{{ $t('log.source') }}</a-radio-button>
<a-radio-button value="target">{{ $t('log.translation') }}</a-radio-button>
</a-radio-group>
</div>
</template>
<a-button
style="margin-right: 20px;"
@click="copyCaptions"
:disabled="captionData.length === 0"
>{{ $t('log.copy') }}</a-button>
</a-popover>
<a-button
danger
@click="clearCaptions"
>{{ $t('log.clear') }}</a-button>
</div>
<a-table
:columns="columns"
:data-source="captionData"
v-model:pagination="pagination"
style="margin-top: 10px;"
>
<template #bodyCell="{ column, record }">
<template v-if="column.key === 'index'">
@@ -72,15 +128,23 @@ import { storeToRefs } from 'pinia'
import { useCaptionLogStore } from '@renderer/stores/captionLog'
import { message } from 'ant-design-vue'
import { useI18n } from 'vue-i18n'
import * as tc from '../utils/timeCalc'
const { t } = useI18n()
const captionLog = useCaptionLogStore()
const { captionData } = storeToRefs(captionLog)
const exportFormat = ref('srt')
const showIndex = ref(true)
const copyTime = ref(true)
const copyOption = ref('both')
const baseHH = ref<number>(0)
const baseMM = ref<number>(0)
const baseSS = ref<number>(0)
const baseMS = ref<number>(0)
const pagination = ref({
current: 1,
pageSize: 10,
@@ -117,20 +181,58 @@ const columns = [
},
]
function changeBaseTime() {
if(baseHH.value < 0) baseHH.value = 0
if(baseMM.value < 0) baseMM.value = 0
if(baseMM.value > 59) baseMM.value = 59
if(baseSS.value < 0) baseSS.value = 0
if(baseSS.value > 59) baseSS.value = 59
if(baseMS.value < 0) baseMS.value = 0
if(baseMS.value > 999) baseMS.value = 999
const newBase: tc.Time = {
hh: Number(baseHH.value),
mm: Number(baseMM.value),
ss: Number(baseSS.value),
ms: Number(baseMS.value)
}
const oldBase = tc.getTimeFromStr(captionData.value[0].time_s)
const deltaMs = tc.getMsFromTime(newBase) - tc.getMsFromTime(oldBase)
for(let i = 0; i < captionData.value.length; i++){
captionData.value[i].time_s =
tc.getNewTimeStr(captionData.value[i].time_s, deltaMs)
captionData.value[i].time_t =
tc.getNewTimeStr(captionData.value[i].time_t, deltaMs)
}
}
function exportCaptions() {
const jsonData = JSON.stringify(captionData.value, null, 2)
const blob = new Blob([jsonData], { type: 'application/json' })
const exportData = getExportData()
const blob = new Blob([exportData], {
type: exportFormat.value === 'json' ? 'application/json' : 'text/plain'
})
const url = URL.createObjectURL(blob)
const a = document.createElement('a')
a.href = url
const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
a.download = `captions-${timestamp}.json`
a.download = `captions-${timestamp}.${exportFormat.value}`
document.body.appendChild(a)
a.click()
document.body.removeChild(a)
URL.revokeObjectURL(url)
}
function getExportData() {
if(exportFormat.value === 'json') return JSON.stringify(captionData.value, null, 2)
let content = ''
for(let i = 0; i < captionData.value.length; i++){
const item = captionData.value[i]
content += `${i+1}\n`
content += `${item.time_s} --> ${item.time_t}\n`.replace(/\./g, ',')
content += `${item.text}\n${item.translation}\n\n`
}
return content
}
function copyCaptions() {
let content = ''
for(let i = 0; i < captionData.value.length; i++){
@@ -166,6 +268,23 @@ function clearCaptions() {
margin-bottom: 10px;
}
.base-time {
width: 64px;
display: inline-block;
}
.base-time-container {
display: flex;
flex-direction: column;
align-items: center;
gap: 4px;
}
.base-time-label {
font-size: 12px;
color: var(--tag-color);
}
.time-cell {
display: flex;
flex-direction: column;

View File

@@ -335,13 +335,12 @@ watch(changeSignal, (val) => {
}
.preview-container {
line-height: 2em;
width: 60%;
text-align: center;
position: absolute;
padding: 20px;
padding: 10px;
border-radius: 10px;
left: 50%;
left: 64%;
transform: translateX(-50%);
bottom: 20px;
}
@@ -349,7 +348,7 @@ watch(changeSignal, (val) => {
.preview-container p {
text-align: center;
margin: 0;
line-height: 1.5em;
line-height: 1.6em;
}
.left-ellipsis {

View File

@@ -33,7 +33,6 @@
<div class="input-item">
<span class="input-label">{{ $t('engine.audioType') }}</span>
<a-select
:disabled="platform !== 'win32' && platform !== 'darwin'"
class="input-area"
v-model:value="currentAudio"
:options="audioType"

View File

@@ -106,7 +106,6 @@ function openCaptionWindow() {
}
function startEngine() {
console.log(`@@${engineControl.modelPath}##`)
if(engineControl.engine === 'vosk' && engineControl.modelPath.trim() === '') {
engineControl.emptyModelPathErr()
return

View File

@@ -115,7 +115,16 @@ export default {
},
log: {
"title": "Caption Log",
"changeTime": "Modify Caption Time",
"baseTime": "First Caption Start Time",
"hour": "Hour",
"min": "Minute",
"sec": "Second",
"ms": "Millisecond",
"export": "Export Caption Log",
"copy": "Copy to Clipboard",
"exportOptions": "Export Options",
"exportFormat": "Format",
"copyOptions": "Copy Options",
"addIndex": "Add Index",
"copyTime": "Copy Time",
@@ -124,7 +133,6 @@ export default {
"source": "Original Only",
"translation": "Translation Only",
"copySuccess": "Subtitle copied to clipboard",
"export": "Export Caption Log",
"clear": "Clear Caption Log"
}
}

View File

@@ -115,7 +115,16 @@ export default {
},
log: {
"title": "字幕ログ",
"changeTime": "字幕時間を変更",
"baseTime": "最初の字幕開始時間",
"hour": "時",
"min": "分",
"sec": "秒",
"ms": "ミリ秒",
"export": "エクスポート",
"copy": "クリップボードにコピー",
"exportOptions": "エクスポートオプション",
"exportFormat": "形式",
"copyOptions": "コピー設定",
"addIndex": "順序番号",
"copyTime": "時間",
@@ -124,7 +133,6 @@ export default {
"source": "原文のみ",
"translation": "翻訳のみ",
"copySuccess": "字幕がクリップボードにコピーされました",
"export": "エクスポート",
"clear": "字幕ログをクリア"
}
}

View File

@@ -115,8 +115,16 @@ export default {
},
log: {
"title": "字幕记录",
"changeTime": "修改字幕时间",
"baseTime": "首条字幕起始时间",
"hour": "时",
"min": "分",
"sec": "秒",
"ms": "毫秒",
"export": "导出字幕记录",
"copy": "复制到剪贴板",
"exportOptions": "导出选项",
"exportFormat": "导出格式",
"copyOptions": "复制选项",
"addIndex": "添加序号",
"copyTime": "复制时间",

View File

@@ -104,12 +104,6 @@ export const useEngineControlStore = defineStore('engineControl', () => {
});
})
watch(platform, (newValue) => {
if(newValue !== 'win32' && newValue !== 'darwin') {
audio.value = 1
}
})
return {
platform, // 系统平台
captionEngine, // 字幕引擎列表

View File

@@ -0,0 +1,42 @@
export interface Time {
hh: number;
mm: number;
ss: number;
ms: number;
}
export function getTimeFromStr(time: string): Time {
const arr = time.split(":");
const hh = parseInt(arr[0]);
const mm = parseInt(arr[1]);
const ss = parseInt(arr[2].split(".")[0]);
const ms = parseInt(arr[2].split(".")[1]);
return { hh, mm, ss, ms };
}
export function getStrFromTime(time: Time): string {
return `${time.hh}:${time.mm}:${time.ss}.${time.ms}`;
}
export function getMsFromTime(time: Time): number {
return (
time.hh * 3600000 +
time.mm * 60000 +
time.ss * 1000 +
time.ms
);
}
export function getTimeFromMs(milliseconds: number): Time {
const hh = Math.floor(milliseconds / 3600000);
const mm = Math.floor((milliseconds % 3600000) / 60000);
const ss = Math.floor((milliseconds % 60000) / 1000);
const ms = milliseconds % 1000;
return { hh, mm, ss, ms };
}
export function getNewTimeStr(timeStr: string, Ms: number): string {
const timeMs = getMsFromTime(getTimeFromStr(timeStr));
const newTimeMs = timeMs + Ms;
return getStrFromTime(getTimeFromMs(newTimeMs));
}

View File

@@ -1,24 +1,11 @@
<template>
<div
class="caption-page"
ref="caption"
:style="{
backgroundColor: captionStyle.backgroundRGBA
}"
class="caption-page"
ref="caption"
:style="{
backgroundColor: captionStyle.backgroundRGBA
}"
>
<div class="title-bar" :style="{color: captionStyle.fontColor}">
<div class="drag-area">&nbsp;</div>
<div class="option-item" @click="pinCaptionWindow">
<PushpinFilled v-if="pinned" />
<PushpinOutlined v-else />
</div>
<div class="option-item" @click="openControlWindow">
<SettingOutlined />
</div>
<div class="option-item" @click="closeCaptionWindow">
<CloseOutlined />
</div>
</div>
<div
class="caption-container"
:style="{
@@ -46,6 +33,20 @@
<span v-else>{{ $t('example.translation') }}</span>
</p>
</div>
<div class="title-bar" :style="{color: captionStyle.fontColor}">
<div class="option-item" @click="closeCaptionWindow">
<CloseOutlined />
</div>
<div class="option-item" @click="openControlWindow">
<SettingOutlined />
</div>
<div class="option-item" @click="pinCaptionWindow">
<PushpinFilled v-if="pinned" />
<PushpinOutlined v-else />
</div>
<div class="drag-area"></div>
</div>
</div>
</template>
@@ -97,38 +98,21 @@ function closeCaptionWindow() {
border-radius: 8px;
box-sizing: border-box;
border: 1px solid #3333;
}
.title-bar {
display: flex;
align-items: center;
}
.drag-area {
padding: 5px;
flex-grow: 1;
-webkit-app-region: drag;
}
.option-item {
display: inline-block;
padding: 5px 10px;
cursor: pointer;
}
.option-item:hover {
background-color: #2221;
}
.caption-container {
display: inline-block;
width: calc(100% - 32px);
-webkit-app-region: drag;
padding-top: 10px;
padding-bottom: 10px;
}
.caption-container p {
text-align: center;
margin: 0;
line-height: 1.5em;
padding: 0 10px 10px 10px;
line-height: 1.6em;
}
.left-ellipsis {
@@ -142,4 +126,30 @@ function closeCaptionWindow() {
direction: ltr;
display: inline-block;
}
.title-bar {
width: 32px;
display: flex;
flex-direction: column;
vertical-align: top;
}
.option-item {
width: 32px;
height: 32px;
display: flex;
justify-content: center;
align-items: center;
cursor: pointer;
}
.option-item:hover {
background-color: #2221;
}
.drag-area {
display: inline-flex;
flex-grow: 1;
-webkit-app-region: drag;
}
</style>