mirror of
https://github.com/HiMeditator/auto-caption.git
synced 2026-02-15 04:14:46 +08:00
docs(readme): 更新说明并添加终端使用指南
This commit is contained in:
BIN
docs/img/06.png
Normal file
BIN
docs/img/06.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 118 KiB |
BIN
docs/img/07.png
Normal file
BIN
docs/img/07.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 94 KiB |
@@ -130,3 +130,175 @@ The software provides two default caption engines. If you need other caption eng
|
||||
Note that when using a custom caption engine, all previous caption engine settings will be ineffective, and the configuration of the custom caption engine is entirely done through the engine command.
|
||||
|
||||
If you are a developer and want to develop a custom caption engine, please refer to the [Caption Engine Explanation Document](../engine-manual/en.md).
|
||||
|
||||
## Using Caption Engine Standalone
|
||||
|
||||
### Runtime Parameter Description
|
||||
|
||||
> The following content assumes users have some knowledge of running programs via terminal.
|
||||
|
||||
The complete set of runtime parameters available for the caption engine is shown below:
|
||||
|
||||

|
||||
|
||||
However, when used standalone, some parameters may not need to be used or should not be modified.
|
||||
|
||||
The following parameter descriptions only include necessary parameters.
|
||||
|
||||
#### `-e , --caption_engine`
|
||||
|
||||
The caption engine model to select, currently three options are available: `gummy, vosk, sosv`.
|
||||
|
||||
The default value is `gummy`.
|
||||
|
||||
This applies to all models.
|
||||
|
||||
#### `-a, --audio_type`
|
||||
|
||||
The audio type to recognize, where `0` represents system audio output and `1` represents microphone audio input.
|
||||
|
||||
The default value is `0`.
|
||||
|
||||
This applies to all models.
|
||||
|
||||
#### `-d, --display_caption`
|
||||
|
||||
Whether to display captions in the console, `0` means do not display, `1` means display.
|
||||
|
||||
The default value is `0`, but it's recommended to choose `1` when using only the caption engine.
|
||||
|
||||
This applies to all models.
|
||||
|
||||
#### `-t, --target_language`
|
||||
|
||||
> Note that Vosk and SOSV models have poor sentence segmentation, which can make translated content difficult to understand. It's not recommended to use translation with these two models.
|
||||
|
||||
Target language for translation. All models support the following translation languages:
|
||||
|
||||
- `none` No translation
|
||||
- `zh` Simplified Chinese
|
||||
- `en` English
|
||||
- `ja` Japanese
|
||||
- `ko` Korean
|
||||
|
||||
Additionally, `vosk` and `sosv` models also support the following translations:
|
||||
|
||||
- `de` German
|
||||
- `fr` French
|
||||
- `ru` Russian
|
||||
- `es` Spanish
|
||||
- `it` Italian
|
||||
|
||||
The default value is `none`.
|
||||
|
||||
This applies to all models.
|
||||
|
||||
#### `-s, --source_language`
|
||||
|
||||
Source language for recognition. Default value is `auto`, meaning no specific source language.
|
||||
|
||||
Specifying the source language can improve recognition accuracy to some extent. You can specify the source language using the language codes above.
|
||||
|
||||
This only applies to Gummy and SOSV models.
|
||||
|
||||
The Gummy model can use all the languages mentioned above, plus Cantonese (`yue`).
|
||||
|
||||
The SOSV model supports specifying the following languages: English, Chinese, Japanese, Korean, and Cantonese.
|
||||
|
||||
#### `-k, --api_key`
|
||||
|
||||
Specify the Alibaba Cloud API KEY required for the `Gummy` model.
|
||||
|
||||
Default value is empty.
|
||||
|
||||
This only applies to the Gummy model.
|
||||
|
||||
#### `-tm, --translation_model`
|
||||
|
||||
Specify the translation method for Vosk and SOSV models. Default is `ollama`.
|
||||
|
||||
Supported values are:
|
||||
|
||||
- `ollama` Use local Ollama model for translation. Users need to install Ollama software and corresponding models
|
||||
- `google` Use Google Translate API for translation. No additional configuration needed, but requires network access to Google
|
||||
|
||||
This only applies to Vosk and SOSV models.
|
||||
|
||||
#### `-omn, --ollama_name`
|
||||
|
||||
Specify the Ollama model to call for translation. Default value is empty.
|
||||
|
||||
It's recommended to use models with less than 1B parameters, such as: `qwen2.5:0.5b`, `qwen3:0.6b`.
|
||||
|
||||
Users need to download the corresponding model in Ollama to use it properly.
|
||||
|
||||
This only applies to Vosk and SOSV models.
|
||||
|
||||
#### `-vosk, --vosk_model`
|
||||
|
||||
Specify the path to the local folder of the Vosk model to call. Default value is empty.
|
||||
|
||||
This only applies to the Vosk model.
|
||||
|
||||
#### `-sosv, --sosv_model`
|
||||
|
||||
Specify the path to the local folder of the SOSV model to call. Default value is empty.
|
||||
|
||||
This only applies to the SOSV model.
|
||||
|
||||
### Running Caption Engine Using Source Code
|
||||
|
||||
> The following content assumes users who use this method have knowledge of Python environment configuration and usage.
|
||||
|
||||
First, download the project source code locally. The caption engine source code is located in the `engine` directory of the project. Then configure the Python environment, where the project dependencies are listed in the `requirements.txt` file in the `engine` directory.
|
||||
|
||||
After configuration, enter the `engine` directory and execute commands to run the caption engine.
|
||||
|
||||
For example, to use the Gummy model, specify audio type as system audio output, source language as English, and target language as Chinese, execute the following command:
|
||||
|
||||
> Note: For better visualization, the commands below are written on multiple lines. If execution fails, try removing backslashes and executing as a single line command.
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e gummy \
|
||||
-k sk-******************************** \
|
||||
-a 0 \
|
||||
-d 1 \
|
||||
-s en \
|
||||
-t zh
|
||||
```
|
||||
|
||||
To specify the Vosk model, audio type as system audio output, translate to English, and use Ollama `qwen3:0.6b` model for translation:
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e vosk \
|
||||
-vosk D:\Projects\auto-caption\engine\models\vosk-model-small-cn-0.22 \
|
||||
-a 0 \
|
||||
-d 1 \
|
||||
-t en \
|
||||
```
|
||||
|
||||
To specify the SOSV model, audio type as microphone, automatically select source language, and no translation:
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e sosv \
|
||||
-sosv D:\\Projects\\auto-caption\\engine\\models\\sosv-int8 \
|
||||
-a 1 \
|
||||
-d 1 \
|
||||
-s auto \
|
||||
-t none
|
||||
```
|
||||
|
||||
Running result using the Gummy model is shown below:
|
||||
|
||||

|
||||
|
||||
### Running Subtitle Engine Executable File
|
||||
|
||||
First, download the executable file for your platform from [GitHub Releases](https://github.com/HiMeditator/auto-caption/releases/tag/engine) (currently only Windows and Linux platform executable files are provided).
|
||||
|
||||
Then open a terminal in the directory containing the caption engine executable file and execute commands to run the caption engine.
|
||||
|
||||
Simply replace `python main.py` in the above commands with the executable file name (for example: `engine-win.exe`).
|
||||
@@ -128,3 +128,175 @@ sudo yum install pulseaudio pavucontrol
|
||||
注意使用自定义字幕引擎时,前面的字幕引擎的设置将全部不起作用,自定义字幕引擎的配置完全通过引擎指令进行配置。
|
||||
|
||||
如果你是开发者,想开发自定义字幕引擎,请查看[字幕引擎说明文档](../engine-manual/zh.md)。
|
||||
|
||||
## 单独使用字幕引擎
|
||||
|
||||
### 运行参数说明
|
||||
|
||||
> 以下内容默认用户对使用终端运行程序有一定了解。
|
||||
|
||||
字幕引擎可用使用的完整的运行参数如下:
|
||||
|
||||

|
||||
|
||||
而在单独使用时其中某些参数并不需要使用,或者不适合进行修改。
|
||||
|
||||
下面的运行参数说明仅包含必要的参数。
|
||||
|
||||
#### `-e , --caption_engine`
|
||||
|
||||
需要选择的字幕引擎模型,目前有三个可用,分别为:`gummy, vosk, sosv`。
|
||||
|
||||
该项的默认值为 `gummy`。
|
||||
|
||||
该项适用于所有模型。
|
||||
|
||||
#### `-a, --audio_type`
|
||||
|
||||
需要识别的音频类型,其中 `0` 表示系统音频输出,`1` 表示麦克风音频输入。
|
||||
|
||||
该项的默认值为 `0`。
|
||||
|
||||
该项适用于所有模型。
|
||||
|
||||
#### `-d, --display_caption`
|
||||
|
||||
是否在控制台显示字幕,`0` 表示不显示,`1` 表示显示。
|
||||
|
||||
该项默认值为 `0`,只使用字幕引擎的话建议选 `1`。
|
||||
|
||||
该项适用于所有模型。
|
||||
|
||||
#### `-t, --target_language`
|
||||
|
||||
> 其中 Vosk 和 SOSV 模型分句效果较差,会导致翻译内容难以理解,不太建议这两个模型使用翻译。
|
||||
|
||||
需要翻译成的目标语言,所有模型都支持的翻译语言如下:
|
||||
|
||||
- `none` 不进行翻译
|
||||
- `zh` 简体中文
|
||||
- `en` 英语
|
||||
- `ja` 日语
|
||||
- `ko` 韩语
|
||||
|
||||
除此之外 `vosk` 和 `sosv` 模型还支持如下翻译:
|
||||
|
||||
- `de` 德语
|
||||
- `fr` 法语
|
||||
- `ru` 俄语
|
||||
- `es` 西班牙语
|
||||
- `it` 意大利语
|
||||
|
||||
该项的默认值为 `none`。
|
||||
|
||||
该项适用于所有模型。
|
||||
|
||||
#### `-s, --source_language`
|
||||
|
||||
需要识别的语言的源语言,默认值为 `auto`,表示不指定源语言。
|
||||
|
||||
但是指定源语言能在一定程度上提高识别准确率,可用使用上面的语言代码指定源语言。
|
||||
|
||||
该项仅适用于 Gummy 和 SOSV 模型。
|
||||
|
||||
其中 Gummy 模型可用使用上述全部的语言,在加上粤语(`yue`)。
|
||||
|
||||
而 SOSV 模型支持指定的语言有:英语、中文、日语、韩语、粤语。
|
||||
|
||||
#### `-k, --api_key`
|
||||
|
||||
指定 `Gummy` 模型需要使用的阿里云 API KEY。
|
||||
|
||||
该项默认值为空。
|
||||
|
||||
该项仅适用于 Gummy 模型。
|
||||
|
||||
#### `-tm, --translation_model`
|
||||
|
||||
指定 Vosk 和 SOSV 模型的翻译方式,默认为 `ollama`。
|
||||
|
||||
该项支持的值有:
|
||||
|
||||
- `ollama` 使用本地 Ollama 模型进行翻译,需要用户安装 Ollama 软件和对应的模型
|
||||
- `google` 使用 Google 翻译 API 进行翻译,无需额外配置,但是需要有能访问 Google 的网络
|
||||
|
||||
该项仅适用于 Vosk 和 SOSV 模型。
|
||||
|
||||
#### `-omn, --ollama_name`
|
||||
|
||||
指定需要调用进行翻译的 Ollama 模型。该项默认值为空。
|
||||
|
||||
建议使用参数量小于 1B 的模型,比如: `qwen2.5:0.5b`, `qwen3:0.6b`。
|
||||
|
||||
用户需要在 Ollama 中下载了对应的模型才能正常使用。
|
||||
|
||||
该项仅适用于 Vosk 和 SOSV 模型。
|
||||
|
||||
#### `-vosk, --vosk_model`
|
||||
|
||||
指定需要调用的 Vosk 模型的本地文件夹的路径。该项默认值为空。
|
||||
|
||||
该项仅适用于 Vosk 模型。
|
||||
|
||||
#### `-sosv, --sosv_model`
|
||||
|
||||
指定需要调用的 SOSV 模型的本地文件夹的路径。该项默认值为空。
|
||||
|
||||
该项仅适用于 SOSV 模型。
|
||||
|
||||
### 使用源代码运行字幕引擎
|
||||
|
||||
> 以下内容默认使用该方式的用户对 Python 环境配置和使用有所了解。
|
||||
|
||||
首先下载项目源代码到本地,其中字幕引擎源代码在项目的 `engine` 目录下。然后配置 Python 环境,其中项目依赖的 Python 包在 `engine` 目录下 `requirements.txt` 文件中。
|
||||
|
||||
配置好后进入 `engine` 目录,执行命令进行运行字幕引擎。
|
||||
|
||||
比如要使用 Gummy 模型,指定音频类型为系统音频输出,源语言为英语,翻译语言为中文,执行的命令如下:
|
||||
|
||||
> 注意:为了更直观,下面的命令写在了多行,如果执行失败,尝试去掉反斜杠,并改换单行命令执行。
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e gummy \
|
||||
-k sk-******************************** \
|
||||
-a 0 \
|
||||
-d 1 \
|
||||
-s en \
|
||||
-t zh
|
||||
```
|
||||
|
||||
指定 Vosk 模型,指定音频类型为系统音频输出,翻译语言为英语,使用 Ollama `qwen3:0.6b` 模型进行翻译:
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e vosk \
|
||||
-vosk D:\Projects\auto-caption\engine\models\vosk-model-small-cn-0.22 \
|
||||
-a 0 \
|
||||
-d 1 \
|
||||
-t en \
|
||||
```
|
||||
|
||||
指定 SOSV 模型,指定音频类型为麦克风,自动选择源语言,不翻译,执行的命令如下:
|
||||
|
||||
```bash
|
||||
python main.py \
|
||||
-e sosv \
|
||||
-sosv D:\\Projects\\auto-caption\\engine\\models\\sosv-int8 \
|
||||
-a 1 \
|
||||
-d 1 \
|
||||
-s auto \
|
||||
-t none
|
||||
```
|
||||
|
||||
使用 Gummy 模型的运行效果如下:
|
||||
|
||||

|
||||
|
||||
### 运行字幕引擎可执行文件
|
||||
|
||||
首先在 [GitHub Release](https://github.com/HiMeditator/auto-caption/releases/tag/engine) 中下载对应平台的可执行文件(目前仅提供 Windows 和 Linux 平台的字幕引擎可执行文件)。
|
||||
|
||||
然后再字幕引擎可执行文件所在目录打开终端,执行命令进行运行字幕引擎。
|
||||
|
||||
只需要将上述指令中的 `python main.py` 替换为可执行文件名称即可(比如:`engine-win.exe`)。
|
||||
Reference in New Issue
Block a user