docs(readme): 更新说明并添加终端使用指南

2026-03-07 03:47:29 +08:00 · 2025-11-02 20:53:56 +08:00
parent e6a65f8362
commit 383e582a2d
8 changed files with 422 additions and 5 deletions
--- a/docs/user-manual/en.md
+++ b/docs/user-manual/en.md
@@ -130,3 +130,175 @@ The software provides two default caption engines. If you need other caption eng
 Note that when using a custom caption engine, all previous caption engine settings will be ineffective, and the configuration of the custom caption engine is entirely done through the engine command.

 If you are a developer and want to develop a custom caption engine, please refer to the [Caption Engine Explanation Document](../engine-manual/en.md).
+
+## Using Caption Engine Standalone
+
+### Runtime Parameter Description
+
+> The following content assumes users have some knowledge of running programs via terminal.
+
+The complete set of runtime parameters available for the caption engine is shown below:
+
+![](../img/06.png)
+
+However, when used standalone, some parameters may not need to be used or should not be modified.
+
+The following parameter descriptions only include necessary parameters.
+
+#### `-e , --caption_engine`
+
+The caption engine model to select, currently three options are available: `gummy, vosk, sosv`.
+
+The default value is `gummy`.
+
+This applies to all models.
+
+#### `-a, --audio_type`
+
+The audio type to recognize, where `0` represents system audio output and `1` represents microphone audio input.
+
+The default value is `0`.
+
+This applies to all models.
+
+#### `-d, --display_caption`
+
+Whether to display captions in the console, `0` means do not display, `1` means display.
+
+The default value is `0`, but it's recommended to choose `1` when using only the caption engine.
+
+This applies to all models.
+
+#### `-t, --target_language`
+
+> Note that Vosk and SOSV models have poor sentence segmentation, which can make translated content difficult to understand. It's not recommended to use translation with these two models.
+
+Target language for translation. All models support the following translation languages:
+
+- `none` No translation
+- `zh` Simplified Chinese
+- `en` English
+- `ja` Japanese
+- `ko` Korean
+
+Additionally, `vosk` and `sosv` models also support the following translations:
+
+- `de` German
+- `fr` French
+- `ru` Russian
+- `es` Spanish
+- `it` Italian
+
+The default value is `none`.
+
+This applies to all models.
+
+#### `-s, --source_language`
+
+Source language for recognition. Default value is `auto`, meaning no specific source language.
+
+Specifying the source language can improve recognition accuracy to some extent. You can specify the source language using the language codes above.
+
+This only applies to Gummy and SOSV models.
+
+The Gummy model can use all the languages mentioned above, plus Cantonese (`yue`).
+
+The SOSV model supports specifying the following languages: English, Chinese, Japanese, Korean, and Cantonese.
+
+#### `-k, --api_key`
+
+Specify the Alibaba Cloud API KEY required for the `Gummy` model.
+
+Default value is empty.
+
+This only applies to the Gummy model.
+
+#### `-tm, --translation_model`
+
+Specify the translation method for Vosk and SOSV models. Default is `ollama`.
+
+Supported values are:
+
+- `ollama` Use local Ollama model for translation. Users need to install Ollama software and corresponding models
+- `google` Use Google Translate API for translation. No additional configuration needed, but requires network access to Google
+
+This only applies to Vosk and SOSV models.
+
+#### `-omn, --ollama_name`
+
+Specify the Ollama model to call for translation. Default value is empty.
+
+It's recommended to use models with less than 1B parameters, such as: `qwen2.5:0.5b`, `qwen3:0.6b`.
+
+Users need to download the corresponding model in Ollama to use it properly.
+
+This only applies to Vosk and SOSV models.
+
+#### `-vosk, --vosk_model`
+
+Specify the path to the local folder of the Vosk model to call. Default value is empty.
+
+This only applies to the Vosk model.
+
+#### `-sosv, --sosv_model`
+
+Specify the path to the local folder of the SOSV model to call. Default value is empty.
+
+This only applies to the SOSV model.
+
+### Running Caption Engine Using Source Code
+
+> The following content assumes users who use this method have knowledge of Python environment configuration and usage.
+
+First, download the project source code locally. The caption engine source code is located in the  `engine` directory of the project. Then configure the Python environment, where the project dependencies are listed in the `requirements.txt` file in the `engine` directory.
+
+After configuration, enter the `engine` directory and execute commands to run the caption engine.
+
+For example, to use the Gummy model, specify audio type as system audio output, source language as English, and target language as Chinese, execute the following command:
+
+> Note: For better visualization, the commands below are written on multiple lines. If execution fails, try removing backslashes and executing as a single line command.
+
+```bash
+python main.py \
+-e gummy \
+-k sk-******************************** \
+-a 0 \
+-d 1 \
+-s en \
+-t zh
+```
+
+To specify the Vosk model, audio type as system audio output, translate to English, and use Ollama `qwen3:0.6b` model for translation:
+
+```bash
+python main.py \
+-e vosk \
+-vosk D:\Projects\auto-caption\engine\models\vosk-model-small-cn-0.22 \
+-a 0 \
+-d 1 \
+-t en \
+```
+
+To specify the SOSV model, audio type as microphone, automatically select source language, and no translation:
+
+```bash
+python main.py \
+-e sosv \
+-sosv D:\\Projects\\auto-caption\\engine\\models\\sosv-int8 \
+-a 1 \
+-d 1 \
+-s auto \
+-t none
+```
+
+Running result using the Gummy model is shown below:
+
+![](../img/07.png)
+
+### Running Subtitle Engine Executable File
+
+First, download the executable file for your platform from [GitHub Releases](https://github.com/HiMeditator/auto-caption/releases/tag/engine) (currently only Windows and Linux platform executable files are provided).
+
+Then open a terminal in the directory containing the caption engine executable file and execute commands to run the caption engine.
+
+Simply replace `python main.py` in the above commands with the executable file name (for example: `engine-win.exe`).