release v0.2.0

- 更新和增加文档 - 添加新的图片 - 优化文档结构和内容
2026-05-06 17:17:31 +08:00 · 2025-07-05 17:11:25 +08:00
parent 50ea9c5e4c
commit 213426dace
32 changed files with 609 additions and 93 deletions
--- a/docs/user-manual/en.md
+++ b/docs/user-manual/en.md
@@ -0,0 +1,60 @@
+# Auto Caption User Manual
+
+Corresponding Version: v0.2.0
+
+## Software Introduction
+
+Auto Caption is a cross-platform caption display software that can real-time capture system audio input (recording) or output (playback) streaming data and use an audio-to-text model to generate captions for the corresponding audio. The default caption engine provided by the software (using Alibaba Cloud Gummy model) supports recognition and translation in nine languages (Chinese, English, Japanese, Korean, German, French, Russian, Spanish, Italian).
+
+Currently, the default caption engine only has full functionality on the Windows platform. On the Linux platform, it can only generate captions for audio input (microphone) and does not support generating captions for audio output (playback).
+
+![](../../assets/media/main_en.png)
+
+### Software Limitations
+
+To use the default caption service, you need to obtain an API KEY from Alibaba Cloud.
+
+The software is built using Electron, so the software size is inevitably large.
+
+## Software Usage
+
+### Preparing the Alibaba Cloud Model Studio API KEY
+
+To use the default caption engine (Alibaba Cloud Gummy), you need to obtain an API KEY from the Alibaba Cloud Model Studio and configure it in your local environment variables.
+
+**The international version of Alibaba Cloud does not provide the Gummy model, so non-Chinese users currently cannot use the default caption engine. I am trying to develop a new local caption engine to ensure that all users have access to a default caption engine.**
+
+Alibaba Cloud provides detailed tutorials for this:
+
+- [Obtain API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
+- [Configure API Key in Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
+
+### Modifying Settings
+
+Caption settings can be divided into three categories: general settings, caption engine settings, and caption style settings. Note that changes to general settings take effect immediately. For the other two categories, after making changes, you need to click the "Apply" option in the upper right corner of the corresponding settings module for the changes to take effect. If you click "Cancel Changes," the current modifications will not be saved and will revert to the previous state.
+
+### Starting and Stopping Captions
+
+After completing all configurations, click the "Start Caption Engine" button on the interface to start the captions. If you need a separate caption display window, click the "Open Caption Window" button to activate the independent caption display window. To pause caption recognition, click the "Stop Caption Engine" button.
+
+### Adjusting the Caption Display Window
+
+The following image shows the caption display window, which displays the latest captions in real-time. The three buttons in the upper right corner of the window have the following functions: pin the window to the front, open the caption control window, and close the caption display window. The width of the window can be adjusted by moving the mouse to the left or right edge of the window and dragging the mouse.
+
+![](../img/01.png)
+
+### Exporting Caption Records
+
+In the caption control window, you can see the records of all collected captions. Click the "Export Caption Records" button to export the caption records as a JSON file.
+
+## Caption Engine
+
+The so-called caption engine is actually a subprocess that real-time captures system audio input (recording) or output (playback) streaming data and uses an audio-to-text model to generate captions for the corresponding audio. The generated captions are output as JSON data converted to strings and returned to the main program. The main program reads the caption data, processes it, and displays it in the window.
+
+The software provides a default caption engine. If you need other caption engines, you can call them by enabling the custom engine option (other engines need to be developed specifically for this software). The engine path is the path to the custom caption engine on your computer, and the engine command is the runtime parameters for the custom caption engine, which need to be filled out according to the rules of the specific caption engine.
+
+![](../img/02_en.png)
+
+Note that when using a custom caption engine, all previous caption engine settings will be ineffective, and the configuration of the custom caption engine is entirely done through the engine command.
+
+If you are a developer and want to develop a custom caption engine, please refer to the [Caption Engine Explanation Document](../engine-manual/en.md).
--- a/docs/user-manual/ja.md
+++ b/docs/user-manual/ja.md
@@ -0,0 +1,62 @@
+# Auto Caption ユーザーマニュアル
+
+対応バージョン：v0.2.0
+
+この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。
+
+## ソフトウェアの概要
+
+Auto Caption は、クロスプラットフォームの字幕表示ソフトウェアで、システムの音声入力（録音）または出力（音声再生）のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。このソフトウェアが提供するデフォルトの字幕エンジン（アリババクラウド Gummy モデルを使用）は、9つの言語（中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、スペイン語、イタリア語）の認識と翻訳をサポートしています。
+
+現在、デフォルトの字幕エンジンは Windows プラットフォームでのみ完全な機能を利用できます。Linux プラットフォームでは、音声入力（マイク）からの字幕生成のみがサポートされており、音声出力（音声再生）からの字幕生成はまだサポートされていません。
+
+![](../../assets/media/main_ja.png)
+
+### ソフトウェアの欠点
+
+デフォルトの字幕サービスを使用するには、アリババクラウドの API KEY を取得する必要があります。
+
+ソフトウェアは Electron で構築されているため、そのサイズは避けられないほど大きいです。
+
+## ソフトウェアの使用方法
+
+### アリババクラウド百炼プラットフォームの API KEY の準備
+
+ソフトウェアが提供するデフォルトの字幕エンジン（アリババクラウド Gummy）を使用するには、アリババクラウド百炼プラットフォームから API KEY を取得し、ローカル環境変数に設定する必要があります。
+
+**アリババクラウドの国際版には Gummy モデルが提供されていないため、中国以外のユーザーは現在、デフォルトの字幕エンジンを使用できません。すべてのユーザーが利用できるように、新しいローカルの字幕エンジンを開発中です。**
+
+アリババクラウドは詳細なチュートリアルを提供していますので、以下のリンクを参照してください：
+
+- [API KEY の取得（中国語）](https://help.aliyun.com/zh/model-studio/get-api-key)
+- [環境変数を通じて API Key を設定する（中国語）](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
+
+### 設定の変更
+
+字幕の設定は3つのカテゴリーに分かれます：一般的な設定、字幕エンジンの設定、字幕スタイルの設定。注意すべき点として、一般的な設定の変更は即座に適用されます。しかし、他の2つの設定については、変更後に該当する設定モジュール右上の「適用」オプションをクリックすることで初めて変更が有効になります。「変更を取り消す」を選択すると、現在の変更は保存されず、前回の状態に戻ります。
+
+### 字幕の開始と停止
+
+すべての設定を完了したら、インターフェースの「字幕エンジンを開始」ボタンをクリックして字幕を開始できます。独立した字幕表示ウィンドウが必要な場合は、インターフェースの「字幕ウィンドウを開く」ボタンをクリックして独立した字幕表示ウィンドウをアクティブ化します。字幕認識を一時停止する必要がある場合は、「字幕エンジンを停止」ボタンをクリックします。
+
+### 字幕表示ウィンドウの調整
+
+下の図は字幕表示ウィンドウです。このウィンドウは現在の最新の字幕をリアルタイムで表示します。ウィンドウの右上にある3つのボタンの機能はそれぞれ次の通りです：ウィンドウを最前面に固定する、字幕制御ウィンドウを開く、字幕表示ウィンドウを閉じる。このウィンドウの幅は調整可能です。マウスをウィンドウの左右の端に移動し、ドラッグして幅を調整します。
+
+![](../img/01.png)
+
+### 字幕記録のエクスポート
+
+字幕制御ウィンドウでは、現在収集されたすべての字幕の記録を見ることができます。「字幕記録をエクスポート」ボタンをクリックすると、字幕記録をJSONファイルとしてエクスポートできます。
+
+## 字幕エンジン
+
+字幕エンジンとは、実際にはサブプログラムであり、システムの音声入力（録音）または出力（音声再生）のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。生成された字幕はIPC経由で文字列に変換されたJSONデータとして出力され、メインプログラムに返されます。メインプログラムは字幕データを読み取り、処理してウィンドウ上に表示します。
+
+ソフトウェアはデフォルトの字幕エンジンを提供しており、他の字幕エンジンが必要な場合は、カスタムエンジンオプションを開いて他の字幕エンジンを呼び出すことができます（他のエンジンはこのソフトウェアに対して開発する必要があります）。エンジンパスは、あなたのコンピュータ上のカスタム字幕エンジンのパスであり、エンジンコマンドはカスタム字幕エンジンの実行パラメータです。これらの部分は、その字幕エンジンの規則に従って記入する必要があります。
+
+![](../img/02_ja.png)
+
+カスタム字幕エンジンを使用する場合、前の字幕エンジンの設定はすべて無効になります。カスタム字幕エンジンの設定は完全にエンジンコマンドによって行われます。
+
+開発者の方で、カスタム字幕エンジンを開発したい場合は、[字幕エンジン説明文書](../engine-manual/ja.md)をご覧ください。
--- a/docs/user-manual/zh.md
+++ b/docs/user-manual/zh.md
@@ -0,0 +1,61 @@
+# Auto Caption 用户手册
+
+对应版本：v0.2.0
+
+## 软件简介
+
+Auto Caption 是一个跨平台的字幕显示软件，能够实时获取系统音频输入（录音）或输出（播放声音）的流式数据，并调用音频转文字的模型生成对应音频的字幕。软件提供的默认字幕引擎（使用阿里云 Gummy 模型）支持九种语言（中、英、日、韩、德、法、俄、西、意）的识别与翻译。
+
+目前软件默认字幕引擎只有在 Windows 平台下才拥有完整功能。在 Linux 平台下只能生成音频输入（麦克风）的字幕，暂不支持音频输出（播放声音）的字幕生成。
+
+![](../../assets/media/main_zh.png)
+
+### 软件缺点
+
+要使用默认字幕服务需要获取阿里云的 API KEY。
+
+软件使用 Electron 构建，因此软件体积不可避免的较大。
+
+## 软件使用
+
+### 准备阿里云百炼平台 API KEY
+
+要使用软件提供的默认字幕引擎（阿里云 Gummy），需要从阿里云百炼平台获取 API KEY 并在本机环境变量中配置。
+
+**国际版的阿里云服务并没有提供 Gummy 模型，因此目前非中国用户无法使用默认字幕引擎。我正在开发新的本地字幕引擎，以确保所有用户都有默认字幕引擎可以使用。**
+
+这部分阿里云提供了详细的教程，可参考：
+
+- [获取 API KEY](https://help.aliyun.com/zh/model-studio/get-api-key)
+
+- [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)
+
+### 修改设置
+
+字幕设置可以分为三类：通用设置、字幕引擎设置、字幕样式设置。需要注意的是，修改通用设置是立即生效的。但是对于其他两类设置，修改后需要点击对应设置模块右上角的“应用”选项，更改才会真正生效。如果点击“取消更改”那么当前修改将不会被保存，而是回退到上次修改的状态。
+
+### 启动和关闭字幕
+
+在修改完全部配置后，点击界面的“启动字幕引擎”按钮，即可启动字幕。如果需要独立的字幕展示窗口，单击界面的“打开字幕窗口”按钮即可激活独立的字幕展示窗口。如果需要暂停字幕识别，单击界面的“关闭字幕引擎”按钮即可。
+
+### 调整字幕展示窗口
+
+如下图为字幕展示窗口，该窗口实时展示当前最新字幕。窗口右上角三个按钮的功能分别是：将窗口固定在最前面、打开字幕控制窗口、关闭字幕展示窗口。该窗口宽度可以调整，将鼠标移动至窗口的左右边缘，拖动鼠标即可调整宽度。
+
+![](../img/01.png)
+
+### 字幕记录的导出
+
+在字幕控制窗口中可以看到当前收集的所有字幕的记录，点击“导出字幕记录”按钮，即可将字幕记录导出为 JSON 文件。
+
+## 字幕引擎
+
+所谓的字幕引擎实际上是一个子程序，它会实时获取系统音频输入（录音）或输出（播放声音）的流式数据，并调用音频转文字的模型生成对应音频的字幕。生成的字幕通过 IPC 输出为转换为字符串的 JSON 数据，并返回给主程序。主程序读取字幕数据，处理后显示在窗口上。
+
+软件提供了一个默认的字幕引擎，如果你需要其他的字幕引擎，可以通过打开自定义引擎选项来调用其他字幕引擎（其他引擎需要针对该软件进行开发）。其中引擎路径是自定义字幕引擎在你的电脑上的路径，引擎指令是自定义字幕引擎的运行参数，这部分需要按该字幕引擎的规则进行填写。
+
+![](../img/02_zh.png)
+
+注意使用自定义字幕引擎时，前面的字幕引擎的设置将全部不起作用，自定义字幕引擎的配置完全通过引擎指令进行配置。
+
+如果你是开发者，想开发自定义字幕引擎，请查看[字幕引擎说明文档](../engine-manual/zh.md)。