feat(docs): 更新文档、添加 macOS 平台适配指南

2026-05-04 07:47:33 +08:00 · 2025-07-08 22:44:11 +08:00
parent cbbaaa95a3
commit 3c9138f115
15 changed files with 463 additions and 244 deletions
--- a/docs/user-manual/en.md
+++ b/docs/user-manual/en.md
@@ -1,12 +1,14 @@
 # Auto Caption User Manual

-Corresponding Version: v0.2.0
+Corresponding Version: v0.3.0

 ## Software Introduction

 Auto Caption is a cross-platform caption display software that can real-time capture system audio input (recording) or output (playback) streaming data and use an audio-to-text model to generate captions for the corresponding audio. The default caption engine provided by the software (using Alibaba Cloud Gummy model) supports recognition and translation in nine languages (Chinese, English, Japanese, Korean, German, French, Russian, Spanish, Italian).

-Currently, the default caption engine only has full functionality on the Windows platform. On the Linux platform, it can only generate captions for audio input (microphone) and does not support generating captions for audio output (playback).
+Currently, the default caption engine of the software only has full functionality on Windows and macOS platforms. Additional configuration is required to capture system audio output on macOS.
+
+On Linux platforms, it can only generate captions for audio input (microphone), and currently does not support generating captions for audio output (playback).

 ![](../../assets/media/main_en.png)

@@ -14,6 +16,8 @@ Currently, the default caption engine only has full functionality on the Windows

 To use the default caption service, you need to obtain an API KEY from Alibaba Cloud.

+Additional configuration is required to capture audio output on macOS platform.
+
 The software is built using Electron, so the software size is inevitably large.

 ## Software Usage
@@ -29,6 +33,22 @@ Alibaba Cloud provides detailed tutorials for this:
 - [Obtain API KEY (Chinese)](https://help.aliyun.com/zh/model-studio/get-api-key)
 - [Configure API Key in Environment Variables (Chinese)](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)

+### Capturing System Audio Output on macOS
+
+The caption engine cannot directly capture system audio output on macOS platform and requires additional driver installation. The current caption engine uses [BlackHole](https://github.com/ExistentialAudio/BlackHole). First open Terminal and execute one of the following commands (recommended to choose the first one):
+
+```bash
+brew install blackhole-2ch
+brew install blackhole-16ch
+brew install blackhole-64ch
+```
+
+After installation completes, open `Audio MIDI Setup` (searchable via `cmd + space`). Check if BlackHole appears in the device list - if not, restart your computer.
+
+Once BlackHole is confirmed installed, in the `Audio MIDI Setup` page, click the plus (+) button at bottom left and select "Create Multi-Output Device". Include both BlackHole and your desired audio output destination in the outputs. Finally, set this multi-output device as your default audio output device.
+
+Now the caption engine can capture system audio output and generate captions.
+
 ### Modifying Settings

 Caption settings can be divided into three categories: general settings, caption engine settings, and caption style settings. Note that changes to general settings take effect immediately. For the other two categories, after making changes, you need to click the "Apply" option in the upper right corner of the corresponding settings module for the changes to take effect. If you click "Cancel Changes," the current modifications will not be saved and will revert to the previous state.
--- a/docs/user-manual/ja.md
+++ b/docs/user-manual/ja.md
@@ -1,6 +1,6 @@
 # Auto Caption ユーザーマニュアル

-対応バージョン：v0.2.0
+対応バージョン：v0.3.0

 この文書は大規模モデルを使用して翻訳されていますので、内容に正確でない部分があるかもしれません。

@@ -8,7 +8,9 @@

 Auto Caption は、クロスプラットフォームの字幕表示ソフトウェアで、システムの音声入力（録音）または出力（音声再生）のストリーミングデータをリアルタイムで取得し、音声からテキストに変換するモデルを利用して対応する音声の字幕を生成します。このソフトウェアが提供するデフォルトの字幕エンジン（アリババクラウド Gummy モデルを使用）は、9つの言語（中国語、英語、日本語、韓国語、ドイツ語、フランス語、ロシア語、スペイン語、イタリア語）の認識と翻訳をサポートしています。

-現在、デフォルトの字幕エンジンは Windows プラットフォームでのみ完全な機能を利用できます。Linux プラットフォームでは、音声入力（マイク）からの字幕生成のみがサポートされており、音声出力（音声再生）からの字幕生成はまだサポートされていません。
+現在、ソフトウェアのデフォルト字幕エンジンは Windows と macOS プラットフォームでのみ完全な機能を有しています。macOS でシステムオーディオ出力を取得するには追加の設定が必要です。
+
+Linux プラットフォームでは、オーディオ入力（マイク）からの字幕生成のみ可能で、現在オーディオ出力（再生音）からの字幕生成はサポートしていません。

 ![](../../assets/media/main_ja.png)

@@ -16,11 +18,13 @@ Auto Caption は、クロスプラットフォームの字幕表示ソフトウ

 デフォルトの字幕サービスを使用するには、アリババクラウドの API KEY を取得する必要があります。

+macOS プラットフォームでオーディオ出力を取得するには追加の設定が必要です。
+
 ソフトウェアは Electron で構築されているため、そのサイズは避けられないほど大きいです。

 ## ソフトウェアの使用方法

-### アリババクラウド百炼プラットフォームの API KEY の準備
+### 百炼プラットフォームの API KEY の準備

 ソフトウェアが提供するデフォルトの字幕エンジン（アリババクラウド Gummy）を使用するには、アリババクラウド百炼プラットフォームから API KEY を取得し、ローカル環境変数に設定する必要があります。

@@ -31,6 +35,22 @@ Auto Caption は、クロスプラットフォームの字幕表示ソフトウ
 - [API KEY の取得（中国語）](https://help.aliyun.com/zh/model-studio/get-api-key)
 - [環境変数を通じて API Key を設定する（中国語）](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)

+### macOS でのシステムオーディオ出力の取得方法
+
+字幕エンジンは macOS プラットフォームで直接システムオーディオ出力を取得できず、追加のドライバーインストールが必要です。現在の字幕エンジンでは [BlackHole](https://github.com/ExistentialAudio/BlackHole) を使用しています。まずターミナルを開き、以下のいずれかのコマンドを実行してください（最初のオプションを推奨します）：
+
+```bash
+brew install blackhole-2ch
+brew install blackhole-16ch
+brew install blackhole-64ch
+```
+
+インストール完了後、`オーディオMIDI設定`（`cmd + space`で検索可能）を開きます。デバイスリストにBlackHoleが表示されているか確認してください - 表示されていない場合はコンピュータを再起動してください。
+
+BlackHoleのインストールが確認できたら、`オーディオ MIDI 設定`ページで左下のプラス(+)ボタンをクリックし、「マルチ出力デバイスを作成」を選択します。出力に BlackHole と希望するオーディオ出力先の両方を含めてください。最後に、このマルチ出力デバイスをデフォルトのオーディオ出力デバイスに設定します。
+
+これで字幕エンジンがシステムオーディオ出力をキャプチャし、字幕を生成できるようになります。
+
 ### 設定の変更

 字幕の設定は3つのカテゴリーに分かれます：一般的な設定、字幕エンジンの設定、字幕スタイルの設定。注意すべき点として、一般的な設定の変更は即座に適用されます。しかし、他の2つの設定については、変更後に該当する設定モジュール右上の「適用」オプションをクリックすることで初めて変更が有効になります。「変更を取り消す」を選択すると、現在の変更は保存されず、前回の状態に戻ります。
--- a/docs/user-manual/zh.md
+++ b/docs/user-manual/zh.md
@@ -1,12 +1,14 @@
 # Auto Caption 用户手册

-对应版本：v0.2.0
+对应版本：v0.3.0

 ## 软件简介

 Auto Caption 是一个跨平台的字幕显示软件，能够实时获取系统音频输入（录音）或输出（播放声音）的流式数据，并调用音频转文字的模型生成对应音频的字幕。软件提供的默认字幕引擎（使用阿里云 Gummy 模型）支持九种语言（中、英、日、韩、德、法、俄、西、意）的识别与翻译。

-目前软件默认字幕引擎只有在 Windows 平台下才拥有完整功能。在 Linux 平台下只能生成音频输入（麦克风）的字幕，暂不支持音频输出（播放声音）的字幕生成。
+目前软件默认字幕引擎只有在 Windows 和 macOS 平台下才拥有完整功能，在 macOS 要获取系统音频输出需要额外配置。
+
+在 Linux 平台下只能生成音频输入（麦克风）的字幕，暂不支持音频输出（播放声音）的字幕生成。

 ![](../../assets/media/main_zh.png)

@@ -14,13 +16,17 @@ Auto Caption 是一个跨平台的字幕显示软件，能够实时获取系统

 要使用默认字幕服务需要获取阿里云的 API KEY。

+在 macOS 平台获取音频输出需要额外配置。
+
 软件使用 Electron 构建，因此软件体积不可避免的较大。

 ## 软件使用

 ### 准备阿里云百炼平台 API KEY

-要使用软件提供的默认字幕引擎（阿里云 Gummy），需要从阿里云百炼平台获取 API KEY 并在本机环境变量中配置。
+要使用软件提供的默认字幕引擎（阿里云 Gummy），需要从阿里云百炼平台获取 API KEY，然后将 API KEY 添加到软件设置中或者配置到环境变量中（仅 Windows 平台支持读取环境变量中的 API KEY）。
+
+![](../../assets/media/api_zh.png)

 **国际版的阿里云服务并没有提供 Gummy 模型，因此目前非中国用户无法使用默认字幕引擎。我正在开发新的本地字幕引擎，以确保所有用户都有默认字幕引擎可以使用。**

@@ -30,6 +36,22 @@ Auto Caption 是一个跨平台的字幕显示软件，能够实时获取系统

 - [将 API Key 配置到环境变量](https://help.aliyun.com/zh/model-studio/configure-api-key-through-environment-variables)

+### macOS 获取系统音频输出
+
+字幕引擎无法在 macOS 平台直接获取系统的音频输出，需要安装额外的驱动。目前字幕引擎采用的是 [BlackHole](https://github.com/ExistentialAudio/BlackHole)。首先打开终端，执行以下命令中的其中一个（建议选择第一个）：
+
+```bash
+brew install blackhole-2ch
+brew install blackhole-16ch
+brew install blackhole-64ch
+```
+
+安装完成后打开 `音频 MIDI 设置`（`cmd + space` 打开搜索，可以搜索到）。观察设备列表中是否有 BlackHole 设备，如果没有需要重启电脑。
+
+在确定安装好 BlackHole 设备后，在 `音频 MIDI 设置` 页面，点击左下角的加号，选择“创建多输出设备”。在输出中包含 BlackHole 和你想要的音频输出目标。最后将该多输出设备设置为默认音频输出设备。
+
+现在字幕引擎就能捕获系统的音频输出并生成字幕了。
+
 ### 修改设置

 字幕设置可以分为三类：通用设置、字幕引擎设置、字幕样式设置。需要注意的是，修改通用设置是立即生效的。但是对于其他两类设置，修改后需要点击对应设置模块右上角的“应用”选项，更改才会真正生效。如果点击“取消更改”那么当前修改将不会被保存，而是回退到上次修改的状态。