feat(engine): 重构字幕引擎,新增 Sherpa-ONNX SenseVoice 语音识别模型

- 重构字幕引擎,将音频采集改为在新线程上进行
- 重构 audio2text 中的类,调整运行逻辑
- 更新 main 函数,添加对 Sosv 模型的支持
- 修改 AudioStream 类,默认使用 16000Hz 采样率
This commit is contained in:
himeditator
2025-09-06 20:49:46 +08:00
parent 2b7ce06f04
commit eba2c5ca45
14 changed files with 377 additions and 112 deletions

View File

@@ -55,15 +55,10 @@ class AudioStream:
self.FORMAT = 16
self.SAMP_WIDTH = 2
self.CHANNELS = 2
self.RATE = 48000
self.RATE = 16000
self.CHUNK_RATE = chunk_rate
self.CHUNK = self.RATE // chunk_rate
def reset_chunk_size(self, chunk_size: int):
"""
重新设置音频块大小
"""
self.CHUNK = chunk_size
def get_info(self):
dev_info = f"""
音频捕获进程:
@@ -84,7 +79,7 @@ class AudioStream:
启动音频捕获进程
"""
self.process = subprocess.Popen(
["parec", "-d", self.source, "--format=s16le", "--rate=48000", "--channels=2"],
["parec", "-d", self.source, "--format=s16le", "--rate=16000", "--channels=2"],
stdout=subprocess.PIPE
)