speech-to-text语音转文字

Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation, multi-language, timestamps. Use for: meeting transcription, subtitles, podcast transcripts, voice notes. Triggers: speech to text, transcription, whisper, audio to text, transcribe audio, voice to text, stt, automatic transcription, subtitles generation, transcribe meeting, audio transcription, whisper ai

作者: admin | 来源: ClawHub

Speech-to-Text

Transcribe audio to text via inference.sh CLI.

Speech-to-Text

Quick Start

CODEBLOCK0

Install note: The install script only detects your OS/architecture, downloads the matching binary from dist.inference.sh, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.

Available Models

Model	App ID	Best For
Fast Whisper V3	INLINECODE1	Fast transcription
Whisper V3 Large

infsh/whisper-v3-large | Highest accuracy |

Examples

Basic Transcription

CODEBLOCK1

With Timestamps

CODEBLOCK2

Translation (to English)

CODEBLOCK3

From Video

CODEBLOCK4

Workflow: Video Subtitles

CODEBLOCK5

Supported Languages

Whisper supports 99+ languages including:
English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.

Use Cases

- Meetings: Transcribe recordings
Podcasts: Generate transcripts
Subtitles: Create captions for videos
Voice Notes: Convert to searchable text
Interviews: Transcription for research
Accessibility: Make audio content accessible

Output Format

Returns JSON with:

- text: Full transcription
INLINECODE4: Timestamped segments (if requested)
INLINECODE5: Detected language

Related Skills

CODEBLOCK6

Browse all audio apps: INLINECODE6

Documentation

- Running Apps - How to run apps via CLI
Audio Transcription Example - Complete transcription guide
Apps Overview - Understanding the app ecosystem

语音转文字

通过 inference.sh 命令行工具将音频转录为文字。

语音转文字

快速开始

bash
curl -fsSL https://cli.inference.sh | sh && infsh login

infsh app run infsh/fast-whisper-large-v3 --input {audio_url: https://audio.mp3}

安装说明： 安装脚本仅检测您的操作系统/架构，从 dist.inference.sh 下载匹配的二进制文件，并验证其 SHA-256 校验和。无需提升权限或后台进程。也可手动安装与验证。

可用模型

模型	应用 ID	最佳用途
Fast Whisper V3	infsh/fast-whisper-large-v3	快速转录
Whisper V3 Large

infsh/whisper-v3-large | 最高准确率 |

示例

基础转录

bash
infsh app run infsh/fast-whisper-large-v3 --input {audio_url: https://meeting.mp3}

带时间戳

bash
infsh app sample infsh/fast-whisper-large-v3 --save input.json

{

audio_url: https://podcast.mp3,

timestamps: true

}

infsh app run infsh/fast-whisper-large-v3 --input input.json

翻译（转英语）

bash
infsh app run infsh/whisper-v3-large --input {
audio_url: https://french-audio.mp3,
task: translate
}

从视频提取

bash

先从视频中提取音频

infsh app run infsh/video-audio-extractor --input {video_url: https://video.mp4} > audio.json

转录提取的音频

infsh app run infsh/fast-whisper-large-v3 --input {audio_url: }

工作流程：视频字幕

bash

1. 转录视频音频

infsh app run infsh/fast-whisper-large-v3 --input {
audio_url: https://video.mp4,
timestamps: true
} > transcript.json

2. 使用转录文本生成字幕

infsh app run infsh/caption-videos --input { video_url: https://video.mp4, captions: }

支持的语言

Whisper 支持 99 种以上语言，包括：
英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文、日语、韩语、阿拉伯语、印地语、俄语等。

使用场景

- 会议：转录录音
播客：生成文字稿
字幕：为视频创建字幕
语音笔记：转换为可搜索文本
访谈：为研究转录
无障碍：让音频内容可访问

输出格式

返回 JSON，包含：

- text：完整转录文本
segments：带时间戳的段落（如请求）
language：检测到的语言

完整平台技能（150+ 应用）

npx skills add inference-sh/skills@inference-sh

文字转语音（反向操作）

npx skills add inference-sh/skills@text-to-speech

视频生成（添加字幕）

npx skills add inference-sh/skills@ai-video-generation

AI 虚拟形象（配合转录文本进行唇形同步）

npx skills add inference-sh/skills@ai-avatar-video

浏览所有音频应用：infsh app list --category audio

文档

- 运行应用 - 如何通过命令行运行应用
音频转录示例 - 完整转录指南
应用概览 - 了解应用生态系统

speech-to-text语音转文字

speech-to-text

Speech-to-Text

Quick Start

Available Models

Examples

Basic Transcription

With Timestamps

Translation (to English)

From Video

Workflow: Video Subtitles

Supported Languages

Use Cases

Output Format

Related Skills

Documentation

语音转文字

快速开始

可用模型

示例

基础转录

带时间戳

{

audio_url: https://podcast.mp3,

timestamps: true

}

翻译（转英语）

从视频提取

先从视频中提取音频

转录提取的音频

工作流程：视频字幕

1. 转录视频音频

2. 使用转录文本生成字幕

支持的语言

使用场景

输出格式

相关技能

完整平台技能（150+ 应用）

文字转语音（反向操作）

视频生成（添加字幕）

AI 虚拟形象（配合转录文本进行唇形同步）

文档

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement