Documentation Index
Fetch the complete documentation index at: https://liquidai-fix-android-sdk-qa-issues.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
← Back to Audio Models
This model is deprecated. Use LFM2.5-Audio-1.5B for improved ASR, TTS, and CPU-friendly inference.
LFM2-Audio-1.5B was the original fully interleaved audio/text model. It has been superseded by LFM2.5-Audio-1.5B, which features a custom LFM-based audio detokenizer and improved performance.
Specifications
| Property | Value |
|---|
| Parameters | 1.5B (1.2B LM + 115M audio encoder) |
| Context Length | 32K tokens |
| Audio Output | 24kHz (Mimi codec) |
| Supported Language | English |
Quick Start
Install:pip install liquid-audio
pip install "liquid-audio[demo]" # optional, for demo dependencies
pip install flash-attn --no-build-isolation # optional, for flash attention 2
Gradio Demo:liquid-audio-demo
# Starts webserver on http://localhost:7860/
Multi-Turn Chat:import torch
import torchaudio
from liquid_audio import LFM2AudioModel, LFM2AudioProcessor, ChatState
# Load models
HF_REPO = "LiquidAI/LFM2-Audio-1.5B"
processor = LFM2AudioProcessor.from_pretrained(HF_REPO).eval()
model = LFM2AudioModel.from_pretrained(HF_REPO).eval()
# Set up chat
chat = ChatState(processor)
chat.new_turn("system")
chat.add_text("Respond with interleaved text and audio.")
chat.end_turn()
chat.new_turn("user")
wav, sampling_rate = torchaudio.load("question.wav")
chat.add_audio(wav, sampling_rate)
chat.end_turn()
chat.new_turn("assistant")
# Generate text and audio tokens
text_out, audio_out = [], []
for t in model.generate_interleaved(**chat, max_new_tokens=512, audio_temperature=1.0, audio_top_k=4):
if t.numel() == 1:
print(processor.text.decode(t), end="", flush=True)
text_out.append(t)
else:
audio_out.append(t)
# Detokenize audio and save (Mimi returns audio at 24kHz)
mimi_codes = torch.stack(audio_out[:-1], 1).unsqueeze(0)
with torch.no_grad():
waveform = processor.mimi.decode(mimi_codes)[0]
torchaudio.save("answer.wav", waveform.cpu(), 24_000)
Setup:export CKPT=/path/to/LFM2-Audio-1.5B-GGUF
export INPUT_WAV=/path/to/input.wav
export OUTPUT_WAV=/path/to/output.wav
ASR (Audio to Text):./llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf \
--mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf \
-mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf \
-sys "Perform ASR." --audio $INPUT_WAV
TTS (Text to Audio):./llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf \
--mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf \
-mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf \
-sys "Perform TTS." \
-p "What is this obsession people have with books?" \
--output $OUTPUT_WAV
Interleaved Mode:./llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf \
--mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf \
-mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf \
-sys "Respond with interleaved text and audio." \
--audio $INPUT_WAV --output $OUTPUT_WAV
Runners are available for macos-arm64, ubuntu-arm64, ubuntu-x64, and android-arm64.