Try LFM • Documentation • LEAP

LFM2-Audio-1.5B-GGUF

This example demonstrates the LFM2-Audio-1.5B audio model.

Link to HF: LiquidAI/LFM2-Audio-1.5B.

The model supports following modes

ASR:
- input audio.wav, output text
TTS:
- input text, output audio.wav
interleaved:
- input text or audio.wav, output text and audio.wav

GGUFS

There are total 3 GGUFs for this model.

Set $CKPT to path to the path containing downloaded GGUFs. Set $INPUT_WAV to path to input wav file.

export CKPT=/data/playground/checkpoints/LFM2-Audio-1.5B-GGUF
export INPUT_WAV=/tmp/input.wav
export OUTPUT_WAV=/tmp/output.wav

(cd $CKPT && ls *.gguf)
audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf  LFM2-Audio-1.5B-Q8_0.gguf  mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf

Optionally, float16 ggufs can be downloaded and used by replacing Q8_0 with F16.

Binaries

runners folder contain runners for andoird-arm64, macos-arm64, ubuntu-arm64, and ubuntu-x64.

runners
├── android-arm64
│   └── lfm2-audio-android-arm64.zip
├── macos-arm64
│   └── lfm2-audio-macos-arm64.zip
├── ubuntu-arm64
│   └── lfm2-audio-ubuntu-arm64.zip
└── ubuntu-x64
    └── lfm2-audio-ubuntu-x64.zip

Each package contains llama-lfm2-audio and llama-mtmd-cli binaries.

Run using `llama-lfm2-audio`

There are 3 supported modes

ASR
TTS
interleaved

The mode is defined by system prompt. There are limitations on system prompt and binary will check for them and raise an error if needed.

ASR

ASR requires -sys "Perform ASR." and --audio audio.wav for input. It will print text to console

lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform ASR." --audio $INPUT_WAV

TTS

TTS requires -sys "Perform TTS.", -p "What is this obsession people have with books?" for input, and --output output.wav for output. It will save audio to output.wav.

lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform TTS." -p "What is this obsession people have with books?" --output $OUTPUT_WAV

Interleaved

Interleaved produces both, text and audio as output, and can consume text or audio as input.

lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Respond with interleaved text and audio." --audio $INPUT_WAV --output $OUTPUT_WAV

Run ASR using `llama-mtmd-cli`

Build llama-mtmd-cli following the standard build procedure.

lfm2-audio-<platform>/llama-mtmd-cli -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -p "<__media__>" -sys "Perform ASR." --audio $INPUT_WAV

Debug

For reproducible results set --temp 0.

Downloads last month: 4,062

GGUF

Model size

0.3B params

Architecture

this model cannot be used as LLM, use it via --model-vocoder in TTS examples

Hardware compatibility

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LiquidAI/LFM2-Audio-1.5B-GGUF

Base model

LiquidAI/LFM2-1.2B

Finetuned

LiquidAI/LFM2-Audio-1.5B

Quantized

(1)

this model

Collection including LiquidAI/LFM2-Audio-1.5B-GGUF

🔈 LFM2-Audio

Collection

End-to-end audio foundation model, designed for low latency and real-time conversations • 3 items • Updated 4 days ago • 8