LFM2-Audio-1.5B-GGUF
This example demonstrates the LFM2-Audio-1.5B audio model.
Link to HF: LiquidAI/LFM2-Audio-1.5B.
The model supports following modes
- ASR:
- input
audio.wav, outputtext
- input
- TTS:
- input
text, outputaudio.wav
- input
- interleaved:
- input
textoraudio.wav, outputtextandaudio.wav
- input
GGUFS
There are total 3 GGUFs for this model.
Set $CKPT to path to the path containing downloaded GGUFs.
Set $INPUT_WAV to path to input wav file.
export CKPT=/data/playground/checkpoints/LFM2-Audio-1.5B-GGUF
export INPUT_WAV=/tmp/input.wav
export OUTPUT_WAV=/tmp/output.wav
(cd $CKPT && ls *.gguf)
audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf LFM2-Audio-1.5B-Q8_0.gguf mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf
Optionally, float16 ggufs can be downloaded and used by replacing Q8_0 with F16.
Binaries
runners folder contain runners for andoird-arm64, macos-arm64, ubuntu-arm64, and ubuntu-x64.
runners
βββ android-arm64
β βββ lfm2-audio-android-arm64.zip
βββ macos-arm64
β βββ lfm2-audio-macos-arm64.zip
βββ ubuntu-arm64
β βββ lfm2-audio-ubuntu-arm64.zip
βββ ubuntu-x64
βββ lfm2-audio-ubuntu-x64.zip
Each package contains llama-lfm2-audio and llama-mtmd-cli binaries.
Run using llama-lfm2-audio
There are 3 supported modes
- ASR
- TTS
- interleaved
The mode is defined by system prompt. There are limitations on system prompt and binary will check for them and raise an error if needed.
ASR
ASR requires -sys "Perform ASR." and --audio audio.wav for input. It will print text to console
lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform ASR." --audio $INPUT_WAV
TTS
TTS requires -sys "Perform TTS.", -p "What is this obsession people have with books?" for input, and --output output.wav for output. It will save audio to output.wav.
lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform TTS." -p "What is this obsession people have with books?" --output $OUTPUT_WAV
Interleaved
Interleaved produces both, text and audio as output, and can consume text or audio as input.
lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Respond with interleaved text and audio." --audio $INPUT_WAV --output $OUTPUT_WAV
Run ASR using llama-mtmd-cli
Build llama-mtmd-cli following the standard build procedure.
lfm2-audio-<platform>/llama-mtmd-cli -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -p "<__media__>" -sys "Perform ASR." --audio $INPUT_WAV
Debug
For reproducible results set --temp 0.
- Downloads last month
- 4,062
8-bit
16-bit