model configration

#1
by anjulRajendraSharma - opened

Hi can you please share model configuration used while pre-training and fine-tuning.

Owner

hi, you can check this draft PR https://github.com/k2-fsa/icefall/pull/1745

Owner

pruned RNN-T fine-tune, LS 960

./hubert/finetune.py
--world-size 8
--num-epochs 222
--start-epoch 1
--use-fp16 0
--exp-dir hubert/exp_finetune
--pretrained-dir download/hubert/hubert_large_ll60k.pt
--full-libri 1
--max-duration 80
--accum-grad 1
--do-normalize 1
--encoder-layers 24
--encoder-embed-dim 1024
--encoder-ffn-embed-dim 4096
--encoder-attention-heads 16
--final-dim 768
--layer-norm-first 1
--untie-final-proj 1
--extractor-mode "layer_norm"
--mask-prob 0.50
--mask-channel-prob 0.25
--mask-channel-length 64
--encoder-layerdrop 0.1
--activation-dropout 0.1
--feature-grad-mult 0.1
--base-lr 0.001
--lr-epochs 10.5

Owner

pruned RNN-T decode

for ((epoch=2; epoch<=19; epoch+=1)); do
for ((avg=1; avg<=$epoch-1; avg+=1)); do
./hubert/decode.py
--epoch $epoch
--avg $avg
--exp-dir ./hubert/exp_finetune
--max-duration 1000
--decoding-method greedy_search
--do-normalize 1
--encoder-layers 24
--encoder-embed-dim 1024
--encoder-ffn-embed-dim 4096
--encoder-attention-heads 16
--final-dim 768
--layer-norm-first 1
--untie-final-proj 1
--extractor-mode "layer_norm"
done
done

The hubert large pretrained model is downloaded from fairseq.

thank you @yfyeung for your quick response.
I'm considering you used the same configuration for large zipformer pertaining model too.

Does zipformer pertaining and fine-tuning support streaming?.
I found causal implemented with zipfomer but not exposed to use?

Owner

No, the streaming Zipformer (--causal 1) differs from the non-streaming Zipformer (--causal 0) in terms of its model architecture.

thank you @yfyeung for clarification

anjulRajendraSharma changed discussion status to closed

Hi @yfyeung , Above large pre-training and finetuning configuration doesn't fit the current script,

I tried it to change the parameter name as below .
--encoder-layers -> --num-encoder-layers
--encoder-embed-dim -> --encoder-dim
--encoder-ffn-embed-dim -> --feedforward-dim
--encoder-attention-heads -> --num-heads
--final-dim -> ??

zipformer/pretrain.py takes the above parameters but leads to 3702317981 parameters, different from 318M in your model.
Can you please help me to replicate your model arch? would be a great help if you could provide parameters for latest script.

Owner

Hi, here is the 318M config. It was trained on 32 V100 32GB GPUs for about 2–3 weeks. Due to limited compute resources at the time, it didn’t reach as many training steps as the original HuBERT paper (400k) or the CMU replication version (800k).

Make sure use the code in the PR: https://github.com/k2-fsa/icefall/pull/1745, instead of code in master.

torchrun
--nproc_per_node $num_gpus
--nnodes $num_nodes
--node_rank $node_rank
--master_addr $master_addr
--master_port $master_port
zipformer/pretrain.py
--use-multi-node 1
--master-port $master_port
--num-epochs 20
--start-epoch 1
--use-fp16 1
--exp-dir zipformer/exp_pretrain
--max-duration 350
--quadratic-duration 1024
--accum-grad 1
--do-normalize 1
--mask-prob 0.8
--dropout-input 0.0
--dropout-features 0.0
--feature-grad-mult 1.0
--num-encoder-layers 2,2,4,5,4,2
--feedforward-dim 768,1536,2048,3072,2048,1536
--encoder-dim 256,512,768,1024,768,512
--encoder-unmasked-dim 256,256,256,320,256,256
--base-lr 0.045

Personally, I found that Zipformer scales with diminishing returns. Compared to Standard Zipformer Large (~150M), the 318M model sacrifices a lot in efficiency without showing significant performance improvements.

Owner

After completing pre-training, you can export the averaged model using the following script:

./zipformer/generate_averaged_model.py
--exp-dir k2ssl-librilight-zipformer-large/exp
--epoch xxx
--avg xxx (~3/4 epochs)
--num-encoder-layers 2,2,4,5,4,2
--feedforward-dim 768,1536,2048,3072,2048,1536
--encoder-dim 256,512,768,1024,768,512
--encoder-unmasked-dim 256,256,256,320,256,256

thank you @yfyeung :)

Hi @yfyeung

i'm trying to train a multilingual large SSL model in K2 librispeech recipe with your config(318M parameters).
I have made a small change in recipe, trained 3000 k-means cluster insted of 500.

i'm stuck with error in very first compute_mask_indices stage in zipformer pretraining.

File "/icefall/egs/librispeech/SSL/zipformer/hubert_ce.py", line 423, in forward
x, mask_indices = self.apply_mask(features, padding_mask, target_list)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/icefall/egs/librispeech/SSL/zipformer/hubert_ce.py", line 319, in apply_mask
mask_indices = compute_mask_indices(
^^^^^^^^^^^^^^^^^^^^^
File "/icefall/egs/librispeech/SSL/zipformer/hubert_ce.py", line 191, in compute_mask_indices
raise ValueError(
ValueError: the entire sequence is masked. sz=8; mask_idc[mask_idc]; index=None

i tried by reducing --mask-prob till 0.1 but no luck.

Can you please provide some guidance to fix this issue.
thank you in advance.

hi @yfyeung , the training is running fine now, there was issue with audio loading with bash sox conversion.

Owner

OK. Glad to hear that.

Sign up or log in to comment