model configration

by anjulRajendraSharma - opened Mar 21

Discussion

anjulRajendraSharma

Mar 21

Hi can you please share model configuration used while pre-training and fine-tuning.

anjulRajendraSharma

Mar 21

@yfyeung

yfyeung

Owner Mar 21

hi, you can check this draft PR https://github.com/k2-fsa/icefall/pull/1745

yfyeung

Owner Mar 21

pruned RNN-T fine-tune, LS 960

./hubert/finetune.py
--world-size 8
--num-epochs 222
--start-epoch 1
--use-fp16 0
--exp-dir hubert/exp_finetune
--pretrained-dir download/hubert/hubert_large_ll60k.pt
--full-libri 1
--max-duration 80
--accum-grad 1
--do-normalize 1
--encoder-layers 24
--encoder-embed-dim 1024
--encoder-ffn-embed-dim 4096
--encoder-attention-heads 16
--final-dim 768
--layer-norm-first 1
--untie-final-proj 1
--extractor-mode "layer_norm"
--mask-prob 0.50
--mask-channel-prob 0.25
--mask-channel-length 64
--encoder-layerdrop 0.1
--activation-dropout 0.1
--feature-grad-mult 0.1
--base-lr 0.001
--lr-epochs 10.5

yfyeung

Owner Mar 21

pruned RNN-T decode

for ((epoch=2; epoch<=19; epoch+=1)); do
for ((avg=1; avg<=$epoch-1; avg+=1)); do
./hubert/decode.py
--epoch $epoch
--avg $avg
--exp-dir ./hubert/exp_finetune
--max-duration 1000
--decoding-method greedy_search
--do-normalize 1
--encoder-layers 24
--encoder-embed-dim 1024
--encoder-ffn-embed-dim 4096
--encoder-attention-heads 16
--final-dim 768
--layer-norm-first 1
--untie-final-proj 1
--extractor-mode "layer_norm"
done
done

yfyeung

Owner Mar 21

•

edited Mar 21

The hubert large pretrained model is downloaded from fairseq.

anjulRajendraSharma

Mar 22

•

edited Mar 22

thank you @yfyeung for your quick response.
I'm considering you used the same configuration for large zipformer pertaining model too.

Does zipformer pertaining and fine-tuning support streaming?.
I found causal implemented with zipfomer but not exposed to use?

yfyeung

Owner Mar 22

No, the streaming Zipformer (--causal 1) differs from the non-streaming Zipformer (--causal 0) in terms of its model architecture.

anjulRajendraSharma

Mar 22

thank you @yfyeung for clarification

anjulRajendraSharma changed discussion status to closed Mar 24

anjulRajendraSharma

Mar 24

Hi @yfyeung , Above large pre-training and finetuning configuration doesn't fit the current script,

I tried it to change the parameter name as below .
--encoder-layers -> --num-encoder-layers
--encoder-embed-dim -> --encoder-dim
--encoder-ffn-embed-dim -> --feedforward-dim
--encoder-attention-heads -> --num-heads
--final-dim -> ??

zipformer/pretrain.py takes the above parameters but leads to 3702317981 parameters, different from 318M in your model.
Can you please help me to replicate your model arch? would be a great help if you could provide parameters for latest script.

yfyeung

Owner Mar 24

Hi, here is the 318M config. It was trained on 32 V100 32GB GPUs for about 2–3 weeks. Due to limited compute resources at the time, it didn’t reach as many training steps as the original HuBERT paper (400k) or the CMU replication version (800k).

Make sure use the code in the PR: https://github.com/k2-fsa/icefall/pull/1745, instead of code in master.

torchrun
--nproc_per_node $num_gpus
--nnodes $num_nodes
--node_rank $node_rank
--master_addr $master_addr
--master_port $master_port
zipformer/pretrain.py
--use-multi-node 1
--master-port $master_port
--num-epochs 20
--start-epoch 1
--use-fp16 1
--exp-dir zipformer/exp_pretrain
--max-duration 350
--quadratic-duration 1024
--accum-grad 1
--do-normalize 1
--mask-prob 0.8
--dropout-input 0.0
--dropout-features 0.0
--feature-grad-mult 1.0
--num-encoder-layers 2,2,4,5,4,2
--feedforward-dim 768,1536,2048,3072,2048,1536
--encoder-dim 256,512,768,1024,768,512
--encoder-unmasked-dim 256,256,256,320,256,256
--base-lr 0.045

Personally, I found that Zipformer scales with diminishing returns. Compared to Standard Zipformer Large (~150M), the 318M model sacrifices a lot in efficiency without showing significant performance improvements.

yfyeung

Owner Mar 24

After completing pre-training, you can export the averaged model using the following script:

./zipformer/generate_averaged_model.py
--exp-dir k2ssl-librilight-zipformer-large/exp
--epoch xxx
--avg xxx (~3/4 epochs)
--num-encoder-layers 2,2,4,5,4,2
--feedforward-dim 768,1536,2048,3072,2048,1536
--encoder-dim 256,512,768,1024,768,512
--encoder-unmasked-dim 256,256,256,320,256,256

anjulRajendraSharma

Mar 24

thank you @yfyeung :)

anjulRajendraSharma

Apr 1

Hi @yfyeung

i'm trying to train a multilingual large SSL model in K2 librispeech recipe with your config(318M parameters).
I have made a small change in recipe, trained 3000 k-means cluster insted of 500.

i'm stuck with error in very first compute_mask_indices stage in zipformer pretraining.

File "/icefall/egs/librispeech/SSL/zipformer/hubert_ce.py", line 423, in forward
x, mask_indices = self.apply_mask(features, padding_mask, target_list)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/icefall/egs/librispeech/SSL/zipformer/hubert_ce.py", line 319, in apply_mask
mask_indices = compute_mask_indices(
^^^^^^^^^^^^^^^^^^^^^
File "/icefall/egs/librispeech/SSL/zipformer/hubert_ce.py", line 191, in compute_mask_indices
raise ValueError(
ValueError: the entire sequence is masked. sz=8; mask_idc[mask_idc]; index=None

i tried by reducing --mask-prob till 0.1 but no luck.

Can you please provide some guidance to fix this issue.
thank you in advance.

anjulRajendraSharma

Apr 8

hi @yfyeung , the training is running fine now, there was issue with audio loading with bash sox conversion.

yfyeung

Owner Apr 8

OK. Glad to hear that.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment