Use default attention implementation with option to override
#2
by
nvidia-oliver-holworthy
- opened
Enables specifying attn_implementation when loading model including spda
Thank you!
Enables specifying attn_implementation when loading model including spda
Thank you!