lucidrains
|
5cf8384c56
|
add a vit with decorrelation auxiliary losses for mha and feedforwards, right after prenorm - this is in line with a paper from the netherlands, but without extra parameters or their manual sgd update scheme
|
2025-10-28 12:17:32 -07:00 |
|