lucidrains
dd6462d19b
release small navit perf
1.16.3
2025-12-06 04:57:12 -08:00
Amit Moryossef
a1ee1daa1a
optimize NaViT with SDPA and vectorized forward pass ( #353 )
...
- Replace manual attention with F.scaled_dot_product_attention
- Use repeat_interleave instead of meshgrid for position computation
- Build image_ids efficiently with repeat_interleave instead of F.pad
- Remove unused Rearrange import
~56% speedup (91ms -> 58ms on 512 variable-sized images)
Numerically equivalent (max diff ~5e-4, within flash attention tolerance)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-authored-by: Claude <noreply@anthropic.com >
2025-12-06 04:56:40 -08:00
lucidrains
3cff5e547a
address https://github.com/lucidrains/vit-pytorch/issues/352
1.16.2
2025-12-02 05:21:52 -08:00
lucidrains
fdaf7f92b9
fix positional embed for mean pool case and cleanup
2025-11-27 17:01:47 -08:00
lucidrains
0ebd4edab9
address https://github.com/lucidrains/vit-pytorch/issues/351
1.16.0
2025-11-27 06:07:43 -08:00
lucidrains
aa49c2783a
VAAT should have two ears
1.15.7
2025-11-22 08:32:23 -08:00
lucidrains
6aa0374313
register tokens for the AST in VAAT
1.15.6
2025-11-22 08:12:01 -08:00
lucidrains
b35a97de05
improvise a variant of VAT with audio cortex before fully generalizing it
1.15.5
2025-11-22 07:51:19 -08:00
lucidrains
1374b93145
the paper claims finetuning everything was better, but just allow for freezing the visual cortex, what PI proposes
1.15.4
2025-11-09 10:59:55 -08:00
lucidrains
4386742cd1
an option to return zero for decorr aux loss if insufficient samples
1.15.3
2025-11-09 10:08:06 -08:00
lucidrains
5cf8384c56
add a vit with decorrelation auxiliary losses for mha and feedforwards, right after prenorm - this is in line with a paper from the netherlands, but without extra parameters or their manual sgd update scheme
2025-10-28 12:17:32 -07:00
lucidrains
f7d59cecb5
some register tokens cannot hurt for VAT
1.14.5
2025-10-24 14:00:38 -07:00
lucidrains
a583cb5988
last tweak to vat
1.14.4
2025-10-23 12:21:09 -07:00
lucidrains
25871013f5
forgot task conditioning for vat
1.14.2
2025-10-23 10:55:16 -07:00
lucidrains
e66862bcd5
add VAT from iclr 2026, which claims SOTA on libero using a relatively simple scheme ( #350 )
1.14.1
2025-10-23 10:23:53 -07:00
lucidrains
39fd9ac8be
for n-dimensional vit, have a method for fetching muon friendly parameters
1.12.5
2025-10-13 12:07:48 -07:00
lucidrains
3becf087bb
have a language model address https://github.com/lucidrains/vit-pytorch/issues/348
2025-09-25 06:21:13 -07:00
lucidrains
f6bc14c81d
able to return embed from vit-nd-rotary
1.12.2
2025-09-23 07:21:34 -07:00
lucidrains
845c844b3b
add a vit nd with rotary nd, from Jerry Xiong at UIUC
1.12.1
2025-09-21 10:45:42 -07:00
lucidrains
5f2bc0c796
with assistance from claude (yes it did the einops equation building here), generalize to n-dimensions
1.12.0
2025-09-21 06:22:43 -07:00
lucidrains
35bf273037
1.11.7
1.11.7
2025-08-17 18:07:42 -07:00
Baraa sameeh
1123063a5e
Make all CCT regularization parameters user-configurable. ( #346 )
2025-08-17 18:07:25 -07:00
lucidrains
f8bec5ede2
able to project the image embedding before applying time positional embedding for accept video wrapper
1.11.6
2025-08-13 10:15:18 -07:00
lucidrains
297e7d00a2
handle channel first for accept video wrapper
1.11.5
2025-08-03 08:29:40 -07:00
lucidrains
29ac8e143c
fix when video time seq len less than max time seq len for video acceptor
1.11.4
2025-07-27 09:00:56 -07:00
lucidrains
e05cd6d8b8
some models only return embeddings with some kwarg on forward
1.11.3
2025-07-27 08:46:43 -07:00
lucidrains
b46233c3d6
need to be able to invoke with eval no grad
1.11.2
2025-07-27 08:25:58 -07:00
lucidrains
68e13a3c7d
bit more flexible
1.11.1
2025-07-27 08:14:48 -07:00
lucidrains
b22dc0ecd2
add a wrapper for accepting video and processing the images individually, optionally able to add time positional embeddings - for use in two robotics work
1.11.0
2025-07-27 08:05:48 -07:00
lucidrains
db05a141a6
add the proposed jumbo vit from Fuller et al. of Carleton University
2025-03-05 10:50:34 -08:00
lucidrains
9f49a31977
1.9.2
1.9.2
2025-01-19 05:53:11 -08:00
JacobLinCool
ab63fc9cc8
remove duplicated qkv computation in na_vit_nested_tensor_3d.py ( #341 )
2025-01-19 05:52:46 -08:00
Phil Wang
c3018d1433
1.9.1
1.9.1
2025-01-04 07:55:49 -08:00
Kale Kundert
b7ed6bad28
add option to set frame padding for 3D CCT ( #339 )
2025-01-04 07:55:27 -08:00
lucidrains
e7cba9ba6d
add a simple vit flavor for a new bytedance paper that proposes to break out of the traditional one residual stream architecture - "hyper-connections"
2024-12-20 17:43:50 -08:00
lucidrains
56373c0cbd
make value residual learned
1.8.9
2024-11-24 08:21:28 -08:00
lucidrains
24196a3e8a
allow for qk norm to be turned off for na vit nested tensor
1.8.8
2024-11-20 10:59:22 -08:00
Phil Wang
f6d7287b6b
readme
2024-11-19 08:20:38 -08:00
lucidrains
d47c57e32f
fix tests
2024-11-10 09:43:54 -08:00
lucidrains
0449865786
update minimum version for nested tensor of NaViT
2024-11-10 09:37:48 -08:00
lucidrains
6693d47d0b
update comment for navit 3d
2024-11-07 20:02:07 -08:00
Phil Wang
141239ca86
fix value residual
1.8.7
2024-10-31 06:48:24 -07:00
lucidrains
0b5c9b4559
add value residual based simple vit
1.8.6
2024-10-28 09:19:00 -07:00
lucidrains
e300cdd7dc
fix multiheaded qk rmsnorm in nViT
1.8.5
2024-10-10 19:15:17 -07:00
Phil Wang
36ddc7a6ba
go all the way with the normalized vit, fix some scales
1.8.4
2024-10-10 10:42:37 -07:00
Phil Wang
1d1a63fc5c
cite for hypersphere vit adapted from ngpt
2024-10-10 10:15:04 -07:00
Phil Wang
74b62009f8
go for multi-headed rmsnorm for the qknorm on hypersphere vit
1.8.2
2024-10-10 08:09:58 -07:00
Phil Wang
f50d7d1436
add a hypersphere vit, adapted from https://arxiv.org/abs/2410.01131
1.8.1
2024-10-09 07:32:25 -07:00
lucidrains
82f2fa751d
address https://github.com/lucidrains/vit-pytorch/issues/330
1.7.14
2024-10-04 07:01:48 -07:00
lucidrains
fcb9501cdd
add register tokens to the nested tensor 3d na vit example for researcher
1.7.12
2024-08-28 12:21:31 -07:00