lucidrains
5cf8384c56
add a vit with decorrelation auxiliary losses for mha and feedforwards, right after prenorm - this is in line with a paper from the netherlands, but without extra parameters or their manual sgd update scheme
2025-10-28 12:17:32 -07:00
lucidrains
e66862bcd5
add VAT from iclr 2026, which claims SOTA on libero using a relatively simple scheme ( #350 )
2025-10-23 10:23:53 -07:00
lucidrains
845c844b3b
add a vit nd with rotary nd, from Jerry Xiong at UIUC
2025-09-21 10:45:42 -07:00
lucidrains
db05a141a6
add the proposed jumbo vit from Fuller et al. of Carleton University
2025-03-05 10:50:34 -08:00
lucidrains
e7cba9ba6d
add a simple vit flavor for a new bytedance paper that proposes to break out of the traditional one residual stream architecture - "hyper-connections"
2024-12-20 17:43:50 -08:00
Phil Wang
f6d7287b6b
readme
2024-11-19 08:20:38 -08:00
lucidrains
0b5c9b4559
add value residual based simple vit
2024-10-28 09:19:00 -07:00
Phil Wang
1d1a63fc5c
cite for hypersphere vit adapted from ngpt
2024-10-10 10:15:04 -07:00
Phil Wang
f50d7d1436
add a hypersphere vit, adapted from https://arxiv.org/abs/2410.01131
2024-10-09 07:32:25 -07:00
roydenwa
9d43e4d0bb
Add ViViT variant with factorized self-attention ( #327 )
...
* Add FactorizedTransformer
* Add variant param and check in fwd method
* Check if variant is implemented
* Describe new ViViT variant
2024-08-21 19:23:38 -07:00
lucidrains
73199ab486
Nested navit ( #325 )
...
add a variant of NaViT using nested tensors
2024-08-20 15:12:29 -07:00
Phil Wang
dfc8df6713
add the u-vit implementation with simple vit + register tokens
2024-08-07 08:45:57 -07:00
Phil Wang
bd72b58355
add lookup vit, cite, document later
2024-07-19 09:48:58 -07:00
Phil Wang
efb94608ea
readme
2023-10-19 09:38:35 -07:00
Phil Wang
1616288e30
add xcit ( #284 )
...
* add xcit
* use Rearrange layers
* give cross correlation transformer a final norm at end
* document
2023-10-13 09:15:13 -07:00
Jason Chou
9e1e824385
Update README.md ( #283 )
...
`patch_size` is size of patches, not number of patches
2023-10-09 11:33:56 -07:00
lucidrains
a36546df23
add simple vit with register tokens example, cite
2023-10-01 08:11:40 -07:00
Phil Wang
6e2393de95
wrap up NaViT
2023-07-25 10:38:55 -07:00
Phil Wang
17675e0de4
add constant token dropout for NaViT
2023-07-24 14:14:36 -07:00
Phil Wang
598cffab53
release NaViT
2023-07-24 13:55:54 -07:00
Phil Wang
23820bc54a
begin work on NaViT ( #273 )
...
finish core idea of NaViT
2023-07-24 13:54:02 -07:00
Phil Wang
c59843d7b8
add a version of simple vit using flash attention
2023-03-18 09:41:39 -07:00
lucidrains
9a8e509b27
separate a simple vit from mp3, so that simple vit can be used after being pretrained
2023-03-07 19:31:10 -08:00
Srikumar Sastry
4218556acd
Add Masked Position Prediction ( #260 )
...
* Create mp3.py
* Implementation: Position Prediction as an Effective Pretraining Strategy
* Added description for Masked Position Prediction
* MP3 image added
2023-03-07 14:28:40 -08:00
Phil Wang
f621c2b041
typo
2023-03-04 20:30:02 -08:00
Phil Wang
bdaf2d1491
adopt dual patchnorm paper for as many vit as applicable, release 1.0.0
2023-02-03 08:11:29 -08:00
Phil Wang
89e1996c8b
add vit with patch dropout, fully embrace structured dropout as multiple papers are now corroborating each other
2022-12-02 11:28:11 -08:00
Phil Wang
cb6d749821
add a 3d version of cct, addressing https://github.com/lucidrains/vit-pytorch/issues/238 0.38.1
2022-10-29 11:35:06 -07:00
Phil Wang
13fabf901e
add vivit
2022-10-24 09:34:04 -07:00
Ryan Russell
c0eb4c0150
Improving Readability ( #220 )
...
Signed-off-by: Ryan Russell <git@ryanrussell.org >
Signed-off-by: Ryan Russell <git@ryanrussell.org >
2022-10-17 10:42:45 -07:00
Phil Wang
b4853d39c2
add the 3d simple vit
2022-10-16 20:45:30 -07:00
Phil Wang
29fbf0aff4
begin extending some of the architectures over to 3d, starting with basic ViT
2022-10-16 15:31:59 -07:00
Phil Wang
4b8f5bc900
add link to Flax translation by @conceptofmind
2022-07-27 08:58:18 -07:00
Phil Wang
4e62e5f05e
make extractor flexible for layers that output multiple tensors, show CrossViT example
2022-06-19 08:11:41 -07:00
Phil Wang
b3e90a2652
add simple vit, from https://arxiv.org/abs/2205.01580
2022-05-03 20:24:14 -07:00
Phil Wang
4ef72fc4dc
add EsViT, by popular request, an alternative to Dino that is compatible with efficient ViTs with accounting for regional self-supervised loss
2022-05-03 10:29:29 -07:00
Zhengzhong Tu
c2aab05ebf
fix bibtex typo ( #212 )
2022-04-06 22:15:05 -07:00
Phil Wang
2d4089c88e
link to maxvit in readme
2022-04-06 16:24:12 -07:00
Phil Wang
c7bb5fc43f
maxvit intent to build ( #211 )
...
complete hybrid mbconv + block / grid efficient self attention MaxViT
2022-04-06 16:12:17 -07:00
Phil Wang
d65a742efe
intent to build ( #210 )
...
complete SepViT, from bytedance AI labs
2022-03-31 14:30:23 -07:00
Phil Wang
df656fe7c7
complete learnable memory ViT, for efficient fine-tuning and potentially plays into continual learning
2022-03-31 09:51:12 -07:00
Phil Wang
6d7298d8ad
link to tensorflow2 translation by @taki0112
2022-03-28 09:05:34 -07:00
Phil Wang
9cd56ff29b
CCT allow for rectangular images
2022-03-26 14:02:49 -07:00
Phil Wang
2aae406ce8
add proposed parallel vit from facebook ai for exploration purposes
2022-03-23 10:42:35 -07:00
Phil Wang
d27721a85a
add scalable vit, from bytedance AI
2022-03-22 17:02:47 -07:00
Phil Wang
6db20debb4
add patch merger
2022-03-01 16:50:17 -08:00
Phil Wang
126d204ff2
fix block repeats in readme example for Nest
2022-01-22 21:32:53 -08:00
Phil Wang
891b92eb74
readme
2021-12-28 16:00:00 -08:00
Phil Wang
70ba532599
add ViT for small datasets https://arxiv.org/abs/2112.13492
2021-12-28 10:58:21 -08:00
Phil Wang
2c368d1d4e
add extractor wrapper
2021-12-21 11:11:39 -08:00