Phil Wang
|
4ef72fc4dc
|
add EsViT, by popular request, an alternative to Dino that is compatible with efficient ViTs with accounting for regional self-supervised loss
|
2022-05-03 10:29:29 -07:00 |
|
Phil Wang
|
81661e3966
|
fix mbconv residual block
|
2022-04-06 16:43:06 -07:00 |
|
Phil Wang
|
13f8e123bb
|
fix maxvit - need feedforwards after attention
|
2022-04-06 16:34:40 -07:00 |
|
Phil Wang
|
c7bb5fc43f
|
maxvit intent to build (#211)
complete hybrid mbconv + block / grid efficient self attention MaxViT
|
2022-04-06 16:12:17 -07:00 |
|
Phil Wang
|
d93cd84ccd
|
let windowed tokens exchange information across heads a la talking heads prior to pointwise attention in sep-vit
|
2022-03-31 15:22:24 -07:00 |
|
Phil Wang
|
5d4c798949
|
cleanup sepvit
|
2022-03-31 14:35:11 -07:00 |
|
Phil Wang
|
d65a742efe
|
intent to build (#210)
complete SepViT, from bytedance AI labs
|
2022-03-31 14:30:23 -07:00 |
|
Phil Wang
|
8c54e01492
|
do not layernorm on last transformer block for scalable vit, as there is already one in mlp head
|
2022-03-31 13:25:21 -07:00 |
|
Phil Wang
|
df656fe7c7
|
complete learnable memory ViT, for efficient fine-tuning and potentially plays into continual learning
|
2022-03-31 09:51:12 -07:00 |
|
Phil Wang
|
4e6a42a0ca
|
correct need for post-attention dropout
|
2022-03-30 10:50:57 -07:00 |
|
Phil Wang
|
9cd56ff29b
|
CCT allow for rectangular images
|
2022-03-26 14:02:49 -07:00 |
|
Phil Wang
|
2aae406ce8
|
add proposed parallel vit from facebook ai for exploration purposes
|
2022-03-23 10:42:35 -07:00 |
|
Phil Wang
|
c2b2db2a54
|
fix window size of none for scalable vit for rectangular images
|
2022-03-22 17:37:59 -07:00 |
|
Phil Wang
|
719048d1bd
|
some better defaults for scalable vit
|
2022-03-22 17:19:58 -07:00 |
|
Phil Wang
|
d27721a85a
|
add scalable vit, from bytedance AI
|
2022-03-22 17:02:47 -07:00 |
|
Phil Wang
|
6db20debb4
|
add patch merger
|
2022-03-01 16:50:17 -08:00 |
|
Phil Wang
|
1bae5d3cc5
|
allow for rectangular images for efficient adapter
|
2022-01-31 08:55:31 -08:00 |
|
Phil Wang
|
25b384297d
|
return None from extractor if no attention layers
|
2022-01-28 17:49:58 -08:00 |
|
Phil Wang
|
64a07f50e6
|
epsilon should be inside square root
|
2022-01-24 17:24:41 -08:00 |
|
Phil Wang
|
c1528acd46
|
fix feature maps in Nest, thanks to @MarkYangjiayi
|
2022-01-22 13:17:30 -08:00 |
|
Phil Wang
|
1cc0f182a6
|
decoder positional embedding needs to be reapplied https://twitter.com/giffmana/status/1479195631587631104
|
2022-01-06 13:14:41 -08:00 |
|
Phil Wang
|
0082301f9e
|
build @jrounds suggestion
|
2022-01-03 12:56:25 -08:00 |
|
chinhsuanwu
|
f2414b2c1b
|
Update MobileViT
|
2021-12-30 05:52:23 +08:00 |
|
Phil Wang
|
70ba532599
|
add ViT for small datasets https://arxiv.org/abs/2112.13492
|
2021-12-28 10:58:21 -08:00 |
|
Phil Wang
|
e52ac41955
|
allow extractor to only return embeddings, to ready for vision transformers to be used in x-clip
|
2021-12-25 12:31:21 -08:00 |
|
Phil Wang
|
2c368d1d4e
|
add extractor wrapper
|
2021-12-21 11:11:39 -08:00 |
|
Phil Wang
|
b983bbee39
|
release MobileViT, from @murufeng
|
2021-12-21 10:22:59 -08:00 |
|
murufeng
|
89d3a04b3f
|
Add files via upload
|
2021-12-21 20:48:34 +08:00 |
|
Phil Wang
|
365b4d931e
|
add adaptive token sampling paper
|
2021-12-03 19:52:40 -08:00 |
|
Phil Wang
|
b45c1356a1
|
cleanup
|
2021-11-22 22:53:02 -08:00 |
|
Phil Wang
|
ff44d97cb0
|
make initial channels customizable for PiT
|
2021-11-22 18:08:49 -08:00 |
|
Phil Wang
|
b69b5af34f
|
dynamic positional bias for crossformer the more efficient way as described in appendix of paper
|
2021-11-22 17:39:36 -08:00 |
|
Phil Wang
|
36e32b70fb
|
complete and release crossformer
|
2021-11-22 17:10:53 -08:00 |
|
Phil Wang
|
768e47441e
|
crossformer without dynamic position bias
|
2021-11-22 16:21:55 -08:00 |
|
Phil Wang
|
6665fc6cd1
|
cleanup region vit
|
2021-11-22 12:42:24 -08:00 |
|
Phil Wang
|
9f8c60651d
|
clearer mae
|
2021-11-22 10:19:48 -08:00 |
|
Phil Wang
|
5ae555750f
|
add SimMIM
|
2021-11-21 15:50:19 -08:00 |
|
Phil Wang
|
dc57c75478
|
cleanup
|
2021-11-14 12:24:48 -08:00 |
|
Phil Wang
|
e8f6d72033
|
release masked autoencoder
|
2021-11-12 20:08:48 -08:00 |
|
Phil Wang
|
cb1729af28
|
more efficient feedforward for regionvit
|
2021-11-07 17:18:59 -08:00 |
|
Phil Wang
|
06d375351e
|
add RegionViT paper
|
2021-11-07 09:47:28 -08:00 |
|
Phil Wang
|
f196d1ec5b
|
move freqs in RvT to linspace
|
2021-10-05 09:23:44 -07:00 |
|
Yonghye Kwon
|
24ac8350bf
|
remove unused package
|
2021-08-30 18:25:03 +09:00 |
|
Yonghye Kwon
|
ca3cef9de0
|
Cleanup Attention Class
|
2021-08-30 18:05:16 +09:00 |
|
Phil Wang
|
73ed562ce4
|
Merge pull request #147 from developer0hye/patch-4
Make T2T process any scale image
|
2021-08-21 09:03:42 -07:00 |
|
Yonghye Kwon
|
ca0bdca192
|
Make model process any scale image
Related to #145
|
2021-08-21 22:35:26 +09:00 |
|
Yonghye Kwon
|
1c70271778
|
Support image with width and height less than the image_size
Related to #145
|
2021-08-21 22:25:46 +09:00 |
|
Yonghye Kwon
|
946815164a
|
Remove unused package
|
2021-08-20 13:44:57 +09:00 |
|
Phil Wang
|
aeed3381c1
|
use hardswish for levit
|
2021-08-19 08:22:55 -07:00 |
|
Phil Wang
|
3f754956fb
|
remove last transformer layer in t2t
|
2021-08-14 08:06:23 -07:00 |
|