Phil Wang
|
aa9ed249a3
|
add knowledge distillation with distillation tokens, in light of new finding from facebook ai
|
2020-12-24 10:39:15 -08:00 |
|
Phil Wang
|
ea0924ec96
|
update readme
|
2020-12-23 19:06:48 -08:00 |
|
Phil Wang
|
24339644ca
|
offer a way to use mean pooling of last layer
|
2020-12-23 17:23:58 -08:00 |
|
Phil Wang
|
b786029e18
|
fix the dimension per head to be independent of dim and heads, to make sure users do not have it be too small to learn anything
|
2020-12-17 07:43:52 -08:00 |
|
Phil Wang
|
a656a213e6
|
update diagram
|
2020-12-04 12:26:28 -08:00 |
|
Long M. Lưu
|
3f50dd72cf
|
Update README.md
|
2020-11-21 18:37:03 +07:00 |
|
Phil Wang
|
4f84ad7a64
|
authors are now known
|
2020-11-03 14:28:20 -08:00 |
|
Phil Wang
|
c74bc781f0
|
cite
|
2020-11-03 11:59:05 -08:00 |
|
Phil Wang
|
c1043ab00c
|
update readme
|
2020-10-26 19:01:03 -07:00 |
|
Phil Wang
|
5b5d98a3a7
|
dropouts are more specific and aggressive in the paper, thanks for letting me know @hila-chefer
|
2020-10-14 09:22:16 -07:00 |
|
Phil Wang
|
0b2b3fc20c
|
add dropouts
|
2020-10-13 13:11:59 -07:00 |
|
Phil Wang
|
b298031c17
|
write up example for using efficient transformers
|
2020-10-07 19:15:21 -07:00 |
|
Phil Wang
|
f7123720c3
|
add masking
|
2020-10-07 11:21:03 -07:00 |
|
Phil Wang
|
8fb261ca66
|
fix a bug and add suggestion for BYOL pre-training
|
2020-10-04 14:55:29 -07:00 |
|
Phil Wang
|
112ba5c476
|
update with link to Yannics video
|
2020-10-04 13:53:47 -07:00 |
|
Phil Wang
|
f899226d4f
|
add diagram
|
2020-10-04 12:47:08 -07:00 |
|
Phil Wang
|
ee8088b3ea
|
first commit
|
2020-10-04 12:35:01 -07:00 |
|
Phil Wang
|
ea03db32f0
|
Update README.md
|
2020-10-03 15:49:27 -07:00 |
|
Phil Wang
|
30362d50dc
|
Update README.md
|
2020-10-03 15:49:02 -07:00 |
|
Phil Wang
|
efb40e0b01
|
Initial commit
|
2020-10-03 15:47:26 -07:00 |
|