mirror of
https://github.com/lucidrains/vit-pytorch.git
synced 2025-12-30 08:02:29 +00:00
additional diagram
This commit is contained in:
@@ -497,6 +497,8 @@ pred = model(img) # (1, 1000)
|
||||
|
||||
<img src="./images/crossformer.png" width="400px"></img>
|
||||
|
||||
<img src="./images/crossformer2.png" width="400px"></img>
|
||||
|
||||
This <a href="https://arxiv.org/abs/2108.00154">paper</a> beats PVT and Swin using alternating local and global attention. The global attention is done across the windowing dimension for reduced complexity, much like the scheme used for axial attention.
|
||||
|
||||
They also have cross-scale embedding layer, which they shown to be a generic layer that can improve all vision transformers. Dynamic relative positional bias was also formulated to allow the net to generalize to images of greater resolution.
|
||||
|
||||
BIN
images/crossformer2.png
Normal file
BIN
images/crossformer2.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 237 KiB |
Reference in New Issue
Block a user