additional diagram

This commit is contained in:
Phil Wang
2021-11-22 14:05:39 -08:00
parent 6665fc6cd1
commit de0b8ba189
2 changed files with 2 additions and 0 deletions

View File

@@ -497,6 +497,8 @@ pred = model(img) # (1, 1000)
<img src="./images/crossformer.png" width="400px"></img>
<img src="./images/crossformer2.png" width="400px"></img>
This <a href="https://arxiv.org/abs/2108.00154">paper</a> beats PVT and Swin using alternating local and global attention. The global attention is done across the windowing dimension for reduced complexity, much like the scheme used for axial attention.
They also have cross-scale embedding layer, which they shown to be a generic layer that can improve all vision transformers. Dynamic relative positional bias was also formulated to allow the net to generalize to images of greater resolution.

BIN
images/crossformer2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 237 KiB