I have found an Neural-Network architecture I want to replicate in keras, but it is made in pytorch. But I have problems replicating it especially when it comes to the Atten