I have been trying to implement a Snapchat-like edit text on an image. What I did so far is implement a UILabel in the center of the UIImageView and I added 3 gestures to this U
The issue you have is that your code takes the current transform and adds another transform based on the current "movement", so you accumulate changes (compound them, really) as you move during a single gesture.
Keep instance variables for rotation, scale, and movement, update the relevant one in each of your gesture recognizer's actions (you'll also need to store the state of each at the beginning of each gesture, so you can apply the delta to the initial state), and create the transform from scratch using those three variables. The transform creating should of course be factorized in a separate function, since you're going to use it several times.