As others have said, the specifications are a good introduction. It's true that they are technical in nature, and worded to be precise - but they are really some of the best specifications I've seen for any protocol, especially the latest RFCs (6120 and 6121) which clarify some of the grey areas in the originals.
E.g. you mention wanting to know the definition of a stanza, it's explained (with examples) in 6120 section 8.
If you have any feedback on how the specifications can be made clearer, then say so on the XMPP mailing list, where all feedback is considered for the next drafts of the specifications.
If the specifications are really too much for you (I appreciate some people like more pictures than I do), do consider the book (whether in paper or digital form) - it's designed exactly as an easy introduction to both the core specifications and the most common extensions, and written by people who help develop and implement them.