In my program, it gets MP4 video in, and I want it to output a MP3 (without any server-side stuff.) Since Android (and my app) needs to run on many different hardware config
MPEG_4 Part 14 (.mp4 file extension) is a container format. In other words, this specifies how multiple media streams can be packaged together. Processing container formats is much less computationally expensive than - for example - compressing or decompressing video. I would be surprised if it turned out to be too computationally expensive to read through an .mp4 file and extract an audio stream on a cell phone ARM processor.
I haven't seen any immediately suitable Java libraries either. It probably wouldn't be too hard to build your own library. Parsing container formats is much simpler than decompressing video. And you do have the libavformat implementation in ffmpeg as a reference. The MPEG4 Part 14 standards can be found here:
http://webstore.iec.ch/preview/info_isoiec14496-14%7Bed1.0%7Den.pdf
and here:
http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html