问题
I'm currently having problems making my audio and video streams stay synced.
These are the AVCodecContexts I'm using:
For Video:
AVCodec* videoCodec = ffmpeg.avcodec_find_encoder(AVCodecID.AV_CODEC_ID_H264)
AVCodecContext* videoCodecContext = ffmpeg.avcodec_alloc_context3(videoCodec);
videoCodecContext->bit_rate = 400000;
videoCodecContext->width = 1280;
videoCodecContext->height = 720;
videoCodecContext->gop_size = 12;
videoCodecContext->max_b_frames = 1;
videoCodecContext->pix_fmt = videoCodec->pix_fmts[0];
videoCodecContext->codec_id = videoCodec->id;
videoCodecContext->codec_type = videoCodec->type;
videoCodecContext->time_base = new AVRational
{
num = 1,
den = 30
};
For Audio:
AVCodec* audioCodec = ffmpeg.avcodec_find_encoder(AVCodecID.AV_CODEC_ID_AAC)
AVCodecContext* audioCodecContext = ffmpeg.avcodec_alloc_context3(audioCodec);
audioCodecContext->bit_rate = 1280000;
audioCodecContext->sample_rate = 48000;
audioCodecContext->channels = 2;
audioCodecContext->channel_layout = ffmpeg.AV_CH_LAYOUT_STEREO;
audioCodecContext->frame_size = 1024;
audioCodecContext->sample_fmt = audioCodec->sample_fmts[0];
audioCodecContext->profile = ffmpeg.FF_PROFILE_AAC_LOW;
audioCodecContext->codec_id = audioCodec->id;
audioCodecContext->codec_type = audioCodec->type;
When writing the video frames, I setup the PTS position as follows:
outputFrame->pts = frameIndex; // The current index of the image frame being written
I then encode the frame using avcodec_encode_video2(). After this, I call the following to setup the time stamps:
ffmpeg.av_packet_rescale_ts(&packet, videoCodecContext->time_base, videoStream->time_base);
This plays perfectly.
However, when I do the same for audio, the video plays in slow motion, plays the audio first and then carry's on with the video afterwards with no sound.
I cannot find an example anywhere of how to set pts/dts positions for video/audio in an MP4 file. Any examples of help would be great!
Also, I'm writing the video frames first, after which (once they are all written) I write the audio. I've updated this question with the adjusted values suggested in the comments.
I've uploaded a test video to show my results here: http://www.filedropper.com/test_124
回答1:
PS: Check out this article/tutorial on A/V Sync with FFmpeg. It might help you if the below doesn't.
1) Regarding the video & audio timestamps...
Rather than use a current frameIndex as the timestamp, and then later rescaling them. If possible just skip the rescale.
The alternative would then be to make sure PTS values (in outputFrame->pts) are created correctly in the first place by using the video's frames-per-second (FPS). To do this...
For each Video frame : outputFrame->pts = (1000 / FPS) * frameIndex;
(For a 30 FPS video, frame 1 has 0 time and by frame 30 the "clock" has reached 1 second.
So 1000 / 30 now gives each video frame a presentation interval of 33.333 msecs. When frameIndex is 30 we can say 33.333 x 30 = 1000 m.secs (or 1 second, confirming 30 frames for each second).
For each Audio frame : outputFrame->pts = ((1024 / 48000) * 1000) * frameIndex;
(since 48khz AAC frame has a duration of 21.333 m.secs, the timestamp increases by that amount of time. The formula is : (1024 PCM / SampleRate) x 1000 ms/perSec) then multiply by frame index).
2) Regarding the audio settings...
Bit-rate :audioCodecContext->bit_rate = 64000; seems odd if your sample_rate is 48000Hz (and I assume, your bit-depth is 16-bits per sample?).
Try either 96000 or 128000 as lowest starting values.
Frame Size :
int AVCodecContext::frame_sizemeans "Number of samples per channel in an audio frame".
Considering the above quote of the Docs, and that MPEG AAC does not do "per channel" (since data for both L/R channels is contained within each frame). The AAC frames each hold 1024 PCM samples.
audioCodecContext->frame_size = 88200; for size, you could try = 1024;
Profile :
I noticed you've used MAIN for AAC profile. I'm used to seeing Low Complexity in videos. I tried a few random MP4 filess from various sources on my HDD and I cannot find one using "Main" profile. As a last resort, testing "Low Complexity" won't hurt.
Try using audioCodecContext->profile = ffmpeg.FF_PROFILE_AAC_LOW;
PS: Check this for a possible AAC issue (depending on your FFmpeg version).
回答2:
Solved the problem. I've added a new function to set video/audio positions after setting the frames PTS positions.
Video is just the usual increment (+1 for each frame), whereas audio is done as follows:
outputFrame->pts = ffmpeg.av_rescale_q(m_audioFrameSampleIncrement, new AVRational { num = 1, den = 48000 }, m_audioCodecContext->time_base);
m_audioFrameSampleIncrement += outputFrame->nb_samples;
After the frame is encoded, I call my new function:
private static void SetPacketProperties(ref AVPacket packet, AVCodecContext* codecContext, AVStream* stream)
{
packet.pts = ffmpeg.av_rescale_q_rnd(packet.pts, codecContext->time_base, stream->time_base, AVRounding.AV_ROUND_NEAR_INF | AVRounding.AV_ROUND_PASS_MINMAX);
packet.dts = ffmpeg.av_rescale_q_rnd(packet.dts, codecContext->time_base, stream->time_base, AVRounding.AV_ROUND_NEAR_INF | AVRounding.AV_ROUND_PASS_MINMAX);
packet.duration = (int)ffmpeg.av_rescale_q(packet.duration, codecContext->time_base, stream->time_base);
packet.stream_index = stream->index;
}
来源:https://stackoverflow.com/questions/38198052/sync-audio-video-in-mp4-using-autogen-ffmpeg-library