I'm attempting to write a jpeg encoder and am stumbling at creating the algorithms that gather the appropriate Y, Cb, and Cr color components in order to pass to the method performing the transform.
As I understand it for the four most common subsampling variants are setup as follows (I could be way off here):
- 4:4:4 - An MCU block of 8x8 pixels with Y, Cb, and Cr represented in each pixel.
- 4:2:2 - An MCU block of 16x8 pixels with Y in each pixel and Cb, Cr every two pixels
- 4:2:0 - An MCU block of 16x16 pixels with Y every two pixels and Cb, Cr every four
There most explicit description of the laout I have found so far is described here
What I don't understand is how to gather those components in the correct order to pass as an 8x8 block for transforming and quantizing.
Would someone be able to write an example, (pseudocode would be fine I'm sure, C# even better), of how to group the bytes for transform?
I'll include the current, incorrect, code I am running.
/// <summary> /// Writes the Scan header structure /// </summary> /// <param name="image">The image to encode from.</param> /// <param name="writer">The writer to write to the stream.</param> private void WriteStartOfScan(ImageBase image, EndianBinaryWriter writer) { // Marker writer.Write(new[] { JpegConstants.Markers.XFF, JpegConstants.Markers.SOS }); // Length (high byte, low byte), must be 6 + 2 * (number of components in scan) writer.Write((short)0xc); // 12 byte[] sos = { 3, // Number of components in a scan, usually 1 or 3 1, // Component Id Y 0, // DC/AC Huffman table 2, // Component Id Cb 0x11, // DC/AC Huffman table 3, // Component Id Cr 0x11, // DC/AC Huffman table 0, // Ss - Start of spectral selection. 0x3f, // Se - End of spectral selection. 0 // Ah + Ah (Successive approximation bit position high + low) }; writer.Write(sos); // Compress and write the pixels // Buffers for each Y'Cb Cr component float[] yU = new float[64]; float[] cbU = new float[64]; float[] crU = new float[64]; // The descrete cosine values for each componant. int[] dcValues = new int[3]; // TODO: Why null? this.huffmanTable = new HuffmanTable(null); // TODO: Color output is incorrect after this point. // I think I've got my looping all wrong. // For each row for (int y = 0; y < image.Height; y += 8) { // For each column for (int x = 0; x < image.Width; x += 8) { // Convert the 8x8 array to YCbCr this.RgbToYcbCr(image, yU, cbU, crU, x, y); // For each component this.CompressPixels(yU, 0, writer, dcValues); this.CompressPixels(cbU, 1, writer, dcValues); this.CompressPixels(crU, 2, writer, dcValues); } } this.huffmanTable.FlushBuffer(writer); } /// <summary> /// Converts the pixel block from the RGBA colorspace to YCbCr. /// </summary> /// <param name="image"></param> /// <param name="yComponant">The container to house the Y' luma componant within the block.</param> /// <param name="cbComponant">The container to house the Cb chroma componant within the block.</param> /// <param name="crComponant">The container to house the Cr chroma componant within the block.</param> /// <param name="x">The x-position within the image.</param> /// <param name="y">The y-position within the image.</param> private void RgbToYcbCr(ImageBase image, float[] yComponant, float[] cbComponant, float[] crComponant, int x, int y) { int height = image.Height; int width = image.Width; for (int a = 0; a < 8; a++) { // Complete with the remaining right and bottom edge pixels. int py = y + a; if (py >= height) { py = height - 1; } for (int b = 0; b < 8; b++) { int px = x + b; if (px >= width) { px = width - 1; } YCbCr color = image[px, py]; int index = a * 8 + b; yComponant[index] = color.Y; cbComponant[index] = color.Cb; crComponant[index] = color.Cr; } } } /// <summary> /// Compress and encodes the pixels. /// </summary> /// <param name="componantValues">The current color component values within the image block.</param> /// <param name="componantIndex">The componant index.</param> /// <param name="writer">The writer.</param> /// <param name="dcValues">The descrete cosine values for each componant</param> private void CompressPixels(float[] componantValues, int componantIndex, EndianBinaryWriter writer, int[] dcValues) { // TODO: This should be an option. byte[] horizontalFactors = JpegConstants.ChromaFourTwoZeroHorizontal; byte[] verticalFactors = JpegConstants.ChromaFourTwoZeroVertical; byte[] quantizationTableNumber = { 0, 1, 1 }; int[] dcTableNumber = { 0, 1, 1 }; int[] acTableNumber = { 0, 1, 1 }; for (int y = 0; y < verticalFactors[componantIndex]; y++) { for (int x = 0; x < horizontalFactors[componantIndex]; x++) { // TODO: This can probably be combined reducing the array allocation. float[] dct = this.fdct.FastFDCT(componantValues); int[] quantizedDct = this.fdct.QuantizeBlock(dct, quantizationTableNumber[componantIndex]); this.huffmanTable.HuffmanBlockEncoder(writer, quantizedDct, dcValues[componantIndex], dcTableNumber[componantIndex], acTableNumber[componantIndex]); dcValues[componantIndex] = quantizedDct[0]; } } }
This code is part of an open source library I am writing on Github