Delphi: Fast (de) JPEG encoding with libjpeg-turbo
Once, while profiling a library for remote monitoring of the desktop, I discovered that a lot of resources and time take JPEG encoding / decoding. Having studied third-party solutions to speed up this procedure, it was decided to use libjpeg-turbo.
Under cat there is a lot of code on Delphi and pitfalls of library use are described
The standard jpeg.pas module is a wrapper over libjpeg. libjpeg-turbo was created to make it easy to replace libjpeg, so it has a compatible API, and there is a huge gain in speed.
According to the link you can see a comparison of libjpeg vs libjpeg-turbo vs intel- ipp. In a nutshell, this library is 3 times faster than libjpeg, and the same speed as Intel IPP but free.
Before inventing a bicycle, I went through Google and stumbled upon the delphi-jpeg-turbo project . The project is certainly useful, but as it turned out, its implementation did not suit me:
The project did not fit, but the headers as a basis for their implementation turned out to be very useful.
I will not paint the libjpeg-turbo API, since I myself did not understand it deeply and, after what I saw, I hope that I don’t have to dig further into this. The source code for the Jpeg.pas module supplied with Delphi, which uses libjpeg with a compatible api, helped a lot to learn the API. If there is something wrong with my implementation, please correct)
So, here's what I got:
At the time of writing, there was a problem with the jpeg_mem_dest function, as it turned out, it allocates memory inside itself using the memory allocator from msvcrt.dll, and accordingly we need to manually free the memory using the mirror function from the same msvcrt.dll.
This option did not suit me for the reason that I use jpegturbo in which msvcrt is linked statically and the pointer to the memory deallocation function is not exported. I had to write my implementation of jpeg_mem_dest which uses the standard delphi memory allocator:
Well, headers, they are in the suJpegTurboHeadersUnit.pas module, these are headers taken from delphi-jpeg-turbo with a couple of improvements:
It was written and tested on Delphi 2010.
All sources can be downloaded here
Under cat there is a lot of code on Delphi and pitfalls of library use are described
What is all this for?
The standard jpeg.pas module is a wrapper over libjpeg. libjpeg-turbo was created to make it easy to replace libjpeg, so it has a compatible API, and there is a huge gain in speed.
According to the link you can see a comparison of libjpeg vs libjpeg-turbo vs intel- ipp. In a nutshell, this library is 3 times faster than libjpeg, and the same speed as Intel IPP but free.
Project delphi-jpeg-turbo
Before inventing a bicycle, I went through Google and stumbled upon the delphi-jpeg-turbo project . The project is certainly useful, but as it turned out, its implementation did not suit me:
- Implemented as a successor from TBitmap, I use TFastDib for normal multithreaded work;
- Outdated header, the author uses decoding in RGB format, and then translates it pixel by pixel into the required Windows GDI GBR format. This takes a lot of time, although libjpeg-turbo can decode immediately in GBR
- Memory loading: libjpeg-turbo has the functions jpeg_mem_src and jpeg_mem_dest, which allow you to immediately encode / decode from the buffer without creating a bunch of intermediate code;
The project did not fit, but the headers as a basis for their implementation turned out to be very useful.
What happened
I will not paint the libjpeg-turbo API, since I myself did not understand it deeply and, after what I saw, I hope that I don’t have to dig further into this. The source code for the Jpeg.pas module supplied with Delphi, which uses libjpeg with a compatible api, helped a lot to learn the API. If there is something wrong with my implementation, please correct)
So, here's what I got:
suJpegTurboUnit.pas
unit suJpegTurboUnit;
interfaceuses
Windows, SysUtils,
FastDIB;
type//Обработчик события которое возникает когда получен буфер с JPEG файлом
TOnEncodedJpegBuffer = reference toprocedure(ABuffer: Pointer; ABufferSize: LongWord);//Декодирование JPEG файла из буфераfunctionDecodeJpegTurbo(ABuffer: Pointer; ABufferLen: Integer; HQ: Boolean = True): TFastDIB;
//Кодирование JPEGprocedureEncodeJpegTurbo(Source: TFastDIB; Quality: Integer; OnEncodedBuffer: TOnEncodedJpegBuffer);implementationuses
suJpegTurboHeadersUnit,
suJpegTurboMemDestUnit;
var
_LibInitialized: LongBool = False;
//Обработчик критической ошибкиprocedureErrorExit(cinfo: j_common_ptr);cdecl;
var
Msg: AnsiString;
begin//Получаем текст ошибки
SetLength(Msg, JMSG_LENGTH_MAX);
cinfo^.err^.format_message(cinfo, PAnsiChar(Msg));
//Что бы обрезать мусор после #0 передаем Msg как PAnsiCharraise Exception.CreateFmt('JPEG error #%d (%s)',
[cinfo^.err^.msg_code, PAnsiChar(Msg)]);
end;
//Обработчик вывода текста ошибки на экранprocedureOutputMessage(cinfo: j_common_ptr);cdecl;
beginend;
//Инициализация библиотеки LibJpeg-TurboprocedureInitLib;beginif _LibInitialized thenExit;
//Пробуем инициалироватьifnot init_libJPEG thenraise Exception.Create('initialization of libJPEG failed.');
//Запишем флаг что библиотека инициализированнаif InterlockedCompareExchange(Integer(_LibInitialized), Integer(True), Integer(False)) = Integer(True) then//Если там уже записан этот флаг, значит другой поток инициализировал библиотеку//выгрузим свой экземпляр
quit_libJPEG;
end;
//ДекодированиеfunctionDecodeJpegTurbo(ABuffer: pointer; ABufferLen: Integer; HQ: Boolean): TFastDIB;
var
Loop: Integer;
JpegErr: jpeg_error_mgr;
Jpeg: jpeg_decompress_struct;
begin//Инициализация библиотеки
InitLib;
FillChar(Jpeg, SizeOf(Jpeg), 0);
FillChar(JpegErr, SizeOf(JpegErr), 0);
//Создаем структуру декомпрессора
jpeg_create_decompress(@Jpeg);
try//Назначим дефолтный обработчик ошибок
Jpeg.err := jpeg_std_error(@JpegErr);
//Переопределим методы дефолтного обработчика. По умолчанию, при возникновении//любой ошибки в LibJPEG происходит закрытие приложения, и вывод ошибки в MessageBox
JpegErr.error_exit := ErrorExit;
JpegErr.output_message := OutputMessage;
jpeg_mem_src(@Jpeg, ABuffer, ABufferLen);
//Прочитаем хедеры, дабы знать высоту и ширину картинки
jpeg_read_header(@jpeg, False);
//На выходе нужно получать пиксели BGR
jpeg.out_color_space := JCS_EXT_BGR;
//Настроим масштабирование - 1:1
jpeg.scale_num := 1;
jpeg.scale_denom := 1;
//Скорость или хорошее качествоIf HQ thenbegin
jpeg.do_block_smoothing := 1;
jpeg.do_fancy_upsampling := 1;
jpeg.dct_method := JDCT_ISLOW
endelsebegin
jpeg.do_block_smoothing := 0;
jpeg.do_fancy_upsampling := 0;
jpeg.dct_method := JDCT_IFAST;
end;
//Декодируем изображение
jpeg_start_decompress(@Jpeg);
try
Result := TFastDIB.Create(jpeg.output_width, jpeg.output_height, 24);
try//Читаем строкиfor Loop := 0to jpeg.output_height - 1do
jpeg_read_scanlines(@jpeg, @Result.Scanlines[Result.Height - 1 - Loop], 1);
except
FreeAndNil(Result);
raise;
end;
finally//Заканчиваем декодирование
jpeg_finish_decompress(@Jpeg);
end;
finally//Уничтожаем ненужные объекты
jpeg_destroy_decompress(@Jpeg);
end;
end;
//Кодирование изображенияprocedureEncodeJpegTurbo(Source: TFastDIB; Quality: Integer; OnEncodedBuffer: TOnEncodedJpegBuffer);var
ScanLine: JSAMPROW;
CompressedBuff: Pointer;
CompressedSize: LongWord;
JpegErr: jpeg_error_mgr;
Jpeg: jpeg_compress_struct;
begin//Инициализация библиотеки
InitLib;
FillChar(Jpeg, SizeOf(Jpeg), 0);
FillChar(JpegErr, SizeOf(JpegErr), 0);
//Создаем структуру компрессора
jpeg_create_compress(@Jpeg);
try//Назначим дефолтный обработчик ошибок
Jpeg.err := jpeg_std_error(@JpegErr);
//Переопределим методы дефолтного обработчика. По умолчанию, при возникновении//любой ошибки в LibJPEG происходит закрытие приложения, и вывод ошибки в MessageBox
JpegErr.error_exit := ErrorExit;
JpegErr.output_message := OutputMessage;
CompressedSize := 0;
CompressedBuff := nil;
//Используем свою реализацию jpeg_mem_dest из-за утечек памяти в стандартной.
suJpegTurboMemDestUnit.jpeg_mem_dest(@Jpeg, @CompressedBuff, @CompressedSize);
try
jpeg.image_width := Source.Width;
jpeg.image_height := Source.Height;
jpeg.input_components := Source.Info.Header.BitCount div8;
jpeg.in_color_space := JCS_EXT_BGR;
//Setting defaults
jpeg_set_defaults(@Jpeg);
//Качество сжатия
jpeg_set_quality(@Jpeg, Quality, True);
//Декодируем изображение
jpeg_start_compress(@Jpeg, True);
trywhile Jpeg.next_scanline < Jpeg.image_height dobegin
ScanLine := JSAMPROW(Source.Scanlines[Jpeg.image_height - Jpeg.next_scanline - 1]);
jpeg_write_scanlines(@Jpeg, @ScanLine, 1);
end;
finally//Заканчиваем кодирование
jpeg_finish_compress(@Jpeg);
end;
//Передаем буфер вызывающей процедуреif Assigned(OnEncodedBuffer) then
OnEncodedBuffer(CompressedBuff, CompressedSize);
finally//Освободим память
FreeMemory(CompressedBuff);
end;
finally//Уничтожаем ненужные объекты
jpeg_destroy_compress(@Jpeg);
end;
end;
initializationfinalization//Выгружаем если была загруженаif _LibInitialized then
quit_libJPEG;
end.
At the time of writing, there was a problem with the jpeg_mem_dest function, as it turned out, it allocates memory inside itself using the memory allocator from msvcrt.dll, and accordingly we need to manually free the memory using the mirror function from the same msvcrt.dll.
This option did not suit me for the reason that I use jpegturbo in which msvcrt is linked statically and the pointer to the memory deallocation function is not exported. I had to write my implementation of jpeg_mem_dest which uses the standard delphi memory allocator:
suJpegTurboMemDestUnit.pas
{
Реализация аналога jpeg_mem_dest из JpegTurbo.
Так как стандартная реализация требует после кодирования изображения освободить
память RTL функцией Free (которая не экспортируется), к которой у нас нет доступа,
пришлось написать аналог, который выделяет/освобождает память с помощью
GetMemory/FreeMemory.
Соответвенно, используя jpeg_mem_dest из данного модуля освобождать память с
картинкой нужно стандартной FreeMemory.
Код портирован на Delphi из jdatadst.c
}unit suJpegTurboMemDestUnit;
interfaceuses
suJpegTurboHeadersUnit;
procedurejpeg_mem_dest(cinfo: j_compress_ptr; outbuffer: PPointer; outsize: PLongWord);implementationconst
OUTPUT_BUF_SIZE = 4096; //choose an efficiently fwrite'able sizetype
my_mem_destination_mgr = record
pub: jpeg_destination_mgr; //public fields
outbuffer: PPointer; //target buffer
outsize: PLongWord;
newbuffer: Pointer; //newly allocated buffer
buffer: JOCTET_ptr; //start of buffer
bufsize: LongWord;
end;
my_mem_dest_ptr = ^my_mem_destination_mgr;
//Initialize destination --- called by jpeg_start_compress//before any data is actually written.procedureinit_mem_destination(cinfo: j_compress_ptr);cdecl;
begin//no work necessary hereend;
{
Empty the output buffer --- called whenever buffer fills up.
In typical applications, this should write the entire output buffer
(ignoring the current state of next_output_byte & free_in_buffer),
reset the pointer & count to the start of the buffer, and return TRUE
indicating that the buffer has been dumped.
In applications that need to be able to suspend compression due to output
overrun, a FALSE return indicates that the buffer cannot be emptied now.
In this situation, the compressor will return to its caller (possibly with
an indication that it has not accepted all the supplied scanlines). The
application should resume compression after it has made more room in the
output buffer. Note that there are substantial restrictions on the use of
suspension --- see the documentation.
When suspending, the compressor will back up to a convenient restart point
(typically the start of the current MCU). next_output_byte & free_in_buffer
indicate where the restart point will be if the current call returns FALSE.
Data beyond this point will be regenerated after resumption, so do not
write it out when emptying the buffer externally.
}functionempty_mem_output_buffer(cinfo: j_compress_ptr): Boolean; cdecl;
var
nextsize: LongWord;
dest: my_mem_dest_ptr;
nextbuffer: JOCTET_ptr;
begin
dest := my_mem_dest_ptr(cinfo^.dest);
//Try to allocate new buffer with double size
nextsize := dest^.bufsize * 2;
nextbuffer := GetMemory(nextsize);
if nextbuffer = nilthen
ERREXIT1(j_common_ptr(cinfo), JERR_OUT_OF_MEMORY, 10);
Move(dest^.buffer^, nextbuffer^, dest^.bufsize);
if dest^.newbuffer <> nilthen
FreeMemory(dest^.newbuffer);
dest^.newbuffer := nextbuffer;
dest^.pub.next_output_byte := JOCTET_ptr(PByte(nextbuffer) + dest^.bufsize);
dest^.pub.free_in_buffer := dest^.bufsize;
dest^.buffer := nextbuffer;
dest^.bufsize := nextsize;
Result := True;
end;
procedureterm_mem_destination(cinfo: j_compress_ptr);cdecl;
var
dest: my_mem_dest_ptr;
begin
dest := my_mem_dest_ptr(cinfo^.dest);
dest^.outbuffer^ := dest^.buffer;
dest^.outsize^ := dest^.bufsize - dest^.pub.free_in_buffer;
end;
{
Prepare for output to a memory buffer.
The caller may supply an own initial buffer with appropriate size.
Otherwise, or when the actual data output exceeds the given size,
the library adapts the buffer size as necessary.
The standard library functions GetMemory/FreeMemory are used for allocating
larger memory, so the buffer is available to the application after
finishing compression, and then the application is responsible for
freeing the requested memory.
}procedurejpeg_mem_dest(cinfo: j_compress_ptr; outbuffer: PPointer; outsize: PLongWord);var
dest: my_mem_dest_ptr;
beginif (outbuffer = nil) or (outsize = nil) then
ERREXIT(j_common_ptr(cinfo), JERR_BUFFER_SIZE);
if (cinfo^.dest = nil) then//first time for this JPEG object?
cinfo^.dest := cinfo^.mem.alloc_small(j_common_ptr(cinfo), JPOOL_PERMANENT,
SizeOf(my_mem_destination_mgr));
dest := my_mem_dest_ptr(cinfo^.dest);
dest^.pub.init_destination := init_mem_destination;
dest^.pub.empty_output_buffer := empty_mem_output_buffer;
dest^.pub.term_destination := term_mem_destination;
dest^.outbuffer := outbuffer;
dest^.outsize := outsize;
dest^.newbuffer := nil;
if (outbuffer^ = nil) or (outsize^ = 0) thenbegin//Allocate initial buffer
outbuffer^ := GetMemory(OUTPUT_BUF_SIZE);
dest^.newbuffer := outbuffer^;
if dest^.newbuffer = nilthen
ERREXIT1(j_common_ptr(cinfo), JERR_OUT_OF_MEMORY, 10);
outsize^ := OUTPUT_BUF_SIZE;
end;
dest^.buffer := outbuffer^;
dest^.pub.next_output_byte := dest^.buffer;
dest^.bufsize := outsize^;
dest^.pub.free_in_buffer := dest^.bufsize;
end;
end.
Well, headers, they are in the suJpegTurboHeadersUnit.pas module, these are headers taken from delphi-jpeg-turbo with a couple of improvements:
suJpegTurboHeadersUnit.pas
{ Known color spaces. }
J_COLOR_SPACE = (
JCS_UNKNOWN, { error/unspecified }
JCS_GRAYSCALE, //* monochrome */
JCS_RGB, //* red/green/blue as specified by the RGB_RED, RGB_GREEN,//RGB_BLUE, and RGB_PIXELSIZE macros */
JCS_YCbCr, //* Y/Cb/Cr (also known as YUV) */
JCS_CMYK, //* C/M/Y/K */
JCS_YCCK, //* Y/Cb/Cr/K */
JCS_EXT_RGB, //* red/green/blue */
JCS_EXT_RGBX, //* red/green/blue/x */
JCS_EXT_BGR, //* blue/green/red */
JCS_EXT_BGRX, //* blue/green/red/x */
JCS_EXT_XBGR, //* x/blue/green/red */
JCS_EXT_XRGB, //* x/red/green/blue */// When out_color_space it set to JCS_EXT_RGBX, JCS_EXT_BGRX,// JCS_EXT_XBGR, or JCS_EXT_XRGB during decompression, the X byte is// undefined, and in order to ensure the best performance,// libjpeg-turbo can set that byte to whatever value it wishes. Use// the following colorspace constants to ensure that the X byte is set// to 0xFF, so that it can be interpreted as an opaque alpha// channel.
JCS_EXT_RGBA, ///* red/green/blue/alpha */
JCS_EXT_BGRA, //* blue/green/red/alpha */
JCS_EXT_ABGR, //* alpha/blue/green/red */
JCS_EXT_ARGB //* alpha/red/green/blue */
);
...
{ Standard data source and destination managers: stdio streams. }{ Caller is responsible for opening the file before and closing after. }// jpeg_stdio_dest: procedure(cinfo: j_compress_ptr; FILE * outfile); cdecl;// jpeg_stdio_src: procedure(cinfo: j_decompress_ptr; FILE * infile); cdecl;
jpeg_mem_src: procedure(cinfo: j_decompress_ptr; inbuffer: Pointer; insize: LongWord);cdecl;
jpeg_mem_dest: procedure(cinfo: j_decompress_ptr; outbuffer: Pointer; outsize: PLongWord);cdecl;
...
Functioninit_libJPEG(): boolean;
...
@jpeg_mem_src := GetProcAddress(libJPEG_Handle, 'jpeg_mem_src');
@jpeg_mem_dest := GetProcAddress(libJPEG_Handle, 'jpeg_mem_dest');
...
{$DEFINE JPEG_LIB_VERSION = 62}//Version 6btype
J_MESSAGE_CODE = (
JMSG_NOMESSAGE,
{$IF Declared(JPEG_LIB_VERSION) and (JPEG_LIB_VERSION < 70)}
JERR_ARITH_NOTIMPL,
{$IFEND}
JERR_BAD_ALIGN_TYPE,
JERR_BAD_ALLOC_CHUNK,
JERR_BAD_BUFFER_MODE,
JERR_BAD_COMPONENT_ID,
{$IF Declared(JPEG_LIB_VERSION) and (JPEG_LIB_VERSION >= 70)}
JERR_BAD_CROP_SPEC,
{$IFEND}
JERR_BAD_DCT_COEF,
JERR_BAD_DCTSIZE,
{$IF Declared(JPEG_LIB_VERSION) and (JPEG_LIB_VERSION >= 70)}
JERR_BAD_DROP_SAMPLING,
{$IFEND}
JERR_BAD_HUFF_TABLE,
JERR_BAD_IN_COLORSPACE,
JERR_BAD_J_COLORSPACE,
JERR_BAD_LENGTH,
JERR_BAD_LIB_VERSION,
JERR_BAD_MCU_SIZE,
JERR_BAD_POOL_ID,
JERR_BAD_PRECISION,
JERR_BAD_PROGRESSION,
JERR_BAD_PROG_SCRIPT,
JERR_BAD_SAMPLING,
JERR_BAD_SCAN_SCRIPT,
JERR_BAD_STATE,
JERR_BAD_STRUCT_SIZE,
JERR_BAD_VIRTUAL_ACCESS,
JERR_BUFFER_SIZE,
JERR_CANT_SUSPEND,
JERR_CCIR601_NOTIMPL,
JERR_COMPONENT_COUNT,
JERR_CONVERSION_NOTIMPL,
JERR_DAC_INDEX,
JERR_DAC_VALUE,
JERR_DHT_INDEX,
JERR_DQT_INDEX,
JERR_EMPTY_IMAGE,
JERR_EMS_READ,
JERR_EMS_WRITE,
JERR_EOI_EXPECTED,
JERR_FILE_READ,
JERR_FILE_WRITE,
JERR_FRACT_SAMPLE_NOTIMPL,
JERR_HUFF_CLEN_OVERFLOW,
JERR_HUFF_MISSING_CODE,
JERR_IMAGE_TOO_BIG,
JERR_INPUT_EMPTY,
JERR_INPUT_EOF,
JERR_MISMATCHED_QUANT_TABLE,
JERR_MISSING_DATA,
JERR_MODE_CHANGE,
JERR_NOTIMPL,
JERR_NOT_COMPILED,
{$IF Declared(JPEG_LIB_VERSION) and (JPEG_LIB_VERSION >= 70)}
JERR_NO_ARITH_TABLE,
{$IFEND}
JERR_NO_BACKING_STORE,
JERR_NO_HUFF_TABLE,
JERR_NO_IMAGE,
JERR_NO_QUANT_TABLE,
JERR_NO_SOI,
JERR_OUT_OF_MEMORY,
JERR_QUANT_COMPONENTS,
JERR_QUANT_FEW_COLORS,
JERR_QUANT_MANY_COLORS,
JERR_SOF_DUPLICATE,
JERR_SOF_NO_SOS,
JERR_SOF_UNSUPPORTED,
JERR_SOI_DUPLICATE,
JERR_SOS_NO_SOF,
JERR_TFILE_CREATE,
JERR_TFILE_READ,
JERR_TFILE_SEEK,
JERR_TFILE_WRITE,
JERR_TOO_LITTLE_DATA,
JERR_UNKNOWN_MARKER,
JERR_VIRTUAL_BUG,
JERR_WIDTH_OVERFLOW,
JERR_XMS_READ,
JERR_XMS_WRITE
);
procedureERREXIT(cinfo: j_common_ptr; code: J_MESSAGE_CODE);procedureERREXIT1(cinfo: j_common_ptr; code: J_MESSAGE_CODE; p1: Integer);
...
//Макрос из jerror.h//Fatal errors (print message and exit)procedureERREXIT(cinfo: j_common_ptr; code: J_MESSAGE_CODE);begin
cinfo^.err^.msg_code := Ord(code);
cinfo^.err^.error_exit(j_common_ptr(cinfo));
end;
procedureERREXIT1(cinfo: j_common_ptr; code: J_MESSAGE_CODE; p1: Integer);begin
cinfo^.err^.msg_code := Ord(code);
cinfo^.err^.msg_parm.i[0] := p1;
cinfo^.err^.error_exit(j_common_ptr(cinfo));
end;
It was written and tested on Delphi 2010.
All sources can be downloaded here