lunedì 21 luglio 2008

Data compression with ZLib

As you read on my previous post for store images on ggredit file I write the raw Pixbuf buffer using Base64 encoding. However for big images this method cause performance to fall down. If I load an image that take 3Mb on memory, when I store it on the ggredit file it take 4Mb, this cause a slowly file reading/writing (using libXml2) and also a generic slow because ggredit store this buffer n-times (for undo/redo). My idea is to use data compression.



My first approach was to write my own compression algorithm. I start to implement the lzw algorithm, this algorithm fascinate me. But when I read about some patents on this algorithm (LZW Patents), sicerly I don't understand this long story, so for avoid any problem I try to search an other way.
I have a look to Inkscape (my favourite gtkmm project) references and I find ZLib.
ZLib is a C library that implement the Deflate algorithm (originally used by PKZip, many many years ago, I remember its use at school time).
So I start to write a tiny C++ wrapper.
On ZLib each function refer to a z_stream object, and we can use a z_stream object for inflate or for deflate using the initialization functions: deflateInit for create a compression stream or inflateInit for create a decompression stream. If the z_stream object was initialized using deflateInit we need to call deflateEnd when the compression operation is complete, if the z_stream object was initialized using inflateInit. we need to call inflateEnd when the decompression operation is complete.
This is my ZStream architecture (very simple):



The ZStreamBase class is the z_stream wrapper.

/*!
* @brief Base class for Zlib streams
*/
class ZStreamBase
{
public:
/*!
* @brief Buffert type
*/
typedef unsigned char char_t;

/*!
* @brief Constructor
*/
ZStreamBase();

/*!
* @brief Destructor
*/
virtual ~ZStreamBase();

/*!
* @brief Byte available in input
*
* @return true if not data are available in input
*/
bool isInEmpty() const { return z_stream_.avail_in == 0; };

/*!
* @briet Byte available in output
*
* @return true if not data are available in output
*/
bool isOutEmpty() const { return z_stream_.avail_out == 0; };

/*!
* @brief Stream input complete on inflate
*
* @return true if the stream input is compltete on inflate
*/
bool isStreamEnd() const { return stream_end_; }

/*!
* @brief Write a buffer to the ZLib stream
*
* @param buf pointer to the buffer
* @param buf_size buffer size
*/
void write(char_t* buf, size_t buf_size);

/*!
* @brief Read a buffer from the ZLib stream
*
* @param buf pointer to the buffer
* @param buf_size buffer size
*/
virtual void read(char_t* buf, size_t buf_size) = 0;

/*!
* @brief Read all ZLib output stream
*
* Use the read function for read each data chunk until
* the output buffer is empty, isOutEmpty return true.
*
* @param buf return a new allocated buffer
* @param buf_size return the size of the allocated buffer
*/
void readAll(char_t** buf, size_t& buf_size);

/*!
* @brief Mark as complete the write operation
*/
void flush() { flush_ = true; };

protected:
z_stream z_stream_;
bool flush_;
bool stream_end_;
};



Initialization


The two inherited classes ZDeflateStream and ZInflateStream initialize the z_stream object for compress and for decompress data.
The deflate stream require the compression level when we initialize the structure. I use an optional argument equal to Z_DEFAULT_COMPRESSION for default.
ZDeflateStream::ZDeflateStream(int compression_level/*=Z_DEFAULT_COMPRESSION*/) :
compression_level_( compression_level )
{
int ret = deflateInit (&z_stream_, compression_level_);

if (ret != Z_OK)
throw ZError ("Can not initialize deflate stream");
}

ZDeflateStream::~ZDeflateStream()
{
deflateEnd (&z_stream_);
}


The inflate stream does not need to specify the compression level.

ZInflateStream::ZInflateStream()
{
z_stream_.avail_in = 0;
z_stream_.next_in = Z_NULL;

int ret = inflateInit (&z_stream_);

if (ret != Z_OK)
throw ZError ("Can not initialize inflate stream");
}

ZInflateStream::~ZInflateStream()
{
inflateEnd (&z_stream_);
}


Read and write data


For read and write data we have to use the next_in, avail_in and next_out, avail_out members of the z_stream structure.
We can write data setting the next_in to the pointer to the buffer we want to write and put on avail_in the buffer size. On this way we can write either compressed or decompressed data, so I implement the write method in the base class.

void ZStreamBase::write(char_t* buf, size_t buf_size)
{
z_stream_.avail_in = buf_size;
z_stream_.next_in = reinterpret_cast< Bytef* >(buf);
}



The read operation instead depend on the type of stream, on deflate stream we have to call deflate() function, on inflate stream we have to call inflate() function. Those two functions read the buffer in next_in and write compressed or decompressed data on the next_out buffer.
On deflate stream we read compressed data, we have to pass to the deflate() function an argument that tell if the input stream is complete. I use the boolean attribute flush_ that is set to true when the flush() method is called.
void ZDeflateStream::read(char_t* buf, size_t buf_size)
{
z_stream_.avail_out = buf_size;
z_stream_.next_out = reinterpret_cast< Bytef* >(buf);

int ret = deflate (&z_stream_, flush_ ? Z_FINISH : Z_NO_FLUSH);

if (ret == Z_STREAM_ERROR)
throw ZError( "Error on deflate" );
}


When we read from an inflate stream we read decompressed data.

void ZInflateStream::read(char_t* buf, size_t buf_size)
{
z_stream_.avail_out = buf_size;
z_stream_.next_out = reinterpret_cast< Bytef* >(buf);

int ret = inflate (&z_stream_, Z_NO_FLUSH);

switch (ret)
{
case Z_STREAM_ERROR:
throw ZError( "The stream structure was inconsistent" );
break;
case Z_NEED_DICT:
throw ZError( "A preset dictionary is needed" );
break;
case Z_DATA_ERROR:
throw ZError( "Input data was corrupted" );
break;
case Z_MEM_ERROR:
throw ZError( "Not enough memory" );
break;
case Z_STREAM_END:
stream_end_ = true;
break;
}
}


The ZStreamBase base class implement also a readAll method that read all data from the input buffer and return an allocated buffer, on this way we have not to take care of the buffer size.
The std::stringstream class help us to allocate the temporary buffer.

const size_t ChunkSize = 16384;
.
.
.
void ZStreamBase::readAll(char_t** buf, size_t& buf_size)
{
std::stringstream out_stream;

char_t out_buf[ChunkSize];

do
{
read (out_buf, ChunkSize);

size_t out_size = ChunkSize - z_stream_.avail_out;
out_stream.write (reinterpret_cast< const char* >(out_buf), out_size);

if (out_stream.bad())
throw ZError( "Error writing temporary stream" );

} while (isOutEmpty());

buf_size = out_stream.str().size();
*buf = new char_t[buf_size];
out_stream.read (reinterpret_cast< std::stringstream::char_type* >(*buf), buf_size);;
}



I use this technique on ggredit, and the performance now are immproved. LibXml2 have to read less data, so the file loading (and parsing) now is fast. Also the copy of the buffer is fast, and this is strange because the complexity is increased.
My first method was encode the buffer using Base64 for generate a string from a source object, decode the Base64 string and create a second buffer on the destination object. Now I compress the source buffer, encode the compressed data using Base64 for generate a string from a source object, decode the Base64 string for obtain compressed buffer, decompress the buffer and finally I get the destination object.

So sometimes the size make the difference.

4 commenti:

  1. Good job.

    You cold also use the faster (with less compression ratio) methods such as LZO. It seems the best for in-memory compression.

    RispondiElimina
  2. Davidson garment is a quintessential Harley abercrombie Outlet davidson item, especially if you feature ones Harley decked abercrombie and fitch over. Featuring the whole distinctive line of leather abercrombie sale jackets, t-shirts, buckles, hats, belts, boots abercrombie & fitch and helmets, feeling of guilt reason this is abercrombie not to accessorize to max.A decent buy Harley enthusiast will abercrombie uk go in closet and grab a Harley t-shirt, leather jacket and boots. But, abercrombie london does your closet contain present accessory considering all abercrombie and Fitch Polo of? The Harley helmet. With out them your road abercrombie Polos trip could end in disaster.

    RispondiElimina
  3. Doing the same thing for thomas sabo diamonds is going to cost you thomas sabo sale an arm and a leg as the colored diamonds thomas sabo jewellery are hard to come by. thomas sabo charms The deposit and jewelry boxes you have, thomas sabo online as well as desk and cabinets, vehicle doors and windows, sabo jewellery home and store front doors can all get ruined. thomas sabo charms sale Using a skilled locksmith you'll cheap thomas sabo charms be able to have the assurance that you get the best work that will be done right and guaranteed.All to often, discount thomas sabo charms instead of finding an automotive locksmith, thomas sabo charms clearance Washington DC residents try to take matters in to their own hands.

    RispondiElimina
  4. Diese süße und fröhliche Kollektion thomas sabo charm umfasst Bands in den Formen der Diademe und Kronen mit thomas sabo anhänger Steinen, Burgen, Zauberstäbe, thomas sabo charms reduziert Glas Pantoffeln und Prinzessinnen selbst! thomas sabo charms günstig Und sie kommen in einer Vielzahl von noblen thomas sabo anhänger günstig und girly Farben: rosa, weiß, lila, thomas sabo anhänger billig blau und gold. Alle Mädchen wollen in einigen thomas sabo 2010 zu investieren und dann kann es losgehen dass Handelssystem und thomas sabo reduziert Sie erhalten alle Prinzessinnen euch sein! Und für Sie gibt es die thomas sabo shop Jungs Silly Bandz-Sea Creatures-Bereich.

    RispondiElimina