I'm trying to take 21 bytes of data which uniquely identifies a trade and store it in a 16 byte char
array. I'm having trouble coming up with the right algorithm for this.
The trade ID which I'm trying to compress consists of 2 fields:
- 18 alphanumeric characters consisting of the ASCII characters 0x20 to 0x7E, Inclusive. (32-126)
- A 3-character numeric string "000" to "999"
So a C++ class that would encompass this data looks like this:
class ID
{
public:
char trade_num_[18];
char broker_[3];
};
This data needs to be stored in a 16-char
data structure, which looks like this:
class Compressed
{
public:
char sku_[16];
};
I tried to take advantage of the fact that since the characters in trade_num_
are only 0-127 there was 1 unused bit in each character. Similarly, 999 in binary is 1111100111, which is only 10 bits -- 6 bits short of a 2-byte word. But when I work out how much I can squeeze this down, the smallest I can make it is 17 bytes; one byte too big.
Any ideas?
By the way, trade_num_
is a misnomer. It can contain letters and other characters. That's what the spec says.
EDIT: Sorry for the confusion. The trade_num_
field is indeed 18 bytes and not 16. After I posted this thread my internet connection died and I could not get back to this thread until just now.
EDIT2: I think it is safe to make an assumption about the dataset. For the trade_num_ field, we can assume that the non-printable ASCII characters 0-31 will not be present. Nor will ASCII codes 127 or 126 (~). All the others might be present, including upper and lower case letters, numbers and punctuations. This leaves a total of 94 characters in the set that trade_num_
will be comprised of, ASCII codes 32 through 125, inclusive.