A very old routine I made on an 8 bit system, simplistic in design and implementation is the 3:4, and later, 7:8 encoder.

Foremost: It was written for PETSCII not ASCII - although a simple translation table can be used in order to fulfil this (see later).

I made the 3:4 first, which in essence was for text that required no more than 64 different symbols. The text itself was for a demo scrolling message.
I noticed compression was not the best when it came to compressing the text, so I gave it some thought and realised that I was wasting 2 bits per byte - meaning I could roll bits from 4 bytes into 3 bytes - which was far better than the compression results available at the time.

Later on, I needed to use upper and lower case text for a magazine. Obviously 64 symbols was not enough, but 128 was - thus the 7:8 encoder was made by removing the 8th bit from all characters, then rolling 1 bit from the last character into that position in the other 6.

The naming itself is pretty self-explanatory - 3:4 means 4 bytes become 3, and 7:8 meaning 8 bytes become 7.

A very interesting side effect is there are numerous positions you could shift the bits around, thus it would also increase obfuscation if desired.

I later posted this idea on Blackcode, which some people made their own variant realising the potential for this. The main issue with using it for ASCII is the way upper and lower case are encoded in ASCII. One person (whom I honestly forget their name) wrote a simple translation table, whereby the symbol 'a' would be shifted to hex value 0x01, 'b' to 0x02 etc. and lessor used symbols such as 0x0a and 0x0d would be moved up to 0x3e and 0x3f. This is something that is down to the creator - you can optimise for various things here, or simply use it for obfuscation.

The first step is to decide where you want to place the bits. The following table shows a typical 4 byte breakdown into bits:

Blue = Byte Red = Bit
11 12 13 14 15 16 17 18
21 22 23 24 25 26 27 28
31 32 33 34 35 36 37 38
41 42 43 44 45 46 47 48

As we remove the 7th & 8th bits, we only focus on bits 1-6. This is the simplist way of making a 3:4 encoder;

Blue = Byte Red = Bit
41 42 11 12 13 14 15 16
43 44 21 22 23 24 25 26
45 46 31 32 33 34 35 36

The 6 bits we keep from the 4th symbol is split into 2 bit pairs (nibble) and shifted into the start of the other bytes. This method is probably the fastest and easiest way to understand it, and is fast enough to be executed at run time even on an 8 bit computer.

I wrote a very simplistic C++ routine for this;

/* 34 Encoder/decoder method & routine (C) by The_Original_Sin */

#include

typedef unsigned char BYTE;

BYTE o1,o2,o3,o4 = 0;
BYTE v1,v2,v3,v4 = 0;

void encode(BYTE v1,BYTE v2,BYTE v3,BYTE v4)
{
// Remove the 7 & 8th bits, and roll in the nibble from the 4th byte
o1 = ((v1 & 0x3f) | ((v4 & 0x30) << 2));
o2 = ((v2 & 0x3f) | ((v4 & 0xc0) << 4));
o3 = ((v3 & 0x3f) | ((v4 & 0x03) << 6));
}

void decode(BYTE v1,BYTE v2,BYTE v3)
{
// Decoding the first 3 bytes is simply removing 7th & 8th bits
o1 = (v1 & 0x3f);
o2 = (v2 & 0x3f);
o3 = (v3 & 0x3f);
// 4th byte is decoded by rolling the nibbles out from the others
o4 = ((v1 & 0xc0) >> 2) |
((v2 & 0xc0) >> 4) |
((v3 & 0xc0) >> 6);
}

int main ()
{
encode (0x08,0x05,0x20,0x21); // encode these bytes
printf ("%02x %02x %02x\n",o1,o2,o3); // show encoded bytes
decode (o1,o2,o3); // decode bytes
printf ("%02x %02x %02x %02x\n",o1,o2,o3,o4); // Show the output
return 0;
}


To use in your own routine, you merely need the 4 buffers (defined v1-4) and 4 output bytes (defined o1-4), encode and decode routine. Although I would like to note that it is NOT for use by US Citizens (by breaking those terms, you bear sole responsibility for any actions I may take as a result).

Once you understand the simple concept of the 3:4 coder, the 7:8 coder is less daunting.

Blue = Byte Red = Bit
11 12 13 14 15 16 17 18
21 22 23 24 25 26 27 28
31 32 33 34 35 36 37 38
41 42 43 44 45 46 47 48
51 52 53 54 55 56 57 58
61 62 63 64 65 66 67 68
71 72 73 74 75 76 77 78
81 82 83 84 85 86 87 88

We remove the 8th bits, keeping only bits 1-7, then roll a single bit from the 8th byte into each character;

Blue = Byte Red = Bit
81 11 12 13 14 15 16 17
82 21 22 23 24 25 26 27
83 31 32 33 34 35 36 37
84 41 42 43 44 45 46 47
85 51 52 53 54 55 56 57
86 61 62 63 64 65 66 67
87 71 72 73 74 75 76 77


The C++ code for this would look something like;

/* 34 Encoder/decoder method & routine (C) by The_Original_Sin */

#include

typedef unsigned char BYTE;

BYTE o1,o2,o3,o4,o5,o6,o7,o8 = 0;
BYTE v1,v2,v3,v4,v5,v6,v7,v8 = 0;

void encode(BYTE v1,BYTE v2,BYTE v3,BYTE v4,BYTE v5,BYTE v6,BYTE v7,BYTE v8)
{
o1 = ((v1 & 0x7f) | ((v8 & 0x40) << 1));
o2 = ((v2 & 0x7f) | ((v8 & 0x20) << 2));
o3 = ((v3 & 0x7f) | ((v8 & 0x10) << 3));
o4 = ((v4 & 0x7f) | ((v8 & 0x08) << 4));
o5 = ((v5 & 0x7f) | ((v8 & 0x04) << 5));
o6 = ((v6 & 0x7f) | ((v8 & 0x02) << 6));
o7 = ((v7 & 0x7f) | ((v8 & 0x01) << 7));
}

void decode(BYTE v1,BYTE v2,BYTE v3,BYTE v4,BYTE v5,BYTE v6,BYTE v7)
{
o1 = (v1 & 0x7f);
o2 = (v2 & 0x7f);
o3 = (v3 & 0x7f);
o4 = (v4 & 0x7f);
o5 = (v5 & 0x7f);
o6 = (v6 & 0x7f);
o7 = (v7 & 0x7f);
o8 = ((v1 & 0x80) >> 1) |
((v2 & 0x80) >> 2) |
((v3 & 0x80) >> 3) |
((v4 & 0x80) >> 4) |
((v5 & 0x80) >> 5) |
((v6 & 0x80) >> 6) |
((v7 & 0x80) >> 7);
}

int main ()
{
printf ("%02x %02x %02x %02x %02x %02x %02x %02x\n",0x08,0x05,0x20,0x21,0x08,0x05,0x20,0x21);
encode (0x08,0x05,0x20,0x21,0x08,0x05,0x20,0x21);
printf ("%02x %02x %02x %02x %02x %02x %02x\n",o1,o2,o3,o4,o5,o6,o7);
decode (o1,o2,o3,o4,o5,o6,o7);
printf ("%02x %02x %02x %02x %02x %02x %02x %02x\n",o1,o2,o3,o4,o5,o6,o7,o8);
return 0;
}


Once again, to use in your own code you only need the buffers (v1-8), output (o1-8) and encode & decode subroutines.

All in all, a VERY simple routine to implement with little overheads on runtime. This is something that could easily be utilised in anything from HTTP-like routines, or even as a "save" feature for text-based applications. The argument for "standardised text" is long gone out the window thanks to Microsoft, Apple and the likes, this routine would save a LOT of space.