1500字范文 > 基于C++实现的一种通用Base编解码器(Hex(Base16)/Base32/Base64)

基于C++实现的一种通用Base编解码器(Hex(Base16)/Base32/Base64)

时间：2020-02-28 16:02:16

使用Hex(十六进制)编码、Base32编码和Base64编码可以将原始数据编码为可视化字符串。它们的原理是一样的，都是将指定位数的原始数据编码为特定字符空间中的一个字符。

Hex：也叫作Base16编码；每4位编码为一个字符，字符空间为"0123456789abcdef"或"0123456789ABCDEF"；不区分大小写，其中的字母可以编码为大写也可以编码为小写，同时解码也不区分大小写，应该能对大小写的HEX字符串都能正确解码；Base32：每5位编码为一个字符，字符空间为"ABCDEFGHIJKLMNOPQRSTUVWXYZ234567"，大小写敏感；Base64：每6位编码为一个字符，字符空间为"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"，大小写敏感；

填充：对于Base32和Base64，如果待编码的数据的比特数不是5或6的倍数，则需要进行填充，标准的填充字符为’=’。

本文使用C++实现一种通用的Base编解码器，类名命名为BaseCodec。完整代码贴在文章末尾。

首先，提炼并定义Base编码的影响因素：

char mPaddingChar;bool mIgnoreCase;int mPartialBits;std::string mBase;

其含义为：

mPaddingChar：填充字符mIgnoreCase：是否忽略大小写mPartialBits：分割位数，即多少位编码为一个字符mBase：字符空间

这四个参数通过BaseCodec的构造函数传入并赋值：

BaseCodec::BaseCodec(const std::string& pBase, char paddingChar, int partialBits, bool ignoreCase){mBase = pBase;mPaddingChar = paddingChar;mPartialBits = partialBits;mIgnoreCase = ignoreCase;}

有了这四个参数，就可以开始编解码了。

然后，定义类的3个public接口，包括编码、解码以及字符格式检测。

public:/*** Encoder*/std::string Encode(const std::string& input);/*** Decoder.** throw std::exception if input length error or has invalid character.*/std::string Decode(const std::string& input);/*** Format check.** A valid character will be in pBase pass into constructor , case-insensitive.*/bool Check(const std::string& data);

接着，通过上面的构造函数和3个公共接口，就可以定义我们需要编解码器了。这里定义了以下的几个：

public:/** Binary Codec*/static BaseCodec Binary;/** Hex Codec(lowercase)*/static BaseCodec Hex;/** Hex Codec(uppercase)*/static BaseCodec Hexu;/** Base32 Codec*/static BaseCodec Base32;/** Base64 Codec*/static BaseCodec Base64;

把各个编解码器声明为BaseCodec的public静态变量，以便于引用。它们的定义为；

static const char CHAR_BASE_2[] = "01";static const char CHAR_BASE_16[] = "0123456789abcdef";static const char CHAR_BASE_16u[] = "0123456789ABCDEF";static const char CHAR_BASE_32[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567";static const char CHAR_BASE_64[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";static const char CHAR_PADDING = '=';BaseCodec BaseCodec::Binary(CHAR_BASE_2, CHAR_PADDING, 1, false);BaseCodec BaseCodec::Hex(CHAR_BASE_16, CHAR_PADDING, 4, true);BaseCodec BaseCodec::Hexu(CHAR_BASE_16u, CHAR_PADDING, 4, true);BaseCodec BaseCodec::Base32(CHAR_BASE_32, CHAR_PADDING, 5, false);BaseCodec BaseCodec::Base64(CHAR_BASE_64, CHAR_PADDING, 6, false);

这样，便可以通过如下的形式：

BaseCodec::Base64.EncodeBaseCodec::Base64.DecodeBaseCodec::Base64.Check

进行引用了。

在上面定义的编解码器中的BaseCodec::Hex和BaseCodec::Hexu是十六进制编解码器，前者是编码为小写字母，后者是编码为大写字母；当然，两者都应该能够同时解码大小写字母，因为它们已经声明为忽略大小写的。

BaseCodec::Binary是一个二进制编码器。如对“1234”进行二进制编码：

有了上面的编解码器示例，可以很容易地定义出类似的其他的编解码器，或者更改字符空间也可以定义出私有的编解码器。

比如，我们可以定义一个Base8编码器，即每3位编码为一个字符，字符空间为"01234567"。

static const char CHAR_BASE_8[] = "01234567";BaseCodec Octal(CHAR_BASE_8, CHAR_PADDING, 3, false);

编解码：当然最重要的还是编码和解码函数的实现。编解码函数的参数采用C格式，这样可以方便在需要时转为C代码。上面的公共接口使用C++和std::string封装了一下。编解码函数如下：

int BaseCodec::Encode(const unsigned char *input, char *output, int length){if (length <= 0 || input == nullptr)return 0;int partialBits = mPartialBits; //编码为1个字节的位数const char* pBase = mBase.c_str(); //编码字符空间int nlcm = lcm(8, partialBits); //编码位数和8的最小公倍数int nn = length * 8 / nlcm; //非剩余编码的块数int retLength = nn * (nlcm / partialBits); //编码后的长度: 首先是非剩余编码的长度int padCount = 0; //填充长度//有剩余位时则有填充nn = length * 8 % nlcm; //剩余位数if (nn != 0) {padCount = (nlcm - nn) / partialBits; //填充长度retLength += nlcm / partialBits; //编码后的长度: 然后是剩余+填充的长度}if (output == nullptr) return retLength; //返回长度unsigned char current; //中间缓存变量unsigned char mask = (1 << partialBits) - 1; // 编码转换掩码int codedOffset = 0; //已编码的长度偏移int leftBits = 0; // 剩余的未编码数据的位数unsigned char left_temp; // 剩余的未编码的数据for (int i = 0; i < length; i++){int currentBits; //当前未处理的比特数if (leftBits == 0) {currentBits = 8;} else {currentBits = 8 + leftBits - partialBits; //先处理一次之后剩余的比特数unsigned char mask_need = (1 << (partialBits - leftBits)) - 1; //编码上次剩余的数据还差的位数掩码mask_need <<= currentBits; //左移以便取高位//处理上次剩余的数据current = (left_temp << (partialBits - leftBits)) | ((input[i] & mask_need) >> currentBits);current &= mask;output[codedOffset++] = pBase[(int)current];}int times = currentBits / partialBits; //当前剩余的数据可以转换编码的次数unsigned char mask_can_be_coded = (1 << (partialBits*times)) - 1; //当前可编码的数据掩码mask_can_be_coded <<= (currentBits - partialBits * times);unsigned char temp_to_coded = (input[i] & mask_can_be_coded); //当前剩余可编码数据leftBits = currentBits - partialBits * times;unsigned char mask_temp_left = (1 << (currentBits - partialBits * times)) - 1; //本次未编码的数据掩码left_temp = (input[i] & mask_temp_left);//将本次可编码的数据进行编码for (int m = 0; m < times; m++) {current = temp_to_coded >> (currentBits - (m + 1) * partialBits);current &= mask;output[codedOffset++] = pBase[(int)current];}}//剩余位if (leftBits != 0) {current = (left_temp << (partialBits - leftBits));current &= mask;output[codedOffset++] = pBase[(int)current];}//填充//padCount = retLength - j;for (int i = 0; i < padCount; i++)output[codedOffset++] = mPaddingChar;output[codedOffset] = '\0';return retLength;}int BaseCodec::Decode(const char *input, unsigned char *output){int length = strlen(input);if (input == nullptr || length == 0)return 0;int partialBits = mPartialBits; //编码为1个字节的位数const char* pBase = mBase.c_str(); //转换字符空间int nlcm = lcm(8, partialBits); //编码位数和8的最小公倍数if ((length % (nlcm / partialBits)) != 0)throw std::exception("input length error");int padCount = 0; //填充长度for (int i = length - 1; i > length - nlcm / partialBits; i--) {if (input[i] == mPaddingChar)padCount++;elsebreak;}int retLength; //解码后的长度int leftPadChar = 0; //最后一个完整块中的非填充字节数if (padCount != 0) {leftPadChar = nlcm / partialBits - padCount;retLength = (length - nlcm / partialBits)*partialBits/8; //首先是非剩余编码的长度retLength += (leftPadChar - 1);} else {retLength = length * partialBits / 8;}if (output == nullptr) return retLength; //返回长度unsigned char mask = (1 << partialBits) - 1; // 编码转换掩码int codedOffset = 0; //已编码的长度偏移int leftBits = 0; // 剩余的未解码数据的位数unsigned char left_temp; // 剩余的未解码的数据int index = 0;for (int i = 0; i < (length - padCount); i++){index = GetCharPos(pBase, input[i], mIgnoreCase);if (index == -1)throw std::exception("has invalid char");unsigned char temp = index & mask;if (leftBits != 0){leftBits += partialBits;if (leftBits >= 8){leftBits -= 8;output[codedOffset++] = (left_temp << (partialBits - leftBits)) | (temp >> leftBits);unsigned char mask_left = (1 << leftBits) - 1;left_temp = temp & mask_left;} else {left_temp = (left_temp << partialBits) | temp;}} else {leftBits = partialBits;left_temp = temp;}}output[codedOffset] = '\0';return retLength;}

在编解码中，有两点是重要的：一是编解码后的长度和填充字符的长度计算；二是在连续处理字符时，要在处理当前字节数据的同时，处理掉上次剩余的数据。

在编码解码函数中，还用到了两个辅助函数，一个是最小公倍数的计算，另一个是在字符串中查找字符位置。

还需要考虑的问题：端序与异常。本例是在Windows的VS中编写的，std::exception异常有默认的可传入消息的构造函数；端序也没有考虑。在其他的平台上可能需要考虑这两个问题。

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。