How to convert UTF-8 byte[] to string?
How to convert UTF-8 byte[] to string?
Question
I have a byte[]
array that is loaded from a file that I happen to known contains UTF-8.
In some debugging code, I need to convert it to a string. Is there a one liner that will do this?
Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.
Read more… Read less…
There're at least four different ways doing this conversion.
Encoding's GetString
, but you won't be able to get the original bytes back if those bytes have non-ASCII characters.BitConverter.ToString
The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.Convert.ToBase64String
You can easily convert the output string back to byte array by usingConvert.FromBase64String
.
Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.HttpServerUtility.UrlTokenEncode
You can easily convert the output string back to byte array by usingHttpServerUtility.UrlTokenDecode
. The output string is already URL friendly! The downside is it needsSystem.Web
assembly if your project is not a web project.
A full example:
byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters
string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1); // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results
string s2 = BitConverter.ToString(bytes); // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes
string s3 = Convert.ToBase64String(bytes); // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes
string s4 = HttpServerUtility.UrlTokenEncode(bytes); // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes
A general solution to convert from byte array to string when you don't know the encoding:
static string BytesToStringConverted(byte[] bytes)
{
using (var stream = new MemoryStream(bytes))
{
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
}
}
Definition:
public static string ConvertByteToString(this byte[] source)
{
return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}
Using:
string result = input.ConvertByteToString();
Converting a byte[]
to a string
seems simple but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:
private string ToString(byte[] bytes)
{
string response = string.Empty;
foreach (byte b in bytes)
response += (Char)b;
return response;
}
Using (byte)b.ToString("x2")
, Outputs b4b5dfe475e58b67
public static class Ext {
public static string ToHexString(this byte[] hex)
{
if (hex == null) return null;
if (hex.Length == 0) return string.Empty;
var s = new StringBuilder();
foreach (byte b in hex) {
s.Append(b.ToString("x2"));
}
return s.ToString();
}
public static byte[] ToHexBytes(this string hex)
{
if (hex == null) return null;
if (hex.Length == 0) return new byte[0];
int l = hex.Length / 2;
var b = new byte[l];
for (int i = 0; i < l; ++i) {
b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
return b;
}
public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare)
{
if (bytes == null && bytesToCompare == null) return true; // ?
if (bytes == null || bytesToCompare == null) return false;
if (object.ReferenceEquals(bytes, bytesToCompare)) return true;
if (bytes.Length != bytesToCompare.Length) return false;
for (int i = 0; i < bytes.Length; ++i) {
if (bytes[i] != bytesToCompare[i]) return false;
}
return true;
}
}