C# GetString method from Encoding's Unicode class

  • C#
  • Thread starter Silicon Waffle
  • Start date
  • Tags
    Class Method
In summary, your code does not compile and the correct way to initialize a byte array is to use byte[] bytes={(byte)'z', (byte)'\0',(byte)'\0',(byte)'\0',(byte)'ý',...}
  • #1
Silicon Waffle
160
203
I have a byte array (each character consumes 4 bytes) of size 64 for example,
Now I decode it using
Code:
byte[] bytes={'z','\0','\0','\0','ý','\0','\0','\0','ó','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0',...};
string s=Encoding.Unicode.GetString(bytes);

Amazingly, after the code is executed, s="zýó";

But if my bytes contains string e.g "劉三好" (byte array should be all numbers representing these Chinese characters), then after I execute the above code line I get this string too. How can that be done ?
 
Technology news on Phys.org
  • #2
Checkout the Unicode tables:

http://en.wikipedia.org/wiki/Unicode

Unicode characters need two-byte if in the lowest plane so I'd expect two non-zero byte followed by two zero bytes... Also it looks like your string is in least byte order ie little endian order.
 
  • #3
Silicon Waffle said:
I have a byte array (each character consumes 4 bytes) of size 64 for example
A byte is an 8-bit unsigned integer (see https://msdn.microsoft.com/en-us/library/system.byte(v=vs.100).aspx). The characters you show in your byte array are one byte each, not 4 bytes. Also, 4 bytes isn't 64 bits, it's 32 bits.
Silicon Waffle said:
,
Now I decode it using
Code:
byte[] bytes={'z','\0','\0','\0','ý','\0','\0','\0','ó','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0',...};
string s=Encoding.Unicode.GetString(bytes);

Amazingly, after the code is executed, s="zýó";

But if my bytes contains string e.g "劉三好" (byte array should be all numbers representing these Chinese characters), then after I execute the above code line I get this string too. How can that be done ?
 
  • #4
Silicon Waffle said:
I have a byte array (each character consumes 4 bytes) of size 64 for example,
Now I decode it using
Code:
byte[] bytes={'z','\0','\0','\0','ý','\0','\0','\0','ó','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0',...};
string s=Encoding.Unicode.GetString(bytes);

Amazingly, after the code is executed, s="zýó";

But if my bytes contains string e.g "劉三好" (byte array should be all numbers representing these Chinese characters), then after I execute the above code line I get this string too. How can that be done ?
This code do not compile. The correct way to initialize a byte array is
Code:
byte[] bytes={(byte)'z', (byte)'\0',(byte)'\0',(byte)'\0',(byte)'ý',...
string s=Encoding.Unicode.GetString(bytes);
Such a call will return s="z\0ý\0......" and not what you said.
I suppose the bytes comes from a file. If this is UTF32 (Not Unicode / UTF16) you should use
Code:
string s=Encoding.UTF32.GetString(bytes)
Note that this string is tailed by \0 that probably needs cleaning...
 

Related to C# GetString method from Encoding's Unicode class

1. What is the C# GetString method from Encoding's Unicode class?

The C# GetString method is a built-in function in the Encoding class that is used to convert a sequence of bytes into a string using the specified character encoding.

2. How do I use the C# GetString method?

To use the C# GetString method, you will need to first create an instance of the Encoding class and then call the GetString method, passing in the byte array and specifying the desired character encoding.

3. What is the difference between Encoding.UTF8 and Encoding.Unicode in the C# GetString method?

Encoding.UTF8 uses a variable-width encoding scheme that can represent all Unicode characters, while Encoding.Unicode uses a fixed-width encoding scheme that can only represent a subset of Unicode characters. Therefore, the choice between the two will depend on the type of characters you need to encode.

4. Can I convert a string to a byte array using the C# GetString method?

Yes, the C# GetString method can also be used to convert a string into a byte array by specifying the desired character encoding and calling the GetBytes method instead of the GetString method.

5. Are there any limitations to using the C# GetString method from Encoding's Unicode class?

The main limitation of the C# GetString method is that it can only be used to convert byte sequences that were originally encoded using the specified character encoding. If the byte sequence was encoded using a different encoding, the resulting string may not be accurate. It is important to ensure that the correct encoding is used when converting between strings and byte arrays using this method.

Similar threads

  • Programming and Computer Science
Replies
5
Views
6K
  • Programming and Computer Science
Replies
3
Views
923
  • Engineering and Comp Sci Homework Help
Replies
7
Views
2K
  • Programming and Computer Science
Replies
3
Views
3K
  • Programming and Computer Science
Replies
5
Views
2K
  • Programming and Computer Science
3
Replies
75
Views
4K
  • Programming and Computer Science
Replies
1
Views
5K
  • Programming and Computer Science
Replies
2
Views
2K
  • Programming and Computer Science
Replies
1
Views
1K
  • Programming and Computer Science
Replies
3
Views
2K
Back
Top