Advertisement
Advertisement


Encode String to UTF-8


Question

I have a String with a "ñ" character and I have some problems with it. I need to encode this String to UTF-8 encoding. I have tried it by this way, but it doesn't work:

byte ptext[] = myString.getBytes();
String value = new String(ptext, "UTF-8");

How do I encode that string to utf-8?

2016/08/08
1
191
8/8/2016 5:34:46 PM

Accepted Answer

String objects in Java use the UTF-16 encoding that can't be modified.

The only thing that can have a different encoding is a byte[]. So if you need UTF-8 data, then you need a byte[]. If you have a String that contains unexpected data, then the problem is at some earlier place that incorrectly converted some binary data to a String (i.e. it was using the wrong encoding).

2017/10/22
140
10/22/2017 10:49:12 PM


In Java7 you can use:

import static java.nio.charset.StandardCharsets.*;

byte[] ptext = myString.getBytes(ISO_8859_1); 
String value = new String(ptext, UTF_8); 

This has the advantage over getBytes(String) that it does not declare throws UnsupportedEncodingException.

If you're using an older Java version you can declare the charset constants yourself:

import java.nio.charset.Charset;

public class StandardCharsets {
    public static final Charset ISO_8859_1 = Charset.forName("ISO-8859-1");
    public static final Charset UTF_8 = Charset.forName("UTF-8");
    //....
}
2017/04/03

Use byte[] ptext = String.getBytes("UTF-8"); instead of getBytes(). getBytes() uses so-called "default encoding", which may not be UTF-8.

2011/04/20

A Java String is internally always encoded in UTF-16 - but you really should think about it like this: an encoding is a way to translate between Strings and bytes.

So if you have an encoding problem, by the time you have String, it's too late to fix. You need to fix the place where you create that String from a file, DB or network connection.

2011/04/20

You can try this way.

byte ptext[] = myString.getBytes("ISO-8859-1"); 
String value = new String(ptext, "UTF-8"); 
2011/04/20

In a moment I went through this problem and managed to solve it in the following way

first i need to import

import java.nio.charset.Charset;

Then i had to declare a constant to use UTF-8 and ISO-8859-1

private static final Charset UTF_8 = Charset.forName("UTF-8");
private static final Charset ISO = Charset.forName("ISO-8859-1");

Then I could use it in the following way:

String textwithaccent="Thís ís a text with accent";
String textwithletter="Ñandú";

text1 = new String(textwithaccent.getBytes(ISO), UTF_8);
text2 = new String(textwithletter.getBytes(ISO),UTF_8);
2018/04/09

Source: https://stackoverflow.com/questions/5729806
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]