Converting a byte array to String seems slowly but what is hard is, doing it correctly. Many programmers brand error of ignoring graphic symbol encoding whenever bytes are converted into a String or char or vice versa. As a programmer, nosotros all know that computer's alone empathize binary information i.e. 0 in addition to 1. All things nosotros meet in addition to purpose e.g. images, text files, movies, or whatsoever other multi-media is stored inwards shape of bytes, but what is to a greater extent than of import is procedure of encoding or decoding bytes to character. Data conversion is an of import theme on whatsoever programming interview, in addition to because of trickiness of graphic symbol encoding, this questions is i of the most popular String Interview question on Java Interviews. While reading a String from input source e.g. XML files, HTTP request, network port, or database, you lot must pay attending on which graphic symbol encoding (e.g. UTF-8, UTF-16, in addition to ISO 8859-1) they are encoded. If you lot volition non purpose the same graphic symbol encoding piece converting bytes to String, you lot would destination upward amongst a corrupt String which may comprise totally wrong values. You mightiness lead maintain seen ?, foursquare brackets after converting byte[] to String, those are because of values your electrical current graphic symbol encoding is non supporting, in addition to only showing to a greater extent than or less garbage values.
I tried to empathize why programmes brand graphic symbol encoding mistakes to a greater extent than oft than not, in addition to my lilliputian inquiry in addition to ain sense suggests that, it may move because of ii reasons, origin non dealing plenty amongst internationalization in addition to graphic symbol encodings in addition to minute because ASCII characters are supported past times almost all pop encoding schemes in addition to has same values. Since nosotros mostly bargain amongst encoding similar UTF-8, Cp1252 and Windows-1252, which displays ASCII characters (mostly alphabets in addition to numbers) without fail, fifty-fifty if you lot purpose different encoding scheme. Real lawsuit comes when your text contains special characters e.g. 'é', which is oft used inwards French names. If your platform's graphic symbol encoding doesn't recognize that graphic symbol thus either you lot volition meet a different graphic symbol or something garbage, in addition to sadly until you lot got your hands burned, you lot are unlikely to move careful amongst graphic symbol encoding. In Java, things are lilliputian fight to a greater extent than tricky because many IO classes e.g. InputStreamReader by default purpose platform's graphic symbol encoding. What this agency is that, if you lot run your plan inwards different machine, you lot volition probable acquire different output because of different graphic symbol encoding used on that machine. In this article, nosotros volition acquire how to convert byte[] to String inwards Java both past times using JDK API in addition to amongst the assistance of Guava in addition to Apache commons.
1) You tin purpose constructor of String, which takes byte array in addition to graphic symbol encoding
This is the right way to convert bytes to String, provided you lot know for certain that bytes are encoded inwards the graphic symbol encoding you lot are using.
2) If you lot are reading byte array from whatsoever text file e.g. XML document, HTML file or binary file, you lot tin purpose the Apache Commons IO library to convert the FileInputStream to a String directly. This method too buffers the input internally, thus at that topographic point is no require to purpose to a greater extent than or less other BufferedInputStream.
In club to correctly convert those byte array into String, you lot must origin discover right graphic symbol encoding past times reading meta information e.g. Content-Type, <?xml encoding="…"> etc, depending on the format/protocol of the information you lot are reading. This is i of the argue I recommend to purpose XML parsers e.g. SAX or DOM parsers to read XML files, they lead maintain tending of graphic symbol encoding past times themselves.
Some programmers, too recommends to purpose Charset over String for specifying graphic symbol encoding, e.g. instead of "UTF-8" purpose StandardCharsets.UTF_8 mainly to avoid UnsupportedEncodingException inwards worst case. There are half-dozen measure Charset implementations guaranteed to move supported past times all Java platform implementations. You tin purpose them instead specifying encoding scheme inwards String. In short, ever prefer StandardCharsets.ISO_8859_1 over "ISO_8859_1", every bit shown below :
Other measure charset supported past times Java platform are :
If you lot are reading bytes from input stream, you lot tin too depository fiscal establishment tally my before post virtually 5 ways to convert InputStream to String inwards Java for details.
Original XML
Here is our sample XML snippet to demonstrate issues amongst using default graphic symbol encoding. This file contains letter 'é', which is non correctly displayed inwards Eclipse because it's default graphic symbol encoding is Cp1252.
And, this is what happens when you lot convert a byte array to String without specify graphic symbol encoding, e.g. :
This volition purpose platform's default graphic symbol encoding, which is Cp1252 in this case, because nosotros are running this plan inwards Eclipse IDE. You tin meet that letter 'é' is non displayed correctly.
To ready this, specify graphic symbol encoding piece creating String from byte array, e.g.
By the way, allow me acquire inwards clear that fifty-fifty though I lead maintain read XML files using InputStream hither it's non a practiced practice, inwards fact it's a bad practice. You should ever purpose proper XML parsers for reading XML documents. If you lot don't know how, delight depository fiscal establishment tally this tutorial. Since this representative is mostly to present you lot why graphic symbol encoding matters, I lead maintain chosen an representative which was easily available in addition to looks to a greater extent than practical.
This rules should too move applied when you lot convert graphic symbol information to byte e.g. converting String to byte array using String.getBytes() method. In this instance it volition purpose platform's default graphic symbol encoding, instead of this you lot should purpose overloaded version which takes graphic symbol encoding.
That's all on how to convert byte array to String inwards Java. As you lot tin meet that Java API, specially java.lang.String shape provides methods in addition to constructor that takes a byte[] in addition to returns a String (or vice versa), but past times default they rely on platform's graphic symbol encoding, which may non move correct, if byte array is created from XML files, HTTP asking information or from network protocols. You should ever acquire right encoding from source itself. If you lot similar to read to a greater extent than virtually what every programmer should know virtually String, you lot tin checkout this article.
Further Learning
Data Structures in addition to Algorithms: Deep Dive Using Java
Algorithms in addition to Data Structures - Part 1 in addition to 2
Data Structures inwards Java nine past times Heinz Kabutz
I tried to empathize why programmes brand graphic symbol encoding mistakes to a greater extent than oft than not, in addition to my lilliputian inquiry in addition to ain sense suggests that, it may move because of ii reasons, origin non dealing plenty amongst internationalization in addition to graphic symbol encodings in addition to minute because ASCII characters are supported past times almost all pop encoding schemes in addition to has same values. Since nosotros mostly bargain amongst encoding similar UTF-8, Cp1252 and Windows-1252, which displays ASCII characters (mostly alphabets in addition to numbers) without fail, fifty-fifty if you lot purpose different encoding scheme. Real lawsuit comes when your text contains special characters e.g. 'é', which is oft used inwards French names. If your platform's graphic symbol encoding doesn't recognize that graphic symbol thus either you lot volition meet a different graphic symbol or something garbage, in addition to sadly until you lot got your hands burned, you lot are unlikely to move careful amongst graphic symbol encoding. In Java, things are lilliputian fight to a greater extent than tricky because many IO classes e.g. InputStreamReader by default purpose platform's graphic symbol encoding. What this agency is that, if you lot run your plan inwards different machine, you lot volition probable acquire different output because of different graphic symbol encoding used on that machine. In this article, nosotros volition acquire how to convert byte[] to String inwards Java both past times using JDK API in addition to amongst the assistance of Guava in addition to Apache commons.
How to convert byte[] to String inwards Java
There are multiple ways to alter byte array to String inwards Java, you lot tin either purpose methods from JDK, or you lot tin purpose opened upward source gratuitous APIs similar Apache common in addition to Google Guava. These API provides at to the lowest degree ii sets of methods to do String shape byte array; one, which uses default platform encoding in addition to other which takes graphic symbol encoding. You should ever purpose afterwards one, don't rely on platform encoding. I know, it could move same or you lot mightiness non lead maintain faced whatsoever work thus far, but it's improve to move security than sorry. As I pointed out inwards my final post virtually printing byte array every bit Hex String, It's too i of the best practise to specify graphic symbol encoding piece converting bytes to graphic symbol inwards whatsoever programming language. It mightiness move possible that your byte array comprise non-printable ASCII characters. Let's origin meet JDK's way of converting byte[] to String :1) You tin purpose constructor of String, which takes byte array in addition to graphic symbol encoding
String str = new String(bytes, "UTF-8");
This is the right way to convert bytes to String, provided you lot know for certain that bytes are encoded inwards the graphic symbol encoding you lot are using.
2) If you lot are reading byte array from whatsoever text file e.g. XML document, HTML file or binary file, you lot tin purpose the Apache Commons IO library to convert the FileInputStream to a String directly. This method too buffers the input internally, thus at that topographic point is no require to purpose to a greater extent than or less other BufferedInputStream.
String fromStream = IOUtils.toString(fileInputStream, "UTF-8");
In club to correctly convert those byte array into String, you lot must origin discover right graphic symbol encoding past times reading meta information e.g. Content-Type, <?xml encoding="…"> etc, depending on the format/protocol of the information you lot are reading. This is i of the argue I recommend to purpose XML parsers e.g. SAX or DOM parsers to read XML files, they lead maintain tending of graphic symbol encoding past times themselves.
String str = IOUtils.toString(fis,StandardCharsets.UTF_8);
Other measure charset supported past times Java platform are :
- StandardCharsets.ISO_8859_1
- StandardCharsets.US_ASCII
- StandardCharsets.UTF_16
- StandardCharsets.UTF_16BE
- StandardCharsets.UTF_16LE
If you lot are reading bytes from input stream, you lot tin too depository fiscal establishment tally my before post virtually 5 ways to convert InputStream to String inwards Java for details.
Original XML
Here is our sample XML snippet to demonstrate issues amongst using default graphic symbol encoding. This file contains letter 'é', which is non correctly displayed inwards Eclipse because it's default graphic symbol encoding is Cp1252.
xml version="1.0" encoding="UTF-8"?> <banks> <bank> <name>Industrial & Commercial Bank of PRC </name> <headquarters> Beijing , China</headquarters> </bank> <bank> <name>Crédit Agricole SA</name> <headquarters>Montrouge, France</headquarters> </bank> <bank> <name>Société Générale</name> <headquarters>Paris, Île-de-France, France</headquarters> </bank> </banks>
And, this is what happens when you lot convert a byte array to String without specify graphic symbol encoding, e.g. :
String str = new String(filedata);
This volition purpose platform's default graphic symbol encoding, which is Cp1252 in this case, because nosotros are running this plan inwards Eclipse IDE. You tin meet that letter 'é' is non displayed correctly.
xml version="1.0" encoding="UTF-8"?> <banks> <bank> <name>Industrial & Commercial Bank of PRC </name> <headquarters> Beijing , China</headquarters> </bank> <bank> <name>Crédit Agricole SA</name> <headquarters>Montrouge, France</headquarters> </bank> <bank> <name>Société Générale</name> <headquarters>Paris, Île-de-France, France</headquarters> </bank> </banks>
To ready this, specify graphic symbol encoding piece creating String from byte array, e.g.
String str = new String(filedata, "UTF-8");
By the way, allow me acquire inwards clear that fifty-fifty though I lead maintain read XML files using InputStream hither it's non a practiced practice, inwards fact it's a bad practice. You should ever purpose proper XML parsers for reading XML documents. If you lot don't know how, delight depository fiscal establishment tally this tutorial. Since this representative is mostly to present you lot why graphic symbol encoding matters, I lead maintain chosen an representative which was easily available in addition to looks to a greater extent than practical.
Java Program to Convert Byte array to String inwards Java
Here is our sample plan to present why relying on default graphic symbol encoding is a bad thought in addition to why you lot must purpose graphic symbol encoding piece converting byte array to String inwards Java. In this program, nosotros are using Apache Commons IOUtils shape to direct read file into byte array. It takes tending of opening/closing input stream, thus you lot don't require to worry virtually leaking file descriptors. Now how you lot do String using that array, is the key. If you lot supply right graphic symbol encoding, you lot volition acquire right output otherwise a nearly right but wrong output.import java.io.FileInputStream; import java.io.IOException; import org.apache.commons.io.IOUtils; /** * Java Program to convert byte array to String. In this example, nosotros lead maintain origin * read an XML file amongst graphic symbol encoding "UTF-8" into byte array in addition to thus created * String from that. When you lot don't specify a graphic symbol encoding, Java uses * platform's default encoding, which may non move the same if file is a XML document coming from to a greater extent than or less other system, emails, or evidently text files fetched from an * HTTP server etc. You must origin discovery right graphic symbol encoding * in addition to thus purpose them piece converting byte array to String. * * @author Javin Paul */ public class ByteArrayToString{ public static void main(String args[]) throws IOException { System.out.println("Platform Encoding : " + System.getProperty("file.encoding")); FileInputStream fis = new FileInputStream("info.xml"); // Using Apache Commons IOUtils to read file into byte array byte[] filedata = IOUtils.toByteArray(fis); String str = new String(filedata, "UTF-8"); System.out.println(str); } } Output : Platform Encoding : Cp1252 <?xml version="1.0" encoding="UTF-8"?> <banks> <bank> <name>Industrial & Commercial Bank of China </name> <headquarters> Beijing , China</headquarters> </bank> <bank> <name>Crédit Agricole SA</name> <headquarters>Montrouge, France</headquarters> </bank> <bank> <name>Société Générale</name> <headquarters>Paris, Île-de-France, France</headquarters> </bank> </banks>
Things to recall in addition to Best Practices
Always remember, using graphic symbol encoding piece converting byte array to String is non a best practise but mandatory thing. You should ever purpose it irrespective of programming language. By the way, you lot tin lead maintain authorities annotation of next things, which volition assistance you lot to avoid span of nasty issues :- Use graphic symbol encoding from the source e.g. Content-Type inwards HTML files, or <?xml encoding="…">.
- Use XML parsers to parse XML files instead of finding graphic symbol encoding in addition to reading it via InputStream, to a greater extent than or less things are best left for demo code only.
- Prefer Charset constants e.g. StandardCharsets.UTF_16 instead of String "UTF-16"
- Never rely on platform's default encoding scheme
This rules should too move applied when you lot convert graphic symbol information to byte e.g. converting String to byte array using String.getBytes() method. In this instance it volition purpose platform's default graphic symbol encoding, instead of this you lot should purpose overloaded version which takes graphic symbol encoding.
That's all on how to convert byte array to String inwards Java. As you lot tin meet that Java API, specially java.lang.String shape provides methods in addition to constructor that takes a byte[] in addition to returns a String (or vice versa), but past times default they rely on platform's graphic symbol encoding, which may non move correct, if byte array is created from XML files, HTTP asking information or from network protocols. You should ever acquire right encoding from source itself. If you lot similar to read to a greater extent than virtually what every programmer should know virtually String, you lot tin checkout this article.
Further Learning
Data Structures in addition to Algorithms: Deep Dive Using Java
Algorithms in addition to Data Structures - Part 1 in addition to 2
Data Structures inwards Java nine past times Heinz Kabutz