0

How to write Logographic Characters to Files using UTF-8 Encoding

Background

This article demonstrates how to write various logographic characters like Kanji (from Japan), Hanja (from Korea), and Hanzi (from China) to files using UTF-8 encoding. In a B2B system, it is not uncommon to read and write files in different language. To deal with languages that use logographic characters, the easiest solution is to perfom the reading and writing in UTF-8 encoding.

Hardware Environment

n/a

Software Environment

  • Windows 7 Professional SP1
  • Eclipse – Kepler Release
  • Java 1.7 (1.7.0_67 – Windows x86)

First things, first – for Eclipse

By default, Eclipse does not use UTF-8 encoding to display logographic characters. With that setting, “????” characters are displayed instead. To avoid them, change Eclipse file encoding to UTF-8 as shown on the image below.

01-read0utf8-file

The Codes to write UTF-8 Encoded Data to Files

Sample Output

The codes write the Japanese characters to a file named the_count_of_monte_cristo.txt.

Get Codes from GitHub

https://github.com/Turreta/File-I-O-in-Java/blob/master/src/com/turreta/io/file/UTF8FileWriter.java

Karl San Gabriel

Karl San Gabriel

Java and Enterprise Technologies Expert