Software Development

How to write Logographic Characters to Files using UTF-8 Encoding

Background

This article demonstrates how to write various logographic characters like Kanji (from Japan), Hanja (from Korea), and Hanzi (from China) to files using UTF-8 encoding. In a B2B system, it is not uncommon to read and write files in different languages. To deal with languages that use logographic characters, the easiest solution is to perform the reading and writing in UTF-8 encoding.

Hardware Environment

n/a

Software Environment

  • Windows 7 Professional SP1
  • Eclipse – Kepler Release
  • Java 1.7 (1.7.0_67 – Windows x86)

First things, first – for Eclipse

By default, Eclipse does not use UTF-8 encoding to display logographic characters. With that setting, “????” characters are displayed instead. To avoid them, change Eclipse file encoding to UTF-8 as shown in the image below.

01-read0utf8-file

The Codes to write UTF-8 Encoded Data to Files

Sample Output

The codes write the Japanese characters to a file named the_count_of_monte_cristo.txt.

Get Codes from GitHub

https://github.com/Turreta/File-I-O-in-Java/blob/master/src/com/turreta/io/file/UTF8FileWriter.java

Loading

Got comments or suggestions? We disabled the comments on this site to fight off spammers, but you can still contact us via our Facebook page!.


You Might Also Like