0

How to Read UTF-8 Encoded Files

Background

This article demonstrates how to read files written in various logographic characters like Kanji (from Japan), Hanja (from Korea), and Hanzi (from China) using UTF-8 encoding.

Hardware Environment

n/a

Software Environment

  • Windows 7 Professional SP1
  • Eclipse – Kepler Release
  • Java 1.7 (1.7.0_67 – Windows x86)

First things, first – for Eclipse

By default, Eclipse does not use UTF-8 encoding to display logographic characters. With that setting, “????” characters are displayed instead. To avoid them, change Eclipse file encoding to UTF-8 as shown on the image below.

01-read0utf8-file

The Codes to Read UTF-8 Encoded Files

Sample Output

02-read0utf8-file_wm

Get Codes from GitHub

https://github.com/Turreta/File-I-O-in-Java/blob/master/src/com/turreta/io/file/UTF8FileReader.java

Karl San Gabriel

Karl San Gabriel

Java and Enterprise Technologies Expert