Java, Software Development

How to Read UTF-8 Encoded Files

Background

This article demonstrates how to read files written in various logographic characters like Kanji (from Japan), Hanja (from Korea), and Hanzi (from China) using UTF-8 encoding.

Hardware Environment

n/a

Software Environment

  • Windows 7 Professional SP1
  • Eclipse – Kepler Release
  • Java 1.7 (1.7.0_67 – Windows x86)

First things, first – for Eclipse

By default, Eclipse does not use UTF-8 encoding to display logographic characters. With that setting, “????” characters are displayed instead. To avoid them, change Eclipse file encoding to UTF-8 as shown on the image below.

01-read0utf8-file

The Codes to Read UTF-8 Encoded Files

Sample Output

02-read0utf8-file_wm

Get Codes from GitHub

https://github.com/Turreta/File-I-O-in-Java/blob/master/src/com/turreta/io/file/UTF8FileReader.java

Loading

Got comments or suggestions? We disabled the comments on this site to fight off spammers, but you can still contact us via our Facebook page!.


You Might Also Like