Background
This article demonstrates how to read files written in various logographic characters like Kanji (from Japan), Hanja (from Korea), and Hanzi (from China) using UTF-8 encoding.
Hardware Environment
n/a
Software Environment
- Windows 7 Professional SP1
- Eclipse – Kepler Release
- Java 1.7 (1.7.0_67 – Windows x86)
First things, first – for Eclipse
By default, Eclipse does not use UTF-8 encoding to display logographic characters. With that setting, “????” characters are displayed instead. To avoid them, change Eclipse file encoding to UTF-8 as shown on the image below.
The Codes to Read UTF-8 Encoded Files
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | /* * Copyright (C) 2014 www.turreta.com * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.turreta.io.file; import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStreamReader; import java.io.UnsupportedEncodingException; public class UTF8FileReader { public void read(File file) throws UnsupportedEncodingException, IOException { BufferedReader in = null; try { in = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF8")); String str; while ((str = in.readLine()) != null) { System.out.println("Processing line: " + str); } } finally { in.close(); } } } |
Sample Output
Get Codes from GitHub
https://github.com/Turreta/File-I-O-in-Java/blob/master/src/com/turreta/io/file/UTF8FileReader.java