Background
This article demonstrates how to write various logographic characters like Kanji (from Japan), Hanja (from Korea), and Hanzi (from China) to files using UTF-8 encoding. In a B2B system, it is not uncommon to read and write files in different languages. To deal with languages that use logographic characters, the easiest solution is to perform the reading and writing in UTF-8 encoding.
Hardware Environment
n/a
Software Environment
- Windows 7 Professional SP1
- Eclipse – Kepler Release
- Java 1.7 (1.7.0_67 – Windows x86)
First things, first – for Eclipse
By default, Eclipse does not use UTF-8 encoding to display logographic characters. With that setting, “????” characters are displayed instead. To avoid them, change Eclipse file encoding to UTF-8 as shown in the image below.
The Codes to write UTF-8 Encoded Data to Files
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | /* * Copyright (C) 2014 www.turreta.com * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.turreta.io.file; import java.io.BufferedWriter; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStreamWriter; import java.io.Writer; public class UTF8FileWriter { public void write(File file, String text) throws IOException{ Writer out = null; try { out = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(file), "UTF8")); out.append(text); } finally { out.flush(); out.close(); } } public static void main(String... args) { String stringToWrite = "すべての人間の知恵は、これらの2つの単語に含まれている - 待って、ホープ"; File file = new File("the_count_of_monte_cristo.txt"); UTF8FileWriter utf8Writer = new UTF8FileWriter(); try { utf8Writer.write(file, stringToWrite); } catch (IOException e) { e.printStackTrace(); } } } |
Sample Output
The codes write the Japanese characters to a file named the_count_of_monte_cristo.txt.
Get Codes from GitHub
https://github.com/Turreta/File-I-O-in-Java/blob/master/src/com/turreta/io/file/UTF8FileWriter.java