Next Spaceship

Let's drive it into the future

From Shift-jis to UTF-8

| Comments

I’m tired of transform shift-jis encoding to UTF-8 encoding for each file in my project these days, so I want to write a script to automatically do this job for me. After searching the Internet, I find it’s an easy job with the tool of Python.

Python, at least 2.6 version, has a library called codecs, and all we have to do is just using this library to read and write files in different encodings.

The code to transform all files, including files in sub-folders, from shift-jis encoding to UTF-8 encoding is here:

Transforming Shift-Jis To UTF-8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#!/usr/bin/python
import sys, os, codecs, glob, re

#Created by Liang Sun on March, 5, 2011

#Translate all files in current folder from 
# shif-jis encoding to utf-8 encoding 

errList = [];

def transcode(infile, outfile, incoding = "shift-jis", outcoding = "utf-8"):
    print "infile = " + infile
    print "outfile = " + outfile
    fin = codecs.open(infile, "rb", incoding)
    fout = codecs.open(outfile, "wb", outcoding)
    try:
        fout.write(fin.read())
    except:
        errList.append(outfile)
        print "!!!" + outfile + " is not encoded in shift-jis."
        fi = open(infile)
        fo = open(outfile, 'w')
        fo.write(fi.read())
        fo.close()
        fi.close()
    fin.close()
    fout.close()
path = os.path.abspath(os.path.dirname(sys.argv[0]))
print "Current Path: " + path
for dirpath, dirs, files in os.walk(path):
    for filename in files:
        if re.search(r".(h|m|mm|cpp|inl|def|txt)$", filename):
            print "----" + filename + "...."
            fi = open(os.path.join(dirpath, filename))
            fo = open(os.path.join(dirpath, filename + ".bk"), 'w')
            fo.write(fi.read())
            fo.close()
            fi.close()
            transcode(os.path.join(dirpath, filename + ".bk"),
                      os.path.join(dirpath, filename))
            print "Done."
            os.remove(os.path.join(dirpath, filename + ".bk"))
if errList:
    print "--------------------------------------------------------"
    print "These files are not encoded in shift-jis:"
    for err in errList:
        print "t" + err
    print "--------------------------------------------------------"
else:
    print
    print "All files have been translated successfully."
print
print "Created for you by Liang Sun on March, 5, 2011."
raw_input()

Comments