Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese text libraries are not working #348

Open
weiqiyang opened this issue Nov 21, 2018 · 5 comments
Open

Japanese text libraries are not working #348

weiqiyang opened this issue Nov 21, 2018 · 5 comments

Comments

@weiqiyang
Copy link

I was trying to convert Japanese Kana to Roman-ji. setq and print work well with Kana, but the romanji function in both kana_sjis.l and kana_euc.l was not reading the input Kana properly.

Following is a sample output using kana_sjis.l.

1.irteusgl$ (load "lib/llib/kana_sjis.l")
t
2.irteusgl$ (romanji "わたしは123まついです。abcひゅうるいちぇんぐふぁつぉでゅ")
"123abc"

I suppose the code itself is right. Then it might be a mismatch of my terminal's character coding. Are there any extra settings I have to do before using these libraries?

@YoheiKakiuchi
Copy link
Member

If you use emacs,
use M-x set-buffer-process-coding-system for changing encoding type for input/output process.
For ubuntu terminal, you can use menu of 端末(T) -> 文字コードの設定(C).

But, problem for this issue, the file lib/llib/kana_sjis.l is saved using utf-8 encoding.
So, you should change the encoding of the file.

FYI, the encoding of kana_euc.l should be changed.

@YoheiKakiuchi
Copy link
Member

You can check the actual digit of string like below.

SJIS

(setq a "ほげ")
(map cons #'(lambda (c) (format nil "0x~X" c)) a)
 => ("0x82" "0xd9" "0x82" "0xb0")

UTF-8

(setq a "ほげ")
(map cons #'(lambda (c) (format nil "0x~X" c)) a)
 => ("0xe3" "0x81" "0xbb" "0xe3" "0x81" "0x92")

@weiqiyang
Copy link
Author

Thanks! Changing source file encoding solved my problem.

I change the encoding after made a copy of the original file:

iconv -f utf-8 -t euc-jp kana_euc.l.bak -o kana_euc.l

The terminal character encoding also need to be changed correspondingly.
And the result is as below.

1.irteusgl$ load "lib/llib/kana_euc.l"
t
2.irteusgl$ (romanji "わたしは123まついです。abcひゅうるいちぇんぐふぁつぉでゅ")
"watashiha123matsuidesu.abchyuuruichenngufatsodyu"

But since most of the time, we are using UTF-8, and so is the source code on github,
maybe it is time for us to have something like a kana_utf.l?

@k-okada
Copy link
Member

k-okada commented Nov 23, 2018 via email

@weiqiyang
Copy link
Author

That’s good idea. Can you create PR for this?

Yes, I will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants