how to extract characters from a Korean string in VBA

白昼怎懂夜的黑 提交于 2019-12-04 14:43:13

I think what you are looking for is a Byte Array Dim aByte() as byte aByte="한글" should give you the two unicode values for each character in the string

Disclaimer: I know little about Access or VBA, but what you're having is a generic Unicode problem, it's not specific to those tools. I retagged your question to add tags related to this issue.

Access is doing the right thing by returning 한, it is indeed the first character of that two-character string. What you want here is the canonical decomposition of this hangul in its constituent jamos, also known as Normalization Form D (NFD), for “decomposed”. The NFD form is ᄒ ‌ᅡ ‌ᆫ, of which the first character is what you want.

Note also that as per your example, you seem to want a function to return the equivalent hangul (ㅎ) for the jamo (ᄒ) – there really are two different code points because they represent different semantic units (a full-fledged hangul syllable, or a part of a hangul). There is no pre-defined mapping from the former to the latter, you could write a small function to that effect, as the number of jamos is limited to a few dozens (the real work is done in the first function, NFD).

Adding to Arthur's excellent answer, I want to point out that extracting jamo from hangeul syllables is very straightforward from the standard. While the solution isn't specific to Excel or Access (it's a Python module), it only involves arithmetic expressions so it should be easily translated to other languages. The formulas, as can be seen, are identical to those in page 109 of the standard. The decomposition is returned as a tuple of integers encoded strings, which can be easily verified to correspond to the Hangul Jamo Code Chart.

# -*- encoding: utf-8 -*-

SBase = 0xAC00
LBase = 0x1100
VBase = 0x1161
TBase = 0x11A7
SCount = 11172
LCount = 19
VCount = 21
TCount = 28
NCount = VCount * TCount


def decompose(syllable):
    global SBase, LBase, VBase, TBase, SCount, LCount, VCount, TCount, NCount

    S = ord(syllable)
    SIndex = S - SBase
    L = LBase + SIndex / NCount
    V = VBase + (SIndex % NCount) / TCount
    T = TBase + SIndex % TCount

    if T == TBase:
        result = (L,V)
    else:
        result = (L,V,T)

    return tuple(map(unichr, result))

if __name__ == '__main__':
    test_values = u'항가있닭넓짧'

    for syllable in test_values:
        print syllable, ':',
        for s in decompose(syllable): print s,
        print

This is the output in my console:

항 : ᄒ ᅡ ᆼ
가 : ᄀ ᅡ
있 : ᄋ ᅵ ᆻ
닭 : ᄃ ᅡ ᆰ
넓 : ᄂ ᅥ ᆲ
짧 : ᄍ ᅡ ᆲ

I assume you got what you needed, but it seems rather convoluted. I don't know anything about this, but recently did some investigating of handling Unicode, and looked into all the string Byte functions, such as LeftB(), RightB(), InputB(), InStrB(), LenB(), AscB(), ChrB() and MidB(), and there's also StrConv(), which has a vbUnicode argument. These are all functions that I'd think would be used in any double-byte context, but then, I don't work in that environment so might be missing something very important.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!