Revision History | ||
---|---|---|
Revision V6.3-011 | 20 December 2019 |
|
Revision V6.3-010 | 31 October 2019 |
|
Revision V6.3-006 | 26 October 2018 |
|
Revision V6.3-001 | 20 March 2017 |
|
Revision V6.1-000 | 28 August 2014 | In “Using the %GBLDEF Utility”, added changes for global spaning regions. |
Table of Contents
This chapter describes GT.M facilities for applications using characters encoded in other than eight-bit bytes (octets). Before continuing with use of UTF-8 features, you will need to ensure that your system has installed and configured the needed infrastructure for languages you wish to support, including International Components for Unicode (ICU / libicu), UTF-8 locale(s), and terminal emulators with appropriate fonts. This chapter addresses the specific issues of defining alternative collation sequences, and defining unique patterns for use with the pattern match operator.
Alternative collation sequences (or an alternative ordering of strings) can be defined for global and local variable subscripts. They can be established for specified globals or for an entire database. The alternative sequences are defined by a series of routines in an executable file pointed to by an environment variable. As the collation sequence is implemented by a user-supplied program, virtually any collation policy may be implemented. Detailed information on establishing alternative collation sequences and defining the environment variable is provided in “Collation Sequence Definitions”.
M has defined pattern classes that serve as arguments to the pattern match operator. GT.M supports user definition of additional pattern classes as well as redefinition of the standard pattern classes. Specific patterns are defined in a text file that is pointed to by an environment variable. Pattern classes may be re-defined dynamically. The details of defining these pattern classes and the environment variable are described in the section called “Matching Alternative Patterns”.
For some languages (such as Chinese), the ordering of strings according to Unicode® code-points (character values) may or may not be the linguistically or culturally correct ordering. Supporting applications in such languages requires development of collation modules - GT.M natively supports M collation, but does not include pre-built collation modules for any specific natural language. Therefore, applications that use characters in Unicode may need to implement their own collation functions. For more information on developing a collation module for Unicode, refer to “Implementing an Alternative Collation Sequence for Unicode® characters”.