Each alternative collation sequence requires a set of four user-created routines--gtm_ac_xform_1 (or gtm_ac_xform), gtm_ac_xback_1 (or gtm_ac_xback), gtm_ac_version, and gtm_ac_verify. The original and transformed strings are passed between GT.M and the user-created routines using parameters of type gtm_descriptor or gtm32_descriptor. An "include file" gtm_descript.h, located in the GT.M distribution directory, defines gtm_descriptor (used with gtm_ac_xform and gtm_ac_xback) as:

typedef struct
{
    short len;
    short type;
    void *val;
} gtm_descriptor;
[Note]Note

On 64-bit UNIX platforms, gtm_descriptor may grow by up to eight (8) additional bytes as a result of compiler padding to meet platform alignment requirements.

gtm_descript.h defines gtm32_descriptor (used with gtm_xc_xform_1 and gtm_xc_xback_2) as:

typedef struct
{
    unsigned int len;
    unsigned int type;
    void *val;
} gtm32_descriptor;

where len is the length of the data, type is set to DSC_K_DTYPE_T (indicating that this is an M string), and val points to the text of the string.

The interface to each routine is described below.

gtm_ac_xform_1 or gtm_ac_xform routines transforms subscripts to the alternative collation sequence.

This routine returns altered keys to the original subscripts. The syntax of this routine is:

#include "gtm_descript.h"
long gtm_ac_xback(gtm_descriptor *in, int level, gtm_descriptor *out, int *outlen)

If the application uses subscripted lvns longer than 32,767 bytes (but less than 1,048,576 bytes), the alternative collation library must contain the gtm_ac_xform_1 and gtm_ac_xback_1 routines. Otherwise, the alternative collation library can contain gtm_ac_xform and gtm_ac_xback.

The syntax of this routine is:

#include "gtm_descript.h"
int gtm_ac_xform_1(gtm32_descriptor* in, int level, gtm32_descriptor* out, int* outlen);

The output arguments for gtm_ac_xform are:

return value: a long result providing a status code; it indicates the success (zero) or failure (non-zero) of the transformation.

out: a gtm_descriptor containing the transformed key.

outlen: an unsigned long, passed by reference, giving the actual length of the output key.

Example:

#include "gtm_descript.h"
#define MYAPP_SUBSC2LONG 12345678
static unsigned char xform_table[256] =
{
  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
 64, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93,
 95, 97, 99,101,103,105,107,109,111,113,115,117,118,119,120,121,
122, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94,
 96, 98,100,102,104,106,108,110,112,114,116,123,124,125,126,127,
128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,
144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,
160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,
176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,
208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,
224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,
240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
};
long
gtm_ac_xform (in, level, out, outlen)
     gtm_descriptor *in;    /* the input string */
     int level;            /* the subscript level */
     gtm_descriptor *out;    /* the output buffer */
     int *outlen;        /* the length of the output string */
{
  int n;
  unsigned char *cp, *cout;
/* Ensure space in the output buffer for the string. */
  n = in->len;
  if (n > out->len)
    return MYAPP_SUBSC2LONG;
/* There is space, copy the string, transforming, if necessary */
  cp = in->val;            /* Address of first byte of input string */
  cout = out->val;        /* Address of first byte of output buffer */
  while (n-- > 0)
    *cout++ = xform_table[*cp++];
  *outlen = in->len;
  return 0;
}

This routine returns altered keys to the original subscripts. The syntax of this routine is:

#include "gtm_descript.h"
long gtm_ac_xback(gtm_descriptor *in, int level, gtm_descriptor *out, int *outlen)

The arguments of gtm_ac_xback are identical to those of gtm_ac_xform.

The syntax of gtm_ac_xback_1 is:

#include "gtm_descript.h"
long gtm_ac_xback_1(gtm32_descriptor *src, int level, gtm32_descriptor *dst, int *dstlen)

The arguments of gtm_ac_xback_1 are identical to those of gtm_ac_xform_1.

Example:

#include "gtm_descript.h"
#define MYAPP_SUBSC2LONG 12345678
static unsigned char inverse_table[256] =
{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 97, 66, 98, 67, 99, 68,100, 69,101, 70,102, 71,103, 72,
104, 73,105, 74,106, 75,107, 76,108, 77,109, 78,110, 79,111, 80,
112, 81,113, 82,114, 83,115, 84,116, 85,117, 86,118, 87,119, 88,
120, 89,121, 90,122, 91, 92, 93, 94, 95, 96,123,124,125,126,127,
128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,
144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,
160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,
176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,
208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,
224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,
240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
};
long gtm_ac_xback (in, level, out, outlen)
     gtm_descriptor *in;    /* the input string */
     int level;             /* the subscript level */
     gtm_descriptor *out;   /* output buffer */
     int *outlen;           /* the length of the output string */
{
  int n;
  unsigned char *cp, *cout;
/* Ensure space in the output buffer for the string. */
  n = in->len;
  if (n > out->len)
    return MYAPP_SUBSC2LONG;
/* There is enough space, copy the string, transforming, if necessary */
  cp = in->val;            /* Address of first byte of input string */
  cout = out->val;        /* Address of first byte of output buffer */
  while (n-- > 0)
    *cout++ = inverse_table[*cp++];
  *outlen = in->len;
  return 0;
}

Two user-defined version control routines provide a safety mechanism to guard against a collation routine being used on the wrong global, or an attempt being made to modify a collation routine for an existing global. Either of these situations could cause incorrect collation or damage to subscripts.

When a global is assigned an alternative collation sequence, GT.M invokes a user-supplied routine that returns a numeric version identifier for the set of collation routines, which was stored with the global. The first time a process accesses the global, GT.M determines the assigned collation sequence, then invokes another user-supplied routine. The second routine matches the collation sequence and version identifier assigned to the global with those of the current set of collation routines.

When you write the code that matches the type and version, you can decide whether to modify the version identifier and whether to allow support of globals created using a previous version of the routine.

Use the %GBLDEF utility to get, set, or kill the collation sequence of a global variable mapped by the current global directory. %GBLDEF cannot modify the collation sequence for either a global containing data or a global whose subscripts span multiple regions. To change the collation sequence for a global variable that contains data, extract the data, KILL the variable, change the collation sequence, and reload the data. Use GDE to modify the collation sequence of a global variable that spans regions.

For more information, refer to “%GBLDEF ”in the Utilities Chapter of this manual.

To examine the collation characteristics currently assigned to a global use the extrinsic entry point:

get^%GBLDEF(gname[,reg])
[Note]Note

get^%GBLDEF(gname) returns global specific characteristics, which can differ from collation characteristics defined for the database file at MUPIP CREATE time from settings in the global directory.

DSE DUMP -FILEHEADER command displays region collation whenever the collation is other than M standard collation.

Example:

GTM>Write $$get^%GBLDEF("^G")
1,3,1

This example returns the collation sequence information currently assigned to the global ^G.

This example is create an alternate collation sequence that collates upper and lower case alphabetic characters in such a way that the set of keys "du Pont," "Friendly," "le Blanc," and "Madrid" collates as:

This is in contrast to the standard M collation that orders them as:

[Important]Important

No claim of copyright is made with respect to the code used in this example. Please do not use the code as-is in a production environment.

Please ensure that you have a correctly configured GT.M installation, correctly configured environment variables, with appropriate directories and files.

Seasoned GT.M users may want download polish.c used in this example and proceed directly to Step 5 for compiling and linking instructions. First time users may want to start from Step 1.

  1. Create a new file called polish.c and put the following code:

    #include <stdio.h>
    #include "gtm_descript.h"
    #define COLLATION_TABLE_SIZE     256
    #define MYAPPS_SUBSC2LONG        12345678
    #define SUCCESS     0
    #define FAILURE     1                
    #define VERSION     0
    static unsigned char xform_table[COLLATION_TABLE_SIZE] =
              {
              0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
              16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
              32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
              48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
              64, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93,
              95, 97, 99,101,103,105,107,109,111,113,115,117,118,119,120,121,
              122, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94,
              96, 98,100,102,104,106,108,110,112,114,116,123,124,125,126,127,
              128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,
              144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,
              160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,
              176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
              192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,
              208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,
              224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,
              240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
              };
    static unsigned char inverse_table[COLLATION_TABLE_SIZE] =
              {
              0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
              16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
              32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
              48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
              64, 65, 97, 66, 98, 67, 99, 68,100, 69,101, 70,102, 71,103, 72,
              104, 73,105, 74,106, 75,107, 76,108, 77,109, 78,110, 79,111, 80,
              112, 81,113, 82,114, 83,115, 84,116, 85,117, 86,118, 87,119, 88,
              120, 89,121, 90,122, 91, 92, 93, 94, 95, 96,123,124,125,126,127,
              128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,
              144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,
              160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,
              176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
              192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,
              208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,
              224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,
              240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
              };

    Elements in xform_table represent input order for transform. Elements in inverse_table represent reverse transform for x_form_table.

  2. Add the following code for the gtm_ac_xform transformation routine:

    long gtm_ac_xform ( gtm_descriptor *src, int level, gtm_descriptor *dst, int *dstlen)
          {
              int n;
              unsigned char  *cp, *cpout;
          #ifdef DEBUG
              char input[COLLATION_TABLE_SIZE], output[COLLATION_TABLE_SIZE];
          #endif
              n = src->len;
              if ( n > dst->len)
                 return MYAPPS_SUBSC2LONG;
              cp  = (unsigned char *)src->val;
          #ifdef DEBUG
              memcpy(input, cp, src->len);
              input[src->len] = '\0';
          #endif
              cpout = (unsigned char *)dst->val;
              while ( n-- > 0 )
                 *cpout++ = xform_table[*cp++];
              *cpout = '\0';
              *dstlen = src->len;
          #ifdef DEBUG
              memcpy(output, dst->val, dst->len);
              output[dst->len] = '\0';
              fprintf(stderr, "\nInput = \n");
              for (n = 0; n < *dstlen; n++ ) fprintf(stderr," %d ",(int )input[n]);
              fprintf(stderr, "\nOutput = \n");
              for (n = 0; n < *dstlen; n++ ) fprintf(stderr," %d ",(int )output[n]);
          #endif
              return SUCCESS;
          }
       3. Add the following code for the gtm_ac_xback reverse transformation routine:
          long gtm_ac_xback ( gtm_descriptor *src, int level, gtm_descriptor *dst, int *dstlen)
          {
              int n;
              unsigned char  *cp, *cpout;
          #ifdef DEBUG
              char input[256], output[256];
          #endif
              n = src->len;
              if ( n > dst->len)
              return MYAPPS_SUBSC2LONG;
              cp  = (unsigned char *)src->val;
              cpout = (unsigned char *)dst->val;
              while ( n-- > 0 )
                 *cpout++ = inverse_table[*cp++];
              *cpout = '\0';
              *dstlen = src->len;
          #ifdef DEBUG
              memcpy(input, src->val, src->len);
              input[src->len] = '\';
              memcpy(output, dst->val, dst->len);
              output[dst->len] = '\0';
              fprintf(stderr, "Input = %s, Output = %s\n",input, output);
          #endif
              return SUCCESS;
          }
  3. Add code for the version identifier routine (gtm_ac_version) or the verification routine (gtm_ac_verify):

    int gtm_ac_version ()
          {
              return VERSION;
          }
          int gtm_ac_verify (unsigned char type, unsigned char ver)
          {
                  return !(ver == VERSION);
          }
  4. Save and compile polish.c. On x86 GNU/Linux (64-bit Ubuntu 10.10), execute a command like the following:

    gcc -c polish.c -I$gtm_dist
    [Note]Note

    The -I$gtm_dist option includes gtmxc_types.h.

  5. Create a new shared library or add the above routines to an existing one. The following command adds these alternative sequence routines to a shared library called altcoll.so on x86 GNU/Linux (64-bit Ubuntu 10.10).

    gcc -o altcoll.so -shared polish.o
  6. Set $gtm_collate_1 to point to the location of altcoll.so.

  7. At the GTM> prompt execute the following command:

    GTM>Write $SELECT($$set^%GBLDEF("^G",0,1):"OK",1:"FAILED")
          OK

    This deletes the global variable ^G, then sets ^G to the collation sequence number 1 with numeric subscripts collating before strings.

  8. Assign the following value to ^G.

    GTM>Set ^G("du Pont")=1
    GTM>Set ^G("Friendly")=1
    GTM>Set ^G("le Blanc")=1
    GTM>Set ^G("Madrid")=1
  9. See how the subscript of ^G order according to the alternative collation sequence:

    GTM>ZWRite ^G
    ^G("du Pont")=1
    ^G("Friendly")=1
    ^G("le Blanc")=1
    ^G("Madrid")=1

This example creates an alternate collation sequence that collates alphabets in reverse order. This is in contrast to the standard M collation that collates alphabets in ascending order.

[Important]Important

No claim of copyright is made with respect to the code used in this example. Please do not use the code as-is in a production environment.

Please ensure that you have a correctly configured GT.M installation, correctly configured environment variables, with appropriate directories and files.

  1. Download col_reverse_32.c from http://tinco.pair.com/bhaskar/gtm/doc/books/pg/UNIX_manual/col_reverse_32.c. It contain code for transformation routine (gtm_ac_xform_1), reverse transformation routine (gtm_ac_xback_1) and version control routines (gtm_ac_version and gtm_ac_verify).

  2. Save and compile col_reverse_32.c. On x86 GNU/Linux (64-bit Ubuntu 10.10), execute a command like the following:

    gcc -c col_reverse_32.c -I$gtm_dist
    [Note]Note

    The -I$gtm_dist option includes gtmxc_types.h.

  3. Create a new shared library or add the routines to an existing one. The following command adds these alternative sequence routines to a shared library called altcoll.so on x86 GNU/Linux (64-bit Ubuntu 10.10).

    gcc -o revcol.so -shared col_reverse_32.o
  4. Set the environment variable gtm_collate_2 to point to the location of revcol.so. To set the local variable collation to this alternative collation sequence, set the environment variable gtm_local_collate to 2.

  5. At the GTM prompt, execute the following command:

    GTM>Write $SELECT($$set^%GBLDEF("^E",0,2):"OK",1:"FAILED")
    OK
  6. Assign the following value to ^E.

    GTM>Set ^E("du Pont")=1
    GTM>Set ^E("Friendly")=1
    GTM>Set ^E("le Blanc")=1
    GTM>Set ^E("Madrid")=1
  7. Notice how the subscript of ^E sort in reverse order:

    GTM>zwrite ^E
    ^G("le Blanc")=1
    ^G("du Pont")=1
    ^G("Madrid")=1
    ^G("Friendly")=1
loading table of contents...