Returns a properly encoded string from a sequence of bytes.
$ZSUB[STR] (expr ,intexpr1 [,intexpr2])
The first expression is an expression of the byte string from which $ZSUBSTR() derives the character sequence.
The second expression is the starting byte position (counting from 1 for the first position) in the first expression from where $ZSUBSTR() begins to derive the character sequence.
The optional third expression specifies the number of bytes from the starting byte position specified by the second expression that contribute to the result. If the third expression is not specified, the $ZSUBSTR() function returns the sequence of characters starting from the byte position specified by the second expression up to the end of the byte string.
The $ZSUBSTR() function never returns a string with illegal or invalid characters. With VIEW "NOBADCHAR", the $ZSUBSTR() function ignores all byte sequences within the specified range that do not correspond to valid UTF-8 code-points. With VIEW "BADCHAR", the $ZSUBSTR() function triggers a run-time error if the specified byte sequence contains a code-point value that is not in the character set.
The $ZSUBSTR() is similar to the $ZEXTRACT() byte equivalent function but differs from that function in restricting its result to conform to the valid characters in the current encoding.
Example:
GTM>write $ZCHSET M GTM>set char1="a" ; one byte character GTM>set char2="ç"; two-byte character GTM>set char3="新"; three-byte character GTM>set y=char1_char2_char3 GTM>write $zsubstr(y,1,3)=$zsubstr(y,1,5) 0
With character set M specified, the expression $ZSUBSTR(y,1,3)=$ZSUBSTR(y,1,5) evaluates to 0 or "false" because the expression $ZSUBSTR(y,1,5) returns more characters than $ZSUBSTR(y,1,3).
Example:
GTM>write $zchset UTF-8 GTM>set char1="a" ; one byte character GTM>set char2="ç"; two-byte character GTM>set char3="新"; three-byte character GTM>set y=char1_char2_char3 GTM>write $zsubstr(y,1,3)=$zsubstr(y,1,5) 1
For a process started in UTF-8 mode, the expression $ZSUBSTR(y,1,3)=$ZSUBSTR(y,1,5) evaluates to 1 or "true" because the expression $ZSUBSTR(y,1,5) returns a string made up of char1 and char2 excluding the three-byte char3 because it was not completely included in the specified byte-length.
In many ways, the $ZSUBSTR() function is similar to the $ZEXTRACT() function. For example, $ZSUBSTR(expr,intexpr1) is equivalent to $ZEXTRACT(expr,intexpr1,$L(expr)). Note that this means when using the M character set, $ZSUBSTR() behaves identically to $EXTRACT() and $ZEXTRACT(). The differences are as follows:
$ZSUBSTR() cannot appear on the left of the equal sign in the SET command where as $ZEXTRACT() can.
In both the modes, the third expression of $ZSUBSTR() is a byte, rather than character, position within the first expression.
$EXTRACT() operates on characters, irrespective of byte length.
$ZEXTRACT() operates on bytes, irrespective of multi-byte character boundaries.
$ZSUBSTR() is the only way to extract as valid UTF-8 encoded characters from a byte string containing mixed UTF-8 and non UTF-8 data. It operates on Unicode® characters so that its result does not exceed the given byte length.