Encoding schemes (the bit pattern that represents a given character) vary among platforms (System i, Windows) and also between languages (Chinese, English, Spanish, etc.). For example, the character '¢' has many different encodings. On a System i we can find it in US English as x'4A', most Latin languages as x'B0', and Japanese as x'B1'. It is encoded on the IBM PC code page as x'9B', on the PC Multilingual code page as x'BD' and in Windows as x'A2'.
In order for characters to appear properly across the various platforms and languages, they need to be translated from their stored encoding to the encoding used by the observer. IBM i accomplishes this automatically through the use of CCSID values. Characters that are "tagged" with a particular CCSID are automatically translated whenever they are accessed by a job with a different CCSID value. This means that a '¢' stored by a U.S. user will appear as a '¢' on a workstation accessed by a user in Quebec, Canada even though the character has a different encoding for each of them.
Although translation is automatic among IBM i based jobs, Sequel software must manage the translation when sending results to stream files or a personal computer. This translation occurs whenever the EXECUTE command is used to create results in email, IFS stream files, HFS documents, and PC files or results windows displayed via ViewPoint. REPORT results sent to an IFS file or converted to PDF format must also be converted to a PC based encoding, as are PRINT results that are emailed.
This document describes the method of the Sequel directed translation and how to configure Sequel so that the translation is performed correctly.
The important factors affecting character translation with Sequel/Viewpoint and Esend are:
On System i:
- SQTRN data area (for Sequel)
- EMTRN data area (for Esend)
See “HOST Character Translation” for more information.
From PC control panel:
- Start\Program \ iSeries Access for Windows \ iSeries Access for Windows Properties \ Language tab.
Almost always must be 1252. See “Viewpoint Character Translation” for more information. This setting in very important.
- Regional and Language options.
This is moderately Important (more so for double byte considerations). See “DBCS and Asian Language support” for more information.
In iSeries Emulator: (green screen - does not impact ViewPoint)
- Menu option: Communication \ Configure.
Host code-page should be set with an appropriate CCSID.
The SQTRN data area in the SEQUEL library is critical to proper character translation when using SEQUEL host commands to create PC file results. The SQTRN data area specifies the target CCSID for the character conversion and also a translation table. Every SEQUEL user must have *USE authority to the SQTRN data area in order for the translation routines to discover the desired preferences.
If multiple local languages are in use on a given system, each user's library list will need to be arranged so that SEQUEL finds a data area with an appropriate specification in it.
The structure of the data area is as follows.
|1||Translation table name||char(10)|
|11||Translation table library||char(10)|
|21||Primary target CCSID||zoned(5,0)|
|26||Secondary target CCSID||zoned(5,0)|
Sequel is shipped with a SQTRN data area value of 'QA6YBFA93 QUSRSYS 0125201252'. This provides proper translation between US English encoded data and the Microsoft Windows codepage. If SEQUEL is installed or used with a different local language, the data area must be changed.
The table below can be used to determine the translation table name in library QUSRSYS that allows proper translation between host EBCDIC encoding and the target CCSID on a personal computer. The proper table can be located using the host CCSID or language ID. For example, a suitable Italian translation table is QUSRSYS/QA63BFA93. This provides translation between the Italian (Euro) 1144 CCSID and the PC-Windows 1252 CCSID.
To determine and set the table to use the correct value, follow these steps:
- Determine the correct IBM i CCSID by using DSPJOB OPTION(*DFNA), scrolling to the last page, and finding the Default coded character set identifier and Language identifier** for the job.
**The Default CCSID for a job is initially determined by a hierarchical process that begins by using the system value (QCCSID). If this value is not 65535, then the job and default CCSID are set to the system value. If the system value is 65535, then the job's CCSID is set to 65535 and the default CCSID is derived based on the job's language identifier. The job's language identifier is assigned by the device being used. If the device is emulated via iSeries Access, the language identifier is determined by the "Host Code Page" setting that is available by choosing menu options Communication: Configure in the emulator. If the device is not emulated through iSeries Access, the default CCSID is inferred by the language ID which is assigned from the keyboard type of the device.Update the data area as follows:
- Look up the Default CCSID or Language ID value in the columns of the table below.
- Update the data as follows:
CHGDTAARA SEQUELU/SQTRN VALUE('ttttttttttQUSRSYSbbbnnnnn')
The table name (ttt) must occupy the first 10 positions of the data area, the table library (QUSRSYS) must occupy the next 10 positions. Positions 21-25 (nnn) identify the target CCSID for the translation and this will normally be set to 01252 (Windows Latin 1 code page) with the following exceptions:
- 01250 if you are using QA7BBFA91. This is the Windows Latin 2 code page
- 01253 if you are using Q875BFA94. This is the Windows Greek code page
- 01254 if you are using QA7DBFA95. This is the Windows Turkish code page
- 00850 if you are using QA64BFA93 and you are having problems with variant Spanish characters in dbase output or working with Client Report Option
|Table name in QUSRSYS||Host
|QA6YBFA93||1140||37, 1047||ENU, NLD, PTG, PTB, ENA, ENP, AFR||USA, Netherlands, Portugal, Afrikaans|
|QA61BFA93||1142||277||DAN, NON, NOR||Denmark, Norway|
|QA62BFA93||1143||278||FIN, SVE||Finland, Sweden|
|QA64BFA93||1145||284||ESP, CAT||Spain, Latin America|
|QA65BFA93||1146||285||ENG, GAE||United Kingdom|
|QA67BFA93||1148||500||DES, FRB, FRC, FRS, ITS, NLB, SQI||Belgium, French Canada, Switzerland (Fr/Gr),MNCS (Multinational Character set)|
|QA7BBFA91||1153||870||CSY, HRV, HUN, PLK, ROM, SKY, SLO, SRL||Czech, Poland, Romania, Slovakia, Slovenia, Hungary, Croatia, Serbia|
Character translation in the Viewpoint environment is accomplished using iSeries Access APIs. When a ViewPoint connection to a target system is made, the character converter is created based on the "OEM code page value" in the Language tab of the iSeries Access for Windows Properties panel. If this value is incorrect, there is a possibility that the conversion will be performed incorrectly and data viewed through the ViewPoint interface will appear garbled.
The character converter maps between the chosen client code page and a target CCSID. The target CCSID is acquired from the QZRCSRVS job on the target system that ViewPoint communicates with.
As a general rule, the "OEM code page" value should reflect the Windows code page being used, rather than "Default". If set to "Default", iSeries Access can choose a code page that won't necessarily map the extended Latin characters to their proper Windows code points. For example, users with "English(United States)" specified as their "Language for non-Unicode programs" will be assigned a iSeries Access OEM code page equal to 437. This is sufficient for the U.S. character set, but will not properly map extended characters such as (é, ï, or ñ) to their correct Windows encodings. As a result, ViewPoint will display the different characters in place of the expected ones. We recommend these values for the "OEM code page" setting:
|1252||In all cases except clients using...|
|1250||Windows Latin 2 MultiLingual|
Windows clients that use DBCS version of iSeries Access should select the "Default" setting so that the appropriate mixed byte code page is selected.
Some languages include more characters than can be specified in a single byte of storage. These languages use two bytes of storage to represent a single character and are known as DBCS (Double Byte Character Set) languages. Storing DBCS data on an IBM i based system prior to V5R3M0 requires a special version of the operating system that can accommodate double byte operations. In V5R3M0 and later versions, DBCS capability is automatically enabled. If a DBCS capable version of the operating system is installed, the system value QIGC will equal '1', otherwise it will always be zero.
Even though a DBCS version of the operating system is installed, an emulated session will not necessarily be able to view double byte character data. In order to view DBCS data, the device that is being used must support a double byte character set.
The iSeries Access client controls whether DBCS capability is enabled. Prior to the V5R3 client, it was required that the DBCS version of iSeries Access (SK3T-4100-01) be installed and the language choice set to a DBCS language.
Beginning with client version V5R3, this is no longer required, but it is required that a DBCS capable version of the emulator be installed. If only "Standard PC5250" is chosen for the emulator during iSeries Access installation, users will not be able to display graphical characters. Instead, choose "Traditional Chinese" or another DBCS capable type as the emulator to obtain DBCS capability when installing iSeries Access, or use Selective Setup to add this feature. This does not install the Chinese language objects; the emulator menus, help text, etc. will match the client's Regional Settings properties when the emulator is run.
In order to properly display double byte characters, the Windows client must specify an appropriate mixed byte language identifier (i.e. Chinese) for non-Unicode programs on the Advanced tab of the Regional Settings applet. Once this is completed, an iSeries Access emulator session can be configured to use a Host Code Page corresponding to the selected non-Unicode language. The interactive job can then be established using a DBCS CCSID. Once this is done, graphic characters should present properly.
The iSeries system value QCCSID must match the desired DBCS code page. An interactive job started through the emulator will be set to the CCSID of the emulator's host code page. However, ViewPoint jobs will always start with a CCSID matching the QCCSID value (regardless of the value in the "Language" tab of the iSeries Access properties page) so DBCS capability will not be available unless QCCSID is properly set.
When the proper mixed byte language identifier for non-Unicode programs is used and the system value QCCSID is properly set, ViewPoint should properly display, print, and provide host reporting capability for files containing double byte data. The ViewPoint log will indicate the source/ target code page values that are used for character conversion. The Client Report Option (CRO) does not support double byte results because Crystal reads our results from a dBase file and dBase doesn't support Unicode or East Asian double byte characters.
In May 2004 we enhanced the XLS output capability of Sequel to support BIFF8 formatted files. This enables the proper creation of XLS workbooks containing both Unicode and East Asian DBCS data. The target CCSID in the SQTRN data area is ignored during XLS output creation; all worksheet data will be converted to Unicode as required by the BIFF8 format. The translation table is still used for performance reasons and must properly reflect the codepage of the current system.
In 2006, we further enhanced the way that Sequel produces output containing Asian characters. Asian character output produced by the iSeries typically uses a double wide graphic character and a single wide Latin character. There is also a space at the beginning and ending location of an Asian character string. Conversion of these results from spooled output to PDF causes alignment problems because the PC fonts consistently use a single wide character. To fix this, extra spaces are inserted to the right of an Asian character string so that columns to the right of the Asian characters will continue to be properly aligned. The spaces are inserted at the location of the first space character following the Asian string.
In a similar manner, some care must be exercised in creating RTF results from spooled files. Sequel uses the CPYSPLIFS command to convert spooled results to RTF format and uses the values stored by the ESNDUSR command to determine the font name and size of the results. The font name specified in the RTF settings must specify a fixed pitch font that consistently represents Asian characters as twice the width of Latin characters. The Microsoft TrueType modern font named MS Gothic is a good choice for this.
Text files and XLS results are not changed to preserve alignment nor are space characters inserted into the results in place of the Shift In/Out control characters. Translation is performed according to the rules of the chosen target CCSID.
Sequel Host Report Character Translation
Sequel Report (SQLRPT) objects are "tagged" with the CCSID of the job when they are created. This enables Sequel translation between literals stored in the report and used in expressions. Translation will be circumvented if the job's CCSID=65535 (no translation).
In most cases, IBM i systems will no longer be configured to use CCSID=65535. However, if CCSID=65535 is in use by jobs on a system that will be using the "green screen" report writer, the SEQUEL report database needs to be converted to the CCSID of the local language for the job. If the data in the files is not converted to the proper code page, references to special fields, characters, and operators (e.g. @, $, #) will not appear properly and may be mishandled by the RPG programs that work with them.
We have created a program that completes this automatically. Once Sequel is installed and the "global" functions restored, the database can be updated by using the command: CALL CVTRPTDB ccccc (where ccccc is the CCSID of the local language - use 280 or 1144 for Italy). Once the report database is converted, the DSNREPORT command will work properly with jobs having CCSID 65535.
In December 2002, we updated SEQUEL so that it could return data stored in GRAPHIC fields with Unicode encoding. SEQUEL will now ALWAYS create views that translate UCS-2, UTF-8 and UTF-16 data to EBCDIC character fields. While this works well for DISPLAY, REPORT, etc. it can have a negative implication in the case of EXECUTE because the created physical file's field definition (EBCDIC) will not match the input file's field definition (Unicode).
Unfortunately, we are forced into this behavior by IBM's Create File API (QZRCRTPF) because it is unable to create files with Unicode fields in them. Although the file seems to create properly, and displays properly with DSPFD/DSPFFD, a function check occurs in QDBCRTME when adding a member.
In February 2004, we enhanced Sequel to perform translations using double byte characters and also enabled Unicode as a target CCSID for stream file output. If the Primary CCSID value in the SQTRN data area is 13488, Sequel will produce Unicode (big-endian) results for text, RTF, and PDF files produced through EXECUTE and REPORT.
When the primary CCSID is set to 13488 Unicode, then it is also necessary to specify an appropriate single byte secondary CCSID in positions 26 - 30. This secondary CCSID is used to create the mime header in each email message, and as a "fall back" code page in the event that the PDF conversion is unable to match the current job's CCSID to one of the built-in PDF Unicode types.