Asian Document Style Standardization for Information Interchange (DocSII)
DocSII N52 2004-08-15
|Virach SORNLERTLAMVANICH and Yushi KOMACHI
|1st Working Draft
|For Review and Comment
|Members of DocSII
The requirements for "Implementation Guide for Document Style Processing" were proposed in the DocSII (Asian Document Style Standardization for Information Interchange) Symposium 2003, which was held in Ulaan Baatar, Mongolia, Sept 30 and Oct. 1, 2003. Working drafts of the Implementation Guide are reviewed and discussed by the members of DocSII.
1. Document processing model
Logical structured documents, e.g., XML documents, consist of logical elements and their structure description. A style specification for the logical documents indicates style attributes for the logical elements and some structure conversions. Document formatting and rendering is a processing of a mathematical convention that shows a mapping of logical elements onto a physical representation media in accordance with corresponding style specification. Those document processings are illustrated in Figure 1.
When the restriction of style specification is too severe, the mathematical convention will have no solution. To avoid those situations without solution of formatting, the following approaches have been employed:
- a) Some reconciliation rules are applied.
- b) Some style specifications are relaxed and some incorporated in formatting and rendering systems.
NOTE 1: Line-end and page-end processing rules have been usually incorporated in actual formatting and rendering systems. When those formatting and rendering systems are assumed, external style specifications include no specification for line-end and page-end processing rules. Assuming those formatting and rendering systems, the ISO/IEC TR 19758 (DSSSL library for style specification) and its amendments deal with no style library for line-end and page-end processing rules.
The approach b) will cause a slight difference between the rendered page images of sending system and receiving system. To minimize the difference, Implementation Guide for Document Style Processing becomes essential.
Web browser is a powerful application used to visualize XML documents. With a Web browser, a style sheet converts an XML document into a suitable HTML document that can be viewed in the browser. Extensible Style Language (XSL) is used to stylize the XML document in 2 different approaches:
- XSL Transformation (XSLT) to transform XML documents into other document formats.
- XSL Formatting Objects to add styles to XML documents by using special formatting rules.
This guideline shows some specifications that should be supported by formatting and rendering systems. A negotiation of those specifications between formatting and rendering systems will contribute to preserve page images of documents interchanged between the systems.
ISO/IEC TR 19758:2003, DSSSL library for complex compositions, 2003-04
4. Terms and definitions
For the purpose of this document, the following terms and definitions apply.
5. Line-end processing
5.1 Japanese text
5.1.1 Line head wrap
The characters shown in Figure 5.1 should not be located at the beginning of a line.
The possible character should be composed
- at the end of the previous line, or
- at the second character position of the line, adjusting the end character of the previous line to the beginning of this line.
5.1.2 Line end wrap
The characters shown in Figure 5.2 should not be located at the end of a line.
The possible character should be composed
- at the beginning of the next line, or
- at the second character position from the end of the line, adjusting the beginning character of the next line to the end of this line.
5.2 Thai text
5.2.1 Processing unit
There is no explicit word boundary in Thai text. To wrap the line end, the application needs to be able to determine the appropriate position that can be a composed character end, a syllable end, a word end, or a space/symbol character.
- A composed character is a composite of a consonant and one or more bound vowel signs and tonal marks. Figure 5.3 shows the bound characters that cannot be placed alone in the text. Figure 5.4 shows a sample of a composed character.
- A syllable and word is a unit that needs a high level processing support to determine the boundary. A word list with syllable information is necessary for weighting the appropriateness of the wrapping.
- The change of character type actually shows the boundary of the breakable unit. A space character must be kept at the end of line to preserve the spacing information.
Hyphenation is not preferred. In an unavoidable case, a hyphen (-) is inserted following the order of preference:
- Between words that compose a compound.
- Between full-sound syllables.
- Between half-sound syllables.
5.2.2 Line head wrap
Figure 5.5 shows non-leading characters which cannot be placed at the beginning of the line. It must be combined with previous character(s) to form a syllable or a word depending on the processing level.
5.2.3 Line end wrap
Figure 5.6 shows non-ending characters which cannot be placed at the end of the line. It must be combined with following character(s) to form a syllable or a word depending on the processing level.
6. Page-end processing
6.1 Japanese text
6.2 Thai text
Bottom right conner shows page number followed by a slash "/", a few words of the following page and dots "...". Figure 6.1 and 6.2 show the indication of the consecutive page.
7. Number representation
7.1 Japanese text
7.1.1 Conversion between vertical and horizontal compositions in Japan's texts
7.2 Thai text
Conversion between Thai and Arabic numerals is prepared for selection according to the desired document. However, mixture between the 2 numeral systems is not allowed. Thai numeral is obligatory in Thai official document. Figure 7.1 shows the correspondance between Thai and Arabic numeral.
8.1 Japanese text
8.2 Thai text
8.2.1 Numbered listing
Arabic and Thai numbering systems are used alternatively. A conversion between the numbering systems need to be prepared. An item is devided upto 4 levels. In addition, sub-item is defined in a bracket "()" attaching the lowest level. Figure 8.1 shows the numbered listing in sub-items.
8.2.2 Alphabetical listing
Only Thai consonants are used to label the items. Conventionally, uncommon consonants are avoided in the listing i.e. . The consonant can be in a bracket "()" or followed by a "." to make the item label. Multiple alphabets labelling is not preferred.
8.2.3 Combined Number-alphabetical listing
The combination of number and alphabetical listing is allowed to label the items. However, the combination of Thai and Arabic number is not allowed in the same document.
9.1 Japanese text
9.2 Thai text
A paragraph always begins with indentation. The size of indentation can be varied but usually be kept the same through out the document. There is no blank line between paragraphs. Figure 8.1 shows the identation of paragraphs.
Annex A Bibliography
The following documents have served as informative references in the preparation of this Implementation Guide.
1) DocSII N37, Summary of DocSII Symposium 2003, 2003-10-01
2) JIS Z 8126:2004, Graphic arts — Glossary — Digital printing terms, 2004-02-20
3) Thai Document Style, Virach Sornlertlamvanich and Thatsanee Charoenporn, Asian Document Standard Workshop, 2002-09-17
4) Acquisitive Examples of Thai Document Styles and Layouts, Virach Sornlertlamvanich and Thatsanee Charoenporn, DocSII Symposium 2003, 2003-09-30
5) Thai Font, National Electronics and Computer Technology Center (NECTEC), 2001