Asian Document Style Standardization for Information Interchange (DocSII)
DocSII N52 2004-08-15

SOURCE:Virach SORNLERTLAMVANICH and Yushi KOMACHI
STATUS:1st Working Draft
ACTION:For Review and Comment
DATE:2004-08-15
DISTRIBUTION:Members of DocSII

Introduction

The requirements for "Implementation Guide for Document Style Processing" were proposed in the DocSII (Asian Document Style Standardization for Information Interchange) Symposium 2003, which was held in Ulaan Baatar, Mongolia, Sept 30 and Oct. 1, 2003. Working drafts of the Implementation Guide are reviewed and discussed by the members of DocSII.

1. Document processing model

Logical structured documents, e.g., XML documents, consist of logical elements and their structure description. A style specification for the logical documents indicates style attributes for the logical elements and some structure conversions. Document formatting and rendering is a processing of a mathematical convention that shows a mapping of logical elements onto a physical representation media in accordance with corresponding style specification. Those document processings are illustrated in Figure 1.

Figure 1.1 — Document processing model

When the restriction of style specification is too severe, the mathematical convention will have no solution. To avoid those situations without solution of formatting, the following approaches have been employed:

  • a) Some reconciliation rules are applied.
  • b) Some style specifications are relaxed and some incorporated in formatting and rendering systems.

NOTE 1: Line-end and page-end processing rules have been usually incorporated in actual formatting and rendering systems. When those formatting and rendering systems are assumed, external style specifications include no specification for line-end and page-end processing rules. Assuming those formatting and rendering systems, the ISO/IEC TR 19758 (DSSSL library for style specification) and its amendments deal with no style library for line-end and page-end processing rules.

The approach b) will cause a slight difference between the rendered page images of sending system and receiving system. To minimize the difference, Implementation Guide for Document Style Processing becomes essential.

Web browser is a powerful application used to visualize XML documents. With a Web browser, a style sheet converts an XML document into a suitable HTML document that can be viewed in the browser. Extensible Style Language (XSL) is used to stylize the XML document in 2 different approaches:

  • XSL Transformation (XSLT) to transform XML documents into other document formats.
  • XSL Formatting Objects to add styles to XML documents by using special formatting rules.

2. Scope

This guideline shows some specifications that should be supported by formatting and rendering systems. A negotiation of those specifications between formatting and rendering systems will contribute to preserve page images of documents interchanged between the systems.

3. References

ISO/IEC TR 19758:2003, DSSSL library for complex compositions, 2003-04

4. Terms and definitions

For the purpose of this document, the following terms and definitions apply.

4.1 ***

TBD

5. Line-end processing

5.1 Japanese text

5.1.1 Line head wrap

The characters shown in Figure 5.1 should not be located at the beginning of a line.

Figure 5.1 — Line-head-wrap characters

The possible character should be composed

  • at the end of the previous line, or
  • at the second character position of the line, adjusting the end character of the previous line to the beginning of this line.

5.1.2 Line end wrap

The characters shown in Figure 5.2 should not be located at the end of a line.

Figure 5.2 — Line-end-wrap characters

The possible character should be composed

  • at the beginning of the next line, or
  • at the second character position from the end of the line, adjusting the beginning character of the next line to the end of this line.

5.2 Thai text

5.2.1 Processing unit

There is no explicit word boundary in Thai text. To wrap the line end, the application needs to be able to determine the appropriate position that can be a composed character end, a syllable end, a word end, or a space/symbol character.

  • A composed character is a composite of a consonant and one or more bound vowel signs and tonal marks. Figure 5.3 shows the bound characters that cannot be placed alone in the text. Figure 5.4 shows a sample of a composed character.
Figure 5.3 — Bound characters
Figure 5.4 — Composed character
  • A syllable and word is a unit that needs a high level processing support to determine the boundary. A word list with syllable information is necessary for weighting the appropriateness of the wrapping.
Figure 5.5 — Syllable unit
  • The change of character type actually shows the boundary of the breakable unit. A space character must be kept at the end of line to preserve the spacing information.

Hyphenation is not preferred. In an unavoidable case, a hyphen (-) is inserted following the order of preference:

  • Between words that compose a compound.
  • Between full-sound syllables.
  • Between half-sound syllables.

5.2.2 Line head wrap

Figure 5.5 shows non-leading characters which cannot be placed at the beginning of the line. It must be combined with previous character(s) to form a syllable or a word depending on the processing level.

Figure 5.6 — Non-leading characters

5.2.3 Line end wrap

Figure 5.6 shows non-ending characters which cannot be placed at the end of the line. It must be combined with following character(s) to form a syllable or a word depending on the processing level.

Figure 5.7 — Non-ending characters

6. Page-end processing

6.1 Japanese text

TBD

6.2 Thai text

Bottom right conner shows page number followed by a slash "/", a few words of the following page and dots "...". Figure 6.1 and 6.2 show the indication of the consecutive page.

Figure 6.1— Previous page document
Figure 6.2 — Following page document

7. Number representation

7.1 Japanese text

7.1.1 Conversion between vertical and horizontal compositions in Japan's texts

7.2 Thai text

Conversion between Thai and Arabic numerals is prepared for selection according to the desired document. However, mixture between the 2 numeral systems is not allowed. Thai numeral is obligatory in Thai official document. Figure 7.1 shows the correspondance between Thai and Arabic numeral.

Figure 7.1 — Corresponding Thai-Arabic numeralv

8. Listing

8.1 Japanese text

TBD

8.2 Thai text

8.2.1 Numbered listing

Arabic and Thai numbering systems are used alternatively. A conversion between the numbering systems need to be prepared. An item is devided upto 4 levels. In addition, sub-item is defined in a bracket "()" attaching the lowest level. Figure 8.1 shows the numbered listing in sub-items.

Figure 8.1 — Numbered listing

8.2.2 Alphabetical listing

Only Thai consonants are used to label the items. Conventionally, uncommon consonants are avoided in the listing i.e. . The consonant can be in a bracket "()" or followed by a "." to make the item label. Multiple alphabets labelling is not preferred.

8.2.3 Combined Number-alphabetical listing

The combination of number and alphabetical listing is allowed to label the items. However, the combination of Thai and Arabic number is not allowed in the same document.

9. Paragraph

9.1 Japanese text

TBD

9.2 Thai text

9.2.1 Indentation

A paragraph always begins with indentation. The size of indentation can be varied but usually be kept the same through out the document. There is no blank line between paragraphs. Figure 8.1 shows the identation of paragraphs.

Figure 8.1 — Paragraph identation

Annex A Bibliography

The following documents have served as informative references in the preparation of this Implementation Guide.

1) DocSII N37, Summary of DocSII Symposium 2003, 2003-10-01

2) JIS Z 8126:2004, Graphic arts — Glossary — Digital printing terms, 2004-02-20

3) Thai Document Style, Virach Sornlertlamvanich and Thatsanee Charoenporn, Asian Document Standard Workshop, 2002-09-17

4) Acquisitive Examples of Thai Document Styles and Layouts, Virach Sornlertlamvanich and Thatsanee Charoenporn, DocSII Symposium 2003, 2003-09-30

5) Thai Font, National Electronics and Computer Technology Center (NECTEC), 2001

Annex B *****