diff options
Diffstat (limited to 'intl/hyphenation/hyphen/README.nonstandard')
-rw-r--r-- | intl/hyphenation/hyphen/README.nonstandard | 122 |
1 files changed, 122 insertions, 0 deletions
diff --git a/intl/hyphenation/hyphen/README.nonstandard b/intl/hyphenation/hyphen/README.nonstandard new file mode 100644 index 000000000..fd80d12c6 --- /dev/null +++ b/intl/hyphenation/hyphen/README.nonstandard @@ -0,0 +1,122 @@ +Non-standard hyphenation +------------------------ + +Some languages use non-standard hyphenation; `discretionary' +character changes at hyphenation points. For example, +Catalan: paral·lel -> paral-lel, +Dutch: omaatje -> oma-tje, +German (before the new orthography): Schiffahrt -> Schiff-fahrt, +Hungarian: asszonnyal -> asz-szony-nyal (multiple occurance!) +Swedish: tillata -> till-lata. + +Using this extended library, you can define +non-standard hyphenation patterns. For example: + +l·1l/l=l +a1atje./a=t,1,3 +.schif1fahrt/ff=f,5,2 +.as3szon/sz=sz,2,3 +n1nyal./ny=ny,1,3 +.til1lata./ll=l,3,2 + +or with narrow boundaries: + +l·1l/l=,1,2 +a1atje./a=,1,1 +.schif1fahrt/ff=,5,1 +.as3szon/sz=,2,1 +n1nyal./ny=,1,1 +.til1lata./ll=,3,1 + +Note: Libhnj uses modified patterns by preparing substrings.pl. +Unfortunatelly, now the conversion step can generate bad non-standard +patterns (non-standard -> standard pattern conversion), so using +narrow boundaries may be better for recent Libhnj. For example, +substrings.pl generates a few bad patterns for Hungarian hyphenation +patterns resulting bad non-standard hyphenation in a few cases. Using narrow +boundaries solves this problem. Java HyFo module can check this problem. + +Syntax of the non-standard hyphenation patterns +------------------------------------------------ + +pat1tern/change[,start,cut] + +If this pattern matches the word, and this pattern win (see README.hyphen) +in the change region of the pattern, then pattern[start, start + cut - 1] +substring will be replaced with the "change". + +For example, a German ff -> ff-f hyphenation: + +f1f/ff=f + +or with expansion + +f1f/ff=f,1,2 + +will change every "ff" with "ff=f" at hyphenation. + +A more real example: + +% simple ff -> f-f hyphenation +f1f +% Schiffahrt -> Schiff-fahrt hyphenation +% +schif3fahrt/ff=f,5,2 + +Specification + +- Pattern: matching patterns of the original Liang's algorithm + - patterns must contain only one hyphenation point at change region + signed with an one-digit odd number (1, 3, 5, 7 or 9). + These point may be at subregion boundaries: schif3fahrt/ff=,5,1 + - only the greater value guarantees the win (don't mix non-standard and + non-standard patterns with the same value, for example + instead of f3f and schif3fahrt/ff=f,5,2 use f3f and schif5fahrt/ff=f,5,2) + +- Change: new characters. + Arbitrary character sequence. Equal sign (=) signs hyphenation points + for OpenOffice.org (like in the example). (In a possible German LaTeX + preprocessor, ff could be replaced with "ff, for a Hungarian one, ssz + with `ssz, according to the German and Hungarian Babel settings.) + +- Start: starting position of the change region. + - begins with 1 (not 0): schif3fahrt/ff=f,5,2 + - start dot doesn't matter: .schif3fahrt/ff=f,5,2 + - numbers don't matter: .s2c2h2i2f3f2ahrt/ff=f,5,2 + - In UTF-8 encoding, use Unicode character positions: össze/sz=sz,2,3 + ("össze" looks "össze" in an ISO 8859-1 8-bit editor). + +- Cut: length of the removed character sequence in the original word. + - In UTF-8 encoding, use Unicode character length: paral·1lel/l=l,5,3 + ("paral·lel" looks "paral·1lel" in an ISO 8859-1 8-bit editor). + +Dictionary developing +--------------------- + +There hasn't been extended PatGen pattern generator for non-standard +hyphenation patterns, yet. + +Fortunatelly, non-standard hyphenation points are forbidden in the PatGen +generated hyphenation patterns, so with a little patch can be develop +non-standard hyphenation patterns also in this case. + +Warning: If you use UTF-8 Unicode encoding in your patterns, call +substrings.pl with UTF-8 parameter to calculate right +character positions for non-standard hyphenation: + +./substrings.pl input output UTF-8 + +Programming +----------- + +Use hyphenate2() or hyphenate3() to handle non-standard hyphenation. +See hyphen.h for the documentation of the hyphenate*() functions. +See example.c for processing the output of the hyphenate*() functions. + +Warning: change characters are lower cased in the source, so you may need +case conversion of the change characters based on input word case detection. +For example, see OpenOffice.org source +(lingucomponent/source/hyphenator/altlinuxhyph/hyphen/hyphenimp.cxx). + +László Németh +<nemeth (at) openoffice.org> |