666 lines
31 KiB
HTML
666 lines
31 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
|
|
|
|
"http://www.w3.org/TR/html4/loose.dtd">
|
|
|
|
<html>
|
|
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<meta http-equiv="Content-Language" content="en-us">
|
|
<title>UCD: Unicode NamesList File Format</title>
|
|
<link rel="stylesheet" type="text/css" href="http://www.unicode.org/reports/reports-v2.css">
|
|
</head>
|
|
|
|
<body bgcolor="#ffffff">
|
|
|
|
<table class="header">
|
|
<tr>
|
|
<td class="icon"><a href="http://www.unicode.org"><img border="0" src="http://www.unicode.org/webscripts/logo60s2.gif" align="middle" alt="[Unicode]" width="34" height="33"></a> <a class="bar" href="http://www.unicode.org/ucd/">Unicode
|
|
Character Database</a></td>
|
|
</tr>
|
|
<tr>
|
|
<td class="gray"> </td>
|
|
</tr>
|
|
</table>
|
|
<div class="body">
|
|
<h1>UnicodeĀ® NamesList File Format</h1>
|
|
<table class="simple" width="90%">
|
|
<tbody>
|
|
<tr>
|
|
<td valign="top" width="144">Revision</td>
|
|
<td valign="top">12.1.0</td>
|
|
</tr>
|
|
<tr>
|
|
<td valign="top" width="144">Authors</td>
|
|
<td valign="top">Asmus Freytag, Ken Whistler</td>
|
|
</tr>
|
|
<tr>
|
|
<td valign="top" width="144">Date</td>
|
|
<td valign="top">2019-03-08</td>
|
|
</tr>
|
|
<tr>
|
|
<td valign="top" width="144">This Version</td>
|
|
<td valign="top">
|
|
<a href="http://www.unicode.org/Public/12.1.0/ucd/NamesList.html">
|
|
http://www.unicode.org/Public/12.1.0/ucd/NamesList.html</a></td>
|
|
</tr>
|
|
<tr>
|
|
<td valign="top" width="144">Previous Version</td>
|
|
<td valign="top">
|
|
<a href="http://www.unicode.org/Public/12.0.0/ucd/NamesList.html">
|
|
http://www.unicode.org/Public/12.0.0/ucd/NamesList.html</a></td>
|
|
</tr>
|
|
<tr>
|
|
<td valign="top" width="144">Latest Version</td>
|
|
<td valign="top"><a href="http://www.unicode.org/Public/UCD/latest/ucd/NamesList.html">http://www.unicode.org/Public/UCD/latest/ucd/NamesList.html</a></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p> </p>
|
|
<h3><i>Summary</i></h3>
|
|
<blockquote>
|
|
<p>This file describes the format and contents of NamesList.txt</p>
|
|
</blockquote>
|
|
<h3><i>Status</i></h3>
|
|
<blockquote>
|
|
<p><i>The file and the files described herein are part of the <a href="http://www.unicode.org/ucd/">Unicode
|
|
Character Database</a> (UCD). The Unicode <a href="http://www.unicode.org/terms_of_use.html">
|
|
Terms of Use</a> apply.</i></p>
|
|
</blockquote>
|
|
<hr width="50%">
|
|
|
|
<h2>1.0 <a name="Introduction" href="#Introduction">Introduction</a></h2>
|
|
|
|
<p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain
|
|
text file used to drive the layout of the character code charts in the Unicode
|
|
Standard. The information in this file is a combination of several fields from
|
|
the UnicodeData.txt and Blocks.txt files, together with additional annotations
|
|
for many characters.</p>
|
|
<p>This document describes the syntax rules for the file
|
|
format, but also gives brief information on how each construct is rendered
|
|
when laid out for the code charts. Some of the syntax elements are used only in
|
|
preparation of the drafts of the code charts and are not present in the final,
|
|
released form of the NamesList.txt file.</p>
|
|
|
|
<p>Over time, the syntax has been extended by adding new features. The syntax for formal aliases and index tabs was introduced with Unicode
|
|
5.0. The syntax for marginal sidebar comments is utilized extensively in
|
|
draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset
|
|
declaration in a comment at the head of the file were introduced after Unicode
|
|
6.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction
|
|
in comments and aliases in the names list format was loosened from the prior
|
|
limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0.</p>
|
|
|
|
<p>The same input file can be used for the preparation of drafts and final editions for ISO/IEC
|
|
10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some
|
|
information in the name list file that is not needed (and in fact removed
|
|
during parsing) for the Unicode code charts.</p>
|
|
|
|
<p>With access to the layout program (<a href="http://www.unicode.org/unibook/">Unibook</a>) it is a simple matter of
|
|
creating name lists for the purpose of formatting working drafts or other documents containing
|
|
proposed characters.</p>
|
|
<p>The content of the NamesList.txt file is optimized for code chart creation.
|
|
Some information that can be inferred by the reader from context has been
|
|
suppressed to make the code charts more readable. See the chapter on Code
|
|
Charts in the <a href="http://www.unicode.org/versions/latest">Unicode
|
|
Standard</a>.</p>
|
|
|
|
<h3>1.1 <a name="Overview" href="#Overview">NamesList File Overview</a></h3>
|
|
|
|
<p>The NamesList files are plain text files which in their most simple form look
|
|
like this:</p>
|
|
|
|
<p>@@<tab>0020<tab>BASIC LATIN<tab>007F<br>
|
|
; this is a file comment (ignored)<br>
|
|
0020<tab>SPACE<br>
|
|
0021<tab>EXCLAMATION MARK<br>
|
|
0022<tab>QUOTATION MARK<br>
|
|
. . . <br>
|
|
007F<tab>DELETE</p>
|
|
|
|
<p>The semicolon (as first character), @ and <tab> characters are used
|
|
by the file syntax and must be provided as shown. Hexadecimal digits must be
|
|
in UPPERCASE. A double @@ introduces a block header, with the title, and
|
|
start and ending code of the block provided as shown.</p>
|
|
|
|
<p>For a minimal name list, only the NAME_LINE and BLOCKHEADER and
|
|
their constituent syntax elements are needed.</p>
|
|
|
|
<p>The full syntax with all the options is provided in the following sections.</p>
|
|
|
|
<h2>2.0 <a name="FileStructure" href="#FileStructure">NamesList File Structure</a></h2>
|
|
|
|
<p>This section defines the overall file structure</p>
|
|
|
|
<pre><strong>NAMELIST: TITLE_PAGE* EXTENDED_BLOCK*
|
|
</strong>
|
|
<strong>TITLE_PAGE: TITLE
|
|
| TITLE_PAGE SUBTITLE
|
|
| TITLE_PAGE SUBHEADER
|
|
| TITLE_PAGE IGNORED_LINE
|
|
| TITLE_PAGE EMPTY_LINE
|
|
| TITLE_PAGE NOTICE_LINE
|
|
| TITLE_PAGE COMMENT_LINE
|
|
| TITLE_PAGE PAGEBREAK
|
|
| TITLE_PAGE FILE_COMMENT
|
|
| FILE_COMMENT
|
|
|
|
|
|
EXTENDED_BLOCK: BLOCK
|
|
| BLOCK SUMMARY
|
|
|
|
|
|
BLOCK: BLOCKHEADER
|
|
| BLOCKHEADER INDEX_TAB
|
|
| BLOCK CHAR_ENTRY
|
|
| BLOCK SUBHEADER
|
|
| BLOCK NOTICE_LINE
|
|
| BLOCK EMPTY_LINE
|
|
| BLOCK IGNORED_LINE
|
|
| BLOCK SIDEBAR_LINE
|
|
| BLOCK PAGEBREAK
|
|
| BLOCK FILE_COMMENT
|
|
| BLOCK CROSS_REF
|
|
|
|
|
|
CHAR_ENTRY: NAME_LINE | RESERVED_LINE
|
|
| CHAR_ENTRY ALIAS_LINE
|
|
| CHAR_ENTRY FORMALALIAS_LINE
|
|
| CHAR_ENTRY COMMENT_LINE
|
|
| CHAR_ENTRY CROSS_REF
|
|
| CHAR_ENTRY DECOMPOSITION
|
|
| CHAR_ENTRY COMPAT_MAPPING
|
|
| CHAR_ENTRY IGNORED_LINE
|
|
| CHAR_ENTRY EMPTY_LINE
|
|
| CHAR_ENTRY NOTICE_LINE
|
|
| CHAR_ENTRY FILE_COMMENT
|
|
| CHAR_ENTRY VARIATION_LINE
|
|
</strong></pre>
|
|
|
|
<p>In other words:</p>
|
|
<p>
|
|
Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER. </p>
|
|
<p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, NOTICE_LINE,
|
|
EMPTY_LINE, IGNORED_LINE and FILE_COMMENT may occur before the first BLOCKHEADER.</p>
|
|
<ul>
|
|
<li>CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, VARIATION_LINE, ALIAS and FORMALALIAS_LINE lines
|
|
occurring before the first block header are treated as if they were
|
|
COMMENT_LINEs.</li>
|
|
</ul>
|
|
<p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted
|
|
sequence of the following lines may occur (in any order and repeated as often
|
|
as needed): ALIAS_LINE, CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, FORMALALIAS_LINE, NOTICE_LINE,
|
|
EMPTY_LINE, IGNORED_LINE, VARIATION_LINE and FILE_COMMENT.</p>
|
|
<ul>
|
|
<li>The conventional order of elements in a char entry: NAME_LINE,
|
|
FORMALALIAS_LINE, ALIAS, COMMENT_LINE or NOTICE_LINE, CROSS_REFs, VARIATION_LINE, and optionally
|
|
ending in either DECOMPOSITION or COMPAT_MAPPING is not enforced by the layout program
|
|
(<a href="http://www.unicode.org/unibook/">Unibook</a>). </li>
|
|
</ul>
|
|
<p>Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and
|
|
FILE_COMMENT, none of these lines may
|
|
occur in any other place.</p>
|
|
<ul>
|
|
<li>A NOTICE_LINE or CROSS_REF displays differently depending on whether it follows a header or title
|
|
or is part of a CHAR_ENTRY</li>
|
|
</ul>
|
|
<p>A PAGEBREAK may appear anywhere, except the middle of a CHARACTER_ENTRY.
|
|
A PAGEBREAK before the file title lines may not be supported. INDEX_TABs may
|
|
appear after any block header.</p>
|
|
<p>If the first line of a file is a file comment, it may contain a UTF-8
|
|
charset declaration (see below). Alternatively, or in addition, a BOM may be
|
|
present at the very beginning of the file, forcing the encoding to be
|
|
interpreted as UTF-16 (little-endian only) or UTF-8. When
|
|
declared as UTF-8, the names list format will support use of characters in
|
|
the range U+0020..U+02FF in LINE and LABEL elements. Otherwise,
|
|
the supported repertoire is limited to Latin-1, and attempted use of characters outside
|
|
the Latin-1 range will result in data corruption.</p>
|
|
<p>Several of these elements, while part of the formal definition of the
|
|
file format, do not occur in final published versions of
|
|
NamesList.txt in the UCD.</p>
|
|
|
|
<h4>Blocks followed by Summaries</h4>
|
|
<p>A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:</p>
|
|
<pre><strong>
|
|
SUMMARY: ALTGLYPH_SUMMARY
|
|
| VARIATION SUMMARY
|
|
| ALTGLYPH_SUMMARY VARIATION_SUMMARY
|
|
| MIXED_SUMMARY
|
|
|
|
ALTGLYPH_SUMMARY: ALTGLYPH_SUBHEADER
|
|
| ALTGLYPH_SUMMARY SUMMARY_LINE
|
|
|
|
VARIATION_SUMMARY: VARIATION_SUBHEADER
|
|
| VARIATION_SUMMARY SUMMARY_LINE
|
|
|
|
MIXED_SUMMARY: MIXED_SUBHEADER
|
|
| MIXED_SUMMARY SUMMARY_LINE
|
|
|
|
SUMMARY_LINE: SUBHEADER
|
|
| NOTICE_LINE
|
|
| FILE_COMMENT
|
|
| EMPTY_LINE</strong>
|
|
</pre>
|
|
|
|
<p>When formatted for display, each summary will recap the information presented in the VARIATION_LINE elements
|
|
of the preceding block, grouped by alternate glyph variants and standardized variation sequences, and
|
|
preceded by the corresponding subheader. Additional SUBHEADER and NOTICE lines, if provided, immediately
|
|
follow the ALTGLYPH_SUBHEADER, VARIATION_SUBHEADER or MIXED_SUBHEADER. There is no provision to provide subheaders that are
|
|
interspersed between items in the summary.</p>
|
|
|
|
<p>These syntax constructs are entirely optional. If the ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER are
|
|
omitted from the names list, but the preceding block nevertheless contains VARIATION_LINE elements
|
|
as described below, Unibook will automatically generate any required summaries using a default format for the headers.</p>
|
|
|
|
<p>Thus, the main purpose for providing ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER elements would be to
|
|
provide specific contents for these summary titles as well as allow the ability to add additional
|
|
information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list
|
|
is machine generated and will always explicitly provide any summary subheaders.</p>
|
|
|
|
<h3>2.1 <a name="FileElements" href="#FileElements">NamesList File Elements</a></h3>
|
|
|
|
<p>This section provides the details of the syntax for the individual elements.</p>
|
|
|
|
<pre><strong>ELEMENT SYNTAX</strong> // How rendered
|
|
|
|
<strong>NAME_LINE: CHAR TAB NAME LF</strong>
|
|
// The CHAR and the corresponding image are echoed,
|
|
// followed by the name as given in NAME
|
|
|
|
<strong> | CHAR TAB "<" LCNAME ">" LF</strong>
|
|
// Control and noncharacters use this form of
|
|
// lowercase, bracketed pseudo character name
|
|
|
|
<strong> | CHAR TAB NAME SP COMMENT LF</strong>
|
|
// Names may have a comment, which is stripped off
|
|
// unless the file is parsed for an ISO style list
|
|
|
|
<strong> | CHAR TAB "<" LCNAME ">" SP COMMENT LF</strong>
|
|
// Control and noncharacters may also have comments
|
|
|
|
<strong>RESERVED_LINE: CHAR TAB "<reserved>" LF</strong>
|
|
// The CHAR is echoed followed by an icon for the
|
|
// reserved character and a fixed string e.g. "<reserved>"
|
|
|
|
<strong>COMMENT_LINE: TAB "*" SP EXPAND_LINE</strong>
|
|
// * is replaced by BULLET, output line as comment
|
|
|
|
<strong> | TAB EXPAND_LINE</strong>
|
|
// Output line as comment
|
|
|
|
<strong>ALIAS_LINE: TAB "=" SP LINE</strong>
|
|
// Replace = by itself, output line as alias
|
|
|
|
<strong>FORMALALIAS_LINE:
|
|
TAB "%" SP NAME LF</strong>
|
|
// Replace % by U+203B, output line as formal alias
|
|
|
|
<strong>CROSS_REF: TAB "x" SP CHAR SP LCNAME LF
|
|
| TAB "x" SP CHAR SP "<" LCNAME ">" LF</strong>
|
|
// x is replaced by a right arrow
|
|
|
|
<strong> | TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF
|
|
| TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF</strong>
|
|
// x is replaced by a right arrow;
|
|
// (second type as used for control and noncharacters)
|
|
|
|
// In the forms with parentheses the "(","-" and ")" are removed
|
|
// and the order of CHAR and LCNAME is reversed;
|
|
// i.e. all inputs result in the same order of output
|
|
|
|
<strong> | TAB "x" SP CHAR LF</strong>
|
|
// x is replaced by a right arrow
|
|
// (this type is the only one without LCNAME
|
|
// and is used for ideographs)
|
|
|
|
<strong>VARIATION_LINE: TAB "~" SP CHAR VARSEL SP LABEL LF
|
|
| TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")"LF</strong>
|
|
// output standardized variation sequence or simply the char code in case of alternate
|
|
// glyphs, followed by the alternate glyph or variation glyph and the label and context
|
|
|
|
<strong>FILE_COMMENT: ";" LINE</strong>
|
|
|
|
<strong>EMPTY_LINE: LF</strong>
|
|
// Empty and ignored lines as well as
|
|
// file comments are ignored
|
|
|
|
<strong>IGNORED_LINE: TAB ";" LINE</strong>
|
|
// Ignore LINE
|
|
|
|
<strong>SIDEBAR_LINE: ";;" LINE</strong>
|
|
// Output LINE as marginal note
|
|
|
|
<strong>DECOMPOSITION: TAB ":" SP EXPAND_LINE</strong>
|
|
// Replace ':' by EQUIV, expand line into
|
|
// decomposition
|
|
|
|
<strong>COMPAT_MAPPING: TAB "#" SP EXPAND_LINE
|
|
| TAB "#" SP "<" TAG ">" SP EXPAND_LINE</strong>
|
|
// Replace '#' by APPROX, output line as mapping;
|
|
// check for balanced < >
|
|
|
|
<strong>NOTICE_LINE: "@+" TAB LINE</strong>
|
|
// Output LINE as notice
|
|
|
|
<strong> | "@+" TAB * SP LINE</strong>
|
|
// Output LINE as notice
|
|
// "*" expands to a bullet character
|
|
// Notices following a character code apply to the
|
|
// character and are indented. Notices not following
|
|
// a character code apply to the page/block/column
|
|
// and are italicized, but not indented
|
|
|
|
<strong>TITLE: "@@@" TAB LINE</strong>
|
|
// Output LINE as text
|
|
// Title is used in page headers
|
|
|
|
<strong>SUBTITLE: "@@@+" TAB LINE</strong>
|
|
// Output LINE as subtitle
|
|
|
|
<strong>SUBHEADER: "@" TAB LINE</strong>
|
|
// Output LINE as column header
|
|
|
|
<strong>VARIATION_SUBHEADER:</strong> <strong>"@~" TAB LINE</strong>
|
|
// Output LINE as column header (summary subheader)
|
|
<strong>| "@~"</strong>
|
|
// Output a default standard variation sequences summary subheader
|
|
<strong>| "@~" TAB "!"</strong>
|
|
// Suppress output of a default standard variant sequences summary subheader
|
|
// and disable display of summary
|
|
<strong>| "@~" TAB "!" VARSEL_LIST</strong>
|
|
<strong>| "@~" TAB "!" VARSEL_LIST LINE</strong>
|
|
// Output a standard summary subheader, using default or LINE respectively
|
|
// Suppress any std variation sequences using selectors from the list
|
|
|
|
<strong>ALTGLYPH_SUBHEADER:</strong> <strong>"@@~" TAB LINE</strong>
|
|
// Output LINE as column header (summary subheader)
|
|
<strong>| "@@~"</strong>
|
|
// Output a default alternate glyph summary subheader
|
|
<strong>| "@@~" TAB "!"</strong>
|
|
// Suppress output of a default alternate glyph summary subheader
|
|
// and disable display of summary
|
|
|
|
<strong>MIXED_SUBHEADER: </strong><strong>"@@@~" TAB LINE</strong>
|
|
// Output LINE as column header (summary subheader)
|
|
<strong>| "@@@~"</strong>
|
|
// Output a default combined variation and alternate glyph summary subheader
|
|
<strong>| "@@@~" TAB "!"</strong>
|
|
// Suppress output of a default alternate glyph summary subheader
|
|
// and disable display of summary
|
|
<strong>| "@@@~" TAB "!" VARSEL_LIST</strong>
|
|
<strong>| "@@@~" TAB "!" VARSEL_LIST LINE</strong>
|
|
// Output a combined summary subheader, using default or LINE respectively
|
|
// Suppress any std variation sequences using selectors from the list
|
|
|
|
<strong>BLOCKHEADER: "@@" TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF</strong>
|
|
// Cause a page break and optional
|
|
// blank page, then output one or more charts
|
|
// followed by the list of character names.
|
|
// Use BLOCKSTART and BLOCKEND to define
|
|
// what characters belong to a block.
|
|
// Use BLOCKNAME in page and table headers
|
|
|
|
<strong>BLOCKNAME: LABEL
|
|
| LABEL SP "(" LABEL ")"</strong>
|
|
// If an alternate label is present it replaces
|
|
// the BLOCKNAME when an ISO-style names list is
|
|
// laid out; it is ignored in the Unicode charts
|
|
|
|
<strong>BLOCKSTART: CHAR</strong> // First character position in block
|
|
<strong>BLOCKEND: CHAR</strong> // Last character position in block
|
|
<strong>PAGEBREAK: "@@"</strong> // Insert a (column) break
|
|
<strong>INDEX_TAB: "@@+"</strong> // Start a new index tab at latest BLOCKSTART
|
|
|
|
<strong>EXPAND_LINE: {ESC_CHAR | CHAR | STRING | ESC +}+ LF</strong>
|
|
// Instances of CHAR (see Notes) are replaced by
|
|
// CHAR NBSP x NBSP where x is the single Unicode
|
|
// character corresponding to CHAR.
|
|
// If character is combining, it is replaced with
|
|
// CHAR NBSP <circ> x NBSP where <circ> is the
|
|
// dotted circle
|
|
</pre>
|
|
|
|
|
|
<b>Notes:</b><ul>
|
|
<li>Blocks must be aligned on 16-code point boundary and contain an integer
|
|
multiple of 16-code point columns. The exception to that rule is for blocks of
|
|
ideographs, <i>etc.</i>, for which no names are listed in the file. The BLOCKEND for such blocks
|
|
must correspond to the last assigned character, and not the actual end of the block.</li>
|
|
<li>Blocks must be non-overlapping and in ascending order. NAME_LINEs
|
|
must be in ascending order and follow the block header for the block to
|
|
which they belong. </li>
|
|
<li>Reserved entries are optional, and will normally be supplied automatically. They are
|
|
required whenever followed by ALIAS_LINE, COMMENT_LINE, NOTICE_LINE or CROSS_REF.
|
|
</li>
|
|
<li>An empty alternative glyph summary subheader expression will result in default header "Selected Alternative Glyphs"</li>
|
|
<li>An empty standard variation subheader expression will result in the default header "Standardized Variation Sequences"</li>
|
|
<li> A VARSEL_LIST may only contain code points for standard variation selectors (including script specific ones)</li>
|
|
<li>When displaying a VARIATION_LINE for alternate glyphs, the "ALTn" selector is not displayed. </li>
|
|
<li>If a glyph is unavailable for the variant glyph in a VARIATION_LINE it is replaced by the glyph for LIGHT SCREEN.</li>
|
|
</ul>
|
|
|
|
|
|
<h3>2.2 <a name="FilePrimitives" href="#FilePrimitives">NamesList File Primitives</a></h3>
|
|
|
|
<p>The following are the primitives and terminals for the NamesList syntax.</p>
|
|
|
|
<pre><strong>LINE</strong>: <strong>STRING LF
|
|
COMMENT: "(" LABEL ")"
|
|
| "(" LABEL ")" SP "*"
|
|
| "*"</strong>
|
|
|
|
<strong>NAME</strong>: <sequence of uppercase ASCII letters, digits, space and hyphen>
|
|
<strong>LCNAME</strong>: <sequence of lowercase ASCII letters, digits, space and hyphen>
|
|
<strong>| LCNAME "-" CHAR</strong>
|
|
|
|
<strong>TAG</strong>: <sequence of ASCII letters>
|
|
<strong>LCTAG</strong>: <sequence of lowercase ASCII letters>
|
|
<strong>STRING</strong>: <sequence of characters in the range U+0020..U+02FF, except controls>
|
|
<strong>LABEL</strong>: <sequence of characters in the range U+0020..U+02FF, except controls, "(" or ")">
|
|
<strong>VARSEL</strong>: <strong>CHAR
|
|
| ALT ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )</strong>
|
|
<strong>VARSEL_LIST</strong>: <strong>"{" CHAR_LIST "}"</strong>
|
|
<strong>CHAR_LIST</strong>: <strong>CHAR
|
|
| CHAR_LIST SP CHAR</strong>
|
|
<strong>CHAR</strong>: <strong>X X X X</strong>
|
|
<strong>| X X X X X </strong>
|
|
<strong>| X X X X X X </strong>
|
|
<strong>X</strong>: <strong>"0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F"</strong>
|
|
<strong>ESC_CHAR</strong>: <strong>ESC CHAR</strong>
|
|
<strong>ESC</strong>: <strong>"\"</strong>
|
|
// Special semantics of backslash (\) are supported
|
|
// only in EXPAND_LINE.
|
|
<strong>TAB</strong>: <sequence of one or more ASCII tab characters 0x09>
|
|
<strong>SP</strong>: <ASCII 20>
|
|
<strong>LF</strong>: <any sequence of ASCII 0A and 0D>
|
|
</pre>
|
|
|
|
<p><b>Notes:</b></p>
|
|
<ul>
|
|
<li>Multiple or leading spaces, multiple or leading hyphens, as well as
|
|
word-initial digits in NAMEs or LCNAMEs are illegal.</li>
|
|
<li>The French version of the names list uses French rules, which allow
|
|
apostrophe and accented letters in character names.</li>
|
|
<li>When names containing code points are lowercased to make them LCNAMEs,
|
|
the code point values remain uppercase. Such code points by convention
|
|
follow a hyphen and are the last element in the name.</li>
|
|
<li>Special lookahead logic prevents a 4 digit number for a standard, such
|
|
as ISO 9999 from being misinterpreted as ISO CHAR. Currently recognized are
|
|
"ISO", "DIN", "IEC" and "S X" as well as "S C" for the JIS X and JIS C series of
|
|
standards. For other standards, or for four-digit years in a comment, use a
|
|
NOTICE_LINE instead, which prevents expansion, or use '\" to escape the digits.</li>
|
|
<li>Single and double straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
|
|
Smart apostrophes are supported, but nested quotes are not.
|
|
Single quotes can only be applied around a single word.</li>
|
|
<li>A CHAR inside ' or " is expanded, but only its glyph image is printed, the
|
|
code value is not echoed.</li>
|
|
<li>Inside an EXPAND_LINE, backslash is treated as an escape character that
|
|
removes the special meaning of any literal character and also prevents
|
|
the following digit sequence from being expanded. A backslash character in
|
|
isolation is never displayed. A sequence of two backslash characters results
|
|
in display of a single backslash, but has no effect on the interpretation
|
|
of following characters.</li>
|
|
<li>The hyphen in a character range CHAR-CHAR is replaced by an EN DASH on
|
|
output.</li>
|
|
<li>The NamesList.txt file is encoded in UTF-8 if the <i>first line</i> is a
|
|
FILE_COMMENT containing the declaration "UTF-8" or any casemap variation
|
|
thereof. Otherwise the file is encoded in Latin-1 (older versions). Beyond
|
|
detecting the charset declaration (typically: "; charset=utf-8") the
|
|
remainder of that comment is ignored.
|
|
If the file is not encoded as
|
|
UTF-8, the character repertoire for running text (anything
|
|
other than CHAR) is effectively restricted to the repertoire of Latin-1.
|
|
Otherwise, characters in the range U+0020..U+02FF
|
|
are allowed in STRING or LABEL elements, and elements derived from them.</li>
|
|
<li>The code chart layout program
|
|
(<a href="http://www.unicode.org/unibook/">Unibook</a>)
|
|
can accept files in several other formats. These include little-endian UTF-16,
|
|
prefixed with a BOM, or UTF-8 prefixed with the UTF-8 BOM.</li>
|
|
<li>While the format allows multiple <tab> characters, by convention the
|
|
actual number of tabs is always one or two, chosen to provide the best
|
|
layout of the plain text file.</li>
|
|
<li>Earlier published versions of the NamesList.txt file may contain trailing or otherwise extraneous
|
|
spaces or tab characters; while these are errors in the files, they are not
|
|
being corrected, to retain stability of the published versions. Anyone
|
|
writing a parser for older versions of this file may need to be prepared to
|
|
handle such exceptions.</li>
|
|
<li>The final LF in the file must be present.</li>
|
|
</ul>
|
|
<h2><a name="Modifications" href="#Modifications">Modifications</a></h2>
|
|
|
|
<p><b>Version 12.1.0</b></p>
|
|
<ul>
|
|
<li>Reissued for Unicode 12.1.0.</li>
|
|
</ul>
|
|
<p><b>Version 12.0.0</b></p>
|
|
<ul>
|
|
<li>Reissued for Unicode 12.0.0.</li>
|
|
<li>Added definition of TAG (allowing uppercase letters), distinct from LCTAG.</li>
|
|
<li>Corrected definition of VARIATION_LINE to use LCTAG instead of LCNAME.</li>
|
|
<li>Corrected definition of COMPAT_MAPPING to use TAG instead of LCTAG.</li>
|
|
<li>Corrected the documentation regarding which elements allow use of characters
|
|
in the range U+0020..U+02FF.</li>
|
|
</ul>
|
|
<p><b>Version 11.0.0</b></p>
|
|
<ul>
|
|
<li>Reissued for Unicode 11.0.0.</li>
|
|
<li>Loosened the limitation on repertoire allowed in LINE and LABEL
|
|
elements to include characters outside Latin-1, in the range
|
|
U+0100..U+02FF.</li>
|
|
</ul>
|
|
<p><b>Version 10.0.0</b></p>
|
|
<ul>
|
|
<li>Reissued for Unicode 10.0.0.</li>
|
|
</ul>
|
|
<p><b>Version 9.0.0</b></p>
|
|
<ul>
|
|
<li>Reissued for Unicode 9.0.0.</li>
|
|
</ul>
|
|
<p><b>Version 8.0.0</b></p>
|
|
<ul>
|
|
<li>Reissued for Unicode 8.0.0.</li>
|
|
<li>Added MIXED_SUBHEADER, VARSEL_LIST, and CHAR_LIST to the syntax.</li>
|
|
<li>Tweaked BNF and notes for variation summaries.</li>
|
|
</ul>
|
|
<p><b>Version 7.0.0</b></p>
|
|
<ul>
|
|
<li>Reissued for Unicode 7.0.0.</li>
|
|
</ul>
|
|
<p><b>Version 6.3.0</b></p>
|
|
<ul>
|
|
<li>Reissued for Unicode 6.3.0.</li>
|
|
</ul>
|
|
<p><b>Version 6.2.0</b></p>
|
|
<ul>
|
|
<li>Edited the variation syntax definitions, description and corresponding notes for wording.</li>
|
|
<li>Minor tweaks to the layout of BNF syntax, mostly adding tabs and | characters as needed.</li>
|
|
<li>Fixed some typographical errors and minor inconsistencies.</li>
|
|
<li>Added syntax for elements required by variation sequence and alternate glyph summaries.</li>
|
|
<li>Edited and reformatted some notes for readability.</li>
|
|
<li>Documented the permitted presence of CROSS_REF outside character entries within blocks.
|
|
Such CROSS_REFs have been present in published names lists, but that information was missing in
|
|
the syntax description. For an example see the Currency Symbols block in the code charts.</li>
|
|
<li>Added description of UTF-8 charset declaration and file encoding.</li>
|
|
</ul>
|
|
<p><b>Version 6.1.0</b></p>
|
|
<ul>
|
|
<li>Removed constraint that LCTAG consist only of lowercase letters,
|
|
because of the existence of the "noBreak" tag.</li>
|
|
</ul>
|
|
<p><b>Version 6.0.0</b></p>
|
|
<ul>
|
|
<li>Added definitions for ESC_CHAR and ESC primitives.</li>
|
|
<li>Clarified interpretation of backslash escapes in EXPAND_LINE.</li>
|
|
</ul>
|
|
<p><b>Version 5.2.0</b></p>
|
|
<ul>
|
|
<li>Better aligned the rules section with the actual published files and
|
|
behavior of existing parsers. This included fixing some obvious typos
|
|
and clarifying some notes as well as the following changes, which are
|
|
listed individually.</li>
|
|
<li>Replaced instances of <tab> by TAB throughout.</li>
|
|
<li>NAME_LINE for special names may have trailing COMMENTs including COMMENTs
|
|
consisting entirely of "*".</li>
|
|
<li>In CROSS_REF added the form without LCNAME, fixed the literal to the
|
|
correct lowercase "x" and noted that LCNAME may have "<" and ">" around
|
|
it in the data. Also added missing LF in the rules.</li>
|
|
<li>Removed a redundant rule for BLOCKHEADER.</li>
|
|
<li>Changed FORMALALIAS_LINE from LINE to NAME to match actual restriction
|
|
on contents.</li>
|
|
<li>Extended the documentation of lookahead logic for CHAR.</li>
|
|
<li>Accounted for FILE_COMMENT in overall file structure.</li>
|
|
</ul>
|
|
<p><b>Version 5.1.0</b></p>
|
|
<ul>
|
|
<li>Noted that comments in NAME_LINEs must be preceded by SP.</li>
|
|
<li>Provided additional information on allowable characters in names.</li>
|
|
<li>Added SIDEBAR_LINE.</li>
|
|
<li>Noted that CROSS_REF must contain a SP and CHAR, and that
|
|
COMPAT_MAPPING must contain a SP and may contain a <tag></li>
|
|
<li>Noted that LCNAME may contain uppercase characters under
|
|
exceptional circumstances.</li>
|
|
<li>Relaxed the restriction on lines starting with #, :, %, x and = on
|
|
the TITLE_PAGE. These are now treated as comments.</li>
|
|
</ul>
|
|
<p><b>Version 5.0.0</b></p>
|
|
<ul>
|
|
<li>Added FORMALALIAS_LINE and INDEX_TAB to syntax.</li>
|
|
<li>Fixed the list of lines that may appear before a BLOCKHEADER by
|
|
adding NOTICE_LINE.</li>
|
|
<li>Minor fixes to the wording of several syntax definitions.</li>
|
|
</ul>
|
|
<p><b>Version 4.0.0</b></p>
|
|
<ul>
|
|
<li>Fixed syntax to better reflect restrictions on characters
|
|
in character and block names.</li>
|
|
<li>Better document treatment of comments in block names, plus
|
|
French name rules.</li>
|
|
</ul>
|
|
<p><b>Version 3.2.0</b></p>
|
|
<ul>
|
|
<li>Fixed several broken links, added a left margin,
|
|
changed version numbering.</li>
|
|
</ul>
|
|
<p><b>Version 3.1.0 (2)</b></p>
|
|
<ul>
|
|
<li>Use of 4-6 digit hex notation is now supported.</li>
|
|
</ul>
|
|
<hr width="50%">
|
|
<div align="center">
|
|
<center>
|
|
<table cellspacing="0" cellpadding="0" border="0">
|
|
<tr>
|
|
<td><a href="http://www.unicode.org/copyright.html">
|
|
<img src="http://www.unicode.org/img/hb_notice.gif" border="0" alt="Access to Copyright and terms of use" width="216" height="50"></a></td>
|
|
</tr>
|
|
</table>
|
|
<script language="Javascript" type="text/javascript" src="http://www.unicode.org/webscripts/lastModified.js">
|
|
</script>
|
|
</center>
|
|
</div>
|
|
</div>
|
|
|
|
</body>
|
|
|
|
</html>
|
|
|