Contents   Index   Search   Previous   Next

2.1 Character Set

   {character set} The only characters allowed outside of comments are the graphic_characters and format_effectors.
Ramification: Any character, including an other_control_function, is allowed in a comment.
Note that this rule doesn't really have much force, since the implementation can represent characters in the source in any way it sees fit. For example, an implementation could simply define that what seems to be a non-graphic, non-format-effector character is actually a representation of the space character.
Discussion: It is our intent to follow the terminology of ISO 10646 BMP where appropriate, and to remain compatible with the character classifications defined in A.3, ``Character Handling''. Note that our definition for graphic_character is more inclusive than that of ISO 10646-1.


character ::= graphic_character | format_effector | other_control_function
graphic_character ::= identifier_letter | digit | space_character | special_character

Static Semantics

   The character repertoire for the text of an Ada program consists of the collection of characters called the Basic Multilingual Plane (BMP) of the ISO 10646 Universal Multiple-Octet Coded Character Set, plus a set of format_effectors and, in comments only, a set of other_control_functions; the coded representation for these characters is implementation defined [(it need not be a representation defined within ISO-10646-1)].
Implementation defined: The coded representation for the text of an Ada program.
   The description of the language definition in this International Standard uses the graphic symbols defined for Row 00: Basic Latin and Row 00: Latin-1 Supplement of the ISO 10646 BMP; these correspond to the graphic symbols of ISO 8859-1 (Latin-1); no graphic symbols are used in this International Standard for characters outside of Row 00 of the BMP. The actual set of graphic symbols used by an implementation for the visual representation of the text of an Ada program is not specified. {unspecified [partial]}
   The categories of characters are defined as follows:
   {identifier_letter} identifier_letter
upper_case_identifier_letter | lower_case_identifier_letter
Discussion: We use identifier_letter instead of simply letter because ISO 10646 BMP includes many other characters that would generally be considered "letters."
   {upper_case_identifier_letter} upper_case_identifier_letter
Any character of Row 00 of ISO 10646 BMP whose name begins ``Latin Capital Letter''.
   {lower_case_identifier_letter} lower_case_identifier_letter
Any character of Row 00 of ISO 10646 BMP whose name begins ``Latin Small Letter''.
This paragraph was deleted.To be honest: {8652/0001} The above rules do not include the ligatures Æ and æ. However, the intent is to include these characters as identifier letters. This problem was pointed out by a comment from the Netherlands.
    {digit} digit
One of the characters 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9.
    {space_character} space_character
The character of ISO 10646 BMP named ``Space''.
    {special_character} special_character
Any character of the ISO 10646 BMP that is not reserved for a control function, and is not the space_character, an identifier_letter, or a digit.
Ramification: Note that the no break space and soft hyphen are special_characters, and therefore graphic_characters. They are not the same characters as space and hyphen-minus.
    {format_effector} format_effector
The control functions of ISO 6429 called character tabulation (HT), line tabulation (VT), carriage return (CR), line feed (LF), and form feed (FF). {control character: See also format_effector}
    {other_control_function} other_control_function
Any control function, other than a format_effector, that is allowed in a comment; the set of other_control_functions allowed in comments is implementation defined. {control character: See also other_control_function}
Implementation defined: The control functions allowed in comments.
    {names of special_characters} {special_character (names)} The following names are used when referring to certain special_characters: {quotation mark} {number sign} {ampersand} {apostrophe} {tick} {left parenthesis} {right parenthesis} {asterisk} {multiply} {plus sign} {comma} {hyphen-minus} {minus} {full stop} {dot} {point} {solidus} {divide} {colon} {semicolon} {less-than sign} {equals sign} {greater-than sign} {low line} {underline} {vertical line} {left square bracket} {right square bracket} {left curly bracket} {right curly bracket}
Discussion: These are the ones that play a special role in the syntax of Ada 95, or in the syntax rules; we don't bother to define names for all characters. The first name given is the name from ISO 10646-1; the subsequent names, if any, are those used within the standard, depending on context.
      symbolname      symbolname
         "quotation mark         :colon
         #number sign         ;semicolon
         &ampersand         <less-than sign
         'apostrophe, tick         =equals sign
         (left parenthesis         >greater-than sign
         )right parenthesis         _low line, underline
         *asterisk, multiply         |vertical line
         +plus sign         [left square bracket
         ,comma         ]right square bracket
         -hyphen-minus, minus         {left curly bracket
         .full stop, dot, point         } right curly bracket
         / solidus, divide   

Implementation Permissions

    In a nonstandard mode, the implementation may support a different character repertoire[; in particular, the set of characters that are considered identifier_letters can be extended or changed to conform to local conventions].
Ramification: If an implementation supports other character sets, it defines which characters fall into each category, such as ``identifier_letter,'' and what the corresponding rules of this section are, such as which characters are allowed in the text of a program.
1  Every code position of ISO 10646 BMP that is not reserved for a control function is defined to be a graphic_character by this International Standard. This includes all code positions other than 0000 - 001F, 007F - 009F, and FFFE - FFFF.
2  The language does not specify the source representation of programs.
Discussion: Any source representation is valid so long as the implementer can produce an (information-preserving) algorithm for translating both directions between the representation and the standard character set. (For example, every character in the standard character set has to be representable, even if the output devices attached to a given computer cannot print all of those characters properly.) From a practical point of view, every implementer will have to provide some way to process the ACVC. It is the intent to allow source representations, such as parse trees, that are not even linear sequences of characters. It is also the intent to allow different fonts: reserved words might be in bold face, and that should be irrelevant to the semantics.

Extensions to Ada 83

{extensions to Ada 83} Ada 95 allows 8-bit and 16-bit characters, as well as implementation-specified character sets.

Wording Changes from Ada 83

The syntax rules in this clause are modified to remove the emphasis on basic characters vs. others. (In this day and age, there is no need to point out that you can write programs without using (for example) lower case letters.) In particular, character (representing all characters usable outside comments) is added, and basic_graphic_character, other_special_character, and basic_character are removed. Special_character is expanded to include Ada 83's other_special_character, as well as new 8-bit characters not present in Ada 83. Note that the term ``basic letter'' is used in A.3, ``Character Handling'' to refer to letters without diacritical marks.
Character names now come from ISO 10646.
We use identifier_letter rather than letter since ISO 10646 BMP includes many "letters' that are not permitted in identifiers (in the standard mode).

Contents   Index   Search   Previous   Next   Legal