Examples

新语言与C/Pascal/Java的比较

		C	Pascal	Java	新语言
内存访问	指针	显式	显式	隐式	隐式
	数组	浅赋值：指针	深复制	浅赋值：指针	浅赋值：指针
	记录	深复制	深复制	浅复制	浅复制
字符串		字符数组	内置	内置	内置
全局变量		允许	允许	类成员变量	不允许
前置声明		需要	需要	不需要	不需要
函数嵌套		不允许	允许	允许内嵌类	不允许
变量声明		任意块开始处	函数开始处	任意位置	函数开始处
多维数组		连续存储空间	连续存储空间	数组的数组	数组的数组
注释		非嵌套	嵌套	非嵌套	嵌套

改进的建议（这学期不适用）

1. 允许全局变量

2. 函数，变量，类型有独立的名字空间

3. 变量将强制初始化

4. 变量可以在块中间声明

5. 去掉break和continue

6. 去掉逗号表达式

—by marong

Lexical Aspects

A token can be a keyword, an identifier, a integer constant, a character constant, or a string constant. Tokens are separated by whitespaces and comments.

An identifier is a sequence of letters, digits, or underscores, which begins with a letter and does not share its name with a keyword. Note that (i) identifiers cannot start with underscores, and (ii) case is significant in identifiers.

Line terminators are \n, \r, and \r\n. Whitespaces, including spaces, tabs, line terminators, and formfeeds (\f), may appear between tokens.

Comments

There are two types of comments: line comments and block comments.

Line comments starts with two slashes //. Texts behind the two slashes are ignored, until a line terminator is met.

Block comments starts with a /* and ends with a */. Texts in between are ignored. Block comments may nest.

Constants

An integer constant is a sequence of decimal digits (i.e., 0123456789). There are no negative integer constants.

A character constant is one printable character or space, or escape sequences that represents one character, surrounded by a pair of single quotes '.

A string constant is a sequence of zero or more printable characters, spaces, or escape sequences, surrounded by a pair of double quotes ".

Escape sequences begins with a backslash \, and represent some special characters. Escape sequences are

Escape sequences	Meanings
`\n`	Linefeed
`\r`	Carriage return
`\t`	Tab
`\\`	Backslash
`\ddd`	The character with ASCII code ddd (three decimal digits)
`\"`	Double quote (only allowed in a string constant)
`\`'	Single quote (only allowed in a character constant)

Reserved words

Here are the reserved words:

`native`	`record`	`new`	`int`	`string`	`char`	`null`
`if`	`else`	`while`	`for`	`return`	`break`	`continue`
`;`	`[`	`]`	`{`	`}`	`(`	`)`
`,`	`=`	`\|\|`	`&&`	`==`	`!=`	`<`
`<=`	`>`	`>=`	`+`	`-`	`*`	`/`
`%`	`!`	`.`

Input & Output

native int readInt();
native char readChar();
native int printInt(int i);
native int printChar(char c);
native int printString(string s);
native int printLine(string s);

Line break character is \n.

Type Conversions

int i;
char c;
string s;

Convert from int

i = 97;
c = chr(i); // c == 'a'
s = "" + i; // s == "97"

Convert from char

c = 'a';
i = ord(c); // i == 97
s = "" + c; // s == "a"

Convert from string

s = "97";
c = s[0];        // c == '9'
c = s[1];        // c == '7'
i = parseInt(s); // i == 97

Note that parseInt is a contributed function.

String Operations

Creation

string a, b;
a = "hello";
b = a;

b should share the same storage with a.

Indexing

string s;
s = "hello";
s[0] == 'h';
s[1] == 'e';

Strings are read-only. Assignments to string elements (e.g. s[1] = 'a';) should cause errors in semantic analysis.

Length

string s;
s = "hello";
s.length == 5;
"hello".length == 5;

Comparison

By value; in alphabet order

string x, y;
x = "a";
y = "ab";
x == "a" && y == "ab";
x < y && y < "b";

Substring

string s;
s = "hello"
substring(s, 0, s.length) == "hello";
substring(s, 1, 2) == "el";

Concatenation

string s;
s = "hello";
s = s + ", " + 2012;
s == "hello, 2012";

BNF Grammar

%precedence: ELSE => right
%precedence: LBRACKET => left

translation_unit : external_decl
translation_unit : translation_unit external_decl

external_decl : prototype_decl
              | function_def
              | record_def

prototype_decl : NATIVE function_head SEMICOLON

function_def : function_head LBRACE variable_decl_list stmt_list RBRACE
             | function_head LBRACE                    stmt_list RBRACE

record_def : RECORD ID LBRACE variable_decl_list RBRACE

variable_decl_list : variable_decl
variable_decl_list : variable_decl_list variable_decl

function_head : type_specifier ID LPAREN parameter_list RPAREN
              | type_specifier ID LPAREN                RPAREN

parameter_list : parameter_decl
parameter_list : parameter_list COMMA parameter_decl

parameter_decl : type_specifier ID

variable_decl : type_specifier id_list SEMICOLON

type_specifier : INT
               | STRING
               | CHAR
               | ID
type_specifier : type_specifier LRBRACKET

id_list : ID
id_list : id_list COMMA ID

stmt_list : stmt
stmt_list : stmt_list stmt

stmt : compound_stmt
     | expr_stmt
     | selection_stmt
     | iteration_stmt
     | jump_stmt

compound_stmt : LBRACE stmt_list RBRACE
              | LBRACE           RBRACE

expr_stmt : expr SEMICOLON

selection_stmt : IF LPAREN expr RPAREN stmt
               | IF LPAREN expr RPAREN stmt ELSE stmt

iteration_stmt : WHILE LPAREN expr RPAREN stmt
               | FOR LPAREN expr_stmt expr_stmt expr RPAREN stmt
               | FOR LPAREN expr_stmt expr_stmt      RPAREN stmt
               | FOR LPAREN expr_stmt SEMICOLON expr RPAREN stmt
               | FOR LPAREN expr_stmt SEMICOLON      RPAREN stmt
               | FOR LPAREN SEMICOLON expr_stmt expr RPAREN stmt
               | FOR LPAREN SEMICOLON expr_stmt      RPAREN stmt
               | FOR LPAREN SEMICOLON SEMICOLON expr RPAREN stmt
               | FOR LPAREN SEMICOLON SEMICOLON      RPAREN stmt

jump_stmt : RETURN expr SEMICOLON
          | BREAK SEMICOLON
          | CONTINUE SEMICOLON

expr : assignment_expr
expr : expr COMMA assignment_expr

assignment_expr : logical_or_expr
assignment_expr : unary_expr ASSIGN assignment_expr

logical_or_expr : logical_and_expr
logical_or_expr : logical_or_expr OR logical_and_expr

logical_and_expr : equality_expr
logical_and_expr : logical_and_expr AND equality_expr

equality_expr : relational_expr
equality_expr : equality_expr EQ  relational_expr
              | equality_expr NEQ relational_expr

relational_expr : additive_expr
relational_expr : relational_expr LESS       additive_expr
                | relational_expr LESS_EQ    additive_expr
                | relational_expr GREATER    additive_expr
                | relational_expr GREATER_EQ additive_expr

additive_expr : mult_expr
additive_expr : additive_expr PLUS  mult_expr
              | additive_expr MINUS mult_expr

mult_expr : unary_expr
mult_expr : mult_expr MULTIPLY unary_expr
          | mult_expr  DIVIDE  unary_expr
          | mult_expr  MODULO  unary_expr

unary_expr : postfix
unary_expr : PLUS  unary_expr
           | MINUS unary_expr
           | NOT   unary_expr

postfix : primary
postfix : postfix LBRACKET expr RBRACKET
        | postfix LPAREN expr RPAREN
        | postfix LPAREN      RPAREN
        | postfix DOT ID

primary : ID
        | NULL
        | INTEGER
        | CHARACTER
        | STRING_LITERAL
        | LPAREN expr RPAREN
        | NEW type_specifier LBRACKET expr RBRACKET
        | NEW ID

Compiler 2012

Table of Contents