1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
|
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Mozilla/4.75 [en] (Windows NT 5.0; U) [Netscape]">
</head>
<body>
Hi,
<p>Here are my thoughts regarding positions and AST. The major goal was
to come up with a consistent view, even though this causes some more work
for refactoring and for the implementation of the new AST. But I think
it might be worth if we can come up with a consistent story for positions.
<h3>
Some general statements</h3>
<ol>
<li>
sourceStart and sourceEnd should always cover the whole node. This is different
to the current implementation where for some nodes declarationSourceStart
and declararionSourcEnd covers the whole node and sourceStart and sourceEnd
only covers the name (examples are: LocalVariableDeclaration, TypeDeclararion,
...)</li>
<li>
sourceStart and sourceEnd should also cover all subnodes</li>
<li>
whenever possible we should follow the grammar as defined in The Java Language
Specification book. So if the grammar says that a production includes the
semicolon then the AST node should include it too. For example the grammar
defines a return statement like</li>
<br> return (expression) ;
<br>So the corresponding AST node should include the ;</ol>
<h3>
Some statements from earlier discussions (mainly between Jim, Philippe,
and me)</h3>
<ul>
<li>
There will be an ExpressionStatement node for expressions used as statements.
For example "if (isChecked()) {}" versus "isCheck();". We agreed that the
expression will not include the semicolon and the ExpressionStatement will.
For the isCheck() example this will look like [[isChecked()];]. This is
consistent with the grammer defined in (3). This together with the general
statement (2) leads to the conclusion that statements that have child statements
will include the semicolon if the child statement has one. For the example</li>
<p><br>for (int i= 0; i < 10; i++)
<br> foo();
<p>sourceEnd of the for statement will include the semicolon of the expression
statement.</ul>
<h3>
Open issues</h3>
<h4>
Multiple local declarations</h4>
Currently multiple local declarations appear in the AST as n separate local
declarations without any relationship to each others. This raises various
questions:
<ul>
<li>
what are the positions of those local declarations</li>
<li>
how is a visitor of that AST able to figure out that he deals with multiple
local declaration.</li>
</ul>
Since the new AST isn't a 1:1 mapping of the compiler's AST anyway (we
have the ExpressionStatement node) I opt to introduce new nodes as defined
in the grammar. Since the semicolon doesn't belong to the variable declaration,
it should be managed by the parent node that ties together multiple declarations.
Here is an example:
<p>int x= 10, x[]= null, i;
<p>LocalVariableDeclaration node manages:
<br> the type (e.g. int)
<br> the positions of the commas (if needed)
<br> the actual variable declarators
<br> sourceStart= start of the type
<br> sourceEnd= ;
<p>VariableDeclarator node manages:
<br> the variable name and its positions
<br> the initialization
<br> sourceStart= start of variable name
<br> sourceEnd= end of initialization. Doesn't include
the comma.
<p>If we want to do some optimization we could also have a node SingleLocalVariableDeclaration
for declaration like int x; or int y= 10; The node would have the following
fields:
<br> the type
<br> the variable name and its positions
<br> the initialization
<br> sourceStart= start of type
<br> sourceEnd= ;
<h4>
Updates in for statements</h4>
Analogous to the local variable declaration, the comma to separate the
update expressions can not be part of the expression (expressions don't
contain a semicolon so they can't contain a comma either). To know the
positions of the commas the for statement should manage them in a separate
array.
<p><i>The general rule is, that whenever language elements are separate
using a comma (for example an interface list in the implements statement,
arguments of a method declaration, ...) the node containing the separated
nodes should manage the positions of the comma, if they are of any interest.
In a first implementation we could leave these positions out and use the
scanner to find them if they are of interest.</i>
<h4>
Treatment of semicolon</h4>
From our experiences with refactoring it is helpful in some cases to know
where the position of the semicolon is. For example if the user extract
a for statement and he doesn't select the action's semicolon we allow the
extraction. So what can we do in these cases:
<ul>
<li>
simple don't allow the case. To support better selection we can offer some
actions to extend the text selection to spawn valid AST nodes. We have
a running prototype for this.</li>
<li>
do some parsing of the source code to find the position of the semicolon.
We could use the scanner for this.</li>
</ul>
<h4>
Answers to explicit questions from Olivier</h4>
<ul>
<li>
for (;;); : in this case the for statement should cover the semicolon.
The best way to achieve this is to have an empty statement as defined in
the grammar.</li>
<li>
declaration source start of an argument: yes, Adam uses argument.type.sourceStart
as the start not declarationSourceStart.</li>
<li>
test: the test we have are the refactoring test. We don't have special
test to check if the AST positions are correct.</li>
</ul>
<br>
<br>
</body>
</html>
|