Synopsis - Issue Tracker

Task25 Editing

Title implement symbol tables / symbol lookup for the C++ parser
Created on 2004-10-01.13:47:06 by stefan, last changed 2005-06-19.02:27:27. by stefan
assigned to stefan, volodya priority normal
type design solves
topic components C++ parser
status closed result done
depends on superseded by
Add Comment:
File
CC List:? stefan, volodya

Messages
Author: stefan Date: 2004-10-01.13:47:06
In lack of documentation, here are some general 
ideas:

The PTree::Encoding class holds encoded types /
names for declarators. Instances are generated by
the parser and stored as members of specific 
PTree::Node objects.
I'v added Symbol and Scope classes that I intend
to use for symbol lookup. The idea is to have
a 'Scope' base class that provides (abstract)
methods to store and lookup symbols based on 
encodings.
The Scope class then will be subclassed to provide
specific lookup semantics polymorphically 
depending on context and language (I'm hoping to 
be able to support C++ as well as C).

As far as the Symbol itself is concerned, I'm 
storing right now only its (meta)type, i.e. 
whether it refers to a type, variable, constant, 
etc., as well as its declarator (i.e. ptree 
reference).

This is all still pretty rough, and so I'd 
appreciate if the approach as a whole could be 
validated.
Then we may think about setting up more unit tests
that will show how symbol lookup works in specific 
contexts such as classes involving inheritance and 
template parameters, etc., etc.
Author: volodya Date: 2004-10-01.14:50:49
I surely can't prove the approach is valid, but I've tried the same one here:
https://zigzag.cs.msu.su:7813/svn/ghost/trunk/ocean/freeplusplus/src/scopes.h
https://zigzag.cs.msu.su:7813/svn/ghost/trunk/ocean/freeplusplus/src/scopes_test.cpp

Now I think we need separate lookup for qualified and nonqualified names. For 
example, when looking up 'n' in context of namespace 'N', parent namespaces 
of 'N' are considered. When doing lookup of 'N::n', they are not.

Tests for lookup logic would be great. In what order we'll be testing that? Say, first namespaces, then classes?
Author: stefan Date: 2004-10-01.15:20:53
Encodings can hold qualified names, so we don't 
need to override the 'lookup' method. Of course, 
we can provide private methods 'lookup_qualified' 
and 'lookup_unqualified' to which the 'lookup' can 
delegate depending on the encoding.

As to the tests, there's already the 
'OpenCxx.Scope' test suite which we may add new 
tests to. It would be best to provide descriptive 
names for the tests (that translates to test input 
files for this suite), and then may be put 
comments into the input file explaining the rules 
we try to test (may be with references to the 
appropriate specs).

Initially they will certainly all fail, but having
them in place will be a wonderful tool to measure 
the progress we are making in implementing symbol 
lookup.
Author: volodya Date: 2004-10-02.08:55:03
I'm lost what "Encoding" mean. Is that a 'name'? Is Encoding passed to the lookup method?

As I've mentioned, the problem with OpenCxx.Scope suite is that it dumps the scope, but does not do any lookup. We'd need some way to do lookups from tests. Something like

namespace N
{
     ////lookup N::n
}
and check the output.
Author: stefan Date: 2004-10-02.16:33:24
Vladimir Prus via synopsis issue tracker wrote:
> Vladimir Prus <ghost@cs.msu.su> added the comment:
> 
> I'm lost what "Encoding" mean. Is that a 'name'? Is Encoding passed to the lookup method?

PTree::Encoding is a type (using std::basic_string<unsigned char> as its buffer) that
holds encoded types / names. Details are documented in the code (and extracted into the
ref manual).
The Parser creates Encodings and passes them to the PTree::Nodes during the parsing.
It can then be used to look up symbols. Currently a function's name isn't mangled.
So to look up functions unambiguously, you have to look at both, the encoded name and
the encoded type. We'll have to think about how to deal with that when looking for functions
in a given scope, as sometimes you know the signature you are looking for, sometimes you don't.

> As I've mentioned, the problem with OpenCxx.Scope suite is that it dumps the scope, 
> but does not do any lookup. We'd need some way to do lookups from tests. Something like
> 
> namespace N
> {
>      ////lookup N::n
> }
> and check the output.

Indeed. Should that be another test suite ? Or may be once we have that the 'OpenCxx.Scope'
suite isn't useful any more, so we should instead change the OpenCxx.Scope executable to
do lookup and not dump the whole scope any more.
Author: stefan Date: 2004-10-02.21:37:21
to get an idea of the current parsing capabilities
I suggest you compile the 'display-ptree' applet:
Once you'v got everything built, call 'make -C 
build/temp.<platform>/Synopsis/Parsers/Cxx 
display-ptree'. Use this to parse and dump 
arbitrary C++ source code.

Use it like 'display-ptree [-t] [-r] <input>'
where the -t option will print encodings
and -r the typeid().name() of the ptree nodes.

That will get you started as to what encodings
contain right now. (See the PTree/Display.cc file
for the code that is used to dump the tree)
Author: volodya Date: 2004-10-04.06:27:37
I've compiled 'display-ptree' it, but when I try to parse:

namespace A
{
    int f;
}

I get:
./display-ptree -t a.cpp
[namespace A [{
 [nil [int] [#i@[1]f[f]] ;]
}]]
unknown:5: parse error before `} }'
Author: volodya Date: 2004-10-04.06:32:27
Stefan Seefeld via synopsis issue tracker wrote:

> PTree::Encoding is a type (using std::basic_string<unsigned char> as its
> buffer) that holds encoded types / names. Details are documented in the
> code (and extracted into the ref manual).

Ok, so it's type/name combo.

> The Parser creates Encodings and passes them to the PTree::Nodes during the
> parsing. It can then be used to look up symbols. Currently a function's
> name isn't mangled. So to look up functions unambiguously, you have to look
> at both, the encoded name and the encoded type. We'll have to think about
> how to deal with that when looking for functions in a given scope, as
> sometimes you know the signature you are looking for, sometimes you don't.

When do you know the signature you're looking for? I think the only case where 
signature is provided is (void (*)(int)(&foo)

> > As I've mentioned, the problem with OpenCxx.Scope suite is that it dumps
> > the scope, but does not do any lookup. We'd need some way to do lookups
> > from tests. Something like
> >
> > namespace N
> > {
> >      ////lookup N::n
> > }
> > and check the output.
>
> Indeed. Should that be another test suite ? Or may be once we have that the
> 'OpenCxx.Scope' suite isn't useful any more, so we should instead change
> the OpenCxx.Scope executable to do lookup and not dump the whole scope any
> more.

I think we should change OpenCxx.Scope. The primary question is what 
directives to insert in code which will trigger lookup. Are comments 
represented in AST, so that we can make a visitor which will visit each 
comment, looking for comments of some special form?
Author: stefan Date: 2004-10-04.12:34:47
Vladimir Prus via synopsis issue tracker wrote:

> I've compiled 'display-ptree' it, but when I try to parse:
> 
> namespace A
> {
>     int f;
> }
> 
> I get:
> ./display-ptree -t a.cpp
> [namespace A [{
>  [nil [int] [#i@[1]f[f]] ;]
> }]]
> unknown:5: parse error before `} }'

Funny, I can't confirm that. I c&p'ed the four lines into
a new file and everything looks good. Note that right now
the parse() method returns individual statements, not a
list of all statements found in the file (I'll change that
soon). That's why you see the dump of the first statement
(the namespace definition), but something at the end of
the file is causing the trouble. No idea what, though.
Author: volodya Date: 2004-10-04.12:41:03
Huh, the problem was with missing newline at the end of the file ;-)
Author: stefan Date: 2004-10-04.23:23:39
Vladimir Prus via synopsis issue tracker wrote:

> I think we should change OpenCxx.Scope. The primary question is what 
> directives to insert in code which will trigger lookup. Are comments 
> represented in AST, so that we can make a visitor which will visit each 
> comment, looking for comments of some special form?

that would be a possibility. On the other hand, why do we need special
markup for that ? Why can't we just look for all identifiers that are not
declarators ?

Some issues that we have to resolve before any of this can work:

* We need to be able to generate a fully qualified name from any symbol
   returned by a lookup.

* As an extension of the above requirement: we need to define a 'scope
   type tree' in particular to contain 'named scopes' such as namespaces
   and classes/structs/unions.

* We need a way to access the current scope that applies while we are
   traversing the parse tree. This is more or less easy during the parse,
   as there we track the 'current scope'. But how can that be reproduced
   in a separate pass ?
Author: volodya Date: 2004-10-05.09:12:35
Stefan Seefeld via synopsis issue tracker wrote:

> Why can't we just look for all identifiers that are not
> declarators ?

In which scopes? Or you mean, for all identifiers in all scopes?

> Some issues that we have to resolve before any of this can work:
>
> * We need to be able to generate a fully qualified name from any symbol
>    returned by a lookup.

I think that basically, the Symbol class should have a 'name' member. Now it 
only has 'type' member which returns Encoding, and it's not obvious how to 
extact fully qualified name from there.

Alternatively, we can just output the like where the looked up symbols are 
defined.

> * As an extension of the above requirement: we need to define a 'scope
>    type tree' in particular to contain 'named scopes' such as namespaces
>    and classes/structs/unions.

That's true.

> * We need a way to access the current scope that applies while we are
>    traversing the parse tree. This is more or less easy during the parse,
>    as there we track the 'current scope'. But how can that be reproduced
>    in a separate pass ?

I don't know about of AST. If namespaces, classes, function bodies, etc are 
separate nodes, then a visitor can be written which which adjusts the scope 
as it traverses the tree.
Author: stefan Date: 2004-10-05.12:35:16
In my initial design the Symbol didn't contain a 
'name' member because that the user already knew as it was the key in the lookup.
Now that I realize that we need to access the 
fully qualified name, I think Symbol should 
contain a 'name' member of type Encoding that is 
the fully qualified name, probably constructed 
when the Symbol is created / inserted into a 
scope.

Yes, I agree a Visitor can keep track of the 'current scope' in the unit test.
Author: stefan Date: 2004-10-07.15:42:33
I just run into the papers listed here:
http://www.cs.may.ie/~jpower/Research/Papers/2000/
FWIW...
Author: volodya Date: 2004-10-08.09:27:18
Stefan Seefeld via synopsis issue tracker wrote:

> I just run into the papers listed here:
> http://www.cs.may.ie/~jpower/Research/Papers/2000/
> FWIW...

Interesting. I've printed two of them, and will read in the evening.
Author: stefan Date: 2004-10-09.19:51:46
As I'm thinking about the 'Scope' hierarchy, I'm
wondering how to name these classes, what 
namespace to put them in, etc.
Up to now I'v lumped low level things into the 
'PTree' namespace but symbol lookup seems to be
a self-contained module worth its own namespace.
However: there are a number of interdependencies
between Scopes and PTree::Nodes, so it's not clear
exactly how to proceed. Do you have suggestions ?
Author: volodya Date: 2004-10-12.08:03:19
Stefan Seefeld via synopsis issue tracker wrote:

> As I'm thinking about the 'Scope' hierarchy, I'm
> wondering how to name these classes, what
> namespace to put them in, etc.
> Up to now I'v lumped low level things into the
> 'PTree' namespace but symbol lookup seems to be
> a self-contained module worth its own namespace.
> However: there are a number of interdependencies
> between Scopes and PTree::Nodes, so it's not clear
> exactly how to proceed. Do you have suggestions ?

You mean,  that PTree:: classes should contain pointer to Scope, and 
Scope::lookup must return PTree:: classes?

Hmm... I'd prefer Scope hierarchy to be completely self-contained and 
independent. Maybe, we can make the 'lookup' method return boost::any, or 
void*, just to break the dependency.

There's another question relevant to Scope::lookup. How do you pass N::n<1>::m 
to it? As you can guess I would not like Scope::lookup to accept Encoding 
(and dependend on all PTree::). But another issue is that qualified name must 
be parsed element-wise.

Once you've parsed 'N::' the "is-is-typename" query should operate in "N::" 
scope -- to make sure 'n' is correctly classified as template name. I see two 
alternatives:

1. You lookup "N::" and change the current scope. Then 'n' is recognized the 
template name, "n<1>" is parsed, and you lookup "N::n<1>", and change the 
scope again and then you lookup N::n<1>::m. In other words, for each level 
you do lookup with a fully qualified name gathered so far.

2. The lookup is done incrementally. You have a class 'NameLooker' and you 
feed "N", "n", "<1>", "m" to it, and it returns the current scope.

3. The reason why you can't just change current scope after seeng "N::" and 
lookup 'n' (unqualified) there, is that rules for qualified lookup are 
different. However, probably it's possible to have "QualifiedNamespaceScope" 
class which will inherit from "NamespaceScope" and will do qualified lookup. 
Then, after "N::" is parsed the current scope will point to 
QualifiedNamespaceScope instance and everything will work.

(I think we need more, much more usecases)
Author: stefan Date: 2004-10-12.11:35:40
Vladimir Prus via synopsis issue tracker wrote:

> Hmm... I'd prefer Scope hierarchy to be completely self-contained and 
> independent. Maybe, we can make the 'lookup' method return boost::any, or 
> void*, just to break the dependency.

To break the circular dependency the PTree module shouldn't depend on Scopes,
but Scopes (which I'v tentatively put into the 'SymbolTable' module) have
to depend on PTree, as that's what Scopes and Symbols operate on.

> There's another question relevant to Scope::lookup. How do you pass N::n<1>::m 
> to it? As you can guess I would not like Scope::lookup to accept Encoding 
> (and dependend on all PTree::). But another issue is that qualified name must 
> be parsed element-wise.

ok, I understand your point, and agree. So we need two 'lookup' methods, i.e.
lookup_qualified() and lookup_unqualified() (say). I'm not sure about the
Encoding yet. I agree we have to clarify a lot more what Encodings are so
we can see whether we can enhance the semantics through a better design,
but the question seems mostly unrelated to the SymbolTable / lookup issues,
so we can keep the Encoding discussion on task27.

> (I think we need more, much more usecases)

I'll pull out the iso standard and copy the examples that are used in one
of the papers into the test suite. That will be a start...
Author: stefan Date: 2004-10-13.05:01:47
Attached is a little diagram showing the processing of

class A
{
public:
   static int n;
};
int main()
{
   int A;
   A::n = 42; // OK
   A b;       // ill-formed: A does not name a type
}

(from section 3.4.3/1 from the standard)

The diagram is based on the 'Modeling Name Lookup in C++'
paper, but adjusted / refined to my design proposal.

Vladimir Prus via synopsis issue tracker wrote:

> 3. The reason why you can't just change current scope after seeng "N::" and 
> lookup 'n' (unqualified) there, is that rules for qualified lookup are 
> different. However, probably it's possible to have "QualifiedNamespaceScope" 
> class which will inherit from "NamespaceScope" and will do qualified lookup. 
> Then, after "N::" is parsed the current scope will point to 
> QualifiedNamespaceScope instance and everything will work.

You'v got a scope as a (persistent) container that encodes the specific
semantics of the search. As the search algorithm also depends on the
argument, you somehow want to change one scope type into another ?
Sorry, I'm a bit lost here.

Here is my model (see attached diagram):

1. the parser asks the current (local) scope for 'A::n'
2. LocalScope realizes that the name is qualified, and
    therefor looks for a scope named 'A' (i.e. it ignores
    objects, functions, enumerators with that name)
3. as it can't find a scope name 'A', it passes the lookup
    to its enclosing scope (the global scope)
4. the global scope finds a scope (here: class) with name 'A'
5. the global scope now asks class A for the symbol 'n',
    and to restrict the search space to A's own scope it calls
    'lookup_qualified'.
6. A finds 'n' ...
7. ...and returns it

 > (I think we need more, much more usecases)

I'm in the process of integrating the tests I found at
http://www.cs.clemson.edu/~malloy/projects/ddj/ .
Author: volodya Date: 2004-10-14.01:54:22
Stefan Seefeld via synopsis issue tracker wrote:

> To break the circular dependency the PTree module shouldn't depend on
> Scopes, but Scopes (which I'v tentatively put into the 'SymbolTable'
> module) have to depend on PTree, as that's what Scopes and Symbols operate
> on.

I think that's OK, too

> > There's another question relevant to Scope::lookup. How do you pass
> > N::n<1>::m to it? As you can guess I would not like Scope::lookup to
> > accept Encoding (and dependend on all PTree::). But another issue is that
> > qualified name must be parsed element-wise.
>
> ok, I understand your point, and agree. So we need two 'lookup' methods,
> i.e. lookup_qualified() and lookup_unqualified() (say). 

Hmm.. the papers you've linked say there are some other cases, but I haven't 
looked closure.

> I'm not sure about 
> the Encoding yet. I agree we have to clarify a lot more what Encodings are
> so we can see whether we can enhance the semantics through a better design,
> but the question seems mostly unrelated to the SymbolTable / lookup issues,
> so we can keep the Encoding discussion on task27.

Agreed.
Author: stefan Date: 2004-10-28.21:16:41
I'v reorganized the code a little bit:

* the module is called 'ScopeLookup'
* a 'ScopeLookup::Table' is provided that acts as
  a facade for the parser to use to minimize 
  dependencies and allow polymorphism in the
  implementation, for example depending on the
  language / dialect
* a 'dump-symbols' demo applet is provided that 
  will dump the scope tree, as an aid while 
  declaring scopes from within the parser
* a Scope contains a 'ScopeTable' that maps
  PTree::Nodes to nested scopes (to be used by
  visitors that traverse the parse tree and the
  scope tree in parallel
Author: stefan Date: 2004-11-18.20:58:46
Even though this might be considered premature
optimization, I'm wondering how best to implement
the per-scope dictionary:
If we use the unencoded symbol name as key, we
need to provide the equivalent of a multimap to
allow for function overloading.
Alternatively we have to provide an intelligent
comparison operator for the key type.
If we use the multimap approach, there's still 
room for improvement as overloading only applies to functions, and thus most (?) symbols only 
appear once. May be some 'proxy symbol' that 
points to a list of (function name) symbols could
be used instead, just to keep the overhead small.
Thoughts ?
Author: volodya Date: 2004-11-20.12:05:41
What about:

   void foo(int) {}

   template<class T>
   void bar() { foo(10); }

   void foo(double) {}

Here, the call of 'foo' will see only first overload, no matter where 'bar' is 
instantiated. If we want to capture accurate meaning of names, we should 
handle this. I'm thinking that symbol table must map name to a list of names. 
New symbols would be added to the front of the list. Then, after looking up 
'foo' we'll assign pointer to the head of current symbols list of 'foo' to 
that name occurence. After adding second 'foo', the set of symbols assigned 
to name occurence won't change.

Also, I would not try to separate functions and non-functions. After all, what 
if one day standard will say function objects should participate in overload 
resolution? I've heard some suggestions to that effect.
Author: stefan Date: 2004-11-20.18:18:50
You raise a couple of issues in a single message. Let me try to respond
to the individual points:

It seems clear to me that symbol lookup has to take into account the
exact point at which a symbol was referenced, or more to the point,
the state of the symbol lookup table at that moment. I'm not sure yet
how that can translate to an actual design. The simplest thing seems
to lookup the symbol 'just in time', i.e. during the parsing when the
table has only populized by the declarations parsed up to this point.
(A notably exception to this is class scope where a function can call
  a method of the same class even if it hasn't yet been declared.)

I'm not sure about the rules for 'two phase lookup', i.e. in your example:
what role could the type of 'T' play in your example to look up 'foo' ?

As to separation (or not) of functions and non-functions, you guess you
are referring to the Symbol class hierarchy. That's an interesting question
in itself, as I'm currently struggling to understand whether 'ConstName'
should really be a distinct type (which seems to make sense insofar as
const integral values can hold initializers, i.e. have a value known
at compile time (and thus be used as a discriminator for types (array types,
as well as template instantiations).
Author: stefan Date: 2004-12-07.15:45:37
Working on the distinction between declarations / 
definitions, etc., I'm wondering: what information
does the symbol table need to represent ? Should 
it record all declarations ? Only the first one 
and possibly the definition (if found) ?

Or should we keep this kind of feature (i.e. the 
ability to look up a symbol's definition given a 
declaration) to a separate representation ?

By the way, an attempt to re-declare a symbol with
a conflicting type / storage class / etc. will 
just throw. 
Right now the parser catches that and tries to 
continue to let the user get a set of errors at 
once, not one on each run. See task30...
Author: stefan Date: 2005-01-25.04:37:28
Here is a little update on the symbol lookup work:

I worked a bit more on the SymbolLookup API to
implement the basics for qualified and unqualified
lookup.
So far it is working nicely, i.e. the two first
tests pass. ('pass' in the sense that running the
SymbolLookup binary on the two input files
generates output that looks as if the module is
doing the right thing. I have yet to decide how
to format the output to make it most useful.)
Author: stefan Date: 2005-02-10.15:14:05
While looking a bit closer at the issues involved in
implementing overload resolution I realize how much
work that actually is (quite complex type analysis).

Therefor, I'm tempted to put that into its own module, i.e. let symbol lookup return all matched 
functions and declare victory (for as far as symbol
lookup is concerned).
Does this appear to be a sensible choice ? Or would
there be reasons not to separate symbol lookup
and overload resolution ?
Author: stefan Date: 2005-06-19.02:27:27
While this surely requires more work to be considered complete, I would like to declare this
task done and open new bug reports whenever a
specific issue comes up.
Files
File nameUploaded
lookup-qual.png stefan, 2004-10-13.05:01:47
History
Date User Action Args
2004-10-01 13:47:07stefancreate
2004-10-01 13:51:24stefanlinktask23 superseder
2004-10-01 14:50:52volodyasetmessages: + msg268
2004-10-01 15:20:54stefansetmessages: + msg269
2004-10-01 15:54:07stefanlinktask15 dependson
2004-10-02 08:55:05volodyasetmessages: + msg273
2004-10-02 16:33:27stefansetmessages: + msg274
2004-10-02 21:37:22stefansetmessages: + msg275
2004-10-04 06:27:38volodyasetmessages: + msg276
2004-10-04 06:32:28volodyasetmessages: + msg277
2004-10-04 12:34:48stefansetmessages: + msg278
2004-10-04 12:41:08volodyasetmessages: + msg279
2004-10-04 23:23:39stefansetmessages: + msg281
2004-10-05 09:12:35volodyasetmessages: + msg283
2004-10-05 12:35:17stefansetmessages: + msg285
2004-10-07 15:42:34stefansetmessages: + msg290
2004-10-08 09:27:19volodyasetmessages: + msg292
2004-10-09 19:51:47stefansetmessages: + msg301
2004-10-12 08:03:20volodyasetmessages: + msg303
2004-10-12 11:35:41stefansetmessages: + msg304
2004-10-13 05:01:48stefansetfiles: + lookup-qual.png
messages: + msg306
2004-10-14 01:54:23volodyasetmessages: + msg307
2004-10-21 15:32:51stefanlinktask32 dependson
2004-10-28 21:16:42stefansetmessages: + msg314
2004-11-18 20:58:47stefansetmessages: + msg316
2004-11-20 12:05:41volodyasetmessages: + msg319
2004-11-20 18:18:50stefansetmessages: + msg320
2004-12-07 15:45:38stefansetmessages: + msg332
2005-01-25 04:37:29stefansetmessages: + msg351
2005-02-10 15:14:06stefansetmessages: + msg368
2005-06-19 02:27:27stefansetstatus: open
messages: + msg451
result: (no value)