收录日期:2019/10/18 22:06:02 时间:2013-01-26 20:57:49 标签:programming-languages,language-design,grammar

I just came across this question in the Go FAQ, and it reminded me of something that's been bugging me for a while. Unfortunately, I don't really see what the answer is getting at.

It seems like almost every non C-like language puts the type after the variable name, like so:

var : int

Just out of sheer curiosity, why is this? Are there advantages to choosing one or the other?

There is a parsing issue, as Keith Randall says, but it isn't what he describes. The "not knowing whether it is a declaration or an expression" simply doesn't matter - you don't care whether it's an expression or a declaration until you've parsed the whole thing anyway, at which point the ambiguity is resolved.

Using a context-free parser, it doesn't matter in the slightest whether the type comes before or after the variable name. What matters is that you don't need to look up user-defined type names to understand the type specification - you don't need to have understood everything that came before in order to understand the current token.

Pascal syntax is context-free - if not completely, at least WRT this issue. The fact that the variable name comes first is less important than details such as the colon separator and the syntax of type descriptions.

C syntax is context-sensitive. In order for the parser to determine where a type description ends and which token is the variable name, it needs to have already interpreted everything that came before so that it can determine whether a given identifier token is the variable name or just another token contributing to the type description.

Because C syntax is context-sensitive, it very difficult (if not impossible) to parse using traditional parser-generator tools such as yacc/bison, whereas Pascal syntax is easy to parse using the same tools. That said, there are parser generators now that can cope with C and even C++ syntax. Although it's not properly documented or in a 1.? release etc, my personal favorite is Kelbt, which uses backtracking LR and supports semantic "undo" - basically undoing additions to the symbol table when speculative parses turn out to be wrong.

In practice, C and C++ parsers are usually hand-written, mixing recursive descent and precedence parsing. I assume the same applies to Java and C#.

Incidentally, similar issues with context sensitivity in C++ parsing have created a lot of nasties. The "Alternative Function Syntax" for C++0x is working around a similar issue by moving a type specification to the end and placing it after a separator - very much like the Pascal colon for function return types. It doesn't get rid of the context sensitivity, but adopting that Pascal-like convention does make it a bit more manageable.

the 'most other' languages you speak of are those that are more declarative. They aim to allow you to program more along the lines you think in (assuming you aren't boxed into imperative thinking).

type last reads as 'create a variable called NAME of type TYPE'

this is the opposite of course to saying 'create a TYPE called NAME', but when you think about it, what the value is for is more important than the type, the type is merely a programmatic constraint on the data

An increasing trend is to not state the type at all, or to optionally state the type. This could be a dynamically typed langauge where there really is no type on the variable, or it could be a statically typed language which infers the type from the context.

If the type is sometimes given and sometimes inferred, then it's easier to read if the optional bit comes afterwards.

There are also trends related to whether a language regards itself as coming from the C school or the functional school or whatever, but these are a waste of time. The languages which improve on their predecessors and are worth learning are the ones that are willing to accept input from all different schools based on merit, not be picky about a feature's heritage.

If the name of the variable starts at column 0, it's easier to find the name of the variable.

Compare

QHash<QString, QPair<int, QString> > hash;

and

hash : QHash<QString, QPair<int, QString> >;

Now imagine how much more readable your typical C++ header could be.

In formal language theory and type theory, it's almost always written as var: type. For instance, in the typed lambda calculus you'll see proofs containing statements such as:

x : A   y : B
-------------
 \x.y : A->B

I don't think it really matters, but I think there are two justifications: one is that "x : A" is read "x is of type A", the other is that a type is like a set (e.g. int is the set of integers), and the notation is related to "x ε A".

Some of this stuff pre-dates the modern languages you're thinking of.

"Those who cannot remember the past are condemned to repeat it."

Putting the type before the variable started innocuously enough with Fortran and Algol, but it got really ugly in C, where some type modifiers are applied before the variable, others after. That's why in C you have such beauties as

int (*p)[10];

or

void (*signal(int x, void (*f)(int)))(int)

together with a utility (cdecl) whose purpose is to decrypt such gibberish.

In Pascal, the type comes after the variable, so the first examples becomes

p: pointer to array[10] of int

Contrast with

q: array[10] of pointer to int

which, in C, is

int *q[10]

In C, you need parentheses to distinguish this from int (*p)[10]. Parentheses are not required in Pascal, where only the order matters.

The signal function would be

signal: function(x: int, f: function(int) to void) to (function(int) to void)

Still a mouthful, but at least within the realm of human comprehension.

In fairness, the problem isn't that C put the types before the name, but that it perversely insists on putting bits and pieces before, and others after, the name.

But if you try to put everything before the name, the order is still unintuitive:

int [10] a // an int, ahem, ten of them, called a
int [10]* a // an int, no wait, ten, actually a pointer thereto, called a

So, the answer is: A sensibly designed programming language puts the variables before the types because the result is more readable for humans.

It's just how the language was designed. Visual Basic has always been this way.

Most (if not all) curly brace languages put the type first. This is more intuitive to me, as the same position also specifies the return type of a method. So the inputs go into the parenthesis, and the output goes out the back of the method name.

I'm not sure, but I think it's got to do with the "name vs. noun" concept.

Essentially, if you put the type first (such as "int varname"), you're declaring an "integer named 'varname'"; that is, you're giving an instance of a type a name. However, if you put the name first, and then the type (such as "varname : int"), you're saying "this is 'varname'; it's an integer". In the first case, you're giving an instance of something a name; in the second, you're defining a noun and stating that it's an instance of something.

It's a bit like if you were defining a table as a piece of furniture; saying "this is furniture and I call it 'table'" (type first) is different from saying "a table is a kind of furniture" (type last).

I always thought the way C does it was slightly peculiar: instead of constructing types, the user has to declare them implicitly. It's not just before/after the variable name; in general, you may need to embed the variable name among the type attributes (or, in some usage, to embed an empty space where the name would be if you were actually declaring one).

As a weak form of pattern-matching, it is intelligable to some extent, but it doesn't seem to provide any particular advantages, either. And, trying to write (or read) a function pointer type can easily take you beyond the point of ready intelligability. So overall this aspect of C is a disadvantage, and I'm happy to see that Go has left it behind.

Putting the type first helps in parsing. For instance, in C, if you declared variables like

x int;

When you parse just the x, then you don't know whether x is a declaration or an expression. In contrast, with

int x;

When you parse the int, you know you're in a declaration (types always start a declaration of some sort).

Given progress in parsing languages, this slight help isn't terribly useful nowadays.

Fortran puts the type first:

REAL*4 I,J,K
INTEGER*4 A,B,C

And yes, there's a (very feeble) joke there for those familiar with Fortran.

There is room to argue that this is easier than C, which puts the type information around the name when the type is complex enough (pointers to functions, for example).

What about dynamically (cheers @wcoenen) typed languages? You just use the variable.