Sunday, November 30, 2008

On implicit self, v2.0

Vassili Bykov posted an answer to my previous post regarding the use of implicit self in his blog. So, I will answer to his answer in my blog. I usually reread what I write and make corrections, so I have a feeling this post will be edited heavily. Please make sure you get the latest version.

Regarding implicit self in Newspeak, Vassili writes:

As far Newspeak is concerned, what we have is not “implicit self”, and its purpose is not saving keystrokes. What Newspeak has are implicit receivers. Because of class nesting, a message with an implicit receiver may really be sent to an object different from the “real” self (the receiver of the current message). This feature is very important in supporting the minimalist module system of Newspeak. Thus, an implicit receiver is not simply an omitted self, and inserting “self” into a message send with an implicit receiver is not a behavior-preserving transformation.

Ok, I did not understand implicit receivers in Newspeak and therefore what I wrote regarding implicit self does not apply to it. However, how does the argument for consistency to which Vassili replied with the text above, namely this (which I wrote),

I find that the consistency offered by a few keystrokes makes it easier for me to read and understand code faster and more accurately. Therefore, since we read code much more often than we write it, I think that favoring reading speed over typing speed is the right decision to make.

apply to implicit receivers in general? Doesn't the fact that the receiver is implicit become confusing over time? I think that, as Vassili says near the end of his post, perhaps experience will tell and unfortunately there is not a whole lot of it yet. On the other hand, maybe an example of what Vassili is referring to is in order.

The next piece from my post that Vassili quotes is this.

I’d rather see self than having to assume it by scanning the first token until the first occurrence of $: (or ‘::’) to only then be able to disambiguate between a receiver and a keyword.

In short: I prefer the work of my internal parser to be made easier by the use of prefixes, rather than to have to keep a stack that only goes down in the presence of a suffix.

Vassili's answer is the following.

This aurgmnet is falwed for the smiple raeson that our percpetion dose’nt wrok this way. We do’nt hvae an intarenl parser. What we rellay hvae culod be desrcbeid as a comlepx adpative, preditcive and bakcptaching paettrn recogznier. This is why we can still read the above even though most of the words are messed up.

Well, there is something worth noting. All the words that are mispelled have something in common: their first letter, a prefix, is always correct!

I've always felt disappointed when VW's spell checker tries to offer suggestions assuming the first letter is not at fault, and that the first letter is always present. When that does not happen, lewl, higstn moceeb rmoe tlufcidif ot dratsnedun. Tralinen erspra ro ton, het klac fo trecocr gliaden sortacindi edos esem ot ucaes tiandadiol sharphid*.

Jokes aside, I would suggest that although our idea of parser may be limited as implemented in a computer, we are quite able to draw distinctions on text based on a number of criteria. My observation was simply a matter of my personal preference.

I prefer the work of my internal parser to be made easier by the use of prefixes, rather than to have to keep a stack that only goes down in the presence of a suffix.

I still think it's worthy of consideration though, particularly because I am not sure Vassili's argument holds in every case:

We don’t scan the text linearly one character and one token at a time. Words are pictures, not character arrays.

Sure, however sentences are read left to right and precedence of certain words does matter. My thing with $: (or '::') is that they are a suffix added on to a word, and that the presence of this suffix has the ability to change the sentence being read quite strongly: it controls whether the first word is a receiver or not.

To put it differently, when it comes to sentences that represent a message send, perhaps the issue here is that Smalltalk basically imposed that the most important thing is the receiver, and that is why it comes first in Smalltalk sentences. In fact, since receivers come first, it is not necessary to mark them with suffixes or anything else.

But in Newspeak this is not so. I was of the impression that implicit self (in Self) or receivers (in Newspeak) were a matter of economy of typing. To some extent I get the same impression from Vassili's comment here:

On the other hand, there are situations when they improve readability by eliminating noise. A good example are DSLs embedded in Newspeak. So far we have two such languages widely used in the system: Gilad’s parser combinators and my UI combinators in Hopscotch. The feature common to both are definitions written in a declarative style combining smaller things into larger ones. Compare an example of such a definition the way it’s commonly written:
heading: (
row: {
image: fileIcon.
label: fileName.
})
details:
[column: folderContents]

with the same definition with explicit receivers:

self heading: (
self row: {
self image: self fileIcon.
self label: self fileName.
})
details:
[self column: self folderContents]

The first example has nothing but the structure it defines. It’s important what the expressions say. The fact that they are message sends is an implementation detail. The second example leaks this implementation, and it takes some effort to see what it really says in between all the “self”s.

The effect I can't help seeing though is that the receiver appears to stop being the most important element of a sentence, so much so that sometimes it is implicit and it is not equivalent to self --- even though in the code above the implicit receiver is self.

How does Newspeak disambiguate between an implicit receiver of "self" and some other implicit receiver? Is the disambiguation expense cheap? Perhaps part of the answer is in Vassili's comment:

Those left unconvinced should also consider that modern IDEs, Newspeak’s Hospcotch included, do the parsing for you by colorizing the source code.

However, I find this particular argument unconvincing because, even though I did work on more than one project that had syntax coloring, I found it most useful when the code was convoluted. So is the coloring good because of itself, or does it become valuable when there are other things to consider such as the inherent entropy of each symbol being read?**

But I digress. Personally, I would prefer the receiver to always be explicit, or at least the indication of whether there is a receiver first or not be a prefix, but what can I say... that's my biased preference today. I do not have a good record: 13 years ago I thought that programming assembler on my 386 was the greatest thing since sliced bread, and yet here I am writing books about Smalltalk... nevertheless, I hope that this is not seen in these terms:

There’s much to be said about the human nature and the tendency to instinctively resist change to something familiar while trying to rationalize that resistance.

Resistance to what? I am not observing the alleged change in my environment, so I cannot possibly be resisting it. The more interesting bit though is this.

It takes some time and experimenting to see a change for what it is and get a feel of the new tradeoffs.

So, I also hope that it is clear that some of the tradeoffs seen in Newspeak seem a bit strange to me at first sight. Not wrong, not incorrect, nor anything like that. Just not something I'd naturally think of today because my preferences are currently somewhere else, that's all.

Now, Gilad says in his presentations that one of the goals of Newspeak is to improve what was achieved with Smalltalk (and other languages such as Self). Well... perhaps the arguments are a bit too long to fit in 45 minutes or 2 hours, and so the essence behind them is missed. However, using implicit receivers for the sake of modularity (and to type less as a side effect)... it just makes me curious. What other alternatives were considered? What tradeoffs were attractive for this one as compared to the ones that were discarded?

To summarize: I think explicit receivers are better because sentences are less ambiguous and because a key distinction of a sentence, the receiver of the message, it always present in the same place plus it seems fitting that it comes first due to its importance. On the other hand, Newspeak's use of implicit receivers has the advantage of making it easier to implement a minimalist modularity scheme, and as a side effect you type considerably less in some cases.

Is that a fair assessment? Where do we go from here?

*: When that does not happen, well, things become more difficult to understand. Internal parser or not, the lack of correct leading indicators does seem to cause additional hardship.

**: Now talking exclusively about Smalltalk for a moment: if coloring is there and I can manage the namespace of a rather complex method better, does that end up helping me? Or does it simply make it easier for poorly written code to live on, thus making syntax coloring necessary and apparently useful? If methods are short and no more than 5 lines long, like we always say they should be, do we really need syntax coloring? Would we even care much about formatting? Which one is the egg and which one is the chicken?

And now talking about C: coloring really helps, but I think the existence of large files with lots of code and little to no visual cues as to where the boundaries between each of the pieces are is what makes coloring helpful in the first place. Nevertheless, I'd rather have a browser.

10 comments:

Russell said...

At least in Self, the receiver is always either (implicitly) self, an expression or an object literal; ie

a: 20 + b.
b printLine.
23 printLine.
('Stop', 'Hammertime!') printLine.
c: list copyWith: 'aaa'.

is:

self a: 20 + self b.
self b printLine.
23 printLine.
('Stop', 'Hammertime!') printLine.
self c: self list copyWith: 'aaa'.

It feels quite natural once you start using it.

Jecel said...

Actually, in Self (which I consider a Smalltalk and not a separate language, by the way) a missing receiver is more like "thisContext". Since the current context inherits from "self" this detail is not obvious, but when accessing arguments and temporaries it makes a difference.

But about implicit receivers, I have read and written a ton of code in Self and Slate and the only problem I have seen is that they make noticing a missing "." slightly harder than in Smalltalk-80.

Andres said...

Jecel,

Doesn't it also allow ambiguity between a unary message and a name visible in the local scope of execution? For example,

negated := something negated.
^negated + 5

What is the answer in this case?

Andres.

Vassili said...

Hi Andres,

It's an almost fair assessment, except for the use of the word "better". My point is that your concerns are not nearly as serious as you believe. I've used the thing for nearly two years, so by saying this I'm sharing my experience and offering an explanation why, not making a hypothetical counter-argument.

On a more narrow subject: you still stick to computer analogies in reading text, and that is fundamentally wrong. The way humans read has nothing in common with linear scanning of characters. The words in my messed-up example were recognizable not only because the first characters were untouched. I also didn't touch the last characters and tried to keep the overall word shape the same. These are the primary characteristics that allow fast recognition of word forms. A word is taken in at once in one saccade, and a terminating colon is perceived immediately. You don't have to "scan to the end" to see if a word is an identifier or a keyword, you just see it. Even if you don't believe it. Exception are words too long to fit in the fovea (more than 10 characters or so for the usual reading conditions), but even so a colon falls in the right parafoveal area and is still recognizable thanks to its distinctive shape (some typefaces can make it easier or harder than others).

Andres said...

Gilad,

Vassili's post made me more aware and curious about what I was missing. I did ask a number of questions regarding how implicit receivers work particularly in Newspeak, and hopefully this will make things more clear to understand for me.

Andres.

Andres said...

Vassili,

> I've used the thing for nearly two years, so by saying this I'm sharing my experience and offering an explanation why, not making a hypothetical counter-argument.

Please share the details of the experience. For example, what is the disambiguation process for determining whether an implicit receiver is self or something else? Do you have a clear example of how implicit receivers are used for the sake of modularity in mind that you might want to bring up?

Andres.

Jecel said...

Andres,

the following code in Self

| negated |
negated: something negated.
^negated + 5

does not cause any confusion for me or slow me down at all to think about it. I will add a SELF (meaning thisConext, as I said before) to the places where I imagine an implicit receiver:

| negated |
SELF negated: SELF something negated.
^ SELF negated + 5

It probably took me no more than an hour or so of Self programming to automatically reject SELF in the middle of "something negated", "negated +" and "+ 5". When you get to that point, the two negated that are a local variable and the one that isn't feel rather different. But not too different since the point of selecting the same name was probably to unify the concepts.

Think of this Smalltalk-80 expression, for example:

^(x + arg x)@(y + arg y)

The two "x" are different, but not too different.

Andres said...

Jecel,

How does Self determine whether the return in

| negated |
negated := "whatever".
^negated + 5

is the temporary variable negated + 5, as opposed to self negated + 5?... or is it the same thing because self meant "thisContext"? Is that what you mean?

Andres.

Gilad Bracha said...

Andres,

Sorry, I'm coming very late to this discussion. You aks about the semantics for implicit receiver sends.

Vassili has made a new post or two about implicit receivers. The semantics are described in the Newspeak spec.

There is also a paper at the Dyla workshop at ECOOP 2007. All of this is available at the
http://newspeakprogramminglanguage.org

The short answer is that it feels like lexical scope, even though it is entirely dynamic. If you see a name declared in the surrounding scope, and it is not shadowed, then that is what the name means. Otherwise, it is inherited from your superclass (i.e., lexical scope wins over inheritance).

My opinion (unbiased, of course) is that this is extremely natural, and helps structure code very nicely. The team agrees with me.

You'll be able to try it quite soon and decide for yourself.

Andres said...

Gilad,

Thank you for your comments. You know, I've been thinking about namespaces (a la Smalltalk) lately, and I found the features of Newspeak quite interesting to consider. I'll try to do more of my homework first and then perhaps we can talk later.

Andres.