merged branch arnaud-lb/str-interpolation (PR #515)
Commits
-------
d41d10c simplification
2dbb420 moved hard-coded regexes
61986c1 shortcut for non-interpolated double quoted strings
062ed5c regex optimization
0e63abb integration tests for interpolated strings
773fff5 parser tests for interpolated strings
098634e parser support for interpolated strings
c8bba24 lexer tests for interpolated strings
a203686 lexer support for interpolated strings
0bd63dd stack lexer states
Discussion
----------
String interpolation syntax
This adds support for string interpolation:
```
{{ "foo #{ any expression here } bar" }}
```
I find string interpolation to be often more readable than concatenation, and doesn't have the precedence ambiguity of the concatenation operator:
```
Interpolation:
{{ "foo #{bar} baz" }}
{{ "foo #{1+2} baz" }}
{{ "foo #{1+2} baz"|escape }}
Concatenation:
{{ "foo "~bar~" baz" }}
{{ "foo "~(1+2)~" baz" }} extra parentheses
{{ ("foo "~(1+2)~" baz")|escape }} more extra parentheses
```
---------------------------------------------------------------------------
by nikic at 2011/11/12 12:41:54 -0800
Really nice idea. We should discuss the exact syntax though :) Dart for example uses `${expr}` for interpolation or `$name` if the expression is just a name. I think it makes sense to have a special short syntax for the name case, because it is the most common one.
---------------------------------------------------------------------------
by stof at 2011/11/12 13:06:51 -0800
@nikic could you comment on the PR instead of commenting on each commit individually ? it would make it easier to see what is concerned in the PR.
---------------------------------------------------------------------------
by arnaud-lb at 2011/11/12 14:22:14 -0800
@nikic Thanks for your review, I'll take care of your remarks.
For the syntax, I have no preference. I've taken this one because it has the same semantics. In Ruby and Coffeescript the `#{...}` allows any expression between the brackets. In PHP the expression in the `${...}` must start with a variable name. As you mentioned Dart's `${...}` seems to allow any expression.
---------------------------------------------------------------------------
by nikic at 2011/11/12 15:03:48 -0800
@arnaud-lb If Ruby and Coffeescript use this, the #{...} syntax should be okay. Especially as ${...} might be confusing for PHP developers as it has the semantics of variable variables in PHP.
@stof Thanks, will do so next time.
---------------------------------------------------------------------------
by damianb at 2011/11/12 15:38:31 -0800
@stof he doesn't need to - this is something github handles already. in the PR view, beside the "@nikic commented" bits, there's a set of links that shows where exactly the comment was in the format of commit sha, filename, and line inside the diff.
You can see for yourself right here, upper left:
![pr snapshot](http://i42.tinypic.com/zl6s82.png)
---------------------------------------------------------------------------
by stof at 2011/11/13 01:20:38 -0800
@damianb The comment appears, but not the context in which the comment was. We have to follow each link to see it whereas comments done on the PR appear directly with their context. This is why I said it is easier when comments are on the PR.
---------------------------------------------------------------------------
by nikic at 2011/12/02 15:13:30 -0800
So what about this now?
Btw, `REGEX_STRING` and `REGEX_DQ_STRING_PART` still are hardcoded and do not depend on the specified `interpolation` tags. But this looks hard to change, at least the regexes are quite complicated.
---------------------------------------------------------------------------
by arnaud-lb at 2011/12/02 15:22:45 -0800
I think I've addressed all issues (except hardcoding of `#{` and `}` in `REGEX_STRING` and `REGEX_DQ_STRING_PART`.)
Can this be merged after v1.4 is release if the syntax is accepted ?
---------------------------------------------------------------------------
by nikic at 2011/12/03 01:53:41 -0800
This would be my go for `STRING` and `DQ_STRING_PART` (isn't tested):
```php
<?php
$interpolChar = preg_quote(substr($options['interpolation'][0], 0, 1), '/');
$interpolPart = '[^' . $interpolChar . '"\\\\]*';
if (strlen($options['interpolation'][0]) > 1) {
$interpolRest = preg_quote(substr($options['interpolation'][0], 1), '/');
$interpolRegex = $interpolPart . '(?:(?:\\\\.|' . $interpolChar . '(?!' . $interpolRest . '))' . $interpolPart . ')*';
} else {
$interpolRegex = $interpolPart . '(?:\\\\.' . $interpolPart . ')*';
}
$this->options['dq_string_part_regex'] = '/' . $interpolRegex . '/As';
$this->options['string_regex'] = '/"' . $interpolRegex . '"|\'([^\'\\\\]*(?:\\\\.[^\'\\\\]*)*)\'/As';
```
It should allow both a single char interpolation syntax (like `{`) and multichar ones (like `#{`).
By the way, could we maybe move the compiled regexes out of `$this->options` into `$this->regexes`?