brandon.hornseth

Adventures with the PHP Internals

tech

I’ve been tinkering with some PHP lately and stumbled into this syntax caveat that existed prior to PHP 5.4:

As of PHP 5.4 it is possible to array dereference the result of a function or method call directly. Before it was only possible using a temporary variable.

So, what we’re talking about is essentially this:

<?php 

// PHP Parse error:  syntax error, unexpected '[', 
// expecting ',' or ';'
echo '[1]: ' . array('one', 'two', 'three')[1];

// or
function array_me() {
  return array('one', 'two', 'three');
}

// raises the same Parse error
array_me()[2];

// ...but this works
$tmp = array_me();
echo $tmp[2];
// > three

While I don’t have terribly broad experience in languages, this quirk really made me pause. What was done to add support for accessing return values in such a way, and why it didn’t “just work” in the first place? It quickly turned into one of those problems I couldn’t stop thinking about until I understood it fully.

In spite of all the disdain for PHP: The Language, it ends up being a really accessible reference implementation for questions like this. In fact, the mini-pascal language we used in my college compilers course was built using the same core tools used to implement PHP, so this question seemed right up my alley.

My first google hits turned up release notes for PHP 5.4 which confirmed this was, indeed, a new feature. That version was released in March 2012, so that gave me a rough starting point for an implementation timeframe. Some additional searching turned up a mail thread from June 2010 where patch author Felipe Pena suggested the improvement and supplied a patch. Digging through the commits on Github around that timeframe, we ultimately end up at this changeset which includes a bunch of tests and some changes to zend_language_parser.y.

That’s an uncommon file extension in the programming world, and this file in particular is an input file for bison, a parser generator. If you ever wondered where T_PAAMAYIM_NEKUDOTAYIM comes from, it’s this file. It contains the entire grammar for the PHP language, and the changeset adds a new non-terminal symbol, array_function_dereference defined as such:

array_function_dereference:
    array_function_dereference '[' dim_offset ']' { fetch_array_dim(&$$, &$1, &$3 TSRMLS_CC); }
  | function_call { zend_do_begin_variable_parse(TSRMLS_C); $$.EA = ZEND_PARSED_FUNCTION_CALL; } 
   '[' dim_offset ']' { fetch_array_dim(&$$, &$1, &$4 TSRMLS_CC); }
;

Let’s break this down line by line. We’ll ignore the code inside the braces for the time being:

array_function_dereference:

defines a rule for the non-terminal array_function_dereference. This rule that comes in one of two forms, separated by a |. The first:

array_function_dereference '[' dim_offset ']'

references itself, followed by an open bracket, the non-terminal dim_offset, and finally a closed bracket. This is how PHP is now able to handle chaining like some of the more elaborate test cases included with the changeset. The second format:

function_call '[' dim_offset ']'

When control returns, the function call will have been processed, and there will be a semantic value available for that construct.

Now, about the statements inside the braces: those are called semantic actions, and it’s the C code called in order to produce some value for the rule being processed (you can take a look at the implementation for fetch_array_dim here). So why didn’t PHP support dereferencing returned arrays prior to this change? The short answer is that the grammar didn’t support it, and the syntax is complicated enough that it had to be specifically added.

Lastly, this post is an extremely high-level view of how PHP is implemented. The documentation for Bison is surprisingly great, though. If you’re at all interested in compilers or learning how programming languages work, I recommend you read up on Bison/Flex (the lexical analyzer typically used with Bison) and implement a simple parser on your own. Ruby folks can check out Racc, which is a comparable program written almost entirely in ruby.

Thanks for reading!

If you have any comments or feedback on this article, I’d love to hear from you. The best way to reach me is on Twitter.