Skip to content
mahills edited this page Jun 4, 2013 · 2 revisions

Parsing PHP Files

To parse PHP files, you first need to import the modules which provide the AST datatypes and the parsing functionality:

import lang::php::ast::AbstractSyntax;
import lang::php::util::Utils;

To parse a single PHP file, use function loadPHPFile:

ast = loadPHPFile(|file:///Users/mhills/Projects/phpsa/corpus/MediaWiki/mediawiki-1.19.1/profileinfo.php|);

This will parse the given file and return an AST-representation, tagged with source locations, of the contents.

To parse an entire system, use function loadPHPFiles. This function comes in two varieties. The first takes a location, which should be a directory, and parses all files with either a .php or a .inc extension:

sys = loadPHPFiles(|file:///Users/mhills/Projects/phpsa/corpus/MediaWiki/mediawiki-1.19.1|);

The second also takes a set containing the extensions that should be included to determine which files to parse:

sys = loadPHPFiles(|file:///Users/mhills/Projects/phpsa/corpus/MediaWiki/mediawiki-1.19.1|, {"php","inc"});

Both return a System, imported as follows:

import lang::php::util::System;

A System is an alias to a map from locations (the location of each file) to ASTs (the AST representing the file at that location).

All of the functions shown above will throw the runtime exception AssertionFailed in cases where the location provided is not correct. In all cases the location must be a file, given with the file scheme, and must exist. For loadPHPFile the location must also be a file, while for loadPHPFiles the location must be a directory.

Parser Changes

The parser was updated on 4 and 5 June, 2013, to match the output of the current version of the external parser and to fix a couple of bugs. To summarize:

  • Support for yield in PHP 5.5 has been added to the AST
  • List assignment now works correctly for multi-level lists
  • List assignment now works correctly for empty positions, e.g., list($a,,$b)
  • List assignment now uses the standard assignment expression constructor with a list expression target versus using the list assignment constructor
  • Namespaces without blocks are differentiated from namespaces with empty blocks by using namespaceHeader for the first

It's easy to translate back to the original AST for everything but yield (which will not be in existing ASTs, most likely) and nested lists. The functions that can do this are in NormalizeAST and are named oldListAssignments and oldNamespaces. Each takes a script and returns a script that uses the original features. This does not currently propagate annotations.