In the last lesson we started with an abstract syntax tree and wrote some functions that did different interesting things to it. Now let's back up a step and figure out how to generate abstract syntax trees.
Abstract syntax trees are designed to be easy for computers to manipulate,
they're not necessarily designed for human understanding. While you're
composing your next opera I doubt you'll want to keep writing
again and again.
Parsing is the general problem of turning raw text into structured data. In the case of parsing programming languages, parsing converts the text the programmer writes into an abstract syntax tree. Parsers turn this:
Some interesting things to notice in the example are that the parser was smart and knew the rule for multiply being done first before the addition. Also, the parser is itself a function. It takes an input that is a string and returns an abstract syntax tree.
Creating a parser
How do you create a parser? A parser is just a function, so one way is to just write a function that does what you want. For simple languages this is not too hard, but as you add features to the language this becomes a lot of tedious work.
There is also a very handy online PEG.js editor that you might want to play around with. It lets you interactively write grammars and test them on input quickly.
Our first PEG parser
Here is a little PEG parser.
When you give this grammar to PEG.js, it will create a parsing function
parse for you. This function
parse takes a string as input. If
the input string is a single digit from 0 to 9, the
will return the digit. Otherwise the
parse function will throw
an exception indicating that the input could not be parsed. (In
the tests for each exercise we convert the exception into a return
undefined so that all the tests run.)
A grammar for PEG.js is a list of rules. Each rule has a name
and then a body. The
start name is special because parsing always
starts there. The notation
[0-9] indicates a single character
from a range of possibilities, in this case digits from 0 through 9.
Interpreting the example above in English, I would say it means, "Start by expecting a digit. A digit is a single character from the set of characters from 0 through 9."
Now it's your turn.
Write a PEG grammar that parses a single uppercase character from