Defining a Language (Part 3)
Sunday, December 30th, 2007Now we are getting some where. We will now define a grammar for our language. A grammar is a set of rules for a part of our language. It has four parts: a set of terminals, a set of non-terminals, a set of productions, and a starting string.
The overall set will be the combination of the terminal characters and the non-terminal characters, and the “null character”. Terminals are usually represented by lower case characters and don’t change after they’ve been placed (this will make more since in a bit). They are essentially a final product. Non-terminals are what have to be eliminated in the up coming process. Sometimes, one step in the process will replace a non-terminal with another non-terminal, but in the end, they will be replaced with terminals. The null character is a character that doesn’t really exist. It doesn’t take up a space in the string. It is basically a terminal when you don’t need to replace a non-terminal with anything.
Productions are rules to transform the non-terminals into terminals. They are the rather important part. This is the stuff that is represented in BNF. In this current form, though, the symbols would look different. However, I can’t type an arrow. So I’ll use the BNF symbols as they are easy enough to understand.
Finally, the starting string is a string of symbols, either terminal or non-terminal that will be transformed via the productions into all terminal symbols.
Now, while the symbols in these examples are just character, and while case has a meaning here, that doesn’t mean that it has to remain that way. For instance, if converting from BASIC to C, the BASIC form would be non-terminals and some terminals (like literals and variable names), while the code in C form would be all terminals.
I’ll do some examples of this in part 4, I think. I’m re-reading this post and see how long and confusing it already is. I’ll follow that up with a post actually about BNF.
