banner logo
Updates Blog Support Contact Links
Programming JdiWeb JdiMOO JdiBoy JdiMail JdiMOOpp NetCheck Tutorials
Gaming AVR Othello

Archive for January, 2008

Compilers (Part 1)

Wednesday, January 16th, 2008

Ok, I know I said I was going to stop, but I found a book about compilers. Through out the book, the talk about how compilers work, and work you through making most of a compiler for a stripped down version of java. The main thing they leave out is linking and assembling.

I started looking through the book, and find it rather easy (for me) to read, so I’m enjoying it. I figured I would start a new set on compilers, but this time, only every once in a while, not all at once like before.

First, a compiler isn’t a black box that does lots of stuff all at once. Rather, it has many portions that build on the previous one to do it all. The two main portions are the front end and the back end. The front end reads the input, does some magic that I’ll talk about later, and outputs to a psudo-assembly. This psudo-assembly isn’t something that an assembler can deal with, but it is similar to many kinds of assembly. It makes it easy for the back end to make it into assembly. The back end will read this psudo-assembly, and convert it to the desired assembly form, then assemble it, and link it, and turn it into machine code.

This two part arrangement allows for one to write one back end for the type of computer, with many front ends for the different languages. Or, you could make one frond end for your language, with many back ends, so that you can compile to many platforms with your new language. It might also be that there is a common psudo-assembly that you could use so you can make use of pre-existing back ends, or front ends, depending on if you are developing a language, or a architecture.

The front end consists of a part that will “tokenize” the code. This is basically looking at each word, rather than each letter. When programming a tokenizer, you define what your keywords will be and what blocks there are, like identifiers, literals, symbols, and comments. You can tell it to skip stuff, like comments, and also, have it pay attention or ignore white space. The tokenizer will basically figure out what the words are in the source. In my book, they use Regular Expressions to determine what is valid or not. They had an example of determine a valid identifier as “[a-zA-Z_][a-zA-Z_0-9]*” which is a word that starts with a letter or underscore, and can be followed by any combination of letter, numbers, or underscores. I’ll do more on Regular Expressions in another post.

The tokenizer will feed into a lexical analyzer that will determine if the syntax is mostly correct. This is where BNF comes in. Your BNF could define what the above tokens are, so some Compiler-Compilers do this. The one in the book separated it out. The Lexical Analyzer looks at each word and determines if you can derive what is there from the BNF. There are a few ways of doing this, but I’m not sure what they exactly are (I’m not there yet).

If it is valid, it passes it on to a section that determines if things are in the correct order. Like using a variable only after it’s been declared. While the BNF doesn’t define this specifically, this is important, and so this section has to check that.

After all that is done, it converts the tolkens into the psudo-assembly, kinda in sections. So this part of the code does this, so we use this block of psudo-assembly. After that, the back end will convert the psudo-assembly into real assembly for the target platform, then will link it to the libraries (not sure on this one), and assemble it into machine code.

Linux’s iTunes

Tuesday, January 15th, 2008

When I first got a computer I could kinda call my own, it was in the 9th grade, and the school provided it. It was an iBook, and it had this new thing called iTunes. Well, I started using iTunes and started developing a library and playlists. Over the 4 years I had that computer, I continued to use iTunes. It let me manage my music the way I wanted. When I got a windows desktop, I put iTunes on that. When I got a windows laptop, I put iTunes on that, too (by then, I didn’t have the iBook anymore).

Fast forward to about 8 months ago. I was so fed up with Windows on my desktop that I was wiping that harddrive and installing linux. I got nearly everything set up. Just about everything I used was either Open source, or had equivalents. All, that is, except for iTunes. I needed something to manage my music with, and work with my iPod. There was Rhythmbox, Amarok, and a few others. Neither worked like iTunes, and, out of the two, Amarok had the most features. So I set up Amarok to be my music player. But I never really liked it. It had a play queue, but no way to view a play list. It had some cool things, like a script manager to do interesting things. But it wasn’t intuitive.

Fast forward again to just the other day. I was browsing around and saw a mock up of what Ubuntu 8.04 might look like. Hiding behind a demo window, was what looked like iTunes’s evil steptwin. Black on dark, yet it had a similar layout to iTunes. I don’t care if it looks evil, if it works, it works. Looking close, I saw that the program was called Songbird.

After some careful searching, I found their site. It is apparently a program built on top of Mozilla. It has a browser with tabs and what not, but primarily, it has a music manager. It has extensions like Firefox, but they are tailored to songbird. You can theme it to look differently, add an extention to work with iPods, and even import an iTunes music library. It only really lacks three features: 1) smart playlists, 2) a way to export an iTunes library (so I can put it back into iTunes if I ever need to), and 3) a way to use it as an alarm clock.

Songbird is cross platform, if you happen to want something that isn’t iTunes. But I don’t see the point, myself. But if you are on linux, and want the closest thing to iTunes you can get, check their site here.

It LIVES!

Friday, January 4th, 2008

In 10th grade, I won a first generation iPod. After about 3 years, it decided to give up it’s ghost. Mostly. I still have it in case I ever can figure out how to fix it. But about that time, my parents gave me an iPod mini for my birthday. That was 4 years ago.

Recently, it had been acting weird, charging, but then saying that it wasn’t charged. I would end up banging on it and waiting about 10 minutes and it would go again for a few hours. But no where near the 8-12 hours promised. It was getting so bad that I had taken to leaving it plugged into my car to listen to there, but I didn’t take it anywhere. It wasn’t useful anymore. I looked into saving up to buy a new iPod. The new iPod Touch looks really nice. Only it wouldn’t work with my car’s stereo system that was suppose to work with iPods. So I dismissed that plan.

I then became aware that iPod batteries where cheep. As long as you installed them yourself and didn’t get them from Apple. Apple wanted 80 bucks last I checked. So I had not thought to look for new batteries, 80 bucks is way too much. I found out that they went for less than 30 after searching for laptop batteries (120 bucks, by the way).

I did some research and found a good deal on iPod mini batteries from Other World Computing. It was about 20 with about 5 dollars standard shipping (UPS in about a week). Well, I bought it and installed it just the other day. I posted pictures of it here. I got to say that while not for the faint of heart, it was easy enough. It gave new life to my ailing iPod, which now seems like it could last a good 24 playback hours.

Some thoughts for those who might be reading this and have an iPod mini that needs revitalizing. OWC has instructions on how to replace your battery. Other places do, too, but their instructions are more detailed and actually have the guy doing it, not like a cooking show. Also, while the set on OWC is a good deal and I recommend it, it doesn’t have that blue plastic wedge tool that he recommends. I also recommend it, and which I had one. I used the screwdriver that came with it and it scratched things up. I have yet to find one that isn’t in some expensive set. Finally, if you have them, do use C-pliers. They would have made this so much easier. You can see the C-clip in the photos. It’s in there tight.

Defining a Language (Part 5)

Tuesday, January 1st, 2008

In this, the, probably, final installment of our series, we will discuss the practical nature of BNF and how it relates to compiler design.

I’ve read a few articles that discuss BNF in practical form. I won’t go into all the details here, though. Basically, you can have a perfectly good BNF grammar, but need to change it. The reasoning is that how you parse the input (top down, bottom up) changes how you want the BNF structured. This helps simplify parsing or lookup tables. I’m not exactly clear on how this works right now, but I might see about this more latter.

Also, there are different forms of BNF. EBNF looks a lot like RegEx in that you can say you will allow any number of something to exist (like any number of digits in a row). Some variants will allow one to specify an “action” to go with the production. This sort of tells a compiler what to do on certain steps.

On that note, making a psudo-compiler is rather easy, provided you have the language defined. The GNU project has made a program that takes appropriately defined grammar and churns out a C/C++ framework for that grammar. But you have to specify what you want it to do in C.

As an example they had in their big manual, you could make a Reverse Polish Notation calculator with this. From the manual, the grammar would be defined as follows:

input:   /* empty */
       | input line
;
line:    '\n'
       | exp '\n'      { printf ("\t%.10g\n", $1); }
;
exp:     NUM           { $$ = $1;           }
       | exp exp '+'   { $$ = $1 + $2;      }
       | exp exp '-'   { $$ = $1 - $2;      }
       | exp exp '*'   { $$ = $1 * $2;      }
       | exp exp '/'   { $$ = $1 / $2;      }
        /* Exponentiation */
       | exp exp '^'   { $$ = pow ($1, $2); }
        /* Unary minus    */
       | exp 'n'       { $$ = -$1;          }
;

What this does is setup a parser in C that can be called (for instance, from the main function of a compiler) that would tolkenize the input. The stuff in the curly brackets is the “action” associated with the line and is the resultant C code… mostly. $$ represents the value for that part of the expression. $1 is the value for the first expression inside the one being dealt with. $2 is the second one, and so on. So if you input “1 1 +, it would see that as valid, and would handle it like the C code “1+1″. If you had “1 1 2 + -” it would turn that into “1+2 -1″ because it would see that “1 2 +” is valid and parse that, then it would see the rest as valid and parse it.

Now that is great for scripting and helping make calculators, but what about compilers? You could think about what this is as compiling the code as you enter it, and immediately executing it. Not useful for compilers, but you could turn all those direct C statements into file output statements. If you wanted to write an RPN compiler, you could change the {$$ = $1 + $2} into something like {”fprintf(/”$1 + /$2/”)”} or so and, when run, it would print to a file the converted version of the RPN function, which you could then compile in C.

I recommend you check out the bison project’s site and read the manual to see more. It provides ways to parse a file, and to provide meaningful feedback when compiling (Syntax Error is too vague for me ;) ).

Unless prompted to do so, this is the conclusion of this series. Questions can and will be answered, but this seems like a good stopping point. Now I just need ideas for more things to post.

EDIT: I had found, and meant to post, the BNF for ANSI C. This might help you get an idea for how they break things down. I also found this page, which seems to be for a commercial parser, but it has grammars for several languages, including the original form of BASIC.

Defining a Language (Part 4)

Tuesday, January 1st, 2008

For this installment, we’ll cover examples of both these grammar things, and of BNF. I lied (well, not really) before when I said that I would use BNF for the notation all the time. I won’t. I’ll use -> to be an arrow to the right, and I’ll use => to be a double arrow to the right. They look close enough to the real thing.

For our example, let N (the non-terminals) be {o, S}, T (the terminals) be {a, b}, and P (the productions) be {o -> bo, o -> aS, S -> bS, S -> b}. o will be our starting string.

We can say that bbab is derivable from o ( noted as o => bbab ). We can prove this as follows.

o => bo => bbo => bbaS => bbab

Each step above follows the use of one production from the grammar. But, wait, you say, Can’t you get other results? Yes. You can get other results. The shortest one you can get is ab ( o => aS => ab ), but you’d find that all possible results are in the following form

b^n a b^m for n >= 0, m >= 1

Which means, you will have some number of b’s between 0 and infinity in a row, followed by a single a, followed by at least one more b. So bbbbbbbbbbbbbbab

We could take the much more complicated grammar of T={a,b,c}, N={o, A, B, C, D, E} and P={o -> aAB, o ->aB, A -> aAC, A->aC, B->Dc, D->b, CD-> CE, CE->DE, DE->DC, Cc->Dcc}. You’re probably thinking… whoa, that’s a lot. Let’s break it down.

We can condense a fair bit this way: CD => CE => DE => DC. We can use this derivation to skip a few steps in the applications of the productions.

Let’s say we wanted to prove that string aaabbbaaa is a valid string in this grammar. We could with the following derivation.

o => aAB => aaACB => aaaCCDc => aaaDCCc => aaaDCDcc => aaaDDCcc => aaaDDDccc => aaabbbaaa

And if we wanted to, we could show that this grammar will produce strings in the form of a^n b^n c^n, or some number of a’s, followed by the same number of b’s, followed by the same number of c’s. This result is the Language the grammar makes.

Now onto BNF. Non-terminals are surrounded by <>’s. Terminals are not. Productions are in the form S ::= T, and the | character represents “or”. So something like S ::= T1 | T2 | T3 would mean that S could become T1, T2, or T3.

Let us define a grammar in BNF as follows:

<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
<integer> ::= < signed integer > | < unsigned integer>
<signed integer> ::= + <unsigned integer> | - <unsigned integer>
<unsigned integer> ::= <digit>|<digit><unsigned integer>

and lets have <integer> as the starting symbol.

With this grammar in BNF, we can parse all integers. If we had -901, we could derive it this way.

<integer> => <signed integer> => – <unsigned integer> => – <digit><unsigned integer> => -<digit><digit><unsigned integer> => -<digit><digit><digit> => -9<digit><digit> => -90<digit> => -901

You will notice that in BNF, you don’t explicitly define what are non-terminals and what are terminals. You have a bunch of productions that you apply to a string. This makes it more ideal for defining larger languages. Many programming languages are defined in terms of BNF, like C++, FORTRAN, and Pascal (according to my book). I’m going to do more research on how they do that latter.

And because I don’t like to plagiarize, the examples in this installment all came from “Discrete Mathematics” 6th edition by Richard Johnsonbaugh, ISBN 0-13-117686-2, from section 12.3 on Languages and Grammars. I did rewrite them and the explanations, but the details of the examples did come from the book. It is a very good book in my opinion, however, it’s about the only one I’ve had on the topic.

The next installment will be on applications of BNF. I will hopefully have some solid, real world examples of how to apply BNF to a programming language and compilers.