linkchecker/PyLR/doc/PyLR.html

<html>
<body bgcolor="#ffffff">
<title> PyLR -- Fast LR parsing in python </title>
<!-- Changed by: Scott, 15-Dec-1997 -->
<center>
<h2>PyLR -- Fast LR parsing in python</h2>
<hr>
</center>

<ul>
<li> <a href="#whatis"> What is PyLR? </a>
<li> <a href="#status"> What is the current state of PyLR? </a>
<li> <a href="#where"> Where do I get PyLR? </a>
<li> <a href="#directions"> What will be added to PyLR? </a>
<li> <a href="#parsing"> Where do I find out about parsing theory? </a>
<li> <a href="#contrib"> How can I contribute to PyLR? </a>
</ul>
<hr>
<p><p>
<a name="whatis"><h2>What is PyLR?</h2></a>

PyLR is a package of tools for creating efficient parsers in python,
commonly known as a compiler compiler.  PyLR is currently under
development.  A ful release is almost complete, but there are still a few missing
features that would make it much nicer.

<p>
PyLR (pronounced 'pillar') was motivated by the frequencly with which parsers are hand
coded in python, the performance demands that these parsers are subject to (you just can't beat
native machine code for speed...), and academic curiosity (I wanted to really know how LR
parsing works).
<p><p>


<a name="status"> <h2>What is the current state of PyLR? </h2></a>
PyLR currently has class interfaces to a Grammar, a Lexer, an extension module
defining a parsing engine builtin type, and a parser generator script.  All of these components
are based on sound parsing theory, but nevertheless haven't been tested by anyone but it's author.
The code as is stands can definitely be of use to anyone hand writing a parser in python, but some
of the nicer things in the complete package <em> just haven't been done yet </em>.  <p>
PyLR is therefore under development, as it will always be.  PyLR will be given a release number
once it supplies the following tools:
<ul>


  <LI>  write an 'engine' module that implements the LR parsing
algorythm in C with callbacks to python functions. (done) </LI>


  <LI> write a Lexer class using re (done)</LI>


  <LI>  write a Grammar class that will take as input a context
free grammar and produce the parsing tables necessary to complement
the engine.  This is to be done with LR(1) grammars (done and then
deleted -- extremely inefficient) and LALR(1) Grammars(done,
except with epsilon (empty) productions,<EM> much</EM> more efficient). </LI>


  <LI> add a user interface -- manually write a lexer and Grammar
using the exisiting classes to parse lexer and grammar specifications
modelled after lex/flex and yacc/bison. (done for Grammars)
 </LI>

  <LI>  write documentation. (usable, but not done)
 </LI>

  <LI>  (post release) add grammars to various languages to the
	distribution.
 </LI>
</ul>
In addtion, I have the following plan for the project:
<UL>
  <LI> make 'epsilon' (empty) productions work  (many of them work now, but not all) </LI>

  <LI> optimize the Lexer.  Try to join it into one regular expression and derive
       function calls from match object data. (done, still the slowest part of parsing)</LI>

  <LI> add error specification routines. </LI>

  <LI> change the parser generation algorithm to use only kernel LALR(1) items
       in the computation of shift actions and gotos in the goto table.  This
       should significantly enhance the rate of parser generation, which is currently
       a bit slow, but certainly acceptable for medium-sized grammars (&lt; ~100 productions)
       (done!) this version
</LI>


  <LI> write a Parser for sql, as used in <A HREF="http://www.pythonpros.com/arw/kwParsing/">gadfly</A>
  </LI>

  <LI> add operator precedence as an option to the parser specification (further down the road...)</LI>

</UL>
These things will probably be done over the next month or two (as I only have free time to give
to this project...Ahemmm...).
<p><p>
<a name="where"><h2>Where do I get PyLR? </h2></a>
You can get PyLR in one of two places, <a href="ftp://chronis.icgroup.com/pub/">here</a>
or <a href="PyLR.tgz"> here</a>.  Both versions will be in sync with each other.
<p><p>

<a name="directions"><h2>What will be added to PyLR? </h2></a>
In addition to the <a href ="#status">list of things to finish </a> before a full release,
is published, PyLR could be used as the basis for an efficient datapath analyzer (optimizer),
for a front end to translation from one language to another, for type checking code, etc.<p>
As soon as the first release is completed, Tools to aid in all these things could well be added
to the package.  Also, anyone wanting to contribute parser specifications for
languages of general use is most welcome.
<p><p>

<a name="parsing"> <h2>Where do I find out more about parsing? </h2></a>
Parsing was for a long time a big challenge for computer scientists.  The need for
computer parsing originally came about with the first writing of compilers.  Since then, the
theory behind parsing has been studied in depth and has pretty much stabilized as it no longer
really presents a big problem in terms of speed or size in terms of parsing todays computer
languages.  One standard means of parsing that has been used for years because of its efficiency
is LR parsing (more particularly, LALR parsing).  A lot of good information is in
<a href="http://www.amazon.com/exec/obidos/ISBN=1565920007">
Lex and Yacc</a> ,
<a href="http://www.amazon.com/exec/obidos/ISBN=0201100886">
The Dragon Book </a>, and
it seems like the only place to find good info on LALR parsing is in

<pre>
DeRemer, F.; and Pennello, T.Efficient computation of LALR(1) look-ahead sets, ACM Trans.
 Program. Lang. Syst. 4 (1982), 615-649.
</pre>

Finally, to find out how to use PyLR, see the<A HREF="manual.html">PyLR manual</A>

<a name="contrib"> <h2>How do I contribute to PyLR? </h2></a>
<a href="mailto:scott@chronis.icgroup.com">mail me. </a>