2003-06-24 20:57:33 +00:00
|
|
|
- all threads should regularly poll a status variable
|
|
|
|
|
this can be used to make ctrl-c working faster, and to print messages
|
2003-06-20 10:47:47 +00:00
|
|
|
|
2003-07-23 23:52:04 +00:00
|
|
|
- the HTML parser should be even more forgiving with badly formatted html
|
|
|
|
|
|
2003-05-05 06:36:42 +00:00
|
|
|
possible Python 2.3 improvements (ie needs Python >= 2.3)
|
2003-01-05 02:01:50 +00:00
|
|
|
- get rid of timeoutsocket.py, the default socket has timeouts
|
|
|
|
|
- use optparse instead of getopt with more flexible commandline help
|
|
|
|
|
- replace the debug() function with the logging module
|
|
|
|
|
we'll see how we can insert multiple debug levels into this thing
|
|
|
|
|
- use Bool object type
|
2003-04-29 14:55:42 +00:00
|
|
|
- get rid of the patched robotparser.py
|
2003-07-04 14:27:44 +00:00
|
|
|
- use new csv module
|
2003-07-04 15:09:17 +00:00
|
|
|
- use the Set type instead of hashmaps (did I use hashmaps for sets here?)
|
2003-04-29 14:55:42 +00:00
|
|
|
|
|
|
|
|
include some web check and/or spider features:
|
|
|
|
|
- warn if overall size of page (including images/flash/etc.) is too big
|
|
|
|
|
- save downloaded pages
|