Commit graph

2848 commits

Author SHA1 Message Date
gerdneuman
de6a82b378
Added whatsapp:// to ignored protocols
Fixes https://github.com/wummel/linkchecker/issues/595
2018-08-09 13:49:15 +02:00
regexaurus
50a9ff65b8 Updated support (issues) URL 2018-08-03 00:53:47 -04:00
Marius Gedminas
6f55f446ae Load cookies from the --cookiefile correctly
requests.cookies.merge_cookies() requires a dict or a CookieJar as the second argument.
We've been passing lists of Cookie objects instead.

Fixes #62, harder this time.
2018-03-16 13:23:26 +02:00
Marius Gedminas
6becc08284 Fix internal error when using cookies
There was some kind of confusion between a module and a function argument,
introduced in commit 90257a1b5e.

Fixes #62.
2018-03-15 23:30:41 +02:00
Petr Dlouhý
e615480850 Python3: fix reading Safari bookmarks 2018-01-19 09:52:43 +01:00
Petr Dlouhý
256202a20b fixes for Python 3: fix proxysuport 2018-01-19 09:52:43 +01:00
Petr Dlouhý
f128c9c168 Python3: fix gzip2 format 2018-01-19 09:52:43 +01:00
Petr Dlouhý
a1b300c892 Python3: fix imports 2018-01-19 09:52:43 +01:00
Petr Dlouhý
0a13fae3b4 remove third party packages and use them as dependency 2018-01-09 23:25:27 +01:00
Reinhold Füreder
e864bbdabf
Use os.makedirs(...) instead of os.mkdir(...) 2018-01-03 11:33:53 +01:00
Philipp Hahn
1368643a50 Fix fragment identifier quoting
According to <https://tools.ietf.org/html/rfc3986>:
 fragment    = *( pchar / "/" / "?" )
 pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
 unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
 pct-encoded = "%" HEXDIG HEXDIG
 sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

Fixes #96
2017-11-10 08:03:03 -05:00
Antoine Beaupré
71be9b941b
fix incorrect call to the logging module (Closes: #847208) 2017-11-03 09:47:01 -04:00
Félix Sipma
c8d9038ae8 improve get_plugin_folders() docstring 2017-10-18 15:58:18 +02:00
Félix Sipma
deca8c667e introduce linkcheck.configuration.get_user_data() 2017-10-18 15:55:55 +02:00
Félix Sipma
a03e2e4ada use xdg dirs for config & data
~/.linkchecker is used instead of the xdg equivalents if the directory
exists (backward compatibility).
2017-10-17 18:48:07 +02:00
Antoine Beaupré
9b12b5d66f
workaround new limitation in requests
newer requests do not expose the internal SSL socket object so we
cannot verify certificates. there was work to allow custom
verification routines which we could use, but this never finished:

https://github.com/shazow/urllib3/pull/257

so right now, just treat missing socket information as if the cert was
missing.

Closes: #76
2017-10-02 20:19:25 -04:00
Marius Gedminas
4a092c218c Whitespace bigotry 2017-03-14 17:18:27 +02:00
anarcat
5471b63ceb Merge pull request #39 from PetrDlouhy/fix/cache
Fix cache: Don't check one url multiple times
2017-03-14 09:26:07 -04:00
Marius Gedminas
fb1debaa68 Fix incompatible pointer type warnings
The warnings looked like this:

    htmlparse.c: In function ‘yyparse’:
    htmlparse.c:1810:18: warning: passing argument 1 of ‘yyerror’ from incompatible pointer type [-Wincompatible-pointer-types]
    htmlparse.y:40:13: note: expected ‘PyObject ** {aka struct _object **}’ but argument is of type ‘PyObject * {aka struct _object *}’
    htmlparse.c:1927:12: warning: passing argument 1 of ‘yyerror’ from incompatible pointer type [-Wincompatible-pointer-types]
    htmlparse.y:40:13: note: expected ‘PyObject ** {aka struct _object **}’ but argument is of type ‘PyObject * {aka struct _object *}’

The argument is not used, so it doesn't really matter what pointer type
it is.
2017-02-24 15:04:09 +02:00
Petr Dlouhý
eaa538c814 don't check one url multiple times 2017-02-14 10:23:25 +01:00
Marius Gedminas
03dfe3d3a1 Fix "operation on ... may be undefined" [-Wsequence-point] warnings
Fixes a bunch of warnings like

  htmlparse.y:509:25: warning: operation on ‘self->userData->buf’ may be undefined [-Wsequence-point]
  htmlparse.y:518:29: warning: operation on ‘self->userData->tmp_buf’ may be undefined [-Wsequence-point]

which were a result of (macro-expanded) code like this (simplified):

  if ((tmp = (tmp = PyMem_Realloc(...))) == NULL) return NULL;

The PyMem_Resize(p, ...) macro assigns the new value to p before
returning it, so there's no need to assign it again.

See http://bugs.python.org/issue1668036 for evidence (from 2007) that
this is indeed a documented side-effect of the macro API.
2017-02-13 15:20:33 +02:00
Graham Seaman
233e7dcf68 Allow wayback-format urls without affecting atom 'feed' urls 2017-02-09 11:43:45 +00:00
Marius Gedminas
743a5f31cb Crawl HTML attributes in deterministic order
Fixes #17.
2017-02-01 19:19:53 +02:00
Graham Seaman
2e32780dc7 Force header names to lower to allow for CaseInsensitvieDict variability 2017-02-01 16:28:07 +00:00
Marius Gedminas
3c99b6aa30 Fix TypeError: hasattr(): attribute name must be string
The one test failure in Travis happens in
TestConsole.test_internal_error, but only if you have the argcomplete
package installed.

This was a real bug in error reporting code.
2017-02-01 16:02:35 +02:00
Antoine Beaupré
d51b7f34b6 Merge branch '9.3.x' 2017-01-31 19:21:22 -05:00
Antoine Beaupré
da8cecd83c Merge remote-tracking branch 'anarcat/norobots' 2017-01-31 11:34:09 -05:00
Antoine Beaupré
bf45fb1884 fix HTTPS URL checks
in Debian Jessie, linkchecker fails because of an API problem.

it completely breaks HTTPs checks.

this patch fixes the problem

from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=772947
2017-01-31 11:25:45 -05:00
Bastian Kleineidam
1e291afdfa Fix python requests version check 2017-01-31 11:25:38 -05:00
Antoine Beaupré
46d96d0aa0 fix HTTPS URL checks
in Debian Jessie, linkchecker fails because of an API problem.

it completely breaks HTTPs checks.

this patch fixes the problem

from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=772947
2016-09-30 11:20:38 -04:00
Bastian Kleineidam
c2ce810c3f Fix python requests version check 2016-06-28 21:55:10 +02:00
Antoine Beaupré
9d899d1dfa add --no-robots commandline flag
While this flag can be abused, it seems to me like a legitimate use
case that you want to check a fairly small document for mistakes,
which includes references to a website which has a robots.txt that
denies all robots. It turns out that most websites do *not* add a
permission for LinkCheck to use their site, and some sites, like the
Debian BTS for example, are very hostile with bots in general.

Between me using linkcheck and me using my web browser to check those
links one by one, there is not a big difference. In fact, using
linkcheck may be *better* for the website because it will use HEAD
requests instead of a GET, and will not fetch all page elements
(javascript, images, etc) which can often be fairly big.

Besides, hostile users will patch the software themselves: it took me
only a few minutes to disable the check, and a few more to make that
into a proper patch.

By forcing robots.txt without any other option, we are hurting our
good users and not keeping hostile users from doing harm.

The patch is still incomplete, but works. It lacks: documentation and
unit tests.

Closes: #508
2016-05-19 14:43:59 -04:00
Bastian Kleineidam
0ef00eea56 Move GUI files to separate project 2016-01-23 13:28:15 +01:00
Bastian Kleineidam
549533d701 Improved debugging 2016-01-19 21:55:50 +01:00
wummel
a40c39be59 Merge pull request #560 from xvadim/feature
Added plugin for parsing and checking links in Markdown files
2016-01-19 07:30:34 +01:00
wummel
e2556abbb6 Merge pull request #561 from nbigaouette/issue555
Detect if "url_data" contains proxy attributes before using them.
2016-01-17 21:59:35 +01:00
Bastian Kleineidam
3d711666e1 Fix parser for changes in bison 3.0.x 2015-11-26 12:33:44 +01:00
Nicolas Bigaouette
4e56eceb35 Detect if "url_data" contains proxy attributes before using them.
Fix proposed by @colwilson in issue #555.
2014-11-12 09:58:30 -05:00
Vadim Khohlov
d4352fc828 Added plugin for parsing and checking links in Markdown files 2014-11-11 15:35:18 +02:00
Bastian Kleineidam
27937e6f83 Fix requests module version check. 2014-09-22 22:45:04 +02:00
Bastian Kleineidam
228bce1ba2 Add to instead of replace the HTTP client headers. 2014-09-20 12:17:42 +02:00
Bastian Kleineidam
92c4ca9a5e Debug request headers 2014-09-20 12:16:24 +02:00
Bastian Kleineidam
029c20ed98 More python3 fixes 2014-09-12 21:59:07 +02:00
Bastian Kleineidam
35eb30432e Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
Bastian Kleineidam
697e7b82e1 Search for system certs 2014-09-11 21:19:49 +02:00
Bastian Kleineidam
21c7200360 Reactivate paging of help pages. 2014-09-11 19:42:42 +02:00
Bastian Kleineidam
06c6b80ed3 Fix proxy support. 2014-09-05 22:48:10 +02:00
wummel
6580d37dc9 Merge pull request #545 from ArloL/patch-1
Use correct attribute
2014-09-05 21:13:40 +02:00
Bastian Kleineidam
ee4545399d Support itms-services: URLs. #532 2014-09-05 21:06:10 +02:00
Bastian Kleineidam
37d4ed6f83 Add hyphen and dot to the allowed scheme characters. 2014-09-05 20:59:54 +02:00