check links in web documents or full websites
Find a file
Marius Gedminas 05c02da2b0 Bump version in git to 10.0.0.dev0
It is confusing to have different versions of the code self-identify
with the same version number.  In my experience it's always best to
increment the version number and add a .dev0 suffix right after making a
release.  When it's time to make a new release, you remove the .dev0,
commit, tag that commit, then make second commit that bumps the version
and adds .dev0 back.

This way only releases identify themselves as "version X.Y.Z" with no
.dev0 suffix and it's immediatelly apparent when you've got a prerelease
installed from git.
2020-05-17 20:06:50 +03:00
.github add github issue template 2018-03-26 09:35:43 -04:00
cgi-bin Update references to GitHub project from wummel to linkchecker 2019-04-18 19:59:52 +01:00
config Move GUI files to separate project 2016-01-23 13:28:15 +01:00
doc Remove spaces after names in function definitions 2020-05-16 20:19:42 +01:00
linkcheck Remove spaces after names in class method definitions 2020-05-16 20:19:42 +01:00
po Move GUI files to separate project 2016-01-23 13:28:15 +01:00
scripts Remove spaces after names in function definitions 2020-05-16 20:19:42 +01:00
tests Remove spaces after names in class method definitions 2020-05-16 20:19:42 +01:00
windows Remove home-cooked htmlparser and use BeautifulSoup 2019-07-22 19:59:37 +01:00
.gitattributes Add .gitattributes 2013-12-04 20:04:34 +01:00
.gitignore Ignore files created during test runs 2020-03-23 11:05:13 +02:00
.project Add Eclipse Pydev project files. 2011-05-18 21:12:18 +02:00
.pydevproject Updated pydev settings. 2011-12-17 19:13:43 +01:00
.travis.yml Restore support for older BeautifulSoup4 versions 2020-03-30 14:49:56 +03:00
CODE_OF_CONDUCT.md split code of conduct and contributing guidelines in two 2018-03-26 09:35:01 -04:00
CONTRIBUTING.mdwn Add link to CODE_OF_CONDUCT.md 2018-11-28 22:25:46 +09:00
COPYING Moved some files into the doc/ subdirectory. 2010-03-06 21:52:25 +01:00
dev-requirements.txt Enable https checking using a test server 2019-11-11 20:12:25 +00:00
Dockerfile use python3 in readme and dockerfile 2020-05-09 08:03:23 -04:00
install-rpm.sh Fix RPM installer generation. 2012-04-11 18:41:34 +02:00
linkchecker Remove spaces after names in class method definitions 2020-05-16 20:19:42 +01:00
linkchecker.freecode Update references to GitHub project from wummel to linkchecker 2019-04-18 19:59:52 +01:00
Makefile Remove home-cooked htmlparser and use BeautifulSoup 2019-07-22 19:59:37 +01:00
MANIFEST.in Remove third_party directory from MANIFEST.in 2020-03-31 19:46:31 +01:00
pytest.ini Move some pytest options into pytest.ini 2019-10-21 17:42:29 +03:00
README.rst use python3 in readme and dockerfile 2020-05-09 08:03:23 -04:00
requirements.txt Remove use of the future package 2020-04-15 19:49:16 +01:00
robots.txt Add non-ascii values to test robots.txt 2008-07-13 13:01:59 +00:00
setup.cfg Remove platform-specific installer stuff and ensure a build .whl wheel file can be built. 2016-01-17 09:05:21 +01:00
setup.py Bump version in git to 10.0.0.dev0 2020-05-17 20:06:50 +03:00
tox.ini Remove use of the future package 2020-04-15 19:49:16 +01:00

LinkChecker
============

|Build Status|_ |License|_

.. |Build Status| image:: https://travis-ci.com/linkchecker/linkchecker.svg?branch=master
.. _Build Status: https://travis-ci.com/linkchecker/linkchecker
.. |License| image:: http://img.shields.io/badge/license-GPL2-d49a6a.svg
.. _License: http://opensource.org/licenses/GPL-2.0

Check for broken links in web sites.

Features
---------

- recursive and multithreaded checking and site crawling
- output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats
- HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support
- restrict link checking with regular expression filters for URLs
- proxy support
- username/password authorization for HTTP, FTP and Telnet
- honors robots.txt exclusion protocol
- Cookie support
- HTML5 support
- a command line and web interface
- various check plugins available, eg. HTML syntax and antivirus checks.

Installation
-------------

See `doc/install.txt`_ in the source code archive for general information. Except the given information there, please take note of the following:

.. _doc/install.txt: doc/install.txt

Python 3 or later is needed.

The version in the pip repository is old. Instead, you can use pip to install the latest release from git: ``pip install git+https://github.com/linkchecker/linkchecker.git@v9.4.0``. See `#4 <https://github.com/linkchecker/linkchecker/pull/4>`_.

Windows builds are seriously lagging behind the Linux releases, see `#53 <https://github.com/linkchecker/linkchecker/issues/53>`_ for details. For now, the only two options are to install from source or use `Docker for Windows <https://www.docker.com/docker-windows>`_.

Usage
------
Execute ``linkchecker http://www.example.com``.
For other options see ``linkchecker --help``.

Docker usage
-------------

If you do not want to install any additional libraries/dependencies you can use the Docker image.

Example for external web site check:
```
docker run --rm -it -u $(id -u):$(id -g) linkchecker/linkchecker --verbose https://google.com
```

Local HTML file check:
```
docker run --rm -it -u $(id -u):$(id -g) -v "$PWD":/mnt linkchecker/linkchecker --verbose index.html
```