check links in web documents or full websites
Find a file
Miro Hrončok ff5ebbae69 Require beautifulsoup4 instead of bs4
bs4 is a dummy package managed by the developer of Beautiful Soup to prevent
name squatting. The official name of PyPI’s Beautiful Soup Python package is
beautifulsoup4. The bs4 package ensures that if you type pip install bs4 by
mistake you will end up with Beautiful Soup.

However, for requirements, it's cleaner to use the proper name.
For downstream packaging in Fedora, this avoids the need of packaging
the dummy package.
2020-02-06 10:05:13 +01:00
.github add github issue template 2018-03-26 09:35:43 -04:00
cgi-bin Update references to GitHub project from wummel to linkchecker 2019-04-18 19:59:52 +01:00
config Move GUI files to separate project 2016-01-23 13:28:15 +01:00
doc Update references to GitHub project from wummel to linkchecker 2019-04-18 19:59:52 +01:00
linkcheck Actually fix TypeError when checking https link 2019-11-19 20:06:10 +00:00
po Move GUI files to separate project 2016-01-23 13:28:15 +01:00
scripts Replace deprecated cgi.escape 2019-09-17 20:25:05 +01:00
tests Fix TypeError when checking https link and test 2019-11-11 20:12:25 +00:00
windows Remove unused code from network subpackage 2019-10-19 10:27:34 +01:00
.gitattributes Add .gitattributes 2013-12-04 20:04:34 +01:00
.gitignore Add a tox.ini 2017-02-01 16:14:47 +02:00
.project Add Eclipse Pydev project files. 2011-05-18 21:12:18 +02:00
.pydevproject Updated pydev settings. 2011-12-17 19:13:43 +01:00
.travis.yml Sudo is needed to stop/start system services 2019-10-22 17:21:53 +03:00
CODE_OF_CONDUCT.md split code of conduct and contributing guidelines in two 2018-03-26 09:35:01 -04:00
CONTRIBUTING.mdwn Add link to CODE_OF_CONDUCT.md 2018-11-28 22:25:46 +09:00
COPYING Moved some files into the doc/ subdirectory. 2010-03-06 21:52:25 +01:00
dev-requirements.txt Enable https checking using a test server 2019-11-11 20:12:25 +00:00
Dockerfile Update references to GitHub project from wummel to linkchecker 2019-04-18 19:59:52 +01:00
install-rpm.sh Fix RPM installer generation. 2012-04-11 18:41:34 +02:00
linkchecker Avoid info log 'Checking intern URLs only; use --check-extern to check extern URLs.' when solely calling 'linkchecker --version' 2017-09-11 15:04:15 +02:00
linkchecker.freecode Update references to GitHub project from wummel to linkchecker 2019-04-18 19:59:52 +01:00
Makefile Remove unused code from network subpackage 2019-10-19 10:27:34 +01:00
MANIFEST.in Move GUI files to separate project 2016-01-23 13:28:15 +01:00
pytest.ini Move some pytest options into pytest.ini 2019-10-21 17:42:29 +03:00
README.rst Add instructions to install current release tag from git via pip 2019-10-21 16:10:26 +02:00
requirements.txt Require beautifulsoup4 instead of bs4 2020-02-06 10:05:13 +01:00
robots.txt Add non-ascii values to test robots.txt 2008-07-13 13:01:59 +00:00
setup.cfg Remove platform-specific installer stuff and ensure a build .whl wheel file can be built. 2016-01-17 09:05:21 +01:00
setup.py Require beautifulsoup4 instead of bs4 2020-02-06 10:05:13 +01:00
tox.ini Enable https checking using a test server 2019-11-11 20:12:25 +00:00

LinkChecker
============

|Build Status|_ |License|_

.. |Build Status| image:: https://travis-ci.org/linkchecker/linkchecker.svg?branch=master
.. _Build Status: https://travis-ci.org/linkchecker/linkchecker
.. |License| image:: http://img.shields.io/badge/license-GPL2-d49a6a.svg
.. _License: http://opensource.org/licenses/GPL-2.0

Check for broken links in web sites.

Features
---------

- recursive and multithreaded checking and site crawling
- output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats
- HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support
- restrict link checking with regular expression filters for URLs
- proxy support
- username/password authorization for HTTP, FTP and Telnet
- honors robots.txt exclusion protocol
- Cookie support
- HTML5 support
- a command line and web interface
- various check plugins available, eg. HTML syntax and antivirus checks.

Installation
-------------

See `doc/install.txt`_ in the source code archive for general information. Except the given information there, please take note of the following:

.. _doc/install.txt: doc/install.txt

Python 2.7.2 or later is needed. It doesn't work with Python 3 yet, see `#40 <https://github.com/linkchecker/linkchecker/pull/40>`_ for details.

The version in the pip repository is old. Instead, you can use pip to install the latest release from git: ``pip install git+https://github.com/linkchecker/linkchecker.git@v9.4.0``. See `#4 <https://github.com/linkchecker/linkchecker/pull/4>`_.

Windows builds are seriously lagging behind the Linux releases, see `#53 <https://github.com/linkchecker/linkchecker/issues/53>`_ for details. For now, the only two options are to install from source or use `Docker for Windows <https://www.docker.com/docker-windows>`_.

Usage
------
Execute ``linkchecker http://www.example.com``.
For other options see ``linkchecker --help``.

Docker usage
-------------

If you do not want to install any additional libraries/dependencies you can use the Docker image.

Example for external web site check:
```
docker run --rm -it -u $(id -u):$(id -g) linkchecker/linkchecker --verbose https://google.com
```

Local HTML file check:
```
docker run --rm -it -u $(id -u):$(id -g) -v "$PWD":/mnt linkchecker/linkchecker --verbose index.html
```