add more information on what our crawler is used for

This commit is contained in:
husky 2024-01-30 16:44:08 -08:00
parent 540a91659f
commit fcec37a9f2
No known key found for this signature in database
GPG key ID: 6B3D8CB511646891

View file

@ -24,6 +24,11 @@
that software such as search engines can use to help find specific websites.
<br/>
<br/>
our web crawler is specifically used for indexing for the work-in-progress search engine <a href="https://asklyphe.com">askLyphe</a>,
which aims to not rely on the results of other search engines and as such needs its own web crawler to function.
we do not use our indexes to train neural networks, and currently do not store full pages in their entirety whatsoever.
<br/>
<br/>
our web crawler attempts to respect standard <a href="https://en.wikipedia.org/wiki/Robots.txt">robots.txt files</a>,
and should also respect robots.txt blocks for googlebot (unless you specifically allow vorebot);
however, no one is a perfect programmer and we may have made a mistake.
@ -51,6 +56,12 @@
so on.
<br/>
<br/>
Our web crawler is specifically used for indexing for the search engine <a href="https://asklyphe.com">askLyphe</a>,
which is currently in development and not available to the public. Our design goal is to not rely on other search engines for our results,
thus we must run our own web crawler.
We do not use our indexes to train neural networks, and currently do not store full pages in their entirety.
<br/>
<br/>
Our web crawler attempts to respect "robots.txt" files (<a href="https://en.wikipedia.org/wiki/Robots.txt">https://en.wikipedia.org/wiki/Robots.txt</a>)
and will also respect blocks on "googlebot" (unless you specifically allow "vorebot"). However, our
program may make errors. If our program has made an error, please email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>