add more information on what our crawler is used for
This commit is contained in:
parent
540a91659f
commit
fcec37a9f2
1 changed files with 11 additions and 0 deletions
11
crawler.html
11
crawler.html
|
@ -24,6 +24,11 @@
|
|||
that software such as search engines can use to help find specific websites.
|
||||
<br/>
|
||||
<br/>
|
||||
our web crawler is specifically used for indexing for the work-in-progress search engine <a href="https://asklyphe.com">askLyphe</a>,
|
||||
which aims to not rely on the results of other search engines and as such needs its own web crawler to function.
|
||||
we do not use our indexes to train neural networks, and currently do not store full pages in their entirety whatsoever.
|
||||
<br/>
|
||||
<br/>
|
||||
our web crawler attempts to respect standard <a href="https://en.wikipedia.org/wiki/Robots.txt">robots.txt files</a>,
|
||||
and should also respect robots.txt blocks for googlebot (unless you specifically allow vorebot);
|
||||
however, no one is a perfect programmer and we may have made a mistake.
|
||||
|
@ -51,6 +56,12 @@
|
|||
so on.
|
||||
<br/>
|
||||
<br/>
|
||||
Our web crawler is specifically used for indexing for the search engine <a href="https://asklyphe.com">askLyphe</a>,
|
||||
which is currently in development and not available to the public. Our design goal is to not rely on other search engines for our results,
|
||||
thus we must run our own web crawler.
|
||||
We do not use our indexes to train neural networks, and currently do not store full pages in their entirety.
|
||||
<br/>
|
||||
<br/>
|
||||
Our web crawler attempts to respect "robots.txt" files (<a href="https://en.wikipedia.org/wiki/Robots.txt">https://en.wikipedia.org/wiki/Robots.txt</a>)
|
||||
and will also respect blocks on "googlebot" (unless you specifically allow "vorebot"). However, our
|
||||
program may make errors. If our program has made an error, please email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
|
||||
|
|
Loading…
Add table
Reference in a new issue