101 lines
6.6 KiB
HTML
101 lines
6.6 KiB
HTML
<html lang="en">
|
|
<head>
|
|
<title>info on vore microcomputers crawler bots</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
|
|
<link rel="stylesheet" href="microsoft.css"/>
|
|
<link rel="icon" type="image/x-icon" href="/favicon.ico">
|
|
</head>
|
|
<body>
|
|
<div class="container">
|
|
<div class="container-item">
|
|
<div class="typeset-logo">
|
|
<img src="/assets/logo_typeset.svg" alt="vore microcomputers logo"/>
|
|
</div>
|
|
<div class="container">
|
|
<h1>information on vore microcomputers crawler bots (vorebot)</h1>
|
|
<p>
|
|
if you have seen activity on your website and/or api endpoints coming from a user-agent
|
|
containing "vorebot", that is likely activity from our work-in-progress web crawler.
|
|
if you do not know what a "web crawler" is, we suggest you take a look at
|
|
<a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">this google page which explains it well</a>
|
|
<br/>
|
|
however, the main idea is that since there is no central list of every website on the internet, we have
|
|
to look at every page we can, find the urls on that page, and repeat, in order to create a list of pages
|
|
that software such as search engines can use to help find specific websites.
|
|
<br/>
|
|
<br/>
|
|
our web crawler is specifically used for indexing for the work-in-progress search engine <a href="https://asklyphe.com">askLyphe</a>,
|
|
which aims to not rely on the results of other search engines and as such needs its own web crawler to function.
|
|
we do not use our indexes to train neural networks, and currently do not store full pages in their entirety whatsoever.
|
|
<br/>
|
|
<br/>
|
|
our web crawler attempts to respect standard <a href="https://en.wikipedia.org/wiki/Robots.txt">robots.txt files</a>,
|
|
and should also respect robots.txt blocks for googlebot (unless you specifically allow vorebot);
|
|
however, no one is a perfect programmer and we may have made a mistake.
|
|
if we have made a mistake and accidentally indexed your site, please email us
|
|
at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a> with your site
|
|
url and we can prevent your site from being indexed in the future.
|
|
<br/>
|
|
<br/>
|
|
if you do not have a robots.txt file and still do not want your site to be indexed, we can still
|
|
block your site manually if you email us, but we STRONGLY recommend that you set up a robots.txt file
|
|
in order to prevent other web crawlers from indexing your site in the future.
|
|
<br/>
|
|
<br/>
|
|
for further questions or comments, feel free to email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
|
|
|
|
</p>
|
|
|
|
<h3>Proper English for Machine Translations</h3>
|
|
<p>
|
|
If you have seen activity on your website or API endpoints coming from a User-Agent containing the word
|
|
"Vorebot", that is likely activity from our work-in-progress Web Crawler.
|
|
If you do not know what a web crawler is, we suggest you take a look at <a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">https://developers.google.com/search/docs/fundamentals/how-search-works#crawling</a>.
|
|
However, the main idea is that in order to develop tools such as Search Engines, one must use a robot to
|
|
find every website it can, find the URLs on that website, and then find the URLs on those websites, and
|
|
so on.
|
|
<br/>
|
|
<br/>
|
|
Our web crawler is specifically used for indexing for the search engine <a href="https://asklyphe.com">askLyphe</a>,
|
|
which is currently in development and not available to the public. Our design goal is to not rely on other search engines for our results,
|
|
thus we must run our own web crawler.
|
|
We do not use our indexes to train neural networks, and currently do not store full pages in their entirety.
|
|
<br/>
|
|
<br/>
|
|
Our web crawler attempts to respect "robots.txt" files (<a href="https://en.wikipedia.org/wiki/Robots.txt">https://en.wikipedia.org/wiki/Robots.txt</a>)
|
|
and will also respect blocks on "googlebot" (unless you specifically allow "vorebot"). However, our
|
|
program may make errors. If our program has made an error, please email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
|
|
and give us your Website URL, and we can block your website from being automatically visited in the future.
|
|
<br/>
|
|
<br/>
|
|
If you do not have a "robots.txt" file and still do not want your site to be visited by us, we can still
|
|
manually block your site if you email us.
|
|
</p>
|
|
</div>
|
|
<p id="footer">
|
|
<br>contact us at <a href="mailto:nikocs@voremicrocomputers.com">nikocs@voremicrocomputers.com</a> <a
|
|
href="/pgp.asc">pgp key</a>
|
|
<br>
|
|
for abuse, copyright, or other legal issues, contact <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a><br>
|
|
<!-- or, call at <a href="tel:1-888-519-0437">+1 (888) 519-0437 (toll-free)</a> or <a href="tel:1-507-767-9433">+1 (507) RMS-YIFF (local)</a> -->
|
|
(our phone system is currently down, sorry!)
|
|
<br><br>
|
|
<a href="/archive.html">archives</a>
|
|
<br><br>
|
|
Vore Microcomputers is an unregistered trademark of Real Microsoft, LLC. all references to "the company"
|
|
are in reference to Real Microsoft, LLC and the name Vore Microcomputers is only a reference to the
|
|
hardware/software focused development project held under Real Microsoft, LLC. Real Microsoft, LLC is
|
|
in no way associated with the Microsoft Corporation or its products/projects.
|
|
<br><br>
|
|
<a href="https://climate.stripe.com/4MO1d9">Our Carbon Footprint</a>
|
|
</p>
|
|
</div>
|
|
<div class="container-item">
|
|
<div class="credit">
|
|
<img src="xenia.png" alt="image of xenia, an anthropomorphic fox who was a contender for the linux mascot"/>
|
|
<span>image made by <a href="https://twitter.com/cathodegaytube/">@cathodegaytube</a>!</span>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</body>
|
|
</html>
|