publish information on web crawlers

This commit is contained in:
husky 2024-01-14 02:00:34 -08:00
parent 9c8178fe32
commit 540a91659f
No known key found for this signature in database
GPG key ID: 6B3D8CB511646891

90
crawler.html Normal file
View file

@ -0,0 +1,90 @@
<html lang="en">
<head>
<title>info on vore microcomputers crawler bots</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<link rel="stylesheet" href="microsoft.css"/>
<link rel="icon" type="image/x-icon" href="/favicon.ico">
</head>
<body>
<div class="container">
<div class="container-item">
<div class="typeset-logo">
<img src="/assets/logo_typeset.svg" alt="vore microcomputers logo"/>
</div>
<div class="container">
<h1>information on vore microcomputers crawler bots (vorebot)</h1>
<p>
if you have seen activity on your website and/or api endpoints coming from a user-agent
containing "vorebot", that is likely activity from our work-in-progress web crawler.
if you do not know what a "web crawler" is, we suggest you take a look at
<a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">this google page which explains it well</a>
<br/>
however, the main idea is that since there is no central list of every website on the internet, we have
to look at every page we can, find the urls on that page, and repeat, in order to create a list of pages
that software such as search engines can use to help find specific websites.
<br/>
<br/>
our web crawler attempts to respect standard <a href="https://en.wikipedia.org/wiki/Robots.txt">robots.txt files</a>,
and should also respect robots.txt blocks for googlebot (unless you specifically allow vorebot);
however, no one is a perfect programmer and we may have made a mistake.
if we have made a mistake and accidentally indexed your site, please email us
at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a> with your site
url and we can prevent your site from being indexed in the future.
<br/>
<br/>
if you do not have a robots.txt file and still do not want your site to be indexed, we can still
block your site manually if you email us, but we STRONGLY recommend that you set up a robots.txt file
in order to prevent other web crawlers from indexing your site in the future.
<br/>
<br/>
for further questions or comments, feel free to email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
</p>
<h3>Proper English for Machine Translations</h3>
<p>
If you have seen activity on your website or API endpoints coming from a User-Agent containing the word
"Vorebot", that is likely activity from our work-in-progress Web Crawler.
If you do not know what a web crawler is, we suggest you take a look at <a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">https://developers.google.com/search/docs/fundamentals/how-search-works#crawling</a>.
However, the main idea is that in order to develop tools such as Search Engines, one must use a robot to
find every website it can, find the URLs on that website, and then find the URLs on those websites, and
so on.
<br/>
<br/>
Our web crawler attempts to respect "robots.txt" files (<a href="https://en.wikipedia.org/wiki/Robots.txt">https://en.wikipedia.org/wiki/Robots.txt</a>)
and will also respect blocks on "googlebot" (unless you specifically allow "vorebot"). However, our
program may make errors. If our program has made an error, please email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
and give us your Website URL, and we can block your website from being automatically visited in the future.
<br/>
<br/>
If you do not have a "robots.txt" file and still do not want your site to be visited by us, we can still
manually block your site if you email us.
</p>
</div>
<p id="footer">
<br>contact us at <a href="mailto:nikocs@voremicrocomputers.com">nikocs@voremicrocomputers.com</a> <a
href="/pgp.asc">pgp key</a>
<br>
for abuse, copyright, or other legal issues, contact <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a><br>
<!-- or, call at <a href="tel:1-888-519-0437">+1 (888) 519-0437 (toll-free)</a> or <a href="tel:1-507-767-9433">+1 (507) RMS-YIFF (local)</a> -->
(our phone system is currently down, sorry!)
<br><br>
<a href="/archive.html">archives</a>
<br><br>
Vore Microcomputers is an unregistered trademark of Real Microsoft, LLC. all references to "the company"
are in reference to Real Microsoft, LLC and the name Vore Microcomputers is only a reference to the
hardware/software focused development project held under Real Microsoft, LLC. Real Microsoft, LLC is
in no way associated with the Microsoft Corporation or its products/projects.
<br><br>
<a href="https://climate.stripe.com/4MO1d9">Our Carbon Footprint</a>
</p>
</div>
<div class="container-item">
<div class="credit">
<img src="xenia.png" alt="image of xenia, an anthropomorphic fox who was a contender for the linux mascot"/>
<span>image made by <a href="https://twitter.com/cathodegaytube/">@cathodegaytube</a>!</span>
</div>
</div>
</div>
</body>
</html>