publish information on web crawlers
This commit is contained in:
parent
9c8178fe32
commit
540a91659f
1 changed files with 90 additions and 0 deletions
90
crawler.html
Normal file
90
crawler.html
Normal file
|
@ -0,0 +1,90 @@
|
|||
<html lang="en">
|
||||
<head>
|
||||
<title>info on vore microcomputers crawler bots</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
|
||||
<link rel="stylesheet" href="microsoft.css"/>
|
||||
<link rel="icon" type="image/x-icon" href="/favicon.ico">
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
<div class="container-item">
|
||||
<div class="typeset-logo">
|
||||
<img src="/assets/logo_typeset.svg" alt="vore microcomputers logo"/>
|
||||
</div>
|
||||
<div class="container">
|
||||
<h1>information on vore microcomputers crawler bots (vorebot)</h1>
|
||||
<p>
|
||||
if you have seen activity on your website and/or api endpoints coming from a user-agent
|
||||
containing "vorebot", that is likely activity from our work-in-progress web crawler.
|
||||
if you do not know what a "web crawler" is, we suggest you take a look at
|
||||
<a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">this google page which explains it well</a>
|
||||
<br/>
|
||||
however, the main idea is that since there is no central list of every website on the internet, we have
|
||||
to look at every page we can, find the urls on that page, and repeat, in order to create a list of pages
|
||||
that software such as search engines can use to help find specific websites.
|
||||
<br/>
|
||||
<br/>
|
||||
our web crawler attempts to respect standard <a href="https://en.wikipedia.org/wiki/Robots.txt">robots.txt files</a>,
|
||||
and should also respect robots.txt blocks for googlebot (unless you specifically allow vorebot);
|
||||
however, no one is a perfect programmer and we may have made a mistake.
|
||||
if we have made a mistake and accidentally indexed your site, please email us
|
||||
at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a> with your site
|
||||
url and we can prevent your site from being indexed in the future.
|
||||
<br/>
|
||||
<br/>
|
||||
if you do not have a robots.txt file and still do not want your site to be indexed, we can still
|
||||
block your site manually if you email us, but we STRONGLY recommend that you set up a robots.txt file
|
||||
in order to prevent other web crawlers from indexing your site in the future.
|
||||
<br/>
|
||||
<br/>
|
||||
for further questions or comments, feel free to email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
|
||||
|
||||
</p>
|
||||
|
||||
<h3>Proper English for Machine Translations</h3>
|
||||
<p>
|
||||
If you have seen activity on your website or API endpoints coming from a User-Agent containing the word
|
||||
"Vorebot", that is likely activity from our work-in-progress Web Crawler.
|
||||
If you do not know what a web crawler is, we suggest you take a look at <a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">https://developers.google.com/search/docs/fundamentals/how-search-works#crawling</a>.
|
||||
However, the main idea is that in order to develop tools such as Search Engines, one must use a robot to
|
||||
find every website it can, find the URLs on that website, and then find the URLs on those websites, and
|
||||
so on.
|
||||
<br/>
|
||||
<br/>
|
||||
Our web crawler attempts to respect "robots.txt" files (<a href="https://en.wikipedia.org/wiki/Robots.txt">https://en.wikipedia.org/wiki/Robots.txt</a>)
|
||||
and will also respect blocks on "googlebot" (unless you specifically allow "vorebot"). However, our
|
||||
program may make errors. If our program has made an error, please email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
|
||||
and give us your Website URL, and we can block your website from being automatically visited in the future.
|
||||
<br/>
|
||||
<br/>
|
||||
If you do not have a "robots.txt" file and still do not want your site to be visited by us, we can still
|
||||
manually block your site if you email us.
|
||||
</p>
|
||||
</div>
|
||||
<p id="footer">
|
||||
<br>contact us at <a href="mailto:nikocs@voremicrocomputers.com">nikocs@voremicrocomputers.com</a> <a
|
||||
href="/pgp.asc">pgp key</a>
|
||||
<br>
|
||||
for abuse, copyright, or other legal issues, contact <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a><br>
|
||||
<!-- or, call at <a href="tel:1-888-519-0437">+1 (888) 519-0437 (toll-free)</a> or <a href="tel:1-507-767-9433">+1 (507) RMS-YIFF (local)</a> -->
|
||||
(our phone system is currently down, sorry!)
|
||||
<br><br>
|
||||
<a href="/archive.html">archives</a>
|
||||
<br><br>
|
||||
Vore Microcomputers is an unregistered trademark of Real Microsoft, LLC. all references to "the company"
|
||||
are in reference to Real Microsoft, LLC and the name Vore Microcomputers is only a reference to the
|
||||
hardware/software focused development project held under Real Microsoft, LLC. Real Microsoft, LLC is
|
||||
in no way associated with the Microsoft Corporation or its products/projects.
|
||||
<br><br>
|
||||
<a href="https://climate.stripe.com/4MO1d9">Our Carbon Footprint</a>
|
||||
</p>
|
||||
</div>
|
||||
<div class="container-item">
|
||||
<div class="credit">
|
||||
<img src="xenia.png" alt="image of xenia, an anthropomorphic fox who was a contender for the linux mascot"/>
|
||||
<span>image made by <a href="https://twitter.com/cathodegaytube/">@cathodegaytube</a>!</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
Loading…
Add table
Reference in a new issue