publish information on web crawlers

2024-01-14 02:00:34 -08:00 · 2024-01-14 02:00:34 -08:00 · 540a91659f
commit 540a91659f
parent 9c8178fe32
1 changed files with 90 additions and 0 deletions
--- a/crawler.html
+++ b/crawler.html
@ -0,0 +1,90 @@
+<html lang="en">
+<head>
+    <title>info on vore microcomputers crawler bots</title>
+    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
+    <link rel="stylesheet" href="microsoft.css"/>
+    <link rel="icon" type="image/x-icon" href="/favicon.ico">
+</head>
+<body>
+<div class="container">
+    <div class="container-item">
+        <div class="typeset-logo">
+            <img src="/assets/logo_typeset.svg" alt="vore microcomputers logo"/>
+        </div>
+        <div class="container">
+            <h1>information on vore microcomputers crawler bots (vorebot)</h1>
+            <p>
+                if you have seen activity on your website and/or api endpoints coming from a user-agent
+                containing "vorebot", that is likely activity from our work-in-progress web crawler.
+                if you do not know what a "web crawler" is, we suggest you take a look at
+                <a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">this google page which explains it well</a>
+                <br/>
+                however, the main idea is that since there is no central list of every website on the internet, we have
+                to look at every page we can, find the urls on that page, and repeat, in order to create a list of pages
+                that software such as search engines can use to help find specific websites.
+                <br/>
+                <br/>
+                our web crawler attempts to respect standard <a href="https://en.wikipedia.org/wiki/Robots.txt">robots.txt files</a>,
+                and should also respect robots.txt blocks for googlebot (unless you specifically allow vorebot);
+                however, no one is a perfect programmer and we may have made a mistake.
+                if we have made a mistake and accidentally indexed your site, please email us
+                at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a> with your site
+                url and we can prevent your site from being indexed in the future.
+                <br/>
+                <br/>
+                if you do not have a robots.txt file and still do not want your site to be indexed, we can still
+                block your site manually if you email us, but we STRONGLY recommend that you set up a robots.txt file
+                in order to prevent other web crawlers from indexing your site in the future.
+                <br/>
+                <br/>
+                for further questions or comments, feel free to email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
+
+            </p>
+
+            <h3>Proper English for Machine Translations</h3>
+            <p>
+                If you have seen activity on your website or API endpoints coming from a User-Agent containing the word
+                "Vorebot", that is likely activity from our work-in-progress Web Crawler.
+                If you do not know what a web crawler is, we suggest you take a look at <a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">https://developers.google.com/search/docs/fundamentals/how-search-works#crawling</a>.
+                However, the main idea is that in order to develop tools such as Search Engines, one must use a robot to
+                find every website it can, find the URLs on that website, and then find the URLs on those websites, and
+                so on.
+                <br/>
+                <br/>
+                Our web crawler attempts to respect "robots.txt" files (<a href="https://en.wikipedia.org/wiki/Robots.txt">https://en.wikipedia.org/wiki/Robots.txt</a>)
+                and will also respect blocks on "googlebot" (unless you specifically allow "vorebot"). However, our
+                program may make errors. If our program has made an error, please email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
+                and give us your Website URL, and we can block your website from being automatically visited in the future.
+                <br/>
+                <br/>
+                If you do not have a "robots.txt" file and still do not want your site to be visited by us, we can still
+                manually block your site if you email us.
+            </p>
+        </div>
+        <p id="footer">
+            <br>contact us at <a href="mailto:nikocs@voremicrocomputers.com">nikocs@voremicrocomputers.com</a> <a
+                href="/pgp.asc">pgp key</a>
+            <br>
+            for abuse, copyright, or other legal issues, contact <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a><br>
+            <!-- or, call at <a href="tel:1-888-519-0437">+1 (888) 519-0437 (toll-free)</a> or <a href="tel:1-507-767-9433">+1 (507) RMS-YIFF (local)</a> -->
+            (our phone system is currently down, sorry!)
+            <br><br>
+            <a href="/archive.html">archives</a>
+            <br><br>
+            Vore Microcomputers is an unregistered trademark of Real Microsoft, LLC. all references to "the company"
+            are in reference to Real Microsoft, LLC and the name Vore Microcomputers is only a reference to the
+            hardware/software focused development project held under Real Microsoft, LLC. Real Microsoft, LLC is
+            in no way associated with the Microsoft Corporation or its products/projects.
+            <br><br>
+            <a href="https://climate.stripe.com/4MO1d9">Our Carbon Footprint</a>
+        </p>
+    </div>
+    <div class="container-item">
+        <div class="credit">
+            <img src="xenia.png" alt="image of xenia, an anthropomorphic fox who was a contender for the linux mascot"/>
+            <span>image made by <a href="https://twitter.com/cathodegaytube/">@cathodegaytube</a>!</span>
+        </div>
+    </div>
+</div>
+</body>
+</html>