From 540a91659fa540e20a302131489fd1635abe87e1 Mon Sep 17 00:00:00 2001 From: husky Date: Sun, 14 Jan 2024 02:00:34 -0800 Subject: [PATCH] publish information on web crawlers --- crawler.html | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) create mode 100644 crawler.html diff --git a/crawler.html b/crawler.html new file mode 100644 index 0000000..65ae05d --- /dev/null +++ b/crawler.html @@ -0,0 +1,90 @@ + + + info on vore microcomputers crawler bots + + + + + +
+
+ +
+

information on vore microcomputers crawler bots (vorebot)

+

+ if you have seen activity on your website and/or api endpoints coming from a user-agent + containing "vorebot", that is likely activity from our work-in-progress web crawler. + if you do not know what a "web crawler" is, we suggest you take a look at + this google page which explains it well +
+ however, the main idea is that since there is no central list of every website on the internet, we have + to look at every page we can, find the urls on that page, and repeat, in order to create a list of pages + that software such as search engines can use to help find specific websites. +
+
+ our web crawler attempts to respect standard robots.txt files, + and should also respect robots.txt blocks for googlebot (unless you specifically allow vorebot); + however, no one is a perfect programmer and we may have made a mistake. + if we have made a mistake and accidentally indexed your site, please email us + at devnull@voremicrocomputers.com with your site + url and we can prevent your site from being indexed in the future. +
+
+ if you do not have a robots.txt file and still do not want your site to be indexed, we can still + block your site manually if you email us, but we STRONGLY recommend that you set up a robots.txt file + in order to prevent other web crawlers from indexing your site in the future. +
+
+ for further questions or comments, feel free to email us at devnull@voremicrocomputers.com + +

+ +

Proper English for Machine Translations

+

+ If you have seen activity on your website or API endpoints coming from a User-Agent containing the word + "Vorebot", that is likely activity from our work-in-progress Web Crawler. + If you do not know what a web crawler is, we suggest you take a look at https://developers.google.com/search/docs/fundamentals/how-search-works#crawling. + However, the main idea is that in order to develop tools such as Search Engines, one must use a robot to + find every website it can, find the URLs on that website, and then find the URLs on those websites, and + so on. +
+
+ Our web crawler attempts to respect "robots.txt" files (https://en.wikipedia.org/wiki/Robots.txt) + and will also respect blocks on "googlebot" (unless you specifically allow "vorebot"). However, our + program may make errors. If our program has made an error, please email us at devnull@voremicrocomputers.com + and give us your Website URL, and we can block your website from being automatically visited in the future. +
+
+ If you do not have a "robots.txt" file and still do not want your site to be visited by us, we can still + manually block your site if you email us. +

+
+ +
+
+
+ image of xenia, an anthropomorphic fox who was a contender for the linux mascot + image made by @cathodegaytube! +
+
+
+ +