diff --git a/crawler.html b/crawler.html new file mode 100644 index 0000000..65ae05d --- /dev/null +++ b/crawler.html @@ -0,0 +1,90 @@ + +
+
+ if you have seen activity on your website and/or api endpoints coming from a user-agent
+ containing "vorebot", that is likely activity from our work-in-progress web crawler.
+ if you do not know what a "web crawler" is, we suggest you take a look at
+ this google page which explains it well
+
+ however, the main idea is that since there is no central list of every website on the internet, we have
+ to look at every page we can, find the urls on that page, and repeat, in order to create a list of pages
+ that software such as search engines can use to help find specific websites.
+
+
+ our web crawler attempts to respect standard robots.txt files,
+ and should also respect robots.txt blocks for googlebot (unless you specifically allow vorebot);
+ however, no one is a perfect programmer and we may have made a mistake.
+ if we have made a mistake and accidentally indexed your site, please email us
+ at devnull@voremicrocomputers.com with your site
+ url and we can prevent your site from being indexed in the future.
+
+
+ if you do not have a robots.txt file and still do not want your site to be indexed, we can still
+ block your site manually if you email us, but we STRONGLY recommend that you set up a robots.txt file
+ in order to prevent other web crawlers from indexing your site in the future.
+
+
+ for further questions or comments, feel free to email us at devnull@voremicrocomputers.com
+
+
+ If you have seen activity on your website or API endpoints coming from a User-Agent containing the word
+ "Vorebot", that is likely activity from our work-in-progress Web Crawler.
+ If you do not know what a web crawler is, we suggest you take a look at https://developers.google.com/search/docs/fundamentals/how-search-works#crawling.
+ However, the main idea is that in order to develop tools such as Search Engines, one must use a robot to
+ find every website it can, find the URLs on that website, and then find the URLs on those websites, and
+ so on.
+
+
+ Our web crawler attempts to respect "robots.txt" files (https://en.wikipedia.org/wiki/Robots.txt)
+ and will also respect blocks on "googlebot" (unless you specifically allow "vorebot"). However, our
+ program may make errors. If our program has made an error, please email us at devnull@voremicrocomputers.com
+ and give us your Website URL, and we can block your website from being automatically visited in the future.
+
+
+ If you do not have a "robots.txt" file and still do not want your site to be visited by us, we can still
+ manually block your site if you email us.
+