101 lines
		
	
	
	
		
			6.6 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			101 lines
		
	
	
	
		
			6.6 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <html lang="en">
 | |
| <head>
 | |
|     <title>info on vore microcomputers crawler bots</title>
 | |
|     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
 | |
|     <link rel="stylesheet" href="microsoft.css"/>
 | |
|     <link rel="icon" type="image/x-icon" href="/favicon.ico">
 | |
| </head>
 | |
| <body>
 | |
| <div class="container">
 | |
|     <div class="container-item">
 | |
|         <div class="typeset-logo">
 | |
|             <img src="/assets/logo_typeset.svg" alt="vore microcomputers logo"/>
 | |
|         </div>
 | |
|         <div class="container">
 | |
|             <h1>information on vore microcomputers crawler bots (vorebot)</h1>
 | |
|             <p>
 | |
|                 if you have seen activity on your website and/or api endpoints coming from a user-agent
 | |
|                 containing "vorebot", that is likely activity from our work-in-progress web crawler.
 | |
|                 if you do not know what a "web crawler" is, we suggest you take a look at
 | |
|                 <a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">this google page which explains it well</a>
 | |
|                 <br/>
 | |
|                 however, the main idea is that since there is no central list of every website on the internet, we have
 | |
|                 to look at every page we can, find the urls on that page, and repeat, in order to create a list of pages
 | |
|                 that software such as search engines can use to help find specific websites.
 | |
|                 <br/>
 | |
|                 <br/>
 | |
| 		our web crawler is specifically used for indexing for the work-in-progress search engine <a href="https://asklyphe.com">askLyphe</a>,
 | |
| 		which aims to not rely on the results of other search engines and as such needs its own web crawler to function.
 | |
| 		we do not use our indexes to train neural networks, and currently do not store full pages in their entirety whatsoever.
 | |
| 		<br/>
 | |
| 		<br/>
 | |
|                 our web crawler attempts to respect standard <a href="https://en.wikipedia.org/wiki/Robots.txt">robots.txt files</a>,
 | |
|                 and should also respect robots.txt blocks for googlebot (unless you specifically allow vorebot);
 | |
|                 however, no one is a perfect programmer and we may have made a mistake.
 | |
|                 if we have made a mistake and accidentally indexed your site, please email us
 | |
|                 at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a> with your site
 | |
|                 url and we can prevent your site from being indexed in the future.
 | |
|                 <br/>
 | |
|                 <br/>
 | |
|                 if you do not have a robots.txt file and still do not want your site to be indexed, we can still
 | |
|                 block your site manually if you email us, but we STRONGLY recommend that you set up a robots.txt file
 | |
|                 in order to prevent other web crawlers from indexing your site in the future.
 | |
|                 <br/>
 | |
|                 <br/>
 | |
|                 for further questions or comments, feel free to email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
 | |
| 
 | |
|             </p>
 | |
| 
 | |
|             <h3>Proper English for Machine Translations</h3>
 | |
|             <p>
 | |
|                 If you have seen activity on your website or API endpoints coming from a User-Agent containing the word
 | |
|                 "Vorebot", that is likely activity from our work-in-progress Web Crawler.
 | |
|                 If you do not know what a web crawler is, we suggest you take a look at <a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">https://developers.google.com/search/docs/fundamentals/how-search-works#crawling</a>.
 | |
|                 However, the main idea is that in order to develop tools such as Search Engines, one must use a robot to
 | |
|                 find every website it can, find the URLs on that website, and then find the URLs on those websites, and
 | |
|                 so on.
 | |
|                 <br/>
 | |
|                 <br/>
 | |
| 		Our web crawler is specifically used for indexing for the search engine <a href="https://asklyphe.com">askLyphe</a>,
 | |
| 		which is currently in development and not available to the public. Our design goal is to not rely on other search engines for our results,
 | |
| 		thus we must run our own web crawler.
 | |
| 		We do not use our indexes to train neural networks, and currently do not store full pages in their entirety.
 | |
|                 <br/>
 | |
|                 <br/>
 | |
|                 Our web crawler attempts to respect "robots.txt" files (<a href="https://en.wikipedia.org/wiki/Robots.txt">https://en.wikipedia.org/wiki/Robots.txt</a>)
 | |
|                 and will also respect blocks on "googlebot" (unless you specifically allow "vorebot"). However, our
 | |
|                 program may make errors. If our program has made an error, please email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
 | |
|                 and give us your Website URL, and we can block your website from being automatically visited in the future.
 | |
|                 <br/>
 | |
|                 <br/>
 | |
|                 If you do not have a "robots.txt" file and still do not want your site to be visited by us, we can still
 | |
|                 manually block your site if you email us.
 | |
|             </p>
 | |
|         </div>
 | |
|         <p id="footer">
 | |
|             <br>contact us at <a href="mailto:nikocs@voremicrocomputers.com">nikocs@voremicrocomputers.com</a> <a
 | |
|                 href="/pgp.asc">pgp key</a>
 | |
|             <br>
 | |
|             for abuse, copyright, or other legal issues, contact <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a><br>
 | |
|             <!-- or, call at <a href="tel:1-888-519-0437">+1 (888) 519-0437 (toll-free)</a> or <a href="tel:1-507-767-9433">+1 (507) RMS-YIFF (local)</a> -->
 | |
|             (our phone system is currently down, sorry!)
 | |
|             <br><br>
 | |
|             <a href="/archive.html">archives</a>
 | |
|             <br><br>
 | |
|             Vore Microcomputers is an unregistered trademark of Real Microsoft, LLC. all references to "the company"
 | |
|             are in reference to Real Microsoft, LLC and the name Vore Microcomputers is only a reference to the
 | |
|             hardware/software focused development project held under Real Microsoft, LLC. Real Microsoft, LLC is
 | |
|             in no way associated with the Microsoft Corporation or its products/projects.
 | |
|             <br><br>
 | |
|             <a href="https://climate.stripe.com/4MO1d9">Our Carbon Footprint</a>
 | |
|         </p>
 | |
|     </div>
 | |
|     <div class="container-item">
 | |
|         <div class="credit">
 | |
|             <img src="xenia.png" alt="image of xenia, an anthropomorphic fox who was a contender for the linux mascot"/>
 | |
|             <span>image made by <a href="https://twitter.com/cathodegaytube/">@cathodegaytube</a>!</span>
 | |
|         </div>
 | |
|     </div>
 | |
| </div>
 | |
| </body>
 | |
| </html>
 |