add more information on what our crawler is used for
This commit is contained in:
		
							parent
							
								
									540a91659f
								
							
						
					
					
						commit
						fcec37a9f2
					
				
					 1 changed files with 11 additions and 0 deletions
				
			
		
							
								
								
									
										11
									
								
								crawler.html
									
										
									
									
									
								
							
							
						
						
									
										11
									
								
								crawler.html
									
										
									
									
									
								
							|  | @ -24,6 +24,11 @@ | |||
|                 that software such as search engines can use to help find specific websites. | ||||
|                 <br/> | ||||
|                 <br/> | ||||
| 		our web crawler is specifically used for indexing for the work-in-progress search engine <a href="https://asklyphe.com">askLyphe</a>, | ||||
| 		which aims to not rely on the results of other search engines and as such needs its own web crawler to function. | ||||
| 		we do not use our indexes to train neural networks, and currently do not store full pages in their entirety whatsoever. | ||||
| 		<br/> | ||||
| 		<br/> | ||||
|                 our web crawler attempts to respect standard <a href="https://en.wikipedia.org/wiki/Robots.txt">robots.txt files</a>, | ||||
|                 and should also respect robots.txt blocks for googlebot (unless you specifically allow vorebot); | ||||
|                 however, no one is a perfect programmer and we may have made a mistake. | ||||
|  | @ -51,6 +56,12 @@ | |||
|                 so on. | ||||
|                 <br/> | ||||
|                 <br/> | ||||
| 		Our web crawler is specifically used for indexing for the search engine <a href="https://asklyphe.com">askLyphe</a>, | ||||
| 		which is currently in development and not available to the public. Our design goal is to not rely on other search engines for our results, | ||||
| 		thus we must run our own web crawler. | ||||
| 		We do not use our indexes to train neural networks, and currently do not store full pages in their entirety. | ||||
|                 <br/> | ||||
|                 <br/> | ||||
|                 Our web crawler attempts to respect "robots.txt" files (<a href="https://en.wikipedia.org/wiki/Robots.txt">https://en.wikipedia.org/wiki/Robots.txt</a>) | ||||
|                 and will also respect blocks on "googlebot" (unless you specifically allow "vorebot"). However, our | ||||
|                 program may make errors. If our program has made an error, please email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a> | ||||
|  |  | |||
		Loading…
	
	Add table
		
		Reference in a new issue