website/crawler.html

<html lang="en">
<head>
    <title>info on vore microcomputers crawler bots</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <link rel="stylesheet" href="microsoft.css"/>
    <link rel="icon" type="image/x-icon" href="/favicon.ico">
</head>
<body>
<div class="container">
    <div class="container-item">
        <div class="typeset-logo">
            <img src="/assets/logo_typeset.svg" alt="vore microcomputers logo"/>
        </div>
        <div class="container">
            <h1>information on vore microcomputers crawler bots (vorebot)</h1>
            <p>
                if you have seen activity on your website and/or api endpoints coming from a user-agent
                containing "vorebot", that is likely activity from our work-in-progress web crawler.
                if you do not know what a "web crawler" is, we suggest you take a look at
                <a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">this google page which explains it well</a>
                <br/>
                however, the main idea is that since there is no central list of every website on the internet, we have
                to look at every page we can, find the urls on that page, and repeat, in order to create a list of pages
                that software such as search engines can use to help find specific websites.
                <br/>
                <br/>
		our web crawler is specifically used for indexing for the work-in-progress search engine <a href="https://asklyphe.com">askLyphe</a>,
		which aims to not rely on the results of other search engines and as such needs its own web crawler to function.
		we do not use our indexes to train neural networks, and currently do not store full pages in their entirety whatsoever.
		<br/>
		<br/>
                our web crawler attempts to respect standard <a href="https://en.wikipedia.org/wiki/Robots.txt">robots.txt files</a>,
                and should also respect robots.txt blocks for googlebot (unless you specifically allow vorebot);
                however, no one is a perfect programmer and we may have made a mistake.
                if we have made a mistake and accidentally indexed your site, please email us
                at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a> with your site
                url and we can prevent your site from being indexed in the future.
                <br/>
                <br/>
                if you do not have a robots.txt file and still do not want your site to be indexed, we can still
                block your site manually if you email us, but we STRONGLY recommend that you set up a robots.txt file
                in order to prevent other web crawlers from indexing your site in the future.
                <br/>
                <br/>
                for further questions or comments, feel free to email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>

            </p>

            <h3>Proper English for Machine Translations</h3>
            <p>
                If you have seen activity on your website or API endpoints coming from a User-Agent containing the word
                "Vorebot", that is likely activity from our work-in-progress Web Crawler.
                If you do not know what a web crawler is, we suggest you take a look at <a href="https://developers.google.com/search/docs/fundamentals/how-search-works#crawling">https://developers.google.com/search/docs/fundamentals/how-search-works#crawling</a>.
                However, the main idea is that in order to develop tools such as Search Engines, one must use a robot to
                find every website it can, find the URLs on that website, and then find the URLs on those websites, and
                so on.
                <br/>
                <br/>
		Our web crawler is specifically used for indexing for the search engine <a href="https://asklyphe.com">askLyphe</a>,
		which is currently in development and not available to the public. Our design goal is to not rely on other search engines for our results,
		thus we must run our own web crawler.
		We do not use our indexes to train neural networks, and currently do not store full pages in their entirety.
                <br/>
                <br/>
                Our web crawler attempts to respect "robots.txt" files (<a href="https://en.wikipedia.org/wiki/Robots.txt">https://en.wikipedia.org/wiki/Robots.txt</a>)
                and will also respect blocks on "googlebot" (unless you specifically allow "vorebot"). However, our
                program may make errors. If our program has made an error, please email us at <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a>
                and give us your Website URL, and we can block your website from being automatically visited in the future.
                <br/>
                <br/>
                If you do not have a "robots.txt" file and still do not want your site to be visited by us, we can still
                manually block your site if you email us.
            </p>
        </div>
        <p id="footer">
            <br>contact us at <a href="mailto:nikocs@voremicrocomputers.com">nikocs@voremicrocomputers.com</a> <a
                href="/pgp.asc">pgp key</a>
            <br>
            for abuse, copyright, or other legal issues, contact <a href="mailto:devnull@voremicrocomputers.com">devnull@voremicrocomputers.com</a><br>
            <!-- or, call at <a href="tel:1-888-519-0437">+1 (888) 519-0437 (toll-free)</a> or <a href="tel:1-507-767-9433">+1 (507) RMS-YIFF (local)</a> -->
            (our phone system is currently down, sorry!)
            <br><br>
            <a href="/archive.html">archives</a>
            <br><br>
            Vore Microcomputers is an unregistered trademark of Real Microsoft, LLC. all references to "the company"
            are in reference to Real Microsoft, LLC and the name Vore Microcomputers is only a reference to the
            hardware/software focused development project held under Real Microsoft, LLC. Real Microsoft, LLC is
            in no way associated with the Microsoft Corporation or its products/projects.
            <br><br>
            <a href="https://climate.stripe.com/4MO1d9">Our Carbon Footprint</a>
        </p>
    </div>
    <div class="container-item">
        <div class="credit">
            <img src="xenia.png" alt="image of xenia, an anthropomorphic fox who was a contender for the linux mascot"/>
            <span>image made by <a href="https://twitter.com/cathodegaytube/">@cathodegaytube</a>!</span>
        </div>
    </div>
</div>
</body>
</html>