The advice usually given to site owners is all about getting your content ranking on Google. But what if you don’t want your site to appear in search results?
There are various reasons why you might want to keep your pages hidden from search results. Perhaps your content is private and specific, and you don’t want the general public touching base on your pages. Or, a page is under development, and you don’t want people to see it until it’s ready.
Hiding pages can also have many benefits. For example, it can preserve your privacy and keep you safe from phishing attacks. But how can you stop Google and other search engines from crawling your site?
Well, there are several ways to protect yourself from search engine spiders. We’ll be exploring them here.
In order to stop search engines from finding your site, you first need to understand how they work.
Search engines employ ‘web crawlers’ (also known as ‘spiders’). When someone types a query into a search engine, these spiders crawl through billions and billions of web pages, looking for keywords and other indicators that a page is relevant to the query.
Spiders travel by jumping from link to link. As they do so, they pull relevant pages into an index using a data ingestion pipeline design.
The search engine then uses that index to present a list of results to the user in order of perceived relevance.
So, if you want to keep your website private, you need to stop the spiders.
Here’s how.
A VPN won’t entirely protect your site from spiders, but it does help to preserve your privacy. If staying private and protecting yourself from things like domain hacking are top of your priority list, a VPN is a must.
How does a VPN work? Well, a good VPN (Virtual Private Network) works by encrypting communications. Rather than happening out in the open, your online activities (for example - searches, sites visited, and so on) go through a tunnel to the VPN provider to be encrypted before the service is fulfilled.
Essentially, this means that your actions on the internet are anonymized. Your web traffic and your IP cannot be tracked. What’s more, because your location can’t be pinned down by the ISPs, you aren’t tied down by geographic restrictions. You can explore sites all over the world without being blocked by geofilters.
This isn’t just useful for personal browsing purposes. If you deal with sensitive information, a VPN can protect your customers. For example, financial firms can keep their customer data safe by encrypting processes via VPN.
If you want to maintain your and your site’s privacy, getting a VPN should definitely be the first item on your agenda.
If you don’t want to entirely block search spiders but do want to keep your content for an exclusive, select audience, password protection is your friend.
Search engines tend not to rank password-protected pages highly, as they know that users want quick results. Search engine users don’t want to have to create an account and enter a password to get the value they were searching for.
So, by password-protecting your website, you ensure that not just anyone can wander around in your content. Plus, you know that the people who have access to your site genuinely want to be there (why else would they bother to create a password-protected account?).
There is one caveat here. People who use VoIP apps to search the web may struggle to navigate password-protected pages, because the HTML and alt-text of these pages don’t always translate well via text-voice tech.
So, make sure that your page HTML is readable by voice-to-text apps if you want visually impaired users to be able to create passwords
This is probably the easiest way to keep spiders out of your site and to protect yourself from ad-based malware. You can use Robots.txt to block spiders from crawling your pages.
Your domain will have a Robots.txt file at its root—i.e. example.com/robots.txt. You can use that file to disable spider access. It can also be used to protect yourself from domain hijacking.
When you go into the Robots.txt file, you will find a set of rules which can be changed to block spider access to your site. So, when you set the rules to block crawling, people will still be able to access your site with a direct link, but it won’t show up in search results.
If you use a hosting service like Wix, you may not be able to edit your Robots.txt file directly. However, hosting services often allow you to customize settings in other ways, so check out their policies and procedures for blocking crawling.
You can create your own Robots.txt file easily with any code text editor. It’s a simple case of creating the file, adding rules, and uploading to your site. Think of custom Robots.txt files as like local vanity phone numbers: people who know the number can contact you easily, but it’s not automatically listed in the phone book.
Blocking access to your site via a Robots.txt file will stop spiders from getting to your site content. However, it might not hide your IP address or stop search engines from indexing your site. In order to do that, you need to block indexing.
To block indexing, your site needs a noindex robots meta tag. When you use noindex, you actively tell search engines not to include your content in search results.
The noindex tag should be included in the source code of your content, like this:
As well as telling search engines not to include your content currently, it also tells them to drop content already indexed. So, as well as keeping your site hidden moving forward, it also deletes your content from historical results.
There is an important thing to note here: spiders can’t read the noindex meta tag if they can’t crawl your site. So, in order to prevent your site from being indexed, you need to allow spiders in via your robots.txt file.
So, if you want search engines not to index and list your site, use the noindex meta tag. If you want search engines to be unable to ‘read’ your content, use the ‘Disallow’ directive in your robots.txt file.
If you’re not sure which to use or aren’t certain that your chosen method is working, we suggest you employ continuous testing. What is continuous testing? It’s a way of monitoring your systems and processes to ensure that they are working on a consistent basis.
Continuous testing is important when you’re dealing with search engines because their algorithms are constantly changing. What works with them this month may not work next month.
There are lots of reasons to hide your site from search engines. Maybe your site contains sensitive information that you don’t want just anyone to access, or it’s under construction, and you don’t want people to see it until it’s ready. Then again, maybe you just like your privacy.
Whatever the reason, there are several ways to keep search engines out of your business. By using a robots.txt file to disallow spiders, you will prevent search engines from crawling your content. However, you won’t necessarily stop them from indexing your site.
To prevent search engines from indexing your site, you need a noindex meta tag. However, in order for your noindex tag to be effective, you need to allow spiders via your robots.txt file. Thus, think carefully about what’s most important to you. Do you want to keep search engines out of your content, or do you want your content to definitively disappear from search indexes?
If you want to browse privately, view profiles discreetly, keep your data safe, and prevent ISPs from tracking your digital footprint, you need a VPN. While you can’t block indexing and block crawling at the same time, you can use a VPN with both of these protocols.
So, if you haven’t already got a VPN, now is the time to get one!