It would be a good idea to determine if your website currently has a robots.txt file, to begin with. You may not want to override anything that is currently there. The absence of a robots.txt file will not stop search engines from crawling and indexing your website. However, it is highly recommended that you create one.
What is Robots.txt?
Search engine crawlers (otherwise known as creepy crawlies or bots), examine your site and index whatever they can. This happens in any case, and you like it or not, and you might not like sensitive or auto-generated files, for example, internal search results, showing up on Google.
Fortunately, crawlers check for a robots.txt file at the root of the site. On the off chance, they’ll follow the crawl instructions inside, but otherwise, they’ll assume the entire site can be indexed.
Here’s a simple robots.txt file:
1. The first line clarifies which agent (crawler) the rule applies to. . For this situation, User-agent: * implies the rule applies to each crawler.
2. The subsequent lines set what paths can (or can’t) be indexed. Allow: /wp-content/uploads/ allows crawling through your uploads folder (images) and Disallow: / means no file or page should be indexed aside from what’s been allowed previously. You can have numerous rules for a given crawler.
3. The rules for different crawlers can be listed in sequence, in the same file.
Why we need a robot text file?
There are some situations in which a robots.txt file can be very handy.
Some common use cases include:
- Preventing duplicate content from appearing in SERPs (note that meta robots is often a better choice for this)
- Keeping entire sections of a website private (for instance, your engineering team’s staging site)
- Keeping internal search results pages from showing up on a public SERP
- Specifying the location of sitemap(s)
- Preventing search engines from indexing certain files on your website (images, PDFs, etc.)
- Specifying a crawl delay in order to prevent your servers from being overloaded when crawlers load multiple pieces of content at once
What Could Go Wrong If I Don’t Use This Robots.txt File?
Without a robots.txt file, your website is:
- Not optimized in terms of crawability
- Disposed to SEO errors
- Easier for ill-willed users to hack the website
- Won’t be able to block impolite bots to crawl webpages.
- Open to sensitive data being seen
- Going to suffer behind the competition
- Going to have indexation problems
- Going to be a mess to sort out in webmaster tools
- Going to give off confusing signals to the search engines
How to Create a Robots.txt file?
Robots.txt file usually resides in your site’s root folder. You will need to connect to your site using an FTP client or by using cPanel file manager to view it.
It is just like any ordinary text file, and you can open it with a plain text editor like Notepad.
If you do not have a robots.txt file in your site’s root directory, then you can always create one. All you need to do is create a new text file on your computer and save it as robots.txt. Next, simply upload it to your site’s root folder.
The robots.txt file is public—be aware that a robots.txt file is a publicly available file. Anyone can see what sections of a server the webmaster has blocked the engines from. This means that if an SEO has private user information that they don’t want publicly searchable, they should use a more secure approach—such as password protection—to keep visitors from viewing any confidential pages they don’t want indexing.
By Muthumali Tharuka Wickramarachchi