x

Indexed, though blocked by robots.txt

Hi,

I received the 'Indexed, though blocked by robots.txt' warning on Google Webmaster Tools

I don't want this page appearing in search results and had ticked the 'Hide this page from search engines' box on Weebly's SEO settings for the page. As I couldn't see anything wrong with it I asked Google for a revalidation and the same error appeared. I tried unticking the box and entering <meta name="robots" content="noindex"> in the header for the page as I read here that Google can't exclude a page from the index until it is unblocked by robots.txt - I asked for revalidation and the same issue reappeared.

I suspect the robot.txt file is preventing the page being crawled thereby stopping the page being removed from the index as Google can't see the no index order. Is the fact the page has a password related?

I have since reticked the box, removed the noindex meta tag from the header, & asked for a further revalidation, but am expecting the same result as essentially nothing has changed. I don't think waiting for Goolge to recrawl my site will help as asking for a revalidation is the same thing isn't it.

How can I fix this and how did this happen in the first place as I hadn't changed the SEO settings on the page when the error first occurred?

Cheers.

8,185 Views
Message 1 of 12
Report
11 REPLIES 11
Square

Thanks for your question, @Magoo. Google will still know the page exists, but because of your robots.txt it's not actually going to include it with search results. Was the page ever not protected by a password and hidden from search engines?

You can see what I mean if you look through the results of a Google site search:

https://www.google.com/search?client=safari&rls=en&q=site:johnwakelin.net&ie=UTF-8&oe=UTF-8

8,099 Views
Message 13 of 12
Report

Thanks Adam, but that doesn't resolve my problem. What do I need to do to my website to get the Google 'Indexed, though blocked by robots.txt' warning to disappear? I'd rather not have a Google warning associated with my website due to the negative repercussions on my Google search ranking - I've just received notice that the 4th attempt at a Google validation has resulted in the same error.

The page has always been password protected and hidden from search engines.

Cheers.

8,030 Views
Message 13 of 12
Report

I'm having the same problem.  Any solution?

6,969 Views
Message 13 of 12
Report

Hi Vinions,

I had success by temporarily disabling the password on the affected page, waiting for Google to recrawl the page, then reenabling the password.

If you don't want to disable the password then you could try making an identicle page that is password protected, redirect the old page to the new, and then delete the old page.

I hope this helps.

6,951 Views
Message 13 of 12
Report

I am having the same issue. The page is hidden from navigation, password protected, as well as marked hidden from search engines. However, Google is flagging it with an "Indexed, though blocked by robots.txt" error. 

The page was never indexed, as I tested with the site:eyetechds.com search function and did not see the url in question come up.

Duplicating the page and having to set redirects to resolve the issue seems like a lot more work and creates the potential for broken links to this page... is their an alternative that a Weebly support specialist can recommend?

Thank you,

Veronica

5,398 Views
Message 13 of 12
Report
Square

I'm not sure why Google would say it's indexed when you can demonstrate the page definitely is not indexed, or at least it isn't ranked or included with any search results. It might just be poor wording on Google's part - in other words, they're saying we know this page exists as part of your site, but we can't read or include it in results.

5,395 Views
Message 13 of 12
Report

I'm getting same issue for website:  georgetowntxparkinson.weebly.com.

No pages are password protected or marked as hidden from search engines

Google URL inspection says  URL is not available to Google

robots.txt file looks like this:

Sitemap: http://georgetowntxparkinson.weebly.com/sitemap.xml

User-agent: NerdyBot
Disallow: /


User-agent: *
Disallow: /ajax/
Disallow: /apps/

5,286 Views
Message 13 of 12
Report
Square

It looks like the site is indexed although Google doesn't seem to have a description for the homepage:

https://www.google.com/search?client=safari&rls=en&q=site:georgetowntxparkinson.weebly.com&ie=UTF-8&...

You might want to use their fetch-as-google tool to have Google index the site again:

https://support.google.com/webmasters/answer/6065812?hl=en

5,260 Views
Message 13 of 12
Report

Hi Adam,

I'm having this same problem with most of the pages on my site. In the example photos, I'm using the below link. Google says it's crawled but then the error messages say it's blocked by robots.txt. How can I fix this?

https://www.thegiftbulb.com/blog

Also, I'm not sure if this is part of the problem, but I'm including this based on what other people were saying. I have this page hidden from navigation (although, I'm having the same problem with pages that aren't hidden), it is not password protected nor has it ever been, and it keeps getting the hide from search engines box checked. I know I have unchecked this at least twice. Is there something I'm doing to make the box auto-check itself? Would it happen when I post a blog post? 

Thank you,

Katherine

imageimage

4,512 Views
Message 13 of 12
Report
Square

This shouldn't affect anything, @KM1. This file is always blocked, and there's no way to unblock. Again, it shouldn't affect your site in any way. 

891 Views
Message 13 of 12
Report

My 2 Sites are getting indexed by google but I have around 50 pages just on 1 of them that are saying they have a noindex rule? they should be able to be read as far as I can tell. We are actively selling most of the products that are being blocked. help!
 
Submitted URL marked ‘noindex’
First detected: 9/7/21
Status: Error
 
 
Affected pages
48
6/10/216/21/217/2/217/13/217/24/218/4/218/15/218/26/219/6/216040200
459 Views
Message 13 of 12
Report