Same content on the HTML page and PDF won’t cause duplicate content issues for your website
In a recent Google SEO hours session held on February 18, 2022, Google’s John Mueller shared key insights on PDF content.
Corina Burri, an attendee of the session, asked John Mueller whether having the same content on the page and in PDF would cause a negative impact on SEO.
Here’s the question:
“I have a question regarding internal duplicate content. So I have the content of a PDF file in a case study. I submit it to my website. Now I want to present it as well in a HTML block article. Does this have any negative impact for my site, because of duplicate content?”
Google’s John Mueller responded by saying that this would not cause a duplicate content issue as the type of content is different.
So we wouldn’t see it as duplicate content, because it’s different content. One is an HTML page, one is a PDF. Even if the primary piece of content on there is the same, the whole thing around it is different. So from that level, we wouldn’t see it as duplicate content”, said John.
Issues with having similar content on the page and in PDF
Even though having similar content won’t cause duplicate content issues, there are some SEO problems you may encounter. John Mueller emphasized this point. He explained a scenario where both the page and the PDF will compete against each other in SERP.
“I think, at most, the difficulty might be that, in the search results, it can happen that both of these show up at the same time. And whether or not you want that to happen, that’s more almost a strategic question on your side. So from my point of view, I wouldn’t see it as a negative when it comes to SEO. But maybe you have strategic reasons to have either the PDF or the HTML page more visible”, said Mueller.
Here’s how to prevent your PDF from appearing in SERP
There are two ways to keep your PDF from displaying in Google SERP:
Use a canonical tag
You can set a canonical tag on the PDF pointing to the main HTML page. You can do this using the HTTP headers.
Set a no-index tag
You can also set a “noindex” tag on the PDF using HTTP headers.
Key Takeaway
There are scenarios where you would want to use similar content on both the HTML page and PDF. As confirmed by Google, this wouldn’t result in internal duplicate content. However, there is a possibility that both the HTML page and PDF will start ranking in SERP as they have similar content. If you don’t want your PDF to compete with your targeted page, we advise you to use the methods explained above.
Popular Searches
How useful was this post?
0 / 5. 0