How not to use JavaScript for SEO?

The use of JavaScript in mod­ern web devel­op­ment and design is inevitable. JavaScript improves inter­ac­tiv­i­ty and is respon­si­ble for visu­al­ly engag­ing web­sites. How­ev­er, if a web­site heav­i­ly relies on it, and it is not opti­mised so that Google can crawl and index it in an opti­mal way, hav­ing a mod­ern look­ing web­site will be worth noth­ing.

If a site is not indexed in Google or any oth­er search engine, it sure­ly won’t get traf­fic. And, in order the site to be indexed, the con­tent should first­ly be vis­i­ble for search engines and crawlable. 

So, what we should avoid and how can we use JavaScript hav­ing SEO in mind, so that our site is inter­ac­tive and mod­ern, while at the same time crawlable and index­able?

Your site can be entire­ly or only par­tial­ly reliant on JavaScript. How­ev­er, even if not the entire web­site is built with JavaScript, I came across sit­u­a­tions where the Main Con­tent or impor­tant resources are dynam­i­cal­ly inject­ed and heav­i­ly reliant on JavaScript exe­cu­tion.

In most of these instances, JavaScript is client-side ren­dered, thus, is only vis­i­ble for the search engines after being ren­dered in the brows­er, which is not a good prac­tice. This means that the Main Con­tent is not vis­i­ble for search engines, and most­ly depends on user inter­ac­tion for this. Besides, client-side ren­der­ing impacts the page load­ing times as it takes longer to ren­der the JavaScript con­tent in the brows­er.

How to inspect this?

You can inspect whether the site relies on client-side ren­der­ing by look­ing at the HTML code of the page. Usu­al­ly, heavy reliant on client-ren­dered JavaScript pages have very lit­tle code in the raw HTML.

Tool: View Ren­dered Source

Anoth­er way to inspect the code is by run­ning a crawl in Scream­ing Frog and inspect­ing the ren­dered code vs raw HTML. In order to do this, you must ensure that you enable JavaScript ren­der­ing (Con­fig­u­ra­tion > Spi­der > Ren­der­ing). In addi­tion, enable the crawler to store Ren­dered HTML (Crawl > Extrac­tion).

How to fix?

Turn away from the client-side ren­der­ing and opt for a dif­fer­ent ren­der­ing option, such as serv­er-side ren­der­ing, dynam­ic ren­der­ing or sta­t­ic site gen­er­a­tion. Which one of these is most suit­able, will depend on mul­ti­ple fac­tors, in par­tic­u­lar, how often your con­tent is updat­ed. How­ev­er, any of these is bet­ter than client-side ren­der­ing.

How to know which solu­tion to use?

This will depend on what is the eas­i­est deploy­ment option hav­ing in mind the cur­rent site set­up, but also on how often does your con­tent update. Less fre­quent con­tent updates should go for Sta­t­ic Site Gen­er­a­tion, as this is the fastest option.

SEO Friendly JavaScript Rendering Options
SEO Friend­ly JavaScript Ren­der­ing Options

JavaScript should not be used to serve dif­fer­ent con­tent ver­sions to users and search engines. For exam­ple, if you fea­ture con­tent in Span­ish in the raw HTML on the sta­t­ic ver­sion of the URL, do not rely on JavaScript to inject the con­tent in Eng­lish to this same URL, as only one ver­sion will be vis­i­ble to search engines.

In this exam­ple, JavaScript is alter­ing the con­tent, thus we have dif­fer­ent con­tent fea­tured on the same URL, one ver­sion being only vis­i­ble to the user (Eng­lish), while the search engines are not able to see it.

The only con­tent present in the raw HTML is Span­ish, thus this is the only con­tent search engines will see.

exmaple of a page serving different content with JS
Snap­shot of a page serv­ing con­tent in Eng­lish
raw HTML code of the page above
Raw HTML con­tent present in Span­ish

This can be inter­pret­ed as cloak­ing, because we are show­ing dif­fer­ent con­tent to users and search engines. Cloak­ing is black-hat SEO and can result in penal­ties!

More­over, this set-up gen­er­al­ly has a neg­a­tive impact on the per­for­mance because only one ver­sion of the con­tent exists for search engines. If we want the Eng­lish con­tent to rank and get traf­fic, this set-up results in com­plete oppo­site, as the con­tent is not being crawled or indexed.

How to inspect and detect this issue?

Start with inspect­ing the index­a­tion of the pages. If there are issues with Google not being able to see your con­tent, this will be reflect­ed on the index­a­tion, as these pages won’t be indexed.

Tools: Scream­ing Frog, Google Search Con­sole, SERP

Check and com­pare raw and ren­dered HTML code. Is the con­tent vis­i­ble at all? What is the con­tent present in the HTML code?

How to fix?

Cre­ate unique URLs for each lan­guage ver­sion with unique con­tent in the raw HTML. In this par­tic­u­lar case, cre­ate a unique URL path for Eng­lish ver­sion, where the con­tent will be found in the raw HTML. On the oth­er site, the Span­ish ver­sion should be served in a ded­i­cat­ed site sec­tion for this par­tic­u­lar lan­guage. This way, there will be two unique URLs, fea­tur­ing con­tent in the raw HTML for each audi­ence.

Block­ing crawlers from access­ing JavaScript or resources such as CSS with the robots.txt dis­al­low direc­tive is such a com­mon error! I can­not high­light enough how impor­tant it is that search engines are able to fetch and crawl these resources, so they can ren­der them cor­rect­ly.

If inter­nal JavaScript (and CSS) or oth­er impor­tant resources, such as images, are blocked, this means that search engines can­not crawl and ren­der them, mean­ing that any con­tent on the page depend­ing on these resources won’t be vis­i­ble and as a result, it won’t be indexed either.

This hap­pens so often! For exam­ple, MANGO shop web­site robots.txt is block­ing Google from access­ing and crawl­ing images. Thus, Google is not able to see any of the images on their page (see below).

For exam­ple, Man­go blocks their images from being crawled by search engines. I checked their robots.txt file and iden­ti­fied the direc­tive block­ing search engines from access­ing these resources. Hope­ful­ly, they will see this post and fix the issue! 😄

Mango shop images not crawlable

The image above depicts how Google cur­rent­ly sees this page, and, as we can see, the image is not acces­si­ble.

There is a direc­tive in their robots.txt (see below) that is block­ing Google from crawl­ing these resources. For this rea­son, when cre­at­ing dis­al­low direc­tives in your robots.txt file, you should always ensure that impor­tant site sec­tions or resources such as JavaScript, CSS or images are enabled for being crawled.

Tool: Robots.txt Val­ida­tor

section of mango robots.txt

How to fix?

Ensure that you check your robots.txt file and remove any direc­tive block­ing search engines form crawl­ing JavaScript (and CSS) resources.

I came across this prac­tice more than once. Using frag­ment­ed URLs (#) for pag­i­na­tion or load­ing any kind of con­tent we want to index is bad prac­tice. Sim­ple rea­son for this is the fact that Google does not rec­og­nize any con­tent that would appear after the # in the URL.

URL frag­ments can be used to anchor to dif­fer­ent sec­tions on the page for exam­ple, how­ev­er, not to intro­duce new con­tent, as this con­tent for Google won’t exist.

How to fix?

When it comes to pag­i­na­tion, as ide­al­ly we want each page in the pag­i­nat­ed sequence to be indexed, this means that pag­i­na­tion intro­duced with frag­ments would not be an option, and para­me­ters should be used instead. See Google’s doc­u­men­ta­tion on Pag­i­na­tion Best Prac­tices.

When to use # URLs then?

Frag­ment­ed (#) URLs can be used for sort­ing, fil­ter­ing and faceted nav­i­ga­tion, as well as for anchor­ing to dif­fer­ent sec­tions on one page.

Links will be crawlable for Google if you rely on JavaScript to insert them into a page dynam­i­cal­ly only if you use any of the fol­low­ing HTML markup:

<a href=“https://example.com”>

<a href=”/products/category/shoes”>

<a href=”./products/category/shoes”>

<a href=”/products/category/shoes” onclick=“javascript:goTo(‘shoes’)”>

<a href=”/products/category/shoes” class=“pretty”>

Google gen­er­al­ly can only crawl links with­in the <a> HTML ele­ment with an href attribute. Thus, if links are includ­ed in the con­tent in any oth­er way, they won’t be crawled which will lead to index­a­tion issues.

The main nav­i­ga­tion is super impor­tant for crawl­ing and index­ing. All the most impor­tant site links are found in the main nav­i­ga­tion, sig­nal­ing to Google that these are the pages we want to be found in the SERP.

If our links in the main nav­i­ga­tion are only vis­i­ble after JavaScript is client side ren­dered, this will result in search engines not see­ing the links until the code is ren­dered in the brows­er. This means that the links in the main nav­i­ga­tion will basi­cal­ly be invis­i­ble for search engines.

How to fix?

The links in the main nav­i­ga­tion should always be present in the raw HTML. This is the first part of the site Google will crawl, and, apart from nav­i­ga­tion­al links, oth­er impor­tant ele­ments, such as lan­guage selec­tor can be found in the main nav­i­ga­tion.

Hav­ing in mind how impor­tant the nav­i­ga­tion is, the links here should always be present in the raw HTML in the <a> HTML ele­ment with an href attribute.

✔️ Move away from client-side ren­der­ing to serv­er side ren­der­ing or any oth­er more SEO friend­ly method.

✔️ Ensure that each page has a unique URL path with unique con­tent fea­tured in the raw HTML, avoid the use JavaScript for serv­ing dif­fer­ent con­tent ver­sions.

✔️ Don’t block JavaScript, CSS or images from being crawled. Ensure that robots.txt does not block any of these impor­tant resources.

✔️ Don’t use # URLs in the inter­nal link­ing or the pag­i­na­tion on the site, as frag­ment­ed URLs are not rec­og­nized by Google as unique.

✔️ Use <a> HTML ele­ment with an href attribute for inter­nal link­ing, in par­tic­u­lar in the main nav­i­ga­tion.

Leave a Reply

Your email address will not be published. Required fields are marked *