information architecture seo

Optimising your website’s information architecture improves site crawlability and indexation on search engines.

In this guide, we will introduce advanced SEO concepts that goes to the very heart of your website: its architecture, code, content organisation and data. This certainly goes beyond keyword research and content optimisation.

We’ll also touch on tools and resources to improve technical SEO aspects of your website such as Screaming Frog Spider Tool, Google Page Speed Insights and Chrome Developer tool extensions.

By the end, you’ll learn advanced SEO techniques to build an SEO-friendly website from scratch and understand how Google sees your pages.

What is an SEO optimised website architecture?

In today’s Information Age, modern search engines rule the web. And we know how ranking well on the big G can have a serious impact on your business’ bottom line.

As everyone competes to get ahead in search result pages, we invest in quality content to attract the most relevant traffic to our site.

Think of all the content, blogs, videos, ebooks and web pages on the Internet as bait on a fishing rod. Technical SEO is in some ways like a fishing rod. You can have the highest quality bait ever, but if you’re fishing with a branch and shoestring, then you’re setting yourself up for failure.

An SEO optimised website architecture is like a fishing rod made from the Elder Tree (harry potter reference). It sets the technical foundation for all your SEO efforts to get the right traffic.

This optimisation process takes a departure from content development and off-page SEO. We’ll look at creating an SEO optimised information architecture and the inner workings of your website.

Quick Jump Menu

#1 Indexable Content

Search engines interact with your web content in two ways: crawling and indexing.

Crawling is the process of finding a page to analyse its content.

Indexing is the storage of that page and its related information on search engine’s databases.

These two functions of crawling and indexing are the premise of technical SEO.

So what is indexable content?

Proper indexing means web crawlers are able to parse all the content on a page. But the way web crawlers see your website is different from how we see it from a user perspective.

If you’re not in google’s index, you’re not ranking. If you’re in the index, but not correctly, you’re not ranking where you want to be.

For example, you’ve built an incredible website about best macarons in France.

But your primary content is a featured image without alt tag. An image is not real HTML text and Google doesn’t know what that image is about.

Then, you’ve got a paragraph of text supported by another body of text about ingredients but hidden in a collapsing text block.

As it stands, Google doesn’t have enough context to know that your page is about best macarons in France. Googlebot is having a hard time parsing the site figuring out if the page is about macaron recipes, equipment for macarons, or how to bake macarons.

Even though your site was crawled successfully, you won’t be successful in ranking or indexing for the keyword terms you’re targeting.

Thankfully, this was before Google caught up with crawling sites built on JavaScript.

Use of JavaScript to render content on a site

Originally, Google could not handle websites that rendered content using JavaScript. It was able to render and understand HTML, text and tags laid out on a page. Google wasn’t able to gather much information from JavaScript pages and if Googlebot can’t understand those pages, they’d have a hard time ranking it for keyword queries.

But the use of JavaScript frameworks to render content on websites have advanced quickly.

Since 2008, Google has gotten incredibly advanced in rendering sites even those using JavaScript frameworks. They accomplish this by using a headless browser to crawl and index content.

Crawling websites with a headless browser is computationally expensive and slower. A website using JavScript to render its view may not always be evaluated by Google correctly. This is because it takes time for all the elements on the site to load and Google might misinterpret a website built on JavaScript.

For example, Googlebot might not know to click on this arrow or menu item to navigate through the content on your homepage. Thus, it might have trouble crawling your website that’s built on JavaScript.

The question is how can you make your JavaScript website more search engine friendly?

Review how search engines view your website with Screaming Frog and Google Search Console

It’s important to take the time to detect and analyse the visual differences between how Googlebot sees your page and how users see your page.

We can’t know for sure how Google sees your website, but we can check with Screaming Frog.

Screaming Frog Rendered View

In Screaming Frog, click on ‘rendered page’ to see how it compares when we load the browser normally.

Screaming Frog JavaScript Crawl

In Screaming Frog, under ‘configuration’ -> ‘spider’ -> ‘rendering’:

  • set rendering to JavScript
  • uncheck the box for screenshots

SEO screamingfrog javascript config

This way, you will be able to select the rendered page view to compare this to how you would normally view it on your browser.

Another good way to check is Google Search Console.

Google Search Console: ‘Fetch as Google’

Log in to Google Search Console, click on the ‘crawl’ tab and select ‘fetch as Google’.

  • Over here, you can fetch and render as Google does.
  • Leave the URL box blank and click ‘fetch and render’ to fetch the homepage, or enter the URL you want to check.
  • You can select between mobile and desktop to compare the differences
  • After the fetch and render process is completed, click on the status ‘complete’ and you can see a comparison in how Googlebot views your site and how users saw your site.

#2 Improving site indexation

Test your site’s index status by crawling your website as Google would.

For example, let’s crawl Moving Jar, a travel and lifestyle blog.

Using Screaming Frog to check Indexation Status

In my configuration settings, we want to get a cleaner crawl:

Spider settings:

  • check ‘links outside of start folder’
  • check ‘crawl canonicals’
  • check ‘extract hreflang’
  • uncheck Images, CSS, JavaScript, SWF, External Links
  • uncheck the other options

SEO screamingfrog spider config

Robots.txt settings

  • check ‘show internal URLs blocked by robots.txt’
  • uncheck the other options

seo screamingfrog robots settingSEO screamingfrog robots config

After the crawl is completed, we see that we’ve crawled (XXXX) pages. This means (XXX) pages on my website are accessible to Google.

Using google site:search to check Indexation Status

Just open Google and type site:movingjar.com

Here, we can see there are about 176 results. Okay, that’s pretty close to (XXXX).

Verify Indexation Status using Google Search Console

Now, this is the best way to check and verify your website indexation status.

After you’ve verified your web property on search console, look at the index status and see how many pages Google has actually indexed.

Movingjar-google-index

Okay, we see that Google has indexed 519 pages. Now i’m feeling pretty confident that things are going well.

Because if I had crawled my website and was expecting 1,000 pages pages to be indexed, and Google has only indexed 519 then something is wrong. I’d need to start investigating indexation errors.

#3 Crawlability

Crawlability of your website determines how likely it is for Google to find all the pages you want indexed.

The Internet is huge and new content pops up every second. While Google has an incredible technology stack, they’re up against an infinite amount of content being created. With finite resources, Googlebot can only crawl a limited portion of web pages.

Web crawlers have a limited amount of time to crawl and index your site – also known as crawl budget.

Check your Crawl Stats in Google Search Console

You can find out your website’s crawl budget on Google Search Console, under ‘Crawl’ -> ‘Crawl Stats’. This is the average number of pages on your website that Google crawled per day.

movingjar crawl statsHere we can see:

  • pages being crawled per day – we can see a decreasing trend in pages being crawled for this site which might indicate issues
  • kilobytes downloaded per day
  • time spent downloading a page – it’s a good sign if google is spending less time downloading your pages, meaning the site is getting faster

Improve crawlability with internal linking

Search engines find new or updated pages by using web crawlers to follow links on webpages. They go from link to link and bring information about content of those webpages back to Google’s servers.

If you have orphaned pages with no internal links pointing to them, Google will not be able to find and index it.

Remember that Google needs direct, crawlable links interconnected between all your pages to crawl.

And a well optimised crawl is built upon good internal linking structures that web crawlers can navigate.

internal linking technical seo agency

For instance, the homepage has links pointing to page A and D, but no internal links point to B and C. Even though pages B and C are on the site, there are no crawlable, internal links connecting them – as far as Google is concerned – these pages don’t exist because web crawlers can’t reach them.

Common problems in search engine crawlability

Many crawl problems come from URL structure.

  • Sorting parameters and sessions IDs in URLs.
    • Having such parameters don’t change the content of a page, they can be removed from the URL structure and put into a cookie.
    • When we put these sorting/tracking information into a cookie, and 301 redirect to the original URL, we end up with less URLs pointing to the same content. Thus, retaining eliminating duplicate content and a cleaner information architecture on our site.
  • Additive filtering in e-commerce sites
    • In online stores or hotel sites, notice as you identify different filters, the URL structure changes. Additive filtering creates different lists for the same content.
    • Now, Google doesn’t need to access many different lists of URLs to get to the same content. Ideally, one piece of unique content should only be available via one URL.

#4 Robots.txt

Robots.txt file is the first file on your website that web crawlers will access.

When GoogleBot wants to visit your site, before it does so, it first accesses the robots.txt and checks whether it shouldn’t visit any pages on your site.

The robots.txt lives at the root directory. If you have several subdomains, you’ll have to add a robots.txt file for each subdomain.

In this case, I’m on craiglist’s robots.txt file and its simply at http://craigslist.org/robots.txt

robots file seo example

Notice in this file, it tells search bots, “I want to disallow you from crawling any of these directories”.

Problems that usually arise with indexing is when you disallow a particular directory or subdirectory to be crawled. Or you accidentally disallowed JavaScript or CSS from being crawled.

Another issue is not having any robots.txt file at all. Which is basically telling search bots, “Go ahead, crawl my entire website.” This wouldn’t normally be a problem for small websites but for large sites, you’ll want to maximise crawl efficiency by blocking low-value and unnecessary areas of your site to be crawled, so more important sections of your website can be crawled and indexed faster.

Check your robots.txt file from Google Search Console

robots file crawl seo search console

Log in to search console, ‘Crawl’ -> ‘robots.txt Tester’

  • Check for any errors or warnings that might have popped up
  • Use and evaluate your robots.txt tester from time to time to ensure its keeping your crawls clean and organised

#5 Site information architecture

A good website information architecture is well thought out for both user experience and search engines.

In terms of user experience, your website should be easy to navigate and visitors can find what they need within a few clicks. It should be intuitive and structured with good internal linking, so users won’t be confused about where to click and how to find the information they want.

To do this, we need to design or redesign a site taxonomy.

Creating a content site taxonomy

Think of your site taxonomy as a content hierarchy that starts with the parent topic then branches out to more specific details and sections.

Creating a site taxonomy is a necessity for large websites and businesses, especially those with multiple product lines and target audiences.

Method 1: create with internal team 

  • List down all the content pages you have, including terms and conditions
  • Define how this content is reached, by primary or secondary navigation.
    • Your primary navigation should be for most important and frequented content pages
    • Your secondary navigation consists of the less important pages and it lives in the footer. This could include your contact details, links to other locations, etc.
  • Outline themes and keywords you’ll target for each section.

Method 2: create with users/customers

  • Use card sorting – ask your users to help you evaluate the information architecture of your site
    • Open sorting: Participants come up with topics/themes  that are intuitive to them and sort the content into those categories
    • Closed sorting: You already create the topic categories as you’ve done the keyword research and they accurately describe your business. Then, your audience simply sort various content on your site into those pre-determined categories.
    • Through card sorting, your audience determines what categories / topic naming aligns with them. This helps you understand their expectations of your business.

My recommendation is to start with an open card sort, then after the topic categories are decided, use a closed sort to see how everything works.

It’s also helpful to do another keyword research against the taxonomy after its finalised.

For instance, high-level topic categories would likely have more search volume. As your categories get more specific, you might discover different search intent and purchase journey your customers have.

Combine keyword research and content strategy here to get the best information flow.

#6 Http vs Https

It’s important that your website makes the move to HTTPS.

HTTP stands for Hypertext Transfer Protocol.

It’s designed to enable two different systems to communicate. So a web browser can communicate with a web server to deliver a website. This protocol instructs how information should be transmitted and what actions web servers and browsers need to do in response to a command.

HTTP transfers everything openly in plain text, it’s like a post card. The details of your message is visible to anyone who handles the post card and anyone who comes in between, before it even arrives at its destination.

If you want your content to be secure, you need to protect and hide its contents.

Thats where the S meaning came from in HTTPS.

HTTPS does the same purpose of transmitting information, but it encrypts that information before delivering. Also, HTTPS comes with authenticity. You need to get an install an SSL certificate to create that encryption and prove that the website is who it says it is and nobody is impersonating it in an attempt to intercept that message.

This is important: Encrypted traffic and HTTPS is important to Google. 

“We are investing and working to make sure that our sites and services provide modern hTTPS by default. Our goal is to achieve 100% encryption across our products and services.”

– Source: Google Transparency Report, HTTPS encryption on the web.

Now it should be clear that have a secure website with HTTPS not only is a security measure but also a trend that Google is working towards.

Migrating from HTTP to HTTPS

If you’re migrating your website from HTTP to HTTPS, you’ll need to be very careful.

Make sure that after securing and installing SSL certificate on your site, the code libraries and file hosting services are also secure. This ensures your website will validate as true HTTPS.

Checklist after migrating to HTTPS:

  • Verify the new URL in Google Search Console and select your default as either www. or non www.
  • Submit a new sitemap to Google Search Console after verifying your sitemap is listing your web pages as HTTPS
  • Implement 301 redirects from HTTP to HTTPS
  • Update all internal links so you’re pointing to HTTPS
  • Check that all canonical tags are using HTTPS
  • Check that robots.txt references are correct
  • Check and resubmit any disavows done to Google on HTTPS domain
  • Update and re-verify analytics platform and tracking tags as necessary

Yes this is definitely no easy feat. Messing up your HTTPS migration will cause mixed warnings on your site and it would usually hurt web performance when done in a sloppy manner.

Bonus tip:

For WordPress sites: I highly recommend using Siteground and Cloudflare. There’s really nothing more reliable, plus their customer support is fantastic. Engage their services and leave the hard technical in their good hands.

#7 Sitemaps

A sitemap lays out all the content on your website. This helps search engine crawlers and users find the information they need when they visit your site.

There are two types of sitemaps:

  • HTML sitemap
  • XML sitemap

The HTML sitemap is key for both visitors and search engine crawlers.

The XML sitemap is only important for search engine crawlers. It gives search engines an easy way to see all the links on your site, how often they are updated and what you see as the overall priority.

 

Final takeaway

An optimised website helps you gain a real competitive advantage. From the user experience to the content structure and page speed, these are factors that play a significant role in how you rank.