How does you website appear to search engines? Have you crafted proper titles and descriptions? Do you need a birds eye view of webpage information?
Screaming Frog SEO Spider is a free website page auditing tool for crawling, analysing and highlighting how exactly a search engine like Google sees your website.
Enter any URL and the software fetches all onsite elements, presenting the information in rows and columns, like a spreadsheet. There are loads of uses for the tool even under the free plan.
It’s not just for public websites either. Do you build your websites offline in a localhost?
I use WAMP to develop sites and yet I use Screaming Frog to comb the site for errors like dead links producing 404 pages. It will also show missing meta information.
Video Overview
On Page Criteria
Enter the address of the website you’d like to crawl and click Start.
Once you’ve finished crawling a website you can scroll down to see how many pages have been included in the analysis.
Scroll to the right and you’ll see various columns about the website page address, the type of content it is, status code, page title, meta description, headings, meta robots, canonical links, size of page, word count, internal links and external links.
This all sounds like a lot to digest, and it is!
There’s an official user guide on the Screaming Frog site and I suggest you refer to it if you want some documentation from the horse’s mouth.
However, if you want to hear me put it all in my own words (of course you do!) read on. 🙂
Resource Info Summary
Each of the rows represent “resources”. Resources include, among other things, webpages and images.
If you click a row, the lower bottom pane of the Screaming Frog interface displays information about that resource.
The same information is located in the horizontal columns at the top. I’ve broken down each column one by one, below…
Address Column
The address column is the URI, which is an acronym for uniform resource identifier. This is computing lingo for something uploaded to your server, whether it be text, image, video, audio or a script.
The screenshot below shows rows containing each URI:
The most common type of URI also known as URL, is a HTML webpage. Screaming Frog comes equipped with another column to the right of the Address column showing what type of URI we are looking at.
Content Column
The Content column sits to the right hand side of the Address column. Each row indicates what type of content is on each URI.
In the screenshot below we have a mixture of HTML webpages (with character set information), images (JPEG files, in this case) and applications (javascript).
Status Code & Status Column
The next two columns return the status of each URI. A status code of 200 means that everything is functioning successfully and that search engines are able to fetch those particular resources.
A 404 (Not Found) suggests resources from a particular URI are unavailable.There are several possible reasons.
A webpage might have been deleted, or the URI might have been changed to something else but there could be page on your site still trying to link to that resource.
Other reasons for seeing a 404 may be because the URI is included in your website sitemap or exists only as a canonical link with no actual page.
Click the row and use the lower pane to look for clues as to the cause of the 404:
Clicking the Inlinks and Outlinks buttons will normally reveal where the error is originating:
The cause of the errors shown above was simply because I had changed the page slug on a blog post and forgotten to update the corresponding canonical link. Tut tut for me! But I found the error thanks to this tool.
One more example: An eBook download link was broken. I could tell instantly by looking at the Inlinks information that I must be linking from an image, because there was no Anchor Text.
There was only ALT text (which I add to all my images) and that was the clue I needed to check the images on that particular webpage to locate the broken link and fix it:
By the way, I recommend looking at this useful list of HTTP status codes if you want to know what the most common ones mean.
Title Column
The next column is perhaps on of the most important, since webpage titles are the headlines uses in search engine snippets.
Page titles are what search engines use to help make decisions about what results to return, and is users see when choosing what results to click.
If you’re using WordPress, do not confuse the page title with the post/page title. The title can be seen in the top of your browser by hovering the mouse cursor over one of the tabs like so:
I cannot emphasise enough how important titles are. And that leads me to the next column…
Title Length Column
A good page title must ideally fit into search engine results snippets without becoming truncated. The reason you don’t want page titles to be truncated is because the people reading them won’t get the whole title, and might not click.
Keeping page titles shorter is a good practice and if you keep them to around 55 characters, it is likely to fit.
The column for title length in Screaming Frog is invaluable because it lets you see where you can make improvements.
In 2014 Moz released a handy tool for inputting title tags and generating realistic Google preview snippets. They also emphasised the importance of understanding that there really is no “magic number”, but that on average, 55 characters gives some wiggle room.
Another reason for keeping page titles short is because “Tweet This” style Twitter widgets on websites usually fetch page titles when a user shares the content.
Keep your titles shorter to ensure they fit into the 140 character Twitter limit when someone tweets your stuff.
Title Pixel Width Column
This column is interesting because often we should be more concerned with title pixel width rather than title character length. Why?
Well, characters come in different shapes and sizes, as well as uppercase and lowercase. It’s all very well saying the ideal character limit for search engine titles is 55ish characters but in 2014 Google changed the text size of titles in search results, making the font larger.
I have found 480 pixels is a fairly safe width for containing page titles.
Meta Description Column
If meta descriptions haven’t been crafted for a particular webpage, you’ll know about it when you run the software.
When we were discussing page titles earlier, I showed a screenshot from the Google search engine results page. Now let me highlight the meta description of that page:
Each description should be unique and provide a summary of what each page is about. Assuming you have added these, they’ll show in the corresponding Screaming Frog column.
For the eagle-eyed readers, you might have spotted a sentence fragment in the screenshot description above! Yes, I noticed that in the meta description column earlier and have now fixed it!
- By the way, if think you’ve spotted a missing description, it might be because those rows correspond to a resource that is not a webpage, in which case, no description is necessary.
Like page titles, meta descriptions should be kept to a certain length, and that’s what the next column is for.
Meta Description Length Column
I suggest a meta description length of no more than 160 characters. As you can see in the screenshot below, some of mine have exceeded the recommended length but not my much.
Meta Description Pixel Width Column
Like with the title tags, character limitations for webpage meta information can often be a moot point. The pixel width is the amount of space the characters used in the description take up.
There are a number of useful filters you can run, in addition to rearranging columns, that reveal if you’ve hit or exceeded the pixel width truncation limit for the description.
As of August 2015 – at least with version 4.1 of Screaming Frog – the pixel width truncation limit is set at 928.
Meta Keywords Column
If the webpage you’re analysing uses meta keywords, they will be displayed here. None of the larger search engines pay much attention to meta keywords and have not done so since 2009.
Meta Keywords Length Column
One way for a webmaster to hurt their own site is to “stuff” keywords in an attempt to gain some ranking advantages. Too many keywords send up a red flag for the search engines.
If you do choose to include keywords, 10 – 12 key phrases is probably plenty. This translates to around about 100 – 150 characters maximum.
H1 Column (First Occurrence)
Heading 1 tags are also known H1. Screaming Frog looks for two occurrences of the H1 tag and denotes them as H1-1 and H1-2.
The screenshot below shows that the words “Small Biz Geek” are used inside every page, except the “Subscribe” heading which is a full width landing page.
The reason “Small Biz Geek” is used multiple times is because by default WordPress has assigned a H1 tag to the title of my blog.
The Site Title and Tagline are used for H1 & H2 respectively
You can’t actually see those words on the webpage from a user’s perspective because CSS has been used to position the H1 tag well outside of the browser window and out of view.
This is because I replaced the text H1 heading and its smaller H2 strapline with a graphic logo. Although the text is not rendered visible to users, the search engines can still see those words.
Here’s the proof that the HTML H1 tag is alive and well:
H1 Length Column (First Occurrence)
This is the character length of the first occurrence of the H1 tag.
H1 Column (Second Occurrence)
The second occurrence of the H1 tag is denoted as H1-2. Not all websites have multiple uses of the H1 tag and I would say any more than three H1 tags is heading into risky territory. I just made a “heading” joke there! 😛
The title of my posts and pages are using H1 tags since that is what each page is about. Think of Heading 1 tags as hanging a lantern on the words you want to illuminate for the benefit of the search engine spiders.
It is probably worth pointing out that since my blog post titles are identical to webpage titles, the second occurence of these particular H1 tags are also the same as the webpage titles I talked about earlier.
H1 Length Column (Second Occurrence)
This is the character length of the second occurrence of the H1 tag.
H2 Column (First Occurrence)
Heading 2 tags are denoted as H2-1 in their first occurrence. In this case I’ve used the phrase “My passion for small business design, marketing & tech” as my WordPress website tagline.
WordPress has then automatically wrapped the website tagline inside the H2 tag.
As I mentioned a minute ago, I chose to replace the text H1 heading and H2 heading with a graphic logo. When I did, WordPress used CSS to effectively hide these words from the website although they’re still there in the HTML.
H2 Length Column (First Occurrence)
This is the character length of the first occurrence of the H2 tag.
Canonical Link Element Column
If you’ve included canonical links for each of your pages they will show up here. As I mentioned earlier, any 404 errors could be a typo with a canonical link.
I forgot to correct one of these links a few weeks back and it was showing as a sort of phantom page and therefore returned the 404.
Basically, all you need to know is that a canonical link for a webpage must match the URI exactly.
Size Column
This figure is measured in bytes. To calculate the size in kb divide it by 1024.
Word Count Column
The word count is all the words between the <body>
and </body>
HTML tags. This can be a bit misleading because dynamic websites usually have headers, sidebars and footers.
I’m more interested in the word count of the main content area on a page.
Inlinks Column
Number of internal inlinks to the URI. ‘Internal inlinks’ are links pointing to a given URI from the same subdomain that is being crawled.
Click a URI row containing a wepbage with Inlinks and you can view those links in the lower pane:
Outlinks Column
Number of internal outlinks from the URI. ‘Internal outlinks’ are links from a given URI to another URI on the same subdomain that is being crawled.
External Outlinks Column
Number of external outlinks from the URI. ‘External outlinks’ are links from a given URI to another subdomain.
Hash Column
Hash value of the page. This is a duplicate content check. If two hash values match the pages are exactly the same in content.
Response Time Column
Time in seconds to download the URI.
Last Modified Column
Read from the Last-Modified header in the servers HTTP response. If there server does not provide this the value will be empty.
Redirect URI Column
If you’ve created a 301 redirect for an old link, the Redirect URI column shows where it is pointing.
The example below shows that I originally published a page but later redirected the link:
500 URI Crawl Limitation
Although you run a small business, is your website small?
Reason I ask is because Screaming Frog is limited to 500 URIs under the free/lite version. At first, I found this confusing because I thought they were saying I could only run as 500 top level domains searches.
What it actually means is that if the website you crawl exceeds 500 pages (500 URI’s) then you’ll need to upgrade to a paid licence in order to get all the data.
This might be a moot point since many small business websites or indeed small websites. You might have as little as 5 pages (5 URI’s). However, if you have an eCommerce store, the product listings could run into the thousands of URI’s, especially if you have lots of images.
Analysis
This is an impressive piece of software and a must have tool for anyone running a website. The level of insight offered under the free/lite version is simply fantastic because it offers up errors that are glaringly obvious from a search engine perspective but invisible to the naked eye.
For a technical audit, I recommend to look towards SiteAnalyzer – a free program for auditing and technical analysis of the site. At the same time, the set of functions is practically not inferior to paid counterparts