Client side vs server side analytics: What's the gap in data?
- TL;DR: Here’s a summary
- How big of a data gap is there between server log analysis and web analytics?
- What are client and server-side analytics and how do they work?
- Advantages of server-side analytics
- Problems with server logs
- Are server logs more privacy-friendly than client-side analytics?
- Are server-side analytics a realistic alternative to client-side analytics for the average website?
TL;DR: Here’s a summary
The main benefits of server-side analytics are that they have no impact on your page speed and the fact that adblockers cannot block server logs. But are these advantages worth the side effects?
With server log analysis, it is harder to filter out robots, crawlers, spiders and other non-visitors. And there is a lot of automated bot traffic on the internet.
How big of a data gap is there between server log analysis and web analytics?
To learn more about the differences in the data between these approaches, I’ve compared the data between Plausible Analytics as client-side analytics and AWStats as server-side analytics on my own website in June 2020.
Comparison from my own website
With access to your server logs, you can feed the logs into server log analysis tools such as AWStats, Analog, GoAccess and Webalizer to get a dashboard with charts and graphs. This is exactly what I’ve done. There are many other server-side analytics providers depending on where you host your site. Netlify and Cloudflare are some of the examples.
I’m lucky to have access to the cPanel through my web hosting provider and cPanel has integration with AWStats. I installed the Plausible Analytics script on my site too (I’ve opened up my Plausible Analytics dashboard to the public and you can view it here), so I can now compare the complete stats for June 2020 between these two services.
I compared all the data points that were measured by both tools. This means that I excluded things such as the bounce rate and the visitor devices as those are not measured by AWStats. I also excluded things such as bandwidth consumed as that’s not measured by web analytics.
I pretty much ignored my site in June to see what would happen. I didn’t publish any new content and I didn’t share any of my posts anywhere at all. Here are the stats.
My website stats for June
Let’s start with Plausible Analytics. You can see my complete data for the month of June in this open dashboard.
Here are my top line stats for June in Plausible Analytics:
And here are my top line stats from AWStats:
There is a great data disparity between the two.
The total number of unique visitors
- Plausible Analytics: 10.3k
- AWStats: 21.2k
More than 100% higher number of visitors according to AWStats.
The total number of page views
- Plausible Analytics: 16.8k
- AWStats: 305.5k
18 times higher number of page views according to AWStats.
The definition for the “pages” metric according to AWStats is: “The number of “pages” viewed by visitors”.
- Plausible Analytics: Google 6.4k
- AWStats: Google USA 7k, Google UK 3.1k
Plausible Analytics alongside other web analytics such as Google Analytics display all traffic from Google under one referral. AWStats splits traffic from Google according to the locality. Between Google USA, Google UK, Google India and others, there were more than 10,000 visitors from Google combined according to AWStats.
- Plausible Analytics: /blog-or-vlog/ 2.5k
- AWStats: /wp-cron.php 93k, /feed/ 78k, home page 67k
Most of the page views and pages that AWStats counts are made by bots to the backend pages that are not customer facing. My actual top post, /blog-or-vlog/, is 6th in the list on AWStats with 3.2k views.
- Plausible Analytics: USA 3.2k
- AWStats: USA 207k (doesn’t show the number of unique visitors but a number of page views)
Interestingly enough, Russia is listed second on AWStats while it doesn’t even make it into the top 10 on Plausible Analytics.
Top operating systems
- Plausible Analytics: Windows 32%
- AWStats: “Unknown” 67.5% (doesn’t show the number of unique visitors but a number of page views). Windows is second with 15.6% of all page views.
- Plausible Analytics: Chrome 59%
- AWStats: “Unknown” 64.7% (doesn’t show the number of unique visitors but a number of page views). Chrome is second with 18.5% of all page views.
What are client and server-side analytics and how do they work?
Let’s take a step back and look at the differences in the way that client and server-side analytics work.
Server-side analytics are server-side programs that monitor network traffic on the server itself and only process server-side information. Every request on your server, no matter what it is and by whom it is made, is recorded in the server logs.
The main issue with the server-side approach is that not every request on your website is from real people. On most websites, the number of requests from non-human visitors largely outnumbers the number of requests from real people.
Bot filtering in server-side analytics
It seems very clear that the majority of data included in AWStats is made by robots. AWStats tries to isolate and exclude robots. They automatically exclude the “not viewed traffic” which includes “traffic generated by robots, worms, or replies with special HTTP status codes”.
Despite these efforts, it seems to miss out on a lot of bot traffic. This seems to be a general issue with server log analysis as a whole rather than an issue with AWStats software itself.
There could be a way to make it more accurate by manually excluding other bots. If I exclude back-end pages such as wp-cron.php, /feed/ and wp-login.php that the bots are trying to access, I may get something closer to the actual numbers.
Similar is the case with top operating systems and top browsers. If I exclude the “unknown” category in AWStats, the remaining table becomes a bit more usable. Same for top countries with Russia, for instance, having way too inflated numbers on AWStats compared to the reality.
Advantages of server-side analytics
Server-side analytics have several native advantages:
- No impact on your site speed and loading time
- Not blocked by privacy-friendly browsers, adblocking extensions and other content blockers
- No third-party integrations needed
Problems with server logs
Here are some disadvantages of using server logs:
- Server logs are not pretty, not user friendly and the user experience is not ideal. It’s a bit of a pain to set up and the user interface is a little bit difficult to use and understand. Many site owners want a prettier and simpler plug and play solution.
- Server log stats are not accurate when compared to what stats you see in tools such as Plausible Analytics. An overwhelming amount of website traffic is automated and comes from search engines, web crawlers and other bots. How to find signal-in-the-noise of server-side analytics and how to filter out bots are very time-consuming tasks.
- Despite many hosting providers such as Netlify and Cloudflare offering server-side analytics, many sites such as those on GitHub pages have no access to server logs at all.
- Server logs only keep a limited amount of traffic data so you need a way to archive them before they are deleted.
Are server logs more privacy-friendly than client-side analytics?
How is Plausible Analytics different from the average client side analytics?
Privacy-friendly web analytics tools such as Plausible Analytics have more in common with server logs than they do with Google’s surveillance capitalism tools.
I would actually say that server logs record and display more personal information than a tool such as Plausible Analytics. Server log analysis on my server using AWStats show all visitor IP addresses, the number of times they have accessed my site and the last time they have visited.
Here’s a quick summary on Plausible Analytics and the way it differs from tools such as Google Analytics:
- We don’t collect nor store any personal data. Unlike server logs, we don’t even collect nor store IP addresses.
- We are lightweight with the script being under 1 KB so no impact on the site speed. Much more comparable to server logs than to Google Analytics.
- We are isolated to one site only and don’t do any tracking of people around the web and across their different devices.
- Our business model has nothing to do with personalized advertising so we have no interest in collecting, sharing or selling your website data to anyone. You are fully in control and have full ownership of your website stats. Our business model is the subscription fee we charge you to host and manage your website stats for you.
- Our script can be served from your domain name as a first-party connection if you’re concerned about adblockers and other content blockers.
- We can be self-hosted on your server if you’re concerned about third-party connections. Our self-hosted product is free as in beer but you do need to install, maintain and manage it yourself on your server.
Are server-side analytics a realistic alternative to client-side analytics for the average website?
Google Analytics is the most popular client-side analytics tool on the web. It is installed on more than 50% of all websites. From speaking to many website owners over the last few weeks, server logs, in general, are not seen as an ideal replacement for Google Analytics by most.
Some web hosting packages may come with server logs pre-installed and ready to use. For most of the others, it is a bit more complicated process to get them set up.
Trying to get the average website or business that is using Google Analytics right now to remove Google Analytics and switch to server-side analytics reminds me of that famous Hacker News comment about the announcement of Dropbox:
“You can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.”
How would the average website or business owner that runs Google Analytics switch to a server log analytics system?
- You need to have access to a server. This is especially difficult when considering that many sites these days are hosted on proprietary CMS tools that don’t allow you to have server access
- Then you need to set up your server logs in the first place unless your server already supports them
- Then you need to enable archiving as by default server logs get discarded after the system processes them
- Then you need to redirect your server logs to another tool to parse them, organize the information into a format readable by people that are used to Google Analytics dashboard and then show the analytics reports with pretty charts and tables
- Then you need to figure out how to filter out bots from the data to get something that resembles the accuracy you are used to from client-side analytics
Some site owners are happy to do this process and that’s fine. I’ve been using server logs for a year or so since I decided to remove Google Analytics and before discovering and becoming part of Plausible Analytics.
But this is not a realistic solution for millions of websites running Google Analytics and donating their visitor and customer data to Google today. There is a need for a better plug and play solution.
Server logs are an option for some site owners with basic analytics needs that are happy to do the work to get it all set up and that can live with inaccuracies caused by so many web visitors being robots.
These site owners need something user friendly, accurate and to actually convince them to make the move away from Google Analytics, they need something that can improve upon some of the issues they experience with Google Analytics.
If we insist all the websites remove Google Analytics and replace them with server logs, we will ultimately fail in the goal of restricting Google’s grip on the web.