Online Behavior - Analytics Articles

In October 2014, Google Tag Manager V2 was officially released. In the wake of this major UI and functional overhaul, the developer team also published an API service that allows anyone to create value-adding components on top of Google Tag Manager (GTM).

The API really opens a whole new world of interaction with your GTM accounts and containers. Tired of waiting for some feature to be developed by Google in GTM? Well, build it yourself!

In this article, I'll walk you through how the API works, what you can do with it, and I'll wrap things off with a simple example, where I use Python to list the GTM accounts and containers your Google ID has access to. Fasten your seat belts!

Available API Services

The API gives you a number of services you can invoke with an authenticated user. Most of the services support typical CRUD (create, read, update and delete) operations, and communications with the API are performed using the HTTP protocol.

Google provides a number of useful client libraries which make it very easy for you to build the service calls.

The services you can access through the GTM API are:

google tag manager api services

Accounts: Lets you access GTM accounts the user has access to.
Permissions: Allows you to modify permissions for GTM accounts and containers. Also lets you view and modify permissions for a given user.
Containers: Gives you full control over container resources in a given GTM account. You can, for example, create a new container, delete a container, or list all containers under an account (that the user has access to).
Container Versions: Perform versioning operations on a container, such as restore, publish and delete. Also gives you access to the entire version resource, together with all tags, triggers, and variables within the version.
Tags / Triggers / Variables: Gives you full control over the assets in a given container. Also lets you create new assets, delete assets, update them, etc.

In short, you can do everything that's available through the UI, but as an added bonus you can create your own solutions that facilitate things that are not available in the UI, such as mass operations, cloning assets from one container to another, and so on.

For example, to clone a container from one account to another, you would need to write a solution that does the following:

Get the container version you want to clone.
Create a container in the target account, using the container version as the body of the request (so that the new container has the same settings).
For each tag, trigger, and variable in the container version to be cloned, create a new tag, trigger, and variable in the target container, using the respective asset as the body of the request.

Operations like these seem complex, but they are actually very trivial calculations for a client application communicating with a service endpoint.

How The Google Tag Manager API Service Works

Google Tag Manager API is very similar to all the other Google APIs out there. That is, it uses OAuth 2.0 for authentication, and it provides you with a client library that you can use to build the requests to the web service.

If you've never worked with Google APIs before, you might want to follow this guide for the Google Analytics API. It has a very nice walkthrough of all the things you need to do to get started with using Google APIs. The steps you need to take, in a nutshell, are:

1. Register a new project in the Google Developers Console, and make sure you've enabled access to the Google Tag Manager API.

google developer console

2. Create new credentials for the application. What type of credentials you want to create depends on whether or not you're working with a web application, an installed application (e.g. command-line), or with a service account. If you want to try out the command-line example at the end of this article, make sure to create credentials for a native application.

application credentials

3. Download and install the right client library, depending on which programming language you want to use. The example at the end of this article will be using Python.

4. In the application code, you will first need to create a service object, using authorization credentials the native application requires. Depending on the scopes you choose, the user will need to authorize the application to access their Google Tag Manager data.

google tag manager data access

5. Using this service object, you can then proceed to call all the services the Google Tag Manager API provides. The service object will be valid for as long as the user doesn't revoke authorization to the application. Naturally, you might need to store the access credentials in some file to ensure that the user doesn't need to provide authorization every time they run the command-line application.

The most difficult thing to grasp in this process is that of authentication and authorization. The OAuth 2.0 protocol isn't difficult to understand, but it has multiple layers and the process is ambiguous to many. But if you follow the steps outlined in the next chapter, you should have a better idea of how authentication works in the Google API universe.

Simple Command-Line Application

Command-Line Application The application we'll create is a simple Python program, which defines the service object, and then proceeds to get the names of all GTM accounts your user has access to. For each account, it also gets the names of all containers you have access to. Everything is output into a text file, which will look like the image on the left.

So, let's get started.

I'm assuming you've read the previous chapter. At this point, you should have a new project created, and you've also created credentials for a native application. You should also have the Python client libraries installed in your development environment.

If you're using Mac OS X, and you've installed the client library following the instructions, you won't even need to setup a complicated development environment. All you'll need is to edit the Python file (.py) directly in a text editor, and then run the application from the command line!

First things first, download the client secret JSON from the Google Developers Console, and store the JSON file as client_secrets.json in the directory you've created for your application. This file will link your application with the project you created in the API console.

JSON File

To kick things off with the code, you will need to create a new text file that will become the Python application. I've used the imaginative name gtm2txt.py, but you can choose whatever name works for you.

First things first, we'll need to import a bunch of modules to help us work with the GTM API:

import argparse
import sys

import httplib2

from apiclient.discovery import build
from oauth2client import client
from oauth2client import file
from oauth2client import tools

These modules are required for the rest of the code to work. Next, as this is a Python command-line application, we'll need to define the main method and invoke it with any command-line arguments:

def main(argv):
    # Content coming soon

if __name__ == '__main__':
    main(sys.argv)

This is required stuff for any native Python application.

All the rest of the code comes in the main method, so remove the line # Content coming soon, and let's get on with the code!

# Define variable constants for the application
CLIENT_SECRETS = 'client_secrets.json'
SCOPE = ['https://www.googleapis.com/auth/tagmanager.readonly']

First, we'll create some constants. If you followed my instructions, you downloaded the client secret JSON from the Google Developers Console, and you renamed it to client_secrets.json, saving it in the same directory as your Python application.

Next, we're defining a scope for the application. As this is a very simple app, all we'll need is read-only rights to GTM accounts and containers. You can view all the available scopes behind this link.

# Parse command-line arguments
parser = argparse.ArgumentParser(parents=[tools.argparser])
flags = parser.parse_args()
    
# Set up a Flow object to be used if we need to authenticate
flow = client.flow_from_clientsecrets(
    CLIENT_SECRETS,
    scope=SCOPE,
    message=tools.message_if_missing(CLIENT_SECRETS))

These lines set up a Flow object. In the world of Google APIs, a flow is how the credentials are passed from the application to the web service and back, after a hopefully successful authentication. The Flow object is built using command-line flags, but since this is a very simple app, we won't be actually using any flags. As you can see, the client_secrets.json file and the scope are passed as arguments to the Flow object.

# Prepare credentials, and authorize the HTTP object with them.
# If the credentials don't exist or are invalid, run through the native client
# flow. The Storage object will ensure that if successful, the good
# credentials will be written back to a file.
storage = file.Storage('tagmanager.dat')
credentials = storage.get()
if credentials is None or credentials.invalid:
    credentials = tools.run_flow(flow, storage, flags)
http = credentials.authorize(http=httplib2.Http())
    
# Build the service object
service = build('tagmanager', 'v1', http=http)

These are very important lines. First, the application checks if a file called tagmanager.dat is located in the application directory. This is the file where we'll save your credentials after a successful authorization. If the file isn't found, or the credentials within are invalid, the run_flow() method is invoked, which opens a browser window and asks for your authorization to the scopes you've defined. A successful authorization returns a credentials object, which we then use to authorize all API requests.

Finally, the service object is built, using the credentials we got back from the authorization flow.

This is how OAuth 2.0 works. Authorization is requested, and if it's provided, a service object can be built with the credentials.

Now that we've built the service object, we can start calling the API and performing tasks.

# Get all accounts the user has access to
accounts = service.accounts().list().execute()
    
# If the user has access to accounts, open accounts.txt and
# write the account and container names the user can access
if len(accounts):
    with open('accounts.txt', 'w') as f:
        for a in accounts['accounts']:
            f.write('Account: ' + 
                    unicode(a['name']).encode('utf-8') + 
                    '\n')
            # Get all the containers under each account
            containers = service.accounts().containers().list(
                accountId=a['accountId']).execute
            if len(containers):
                for c in containers['containers']:
                    f.write('Container: ' + 
                            unicode(c['name']).encode('utf-8') + 
                            '\n')

The very first command we're executing shows exactly how Google APIs work. As you can see, we're invoking a number of methods of the service object, which we built around the Google Tag Manager API.

So, to get all the accounts the user has access to, you will have to run the accounts().list() query method against the service object. It takes no parameters. I know this, because I've consulted the instructions for invoking the list() method in the Google Tag Manager API Reference Guide. The execute() command in the end runs the actual service call.

Because it's a variable assignment, I'm actually storing whatever this API method returns in the object accounts. By looking at the reference guide again, I can see that the API returns a JSON list object, with all the accounts I have access to as objects within this list.

As I know now what the response resource is like, I can confidently first check if there are any accounts, using Python's built-in len() call to check the length of the list. Next, I can iterate over all the account resources in this list, storing the value of the name property in the text file. The reason I'm re-encoding the value in unicode is because I might have access to accounts that have been created with a character set not supported natively by the default encoding.

Look at the call I'm doing on the service object next. I'm invoking the accounts().containers().list() method to get a list of containers the user has access to. This time, I will need to add a parameter to this call, namely the account ID I want to get the containers for. Luckily I'm in the process of looping through the accounts returned by the first API call, so all I have to do is send the accountId property of the account currently being looped through. Again, I can check the detail for the containers().list() method from the reference guide.

And that's the application right there. Short and sweet. Once you have the code in a Python file, you can run it with the command python gtm2txt.py.

You can download this application code from my GitHub repository.

It's not difficult, if you have a modest understanding of Python, if you understand how authorization is passed to and from your application to the web service, and if you're prepared to consult the reference guide for the API multiple times while debugging your application.

What to do next?

Well, the world is your oyster. The API opens up a lot of possibilities. I've created a set of tools for accounts created in the new Google Tag Manager interface. The tools are free to use, and can be found at v2.gtmtools.com. The toolset is built on a number of API methods similar to the ones we've explored today. Be sure to check out the user guide I've written as well.

Feel free to use your imagination with the API. Here are a couple of ideas you might want to try:

A branch/merge publishing workflow for containers.
A diff for container versions, which shows what changed, what was removed, and what was added between two container versions.
A tool which lets you add a trigger condition to multiple tags at once.
An app which is similar to the one we just created, but which outputs the details into a .csv file, and includes also account IDs and container IDs.

image

Advanced Table Filters on Google Analytics

Table filters are a very powerful feature in Google Analytics. They allow you to perform deep analysis from within the interface. From experience, I can say that many Google Analytics users don't know how to effectively use this feature, a missed opportunity for sure!

Most reports in Google Analytics contain one dimension and several metrics by default. However, it is easy to add a secondary dimension to your report, as seen in the screenshot below. And table filters really help you to slice and dice through your reports and their building stones - metrics and dimensions - in a more efficient way.

Table Secondary Dimensions

In this article I will guide you through table filters and a few related topics.

View Filters vs. Table Filters

I would like you to understand the basic concepts first, this will make it easier to get the complete picture.

View filters are applied before the Google Analytics data is saved in your account. They are set up in the Admin interface and will apply to all data in the view, forever. I won't go into much detail and examples about view filters here. Read this guide to Google Analytics filters for in-depth information on view filters.

Table filters work in a different way, they are ad-hoc segmentation filters. Contrary to view filters, these filters don't permanently affect any data from your reporting view. You can think of it as filters in Excel. There are two types of table filters:

Standard table filters allow you to filter data for the first dimension in your report and this can sometimes be limiting.
Advanced table filters are more powerful as they allow you to filter on all available dimensions and metrics in your report.

By now you understand the difference between the different filter types that are available in Google Analytics, here is a quick overview.

Filter Comparison Table

A good knowledge of regular expressions is very handy when working with both filter types. I recommend learning at least the basics.

Filtering Standard Reports with Advanced Table Filters

Below I discuss how to create advanced table filters on Google Analytics.

1. Click on the advanced link to the left of the search field

Advanced Filter link

2. Choose your filter

Advanced Filter Levels

First level: include or exclude
Second level: dimension or metric
Third level: matching type
Fourth level: filter field

3. (optional) Filter two dimensions simultaneously
You have the option to filter on two different dimensions at the same time if you add a secondary dimension to your report, as shown below. Here both Browser and Source / Medium are visible in the Google Analytics report and advanced filter field.

Filtering Google Analytics using two dimensions

Note: the advanced table filter is limited to the dimensions and metrics that are included in your report. Also, by default the advanced table filter is linked to an AND equation. By smartly working with regular expressions you can build an OR equation as well. Google Analytics accepts RegEx in both standard as well as advanced table filters.

Concluding Thoughts

Make sure to understand the difference between view filters and table filters
Start every data deep dive with a business question; what are you trying to solve?
Advanced table filters provide you with a great option to filter out anomalies before exporting or presenting a report
Start with one or two advanced filters and expand if you need to
Make sure enough data on the level of dimensions remains to draw statistically significant conclusions.

image

Filtering Google Analytics using two dimensions

Google Analytics goals are a fantastic way to measure key activities on your website and enable a slew of analytics reports right in the GA interface. You are doing yourself a disservice if you haven't set them up. Make sure you do it, make sure you do it correctly, and make sure you do it ASAP!

Our analytics team sees heaps of common mistakes when working with companies (large and small) that are easily avoidable. In this article I will provide a few tips and tricks to make them work better for you and ensure your data is reliable.

In summary, don't lose time, make sure your goals are tidy and improve your Google Analytics reports today.

1. Goals only collect data once they are configured

Tracking Goals

Unfortunately, Google Analytics Goals are not retroactive. When you set up a goal, the way GA processes your data changes immediately, all of a sudden GA will store additional data for you. You can access this data in a variety of reports across audience, acquisition, behaviour and conversion reports right in the user interface.

So, even though you may have already been tracking your key activities, now you can easily view performance across dozens of new dimensions like traffic channel or landing page.

2. Goals fire once per session per goal

Goal Session Goals function as counters and once a particular goal has fired, that's it for the remainder of that user's session. A user can trigger all twenty goals in a session but each goal will only be captured once. So even if a user submits your promotion form offering free Nutella for life 100 times in one session, that goal will only fire once. That's right, once.

Another key component to consider, in view of this fact, is whether or not you want to aggregate or separate conversions for certain activities across your site. For example, if you have PDF downloads on your product page and also on your careers page, you may want to set these up as two unique goals. Make sure to lay out a measurement plan defining your site KPIs and align your goals with your website or mobile app objectives.

3. Goal funnels don't impact completions, they apply to Funnel Visualisations

Funnel Visualization

While it is very useful to set up goal funnels, they solely aid in reporting the goal Funnel Visualisation report. A goal with a required funnel will not have a lower number of goals completions than the similar goal with a required funnel (except in the Funnel Visualization report). If you want to track two different paths to conversion your best bet is to view the funnel visualisation report or tag the paths differently. Additionally, you might also consider using the Behavior Flow report.

And a word to the wise, if you do set up funnelled goals be sure not to aggregate the goal values as you will be double counting conversions!

4. You can use Regular Expressions (RegEx) to build more robust goals

Goals can be turbo-charged when you combine them with regular expressions (regex). Regular expressions are operational functions which allow for advanced conditional statements in input bars (i.e. you can even use them for Google searches). For example, if you want a goal to fire on multiple pages or events you can utilise operators like a RegEx pipe or 'any string' to set it up. RegEx is fun! For a RegEx library and testing visit regexr.com.

Regular Expression RegEx

5. Goals values can be used to measure relative value

Goal values can be used outside of a relationship to currency. They can also be used relationally. For example, if you've set ten goals on your site you can define the importance of each activity on a scale in relation to each other. As a user navigates through your site they may fire off various goals. I like to think about this as a pinball machine or Mario Brother's video game. As a key activity occurs, users collect value, or coins.

Once you are tracking relational goal value you can go into your most reports, for example the location report, select 'All Goals' and compare the total goal value / sessions by each country. This will tell you not just which country was most likely to convert but also which country was most engaged with the important, valuable sections of your site.

If you are already using goals properly, good on ya! If not, get started today or reach out to a qualified professional to get you up and running.

image

There are multiple analytics tools that excel at Customer Analytics and fill gaps in areas where Google Analytics may not have excelled. But over the last year or so, Google Analytics has been consistently pumping out new updates and has some solid offerings to help you understand and analyze your customers more effectively and close some of those gaps. Some highlights from these offerings are Flow reporting, Enhanced E-commerce, User ID, Data Import, and improved Audience Reporting.

In this post I will discuss each of those separately and provide a short summary of what you should know about them and why you should be using each of those features.

Behavioral Flow Reporting

Behavioral Flow reporting has been around for quite sometime, and is incredibly helpful when trying to uncover how your customers are behaving on your site. Flow reports display how a customer moves through your site one interaction at a time. What makes flow reporting especially powerful is the ability to alter your starting dimension. You can choose from any number of dimensions to see how users are traveling through your site from specific sources, mediums, campaigns, geographical locations and more!

Behavioral Flow reporting

While studying the paths your customers may take, you can uncover more detail by highlighting specific lines of traffic or viewing segments of the dimension you're investigating. Combine this with an advanced segment and the sky's the limit.

Enhanced E-commerce

One of the largest announcements to come out of Google Analytics last year, is Enhanced Ecommerce. Moving past basic transactional detail, Enhanced Ecommerce provides analysts with even deeper insights surrounding the customer journey. Included in Enhanced Ecommerce is the ability to track all phases of the purchase process, upload product data, refund data, and a slew of new reporting dimensions and metrics. With this new functionality, analysts can easily answer questions like "Where are my customers falling off in the transaction process?""Which of my products are viewed most frequently?" and "What products are most frequently purchased or abandoned?"

Enhanced E-Commerce

Similar to Behavioral Flow reports, Shopping Behavior Analysis provide an overarching view of your customer's journey from site entrance to transaction completion. Using the visual (above) analysts can quickly identify where the highest amount of fallout during a site session is occurring. The steps within this report are customizable to best fit your website needs, and are based on your site's implementation.

User ID

It's no secret that consumers have overwhelmingly transitioned to a multi-device lifestyle. Home computers, work computers, smart phones, tablets and even gaming systems all provide individuals with a means to view online content. Historically, visiting sites from these different venues resulted in a unique user for each device. To create a more complete picture of the user, Google announced User ID's with its roll-out of Universal Analytics earlier this year.

User ID report

This is a big deal. User ID functionality will provide the ability to tie together how consumers interact with brands across devices and answer questions like "Do my products sell more frequently on smartphones or desktops?"" Which devices are used primarily for research?" The User ID can also be associated with authentication systems, providing the ability to create custom segments based on attributes specific to your organization. The User ID provides for the ability to have a more complete picture of a customer's online journey, allowing you to promote and optimize your site more effectively.

Data Import

If you haven't already migrated to Universal Analytics, another reason to do so is Data Import. By leveraging either a customer ID or transaction ID, you can upload corresponding data directly into Google Analytics. This could include information such as:

age
gender
customer lifetime value (total purchases)
# of transactions
loyalty card holder
and much more!

Data Import

With these added dimensions, you can discover new trends among your customers. Just remember not to upload Personally Identifiable Information (PII). Uploading PII is against Google's terms and conditions, and in any case is a best practice to protect you and your customers data and personal information.

Whether you're investigating how customers move through your site, where they fall off before making a purchase, how they're interacting across devices, or want to include additional information to your reporting, Google Analytics provides solutions to answer all of these questions. We've only scratched the surface of the capabilities of Flow Reporting, Enhanced Ecommerce, User IDs and Data Import, but the case is clear; Google Analytics paints a dynamic picture of how your customers are behaving online.

image

Colonoscopies are no fun. The exam is one of the most disagreeable screenings in medicine; but scientists have found a way to make patients "dislike it less". The solutions is contrary to what any web analyst would include as a hypotheses to "optimize the experience": increasing the duration of the exam.

That's right: increasing the duration of the exam!

A study by Nobel Prize winner Daniel Kahneman assessed patients' appraisals of uncomfortable colonoscopy and correlated the remembered experience with real-time findings. They found that patients consistently evaluated the discomfort of the experience based on the intensity of pain at the worst (peak) and final (end) moments. So why were the patients happier because of the deliberately prolonged colonoscopy? Because a few extra minutes of mild discomfort were added after the end of the examination by the doctors - making the "end moment" less disagreeable. This science is explained in Kahneman's theory called the Peak-end rule - here is a great interview with him.

Website Exams

This is a great example on how we can sometimes jump at conclusions while analyzing data and assume that we are creating better user experiences on our online properties by simply "reducing pain" instead of "increasing happiness". And increased happiness (you can also call this "customer satisfaction") means more conversions and more referrals. So, how do we know that we are really "increasing happiness"? We have to do what the doctors did in the exams - ask the patients.

Voice Of the Customer (VOC) is essential to really gauge visitor experience. There are various tools and frameworks available to measure VOC - but I am here to talk about the Net Promoter Score (NPS). NPS is a 1-question survey: How likely is it that you would recommend our company/product/service to a friend or colleague? The scoring for this answer is based on a 0 (not at all) to 10 (extremely likely) scale, and a follow-up question is then presented asking the client the reason they gave that score.

Visitors who gave you a 0-6 rating are called Detractors, they are unhappy customers who can damage your brand through negative word-of-mouth. Visitors who gave you a 7-8 score are the Passives, who are satisfied but unenthusiastic customers who are vulnerable to competitive offerings. And last are the Promoters, who gave you a 9-10 score: loyal enthusiasts who will keep buying and referring others, fueling growth.

Just like any customer feedback framework, NPS isn't perfect. But it's simple, and it gets things done. It gets you feedback, and it opens the door to improvement. If you're not getting client feedback from every single client, it's time to start.

Net Promoter Score in Google Analytics

Net Promoter Score on Google Analytics

Besides the valuable insights you will get from the comments on why certain customers gave specific ratings, you can bring this information into Google Analytics and try to uncover behaviors that correlate to happiness.

The first thing to consider is to survey your customers while they are actually using your website - instead of an email message. This will allow you to gauge a more precise perception of their experience.

The survey timing is also very critical. You don't want to survey your clients before they have had time to acquaint themselves with your site or SaaS. But at the same time, you don't want to limit your surveys to very heavy users - because these shy away from the average and tend to be back all the time because they love your site.

I recommend using a Custom Dimension to hold the NPS rating. This will allow you to slice & dice pretty much any report to observe different behavior patterns based on their happiness. A quick way to get started measuring NPS is to use FanExam (disclosure: I am the founder), it has a hefty free plan with automatic GA integration.

Things to look for in NPS segmented reports include:

Top content read / features used by promoters but not by passives or detractors.
Traffic sources by type - focus investments in promoter sources.
The new cohort analysis can also bring valuable insights.

This will also help validate if certain hypothesis intended to increase conversions are also affecting happiness.

Closing Thought

I really don't think that if the doctors who optimized the exam for happiness started measuring Net Promoter Score with their colonoscopy exams - people would be handing out 9 and 10's! But I am also sure that if their patients had an exam elsewhere they would definitely recommend this doctor.

In the web, it might be pretty hard to get a great score for a support website, where people come to only when they are already frustrated with your service... but you should definitely look at the trends: can you increase the score from 6 to 7? That would already represent a huge win.

image

A pretty common need for content-driven websites is to know how well their long-form text content engages visitors. Content is the lifeblood of their value proposition, and provides the pageviews and return visits that justify investments from advertisers. This got me thinking about ways to use Google Analytics to measure this in a more accurate way than currently exists.

Right now, content marketers are stuck with the rather crude "time on page", "time on site", and "pages per session" metrics. Time on page or time on site, for example, measure visitors on the site, but measure them whether or not they have actual content on screen. Pages per session can include pages that have nothing to do with content, such as "About Us" or "Contact" pages. There has to be a better, more precise, and more useful way to crack this puzzle.

I would propose a better way: we could measure how long the "container" for our main piece of content is visible to the visitor and has been on screen. We can measure this by page or by session, just as we do with time on page and time on site, but the main difference is that we are only measuring what is relevant to provide us feedback as to how engaging content is or how long our audience takes to read it.

What would we need in order to do this? We would need to add a CSS "id" tag to the HTML container that our content will be displayed in. We would also need to ensure that tag only appears when the content inside the container is the actual blog post.

The next step would be to come up with a mechanism that does the following for us:

Watches for the presence of the container on screen (inside the browser viewport).
Records the time the container was first visible on screen.
Records the time at which the container stopped being visible on screen or the time at which the visitor navigated away from the page or the website.
Subtracts the time the container was first visible from the time the container was last visible in order to get the amount of time the container was on screen.
Sends the data to Google Analytics as an Event when the page changes so we can get data for a per-page "time on content". We'll need to be able to briefly intercept the user's navigation, make the calculation, fire off our Google Analytics data, and then let the visitor continue onto their original destination. It's possible, and it's been done before to implement "outbound link" tracking on other websites.
Sets a visit-level cookie to add the amount of time content was visible to a running total so we can get the per-session "time on content". Divide this by the number of pages
Sends the data to Google Analytics as an Event when the page changes so we can get data for a per-session "average time on content". Again, we'll need to intercept the user's navigation to calculate and send the data to Google Analytics.

It would also be important to set a maximum amount of time in case a user leaves the page open in the background. Consider getting someone else who is less used to reading content on the Internet to read a selection of articles of varying length on your website while timing themselves. The amount of time it takes that person to read each of the articles can be averaged out to provide you with a benchmark for how long the average user might take to read the article. Add a few seconds to that average, and then use the resulting number as your "time-out".

So there you have it: an idea for more effectively defining and measuring visitor engagement with text-based content like blogs, articles, or essays, on a website whose central value proposition is based on that content.

The more we know about how visitors are using content, and the more accurate that knowledge is, the better we can find out what sort of content works best for our target audience and adjust our strategy and posting schedules accordingly. Content may be king, but data is its queen.

There is a lot of useful and interesting data held in your Google Analytics account that could be used to drive content on your site and apps. For example, you might want to show your website visitors what are the most viewed products, or the most viewed articles, or the best performing authors, etc.

In this tutorial I provide a step-by-step guide showing how to create a Top Authors widget using the Google Analytics API, Google App Engine (Python) and Google Tag Manager. You can create a free Google App Engine account that should give you enough allowance to build and use your widget. You can see the end result of this tutorial right there on the right hand side of this site, see "Top Authors" widget.

There are 2 reasons we are using Google App Engine as a proxy instead of just calling the Google Analytics API directly:

Avoid exposing any sensitive information held in Google Analytics. Eg. Instead of sharing pageviews we will calculate and share a percentage of the maximum pageviews instead.
There is a limit to the number of API calls that can be made and with this method we only need to call the API once a day as we will cache the results. Therefore we don't risk exceeding the API quota; also, as the data is cached, the results will return a lot faster.

The steps below will take you through all the way from creating your app engine project to adding the widget to your site using Google Tag Manager.
Create a New Google Cloud Project
Create Your Google App Engine App
Enable the Google Analytics API
Use Import.io To Scrape Extra Data
Create the Top Authors API
Serve the Widget using Google Tag Manager

1. Create a New Google Cloud Project

If you have not used Google cloud before sign up and create a new project at https://console.developers.google.com. For this tutorial you will be using the free version of App Engine and therefore you do not need to enable billing. Name the project and create a brand friendly project id as this will become your appspot domain, eg. yourbrandwidgets.appspot.com

Google Cloud Project

2. Create Your Google App Engine App

Download the Google App Engine SDK for Python and create a folder on your computer called yourbrandwidgets.

In the folder create a file called app.yaml and add the code below. This is the configuration file and it is important that the application name matches the the project ID created in the first step.

application: onlinebehaviorwidgets
version: 1
runtime: python27
api_version: 1
threadsafe: yes

handlers:
- url: .*
  script: main.app

libraries:
- name: jinja2
  version: "2.6"
- name: markupsafe
  version: "0.15"

In the folder create a file called main.py and add the following code

from flask import Flask

app = Flask(__name__)
app.config['DEBUG'] = True

# Note: We don't need to call run() since our application is embedded within the App Engine WSGI application server.

@app.route('/')
def home():
    """Return a friendly HTTP greeting."""
    return 'Online Behavior Widgets'

@app.errorhandler(404)
def page_not_found(e):
    """Return a custom 404 error."""
    return 'Sorry, nothing at this URL.', 404

Create a file called appengine_config.py and add the following code.

"""'appengine_config' gets loaded when starting a new application instance."""
import sys
import os.path

# add 'lib' subdirectory to 'sys.path', so our 'main' module can load third-party libraries.

sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'lib'))

Create a folder called lib in the main folder.

Download the file called google-api-python-client-gae-.zip from this page.

Unzip the folder and add the 4 folders to the lib folder in your project.
Install the other required libs for Flask by creating a file called requirements.txt and add the following text.

# This requirements file lists all third-party dependencies for this project.
# Run 'pip install -r requirements.txt -t lib/' to install these dependencies in 'lib/' subdirectory.
# Note: The 'lib' directory is added to 'sys.path' by 'appengine_config.py'.
Flask>=0.10

Run pip install -r requirements.txt -t lib/ in the terminal install these dependencies. You should now be ready to test locally. Using the Google App Engine Launcher add the application as described in this tutorial.

Then, select the app as shown in the screenshot below and click run; this will run locally and open a new tab in your current open browser.

Run Widget Locally

If this works as expected you should be able to visit the site on your localhost at the port you set.

You are now ready to deploy this to the cloud. Click deploy and keep an eye on the logs to check that there are no errors.

if successful you can test the the app at yourbrandwidgets.appspot.com.

3. Enable the Google Analytics API

To use the Google Analytics API you will need to enable it for your project. Go to the API portal in the developer console under APIs & Auth and click on the Analytics API as shown in the screenshot below. Then, click on the Enable API button.

Enable Google Analytics API

Get the App Engine service account email, which will look something like yourbrandwidgets@appspot.gserviceaccount.com, under the Permissions tab following the steps shown in the screenshot below and add the email to your Google Analytics account with collaborate, read and analyze permission (learn more about User Permissions).

Google Analytics Permissions

4. Use Import.io To Scrape Extra Data

One issue we had while creating the widget in the sidebar of this site was that the author images and links are not stored in Google Analytics. We therefore have 2 options to overcome this.

Option 1

If you are using Google Tag Manager, create a variable to capture the author image and author urls on each pageview as custom dimensions.

Option 2(the option we will use in this tutorial)

We used import.io to scrape the authors page and turn it into an API that we can use in app engine.

In order to see how this works, go to https://import.io and copy and paste this URL into the box and press try it out. You should see the page scraped into a structured format that you can use by clicking on the Get API button, as shown below.

import.io API

As you can see, the API has a record for each author in a neat JSON format including the 3 pieces of data we needed. The author’s name is under "value", the author’s page link is under "picture_link" and the author’s image is under "picture_image". That really is magic.

We can now create a function in our code that will call the import.io api, extract the 3 data points that we need, cache it for 24 hours, and the returns the result. We can test the result of this by creating an url for this. Update the main.py file with this code. You will notice we have now included some new modules at the top.

import json
import pickle
import httplib2

from google.appengine.api import memcache
from google.appengine.api import urlfetch
from apiclient.discovery import build
from oauth2client.appengine import OAuth2Decorator
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
from oauth2client.appengine import AppAssertionCredentials
from flask import Flask
from flask import request
from flask import Response

app = Flask(__name__)
app.config['DEBUG'] = True

# Note: We don't need to call run() since our application is embedded within the App Engine WSGI application server.

@app.route('/')
def hello():
    """Return a friendly HTTP greeting."""
    return 'Hello World!'

@app.route('/importioauthors.json')
def importio():
    authors = importioOnlineBehaviorAuthors()

    return json.dumps(authors)

@app.errorhandler(404)
def page_not_found(e):
    """Return a custom 404 error."""
    return 'Sorry, nothing at this URL.', 404

def importioOnlineBehaviorAuthors():

    ob_authors_check = memcache.get('importioOnlineBehaviorsAuthors')
    if ob_authors_check:
        ob_authors_output = pickle.loads(memcache.get('importioOnlineBehaviorsAuthors'))
        ob_authors_output_method = 'memcache'
    else:
        importio_url = "https://api.import.io/store/data/6f4772f4-67ce-4f78-83f3-fa382e87c658/_query?input/webpage/url=http%3A%2F%2Fonline-behavior.com%2Fabout%2Fauthors&_user=ENTER-YOUR-USERID-HERE&_apikey=ENTER-YOUR-API-KEY-HERE"
        importio_url_result = urlfetch.fetch(importio_url)
        importio_result = json.loads(importio_url_result.content)
        importio_author_images = {}

        for row in importio_result['results']:
            name = row['value']
            importio_author_images[name] = {
                    'picture_image': row['picture_image'],
                    'picture_link': row['picture_link']
                    }

        ob_authors_output = importio_author_images

        memcache.set('importioOnlineBehaviorsAuthors', pickle.dumps(ob_authors_output), 86400)


    return ob_authors_output

You can run this locally or deploy to live and then go to yourbrandwidgets.appspot.com/importioauthors.json to test this is working.

5. Create the Top Authors API

The code shown below will authenticate and call the Google Analytics API using the App Engine service account email we added earlier. As you will see in the API request below, we are getting Unique Pageviews for the top 20 authors from the past 30 days. The code then stitches the import.io data to the Google Analytics data so that we have the author images and links ready to be used.

The results are cached for 24 hours so that the API is only called once a day for all users and returns the data in the callback function name we define when calling the URL.
Add the following code to your main.py file above the line of code @app.errorhandler(404)

@app.route('/topauthors.jsonp')
def topauthors(): 
    # Get the callback function name from the URL
    callback = request.args.get("callback")

    # Check if the data is stored in the cache (it resets after 24 hours)
    output_check = memcache.get('gaApiTopAuthors')

    # If yes then used the cached data in the response
    if output_check:
      output = pickle.loads(memcache.get('gaApiTopAuthors'))

      # If no then request the Google Analytics API
    else:

      # Authenticate and connect to the Google Analytics service 
      credentials = AppAssertionCredentials(
      scope='https://www.googleapis.com/auth/analytics.readonly')
      http = credentials.authorize(httplib2.Http(memcache))
      analytics = build("analytics", "v3", http=http)

      # Set the Google Analytics View ID
      view_id = '32509579'

      # Set the report options
      result = analytics.data().ga().get(
        ids='ga:' + view_id,
        start_date='30daysAgo',
        end_date='yesterday',
        dimensions='ga:contentGroup2',
        metrics='ga:uniquePageviews',
        sort='-ga:uniquePageviews',
        filters='ga:contentGroup2!~Online Behavior|admin|(not set)|Miklos Matyas',
        max_results='20'
        ).execute()

      # Get the authors extra data
      authors = importioOnlineBehaviorAuthors()

      # Loop through the results from Google Analytics API and push into output only the data we want to share publicly
      output = []
      max_unique_pageviews = float(result['rows'][0][1])

      for row in result['rows']:
        author = row[0]
        unique_pageviews = float(row[1])
        perc_of_max = str(int(100*(unique_pageviews/max_unique_pageviews)))

        # Only push the author if their image and link exist in the import.io API
        if (author in authors):
            output.append({
              "author":author,
              "perc":perc_of_max,
              "image":authors[author]['picture_image'],
              "link":authors[author]['picture_link']
              })

      # Save the output in cache for 24 hours (60 seconds * 60 minutes * 24 hours)
      memcache.set('widgetTopTenAuthors', pickle.dumps(output), 86400)

    # Create the response in the JSONP format
    jsonp_callback = callback+'('+json.dumps(output)+')'

    resp = Response(jsonp_callback, status=200, mimetype='application/json')
    resp.headers['Access-Control-Allow-Origin'] = '*'

    # Return the response
    return resp

You will not be able to test this locally as it accesses the Google Analytics API so you will have to deploy to App Engine to see the output.

If it is all working as expecting you should see the result by directly accessing the URL in the browser. eg. http://yourbrandwidgets.appspot.com/topauthors.jsonp?callback=anyFunctionName

You can check for any errors in the developer console under Monitoring > Logs. Select App Engine and click the refresh icon on the right to see the latest logs for every time a URL is requested.

Developer Console Logs

6. Serve the Widget using Google Tag Manager

Using the API we just created, which returns the top authors data, we can add a custom HTML tag to Google Tag Manager that will loop through the results and (using a bit of HTML and CSS) output the results in nice looking widgets, complete with bar charts based on the percentage of the maximum Pageviews we calculated server-side.

You will want to design the widget first, and a tip is to try and reuse as much of the current CSS styles for the website.

a) Add the widget code as a tag

Add the following code to Google Tag Manager as a Custom HTML tag.

<script>
// create a function that will be called when the API is called
function topAuthorsCallback(data){
    // dos something with the data that is returned

    // append any new CSS styling to the head tag
    $('head').append( 
    '<style>' + 
    '.gawidget-author { float: left; width: 100%; }' +
    '.gawidget-author-img { width: 40px; float: left; }' +
    '.gawidget-author-chart { display: inline-block; vertical-align: top; width: 85%; height: 40px; margin-bottom: 5px; }' +
    '.gawidget-author-bar { height: 60%; background: #62B6BA; }' +
    '.gawidget-author-name { height: 40%; padding: 2px 5px; color: #666;}' +
    '</style>' )

    // Create a new div for where the widget will be inserted
    $( '#block-block-18' ).before(
     '<div id="block-top-authors-0" class="clear-block block"><h2>Top Authors</h2></div>' );

    // Create a header for Social links for consistency
    $( '#block-top-authors-0' ).after(
     '<div id="block-social-0" class="clear-block block"><h2>Social Links</h2></div>' );

    // loop through the first 5 results to create the widget
    for (var i = 0; i < 5; i++){

        var authorName = data[i]['author'];
        var authorUrl = data[i]['link'];
        var authorPerc = data[i]['perc'];
        var authorImg = data[i]['image'];
        var authorPosition = i + 1;

        var html_output = '<div class="gawidget-author">' + 
        '<a href="' + authorUrl + '">' +
        '<div class="gawidget-author-img">' +
        '<img src="' + authorImg + '" style="width: 100%;">' +
        '</div>' +
        '<div class="gawidget-author-chart"><div class="gawidget-author-bar" style="width: '+ authorPerc +'%;"></div>' +
        '<div class="gawidget-author-name">' + authorName + '</div>' +
        '</div></a></div></div>'

        $(html_output).hide().appendTo('#block-top-authors-0').fadeIn(2000)

    }

}

// The URL for the API on App Engine
var api_url = 'http://onlinebehaviorwidgets.appspot.com/topauthors.jsonp'
// The function created that will add the widget content to the site
var callback_function = 'topAuthorsCallback'
// Join the above to create the final URL
var url = api_url + '?callback=' + callback_function

// Call the jsonp API
$.ajax({
    "url": url,
    "crossDomain":true,
    "dataType": "jsonp"
});
</script>

b) Create a variable and trigger

In this example, we will be adding the new widget right above the Google+ widget so first we create a Custom JS variable that returns true if the div element holding the Google+ exists and false if it does not, as shown below.

GTM Variable Trigger

c) Custom JS Variable - Google Plus Widget Exists

function(){
  if ($( '#block-block-18' ).length > 0){
    return true
  } else {
    return false
  }
}

d) Preview and Publish the widget

Set the tag to trigger on all pages where the div we are appending the widget to exists, as shown below.

Publish Widget

Save the tag and before you publish, go into preview mode to test that the tag is triggering as expected and the widget is appearing as you have designed. If you are happy it is all working you can publish the tag and launch the widget to all your users.

Your Turn To Create A Widget

This is just one simple example of what is possible and we would love to see what you create. How about sharing your top 10 products based on sales or top performing brands or categories. The possibilities are endless!

image

Last week, during the annual Google Analytics Summit in San Francisco, I had a conversation on stage with my colleague Kerri Jacobs, Head of Sales, Data and Analytics. We spoke about my recent book Google Analytics Integrations and the importance of integrating data in general. Kerri is an extremely smart, knowledgeable and fun person, which made the conversation really pleasant!

This article is a summary of most of the things we spoke about, Kerri asked the questions and I answered them, so the answers are mostly on first person.

Why are data integrations so important?

Last winter I traveled to the UK Lake District with my family, and we stopped at the Quarry Bank Mill, a Cotton Mill from the 18th century. We were walking around and learning about the weaving process, and the idea for the book intro came to my mind. It is amazing how the world of data is similar to the weaving process.

Basically, the weaving process has three main steps: raw cotton, threads and cloth. The raw cotton is pretty useless without treatment; threads are a bit more useful, they can be used to tie things up; but it is only when you weave the threads that you get something truly actionable, something that warms you, protects you, and makes you more beautiful. Below is an illustration showing how weaving works, where we have multiple parallel threads (the warps) that are interwoven by one single thread (the weft).

Weaving Analytics

And the same is true for Analytics, raw data and datasets are not very useful; it is only when you manage to bring the data together into a centralized platform that you will be able to make meaningful decisions using it. In summary, I believe that Google Analytics can (and should!) be the weft of your business data.

What is the biggest challenge faced by companies that don’t have integrated data?

The other day I was talking to a colleague that used to be a Marketing Director at a large retailer. She told me that when she got the job she scheduled a meeting with all her teams where each of them presented their revenue numbers. At the end, she took a marker and summed up all the numbers in the white board: the result was twice the actual revenue. This is a great example of the damage siloed data can bring: if we look at data using different tools we are likely to be caught in this kind of scenario.

A good analogy is the Hindoo Fable of the Blind Men and the Elephant. Basically, the fable shows how six blind men each touch a part of an elephant, and each comes out with a different view of what this thing is: a wall, a spear, a snake, a tree, a fan, and a rope. As the conclusion goes:

So, oft in theologic wars
the disputants, I ween,
rail on in utter ignorance
of what each other mean,
and prate about an Elephant
not one of them has seen!

The same happens with data when you don’t have ONE source of truth, because every person in the company can look at a different source and come up with a different theory about what is going on.

How did you choose the integrations to cover in the book?

This was certainly the hardest thing about the book, I had several versions of the Table of Contents and after a lot of thinking I closed in on one, but halfway through the book I had to change it as the product was evolving in a different direction. It was quite stressing, but Wiley, my publisher, was pretty cool about it, which helped.

But one decision was pretty clear to me, to include only those integrations that bring data into Google Analytics, not out of it. I am a big fan of the tool’s visualizations and user interface in general, I think the whole design looks amazing. So I believe professionals should take advantage of that, because it makes a difference on how much people actually use data: if it is easy to use and pleasing to the eyes, people will be less threatened by it.

How do you actually make the integrated data actionable?

If we look closely at the value of integrated data, we can see amazing ways it empowers Google Analytics users to make decisions about how to better spend their marketing budgets and better monetize their digital properties. And some of this actionable data is now available only through Google Analytics, which is pretty amazing.

For example Remarketing, it is incredible that you can build remarketing lists based on online and offline behavior using the Measurement Protocol and Data Import. So, basically, you can bring data from your CRM systems about your users offline interactions (provided that they logged in or told you somehow who they are), and remarket to those people through Google Analytics and AdWords; this might result in better user experience and more successful advertising. In my opinion this is a deep and actionable integration!

How can Google Tag Manager help integrating data?

Google Tag Manager (GTM) is absolutely awesome, and it is touching to see how passionate the community is about the product. And I think they are right to be passionate, because GTM really brings power and scale to the digital professional, it is an incredible tool.

One of the solutions featured on the book, contributed by Stephane Hamel, is the YouTube integration. The official integration only allows us to bring channel behavior into Google Analytics, how users behave inside a channel page. Unfortunately, this data is only a subset of the interactions that happen with video content. Using Google Tag Manager and the YouTube JavaScript Player API it is possible to expose user interactions with videos embedded in a website. For any video embedded in a website it is possible to see whether or not it was played and what percentage was watched by users.

With this data it is possible to create remarketing lists that include only users that watched certain videos on the website, such as a product or explanation video. Then you can use those lists to remarket to users through the Google Display Network to remind your potential customers about your great products.

Why is Data Import so important?

I believe Data Import (along with the measurement Protocol) is a critical feature in that it transformed Google Analytics from a tool to a platform. In the past, the tool was great for measuring interactions with websites and apps; but it was only when Google Analytics started offering the capability to upload/send custom data into it that it became a powerful and multi-purpose data platform.

There are two chapters in the book about Data Import, one contributed by Corey Koberg from Cardinal Path, and another contributed by Benjamin Mangold from Loves Data focusing on Cost Data Import.

One of the examples Corey wrote about is refund data, which I think is critical for data accuracy. You will be using Google Analytics data to optimize your campaigns and make actual decisions, and if some campaigns are bringing a lot of people that buy stuff just to return a few days later, you cannot afford to ignore this data; if you do so, you may do real damage to your business.

Corey also writes about Profit data; optimizing for revenue is sub-optimal, it is clear that if you know which products are the most profitable ones you will be in a better position to bring more money to the table. And Corey makes a great point when he writes that this is certainly something you should import through Data Import, otherwise you run the risk of your customers seeing it. This is a great point in general, as you should know what kind of data you will send through your code (which your users can potentially see) or through the Google Analytics interface, which is seen just by you and your team.

Benjamin focused on Cost Data. I think that’s just unbelievable that people use multiple platforms to decide on their marketing budget, it is so suboptimal! When you upload your cost data from other advertising platforms into Google Analytics you can not only compare between different campaign performances but you will also have a much deeper understanding about users coming from them.

What do you think people should be doing differently?

Stop using Google Analytics to kill time! And I must admit that I am guilty of that too, who in this industry can say that he or she never went to Google Analytics just to spend some time browsing around, or watching Real Time in complete hypnosis :-) So, the next time you see a customer with a huge TV screen on their office showing Real Time reports, don’t Tweet... it is cool, I know, but the problem with that is it brands Analytics as something you watch, not something you do. So when you go to Google Analytics, change your mindset to action, do something with the data.

What’s one thing we can do right now to make an impact?

One of the things that became clear to me in the last few years is that we have to broaden our horizons if we want ourselves and our industry to move forward. When I joined Google, I was already a Google Analytics fan boy, I thought it was the biggest thing on earth and that everyone loves it. But that’s not true. There are whole ecosystems out there that don’t even know what Google Analytics is. And if we don’t go out and tell them how awesome is Google Analytics they will never know.

So one thing I believe we should all do is to think about all the products that can bring data into Google Analytics (DoubleClick, AdWords, AdSense, etc), choose one you don’t know anything about, and start learning about what it does, what data it collects and how you can use this data together with Google Analytics to make better decisions. Make a plan to learn it.

Bonus!

In the end of our chat, Kerri had the very cool idea to do a selfie with all the audience holding my book up (everyone got a copy). That was pretty nice :-)

Analytics Selfie

image

"The only fence against the world is a thorough knowledge of it." ~ John Locke

Consider your world. It is data now. Data is in everything we do. Especially in business. But wait, what does that really mean? Does it mean that you are behind in a race that everyone else is winning and you didn't know you were running? Maybe! But more likely it means you or someone you are working with is thinking about how data is going to impact them or their business. You may already be doing something and want to do more. You may want more or better analytics or machine learning. You may want a deeper understanding of some aspect of your business.

When you start to search for tools or try to build your own data product, you will be overwhelmed by the options. Any sane person would be. The data industry is a fast growing mess of ideas, technology; old and new. It is possible to dive in, spend a lot of money, work really hard and not get anywhere.

At least to me, that doesn't sound very appealing.

What you need to do is work out where you are, what you want to do and prioritise a set of actions to get there. You can do this using a SWOT, or set of box and line diagrams if you wish. But as Simon Wardley would recommend, you would do better by understanding 'position and movement'

Like any great general going into battle, to do this, you need to use a map. Going through the process of Data Landscaping will provide you with a map and a set of prioritised actions that move you towards answers.

What is Data Landscaping?

Data Landscaping gives you knowledge of your world. It is a technique I have developed that can be done very quickly as an individual or in small groups. All you need is a question you think data will help you answer, a pen, and a piece of paper. By the end, you will have a prioritised list of highly actionable next steps to get you started in your data work.

Data Landscaping is not magic, it doesn't unlock the 'Secrets of Big Data' and it won't set up your clusters, algorithms and processing pipelines.

Data Landscaping is based on a 'four box matrix'. This is a grid divided by two axis of opposing extremes. You may have met such examples as 'The Ease / Value Matrix' or the 'Urgent / Important Matrix'. If not, they are well worth the research.

Generally any four box matrix should be set up so that 'Good' is top right and 'Bad' is bottom left. In Data Landscaping the axis are 'Closeness' and 'Lightness'. The horizontal axis goes from 'Distant' to 'Close' and the vertical axis from 'Dark' to 'Light'. You can see an example below.

Data Landscape

The Dark / Light Axis

Dark data means data that you know exists but are not able to use (Email, archives and server telemetry)
Light data you know exists and you are able to use in your business (Spreadsheets, web analytics and supplier costs)

The Distant / Close Axis

Distant data exists outside your organisation and has some distance to travel to be used (Social media, government and research data)
Close data is data within your organisation and can be used immediately (Media spend, transaction and sales data)

Combined

Light and Close is where you want data to be. It is the data you have and know how to use.This is most often the data you 'start with'. While very useful, it can often be the least surprising data sets. This data has well understood value and can be described as business critical.
Distant and Dark, the opposite side of the map, is the the data you know nothing about and you don't know how to get. When most data projects start it is considered a distraction. Thankfully, data landscaping gives you an opportunity to discuss these data sets and avoid glossing over potential value.
Dark and Close is the data you know you have but you don't know how to use. Especially with the cheapening of storage media and the emergence of 'Big Data' as an industry, many organisations decided to 'capture everything' even if they didn't know what to do with it upfront. Often these data sets are aspirational. Organisations that have them would be happy to develop new insight or revenue from these data sets but don't know how.
Distant and Light is the data you know what to do with but don't have. These data sets are the ones organization's desire. You may have overheard 'I wish I knew where our customers lived','Do we know which roads our delivery drivers use most often?', 'I wish we could afford to run a survey to find out what people really think'. These all describe Distant Light data.

A little bit of history

While Data Landscaping has been developed to fill a gap in data understanding, it has been inspired by the work of others. Here are two examples.

1. Gartner

In his Gartner paper from October 2012, Big Data Strategy Components: Business Essentials, Douglas Laney describes 5 types of data that represent potentially valuable sources for a business.

Operational Data - Readily available data typically in databases
Dark Data– Data that remains in archives logs or is not generally accessible
Commercial Data– Syndicated data from the likes of Nielsen and IRI
Public Data– Published governmental data
Social Media Data– Captured data about participation by individuals and businesses in Social Media

You can see them plotted against our axes here:

Gartner Landscape

2. Simon Wardley

Simon Wardley has developed a technique known as Value Chain Mapping (great resources can be found at Wardley Maps and on his blog). They key to a map, of any kind, is that it shows you position and movement as opposed to just a state or, at worst, behaviour.

In the case of Data Landscaping, once you have identified where your data sets are in the quadrants you have a sense of position. Next, you consider movement.

Moving data between quadrants

The goal of moving data is to get it into the Light / Close quadrant. This means you have the data and you know what you are going to do with it. In general there are two kinds of action that move data sets on the map: Access and Education.

Moving Data Landscape

1. Access moves data from Distant to Close

Access actions are most often financial or technological. Financial actions are simply the purchase of data but can also involve an investment in matching, integrating or uniting teams or technologies. Technological actions are the implementation of acquisition, ingest, integration or unification, matching and storage technologies. This can be as simple as learning how to call a programmatic web API on demand to setting up an automated data pipeline.

Other access barriers can include data quality and trust. If data is poor quality users might reject the data and block the value expected from accessing the data.

Access actions have the clearest value to an organisation. They are easiest to ask for and often come with a well understood return on investment.

2. Education moves data from Dark to Light

Education actions teach people what data contains and how it can be used. Education can be as simple as reading documentation, press releases and searching online. These actions can also include training courses and experimentation. Data sources that provide a well defined data product often come with the best documentation and the easiest education actions. Data that was 'just collected because it seemed like a good idea' often have the most complex and challenging Education actions. Often a large amount of experimentation is necessary to discover the value within dark data.

Education actions have a less well defined return on investment, especially as they relate most closely to aspirational data sets (those in the Dark / Close quadrant). Often a 'self starter' will investigate the possibilities of a data set, then will evangelise it's use within an organisation.

Interestingly, quite often Dark / Close datasets become Distant / Light after education because access considerations are one of the things that were not known.

There are many 'Tools' available for working with data. Tools both assist with Access actions and Education actions. When deciding which tools to invest in it is important to understand which kind of movement they are creating. As with the actions themselves, it is easiest to understand the value of tools that give you access to data you knew you wanted. However, it is possible that when you receive the data it is much 'Darker' than you were hoping for. Similarly, tools that help educate people about data sets but don't provide any improved access can be frustrating.

How To Build a Data Landscape: An Example

Business X does not sell directly to their customer, but has a strong brand and sells their products through large retailers and supermarkets. Their product has a strong association with health and vitality. As part of their marketing activity they want to create a 'honey pot' of content for sharing in social media. This will include advice videos, motivational images and articles. This content is categorised, themed and carries keyword tags to help users find it on their site and through search engines.

To understand the effectiveness of this strategy they want to answer the following questions through data:

What content is available on the website?
How much traffic each type of article gets?
Which articles are shared the most?
The sales of the key brands promoted?

The data landscape applied to the given model looks like this:

Data Landscape Example

The company has implemented Google Analytics on the honey pot site and can see traffic to various articles. This puts Google Analytics in the Close / Light quadrant. However, because of the structure of the website the details and content of the articles themselves are not available within Google Analytics. The company knows that the content is generated in a CMS but this is a bought-in system and they do not have direct access to the underlying data. This makes the CMS data Close and Dark. They do know, however, that an RSS feed is available that provides the data required. The data as RSS is therefore lighter and closer than the CMS - however they don't know how to use it as data.

The company built into the CMS a share button and configured it to allow sharing to Facebook, Twitter and Pinterest. Each of these systems gives out some form of data. Twitter and Facebook have well developed APIs for data or insights. In the case of Twitter, there are tools that allow deeper analysis beyond the API provided by Twitter. Pinterest does not provide a similar service at this time. This puts Facebook and Twitter on the light side but distant from the company and Pinterest on the dark side of the line. Google Analytics is also able to track the 'click' on the share buttons for each site.

Nielsen and Kantar provide volume and penetration data about the products in question, but the data is quite expensive to access so is more distant than Facebook and Twitter. The company also knows that Tesco and Holland & Barrett have data about the sales of their product but no idea what they could use it for or how to get access. This makes these two sources Distant and Dark.

The company can start getting insights about their strategy now because they have a Light and Close source (Google Analytics). Educating an analyst in what is available in RSS will move this data Closer. However, based on the technical capabilities of the organisation it is likely to get more distant because RSS is not a useful data format and needs processing to be useful to an analyst.

Similarly, reading the documentation on the CMS will help lighten this data. Again, it is likely to push it more distant because it will be locked into the way the CMS works. If the RSS is complete, this may not be needed.

Connecting to the Facebook and Twitter APIs, or using another appropriate tool, will give access to data from these sources and allow the analyst to understand how articles are shared further.

To understand sales impact, the company can decide to pay for Nielsen or Kantar data to bring this data closer. Alternatively, they could contact the retailers directly about any data products that are available. It is often possible to barter for data as part of the wholesale process.

The further the data source has to move towards Light and Close the more investment is needed in access, education and tools.

Getting started

There is no real trick to getting started: draw the axis on a piece of paper or whiteboard and lay out what you know. To get the maximum benefit from the framework you should be brutally honest with yourself and ask questions such as: Is data from a source really as Light and Close as it could be? Do I really know what is available from public data? Do I need to invest more in education, at this stage, than access or tools?

Data Landscaping should be done quickly. There is little point in agonising over the exact positioning of a piece of data. As long as it is relative to the others on the landscape - that is fine. Also, for your business the exact definition of Dark or Light might be different from others. The key is to have the discussion and agree the relative position of items of data.

The next step is to prioritise a list of actions that will move data closer and lighten it for your business. It is easiest to focus on small movements first and, as you get comfortable with the process of moving data, you can dig deeper into what is available.

The breadth and richness of data sources available is ever increasing, Data Landscaping is designed to help stopping this becoming overwhelming.

image

The idea of measuring the full customer journey has been around for quite a while, and we have seen solutions that partially solved this challenge, but within the boundaries of click-only traffic or paid-only channels.

Historically it has been difficult to integrate ad views without clicks, generic traffic channels (direct, organic, referral) and cross-device journeys within one holistic view. With the integration of Google Analytics Premium and the DoubleClick Campaign Manager (DCM) all of this is available NOW.

All touchpoints included in the channel path

Once you integrate Google Analytics Premium and DoubleClick Campaign Manager, when you take a closer look at the Google Analytics Multi-Channel-Funnels (MCF) path analysis report, you will notice some special features, as seen in the screenshot below.

Multi-Channel-Funnels path analysis report

All touchpoints, where users have viewed a display ad on the journey to conversion (even without clicking on it), are marked with the eye icon. Bear in mind that in this case we do not only measure classic display ad views, but also email newsletters that have been opened, but not clicked. This leads to insights as illustrated in the following path.

Measuring Ad Views

In this scenario the newsletter was opened, but obviously didn't attract enough attention to yield a click. Later on, a display ad supported the customer journey, which led to a reopening of the email, a click within the email and lastly a conversion.

Look around for articles on how to gain insights from attribution modeling leading to campaign and media optimization. Typical questions are:

How do my e-mail campaigns support my revenue?
Which other channels get support from email?
What other channels are necessary for email to perform well?
How do display views influence the path length?

Path length and time with and without view attribution

Having only partial insights (and data) from the customer journey leads to wrong assumptions and decisions. Here is the comparison of the path length for a specific goal, with and without taking the view contacts into account.

Cumulated Conversions

Without taking the views into account, we would think that we need fewer interactions than we actually do. This assumption is made even worse when we look into the assists. In the screenshot below, the views are NOT included. Here we see 42 assisting clicks to the conversion and the assisted conversion value of €2,285.

Path length with no views

When we look into the same data WITH attribution of the ad-views, we see the ads assisting the goal 895 times - 20 times more! And the assisted conversion value is €33,716, which is €31,431 higher than in the previous screenshot!

Path length including ad views

Without attributing the views, the display channel seems worthless. A wrong conclusion would be to decrease media spend for that channel and shift it to the "performance" channels. This could result in a loss of awareness in the upper funnel, which could lead to lower conversions and ROI: a fatal error for most businesses out there.

But what is the value of a view?

We often get asked: "But has a simple ad view the same value as a click?" This is a valid question, as a click is a clear indication of interest, whereas a view is only the technical delivery of an ad to the user's browser; we cannot determine whether it has really been seen or whether it was of any interest.

This is why the Google Analytics attribution modeling engine offers us the option to customize its models in multiple ways. In this article I want to emphasize the adjustments regarding ad views:

As shown in the screenshot below, we can define that

in general a view should only be attributed 50% of the value of a click (0.5)
however, if a view is followed by a click within 10 minutes, it should be counted as 150% (1.5)
and if a click did not lead to conversion directly but to a visit with a higher user engagement (time on site > x:xx), it should be valued at yyy%

Create attribution model

Conclusion

As we all know, the future of marketing is data driven. Having only fraction of the data leads to suboptimal and sometimes inaccurate decisions. The 360 integration of all channels (click AND view interactions) and a successful cross-device measurement (online AND offline) is the key to success. Google Analytics Premium and DoubleClick Campaign Manager offer a supreme solution for that with the power of data in a well known user interface. That enables businesses gaining the right insights for the best strategies.

This article is also available in german: Full customer Journey Analyse mit Analytics und DoubleClick

image

Multi-Channel-Funnels path analysis report

Google Analytics Art: Symmetry & Patterns

As many of you, I use Google Analytics quite a lot, and I love the visualizations and the interface in general. But once in a while I am gifted with little data jewels that really make my day. We all know (or should know) that the chance of getting an exact 50-50 rate is the same as getting a 51-49, but round numbers have a certain glamour, and it is hard not to notice and appreciate them, especially when talking about very large numbers.

Usually, we look at it and appreciate the beauty for a few moments, maybe even share it with a colleague… but I also like to collect them :-) So, today I am sharing a few screenshots I collected over time with some nice patterns and symmetries.

Note: I am aware that it is possible to create any visualization on Google Analytics with fake data by simply going into the interface, right-clicking on numbers to "Inspect Element", and changing the data in the source code. While that's a great way to create a prank or just admire beautiful visualizations, in this article I am sharing charts I saw on real data, as I find it really mind-blowing, as mentioned above.

Symmetry: Lines, Pies, Bars

"The universe is built on a plan the profound symmetry of which is somehow present in the inner structure of our intellect." Paul Valery

Symmetric Line Chart

Symmetric Pie Charts

Symmetric Comparison Chart

Patterns: Multi-Channel Funnels

"Art is the imposing of a pattern on experience, and our aesthetic enjoyment is recognition of the pattern." Alfred North Whitehead

Google Analytics Channel pattern

Display seems quite strong!

Multi-Channel Funnel pattern

Colorful Data Patterns

Colorful data pattern

image

Tired of manually clicking through the Google Analytics interface to retrieve data? Want to take advantage of the Google Management API, but the functions you need just aren't available? Well then, let me share with you my temporary solution and long-term vision for creating an easy interface for such tasks.

As a web analytics implementation consultant at Cardinal Path, I work extensively with Google Analytics and Google Tag Manager. The standard approach that I often take before attempting any form of tagging or code changes is to take a peek at the client's existing configuration and data. This initial audit checks for data integrity, overlooked resources, and implementation practices.

My role offers me the opportunity to work with enterprise-level clients with hundreds of views. This means that I may end up manually clicking tens of thousands of times with a high likelihood for repetition when making a global change whether for maintenance or to resolve an issue.

As a web programmer, I refuse to let the web take advantage of me and thus started my hunt for an automatic and robust solution for repetitive tasks within Google Analytics.

A Sample use case: Adding Annotations to multiple views

A recent client audit revealed that the Annotations feature isn't being used consistently, if at all, during certain time periods. This is a powerful Google Analytics feature that allows anyone who is analyzing data within the reports to realize the reasons for possible spikes in data.

The possible overlook of this feature does not come as a total surprise for this particular client due to the sheer amount of views that are being maintained. As result, we would like to offer insights on how to carry out a possible solution as well as how to approach similar tasks in the future.

Using Google Chrome Console

In the aforementioned use case, the client needs a way to start annotating, and often, the same annotations are required for multiple views / properties. A quick way to achieve this is to automate a series of manual tasks in succession in the browser console. In the case of Chrome, I rely on my handy dandy chrome inspector (shortcut: CTRL + SHIFT + J on Windows, or CMD + OPT + J on Mac).

Before you drop the code provided below into your Chrome console, ensure you are inside the Google Analytics reporting interface and find the starting view by searching for its property in the dropdown menu at the top (as highlighted in the green box in the screenshot below). This step ensures the code runs through the list of views following and including the selected view for that property (the code stops running once it finds a new property).

Google Analytics Semi Automated Tasks

If I select the view MyLabel - David Xue -liftyourspirit david.com in the screenshot above, the code will affect all the views down to the last view in this property, MyLabel - David Xue - Youtube.

Running the semi-automated Google Analytics tasks

You may run the code after pasting it into the console tab by pressing ‘enter.'Caution: this will add annotations to your views, so be sure you are in a test property with test views, or just delete them manually afterwards (better yet, modify this current code for deletion across the property).

count = 0;
start = setInterval(function() {
    var current_view = $('li[aria-selected=true]');
    setTimeout(function() {
        setAnnotation();
        if (current_view.next()[0] == undefined) {
            console.log('Swept ' + count + ' views');
            clearInterval(start);
        }
    }, 2000);

    setTimeout(function() {
        getNextViewInReporting();
        count++;
    }, 6000);
}, 8000);


getNextViewInReporting = function() {
    $('._GAx5').click();
    var current_view = $('li[aria-selected=true]');
    current_view.next().click()
}

setAnnotation = function() {
    var view = $('.ID-accounts-summary-1').text()
    var date = 'Sep 13, 2015' //Enter the date
    var annotation = "A test annotation" //Enter the annotation
    var visibility = "shared" //Default is shared, other is private
    $('#AnnotationDrawer_wrapper').css('display', '');
    setTimeout(function() {
        $('a._GAkAb._GAlo')[0].click()
        $('input[name="date"]').val(date) //"Sep 17, 2015"
        $('._GATb._GACm').find('tbody textarea[name="text"]').val(annotation)
        $('._GATb._GACm').find('tbody textarea[name="text"]').click()
        if (visibility == 'private') {
            $('#AnnotationsDrawer_private_radiobutton').prop("checked", true)
        } else {
            $('#AnnotationsDrawer_public_radiobutton').prop("checked", true)
        }
    }, 1000)

    setTimeout(function() {
        $('._GATb._GACm').find('form').find('a._GAE._GADq b b b').click();
        console.log(count + ' view: ' + view + ' added annotation: ' + annotation)
    }, 1800);
}

The actions that soon follows mimic how we would go around to manually produce the annotation. Upon the scheduled eight-seconds interval, the code will first ensure the drawer near the timeline is open, so the ‘Create new annotation' object is exposed. Then we click this object in order to fill the expanded form with pre-filled data from our code. Lastly, we click on the save button before the process is repeated until the last view in the property.

Please note that the code works as of October 2015, but Google may change their HTML markups so make sure to test the code before you use it..

Further Development

By modifying the code provided above you can enlarge the use cases to include the following (but the sky's the limit!):

Check for campaign data
Check for social network data
Check and edit view configurations
Check and edit most of what is provided in the Google Management API; reason for this duplicate method is that there is quota limit, which you can reference here
Check, create and edit calculated metrics

Since we can semi-automate so much without relying on the Google Management API, my next step would be to create a long term solution in the form of a plugin that will automate and provide a quick summary of the general audit we typically perform at Cardinal Path. Note that Cardinal Path provides a more in-depth and personalized audit with our current clients, so definitely reach out if you would like to learn more.

image

Google Sheets Add-On for Google Analytics Configurations

Google Analytics is known for its simple, turnkey approach to getting started with analyzing your traffic data. Just drop the snippet on your page and GA does the rest! Right?

...right?

Ok, fine. While there are plenty of Google Analytics users who manage just one property, a growing chunk of the analytics market is composed of large companies with complex account structures with dozens upon dozens of properties. Managing things like filters, channel groups, goals, and custom dimensions across all of these entities is far from trivial, and time-intensive at best.

Take custom dimensions, for instance. Imagine having to edit custom dimensions across, say, 20 properties. You'll be moving from property to property, drilling into each dimension setting and waiting for the save to happen before moving to the next property. In short, you'll be clicking around the Admin section of the interface for quite a while!

Custom Dimensions settings

As it happens, most of the people in these large companies with complex requirements are often unable to get through the day without opening up a spreadsheet. In fact, more often than I'm comfortable admitting, people have told me that what they want isn't necessarily a robust tool; they just want a tool that will allow them to get the job done, and ideally it would be in a spreadsheet because that's where they live.

"But spreadsheet add-ons aren't robust enough for enterprise software."
Me: "Yeah, well, that's just like - you know - your opinion, man."

Look, enterprise data is messy. You have to manage it in a way that is sustainable and flexible, but getting it done is better than not. And for better or worse, in large organizations, people live in spreadsheets. By providing tools that let people get their jobs done with as gradual a learning curve as possible, larger organizations will be better positioned to use powerful features of Google Analytics in a way that would otherwise be too cumbersome to consider.

So, for this growing group of users, wouldn't it be great if they could manage the configuration of their custom dimensions in a spreadsheet, copy and paste the configurations across multiple properties (maybe with some slight differences), and then upload the whole thing back into Google Analytics?

Custom Dimension spreadsheet

Google Analytics Management API and Custom Dimensions

The Google Analytics Management API can be used to manage many things, including common entities such as custom dimensions. The API can be accessed through a Google Sheet using Apps Scripts, and Google Sheets automatically handles authentication for Google APIs. This makes Sheets add-ons a convenient way to distribute functionality for important business processes such as listing and updating custom dimension information in the tabular form to which spreadsheet users everywhere are accustomed. In fact, the API was built with the expectation that users would develop their own ways of accessing and processing their data.

One of the configurations that the Management API enables you to manage from your own systems (as opposed to the Google Analytics interface) is the Custom Dimension feature. This is an important feature that allows you to add custom data to the information Google Analytics is automatically getting for you. For example, you can add a dimension to capture:

The type of users (silver, gold, platinum)
The level of engagement in the current session (maybe based on scroll percentage)
The name of the author on an article page

If you do not use this feature, take a look at these 5 questions.

But when you're a marketing organization with limited engineering resources, who is going to write a robust tool to manage these entities at scale in a way that is easy to use and gets the job done?
Your friendly neighborhood Googler, that's who!

Working with Custom Dimensions in Google Sheets

With that in mind, I rolled up my sleeves and started working on an add-on that would help users manage their custom dimensions in a more robust and organized way, while using an interface they are comfortable with. What I came up with can be found in the add-on store: GA Management Magic

Below I provide a step-by-step guide on how to use the add-on to manage your custom dimensions, but if you are the video type of person, you can also see me using the add-on in the following screencast.

1. Install the GA Management Magic add-on

The add-on is available through the add-on store

2. Listing Custom Dimensions

To list custom dimensions from a property, run the List custom dimensions command from the add-on menu (see screenshot below). Enter the property ID from which to list custom dimension settings into the prompt.

Google Analytics configuration add-on

A new sheet will be added, formatted, and populated with the values from the property. You're welcome!

3. Updating Custom Dimensions

To update custom dimension settings within 1 or more properties, run the Update custom dimensions command from the add-on menu (see screenshot above). Enter the property IDs (separated by commas) of the properties that should be updated with the custom dimension settings in your sheet.

The properties listed in the sheet will be updated with these values. Neat, right?

If you have not named the range(s) as described above, the script will format a new sheet for you into which you can enter your custom dimension settings. It is also recommended that you not update blank values into the property as it may result in undesirable behavior.

The code for this add-on is available on GitHub. Feel free to grab, improve and share!

image

This article was contributed by Bill Tripple, Senior Consultant of Digital Intelligence and Alex Clemmons, Manager of Analysis & Insights, Digital Intelligence, both from the award winning digital data analytics firm, Cardinal Path.

One of the most basic questions asked by marketers is: "How much content is being consumed on my website?" Traditionally, content consumption has been measured through the use of pages per session (total pageviews / total sessions). This metric has served as a simple barometer for content consumption for many years, but it has its limitations: namely that we can't drill down to easily see which page(s) on the website are actually driving additional content consumption vs. which pages are just the most popular.

Enter Page Velocity, a custom built metric within Google Analytics that allows us to drill deeper and understand which pages have the greatest influence in driving users deeper into a website. Through the use of page velocity we have the ability to see that Page A drives, on average, five additional page views, whereas Page B only drives three. Perhaps we optimize some of our landing pages to take elements of Page A and test them on Page B. Or divert media traffic from one page to another.

As you can imagine, this metric can be very useful for content-driven websites that depend on advertising dollars, as we can now look into which pages are driving the highest ROI (propelling users deeper into the website)

Defining Page Velocity

The basic principle of page velocity is as follows: Page Velocity = (number of pages seen after the current page is viewed / unique page views to the current page)

The following example measures page velocity from three sessions:

Page Velocity Calculation

Page Velocity Values

From here you can start to see how this comes together. Page A is tied with page B for the highest Page Velocity. Both were seen within two sessions and both drove a total of 9 additional page views. Page G is on the low end with a velocity of 0. It was seen within two sessions but did not drive any additional page views.

Of course you can also start to imagine where this metric has its flaws in that some pages will have a low Page Velocity by design (like the thank you page of a form, for example). Understanding the purpose of each page will be critical for successful analysis using this metric.

Using Page Value to Measure Page Velocity

The measurement of Page Velocity takes advantage of the Page Value metric within Google Analytics. Before diving into specifics, it's important to understand how Google Analytics evaluates Page Value. In Google's official documentation they provide an example where page B would receive a page value of $110 ((10+100) /1 session):

Google Analytics Page Value

To measure Page Velocity, we will need to send an Ecommerce transaction with an arbitrary value of $1 on every pageview so it receives credit for the future pageviews as well.

If you are already using Ecommerce, this will inflate your real Ecommerce metrics / reports, so we recommend that you create one view specific for page velocity, and filter out these Ecommerce transactions in your remaining views. Additionally, you'll want to filter legitimate Ecommerce transactions from your Page Velocity view.

Using this trick, we can now answer the question of which pages on your website are actually driving visitors the deepest. The next step is to segment your visitors, to further optimize existing and future content. Below you will find a pages report showing our Page Velocity metric via the Page Value column.

Page Velocity Report

Even though we're seeing a dollar value, this really represents the velocity. So the page in row 10 of the above screenshot drives an average of 5.27 additional pages.

Going a level deeper with segmentation

The next step is to segment your visitors, looking at page velocity by different source/mediums, by custom dimensions, etc. You can find two useful examples below.

1. Pages segmented by "Medium" with a secondary dimension

Segmenting Page Velocity

2. Pages segmented by "Landing Pages" with a secondary dimension

Landing Page Velocity along side bounce rate can be very telling. For example, a page with a low bounce rate but also a low Page Velocity may be one to look deeper into.

Landing Pages Velocity

Additional use cases

At first glance, Page Velocity may seem to be useful only for content-heavy sites. But this is not the case. We have found that marketers across verticals have an appetite for this metric including in Ecommerce, Banking, Finance, Higher Education, Non-Profit and many others using it in various capacities. One major benefit of the metric is that it is completely customizable to the needs of the website and stakeholders utilizing it.

For example, on an Ecommerce site you may want to understand which products are driving additional research. Tailoring Page Velocity to only be captured on research focused pages (say a product detail page and a product gallery page) would allow us to tailor this metric to be something like Product Research Velocity.

About the Authors

Bill Tripple is a Senior Consultant of Digital Intelligence at Cardinal Path, his area of expertise includes analytics implementations for both Google and Adobe products, and he is convinced that he can track nearly anything. He is certified in Google Analytics, Adobe Analytics, and Eloqua. His current and past experiences in the Digital Marketing and the development industry has given him a competitive edge to easily associate and relate with both Marketers and Developers. Learn more about his professional experience on LinkedIn.

Alex Clemmons is a Manager of Analysis & Insights, Digital Intelligence. He leads day-to-day operations across multiple clients at the award winning digital data analytics firm, Cardinal Path. He has a passion for finding meaning in mountains data and will jump at any opportunity to help clients apply their data to drive results. His area of expertise include measurement strategy, testing and deep dive analysis focused on identifying areas for optimization. Learn more about his professional experience on Linkedin

image

Opening Google Analytics for the first time can be overwhelming. Graphs and reports, never-ending menus, and configuration settings that you may or may not need to know about; it's all there waiting for you. Are you prepared to speak confidently about what you see in your Google Analytics?

Generally speaking, you'll find two main types of articles about Google Analytics: setup and reporting. Setting up the tracking on your website starts easily enough, but can quickly take on barnacles as you encounter challenges with your particular site, third-party vendors, and multiple systems, just to name a few. Reporting seems like it should be much simpler, everyone gets the same set of reports – your reports just have data about your website, assuming proper setup.

In practice, reporting brings its own unique difficulties. Even if you didn't set up the tracking on your site, you still need to understand how the data is collected and processed to understand the data you've been tasked with interpreting.

This guide is meant to cover the basics about how Google Analytics works, what the numbers actually mean, and how you should begin to report on it. If you've used Google Analytics extensively, forgive me for the review, but I feel it's worth re-familiarizing yourself with the core concepts and definitions, if only to solidify your understanding.

Setting Up Our Google Analytics

Let's start at the very beginning. Inside the GA interface, you first create an Account. This will usually correspond with your company name. Inside the Account, you create separate Properties for each website that you own that you'd like to track.

Each website gets assigned its own Property ID, which is how GA will keep your data organized. This ID looks something like UA-XXXXX-YY. Think of the Property like an email inbox and the ID like the email address. You send data to this particular ID and GA collects it for us.

Underneath Properties are Views, which are different ways to view the data that was collected in the property. In your email inbox, you may sort emails into different folders or use tags to identify them. Similarly in GA, we can sort data into different Views so that we can easily look at a smaller section of data. All of the data lives at the Property level, Views are just different pre-sorted or pre-formatted ways to look at it. Here is an example from the Google Analytics Help Center.

Google Analytics Hierarchy

How Does Your Data Get To Google Analytics?

That's it for the interface! Let's talk about how you send data to the right place. GA automatically generates a little bit of JavaScript for you that you then need to place on every page of your website. How you add this to your website is very specific to how your site was built. There are a number of ways that accomplish this like plugins for popular sites like Wordpress and systems like Google Tag Manager that make it easier to add Google Analytics to your site.

Once the code is installed correctly (we won't cover that here), it will immediately start sending data to GA. Again, let's go back to the email address metaphor. The code automatically sends small pieces of data to GA to track what pages are loaded and information about the person and browser loading the page. These pieces of data are called hits and are sent to that unique UA-XXXXX-YY ID that is specific to your site. Here is how to find your tracking code.

This isn't, however, like a video where GA can see someone's mouse moving around on the screen. By default, information gets collected only when the page is loaded and the hit is sent to GA. This is important when we get to how metrics are defined below.

Potential Tracking Challenges

Let's be upfront about a few issues that may occur. Due to the nature of this type of tracking (called client-side tracking) these numbers will never be 100% complete. Google Analytics isn't all-knowing and all-seeing. There are a number of things that may affect data from ever being sent to GA, but these usually only impact a small percentage of your traffic.

Depending on how your particular users access the site, there are several reasons why the data may not be sent to GA. JavaScript is required to execute the GA code that tracks users accessing your site. This feature can be turned off in a particular browser by a person or the company that owns the device, though arguably many pages on the internet would flat out stop working without JavaScript.

For a typical implementation, GA also requires the ability to store first-party cookies on the person's device. First-party cookies are small pieces of data that are stored on the user's computer to help remember if they've been to the site before. This is how GA determines New vs Returning Users. Generally, first-party cookies are considered trustworthy - they can only be accessed by the page that set the cookie. If there's an issue with cookie storage, several metrics will be severely affected.

There are technologies out there to help block advertising and web tracking like Google Analytics. There are many motivations why someone would actively try to block Google Analytics, I've heard everything from privacy concerns to data usage. Unfortunately, there's not much you can do from a technical perspective. If someone blocks Google Analytics from loading properly, then they won't be included in your Google Analytics data. This goes for regular site traffic, as well as ecommerce and other important conversions you may have defined.

Then there's the implementation – it's amazing how often people accidentally install the same tracking more than once or make other small mistakes with big impacts. It should go without saying, but if Google Analytics isn't set up properly, the numbers you get back won't be accurate.

Types of Reports

There are four main categories of reports in Google Analytics and they answer very different questions.

Audience

These reports tell you about the users that accessed your website. More specifically, they'll tell you everything that can be gleaned just by the user arriving on your site. What type of device are they on, have they been here before, where are they coming from geographically. Note that you won't see personal information here. (We're back to Google Analytics not being omnipotent!) There are reports called Demographics and Interest reports that can be turned on, but those are using Google's best guesses and estimates for people's Gender, Age, Interests and Affinities.

Acquisition

These reports help determine how someone arrived at your site. Without going into too much detail, this is determined by looking at what was the previous page someone was on before they arrived on your site. Users are placed into different channels like Organic, Paid Search, Social, and Referral.

Behavior

This section focuses on what users did on your website. By default, you get information about what pages people look at – which landing pages are most popular, how long people spend on specific pages, etc. With some setup, you can also see other actions that take place on your site like downloads and site search.

Conversions

This last group of reports is your chance to tell Google what's most important to you. You can define certain pages or actions that you hope visitors to your site will accomplish, and then be able to see how often those conversions occur. You can also track ecommerce purchases and related actions. This section requires configuration to make it specific to your site.

What Should You Report On?

So you've got your site tracking set up properly and you can see the data flowing into Google Analytics – now what?

It's important to lay out what you'll be tracking and why. Don't get hung up on specifics that are too small to matter. Rather, focus on comparing time periods and trying to identify places for additional research. If traffic went up from last month to this month, where did you see your most growth? Mobile vs Desktop? Were there specific types of content that did better or a particular channel that sent more traffic this month?

Monthly reporting is common but problematic. Keep in mind the complexities that come with months of different lengths. Most website traffic ebbs and flows with the work week, so it would make sense that if one month has more weekdays then it would have more traffic. Holidays throw a wrench into calculations and depending on your industry, so could world news or political events. Often these are the simplest explanations for differences between two months.

We talk often of Measurement Strategies and forward thinking. Try to anticipate decisions you could make with the right data and then create the reports to help influence future decision making. If you're a content or service website, knowing which pages are getting the most traffic can help influence future articles or business opportunities. Traffic performance can influence paid marketing campaigns, and with the right tracking, you can answer questions about how various channels are driving conversions that may influence budgeting discussions.

You can also use reporting to identify potential errors on your website. Hopefully, most errors are due to tracking code issues and easily corrected, but you should also be using reporting to monitor if your site is performing appropriately across browsers and devices. Large spikes or dips can identify areas for future research.

Common Google Analytics Metrics

When you see the term Metrics, think of any sort of number or count in Google Analytics. The most common metrics that we use to report on are listed below.

Users

First, it is important to understand that users do NOT tell us the number of PEOPLE that arrive on our website. In practice, users is a count of the number of unique devices that access our website. Even more specifically, a unique browser on a unique device. Remember those cookies we talked about earlier? Each set of cookies is a different user.

Think about your own digital life – how many computers/devices do you use during the day to access the internet? Work PC vs Home PC? Phone vs Laptop? Each different device counts as a different user. There are ways to get this number to more accurately represent people instead of devices, but that requires additional setup and a situation where you know someone's actual identity (for instance if they log in to your website.)

Typically, this number is higher than it should be. We have more users than actual people visiting the website and some tracking issues will artificially increase this number. People can clear their cookies and get new computers. It's still worth reporting on, but be clear when talking about this particular metric.

Sessions

A session is all of a user's activities on your site within a given time period. If I come to your website and view five pages, that is all grouped into my one session. Remember that GA doesn't have a live camera feed to watch someone browse your site, so there's really no way for it to know when a person leaves your site. It determines that a session is over after a user has been inactive for more than 30 minutes.

Each session gets attributed back to a specific channel in the Acquisition reports, so if someone arrives on our site from Social media or a Google search, all of the activity in that particular session gets credited to that particular channel. If they come back from a different source (or after 30 minutes of inactivity) then a new session is started.

This a great metric to track and report on. We clearly want to see more sessions coming to our site and sessions is a great indicator of activity on the site.

Pageviews

This counts how many pages are viewed on the website, pretty easy right? For general reporting, month over month, it's an OK metric to use to see ups and downs. Keep in mind what it's really measuring though. If you have “hub" pages, like your homepage, where people branch off from and then return to frequently, your pageview numbers will go up, but you haven't necessarily increased value from those extra pageviews.

Typically, this number is higher than it should be, because it includes multiples views of the same page, even during the same session. Use this as a benchmark month over month or year over year, but for more in-depth analysis, use the Unique Pageviews metric for individual pages.

Avg. Session Duration

One of the most misunderstood metrics, we'd ideally want Session Duration to be just that – how long did users on average spend on our site. Instead, you're reporting how much time we've measured that users spent on our site. It may seem obvious, but it's worth making the distinction. Remember that data is only sent to GA by default on page load. Everything after that page load is a mystery until they visit another page.

Typically, this number is lower than reported. You know people are spending more time on your site, and you can take efforts to help get this number more correct. Think of a digital image, the more data you have, the clearer the picture becomes. You can add events to track engagements like downloads or video plays, which will in turn provide GA with more data to make Avg. Session Duration a more accurate calculation.

Bounce Rate

This metric tells you the percentage of sessions on your site that only completed one action. Typically this means how many people arrived on your site and then left without doing anything else, or “bounced." This metric is extremely helpful for gauging effectiveness of landing pages or from specific channels.

(Sidenote – I rarely give “good" and “bad" numbers for metrics, as every site and industry is unique, but if your bounce rate is very low, it can be a sign that you have a tracking issue. It'd be nice to think that your site has a 5% bounce rate, but most often that's not the case.)

Typically, this number is higher than reported. Again, we have an issue that can be solved with more data. By default, the GA code tracks when pages are loaded only. If someone arrives on your page and leaves after 2 seconds, we want that to be counted as a bounce. If they stay for 10 minutes reading an article and then leave, they'll still be counted as a bounce, because GA has no idea they were there for that long. Adding additional events will not only move your Avg. Session Duration closer to accurate, but it will also help clarify your bounce rate.

Bounces also include in the time on site calculations as a zero, which really will bring down your average.

Taking The Next Step

Reporting through Google Analytics can be a rewarding experience when paired with active decision-making based on those reports. It's important to know how your website is performing, but even more so, the insight gleaned from Google Analytics reports can influence strategy, affect budgets, guide development, and more.

There will always be ways to improve your data. You can collect more data, you can collect better data, and even after all of that, you can address data issues that make reporting more difficult.

Take the time to know what is being tracked and how the numbers are calculated to help clarify your own understanding of Google Analytics and how it can impact your website.

For continued education, there are a number of great options out there. Google's Analytics Academy is a great (free!) option for those who enjoy self-guided learning. Our company, as well as many others, offer in-person trainings around the world, covering everything from beginner-level reporting to advanced implementation and Google Tag Manager setup.

Of course, there's no better way to learn than to start doing. It takes a company-wide commitment to identify the decisions that can be helped with data from Google Analytics and then to put it into practice.

image

This post is not about horse racing but bear with me for a second (the analogy is from my Dad, who loves horse racing).

When I was a kid, I asked my Dad a lot of questions about horse racing. He would say: "No matter how fast the horse is, if the jockey is no good, don't bet on that horse" (I am not giving horse betting advice here - and take this with a grain of salt; he hasn't won anything big in the last... uhmmm, 30 years).

You Gotta Love The New Shiny Tool

Now, why am I talking about horse racing? Because there is something that drives me crazy. Tools, tools, tools... what tools do you use? Do you use Google Analytics? Do you use Adobe? Do you use Tableau? What about Power BI? It's cool... you get it. Whenever I talk to someone from the industry, I usually get a tool question, pun intended. Everybody wants to know what tool I use at work. And as you will find out by the end of the post, this has been bothering me for a long while because I, to be totally honest, don't care about the tools you use. I used pretty much every tool out there from Adobe to Webtrends to multiple versions of Google Analytics but what I care most about is what you do with the data these tools spit out.

I've been fortunate (!) enough to spend some time with Executives, and I was never asked about the tools I use. They don't give a damn about my tools. It's all about the analyses my team does and the impact these analyses have on the business.

Perfect Data: The Utopia

If you've been around for a while, you will most likely agree with me that data has never been and will never be perfect. Period. I talk to friends who work at the top 10 banks in the world and it's the same story. There will always be some problem: the data collection methodology is different in the new system, there was a bug in the code, the implementation was inaccurate because the developer forgot to append a semicolon, the analyst didn't write the requirements based on what the business asked for, QA didn't do its job properly because they didn't invest enough time before the release, yada yada yada.

As analysts, I think our job is not to make the data perfect (I am not suggesting that you shouldn't try) but as long as your dataset aligns with the business question / challenge in hand and you understand how it's collected and pulled, you should be in good hands. And this doesn't have anything to do with the tool, the horse; it's always about the analyst, the jockey.

Digital Analytics Jockey Source: Tyler Baze and other Jockeys

If you tried to hire in the last couple of years, I am sure you are aware of the challenges regarding the talent or lack thereof – there are not enough of us. I would, without hesitation, give away my best tools in exchange for a great analyst. An analyst who understands the business well, who understands the system both from the technical and business perspective, is worth more than some shiny tool. And to be blunt, if you ask me about the tools we use in an interview, I will probably thank you for coming in and wish you luck (unless if you are an entry level analyst of course).

I'd Prefer an Abacus

In summary, if you have deep enough pockets to be able to put the best analysts in front of the best tools, go right ahead. You are going to ace your game. But if you have to cut down on your hiring budget to be able to afford the next shiny tool, I would recommend that you think twice. I'd rather give an abacus to a great analyst than give the Most-Universally-Powerful-Super-Automated-Cool-Attribution-Model generating tool to the average analyst.

So let's talk about analysis, let's talk about the business challenges, let's talk about the impact on business when we talk about digital analytics – not tools.

Oh, we use Google Analytics Premium and Google BigQuery at autoTRADER... in case you are wondering!

image

Funnel Analysis with Google Analytics Data in BigQuery

Conversion funnels are a basic concept in web analytics, and if you've worked with them enough, you may have gotten to a point where you want to perform a deeper analysis than your tools will allow.

"Which steps in my funnel are being skipped? What was going on in this funnel before I defined it? Which user-characteristics determine sets of segments across which progression through my funnel differs?" These questions can be answered using the solution described in this article. In particular, I'm going to talk about how to use BigQuery (BQ) to analyze Google Analytics (GA) page-hit data, though the principles could be applied to any page-hit data stored in a relational database.

The Google Analytics Funnel Visualization report (see below) makes certain abstractions and has certain limitations, and advanced users can benefit through the use of Google BigQuery (BQ) - an infrastructure-as-a-service offering which allows for SQL-like queries over massive datasets.

Funnel Analysis

In this article, we'll discuss the benefits of using BigQuery for funnel analysis as opposed to the Google Analytics user interface. In order to make the solution clear I will go over the basic structure of an SQL query for funnel analysis and explain how to use Funneler, a simple Windows application to automate query-writing. The source code of Funneler is also provided as a Python 3 script. Please note that in order to use the specific examples provided here you will need a Google Analytics Premium account linked to BigQuery (learn more about the BigQuery Export feature).

Funnel Analysis - Google Analytics UI vs. BigQuery

The solution I propose below works as follows: using a Windows application (or Python script) a BigQuery-dialect SQL query is generated which tracks user-sessions through a set of web properties, and optionally segmenting and/or filtering the sessions based on session characteristics. BigQuery's output is a table with two columns per funnel stage: one for session-counts, and one for exit-counts.

Below is a list of the most significant differences between GA Funnel Visualization and the solution I will be discussing.

Loopbacks: If a user goes from steps 1 -> 2 -> 1, GA will register two sessions: one which goes to step 1, one which goes to step 2, and an exit from step 2 to step 1. Our query will only count one session in the above instance: a session which goes from step 1 to step 2. Furthermore, since progress through the funnel is measured by the "deepest" page reached, the above scenario will not be distinguished from a session which goes from step 1 -> 2 -> 1.
Backfilling funnel steps: GA will backfill any skipped steps between the entrance and the exit. This solution will only register actual page-hits, so you'll get real numbers of page-hits.
Historical Information: GA Funnels cannot show historical data on a new funnel, whereas this workflow can be used on any date range during which GA was tracking page-hits on the selected funnel-stage pages.
Advanced Segmentation: GA Funnels don't support advanced segmentation, whereas with Group By clauses in BigQuery, you can segment the funnel on any column.
Sampling: GA Funnel Visualization shows up to 50,000 unique paths, whereas BQ will contain all the page-hits that GA recorded, and allow you to query them all.

The Query

For Google Analytics data, the basis of a funnel query is a list of URLs or Regular Expressions (regex), each representing a stage in the conversion funnel.

If you have a pre-existing funnel in GA, follow the steps below to find your funnel settings:
Go to Admin in GA
Select the correct Account, Property, and View
Go to Goals
Select a Goal
Click Goal Details
In this screen you will find a regex or URL for each step of the funnel. They may look like this: "/job/apply/".

The basic process of writing the query, given the list of regexes or URLs, is as follows:

1. Create a base-level subquery for each regex

For each row which has a regex-satisfying value in the URL column, pull out fullVisitorId and visitId (this works as a unique session ID), and the smallest hit-number. The smallest hit-number just serves as a non-null value which will be counted later. The result sets of these subqueries have one row per session.

SELECT
fullVisitorId,
visitId,
MIN(hits.hitNumber) AS firstHit
FROM
TABLE_DATE_RANGE([<id>.ga_sessions_], TIMESTAMP('YYYY-MM-DD'), 
TIMESTAMP('YYYY-MM-DD'))
WHERE
REGEXP_MATCH(hits.page.pagePath, '<regex or URL>')
AND totals.visits = 1
GROUP BY
  	fullVisitorId,
  	visitId

2. Join the first subquery to the second on session ID

Select session ID, hit-number from the first subquery, and hit-number from the second subquery. When we use full outer joins, we're saying sessions can enter the funnel at any step. To count sessions at each stage that have only hit a previous stage, use a left join.

SELECT
  	s0.fullVisitorId,
  	s0.visitId,
  	s0.firstHit,
  	s1.firstHit
FROM (
	# Begin Subquery #1 aka s0
  	SELECT
        	fullVisitorId,
        	visitId,
        	MIN(hits.hitNumber) AS firstHit
  	FROM
TABLE_DATE_RANGE([<id>.ga_sessions_], TIMESTAMP('2015-11-01'), 
TIMESTAMP('2015-11-04'))
WHERE
      REGEXP_MATCH(hits.page.pagePath, '<regex or URL>')
        	AND totals.visits = 1
GROUP BY
      fullVisitorId,
      visitId) s0
	# End Subquery #1 aka s0
FULL OUTER JOIN EACH (
# Begin Subquery #2 aka s1
SELECT
    	fullVisitorId,
    	visitId,
    	MIN(hits.hitNumber) AS firstHit
  	FROM
TABLE_DATE_RANGE([<id>.ga_sessions_], TIMESTAMP('2015-11-01'), 
TIMESTAMP('2015-11-04'))
WHERE
REGEXP_MATCH(hits.page.pagePath, '<regex or URL>')
  AND totals.visits = 1
GROUP BY
      fullVisitorId,
      visitId) s1
# End Subquery #2 aka s1

ON
  	s0.fullVisitorId = s1.fullVisitorId
  	AND s0.visitId = s1.visitId

3. Join the third subquery to the result of the above join on session ID

Select session ID, hit-number from the first subquery, hit-number from the second subquery, and hit-number from the third subquery.

4. Join the fourth subquery to the result of the above join on session ID

Select session ID, hit-number from the first subquery, hit-number from the second subquery, hit-number from the third subquery, and hit-number from the fourth subquery.

5. Continue until all subqueries are joined in this way

6. Aggregate results

Instead of a row for each session, we want one row with counts of non-null hit-numbers per funnel-step. Take the query so far, and wrap it with this:

SELECT
  COUNT(s0.firstHit) AS _job_details_,
  COUNT(s1.firstHit) AS _job_apply_
FROM (
  (query from 2. goes here if the funnel has two steps))

The query has a recursive structure, which means that we could use a recursive program to generate the query mechanically. This is a major advantage, because for longer funnels, the query can grow quite large (500+ lines for a 13-step funnel). By automating the process, we can save lots of development time. We'll now go over how to use Funneler to generate the query.

Funneler

Funneler is an executable Python script (no need to have Python installed) which, when fed a json containing a list of regexes or URLs, generates the SQL query in the BigQuery dialect to build that funnel. It manipulates and combines strings of SQL code recursively. It extends the functionality of the query described in section 2 and it allows for segmenting and filtering of sessions based on any column in the BigQuery table.

Funneler and funneler.py can be found on my Github page (https://github.com/douug).

The input to Funneler is a json document with the following name/value pairs:

Table name, with the following format: [(Dataset ID).ga_sessions_]
Start date: ‘YYYY-MM-DD'
End date: ‘YYYY-MM-DD'
List of regexes: one regex per funnel-step
Segmode: True for segmenting, False otherwise
Segment: The column to segment on
Filtermode: True for filtering, False otherwise
Filtercol: The column to filter on
Filterval: The value to filter on in the above-mentioned column

Here is an example of an input json:

{
  	"table": "[123456789.ga_sessions_]",
  	"start": "'2015-11-01'",
  	"end": "'2015-11-04'",
  	"regex_list": ["'/job/details/'",
        	"'/job/apply/'",
        	"'/job/apply/upload-resume/'",
        	"'/job/apply/basic-profile/'",
        	"'/job/apply/full-profile/'",
        	"'/job/apply/(assessment/external|thank-you)'"],
  	"segmode": "True",
  	"segment": "device.deviceCategory",
  	"filtermode": "False",
  	"filtercol" : "hits.customDimensions.index",
  	"filterval" : "23"
}

Please note the quoted quotes (e.g. in the elements of the value of the key "regex_list" above). These are included because after the json is ingested into a Python dictionary, the Python strings may contain SQL strings, which themselves require quotes. But, the value of the key "filterval" has no inside quotes because 23 is of type int in SQL and wouldn't be quoted.

To run Funneler, go to \dist_funneler\data. Open input.json and modify the contents, then go back to \dist_funneler and run funneler.exe. Three files should appear - std_error.log, std_out.log (which contains feedback about whether Segmode or Filtermode are engaged, and where the generated query can be found), and your query. Copy and paste your query into BigQuery. Try starting with a short funnel, as it may take a few tries to format the input correctly.

Alternatively, if you are running funneler.py, it can be executed from the command line with the following:

python funneler.py input.json

In this case, the contents of the above mentioned std_error.log and std_out.log files will appear in-console. This query can then be copied into your BQ instance. The resulting table should have two columns per regex/funnel-step - one for hits, and one for exits - and one row . If segmode is set to True, then there will be a row per value in the segment column.

Hopefully these tools help you to quickly create complex queries and meet analysis objectives to perform deeper analysis of GA page-hit data.

image

DoubleClick for Publishers and Google Analytics Premium

Recently the Google Analytics team released two new integrations: DoubleClick for Publishers (DFP) for Google Analytics Premium and DoubleClick Ad Exchange (AdX) for all Google Analytics users. While these new integrations were not widely publicized, I believe they are major game changers, they effectively embrace Publishers as first-class citizens, providing a robust solution to measure and optimize ad supported websites.

Up to July 2015, Google Analytics provided only one integration for publishers, with AdSense (did you read my book?), where they could analyze AdSense effectiveness and find insights to optimize results. If you were serving only AdSense ads on your site, it would work well with the Google Analytics integration, so you were all set if you were using just that.

However, DoubleClick for Publishers (DFP) is a widely used solution to serve Direct Deals, AdSense and DoubleClick Ad Exchange (AdX). That's where the DFP integration enters the scene. Before this integration, only AdSense metrics would be available, but since July 2015, you can report Ad Exchange (for all Google Analytics users) and DFP (Google Analytics Premium only). This means that now Publishers can understand the intersection of their content and monetization strategies; it also means that a user that left the website through a click on a DFP or AdX unit, in the past, was considered a simple abandonment, but now you would see them as ads clicked, a considerable improvement in both accuracy and completeness of your Google Analytics reporting.

Below is a quick explanation of what those new integrations will bring to publishers when it comes to understanding and reporting new data.

Publisher Metrics

Following the integration, you will have access to dozens of new metrics on Google Analytics, which can be seen on the interface or can be used in Custom Reports and the Segment Builder. The metrics are similar to the ones you already see for AdSense and similar new ones for Ad Exchange. Below is a list of the overall Publisher metrics and their official definitions.

Publisher Impressions: An ad impression is reported whenever an individual ad is displayed on your website (AdSense, AdX, DFP). For example, if a page with two ad units is viewed once, Google will display two impressions.
Publisher Coverage: Coverage is the percentage of ad requests that returned at least one ad. Generally, coverage can help you identify sites where your publisher account (AdSense, AdX, DFP) isn't able to provide targeted ads. (Ad Impressions / Total Ad Requests) * 100
Published Monetized Pageviews: Monetized Pageviews measures the total number of pageviews on your property that were shown with an ad from one of your linked publisher accounts (AdSense, AdX, DFP). Note - a single page can have multiple ad slots.
Publisher Impressions / Session: The ratio of linked publisher account (AdSense, AdX, DFP) ad impressions to Analytics sessions (Ad Impressions / Analytics Sessions).
Publisher Viewable Impressions %: The percentage of ad impressions that were viewable. An impression is considered a viewable impression when it has appeared within a user's browser and had the opportunity to be seen.
Publisher Click: The number of times ads from a linked publisher account (AdSense, AdX, DFP) were clicked on your site.
Publisher CTR: The percentage of pageviews that resulted in a click on a linked publisher account ad (AdSense, AdX, DFP).
Publisher Revenue: The total estimated revenue from all linked publisher account ads (AdSense, AdX, DFP).
Publisher Revenue / 1000 sessions: The total estimated revenue from all linked publisher accounts (AdSense, AdX, DFP) per 1000 Analytics sessions.
Publisher eCPM: The effective cost per thousand pageviews. It is your total estimated revenue from all linked publisher accounts (AdSense, AdX, DFP) per 1000 pageviews.

Those metrics are clearly a major improvement to the measurement capability of publishers, as they will now be able not only to see all those ad interactions from separate DFP networks in one centralized platform, but also to combine this information with other behavioral data that is already being collected by Google Analytics.

As mentioned, in addition to the metrics described above, you might have additional sets: AdSense, Ad Exchange, DFP, and DFP Backfill. Most publishers will not have all 4 sets; they will either have 1 (AdSense or AdX) or 2 (DFP and DFP Backfill [DFP Backfill includes AdSense and AdX served through that DFP network]).

In the next section I describe the difference between those groups and where each ad interaction would appear depending on the tags you are using. I will also go through the new default reports and reporting capabilities in general.

Publisher Reports

After you set up the integrations (and assuming that you have all of them, which is not likely), you will have access to quite a few default reports. You will still see similar options in the sidebar navigation, under the Publishers menu, which will read: Overview, Publisher Pages, Publisher Referrers. But for each of those three reports, you will now have 5 options in the report tabs (right above the chart):

Total: sums up all the interactions below, i.e. for Publisher Impressions you would see all impressions including AdSense, DFP and AdX.
AdSense: only AdSense served through the AdSense tag. Note that you do not need the AdSense tag if you are serving it through DFP, and if that is your case you will see AdSense data as DFP Backfill, not as AdSense.
Ad Exchange: only AdX served through the Ad Exchange tag. Note that you do not need the AdX tag if you are serving it through DFP, and if that is your case you will see AdX data as DFP Backfill, not as AdX.
DFP: only DFP metrics (excluding AdSense and AdX backfills) for directly sold campaigns and house ads served through the Google Publisher Tag.
DFP Backfill: all the AdSense and AdX interactions (indirect / programmatic campaigns) when served through the Google Publisher Tag.

But to be honest, I am way more excited about the custom reports that we are now capable of building. For example, for Online Behavior, I wanted to check the total revenue I am getting from each of the authors contributing articles. So I used a Content Group I have been populating with the author name (which is publicly available in the article, hence not PII). Below is a screenshot of the custom report I built.

As you will see, the 5th line is Allison Hartsoe, who wrote two articles for Online Behavior. Even though she has significantly lower impressions and clicks, she produced a pretty high revenue in this period. My conclusion: I should promote her articles heavily and reach out to her for another post :-)

Publisher Analytics insights

The options are really endless here, you could use any Content Group (e.g. interest, content type, page template, etc), Custom Dimension (paid user, loyalty status, etc), or any default dimension to measure your full publisher interactions; one especially interesting section to look at is the Interests report, it will show you the interests of users clicking on ads. This will open up a deep understanding about your customer segments and how they perform when it comes to publisher revenue.

This example illustrates the power of centralizing all your interactions from AdSense, Ad Exchange and DFP in one place. This allows publishers not only to have a deeper understanding of their revenue, but also to act on it.

Linking AdSense, Ad Exchange and DoubleClick For Publishers

In summary, if you are a Google Analytics standard user, you can link your account to AdSense (tutorial) or Ad Exchange (tutorial), but you will have to consider Google Analytics Premium if you are thinking about integrating DoubleClick for Publishers (learn more). If you are a Google Analytics Premium client and did not link your DFP account, do it today! You should contact your Account Manager (or Authorized Reseller) and ask them about it.

As I wrote in the beginning, I think this is a huge step for the Analytics industry when it comes to Publishers, which now have a robust and comprehensive measurement solution.

image

Understanding Google Analytics Goals & Funnels

In previous discussions, I've addressed the need for an analytical mindset, which involves understanding things and the interrelationships between them. Developing an analytical mindset increases and deepens your appreciation of complex systems, including the behavior of users of your website or app.

To promote analytical thinking, I recently posed a question on social media, which I believe to be an excellent place to pose sensible questions:

"What features of Google Analytics (GA) are the least understood?"
"Which ones appear to make the least sense?"
"What features do you want to understand better?"

Some of the responses I got were great:

How do goals and funnels work?
What does direct traffic really mean?
Why does my own website appear in the list of referrals?
What private data are Google Analytics legally allowed to collect?

In this article, I'll focus on the first question: "How do goals and funnels work?" (stay tuned if you're interested in answers to the others). My answer will also demonstrate how approaching a question from an analytical perspective helps you develop a more complete understanding than otherwise. This analytical approach involves breaking apparently complex issues down into constituent components that are simpler and therefore more easily understood. Paradoxically, understanding something at a fundamental level allows you to build up to an understanding of the thing as a whole.

Dissecting Google Analytics Goals

Let's start by setting goals in Google Analytics. Ultimately, you want to know when and how frequently users do what you want them to do on your site or app. Therefore, setting goals on GA involves obtaining meaningful measurements of these key outcomes, and the configuration of your Google Analytics setup must properly reflect these goals for you to get the information you need. Some examples of key outcomes are the following:

Make a Purchase
Sign up for a newsletter
Play a video
Download a pdf

These are all actions users can take on your site that deliver value to your business, and all are easily tracked using goals. Because they enable you to quantify the ROI of your site or app, these are likely to be derived from your business Key Performance Indicators (KPIs). As you'll see below, measuring these is an essential part of your GA setup.

Knowing why we need goals clarifies the importance of goal configuration in GA and explains its role as one of the most critical customisation actions you take in setting up GA. However, to get the most out of GA, you also need to understand how goals are used and how they work.

Setting up Goals

Google Analytics offers 20 goals per View, organized in 4 groups of 5. Depending on a user's rights with respect to a view, the user can perform a number of goal-related functions by accessing Goals as shown below. Edit rights on the view allow the user to add new goals, change existing goals, and enable/disable goals in the View Admin section whereas Read & Analyse rights permit only reading of goals (learn more about user permissions).

Goals setting

GA begins tracking from the moment you set a goal but isn't retrospective, that is, it doesn't track the time prior to your setting the goal (unless... well, keep reading to find out more). Once it is set, you can test a goal using the handy utility Verify this Goal link in the goal setup (screenshot below).

Goal verification

Furthermore, goal data are defined as non-standard data processing, meaning that goal data are available only after 24 hours (estimated processing time) have elapsed, even in Google Analytics Premium.

Pre-requirements to understanding Goals

At this juncture, we need to discuss key topics that are fundamental to understanding goals:

Tracking: Regardless of goal type, goal metrics (completions, conversion rate, abandonment rate, and value) are derived from your existing GA tracking of pageviews and events; goals require the capture of no additional data, they are only configurations.
Sessions: "a group of interactions that take place on your website within a given time frame" - there are other nuances to this definition, learn more.
Conversions: "the process of changing or causing something to change from one form to another." Huh?! (see Google definition below)

Conversion definition

Let's look at an example. Assume your goal involves signing up to receive a newsletter. The sign-up process begins when a visitor to your site who is not already a newsletter subscriber arrives. If the visitor decides to become a subscriber, he or she then finds, completes, and submits the sign-up form. Aren't we actually measuring the moment at which a user converts from not being a subscriber to being a subscriber?

In this case, the goal, which is to capture the moment at which this conversion occurs, defines the conditions within Google Analytics' data that represent a user having taken a meaningful action, that is, converting from nonsubscriber to subscriber.

Google Analytics Goal types and definitions

Goals are of four types:

Destination
Duration
Pages/Screens per Session
Event

You can set your own goal settings, use a preconfigured template, or download a setting from the Solutions Gallery.

Goal definitions vary by goal type, that is, by destination, duration, screens/pages per session, and event.

Destination

A conversion for a Destination goal is defined using a screen or pageview:

Page/Screen	[equals] a value (e.g. /blog/dec/my-article)
Page/Screen	[begins with] a value (e.g. /blog/dec)
Page/Screen	[matches Regular Expression] value (e.g. \/blog\/(nov|dec)

Duration

As the name implies, a conversion for a Duration goal is defined using the Google Analytics session duration:

Session Duration [greater than] x hours y minutes and z seconds

Pages/Screens per session

A conversion for a Pages/Screens per session goal is defined using the number of screens or pageviews in a Google Analytics session:

Pages/Screens per session [greater than] a value (e.g. 10 pageviews)

Event

A conversion for an Event goal is defined using a Google Analytics event. At least one of the following conditions is needed, but all four or any combination can be used:

Event Category [equals] a value (e.g. Video)
Event Action [begins with] a value (e.g. Play)
Event Label [matches Regular Expression] value (e.g. Homepage Video)
Event Value [greater than, equals, or less than]	a value (e.g.  20)

As you can see, destination and event goals use either pageviews or events to define the conditions that define a conversion, while the other two goal types clearly use session metrics (duration and pages/screens). In fact, destination and event goals are actually session based also. How so? What do you mean? you may be asking yourself... Excellent, the analytical and curious mind seeks clarity!

Events, Sessions & Users in Goal conversion rates

This subtle point requires crystal clear understanding on your part to grasp the tricky and, some would argue, flawed nuance of the conversion rate metric. First, let's assume I define a goal based on an event firing, playing a video for instance. Since a user can play a video multiple times during a session, each playing would represent an event and so a goal conversion, correct? Although that interpretation seems logical, it is not correct. To understand why, consider the nature of the conversion in this case:

Before - The user hasn't seen the video
After - The user has played and therefore seen the video

As you can see, the event occurs when the user converts from not having seen the video to having seen it, and so goal measurement is the number of sessions that include at least one video play. The key to understanding events is to remember that goal conversions are recorded per session, and so the conversion can occur only once within a session. The specific metric used here is conversion rate, which is calculated as follows:

Sessions with specific Goal completion/Total sessions

Using session rather than user as the goal's container provokes intense disagreement. One side argues that, since the user and not the session has converted, conversions and conversion rates should be user based. The other side counters with an argument involving the personal area network (PAN) device issue. This side asks, can you consistently and accurately determine that the user is the same individual when multiple devices are included in the user's PAN? The answer is, not easily and not all the time, and therefore session is a more appropriate container for the goal than user.

Although Google Analytics typically uses sessions in goal tracking, it does provide a means to use goal completions by users to calculate conversion rate, the recently rolled out Calculated Metric functionality. The excellent articles linked below cover this subject thoroughly and in sufficient detail, and so I won't belabour the point here:

Dissecting Google Analytics Funnels

Our understanding of goals within Google Analytics is now significantly deeper than it was on the beginning of this article. Let's take another step deeper into the rabbit hole and consider goals having multiple steps.

The funnel goal still uses a final destination page to determine when a conversion occurs. This goal type uses pages only; event sequences are not supported. The sequence of pages or screens is defined using the same constructs (equals, begins with, and regex) and, as shown in the screenshot below, each step is given a name to make the steps human readable and so aid in analysis and reporting.

Goal funnel

The screen above shows a standard ecommerce checkout funnel modelled with a goal funnel. Regular expression matches are employed, and the match method used for the destination page is also used in the goal funnel.

As one example, assume that this goal is a guest checkout in which users enter the minimal information required to complete the order but don't log in or create an account. Consider the identical example except that the users do log in to an account prior to checking out. Although the checkout steps differ, the goal destination remains the same, meaning that GA will record the same number of conversions for the goal in both examples. Hence, even though the steps are essentially different, the goal conversion rate for each goal type will also be the same. Whilst the funnel conversion rate will differ (because the steps in the goal funnel differ).

Where two divergent funnels are used to represent different outcomes, ideally the destination pages will also differ so that two distinct funnel conversions and funnel conversion rates will be delivered, one for each funnel.

Another possible funnel configuration involves starting point. Notice the example above stipulates that the first step in the funnel is Required. This is therefore known as a closed funnel, which allows users to enter the funnel only by first visiting a page matching the condition defined in step 1. An open funnel, where the Required flag is not set, allows users to enter the funnel at any stage of the process. You can use this setting to ensure that the tracking matches the functionality of your site. If users can enter the checkout, for example, at any stage, then the funnel needs to reflect this. If, on the other hand, users can only check out starting at the basket page, the funnel needs to be closed using the Required flag on the basket stage.

Importantly, for the same number of checkouts, open and closed funnels will show the same number of goal conversions and the same conversion rates, but their funnel conversions and funnel conversion rates will differ.

Goal Funnel vs. Goal Flow reports

Now you should have a firm grasp of the basic goal types as well as of the more complex goal funnel. When you begin asking questions about typical user behaviour and how to model it in a funnel, you add further intricacies and complexity to the funnel functionality. For instance, staying with the standard ecommerce checkout example, consider users who fill in their delivery details, proceed to the billing section, and then remember that the address they used is the wrong one. No biggie, they just navigate back one step, update, and carry on. But what effect does this detour have on the goal funnel?

The answer lies in the goal funnel report. There you'll see the user hitting the delivery goal step, progressing normally to the billing step, but then exiting back to the delivery goal step. This is called a loop-back and is more clearly visible in the goal flow report than in the goal funnel report.

What if you have an optional step in the checkout, for a coupon, for example? Although the goal will be configured to incorporate the coupon checkout step, users who skip the coupon step and go straight on to the order confirmation will be subject to back filling in the goal funnel report. Therefore, you'll see an entrance to the step prior to the coupon stage, progression to the coupon stage, and then the hit on the order confirmation stage. The goal flow report shows skipped steps with more clarity, with a separate flow shown for users who skipped a step.

The power of the goal flow report is not widely appreciated, and these simple examples illustrate how the goal flow report offers clearer reporting for funnels. Read this help center article in the GA docs to learn more about the differences.

Google Analytics Premium Custom Funnels

As mentioned earlier, traditional goal funnels are session based and only employ pageviews. However, your checkout funnel may employ pageviews and events to more finely tune tracking of user behaviour during this essential part of a transaction. You may want to track converting journeys across sessions and decide yourself whether to use sessions or users in your conversion calculation. Whilst the standard goal funnel is a poor fit for these requirements, Google Analytics Premium users have the wonderful Custom Funnel functionality at their disposal.

The screenshot below shows the wealth of powerful options that can be used to customise funnels:

Custom funnel options

Moreover, powerful rules concerning pages and events can further define funnel stages:

Custom funnel rules

Indeed, pretty much every standard reporting dimension is available to allow fine tuning of funnel stages:

Custom funnel dimensions

Deep joy ensues when the funnel is set up and you see it applied to data retrospectively, thereby allowing further fine tuning of the funnel for historical analysis:

Custom funnels

Custom funnels are still flagged as beta functionality, and so, whilst massive power is currently available, expect more in the future.

image

A common question you might ask when going through your Google Analytics data is "Where do my users come from?" This will help you understanding how users find your website or app and which sources of traffic are working well (or not). Using this information wisely, you can quantify the value of your paid marketing campaigns, optimise Organic traffic, find out how well your newsletter emails are performing and more!

You probably know your main traffic sources but do you know how many possible traffic sources there are? Literally, how many sources do you think your site could have? Apart from the campaigns you've set up in AdWords or emails, have you ever considered what other traffic sources are available? There are more traffic sources then you may think.

You might not know this but there are potentially hundreds if not thousands of potential traffic sources! This poses a major challenge to Google Analytics to make sense of where all your users came from and to present accurate data in reports. Google Analytics always tries as hard as possible to give you accurate data about your traffic sources but sometimes it's not so straightforward.

Enters the Direct source and (none) medium. You'd expect Direct traffic to represent your most loyal users who know how to find your app or site by name. Right? Not always. There's more to Direct than you might initially think.

This article will cover exactly what Direct means, how to understand it, when it is correct or due to an error and how to solve it.

How does Google Analytics calculate a traffic source?

When a user arrives on your site, how does GA decide whence they came? GA uses a fairly simple algorithm, which is described in the flow chart below. It is long, but keep on going!

Traffic Source processing flow chart

There are a few technical terms in this diagram so let's work through a series of use cases to better understand what they mean.

How GA identifies visits from Organic search?

Go to Google, search for ConversionWorks and click through to my homepage. How would GA know you came from a Google search? Thanks to the magic that enables the internet to run (HTTP), we can see in the Request Headers that the site I came from, the Referrer, was https://www.google.co.uk

HTTP Request

Okay, but looking at my GA data in the Acquisitions report (Source/Medium) I can see google / organic, google / cpc and a bunch of referrals from other Google properties. How does GA differentiate between a Google search visit, a click on an AdWords ad and the other referrals?

Being a Google product and knowing that Google knows a thing or two about search, GA knows how to recognise a visit from a search engine. You can see the list of known search engines and how they're identified using the Referrer value here.

How GA identifies visits from a paid ad click?

Okay, so GA can spot a referring search engine but what about paid traffic? When we talk about paid traffic we might mean AdWords or DoubleClick for example. If paid campaigns using these ad networks are setup using auto-tagging then GA will look for a query parameter on the URL when users land on the site. A user coming to ConversionWorks from a click on an AdWords ad will have a query string parameter named gclid appended on the URL like this:

http://www.conversionworks.co.uk/?gclid=gfjsgWSAR45jdxn32hjkh4324n

A DoubleClick ad click will have a query string parameter named dclid. GA can see these parameters (gclid and dclid) and can decide not only that this visitor came from a paid ad click but also which ad they clicked, which campaign, the cost of the click and so on. Pretty amazing and super valuable data!

If the user came from a Google property with no query string data and it wasn't a search property then the visit must be a referral.

How GA identifies visits from custom campaigns?

GA looks for some other standard query string parameters too - the utm parameters:

utm_source
utm_medium
utm_campaign
utm_content
utm_keyword

You can append these querystring parameters to links in emails, social shares, non-Google ad networks or links in pdf documents. You can use these very powerful parameters to control the data in your acquisition reports. You can use the official Google URL Builder to help build your links.

Be careful though! Using these parameters on internal links on your website (from one page on your site to another page on your site) will cause a new session to start artificially. If you change the campaign value, you'll start a new session which you probably do not want to happen. Best use these guys only when linking from an external source that you want to track explicitly.

What Direct means and when can it happen?

If GA can't determine that a user landed on your site from a recognised campaign, search or social source, there's no existing campaign data, it's a new session and no referral data is available: you'll find yourself at the very bottom of the traffic source algorithm flow chart - it's a Direct visit.

But is it really Direct?

Just because GA decided a user was Direct doesn't necessarily mean this is exactly what happened. The analytical mind will always question a data source. Is this data a true reflection of reality? How can I trust this data? How can I prove this is correct?

What grounds do we have to doubt the accuracy of the data? How do we reasonably question the data and go about calibrating it?

Questioning your data

Let's assume you have some expectations regarding the volume of traffic you'll get from 'owned' campaigns such as Email, AdWords, DoubleClick or Social. In addition to expected volume, you probably also know where users are going to land on the site. Now you need to check your data. Ask:

Are users landing on the pages you expect?
Are the landing pages being hit by the right campaigns?
Does the session volume look right?

Acquisition report gross error checking

This kind of "gross error checking" exercise is seldom performed. This is such an important exercise to conduct if you're going to trust your data.

Compare your click data from AdWords with GA data. Are you seeing the right number of sessions compared to clicks? The numbers may not match exactly but anywhere within 5% is about right.

Check other source/medium combinations for your landing pages in GA. Seeing anything untoward or untrustworthy? If your campaign traffic volume is south of your expected value then this is where you might see more Direct traffic than you might expect.

Is this a good or a bad thing? How to spot and fix issues?

Users can always type a URL into their address bar or click a bookmark, of course, and this is genuine Direct traffic. However, there is also a chance Google Analytics may not be able to correctly attribute the user's session to a traffic source, in which case the session is flagged as Direct. Here are a few scenarios:

Clicking from a secure site that uses https to in insecure site that uses http
Clicks from apps
Untagged or incorrectly tagged links (most common)
Measurement protocol hit

If any of the scenarios listed above happen then this will cause GA to flag the session as Direct which is potentially not right.

HTTPS to HTTP

This is the way the internet works. If you're on a secure site that uses https, part of the security is that when you click through to an insecure site using http, the insecure site is prevented from seeing where you came from - no referrer data is available to GA on the insecure site.

Secure sites like Google and Facebook are quite clever in that they do expose referrer information when you click through from their secure pages to insecure pages on other sites. We don't need to go into how they do this in this post but the simplest solution is to run your site on https. This is good for your users. Give them peace of mind knowing their browsing experience is secure and you'll have no worries about losing referral data. That's an easy trade. Talk to your engineers and get it done already!

Clicks from apps

If users click on links to your site from within an app, GA can't see which site they came from because they didn't come from a site! They came from an app which is an app... not a site in a browser. The app won't necessarily send referral information which confounds GA and you end up with incorrect Direct traffic.

It's quite possible the clicks from apps are valuable. If you treat clicks from apps as a monetisable channel then you need to track these clicks properly.

Use utm tagging (also known as manual tagging) to decorate the links in the app with campaign data. If you've never done this, take a look at this handy resource provided by Google.

Untagged or incorrectly tagged links

This is a very similar scenario to the last one. Maybe you don't have links in apps but if you have links in emails or maybe even pdf documents, Word documents or Excel spreadsheets, these are not browsers and might not send referral data for GA to latch on to. You need to use manual tagging again.

What if you are using manual tagging but had a little finger trouble? You did test the links right? They went through to the right page but did you check the GA data for the right source, medium and campaign values?

If you click this link, you'll end up on our homepage:

http://conversionworks.co.uk?utm_sorce=onlineBTest&utm_medium=onlineBTest&utm_campaign=onlineBTest

Looks okay? Can you spot the issue? utm_sorce is not a correct utm parameter. Make sure to double check every time you create a campaign link or (better) use an automatic solution. A great way to check if the link works correctly is to use it once and use real time reports in GA to check it works correctly.

Measurement protocol hits

Have you heard of the Internet of things? Internet connected devices that can talk to other things on the internet: fridges, fitness trackers, cows... yes, even farm animals. None of these things are browsers but they can all potentially send data to Google Analytics using the Measurement Protocol. The Measurement Protocol is what makes Universal Analytics truly universal. It's a technique provided by Google Analytics for non-web browser technology to be measured using GA.

GA data sent via the Measurement Protocol might be flag ged as Direct if it is not decorated with campaign information. You can check the data to see if these hits are from things rather than users quite simply. Knowing that things are things and not browsers means we can use common dimensions in GA to see real browsers. Real browsers will automatically expose the screen resolution, the computer operating system and the flash version being used amongst others. These appear as dimensions in GA reports.

So, for example, a property that was only populated with Measurement Protocol hits might show (not set) for all Measurement Protocol hits on the Operating System dimension. Similarly, you would see no Flash Version, no screen resolution or Screen Colours. These dimensions are all available to see in the Audience -> Technology -> Browsers & OS report.

See how adding a secondary dimension of Source / Medium helps us narrow down the data to check exactly what's going on? This is a useful technique to learn and use.

Conclusion

This essay has shown how GA decides where a user came from. You've seen how this can work and you've seen how this can fail. Knowing these details, plan a review of your traffic source data. Do some gross error checking. Do some calibration. Check your data and build confidence in the numbers.

If you find any holes, you're better armed with explanations and fixes. You may find more value in certain channels and optimisation opportunities in others.

You're on your way to using data more wisely. Good!

image