People Identification

26 Mar 2017 » Analytics Tips

One of the main challenge, if not the most important, of digital marketing is to be able to identify real people. This is the key to marketing objectives like “360-degree view of the customers” or “single view of consumers”. The problem is that web analytics tools track only visitors, so we need to find a way to be able to perform this people identification. Let me explain what options do we have.

Understanding the visitors metric

First, let’s understand what a visitor is. The traditional way of visitor identification is with a cookie. In fact, this is one of the most common purpose of a cookie. The web analytics system generates a random cookie value that uniquely identifies a browser. This random string is generated for every new browser without this unique identifier cookie. This cookie is then sent with every analytics request, for the web analytics servers to process and identify individual visitors. In other words, the visitor’s metric, in fact, equates to unique cookies.

In the particular case of the Adobe Marketing Cloud, it relies on the AMCV cookie. Within it, there is a parameter named MID, which is the value to uniquely identify the browser.

AMCV cookie MID in Analytics

This approach was useful to identify people when we only had one desktop computer, with only one browser. However, over time, this metric has become less accurate for various reasons:

Multiple devices. Most of us have now multiple devices. Each one will get its own cookie.
Multiple browsers. In my work laptop, I have 5 browsers. Each browser has its own cookie jar within the same operating system.
Incognito/private mode. In this mode, the browser generates a new, temporary cookie jar.
Cookie deletion. It is now common to delete the cookies after a certain period of time.

As a metric, the value of unique visitors is still useful, as it gives an idea of volume, but it is not useful any more to understand real customers. We need new tools to be able to individual people.

BI Tools

This has been the first solution, which has been used for some time now. It relies on the re-processing the raw web analytics data. If there is a field in the clickstream, like a CRM ID, BI tools can link one of those to various visitor identifiers. These tools can even process offline data, as long as there is a common key.

For example, this is what Data Workbench does to provide a 360-degree view of a customer.

With these tools, we have managed to resolve people identification, but the main problem is that the processing happens offline, in batch and can generally only be used for reporting.

Fingerprinting

I am sure you must have heard this word in the last few years. The basic idea is to find “something” that uniquely identifies a device. This means that, even if you delete cookies, use incognito mode or different browsers, it will still be possible to uniquely identify the device. There are many researchers working in this area and many techniques have been created.

My knowledge in this area is very thin, so I will not explain them in detail. Suffice to say that modern algorithms are having a very high match rate, over 90%. This means that, for more than 90% of the cases, these techniques have been successful at uniquely identifying a device. In general, these techniques execute a very specific JavaScript algorithm and server-side processes analyse its output. Apparently, these algorithms are able to uncover traits that are based on the hardware, not the browser or the operating system.

The main limitation in this technique is that it cannot “follow” people across devices. In other words, we have device identification, but not people identification. Besides, these techniques are frowned upon and are not welcome by the general public.

If you want more information about it, I suggest you perform a search on “techniques to browser fingerprinting”.

Other names to this other approach is visitor stitching, device matching or identity resolution. It still relies on cookies, but with a large enough cookie pool and some additional information, the server-side algorithm can link various cookies into a single profile. Adobe has branded its own solution as device co-op and this is the basis of the People core service.

Deterministic algorithms

As it name implies, it relies on hard facts. In general, this is based on user identification: logging in to the same website and with the same credentials, but using different devices. Each device will have its own unique identification (i.e. multiple visitors), but the user identification will be the same in all devices, after logging in. By exposing and capturing this user identification, server-side algorithms can “stitch” various cookies into a single person, thus linking visitors to people. This is basically what BI tools do. So, what can deterministic algorithms offer to differentiate from BI tools?

If your cookie pool is large enough, you can go one step further. Let’s consider the following scenario of a user with three devices:

Laptop, used to check bank accounts.
Mobile phone, used both for bank account and supermarket purchases.
Tablet, used only for supermarket purchases.

If you consider only the bank or the supermarket, you can only stitch two devices at a time. However, by combining the total information we have, we can deduct that all three devices belong to the same person, thanks to the mobile phone used for both the bank and the supermarket. This is, generally speaking, how Adobe device co-op works.

Adobe’s algorithms rely on using the visitor.setCustomerIDs() call. This applies to device co-op, profile merge rules and customer attributes. I therefore recommend you implement it as soon as you can, if you have not done so yet. Even if you are not using any of the features now, I am quite sure you will in the future.

Probabilistic algorithms

It is not always possible to get a unique user identifier. In this case, more complex algorithms can be used to try to infer who is behind the device. I am not going to explain these algorithms, as I am not an expert on them. However, it is easy to understand some traits these algorithms take into account:

IP address
Time of the day
Visited pages
Unique identifiers

The output of these algorithms is a group of cookies that belong to the same user, with a degree of confidence. It is work noting that Adobe’s device co-op also has a probabilistic part.

Ethics and legality

I did not want to finish this post with a call out to ethics and legality of these techniques. The fact that something can be done, does not necessarily mean that it should be done. There are certain boundaries which, I believe, should never be crossed. An aggressive marketer will want to know the smell of your perfume, but is this ethical or, even, legal?

For starters, non-ethical marketers run a very high risk of being spotted. Privacy-advocate groups are always scrutinising marketing techniques, which go beyond what is reasonable. You would these groups to single out your company name, as it can end up in the press.

Finally, some countries clearly specify what you can and you cannot do with user’s data. These legislations are always evolving and, depending on the country, in opposite directions. If you are reading me from USA, your are probably going to be tracked by your broadband provider in the near future. On the other hand, Germany has one of the most restrictive legislations, protecting consumers from large corporations.

My recommendation is to be clear of any suspicion.

SPAs and Attribution (Categories: Analytics Tips)
AA vs CJA: Basic Metrics (Categories: Platform, Analytics Tips)
Adobe Analytics vs Customer Journey Analytics (Categories: Analytics Tips)
The Adobe Experience Platform Debugger (Categories: Analytics Tips)
Reporting vs Analysis (Categories: Analytics Tips, Opinion)
Server-side Download Tracking (Categories: Analytics Tips, Server Side)