The following is excerpted from It Is More Than Metadata – The Meaning Behind The Mining by Big Men on Content.
The power of metadata is two-fold. First it allows for classification of information it describes. This classification makes possible the filtering of information that is not interesting to the analyst. This much data is useless without a way to group information into logical divisions and then essentially ignore everything else you are not interested in. The second power is that it makes possible identifying relationships between information that you would otherwise not see as connected.
Crime shows have been educating the public on this for years. The detectives get a list of suspects as starting point with the same modus operandi.(classification) They then cross-reference that list with the location data from their phones at the time of the crime.(phone call metadata) Now from the list of all possible suspects that you could look at the list is narrowed to a manageable few to investigate with other means. You can also use other metadata, who else have they texted or called to establish relationships. Then associate those people with location data to identify possible accomplices. All of this done from the comfort of your data center.
Notice in this example I use a past event – the crime – as the context for the relationships. What happens though when you try and use this same data in a predictive mode. You can still create the web of relationships and touch points but without more information about the context why these relationships matter must be inferred. I text a man every seven days who happens to be a drug dealer. I could be scheduling a drop or he could just be running a lawn service too and I need help with a different kind of grass.
The experts will tell you that analysis of the other data in our respective networks will increase or decrease the likelihood that I am involved in illegal behavior. In order for that to work though you must have everybody’s data, not just those that are of immediate interest. The point here is that while the tools have an increasing degree of accuracy in predictive capability with more data – it remains an inference. A guess. An accusation without anymore evidence than a passing business relationship.
How do you solve for that? The natural next step is more data. Data that will provide the context needed to establish a threat. The content of the calls and messages themselves and it will be what is captured next. The technology for capturing and automatically analyzing that is an advancing as well and this is critical. As impressive as the numbers around the records being collected are they are trivial when compared to the number of one and zeros that make up calls themselves.