Fraud Detection - it's not just about data, it's about relationships -

By some estimates, insurance fraud siphons $80 billion out of consumer’s pockets, annually. That should be evidence enough that insurance fraud is not a victimless crime. Insurance companies are not going to swallow the loss, they are going to pass it along to you and me, and that means higher premiums.

Companies are increasingly looking at Big Data and data mining in efforts to fight fraud and gain a competitive advantage. Blogs and media discussions abound on the potential benefits of Big Data; it’s sexy, exciting, and topical.

But looking at data is not enough; it is not strictly about the data, it is about the relationships between the data. This is the less glamorous, hard slogging, methodical, connecting-the-dots effort that is necessary to identify patterns and connections between the data and to interpret what those connections and relationships mean.

Let’s look at an illustrative example. Suppose a Workers Compensation insurance company was concerned that a medical provider was involved in insurance fraud. It’s not enough to have a suspicion, facts are needed.

So, the insurance company looks at the claims where this provider is involved in treatment of the injured worker and finds that in 100 such claims, this provider was the treating provider. What does this mean? In and of itself, this number is meaningless.

A better question might be: what percentage of all claims is this provider involved in? If you found that number was 20%, would that indicate fraud? Not necessarily but maybe.

Through additional investigation, what if you then discovered that most of these claims resulted from one large employer, the main employer in a small town, and the provider was one of only a handful in the town? The potential of fraud is looking less likely. As almost the only game in town, the provider is naturally likely to see a large proportion of claims.

However, what if your investigation revealed that the claimant’s, rather than all living in close proximity to each other and the provider, all in a small town, were scattered across a large metropolitan area, many miles from the provider, and travelled significant distances to attend appointments with this provider? Not looking so good now. And what if your investigation also revealed that all of the claimants had retained the same attorney? Looking worse. And all the injured workers claimed a soft tissue back injury; injuries which are notoriously difficult to objectively confirm? Definitely could be a problem here.

This simple example illustrates that the collection of data, in and of itself, is meaningless, it is the relationships, the connections between the bits of data that reveal the complete picture.

Years ago, in California (arguably the epicenter of insurance fraud) there was a multiple fatality rear-end accident on the freeway. A large, heavily laden tractor trailer unit rear-ended the vehicle in front, at high speed, crushing it and killing the rear seat passengers. On the face of it, a simple, yet tragic, rear-end accident; common on freeways, where speeds are high and traffic is condensed.

When interviewed, the truck driver said that a vehicle ahead in the fast lane had made a sudden lane change in front of the vehicle he had hit, and then made a sudden stop, causing the vehicle in front of him to slam on his breaks, resulting in the truck rear-ending it.

Again, a common enough occurrence; frequent sudden stops on packed freeways are, as we have all experienced, a fact of freeway driving.

However, it was a sharp eyed claim adjuster, with a good memory, who began to see similarities between this case and some others he had handled; rear-end accidents, on the freeway, where the at-fault driver said there was a sudden lane change and a stop. The adjuster began to dig further. He found the files, and many others, and noted that in all cases, the same attorney was involved in representing the “innocent” victims. Additionally, shortly after receiving the notice of representation, bills from the same medical provider began to arrive on the file for all the passengers in the vehicle. All medical injury diagnoses claimed soft tissue back injuries (except for the one fatality). All claimants began attending the same physical therapy clinic and chiropractic clinics, even though all claimants were located throughout the metro area, many of them miles from the clinics.

Ultimately, the connections made by the adjuster resulted in murder charges against the attorney and various insurance fraud charges being laid against 26 other accomplices.

Even though this was well before the days of Big Data, it was not the data that identified the fraud, it was the relationships between the data that the adjuster identified.

It’s important to remember this in any discussion regarding fraud identification. Collecting the data is really the easy part. What to do with that data, how to analyze it, how to create flags or reports designed to highlight certain claims or parties is the difficult part. It’s not about the data, it’s about the relationships.