Securing the Digital World

Demystifying the Black Box of Machine Learning

Aug 13, 2017 | by Heidi Bleau |

Nowadays, it is common to use machine learning to detect online fraud. In fact, machine learning is everywhere. Due to its independent nature and human-like intelligence qualities, machine learning does, at times, seem like an inexplicable “black box.” But truth be told, machine learning doesn’t have to be like that.

Here is what you should know if you decide to give “computers the ability to learn without being explicitly programmed.”

Before choosing  fraud detection technologies that leverage machine learning, consider the advantages and disadvantages of the different algorithms together with the demand for transparency, prediction accuracy and ability to adjust to the rapidly changing landscape (be it fraud or others). In order to be truly successful, the specific algorithm you use, combined with your domain knowledge, can make all the difference. Here are a few algorithms and factors you should consider:

Artificial Neural Network (ANN; or their more advanced counterpart – Deep Neural Nets) is considered to be a “universal approximator” as it fits almost any scenario and field.  But is it the best one in every field?

While Deep Neural Nets is superior to other algorithms when image or speech recognition is considered (in other words, when working with huge data sets), there are a lot of examples where ANN can produce inferior results to other classification techniques especially if the size of training sample is limited. It requires a large set of training data and is prone to over-fitting. ANN is sometimes referred to as “the second best way to solve any problem” while the best way is to actually to understand the different parameters of the problem you are trying to solve and then implement a model closely resembling reality.

Another statistical method known as naïve Bayes classifier - a probabilistic supervised classifier tool - has been proven mathematically to have a high degree of efficiency and reliability. The Naive Bayes algorithm - leveraged in risk-based authentication technologies - affords fast, highly scalable model building and scoring. Bayesian classifiers are usually faster to learn new fraud patterns on smaller datasets (e.g., when less fraud/genuine feedback is available). They are flexible to additions of new predictors which is crucial in the ever-changing fraud reality, and their simplicity prevents them from fitting their training data too closely.

With the Bayesian approach, the parameters that contribute to the final result can be made visible (not a “black box”). This means that Bayesian classifiers are free from the intrinsic disadvantages of other methods like ANN that cannot provide information about the relative significance of the various parameters - these are the real black box models. To that end, users of risk-based authentication have the ability to understand the top parameters which contributed the most to the risk assessment, and these factors are visualized through a Case Management application. 

Relying on artificial intelligence does not necessarily mean living in the dark or losing control over what your system is doing. When you have a robust machine learning algorithm that adapts to changes and partial data, is flexible to constant additions and possesses a unique quality by which predictions are easy to interpret, then leading fraud detection results can coexist with transparency and a clear understanding of machine provided risk assessments.  Learn more about RSA’s use of machine learning for fraud detection here.


The content in this blog was contributed by Maya Herskovic, Senior Product Manager for RSA Adaptive Authentication.