8th May 2017

Is automated redaction of sensitive data redaction reliable enough to ensure regulatory compliance?

Written by Spitch

Automated sensitive data redaction software based on speech technologies can help remove confidential information from customer conversation recordings. Although the technology has been there for at least a decade, questions remain over its reliability. The key issue is if such solutions can be trusted to ensure full compliance with changing regulations, e.g. the new PCI DSS 3.2, forbidding the storage of Sensitive Authentication Data (SAD) post authorization.

The problem for many operators is that manual pause/resume solutions are not accepted as an adequate measure for ensuring compliance, while machine-driven redaction methods employing automatic speech recognition (ASR) are based on probabilistic approaches. In other words, if you could ask a machine for a simple binary “yes” or “no” answer to the question whether all the sensitive data has been removed, it would instead reply that there is perhaps a 90% probability of success.

There is apparently no theory or statistical model to replace a purely empirical approach in this particular case. Lab testing then? Experience shows that when seasoned financial services executives face this dilemma, they invariably conclude that lab tests results are good… but unfortunately not good enough, where it concerns compliance. Also, scientists know that the level of effort required to produce the lab tests results one wants is always considerably lower than the amount of time and energy required to refute them. Therefore, a widely-preferred approach is to run a simple and quick proof-of-concept project with real customers and conversation recordings.

This could be as simple as constantly checking existing data records for PCI DSS non-complaint information and providing reliable tools to highlight/locate such information, if it is there, for deletion or editing.

The latest innovations combine a bespoke automated speech recognition tuned for UK English that delivers the accurate detection of SAD targets with effective redaction tools based on machine learning and neural networks.

Here’s a simple example of how they work in practice:

Problem:

  • Reference: 1 2 3 4; Recognized: 1 2 3 4
  • Reference: 1 2 3 4; Recognized: 1 2 3 for
  • Reference: 1 2 3 4; Recognized: won 2 3 4

Five-step solution:

  • Identify SAD information in stored calls and audio records;
  • Redaction of identified audio segments fully corresponding with SAD segments in the reference audio recordings (100% accuracy);
  • Achieve a minimum of 90% (or similar) recall on targeted SAD criteria, in accordance with ASR accuracy and recall definitions;
  • Deploy a remediation mechanism addressing recognition inaccuracies appearing due to homophones (words that sound the same but have different meaning/spelling);
  • Deliver end results corresponding to compliance requirements.

For more information please register here for our webinar to be held on 11 May 2017, or contact us directly.

Please register or login to add this to your interests.