A New Battleground: The Impact of Gen AI on Call Security
Remote workforces and evolving AI tools provide new opportunities to attackers
This week I saw an amazing product demo that perfectly showcased the convergence of the two technologies that I will go on to discuss in this blog, video conferencing and generative AI.
Created by the team at Pickle, the demo (available on their website) showcases their amazing new commercial product that uses real-time AI cloning to allow users to appear professional and engaged on a live video call, while the human “on the call” is free to do whatever they please without having to sit in front of the camera.
At first (and even second) glance, the avatar appears virtually indistinguishable from a real human on a zoom call. Certainly good enough to fool me if I was not expecting or looking for it.
According to their website, the process is incredibly easy, with customers able to create their clones using as little as 3 minutes of pre-recorded video.
Get your avatar
Submit 3-minutes video talking of you and generate your own AI avatar with trained appearance and movements.
Enter calls with console turned on
Console activate virtual camera and replace camera input with generated video using your live audio stream input.
Live Video Generation
AI avatar appears in your camera and lipsyncs on your words everytime you speak.
While I’m sure remote workers everywhere are rejoicing at the prospect of increased “freedom” during their working hours, for cybersecurity professionals, this tech showcases a perfect example of the new tools and attack vectors available to attackers.
Before we explore this further, let’s take a moment to recap the key advancements that led us to this point.
2020: Adoption of video conferencing
During COVID, remote work with distributed teams became normalized. Companies previously unreceptive to this concept were forced to adapt, and as a result, began using video conferencing software to conduct their everyday business. Now, several years after COVID, virtually every company continues to use Zoom, Meet, Teams etc. to communicate both internally and externally. While each company tends to have their preferred provider and platform of choice, the reality is that most employees (especially external facing ones) tend to join calls across a variety of platforms on a regular basis.
One of the interesting dynamics associated with this mass adoption of video conferencing was the lack of concern for the security of these platforms. There was a generally accepted understanding that video conferences were as close to in person interaction as you could get without actually being face to face. With this, there was a sentiment that since the authenticity of a person was never in doubt when speaking to them in person, video conferencing was largely the same since in both scenario’s you could see and hear the person you were talking to in real-time.
And to be fair, this was status quo was largely fine up until another much more recent shift in tech took place.
The second great shift
Fueled by decades of quiet foundational work, mountains of venture capital dollars, and landmark product releases, generative AI has burst onto the scene over the last couple of years and continues to remain at the forefront of public consciousness. Each year we have seen increasingly rapid improvements in the quality, accessibility, and adoption of this tech across voice, video, and text. New applications for generative AI are being discovered, funded, and developed faster than society can collectively keep up with and prepare for.
A (not so) slow motion car crash
This past year we have seen these two technologies collide.
This is largely because this past year, for the first time, generative voice and video became accessible in high enough speed and quality to allow convincing impersonation of human likeness in real-time.
There is no going back.
From a security perspective, this presents a significant new risk. The channels of communication considered most secure and trustworthy are now vulnerable.
Since AI now allows real-time replication of human likeness good enough to fool people on a consistent basis, then humans can no longer be relied upon to enforce the security of these communication channels. This is especially true when they have never had to question the security of these communication channels before, and more often than not have been taught to use these voice and video communication channels to verify any questionable requests received via email, text, etc.
While this new vulnerability affects all businesses, for those with remote or distributed teams, who often use voice and video conferencing platforms on a daily basis as an integral part of their business, this presents an existential threat.
In 2023, the first widely publicized breach resulted in over $25 million in losses after a finance department employee at a large distributed engineering firm correctly questioned a phishing email, jumped on a video call to verify the suspicious request, and was duped by a real-time deepfake of their CFO.
While this attack generated headlines, the narrative revolved around the first high profile usage of deepfake technology in an attack, and the ensuing conversation around security focused on how deepfakes could be identified in the future.
However, what this missed was the nuance that deepfakes are just a new tool added to the arsenal of social engineers. While they are concerning, what is more significant than the usage of deepfakes in this attack was the new attack vector that was exploited.
For the first time, an effective social engineering attack resulting in significant financial losses was executed using a video conferencing platform.
There was no security layer for this communication channel to protect the employee because it was never considered that the eyes and ears of the employee could be deceived in real-time.
While the milestone discussed in headlines was the usage of deepfakes in a social engineering attack, the real significance lay in the explicit signal that generative AI had unlocked video communication channels as a new attack vector going forward.
With this, similar attacks are no longer a question of “if”, they are a question of “when”, especially for businesses operating in regulated industries.
And the data is starting to confirm this is the case.
According to a recent survey from Medius, 53% of finance professionals in the US and UK have already been targeted by attacks leveraging deepfake technology, with 43% admitting they fell victim to the attack. Even worse, “when asked, the vast majority of professionals (87%) admitted that they would make a payment if they were “called” by their CEO or CFO to do so. This is concerning as more than half (57%) of financial professionals can independently make financial transactions without additional approval.”
The unfortunate reality is that we are only at the beginning, and that this is only going to get worse as the tools available to attackers continue to get better and better. Just like generative AI is unlocking amazing new legitimate capabilities for businesses, it is also allowing attackers to conduct new types of attacks and exploit new attack vectors.
Going Forward
With this, we have entered a new era where we can no longer trust what we are seeing and hearing in real time across all of the voice and video communication channels we regularly use. This includes not just malicious use of generative AI to attack these channels, but also legitimate commercial products like Pickle that blur the lines between real and fake in our everyday communication.
The question then becomes “how do we navigate this new reality”?
From a security perspective, it starts by acknowledging that this irreversible shift has taken place, and that humans can no longer be relied upon as the only security layer for voice and video communication channels. In fact, if anything, they are what is being targeted across these platforms.
Then, if this is accepted, the next step is to identify how these communication channels can be protected and defended. Updates to company trainings, policies, and processes will need to be made, along with the adoption of technology specifically built to help secure these channels - not just from deepfakes, but from all types of social engineering, voice phishing and fraud, whether they use deepfakes or not.
This is where my startup DeepTrust can help.
At DeepTrust we are on a mission to protect human authenticity in a new era where AI allows the convincing replication of anyone's likeness.
What this means in practice is that we help security teams identify and defend against social engineering, voice phishing, and deepfakes across all of their voice and video communication channels. We do this by integrating with VoIP services like Zoom, Microsoft Teams, Google Meet, RingCentral and others, to verify audio source, detect deepfakes, and alert both users and security teams of suspicious requests - all in real time.
If you’re a security leader interested in learning more and understanding your options when it comes to protecting the voice and video communication channels for your company, I’d love to chat.
For everyone else reading this, I encourage you to begin having conversations around security in this new era of digital communication with your friends and family. Institute passwords, pause to ask questions if you get an urgent call, and most importantly, ensure those around you are also aware of the developments in technology that have happened over the last couple years. You would be surprised how many people simply have no idea.
If you have any questions, I’m always happy to chat. Stay safe out there!