Why not all AI is created equal and how the wrong choice could be hurting your business [Q&A]
AI seems to be everywhere at the moment. But despite the fact that it has become ubiquitous, it isn't all the same.
Steve Benton, VP of threat research for Anomali, talked to us about why not all AI is equal and what businesses need to consider to ensure they get the most from the technology.
BN: Why isn't all AI created equally?
SB: At the most fundamental level, AI is based on Large Language Models (LLMs), and although they are all huge, some are way 'huger' than others.
Right now, there are six full commercial LLM offerings and over 25 open source offerings, and this is only going to grow. There will be an almost Darwinian survival-of-the-fittest to a smaller set that will endure and sustain.
The furor and fun from ChatGPT’s launch was the advent of this completely jack-of-all-trades AI -- it captured the imagination of us all and doomsday fears of some. But in my opinion, just with human beings, when you come to employ them in your organization (and I use 'employ' deliberately here) in a particular domain and across a set of tasks, you should expect to have to both train and supervise them. And this is a key part of my approach to AI -- I see AI as another employee in the organization.
The truth is AI can only get so far as we allow it to… so this is like the frog sitting in the pan of heating water -- it will not realize it is being boiled until it's too late. So here, my approach is one of eyes wide open. Again I go back to treating AI in your organization as an employee, for example, in a hospital, you wouldn't just appoint anyone to perform surgery on patients. You'd look for training, qualifications, and track record, and once employed, you would continue training and supervising.
BN: Why isn't more data feeding into AI necessarily the best idea?
SB: There is a danger in treating AI like big data -- I'm sure we all remember the approach many organizations took of creating a huge data lake with everything dumped in it, with the expectation that useful things would flow from that. The risk is you have created a huge messy/noisy pool with the potential for confusing the AI as the data is inherently of questionable fidelity and value. You have detuned your AI. Potentially undoing the good work you have spent many months on.
As I said at the AI and ChatGPT Solutions Forum 2023 in September -- "AI simplifies the complex, spans the vast, finds insights and patterns, learns and remembers, intuits and suggests, at breakneck speed…" -- so you need to strike the balance between keeping the huge span of data you have it assisting you with to be as good a fidelity and relevance as it can be, coupled to enhancing its training to deal with the noise and mess and create the insights and answers you need.
The most important data for an AI is the data on which it is trained. This is how you create the quality and integrity for the AI you are intending to have as an important employee in your organization. That data needs to be of the right quality and be protected from unintended or deliberate modification -- it is what you will rely upon to rebuild your AI should that need arise.
BN: How will the next iteration of AI -- Personal instead of Generative -- benefit CISOs?
SB: CISOs are all facing the challenge of achieving more and at greater cost efficiency.
In the beginning, I can see the AI systems working very much as the generic 'assistant' to the security team. Accelerating areas of analytics and summarization. As the system is allowed to learn more about the environment being protected and the relevant threat intelligence for the organization it can evolve to the generic 'advisor.' At this stage the AI has established its position of trust in the SOC -- trust is really, really important -- and for AI must never be unconditional.
The final evolution is AI becoming the 'Iron Man suit' specific to each analyst, perhaps also a suit for the CISO. In this role, the AI has not only learned the threat landscape for the organization, and the security posture of the organization but also learned how the individual they are wrapped around think through issues/incidents and reach decisions. They understand what worries the human they are serving and highlight relevant trends and threats. Here they have become the 'personal' adviser/partner. The human however is still and always accountable.
BN: When shouldn't companies use AI?
SB: Going back to my 'treat AI as an employee' principle, I think we should continue to rationalize this in human terms -- responsibility vs accountability. In traditional RACI models these are defined as:
- Accountable: The one ultimately answerable for the correct and thorough completion of the deliverable or task, the one who ensures the prerequisites of the task are met and who delegates the work to those responsible. In other words, an accountable must sign off (approve) work that responsible provides. There must be only one accountable specified for each task or deliverable.
- Responsible: Those who do the work to complete the task. There is at least one role with a participation type of responsible, although others can be delegated to assist in the work required.
You won't be surprised to hear that I think the AI must be restricted to the 'responsible' role and that the 'accountable' role must always be a human. Unilateral, unsupervised operation of AI is dangerous unless it has been assessed as being fully safe. A wrong/bad decision cannot cause harm (especially directly to humans), and in addition any errors can be detected and corrected via training/correction (just like with human employees). A useful addition to AI systems would be a 'four eyes' approach e.g. a rules based system that checks the output from the AI to prevent harmful decisions from being executed.
As information is placed within the models they become part of those models -- whether that is PII or corporate sensitive information or IP. So, you need to apply your information security management principles here. If the LLM you are using is public then the only information that can go in there is information that you are happy to publish in the public domain -- because essentially that's what you're doing. If you are using a private AI then first off make sure you read up exactly what private means -- you need to know where your information will go and how its access and confidentiality is maintained right through its lifecycle. An area here that is still raging in debate is data deletion. Going back to the brain analogy it's like trying to wipe a memory, how do you find the bunch of neurons in the brain that constitute that memory? Essentially the only way being offered at present it to retrain the AI from scratch. Data Protection regulators are currently wrestling with this problem for the public AIs -- the right to be forgotten -- because with things like ChatGPT -- we may already be in there! And were we asked whether we wanted to be when it was being trained?
But back to individual organizations making decisions about their own private AI. Deletion of data is in essence 'machine unlearning', and just as learning is training based so will unlearning be. And as I said it seems the only reliable way is to burn the current model and train it back from scratch without the data you want deleted. But that means it having have had all the data it has learned from (not just the original training set) reapplied. This sounds like a monumental undertaking and as such underlines the importance of that 'eyes-wide-open' principle in determining the application of AI in any organization.