Building cost-effective AI models: Creating accessible AI for all
With recent AI development, we have seen the priority of big tech companies be the creation of ever-larger language models. While sheer scale does have its benefits, the escalating costs associated with training and running these massive models has become a significant obstacle, particularly for small and medium-sized enterprises. With this in mind, and as new developers enter the space, we are seeing this trend slowly shift, the focus moving away from models of scale, and now onto how AI technologies can be made accessible and affordable for everyone.
With limited funds posing a significant challenge for smaller organizations looking to invest in AI, we could see prohibitively expensive models stifling innovation and diversity within the market. Customizing models like GPT-4 for specific business use cases currently comes with a hefty price tag, often reaching tens of thousands of dollars. Moreover, as models become more complex, long-term operational expenses soar. For instance, maintaining servers for ChatGPT can incur a staggering daily cost comfortably in the six figures. Smaller enterprises require intelligent and optimized model architectures that can compete with the capabilities of larger models at a price point consummate with their business size.
These exorbitant costs could easily render the customization of large models for domain-specific applications unachievable for many organizations. There are, however, potential solutions to this problem as there are methods to address and lower the costs associated with the training and operation of AI models.
Strategies for Cost-Effective Development
There are several stages in training and tuning AI models, each with its own methods and considerations when it comes to cost reduction. One common approach is to take an existing open-source model and train it to align with an organization's specific needs. In order to minimize training and running costs, it is essential to determine the fewest number of parameters and tuning necessary for each specific use case.
Three types of model tuning play a vital role in making models fit for purpose: fine-tuning, instruction set tuning, and prompt tuning.
Optimizing Models through Fine-tuning
Fine-tuning involves making small adjustments to how models comprehend and encode language within a particular domain. It allows users to address the problem of underrepresented tokens, thereby enhancing the model's contextual understanding. For example, if a model is initially designed to recognize and categorize scientific papers, fine-tuning it for a similar use case, such as patent research, could well be more efficient than training a more general model from scratch.
By diligently curating a dataset, often incorporating a business's proprietary data, a fine-tuned model can surpass the accuracy of the generic model it originated from. An approach of emphasizing quality rather than quantity enhances accuracy while minimizing total training time.
A Smarter Approach to Model Instructions
Instruction set tuning can offer cost and data efficiency compared to fine-tuning, albeit requiring the meticulous formulation of instructions and prompts. Automating data gathering to enable scalability poses another challenge for these approaches.
Introduced in 2021 through the ground-breaking 'Fine-tuned Language Models Are Zero-Shot Learners' paper by Google researchers, instruction set tuning is a relatively recent technique. It entails providing the model with an understanding of specific instructions, eliminating the need for users to provide step-by-step guidance.
However, this approach does have limits, particularly in managing performance losses caused by counterproductive or overlapping instructions. Overcoming this challenge involves the use of highly customized and curated datasets with discreet instructions, typically requiring manual creation and curation. Alternatively, deploying a "swarm" of intelligent, specialized language models can automatically generate high-quality datasets, reducing the need for extensive human labor.
Extracting Knowledge from Models
Prompt tuning allows organizations to extract specific knowledge from a model based on its encoded information, similar to formulating a search engine query to obtain precise results. As we zoom out to this high-level optimization, it's important to remember that the effectiveness of prompt tuning is reliant on the quality of any fine and instruction tuning that has taken place.
If the required information has been properly encoded in the model, fine-tuning may not be necessary. However, since language holds a multitude of meanings in differing contexts, fine-tuning often becomes indispensable for optimizing a model to cater to specialized domains. Similarly, if a model possesses the ability to execute multi-step instructions and present information in a user-friendly manner, instruction tuning may not be required.
Balancing Size and Capability
The number of parameters in a model, often measured in the billions, signifies the quantity of activated components during a response. While a higher parameter count may suggest a better result, this may simply be inefficient.
To increase cost-effectiveness, treating size and capability as a linear progression must be avoided. Instead, models should be smarter and feature optimized architectures, moving beyond a brute-force approach for every use case.
Developers should consider the tasks their AI models aim to accomplish. For instance, in the case of language models, it is essential to determine if the model needs to excel in specific areas of natural language processing, such as sentence boundary disambiguation, factual validation, or part-of-speech tagging. This analysis will highlight areas within the architecture that require focused attention and identify opportunities for simplification.
The creation of a streamlined model requires the ability to perform fine-tuning, instruction set tuning, and prompt tuning in a cost-effective manner. This establishes a "temperate zone" for the optimal number of parameters. Too few may compromise performance, while too many will exceed the threshold and possibly render the price point unattainable for smaller businesses. As with any resource-intensive endeavor, striking a balance is key.
Data-Centric Approach to AI
The concept of 'data-centric AI', advocated by influential figures like Andrew Ng, emphasizes the significance of data quality over quantity. With the progression of algorithms and the proliferation of open-source large language models for training models, it is time to focus on how to engineer data in constructing cost-efficient models without compromising performance. Industry leaders like Microsoft, through initiatives like Phi-1, are already heading in this direction.
A key aspect is the emphasis on collecting high-quality, carefully curated datasets for fine-tuning. This approach ensures both high accuracy and reduces the risk of generating false information while minimizing total training time. Looking ahead, the use of synthetic datasets might become a feasible option, making it possible to obtain the necessary data at volume, even for highly specialized domains.
In making AI tools financially viable for smaller organizations, it is crucial to develop smarter language models that consume only the minimum amount of computational resources. The costs saved through this would widen access to these powerful tools and ensure we take significant strides towards democratizing AI and make it available to all regardless of the size of enterprise or specialization of their domains.
Image credit: Laurent T / Shutterstock
Victor Botev is CTO and Co-founder of Iris.ai. Victor is an AI Researcher from Chalmers University of Technology, and tech lead at international companies and developer of several autonomous systems.