The AI Productivity Boom: Embracing the Shift Toward Smaller, Specialized Models

"AI will not take your job. The person using AI will take your job." –Jensen Huang

All the buzz right now is Artificial Intelligence. You simply cannot avoid the phrase “AI” when listening to earnings calls for any publicly traded tech company in the stock market. The phrase “AI” dominates earnings calls for publicly traded tech companies and extends far beyond the technology sector due to its seemingly limitless applications. But what makes AI so special? Well, from my months of extensive research, if I had to boil down all of AI’s advantages down to one word it would be “productivity.” At the core of economic sustainability for a country, business, civilization or whatever you may call it, productivity lies at the heart of it. Productivity means growth, growth means innovation, innovation means efficiency, and efficiency means margin expansion. This article delves into the current state of AI, explores the distinctions between large and small models, discusses the challenges inherent in AI development, and examines why businesses might increasingly favor specialized models over large language models (LLMs) moving forward.

AI has been around for a long time, but it only really started to gain traction in the mainstream media when the company OpenAI, headed by Sam Altman, broke the internet with the release of ChatGPT-3. In technical terms, ChatGPT is called a Large Language Model or LLM for short. LLMs are an advanced type of artificial intelligence designed to understand, generate, and interact with human language through vast amounts of text data. LLMs, such as OpenAI's GPT-4, use deep learning techniques to predict and produce coherent text based on context provided by the user. These models are capable of performing a wide range of language tasks, including translation, summarization, question-answering, and creative writing. In contrast to large generative models, specialized models are designed for specific tasks, such as image recognition or sentiment analysis. These models are optimized for their particular functions, often resulting in greater efficiency and cost-effectiveness for those specific applications compared to the more generalized LLMs.

Creating an AI model, regardless of its size, involves several critical steps:

Define the Model's Purpose: Clearly articulate what the model is intended to achieve.
Determine the Model Size: Decide on the complexity based on the scope and requirements of the task.
Collect and Clean Data: Gather relevant data and preprocess it by removing duplicates and irrelevant information to ensure quality.
Design Model Architecture: Develop a blueprint that outlines how the model will process data.
Train the Model: Use iterative algorithms to adjust the model's parameters for optimal performance.
Evaluate and Test: Assess the model's effectiveness and make necessary adjustments before deployment to end user.

The methods for completing each step are vastly different in creating an LLM vs a specialized model. One key difference is in the data collection and cleaning step. Specialized models require datasets that are very specific to the end task. These datasets are often smaller in size, with cleaning focused on the task specified for the model. For LLMs, this data collection process is significantly more resource-intensive. They require massive and diverse datasets for pretraining that require extensive formatting and removal of noise to ensure uniformity to train the model. LLMs are also typically extremely large, often tens or hundreds of billions of trainable parameters, which greatly increase the costs associated with data storage, hardware, and computational power. In contrast, specialized models utilize smaller, task-specific datasets and simpler architectures, making them faster and more economical to develop and deploy. The architecture is like the blueprint of a building, outlining the structure and design of a neural network to determine how it processes and interprets data. It not only determines the efficiency, cost of training, and deployment but also defines its purpose and versatility.

The difference in architecture is where I would like to spend more time because it affects so many other facets of the model inputs and outputs. Because LLMs require much larger datasets, more data cleaning and model training, it increases the costs of almost everything. For companies that create foundation models on premise like Google’s Gemini or Meta’s Lllama2 they must purchase, store, and clean larger datasets, buy more hardware (GPUs and TPUs) for training, and inference costs will go up drastically in the model deployment process. The costs are ongoing and there will always be a need for the latest and greatest GPUs and TPUs because their efficiency can shorten the amount of GPU hours used to train data, thus saving time and money.

Inference refers to the process by which a model interprets user prompts and generates corresponding responses. When a large language model is deployed to millions of users, the number of queries or requests grows with user demand, causing inference costs to scale accordingly. In such scenarios, inference costs can sometimes surpass the initial training costs over time, making it a significant expense in the long term. But how is the cost of inference measured? Inference runs on a discrete unit of measurement called a “token”. One token roughly equates to 3/4 of a word. So, 100 tokens would equate to roughly 75 words of both the prompt and the output from the model.

To illustrate, let's consider a hypothetical example scenario of a relatively small LLM:

Training Costs:

Compute Power:
- Training Duration: 2,000 hours
- Cost per GPU Hour: $2.50 (average rate for cloud providers)
- Total GPU Cost: 1,024 GPUs * 2,000 hours * $2.50 = $5,120,000
Memory and Storage: $30,000
Data Costs:
- Data Acquisition and Cleaning: $50,000
- Data Storage: $2,000
Personnel:
- Data Scientists, Engineers, and Researchers: $200,000+

Inference Costs (Monthly):

Compute Power for Inference: $20,000
- Usage Assumption: Running 10 million inferences per month
- Cost per Inference: Estimated at $0.002
- Total Inference Cost per Month: 10 million * $0.002 = $20,000
Operational Management:
- Cost for Monitoring, Logging, and Management: $10,000
Scaling Infrastructure for High Availability: $5,000 per month

Summary

Total Training Cost: $5,402,000
Total Running Cost (Monthly): $35,000

As shown above, creating an LLM involves numerous factors, making a strong case for why companies will likely continue to rely on smaller models. While significant progress has been made in improving the efficiency of LLMs, smaller models are designed for specific tasks, such as image recognition and sequence processing. These models inherently have far fewer parameters than LLMs, making them faster and more cost-effective to train and deploy.

There is an endless amount of data in the world. More data means more information and more knowledge means better models; therefore, data collection is always an ongoing process and the need for memory and storage of this data is always going up. It’s all a chain reaction because when you need more hardware, software, and computing power, then your energy, maintenance, personnel, and data center costs increase significantly.

Reid Hoffman, famous for co-founding LinkedIn and joining Microsoft's board of directors in 2017, said recently on the All-In Podcast:

“Already today, for example one of the things that happens with all the model providers at Microsoft and OpenAI which I’ve seen is they’ll sometimes sub in GPT 3.5 as opposed to 4 to see what the answers are because there is a cost to compute. Even as you learn to bring the cost of compute of the larger model down, larger models are always going to be more expensive loosely on the order of magnitude. So, what I think you’re going to see is networks of models and traffic control and escalations. The AI agents are not going to be one model, they are going to be blends of models. You can train very specific models on high quality data along with the larger model helping train it, then all of a sudden you have a functional smaller model.” (source)

The age of productivity has just begun and there will be a resurgence in the coming years of creating smaller models because many companies will need AI models to increase productivity and efficiency, but do not have the resources to create an LLM from scratch. Instead, they will use already existing LLMs to create smaller and less expensive specialized models trained on the company's own data to make them more efficient. The most famous example of this was performed by Microsoft’s Research team in their published study, “Textbooks Are All You Need.” The research team created Phi-1, a smaller model, whose development heavily relied on the capabilities of a larger model (GPT-4) to curate and generate the high-quality training data that enabled its superior performance. This highlights a teacher-student dynamic, where the larger model effectively "teaches" the smaller one by providing curated and synthesized knowledge.

There are countless benefits for businesses in implementing similar approaches within their operations. Creating or adopting smaller, specialized AI models in this way offers several significant advantages:

Increased Efficiency: Employees can rapidly access information and insights without relying on colleagues or supervisors, streamlining workflows and reducing bottlenecks.
Cost Savings: Automating tasks with AI reduces labor costs, enabling businesses to accomplish more with fewer resources and reallocating human talent to more strategic initiatives.
Enhanced Productivity: Customized AI models designed to meet a company's specific needs can improve accuracy and speed in various processes, driving overall performance improvements.

Consider the following examples of how this will likely function in everyday business situations:

Example 1: Personalized Financial Planning Assistant

Using an LLM to curate and preprocess data from financial planning textbooks, articles, tax codes, regulatory documents, and case studies. This processed data can then be used to train a smaller, specialized model on proprietary client data stored in your CRM (Customer Relationship Management), such as Salesforce or SharePoint. Advisors could then prompt this tailored model to generate a briefing on a client’s history before a meeting. Additionally, it could assist in brainstorming ideas, recommending questions to ask based on the client’s prior history, and even creating detailed post-meeting action plans.

How It Works

Step 1: Data Integration

The smaller model is trained on:

Client financial history: Investments, goals, and risk tolerance from CRM.
Meeting notes: Action items and key decisions from past sessions.
Portfolio performance: Current and historical performance metrics.
Behavioral data: Communication preferences and key milestones (e.g., retirement age, college funds).

Step 2: Meeting Preparation

Before a meeting, the model could:

Summarize client data: Provide a briefing on the client’s financial situation, including recent portfolio performance, progress toward goals, and any flagged issues (e.g., underfunded accounts, market risks).
Generate questions: Recommend tailored questions based on prior history, such as:
- "You mentioned in our meeting 8 months ago that you were thinking about restructuring your business to an S-Corp or C-Corp, and I recommended breaking up your LLC into two different S-Corps. Did you end up speaking with your CPA on how to go about this or would you like me to recommend a tax attorney for any assistance?"
- "How is your son Timothy doing since we last spoke? I remember you said he was going to have a child. Are you excited about being a grandparent?"
Highlight opportunities: Suggest relevant topics based on curated insights from LLMs:
- New tax strategies relevant to their income bracket or business structure.
- Investment opportunities aligned with their risk tolerance and market trends.
- Potential plan adjustments for life events (e.g., inheritance, college savings).

Step 3: Post-Meeting Action Plans

The model could help:

Draft a follow-up email summarizing key points.
Recommend action items for the advisor, such as sending educational materials, scheduling additional meetings with specialists, or setting periodic distributions for RMDs depending on the client’s tax needs.

Example 2: Personalized Patient Chart and History Analysis Tool

In a hospital setting, a nurse or doctor can use a proprietary AI model trained on the hospital’s electronic health records (EHR) system to streamline patient care. This smaller, task-specific model would be trained using a large language model (LLM) to process and synthesize vast medical knowledge (e.g., textbooks, clinical guidelines, and medical articles) and then fine-tuned on the hospital’s proprietary patient data.

How It Works

1. Data Integration and Training:

The LLM is used to curate a foundational understanding of medical conditions, treatment protocols, and diagnostic approaches.
A smaller model is then trained on the hospital’s proprietary EHR data, including:
- Patient demographics.
- Historical diagnoses and treatments.
- Lab results and imaging reports.
- Notes from prior visits or specialists.

2. Real-Time Use by Healthcare Providers:

When a nurse or doctor meets a patient for the first time, the model can:

Generate a Detailed Patient Summary:
- Summarize key information from the patient’s medical history, such as:
  - Chronic conditions (e.g., diabetes, hypertension).
  - Recent test results (e.g., abnormal blood sugar levels or cholesterol).
  - Current medications and potential interactions.
- Flag urgent concerns (e.g., risk of sepsis based on vitals and lab trends).
Propose Relevant Questions:
- Suggest tailored questions based on patient history to guide the provider’s examination:
  - "Have you been experiencing shortness of breath since your last hospital visit?"
  - "Are you adhering to your prescribed medication regimen for hypertension?"
- Highlight areas needing further clarification, such as unexplained symptoms or missed follow-ups.
Assist in Charting:
- Automatically draft chart entries summarizing the visit and ensuring all required documentation is completed:
  - "Patient reports persistent lower back pain for 2 weeks. No radiation to legs. Taking ibuprofen intermittently."
- Reduce documentation burden, enabling the provider to focus more on patient interaction.

4. Post-Visit Assistance:

The model can suggest next steps for care, such as:
- Scheduling follow-up tests or imaging.
- Referring the patient to a specialist.
- Generating patient-friendly educational materials based on their diagnosis.

Here are the main benefits that both examples have in common that can increase company margins and productivity:

Scalability

Doctor: Allows medical professionals to manage more patients effectively without compromising care quality.
Financial Planner: Enables advisors to serve a larger client base while maintaining a personalized touch.

Better Client/Patient Experience

Doctor and Financial Planner: Creates a sense of personalized attention and thorough care, improving patient satisfaction and trust.

Cost

Doctor and Financial Planner: For doctors, these models eliminate the time spent asking nurses or administrators for patient details by providing instant, detailed summaries of medical records. Financial planners similarly save time by automating client briefing preparation and generating tailored recommendations without needing extra support staff. This efficiency allows professionals to focus on high-value tasks, improving productivity and service quality while cutting operational expenses.

These are just two examples from finance and healthcare, but the potential applications across every industry and sector are vast. The productivity gains these services can deliver to employees and businesses are likely to be significant and, over time, nearly ubiquitous. Moreover, the shift from cloud-based LLMs to smaller, specialized models is still in its early stages.

By integrating specialized AI models, businesses will continue to unlock AI's transformative potential in ways that align with their operational goals and budget. The opportunities are vast, and the creativity of business managers and executives in identifying use cases will yield significant benefits both for their companies and their customers. It’s time for employees, managers, and innovators to think creatively and use AI to their advantage. If they don’t, they will fall behind those that do.

If you have any questions regarding our research or investment strategy, please do not hesitate to contact us at (888) 486-3939.

To find out more about Financial Sense® Wealth Management, click here to contact us.

Sources for hypothetical training cost example:

Advisory services offered through Financial Sense® Advisors, Inc., a registered investment adviser. Securities offered through Financial Sense® Securities, Inc., Member FINRA/SIPC. DBA Financial Sense® Wealth Management. Past Performance is not indicative of future results.

For Financial Sense Wealth Management (click here)

For Financial Sense Wealth Management
(click here)

The AI Productivity Boom: Embracing the Shift Toward Smaller, Specialized Models

Example 1: Personalized Financial Planning Assistant

Example 2: Personalized Patient Chart and History Analysis Tool

OpenAI’s o1 Model: A Quantum Leap in AI Intelligence – Insights from Dr Alan D. Thompson

America’s Power Surge: Mark Mills on AI, Data Centers, and Natural Gas

Eddie Yoon: Prepare for Golden Age of Small Business

13D's Woody Preucil on Rapid Growth of Data Centers Globally, Liquid Cooling, and More

How AI May Escalate Geopolitical Conflicts – Interview with Jim Rickards

Puru Saxena: Tesla May Become World's Largest Company

Interview with Ed Yardeni: Roaring 20s for US Stock Market to Continue

Deep Dive: Investing in the Space Race of the 21st Century

Asia Building Central Bank Digital Currency Powerhouse to Rival SWIFT, Says Rich Turrin

About the Author

Xavier Stonehouse

From Wall Street to Wellness: Scott Chaverri on the Red Light Revolution

US Treasury to Stockpile Bitcoins if Bill Gets Passed – Dr. Demelza Hays Weighs In

Peter Boockar: Investor Euphoria at All-Time High, Time to Start Thinking Like a Contrarian

About

Connect

Featured

Financial Sense

For Financial Sense Wealth Management (click here)

For Financial Sense Wealth Management (click here)

The AI Productivity Boom: Embracing the Shift Toward Smaller, Specialized Models

Example 1: Personalized Financial Planning Assistant

Example 2: Personalized Patient Chart and History Analysis Tool

Recommended

About the Author

From Wall Street to Wellness: Scott Chaverri on the Red Light Revolution

US Treasury to Stockpile Bitcoins if Bill Gets Passed – Dr. Demelza Hays Weighs In

Peter Boockar: Investor Euphoria at All-Time High, Time to Start Thinking Like a Contrarian

About

Connect

Featured

Financial Sense

Social Media

For Financial Sense Wealth Management
(click here)