Tyler Weitzman, cofounder, President and Head of AI at Speechify.
Every day, open-source AI technology powers more and more models. The 2023 “Open Source Security and RiskAnalysis” report by Synopsys found that 76% of code scanned in codebases was open source. In artificial intelligence, with an ecosystem dependent on public research and baseline dependencies, I believe the percentage is likely even higher.
A few months ago, a software engineer at Google released a document on a public Discord server. He said the “gap is closing” in terms of the edge Google and OpenAI have on open-source technology. The document states, “We (Google) have no moat, and neither does OpenAI.”
According to the leaked document, the engineer thinks open-source AI will outclass Google and OpenAI because the open-source community has now solved the “major open problems” and made it accessible to the general public. OpenAI’s and Google’s models are still better in quality, the document said, but “open-source models are faster, more customizable, more private, and pound-for-pound more capable.”
At my company, we have combined closed-source AI research with open-source language and audio models to create text-to-speech tools for users, and this experience has shown me the capabilities of open-source AI in today’s fast-paced digital landscape.
Benefits And Risks Of Open-Source AI
In the early days, implementing large language models was considered an expensive approach to driving research and innovation. But today, open-sourced LLMs can provide benefits for businesses with little implementation investment. Here are some of the benefits of open-source LLMs versus paying for third-party APIs:
• Security: Integrating LLMs into their own infrastructure can give companies control over their data and secure their sensitive information, whether on-premises or in the cloud. This can help prevent unauthorized access and data leaks.
• Transparency: An open-source LLM also gives companies transparency about their working mechanism, data training and architecture. This can make it easier to understand the shortcomings and biases of the model, as well as the tasks in which it may specialize.
• Price: In my experience, open-source LLMs are typically less expensive than proprietary LLMs, as they primarily involve hosting fees rather than margins and licensing fees to the developer. There are a variety of sizes of models available with a range of open-source performance optimizations to lower cost.
• Customizable: Pretrained open-source LLMs are easily tunable and customizable. Businesses can specialize these models to specific datasets and benefit from free community development support for any issues encountered.
However, businesses should also consider some shortcomings of self-hosting open-source AI before using or offering this technology. Open-source software often requires more time invested than paying an API provider, and community support is not being paid to solve every issue for you, so users have to debug and search for answers on online forums. Some tools might also not be as user-friendly, which can limit their usage to only experienced developers or data scientists.
I’ve found that many open-source AI solutions also lack quality assurance and testing, which can lead to sudden bugs or instability. There can also be security concerns, and there’s a need to validate that the repository has the appropriate licenses for a company’s intended use. I’d recommend organizations address and plan proactively for these challenges.
Organizations Releasing Open-Source Models
Considering the rising popularity of open-source models, many growing startups and bigger corporations, such as Meta, have included open-source AI in their strategy-making. Some examples of open-source models include:
• Llama 2: Owned by Meta, Llama 2 consists of pretrained LLMs with up to 70 billion parameters. It is a licensed open-source LLM for commercial use.
• Vicuna and Alpaca: These models were developed on top of Llama to be GPT-4-level language models.
• Falcon: The Falcon LLM is the Technology Innovation Institute’s open-source model that can be used for research or commercial use.
• Bloom: BigScience released Bloom, a multilingual language model built by 1,000-plus researchers.
• MPT-7B and MPT-30B: Databricks acquired MosaicML, an open-source startup that developed MPT-7B and MPT-30B.
• Mistral 7B and Mixtral 8x7B: Mistral AI, a French startup, released Mistral 7B, a 7.3 billion parameter model that surpassed larger models in benchmarks, and Mixtral 8x7B, a 46.7 billion parameter model that uses a mixture of experts architecture. Both are under the Apache 2.0 license for open use.
Some other open-source models include Google’s FLAN-T5, GitHub and Hugging Face’s StarCoder, Together AI’s RedPajama-INCITE-3B and Cerebras’ Cerebras-GPT. These models are trained on modern AI algorithms and help companies become a part of the open-source community.
Why did Meta open-source Llama? According to a blog by the company, titled “Introducing LLaMA: A foundational, 65-billion-parameter large language model,” open-sourcing Llama would help further democratize access for those who don’t otherwise have access. In other cases, companies might open-source to attract customers and developers. I believe OpenAI, for example, built much of its initial branding off of open-sourcing the earlier versions of its GPT systems.
Meanwhile, I’ve found that many individual developers believe enough in the mission of democratized AI that they contribute to open-source voluntarily and at no charge. Companies that open-source can benefit from contributions from this community of developers.
Getting Started With Open-Source AI
The future seems promising for the open-source community, and, from my view, it won’t take much longer for this technology to catch up to large organizations. Thus, I believe participating in part of this ecosystem will position businesses to thrive in the market.
It is also important to address the security, usability and stability concerns surrounding open-source AI. Today, I find that people want little to no restrictions when using LLMs. Corporations will need to take some necessary steps to establish themselves as leaders in the open-source community while safe-guarding against abuse.
Companies should also ensure effective governance around open-source models with robust legal frameworks that mitigate threats. Implementing these strategies can help businesses set themselves up for success in the open-source AI ecosystem.
Forbes Business Council is the foremost growth and networking organization for business owners and leaders. Do I qualify?