What is open-source software?
Open-source software, or free and open-source software (FOSS), is the collective name for software for which the source code of the software is publicly available and which is under an open licence that grants anyone the right to use it, modify it, and distribute it (either the original or the modified version) free of charge or royalty. This is in contrast to typical proprietary software licences that usually seek to restrict the licensee from accessing the source code, using it across multiple users, locations, or computers, and making and distributing copies. FOSS emphasises freedom over restrictions. Although it is available free of charge (gratis), the word “free” is intended to connote liberty, not price. There are subtle differences between free and open-source software. The term ‘free software’ is more associated with the Free Software Foundation and has a different philosophy from open-source software, with more emphasis on the ethical and philosophical aspects of software freedom and a focus on the individual user.
The development of open-source software has grown from a niche activity to enter the mainstream, and today open-source software provides the basis for virtually all modern software development. This includes Apple’s iOS and macOS operating systems, Google’s Android phones, the software running in cars and on consumer devices, and much of the infrastructure running data centres and the internet.
People often believe that open-source software comes completely unrestricted. This is not the case: it is still subject to software licences. There are many different licences, and although they all guarantee the freedom to use, modify, examine, and distribute the code, they may still have conditions. The conditions may be straightforward (such as retaining disclaimers and copyright notices in redistributed code), or they may be more complex (such as requiring redistributed code to be licensed under the same open-source licence and for source code of any modifications to be made available).
‘Open source’ is the opposite of ‘closed source’, which is normally called ‘proprietary’ in the open source world. Proprietary code (sometimes inaccurately referred to as commercial software) is owned and developed by a company or individuals and typically licensed only in object code, with the source code usually kept secret. Sometimes the source code is provided for specific purposes but cannot usually be redistributed and may well be released under a non-disclosure agreement. This type of licence is sometimes called a “shared source” agreement and, despite access to the source code, is still a proprietary, non-open-source licence.
The Open Source Initiative provides a definition of open source software
The Open Source Initiative (OSI) is a California public benefit corporation actively involved in open source community-building, education, and public advocacy to promote awareness about the importance of non-proprietary software.
The OSI created the “Open Source Definition” (OSD), which is a list of ten criteria that must be complied with in order for a licence to qualify as an OSI-approved open-source software licence. Transactional lawyers frequently utilise the OSD as a crucial tool to define open-source software in contracts.
The developers of a piece of software are not required to select any specific licence if they wish to describe the software as ‘open source’. But the OSI will only consider the software open source if the licence terms meet all OSD requirements. The OSI has a process for determining whether licences comply with the OSD; if so, the licence is approved.
Note that software may be open source, even if it is licensed under a licence that is not on the approved list. Provided that the licence meets the OSD criteria, the software will be open source. The OSI has a triage process intended to limit the approval of new licences (even if they are potentially OSD-compliant), partially to reduce the workload on its approvals committee, but mainly to reduce licence proliferation. The open-source community perceives that more than enough licences are available to cover almost every possible use case, and that developers should therefore select an existing approved licence rather than creating one of their own or adopting a non-approved licence.
Why is open-source artificial intelligence more complex?
Many of the artificial intelligence (AI) models that have been the focus of widespread attention recently, such as OpenAI’s ChatGPT and Google’s Gemini, are proprietary or ‘closed’ AI models. The internal workings, algorithms, and data used to train the models are not made publicly available, and access to them is provided under a commercial licence and overseen by the respective providers. By contrast, open-source AI models are, broadly speaking, made available under licences that allow them to be used, modified, and distributed by anyone without any oversight by the licensor(s). However, the intersection of AI and open source and the precise definition of what constitutes open source AI is complex for several reasons:
• AI systems have complex components—There are inherent complexities to AI systems that distinguish them from traditional software. AI systems include components that are not always covered by standard open-source software licences. In its Model Openness Framework (MOF), the Linux Foundation identifies 16 critical components that it states “constitute a truly complete model release,” including the core model (architecture, parameters, and documentation), the full suite of code used for training, evaluating, and running the model, the key datasets and raw training data, and a thorough research paper detailing the entire model development process, intermediate checkpoints, log files, and more.
Data is a particularly important aspect of AI systems. In a blog dated 9 February 2024, the Linux Foundation emphasises that ‘[w]hile software played a central role in the evolution of IT systems over their first few decades, data has played the central role in the advances of AI over the past 20 years. Data is not merely the fuel for AI systems but the determining factor in the system’s overall quality. Despite this, companies describing their release of particular AI models as ‘open source’ do not always publish the underlying datasets and sometimes incorporate additional commercial terms into the licences, which has led to criticism of their use of the term.
• Barriers to access—Developing large-scale AI systems also requires significant computational, financial and research resources. Access to these resources has generally been concentrated in the hands of a small number of leading tech companies, such as Microsoft, Google, Meta, and Amazon. Large developers have therefore been able to maintain their market hold over state-of-the-art development and have generally chosen to preserve their proprietary rights over key developments. Access to computing can be a significant practical obstacle to the reuse and further development of large models, and some providers embed open models in proprietary ecosystems.
There are some large tech players, such as Meta and IBM, who are public supporters of openly releasing AI models. Meta has made its Llama 3 model available under a permissive licence and published the weights and the inference code, a release that is considered significant by many. The open-source company Hugging Face wrote in its blog dated 18 April 2024, ‘It’s wonderful to see Meta continuing its commitment to open AI, and we’re excited to support the launch fully with comprehensive integration in the Hugging Face ecosystem.’ However, Meta has also received criticism for using the term ‘open source’ while not releasing the underlying data sets for Llama 3. The licence for Llama 3 also contains restrictions on the use of the model by companies with over 700 million monthly active users, essentially barring any use by Meta’s competitors.
However, there are indications that the barriers to access could potentially change. In January 2025, the Chinese start-up DeepSeek disrupted the AI market by making freely available a reasoning model, R1, that was reported to have similar capabilities to OpenAI’s o1 model. DeepSeek claimed that the model was developed at a cost of around £6m and that it used significantly less computing power to train the model, using less powerful Nvidia H800 GPUs. There was scepticism expressed by some media sources about the exact claims; however, DeepSeek’s technical paper for R1 signalled that the company made innovative use of efficient algorithms and optimised the use of hardware. The innovation demonstrated by DeepSeek may indicate a trend towards greater accessibility to AI models.
In conclusion
Open-source software continues to be a pillar of creativity, cooperation, and accessibility as the software development environment evolves. The values of openness and mutual development have significantly influenced the digital framework that supports a large portion of our modern world. However, unusual and complex difficulties exist at the intersection of artificial intelligence and open source. The nature of AI systems, especially the significance of training data and computational resources, has brought forth subtle considerations that traditional open-source frameworks do not fully address, even though the fundamental open-source principles of freedom to use, modify, and distribute remain applicable.
There are serious concerns about what exactly qualifies as open-source AI, given the growing trend of releasing AI models under “open” terms, without complete transparency or access to datasets. Clearer definitions, strong governance, and responsible licensing that take into account the special qualities of AI are desperately needed as stakeholders from all sectors of the tech ecosystem—from large corporations to up-and-coming firms—manage this changing environment.
Ultimately, a team effort will be required to maintain the principles of openness while adjusting to the ethical and technical requirements of advanced AI development if open-source AI is to survive. Finding the ideal balance between innovation, accountability, and fair access will require sustained dedication from developers, legislators, and the general public. However, a more inclusive, transparent, and accessible AI ecosystem is achievable.


