Afterwork EPITA Alumni x LINAGORA at the Villa Good Tech

A few days ago, LINAGORA welcomed the EPITA Alumni Association to the Villa Good Tech for an afterwork session, followed by a networking cocktail, on the challenges of artificial intelligence in business. The event was an opportunity for LINAGORA's Managing Director and co-founder, Michel-Marie Maudet, to talk toAugustin Abelé and Robin Champsaur from Moqa Studio about generative AI technologies, and in particular RAG (Retrieval-augmented generation) in the field of customer service.

AI has become an unavoidable topic in today's technological landscape. Generative language models, such as ChatGPT, Claude, and Perplexity, have revolutionised the way we interact with technology. However, behind these advances lies a crucial question: that of technological independence and digital sovereignty.

 

The need for a sovereign European AI

 

Technological Dependence

Europe, and France in particular, has often lagged behind when it comes to major technological advances. Michel-Marie confirms: ‘If we don't act now, we will become completely dependent on American technologies.

Today, around 80% of Internet requests pass through American servers from the first click, making it difficult for European Clouds to be independent. Without action, this dependency could reach 100%. Another crucial point is dependence on GPU cards: following Nvidia' s ban on selling its latest GPUs to the Chinese, there is no guarantee that Europe will not be the next target... This issue goes beyond GAFAMs and touches on our technological and civilisational sovereignty.

" Basically, as soon as you enter a URL, you ‘enter’, you are already potentially in a system managed by the Americans.

Michel-Marie MAUDET

 

The importance of culture and diversity

The opinions in AI models depend on the data and the culture of their sources. If these data come mainly from American sources, the opinions and values they generate may not reflect European cultural diversity.

" Personally, I don't want my children's future to be guided by AIs who don't have the culture or knowledge of what we are today."

Michel-Marie MAUDET

 

AI models are profoundly influenced by the data they are trained on. For example, the Llama model is trained mainly in English, which limits the cultural diversity it represents. Just imagine: if you ask a model to draw a house for a family near Arras or in Alsace, it will probably have difficulty creating an image that is representative of this region, because it has not been exposed to a lot of local data: ‘the information is there, but it's less than 1% compared to 90% of the other data’.

This limitation can be seen in the models' answers: ask, for example, what the first personal computer was. Although the Micral N is the first, many models will mention the Altair 8800, because American data dominates. By choosing the training data, designers influence the model's biases and preferences, making it difficult to obtain diverse perspectives.

The OpenLLM-France initiative, led by LINAGORA and other Open Source AI players, aims to reverse these trends by developingfully open and independent AI models.

 

1729609776569.jpeg

 

LINAGORA and the OpenLLM-France community

 

A long-standing commitment

" What we've been doing for 25 years at LINAGORA is developing alternatives to the giants! "

Michel-Marie MAUDET

 

At LINAGORA, we have been committed for 25 years to offering alternatives to the technological giants (American and Chinese) through Open Source solutions.
The OpenLLM-France initiative is part of this approach: creating AI models that are 100% open and accessible to all, offering technological sovereignty to governments and guaranteeing technological dependence on the private sector.

 

OpenLLM-France: community and collaboration

The success of OpenLLM-France rests largely on the collaboration and strength of this community. The idea behind this initiative is to unite a community of Open Source generative AI enthusiasts around work that we call the ‘digital commons’. With this objective in mind, the initiative has succeeded in mobilising hundreds of specialists from laboratories and academic institutions to work together on ambitious projects such as LUCIE: the very first 100% Open Source generative AI model in training.

Michel-Marie explains : 

 

"Out of 800 people (last count in September), 1%, i.e. nearly 80 people, are currently working with us. These are specialists from laboratories who have already worked on Bloom's training, who work at GENCI, and who give us access to the Jean ZAY machine. "

It's a collaborative approach that reflects the community values of Open Source, enabling us to leverage our efforts and benefit from a wide range of qualified expertise.

 

Challenges and prospects for OpenLLM-France

 

Technical and financial challenges

Training large-scale AI models requires considerable resources, both in terms of data and computing power. The OpenLLM-France consortium is fortunate to be one of the winners of theAAP France 2030 on digital commons, for which it receives financial support from the French government. And thanks to LINAGORA's various private/public partnerships, the community has access to substantial computational resources such as GENCI's Jean Zay supercomputer.

Michel-Marie Maudet stresses the need to use small models. For AI to be used effectively, it is preferable to createSmall Language Models (SLMs) that can be run on ordinary machines, rather than Large Language Models (LLMs) that require massive resources. This preference also avoids dependence on heavy infrastructures such as data centres. These kinds of specialised models also represent a major opportunity, both economically and ecologically:

"Those famous H100 cards you've been dreaming about. There are 1,474 of them at a purchase price of 40,000 euros each. So you do need a big budget".

Michel-Marie MAUDET

 

In addition, using the OpenAI API shows that queries in French cost 30% more than those in English, due to a tokenizer trained mainly on English, which leads to an increase in the number of tokens needed for French, and therefore additional costs.

 

The outlook

OpenLLM France aims not only to create high-performance AI models, but also to make them accessible to everyone. The LUCIE model, which is currently being learned, will be published under an Open Source licence, allowing anyone to use and modify it. This approach aims to democratise access to AI and encourage innovation.

There's no doubt that France, and more broadly Europe, have the capacity to develop high-level artificial intelligence models, even if the comparison with foreign models is not always fair: French-speaking models are rarely evaluated using appropriate benchmarks, which makes it difficult to compare them with other models such as Llama or ChatGPT.

The community's aim in building this LLM is to meet specific needs, such as those of the education sector. Unlike generalist models such as ChatGPT, which require an Internet connection, LUCIE will be able to operate offline, on a simple computer, an asset for teaching in France. The aim is to ensure that French pupils, particularly those at the end of primary and secondary school, have access to AI that reflects French values and culture, rather than content that is globalised and biased towards English.

The community is therefore turning its attention to multimodal projects, integrating voice and speech recognition in French. The model could thus be used for voice interactions in French, a functionality that English-speaking AIs do not offer optimally in this language. Their long-term vision is to develop ‘action models’, i.e. agents capable of carrying out specific actions, going beyond traditional language models (LLMs).


As you can see, the challenges are many, but the prospects are promising. Through this initiative, Europe could well become a major player in the field of generative artificial intelligence.


 

 

How can I help you?

CAPTCHA
15 + 0 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.