LUCIE: towards open source, ethical and sovereign AI in France!

LUCIE: towards open source, ethical and sovereign AI in France!

A look back at the talk given by our AI Program Director Andrzej Neugebauer at PIAF Saclay to discuss the challenges of open source AI.

In an environment dominated by American and Chinese tech giants, we at LINAGORA believe in a different path: open source artificial intelligence that is transparent, ethical and in line with our own values. 
Andrzej takes us behind the scenes of the development of LUCIE with the OpenLLM France/Europe community, a compact, sovereign and ethical language model for the French-speaking world.

LUCIE is an AI: 

  • Entirely open source, both in terms of the model and the training data,
  • Ethical, respectful of privacy and copyright,
  • Sovereign, developed in France, with no dependence on foreign infrastructures,
  • Francophone, optimised for European languages.
     

The LUCIE model has 7 billion parameters and has been trained on a set of 3,000 billion tokens with carefully selected data, as well as respect for copyright and privacy :

  • All newspapers, monographs, magazines and legislative documents, as well as most books, are in the public domain.
  • Other data is published under permissive licences (CC BY or CC BY-SA)
  • All web data comes from websites that are not opposed to scraping.
     

The data sets are therefore open source, transparent and ethical.