Home Lifestyle ChatGPT: Data is at the heart of the AI ​​process

ChatGPT: Data is at the heart of the AI ​​process

by admin

ChatGPT has been downloaded by millions of users within a few months, and brings the data source issue back to the fore. What data does artificial intelligence rely on? Can we trust him blindly?

Discriminatory behaviors observed in many previous AI systems chat, raises the issue of the quality and diversity of data used to train AI systems. To create trusted AI systems, organizations can play a key role by making all their data “shareable,” that is, non-personal, non-confidential data that is an invaluable mass of knowledge that they accumulate as a result of their activities. Without organizations willingly sharing this information, Amnesty International It will not take into account the diversity of subjects and actors and will provide false or partial information.

Since its launch, chat It is the subject of modern technology. Designed by the American company Open AI, This artificial intelligence, which has been downloaded by millions of netizens since its launch last November, fascinates and worries at the same time. conversational artificial intelligence, chat Able, at the request of an Internet user, to write long texts on specific topics, to compile a long document, to make an agenda, to generate ideas for writing a book, or even to write a book, to solve a complex arithmetic operation, to identify developers’ code errors, etc. ChatGPT is an AI that seduces simplicity of use and democratizes AI among the general public. But how does it work? What data does this AI rely on to produce information? If web contents are widely used, other data from data produced by companies or public bodies come to feed into this knowledge base. In all, ChatGPT concentrates hundreds of billions of data but does not integrate data from the network on the fly, and its repository for the time being ceases in 2021. By discarding data produced since this date, ChatGPT deprives the entire block of knowledge that can be processed, for certain requests , that undoubtedly leads to different answers. Can we then, in the context of truncated data, blindly trust ChatGPT?

Bias: A flaw inherent in many AI systems

Many older AI systems chatUncovering discriminatory biases. We remember Google ads and ads for high paying jobs offered to men more than women, Tay chatbot from Microsoft His racist remarks are posted on social networks or Facebook and a content recommendation algorithm that equates blacks to monkeys. Other societal, cultural, and economic biases produced by corporate AI systems have also been noted. But where do they come from? From the design of the algorithms to the quality and quantity of the data they feed. Thus, once an algorithm has been trained on data that reduces subject complexity or incorporates the programmer’s cognitive biases, its conclusions are poor. Then the AI ​​loses its usefulness. Thus, combating these abuses involves making the professionals responsible for algorithm design aware of their own prejudices. And by using representative data sets to avoid any distortions in the algorithm training process.

Publishing shareable data reduces AI drift

In 2018, mathematician and MP Cedric Villani insisted on the question of data during his speech giving his report on artificial intelligence: “Data is the raw material for contemporary AI, and the emergence of many uses and applications depends on it.” Therefore, companies have every interest in sticking to a strategy to make their data available Shareable (non-personal data and non-confidential data) for the purpose of reuse but also for AI learning. In fact, choosing not to use external data significantly reduces the richness of its analytics, a situation that can lead to poor decisions regarding business, research and development, or customer relationships. On the distribution side, depriving the market of its sharable data and therefore knowledge and objectivity does not contribute to the creation of trusted AI. Make all of your data available Shareable So it became an economic and competitive issue for the entire French business system. However, the open data strategy is not without impact on corporate information systems which must be able to protect sensitive or confidential private data and anonymize the data. Shareable To produce open data while respecting the legal data protection framework.

Today we are seeing a rise in synthetic data production. Created using artificial intelligence algorithms and from original public or anonymized data, this synthetic data has the same characteristics as the original data. The acceleration of the development of artificial intelligence is leading to the growth of the production of this synthetic data, an activity that has become a specialty in its own right with dedicated professions.

If some companies continue to show some concerns about AI, employees will be at great risk, using ChatGPT in their private lives, to introduce AI into their professional activities. A situation that would be reminiscent of the case of BYOD or shadow IT in 2010. Also, companies are also interested in increasing the amount of data available in order to represent it in the information produced by AI systems. This strategy will allow them to use trusted AI while in the digital information landscape.

Related News

Leave a Comment