Does ChatGPT understand climate change correctly?

Berlin researchers are investigating how reliably ChatGPT provides scientifically sound information on climate change. They found that the AI usually provides correct answers, but that it should never be trusted blindly. Checking sources is more important than ever - but anything but easy.

ChatGPT and other large language models based on machine learning and large data sets are penetrating almost all areas of society. Companies or researchers who do not make use of them are increasingly seen as anachronistic. But is the information provided by artificial intelligence reliable enough? Scientists at the Technical University of Berlin have tested this by looking at climate change. They asked ChatGPT questions on the subject and examined the answers for their accuracy, relevance and possible errors and contradictions.

Its impressive capabilities made ChatGPT a potential source on many different topics, writes the Berlin team in the paper published in "Ökologisches Wirtschaften". However, not even the developers themselves could explain how a particular answer came about. This may still be okay for creative tasks, such as writing a poem. However, for topics such as the consequences of climate change, where accurate, fact-based information is important, this is a problem.

It is therefore important to investigate the quality of the answers that ChatGPT provides in such subject areas, according to the researchers. Among other things, it is important to separate misinformation in the public debate and media from scientifically sound findings.

Hallucinations and pointless assumptions

This is not easy. To make matters worse, the AI can "hallucinate". This means that ChatGPT makes factual claims that cannot be substantiated by any sources. In addition, the language model tends to "make meaningless assumptions instead of rejecting unanswerable questions", according to the TU team.

The great danger is that ChatGPT users take incorrect or false answers at face value, as they are formulated plausibly and semantically correctly. Previous research had shown that people weighted the AI's advice more highly if they were unfamiliar with the topic being discussed, had used ChatGPT before and had received accurate advice from the model, the researchers write.

The Berlin team is particularly interested in the topic as it is developing an AI-supported assistant with the Green Consumption Assistant research project, which supports consumers in making more sustainable purchasing decisions online. Previous research had only highlighted the possibilities of ChatGPT, but did not reflect its ability to answer questions about climate change, the researchers write.

To clarify this, they asked ChatGPT a total of 95 questions. They evaluated the answers in terms of accuracy, relevance and consistency. The team checked the quality of the answers using public and reliable sources of information on climate change, such as the latest report from the Intergovernmental Panel on Climate Change (IPCC).

Mostly high-quality answers

The researchers took into account the fact that the language model is constantly being further developed. For example, they checked whether an input (prompt) delivered different results at different times. The first round took place last February with ChatGPT-3.5, while the second set of questions was carried out in mid-May this year with the subsequent version of the model. Recently, its knowledge base received an update and now extends to April 2023. Previously, the model only had information up to September 2021.

The results could therefore be different today. For follow-up studies, the researchers suggest more rounds of questions at shorter intervals. The researchers see further limitations to their work in the possibly insufficient number of experts to evaluate the answers. In addition, the questions and their wording were not based on current user data. People today could ask ChatGPT different questions, formulated in a different way, which would produce different results.

The research work that has now been published has shown that the quality of the model's answers is generally high. On average, it was rated 8.25 out of 10 points. "We observed that ChatGPT provides balanced and nuanced arguments and concludes many answers with a comment that encourages critical examination to avoid biased answers," says Maike Gossen from TU Berlin. For example, ChatGPT's answer to the question "How is marine life affected by climate change and how can negative impacts be reduced?" mentioned not only the reduction of greenhouse gas emissions - but also the?

Reducing non-climatic impacts of human activities such as overfishing and pollution.

Relevant error rate

The accuracy of more than half of the answers was rated as high as 10 points. However, one should not rely on the results always being this high. After all, 6.25 percent of the answers scored no more than 3 points for accuracy and 10 percent scored no higher than 3 for relevance.

Among the questions that were answered inaccurately, the most common error was caused by hallucinations of facts. For example, ChatGPT's answer to the question "What percentage of recyclable waste is actually recycled by Germany?" was correct in broad strokes, but not in detail. According to the Federal Environment Agency, it was 67.4 percent in 2020, while ChatGPT stated 63 percent.

ChatGPT invents, but appears credible

In some cases, ChatGPT generated false or falsified information such as invented references or fake links, including to alleged articles and contributions in scientific publications. Other errors occurred in cases where ChatGPT provided concrete and correct scientific sources or literature, but drew false conclusions from them.

The researchers also observed that inaccurate answers were formulated so plausibly by ChatGPT that they were falsely perceived as correct. "Since text generators like ChatGPT are trained to give answers that sound right to people, the confident answering style can mislead people into believing that the answer is correct," says Maike Gossen.

The team also came across misinformation in social discourse and prejudices. For example, some of the incorrect answers from ChatGPT reflected misunderstandings about effective measures against climate change. These include the overvaluation of individual behavioral changes, but also individual measures with little impact that slow down structural and collective changes with greater impact. At times, responses also seemed overly optimistic about technological solutions as a key way to mitigate climate change.

Valuable but fallible source

Large language models such as ChatGPT could be a valuable source of information on climate change, the researchers conclude. However, there is a risk that they spread and promote false information about climate change because they reflect outdated facts and misunderstandings.

Their short study shows that checking sources of environmental and climate information is more important than ever. However, recognizing false answers often requires detailed expertise in the respective subject area, precisely because they appear plausible at first glance.

In light of the study on ChatGPT's accuracy in providing climate change information, it's crucial for users to verify the sources of information they obtain from AI models like ChatGPT. This becomes particularly important in the context of the Green Consumption Assistant project, which aims to provide sustainable purchasing decisions support to consumers online.

The study found that while ChatGPT generally provides high-quality answers on climate change, there is a relevant error rate. Misinformation, hallucinations, and false conclusions are possible issues with the AI's responses, underscoring the importance of source verification. In some cases, ChatGPT's answers are formulated so plausibly that users may mistakenly believe them to be accurate, emphasizing the need for critical thinking when using AI as a source.

Source: www.ntv.de