It is a question that more and more readers of scientific articles are asking themselves. Long language models (LLMs) are today more than sufficient to help write a scientific article. They can bring dense scientific prose to life and speed up the writing process, especially for non-native English speakers. This use also carries risks: LLMs are particularly susceptible to reproducing biases, for example, and can produce huge amounts of plausible nonsense. However, it is not clear how widespread this problem has become.
In a paper recently published on arXiv, researchers from the University of Tübingen in Germany and Northwestern University in the United States provide some clarification. Their research, which has not yet been peer-reviewed, suggests that at least one in ten new scientific papers contains material produced by an LLM. That means that more than 100,000 such papers will be published this year. And that is a lower limit. In some fields, such as computer science, it is estimated that more than 20% of research abstracts contain text generated by an LLM. Among papers by Chinese computer scientists, the figure is one in three.
Detecting LLM-generated texts is not easy. Researchers have typically turned to one of two methods: detection algorithms trained to identify the telltale rhythms of human prose, and a more direct search for suspect words that LLMs disproportionately favor, such as “pivotal” or “realm.” Both approaches rely on “ground truth” data—a stack of texts written by humans and another written by machines. These are surprisingly difficult to collect: both human- and machine-generated texts change over time, as languages evolve and models are updated. Moreover, researchers typically collect LLM texts by prompting these models themselves, and the way they do this can be different from the way scientists behave.
The latest research by Dmitry Kobak of the University of Tübingen and colleagues shows a third way, one that bypasses the need for ground-truth data altogether. The team’s method is inspired by demographic work on excess deaths, which allows one to determine mortality associated with an event by looking at differences between expected and observed death counts. Just as the excess deaths method looks for abnormal mortality rates, their excess vocabulary method looks for abnormal word usage. Specifically, the researchers looked for words that appear in scientific abstracts significantly more frequently than expected from the existing literature (see figure 1). The corpus they chose to analyze consisted of the abstracts of virtually all English-language articles available on PubMed, a biomedical research search engine, published between January 2010 and March 2024—about 14.2 million in total.
The researchers found that in most years, word usage remained relatively stable: In no year between 2013 and 2019 did a word increase in frequency beyond what was expected by more than 1%. That changed in 2020, when the words “SARS,” “coronavirus,” “pandemic,” “disease,” “patients,” and “severe” all skyrocketed. (COVID-related words continued to command abnormally high usage into 2022.)
In early 2024, about a year after master’s programs like ChatGPT became widely available, a different set of words became popular. Of the 774 words whose usage increased significantly between 2013 and 2024, 329 became popular in the first three months of 2024. Of these, 280 were related to style, rather than topic. Notable examples include: “delve deeper,” “potential,” “intricate,” “meticulously,” “crucial,” “significant,” and “perspectives” (see chart 2).
According to the researchers, the most likely reason for these increases is the help of LLMs. When they calculated the proportion of abstracts that used at least one of the leftover words (omitting words that are widely used anyway), they found that at least 10% were likely to be helped by LLMs. Since PubMed indexes about 1.5 million articles a year, that would mean that more than 150,000 articles a year are currently written with the help of LLMs.
This seems to be more widespread in some fields than others. The researchers found that computer science was the most heavily used discipline, at over 20%, while ecology was the least used, with a lower limit of just under 5%. There was also variation by geography: scientists in Taiwan, South Korea, Indonesia, and China were the most frequent users, and those in Britain and New Zealand used them the least (see figure 3). (Researchers in other English-speaking countries also used LLMs infrequently.) Different journals also yielded different results. Journals in the Nature family, as well as other prestigious publications such as Science and Cell, appear to have a low rate of LLM attendance (below 10%), while Sensors (a journal about sensors, not very imaginatively) exceeded 24%.
The results of the over-vocabulary method are roughly consistent with those of older detection algorithms, which analyzed smaller samples from more limited sources. For example, in a preprint published in April 2024, a Stanford team found that 17.5% of sentences in computer science abstracts were likely generated by LLMs. They also found a lower prevalence in Nature publications and math papers (LLMs are terrible at math). The identified over-vocabulary also fits with existing lists of suspect words.
These results should not come as too much of a surprise. Researchers routinely acknowledge the use of LLMs to write papers. In a survey of 1,600 researchers in September 2023, more than 25% told Nature that they used LLMs to write manuscripts. The biggest benefit identified by respondents, many of whom studied or used AI in their own work, was help with editing and translation for those whose first language was not English. Faster and easier coding came in second, along with streamlining administrative tasks; the ability to summarize or track scientific literature; and, tellingly, speeding up the writing of research manuscripts.
Despite all these benefits, using LLMs to write manuscripts is not without risks. Scientific papers depend, for example, on the accurate communication of uncertainty, which is an area where LLMs’ capabilities remain murky. Hallucination (whereby LLMs confidently assert their fantasies) remains common, as does the tendency to regurgitate other people’s words, verbatim and without attribution.
Studies also indicate that LLMs preferentially cite other papers that are highly cited in a field, potentially reinforcing existing biases and limiting creativity. As algorithms, they also cannot be credited as authors of a paper or take responsibility for errors they introduce. Perhaps most worryingly, the speed at which LLMs can produce prose risks flooding the scientific world with low-quality publications.
Academic policies on the use of LLMs are changing. Some journals ban them outright, while others have changed their minds. Until November 2023, Science labelled all LLM texts as plagiarism, saying: “Ultimately, the product must come from, and be expressed by, the wonderful computers in our heads.” They have since modified their policy: LLM texts are now permitted if detailed notes on how they were used are provided in the methods section of papers as well as in accompanying cover letters. Nature and Cell also allow their use, provided it is clearly acknowledged.
It is unclear to what extent these policies will be enforceable. For now, there is no reliable method for detecting LLM student prose. Even the over-vocabulary method, while useful for detecting large-scale trends, cannot determine whether a specific abstract had input from LLM students. And researchers need only avoid certain words to evade detection entirely. As the new draft puts it, these are challenges that must be meticulously addressed.
© 2024, The Economist Newspaper Limited. All rights reserved. From The Economist, published under license. The original content can be found at www.economist.com
Disclaimer:
The information contained in this post is for general information purposes only. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services, or related graphics contained on the post for any purpose.
We respect the intellectual property rights of content creators. If you are the owner of any material featured on our website and have concerns about its use, please contact us. We are committed to addressing any copyright issues promptly and will remove any material within 2 days of receiving a request from the rightful owner.