高雄市生物科技發展協會|http://www.khba.org.tw
會員登入
記住帳號 自動登入
會員名錄
各式辦法
下載專區
留言板
您目前的位置:首頁 / 活動與新訊
AI to publish knowledge: a tectonic shift
活動日期:2024.04.17
2024.04.17  

AI to publish knowledge: a tectonic shift

https://www.embopress.org/doi/full/10.1038/s44319-024-00119-4

Thomas Lemberger https://orcid.org/0000-0002-2499-4025 thomas.lemberger@embo.orgAUTHOR INFORMATION

EMBO Rep (2024) 25: 1687 - 1689

https://doi.org/10.1038/s44319-024-00119-4

The rise of generative AI will transform scientific publishing but it also poses risks. While AI enables the dissemination of knowledge in computable form, preserving transparency and human values will become even more important for maintaining trust in the scientific discourse.

Preserving the chain of trust

In 2023, we witnessed a pivotal moment: conversations between humans and machines became commonplace after the launch of ChatGPT. It reflects an era of rapid innovation in AI with papers such as the seminal “Attention is All You Need” (Vaswani et al, 2017) that garnered more than a hundred thousand citations. This was also evident in the biomedical literature, where mentions of ‘generative AI’ and ‘large language models’ in PubMed abstracts surged from a few in 2022 to thousands by 2023. The increase in dataset size, model complexity and computing power has enabled the creation of versatile ‘foundation models’ that can perform tasks they were not specifically trained for: so-called “zero-shot generalization”. This approach was crucial for the success of generative applications including large language models as well as many non-generative uses, such as advanced computer vision.

Generative models, by design, produce synthetic data that appear realistic, such as images of human faces or fictional stories. Depending on the context, this is either a fatal flaw when models generate ‘confabulations’—also referred to as 'hallucinations' (Smith et al, 2023)—or a powerful capability, for example when designing novel molecular structures. Progress is gradually addressing some of these limitations: see, for example, metrics on ‘hallucinations’ and ‘factual consistency’ at https://github.com/vectara/hallucination-leaderboard. Notably, generative models are increasingly integrated into broader systems where AI agents consult external databases before returning a response (Gao et al, 2024), search the internet, and autonomously cooperate with other models, thereby enhancing their power and robustness (Durante et al, 2024).

While this technological revolution unfolds, it is already clear that it will profoundly impact science and scientific publishing. AI touches upon each stage of what we refer to as the ‘chain of trust’: authors are entrusted with reporting genuine results; editors with selecting knowledgeable and fair reviewers; reviewers with providing competent and constructive evaluations; and publishers with disseminating papers efficiently and responsibly. Each of these stages is poised to be significantly influenced by AI in positive or concerning ways.

Assistive writing

The ability of AI models to generate content—texts and figures—that is seemingly accurate but nonetheless fabricated or even incorrect increases the risk of polluting the scientific record if authors employ such tools without due vigilance. More concerningly though, AI can be misused to deliberately create fake papers, amplifying the threat of papermills and the spread of misinformation. Whether publishers can rely on advances in forensics technology—ironically also AI-powered—to detect synthetic content remains an open question. We may enter an arms race in fraud sophistication and fraud detection with no winner in sight. ‘Watermarking’ AI-generated data or experimental raw data could be a solution, but it remains to be seen whether industry-wide standards can be adopted, let alone whether tamper-proof watermarking is even possible (Fernandez et al, 2023; Zhang et al, 2023). Ultimately, it requires human-based verification processes and a reinvigorated culture of individual responsibility through training and clear policies.

While AI poses challenges, it also levels the playing field in scientific writing. Performant AI assistants can help non-native speakers or those of us who suffer from the blank-page syndrome. Brainstorming, editing, and developing a narrative through interactions with colleagues is integral to writing a manuscript. It can now include discussions with infinitely patient machines. However, irrespective of whether interactions involve humans or machines, authors must remain fully responsible for their work.

AI can also help to generate figures from data and to link results to methods, protocols, and computer code, which provides a unique opportunity for open science. Proof of principle demonstrations of end-to-end automated pipelines that draft an entire manuscript from data already exist. This includes the autonomous generation of a research project based on data, iterative cycles of data analysis, interpretation, and self-criticism, and the drafting of reports with traceable code and methods (Roy Kishony, personal communication). These radically new approaches open new opportunities to share research findings in a faster and more modular way that will unlock a broader range of credit mechanisms to reward data production and quality assessment. AI is thus set to strengthen the trustworthiness of published papers by facilitating greater transparency, reproducibility, and the adoption of open-science practices.

Even though AI will alter the way authors compile data and rapidly communicate new results, we anticipate that many researchers will still want to present a ‘story’ that conveys their insights and interpretations of their findings. The value of a narrative to communicate important science is here to stay.

Synergies in evaluation

The evaluation and quality assessment of research manuscripts are at the heart of scientific publishing. Despite the impressive capabilities of current AI models, their ability to fulfill these tasks still fall short of human reasoning abilities, knowledge, and experience. While this may change in the future, in-depth peer review remains for now a human domain. Nonetheless, text, image, and data comprehension by AI models have sufficiently advanced to the point that computer assistance can extend beyond mere data-processing tasks. As such, AI can potentially assist in performing systematic technical checks at scale and low cost to ease the burden on editorial teams and reviewers. Instead of replacing humans in the peer-review and editorial process, we foresee synergies between humans and machines that will lead to more robust and trustworthy reviews.

A key concern with involving, even if peripherally, AI models in making decisions is their potential to inherit and propagate biases from their training datasets. Nonetheless, humans are similarly prone to biases and systematically assessing if and how those biases might influence their decisions is challenging given that we are dealing with human psychology. In contrast, AI models can be quantitatively benchmarked on testing datasets, a practice that has now become standard when reporting model performance through ‘model cards’ to reveal, measure, and correct for inherent biases.

Redefining content as computable knowledge

As we consider the ‘chain of trust’ in scientific publishing, one of the most significant transformations brought about by AI may in fact occur at the chain’s end: the dissemination of knowledge. How will researchers access knowledge in the future (Metzler et al, 2021)? Will they still read papers or will they ask an AI to compile the relevant literature on the fly?

A key aspect of AI models is their ability to create high-dimensional representations of the data they process—‘embeddings’ or vector representations—be it language, images, sound, or other data types. Advanced models are even trained across multiple modalities, such as images paired with text, sounds, or experimental data (Girdhar et al, 2022, Xie et al, 2023). Crucially, these representations are computable and can serve as the basis for classification, generating new content, reasoning, or summarization by AI. In essence, AI models are encoding the semantics of human-readable content into a computationally amenable form, thereby narrowing the divide between human- versus machine-readable content.

The full impact of converting the traditional ‘content’ of research papers into computable knowledge in terms of how scientists will access published research remains to be seen. However, early indications of change are already visible. Software engineers and computational scientists for instance, increasingly consult with AI assistants to access technical information for problem-solving, debugging, and tailored code examples rather than navigating through extensive documentation. It is plausible that similar behavioral shifts will occur in biomedical research. Such changes could significantly alter publishers’ business models from maintaining elaborate websites to providing computable, verified knowledge.

Towards an AI-powered human-centric publishing environment

The rise of generative AI marks a ‘tectonic shift’ in scientific publishing. While it opens exciting new possibilities for writers, editors, reviewers, and readers, AI-generated deepfakes remain a major risk that could erode the essence of truth and threaten rationality and science itself. This dichotomy cannot be ignored, and the risks are exacerbated when misuse intersects with inadequate education and a lack of critical thinking that together foster the spread of conspiracy theories or beliefs in alternate ‘metaversical’ worlds. As we build AI-powered science communication, we must therefore preserve the integrity of the publishing process at every step. This means upholding crucial human and scientific values through training, education, and emphasizing the value of human interactions.

We believe that trust, transparency and fostering communities with shared values and interests will be central to preserving scientific discourse and knowledge dissemination. Scientific journals that represent human communities and facilitate human-human as well as human-machine interactions while upholding rigorous quality standards will become instrumental in this endeavor. On this perilous journey, we may remember the words of Fei-Fei Li, a visionary pioneer in AI: “If AI was going to help people, our thinking had to begin with the people themselves”.

Disclosure and competing interests statement

TL is Deputy Head of Scientific Publications and Head of Open Science Implementation at EMBO.

Acknowledgements

We thank Holger Breithaupt and Bernd Pulverer for critical reading of the manuscript. ChatGPT4, Claude2.1 and Grammarly were used for drafting and editing.

References

Durante Z, Sarkar B, Gong R, Taori R, Noda Y, Tang P, Adeli E, Lakshmikanth S, Schulman K, Milstein A, Terzopoulos D, Famoti A, Kuno N, Llorens A, Vo H, Ikeuchi K, Fei-Fei L, Gao J, Wake N, Huang Q (2024) An interactive agent foundation model. Preprint at

Go to Citation

Crossref

Google Scholar

Fernandez P, Couairon G, Jégou H, Douze M, Furon T (2023) The stable signature: rooting watermarks in latent diffusion models. International Conference on Computer Vision (ICCV).

Go to Citation

Crossref

Google Scholar

Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, Dai Y, Sun J, Guo Q, Wang M, Wang H (2024) Retrieval-augmented generation for large language models: a survey. Preprint at

Go to Citation

Crossref

Google Scholar

Girdhar R, El-Nouby A, Liu Z, Singh M, Alwala KV, Joulin A, Misra I (2022) ImageBind: one embedding space to bind them all. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 15180–15190.

共有295筆資料 頁數: 第5頁(共15頁)
編號 標題 新增日期
1 Novartis drug cut death risk by 35 pct in gene mutation br.. 2018.11.02
2 Novartis, Pfizer join forces on potentially lucrative fatt.. 2018.10.31
3 Nicotine’s Effects Passed On Through Generations of Mice 2018.10.19
4 【諾貝爾獎 2018】醫學獎出爐,癌症免疫療法 2 學者獲得殊榮 2018.10.04
5 不必越洋求醫 6細胞療法明上路 2018.09.06
6 How to Successfully Collaborate with Industry 2018.09.04
7 Eisai And Merck Announce FDA Approval Of LENVIMA® (le.. 2018.08.17
8 癌炎分不清 日找出自體免疫胰臟炎病因 2018.08.10
9 恐危害大腦!美研究:食用益生菌小心D乳酸中毒 2018.08.09
10 抗憂鬱藥污染環境 打亂鳴禽求偶 2018.08.09
11 產子早夭 荷蘭威而鋼試驗喊卡 2018.07.30
12 Sarepta’s Duchenne gene therapy shows promise in small st.. 2018.07.23
13 抽驗市售牛樟精油 全含禁用黃樟素 2018.07.19
14 免費講座活動: 【講題】電子商務E點通,迎向網路新經濟 2018.07.18
15 免疫力強化有解 抗老又勇健 2018.07.17
16 新南向再報捷 我與印度合推傳統醫學 2018.07.09
17 There’s no limit to longevity, says study that revives hu.. 2018.07.06
18 Cardiac Cell Transplants Help Monkeys’ Hearts 2018.07.04
19 Sarepta Shares Skyrocket on Early Results for DMD Gene The.. 2018.06.26
20 Takeda to Buy Cancer Drug Maker Ariad for $5.2 Billion 2017.01.09
上一頁  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15  下一頁
版權所有©2006 高雄市生物科技發展協會 所有文字、資料禁止轉用
地址:高雄市中正一路120號14樓之3 TEL:(07)591-9569 / FAX:(07)591-9018 / e-mail: khba.tw@gmail.com
累積進站人數:2494770