• Reduce text

    Reduce text
  • Restore text size

    Restore text size
  • Increase the text

    Increase the text
  • Print

    Print

Analysing publications to reveal research strategies

Thanks to powerful and innovative tools, comparative analyses of publications put out by INRA and the US Department of Agriculture (USDA) over a period of 40 years is shedding light on the history and functioning of these two giants of the world of agriculture.

Présentation de CORTEX par Guy Riba Analyse de texte des publications Inra et USDA pendant la période 1976-2016 avec le logiciel CORTEX et la technologie WILDER.. © INRA, Bertrand Nicolas
By Pascale Mollier translated by Inge Laino
Updated on 08/16/2016
Published on 06/22/2016

Everyone is familiar with the traces animals leave in their trail. But scientific publications also leave traces… New software called CorTexT has analysed some 230,000 publications put out over a period of 40 years by INRA and the USDA. These analyses allow science to compare the traces of these two giants of agricultural research: main concerns, research strategies, but also history and governance.

Giving a voice to publications

From 1976 to 2016, the USDA published close to 160,000 scientific articles; INRA’s count hovers around 70,000. In the field of agricultural research, the two bodies are the world’s top two in terms of the number of publications, but also in terms of how often they are cited.

The CorTexT software extracts the most cited terms from this scientific literature and the frequency of their co-citations. The groups of words obtained, called “clusters”, represent key topics in research. Very high-resolution images of these clusters, made possible by innovative technology, allow for a detailed comparison of the top research priorities of INRA and the USDA.

Overlap of research topics between INRA and USDA

When the top 500 most-cited words are selected, more than 95% of them appear for both INRA and USDA. They are even distributed over eleven analogous clusters.

Nevertheless, there are differences within the word clusters themselves. For example, the cluster “Pathogens and diseases” groups together words related to animal health and food safety for the USDA, while the same cluster contains words linked to plant health for INRA.

Comparing the robustness of topics

One way to test the robustness of a topic is to vary the frequency threshold for selected words. For example, if the top 200 most common words are selected, there will be fewer clusters than for the top 500. As the threshold is lowered, only the most robust clusters will continue to appear. And there are differences between which terms are robust in INRA literature and which are robust in USDA publications. For instance, when it comes to water, the term “water stress” is robust for INRA, related as it is to a whole slew of words having to with plants’ interactions with water. For the USDA, it’s the cluster “ground water” - that is, the study of runoff, drainage, seepage, water reserves, dams, etc. - that is very robust, while the term does not even appear in INRA literature.  

Changing trends

Lastly, an analysis of trends in clusters over the past 40 years is very revealing. For example, at INRA, the topic “cattle” split into two clusters - “meat” and “milk, genetics” - in the 1980s. In 1990, these two clusters spawned a further four, for a total of six clusters linked to specialisations in the field (genetics, functional genomics, micro-biology, etc.). In 2000, a “prairies and environment” cluster appeared, while the genomics and cheese productions clusters waned.

In the 1980s, agricultural yield was the top priority for the USDA, as it was in France. But the issue of pastureland quickly arose, especially in the USA due to a lack thereof. In the 1990s, terms related to food safety start to appear, while the gap between pastureland and drought starts to close.

Similarities and differences in the history of INRA and USDA

“Text mining” is like “storytelling” says Guy Riba (1), who ran the study. By looking at the different word clusters, the histories and functioning of INRA and the USDA come into focus. The USDA takes an interest in applied research (labour, local water management, etc.), and relies on basic knowledge developed by American universities.  INRA, on the other hand, has the monopoly on agricultural research in France, and develops approaches to basic research (genetics, physiology, pathology, etc.). This difference of missions is reflected in the governance of the two bodies: the USDA is under the sole aegis of the US ministry of agriculture, while INRA falls under two ministries: that of agriculture and that of research. The way the two bodies go about their respective missions also differs. The USDA focuses on local solutions with local players and relevant universities, while INRA tackles issues in the most competent regional centres, regardless of where in France the issue at hand stems from.

To sum up, the profiles of each institute in broad strokes would look like this: the USDA has an applied, locally autonomous vocation, while INRA has an applied and basic vocation that falls within a policy of national research with integrated, long-term programmes.

Ultimately, although bibliometrics has long been a tool for monitoring scientific trends, the tools used in this study stand out for their exceptional analysis power.  They herald a technological breakthrough that will lead to bigger and better things, allowing science to gather more and more strategic information for analysing literature. 

 

        (1) Guy Riba was the head of INRA’s Zoology division, then scientific director of the “Plants and Plant Products” division, and finally Deputy Director General.

Cortex - approche manuelle.
L'analyse des clusters sur le papier est laborieuse et imprécise.. © INRA, Bertrand Nicolas

Performance and innovation multiplied by the power of two

CorTexT is an INRA software programme that creates networks of terms based on how frequently they appear and co-occurrences in titles and article abstracts. These word networks appear in the form of clusters. In its condensed form, it creates a sort of space map where each “planet” is a thematic cluster. See the slideshow.  

WILDER is INRA technology that creates high-resolution images from the results obtained with CorTexT. The images are projected onto a 12.2m wall made up of 75 basic screens measuring 40cm by 40cm.

“Only the combination of textual data (CorTexT) and its translation into high-definition images (WILDER) allows for such in-depth analysis”, explains Guy Riba, who ran the study. “If you want to compare two clusters that each contain more than 1,000 words, it’s impossible to do on paper. On screen, however, it is possible, because the resolution is strong enough to see all the words”.