Network analysis and data mining in food science: the emergence of computational gastronomy
© Ahnert; licensee BioMed Central Ltd. 2013
Received: 7 November 2012
Accepted: 8 November 2012
Published: 9 January 2013
The rapidly growing body of publicly available data on food chemistry and food usage can be analysed using data mining and network analysis methods. Here we discuss how these approaches can yield new insights both into the sensory perception of food and the anthropology of culinary practice. We also show that this development is part of a larger trend. Over the past two decades large-scale data analysis has revolutionized the biological sciences, which have experienced an explosion of experimental data as a result of the advent of high-throughput technology. Large datasets are also changing research methodologies in the social sciences due to the data generated by mobile communication technology and online social networks. Even the arts and humanities are seeing the establishment of ‘digital humanities’ research centres in order to cope with the increasing digitization of literary and historical sources. We argue that food science is likely to be one of the next beneficiaries of large-scale data analysis, perhaps resulting in fields such as ‘computational gastronomy’.
Large-scale data analysis
The past two decades have seen the advent of high-throughput technologies in biology, making it possible to sequence genomes cheaply and quickly, to measure gene expression for thousands of genes in parallel, and to test large numbers of potential regulatory interactions between genes in a single experiment. The large amounts of data created by these technologies have given rise to entire new research areas in biology, such as computational biology and systems biology. The latter, which attempts to understand biological processes at a ‘systems’ level, is particularly indicative of the potential advantage that large datasets and their analysis can offer to biology, and to other fields of research. This advantage is a ‘birds-eye’ perspective, which, with the right kind of analysis, can complement the more established research methods that take place ‘on the ground’ and investigate the system in much more detail. An example would be the analysis of high-throughput gene expression data of tumour tissues in order to highlight a set of potential candidate genes that may play a role in causing a particular cancer. These candidates would then be investigated one by one, for instance by creating mutant organisms in which one of these genes is deactivated.
Similar large-scale data analysis methods have more recently arrived in the social sciences as a result of rapidly growing mobile communications networks and online social networking sites. Here too data analysis offers a birds-eye perspective of large social networks and the opportunity to study social dynamics and human mobility on an unprecedented scale. The most recent research areas to be transformed by information technology are the Arts and Humanities, which have witnessed the emergence of ‘digital humanities’. As more and more literary and historical documents are digitized, it becomes possible to uncover fundamental relationships that underlie large corpora of literary texts, or long-term historical and political developments. A striking example is the discovery by Lieberman et al.  that the regularisation of verbs across 12 centuries of English is governed by a simple quantitative relationship between the frequency of verb usage and the speed at which it is regularised.
Network analysis of flavour compounds
The chef Heston Blumenthal, together with flavour scientists, has suggested that two foods that share chemical flavour compounds are more likely to taste good in combination . By comparing the network of ingredients to a body of 56,498 online recipes, downloaded from epicurious.com, allrecipes.com, and menupan.com, we were able to show that this hypothesis is confirmed in most Western cuisines, but not in Eastern ones. This result indicates that shared compounds may offer one of several possible mechanisms that can make two ingredients compatible.
Our network of ingredients and flavour compounds is just a first step towards a true network of shared flavour compound perception, which would have to include compound concentrations  and detection thresholds  in order to further investigate the shared compound hypothesis. Its most important purpose is to open up a new way in which data analysis can aid sensory science and the study of culinary practice.
In a broader development the increasing availability of data on food usage, food chemistry and sensory biology is likely to result in the establishment of new research disciplines, such as ‘computational gastronomy’.
SEA is supported by the Royal Society, UK.
- Lieberman E, Michel JB, Jackson J, Tang T, Nowak MA: Quantifying the evolutionary dynamics of language. Nature. 2007, 449: 713-716. 10.1038/nature06137.PubMed CentralView ArticlePubMedGoogle Scholar
- Watts D, Strogatz S: Collective dynamics of “small-world” networks. Nature. 1998, 393: 440-442. 10.1038/30918.View ArticlePubMedGoogle Scholar
- Barabási AL, Albert R: Emergence of scaling in random networks. Science. 1999, 286: 509-512. 10.1126/science.286.5439.509.View ArticlePubMedGoogle Scholar
- Ahn YY, Ahnert SE, Bagrow JP, Barabási AL: Flavor network and the principles of food pairing. Sci Rep. 2011, 1: 196-PubMed CentralView ArticlePubMedGoogle Scholar
- Burdock GA: Fenaroli’s Handbook of Flavor Ingredients. 2004, CRC Press, Boca Raton, 5View ArticleGoogle Scholar
- Serrano MA, Boguña M, Vespignani A: Extracting the multiscale backbone of complex weighted networks. Proc Natl Acad Sci USA. 2009, 106: 6483-6488. 10.1073/pnas.0808904106.PubMed CentralView ArticlePubMedGoogle Scholar
- Blumenthal H: The Big Fat Duck Cookbook. 2008, Bloomsbury, LondonGoogle Scholar
- Nijssen LM, van Ingen-Visscher CA, Donders JJH: VCF Volatile Compounds in Food: database. – Version 13.2 – Zeist. 1963–2012, TNO Triskelion, The NetherlandsGoogle Scholar
- van Gemert LJ: Flavour threshold values in water and other media. 2011, Leffingwell, Canton, GA, 2Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.