Top Research Papers on Data Science
Uncover influential research that defines the field of Data Science. Our curated list includes innovative studies that push the boundaries of data analysis, machine learning, and predictive modeling. Whether you're an academic, a professional, or an enthusiast, these papers offer valuable insights and advancements in the ever-evolving domain of Data Science.
Looking for research-backed answers?Try AI Search
Foundations of Data Science
308 Citations 2020Avrim Blum, John E. Hopcroft, Ravindran Kannan
Cambridge University Press eBooks
Computer science as an academic discipline began in the 1960’s with emphasis on programming languages, compilers, operating systems, and the mathematical theory that supported these areas, but today, a fundamental change is taking place and the focus is more on applications.
The introduction discusses the idea of data journeys and its characteristics as an investigative tool and theoretical framework for this volume and broader scholarship on data and the significance of this approach towards addressing the challenges raised by data-centric science and the emergence of big and open data.
Veridical data science
154 Citations 2020Bin Yu, Karl Kumbier
Proceedings of the National Academy of Sciences
This work proposes (basic) PCS inference for reliability measures on data results, extending statistical inference to a much broader scope as current data science practice entails, and proposes PCS documentation based on R Markdown or Jupyter Notebook to back up human choices made throughout an analysis.
Spatial Data Science introduces fundamental aspects of spatial data that every data scientist should know before they start working with spatial data. These aspects include how geometries are represented, coordinate reference systems (projections, datums), the fact that the Earth is round and its consequences for analysis, and how attributes of geometries can relate to geometries. In the second part of the book, these concepts are illustrated with data science examples using the R language. In the third part, statistical modelling approaches are demonstrated using real world data examples. Aft...
The twenty-first century has ushered in the age of big data and data economy, in which data DNA, which carries important knowledge, insights and potential, has become an intrinsic constituent of all data-based organisms. An appropriate understanding of data DNA and its organisms relies on the new field of data science and its keystone, analytics. Although it is widely debated whether big data is only hype and buzz, and data science is still in a very early phase, significant challenges and opportunities are emerging or have been inspired by the research, innovation, business, profession, and e...
Data-Driven Science and Engineering
564 Citations 2022Steven L. Brunton, J. Nathan Kutz
Cambridge University Press eBooks
Data-driven discovery is revolutionizing how we model, predict, and control complex systems. Now with Python and MATLAB®, this textbook trains mathematical scientists and engineers for the next generation of scientific discovery by offering a broad overview of the growing intersection of data-driven methods, machine learning, applied optimization, and classical fields of engineering mathematics and mathematical physics. With a focus on integrating dynamical systems modeling and control with modern methods in applied machine learning, this text includes methods that were chosen for their releva...
Statistical Foundations of Data Science
186 Citations 2020Jianqing Fan, Runze Li, Cun‐Hui Zhang + 1 more
journal unavailable
Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data an...
A Survey on Data Pricing: From Economics to Data Science
108 Citations 2020Jian Pei
IEEE Transactions on Knowledge and Data Engineering
A unified, interdisciplinary and comprehensive overview of the economics of data pricing is presented and the development and evolution of pricing models according to a series of fundamental principles are reviewed.
The Critical Importance of Citizen Science Data
159 Citations 2021Alex de Sherbinin, Anne Bowser, Tyng–Ruey Chuang + 10 more
Frontiers in Climate
Citizen science is an important vehicle for democratizing science and promoting the goal of universal and equitable access to scientific data and information. Data generated by citizen science groups have become an increasingly important source for scientists, applied users and those pursuing the 2030 Agenda for Sustainable Development. Citizen science data are used extensively in studies of biodiversity and pollution; crowdsourced data are being used by UN operational agencies for humanitarian activities; and citizen scientists are providing data relevant to monitoring the sustainable develop...
From Open Data to Open Science
118 Citations 2021Rahul Ramachandran, Kaylin Bugbee, Kevin Murphy
Earth and Space Science
This study defines open science as a collaborative culture enabled by technology that empowers the open sharing of data, information, and knowledge within the scientific community and the wider public to accelerate scientific research and understanding.
Data science applications for predictive maintenance and materials science in context to Industry 4.0
116 Citations 2021S. M. Sajid, Abid Haleem, Shashi Bahl + 3 more
Materials Today Proceedings
Five critical processes of data scientists for predictive maintenance are identified and discussed briefly through a literature review and identified as essential for Industry 4.0.
Ridge Regularization: An Essential Concept in Data Science
103 Citations 2020Trevor Hastie
Technometrics
Some of the magic and beauty of ridge that my colleagues and I have encountered over the past 40 years in applied statistics are collected together.
Data Science Meets Physical Organic Chemistry
102 Citations 2021Jennifer M. Crawford, Cian Kingston, F. Dean Toste + 1 more
Accounts of Chemical Research
This work investigated whether enantioselectivity data from a reaction can be quantitatively connected to the attributes of reaction components, such as catalyst and substrate structural features, to harness data for asymmetric catalyst design and developed a workflow to relate computationally derived features of Reaction components to enantiOSElectivity using data science tools.
Automated Experimentation Powers Data Science in Chemistry
109 Citations 2021Yao Shi, Paloma L. Prieto, Tara Zepel + 2 more
Accounts of Chemical Research
This Account contextualizes the need for more complex and diverse experimental data and highlights how the seamless integration of robotics, machine learning, and data-rich monitoring techniques can be used to access it with minimal human labor.
The R Language: An Engine for Bioinformatics and Data Science
200 Citations 2022Federico M. Giorgi, Carmine Ceraolo, Daniele Mercatelli
Life
An historical chronicle of how R became what it is today is provided, describing all its current features and capabilities, and the role of R in science in general as a driver for reproducibility is discussed.
Big data and machine learning for materials science
154 Citations 2021José F. Rodrigues, Larisa Florea, Maria Cristina Ferreira de Oliveira + 2 more
Discover Materials
A roadmap for future developments with emphasis on computer-aided discovery of new materials and analysis of chemical sensing compounds, both prominent research fields for ML in the context of materials science are proposed.
The Science of Visual Data Communication: What Works
285 Citations 2021Steven Franconeri, Lace Padilla, Priti Shah + 2 more
Gothic.net
Effectively designed data visualizations allow viewers to use their powerful visual systems to understand patterns in data across science, education, health, and public policy. But ineffectively designed visualizations can cause confusion, misunderstanding, or even distrust—especially among viewers with low graphical literacy. We review research-backed guidelines for creating effective and intuitive visualizations oriented toward communicating data to students, coworkers, and the general public. We describe how the visual system can quickly extract broad statistics from a display, whereas poor...
Data-science driven autonomous process optimization
223 Citations 2021Melodie Christensen, Lars P. E. Yunker, Folarin Adedeji + 8 more
Communications Chemistry
A closed-loop system capable of carrying out parallel autonomous process optimization experiments in batch with significantly reduced cycle times is developed, and it is found that the definition of a set of meaningful, broad, and unbiased process parameters was the most critical aspect of a successful optimization.
Network analysis of multivariate data in psychological science
1067 Citations 2021Denny Borsboom, Marie K. Deserno, Mijke Rhemtulla + 12 more
Nature Reviews Methods Primers
In recent years, network analysis has been applied to identify and analyse patterns of statistical association in multivariate psychological data. In these approaches, network nodes represent variables in a data set, and edges represent pairwise conditional associations between variables in the data, while conditioning on the remaining variables. This Primer provides an anatomy of these techniques, describes the current state of the art and discusses open problems. We identify relevant data structures in which network analysis may be applied: cross-sectional data, repeated measures and intensi...
Small data machine learning in materials science
588 Citations 2023Pengcheng Xu, Xiaobo Ji, Minjie Li + 1 more
npj Computational Materials
This review discussed the dilemma of small data faced by materials machine learning, and the methods of dealing with small data, including data extraction from publications, materials database construction, high-throughput computations and experiments from the data source level.
Eleven grand challenges in single-cell data science
1326 Citations 2020David Lähnemann, Johannes Köster, Ewa Szczurek + 48 more
Genome biology
This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years in single-cell data science.
Genetic Algorithms in the Fields of Artificial Intelligence and Data Sciences
254 Citations 2021Ayesha Sohail
Annals of Data Science
The time series forecasting and the Bayesian inference, in combination with the genetic algorithms, can prove to be powerful artificial intelligence tools.
The Neurodata Without Borders ecosystem for neurophysiological data science
133 Citations 2022Oliver Rübel, Andrew Tritt, Ryan Ly + 9 more
eLife
The neurophysiology of cells and tissues are monitored electrophysiologically and optically in diverse experiments and species, ranging from flies to humans. Understanding the brain requires integration of data across this diversity, and thus these data must be findable, accessible, interoperable, and reusable (FAIR). This requires a standard language for data and metadata that can coevolve with neuroscience. We describe design and implementation principles for a language for neurophysiology data. Our open-source software (Neurodata Without Borders, NWB) defines and modularizes the interdepend...
PANGAEA - Data Publisher for Earth & Environmental Science
133 Citations 2023Janine Felden, Lars Möller, Uwe Schindler + 5 more
Scientific Data
The information system PANGAEA provides targeted support for research data management as well as long-term data archiving and publication and an integral component of national and international science and technology activities.
Cost data in implementation science: categories and approaches to costing
147 Citations 2022Heather T. Gold, Cara L. McDermott, Ties Hoomans + 1 more
Implementation Science
This paper explains how implementation researchers might optimize their measurement and inclusion of costs, building on traditional economic evaluations comparing costs and effectiveness of health interventions.
Data Science Methodologies: Current Challenges and Future Approaches
106 Citations 2021Iñigo Martı́nez, Elisabeth Viles, Igor G. Olaizola
Big Data Research
A conceptual framework containing general characteristics that a methodology for managing data science projects with a holistic point of view should have is proposed and can be used by other researchers as a roadmap for the design of new data science methodologies or the updating of existing ones.
Data quantity governance for machine learning in materials science
153 Citations 2023Yue Liu, Zhengwei Yang, Xinxin Zou + 4 more
National Science Review
A synergistic data quantity governance flow with the incorporation of materials domain knowledge is proposed, paving the way for obtaining the required high-quality data to accelerate materials design and discovery based on ML.
Leveraging Data Science to Combat COVID-19: A Comprehensive Review
245 Citations 2020Siddique Latif, Muhammad Usman, Sanaullah Manzoor + 8 more
IEEE Transactions on Artificial Intelligence
This paper attempts to systematise the various COVID-19 research activities leveraging data science, where data science is defined broadly to encompass the various methods and tools that can be used to store, process, and extract insights from data.
Surgical data science – from concepts toward clinical translation
289 Citations 2022Lena Maier‐Hein, Matthias Eisenmann, Duygu Sarıkaya + 48 more
univOAK (4 institutions : Université de Strasbourg, Université de Haute Alsace, INSA Strasbourg, Bibliothèque Nationale et Universitaire de Strasbourg)
Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applications have been studied in the fields of radiological and clinical data science, translational success stories are still lacking in surgery. In this publication, we shed light on the underlying reason...
Extending the Global Mass Change Data Record: GRACE Follow‐On Instrument and Science Data Performance
702 Citations 2020Felix W. Landerer, Frank Flechtner, Himanshu Save + 25 more
Geophysical Research Letters
Abstract Since June, 2018, the Gravity Recovery and Climate Experiment Follow‐On (GRACE‐FO) is extending the 15‐year monthly mass change record of the GRACE mission, which ended in June 2017. The GRACE‐FO instrument and flight system performance has improved over GRACE. Better attitude solutions and enhanced pointing performance result in reduced fuel consumption and gravity range rate post‐fit residuals. One accelerometer requires additional calibrations due to unexpected measurement noise. The GRACE‐FO gravity and mass change fields from June 2018 through December 2019 continue the GRACE rec...
Big Earth Data science: an information framework for a sustainable planet
126 Citations 2020Huadong Guo, Stefano Nativi, Dong Liang + 10 more
International Journal of Digital Earth
The universe of discourse characterizing a new engineering discipline, Big Earth Data science, is introduced, its foundational paradigms and methodologies, and a possible technological framework to be implemented by applying an ecosystem approach are introduced.
Cybersecurity data science: an overview from machine learning perspective
679 Citations 2020Iqbal H. Sarker, A. S. M. Kayes, Shahriar Badsha + 3 more
Journal Of Big Data
The goal is to focus the applicability towards data-driven intelligent decision making for protecting the systems from cyber-attacks by providing a machine learning based multi-layered framework for the purpose of cybersecurity modeling.
The BioImage Archive – Building a Home for Life-Sciences Microscopy Data
129 Citations 2022Matthew Hartley, Gerard J. Kleywegt, Ardan Patwardhan + 3 more
Journal of Molecular Biology
Despite the huge impact of data resources in genomics and structural biology, until now there has been no central archive for biological data for all imaging modalities. The BioImage Archive is a new data resource at the European Bioinformatics Institute (EMBL-EBI) designed to fill this gap. In its initial development BioImage Archive accepts bioimaging data associated with publications, in any format, from any imaging modality from the molecular to the organism scale, excluding medical imaging. The BioImage Archive will ensure reproducibility of published studies that derive results from imag...
2020 ACR Data Science Institute Artificial Intelligence Survey
150 Citations 2021Bibb Allen, Sheela Agarwal, Laura P. Coombs + 2 more
Journal of the American College of Radiology
Information from the survey will help researchers and industry develop AI tools that will enhance radiological practice and improve quality and efficiency in patient care.
Machine Learning Methods for Small Data Challenges in Molecular Science
380 Citations 2023Bozheng Dou, Zailiang Zhu, Ekaterina Merkurjev + 7 more
Chemical Reviews
This review summarizes and analyzes several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences and briefly discusses the latest advances in these methods.
Pragmatic approaches to analyzing qualitative data for implementation science: an introduction
237 Citations 2021Shoba Ramanadhan, Anna Revette, Rebekka M. Lee + 1 more
Implementation Science Communications
Qualitative methods are critical for implementation science as they generate opportunities to examine complexity and include a diversity of perspectives. However, it can be a challenge to identify the approach that will provide the best fit for achieving a given set of practice-driven research needs. After all, implementation scientists must find a balance between speed and rigor, reliance on existing frameworks and new discoveries, and inclusion of insider and outsider perspectives. This paper offers guidance on taking a pragmatic approach to analysis, which entails strategically combining an...
A data-science approach to predict the heat capacity of nanoporous materials
117 Citations 2022Seyed Mohamad Moosavi, Balázs Álmos Novotny, Daniele Ongari + 10 more
Nature Materials
The heat capacity of a material is a fundamental property of great practical importance. For example, in a carbon capture process, the heat required to regenerate a solid sorbent is directly related to the heat capacity of the material. However, for most materials suitable for carbon capture applications, the heat capacity is not known, and thus the standard procedure is to assume the same value for all materials. In this work, we developed a machine learning approach, trained on density functional theory simulations, to accurately predict the heat capacity of these materials, that is, zeolite...
How to address data privacy concerns when using social media data in conservation science
179 Citations 2021Enrico Di Minin, Christoph Fink, Anna Hausmann + 2 more
Conservation Biology
It is recommended that conservation scientists carefully consider the recommendations in devising their research objectives so as to facilitate responsible use of social media data in conservation science research, for example, in conservation culturomics and investigations of illegal wildlife trade online.
Web of Science as a data source for research on scientific and scholarly activity
1065 Citations 2020Caroline Birkle, David Pendlebury, Joshua D. Schnell + 1 more
Quantitative Science Studies
The Institute for Scientific Information (ISI) continues to work closely with bibliometric groups around the world to the benefit of both the community and the services that the company provides to researchers and analysts.
Smart City Data Science: Towards data-driven smart cities with open research issues
140 Citations 2022Iqbal H. Sarker
Internet of Things
Cities are undergoing huge shifts in technology and operations in recent days, and ‘data science’ is driving the change in the current age of the Fourth Industrial Revolution (Industry 4.0 or 4IR). Extracting useful knowledge or actionable insights from city data and building a corresponding data-driven model is the key to making a city system automated and intelligent. Data science is typically the scientific study and analysis of actual happenings with historical data using a variety of scientific methodologies, machine learning techniques, processes, and systems. In this paper, we concentra...