Top Research Papers on Data Science
Uncover influential research that defines the field of Data Science. Our curated list includes innovative studies that push the boundaries of data analysis, machine learning, and predictive modeling. Whether you're an academic, a professional, or an enthusiast, these papers offer valuable insights and advancements in the ever-evolving domain of Data Science.
Looking for research-backed answers?Try AI Search
Foundations of Data Science
308 Citations 2020Avrim Blum, John E. Hopcroft, Ravindran Kannan
Cambridge University Press eBooks
Computer science as an academic discipline began in the 1960’s with emphasis on programming languages, compilers, operating systems, and the mathematical theory that supported these areas, but today, a fundamental change is taking place and the focus is more on applications.
The introduction discusses the idea of data journeys and its characteristics as an investigative tool and theoretical framework for this volume and broader scholarship on data and the significance of this approach towards addressing the challenges raised by data-centric science and the emergence of big and open data.
Veridical data science
154 Citations 2020Bin Yu, Karl Kumbier
Proceedings of the National Academy of Sciences
This work proposes (basic) PCS inference for reliability measures on data results, extending statistical inference to a much broader scope as current data science practice entails, and proposes PCS documentation based on R Markdown or Jupyter Notebook to back up human choices made throughout an analysis.
Spatial Data Science introduces fundamental aspects of spatial data that every data scientist should know before they start working with spatial data. These aspects include how geometries are represented, coordinate reference systems (projections, datums), the fact that the Earth is round and its consequences for analysis, and how attributes of geometries can relate to geometries. In the second part of the book, these concepts are illustrated with data science examples using the R language. In the third part, statistical modelling approaches are demonstrated using real world data examples. Aft...
The twenty-first century has ushered in the age of big data and data economy, in which data DNA, which carries important knowledge, insights and potential, has become an intrinsic constituent of all data-based organisms. An appropriate understanding of data DNA and its organisms relies on the new field of data science and its keystone, analytics. Although it is widely debated whether big data is only hype and buzz, and data science is still in a very early phase, significant challenges and opportunities are emerging or have been inspired by the research, innovation, business, profession, and e...
Data-Driven Science and Engineering
564 Citations 2022Steven L. Brunton, J. Nathan Kutz
Cambridge University Press eBooks
Data-driven discovery is revolutionizing how we model, predict, and control complex systems. Now with Python and MATLAB®, this textbook trains mathematical scientists and engineers for the next generation of scientific discovery by offering a broad overview of the growing intersection of data-driven methods, machine learning, applied optimization, and classical fields of engineering mathematics and mathematical physics. With a focus on integrating dynamical systems modeling and control with modern methods in applied machine learning, this text includes methods that were chosen for their releva...
Statistical Foundations of Data Science
186 Citations 2020Jianqing Fan, Runze Li, Cun‐Hui Zhang + 1 more
journal unavailable
Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data an...
A Survey on Data Pricing: From Economics to Data Science
108 Citations 2020Jian Pei
IEEE Transactions on Knowledge and Data Engineering
A unified, interdisciplinary and comprehensive overview of the economics of data pricing is presented and the development and evolution of pricing models according to a series of fundamental principles are reviewed.
The Critical Importance of Citizen Science Data
159 Citations 2021Alex de Sherbinin, Anne Bowser, Tyng–Ruey Chuang + 10 more
Frontiers in Climate
Citizen science is an important vehicle for democratizing science and promoting the goal of universal and equitable access to scientific data and information. Data generated by citizen science groups have become an increasingly important source for scientists, applied users and those pursuing the 2030 Agenda for Sustainable Development. Citizen science data are used extensively in studies of biodiversity and pollution; crowdsourced data are being used by UN operational agencies for humanitarian activities; and citizen scientists are providing data relevant to monitoring the sustainable develop...
From Open Data to Open Science
118 Citations 2021Rahul Ramachandran, Kaylin Bugbee, Kevin Murphy
Earth and Space Science
This study defines open science as a collaborative culture enabled by technology that empowers the open sharing of data, information, and knowledge within the scientific community and the wider public to accelerate scientific research and understanding.
Data science applications for predictive maintenance and materials science in context to Industry 4.0
116 Citations 2021S. M. Sajid, Abid Haleem, Shashi Bahl + 3 more
Materials Today Proceedings
Five critical processes of data scientists for predictive maintenance are identified and discussed briefly through a literature review and identified as essential for Industry 4.0.
Ridge Regularization: An Essential Concept in Data Science
103 Citations 2020Trevor Hastie
Technometrics
Some of the magic and beauty of ridge that my colleagues and I have encountered over the past 40 years in applied statistics are collected together.
Data Science Meets Physical Organic Chemistry
102 Citations 2021Jennifer M. Crawford, Cian Kingston, F. Dean Toste + 1 more
Accounts of Chemical Research
This work investigated whether enantioselectivity data from a reaction can be quantitatively connected to the attributes of reaction components, such as catalyst and substrate structural features, to harness data for asymmetric catalyst design and developed a workflow to relate computationally derived features of Reaction components to enantiOSElectivity using data science tools.
Automated Experimentation Powers Data Science in Chemistry
109 Citations 2021Yao Shi, Paloma L. Prieto, Tara Zepel + 2 more
Accounts of Chemical Research
This Account contextualizes the need for more complex and diverse experimental data and highlights how the seamless integration of robotics, machine learning, and data-rich monitoring techniques can be used to access it with minimal human labor.
The R Language: An Engine for Bioinformatics and Data Science
200 Citations 2022Federico M. Giorgi, Carmine Ceraolo, Daniele Mercatelli
Life
An historical chronicle of how R became what it is today is provided, describing all its current features and capabilities, and the role of R in science in general as a driver for reproducibility is discussed.
Big data and machine learning for materials science
154 Citations 2021José F. Rodrigues, Larisa Florea, Maria Cristina Ferreira de Oliveira + 2 more
Discover Materials
A roadmap for future developments with emphasis on computer-aided discovery of new materials and analysis of chemical sensing compounds, both prominent research fields for ML in the context of materials science are proposed.
The Science of Visual Data Communication: What Works
285 Citations 2021Steven Franconeri, Lace Padilla, Priti Shah + 2 more
Gothic.net
Effectively designed data visualizations allow viewers to use their powerful visual systems to understand patterns in data across science, education, health, and public policy. But ineffectively designed visualizations can cause confusion, misunderstanding, or even distrust—especially among viewers with low graphical literacy. We review research-backed guidelines for creating effective and intuitive visualizations oriented toward communicating data to students, coworkers, and the general public. We describe how the visual system can quickly extract broad statistics from a display, whereas poor...
Data-science driven autonomous process optimization
223 Citations 2021Melodie Christensen, Lars P. E. Yunker, Folarin Adedeji + 8 more
Communications Chemistry
A closed-loop system capable of carrying out parallel autonomous process optimization experiments in batch with significantly reduced cycle times is developed, and it is found that the definition of a set of meaningful, broad, and unbiased process parameters was the most critical aspect of a successful optimization.
Network analysis of multivariate data in psychological science
1067 Citations 2021Denny Borsboom, Marie K. Deserno, Mijke Rhemtulla + 12 more
Nature Reviews Methods Primers
In recent years, network analysis has been applied to identify and analyse patterns of statistical association in multivariate psychological data. In these approaches, network nodes represent variables in a data set, and edges represent pairwise conditional associations between variables in the data, while conditioning on the remaining variables. This Primer provides an anatomy of these techniques, describes the current state of the art and discusses open problems. We identify relevant data structures in which network analysis may be applied: cross-sectional data, repeated measures and intensi...
Small data machine learning in materials science
588 Citations 2023Pengcheng Xu, Xiaobo Ji, Minjie Li + 1 more
npj Computational Materials
This review discussed the dilemma of small data faced by materials machine learning, and the methods of dealing with small data, including data extraction from publications, materials database construction, high-throughput computations and experiments from the data source level.