Uncover influential research that defines the field of Data Science. Our curated list includes innovative studies that push the boundaries of data analysis, machine learning, and predictive modeling. Whether you're an academic, a professional, or an enthusiast, these papers offer valuable insights and advancements in the ever-evolving domain of Data Science.
Looking for research-backed answers?Try AI Search
Sirui Hong, Yizhang Lin, Bangbang Liu + 22 more
ArXiv
The Data Interpreter incorporates two key modules: 1) Hierarchical Graph Modeling, which breaks down complex problems into manageable subproblems, enabling dynamic node generation and graph optimization; and 2) Programmable Node Generation, a technique that refines and verifies each subproblem to iteratively improve code generation results and robustness.
Koby Mike and Orit Hazzan consider why multiple definitions are needed to pin down data science.
Data Science (DS) as defined by Jim Gray is an emerging paradigm in all research areas to help finding non-obvious patterns of relevance in large distributed data collections, but it will take much more time to implement Open Science (OS) than the authors may have expected.
Narender Chinthamu, Manideep Karukuri
Journal of Data Science and Intelligent Systems
The paper highlights the significance of data science in enhancing the functionality of enterprise resource planning (ERP) systems, with AI-based solutions such as those offered by MahaaAi and other firms automating human tasks, chat-based ERP applications, and virtual assistant assistants support to avoid human efforts.
A purported `AI Singularity' has been in the public eye recently. Mass media and US national political attention focused on `AI Doom' narratives hawked by social media influencers. The European Commission is announcing initiatives to forestall `AI Extinction'. In my opinion, `AI Singularity' is the wrong narrative for what's happening now; recent happenings signal something else entirely. Something fundamental to computation-based research really changed in the last ten years. In certain fields, progress is dramatically more rapid than previously, as the fields undergo a transition to friction...
Neil D. Lawrence, Jessica Montgomery
Royal Society Open Science
This work suggests a framework for accelerating AI adoption that requires action to build supply chains of ideas between disciplines; rapidly transfer technological capabilities through open research; create AI tools that empower researchers; and embed effective data stewardship to cultivate an environment of open data science.
Elena Parmiggiani, Thomas Østerlie, P. Almklov
J. Assoc. Inf. Syst.
This paper draws on a longitudinal study of data management in the oil and gas industry to shed light on backroom data work, finding that this type of work is qualitatively different from the front-stage data analytics in the realm of data science but is also deeply interwoven with it.
Tijl De Bie, Luc de Raedt, J. Hernández-Orallo + 3 more
Communications of the ACM
Given the complexity of data science projects and related demand for human expertise, automation has the potential to transform the data science process.
Michael L. Brodie
ArXiv
This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving, by exploring and evaluating its remarkable, definitive features.
This detailed guide to data modeling in the sciences is ideal for students and researchers keen to develop their understanding of probabilistic data modeling beyond the basics of p-values and fitting residuals.
Gabriel Neagu
Studies in Informatics and Control
The content focuses on concepts, principles and practical applications that are applicable to any industry and technology environment, and the learning is supported and explained with examples that you can replicate using open-source software.
Les information détaillées à propos de chaque cours sont disponibles en cliquant sur le code cours. En particulier, l’horaire précis, jour par jour, et les locaux correspondants sont accessibles via la rubrique “Horaire”. Detailed information about each course unit is available by clicking the course code. In particular, the detailed schedule, day by day, and the corresponding classrooms are provided under the “Schedule” sub-title.
Hoang Thanh Lam, Beat Buesser, Hong Min + 7 more
2021 IEEE 37th International Conference on Data Engineering (ICDE)
A novel system called OneBM (One Button Machine), that enables data scientists to increase their efficiency with automated feature engineering for relational data by automatically identifying and executing relevant joins and aggregates in the data.
M. S. El-Nasr, Alessandro Canossa, Truong-Huy D. Nguyen + 1 more
journal unavailable
This book is aimed at giving readers an introduction to the practical side of game data science and thus can be used a textbook for game analytics or game user research class or as a reference to self learners and enthusiasts.
Alex Bogatu, N. Paton, Mark Douthwaite + 1 more
journal unavailable
The design decisions made in the development of a system to support data discovery and integration are reported, and an evaluation that investigates both usability and task e � ciency is reported on.
Yazhen Wang
Harvard Data Science Review
,
Rotem Israel-Fishelson, Peter F. Moon, Rachel Tabak + 1 more
Issue 6.2, Spring 2024
A qualitative analysis of four data science curricula shows that the curricula use relatively recent and small datasets covering a range of topics and that there is limited learner involvement in dataset selection, and reveals gaps between the datasets used and students' self-reported interests.
Lijing Wang, David Zhen Yin, J. Caers
journal unavailable
Data Science for the Geosciences focuses on techniques that address common characteristics of geoscientific data, including extremes, multivariate, compositional, geospatial and space-time methods, and is the perfect text for those with limited mathematical or coding experience.
P. Rodrigues, E. Carfagna
Environmetrics
This opinion piece will include a limited number of examples that highlight the usefulness of data science in environmetrics, and a specific illustration of the behavior of the wildfires in Brazil between January and December of 2021.
Liqiang Jing, Zhehui Huang, Xiaoyang Wang + 6 more
ArXiv
DSBench is introduced, a comprehensive benchmark designed to evaluate data science agents with realistic tasks, and shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG).
Allison S. Theobold
Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1
This article details how alternative grading, specifically "ungrading," was integrated into an introductory data science course and discusses how infusing alternative methods of assessment into the classroom stands to cultivate the diversity continually lacking in computer science and data science.
Michael J. Muller, Angelika Strohmayer
Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
A taxonomy of data silences in data work is used to analyze how data workers forget, erase, and unknow aspects of data and an analytic vocabulary for future work in remembering, forgetting, and erasing in HCI and the data sciences is contributed.
Christopher Dawson, Steve Mann, Edward Roske + 1 more
journal unavailable
A novel application of Natural Language Processing techniques to classify unstructured text into toxic and non-toxic categories and showed a very promising accuracy of more than 70% performance by LSTM among all algorithms.
Anjali N. Nair, Ambika Biradar, R. Menon
INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH
Role of Mathematics in Data Science is discussed about to understand the mechanism of the algorithm like what is happening, why it's happening, and how to optimize it to obtain the required result.
The techniques discussed range from k-anonymity and differential privacy to homomorphic encryption and zero-knowledge proofs to address privacy concerns, measurements to assess, and techniques to remove discrimination against sensitive groups, and various explainable AI techniques.
R. Peng, H. Parker
Annual Review of Statistics and Its Application
This review attempts to distill some core ideas from data science by focusing on the iterative process of data analysis and develop some generalizations from past experience that form the basis of a theory of data science.
S. Lipovetsky
Technometrics
This section will review those books whose content and level reflect the general editorial policy of Technometrics. Publishers should send books for review to Ejaz Ahmed, Department of Mathematics and Sciences, Brock University, St. Catharines, ON L2S 3A1 (dean.fms@brocku.ca). The opinions expressed in this section are those of the reviewers. These opinions do not represent positions of the reviewers’ organization and may not reflect those of the editors or the sponsoring societies. Listed prices reflect information provided by the publisher and may not be current. The book purchase programs o...
Jacob Sagrans, Janice Mokros, Christine Voyer + 1 more
The Science Teacher
The authors have been implementing the NSF-funded "Data Clubs" project to examine using data sets on topics such as ticks and Lyme disease, COVID-19, and sports and leisure injuries with youth in out-of-school settings.
Kayla O'Leary, Abebe Rorissa, Jeonghyun Kim
Proceedings of the ALISE Annual Conference
This study attempts to address current and future workforce demands in data librarianship and analytics, employer skill demand, and acquisition of skills, and content analysis of 294 distinct graduate program curricula involving significant data science coursework in 43 LIS schools and iSchools across North America.
Thomas Neifer, Dennis Lawo, Margarita Esau
journal unavailable
The Data Science Canvas was developed in an expert workshop and evaluated by practitioners to find out whether such an instrument could support data-driven value creation.
A systematic and end-to-end framing of the field based on an inclusive definition of data science is described, which describes a systematic and end-to-end framing of the field.
Maria-Theresia Verwega, Carola Trahms, A. Antia + 8 more
journal unavailable
From perspectives to define Marine Data Science as a distinct discipline are presented, the methods of Marine Data science are characterized as a toolbox including skills from their two parental sciences, which build the foundation of Marine data Science.
Margarita Boenig-Liptsin, A. Tanweer, A. Edmundson
Journal of Statistics and Data Science Education
The theoretical foundations from the fields of Science, Technology and Society, feminist theory, and critical race theory that animate the Ethos Lifecycle are discussed and it is shown how these orient the tool toward a normative commitment to justice and what is called the “world-making” view of data science.
M. Zarbin, Aaron Lee, P. Keane + 1 more
Translational Vision Science & Technology
Typically, data scientists undertake exploratory data analysis by deploying machine learning principles and algorithms to identify patterns in raw data with the purpose of understanding processes and predicting outcomes.
Yuge Zhang, Qiyang Jiang, Xingyu Han + 3 more
journal unavailable
DSEval is introduced -- a novel evaluation paradigm, as well as a series of innovative benchmarks tailored for assessing the performance of large Language Models throughout the entire data science lifecycle, incorporating a novel bootstrapped annotation method.
Alfred Hero, S. Kar, José M. F. Moura + 4 more
Issue 5.1, Winter 2023
To address these emerging vulnerabilities, data science has much to contribute, including methods of distributed statistical inference, data fusion, and data fusion.
George - John Nychas, E. Sims, P. Tsakanikas + 1 more
Annual review of biomedical data science
A massive amount of data is generated, not only from the next generation of food safety monitoring systems and along the entire food chain but also from the Internet of things, media, and other devices, and the scientific field of data science should be a vital player in helping to make this possible.
Data Science in the Library: Tools and Strategies for Supporting Data-Driven Research and Instruction brings together an international group of librarians and faculty to consider the opportunities afforded by data science for research libraries.
Mohammed Mahmoud
Technologies
Big Data analysis is one of the most contemporary areas of development and research in the present day [...]
Adriane P. Chapman, Luca Lauro, P. Missier + 1 more
Proc. VLDB Endow.
The DPDS toolkit implements an observer pattern that is able to capture the fine-grained provenance associated with each individual element of a dataframe, across multiple transformation steps, with the goal of helping engineers and analysts to justify and explain their choice of data operations.
Towards the end of the Cold War in 1985, in reference to the theory of leadership for the first time, in the book ‘Leaders: The Strategies For Taking Charge’ by Warren Bennis and Burt Nanus [...]
Chengcheng Qian, Baoxiang Huang, Xueqing Yang + 1 more
Big Earth Data
This paper analyzes the present and makes predictions for the future regarding the use of big and small data in ocean science.
A broad range of topics is covered, including correlation, regression, classification, clustering, neural networks, random forests, boosting, kernel methods, evolutionary algorithms and deep learning, as well as the recent merging of machine learning and physics.
This work will outline how existing data visualization techniques are already successfully employed in different data science workflow stages and highlight the differences among the libraries and applications currently available.
C. Akcora, Murat Kantarcioglu, Y. Gel
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
This tutorial will detail the state of art in Blockchain data analytics for graph, security, and finance domains and offer a holistic view of applied Data Science on Blockchains.
Christian Varner, Vivak Patel
journal unavailable
It is shown that optimization problems arising in data science are particularly difficult to solve, and that there is a need for methods that can reliably and practically solve these problems.
Jessica A. R. Logan, S. Hart, C. Schatschneider
AERA Open
This article addresses the wide range of benefits of data sharing, the many ways by which data can be shared, and a step by step guide to start sharing data, to respond to common concerns.
Dhrithi Deshpande, Karishma Chhugani, Yutong Chang + 15 more
Frontiers in Genetics
The purpose of this review is to explain basic concepts in the computational analysis of RNA-seq data and define discipline-specific jargon.
This Perspective surveys the rapidly expanding new field of data science in cell imaging, highlighting how data science tools are used within current image analysis pipelines, proposing a computation-first approach to derive new hypotheses from cell image data, and describing the next frontiers where data science will make an impact.
Alwiyah
International Transactions on Artificial Intelligence (ITALIC)
With the development of technology, new data analysis methods can be used to overcome the problem of managing and analyzing data with increasing amount of data available.