Uncover influential research that defines the field of Data Science. Our curated list includes innovative studies that push the boundaries of data analysis, machine learning, and predictive modeling. Whether you're an academic, a professional, or an enthusiast, these papers offer valuable insights and advancements in the ever-evolving domain of Data Science.
Looking for research-backed answers?Try AI Search
D. Blei, Padhraic Smyth
Proceedings of the National Academy of Sciences
This article discusses data science from three perspectives: statistical, computational, and human, and argues that the effective combination of all three components is the essence of what data science is about.
I came into data science from the industrial side, and when I saw that Harvard Business Review already in 2012 had declared “Data Scientist” to be “The Sexiest Job of the 21st Century” [3], I wanted to become one too.
This report summarizes two talks that I gave at the Advanced Future Studies at Kyoto University in February of 2016, which provided an overview of an emerging research trend—the emergence of a new discipline called the Science of Science.
Learn about data science resources, analysis, communities and data management, and also learn about hte datasets openly available and dataset purchase program.
Data Science (DS) as defined by Jim Gray is an emerging paradigm in all research areas to help finding non-obvious patterns of relevance in large distributed data collections, but it will take much more time to implement Open Science (OS) than the authors may have expected.
This discussion is about changes in the world, changes that affect us as academics, changes that affect us as empirical researchers (qualitative and quantitative), and changes that affect our students and our universities/ colleges I am an empirical social scientist, a political methodologist, and a statistician, so I will only discuss topics along these lines where I am qualified to make comments As of this re-rewriting, the country and the world are in various levels of lockdown and recovery because of Covid-19 During this difficult process it is clear that data and privacy issues are changi...
A VP of Engineering at a startup doing data mining and machine learning research explains how to get research into the hands of customers faster.
A new methodology for analysis of precipitate shapes using a segmentation-free approach based on the histogram of oriented gradients feature descriptor (HOG), a classic tool in image analysis, is demonstrated.
Will Sherman, Kati Schuerger, Randy Kim + 1 more
journal unavailable
. The M3-Competition found that simple models outperform more complex ones for time series forecasting. As part of these competitions, several claims were made that statistical models exceeded machine learning (ML) techniques, such as recurrent neural networks (RNN), in prediction performance. These findings may over-generalize the capabilities of statistical models since the analysis measured the total forecasting accuracy across a wide range of industries and fields and with different interval lengths. This investigation aimed to assess how statistical and ML methods compared when individuat...
Lessons learned managing a data science research team are shared to help improve the quality of research and reduce the amount of uncertainty in the research process.
J. D. Horn, Lily Fierro, Jeana Kamdar + 11 more
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
The activities of theBD2K TCC are described and its focus on the construction of the Educational Resource Discovery Index (ERuDIte), which identifies, collects, describes, and organizes online data science materials from BD2K awardees, open online courses, and videos from scientific lectures and tutorials.
K. McGrail, K. Jones
International Journal of Population Data Science
These implications are the beginnings of a research agenda for Population Data Science, which if approached as a collective field will catalyze significant advances in the understanding of society, health, and human behavior and increase the impact of the research.
Triston Hudgins, Shijo Joseph, Douglas Yip + 1 more
journal unavailable
The result of this study concludes that review helpfulness can be effectively predicted through the deployment of model features and enable strategies to moderate a user review post to improve the helpfulness quality of a review.
Ravindra Thanniru, Gautam Kapila, Nibhrat Lohia
journal unavailable
This paper researches application of Machine Learning approaches for memory element failure analysis which could mimic simulation-like accuracy and minimize the need for engineers to rely heavily on simulators for their validations.
J. Saltz, F. Armour, R. Sharda
Commun. Assoc. Inf. Syst.
Information system educators who can gain a better understanding of current trends in data science/analytics education and other information system researchers who are interested in how data science /analytics might impact the broader field of information systems and management education should find interest in this report.
Tai Chowdhury, Ravi Sivaraman, Apurv Mittal + 2 more
journal unavailable
A novel machine learning-based framework, the DARTH framework, that characterizes and combines multiple models, with one model for each composite feature, that enables the accurate identification of phishing emails is presented.
Anthony Yeung, Emmanuel Onyeka, Joe Chung + 1 more
journal unavailable
This paper explores in-depth the simulation model of Moving Average and Moving Average Convergence/Divergence (MACD) to come up with optimized parameters that will allow traders to profit from trading Dow Jones Industrial Index and Hang Seng Index.
Broad discussion of data management in the sciences, and how libraries and librarians can embed themselves in the data lifecycle are presented, along with specific examples of how libraries have become involved with research data services.
This article provides a comprehensive survey and tutorial of the fundamental aspects of data science: the evolution from data analysis to data science, the data science concepts, a big picture of the era of dataScience, the major challenges and directions in data innovation, the nature of data analytics, new industrialization and service opportunities in the data economy, the profession and competency of data education, and the future of datascience.
Machine learning is a highly influential field that has made major contributions to the increased effectiveness of artificial intelligence by utilizing different methods, four of which have been particularly effective.
While it may not be possible to build a data brain identical to a human, data science can still aspire to imaginative machine thinking.
Mahsa Ghasemi
International Journal of Advanced Research in Science, Communication and Technology
The main aim of Data Science search out turn big sets of two together unorganized and organized data into valuable news that can help organisations to create strong data-compelled resolutions.
W. Auzinger, I. Březinová, Alexander Grosz + 3 more
journal unavailable
Among the most popular integrators such as Runge–Kutta methods, time-splitting, exponential integrators and Lawson methods, exponential Lawson multistep methods with one predictor–corrector step provide the best stability and accuracy at the least effort.
Understanding published research results should be through one’s own eyes and include the raw diffraction data, an option that has recently become viable at various data archives.
Les information détaillées à propos de chaque cours sont disponibles en cliquant sur le code cours. En particulier, l’horaire précis, jour par jour, et les locaux correspondants sont accessibles via la rubrique “Horaire”. Detailed information about each course unit is available by clicking the course code. In particular, the detailed schedule, day by day, and the corresponding classrooms are provided under the “Schedule” sub-title.
Herlambang Dwi Prasetyo, Pandu Ananto, Ika Nurlaili Isnainiyah
journal unavailable
The author wants to create a diabetes prediction system independently through a website-based application system using the XGBoost algorithm, which has an accuracy of 74.67%, a precision value of 57.40%, a recall value of 65.94% and a specificity value of 78.50%.
The Bachelor of Science in Data Science studies the collection, manipulation, storage, retrieval, and computational analysis of data in its various forms, including numeric, textual, image, and video data from small to large volumes. The program combines computer science, information science, mathematics, statistics, and probability theory into an integrated curriculum that prepares students for careers or graduate studies in big data analysis, data science, and data analytics. The coursework covers exploratory data analysis, data manipulation in a variety of programming languages, large-scale...
The technological revolution has led to an explosion of data in domains of knowledge, and new methodologies have emerged to power intelligent systems, make more accurate predictions, and gain new insight using the large volumes of data generated by scientists, entrepreneurs, and analysts.
This paper aims to reveal the obstacle and limitations of other science into a data science completely, on that basis the definition of data sciences needs to be elaborated, then confirm data science as new science and not depend directly on several other sciences.
Design, development, evaluation 3D user interfaces, Symbolic, menu, gestural, and multimodal interaction, interaction techniques metaphors, immersive.
A review of the impact of information security on the government and companies is described in terms of threats and types of information safety, including application security, cloud security, cryptography, security infrastructure, incident response, and vulnerability management.
An increasing number of consequential decisions are made automatically by software that employs machine learning, data analytics, and artificial intelligence to discover decision rules using data to ensure good governance of these technologies and building accountable algorithms.
Zarek Drozda, J. Walker, Kathi Fisler + 1 more
Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2
This panel will explore what DS education and CS education can learn from each other, how each can contribute and advance the goals of the other, and how these two intertwined disciplines can productively live alongside each other in K-12 settings.
Alessandro Mantelero, G. Vaciago
journal unavailable
This chapter investigates the limits and criticisms of the existing legal framework and the possible options to provide adequate answers to the new challenges of Big Data processing and suggests a broader approach that encompasses the collective dimension of data protection.
Cfa Mba Jeff Reed, Mba Allen Hoskins, PhD Robert Slater
journal unavailable
. A chasm exists between the active public equity investment management industry's fundamental, momentum, and quantitative styles. In this study, the researchers explore ways to bridge this gap by leveraging domain knowledge, fundamental analysis, momentum, crowdsourcing, and data science methods. This research also seeks to test the developed tools and strategies during the volatile time period of 2020 and 2021.
Samuel Onalaja, Eric Romero, Bo Yun + 1 more
journal unavailable
The study shows that assigning higher driving factors to certain aspects and genre result in the higher accuracy of the sentiment prediction models that utilized in this research.
H. Uzunalioglu, Jin Cao, C. Phadke + 5 more
ArXiv
Key features of ADS are the replacement of rudimentary data exploration and processing steps with automation and the augmentation of data scientist judgment with automatically-generated insights in a domain-agnostic way to facilitate the data science process.
Michael L. Brodie
ArXiv
This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving, by exploring and evaluating its remarkable, definitive features.
Michael Schulte, Ranjan Karki, Nibhrat Lohia
journal unavailable
This paper aims to develop a method to build an interpretable model for univariate and multivariate nonlinear time series data using wavelets and symbolic regression and relies on multilayer perceptron (MLP) neural networks as a form of dimensionality reduction and the PySR algorithm to determine the symbolic relationships.
Rajesh Satluri, Suchismita Moharana, Venkat Kasarla + 1 more
journal unavailable
Using the Jaccard similarity coefficient, in the knowledge graph, this study is able to identify and explore relationships between COVID-19 cases as well as predict the vulnerability of general population in a vicinity.
Christopher Dawson, Steve Mann, Edward Roske + 1 more
journal unavailable
. Over 87% of the streaming music is owned by four major record labels (Jones, 2018). Yet, the songs owned by those labels account for <1% of the total amount of music created each year. These labels are historically better at identifying talent (though this talent identification is becoming more difficult). Even though Spotify has 36% of the streaming marketing share (T4, 2021), Spotify has not been profitable because of the large licensing costs paid to the large music labels. If Spotify could identify hit songs & artists before the large labels, they would sign those artists and dramaticall...
Alan Abadzic, Milan Patel, Jacquie Cheun-Jensen
journal unavailable
This study presents a comprehensive analysis of player performance data to forecast the top 12 fantasy points performers per position for the upcoming season and identifies key performance indicators and trends to inform player evaluations.
S. Zaheri, J. Leath, David Stroud
journal unavailable
A novel application of Natural Language Processing techniques to classify unstructured text into toxic and non-toxic categories and showed a very promising accuracy of more than 70% performance by LSTM among all algorithms.
Matthew Baldree, Paul Widhalm, Brandon Hill + 1 more
journal unavailable
A tool that provides trading recommendations for cryptocurrency using a stochastic gradient boost classifier trained from a model labeled by technical indicators that concludes that Bitcoin is a unique asset with similarities to gold.
Matthew David, William Jones, Hayley Horn
journal unavailable
. In recent years, the adoption of complex machine learning algorithms, often perceived as “black box” models, has grown exponentially across various disciplines. However, the lack of understanding regarding how these models come to their predictions often fosters skepticism and mistrust. In response to the demand for transparency and interpretability, Explainable AI techniques, such as SHapley Additive exPlanations (SHAP), have emerged as powerful tools for comprehending and trusting these algorithms. However, SHAP has an exponential computational demand 𝑂( 𝑥 2 ), where x is the number of f...
Hannah Kosinovsky, Sita Daggubati, Kumar Ramasundaram + 1 more
journal unavailable
Using the previous years of parts sales data for a supplier to the oil and gas industry in North America, a novel method to predict demand with a minimal error rate is found.
Helene Barrera, Justin Ehly, Blake A. Freeman + 2 more
journal unavailable
This study focuses on the exploration of feature selection through building multiple models, one simple linear model and one decision tree model for prediction on inpatient hospitalization rates, which will result in a highly interpretable model that can be more readily understood and easily used.
Tanya Garg, Reenu Rani
PARIPEX INDIAN JOURNAL OF RESEARCH
This paper examines the Finance and Banking industry, highlighting its issues and emphasizing the crucial role that Data Science plays in solving them.
Daanesh Ibrahim, Jules Stacy, D. Stroud + 2 more
journal unavailable
A Proximal Policy Optimization model was found that was able to learn how to play the game and consistently increase its reward function scores over time and it is recommended for future tasks in similar spaces.
T. Wiktorski, Y. Demchenko, A. Belloum
2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)
The DSBoK provides a basis for structuring the proposed MC-DS by Knowledge Area Groups (KAG) defined in correspondence with the CF-DS competence groups.