Top Research Papers on Data Mining
Looking to deep dive into the world of data mining? Our curated list of top research papers on data mining offers valuable insights, methodologies, and breakthrough discoveries. Perfect for enthusiasts, researchers, and professionals in the field. Delve into the latest advancements and expand your knowledge base with these essential reads.
Looking for research-backed answers?Try AI Search
Data Mining and Machine Learning
129 Citations 2020Mohammed J. Zaki, Wagner Meira
Cambridge University Press eBooks
The fundamental algorithms in data mining and machine learning form the basis of data science, utilizing automated methods to analyze patterns and models for all kinds of data in applications ranging from scientific discovery to business analytics. This textbook for senior undergraduate and graduate courses provides a comprehensive, in-depth overview of data mining, machine learning and statistics, offering solid guidance for students, researchers, and practitioners. The book lays the foundations of data analysis, pattern mining, clustering, classification and regression, with a focus on the a...
Data augmentation in microscopic images for material data mining
113 Citations 2020Boyuan Ma, Xiaoyan Wei, Chuni Liu + 11 more
npj Computational Materials
A novel transfer learning strategy to address problems of small or insufficient data by fusing the images obtained from simulating the physical mechanism of grain formation and the “image style” information in real images to generate synthetic data.
Text Mining in Big Data Analytics
301 Citations 2020Hossein Hassani, Christina Beneki, Stephan Unger + 2 more
Big Data and Cognitive Computing
The state-of-the-art text mining approaches and techniques used for analyzing transcripts and speeches, meeting transcripts, and academic journal articles, as well as websites, emails, blogs, and social media platforms, are investigated.
Fast Utility Mining on Sequence Data
101 Citations 2020Wensheng Gan, Jerry Chun‐Wei Lin, Jiexiong Zhang + 3 more
IEEE Transactions on Cybernetics
An efficient algorithm for the task of HUSP mining with UL-list (HUSP-ULL), which utilizes a lexicographic $q$ -sequence (LQS)-tree and a utility-linked (UL)-list structure to quickly discover HUSPs.
Deep learning in mining biological data
421 Citations 2021Mufti Mahmud, M. Shamim Kaiser, T.M. McGinnity + 1 more
Nottingham Trent University's Institutional Repository (Nottingham Trent Repository)
Focusing on the use of DL to analyse patterns in data from diverse biological domains, different DL architectures’ applications to these data are investigated and a meta-analysis has been performed and the resulting resources have been critically analysed.
Big data management in the mining industry
194 Citations 2020Chongchong Qi
International Journal of Minerals Metallurgy and Materials
A brief introduction to big data and BDM is provided and the precautions for the utilization of BDM in the mining industry are outlined, and a future in which a global database project is established and big data is used together with other technologies supported by government policies and following international standards is envisioned.
Machine learning and data mining in manufacturing
623 Citations 2020Alican Doğan, Derya Birant
Expert Systems with Applications
A comprehensive literature review is presented to provide an overview of how machine learning techniques can be applied to realize manufacturing mechanisms with intelligent actions and points to several significant research questions that are unanswered in the recent literature having the same target.
Big Data Analysis and Perturbation using Data Mining Algorithm
177 Citations 2021Haoxiang Wang, S. Smys
Journal of Soft Computing Paradigm
Experimental analysis indicates that the proposed work is more successful in terms of attack resistance, scalability, execution speed and accuracy when compared with other algorithms that are used for privacy preservation.
Mining Facebook Data for Predictive Personality Modeling
134 Citations 2021Dejan Markovikj, Sonja Gievska, Michał Kosiński + 1 more
Proceedings of the International AAAI Conference on Web and Social Media
This paper explores the feasibility of modeling user personality based on a proposed set of features extracted from the Facebook data, and explores the suitability and performance of several classification techniques.
A global-scale data set of mining areas
250 Citations 2020Victor Maus, Stefan Giljum, Jakob Gutschlhofer + 6 more
Scientific Data
The area used for mineral extraction is a key indicator for understanding and mitigating the environmental impacts caused by the extractive sector. To date, worldwide data products on mineral extraction do not report the area used by mining activities. In this paper, we contribute to filling this gap by presenting a new data set of mining extents derived by visual interpretation of satellite images. We delineated mining areas within a 10 km buffer from the approximate geographical coordinates of more than six thousand active mining sites across the globe. The result is a global-scale data set ...
Mining Big Data in Education: Affordances and Challenges
373 Citations 2020Christian Fischer, Zachary A. Pardos, Ryan S. Baker + 6 more
Review of Research in Education
This chapter outlines current challenges of accessing, analyzing, and using big data and argues that addressing these challenges is worthwhile given the potential benefits of mining big data in education.
Brief introduction of medical database and data mining technology in big data era
576 Citations 2020Jin Yang, Yuanjie Li, Qingqing Liu + 6 more
Journal of Evidence-Based Medicine
This work has introduced several databases and data mining techniques to help a wide range of clinical researchers better understand and apply database technology.
The Secondary Use of Electronic Health Records for Data Mining: Data Characteristics and Challenges
109 Citations 2022Tabinda Sarwar, Sattar Seifollahi, Jeffrey Chan + 5 more
ACM Computing Surveys
An overview of information found in EHR systems and their characteristics that could be utilized for secondary applications is provided and can serve as a primer for researchers to understand the use of EHRs for data mining and analytics purposes.
Internet of things and data mining: An application oriented survey
106 Citations 2020Priyank Sunhare, Rameez Raja Chowdhary, Manju K. Chattopadhyay
Journal of King Saud University - Computer and Information Sciences
A systematic and detailed review of various data mining techniques employed in the large and small scale IoT applications to formulate an intelligent environment and an overview of cloud-assisted IoT Big data mining system are presented to better understand the importance of data mining for an IoT environment.
A community resource for paired genomic and metabolomic data mining
124 Citations 2021Michelle Schorn, Stefan Verhoeven, Lars Ridder + 107 more
Nature Chemical Biology
The Paired Omics Data Platform is a community initiative to systematically document links between metabolome and (meta)genome data, aiding identification of natural product biosynthetic origins and metabolite structures.
Educational data mining and learning analytics: An updated survey
891 Citations 2020Cristóbal Romero, Sebastián Ventura
Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery
The current state of the art in data mining in education is provided by reviewing the main publications, the key milestones, the knowledge discovery cycle, the main educational environments, the specific tools, the free available datasets, the most used methods, themain objectives, and the future trends in this research area.
Deep Learning for Spatio-Temporal Data Mining: A Survey
713 Citations 2020Senzhang Wang, Jiannong Cao, Philip S. Yu
IEEE Transactions on Knowledge and Data Engineering
A comprehensive review of recent progress in applying deep learning techniques for spatio-temporal data mining (STDM) in different domains including transportation, on-demand service, climate & weather analysis, human mobility, location-based social network, crime analysis, and neuroscience is provided.
Next-Generation Morphometry for pathomics-data mining in histopathology
122 Citations 2023David L. Hölscher, Nassim Bouteldja, Mehdi Joodaki + 14 more
Nature Communications
This study provides a concept for Next-generation Morphometry (NGM), enabling comprehensive quantitative pathology data mining, i.e., pathomics, and shows that the extracted features are independent predictors of long-term clinical outcomes in IgA-nephropathy.
Spatiotemporal data mining: a survey on challenges and open problems
133 Citations 2022Ali Hamdi, Khaled Shaban, Abdelkarim Erradi + 3 more
RMIT Research Repository (RMIT University Library)
This work investigates the challenging issues in regards to spatiotemporal relationships, interdisciplinarity, discretisation, and data characteristics related to STDM tasks of classification, clustering, hotspot detection, association and pattern mining, outlier detection, visualisation, visual analytics, and computer vision tasks.
Proceedings of the 2020 SIAM International Conference on Data Mining
312 Citations 2020Demeniconi, Carlotta, Chawla, Nitesh V., SIAM International Conference on Data Mining 2020 Cincinnati, Ohio
Society for Industrial and Applied Mathematics eBooks
Data mining is an important tool in science, engineering, industrial processes, healthcare, business, and medicine. The datasets in these fields are large, complex, and often noisy. Extracting knowledge requires the use of sophisticated, high performance and principled analysis techniques and algorithms, based on sound theoretical and statistical foundations. These techniques in turn require implementations that are carefully tuned for performance; powerful visualization technologies; interface systems that are usable by scientists, engineers, and physicians as well as researchers; and infrast...
Sentiment analysis and opinion mining on educational data: A survey
163 Citations 2022Thanveer Shaik, Xiaohui Tao, Christopher Dann + 3 more
Natural Language Processing Journal
The role of emotional analysis in education from four levels: document level, sentence level, entitylevel, and aspect level, and the role of AI in sentiment analysis with methodologies like machine learning, deep learning, and transformers are discussed.
Data mining approach to shipping route characterization and anomaly detection based on AIS data
215 Citations 2020Hao Rong, A.P. Teixeira, C. Guedes Soares
Ocean Engineering
The approach consists of identifying relevant waypoints along a route where significant changes in the ships’ navigational behaviour are observed, such as changes in heading, using trajectory compression and clustering algorithms to provide a vector-based representation of the ship routes consisting of straight legs and connecting turning sections that facilitates route probabilistic characterization and anomaly detection.
A data visualization and data mining approach to response and non-response analysis in survey research
129 Citations 2020Chong Ho Yu, Angel Jannasch‐Pennell, Samuel DiGangi + 2 more
Scholarworks (University of Massachusetts Amherst)
This study reveals that academic level, gender, and race were not identified as crucial factors in determining the response rate, and the nature of the subject matter might be more important for science/engineering and law students seemed more interested in this technology-related survey.
Data mining in clinical big data: the frequently used databases, steps, and methodological models
496 Citations 2021Wentao Wu, Yuan-Jie Li, Aozi Feng + 4 more
Military Medical Research
The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.
lipidr: A Software Tool for Data Mining and Analysis of Lipidomics Datasets
115 Citations 2020Ahmed Mohamed, Jeffrey Molendijk, Michelle M. Hill
Journal of Proteome Research
The ease of use and innovative features of lipidr, an open-source R/Bioconductor package for data mining and analysis of lipidomics datasets, are expected to allow the lipidomics research community to gain novel detailed insights from lipidomics data.
Systematic ensemble model selection approach for educational data mining
162 Citations 2020MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif + 1 more
Knowledge-Based Systems
This work proposes a systematic approach based on Gini index and p -value to select a suitable ensemble learner from a combination of six potential machine learning algorithms.
Concept Drift Detection in Data Stream Mining : A literature review
183 Citations 2021Supriya Agrahari, Anil Kumar Singh
Journal of King Saud University - Computer and Information Sciences
In recent years, the availability of time series streaming information has been growing enormously. Learning from real-time data has been receiving increasingly more attention since the last decade. Online learning encounters the change in the distribution of data while extracting considerable information from data streams. Hidden data contexts, which are not known to the learning algorithms, are known as concept drift. Classifier classifies incoming instances using past training instances of the data stream. The accuracy of the classifier deteriorates because of the concept drift. The traditi...
PharmKG: a dedicated knowledge graph benchmark for bomedical data mining
152 Citations 2020Shuangjia Zheng, Jiahua Rao, Song Ying + 5 more
Briefings in Bioinformatics
This work introduced PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500 000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities, and established a comprehensive KG system for the biomedical field.
Detection and Prediction of Diabetes Using Data Mining: A Comprehensive Review
102 Citations 2021Farrukh Aslam Khan, Khan Zeb, Mabrook Al‐Rakhami + 2 more
IEEE Access
This paper provides a comprehensive classification and comparison of the techniques that have been frequently used for diagnosis and prediction of diabetes based on important key metrics and highlights the challenges and future research directions in this area that can be considered in order to develop optimized solutions for diabetes detection and prediction.
A survey on swarm intelligence approaches to feature selection in data mining
380 Citations 2020Bach Hoai Nguyen, Bing Xue, Mengjie Zhang
Swarm and Evolutionary Computation
A comprehensive survey on the state-of-the-art works applying swarm intelligence to achieve feature selection in classification, with a focus on the representation and search mechanisms.
Applications of Data Mining Techniques in Healthcare and Prediction of Heart Attacks
181 Citations 2020K Kasikumar, M. Mohamed Najumuddeen, R Suresh
International Journal of Data Mining Techniques and Applications
Data mining is the computer based process of analyzing huge sets of data and then extracting the meaning of the data. Data mining tools predict future trends, allowing business to make positive, knowledge-driven decisions. The huge amounts of data generated by traditional methods for prediction of heart disease are too complex and voluminous to be processed and analyzed. Data mining provides the technologies to transform these huge sets of data into useful information for decision making. Data mining techniques takes less time for the prediction of the disease with more accuracy. In this paper...
<i>microeco</i>: an R package for data mining in microbial community ecology
1402 Citations 2020Chi Liu, Yaoming Cui, Xiangzhen Li + 1 more
FEMS Microbiology Ecology
An integrated R package-'microeco' is presented, developed based on the R6 class system and combines a series of commonly used and advanced approaches in microbial community ecology research, and provides powerful and convenient tools for researchers.
An overview of deep learning methods for multimodal medical data mining
140 Citations 2022Fatemeh Behrad, Mohammad Saniee Abadeh
Expert Systems with Applications
Deep learning methods have achieved significant results in various fields. Due to the success of these methods, many researchers have used deep learning algorithms in medical analyses. Using multimodal data to achieve more accurate results is a successful strategy because multimodal data provide complementary information. This paper first introduces the most popular modalities, fusion strategies, and deep learning architectures. We also explain learning strategies, including transfer learning, end-to-end learning, and multitask learning. Then, we give an overview of deep learning methods for m...
An outliers detection and elimination framework in classification task of data mining
217 Citations 2023Ch. Sanjeev Kumar Dash, Ajit Kumar Behera, Satchidananda Dehuri + 1 more
Decision Analytics Journal
An outlier is a datum that is far from other data points in which it occurs. It can have a considerable impact on the output. Therefore, removing or resolving it before the analysis is essential to prevent skewing. Outliers in a survey sampling can have a significant outcome on statistical results. The goal of discovering outliers in data mining is to find a pattern in data that does not conform to expected behavior. In this paper, we have proposed a framework in which a popular statistical approach termed Inter-Quartile Range (IQR) is used to detect outliers in data and deal with them by Wins...
Knowledge Discovery: Methods from data mining and machine learning
216 Citations 2022Xiaoling Shu, Yiwan Ye
Social Science Research
The interdisciplinary field of knowledge discovery and data mining emerged from a necessity of big data requiring new analytical methods beyond the traditional statistical approaches to discover new knowledge to produce improved models that combine explanation and prediction.
Data Mining Framework for Nutrition Ranking: Methodology: SPSS Modeller
100 Citations 2021Nauman Aziz, Shabib Aftab
International Journal of Technology Innovation and Management (IJTIM)
The goal of this research is to use the technology of Data Mining in a dataset for a ranking of three diets on the respondents and investigate such tools advantages and limitations such as large amount of manipulation before analysis replications with the same results from the analysis bias. Here we can see the ethical consequences of such programs in details.
Spatial structures of tourism destinations: A trajectory data mining approach leveraging mobile big data
158 Citations 2020Sangwon Park, Yang Xu, Jiang Liu + 2 more
Annals of Tourism Research
A large scale mobile phone dataset that captures the cellphone trace of international travelers who visited South Korea is analyzed to understand the spatial structures of tourist activities within three different destinations and reveals multiple “hot spots” in travel destinations and spatial interactions across these places.
Applications of data mining and machine learning framework in aquaculture and fisheries: A review
169 Citations 2022J. Gladju, Biju Sam Kamalam, A. Kanagaraj
Smart Agricultural Technology
Aquaculture and fisheries sectors are finding ingenious ways to grow and meet the soaring human demand for nutrient-rich fish and seafood by efficiently utilizing the vast water resources and biodiversity of aquatic life on earth. This includes the progressive integration of information technology, data science and artificial intelligence with fishing and fish farming methods to enable intensification of aquaculture production, sustainable exploitation of natural fishery resources and mechanization-automation of allied activities. Exclusive data mining and machine learning systems are being de...
Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA
145 Citations 2020Aimin Yang, Wei Zhang, Jiahao Wang + 3 more
Frontiers in Bioengineering and Biotechnology
This review introduces the development process of sequencing technology, expounds on the concept of DNA sequence data structure and sequence similarity, and analyzes the basic process of data mining, several major machine learning algorithms, and puts forward the challenges faced by machineLearning algorithms in the mining of biological sequence data.
MOF Synthesis Prediction Enabled by Automatic Data Mining and Machine Learning**
205 Citations 2022Yi Luo, Saientan Bag, Orysia Zaremba + 5 more
Angewandte Chemie International Edition
It is shown how ML can be used for rationalization and acceleration of the MOF discovery process by directly predicting the synthesis conditions of a MOF based on its crystal structure, outperforming human expert predictions obtained through a synthesis survey.