Home / Papers / Top Research Papers on Data Engineering

Top Research Papers on Data Engineering

Dive into a curated selection of the most influential research papers on Data Engineering. This collection covers groundbreaking approaches, methodologies, and applications that are shaping the future of this critical field. Expand your knowledge and keep up with the latest trends and innovations in Data Engineering.

Looking for research-backed answers?Try AI Search

Data-Driven Science and Engineering

564 Citations 2022

Steven L. Brunton, J. Nathan Kutz

Cambridge University Press eBooks

Data-driven discovery is revolutionizing how we model, predict, and control complex systems. Now with Python and MATLAB®, this textbook trains mathematical scientists and engineers for the next generation of scientific discovery by offering a broad overview of the growing intersection of data-driven methods, machine learning, applied optimization, and classical fields of engineering mathematics and mathematical physics. With a focus on integrating dynamical systems modeling and control with modern methods in applied machine learning, this text includes methods that were chosen for their releva...

Data engineering for fraud detection

135 Citations 2021

Bart Baesens, Sebastiaan Höppner, Tim Verdonck

Decision Support Systems

This work proposes several data engineering techniques to improve the performance of an analytical model while retaining the interpretability property, and illustrates the improvement in performance of these data engineering steps for popular analytical models on a real payment transactions data set.

The R Language: An Engine for Bioinformatics and Data Science

200 Citations 2022

Federico M. Giorgi, Carmine Ceraolo, Daniele Mercatelli

Life

An historical chronicle of how R became what it is today is provided, describing all its current features and capabilities, and the role of R in science in general as a driver for reproducibility is discussed.

Automated data processing and feature engineering for deep learning and big data applications: A survey

102 Citations 2024

Alhassan Mumuni, Fuseini Mumuni

Journal of Information and Intelligence

A thorough review of approaches for automating data processing tasks in deep learning pipelines, including automated data preprocessing, as well as data augmentation (including synthetic data generation using generative AI methods and feature engineering), and the use of AutoML methods and tools to simultaneously optimize all stages of the machine learning pipeline are presented.

Low-N protein engineering with data-efficient deep learning

383 Citations 2021

Surojit Biswas, Grigory Khimulya, Ethan C. Alley + 2 more

Nature Methods

A machine learning-guided paradigm that can use as few as 24 functionally assayed mutant sequences to build an accurate virtual fitness landscape and screen ten million sequences via in silico directed evolution is introduced.

Google Earth Engine for geo-big data applications: A meta-analysis and systematic review

1181 Citations 2020

Haifa Tamiminia, Bahram Salehi, Masoud Mahdianpari + 3 more

ISPRS Journal of Photogrammetry and Remote Sensing

A meta-analysis investigation of recent peer-reviewed GEE articles focusing on several features, including data, sensor type, study area, spatial resolution, application, strategy, and analytical methods confirmed that GEE has and continues to make substantive progress on global challenges involving process of geo-big data.

Sentinel-1 SAR Backscatter Analysis Ready Data Preparation in Google Earth Engine

335 Citations 2021

Adugna Mullissa, Andreas Vollrath, Christelle Odongo-Braun + 5 more

Remote Sensing

A framework for preparing Sentinel-1 SAR backscatter Analysis-Ready-Data in the Google Earth engine that combines existing and new Google Earth Engine implementations for additional border noise correction, speckle filtering and radiometric terrain normalization is presented.

Extracting accurate materials data from research papers with conversational language models and prompt engineering

249 Citations 2024

Maciej P. Polak, Dane Morgan

Nature Communications

This work proposes the ChatExtract method, a method that can fully automate very accurate data extraction with minimal initial effort and background, using an advanced conversational LLM, and shows that approaches similar to ChatExtract are likely to become powerful tools for data extraction in the near future.

Sustainable industrial and operation engineering trends and challenges Toward Industry 4.0: a data driven analysis

316 Citations 2021

Ming‐Lang Tseng, Thi Phuong Thuy Tran, Hiền Minh Hà + 2 more

Journal of Industrial and Production Engineering

This study supplies contributions to the existing literature with a state-of-the-art bibliometric review of sustainable industrial and operation engineering as the field moves toward Industry 4.0, and guidance for future studies and practical achievements. Although industrial and operation engineering is being promoted forward to sustainability, the systematization of the knowledge that forms firms’ manufacturing and operations and encompasses their wide concepts and abundant complementary elements is still absent. This study aims to analyze contemporary sustainable industrial and operations e...

Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A Comprehensive Review

1035 Citations 2020

Meisam Amani, Arsalan Ghorbanian, Seyed Ali Ahmadi + 9 more

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

This study aims to comprehensively explore different aspects of the GEE platform, including its datasets, functions, advantages/limitations, and various applications, and observed that Landsat and Sentinel datasets were extensively utilized by GEE users.

A XGBoost-Based Lane Change Prediction on Time Series Data Using Feature Engineering for Autopilot Vehicles

103 Citations 2022

Yi Zhang, Xiupeng Shi, Sheng Zhang + 1 more

IEEE Transactions on Intelligent Transportation Systems

A lane change prediction framework for feature learning, with the aim to have a deep and comprehensive understanding of lane change behaviors, and reach a high performance based on the selected features.

Rapid and robust monitoring of flood events using Sentinel-1 and Landsat data on the Google Earth Engine

459 Citations 2020

Ben DeVries, Chengquan Huang, John Armston + 3 more

Remote Sensing of Environment

An algorithm is presented that exploits all available Sentinel-1 SAR images in combination with historical Landsat and other auxiliary data sources hosted on the GEE to rapidly map surface inundation during flood events, relying on multi-temporal SAR statistics to identify unexpected floods in near real-time.

Combining machine learning and process engineering physics towards enhanced accuracy and explainability of data-driven models

177 Citations 2020

Timur Bikmukhametov, Johannes Jäschke

Computers & Chemical Engineering

By adding physics-based models to machine learning, it is possible not only to improve the performance of the purely black-box machine learning models, but also to make them more transparent and interpretable.

Aircraft engine remaining useful life estimation via a double attention-based data-driven architecture

235 Citations 2022

Lu Liu, Xiao Song, Zhetao Zhou

Reliability Engineering & System Safety

Remaining useful life (RUL) estimation has been intensively studied, given its important role in prognostics and health management (PHM) of industry. Recently, data-driven structures such as convolutional neural networks (CNNs), have achieved outstanding RUL prediction performance. However, conventional CNNs do not include an adequate mechanism for adaptively weighing input features. In this paper, we propose a double attention-based data-driven framework for aircraft engine RUL prognostics. Specifically, a channel attention-based CNN was utilized to apply greater weights to more significant f...

Engineering Is Elementary: An Engineering And Technology Curriculum For Children

162 Citations 2020

Kate Hester, Christine M. Cunningham

journal unavailable

As our society becomes increasingly dependent on engineering and technology, it is more important than ever that everyone have a basic understanding of what engineers do, and the uses and implications of the technologies they create.Yet few citizens are technologically literate, in large part because technology and engineering are not taught in our schools 1 .Just as it is important to begin science instruction in the elementary grades by building on children's curiosity about the natural world, it's important to begin engineering instruction in elementary school by building on children's natu...

Atomically engineered, high-speed non-volatile flash memory device exhibiting multibit data storage operations

116 Citations 2023

Ghulam Dastgeer, Sobia Nisar, Aamir Rasheed + 6 more

Nano Energy

Non-volatile memory devices, which offer large capacity and mechanical dependability as a mainstream technology, have played a key role in fostering innovation in modern electronics. Despite the advantages of non-volatile memory devices, their low ON/OFF ratio and slow operational speed have limited their performance compared to their volatile counterparts. In this study, we present a non-volatile floating-gate memory device based on van der Waals heterostructures , which exhibits ultrahigh-speed memory operations in the range of a hundred nanoseconds with an extinction ratio of up to 10 6 . T...

Dynamic predictive maintenance for multiple components using data-driven probabilistic RUL prognostics: The case of turbofan engines

108 Citations 2023

Mihaela Mitici, Ingeborg de Pater, Anne Barros + 1 more

Reliability Engineering & System Safety

The increasing availability of condition-monitoring data for components/systems has incentivized the development of data-driven Remaining Useful Life (RUL) prognostics in the past years. However, most studies focus on point RUL prognostics, with limited insights into the uncertainty associated with these estimates. This limits the applicability of such RUL prognostics to maintenance planning, which is per definition a stochastic problem. In this paper, we therefore develop probabilistic RUL prognostics using Convolutional Neural Networks. These prognostics are further integrated into maintenan...

Illuminating Engineering

177 Citations 2020

Elizabeth M. Parry, Laura Bottomley

journal unavailable

Abstract NOTE: The first page of text has been automatically extracted and included below in lieu of an abstract Session 2480 Illuminating Engineering Laura J. Bottomley and Elizabeth A. Parry North Carolina State University/Science Surround Abstract Engineering is a difficult profession to explain to the average person, much less student, and is probably one of the most frequently misunderstood. The session described in this paper was developed to put engineering in common terms for the lay person, as well as provide an interesting and fun way to explore different concentration areas of the p...

Aerodynamics for Engineers

350 Citations 2025

John J. Bertin, Mike L. Smith

Cambridge University Press eBooks

Revised and expanded to reflect cutting-edge innovation in aerodynamics, and packed with new features to support learning, the seventh edition of this classic textbook introduces the fundamentals of aerodynamics using clear explanations and real-world examples. Structured around clear learning objectives, this is the ideal textbook for undergraduate students in aerospace engineering, and for graduate students and professional engineers seeking a readable and accessible reference. Over 10 new Aerodynamics Computation boxes that bring students up to speed on modern computational approaches for p...

Aerodynamics for Engineers

174 Citations 2021

John J. Bertin, Russell M. Cummings

Cambridge University Press eBooks

Now reissued by Cambridge University Press, this sixth edition covers the fundamentals of aerodynamics using clear explanations and real-world examples. Aerodynamics concept boxes throughout showcase real-world applications, chapter objectives provide readers with a better understanding of the goal of each chapter and highlight the key 'take-home' concepts, and example problems aid understanding of how to apply core concepts. Coverage also includes the importance of aerodynamics to aircraft performance, applications of potential flow theory to aerodynamics, high-lift military airfoils, subsoni...

Combustion Engineering

141 Citations 2022

Kenneth M. Bryden, Kenneth W. Ragland, Song‐Charng Kong

journal unavailable

Combustion Engineering, Third Edition introduces the analysis, design, and building of combustion energy systems. It discusses current global energy, climate, and air pollution challenges and considers the increasing importance of renewable energy sources, such as biomass fuels. Mathematical methods are presented, along with qualitative descriptions of their use, which are supported by numerous tables with practical data and formulae, worked examples, chapter-end problems, and updated references. The new edition features new and updated sections on solid biofuels, spark-ignition, compression-i...

Engineering organoids

1034 Citations 2021

Moritz Hofer, Matthias P. Lütolf

Nature Reviews Materials

It is argued that many limitations of traditional organoid culture can be addressed by engineering approaches at all levels of organoid systems, and engineering approaches, including cellular engineering, designer matrices and microfluidics, are investigated to improve the reproducibility and physiological relevance of organoids.

Engineering Electromagnetics

141 Citations 2020

Nathan Ida

journal unavailable

The textbook's 4th edition features 600 new and revised end-of-chapter problems, applications and include new topics such as energy harvesting and renewable energy and a host of online videos and experiments, all directed to upper undergraduates, intended for classes in electromagnetics.

Engineering Thermodynamics

117 Citations 2022

Mohamed Aboudou Kassim

journal unavailable

Designed to cover the fundamental concepts of thermodynamics used in engineering, the book introduces topics such as the laws of thermodynamics, exergy analysis, thermodynamic cycles, measurement theory, and applications. Using step by step examples and numerous illustrations, the book is designed with a self-teaching methodology, including a variety of exercises with corresponding answers to enhance mastery of the content. The book provides an engineer with a basic understanding or review of thermodynamic principles. Features: Designed to cover the fundamental concepts of thermodynamics used ...

Electromagnetics for Engineers

100 Citations 2025

F.T. Ulaby

Michigan Publishing Services eBooks

Contents Preface 1 Introduction 2 Vector Algebra 3 Vector Calculus 4 Electrostatics 5 Magnetostatics 6 Maxwell's Equations for Time-Varying Fields 7 Plane-Wave Propagation 8 Transmission Lines 9 Wave Reflection and Transmission 10 Radiation and Antennas

Engineered probiotics

117 Citations 2022

Junheng Ma, Yuhong Lyu, Xin Liu + 5 more

Microbial Cell Factories

The theoretical basis of gene editing technology is introduced and some recent engineered probiotics researches, including inflammatory bowel disease, bacterial infection, tumor and metabolic diseases are focused on.

Machine learning data-driven approaches for land use/cover mapping and trend analysis using Google Earth Engine

171 Citations 2021

Bakhtiar Feizizadeh, Davoud Omarzadeh, Mohammad Kazemi Garajeh + 2 more

Journal of Environmental Planning and Management

This study utilizes machine learning algorithms on the GEE cloud computing platform for land use/land cover (LULC) mapping and change detection analysis using a Landsat satellite image time series and confirms the potential of machine learning techniques for time series LULC mapping on theGEE platform while lowering the barriers to analyzing large amounts of satellite data.

Analisis Mutu Data Time Series Covid-19: Studi kasus di Covid-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University

116 Citations 2020

Dhika Surya Pangestu, Yuyun Hidayat

journal unavailable

Covid-19 adalah penyakit menular yang disebabkan oleh SARS-CoV-2, yang merupakan salah satu jenis dari koronavirus. Sejak awal kemunculannya pada akhir tahun 2019, hingga 2 Agustus 2020 telah ada lebih dari 17,7 juta penduduk dunia yang terinfeksi. Dalam selang waktu itu muncul berbagai penelitian untuk mempelajari pandemi covid-19 ini dan salah satunya adalah penelitian mengenai perkembangan jumlah kasus covid-19. Salah satu dari sekian banyak dataset yang digunakan dalam mempelajari perkembangan jumlah kasus covid-19 adalah data dari COVID-19 Data Repository by the Center for Systems Science...

Engineering Multi‐Cellular Spheroids for Tissue Engineering and Regenerative Medicine

220 Citations 2020

Se‐Jeong Kim, Eun Mi Kim, Masaya Yamamoto + 2 more

Advanced Healthcare Materials

Abstract Multi‐cellular spheroids are formed as a 3D structure with dense cell–cell/cell–extracellular matrix interactions, and thus, have been widely utilized as implantable therapeutics and various ex vivo tissue models in tissue engineering. In principle, spheroid culture methods maximize cell–cell cohesion and induce spontaneous cellular assembly while minimizing cellular interactions with substrates by using physical forces such as gravitational or centrifugal forces, protein‐repellant biomaterials, and micro‐structured surfaces. In addition, biofunctional materials including magnetic nan...

Mapping the Land Cover of Africa at 10 m Resolution from Multi-Source Remote Sensing Data with Google Earth Engine

101 Citations 2020

Qingyu Li, Chunping Qiu, Lei Ma + 2 more

Remote Sensing

After experimental evaluation of different land cover classes across different cities, it is concluded that continental land cover mapping results can be considerably improved when training samples of natural land cover Classes are collected and combined from areas covering each Köppen climate zone.

Monitoring Forest Change in the Amazon Using Multi-Temporal Remote Sensing Data and Machine Learning Classification on Google Earth Engine

107 Citations 2020

Maria Antonia Brovelli, Yaru Sun, Vasil Yordanov

ISPRS International Journal of Geo-Information

The results demonstrate that such a fusion of satellite observations, machine learning, and cloud processing, benefits the analysis of the forest dynamics and can provide useful information for the development of forest policies.

Mapping Three Decades of Changes in the Brazilian Savanna Native Vegetation Using Landsat Data Processed in the Google Earth Engine Platform

224 Citations 2020

Ane Alencar, Julia Z. Shimbo, Felipe Lenti + 13 more

Remote Sensing

These results were fundamental in indicating areas with higher rates of change in a long time series in the Brazilian Cerrado and to highlight the challenges of mapping distinct NV types in a highly seasonal and heterogeneous savanna biome.

The Art of Feature Engineering

105 Citations 2020

Pablo Duboue

Cambridge University Press eBooks

When machine learning engineers work with data sets, they may find the results aren't as good as they need. Instead of improving the model or collecting more data, they can use the feature engineering process to help improve results by modifying the data's features to better capture the nature of the problem. This practical guide to feature engineering is an essential addition to any data scientist's or machine learning engineer's toolbox, providing new ideas on how to improve the performance of a machine learning solution. Beginning with the basic concepts and techniques, the text builds up t...

Chemical Engineering Thermodynamics

170 Citations 2020

Ravi Varma Rambha

journal unavailable

The book presents concepts and equations of equilibrium thermodynamics or thermostatics. Key features that distinguish this book from others on chemical engineering thermodynamics are: a mathematical treatment of the developments leading to the discovery of the internal energy and entropy; a clear distinction between the classical thermodynamics of Carnot, Clausius and Kelvin and the thermostatics of Gibbs; an intensive/specific variable formalism from which the extensive variable formalism is obtained as a special case; a systematic method of obtaining the central equations of thermostatics w...

Wind Energy Engineering

315 Citations 2023

authors unavailable

Elsevier eBooks

This chapter discusses the development of financial models for planning and execution of wind projects, as well as the impact of environmental impact and other aspects of the business.

Principles of Tissue Engineering

218 Citations 2020

Kanczler, Janos M., Wells, Julia Anne, Gibbs, David + 3 more

Elsevier eBooks

This research highlights the need to understand more fully the role of “cell reprograming” in the development of Parkinson’s disease.

Engineered Living Hydrogels

190 Citations 2022

Xinyue Liu, María Eugenia Inda, Yong Lai + 2 more

Advanced Materials

Abstract Living biological systems, ranging from single cells to whole organisms, can sense, process information, and actuate in response to changing environmental conditions. Inspired by living biological systems, engineered living cells and nonliving matrices are brought together, which gives rise to the technology of engineered living materials. By designing the functionalities of living cells and the structures of nonliving matrices, engineered living materials can be created to detect variability in the surrounding environment and to adjust their functions accordingly, thereby enabling ap...

System Requirements Engineering

308 Citations 2020

authors unavailable

journal unavailable

This book considers requirements engineering as a combination of three concurrent and interacting processes: eliciting knowledge related to a problem domain, ensuring the validity of such knowledge and specifying the problem in a formal way.

Phase engineering of nanomaterials

708 Citations 2020

Ye Chen, Zhuangchai Lai, Xiao Zhang + 4 more

Nature Reviews Chemistry

Phase has emerged as an important structural parameter - in addition to composition, morphology, architecture, facet, size and dimensionality - that determines the properties and functionalities of nanomaterials. In particular, unconventional phases in nanomaterials that are unattainable in the bulk state can potentially endow nanomaterials with intriguing properties and innovative applications. Great progress has been made in the phase engineering of nanomaterials (PEN), including synthesis of nanomaterials with unconventional phases and phase transformation of nanomaterials. This Review prov...

Engineering cytokine therapeutics

173 Citations 2023

Jeroen Deckers, Tom Anbergen, Ayla M. Hokke + 13 more

Nature Reviews Bioengineering

How the development of bioanalytical methods, such as sequencing and high-resolution imaging combined with genetic techniques, have facilitated a better understanding of cytokine biology are discussed, and bioengineering approaches for the design of clinically applicable and safe cytokine-based therapeutics are highlighted.