Home / Papers / Big Data Analytics Software

Big Data Analytics Software

88 Citations•2015•

A. Peter

journal unavailable

The definition of big data software is proposed as “software that supports the time-constrained processing of continuous information flows to provide actionable intelligence.”

Abstract

work? at defining big data, varying based on context, domain, and perspective. From the infrastructure’s perspective, big data has been defined as data with high volume, velocity, and variety (3V), and unpredictability. In this context, it has also been defined as data with some aspect that’s so large that current, typical methods can’t be used to process it.1,2 From the analytics’ perspective, big data has been defined as data so large that it contains significant low probability events that would be absent from traditional statistical sampling methods.3 From the business user’s perspective, big data represents opportunities for gaining a competitive advantage by gaining actionable intelligence.4 Each of these definitions provides descriptive and important aspects that must be supported by big data software. Borrowing from these definitions, we propose a definition for big data software as “software that supports the time-constrained processing of continuous information flows to provide actionable intelligence.” The phrase software that supports acknowledges that big data software includes both infrastructure and analytics software— these have been referred as big throughput and big analytics software, respectively.5 Infrastructure software is needed to store, retrieve, transmit, and process big data. While it’s essential to developing big data software, much of the emphasis and hype has been placed on the analytics portion of big data software. Nonetheless, our definition of big data software encompasses both types of software. The term time-constrained denotes the urgency in providing solutions. In a way, big data software shares a similar property with real-time software: late responses are wrong responses. The phrase continuous information flows generalizes the input of big data software, which has the unique properties of volume, velocity, and variety. This generalization also extends to other important information properties of big data input, such as continuity (data in motion versus data at rest). Data in motion (or data streams) W hat is big data software? How is it different than non-big-data software? Can it be engineered? Answering these questions requires