Approaching big data
The explosion of social networks, combined with the unstoppable spread of smartphones, justifies the fact that one of the recurring terms in the world of innovation, marketing, and information technology in recent years is big data. This term indicates data produced in large quantities, with considerable speed, and in the most diverse formats, the processing of which requires technologies and resources that go well beyond the conventional systems of management and storage of data. But what is enclosed in this term?
In a widely quoted article, The Age of Big Data, Steve Lohr (a technology reporter for The New York Times) explained big data in this way:
The term big data should not mislead us; in fact, at first sight, we may think that this phenomenon concerns only the data size. Although the dimensions certainly represent an element of the problem, there are other aspects or other properties of big data that are not necessarily associated with them.
"Big data has three dimensions—volume, variety, and velocity," says Michael Minelli. "And within each of those three dimensions is a wide range of variables."
Let's take a closer look at the three dimensions associated with big data:
- Volume: Big data implies huge volumes of data. Earlier, it was men who created data. Now that data is generated by machines, networks, and social media, the volume of data to be analyzed is enormous. Yet volume is not the only problem that needs to be addressed.
- Variety: The variety in data is due to the many sources and types of both structured and unstructured data in which such data is stored. In the past, data was stored in spreadsheets and databases. Now it is available in the form of photos, videos, audio, emails, and so on. This variety of unstructured data creates problems for storing, extracting, and analyzing data.
- Velocity: Finally, velocity refers to the sequence in which data arrives from sources such as industrial processes, machines, networks, social media, mobile devices, and so on. The flow of data is therefore massive and continuous. This real-time data can help researchers and companies make important decisions that offer strategic competitive advantages if they are able to manage the speed.
Companies are generating ever-increasing amounts of data, capturing trillions of information bytes on their customers, suppliers, and operations. This large amount of information is due to the fact that the data arrives massively from sources such as:
- Sensors that collect different types of data
- GPRS packages of mobile phones that map the position of potential customers
- Contents on social media
- Images—digital and video
- Online recordings of purchase transactions
- Any other source that can produce information of our interest
These are shown in the following diagram:

Functionally, gathering this large amount of structured and unstructured data can help organizations to:
- Reduce costs
- Improve operational efficiency and production performance
- Improve customer relationships
- Develop new products in a more informed way
- Accelerate and synchronize deliveries
- Formulate and respond to more in-depth requests
- Improve and simplify decision-making
All this is already the reality for many large companies. The challenge for the future is to make sure that even small companies as well as individuals can have access to resources that allow them to process data in a simple and functional way.
Thanks to data storage and cloud computing, the ability to memorize, aggregate, and combine data (and therefore to use the results to perform deep analysis) is gradually becoming more accessible. In other words, these services continue to reduce their costs and other technological barriers in the face of an increasingly performant and efficient service. For example, with cloud computing, highly scalable computing resources can be accessed through the internet, often at lower prices than those needed to install on their computers, as resources are shared among many users.