The term Big Data is being increasingly used almost everywhere – online and offline. And it is not only related to computers. It comes under a blanket term called information technology, which is now part of almost all other technologies, fields of study, and businesses. Big Data is not a big deal. The hype surrounding it is sure a pretty big deal to confuse you. This article takes a look at what is Big Data. It also contains an example of how NetFlix used its data, or Big Data, to better serve its clients’ needs.
What is Big Data
The data lying in your company’s servers was just data until yesterday – sorted and filed. Suddenly, the slang Big Data got popular, and now the data in your company is Big Data. The term covers every piece of data your organization has stored until now. It includes data stored in clouds and even the URLs that you bookmarked. Your company might not have digitized all the data. You may not have structured all the data already. But then, all the digital, papers, structured, and non-structured data with your company is now Big Data.
In short, all the data in your servers—whether or not categorized—are collectively called BIG DATA. This data can be used to get different results using different types of analysis. Not all analyses need to use all the data. The analysis uses different parts of the BIG DATA to produce the necessary results and predictions.
Big Data is the data you analyze for results that you can use for predictions and other uses. When using the term Big Data, your company or organization is suddenly working with top-level Information technology to deduce different types of results using the same data you stored intentionally or unintentionally over the years.
Read: Data Science vs Computer Science explained.
How big is Big Data
Essentially, all the data combined is Big Data, but many researchers agree that Big Data – as such – cannot be manipulated using normal spreadsheets and regular database management tools. They need special analysis tools like Hadoop (we’ll study this in a separate post) so that all the data can be analyzed at one go (may include iterations of analysis).
Contrary to the above, though I am not an expert on the subject, I would say that data from any organization—big or small, organized or unorganized—is Big Data for that organization and that the organization may choose its own tools to analyze the data.
Normally, people create different data sets based on one or more common fields to make data analysis easy. In the case of Big Data, there is no need to create subsets. We now have tools that can analyze data regardless of its size. Probably, these tools themselves categorize the data even as they analyze it.
I find it important to mention two sentences from the book “Big Data” by Jimmy Guterman:
“Big Data: when the size and performance requirements for data management become significant design and decision factors for implementing a data management and analysis system.”
-And-
“For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.”
So you see that volume and analysis are important to Big Data.
Read: What is Data Mining?
Big Data Concepts
This is another point where most people don’t agree. Some experts say that the Big Data Concepts are three V’s:
- Volume
- Velocity
- Variety
Some others add few more V’s to the concept:
- Visualization
- Veracity (Reliability)
- Variability and
- Value
I will cover concepts of Big Data in a separate article as this post is already getting big. In my opinion, the first three V’s are enough to explain the concept of Big Data.
Big Data Example – How NetFlix used it to fix its problems
Several years back, there was an outage at NetFlix, leaving many customers in the dark. While some could still access the streaming services, most could not. Some customers managed to get their rented DVDs, whereas others failed. A blog post on the Wall Street Journal says Netflix had just started on-demand streaming.
The outage made management think about possible future problems, and hence, it turned to Big Data. Using that data, it analyzed high-traffic areas, susceptible points, network throughput, etc., and worked on lowering the downtime if a future problem arose as it went global. Here is the link to the Wall Street Journal Blog if you wish to check out the examples of Big Data.
The above summarizes Big Data in layman’s language—you can call it a very basic introduction. I plan to write a few more articles on associated factors such as Concepts, Analysis, Tools, and uses of Big Data, Big Data 3 V’s, etc. Meanwhile, if you would like to add anything to the above, please comment and share it with us.
Read next: What is Web Scraping?