The term Big Data is being increasingly used almost everywhere – online and offline. And it is not only related to computers. It comes under a blanket term called information technology, which is now part of almost all other technologies, fields of study, and businesses. Big Data is not a big deal. The hype surrounding it is sure a pretty big deal to confuse you. This article takes a look at what is Big Data. It also contains an example of how NetFlix used its data, or Big Data, to better serve its clients’ needs.
What is Big Data
The data lying in your company’s servers was just data until yesterday – sorted and filed. Suddenly, the slang Big Data got popular, and now the data in your company is Big Data. The term covers every piece of data your organization has stored until now. It includes data stored in clouds and even the URLs that you bookmarked. Your company might not have digitized all the data. You may not have structured all the data already. But then, all the digital, papers, structured, and non-structured data with your company is now Big Data.
In short, all the data in your servers—whether or not categorized—are collectively called BIG DATA. This data can be used to get different results using different types of analysis. Not all analyses need to use all the data. The analysis uses different parts of the BIG DATA to produce the necessary results and predictions.
Big Data is the data you analyze for results that you can use for predictions and other uses. When using the term Big Data, your company or organization is suddenly working with top-level Information technology to deduce different types of results using the same data you stored intentionally or unintentionally over the years.
Read: Data Science vs Computer Science explained.
How big is Big Data
Essentially, all the data combined is Big Data, but many researchers agree that Big Data – as such – cannot be manipulated using normal spreadsheets and regular database management tools. They need special analysis tools like Hadoop (we’ll study this in a separate post) so that all the data can be analyzed at one go (may include iterations of analysis).
Contrary to the above, though I am not an expert on the subject, I would say that data from any organization—big or small, organized or unorganized—is Big Data for that organization and that the organization may choose its own tools to analyze the data.
Normally, people create different data sets based on one or more common fields to make data analysis easy. In the case of Big Data, there is no need to create subsets. We now have tools that can analyze data regardless of its size. Probably, these tools themselves categorize the data even as they analyze it.
I find it important to mention two sentences from the book “Big Data” by Jimmy Guterman:
“Big Data: when the size and performance requirements for data management become significant design and decision factors for implementing a data management and analysis system.”
-And-
“For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.”
So you see that volume and analysis are important to Big Data.
Read: What is Data Mining?
Big Data Concepts
This is another point where most people don’t agree. Some experts say that the Big Data Concepts are three V’s:
- Volume
- Velocity
- Variety
Some others add few more V’s to the concept:
- Visualization
- Veracity (Reliability)
- Variability and
- Value
I will cover concepts of Big Data in a separate article as this post is already getting big. In my opinion, the first three V’s are enough to explain the concept of Big Data.
Big Data Example – How NetFlix used it to fix its problems
Several years back, there was an outage at NetFlix, leaving many customers in the dark. While some could still access the streaming services, most could not. Some customers managed to get their rented DVDs, whereas others failed. A blog post on the Wall Street Journal says Netflix had just started on-demand streaming.
The outage made management think about possible future problems, and hence, it turned to Big Data. Using that data, it analyzed high-traffic areas, susceptible points, network throughput, etc., and worked on lowering the downtime if a future problem arose as it went global. Here is the link to the Wall Street Journal Blog if you wish to check out the examples of Big Data.
The above summarizes Big Data in layman’s language—you can call it a very basic introduction. I plan to write a few more articles on associated factors such as Concepts, Analysis, Tools, and uses of Big Data, Big Data 3 V’s, etc. Meanwhile, if you would like to add anything to the above, please comment and share it with us.
Read next: What is Web Scraping?
The company I used to own wrote a GUI for a “BIG DATA” analysis program for the state DOT to conduct traffic studies. The DOT purchased data subscriptions that were automatically imported into our GUI and then the DOT could organize the data into things like travel distance to an intersection, traffic volumes at all hours, speed, demographics of vehicle passengers, as well as dozens of other criteria. It gave the DOT much more useful information than the standard axle counters stretched across the road gave them.
Arun, very nice Big Data article. When considering a big data strategy, I think it’s worth mentioning HPCC Systems from LexisNexis. Designed by data scientists, HPCC Systems is an open source data-intensive supercomputing platform to process and solve Big Data analytical problems and can help companies derive actionable insights from their data.
HPCC Systems provides proven solutions to handle what are now called Big Data problems, and have been doing so for more than a decade. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model. More info at http://hpccsystems.com
I am reading about the different options available for data analysis. The link you gave is a useful resource. Thank you :)
Do you mean using a customized option is better instead of going for ready-made products like Hadoop etc?
You’re welcome! You might also want to check out their free online ECL training at http://learn.lexisnexis.com/hpcc
Detailed archiving of your info; be it family albums or music files will be an ongoing task but well worthwhile.
With a few keystrokes you can automatically catalog existing “MPG” video, “MP3” audio, “JPG” pictures and “TXT” email records & notes etc.
Moments later you will be randomly sampling your data treasures.
It takes some time for Your personal database to become large enough to make searching interesting. If your memory is short then not so long.
This app plows through text files at 20,000,000 CPS and beyond on a 8 year old LapTop..
I have completed over 4 Dozen Telephone and CableSystem billing conversions (ETL) 95% of that data came
in text files, a small percent was packed integer and real numbers etc. The Export data included toll files, work orders, customer details etc. These data were no problem for this app.
Many of these IT jobs were for some of the largest companies in the Western Canada and US.
The so called “Big Data” isn’t that BIG for today’s computers. I have more personal data than all the ETLs combined!!!
To keep You in touch with the massive amounts of DATA you’ll collect; the app can randomly sample Video or audio segments as easily as family pictures.
Text data can be displayed “in context” or “matching lines only” along with match counts, line counts and elapsed time.
Without a Random option; computer resources go unused and your data mining tools will fall short.
Video playback options such as Fast Forward, Slow Motion and Large font captioning mixed with Video segments are a few of the
main features. There is no more useful app than this.
See the thread “nobody shares knowledge better than this” for all the details
Nice Explanation