• No products in the cart.

301.1.4-Handling Big Data

Bigdata Tool

  • Analysis on this Bigdata can give us awesome insights.
  • But, by definition, the bigdata can’t be handled using conventional tools.
  • Datasets complex, huge and difficult to process.
  • What is the solution?

Handling Bigdata – Using supercomputers

  • Super Computer is a solution.
  • Put multiple CPUs in a machine (100?). It will give the result quickly.
  • Let us see if we have a normal laptop then it is very difficult to handle big data, because the data set size itself is 16 PB or 1 PB and if we have a normal system that even might have just 1 TB of hard disk space, then getting the data or acquiring the data or storing the data itself becomes difficult, forget about analyzing the data.
  • We can take a supercomputer, so instead of one CPU, we can put multiple CPUs in that, instead of one hard disk, we can put a huge hard disk so we can have a supercomputer to handle the big data.
  • Now the problem with the supercomputer is building a supercomputer or the cost of building a supercomputer is so high that the institutes like NASA or ISRO or really big institutes or really big companies can afford supercomputers.
  • The cost of buying a supercomputer might be sometimes really higher than whatever results that you are going to get out of big data.
  • If the dataset’s size is large, then that doesn’t mean we have to invest a lot on the computer.
  • Supercomputer is a solution but it is not that cost effective solution; it is really costly for individuals. It’s almost like impossible to buy a supercomputer just to perform these operations.

Handling Bigdata: Is there a better way?

  • Till 1985, there is no way to connect multiple computers.
  • All computers were centralized individual systems.
  • Multi-core system or supercomputers were the only options for big data problems.
  • After 1985, we have powerful microprocessors and High-Speed Computer Networks.

Handling Bigdata: Distributed systems

  • The Computer Networks LANs, WANs lead to distributed systems.
  • Now that we have a distributed system that ensures a collection of independent computers appears to its users as a single coherent system.
  • We can use some low-priced connected computers and process our bigdata.

Cluster Computing

  • Cluster is nothing but when you take few machines and you connect them through LANs and WAN’s, that is called cluster.
  • A collection of independent computers that are joined together using LAN is called computer cluster.
  • We can do distributed computing or cluster computing to handle big data with a single machine, as it is really difficult for it to handle big data.

Handling Bigdata- Distributed computing

Distributed computing

  • We have the overall final task, then we can divide the data into smaller pieces and place them on all these different machines.
  • Now, these smaller machines or low-end machines can handle smaller data set, if we have a huge data set, we can divide the dataset into smaller pieces and then distributed onto all these machines.
  • Then we connect all these machines using LAN or WAN and this whole set of machines or cluster of machines, the cluster of computers look like a really big supercomputer, we can make it work like that.
  • Put them in each of the machines, divide the overall problem into smaller pieces and then run them locally on each of the machines.


0 responses on "301.1.4-Handling Big Data"

Leave a Message


Statinfer derived from Statistical inference is a company that focuses on the data science training and R&D.We offer training on Machine Learning, Deep Learning and Artificial Intelligence using tools like R, Python and TensorFlow

Contact Us

We Accept

Our Social Links

How to Become a Data Scientist?

© 2020. All Rights Reserved.