Statinfer

301.1.1-Introduction To Bigdata

Conventional tools and their limitations

Really big data:

  • Can you think of running a query on 20,980,000 GB file?
  • What if we get a new dataset like this, every day?
  • What if we need to execute complex queries on this data set everyday?
  • Does anybody really deal with this type of data set?
  • Is it possible to store and analyze this data?
    • Yes, google deals with more than 20 PB data everyday (before some years back, as in 2008). Now they’re dealing with more than 20 PB of data every day. Google is collecting lots of data everyday.
  • Now running queries on one such data set which is off 20 PB on a SQL is very difficult.

Are there really big datasets?

  • Google processes 20 PB a day (2008).
  • Way back Machine has 3 PB + 100 TB/month (3/2009).
  • Facebook has 2.5 PB of user data + 15 TB/day (4/2009).
  • eBay has 6.5 PB of user data + 50 TB/day (5/2009).
  • CERN’s Large Hydron Collider (LHC) generates 15 PB a year.

In fact, in a minute…

  • Email users send more than 204 million messages;
  • Mobile Web receives 217 new users;
  • Google receives over 2 million search queries;
  • YouTube users upload 48 hours of new video;
  • Facebook users share 684,000 bits of content;
  • Twitter users send more than 100,000 tweets;
  • Consumers spend $272,000 on Web shopping;
  • Apple receives around 47,000 application downloads;
  • Brands receive more than 34,000 Facebook ‘likes’;
  • Tumblr blog owners publish 27,000 new posts;
  • Instagram users share 3,600 new photos;
  • Flickr users, on the other hand, add 3,125 new photos;
  • Foursquare users perform 2,000 check-ins;
  • WordPress users publish close to 350 new blog posts.
  • There are many places, where data is generated in very huge amount within one day or within a minute.
  • In fact, just in 1 minute, this much amount of data is being generated.

Conventional tools and their limitations

  • Traditional data handling tools and their limitations
  • Excel : Have you ever tried a pivot table on 500 MB file?
    • If you try excel, it is a good tool for a ad-hoc analysis, but if you try to open a file which is 500 MB or even 1 GB then it starts hanging the system as you won’t be able to handle the data more than 1 GB in excel on usual systems.
  • SAS/R : Have you ever tried a frequency table on 2 GB file?
    • SAS or R are the analytical tools as they tend to give it up when you try a data which is more than 2 GB file.
  • Access: Have you ever tried running a query on 10 GB file?
    • Access can handle data or query up to 10 GB, but beyond that it is not really going to help you.
  • SQL: Have you ever tried running a query on 50 GB file?
    • SQL on a supercomputer kind of system can handle up to 50 GB data, but beyond that, SQL won’t be able to handle data.

  • Thus, these are the conventional tools such as SQL, Excel, Access, R, SAS.
  • The conventional tools won’t be able to handle this type of data that is coming rapidly within a minute or every day.
  • There is so much of data that is coming up and the conventional tools won’t be able to handle this.
15th May 2017

0 responses on "301.1.1-Introduction To Bigdata"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?

top