Definition of Bigdata
Any data that is hard to handle using conventional tools and techniques is Bigdata.
- If you just want to find a number of rows, number of columns or what an average of a particular column is; then you load the file and find the average or find the number of loads.
- But if you use the same technique on a bigger dataset, then it won’t work.
- Thus, any dataset that is hard to handle using conventional tools and techniques is called big data.
Bigdata
Any data that is difficult to:
- Capture
- Curate
- Store
- Search
- Transfer
- Analyze
- To create visualizations
Bigdata is
- Data that so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
- Big Data is the data whose scale, diversity, and complexity require new architecture, techniques, tools, algorithms, and hardware to manage it and extract value and hidden knowledge from it.
Bigdata is not just about size
- Volume:
- Data volumes are becoming unmanageable.
- Variety:
- Text Data (Web), Numerical, Images, Audio, Video, Social Network, Semantic Web (RDF), Semi-Structured Data (XML), Multi-Dimensional Arrays, etc.
- Social Networking Sites data
- Text Data (Web), Numerical, Images, Audio, Video, Social Network, Semantic Web (RDF), Semi-Structured Data (XML), Multi-Dimensional Arrays, etc.
- Velocity:
- Some data is arriving so rapidly, that it must either be processed instantly, or lost. This is a whole subfield called stream processing.
- E-commerce data
- Stock exchange data
- Some data is arriving so rapidly, that it must either be processed instantly, or lost. This is a whole subfield called stream processing.
Some examples of Bigdata
- Stock exchange transactions data
- Voice clips data
- Video clips data
- Social network data
- Smart phone generated data
- Weblog data
- E-commerce customer data
What can be done with Bigdata?
Uses of Bigdata Analysis
- Ford:
- Collects the data from more than 4,000,000 vehicles using sensors.
- Thus, from these sensors, they collect much of data every day; whenever the vehicle is in active position.
- The sensors are sending much of data to Ford servers.
- They are analyzing the sensor data to improve quality of vehicles and to reduce the fatal accidents.
- Amazon:
- Collects millions of users click stream and product search log data thus as soon as you buy first, second or third product then the 4th product that is going to appear in the recommendations is most likely to be the one that you are looking for.
- Used for improving product recommendation engine.
- AT&T:
- They’re collecting lots of signal data from the telecom mobile users, based on that they’re giving that data to traffic police, so that they can plan the traffic.
- Collects data from millions of customers.
- Analyzes the cell tower network usage data and helps the urban planners and traffic engineers, thus even traffic engineers are using it for the well planning of the traffic or routing.
- Walmart:
- One of the largest civilian data warehouse in the world.
- They get a lot of data from their users, based on that, they come up with a lot of interesting relations and market basket analysis.
- Market Basket Analysis crunches the data and finds out various hidden patterns like what do hurricanes, strawberry and beer have in common.