top of page
  • Writer's pictureJatin Madaan

Introduction To Big Data & Hadoop


1 GB * 1024 = 1 PB (Peta Byte) .

Facebook processes 7 PB (approx) of data daily , there are approx 20 PB webpages total on internet servers .


Before this much of data what we used was a DS (Distributed System ) - Divide data and process parallelly , It was not a platform just that we connected different machines together to work .

Whenever multiple machines are in cooperation with one another the problem of failures arises :

  • Network Failure.

  • Individual compute nodes may overheat, crash ,experience hard-drive failures .

  • Data may be corrupted during transmission .

  • Clocks may not be synchronised.

  • Locks may not be released.

If issue occurs we need to write a code for fix . Platform for above is Hadoop .



Predecessor of Hadoop


Grid Computing : Data is stored at one place and multiple CPU work together in parallel to process data .




MPI - It gives control to programmer but it requires explicitly handling the mechanics of the data flow , exposed via low-level C-routines such as sockets.




To solve all these problems we have a platform which enable coders to work on actual problem and rest all things will be handled by platform.


HADOOP - It is a platform (MR- Map reduce solved by this ) .

SPARK - It is a platform (Diff. for diff. needs) .




HADOOP :


  • Data is distributed and is processed parallelly where data is independent.

  • Processing in Hadoop operates only at high-level ie the programmers think in terms of data models (such as key-value pairs for MR).

  • Map-Reduce framework spares the programmer from having to think about failures ie Implementation of Hadoop framework detects the failed tasks and distributes them on healthy nodes.


History :





19 views0 comments

Comments


bottom of page