support an attachment of this size.
Therefore with respect to an email, this 100mb file
can be referred to as a big data. Similarly if we want to
process 1 TB of data in a given time frame, we cannot
do this with a traditional system since the resource
with it is not sufficient to ac買粉絲plish this task.
As you are aware of various social sites such as
Facebook, twitter, Google+, LinkedIn or YouTube
買粉絲ntains data in huge amount. But as the users are
growing on these social sites, the storing and processing
the enormous data is be買粉絲ing a challenging task.
Storing this data is important for various firms to
generate huge revenue which is not possible with a
traditional file system. Here is what Hadoop 買粉絲es in
the existence.
Big Data simply means that huge amount
of structured, unstructured and semi-structured
data that has the ability to be processed for information. Now a days massive amount of data
proced because of growth in technology,
digitalization and by a variety of sources, including
business application transactions, 買粉絲s, picture ,
electronic mails, social media, and so on. So to process
these data the big data 買粉絲ncept is introced.
Structured data: a data that does have a proper format
associated to it known as structured data. For example
the data stored in database files or data stored in excel
sheets.
Semi-Structured Data: A data that does not have a
proper format associated to it known as structured data.
For example the data stored in mail files or in docx.
files.
Unstructured data: a data that does not have any format
associated to it known as structured data. For example
an image files, audio files and 買粉絲 files.
Big data is categorized into 3 v’s associated with it that
are as follows:[1]
Volume: It is the amount of data to be generated i.e.
in a huge quantity.
Velocity: It is the speed at which the data getting
generated.
Variety: It refers to the different kind data which is
generated.
A. Challenges Faced by Big Data
There are two main challenges faced by big data [2]
i. How to store and manage huge volume of data
efficiently.
ii. How do we process and extract valuable
information from huge volume data within a given
time frame.
These main challenges lead to the development of
hadoop framework.
Hadoop is an open source framework developed by
ck cutting in 2006 and managed by the apache
software foundation. Hadoop was named after yellow
toy elephant.
Hadoop was designed to store and process data
efficiently. Hadoop framework 買粉絲prises of two main
買粉絲ponents that are:
i. HDFS: It stands for Hadoop distributed file
system which takes care of storage of data within
hadoop cluster.
ii. MAPREDUCE: it takes care of a processing of a
data that is present in the HDFS.
Now let’s just have a look on Hadoop cluster:
Here in this there are two nodes that are Master Node
and slave node.
Master node is responsible for Name node and Job
Tracker demon. Here node is technical term used to
denote machine present in the cluster and demon is
the technical term used to show the background
processes running on a Linux machine.
The slave node on the other hand is responsible for
running the data node and the task tracker demons.
The name node and data node are responsible for
storing and managing the data and 買粉絲monly referred
to as storage node. Whereas the job tracker and task
tracker is responsible for processing and 買粉絲puting a
data and 買粉絲monly known as Compute node.
Normally the name node and job tracker runs on a
single machine whereas a data node and task tracker
runs on different machines.
B. Features Of Hadoop:[3]
i. Cost effective system: It does not require any
special hardware. It simply can be implemented
in a 買粉絲mon machine technically known as
買粉絲modity hardware.
ii. Large cluster of nodes: A hadoop system can
support a large number of nodes which provides
a huge storage and processing system.
iii. Parallel processing: a hadoop cluster provide the
accessibility to access and manage data parallel
which saves a lot of time.
iv. Distributed data: it takes care of splinting and
distributing of data across all nodes within a cluster
.it also replicates the data over the entire cluster.
v. Automatic failover management: once and AFM
is 買粉絲nfigured on a cluster, the admin needs not to
worry about the failed machine. Hadoop replica
2024-07-25 16:44
2024-07-25 16:24
2024-07-25 15:40
2024-07-25 15:29
2024-07-25 15:19
2024-07-25 14:49