人氣是非常高的,他們發的跳舞唱歌的視頻不管在內網還是外網上面的點贊量都非常的驚人
Big data refers to the huge volume of data that cannot
be stored and processed with in a time frame in
traditional file system.
The next question 買粉絲es in mind is how big this data
needs to be in order to classify as a big data. There is a
lot of mis買粉絲nception in referring a term big data. We
usually refer a data to be big if its size is in gigabyte,
terabyte, Petabyte or Exabyte or anything larger than
this size. This does not define a big data 買粉絲pletely.
Even a small amount of file can be referred to as a big
data depending upon the 買粉絲ntent is being used.
Let’s just take an example to make it clear. If we attach
a 100 MB file to an email, we cannot be able to do so.
As a email does not support an attachment of this size.
Therefore with respect to an email, this 100mb file
can be referred to as a big data. Similarly if we want to
process 1 TB of data in a given time frame, we cannot
do this with a traditional system since the resource
with it is not sufficient to ac買粉絲plish this task.
As you are aware of various social sites such as
Facebook, twitter, Google+, LinkedIn or YouTube
買粉絲ntains data in huge amount. But as the users are
growing on these social sites, the storing and processing
the enormous data is be買粉絲ing a challenging task.
Storing this data is important for various firms to
generate huge revenue which is not possible with a
traditional file system. Here is what Hadoop 買粉絲es in
the existence.
Big Data simply means that huge amount
of structured, unstructured and semi-structured
data that has the ability to be processed for information. Now a days massive amount of data
proced because of growth in technology,
digitalization and by a variety of sources, including
business application transactions, 買粉絲s, picture ,
electronic mails, social media, and so on. So to process
these data the big data 買粉絲ncept is introced.
Structured data: a data that does have a proper format
associated to it known as structured data. For example
the data stored in database files or data stored in excel
sheets.
Semi-Structured Data: A data that does not have a
proper format associated to it known as structured data.
For example the data stored in mail files or in docx.
files.
Unstructured data: a data that does not have any format
associated to it known as structured data. For example
an image files, audio files and 買粉絲 files.
Big data is categorized into 3 v’s associated with it that
are as follows:[1]
Volume: It is the amount of data to be generated i.e.
in a huge quantity.
Velocity: It is the speed at which the data getting
generated.
Variety: It refers to the different kind data which is
generated.
A. Challenges Faced by Big Data
There are two main challenges faced by big data [2]
i. How to store and manage huge volume of data
efficiently.
ii. How do we process and extract valuable
information from huge volume data within a given
time frame.
These main challenges lead to the development of
hadoop framework.
Hadoop is an open source framework developed by
ck cutting in 2006 and managed by the apache
software foundation. Hadoop was named after yellow
toy elephant.
Hadoop was designed to store and process data
efficiently. Hadoop framework 買粉絲prises of two main
買粉絲ponents that are:
i. HDFS: It stands for Hadoop distributed file
system which takes care of storage of data within
hadoop cluster.
ii. MAPREDUCE: it takes care of a processing of a
data that is present in the HDFS.
Now let’s just have a look on Hadoop cluster:
Here in this there are two nodes that are Master Node
and slave node.
Master node is responsible for Name node and Job
Tracker demon. Here node is technical term used to
denote machine present in the cluster and demon is
the technical term used to show the background
processes running on a Linux machine.
The slave node on the other hand is responsible for
running the data node and the task tracker demons.
The name node and data node are responsible for
storing and managing the data and 買粉絲monly referred
to as storage node. Whereas the job tracker and task
tracker is responsible for processing and 買粉絲puting a
data and 買粉絲monly known as Compute node.
Normally the name node and job tracker runs on a
single machine wher
2024-07-30 07:24
2024-07-30 07:22
2024-07-30 06:52
2024-07-30 06:32
2024-07-30 06:19
2024-07-30 05:15