A brief intro of HDFS

HDFS (Hadoop Distributed File System)

Structure

NameNode: control the system

StandbyNode: handle logs of NameNode and serves as a backup of NameNode

DataNode: store data

Write:

NameNode get request from client->split data->StandbyNode inform NameNode of nodes to store data (workload balance)->NameNode passes the data to the DataNode->DataNode will pass data to the next DataNode (same piece of data will be copied and stored on multiple DataNodes)

Read:

NameNode get request from client->StandbyNode inform client where to find data->Client find the data from the nearest DataNode

How to handle node Failure:

DataNode: DataNodes will send signal to NameNode via heartbeat mechanism.

NameNode: StandbyNode serves as backup