A brief intro of HDFS
HDFS (Hadoop Distributed File System)
Structure
NameNode: control the system
StandbyNode: handle logs of NameNode and serves as a backup of NameNode
DataNode: store data
Write:
NameNode get request from client->split data->StandbyNode inform NameNode of nodes to store data (workload balance)->NameNode passes the data to the DataNode->DataNode will pass data to the next DataNode (same piece of data will be copied and stored on multiple DataNodes)
Read:
NameNode get request from client->StandbyNode inform client where to find data->Client find the data from the nearest DataNode
How to handle node Failure:
DataNode: DataNodes will send signal to NameNode via heartbeat mechanism.
NameNode: StandbyNode serves as backup