Big Data Analysis using Apache Hadoop
HDFS (Hadoop Distributed File System)
Hadoop File System was developed using ditributed file system design. It is run on commodity hardware. Unlike other distributed systems, HDFS is highly fault tolerant and designed using low-cost hardware.
HDFS holds very large amount of data and provides easier access. To store such huge data, the files are stored across multiple machines. These files are stored in redundant fashion to rescue the system from possible data losses in case of failure. HDFS also makes applications available to parallel processing.
- Parallelize data so that Access can be parallelized.
- Appears as a single task.
- Based on GFS Google file system.
- Fault tolerant – handle disk crashes, machine crash.
- Storage large amount of files.
- Streaming Data.
- Multiple reads write once.
Get In Touch