Analyse Hadoop fsimage using the Offline Image Viewer (OIV) Tool
Hadoop fsimage is an “Image” file and its contents cannot be read easily using normal unix file system tools like cat, more etc. At times, it is very important to read the clear text version of the fsimage which holds the meta data of the file system. You can perform NameSpace Analysis, find out health of your fsimage, and even explore the interesting usage patterns.
The Offline Image Viewer is a tool used to dump the contents of hdfs fsimage files to human-readable formats in order to allow offline analysis and examination of an Hadoop cluster’s namespace. The tool is able to process very large image files relatively quickly, converting them to one of several output formats. If the tool is not able to process an image file, it will exit cleanly. The Offline Image Viewer does not require a Hadoop cluster to be running; it is entirely offline in its operation.
Lets now read and analyse the fsimage using this OIV tool.
STEP 1: Download the latest fsimage copy.
$ hdfs dfsadmin -fetchImage /tmp 14/07/08 07:27:49 INFO namenode.TransferFsImage: Opening connection to http://<nn_hostname>:50070/getimage?getimage=1&txid=latest 14/07/08 07:27:49 INFO namenode.TransferFsImage: Transfer took 0.23s at 89.74 KB/s $ ls -ltr /tmp | grep -i fsimage -rw-r--r-- 1 root root 22164 Jul 8 07:27 fsimage_0000000000000001386
$ hdfs oiv -i /tmp/fsimage_0000000000000001386 -o /tmp/fsimage.txt $ head /tmp/fsimage.txt drwxrwxrwx - hdfs supergroup 0 2014-07-08 06:05 / drwxr-xr-x - root supergroup 0 2014-07-08 06:05 /conf.d -rw-r--r-- 3 root supergroup 5314 2014-07-08 06:05 /gmetad.conf -rw-r--r-- 3 root supergroup 8283 2014-07-08 06:05 /gmond.conf ....<output truncated for brevity>
$ hdfs oiv -i fsimage_0000000000000001386 -o /tmp/fsimage.txt -p Indented $ head -n 20 /tmp/fsimage.txt $ more /tmp/fsimage.txt FS_IMAGE IMAGE_VERSION = -40 NAMESPACE_ID = 1591147859 GENERATION_STAMP = 1209 TRANSACTION_ID = 1386 IS_COMPRESSED = false INODES [NUM_INODES = 228] INODE INODE_PATH = / REPLICATION = 0 MODIFICATION_TIME = 2014-07-08 06:05 ACCESS_TIME = 1969-12-31 18:00 BLOCK_SIZE = 0 BLOCKS [NUM_BLOCKS = -1] NS_QUOTA = 2147483647 DS_QUOTA = -1 PERMISSIONS USER_NAME = hdfs GROUP_NAME = supergroup PERMISSION_STRING = rwxrwxrwx
- Determine the number of files each user has created on the file system
- Find files that were created but have not been accessed
- Find probable duplicates of large files by comparing the size of each file.