Repairing Namenode edits log and HBase .META. Table

During the last six months I have been working on Hadoop cluster and running HBase on top of it. During this period I encountered 2 very interesting and challenging issues. One was editing a corrupted edits file of NameNode and other was restoring HBase .META. Table.

The cluster I was working on consisted of 13 physical nodes, with 11 datanodes, Primary NameNode and Secondary NameNode. With a total storage capacity of 11Tb. Each node was dual core xeon server. The issue started  when I was running lot of MapReduce jobs, and due to some reason they were failing and outputting enormous amount of logging information. Since this jobs were started on NameNode, the disk started filling up preety quickly and reached 100% usage overnight.

Editing "edits" file
=============

For beginners, very basic background on Namenode which concerns this post, it is the master node of the cluster and keeps log of all the edits done on HDFS,  this edits file is a binary file. Now if due to some reason the disk usage is 100% and NameNode is not able to write the full edit record, when you try to restart NameNode it will throw some fatal error depending on where your edits file was corrupted. In my case it was "Null Pointer Exception". I was using Hadoop-0.20.205.0 version and HBase 0.90.5, initial searches show that you can replace your "edits" file from a checkpoint file kept by HDFS but if that too is corrupted (which was in my case) then, if you secondary node was correctly configured, you can restore it (but when I installed hadoop, I didnt check wether secondary node keeping a copy of state or not and it was not due to some binding issues, which I resolved later), so the only option left was to edit "edits" binary file and remove the last record written by NameNode.  There are 2 wikis that might help you, http://wiki.apache.org/hadoop/TroubleShooting and http://wiki.apache.org/hadoop/NameNodeFailover

If the records are written in a structured way than it shouldn't be an issue to remove the last one, but that's not the case and if your HDFS is like 70% full formating the FS is not an option. So with no other option left, I took upon the challenge of editing this file using hexedit …:)

Long story short, I looked at the stack trace of the exception and found that the main function that tries to load the "edits file" is "loadFSEditsLog" in  "org.apache.hadoop.hdfs.server.namenode.FSEditLog" file. Then I scanned each line of that function and seek to the corresponding location on the actual "edits" file.
The file header is the version of int type, if it reads that successfully then it starts reading records, now each record starts with an OPCode of one byte and can be between -1 and 14 inclusive. Here is the trick, not every opcode occupies same length of bytes, so you have to carefully look at the opcode and the byte used and seek in the file accordingly. You should always start from end to debug the corrupted log.

After you find the corrupted log, you can make that as OP_INVALID (-1) and all the bytes thereafter till you reach a string of "00" bytes, and after this you will can restart your NameNode without any Exceptions..:)

Recovering HBase .META. Table
=====================

The above edits corruption can lead to a side effect of corrupting your HBase ".META." Table, so if during recovering HDFS is not able to restore some of the "META" table blocks, your HBase won't start successfully, it will start but you won't be able to see any regions in it. ".META." table basically contains all your regions to regionserver mappings and hence if this gets corrupted you wont be able to boot up HBase. So before meddling with the HBase logs and regions stuff, you should try your best to recover the blocks at the HDFS level itself. If that fails, from 0.92.0 version onwards they have included a tool for offline repair of .META. table from file system data as mentioned in cloudera release page.

" This code is used to rebuild meta off line from file system data.If there
   * are any problem detected, it will fail suggesting actions for the user to do
   * to "fix" problems. If it succeeds, it will backup the previous .META. and
   * -ROOT- dirs and write new tables in place.
   *
   * This is an advanced feature, so is only exposed for use if explicitly
   * mentioned.
   *
   * hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair .. "


You can use to recover the ".META." table.

I found this 2 issues very intriguing and for the first one there has some patches applied for monitoring the disk usage and moving the NameNode into safe mode if it reaches beyond a threshold, HDFS-1594

Hope you find this post helpful. Thanks

Advertisements
    • L
    • February 16th, 2014

    Very helpful!! Thanks

  1. April 16th, 2014

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: