HDFS File System Metadata Backup
The NameNode daemon in an Apache Hadoop cluster runs a HTTP web interface (HTTP Servlet). One of the features exposed by this interface is the ability to retrieve the fsimage and edits HDFS filesystem metadata files.
The fsimage file contains a serialised form of the filesystem metadata (file names, permissions, owner etc) and the edits file contains a log of all file system changes (i.e. file or directory create operations, permission changes etc).
The fsimage is stored on disk and in memory but when changes are made to HDFS these are reflected in the in-memory version of fsimage and the edits log but not on the fsimage stored on disk (although it is periodically updated through a checkpoint process and on NameNode start up).
I've just posted the initial version of a Python (2.x) script to GitHub for backing up HDFS metadata. I hope to improve the script by modularising it further and cleaning up and adding further error handling. Feel free to modify the script for your own use (I'd request that you feed back improvements via GitHub).
The script can be used with Apache Hadoop 1.0/CDH3 and Apache Hadoop 2.0/CDH4. As you may be aware the 2.0 release uses a slightly different metadata storage scheme in that with 1.0 there was a single fsimage and edits file. Whereas with 2.0 there can be more than one fsimage labelled by the most recent transaction ID it covers and multiple edits files each covering a range of transaction IDs.
In the screenshot above you can see the edits files numbered by transaction and the fsimage files. As you can see the md5 hash of the fsimage is also stored to detect corruption. You may have also noticed there is a file named edits_inprogress_xxx this is the current edits file which stores all transactions since transaction 3170 - there's no ending transaction number since it's still being used.
The seen_txid contains the last transaction seen in this directory. This sounds odd but remember that the metadata can be written to multiple directories as specified by the dfs.namenode.name.dir configuration parameter in hdfs-site.xml. In fact you can actually specify separate directories for the edits and fsimage files via the dfs.namenode.edits.dir parameter. The NameNode will write synchronously to all the metadata directories, so why might the seen_txid file in one directory contain a different value? Well if there is a temporary failure of one of the directories it may not contain the latest metadata information.
Here are the URLs for retrieving the fsimage and edits files (assuming your cluster is using the default ports):
Apache Hadoop 1.0/CDH3:
fsimage: http://<namenode>:50070/getimage?getimage=1
edits: http://<namenode>:50070/getimage?getedit=1
Apache Hadoop 2.0/CDH4:
http://<namenode>:50070/getimage?getimage=1&txid=latest
http://<namenode>:50070/getimage?getedit=1&startTxId=X&endTxId=Y
You can retrieve the files from the command line using wget or curl. The files cannot be viewed as is since they are serialised, however, you can use the hdfs oiv and hdfs oev commands to deserialise the files.
The screenshot below shows how to use the hdfs oiv command to read the serialised version of the fsimage and write out the deserialised version to another file.
The next screenshot shows the hdfs oev command (note the 'e') to deserialise the edits file.
Note that the edits file is XML.
I hope you found this blog post useful.
The fsimage file contains a serialised form of the filesystem metadata (file names, permissions, owner etc) and the edits file contains a log of all file system changes (i.e. file or directory create operations, permission changes etc).
The fsimage is stored on disk and in memory but when changes are made to HDFS these are reflected in the in-memory version of fsimage and the edits log but not on the fsimage stored on disk (although it is periodically updated through a checkpoint process and on NameNode start up).
I've just posted the initial version of a Python (2.x) script to GitHub for backing up HDFS metadata. I hope to improve the script by modularising it further and cleaning up and adding further error handling. Feel free to modify the script for your own use (I'd request that you feed back improvements via GitHub).
The script can be used with Apache Hadoop 1.0/CDH3 and Apache Hadoop 2.0/CDH4. As you may be aware the 2.0 release uses a slightly different metadata storage scheme in that with 1.0 there was a single fsimage and edits file. Whereas with 2.0 there can be more than one fsimage labelled by the most recent transaction ID it covers and multiple edits files each covering a range of transaction IDs.
In the screenshot above you can see the edits files numbered by transaction and the fsimage files. As you can see the md5 hash of the fsimage is also stored to detect corruption. You may have also noticed there is a file named edits_inprogress_xxx this is the current edits file which stores all transactions since transaction 3170 - there's no ending transaction number since it's still being used.
The seen_txid contains the last transaction seen in this directory. This sounds odd but remember that the metadata can be written to multiple directories as specified by the dfs.namenode.name.dir configuration parameter in hdfs-site.xml. In fact you can actually specify separate directories for the edits and fsimage files via the dfs.namenode.edits.dir parameter. The NameNode will write synchronously to all the metadata directories, so why might the seen_txid file in one directory contain a different value? Well if there is a temporary failure of one of the directories it may not contain the latest metadata information.
Here are the URLs for retrieving the fsimage and edits files (assuming your cluster is using the default ports):
Apache Hadoop 1.0/CDH3:
fsimage: http://<namenode>:50070/getimage?getimage=1
edits: http://<namenode>:50070/getimage?getedit=1
Apache Hadoop 2.0/CDH4:
http://<namenode>:50070/getimage?getimage=1&txid=latest
http://<namenode>:50070/getimage?getedit=1&startTxId=X&endTxId=Y
You can retrieve the files from the command line using wget or curl. The files cannot be viewed as is since they are serialised, however, you can use the hdfs oiv and hdfs oev commands to deserialise the files.
The screenshot below shows how to use the hdfs oiv command to read the serialised version of the fsimage and write out the deserialised version to another file.
The next screenshot shows the hdfs oev command (note the 'e') to deserialise the edits file.
Note that the edits file is XML.
I hope you found this blog post useful.
Hi.
ReplyDeleteThis is Harry Haller, from Spain.
First of all: great blog!
I'm currently working by my own, designing automated cluster deploying and management.
I'm writing to you in order ask your consent to use your current scripts, developed in Python, and adapt 'em to CDH 4.5.0 and CDH 5.0 (currently in beta state).
Thanks in advance.
P.S.: if you want to contact me, my email address is telemaco230@gmail.com.
Hi Harry, the code is on GitHub so feel free to use it. Just to be clear, this script is not part of a suite of cluster management and deployment scripts it's standalone.
DeleteI believe the preferred method in CDH4 is 'dfsadmin -fetchImage'
ReplyDelete'wget http://:50070/getimage?getimage=1&txid=latest' fails in our CDH4.2.1 env with an error "410 GetImage failed. java.io.IOException: Invalid request has no txid parameter"
Hi Steve, thanks for your comments, having looked into this again I believe you are correct.
ReplyDelete