Cloudera Certified Administrator for Apache Hadoop CDH4 Exam

I took the Cloudera Certified Administrator for Apache Hadoop CDH4 (CCAH) exam last week and I'm happy to say that I passed.

The CCAH is a 90 minute, 60 question exam with a pass mark of 70%. It's multiple choice exam where you either have to select a single best correct answer or multiple correct answers. The emphasis on best is there because in such questions there may be multiple statements that are factually correct but a particular statement may provide more information about a particular topic and is therefore the 'best' answer.

You are provided with a sheet of laminated paper to use in jotting down notes or performing calculations (don't worry they're just simple calculations).

Although the syllabus provided for CCAH on the Cloudera website is, shall we say, quite succinct the exam questions do closely follow it - you just have to think about the topics mentioned in the syllabus and what are the key points in each topic.

I attended the Cloudera Administrator Training for Apache Hadoop but you can pass the exam by simply reading the project documentation, Hadoop Operations (if you take the course before March 2013 you receive an electronic copy of the book on the training course) and Hadoop The Definitive Guide (currently in 3rd edition). I have reviewed both books in a previous post. The documentation is quite brief in places and it may not be immediately obvious how something works by reading the documentation alone (I'm assuming you'd rather not dig through the Java source code to work it out!). In light of this I'd recommend investing in one or both of the aforementioned books. I have listed the chapters that map to each of the syllabus topics in the table below.


Syllabus Topic Hadoop Operations Hadoop The Definitive Guide
HDFS (38%) Chapters 1 and 6 Chapter 3, covered in more detail than in Hadoop Operations.
MapReduce (10%) Chapter 3 Chapter 6, covered in more detail than in Hadoop Operations.
Apache Hadoop Cluster Planning(12%) Chapter 4, much more detail than Hadoop The Definitive Guide Chapter 9
Apache Hadoop Cluster Installation and Administration (17%) Chapter 5 and 8 Chapter 9 and 10 but generally better coverage in Hadoop Operations
Resource Management (6%) Chapter 7, reading both book's information on this topic helped me to better understand scheduling. Chapter 6, the section on Job Scheduling.
Monitoring and Logging (12%) Chapter 10, slightly better coverage in this book. Chapter 10
Ecosystem(5%) Not covered beyond briefly discussing ecosystem projects in the introduction and in the context of configuring security (Kerberos) Excellent coverage of ecosystem projects including Hive, Pig, HBase and Sqoop


The other advantage of the training course is that you have the opportunity to build a working cluster across multiple machines (as opposed to pseudo-distributed mode, where all Hadoop services/daemons run on one machine) and can ask the trainer questions (admittedly you could ask such questions on the Hadoop mailing lists. Last of all the course material unsurprisingly covers everything you need to know for the exam and more. I felt that the material helped in answering some questions that I would not have been able to necessarily answer by just reading the books.


Comments

  1. Thanks for the information on this, Vijay. I'm looking forward to take up this exam by preparing for at least 2-3 months by running some machines in my lab. I'm a Windows system administrator and have little experience with linux administration. How tough do you think it'll be for me to prepare? Apart from the books, did you have any exam prep kind of guide to test your knowledge? Going through Hadoop operations, I was expecting each chapter to have a small quiz section, which it doesn't.

    Can you please point to some guides, apart from the books and documentation mentioned? Thanks in advance.

    ReplyDelete
  2. Hi Vijay, congrats on your success. I am planning to prepare for this certification. Is it possible for you to share your email id or drop me an inbox on ssharma2@babson.edu. Also, are you working in US or looking for a job in Big Data in US?

    ReplyDelete
  3. Hi Vijay,

    I need to discuss about Hadoop with you.

    could you please drop a mail to my email id

    tejasvi_kt@yahoo.co.in
    tejaswi.kt@gmail.com

    ReplyDelete
  4. Sunny, sorry for the late response I haven't had a chance lately to keep the blog up to date. Since writing this post there are a large number of books available on Hadoop. The questions in the exam are focused on testing your understanding of how Hadoop works, they don't test you on specifics of Linux (that wouldn't be very fair) so I don't think being familiar with Linux is a pre-requisite. Cloudera themselves now provide sample questions to help people prepare http://www.cloudera.com/content/cloudera/en/training/certification/ccah/practice.html. This wasn't available when I took the test so I don't know how useful it is - it's also not free.

    ReplyDelete
  5. Hi am planning to take up CCAH exam by Feb and i had a doubt that i had prepared myself by installing hadoop (tar.gz) and reading "Hadoop in Operation " and "The Definite Guide" is this enough or i need to install CDH4 and get used to it

    ReplyDelete
  6. Muthu, the books are generally enough to pass the exams but if you're serious about understanding Hadoop I would recommend that you install and play around with it. I personally find that reading about things in books is good but I tend to understand things better if I put them into practice.

    ReplyDelete

Post a Comment