How to Determine the Location on HDFS of a Hive Database or Table
Sometimes in troubleshooting an issue or making changes you may need to determine the location on HDFS used by a particular database or table, the steps below how to find this information.
1. If your cluster is kerberised run kinit first, then start beeline
2. Run the command `show databases extended like MyDatabaseName;` this will produce output similar to that in the example below
show databases extended like text_analysis;
INFO : OK
+---------------+----------+-----------------------------------------------------------------------------------------+-------------+-------------+-------------+--+
| db_name | comment | location | owner_name | owner_type | parameters |
+---------------+----------+-----------------------------------------------------------------------------------------+-------------+-------------+-------------+--+
| text_analysis | | hdfs://node--0005.mydomain.net:8020/Data/hive/warehouse/text_analysis.db | admin | USER | |
+---------------+----------+-----------------------------------------------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (1.5 seconds)
3. In the example above we can see the location on HDFS where the database named text_analysis resides.
1. If your cluster is kerberised run kinit first, then start beeline
2. Select the database by running `use MyDatabaseName;`
3. Run the command `describe formatted MyTableName;`. An example is shown below
```
describe formatted t_fuzzymatching_results;
+-------------------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------+--+
| col_name | data_type | comment |
+-------------------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------+--+
| # col_name | data_type | comment |
| | NULL | NULL |
| item_name | varchar(200) | |
| item_code | varchar(200) | |
<removed for brevity>
|
| # Detailed Table Information | NULL | NULL |
| Database: | text_analysis | NULL |
| Owner: | jbloggs | NULL |
| CreateTime: | Wed Oct 26 12:10:30 UTC 2016 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://node-0005.mydomain.net:8020/Data/hive/warehouse/text_analysis.db/t_fuzzymatching_results | NULL |
| Table Type: | MANAGED_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | COLUMN_STATS_ACCURATE | true |
| | numFiles | 5 |
| | orc.compress | SNAPPY |
| | totalSize | 3997148 |
| | transient_lastDdlTime | 1477483844 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL
| NULL |
| | serialization.format | 1 |
+-------------------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------+--+
```
4. We can see from the above example that
* Location of the table in HDFS is shown by the row labelled Location under Detailed Table Information
* The Table Type shows whether this is a Hive internal (managed) or external table
* The Input/Output format show the data format that is used by the table.
Finding the location on HDFS for a database
1. If your cluster is kerberised run kinit first, then start beeline
2. Run the command `show databases extended like MyDatabaseName;` this will produce output similar to that in the example below
show databases extended like text_analysis;
INFO : OK
+---------------+----------+-----------------------------------------------------------------------------------------+-------------+-------------+-------------+--+
| db_name | comment | location | owner_name | owner_type | parameters |
+---------------+----------+-----------------------------------------------------------------------------------------+-------------+-------------+-------------+--+
| text_analysis | | hdfs://node--0005.mydomain.net:8020/Data/hive/warehouse/text_analysis.db | admin | USER | |
+---------------+----------+-----------------------------------------------------------------------------------------+-------------+-------------+-------------+--+
1 row selected (1.5 seconds)
3. In the example above we can see the location on HDFS where the database named text_analysis resides.
Finding the location on HDFS for a table
1. If your cluster is kerberised run kinit first, then start beeline
2. Select the database by running `use MyDatabaseName;`
3. Run the command `describe formatted MyTableName;`. An example is shown below
```
describe formatted t_fuzzymatching_results;
+-------------------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------+--+
| col_name | data_type | comment |
+-------------------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------+--+
| # col_name | data_type | comment |
| | NULL | NULL |
| item_name | varchar(200) | |
| item_code | varchar(200) | |
<removed for brevity>
|
| # Detailed Table Information | NULL | NULL |
| Database: | text_analysis | NULL |
| Owner: | jbloggs | NULL |
| CreateTime: | Wed Oct 26 12:10:30 UTC 2016 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://node-0005.mydomain.net:8020/Data/hive/warehouse/text_analysis.db/t_fuzzymatching_results | NULL |
| Table Type: | MANAGED_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | COLUMN_STATS_ACCURATE | true |
| | numFiles | 5 |
| | orc.compress | SNAPPY |
| | totalSize | 3997148 |
| | transient_lastDdlTime | 1477483844 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL
| NULL |
| | serialization.format | 1 |
+-------------------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------+--+
```
4. We can see from the above example that
* Location of the table in HDFS is shown by the row labelled Location under Detailed Table Information
* The Table Type shows whether this is a Hive internal (managed) or external table
* The Input/Output format show the data format that is used by the table.
Comments
Post a Comment