
HDPV2 - Services Review
A review of all Services RunningThis blog discusses all the services / java process running / configured till now as part of Hadoop V2 Configuration.First let me list out all...
A journey from Database Administrator to Data Administrator.
A review of all Services RunningThis blog discusses all the services / java process running / configured till now as part of Hadoop V2 Configuration.First let me list out all...
In this blog I discuss on how to configure Fair Scheduler. Fair Scheduler is also one of the scheduler used in production environments.In my words it is more fairer than...
In this blog I discuss how to do fair scheduler configuration for Hadoop 2 I will design Queues and Capacity as per below diagram. (All the detailed configuration is present...
In this blog I will demonstrate how to import data using sqoop from Oracle to HDFSIf you have followed my last blog, you have your sqoop installation ready. Step 1...
Symptoms - Sqoop job fails when importing with avro type format. Container logs show org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.avro.reflect.ReflectData.addLogicalTypeConversion(Lorg/apache/avro/Conversion;)V Solution - Run the job by adding arguements to...
In this blog I will discuss how to find out issues with container launch when you are running your jobs / applications. Looking at below error trace from a MR...
In this blog I discuss Sqoop deployment, Sqoop stands for SQL to Hadoop. SQL is a tool which can import / export data from RDBMSSqoop - Comes bundled with special...
In this blog I discuss my configuration of Automatic Failover using QJM.This is in continuation with my previous QJM blog for manual configuration. Automatic failover is configured using ZKFC -...
In this blog I discuss usage of haadmin command. I have already setup Manual HA configuration using QJM in last blog. haadmin command is supported for failing over , switching...
In this blog I discuss setting up Namenode High availability using QJM (Quorum Journal Manager)Functioning of QJM1. 3 (or 5 or odd number) nodes running QJM2. NN writes Edit logs...
In this short blog I will discuss on getconf class of hdfs. This class lists the configuration as listed in the configuration fileIt can be used to get details on...
In this blog I will discuss about Safe Mode in NamenodeSafe Mode is a special mode of Hadoop which is read only mode (No client connections still) and no changes...
This blog covers details of Namenode and Namenode related parametersCheckpoint Frequency Checkpointing can be configured by setting - dfs.namenode.checkpoint.period. This parameter controls the time between 2 checkpoints. - dfs.namenode.checkpoint.txns. This...
In this blog I discuss setup of HttpFS in HadoopIN hadoop HttpFS1. Acts as a proxy server for catering to REST requests2. Acts as single point of contact for all...
In this blog I discuss how to setup WebHDFS.In your hdfs-site.xml Setup below property on namenode in hdfs-site.xml<property> <name>dfs.webhdfs.enabled</name> <value>true</value></property>Distribute this on all nodesThis will need bounce of all Datanodes...
FSCK is one of the key utilized and monitoring command for monitoring hdfs FSCK1. Similar to Linux fsck, finds out block corruptions and issue with the File System 2. Does...
In this blog I discuss HDFS snapshot feature HDFS Snapshot - 1. Feature to take snapshots of directory to prevent errors2. It is used to query old versions of data3....
In this blog I discuss about HDFS Trash Feature.Trash is a feature provided by HDFS similar to recycle bin of Windows. However there are differences / few changes 1. It...