之前配置Hadoop的很多步骤不太详细,配置文件的许多优化点也没太搞清楚,重新写一下。
2012.06.22更新:Hadoop版本兼容到1.0.3。
0、ssh免密码登录
ssh-keygen -t rsa -P "" cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys echo "StrictHostKeyChecking no" >> ~/.ssh/config
1、安装JDK7
#下载 && 解压缩 && 安装 wget http://download.oracle.com/otn-pub/java/jdk/7u2-b13/jdk-7u2-linux-i586.tar.gz tar -xzf jdk-7u2-linux-i586.tar.gz mv ./jdk1.7.0_02 ~/jdk #配置JAVA_HOME环境变量 vim ~/.bashrc export JAVA_HOME=/home/hadoop/jdk/ export JAVA_BIN=/home/hadoop/jdk/bin export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
2、安装Hadoop(0.23.0)
#安装解压缩Hadoop wget http://labs.mop.com/apache-mirror/hadoop/common/hadoop-1.0.3/hadoop-1.0.3-bin.tar.gz tar -xzvf hadoop-1.0.3-bin.tar.gz mv ./hadoop-1.0.3 ~/hadoop_home #创建运行时目录 cd ~/hadoop_home mkdir var cd var mkdir tmp mapred hdfs cd hdfs mkdir name data #导出Java_HOME cd ~/hadoop_home/conf/ vim ./hadoop-env.sh export JAVA_HOME=/home/hadoop/jdk/
更新:注意权限,新版本中,所有HDFS目录权限务必是755,不能是775。
chmod 755 data name
3、准备环境变量
主要是HADOOP_HOME,在1.0之后,还要这个参数
export HADOOP_HOME=/home/hadoop/hadoop_home/ export HADOOP_HOME_WARN_SUPPRESS=1
4、配置hosts(Linux和Hadoop)
#配置每个结点上的hosts文件 sudo vim /etc/hosts #Hosts for hadoop 10.70.0.101 hadoop1 10.70.0.102 hadoop2 ...... #配置masters和slaves vim ~/hadoop_home/conf vim masters hadoop1 vim slaves hadoop1 hadoop2 ......
5、配置文件:
参数详细配置参考:http://hadoop.apache.org/common/docs/current/cluster_setup.html
core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.default.name</name> <value>hdfs://hadoop1:54310</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop_home/var/tmp</value> </property> <!--Following use more memory but speed up more --> <property> <name>fs.inmemory.size.mb</name> <value>200</value> </property> <property> <name>io.sort.factor</name> <value>100</value> </property> <property> <name>io.sort.mb</name> <value>200</value> </property> </configuration>
hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/hadoop_home/var/hdfs/data</value> </property> <property> <name>dfs.name.dir</name> <value>/home/hadoop/hadoop_home/var/hdfs/name</value> </property> <!--Here is 128MB !! --> <property> <name>dfs.block.size</name> <value>134217728</value> </property> <!--Parrel RPC Handler for namenode--> <property> <name>dfs.namenode.handler.count</name> <value>40</value> </property> </configuration>
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value>hadoop1:54311</value> </property> <property> <name>mapred.reduce.parallel.copies</name> <value>20</value> </property> <property> <name>mapred.local.dir</name> <value>/home/hadoop/hadoop_home/var/mapred</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>12</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>6</value> </property> <!--Following use more memory but speed up more --> <property> <name>mapred.map.child.java.opts</name> <value>-Xmx512M</value> </property> <property> <name>mapred.reduce.child.java.opts</name> <value>-Xmx512M</value> </property> </configuration>
6、格式化namenode
cd bin ./hadoop namenode -format
7、启动Hadoop
cd ~/hadoop_home/bin ./start-all.sh #查看启动情况 jps 7532 SecondaryNameNode 7346 NameNode 7433 DataNode 7605 JobTracker 7759 Jps 7701 TaskTracker
GUI:http://localhost:50030 (集群)
GUI:http://hadoop1:50070(HDFS)
8、其他说明:
mapred.tasktracker.map.tasks.maximum 每个结点最多运行多少个map
mapred.tasktracker.reduce.tasks.maximum 每个结点最多运行多少个reduce