構築中。まだメモの途中なので。
マシン構成
- themis ... namenode01
- maria ... datanode01
- pallas ... datanode02
update-alternatives
$ sudo cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.cluster
$ sudo update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50
$ update-alternatives --display hadoop-0.20-conf
hadoop-0.20-conf - auto mode
リンクは現在 /etc/hadoop-0.20/conf.cluster を指しています
/etc/hadoop-0.20/conf.cluster - 優先度 50
/etc/hadoop-0.20/conf.empty - 優先度 10
/etc/hadoop-0.20/conf.pseudo - 優先度 30
現在の `最適' バージョンは /etc/hadoop-0.20/conf.cluster です
設定ファイルいろいろ
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://themis:54310</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop-0.20/cache/${user.name}</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
<property>
<name>dfs.hosts</name>
<value>${hadoop.tmp.dir}/hosts.include</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>${hadoop.tmp.dir}/hosts.exclude</value>
</property>
<property>
<name>dfs.http.address</name>
<value>themis:50070</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/var/lib/hadoop-0.20/cache/hadoop/dfs/name</value>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>themis:54311</value>
</property>
<property>
<name>mapred.hosts</name>
<value>${hadoop.tmp.dir}/hosts.include</value>
</property>
<property>
<name>mapred.hosts.exclude</name>
<value>${hadoop.tmp.dir}/hosts.exclude</value>
</property>
</configuration>
hosts.include, hosts.excludeは作成しないとnamenode起動時に怒られる
masters
themis
slaves
maria pallas
ポートは全開
namenodeをフォーマット
hadoopユーザでフォーマット。
maki@themis:~$ sudo su -s /bin/bash - hadoop -c "hadoop namenode -format"
スタート
namenode
$ sudo /etc/init.d/hadoop-0.20-namenode start $ sudo /etc/init.d/hadoop-0.20-jobtracker start
datanode
$ sudo /etc/init.d/hadoop-0.20-datanode start $ sudo /etc/init.d/hadoop-0.20-tasktracker start
次のようなスクリプトを書いておくと便利
node.sh
#!/bin/sh
NAMENODE="themis"
DATANODE="maria pallas"
NODE="$NAMENODE $DATANODE"
sendAll.sh
#!/bin/sh
. ~/node.sh
for n in $NODE;
do
CMD="ssh $n $*"
echo "== $n =="
$CMD;
done
実行例
$ ./sendAll.sh hostname
== themis ==
themis
== maria ==
maria
== pallas ==
pallas
sync.sh
#!/bin/sh
. ~/node.sh
for n in $NODE;
do
if [ `hostname` != "$n" ];then
CMD="sudo rsync --progress -av /etc/hadoop/conf.cluster $n:/etc/hadoop/conf.cluster"
echo "== $n =="
echo $CMD
$CMD
fi
done
実行例(マスターを変更すれば全部同期する)
$ ./sync.sh
== maria ==
sudo rsync --progress -av /etc/hadoop/conf.cluster/capacity-scheduler.xml /etc/hadoop/conf.cluster/configuration.xsl /etc/hadoop/conf.cluster/core-site.xml /etc/hadoop/conf.cluster/core-site.xml~ /etc/hadoop/conf.cluster/fair-scheduler.xml /etc/hadoop/conf.cluster/hadoop-env.sh /etc/hadoop/conf.cluster/hadoop-metrics.properties /etc/hadoop/conf.cluster/hadoop-policy.xml /etc/hadoop/conf.cluster/hdfs-site.xml /etc/hadoop/conf.cluster/hdfs-site.xml~ /etc/hadoop/conf.cluster/log4j.properties /etc/hadoop/conf.cluster/mapred-site.xml /etc/hadoop/conf.cluster/mapred-site.xml~ /etc/hadoop/conf.cluster/masters /etc/hadoop/conf.cluster/masters~ /etc/hadoop/conf.cluster/slaves /etc/hadoop/conf.cluster/slaves~ /etc/hadoop/conf.cluster/ssl-client.xml.example /etc/hadoop/conf.cluster/ssl-server.xml.example maria:/etc/hadoop/conf.cluster
sending incremental file list
sent 382 bytes received 12 bytes 788.00 bytes/sec
total size is 24285 speedup is 61.64
== pallas ==
sudo rsync --progress -av /etc/hadoop/conf.cluster/capacity-scheduler.xml /etc/hadoop/conf.cluster/configuration.xsl /etc/hadoop/conf.cluster/core-site.xml /etc/hadoop/conf.cluster/core-site.xml~ /etc/hadoop/conf.cluster/fair-scheduler.xml /etc/hadoop/conf.cluster/hadoop-env.sh /etc/hadoop/conf.cluster/hadoop-metrics.properties /etc/hadoop/conf.cluster/hadoop-policy.xml /etc/hadoop/conf.cluster/hdfs-site.xml /etc/hadoop/conf.cluster/hdfs-site.xml~ /etc/hadoop/conf.cluster/log4j.properties /etc/hadoop/conf.cluster/mapred-site.xml /etc/hadoop/conf.cluster/mapred-site.xml~ /etc/hadoop/conf.cluster/masters /etc/hadoop/conf.cluster/masters~ /etc/hadoop/conf.cluster/slaves /etc/hadoop/conf.cluster/slaves~ /etc/hadoop/conf.cluster/ssl-client.xml.example /etc/hadoop/conf.cluster/ssl-server.xml.example pallas:/etc/hadoop/conf.cluster
sending incremental file list
sent 382 bytes received 12 bytes 262.67 bytes/sec
total size is 24285 speedup is 61.64
ここ読め