IK.AM

@making's tech note


Hadoop完全分散環境構築メモ

🗃 {Middleware/DistributedSystem/Hadoop}
🗓 Updated at 2010-09-14T17:25:08Z  🗓 Created at 2010-09-14T17:25:08Z   🌎 English Page

構築中。まだメモの途中なので。

マシン構成

  • themis ... namenode01
  • maria ... datanode01
  • pallas ... datanode02

update-alternatives

$ sudo cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.cluster
$ sudo update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50
$ update-alternatives --display hadoop-0.20-conf
hadoop-0.20-conf - auto mode
 リンクは現在 /etc/hadoop-0.20/conf.cluster を指しています
/etc/hadoop-0.20/conf.cluster - 優先度 50
/etc/hadoop-0.20/conf.empty - 優先度 10
/etc/hadoop-0.20/conf.pseudo - 優先度 30
現在の `最適' バージョンは /etc/hadoop-0.20/conf.cluster です

設定ファイルいろいろ

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://themis:54310</value>
  </property>

  <property>
     <name>hadoop.tmp.dir</name>
     <value>/var/lib/hadoop-0.20/cache/${user.name}</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
  <property>
    <name>dfs.permissions</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.hosts</name>
    <value>${hadoop.tmp.dir}/hosts.include</value>
  </property>
  <property>
    <name>dfs.hosts.exclude</name>
    <value>${hadoop.tmp.dir}/hosts.exclude</value>
  </property>
  <property>
    <name>dfs.http.address</name>
    <value>themis:50070</value>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>/var/lib/hadoop-0.20/cache/hadoop/dfs/name</value>
  </property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>themis:54311</value>
  </property>
  <property>
    <name>mapred.hosts</name>
    <value>${hadoop.tmp.dir}/hosts.include</value>
  </property>
  <property>
    <name>mapred.hosts.exclude</name>
    <value>${hadoop.tmp.dir}/hosts.exclude</value>
  </property>
</configuration>

hosts.include, hosts.excludeは作成しないとnamenode起動時に怒られる

masters

themis

slaves

maria
pallas

ポートは全開

namenodeをフォーマット

hadoopユーザでフォーマット。

maki@themis:~$ sudo su -s /bin/bash - hadoop -c "hadoop namenode -format"

スタート

namenode

$ sudo /etc/init.d/hadoop-0.20-namenode start
$ sudo /etc/init.d/hadoop-0.20-jobtracker start

datanode

$ sudo /etc/init.d/hadoop-0.20-datanode start
$ sudo /etc/init.d/hadoop-0.20-tasktracker start

次のようなスクリプトを書いておくと便利

node.sh

#!/bin/sh
NAMENODE="themis"
DATANODE="maria pallas"
NODE="$NAMENODE $DATANODE"

sendAll.sh

#!/bin/sh
. ~/node.sh

for n in $NODE;
do
    CMD="ssh $n $*"
    echo "== $n =="
    $CMD;
done

実行例

$ ./sendAll.sh hostname
== themis ==
themis
== maria ==
maria
== pallas ==
pallas

sync.sh

#!/bin/sh
. ~/node.sh

for n in $NODE;
do
if [ `hostname` != "$n" ];then
    CMD="sudo rsync --progress -av /etc/hadoop/conf.cluster $n:/etc/hadoop/conf.cluster"
    echo "== $n =="
    echo $CMD
    $CMD
fi
done

実行例(マスターを変更すれば全部同期する)

$ ./sync.sh
== maria ==
sudo rsync --progress -av /etc/hadoop/conf.cluster/capacity-scheduler.xml /etc/hadoop/conf.cluster/configuration.xsl /etc/hadoop/conf.cluster/core-site.xml /etc/hadoop/conf.cluster/core-site.xml~ /etc/hadoop/conf.cluster/fair-scheduler.xml /etc/hadoop/conf.cluster/hadoop-env.sh /etc/hadoop/conf.cluster/hadoop-metrics.properties /etc/hadoop/conf.cluster/hadoop-policy.xml /etc/hadoop/conf.cluster/hdfs-site.xml /etc/hadoop/conf.cluster/hdfs-site.xml~ /etc/hadoop/conf.cluster/log4j.properties /etc/hadoop/conf.cluster/mapred-site.xml /etc/hadoop/conf.cluster/mapred-site.xml~ /etc/hadoop/conf.cluster/masters /etc/hadoop/conf.cluster/masters~ /etc/hadoop/conf.cluster/slaves /etc/hadoop/conf.cluster/slaves~ /etc/hadoop/conf.cluster/ssl-client.xml.example /etc/hadoop/conf.cluster/ssl-server.xml.example maria:/etc/hadoop/conf.cluster
sending incremental file list

sent 382 bytes  received 12 bytes  788.00 bytes/sec
total size is 24285  speedup is 61.64
== pallas ==
sudo rsync --progress -av /etc/hadoop/conf.cluster/capacity-scheduler.xml /etc/hadoop/conf.cluster/configuration.xsl /etc/hadoop/conf.cluster/core-site.xml /etc/hadoop/conf.cluster/core-site.xml~ /etc/hadoop/conf.cluster/fair-scheduler.xml /etc/hadoop/conf.cluster/hadoop-env.sh /etc/hadoop/conf.cluster/hadoop-metrics.properties /etc/hadoop/conf.cluster/hadoop-policy.xml /etc/hadoop/conf.cluster/hdfs-site.xml /etc/hadoop/conf.cluster/hdfs-site.xml~ /etc/hadoop/conf.cluster/log4j.properties /etc/hadoop/conf.cluster/mapred-site.xml /etc/hadoop/conf.cluster/mapred-site.xml~ /etc/hadoop/conf.cluster/masters /etc/hadoop/conf.cluster/masters~ /etc/hadoop/conf.cluster/slaves /etc/hadoop/conf.cluster/slaves~ /etc/hadoop/conf.cluster/ssl-client.xml.example /etc/hadoop/conf.cluster/ssl-server.xml.example pallas:/etc/hadoop/conf.cluster
sending incremental file list

sent 382 bytes  received 12 bytes  262.67 bytes/sec
total size is 24285  speedup is 61.64

ここ読め


✒️️ Edit  ⏰ History  🗑 Delete