本文为分布式计算 - 云计算课程课程实验的记录。

本实验通过购买华为云 ECS 和在华为云上选取对象存储服务 OBS 服务,为后续实验提供搭建
环境基础。(华为发放的有代金券)

实验流程

本实验的基本步骤包含:购买并配置 ECS;购买 OBS 并获取 AK、SK 信息;搭建
Hadoop 集群;搭建 MapReduce 集群并验证存算分离。

image-20221230111909631

实验目的

・掌握华为云上购买 ECS 步骤。

・掌握华为云上选取对象存储服务 OBS。

・掌握 Hadoop 集群搭建。

・掌握 MapReduce 实验实现存算分离。

# 环境准备

购买四台 ECS 服务器

image-20221230114840224

购买一个 OBS 对象存储服务并创建桶

image-20221230114925373

配置 Access Key 和 Secret Key

image-20221230115054920

# 搭建 Hadoop 集群

命令行 ssh 登录服务器

1
2
3
4
5
6
7
8
9
10
C:\Users\cy>ssh [email protected]
The authenticity of host '124.70.xxx.xxx (124.70.xxx.xxx)' can't be established.
ECDSA key fingerprint is SHA256:niKW2UdeZOz5jDXQSbmpHY1vFt******************.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '124.70.xxx.xxx' (ECDSA) to the list of known hosts.
[email protected]'s password:

Welcome to Huawei Cloud Service

[root@ecs-****-0002 ~]#

# 下载所需软件

下载 Hadoop

1
2
3
4
5
6
7
8
9
[root@ecs-****-0002 ~]# wget https://archive.apache.org/dist/hadoop/common/hadoop-2.8.3/hadoop-2.8.3.tar.gz
--2022-12-30 12:10:51-- https://archive.apache.org/dist/hadoop/common/hadoop-2.8.3/hadoop-2.8.3.tar.gz
Resolving archive.apache.org (archive.apache.org)... 138.201.131.134, 2a01:4f8:172:2ec5::2
Connecting to archive.apache.org (archive.apache.org)|138.201.131.134|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 244469481 (233M) [application/x-gzip]
Saving to: ‘hadoop-2.8.3.tar.gz’

5% [===> ] 14,147,584 295KB/s eta 4m 21s

5M 的带宽,确实有点慢。。由于下载速度太慢,换用 winscp 传输,以下都下载到本地然后用 winscp 传过去

image-20221231135948082

下载 OBSFileSystem

1
wget https://github.com/huaweicloud/obsa-hdfs/raw/master/release/hadoop-huaweicloud-2.8.3-hw-39.jar

下载 OpenJDK

1
wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u191-b12/OpenJDK8U-jdk_aarch64_linux_hotspot_8u191b12.tar.gz
踩坑:华为云的ssh总是自动断连,需要修改`sshd`和`ssh`心跳设置
1
vim /etc/ssh/sshd_config

​ 找到

1
2
#ClientAliveInterval 0
#ClientAliveCountMax 3

去掉注释修改为

1
2
ClientAliveInterval 30 #服务端每隔多少秒向客户端发送一个心跳数据
ClientAliveCountMax 86400 #客户端多少次没有响应,服务器自动断掉连接

ecs :wq 退出并保存配置,

重启 sshd 服务

1
service sshd restart

修改 ssh 配置

1
vim /etc/ssh/ssh_config

添加两行

1
2
ServerAliveInterval 20 #表示每隔多少秒,客户端向服务端,发送一次心跳(alive 检测)
ServerAliveCountMax 999 #表示服务端多少次心跳无响应之后,客户端才会认为与服务器的 SSH 连接已经断开,然后断开连接

退出并保存,然后重新 ssh 连接

配置节点互信

node1 关闭防火墙

1
2
systemctl stop firewalld
systemctl disable firewalld

node1-node4 配置节点 ssh rsa 密钥互信

分别执行生成 rsa 公钥,并输出公钥内容

1
2
ssh-keygen -t rsa
cat /root/.ssh/id_rsa.pub

复制所有的公钥文本内容,汇总到一个文件 nodes_keys,再将其内容复制到所有节点的 /root/.ssh/authorized_keys

1
vim /root/.ssh/authorized_keys

再将各自 ip 和 node 节点名加入到各个节点 /etc/hosts

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@ecs-****-0001 .ssh]# ssh ecs-****-0002
The authenticity of host 'ecs-****-0002 (124.70.***.***)' can't be established.
ECDSA key fingerprint is SHA256:******************************************.
ECDSA key fingerprint is MD5:**:**:**:**:**:**:**:**:**:**:**:**:**:**:**:**.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ecs-****-0002,124.70.***.***' (ECDSA) to the list of known hosts.
Last failed login: Sat Dec 31 15:22:29 CST 2022 from 68.183.***.*** on ssh:notty
There were 5 failed login attempts since the last successful login.
Last login: Sat Dec 31 14:52:26 2022 from 223.90.***.***

Welcome to Huawei Cloud Service

[root@ecs-****-0002 ~]# exit

确保每个节点跟其他任意节点都能无密码登录

# 安装 OpenJDK

创建目录

各个节点分别执行下面的命令

1
2
3
mkdir -p /home/modules/data/buf/
mkdir -p /home/test_tools/
mkdir -p /home/nm/localdir

复制 jdk 安装包

node1 节点执行如下命令,复制 jdk 安装包到 /usr/lib/jvm 目录下

1
cp OpenJDK8U-jdk_aarch64_linux_hotspot_8u191b12.tar.gz /usr/lib/jvm/

node1 节点执行如下命令,将 jdk 安装包复制到其他节点的目录中

1
for i in {2..4};do scp /usr/lib/jvm/OpenJDK8U-jdk_aarch64_linux_hotspot_8u191b12.tar.gz root@ecs-***-000${i}:/usr/lib/jvm/;done

其中 root@ecs-****-000 为节点名前缀

1
2
3
4
[root@ecs-****-0001 ~]# for i in {2..4};do scp /usr/lib/jvm/OpenJDK8U-jdk_aarch64_linux_hotspot_8u191b12.tar.gz root@ecs-****-000${i}:/usr/lib/jvm/;done
OpenJDK8U-jdk_aarch64_linux_hotspot_8u191b12.tar.gz 100% 72MB 10.3MB/s 00:07
OpenJDK8U-jdk_aarch64_linux_hotspot_8u191b12.tar.gz 100% 72MB 1.2MB/s 01:02
OpenJDK8U-jdk_aarch64_linux_hotspot_8u191b12.tar.gz 100% 72MB 110.9MB/s 00:00

解压安装

四个节点分别执行如下命令,解压安装包

1
2
cd /usr/lib/jvm/
tar zxvf OpenJDK8U-jdk_aarch64_linux_hotspot_8u191b12.tar.gz

修改配置项

四个节点分别在 /etc/profile 最后增加 export JAVA_HOME=/usr/lib/jvm/jdk8u191-b12

1
vim /etc/profile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
****************
for i in /etc/profile.d/*.sh /etc/profile.d/sh.local ; do
if [ -r "$i" ]; then
if [ "${-#*i}" != "$-" ]; then
. "$i"
else
. "$i" >/dev/null
fi
fi
done

unset i
unset -f pathmunge

export JAVA_HOME=/usr/lib/jvm/jdk8u191-b12 #添加该行

确认 java 版本

1
2
source /etc/profile
java -version
1
2
3
4
5
[root@ecs-****-0001 jvm]# java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
[root@ecs-****-0001 jvm]#

# 搭建 Hadoop 集群

在 node1 节点解压 Hadoop 安装包

1
2
3
4
cd /root
cp hadoop-2.8.3.tar.gz /home/modules/
cd /home/modules/
tar zxvf hadoop-2.8.3.tar.gz

配置 Hadoop 环境变量

1
vim /home/modules/hadoop-2.8.3/etc/hadoop/hadoop-env.sh

末尾添加

1
export JAVA_HOME=/usr/lib/jvm/jdk8u191-b12
1
2
3
4
5
6
7
8
9
10
# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by
# the user that will run the hadoop daemons. Otherwise there is the
# potential for a symlink attack.
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER
export JAVA_HOME=/usr/lib/jvm/jdk8u191-b12 #添加该行
各xml配置

修改 Hadoop core-site.xml 配置文件

1
vim /home/modules/hadoop-2.8.3/etc/hadoop/core-site.xml

由于 vim 粘贴太麻烦,我选择用 winscp 的文本编辑器来修改

修改其中的 fs.obs.access.key fs.obs.secret.key fs.obs.endpoint

前两项对应之前下载的 credentials.csv 中的 Access Key Id 项和 Secret Access Key

最后一个需要在华为云控制台的对象存储服务 - 概况中找到

image-20221231161000306

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
<configuration>
<property>
<name>fs.obs.readahead.inputstream.enabled</name>
<value>true</value>
</property>
<property>
<name>fs.obs.buffer.max.range</name>
<value>6291456</value>
</property>
<property>
<name>fs.obs.buffer.part.size</name>
<value>2097152</value>
</property>
<property>
<name>fs.obs.threads.read.core</name>
<value>500</value>
</property>
<property>
<name>fs.obs.threads.read.max</name>
<value>1000</value>
</property>
<property>
<name>fs.obs.write.buffer.size</name>
<value>8192</value>
</property>
<property>
<name>fs.obs.read.buffer.size</name>
<value>8192</value>
</property>
<property>
<name>fs.obs.connection.maximum</name>
<value>1000</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://****:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/modules/hadoop-2.8.3/tmp</value>
</property>
<property>
<name>fs.obs.access.key</name>
<value>>***********</</value>
</property>
<property>
<name>fs.obs.secret.key</name>
<value>*************************</value>
</property>
<property>
<name>fs.obs.endpoint</name>
<value>obs.cn-north-4.myhuaweicloud.com:5080</value>
</property>
<property>
<name>fs.obs.buffer.dir</name>
<value>/home/modules/data/buf</value>
</property>
<property>
<name>fs.obs.impl</name>
<value>org.apache.hadoop.fs.obs.OBSFileSystem</value>
</property>
<property>
<name>fs.obs.connection.ssl.enabled</name>
<value>false</value>
</property>
<property>
<name>fs.obs.fast.upload</name>
<value>true</value>
</property>
<property>
<name>fs.obs.socket.send.buffer</name>
<value>65536</value>
</property>
<property>
<name>fs.obs.socket.recv.buffer</name>
<value>65536</value>
</property>
<property>
<name>fs.obs.max.total.tasks</name>
<value>20</value>
</property>
<property>
<name>fs.obs.threads.max</name>
<value>20</value>
</property>
</configuration>

配置 hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node1:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>node1:50091</value>
</property>
</configuration>

配置 yarn-site.xml

其中 node1 为实际节点名

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
<configuration>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/nm/localdir</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>28672</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>3072</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>28672</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>38</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>38</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for
containers</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://node1:19888/jobhistory/logs</value>
</property>
</configuration>

配置 mapred-site.xml

重命名 mapred-sit.xml.template

1
2
cd /home/modules/hadoop-2.8.3/etc/hadoop/
mv mapred-site.xml.template mapred-site.xml

修改 mapred-site.xml ,node1 为实际节点名

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>1800000</value>
</property>
</configuration>

配置 slaves

1
vim /home/modules/hadoop-2.8.3/etc/hadoop/slaves

修改为如下内容

1
2
3
ecs-****-0002
ecs-****-0003
ecs-****-0004

配置 jar 包

1
cd /root

通过 WinScp 将 jar 包上传到服务器,然后执行如下命令将 jar 包复制到对应目录

1
2
3
4
cp hadoop-huaweicloud-2.8.3-hw-39.jar /home/modules/hadoop-2.8.3/share/hadoop/common/lib/
cp hadoop-huaweicloud-2.8.3-hw-39.jar /home/modules/hadoop-2.8.3/share/hadoop/tools/lib/
cp hadoop-huaweicloud-2.8.3-hw-39.jar /home/modules/hadoop-2.8.3/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/
cp hadoop-huaweicloud-2.8.3-hw-39.jar /home/modules/hadoop-2.8.3/share/hadoop/hdfs/lib/

分发 Hadoop 包到各个节点

node1 执行如下命令将包分发给其他节点

1
for i in {2..4};do scp -r /home/modules/hadoop-2.8.3 root@ecs-****-000${i}:/home/modules/;done

又是漫长的等待。。

配置环境变量

在各个节点的 /etc/profile 文件中添加

1
2
3
4
export HADOOP_HOME=/home/modules/hadoop-2.8.3
export PATH=$JAVA_HOME/bin:$PATH
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_CLASSPATH=/home/modules/hadoop-2.8.3/share/hadoop/tools/lib/*:$HADOOP_CLASSPATH

然后各个节点都执行 source /etc/profile 命令来应用配置

namenode 初始化

node1 执行 hdfs namenode -format 进行初始化

1
2
3
4
5
6
7
8
9
10
11
........................................
22/12/31 17:52:29 INFO common.Storage: Storage directory /home/modules/hadoop-2.8.3/tmp/dfs/name has been successfully formatted.
22/12/31 17:52:29 INFO namenode.FSImageFormatProtobuf: Saving image file /home/modules/hadoop-2.8.3/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
22/12/31 17:52:30 INFO namenode.FSImageFormatProtobuf: Image file /home/modules/hadoop-2.8.3/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
22/12/31 17:52:30 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
22/12/31 17:52:30 INFO util.ExitUtil: Exiting with status 0
22/12/31 17:52:30 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/
[root@ecs-****-0001 jvm]#

node1 执行 start-dfs.sh 启动 HDFS

发现警告信息

1
22/12/31 18:07:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

缺少 native-hadoop library , 事实上这个警告信息应该没有影响

继续执行 hdfs 命令

1
2
hdfs dfs -mkdir /bigdata
hdfs dfs -ls /
1
2
3
4
5
6
[root@ecs-****-0001 ~]# hdfs dfs -mkdir /bigdata
22/12/31 18:19:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@ecs-****-0001 ~]# hdfs dfs -ls /
22/12/31 18:19:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x - root supergroup 0 2022-12-31 18:18 /bigdata

# 测试与 OBS 互联

在 OBS 中上传文件

在华为云控制台中,选择 对象存储服务-OBS , 上传对象

image-20221231182349241

执行 hdfs 命令查看 OBS 文件

1
2
#obs-ecs 为实际的桶名称
hdfs dfs -ls obs://obs-ecs/
1
2
3
4
5
6
7
.........................................
2022-12-31 18:25:03 816|com.obs.services.ObsClient|doActionWithResult|2824|Storage|1|HTTP+XML|listObjects||||2022-12-31 18:25:03|2022-12-31 18:25:03|||0|
2022-12-31 18:25:03 816|com.obs.services.ObsClient|doActionWithResult|2827|ObsClient [listObjects] cost 21 ms

Found 1 items
-rw-rw-rw- 1 root root 118 2022-12-31 18:24 obs://obs-ecs/palyerinfo.txt
[root@ecs-****-0001 ~]#

看到输出了 -rw-rw-rw- 1 root root 118 2022-12-31 18:24 obs://obs-ecs/palyerinfo.txt

Hadoop 集群与 OBS 互联成功。

# MapReduce 程序实验

本实使用 MapReduce 验证存算分离,数据存放在 OBS 上,计算在 ECS 执行。通过本实验可
以掌握大数据存算分离操作。

# 测试 Hadoop 集群功能

node1 启动 YARN

执行命令 start-yarn.sh

1
2
3
4
5
6
7
[root@ecs-****-0001 ~]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/modules/hadoop-2.8.3/logs/yarn-root-resourcemanager-ecs-****-0001.out
ecs-****-0002: starting nodemanager, logging to /home/modules/hadoop-2.8.3/logs/yarn-root-nodemanager-ecs-****-0002.out
ecs-****-0004: starting nodemanager, logging to /home/modules/hadoop-2.8.3/logs/yarn-root-nodemanager-ecs-****-0004.out
ecs-****-0003: starting nodemanager, logging to /home/modules/hadoop-2.8.3/logs/yarn-root-nodemanager-ecs-****-0003.out
[root@ecs-****-0001 ~]#

测试文件

之前上传的 palyerinfo.txt 内容为

1
2
3
4
5
Alex James Lax Genu
Kerry Mary Olivia William
Hale Edith Vera Robert
Mary Olivia James Lax
Edith Vera Robertm Genu

执行 hdfs dfs -cat obs://obs-ecs/palyerinfo.txt 测试输出该文件内容

1
2
3
4
5
6
7
8
..............................
2022-12-31 18:41:21 153|com.obs.services.ObsClient|doActionWithResult|2827|ObsClient [getObject] cost 44 ms

Alex,James,Lax,Genu
Kerry,Mary,Olivia,William
Hale,Edith,Vera,Robert
Mary,Olivia,James,Lax
Edith,Vera,Robertm,Genu[root@ecs-****-0001 ~]#

执行 hadoop wordcount 进行统计单词频率

1
hadoop jar /home/modules/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar wordcount obs://obs-ecs/ /output
出现了如下错误 `There are 0 datanode(s) running and no node(s) are excluded in this operation.`
1
2
3
4
5
6
7
8
.................................
22/12/31 19:20:15 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/root/.staging/job_1672483928513_0004/job.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1726)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2561)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:829)
................................

暂未解决