---
title: pgvector, PostgresMLに対応したVMware Greenplum® 7をvSphere上のRocky Linux VMにインストールするメモ
tags: ["Greenplum", "Tanzu", "PostgreSQL", "Rocky", "pgvector", "PostgresML", "vSphere", "Machine Learning"]
categories: ["Middleware", "RDBMS", "Greenplum"]
date: 2023-10-02T06:52:05Z
updated: 2023-10-31T08:01:23Z
---

[VMware Greenplum®](https://network.tanzu.vmware.com/products/vmware-greenplum) 7がリリースされました。[pgvector](https://github.com/pgvector/pgvector) や[PostgresML](https://postgresml.org/)に対応し、AI対応が強化されています。<br>
VMware Greenplum® 7を vSphere上にインストールしてみます。<br>

Coordinator 1台 (`gp-coordinator`) と Segment 2台 (`gp-segment1`, `gp-segment2`) の構成を作ります。本記事ではVMware Greenplum® 7.0.0をインストールします。

基本的には [インストールガイド](https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/install_guide-install_guide.html) にしたがって作業しましたが、以下の手順はとりあえず動作させるための設定しか行っておらず、推奨設定を全て実施しているわけではありません。
自己責任でご参照ください。

**目次**
<!-- toc -->

### Rocky Linux VMの作成

LinuxのDistroには [サポートされている](https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/install_guide-platform-requirements-overview.html#operating-systems) Rocky Linux 8を使用します。

ISOファイルからVMを作成します。

https://download.rockylinux.org/pub/rocky/8/isos/x86_64/Rocky-8.8-x86_64-minimal.iso のISOファイルをコンテンツライブラリに保存しておきます。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/08b75d45-aca7-4603-b889-cd86f762fa48">

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/acc05cd8-e17a-4fc4-b78c-eba481fee3bd">

VMを新規作成します。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/7498b51f-c662-4c82-8b41-d0a7b59d0969">

名前は`gp-coordinator`にします。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/bd070f85-3a72-438b-883b-4d8f96547e94">

以下、適当に選択します。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/6b8270d7-3dd8-42cc-b58c-ae4bf7213337">

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/70037fd9-3608-4846-8c10-c9cd6bafecf2">

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/6c0320bd-eff0-4db2-bc31-73e90de43859">

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/38bc4633-35b2-4694-b269-65cf37791aa8">

CPUは8、メモリは16GBにしましたが、もっと小さくても動作すると思います。CD/DVDドライブにコンテンツライブラリを指定し、ISOイメージをマウントします。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/6a1750b3-16f0-445d-8b8b-5ecd685a94b4">

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/ee28ea81-458a-4895-892a-443273352995">

VMを作成します。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/5371e224-f77f-40b3-b9ed-52729fc7f2ca">

パワーオンします。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/770be933-aece-4915-9171-0cc4d28fe9bc">

コンソールを起動します。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/9f45ae4e-e48e-4735-b0af-c4fea73648ae">


<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/24fca66f-99c0-4c6e-abef-f20bf92b6281">

"Install Rocky Linux 8.8" を選択します。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/06d9eda0-f2aa-432b-9260-54f71ae05322">

Englishを選択。


<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/bc24199e-49c5-4c4d-ad9b-106f435037d5">


<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/64fb91ff-50ce-48f0-8555-ad773be6caac">


Networkを選択し、ONにします。IPアドレス(ここでは10.220.46.50)を控えておきます。Hostnameも`gp-coordinator`にします。"Done"をクリックします。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/1f4d13f3-25d0-4c48-914e-0b09db8aacd7">

rootユーザーのパスワードを設定し、"Done"をクリックします。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/a0f9dc9d-f5a3-4aeb-82aa-e92b985e4924">

(オプショナルで)Timezoneを選択し、"Done"をクリックします。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/0c614f95-a492-4ce7-8fe1-cd2754a086d7">

Installation Destinationはデフォルトのまま、"Done"をクリックします。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/6d66443f-dea3-452b-ad8d-d88e52392a8a">

"Begin Installation"をクリックします。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/a869f06f-857f-41ec-b241-611bf686a179">

完了したら"Reboot System"をクリックします。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/cb2a8bb1-e946-4ef3-b83c-8745982df88a">

ターミナルを開き、控えたIPアドレスに対してrootユーザーでsshログインします。

```
ssh root@10.220.46.50
```

### OSの初期セットアップ

`gp-coordinator` VM上で次のコマンドを実行します

```bash
sudo yum -y install epel-release
sudo yum -y config-manager --set-enabled powertools
sudo yum -y install apr \
apr-util \
bash \
bzip2 \
curl \
lsof \
cmake \
bind-utils \
krb5-libs \
libcgroup-tools \
libcurl \
libevent \
libxml2 \
libyaml \
zlib \
openldap \
openssh-clients \
openssh-server \
openssl \
openssl-libs \
sshpass \
perl \
python39 \
readline \
rsync \
R \
sed \
tar \
zip \
apr \
apr-util \
libyaml \
libevent \
java-11-openjdk-devel

echo 2 | sudo update-alternatives --config java

cat <<EOF | sudo tee -a /etc/sysctl.d/99-sysctl.conf > /dev/null
net.ipv4.ip_local_reserved_ports=65330
EOF
sudo sysctl --system

cat <<EOF | sudo tee -a /etc/security/limits.conf > /dev/null
* soft nofile 65536
* hard nofile 65536
EOF

```

### VMware Greenplum® のダウンロード

VMware Greenplum® のrpmをダウンロードします。ここでは [`pivnet`](https://github.com/pivotal-cf/pivnet-cli) CLIを使用します。API Tokenは [こちら](https://network.tanzu.vmware.com/users/dashboard/edit-profile) から取得できます。


```bash
mkdir -p ~/Downloads/
cd ~/Downloads

curl -sL https://github.com/pivotal-cf/pivnet-cli/releases/download/v3.0.1/pivnet-linux-amd64-3.0.1 > pivnet
chmod +x pivnet
sudo mv pivnet /usr/local/bin/
pivnet login --api-token=*********************-r
pivnet download-product-files --product-slug='vmware-greenplum' --release-version='7.0.0' --glob='*.rpm'
pivnet download-product-files --product-slug='vmware-greenplum' --release-version='7.0.0' --glob='*.gppkg'
pivnet download-product-files --product-slug='vmware-greenplum' --release-version='7.0.0' --glob='*.tar.gz'
```

### VMware Greenplum®のインストール

`gpadmin`ユーザーを作ってrpmをダウンロードしたインストールします。

`gp-coordinator`,`gp-segment1`, `gp-segment2` それぞれで以下を実行します。

```bash
sudo groupadd gpadmin
sudo useradd -m gpadmin -g gpadmin
echo gpadmin:Greenplum123 | sudo chpasswd
echo 'gpadmin ALL=(ALL) NOPASSWD: ALL' | sudo EDITOR='tee -a' visudo

sudo yum -y install ./greenplum-db-7.0.0-el8-x86_64.rpm
sudo chown -R gpadmin:gpadmin /usr/local/greenplum*
sudo chown -R gpadmin:gpadmin /root/Downloads
sudo chgrp -R gpadmin /usr/local/greenplum*
```

Greenplum用の環境変数などが設定されたスクリプトを `.bashrc` で読み込むようにします。

```bash
cat <<EOF | sudo su - gpadmin bash -c 'tee -a /home/gpadmin/.bashrc'
source /usr/local/greenplum-db/greenplum_path.sh
EOF
```

Firewallを無効にします。

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/install_guide-prep_os.html#deactivate-or-configure-firewall-software-1

```
sudo systemctl stop firewalld.service
sudo systemctl disable firewalld.service
```

### VMテンプレートの作成

ここまでの作業を何度も実施するのが大変なので、VMをテンプレート化します。

一旦、`gp-coordinator`のVMをパワーオフします。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/b31e3089-daa6-4ff5-8c6e-fb7424ac4dd3">

"クローン作成" => "テンプレートにクローン作成" を選択します。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/29e7fb09-6020-4bf4-8c9b-a0bc14287d06">

テンプレート名を`gp-template`にし、適当な場所に保存します。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/afb3bb55-8993-41d6-ad26-9061a0a7c305">

`gp-coordinator`を再びパワーオンしておきます。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/32da41aa-9bea-42df-a144-757188f89ec8">


`gp-template`テンプレートから仮想マシンを作成します。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/dcddf7e6-263f-4111-ae14-1b7d450e15a9">

名前は`gp-segment1`にします。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/a86c32ae-fcdc-4166-8849-8f3ef640ba17">

"作成後に仮想マシンをパワーオン" にチェックを入れます。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/155921af-87e6-4c40-8632-b67a286fbc7f">

VMを作成します。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/d007fb25-56ab-49c4-8619-c78356df1ee6">

同じ手順で、`gp-segment2`も作成します。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/10043783-f280-489d-8d7d-2d87a58c9c89">

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/f051a10d-a205-492e-b19c-51c24fbb310d">

`gp-segment1`のWebコンソールを開きます。

<img width="1912" alt="image" src="https://github.com/making/blog.ik.am/assets/106908/a946774f-bd37-4cd6-8518-13db1896945f">

次のコマンドを実行し、hostnameを変更します。

```
hostnamectl set-hostname gp-segment1
```

また、`ip addr`コマンドでIPアドレスを確認します。



同様に`gp-segment2`のWebコンソールを開き、次のコマンドを実行し、hostnameを変更します。

```
hostnamectl set-hostname gp-segment2
```

また、`ip addr`コマンドでIPアドレスを確認します。

`gp-coordinator`、`gp-segment1`、`gp-segment2`それぞれで次のコマンドを実行し、`/etc/hosts`ファイルにそれぞれのIPアドレスを記述します。IPアドレスは自分の環境に合わせて変更してください。

```
cat <<EOF | sudo tee /etc/hosts > /dev/null
10.220.46.50 gp-coordinator
10.220.46.55 gp-segment1
10.220.46.56 gp-segment2
127.0.0.1 localhost
EOF
```

この記事ではDHCPのまま作業しますが、必要に応じてstatic ipを設定してください。

### SSHの設定

各ノード間をパスワードレスでssh通信できるようにする必要があります。

まずは`gp-coordinator`、`gp-segment1`、`gp-segment2`それぞれで次のコマンドを実行して鍵を生成します。

```bash
sudo su - gpadmin bash -c 'ssh-keygen -m PEM -t rsa -b 4096 -q -N "" -f /home/gpadmin/.ssh/id_rsa'
```

以下の作業は `gp-coordinator` 上でのみで行います。`gpadmin` ユーザーで作業します。

`gp-coordinator` の公開鍵を各ホストの `/home/gpadmin/.ssh/authorized_keys` に追記します。

```bash
sudo su - gpadmin

SSHPASS=Greenplum123 sshpass -e ssh-copy-id -o StrictHostKeyChecking=no gp-coordinator
SSHPASS=Greenplum123 sshpass -e ssh-copy-id -o StrictHostKeyChecking=no gp-segment1
SSHPASS=Greenplum123 sshpass -e ssh-copy-id -o StrictHostKeyChecking=no gp-segment2
```

次のコマンドで各ホストの公開鍵を各ホストの `/home/gpadmin/.ssh/known_hosts` に追記します。

```bash
cat <<EOF > hostfile_exkeys
gp-coordinator
gp-segment1
gp-segment2
EOF

gpssh-exkeys -f hostfile_exkeys
```

### VMware Greenplum®のセットアップ

以下の作業は `gp-coordinator` 上でのみで行います。引き続き `gpadmin` ユーザーで作業します。

次のコマンドで各ホスト用のディレクトリの作成をします。

```bash
sudo mkdir -p /data/coordinator
sudo chown gpadmin:gpadmin /data/coordinator

cat <<EOF > hostfile_gpssh_segonly
gp-segment1
gp-segment2
EOF

gpssh -f hostfile_gpssh_segonly -e 'sudo mkdir -p /data/primary'
gpssh -f hostfile_gpssh_segonly -e 'sudo mkdir -p /data/mirror'
gpssh -f hostfile_gpssh_segonly -e 'sudo chown -R gpadmin /data/*'
```

次のコマンドでVMware Greenplum®を初期化します。

```bash
mkdir -p gpconfigs
cat <<EOF > gpconfigs/hostfile_gpinitsystem
gp-segment1
gp-segment2
EOF

cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpconfigs/gpinitsystem_config
sed -i.bak \
  -e 's|/data1/primary /data1/primary /data1/primary /data2/primary /data2/primary /data2/primary|/data/primary|' \
  -e 's|/data1/mirror /data1/mirror /data1/mirror /data2/mirror /data2/mirror /data2/mirror|/data/mirror|' \
  -e 's/#MIRROR_PORT_BASE/MIRROR_PORT_BASE/' \
  -e 's/#declare -a MIRROR_DATA_DIRECTORY/declare -a MIRROR_DATA_DIRECTORY/' \
  -e 's/COORDINATOR_HOSTNAME=cdw/COORDINATOR_HOSTNAME=gp-coordinator/' \
  gpconfigs/gpinitsystem_config

gpinitsystem -c gpconfigs/gpinitsystem_config -h gpconfigs/hostfile_gpinitsystem
```

次のようなログが出力され、確認が求められます。

```
20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Checking configuration parameters, please wait...
20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Reading Greenplum configuration file gpconfigs/gpinitsystem_config
20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Locale has not been set in gpconfigs/gpinitsystem_config, will set to default value
20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-No DATABASE_NAME set, will exit following template1 updates
20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-COORDINATOR_MAX_CONNECT not set, will set to default value 250
20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Checking configuration parameters, Completed

20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Commencing multi-home checks, please wait...
..
20231002:14:33:41:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Configuring build for standard array
20231002:14:33:41:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Commencing multi-home checks, Completed
20231002:14:33:41:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Building primary segment instance array, please wait...
..
20231002:14:33:43:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Building group mirror array type , please wait...
..
20231002:14:33:45:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Checking Coordinator host
20231002:14:33:45:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Checking new segment hosts, please wait...
....
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Checking new segment hosts, Completed
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Greenplum Database Creation Parameters
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:---------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator Configuration
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:---------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator hostname       = gp-coordinator
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator port           = 5432
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator instance dir   = /data/coordinator/gpseg-1
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator LOCALE         = 
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Greenplum segment prefix   = gpseg
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator Database       = 
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator connections    = 250
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator buffers        = 128000kB
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Segment connections        = 750
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Segment buffers            = 128000kB
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Encoding                   = UNICODE
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Postgres param file        = Off
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Initdb to be used          = /usr/local/greenplum-db-7.0.0/bin/initdb
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-GP_LIBRARY_PATH is         = /usr/local/greenplum-db-7.0.0/lib
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-HEAP_CHECKSUM is           = on
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-HBA_HOSTNAMES is           = 0
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Ulimit check               = Passed
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Array host connect type    = Single hostname per node
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator IP address [1]      = ::1
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator IP address [2]      = 10.220.46.50
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator IP address [3]      = fe80::250:56ff:feb3:718
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Standby Coordinator             = Not Configured
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Number of primary segments = 1
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total Database segments    = 2
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Trusted shell              = ssh
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Number segment hosts       = 2
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Mirror port base           = 7000
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Number of mirror segments  = 1
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Mirroring config           = ON
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Mirroring type             = Group
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:----------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Greenplum Primary Segment Configuration
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:----------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-gp-segment1 	6000 	gp-segment1 	/data/primary/gpseg0 	2
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-gp-segment2 	6000 	gp-segment2 	/data/primary/gpseg1 	3
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:---------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Greenplum Mirror Segment Configuration
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:---------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-gp-segment2 	7000 	gp-segment2 	/data/mirror/gpseg0 	4
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-gp-segment1 	7000 	gp-segment1 	/data/mirror/gpseg1 	5

Continue with Greenplum creation Yy|Nn (default=N):
> 
```

`y`を入力すると、次のようなログが出力されます。

```
20231002:14:33:54:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Building the Coordinator instance database, please wait...
20231002:14:33:57:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Starting the Coordinator in admin mode
20231002:14:33:57:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Commencing parallel build of primary segment instances
20231002:14:33:57:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Spawning parallel processes    batch [1], please wait...
..
20231002:14:33:58:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Waiting for parallel processes batch [1], please wait...
............
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Parallel process exit status
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as completed           = 2
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as killed              = 0
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as failed              = 0
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Removing back out file
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-No errors generated from parallel processes
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Restarting the Greenplum instance in production mode
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Starting gpstop with args: -a -l /home/gpadmin/gpAdminLogs -m -d /data/coordinator/gpseg-1
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Gathering information and validating the environment...
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Obtaining Greenplum Coordinator catalog information
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Obtaining Segment details from coordinator...
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 7.0.0 build commit:0a7a3566873325aca1789ae6f818c80f17a9402d'
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Commencing Coordinator instance shutdown with mode='smart'
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Coordinator segment instance directory=/data/coordinator/gpseg-1
20231002:14:34:11:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Stopping coordinator segment and waiting for user connections to finish ...
server shutting down
20231002:14:34:12:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Attempting forceful termination of any leftover coordinator process
20231002:14:34:12:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Terminating processes for segment /data/coordinator/gpseg-1
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Starting gpstart with args: -a -l /home/gpadmin/gpAdminLogs -d /data/coordinator/gpseg-1
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Gathering information and validating the environment...
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 7.0.0 build commit:0a7a3566873325aca1789ae6f818c80f17a9402d'
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Greenplum Catalog Version: '302307241'
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Starting Coordinator instance in admin mode
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-CoordinatorStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data/coordinator/gpseg-1 -l /data/coordinator/gpseg-1/log/startup.log -w -t 600 -o " -c gp_role=utility " start
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Obtaining Greenplum Coordinator catalog information
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Obtaining Segment details from coordinator...
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Setting new coordinator era
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Coordinator Started...
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Shutting down coordinator
20231002:14:34:18:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Process results...
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-----------------------------------------------------
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-   Successful segment starts                                            = 2
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-----------------------------------------------------
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances 
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-----------------------------------------------------
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Starting Coordinator instance gp-coordinator directory /data/coordinator/gpseg-1 
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-CoordinatorStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=ab03d3b04e6ea60f_231002143415 $GPHOME/bin/pg_ctl -D /data/coordinator/gpseg-1 -l /data/coordinator/gpseg-1/log/startup.log -w -t 600 -o " -c gp_role=dispatch " start
20231002:14:34:20:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Command pg_ctl reports Coordinator gp-coordinator instance active
20231002:14:34:20:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Connecting to db template1 on host localhost
20231002:14:34:20:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-No standby coordinator configured.  skipping...
20231002:14:34:20:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Database successfully started
20231002:14:34:20:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Completed restart of Greenplum instance in production mode
20231002:14:34:20:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Creating core GPDB extensions
20231002:14:34:21:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Importing system collations
20231002:14:34:28:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Commencing parallel build of mirror segment instances
20231002:14:34:28:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Spawning parallel processes    batch [1], please wait...
..
20231002:14:34:28:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Waiting for parallel processes batch [1], please wait...
......
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Parallel process exit status
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as completed           = 2
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as killed              = 0
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as failed              = 0
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Scanning utility log file for any warning messages
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Log file scan check passed
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Greenplum Database instance successfully created
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-------------------------------------------------------
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-To complete the environment configuration, please 
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-update gpadmin .bashrc file with the following
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-1. Ensure that the greenplum_path.sh file is sourced
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-2. Add "export COORDINATOR_DATA_DIRECTORY=/data/coordinator/gpseg-1"
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-   to access the Greenplum scripts for this instance:
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-   or, use -d /data/coordinator/gpseg-1 option for the Greenplum scripts
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-   Example gpstate -d /data/coordinator/gpseg-1
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Script log file = /home/gpadmin/gpAdminLogs/gpinitsystem_20231002.log
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-To remove instance, run gpdeletesystem utility
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-To initialize a Standby Coordinator Segment for this Greenplum instance
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Review options for gpinitstandby
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-------------------------------------------------------
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-The Coordinator /data/coordinator/gpseg-1/pg_hba.conf post gpinitsystem
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-has been configured to allow all hosts within this new
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-new array must be explicitly added to this file
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-located in the /usr/local/greenplum-db-7.0.0/docs directory
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-------------------------------------------------------
```

`Greenplum Database instance successfully created` と出力されたのでセットアップが成功したように見えます。

`.barshrc`に次の環境変数を追加します。

```
cat <<EOF | tee -a /home/gpadmin/.bashrc > /dev/null
export COORDINATOR_DATA_DIRECTORY=/data/coordinator/gpseg-1
export PGPORT=5432
export PGUSER=gpadmin
export PGDATABASE=gpadmin
export LD_PRELOAD=/lib64/libz.so.1 ps
EOF

source /home/gpadmin/.bashrc
```


### データベースの作成

データベースの作成はPostgreSQLと基本的に同じです。以下の作業は `gp-coordinator` 上でのみで行います。引き続き `gpadmin` ユーザーで作業します。

```bash
createdb test
```

### VMware Greenplum®へのアクセス

VMware Greenplum®へのアクセスもPostgreSQLと基本的に同じです。以下の作業は `gp-coordinator` 上でのみで行います。引き続き `gpadmin` ユーザーで作業します。


```bash
$ psql -d test
psql (12.12)
Type "help" for help.
```

次のSQLを実行します。

```sql
CREATE TABLE IF NOT EXISTS organization
(
    organization_id   BIGINT PRIMARY KEY,
    organization_name VARCHAR(255) NOT NULL
);
INSERT INTO organization(organization_id, organization_name) VALUES(1, 'foo');
INSERT INTO organization(organization_id, organization_name) VALUES(2, 'bar');
```

デフォルではPrimary Keyでデータが分散されるようです。`gp_segment_id`というカラムでデータがどのセグメントに配置されているか確認することができます。


```sql
test=# select organization_id,organization_name,gp_segment_id from organization;
 organization_id | organization_name | gp_segment_id 
-----------------+-------------------+---------------
               2 | bar               |             0
               1 | foo               |             1
(2 rows)
```

とりあえず、ここまで確認できました。

### Extensionの有効化

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/install_guide-install_modules.html

いくつかのExtensionを有効化します。

#### uuid-ossp

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/ref_guide-modules-uuid-ossp.html

```
test=# CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION
test=# select uuid_generate_v4();
           uuid_generate_v4           
--------------------------------------
 230d1dd3-aca8-4140-b2fe-83b32cacf954
(1 row)
```


#### pgvector

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/ref_guide-modules-pgvector-pgvector.html


```sql
test=# CREATE EXTENSION vector;
CREATE EXTENSION
```

```sql
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
```

```
 id | embedding 
----+-----------
  1 | [1,2,3]
  2 | [4,5,6]
(2 rows)
```

#### postgresml

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/ref_guide-modules-postgresml.html

gppkgファイルは`pivnet`コマンドでrpmとともにダウンロード済みです。

```
sudo cp /root/Downloads/DataSciencePython3.9-2.0.0-gp7-el8_x86_64.gppkg ./
gppkg install -a DataSciencePython3.9-2.0.0-gp7-el8_x86_64.gppkg 
```

```
createdb gpadmin
gpconfig -c shared_preload_libraries -v 'pgml' 
```

Greenplumを再起動します。

```
gpstop -r -a
```

```
gpconfig -c pgml.venv -v '$GPHOME/ext/DataSciencePython3.9'
```

> 次のエラーが出る場合は、`gpstop -r`が漏れています。
> 
> ```
> gpconfig:gp-coordinator:gpadmin-[CRITICAL]:-not a valid GUC: pgml.venv
> not a valid GUC: pgml.venv
> ```

設定をリロードします。

```
gpstop -u
```


```sql
test=# CREATE EXTENSION pgml;
INFO:  Python version: 3.9.16 (main, Jul  3 2023, 20:07:32) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-18)]
INFO:  Scikit-learn 1.1.2, XGBoost 1.6.2, LightGBM 4.0.0, NumPy 1.22.1
CREATE EXTENSION
```

> 次のエラーが出る場合は、`gpstop -u`が漏れています。
>
> ```
> test=# CREATE EXTENSION IF NOT EXISTS pgml;
> INFO:  Python version: 3.9.16 (main, Jul  3 2023, 20:07:32)
> [GCC 8.5.0 20210514 (Red Hat 8.5.0-18)]
> ERROR:  The xgboost package is missing. Install it with `sudo pip3 install xgboost`
> ModuleNotFoundError: No module named 'xgboost' (api.rs:36)
> ```

PostgresMLのtransformを試します。

```sql
SELECT pgml.transform(
    'question-answering',
    inputs => ARRAY[
        '{
            "question": "What does the customer want?",
            "context": "Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee."}'
    ]
) AS answer;
```

初回はモデルのロードに時間がかかりますが、2回目以降は速いです。

```
                                            answer                                            
----------------------------------------------------------------------------------------------
 {"end": 358, "score": 0.6312912106513977, "start": 335, "answer": "an exchange of Megatron"}
(1 row)
```

### ユーザー追加

以下のSQLを実行して、`postgresml`ユーザーと`postgresml`データベースを作ります。手順はPostgreSQLと同じです。


```
psql -c "CREATE ROLE postgresml PASSWORD 'postgresml' SUPERUSER LOGIN"
createdb postgresml --owner postgresml
psql -c 'ALTER ROLE postgresml SET search_path TO public,pgml'
```

外からアクセスできるように`gp-coordinator`上で、次の設定ファイルを変更し、Greenplumを再起動します。

```
cat <<EOF | tee -a ${COORDINATOR_DATA_DIRECTORY}/pg_hba.conf > /dev/null
host     postgresml  postgresml      0.0.0.0/0      md5
EOF
gpstop -r -a
```

`gp-coordinator`の外から`gp-coordinator`のIPにアクセスします。

```
$ PGPASSWORD=postgresml psql -U postgresml -d postgresml -h 10.220.46.50
psql (15.3, server 12.12)
Type "help" for help.

postgresml=#
```

以下のSQLを実行して、結果が変えればOKです。

```sql
CREATE EXTENSION IF NOT EXISTS pgml;
SELECT pgml.transform(
    'question-answering',
    inputs => ARRAY[
        '{
            "question": "What does the customer want?",
            "context": "Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee."}'
    ]
) AS answer;
```

---

並列分散処理が得意なGreenplumでvgvectorやPostgresMLが利用できることで、[Retrieval Augmented Generation (RAG)](https://ai.meta.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/)パターンに大きく活用できそうです。
