IK.AM

@making's tech note


pgvector, PostgresMLに対応したVMware Greenplum® 7をvSphere上のRocky Linux VMにインストールするメモ

🗃 {Middleware/RDBMS/Greenplum}
🏷 Greenplum 🏷 Tanzu 🏷 PostgreSQL 🏷 Rocky 🏷 pgvector 🏷 PostgresML 🏷 vSphere 🏷 Machine Learning 
🗓 Updated at 2023-10-31T08:01:23Z  🗓 Created at 2023-10-02T06:52:05Z   🌎 English Page

⚠️ 本記事の内容はVMwareによってサポートされていません。 記事の内容で生じた問題については自己責任で対応し、 VMwareサポート窓口には問い合わせないでください

VMware Greenplum® 7がリリースされました。pgvectorPostgresMLに対応し、AI対応が強化されています。
VMware Greenplum® 7を vSphere上にインストールしてみます。

Coordinator 1台 (gp-coordinator) と Segment 2台 (gp-segment1, gp-segment2) の構成を作ります。本記事ではVMware Greenplum® 7.0.0をインストールします。

基本的には インストールガイド にしたがって作業しましたが、以下の手順はとりあえず動作させるための設定しか行っておらず、推奨設定を全て実施しているわけではありません。 自己責任でご参照ください。

目次

Rocky Linux VMの作成

LinuxのDistroには サポートされている Rocky Linux 8を使用します。

ISOファイルからVMを作成します。

https://download.rockylinux.org/pub/rocky/8/isos/x86_64/Rocky-8.8-x86_64-minimal.iso のISOファイルをコンテンツライブラリに保存しておきます。

image image

VMを新規作成します。

image

名前はgp-coordinatorにします。

image

以下、適当に選択します。

image image image image

CPUは8、メモリは16GBにしましたが、もっと小さくても動作すると思います。CD/DVDドライブにコンテンツライブラリを指定し、ISOイメージをマウントします。

image image

VMを作成します。

image

パワーオンします。

image

コンソールを起動します。

image image

"Install Rocky Linux 8.8" を選択します。

image

Englishを選択。

image image

Networkを選択し、ONにします。IPアドレス(ここでは10.220.46.50)を控えておきます。Hostnameもgp-coordinatorにします。"Done"をクリックします。

image

rootユーザーのパスワードを設定し、"Done"をクリックします。

image

(オプショナルで)Timezoneを選択し、"Done"をクリックします。

image

Installation Destinationはデフォルトのまま、"Done"をクリックします。

image

"Begin Installation"をクリックします。

image

完了したら"Reboot System"をクリックします。

image

ターミナルを開き、控えたIPアドレスに対してrootユーザーでsshログインします。

ssh root@10.220.46.50

OSの初期セットアップ

gp-coordinator VM上で次のコマンドを実行します

sudo yum -y install epel-release
sudo yum -y config-manager --set-enabled powertools
sudo yum -y install apr \
apr-util \
bash \
bzip2 \
curl \
lsof \
cmake \
bind-utils \
krb5-libs \
libcgroup-tools \
libcurl \
libevent \
libxml2 \
libyaml \
zlib \
openldap \
openssh-clients \
openssh-server \
openssl \
openssl-libs \
sshpass \
perl \
python39 \
readline \
rsync \
R \
sed \
tar \
zip \
apr \
apr-util \
libyaml \
libevent \
java-11-openjdk-devel

echo 2 | sudo update-alternatives --config java

cat <<EOF | sudo tee -a /etc/sysctl.d/99-sysctl.conf > /dev/null
net.ipv4.ip_local_reserved_ports=65330
EOF
sudo sysctl --system

cat <<EOF | sudo tee -a /etc/security/limits.conf > /dev/null
* soft nofile 65536
* hard nofile 65536
EOF

VMware Greenplum® のダウンロード

VMware Greenplum® のrpmをダウンロードします。ここでは pivnet CLIを使用します。API Tokenは こちら から取得できます。

mkdir -p ~/Downloads/
cd ~/Downloads

curl -sL https://github.com/pivotal-cf/pivnet-cli/releases/download/v3.0.1/pivnet-linux-amd64-3.0.1 > pivnet
chmod +x pivnet
sudo mv pivnet /usr/local/bin/
pivnet login --api-token=*********************-r
pivnet download-product-files --product-slug='vmware-greenplum' --release-version='7.0.0' --glob='*.rpm'
pivnet download-product-files --product-slug='vmware-greenplum' --release-version='7.0.0' --glob='*.gppkg'
pivnet download-product-files --product-slug='vmware-greenplum' --release-version='7.0.0' --glob='*.tar.gz'

VMware Greenplum®のインストール

gpadminユーザーを作ってrpmをダウンロードしたインストールします。

gp-coordinator,gp-segment1, gp-segment2 それぞれで以下を実行します。

sudo groupadd gpadmin
sudo useradd -m gpadmin -g gpadmin
echo gpadmin:Greenplum123 | sudo chpasswd
echo 'gpadmin ALL=(ALL) NOPASSWD: ALL' | sudo EDITOR='tee -a' visudo

sudo yum -y install ./greenplum-db-7.0.0-el8-x86_64.rpm
sudo chown -R gpadmin:gpadmin /usr/local/greenplum*
sudo chown -R gpadmin:gpadmin /root/Downloads
sudo chgrp -R gpadmin /usr/local/greenplum*

Greenplum用の環境変数などが設定されたスクリプトを .bashrc で読み込むようにします。

cat <<EOF | sudo su - gpadmin bash -c 'tee -a /home/gpadmin/.bashrc'
source /usr/local/greenplum-db/greenplum_path.sh
EOF

Firewallを無効にします。

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/install_guide-prep_os.html#deactivate-or-configure-firewall-software-1

sudo systemctl stop firewalld.service
sudo systemctl disable firewalld.service

VMテンプレートの作成

ここまでの作業を何度も実施するのが大変なので、VMをテンプレート化します。

一旦、gp-coordinatorのVMをパワーオフします。

image

"クローン作成" => "テンプレートにクローン作成" を選択します。

image

テンプレート名をgp-templateにし、適当な場所に保存します。

image

gp-coordinatorを再びパワーオンしておきます。

image

gp-templateテンプレートから仮想マシンを作成します。

image

名前はgp-segment1にします。

image

"作成後に仮想マシンをパワーオン" にチェックを入れます。

image

VMを作成します。

image

同じ手順で、gp-segment2も作成します。

image image

gp-segment1のWebコンソールを開きます。

image

次のコマンドを実行し、hostnameを変更します。

hostnamectl set-hostname gp-segment1

また、ip addrコマンドでIPアドレスを確認します。

同様にgp-segment2のWebコンソールを開き、次のコマンドを実行し、hostnameを変更します。

hostnamectl set-hostname gp-segment2

また、ip addrコマンドでIPアドレスを確認します。

gp-coordinatorgp-segment1gp-segment2それぞれで次のコマンドを実行し、/etc/hostsファイルにそれぞれのIPアドレスを記述します。IPアドレスは自分の環境に合わせて変更してください。

cat <<EOF | sudo tee /etc/hosts > /dev/null
10.220.46.50 gp-coordinator
10.220.46.55 gp-segment1
10.220.46.56 gp-segment2
127.0.0.1 localhost
EOF

この記事ではDHCPのまま作業しますが、必要に応じてstatic ipを設定してください。

SSHの設定

各ノード間をパスワードレスでssh通信できるようにする必要があります。

まずはgp-coordinatorgp-segment1gp-segment2それぞれで次のコマンドを実行して鍵を生成します。

sudo su - gpadmin bash -c 'ssh-keygen -m PEM -t rsa -b 4096 -q -N "" -f /home/gpadmin/.ssh/id_rsa'

以下の作業は gp-coordinator 上でのみで行います。gpadmin ユーザーで作業します。

gp-coordinator の公開鍵を各ホストの /home/gpadmin/.ssh/authorized_keys に追記します。

sudo su - gpadmin

SSHPASS=Greenplum123 sshpass -e ssh-copy-id -o StrictHostKeyChecking=no gp-coordinator
SSHPASS=Greenplum123 sshpass -e ssh-copy-id -o StrictHostKeyChecking=no gp-segment1
SSHPASS=Greenplum123 sshpass -e ssh-copy-id -o StrictHostKeyChecking=no gp-segment2

次のコマンドで各ホストの公開鍵を各ホストの /home/gpadmin/.ssh/known_hosts に追記します。

cat <<EOF > hostfile_exkeys
gp-coordinator
gp-segment1
gp-segment2
EOF

gpssh-exkeys -f hostfile_exkeys

VMware Greenplum®のセットアップ

以下の作業は gp-coordinator 上でのみで行います。引き続き gpadmin ユーザーで作業します。

次のコマンドで各ホスト用のディレクトリの作成をします。

sudo mkdir -p /data/coordinator
sudo chown gpadmin:gpadmin /data/coordinator

cat <<EOF > hostfile_gpssh_segonly
gp-segment1
gp-segment2
EOF

gpssh -f hostfile_gpssh_segonly -e 'sudo mkdir -p /data/primary'
gpssh -f hostfile_gpssh_segonly -e 'sudo mkdir -p /data/mirror'
gpssh -f hostfile_gpssh_segonly -e 'sudo chown -R gpadmin /data/*'

次のコマンドでVMware Greenplum®を初期化します。

mkdir -p gpconfigs
cat <<EOF > gpconfigs/hostfile_gpinitsystem
gp-segment1
gp-segment2
EOF

cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpconfigs/gpinitsystem_config
sed -i.bak \
  -e 's|/data1/primary /data1/primary /data1/primary /data2/primary /data2/primary /data2/primary|/data/primary|' \
  -e 's|/data1/mirror /data1/mirror /data1/mirror /data2/mirror /data2/mirror /data2/mirror|/data/mirror|' \
  -e 's/#MIRROR_PORT_BASE/MIRROR_PORT_BASE/' \
  -e 's/#declare -a MIRROR_DATA_DIRECTORY/declare -a MIRROR_DATA_DIRECTORY/' \
  -e 's/COORDINATOR_HOSTNAME=cdw/COORDINATOR_HOSTNAME=gp-coordinator/' \
  gpconfigs/gpinitsystem_config

gpinitsystem -c gpconfigs/gpinitsystem_config -h gpconfigs/hostfile_gpinitsystem

次のようなログが出力され、確認が求められます。

20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Checking configuration parameters, please wait...
20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Reading Greenplum configuration file gpconfigs/gpinitsystem_config
20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Locale has not been set in gpconfigs/gpinitsystem_config, will set to default value
20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-No DATABASE_NAME set, will exit following template1 updates
20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-COORDINATOR_MAX_CONNECT not set, will set to default value 250
20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Checking configuration parameters, Completed

20231002:14:33:40:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Commencing multi-home checks, please wait...
..
20231002:14:33:41:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Configuring build for standard array
20231002:14:33:41:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Commencing multi-home checks, Completed
20231002:14:33:41:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Building primary segment instance array, please wait...
..
20231002:14:33:43:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Building group mirror array type , please wait...
..
20231002:14:33:45:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Checking Coordinator host
20231002:14:33:45:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Checking new segment hosts, please wait...
....
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Checking new segment hosts, Completed
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Greenplum Database Creation Parameters
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:---------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator Configuration
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:---------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator hostname       = gp-coordinator
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator port           = 5432
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator instance dir   = /data/coordinator/gpseg-1
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator LOCALE         = 
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Greenplum segment prefix   = gpseg
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator Database       = 
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator connections    = 250
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator buffers        = 128000kB
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Segment connections        = 750
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Segment buffers            = 128000kB
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Encoding                   = UNICODE
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Postgres param file        = Off
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Initdb to be used          = /usr/local/greenplum-db-7.0.0/bin/initdb
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-GP_LIBRARY_PATH is         = /usr/local/greenplum-db-7.0.0/lib
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-HEAP_CHECKSUM is           = on
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-HBA_HOSTNAMES is           = 0
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Ulimit check               = Passed
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Array host connect type    = Single hostname per node
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator IP address [1]      = ::1
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator IP address [2]      = 10.220.46.50
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Coordinator IP address [3]      = fe80::250:56ff:feb3:718
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Standby Coordinator             = Not Configured
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Number of primary segments = 1
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total Database segments    = 2
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Trusted shell              = ssh
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Number segment hosts       = 2
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Mirror port base           = 7000
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Number of mirror segments  = 1
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Mirroring config           = ON
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Mirroring type             = Group
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:----------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Greenplum Primary Segment Configuration
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:----------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-gp-segment1 	6000 	gp-segment1 	/data/primary/gpseg0 	2
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-gp-segment2 	6000 	gp-segment2 	/data/primary/gpseg1 	3
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:---------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Greenplum Mirror Segment Configuration
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:---------------------------------------
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-gp-segment2 	7000 	gp-segment2 	/data/mirror/gpseg0 	4
20231002:14:33:52:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-gp-segment1 	7000 	gp-segment1 	/data/mirror/gpseg1 	5

Continue with Greenplum creation Yy|Nn (default=N):
> 

yを入力すると、次のようなログが出力されます。

20231002:14:33:54:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Building the Coordinator instance database, please wait...
20231002:14:33:57:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Starting the Coordinator in admin mode
20231002:14:33:57:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Commencing parallel build of primary segment instances
20231002:14:33:57:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Spawning parallel processes    batch [1], please wait...
..
20231002:14:33:58:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Waiting for parallel processes batch [1], please wait...
............
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Parallel process exit status
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as completed           = 2
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as killed              = 0
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as failed              = 0
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Removing back out file
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-No errors generated from parallel processes
20231002:14:34:10:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Restarting the Greenplum instance in production mode
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Starting gpstop with args: -a -l /home/gpadmin/gpAdminLogs -m -d /data/coordinator/gpseg-1
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Gathering information and validating the environment...
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Obtaining Greenplum Coordinator catalog information
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Obtaining Segment details from coordinator...
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 7.0.0 build commit:0a7a3566873325aca1789ae6f818c80f17a9402d'
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Commencing Coordinator instance shutdown with mode='smart'
20231002:14:34:10:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Coordinator segment instance directory=/data/coordinator/gpseg-1
20231002:14:34:11:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Stopping coordinator segment and waiting for user connections to finish ...
server shutting down
20231002:14:34:12:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Attempting forceful termination of any leftover coordinator process
20231002:14:34:12:005447 gpstop:gp-coordinator:gpadmin-[INFO]:-Terminating processes for segment /data/coordinator/gpseg-1
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Starting gpstart with args: -a -l /home/gpadmin/gpAdminLogs -d /data/coordinator/gpseg-1
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Gathering information and validating the environment...
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 7.0.0 build commit:0a7a3566873325aca1789ae6f818c80f17a9402d'
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Greenplum Catalog Version: '302307241'
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Starting Coordinator instance in admin mode
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-CoordinatorStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data/coordinator/gpseg-1 -l /data/coordinator/gpseg-1/log/startup.log -w -t 600 -o " -c gp_role=utility " start
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Obtaining Greenplum Coordinator catalog information
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Obtaining Segment details from coordinator...
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Setting new coordinator era
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Coordinator Started...
20231002:14:34:15:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Shutting down coordinator
20231002:14:34:18:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Process results...
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-----------------------------------------------------
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-   Successful segment starts                                            = 2
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-----------------------------------------------------
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances 
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-----------------------------------------------------
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Starting Coordinator instance gp-coordinator directory /data/coordinator/gpseg-1 
20231002:14:34:19:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-CoordinatorStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=ab03d3b04e6ea60f_231002143415 $GPHOME/bin/pg_ctl -D /data/coordinator/gpseg-1 -l /data/coordinator/gpseg-1/log/startup.log -w -t 600 -o " -c gp_role=dispatch " start
20231002:14:34:20:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Command pg_ctl reports Coordinator gp-coordinator instance active
20231002:14:34:20:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Connecting to db template1 on host localhost
20231002:14:34:20:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-No standby coordinator configured.  skipping...
20231002:14:34:20:005858 gpstart:gp-coordinator:gpadmin-[INFO]:-Database successfully started
20231002:14:34:20:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Completed restart of Greenplum instance in production mode
20231002:14:34:20:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Creating core GPDB extensions
20231002:14:34:21:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Importing system collations
20231002:14:34:28:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Commencing parallel build of mirror segment instances
20231002:14:34:28:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Spawning parallel processes    batch [1], please wait...
..
20231002:14:34:28:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Waiting for parallel processes batch [1], please wait...
......
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Parallel process exit status
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as completed           = 2
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as killed              = 0
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Total processes marked as failed              = 0
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:------------------------------------------------
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Scanning utility log file for any warning messages
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Log file scan check passed
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Greenplum Database instance successfully created
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-------------------------------------------------------
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-To complete the environment configuration, please 
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-update gpadmin .bashrc file with the following
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-1. Ensure that the greenplum_path.sh file is sourced
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-2. Add "export COORDINATOR_DATA_DIRECTORY=/data/coordinator/gpseg-1"
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-   to access the Greenplum scripts for this instance:
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-   or, use -d /data/coordinator/gpseg-1 option for the Greenplum scripts
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-   Example gpstate -d /data/coordinator/gpseg-1
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Script log file = /home/gpadmin/gpAdminLogs/gpinitsystem_20231002.log
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-To remove instance, run gpdeletesystem utility
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-To initialize a Standby Coordinator Segment for this Greenplum instance
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Review options for gpinitstandby
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-------------------------------------------------------
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-The Coordinator /data/coordinator/gpseg-1/pg_hba.conf post gpinitsystem
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-has been configured to allow all hosts within this new
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-new array must be explicitly added to this file
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-located in the /usr/local/greenplum-db-7.0.0/docs directory
20231002:14:34:34:002300 gpinitsystem:gp-coordinator:gpadmin-[INFO]:-------------------------------------------------------

Greenplum Database instance successfully created と出力されたのでセットアップが成功したように見えます。

.barshrcに次の環境変数を追加します。

cat <<EOF | tee -a /home/gpadmin/.bashrc > /dev/null
export COORDINATOR_DATA_DIRECTORY=/data/coordinator/gpseg-1
export PGPORT=5432
export PGUSER=gpadmin
export PGDATABASE=gpadmin
export LD_PRELOAD=/lib64/libz.so.1 ps
EOF

source /home/gpadmin/.bashrc

データベースの作成

データベースの作成はPostgreSQLと基本的に同じです。以下の作業は gp-coordinator 上でのみで行います。引き続き gpadmin ユーザーで作業します。

createdb test

VMware Greenplum®へのアクセス

VMware Greenplum®へのアクセスもPostgreSQLと基本的に同じです。以下の作業は gp-coordinator 上でのみで行います。引き続き gpadmin ユーザーで作業します。

$ psql -d test
psql (12.12)
Type "help" for help.

次のSQLを実行します。

CREATE TABLE IF NOT EXISTS organization
(
    organization_id   BIGINT PRIMARY KEY,
    organization_name VARCHAR(255) NOT NULL
);
INSERT INTO organization(organization_id, organization_name) VALUES(1, 'foo');
INSERT INTO organization(organization_id, organization_name) VALUES(2, 'bar');

デフォルではPrimary Keyでデータが分散されるようです。gp_segment_idというカラムでデータがどのセグメントに配置されているか確認することができます。

test=# select organization_id,organization_name,gp_segment_id from organization;
 organization_id | organization_name | gp_segment_id 
-----------------+-------------------+---------------
               2 | bar               |             0
               1 | foo               |             1
(2 rows)

とりあえず、ここまで確認できました。

Extensionの有効化

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/install_guide-install_modules.html

いくつかのExtensionを有効化します。

uuid-ossp

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/ref_guide-modules-uuid-ossp.html

test=# CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION
test=# select uuid_generate_v4();
           uuid_generate_v4           
--------------------------------------
 230d1dd3-aca8-4140-b2fe-83b32cacf954
(1 row)

pgvector

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/ref_guide-modules-pgvector-pgvector.html

test=# CREATE EXTENSION vector;
CREATE EXTENSION
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
 id | embedding 
----+-----------
  1 | [1,2,3]
  2 | [4,5,6]
(2 rows)

postgresml

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/ref_guide-modules-postgresml.html

gppkgファイルはpivnetコマンドでrpmとともにダウンロード済みです。

sudo cp /root/Downloads/DataSciencePython3.9-2.0.0-gp7-el8_x86_64.gppkg ./
gppkg install -a DataSciencePython3.9-2.0.0-gp7-el8_x86_64.gppkg 
createdb gpadmin
gpconfig -c shared_preload_libraries -v 'pgml' 

Greenplumを再起動します。

gpstop -r -a
gpconfig -c pgml.venv -v '$GPHOME/ext/DataSciencePython3.9'

次のエラーが出る場合は、gpstop -rが漏れています。

gpconfig:gp-coordinator:gpadmin-[CRITICAL]:-not a valid GUC: pgml.venv
not a valid GUC: pgml.venv

設定をリロードします。

gpstop -u
test=# CREATE EXTENSION pgml;
INFO:  Python version: 3.9.16 (main, Jul  3 2023, 20:07:32) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-18)]
INFO:  Scikit-learn 1.1.2, XGBoost 1.6.2, LightGBM 4.0.0, NumPy 1.22.1
CREATE EXTENSION

次のエラーが出る場合は、gpstop -uが漏れています。

test=# CREATE EXTENSION IF NOT EXISTS pgml;
INFO:  Python version: 3.9.16 (main, Jul  3 2023, 20:07:32)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-18)]
ERROR:  The xgboost package is missing. Install it with `sudo pip3 install xgboost`
ModuleNotFoundError: No module named 'xgboost' (api.rs:36)

PostgresMLのtransformを試します。

SELECT pgml.transform(
    'question-answering',
    inputs => ARRAY[
        '{
            "question": "What does the customer want?",
            "context": "Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee."}'
    ]
) AS answer;

初回はモデルのロードに時間がかかりますが、2回目以降は速いです。

                                            answer                                            
----------------------------------------------------------------------------------------------
 {"end": 358, "score": 0.6312912106513977, "start": 335, "answer": "an exchange of Megatron"}
(1 row)

ユーザー追加

以下のSQLを実行して、postgresmlユーザーとpostgresmlデータベースを作ります。手順はPostgreSQLと同じです。

psql -c "CREATE ROLE postgresml PASSWORD 'postgresml' SUPERUSER LOGIN"
createdb postgresml --owner postgresml
psql -c 'ALTER ROLE postgresml SET search_path TO public,pgml'

外からアクセスできるようにgp-coordinator上で、次の設定ファイルを変更し、Greenplumを再起動します。

cat <<EOF | tee -a ${COORDINATOR_DATA_DIRECTORY}/pg_hba.conf > /dev/null
host     postgresml  postgresml      0.0.0.0/0      md5
EOF
gpstop -r -a

gp-coordinatorの外からgp-coordinatorのIPにアクセスします。

$ PGPASSWORD=postgresml psql -U postgresml -d postgresml -h 10.220.46.50
psql (15.3, server 12.12)
Type "help" for help.

postgresml=#

以下のSQLを実行して、結果が変えればOKです。

CREATE EXTENSION IF NOT EXISTS pgml;
SELECT pgml.transform(
    'question-answering',
    inputs => ARRAY[
        '{
            "question": "What does the customer want?",
            "context": "Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee."}'
    ]
) AS answer;

並列分散処理が得意なGreenplumでvgvectorやPostgresMLが利用できることで、Retrieval Augmented Generation (RAG)パターンに大きく活用できそうです。


✒️️ Edit  ⏰ History  🗑 Delete