Vertex AI で custom container を使ってモデルをサービングする

こんにちは、UPSIDER Data チームです。

現在、Data チームでは「与信システム基盤刷新プロジェクト」に取り組んでいます。UPSIDER カードの強みである「最短即日発行」と「柔軟な与信枠」は、この与信システムによって支えられています。本システムでは、機械学習モデルと Web アプリケーションを組み合わせ、日次バッチ処理や Credit チーム*による UI 上の操作を通じて、新規のお客様への与信設定や、既存のお客様の与信調整を行っています。

*Credit チーム: カード与信の管理を担当するビジネスサイドのチーム

本プロジェクトは複数チームが連携して進めていて、それぞれ以下のように分担しています。

Corp App チーム: 与信システムの Web アプリケーション部分を担当
Data チーム: データ基盤・機械学習基盤・モデル開発を Data Engineer/Data Scientist で分担
Credit チーム: より良いシステムにするための要件出しを担当

与信システムは、これまでも Web アプリケーションへの機能追加や、モデルの学習環境をローカルの Jupyter Notebook から Vertex AI Pipelines に移行する取り組みなど段階的にアップデートされてきましたが、事業拡大に伴い、このたび基盤を一から刷新するプロジェクトが発足しました。

とくに、機械学習基盤についてはマネージドシステムをより活用し、運用負荷の低減・アプリケーションとの接続しやすさを目指しています。具体的なタスクの1つとして、モデルの管理・サービングを Vertex AI の Model / Endpoint へ移行することを検討しています。

タスクを進めるにあたり、自分自身が MLOps や機械学習全般の経験が乏しい中 Vertex AI を調べていて、調査が思ったように進まないというハードルがあったため、同じく Vertex AI を調べている方の参考になればと思い記事を書いてみました。

以下では、Vertex AI でモデルをサービングする方法のうち、 custom container を使ってオンライン推論のエンドポイントをデプロイする手順と実装上のポイントを紹介します。

Vertex AI でのモデルのサービング

今回関連する Vertex AI の機能としては、

学習したモデルを Model Registry に登録する
モデルに対して Endpoint を作成する

の2つがあります。これらは、 Python SDK ではそれぞれ Model クラスと Endpoint クラスで表現されています。実装イメージとしては以下のようになり、Model さえ作成できればあとは model.deploy() を呼ぶだけで Endpoint を作成できます。

from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)

# ... Model を作成する準備 ...

model = aiplatform.Model(
  # model を作成するための設定
)
endpoint = model.deploy(machine_type="n1-standard-4")

参考:

Model クラス: https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model
Endpoint クラス: https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Endpoint

サービングの方法

Vertex AI でモデルをサービングする方法として、一番シンプルなのは、 "prebuilt container" という Vertex AI が用意したコンテナイメージを使うことです。

prebuilt container は、以下の ML フレームワークに対して用意されています。

TensorFlow
PyTorch
XGBoost
scikit-learn

上記のいずれかの ML フレームワークを使って作成したモデルであれば、prebuilt container が使えます。

https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers

一方、上記以外のフレームワークの場合は、

custom container を使う
custom inference routine を使う

のいずれかの方法でのサービングが必要です。

custom container は、名前の通りコンテナを一からつくる方法で、 HTTP server を自分で用意し、前処理や推論のコードも自分で書く必要があります。

custom inference routine では、一部のコードはすでに Vertex AI に用意されていて、抽象クラスを実装することでサービングが実現でき、 custom container よりは自分で書くコードは少なくできます。一方で、柔軟性の観点では custom inference routine の抽象クラスに沿った形で書く必要があるため、custom container よりは実装の自由度が低くなるというデメリットはあります。

今回のプロジェクトでは一部レガシーの実装を引き継ぐ必要があり、 custom inference routine だと若干書きにくい部分があったため、custom container を使う意思決定をしました。

https://cloud.google.com/vertex-ai/docs/predictions/use-custom-container

https://cloud.google.com/vertex-ai/docs/predictions/custom-prediction-routines

custom container での実装

custom container の要件

custom container を使う場合は、以下のリンク先のドキュメントにある通り、いくつか実装に関する要件があります。

https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements

要件の一覧を見るとなんだか難しそうな感じがしますが、サンプルコードを見ると意外とシンプルな実装になることがわかります。

以下に、サンプルコードにコメントをつけたものを示します。

# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from fastapi import FastAPI, Request

import joblib
import json
import numpy as np
import pickle
import os

from google.cloud import storage
from preprocess import MySimpleScaler
from sklearn.datasets import load_iris

app = FastAPI()
gcs_client = storage.Client()

# モデルのアーティファクトを AIP_STORAGE_URI に指定された GCS のパスからダウンロード
with open("preprocessor.pkl", 'wb') as preprocessor_f, open("model.joblib", 'wb') as model_f:
    gcs_client.download_blob_to_file(
        f"{os.environ['AIP_STORAGE_URI']}/preprocessor.pkl", preprocessor_f
    )
    gcs_client.download_blob_to_file(
        f"{os.environ['AIP_STORAGE_URI']}/model.joblib", model_f
    )

with open("preprocessor.pkl", "rb") as f:
    preprocessor = pickle.load(f)

_class_names = load_iris().target_names
# モデルの読み込み
_model = joblib.load("model.joblib")
# 前処理ファイルの読み込み
_preprocessor = preprocessor

# healthcheck 用のエンドポイント; ルートは AIP_HEALTH_ROUTE に指定されたパス
@app.get(os.environ['AIP_HEALTH_ROUTE'], status_code=200)
def health():
    return {}

# 推論用のエンドポイント; ルートは AIP_PREDICT_ROUTE に指定されたパス
@app.post(os.environ['AIP_PREDICT_ROUTE'])
async def predict(request: Request):
    body = await request.json()

    # リクエストから特徴量を取得
    instances = body["instances"]
    inputs = np.asarray(instances)
    # 前処理
    preprocessed_inputs = _preprocessor.preprocess(inputs)
    # 推論
    outputs = _model.predict(preprocessed_inputs)

    # 推論結果を返す
    return {"predictions": [_class_names[class_num] for class_num in outputs]}

(https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/custom/SDK_Custom_Container_Prediction.ipynb を参考に筆者が解説を付与)

要件のポイントとしては、以下の通りです。

health check 用の route と、 inference 用の route が必要
これらの route は AIP_HEALTH_ROUTE と AIP_PREDICT_ROUTE に指定されている
- AIP_* の環境変数は Vertex AI が自動で設定してくれる
- see: https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables
モデルのアーティファクトは事前に GCS にアップロードしておき、 container では AIP_STORAGE_URI に指定されたパスからダウンロードすることでモデルを読み込む
request, response はそれぞれかたちが決まっている
- see: https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#inference

custom container からの request/response について

request/response について補足すると、request は

{ "instances": [...], "parameters: {} }

のような形で、

instances: 必須; "array of one or more JSON values of any type" なので、特徴量の配列や何かの ID を渡すことができる
parameters: 任意の JSON オブジェクト

の2つのキーを設定できます

response は、

{ "predictions": [...] }

のようなかたちで、 request の instance に対応した prediction を返します。

モデル作成からの全体の流れ

一応ドキュメントや、前述のサンプルコードにも、モデル作成から Endpoint のデプロイまでは書いてあるのですが、初めて見たとき自分にはわかりにくかったため、以下に Jupyter Notebook の実行例を示します。

個人的に、以下のポイントが抑えられるとわかりやすいかなと思います。

Model に対して、 model.deploy() を呼ぶことで Endpoint リソースが作成される
Model を作るためには、モデルのアーティファクト (.model ファイルなど) への URI と custom container のイメージ URI の両方を指定する必要がある

# %%
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# %% [markdown]
前提:

- モデルが作成済みで `MODEL_ARTIFACT_DIR` のパスに `.model` ファイルなどがある
- custom container (Flask や FastAPI のコード + Dockerfile)  `CONTAINER_SRC_DIR` 以下に用意してある
- Artifact Registry のリポジトリ, GCS Bucket が作成済み

# %%
PROJECT_ID = "your-project-id"
LOCATION = "your-location"
CONTAINER_SRC_DIR = "your-container-src-dir" # ローカルで custom container のコードがあるディレクトリ
MODEL_ARTIFACT_DIR = "your-model-artifact-dir" # モデルのアーティファクトのあるディレクトリ
GCS_MODEL_ARTIFACT_URI = "gs://your-bucket/your-directory" # GCS でモデルのアーティファクトをアップロードするパス
REPOSITORY = "your-ar-repo-name" # Artifact Registry のリポジトリ名
IMAGE = "your-image-name" # custom container のイメージ名

# %%
# このディレクトリにモデルファイルがあると想定
ls $MODEL_ARTIFACT_DIR
# my-model.model

# %%
# GCS にモデルファイルをアップロード
!gsutil cp $MODEL_ARTIFACT_DIR/my-model.model $GCS_MODEL_ARTIFACT_URI/my-model.model

# %%
# custom container のイメージをビルド
!docker build \\
  --platform linux/amd64 \\ # MacOS でビルドしている場合は必要
  -t $LOCATION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE \\
  $CONTAINER_SRC_DIR

# %%
# Artifact Registry にイメージを push
!docker push $LOCATION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE

# %%
# Model を作成
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)

# artifact URI と image URI の両方を指定して Model を作成
model = aiplatform.Model.upload(
  display_name="my-model",
  artifact_uri=$GCS_MODEL_ARTIFACT_URI, # ディレクトリまでなので注意
  serving_container_image_uri="$LOCATION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE",
)

# %%
# Endpoint を作成
endpoint = model.deploy(machine_type="n1-standard-4")

# %%
# Endpoint にリクエストを投げる
instances = [[1, 2, 3]] # よしなに設定する
response = endpoint.predict(instances)
response

(https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/custom/SDK_Custom_Container_Prediction.ipynb を参考に筆者が作成)

複数の route を持たせたい場合

https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#inference

推論の route は基本的には AIP_PREDICT_ROUTE に指定されたパスになりますが、必要に応じて複数の route を持たせることができます。Model リソース作成時に設定を入れると、 /invoke/foo/bar に送ったリクエストがサーバー側の /foo/bar にルーティングされます。

詳細はこちらのドキュメントを参照してください。

※ドキュメントにも記載がありますが、2025年10月現在、本機能は Pre-GA の段階です。

Endpoint の種類

Vertex AI の Endpoint にはいくつか種類があり、

public
- shared public endpoint
- dedicated public endpoint
private
- private endpoint
- dedicated private endpoint using Private Service Connect

の4つがあります。

セキュリティ要件に応じて選ぶのはもちろんですが、 shared public endpoint では

request/response のサイズが 1.5MB まで
タイムアウトが60秒

という制限があるため、いずれかを超える場合は、dedicated public endpoint などの検討が必要です。

https://cloud.google.com/vertex-ai/docs/predictions/choose-endpoint-type

参考: custom inference routine での実装

custom inference routine を使う場合のサンプルコードは以下のリンクにあります。 https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/prediction/custom_prediction_routines/SDK_Custom_Predict_and_Handler_SDK_Integration.ipynb

(全体的に Vertex AI のサービング周りの機能については “inference” と “prediction” の表記揺れがあり、ドキュメントでは “custom inference routine” と書かれていますが、コード上では "custom prediction routine" となっています)

custom inference routine では Dockerfile の用意は不要で、推論をする部分の Predictor と、必要に応じてリクエストを処理する部分の Handler の2つのクラスを実装することでサービングができます (Handler の実装は optional) 。

# %%
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# %%
PROJECT_ID = "your-project-id"
LOCATION = "your-location"
USER_SRC_DIR = "your-user-src-dir" # handler.py, predictor.py が置いてあるディレクトリ
REGION = "your-region"
REPOSITORY = "your-ar-repo-name" # Artifact Registry のリポジトリ名
IMAGE = "your-image-name" # custom container のイメージ名

# %%
# Handler/Predictor の実装を渡して `LocalModel` リソースを作成
import os

from google.cloud.aiplatform.prediction import LocalModel
# src_dir_handler_sdk を自分の handler.py が置いてあるディレクトリに置き換える
from src_dir_handler_sdk.handler import CprHandler
# src_dir_handler_sdk を自分の predictor.py が置いてあるディレクトリに置き換える
from src_dir_handler_sdk.predictor import CprPredictor

local_model = LocalModel.build_cpr_model(
    USER_SRC_DIR,
    f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{IMAGE}",
    predictor=CprPredictor,  # Update this to the custom predictor class.
    handler=CprHandler,  # Update this to the custom handler class.
    requirements_path=os.path.join(USER_SRC_DIR, "requirements.txt"),
    base_image="python:3.13", # デフォルトの base image から変えたい場合は必要
    platform="linux/amd64", # MacOS でビルドしている場合は必要
)

# %%
# `LocalModel` に `push_image` を呼ぶと、内部の Dockerfile を使って Artifact Registry にイメージを push してくれる
local_model.push_image()

# %%
# `LocalModel` リソースを `Model` リソースに変えて、 `Endpoint` リソースを作成
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)
model = aiplatform.Model.upload(
    local_model=local_model,
    display_name=MODEL_DISPLAY_NAME,
    artifact_uri=f"{BUCKET_URI}/{MODEL_ARTIFACT_DIR}",
)
endpoint = model.deploy(machine_type="n1-standard-4")

(https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/prediction/custom_prediction_routines/SDK_Custom_Predict_and_Handler_SDK_Integration.ipynb を参考に筆者が解説を付与)

参考: オフライン推論

本記事ではオフライン推論については触れませんでしたが、オフライン推論は Model リソースに対して BatchPredictionJob を作成することでできます。

以下に、ドキュメントのサンプルコードの抜粋にコメントをつけたものを記載します。

from google.cloud import aiplatform

project = "your-project-id"
location = "your-region"

# 初期化
aiplatform.init(project=project, location=location)

# Model インスタンスを作成
my_model = aiplatform.Model("your-model-name")

# BatchPredictionJob を作成
batch_prediction_job = my_model.batch_predict(
    job_display_name=job_display_name,
    gcs_source=gcs_source,
    gcs_destination_prefix=gcs_destination,
    instances_format=instances_format,
    machine_type=machine_type,
    accelerator_count=accelerator_count,
    accelerator_type=accelerator_type,
    starting_replica_count=starting_replica_count,
    max_replica_count=max_replica_count,
    sync=sync,
)

# 完了を待つ
batch_prediction_job.wait()

(https://cloud.google.com/vertex-ai/docs/predictions/get-batch-predictions#request_a_batch_inference を参考に筆者が解説を付与)

おわりに

この記事では、Vertex AI で custom container が必要な場合に、 Model リソースと Endpoint リソースを使ってモデルのサービングをする方法を紹介しました。

Vertex AI はまだドキュメントがあまり充実しておらず、技術ブログも多くない中で、どう実装するか迷うことが多いと思います。個人的に比較的うまく行った方法としてはサンプルコードを見ることで、関連するサンプルコードを貼ってあるページから辿ったり、以下のサンプルコード一覧からユースケースにあったサンプルコードを探すと実装のイメージが湧きやすいです。

Vertex AI notebook tutorials | Google Cloud Documentation