Building Classification ML Model With Google BigQuery

  • Post last modified:July 16, 2022
  • Reading time:3 mins read


Google BigQuery supports running ML models using SQL queries which basically bridges the gap for data analysts and data scientists. As a data analyst, you don’t have to learn python, R, or yet another popular ML framework or library.

A basic understanding of ML discipline is enough and with the help of SQL, data analysts can enter into the complex-looking fancy world of Machine Learning.

BigQuery ML supports various types of ML models such as :

  • Linear Regression Binary
  • Logistic Regression
  • Multiclass Logistic Regression
  • K-means clustering and many more.

In this blog, we will build a binary classification model using BigQuery ML to predict Travel Insurance Claim will be filled by the customer or not.

Kaggle Dataset

We will use the Travel Insurance dataset from Kaggle for this tutorial.
Download the Travel Insurance Dataset.

Load data Into BigQuery

We can upload data in Bigquery in many ways, but for this tutorial simplicity I will use Cloud Console from Google Cloud to Load data into BigQuery Table with the name “travel_insurance”. Enable Auto to detect the checkbox so that you don’t have to define a schema for the table.

Creating Logistic Regression Model

After loading data into the table successfully, now we are ready to create our first binary logistic regression classification model. The syntax is pretty simple and self-explanatory.

Model Evaluation

Once the model is created, we will evaluate the model in order to judge if the model is accurate and precise enough to predict our input data.

Above query execution will result in various logistic regression related columns:

  • precision
  • recall
  • accuracy
  • f1_score
  • log_loss
  • roc_curve

Model Prediction

Once we are happy with our result for model evaluation, now we can run our test data against the model to classify customers based on whether they will file a claim or not. We have used the same input data that we used for training just for demo purposes but in reality, separate test data should be used against our trained model.

Above query execution added predicted_claim , predicted_claim_probs.label and preditcted_claim_probs.prob columns into the result table. These columns provide details of the probability of a customer filing for the claim.


BigQuery ML is narrowing down the gap between data analysts and data scientists. In my opinion, it’s a great effort from BigQuery to give power of machine learning in hands of data analysts who understand data much better but due to lack of knowledge cannot apply machine learning principles on data and depends on the data scientist.

Let me know your opinion what do you think about BigQuery ML! ✌️
Happy Analyzing!

Leave a Reply