Call Us: (212) 336-1440-556

Building Classification ML Model With Google BigQuery

Post last modified:July 16, 2022
Reading time:3 mins read

Background

Google BigQuery supports running ML models using SQL queries which basically bridges the gap for data analysts and data scientists. As a data analyst, you don’t have to learn python, R, or yet another popular ML framework or library.

A basic understanding of ML discipline is enough and with the help of SQL, data analysts can enter into the complex-looking fancy world of Machine Learning.

BigQuery ML supports various types of ML models such as :

Linear Regression Binary
Logistic Regression
Multiclass Logistic Regression
K-means clustering and many more.

In this blog, we will build a binary classification model using BigQuery ML to predict Travel Insurance Claim will be filled by the customer or not.

Kaggle Dataset

We will use the Travel Insurance dataset from Kaggle for this tutorial.
Download the Travel Insurance Dataset.

Load data Into BigQuery

We can upload data in Bigquery in many ways, but for this tutorial simplicity I will use Cloud Console from Google Cloud to Load data into BigQuery Table with the name “travel_insurance”. Enable Auto to detect the checkbox so that you don’t have to define a schema for the table.

Creating Logistic Regression Model

After loading data into the table successfully, now we are ready to create our first binary logistic regression classification model. The syntax is pretty simple and self-explanatory.https://medium.com/media/50c6eef1c286c0fe9f2fc62bbf0b7dd4

Model Evaluation

Once the model is created, we will evaluate the model in order to judge if the model is accurate and precise enough to predict our input data.https://medium.com/media/62e3dc154dd05c21420d296b17c2bebf

Above query execution will result in various logistic regression related columns:

precision
recall
accuracy
f1_score
log_loss
roc_curve

Model Prediction

Once we are happy with our result for model evaluation, now we can run our test data against the model to classify customers based on whether they will file a claim or not. We have used the same input data that we used for training just for demo purposes but in reality, separate test data should be used against our trained model.https://medium.com/media/d5a250b3d332db8c2b88bfd3221c93c6

Above query execution added predicted_claim , predicted_claim_probs.label and preditcted_claim_probs.prob columns into the result table. These columns provide details of the probability of a customer filing for the claim.

Conclusion

BigQuery ML is narrowing down the gap between data analysts and data scientists. In my opinion, it’s a great effort from BigQuery to give power of machine learning in hands of data analysts who understand data much better but due to lack of knowledge cannot apply machine learning principles on data and depends on the data scientist.

Let me know your opinion what do you think about BigQuery ML! ✌️
Happy Analyzing!

Tags: BigQuery, BigQuery ML, Classification Model, GCP

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Building Classification ML Model With Google BigQuery

Background

Kaggle Dataset

Load data Into BigQuery

Creating Logistic Regression Model

Model Evaluation

Model Prediction

Conclusion

Leave a Reply Cancel reply

Recent Posts

Background

Kaggle Dataset

Load data Into BigQuery

Creating Logistic Regression Model

Model Evaluation

Model Prediction

Conclusion

Please Share This Share this content

You Might Also Like

Longest Substring Without Repeating Characters — Leetcode #3 ( Java Solution )

Regex With SQL Database Explained!

Java Interview Practice Problem (Beginner): Extract JSON Field

Leave a Reply Cancel reply

Share this content