Open in app

Sign In

Write

Sign In

Anand Jain
Anand Jain

35 Followers

Home

About

Published in Towards Dev

·Mar 22

How to load and query invalid JSON files in BigQuery

In collaboration with Anisha Wadawadigi (Google) Overview BigQuery natively supports New line delimited JSON. But for it to work, the JSON files need to be properly formatted. Some of the common issues that arise when JSON files are ingested from various source systems are: Column names do not conform to the…

Bigquery

3 min read

How to load and query invalid JSON files in BigQuery
How to load and query invalid JSON files in BigQuery
Bigquery

3 min read


Published in Towards Dev

·May 6, 2022

Back of envelope calculation of your BigQuery cost

Overview: It is sometimes important to figure out how much it will cost to run BigQuery and how to optimize the cost. Google has provided tools to estimate how many slots you would need and how to manage cost once you have some workloads (data and processing i.e. queries, reports…

Bigquery

2 min read

Back of envelope calculation of your BigQuery cost
Back of envelope calculation of your BigQuery cost
Bigquery

2 min read


Published in Towards Dev

·Mar 14, 2022

How to find duplicate objects in our Google Cloud Storage (GCS) buckets

Problem Statement: If we do a little calculation we find that it cost $26,000/month for storing 1 PB of data in a US multi-region standard bucket. Google Cloud storage provides 11 9’s durability. So it is important to store only the data that we need to reduce Google Cloud Storage cost. When…

Google Cloud Platform

2 min read

Google Cloud Platform

2 min read


Published in Towards Dev

·Jan 14, 2022

How to query the latest partition data in BigQuery

Partition is one of the most important feature for BigQuery performance consideration. BigQuery supports several partitioning schemes including: Date or Time unit columns Ingestion time Integer range columns Sometimes we need to find out how many records are there in the latest partition. In the article we will see how…

Bigquery

2 min read

Bigquery

2 min read


Published in Towards Dev

·Dec 7, 2021

How to “label” BigQuery Stored procedure calls for monitoring and troubleshooting?

Overview: As many organizations use BigQuery for Modern Data Warehouse system, it is very important for any Enterprise Data Warehouse (EDW) systems to support batch jobs that have strict Service Level Agreement SLA (for batch windows). …

Bigquery

2 min read

How to “label” BigQuery Stored procedure calls for monitoring and troubleshooting?
How to “label” BigQuery Stored procedure calls for monitoring and troubleshooting?
Bigquery

2 min read


Published in Towards Dev

·Mar 10, 2021

How to use labels with Google Cloud Platform Dataflow workers

Resource labels are used in Google Cloud Platform for grouping resources. These labels are passed along to Google Cloud billing. This is very helpful if you want to allocate cost across various business units or environments (e.g. Development, Test, Production etc.). The official documentation states that you can label the…

Google Cloud Platform

2 min read

How to use label with Google Cloud Platform Dataflow workers
How to use label with Google Cloud Platform Dataflow workers
Google Cloud Platform

2 min read


Jun 12, 2020

Heuristics for choosing Google Cloud persistence layer services

Are you confused by the multitude of Google Cloud persistence layer options available to you when designing or migrating your applications to Google Cloud? …

Google Cloud Platform

3 min read

Google Cloud Platform

3 min read

Anand Jain

Anand Jain

35 Followers

Strategic Cloud Engineer — Google

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech