Nick’s Data Blog | Engineering and Data

What actually happens when you COPY in Postgres?

I recently had someone ask me why the COPY command is more performant than INSERT INTO. While coming up with an answer, I discovered I was starting from a deficient: I didn’t know how COPY works under the hood. Trying to come up with an answer was at best a guess. Through this post, I hope to narrow that knowledge gap and help myself and others get a deeper understanding of my favorite database.

12 min read

Modifying Nginx settings on ElasticBeanStalk with Docker

We run our stack on ElasticBeanStalk - and have potentially large payloads. The default payload length is 1MB for nginx - which was too small for us. Here’s how to update that in ElasticBeanStalk if you run a Dockerfile.

1 min read

Pixel Art Challenge

This one was pretty fun. During a team social - we were asked to draw a picture in Google Sheets given a color pallete - 1 was red, 2 was blue, 3 was black, etc.

1 min read

Connecting Databricks to Redshift with SparklyR

Databricks gives documentation hooking up Spark with Redshift using the raw Spark libraries, but not with SparklyR, which gives some great functions you want (notably - dplyr syntax). This post is how to connect the two.

1 min read

Airflow & Kubernetes

I’ve seen a lot of people confused about the difference between the KubernetesExecutor and the KubernetesPodOperator - they similarly named and both use Kubernetes Pods, yet very different in how they run, so the goal of this post is to lay out the differences and help you decide which to use.

5 min read

Generating a non-self signed SSL Certificate for Kubernetes

We had an issue with our Kubernetes cluster running Astronomer where we had a SSL certificate for our cluster - but it had a 90 day expiration. The configuration was through Terraform, but due to some version skew, as well as complicated dependency trees we didn’t want to address, we needed to generate a valid SSL certificate today - and update a secret that containers in EKS were using. Typically - you should automate this process somehow (Airflow job or some similarly scheduled performance), but if you need the manual version - here’s how we did it.

1 min read

Turning csv into list for SQL

So this is something I’ve been doing for a long time - if you have a CSV of values and don’t want to write a Python script to read in the results and execute SQL, you can go ahead and use Excel/GoogleSheets/{your favorite spreadsheet program} to create this into a comma seperated list.

~1 min read

Killing long running redshift connections

We recently ran into an issue where we needed to kill database connections to redshift. To do so, we just wrote the following script. Nothing fancy, but useful.

~1 min read