Analytical Functions

I mentioned recently that you sometimes see people programming in sql. Well, one entry to that is the analytical function. The syntax always escapes me, and the idea of them are always a little bit foggy. So now that I had an opportunity to use it I thought I’d jot it down here for future reference to myself.
You can read all about analytical (or window) functions elsewhere I know. But this is for me! A window function allows you to compute some value over a running window of your data, say a running average or running sum based off of some value. I’ve used that to have a running monthly average of some expense over a year for my expense data (where you’d partition over a year for each transaction and order the window by transaction_date). For example:
SELECT
amount,
avg(amount) OVER (PARTITION BY EXTRACT(YEAR FROM transaction_date) ORDER BY transaction_date) as running_monthly_average
FROM transactions
WHERE category = "food"
ORDER BY transaction_date DESC
But today I had a different usecase: I wanted to find duplicate entries in my data! And I’m glad I did, turned out that I had quite a few. This led me into the ranking functions, specifically LEAD() and LAG() which does a pretty simple thing really. They show the next or previous record respectively, for the column you select. This allows some pretty funky stuff:
SELECT
transaction_date,
LEAD(transaction_date) OVER (ORDER BY transaction_date DESC) AS next_date,
FROM transactions
ORDER BY transaction_date DESC
will simply give you the date for each row, and the date for the next row. Since I knew that duplicate entries would specifically have the same title, date, amount (and potentially category) I could extract all of those into their own CTE and then just select the sequential rows that have identical titles, dates and names.
WITH
dupl AS (
SELECT
transaction_date AS date,
LEAD(transaction_date) OVER (ORDER BY transaction_date DESC) AS next_date,
title,
LEAD(title) OVER (ORDER BY transaction_date DESC) AS next_title,
amount,
LEAD(amount) OVER (ORDER BY transaction_date DESC) AS next_amount,
category,
LEAD(category) OVER (ORDER BY transaction_date DESC) AS next_cat,
FROM transactions
ORDER BY transaction_date DESC, amount, title
)
SELECT date, title, amount, category
FROM dupl
WHERE
next_date = date
AND next_title = title
AND next_amount = amount
ORDER BY amount DESC, date DESC
Et Presto! Only the entries that have been duplicated show up! Pretty sweet when you actually have a usecase for a sql function. I’m coming for you next =VLOOKUP()…