Visualizing Cable Outages using Power Query

Bryce Canyon Climb

I’ve long been a fan of PowerQuery (PQ), even more so since Microsoft finally made it available on Mac. If you’re unfamiliar, here’s a worked example of how to use it for a, sadly, every day problem.

Since moving house this summer we’ve gone from a rock solid fiber Internet connection to a less than glorious Xfinity cable connection that suffers an outage or two every day (as my team can attest to with frozen Teams calls). It’s infuriating.

Getting customer service from any cable company is always a challenge (‘have you rebooted your router?’) so I resolved to collect some outage data that will perhaps nudge Xfinity to take me seriously.

The first thing to do is collect good quality data. I installed the netcheck package on my hardwired Raspberry PI  and started to collect data. The software logs outages like this: 

-----------------------------------------------------------------------------
LINK DOWN:                                       Wed 20 Sep 2023 10:29:17 CDT
LINK RECONNECTED:                                Wed 20 Sep 2023 10:30:51 CDT
TOTAL DOWNTIME:                                  1 minutes and 34 seconds.
RECONNECTED LINK SPEED:                         
-----------------------------------------------------------------------------
                                                 Ping: 22.03 ms
                                                 Download: 26.02 Mbit/s
                                                 Upload: 39.22 Mbit/s
-----------------------------------------------------------------------------

The key data we need is those LINK DOWN and RECONNECTED lines. Those are easily extracted with a Grep statement:

grep -P "LINK (D|R)" connection.log > temp.log

That gives us a log file like this:

LINK DOWN:                                       Tue 19 Sep 2023 07:39:40 CDT
LINK RECONNECTED:                                Tue 19 Sep 2023 07:42:26 CDT

My goal is to get to a three column file, outage start, outage end, duration (in seconds), so I can chart the data, perhaps using a Pivot Chart: Pivot Chart

To manipulate the file using PowerQuery, first open Excel and select the Get Data (Power Query) option under the Data Tab (my screenshots are Mac but they are almost identical for PC).

Get data

Select the Text/CSV option, choose the file and you’re presented with a preview of the data:

Preview

Click the transform button to launch the PowerQuery Editor. What I love about this environment is that if you prefer to edit your data visually, you can, yet the editing steps are stored as code you can directly edit if you wish.

A bit like Excel, each line of code (in this case, loading a document) is shown in the function bar with the resulting data shown below:

Transform

Notice all the columns are imported as text (the ‘ABC’ icon in the column header. The first thing to do convert the time to a timestamp. Click that column, click Transform and change the type to time:

Change data type

Some of the columns we don’t need: the timezone, day and ‘LINK’ columns. Click each column header and select remove column. As you transform the data, you’ll see each step being recorded on the right hand side:

Applied Steps

This is one of the things I love about PQ. Click on each step and you can see the exact changes you’ve made to your data.

The next thing to do is to recombine those three date columns into a new column with the date type. Click ‘Custom Column’ on the ‘Add Column’ tab. A quick search of the PQ website shows me the function to convert a string to a date:

Custom column

Next, I combine the date and time columns with a new custom column called StartTime. The formula is as simple as  [Timestamp] & [Column7]

I can now remove the columns I don’t need. Unlike Excel, PQ actually copies the data when you create a new column, so removing superfluous columns won’t ‘break’ your formula. It’s always good practice to remove unneeded columns to reduce storage and keep things tidy, leaving me with two columns.

Two columns left

To chart this data, I need to combine each pair of rows so I have a start and end time. There’s multiple ways of doing this. An easy way is to use Fill Up to copy data from a subsequent row. To do that, we need a new column that will become the end time. I use a conditional column here to only copy the end time if it truly is:

Add conditional column

This results in a half filled End Time column:

but with nulls

Next, I use a ‘Fill Up’ (under the Transform tab, Fill button) command to replace the ’null’ values with the value from the row below:

replaced with fill up

Only two steps remain. We need to remove the ‘RECONNECTED’ rows as we don’t need them any more and then calculate the duration of the outage.

There are multiple ways to achieve the first goal, an easy way is ‘Remove Alternate Rows’ under the ‘Remove Rows’ button

Remove alternates

Finally we add a custom column for duration with the following formula to count seconds: Duration.TotalSeconds(Duration.From([EndTime]-[StartTime]))

After removing the Status column we end up with three columns. Click ‘Close and Load’ to insert them into Excel:

Data loaded to Excel

One thing is missing.. we don’t have rows for days when there were no outages. That’s a post for another day… and with the number of outages there’s not many needed anyway!

Here’s all the steps. While it might seem a lot, each is a very simple transformation that we can easily understand and change or delete if we make a mistake:

Full steps

Hopefully this has inspired you to give PQ a try. yourself. Let me know how you get on.