I’m using node_exporter to generate host metrics for several of the nodes in my lab. I was re-working one of my thermal graphs, today, with the goal of getting good historical temps of my Pis and my Ubuntu-based homebuilt NAS into a single readable graph. node_exporter
has two relevant time series:
node_thermal_zone_temp
which was exported on all of the Raspberries Pinode_hwmon_temp_celsius
which was exported by the NAS and the Raspberries Pi 4. The rPi3 did not export this metric.I liked node_hwmon_temp_celsius
a lot, and opted to spend some time focusing on getting that to fit as well as I could. It’s an [instant vector][instant_vector], and it returned the following with my config:
node_hwmon_temp_celsius{chip="0000:00:01_1_0000:01:00_0", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter", sensor="temp1"} 29.85
node_hwmon_temp_celsius{chip="0000:00:01_1_0000:01:00_0", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter", sensor="temp2"} 29.85
node_hwmon_temp_celsius{chip="0000:00:01_1_0000:01:00_0", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter", sensor="temp3"} 32.85
node_hwmon_temp_celsius{chip="0000:20:00_0_0000:21:00_0", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter", sensor="temp1"} 52.85
node_hwmon_temp_celsius{chip="0000:20:00_0_0000:21:00_0", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter", sensor="temp2"} 52.85
node_hwmon_temp_celsius{chip="0000:20:00_0_0000:21:00_0", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter", sensor="temp3"} 58.85
node_hwmon_temp_celsius{chip="pci0000:00_0000:00:18_3", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter", sensor="temp1"} 37.75
node_hwmon_temp_celsius{chip="pci0000:00_0000:00:18_3", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter", sensor="temp2"} 37.75
node_hwmon_temp_celsius{chip="pci0000:00_0000:00:18_3", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter", sensor="temp3"} 27
node_hwmon_temp_celsius{chip="thermal_thermal_zone0", class="raspberry pi", environment="cluster", hostname="cluster1", instance="10.0.1.201:9100", job="node-exporter", sensor="temp0"} 37.485
node_hwmon_temp_celsius{chip="thermal_thermal_zone0", class="raspberry pi", environment="cluster", hostname="cluster1", instance="10.0.1.201:9100", job="node-exporter", sensor="temp1"} 37.972
node_hwmon_temp_celsius{chip="thermal_thermal_zone0", class="raspberry pi", environment="cluster", hostname="cluster2", instance="10.0.1.252:9100", job="node-exporter", sensor="temp0"} 32.128
node_hwmon_temp_celsius{chip="thermal_thermal_zone0", class="raspberry pi", environment="cluster", hostname="cluster2", instance="10.0.1.252:9100", job="node-exporter", sensor="temp1"} 32.128
The class
, environment
, and hostname
labels are added when scraped.
The chip
label looked interesting, but it appears to the an identifier as opposed to a name, and I’m terrible at mentally mapping hard-to-read identifiers to something meaningful. Digging around a little more, I found node_hwmon_chip_names
, which when queried returned
node_hwmon_chip_names{chip="0000:00:01_1_0000:01:00_0", chip_name="nvme", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter"} 1
node_hwmon_chip_names{chip="0000:20:00_0_0000:21:00_0", chip_name="nvme", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter"} 1
node_hwmon_chip_names{chip="pci0000:00_0000:00:18_3", chip_name="k10temp", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter"} 1
node_hwmon_chip_names{chip="platform_rpi_poe_fan_0", chip_name="rpipoefan", class="raspberry pi", environment="cluster", hostname="cluster0", instance="10.0.1.42:9100", job="node-exporter"} 1
node_hwmon_chip_names{chip="platform_rpi_poe_fan_0", chip_name="rpipoefan", class="raspberry pi", environment="cluster", hostname="cluster1", instance="10.0.1.201:9100", job="node-exporter"} 1
node_hwmon_chip_names{chip="platform_rpi_poe_fan_0", chip_name="rpipoefan", class="raspberry pi", environment="cluster", hostname="cluster2", instance="10.0.1.252:9100", job="node-exporter"} 1
node_hwmon_chip_names{chip="power_supply_hidpp_battery_0", chip_name="hidpp_battery_0", class="nas server", environment="storage", hostname="20-size", instance="10.0.1.217:9100", job="node-exporter"} 1
node_hwmon_chip_names{chip="soc:firmware_raspberrypi_hwmon", chip_name="rpi_volt", class="raspberry pi", environment="cluster", hostname="cluster0", instance="10.0.1.42:9100", job="node-exporter"} 1
node_hwmon_chip_names{chip="soc:firmware_raspberrypi_hwmon", chip_name="rpi_volt", class="raspberry pi", environment="cluster", hostname="cluster1", instance="10.0.1.201:9100", job="node-exporter"} 1
node_hwmon_chip_names{chip="soc:firmware_raspberrypi_hwmon", chip_name="rpi_volt", class="raspberry pi", environment="cluster", hostname="cluster2", instance="10.0.1.252:9100", job="node-exporter"} 1
node_hwmon_chip_names{chip="thermal_thermal_zone0", chip_name="cpu_thermal", class="raspberry pi", environment="cluster", hostname="cluster1", instance="10.0.1.201:9100", job="node-exporter"} 1
node_hwmon_chip_names{chip="thermal_thermal_zone0", chip_name="cpu_thermal", class="raspberry pi", environment="cluster", hostname="cluster2", instance="10.0.1.252:9100", job="node-exporter"} 1
You might notice that the chip
label matches in both vectors. Which made me think I could cross-refrence one against the other. This was way more hack-y than I expected.
Prometheus only allows for label joining by using the group_right
and group_left
operations, which are very poorly documented. Fortunately, I came across these two posts by Brian Brazil, which got me started. This answer on Stack Overflow helped me get the rest of the way there.
I’ll start with my working query and work backwards.
avg (node_hwmon_temp_celsius) by (chip,type,hostname,instance,class,environemenet,job) * ignoring(chip_name) group_left(chip_name) avg (node_hwmon_chip_names) by (chip,chip_name,hostname,instance,class,environemt,job)
We’ll break the query above into two parts seperated by the operator:
avg (node_hwmon_temp_celsius) by (chip,type,hostname,instance,class,environemenet,job)
avg (node_hwmon_chip_names) by (chip,chip_name,hostname,instance,class,environemt,job)
* ignoring(chip_name) group_left(chip_name)
Let’s go through each.
The left side averages the records for every series that has the same chip
label. In this case, the output above showed that some chip
s had multiple series seperated by temp1
…tempN
labels. I don’t really care about those, so I averaged them. Averaging records with one series just returns that series value, so that’s a good solution.
The right side returns several series with labels matching chip
s to chip_name
s, and the other requisite labels. The value for these series are all 1
, effecitvely saying “this chip exists.”
The operator is where it gets both interesting and hacky.
*
(multiplication) vector match because the right-side value is always 1
and therefore safe to multiply my left-side values without changing them.ignore()
keyword allows us to list lablels to be ignored when looking for identical label sets. In this case I told the arithmetic operator to ignore(chip_name)
becuase it only exists on the right side.group_left()
and group_right()
) to match many-to-one or one-to-many. That is, the group_left()
modifier will take any labels specified and pass them along with the results of the equation. Since I used group_left(chip_name)
, it returned chip_name
in the list of fields after matching.Here’s what makes this hacky: as far as I can tell, this is the only way to take matching labels and use them in reference to one-another.
The query returns1
{chip="0000:00:01_1_0000:01:00_0",chip_name="nvme",class="nas server",hostname="20-size",instance="10.0.1.217:9100",job="node-exporter"} 28.85
{chip="0000:20:00_0_0000:21:00_0",chip_name="nvme",class="nas server",hostname="20-size",instance="10.0.1.217:9100",job="node-exporter"} 54.85
{chip="pci0000:00_0000:00:18_3",chip_name="k10temp",class="nas server",hostname="20-size",instance="10.0.1.217:9100",job="node-exporter"} 30.166666666666668
{chip="thermal_thermal_zone0",chip_name="cpu_thermal",class="raspberry pi",hostname="cluster1",instance="10.0.1.201:9100",job="node-exporter"} 36.998000000000005
{chip="thermal_thermal_zone0",chip_name="cpu_thermal",class="raspberry pi",hostname="cluster2",instance="10.0.1.252:9100",job="node-exporter"} 32.128
Pretty sweet.
You’ll notice the series for chip="platform_rpi_poe_fan_0"
and for hostname=cluster0
were dropped because there’s no series with matching labels on the left-side results. ↩︎