Process Controll – Page 2 – Adam Szczecina P.Eng

My Power BI Dashboards

I wanted to share some of my person dashboards that I run off of Power BI.

The first one is my aquarium monitoring dashboard which I use to monitor my KPI. Now its been a while since I did a water change so you can see my TDS is out of spec and I recently moved my turbidity probe to a more appropriate location to properly manage the tank. I have 3 quick cards to see if I am green or red and then further charts to show daily variances and the time series trend at the bottom. I have some alerts that go out when I go out of tolerance but as can be clearly seen I don’t necessarily change my water immediately. (Before anyone freaks out, I have a heavily planted tank so I have a healthy equilibrium that allows me to do infrequent water changes).

Next I have my temperature monitoring dashboard that I use to track 3 main temperatures: my room temperature, my sun-room temperature (and by extension my outside temperature) and my server rack temperature. There is not much to note but I do use the server rack temperature to ensure good operation of equipment. I don’t want to run my server to hot as that my cause early device failure.

Lastly I have my power monitoring dashboard, where I track my desktop computer, server rack and fridge temperature. I track my desktop power usage to understand costs associate with use and to track GPU mining profitability. Server rack power as an important one as well as server hardware is known to be a big power draw. I spent a fair bit of time optimizing power usage and this dashboard was important in tracking those changes.

Overall I have much more work related dashboards but these are the ones I run on my personal Power BI workspace.

Custom Fan Curve For Dell Servers (R530)

I was very happy to get my Dell R530 for what was effectively a steal up until I heard it turn on. For those that have never heard any server turn on, it is close to an airplane turbine spinning up (I am not kidding, my server fans can reach 15K RPM).

Now of course it promptly idled down but the problem I had is that it was idling at around 20% (3000RPM) which produced a noticeable hum that could be heard 2 floors up. There are some extenuating circumstances as I had added a few PCIe devices that cause Dell to compensate but that is aside from this.

A background note on fan curves and how computers stay cool. Dell servers have a fan curve which dictates the PWM output % for the system fans based on the air temperature. The problem is that these servers are designed to run with cold refrigerated incoming air in datacenters and in my condition 20c ambient translates to 20% fan speed and it is hard for me to go lower.

In a weird twist of worlds the consumer space already had a solution for this: the custom fan curve. In the consumer space, end users are able to adjust the original fan curve to what is best for their use case instead of being forced into the OEM curve.

To do something like this on a dell server is a bit more difficult though. I had to borrow from a github project where the host machine measures cpu temp and then issues IPMI commands to manually set a speed. Through this we can make a makeshift fan curve and I have shown mine below. This curve is a more user-centric where under idle conditions (<40c) fan speeds are relatively low and only ramp up when needed. This provides a good balance of thermal performance under load and loudness at idle.

Now the server is not bothering anyone at idle and I don’t have to worry about over temperature while under significant load.

CI Power Savings at Home

With new power monitoring plugs (flashed with custom Tasmota firmware) I wanted to have a look at my server rack power usage and see if there were any savings opportunities. I ran a “top" command on the VM that had the highest usage and found that my database (PostgreSQL) was running higher than I think it should. I started to go deeper into the jobs running on the database and found some stuck system jobs. I was able to resolve most of them and implemented a query timeout to prevent anything from going to long and was able to go from a mean 104 W down to 90W presenting a 14 W savings. This may not be much but this server runs 24/7 and so will accumulate over time.

I made a quick I-chart to show the power savings the the optimization period where I was hammering the server to figure out what was going on. There are still some cyclic increases in power that are related to clean up jobs.

How Accurate Are House Thermostats?

I have a programmable thermostat at home and being the indulgent human I am I have a rule set that the house starts to warm up at 5am so that when I get out of bed it is warm and not cold 😀

Of course that sounds good in theory but what proof is there that the thermostat actually delivers ;).

I setup a temperature logger to test how effective this actually was and I must admit was pleasantly surprised. Over night temperatures stayed at 21.3°C but as soon as 5am came around temperatures went up to 21.9°C enough that I feel much better getting out of bed in the morning.

“Quis custodiet ipsos custodes” & Process Control

“Who watches the watchmen” a question posed by Alan Moore has much broader implications into the realm of process control. The core of this political sentiment is accountability and it has broad applications to sensors and how we interpret their data.

Equipment fails and it is not a question of if but rather when. A good example is the 2019 Boeing airplane crash where investigations found that cause was a faulty angle of attack sensor. A very good research paper goes deep into this and I will be borrowing on this idea https://www.sciencedirect.com/science/article/abs/pii/S0952197622000744. The root of the argument is how do you trust sensor data. If you have 1 sensor and it is providing you faulty data you do not know, with 2 sensors (one good & one faulty) you can see the sensor data is different but you won’t know which one is right and with finally 3 sensors (two good & one faulty) you can have the sensors ‘vote’ on the correct sensor value and then tell which is faulty. The obvious error is if you have 2 faulty sensors out of 3 the wrong value will be chosen. This can be extended infinitely but the probability of 2 faulty sensors that are regularly checked is unlikely and so 3 sensors is the agreed norm for critical voting processes. The previously noted research paper goes into how 2 real sensors and a virtual sensor could be used to the same effect and has some really interesting implications to save on sensor cost.

Extending this to process control we can have the exact same failure modes. One example is a control valve used to control fluid flow rates. If said control valve fails (“Gasp, by God how could a control valve ever fail or stick” /s) how would we know. The most trivial solution is to have a “watcher” for all control points. If we have a control valve to control fluid flow rate we need a flow meter to measure the effect of that control valve. This way if the control unit is faulty we can measure that and then investigate. The concern is that if the flow meter is faulty we can have false negatives or fail to catch true negatives. For true critical to safety (CTS) or critical to quality (CTQ) parameters is may be beneficial to design processes with redundant sensors or innovative intertwined solutions.

One real life example is a steam heater used to heat a process stream. We had a control valve fail (after root cause analysis with maintenance we found that it was damaged and was very prone to sticking) and the sensor measuring the flow rate (well not really measuring flow rate but rather pressure differential and then estimating flow rate) was not able to pick up the failure. This would be a catastrophic failure leading to over-temp material but we had temperature sensors in the material being heated by the steam and so were able to catch this condition. The main point here is processes overlap. Surface level views may indicate the faulty valve and sensor are isolated but their effects affect downstream processes so it is possible to detect their failures through downstream sensors.

What this also implies is that failures in a large complex processes can be caught by direct process sensors as well as downstream sensors much further down the line. Another example to show the point is in a boiler where if the heater (gas, electric, etc) fails and the temperature sensors itself fails to catch it, downsteam processes that use the boiler stream would be able to catch this failure due to low pressure or low temperature.

Interconnecting a process like this and creating a system to catch failures like this can be very difficult work and poses its own risks but the core tenant of “Who watches the watchmen” is something all process engineers should be aware of as both equipment and sensors fail and may have significant impact to process.

Parametric vs Non parametric Statistics

Many people assume perfect normal distributions when looking at data, but this is not always the case. An example data set is shown below where the data is clearly non normal and both the normal and non parametric tolerance intervals and means are shown. The mean is obviously the same but the 99% tolerance intervals show some key distinction. The normal tolerance intervals overestimates and unrealistically goes to -1.907 because it assumes a normal distribution and “equal” datapoints on both sides of the mean. This is not the case and the non parametric tolerance interval is obviously more accurate as it takes into account the “Real” aspect of the data.

I know many people just default to always assuming their distributions are normal but this is not always true and wrong assumptions tend to backfire when it comes to statistics.

Refractometers and Fluid Density Measurements

In the past I thought the the typical way of measuring a materials density is to measure its volume and then weigh it. As there are obvious difficulties and flaws (fluids can evaporate during measurements, volume size errors) a different way is to use a refractometer which uses a fluids index of refraction to estimate it’s density/etc. These all have to be calibrated for the particular working fluid and I have one made for ethanol. It is a good tool and a non destructive small sample way of finding the alcohol %.

After calibration at the low and high end I was able to measure the alcohol% in various drinks and found some that under deliver. One key thing to note is to never take measurements blindly as alcoholic beverages that have high sugar content can artificially change the alcohol% as it changes the fluid density.

Microbial Death Kinetics & Pasteurization

Being interested in tech, food and having a chemical engineering degree this is a personal favourite topic of mine (and may be a bit of a rant). Pasteurization is all about statistics and killin’ microbes.

Microbes aren’t smart. They don’t sit around waiting for the 165F safe internal temperature for chicken (as deemed by the FDA below) and die. The 165F chosen by the FDA is based on a bit of science and statistics and by using the same tricks you can actually go lower.

Microbe death follows a first order death equation dN = -k*N*dt where N is the size of the microbial population and k is the death rate. Where the factor k itself follows k = A*exp(-E/RT). Long story short, the decrease in a microbial population size is affected by time and temperature (different microbes behave differently).

This equation isn’t used too much in industry instead going with decimal reduction time z= (T2-T1)/[log(D1)-log(D2)] which defines the time required to reduce a microbial count by 90%.

What this all boils down to (great pun), is that the FDA safe internal temperatures values are based on the temperature at which we see a 7 log10 reduction in microbial content instantaneously. This is good for the government as their recommendation is very conservative and they can’t be blamed for anything other than dry chicken.

To cheat the system we can achieve the same 7 log10 reduction at a lower temperature but a longer time. There are curves for this but for example with chicken we can achieve this with 58c for 64 minutes meaning we can have both safe and juicy chicken (talk about having your cake and eating it too).

This is cool and all but the astute will note that a 7 log reduction is 99.99999% effective but not 0 and microbes just like cockroaches will multiply. If you leave one alive it will come back with more. For this the government uses a bit of statistics and for shell stable foods uses a 12 log reduction. Now a 12 log reduction also doesn’t guarantee 0 microbes but we get down to such small numbers that spoilage probability and economics take over. For a given set of starting microbes, a 12 log reduction and for R amount of containers you can estimate how many containers will have microbes out of a batch and if it an acceptable amount. So unfortunately the chances of buying processed food and having food poisoning are never 0 but are managed to safe levels.

Pasteurization and Sous Vide

Working in the food manufacturing industry, pasteurization is a critical step to ensure food safety. Now I don’t have any industrial ovens or equipment at home but I can get fairly close with home equipment. A follow up to my strawberry compote recipe https://adam-s.ca/strawberry-rhubarb-compote/ I decided to try pasteurizing the drink so I can extend its shelf life as I will be bringing these on a outdoor trip a week from now.

I will not get too deep into thermal death kinetics (but its a great topic for another time). But for pasteurization time and temperature matter most. I can use high temperature for a short time or lower temperature for a longer time. For this case I pasteurized at 90c for 1 hour (plus the startup time). I setup my sous vid for 194F (90c) and a probe along side it and put in my bottles and jars to be pasteurized (picture bellow). In industry I have used Datapaq at few different jobs to confirm the time at temperature which is critical to process and was able to emulate that with my own probe & logger. I was able to confirm exactly 1 hr at 90c and almost 2 hours at 80c which proves sufficient for pasteurization.

This was a fun project and I was very happy to see such a consistent temperature profile, now if only temperature profiles were so “text book” at work :D.

Sous Vide Temperature Accuracy & Reliability

I have purchased a new sous vide machine and as always it has to be broken in and tested. I have found digital temperature sensors to be repeatable but sometimes inaccurate and so offset have to calculated for each of these machines.

I have a master chef unit and the test was controlling water at 132F (55.56 C). I dropped in a logging probe to confirm this. The chart below shows a very tight temperature grouping but the average is at 55C and not the 55.56 the sous vide was controlling to. So in the end I have a very repeatable and controllable unit but it is 0.5C off which I can alwaysaccount for.