Jump to content

Archived

This topic is now archived and is closed to further replies.

SpivR

Custom heartbeat rules for problematic devices

Recommended Posts

SpivR

I am monitoring two Nest thermostats and getting a lot of device down/device up alerts.  The Nest devices seem to "bounce" quite a bit while all the other devices are stable, so no obvious network configuration or other problem.

 

I would like to keep monitoring the Nest devices for if they did actually go down for more than a few minutes, I want to be aware of this.

 

Not wanting to design the solution, but would like some way to configure more relaxed "heartbeat lost" time intervals for problematic devices like this.  Or perhaps to define how many minutes the devices must be "down" before reporting a "device down" alert.  Essentially, I would like to still have the Nest thermostats listed as "important" devices, but not receive up or down alerts for relatively transient behavior. (e.g. if the thermostat went down for 5 minutes or more, I would consider that an alertable event.)

 

On a related note, I have a few specialized IoT devices with embedded processors that have very low (computing) power.  Their TCP/IP stack is not very tolerant of multiple pings and so I know, from prior testing, that pings will often fail.  I would like to monitor these devices as important, but again, not receive a stream of up/down alerts unless the device is truly down (as defined by a long period of time - 5 or 10 minutes) versus the normal Domotz heartbeat detection limits.  For these devices, the frequency of heartbeat checks should ideally be lengthened to be longer than every 30 seconds as they will consistently miss heartbeats if the default interval is used.

Share this post


Link to post
Share on other sites
Giancarlo

Good news... we have a customizable "period" to consider a device down. Not sure yet when that will be available, but it should be before the end of the year.

 

With regard to HeartBeat lost: each heartbeat is actually a train of 5 consecutive pings (1 each second). So, also for the IoT devices you were mentioning, just making longer the period to consider them down, should do the work.  

Share this post


Link to post
Share on other sites
imanuk

+1 for this request as it doesn't appear to have materialised and we are well into 2017 now.  

Share this post


Link to post
Share on other sites
Giancarlo

Dear Imanuk,

 

I confirm that the customizable period is available since few months now. 

 

You can configure the status check to any range between few minutes up to 24 hours. See attachment.

 

Regards,

Giancarlo

Screen Shot 2017-05-18 at 12.13.37 PM.png

Share this post


Link to post
Share on other sites
imanuk

Brilliant :-D  

I had tried clicking the status cog to no avail in chrome, having now tried this in the app and via internet explorer it works. So perhaps a chrome issue. If you want to look at chrome compatibility I'm running  Version 57.0.2987.133  on windows 10. 

 

Also it would be great if this can be changed to several devices in one go rather than individually. 

Thanks, 

John

Share this post


Link to post
Share on other sites

×
×
  • Create New...