Jump to content

Archived

This topic is now archived and is closed to further replies.

boojew

Frequent Agent Disconnects

Recommended Posts

boojew

I've been getting notices of frequent disconnects of the Domotz agent- sometimes up to 4 times per day. I dont believe it is an issue with my network, but possible with the agent. This has heppened twice today. I wanted to look through the logs but everything in var/log/domotz is from yesterday or earlier.

Share this post


Link to post
Share on other sites
Giancarlo

Hi Boojew,

 

logs are not continuously written, but flushed after a while (this is for saving life when running on Raspberry SD card, and allow disk sleeping when running on NAS). Therefore you will see logs written after a while they are produced. If you wish to flush logs, just send a sigh-up message to the Domotz process (kill -1 `cat /tmp/domotz_listener.pid`)

 

With regard to disconnections, you can check the history of disconnections within the Domotz WebApp --> Agent Dashboard --> Network Health Status --> Downtime tab

 

Please note that Alerts you receive via email and Downtime history are based on two different mechanism, so that you can always have a way to double check what was going with your network. If you receive a Down/Up alert and there are entries in the Downtime history, there is a very high probability that the Domotz Agent lost connection toward Internet/our cloud infrastructure. You can also put a constant Ping on the Raspberry hosting the agent to have a third piece of proof. 

 

I would like to provide you with a bit of background on how the Network Health Analysis --> Downtime Tab is populated versus how we trigger the Agent disconnection from Internet (Agent Connection Lost in the Alert Manager). Sorry for this long note, but it's worth explaining it in order to understand the overall scenario, and why it is built in this way.

 

The two features (Downtime Tab and Agent Connection Lost Alert) are similar in concept, but technically speaking they have been implemented in a very different way. This is very important, because it is a good way to guarantee that both are working correctly, or to identify if there is a misalignment in one of the two mechanisms.

Basically, every Domotz Agent performs periodic pings toward Domotz Cloud servers. It also sends an Heartbeat to Domotz Cloud, to notify the back-end that the agent is still alive.

Agent Perspective: Missing Pings --> triggers an entry in the Downtime Tab 

As soon as a train of 5 pings are missing (either the DNS is failing, or there is not any internet connection, network cable is detached from the Domotz Box/Pi, or simply the Domotz ping server is not available) the Domotz Agent starts keeping track of the time that the first train of pings was lost.

Two minutes after the Ping responses are recovered again, the Agent notifies our back-end that there was a Downtime of X minutes (starting from the first ping lost). This is going to populate the Downtime Tab.

This means that in case we miss pings, because of the fact we wait two minutes of consistently received pings before we report the downtime, you won't see the entry in the Tab during the downtime period. In your monitoring therefore you can assume that if you see two minutes of downtime, we only lost a few ping responses during that time. Longer periods indicate that multiple Ping messages were lost (we send these ping's every 30 seconds) for whatever reason.

What happens if the Domotz Agent is turned off (unplug the power cable from the Domotz Box/Pi)? In this case, the Agent does not send any ping (of course) and cannot be aware if there is connectivity or not. Therefore, there will be no entry in the Downtime Tab, for the whole period the Domotz Agent is off.

However, Domotz Cloud is still aware that something is going on. See next.

Cloud Perspective: Missing Heartbeat --> triggers the alert (if configured in the Alert management - Agent Connection Lost) 

After three minutes have passed from when our back-end receives the last Heartbeat from an Agent, the back-end declares that Agent to be OFFLINE.

If the Agent Connection Lost alert is configured, you will receive an email from our back-end stating that the we didn't have info from that agent in the last three minutes.

As soon as the Agent starts sending the Heartbeats again you will receive an email from Domotz Cloud stating that the connection to the Agent was recovered.

This last mechanism works both if you disconnect the power cable or the network cable from the Domotz Box/Pi/NAS, because it's triggered by Domotz Cloud.

Sorry for the long response, but I hope this provide sufficient info on how the network monitoring works.

 

Share this post


Link to post
Share on other sites
boojew

Thanks for the details - they really are appreciated. A few questions.. For the downtime monitoring, what IP/host is it pinging? How frequent are heartbeats between the backend and the agent? I.e. in the 3 minutes how many heartbeats is it missing? 

 

Thanks!

Share this post


Link to post
Share on other sites
Giancarlo

No problem. Glad to help explaining the system. 

 

The heartbeat is performed every 60 seconds (actually 58). So 3 heartbeat are missing in 3 minutes.   

Share this post


Link to post
Share on other sites

×
×
  • Create New...