Dash Modem Times Out and Won't Send Data After 1 Hour

Hey @ajandar @MichaelM @bitshift @AlexS @jasoveen @FrankVigilante,

I can give an update.

The problem:

  • On some cellular towers for some carriers, the PSD context (i.e. PPP) was being dropped after inactivity
  • This caused strange effects to the modem’s internal state machine, which were not being reset as expected when trying to immediately self-heal
  • Seemingly not every carrier/tower exhibited this behavior in the same way (strange, no?); nonetheless, it was a widely reported issue

The potential solutions:

  1. Send a keep-alive packet every so often (not acceptable, since that would waste cellular data)
  2. Get the modem module to renegotiate properly if/when this occurs (desirable solution)

Option #2 proved difficult to implement reliably. We added a number of new states into our state machine and tests to verify if/when the modem module was not in a state where the PSD context could be recovered without requiring a deeper reset.

We have successfully reproduced the issue, and have just finished a 0.9.4 release candidate (after needing to perform extensive testing of the additional states in our state machine). Official release will be early next week, but we’ll post the release candidate here prior to that.

In firmware version 0.9.4, we implement Option #2: in the event that the carrier drops the PSD context, and the modem module enters a state where a more significant renegotiation must occur, we detect that and correct the state of the modem module. This improves the robustness of our self-healing code. No keep-alives are necessary.

Also in version 0.9.4: We identified a socket-level networking stack issue that, under certain circumstances, would cause an error connecting to a server the first time a connection is attempted after initial module boot. We have been able to reproduce that issue. We have implemented a workaround for that issue within our modem driver, which is also included in version 0.9.4.

Best,
PFW