SARA R410M loses USB connection to RPi

I am developing a smart agriculture application on a Raspberry Pi 3 B+ (Buster). For cellular comms I use a Nova SARA-R410M-02B modems with a Hologram SIM card.
Also part of the application is an XBee RF radio that receives sensor data from the field. Both devices use serial IO via the Pi USB ports.

My application is written in Ruby and uses the Serialport gem for serial communications. I set up a udev rule that creates a symlink called ttyNOVA for the Nova modem and ttyXBEE for the SBee radio:

SUBSYSTEM==ā€œttyā€, ATTRS{idVendor}==ā€œ05c6ā€, ATTRS{idProduct}==ā€œ90b2ā€, KERNEL==ā€œttyUSB*ā€, SYMLINK+=ā€œttyNOVAā€, MODE=ā€œ0666ā€
SUBSYSTEM==ā€œttyā€, ATTRS{idVendor}==ā€œ0403ā€, ATTRS{idProduct}==ā€œ6015ā€, KERNEL==ā€œttyUSB*ā€, SYMLINK+=ā€œttyXBEEā€, MODE=ā€œ0666ā€

When the Pi boots up everything is set up ok:

lrwxrwxrwx 1 root root 7 Mar 5 08:45 ttyNOVA ā†’ ttyUSB2
lrwxrwxrwx 1 root root 7 Mar 5 08:45 ttyXBEE ā†’ ttyUSB0

After opening the serial port for ttyNOVA the port functions normally with:

cell port signals = {ā€œrtsā€=>1, ā€œdtrā€=>1, ā€œctsā€=>1, ā€œdsrā€=>0, ā€œdcdā€=>0, ā€œriā€=>0} (Note that ā€œctsā€=>1)

After starting the application everything hums along until the serial port is suddenly lost. Might take a few minutes or many hours.
Grepping the syslog for entries with ā€œusbā€ in them I find a pattern like this:

Mar 2 14:52:38 raspberrypi kernel: [ 131.566043] usb 1-1.3: USB disconnect, device number 8
Mar 2 14:52:38 raspberrypi kernel: [ 131.568025] qmi_wwan 1-1.3:1.3 wwan0: unregister ā€˜qmi_wwanā€™ usb-3f980000.usb-1.3, WWAN/QMI device
Mar 2 14:52:39 raspberrypi kernel: [ 132.632517] usb 1-1.3: new high-speed USB device number 9 using dwc_otg
Mar 2 14:52:39 raspberrypi kernel: [ 132.763246] usb 1-1.3: New USB device found, idVendor=05c6, idProduct=90b2, bcdDevice= 0.00
Mar 2 14:52:39 raspberrypi kernel: [ 132.763263] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Mar 2 14:52:39 raspberrypi kernel: [ 132.763273] usb 1-1.3: Product: QHSUSB__BULK
Mar 2 14:52:39 raspberrypi kernel: [ 132.763282] usb 1-1.3: Manufacturer: Qualcomm CDMA Technologies MSM
Mar 2 14:52:39 raspberrypi kernel: [ 132.763291] usb 1-1.3: SerialNumber: db4cad1d
Mar 2 14:52:39 raspberrypi kernel: [ 132.764775] usb 1-1.3: GSM modem (1-port) converter now attached to ttyUSB1
Mar 2 14:52:39 raspberrypi mtp-probe: checking bus 1, device 9: ā€œ/sys/devices/platform/soc/3f980000.usb/usb1/1-1/1-1.3ā€
Mar 2 14:52:39 raspberrypi mtp-probe: checking bus 1, device 9: ā€œ/sys/devices/platform/soc/3f980000.usb/usb1/1-1/1-1.3ā€

There is no other USB activty in the syslog around this time. Any idea what might be causing this? XBee port ā€˜interferenceā€™?

After this, the next time the Nova serial port is accessed (eg. to = serialport.read_timeout) an exception is thrown:

Input/output error - tcgetattr on port /dev/ttyNOVA

As far as I can tell, this error implies that the serial port no longer exists, or is in some way invalid.
This makes sense as ttyNOVA is now linked to a disconnected port (ttyUSB2) but reconnected to another port (ttyUSB1).

Currently, the application is designed to shutdown and restart at this point (sudo shutdown -r now).
Often, it comes up and there are two immediate observations.
First, after opening the ttyNOVA port we get:

cell port signals = {ā€œrtsā€=>1, ā€œdtrā€=>1, ā€œctsā€=>0, ā€œdsrā€=>0, ā€œdcdā€=>0, ā€œriā€=>0} (Note that ā€œctsā€=>1)

Note that ā€œctsā€=>0. Whenever this happens it indicates the serial port is not properly initialized.
No AT commands execute successfully. Bytes appear to be written but usually there is no valid response.
Sometimes there is a ā€˜responseā€™ of sorts, but it is nonsense. For example, often I will see a byte pattern something like

1, 0, 0, 0, 48, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 4, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

This response has 48 bytes, but sometimes there are hundreds.

Second, before issuing the first AT command (ATE0) the input buffer is cleared. Invariably if there are characters in the buffer, they are always the same:

0x60 0x0a 0x00 0x9e 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xef 0xad 0x7e

followed by a ā€˜nilā€™. Anyone recognize this byte pattern (I donā€™t)?

Usually after a few restarts, things return to normal and communications are restored.

I should also note that I have three test systems running atm and only one is exhibiting this poor behaviour. Also, the problem modem has had the latest firmware upgrade installed (https://support.hologram.io/hc/en-us/articles/360035212594-Updating-the-Cat-M1-R410-Nova-s-Firmware).

I have the same going on with a Pi. Iā€™m using the hologram python api and the R410m and have also updated firmware to latest. When the modem stops working, both the leds are out. Iā€™ve made sure Iā€™m using a substantial power supply.

The only pattern I think Iā€™ve noted is that the problem happens much quickly if Iā€™m not interacting with the modem (listening for inbound cloud data).

There are normally two ttyusb devices for the modem. After the failure, there is only one.

Yes, Iā€™ve also noticed the problem with the LEDs. It seems particularly odd that the blue power LED would stop working. I use only the standard power blocks that come with the basic RPi3B+. This is a 2.5A supply which should be more than sufficient. The only external draw on the PS is a small cooling fan which only comes on rarely and does not appear to be connected with the issue.

Sometimes the problem takes many hours or even days before it occurs. The syslog indicates that, until the port is disconnected as shown in the original post, there is no other USB activity from any source. Also, the application is running basically unattended so there is no human interaction either.

I just ran another test and found that after a restart the app started up correctly: initialized the modem, connected to the network, read some modem information, etc - IOW, everything is working exactly as it should. The next thing the app tried to do was register MQTT agent information. This was immediately followed by a system disconnect (as indicated in the original post):

Mar 9 09:25:01 raspberrypi kernel: [ 127.302880] usb 1-1.4: new high-speed USB device number 9 using dwc_otg
Mar 9 09:25:01 raspberrypi kernel: [ 127.434059] usb 1-1.4: New USB device found, idVendor=05c6, idProduct=90b2
Mar 9 09:25:01 raspberrypi kernel: [ 127.434077] usb 1-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Mar 9 09:25:01 raspberrypi kernel: [ 127.434088] usb 1-1.4: Product: QHSUSB__BULK
Mar 9 09:25:01 raspberrypi kernel: [ 127.434098] usb 1-1.4: Manufacturer: Qualcomm CDMA Technologies MSM
Mar 9 09:25:01 raspberrypi kernel: [ 127.434109] usb 1-1.4: SerialNumber: 8b17c662
Mar 9 09:25:01 raspberrypi kernel: [ 127.435691] option 1-1.4:1.0: GSM modem (1-port) converter detected
Mar 9 09:25:01 raspberrypi kernel: [ 127.436317] usb 1-1.4: GSM modem (1-port) converter now attached to ttyUSB1
Mar 9 09:25:01 raspberrypi mtp-probe: checking bus 1, device 9: ā€œ/sys/devices/platform/soc/3f980000.usb/usb1/1-1/1-1.4ā€
Mar 9 09:25:01 raspberrypi mtp-probe: bus: 1, device: 9 was not an MTP device

Again, at this time - about 2 minutes after a restart of the Pi - there was no user interaction and no RF communications from the XBee. Only the spontaneous disconnection and re-connection of the USB port to which the modem was attached, leading to an exception the next time the port was accessed (for the MQTT init).

Have you tried the ā€œTroubleshootingā€ section on that firmware upgrade page? Iā€™ve found that doing that refresh of the MNO Profile settings can help with a case like this.

That was something I planned to do because sometimes the modem does not make or maintain a connection with the carrier. However, the issue in this post is more about connecting and retaining a USB serial connection with the modem. Maybe the issues are connected? In any case, I canā€™t issue an AT command unless I can establish a connection to the modem.

Also, if I want to use the AT+UMNOPROF= command I need to know the MNO number of my carrier, Rogers (Rogers Communication Partnership). I have not been able to find it listed anywhere. Do you know what it might be?

Regards.

For whatever reason, refreshing the profile back and forth seems to help with the board itself powering off. My theory is that the firmware upgrade might not be restoring all files properly in all cases and doing this forces things to refresh.

You can just set the profile number back to whatever youā€™re already using if itā€™s working for you right now, or you can follow the instructions in the UMNOPROF and Bandmask section of this guide: https://support.hologram.io/hc/en-us/articles/360038810214-u-blox-SARA-R410

That will get you to a pretty general profile that should work on carriers that we support.

After switching to the 100 profile and restarting, I get an error on the first bandmask. The second one seems OK. I ended up putting the bandmask from profile zero in for the errored one and the connection to ATT popped right up. Iā€™ll report back on the power issue.
Iā€™m wondering if the power issue is the modem going into deep sleep mode, even though that appears to be disabled.

The shutdown problem happened within a few hours of being up on profile 100. Maybe a hardware issue? How do I swap out the modem?

Not sure if I saw that above, but what kind of power supply are you using on the Pi?

If you want to do an exchange, send us an email at success@hologram.io and someone can help you out.

Iā€™ve tried both a 3A PS that came with the Pi as well as a big HP laptop supply. The modem shuts off after a random period of time even when plugged into a powered USB hub, without the pi connected at all.

Reuben,
I havenā€™t had any success getting a response from the success team. Could you give them a nudge for me please?
Jon

Yeah Iā€™ll have them look out for your message

Just a note that Iā€™m having exactly the same issue with a custom Linux embedded system [running another control application written in Ruby :-] I updated to .05.08 firmware last April. Iā€™ve tried different power supplies, adding a ferrite RF choke to the USB, bypassing the USB series resistors, etc. Iā€™d been concentrating on the USB signaling because Iā€™ve observed that the Nova will not function when directly plugged into the USB port directly. It requires at least a 1 meter cable.
Iā€™ve ā€œsolvedā€ that with the ferrite choke, but it seems thereā€™s still this occasional firmware crash where the R410 reverts back to its bootloader (Product QHSUSB__BULK)
My current hypothesis is that the firmware sometimes crashes when it loses the cell signal, as I donā€™t see these failures as often when signal strength is high (>22)
Iā€™d really like to use this modem in our battery operated application, but this issue is a show stopper for us.
Please keep hitting this issue with updates once every 3wks until we get some sort of resolution.

Is L0.0.00.00.05.08 still the latest firmware?

Thanks!

Not exactly sure what you mean by ā€œthe R410 reverts back to its bootloader (Product QHSUSB__BULK)ā€. At the very top of this post I describe the USB activity that was written to the system log around the time that the USB serial port seems to reset itself leading to a fatal failure because the software thinks the port is still open and functioning normally. Is that the activity that you are referring to as reverting ā€œback to its bootloaderā€? With the hypothesis that this happens when the firmware crashes because ā€œit loses the cell signalā€? If so, it sounds like a reasonable hypothesisā€¦

When my hardware and software experience the problem there does not appear to be anything going on in my software or with other processes on the Pi that could account for annihilation of the USB port. So the idea that the modem firmware either crashes, reboots or restarts itself in response to an internal condition (like losing its cell signal) makes sense to me. Thanks for your insight.

I am also running into this issue.

At times i get ā€œERROR - Could not configure port: (5, ā€˜Input/output errorā€™)ā€ while using the modem to query for SMSs.

My code does automatically reboot the RP in this case, but that does not power cycle the USB port and on the next load I will first get ā€œINFO - Detected modem NovaMā€ and next ā€œERROR - Unable to detect a usable serial portā€. No LED on the modem are one when this happens.

Only a complete power cycle gets the modem out of this mode.

@Sven I donā€™t uses SMSs but the failure mode is similar in many other scenarios. Basically, the serial port appears unusable and only a hard reboot (power cycle) seems to fix the problemā€¦ eventually. However, recently when the modem seems to lose its brains (eg, no LEDs), I have just let it go and reboot, multiple times. I have found that it may take 5 -10 reboots, more or less, but eventually the modem resets properly, connects to the host network, and everything works again. This assumes that everything had been working properly before the failure.

So it appears that it may be the case that the modem just needs some time and several reboots to reset and repair itself. You are correct, that a power cycle seems to expedite the process. However, in practical use cases, a power cycle is often not an option. My gateway runs in ā€˜unattendedā€™ mode and while it may be possible to eventually get some human intervention to cycle power, this is often not a timely option. And the application cannot afford to lose its (sensor) data for hours or days. Therefore it would be a show-stopper if the PI+application could not recover on its own. Inability to do this would mean that it would be impossible to move forward with the Hologram solution (like @genosensor).

Are you guys on Pi Model Aā€™s? Weā€™ve found that the USB design on those is kind of weird. It never powers down the USB port on reboot, unlike on the Model B.

RP 3 A+

You are right about the power design vs. the other RPs. There is no way to control USB power as there is on other models.

One possible fix would be to wire a GPIO from the RP to the reset pin of the modem to force a hard reset. The pin is exposed on the modem and needs to be pulled low for 10s to reset it.

EDIT: I just tested a B model RP and it does power cycle the modem when the system resets.

Still would like to better understand why the modem sometimes hangs (and only exposes the debug port in this stage). Looks like it might have something to do with the tower it is connected to as I see the issue more frequently in one location and cannot repro it when moving the system to another location. Reception is similar in both locations.

All of my tests were running on the RPi 3 B+.

The USB kernel log messages I see are identical to the ones you initially posted.
The SARA-R410M USB device disconnects and reenumerates on the USB bus with its usual vendor and product ID numbers, but with ā€œQHSUSB__BULKā€ substituted for the product string.
The web is full of references to Qualcomm smart phones reverting back to this ā€œQHSUSB__BULKā€ mode.
There are drivers that can upload firmware to the ā€œbrickedā€ phones in this mode to recover them.
However, the SARA-R410M modules seem to just need a power cycle to recover from this state.

Good News:
Over the past week, I have gotten to the point where the SARA-R410M can be used in my embedded application. But, thereā€™s a lot of voodoo involved. Hereā€™s what works for me:

  1. This sounds crazy, but Iā€™ve verified it again and againā€¦
    You must start your PPP session about four seconds from the time the SARA-R410M enumerates on the USB bus.
    a. If you start it too soon, the connection will not be established
    b. If you start the PPP more than 10 seconds after enumeration, the connection will succeed, but the modem will disconnect and likely crash within 10 minutes.
    c. If you start the PPP session 4 seconds after enumeration, sessions typically run as long as 20 hours. However, they still ultimately terminate with the modem crashing into its QHSUSB__BULK mode

  2. Signal strength doesnā€™t seem to correlate with these crashes.
    I tried disconnecting the antenna or wrapping it in tin foil. Both methods blocked the signal, but neither caused the modem to crash.

  3. Create a udev rule that recognizes when the modem reenumerates as QHSUSB__BULK. This runs a utility that cuts power to the USB bus for 5 seconds. When then modem enumerates correctly, another udev rule reestablishes the PPP session after a 4 second delay.

Hereā€™s my udev rule that fires when the modem crashes: (you must create your own resetUSB utility)

#cycle USB power whenever the SARA-R4 modem fireware crashes
ATTRS{idVendor}==ā€œ05c6ā€, ATTRS{idProduct}==ā€œ90b2ā€,
ATTRS{product}==ā€œ*USB__BULKā€, RUN+=ā€œ/usr/sbin/resetUSBā€, GOTO=ā€œskipNormalCaseā€

ā€¦ normal case rules ā€¦ Handle starting the PPP session after a 4 second delay here

LABEL=ā€œskipNormalCaseā€

Of course, the problem with this approach is that, if you have other peripherals on the USB bus, they get reset too! Fortunately, most if the systems we field use the USB bus only for the cellular modem.
Iā€™d love to find a USB command I could send the the QHUSB__BULK device to cause it to reset!

  1. I succeeded in getting the Nova to run when plugged into my CPU board without a long cable or ferrite choke by installing two 47pf capacitors between USB D+/D- and ground. My CPU board has not had similar trouble with any other of the dozen or so peripherals Iā€™ve tested, including a half dozen other types of LTE modem dongles.

  2. I noticed that the schematic calls for three 330uF bypass capacitors on the Novaā€™s internal 3.8V supply. The back side of board has pads for these, but they are not populated. I installed a 220uF cap as it was the only surface mount cap I had on hand. Iā€™m not sure this was critical ā€“ just a note.

  3. This is a problem with the SARA-R4 module, not the Hologram Nova board. I have a IOT Click board from www.mikroe.com that exhibits the same modem crashes. That board also allows one to communicate with the modem module via its TTL RS232 serial I/O pins. The same crashes occur via RS232 as occur via the USB port. Of course, in the RS232 case, the modem does not reenumerate. It just freezes until power is cycled or it Power On signal is toggled.

Hope this info helps.
The SARA-R4 is a unique part with great potential.
Iā€™m hoping the next firmware update from uBlox will address the crashing weā€™re observing.
Please let me know what works for you in the end.

1 Like