Can’t send from Cloud to Device, redux...

Blake · March 20, 2019, 2:14am

Hi,

I’ve read this thread:
Can’t send from Cloud to Device

and have the same issue.

I’ve finally got the Hologram Nova up and running using CLI commands (after turning off the NB-IoT with AT commands through Screen). I’m on a raspberry pi 3 B+, using the latest builds of the hologram SDK and Raspbian.

I can send messages out to the dashboard from the pi/nova with sudo hologram send "message". But it seems like receive just won’t happen most of the time. In fact, after trying for the last two hours, I finally decided to join this forum and ask this question. Only on the very last try, did it manage to send a message to the device. So rhavourd, I guess I’m someone who’s done it on the NovaM.

Order of operations:

Put the nova into receive mode from CLI. It indicates it is waiting for a message on port 4010.
On the dashboard, send a simple message “via cloud data” on port 4010 using TCP.
Within seconds, the dashboard console shows socket error: timed out, with topic _API_RESP_, as the last poster in the previous thread indicated.

That is, until it just started working as I typed this post, for no apparent reason. I’ve changed no settings between not working/working.

I’m connected to “AT&T Hologram”, signal strength “29,99”

Any ideas on why sending messages to the nova seems fragile? What are some failure points of this connection? Maybe it takes a certain amount of time after connecting to a network? Just shooting in the dark here…

Has anything changed on Hologram’s end since Dec 2018 on how timeouts are handled (this was mentioned by Reuben in the other thread?

Thanks,
Blake

Reuben · March 20, 2019, 3:20am

Hey @Blake unfortunately we don’t have a fix for this on Cat-M1 yet, but we’re hoping to get it resolved soon.

Blake · March 20, 2019, 3:44am

Just so we’re totally clear, what is it that you do not yet have a fix for?

In other words, what does Hologram think is the problem that needs fixing here?

Is the problem “sometimes the Nova doesn’t work”? Or can you be more specific?

Is it a problem with the dashboard sending commands? Is it a problem with the Nova hardware? The code? The network? Our reality?

Thanks.

Reuben · March 20, 2019, 3:56am

We’re pretty sure we can fix this on our end by raising the TCP timeout that one of our backend systems uses to send the data. We actually even have a patch put together but we need to confirm that it actually fixes the issue and doesn’t break anything with our 3G modems.

I don’t want to promise anything, but we might actually have time this week to get this rolled out assuming we don’t stumble into another layer of complexity.

To explain further, our current theory that we’ve developed from experimentation is that the longer radio power-downs on idle that are built into the Cat-M1 protocol make it harder for packets to reach the modem before the TCP timeout expires. The modem can’t receive the packet until it decides to power back up and it might be over a minute before that happens.

Reuben · March 20, 2019, 4:00am

BTW, one way you can see this in action is to start sending out pings at the same time as you’re trying to receive. In that case, since the modem is staying powered up you should be able to get stuff right away.

Then stop pinging for a minute and try to send and it will probably not work.

Blake · March 20, 2019, 4:04am

Thank you for this explanation.

I think (as I’m sure do you and everyone else using these products) that being able to reliably send data to the device is absolutely critical for functionality/usefulness within remotely accessed equipment.

I look forward to your resolution of the timeout issue with the CAT-M1 devices.

Quick final question for now: it seems that the problem does not present itself on the 3G devices (probably due to different power usage states with the radio). Is any different noticed when using the NB-IoT radio instead?

Thank you,
Blake

Blake · March 20, 2019, 4:07am

Please excuse my lack of knowledge:

How would you send out a ping from the Nova? With AT commands?

Would you ping at the same time that the Nova is in receive mode, or directly before going into the receive state?

Reuben · March 20, 2019, 4:08am

Yep, we totally agree which is why we want to really get this testing.
Definitely understand your frustration here and we do want to get this working for everyone.

You’re correct about the different power down states on 3G making this a non-issue there.
As far as NB, we’re still in early stages with supporting NB with our SIMs and so this feature doesn’t work at all on NB yet. It’s unclear if it can be supported since NB is incredibly stripped down. We won’t know that until we’re farther along with our NB discussions with carriers.

Reuben · March 20, 2019, 5:50pm

Oh you can send via AT, but I just meant doing a hologram network connect and then something like ping -I ppp0 8.8.8.8
I guess you’d have to do the receive manually at that point though since the modem will be operating over PPP so you could try receiving on port 4010 with the linux nc utility.

I just simulated by opening up a couple terminals on my Pi, then running hologram network connect and then nc -l 4010
I sent a message from the dashboard and it didn’t show up. Then I ran ping -I ppp0 8.8.8.8 in another terminal and the messages started showing up immediately

AndrewGifft · March 20, 2019, 8:05pm

@Reuben are you talking about eDRX / PSM (see: LTE eDRX and PSM Technology Explained for LTE-M1 | Link Labs) when talking about:

I think on most modems both are disabled by default, atleast on the Sara R410M and monitoring the current draw of my R410M based boards I can see the current spikes every ~1.28s for the paging cycle.

If it is one of these features a fix should be to explicitly disable them right?

Reuben · March 20, 2019, 8:23pm

Yes, I think that would be it. I believe the network itself has an ability to assign timing there so I wonder if maybe some operators are using different values which might also explain some of the variation we’ve seen. I’m curious with your quick paging cycle if you see the same phenomena that I’ve described in here.

That page is also very interesting to me since it still claims a paging cycle in seconds even with M1 which shouldn’t cause a problem like this. This seems to conflict with some info we’ve gotten from some carrier partners. We’ll have to do a little more research there. I wonder if there’s a difference here between when it can receive TCP/IP vs when it can receive SMS or something more at the network level.

Reuben · March 20, 2019, 8:53pm

One other thing @AndrewGifft can you post the output of AT+CEDRXS? on that board?

AndrewGifft · March 20, 2019, 9:04pm

Blank response basically:

AT+CEDRXS?
+CEDRXS:

OK

ATI command for reference:

Manufacturer: u-blox
Model: SARA-R410M-02B
Revision: L0.0.00.00.05.06 [Feb 03 2018 13:00:41]
SVN: 00
IMEI: 3527xxxxxxxxxxx

OK

Response from AT+CEDRXRDP:

CEDRXRDP: 0

OK

Here is a plot of my device current, note there is an LED that blinks with a period of 1/2s which is the lower amplitude square wave, you can see the network pings at ~1.25s. This behavior persists indefinitely on these devices (have had them up monitoring current for days, same behavior).

Reuben · March 20, 2019, 9:32pm

Interesting. Yeah it is disabled on yours but on mine it is not. I never did that intentionally so I think some networks might be pushing that setting.

Though I can disable that if I want to.

We just did some more testing around here. If I do turn it off, then sending messages no longer has an issue.
With it on, we’re seeing that the TCP SYN packet needs to be retried a bunch of times and with the TCP backoff that means it takes longer and longer to retry so even though my Nova is only going 9 seconds between tower connects, it ends up taking a lot longer for the message to hit it since we’re basically waiting for the timing to line up right.

So anyway, this does show that the new features built-in to Cat-M are the cause of this but we think it also points to an issue with some of our carrier partner’s networking settings and we’re going to be reaching out to them.

In the meantime, the fix we were already working on should make this more reliable and it sounds like we have a new workaround which is to turn off eDRX on the board to speed things up as well. We might build that into the next version of the SDK receive command.

AndrewGifft · March 20, 2019, 9:45pm

Great to hear, I will be adding that command to my initialization as well, although it would be interesting to know if the carrier can override that setting and re-enable (meaning it would have to be re-disabled periodically).

Hopefully AT+CEDRXS=3 actually disables eDRX and does not allow the network to re-enable. Note this setting is not saved in NVM so needs to be re-issued each power cycle.

Reuben · March 22, 2019, 5:09pm

Hey @Blake we deployed the update this morning. On top of increasing the timeout, it will give better error responses now about what actually went wrong.

That being said, I am still seeing some messages get missed when eDRX is enabled so we may need to tweak things further. We’ll keep looking at it. I think at this point, we’d recommend disabling eDRX on the modem if you want to be able to receive stuff without any issues.

You can do this by issuing the command AT+CEDRXS=3
We’ll probably build an easier way to do this into the SDK at some point in the future.

Note that this will increase overall power usage a little bit.

Reuben · March 22, 2019, 5:10pm

Oh, I should also mention that the R410 can be a little slow just to close the socket at the end of the receive for some reason so it can take a minute for the message to print out.

Blake · March 22, 2019, 5:42pm

@Reuben @AndrewGifft Thanks for the information and updates. I’ve been away from the project the past two days but will try out the updates this weekend.

Does the update affect whether you need to disable eDRX each power cycle, or will it need to be re-issued each time as Andrew indicated?

I’m very new to cellular communication. I’ve googled around on this question but can’t find a straightforward answer: When using eDRX, the radio is off, disconnecting the device from the cell network. It seems like this can be anywhere from 10 seconds to up to 40 minutes or longer between check-ins. If data is sent to the device while it is disconnected, what happens to that data? Will the network wait for the device to come back online and then send the data? Is it lost? Is it up to the sender to keep track of the device’s ability to receive (like pinging it first or something)?

Thanks.

Edit: I see this:

It is recommended that a “store and forward” policy should be supported for PSM. The
operator should consider storing/forwarding the last received packets or an SMS (whichever
is supported) to be sent to the device when it awakens. At a minimum, the last packet of
100 bytes should be sent, to allow the customer to send a simple message. Any
store/forward limitations should be communicated to the customer as part of a service level
agreement.

In this document:
https://www.gsma.com/newsroom/wp-content/uploads/CLP.28v1.0.pdf

Would the operator in this instance be AT&T, Hologram, or my sending device in this instance?

Reuben · March 22, 2019, 5:57pm

Yeah, so if data is sent while disconnected at the moment, it really only gets stored for maybe a minute or so until the usual TCP timeout. In this case, the store and forward policy is kind of a combo of different operators so that’s why it’s a little messy right now.

The update we pushed does not affect whether the local operator can try to suggest that that modem use eDRX so you might still need to send that command occasionally. It’s a little unclear on when that switch might happen by itself. When I turn it off on my modem now it doesn’t seem to always get turned back on. It might only be when it sees an operator for the first time or something.

Sorry can’t give super definite answers to these questions. As you may have seen from other threads, Cat-M1 is still pretty new and there’s still a lot of policies that are getting formulated as more carriers add support.

Blake · March 22, 2019, 6:09pm

Right, but wasn’t today’s update meant to increase the TCP timeout?

Did it get increased to a minute from something less? Or what is it? Because, if I understand correctly, the DRX sleep cycle (or minimum eDRX cycle with 1 hyperframe) is 10.24 seconds, which is less than a minute. So shouldn’t it theoretically wake up and get the message before TCP times out?