Minimizing data usage and protocol timeouts

I’m trying to understand how to minimize communication overhead in order to get the most data transferred per $$. In other posts it is explained that the typical per-message overhead is around 500 bytes, which sounds like the TCP syn + syn-ack + data + ack + fin + fin-ack overheads. With this context I’m wondering about keeping TCP connections alive across messages in order to avoid all the connection establishment and tear-down overhead.

I have a variety of use-cases of devices that send relatively small messages with ~20-30 bytes of payload. Some send every minute, some every 5, others every 60 minutes. I have the feeling that slightly different strategies are needed for these. Some specific questions I have:

  • After how many seconds of inactivity does the hologram socket API server disconnect? E.g. can I send a message every 60 seconds on the same socket or do I have to establish a fresh TCP connection each time?
  • Is there something causing TCP connections to break after some time in the cell network (e.g., NAT timeout or IP address change)?
  • Is my evaluation correct that if I want to minimize overhead by keeping connections open I better use my own server and avoid the Hologram cloud?

My last question related to all this is whether I’m being billed for the PPP layer encapsulation or only for the actual IP-level bytes.

1 Like

Good question! I was wondering of the overhead as well, it seems high at 500-700 bytes per message.

I know it is not raw tcp, but still … what does it need that much overhead for?

(I have a similar usage case to yours, with small packets)

Well 500 bytes is what it takes to set-up a TCP connection and tear it down. The minimum IP header is 20 bytes and the minimum TCP header is also 20 bytes.
A minimal TCP connection that transfers data in both directions and is closed by a server-side timeout consists of a SYN packet, a SYN-ACK reply, a data packet, a response packet with piggy-backed ACK, an ACK for that and then on timeout a FIN packet, a FIN-ACK reply, and a final ACK. Depending on timing at the server end the initial data packet may see an ACK followed by the response in a separate packet. So that’s easily 9 packets times 40 bytes of overhead, i.e. 360 bytes.
Often there are TCP options that take up some more bytes and I’m not clear whether we’re charged for the PPP protocol overhead which adds a few bytes to each packet as well.
Then comes the message itself. The minimal 1-byte message with one 1-byte topic would be something like
{k:“12345678”,d=“X”,t=“T”}
where X is the message and T is the topic, this adds another 24 bytes of overhead. So I’m not quite getting to 500 bytes, but you see what’s happening. If I had a SIM I could dump the packets and provide a concrete example.
If we can keep a connection open then each message would have ~80 bytes of overhead for the message packet itself and then the ACK reply, assuming that no explicit response comes back from the server, and more like 120 bytes of overhead otherwise.
If you use UDP you lower the per-packet overhead a bit and you get to control exactly when a packet is sent, but you also have more work WRT handling lost packets. However, for some applications this is actually pretty easy.

After posting I realized I didn’t need to wait for a SIM, I could easily try this on any linux box. So I ran a tcpdump in one window and ran my little example from above in another:
echo ‘{k:“12345678”,d=“X”,t=“T”}’ | nc -q 10 23.253.146.203 9999
which printed [2,0] since a key of 12345678 isn’t valid, but that doesn’t really change the example very much. The tcpdump output (I’m omitting the packet details except for the one that carries the message payload):

> sudo tcpdump -n -X -e host 23.253.146.203
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
22:10:52.296729 f4:6d:04:ed:62:ca > 36:e6:6a:0e:97:b1, ethertype IPv4 (0x0800), length 74: 192.168.0.3.42996 > 23.253.146.203.9999: Flags [S], seq 3484201343, win 29200, options [mss 1460,sackOK,TS val 910947861 ecr 0,nop,wscale 7], length 0
22:10:52.569254 36:e6:6a:0e:97:b1 > f4:6d:04:ed:62:ca, ethertype IPv4 (0x0800), length 74: 23.253.146.203.9999 > 192.168.0.3.42996: Flags [S.], seq 3967561882, ack 3484201344, win 14480, options [mss 1298,sackOK,TS val 2405650858 ecr 910947861,nop,wscale 9], length 0
22:10:52.569296 f4:6d:04:ed:62:ca > 36:e6:6a:0e:97:b1, ethertype IPv4 (0x0800), length 66: 192.168.0.3.42996 > 23.253.146.203.9999: Flags [.], ack 1, win 229, options [nop,nop,TS val 910947929 ecr 2405650858], length 0
22:10:52.569366 f4:6d:04:ed:62:ca > 36:e6:6a:0e:97:b1, ethertype IPv4 (0x0800), length 93: 192.168.0.3.42996 > 23.253.146.203.9999: Flags [P.], seq 1:28, ack 1, win 229, options [nop,nop,TS val 910947929 ecr 2405650858], length 27
        0x0000:  4500 004f 476c 4000 4006 87c9 c0a8 0003  E..OGl@.@.......
        0x0010:  17fd 92cb a7f4 270f cfac b180 ec7c 309b  ......'......|0.
        0x0020:  8018 00e5 7ae6 0000 0101 080a 364b f659  ....z.......6K.Y
        0x0030:  8f63 51aa 7b6b 3a22 3132 3334 3536 3738  .cQ.{k:"12345678
        0x0040:  222c 643d 2258 222c 743d 2254 227d 0a    ",d="X",t="T"}.
22:10:52.844227 36:e6:6a:0e:97:b1 > f4:6d:04:ed:62:ca, ethertype IPv4 (0x0800), length 66: 23.253.146.203.9999 > 192.168.0.3.42996: Flags [.], ack 28, win 29, options [nop,nop,TS val 2405650927 ecr 910947929], length 0
22:10:52.846726 36:e6:6a:0e:97:b1 > f4:6d:04:ed:62:ca, ethertype IPv4 (0x0800), length 71: 23.253.146.203.9999 > 192.168.0.3.42996: Flags [P.], seq 1:6, ack 28, win 29, options [nop,nop,TS val 2405650928 ecr 910947929], length 5
22:10:52.846755 f4:6d:04:ed:62:ca > 36:e6:6a:0e:97:b1, ethertype IPv4 (0x0800), length 66: 192.168.0.3.42996 > 23.253.146.203.9999: Flags [.], ack 6, win 229, options [nop,nop,TS val 910947998 ecr 2405650928], length 0
22:10:52.846763 36:e6:6a:0e:97:b1 > f4:6d:04:ed:62:ca, ethertype IPv4 (0x0800), length 66: 23.253.146.203.9999 > 192.168.0.3.42996: Flags [F.], seq 6, ack 28, win 29, options [nop,nop,TS val 2405650928 ecr 910947929], length 0
22:10:52.846840 f4:6d:04:ed:62:ca > 36:e6:6a:0e:97:b1, ethertype IPv4 (0x0800), length 66: 192.168.0.3.42996 > 23.253.146.203.9999: Flags [F.], seq 28, ack 7, win 229, options [nop,nop,TS val 910947999 ecr 2405650928], length 0
22:10:53.110595 36:e6:6a:0e:97:b1 > f4:6d:04:ed:62:ca, ethertype IPv4 (0x0800), length 66: 23.253.146.203.9999 > 192.168.0.3.42996: Flags [.], ack 29, win 29, options [nop,nop,TS val 2405650996 ecr 910947999], length 0

The packet lengths include a 14-byte ethernet header which doesn’t apply over cellular, so we have: 60+60+52+79+52+57+52+52+52+52 we have 568 bytes!

I then proceeded to turn off TCP timestamps (one of the TCP options on my system) and ran the same test (using more verbose printing by TCP dump so I get the actual packet length without the ethernet header):

22:31:18.983050 f4:6d:04:ed:62:ca > 36:e6:6a:0e:97:b1, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 62682, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.0.3.43698 > 23.253.146.203.9999: Flags [S], cksum 0x873e (correct), seq 3886895327, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
22:31:19.272847 36:e6:6a:0e:97:b1 > f4:6d:04:ed:62:ca, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 47, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    23.253.146.203.9999 > 192.168.0.3.43698: Flags [S.], cksum 0xa2c3 (correct), seq 446628723, ack 3886895328, win 14600, options [mss 1298,nop,nop,sackOK,nop,wscale 9], length 0
22:31:19.272889 f4:6d:04:ed:62:ca > 36:e6:6a:0e:97:b1, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 64, id 62683, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.0.3.43698 > 23.253.146.203.9999: Flags [.], cksum 0x1b19 (correct), seq 1, ack 1, win 229, length 0
22:31:19.272959 f4:6d:04:ed:62:ca > 36:e6:6a:0e:97:b1, ethertype IPv4 (0x0800), length 81: (tos 0x0, ttl 64, id 62684, offset 0, flags [DF], proto TCP (6), length 67)
    192.168.0.3.43698 > 23.253.146.203.9999: Flags [P.], cksum 0x0697 (correct), seq 1:28, ack 1, win 229, length 27
22:31:19.547341 36:e6:6a:0e:97:b1 > f4:6d:04:ed:62:ca, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 47, id 64657, offset 0, flags [DF], proto TCP (6), length 40)
    23.253.146.203.9999 > 192.168.0.3.43698: Flags [.], cksum 0x1bc6 (correct), seq 1, ack 28, win 29, length 0
22:31:19.557319 36:e6:6a:0e:97:b1 > f4:6d:04:ed:62:ca, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 47, id 64658, offset 0, flags [DF], proto TCP (6), length 45)
    23.253.146.203.9999 > 192.168.0.3.43698: Flags [P.], cksum 0x3756 (correct), seq 1:6, ack 28, win 29, length 5
22:31:19.557347 f4:6d:04:ed:62:ca > 36:e6:6a:0e:97:b1, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 64, id 62685, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.0.3.43698 > 23.253.146.203.9999: Flags [.], cksum 0x1af9 (correct), seq 28, ack 6, win 229, length 0
22:31:19.558083 36:e6:6a:0e:97:b1 > f4:6d:04:ed:62:ca, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 47, id 64659, offset 0, flags [DF], proto TCP (6), length 40)
    23.253.146.203.9999 > 192.168.0.3.43698: Flags [F.], cksum 0x1bc0 (correct), seq 6, ack 28, win 29, length 0
22:31:19.558170 f4:6d:04:ed:62:ca > 36:e6:6a:0e:97:b1, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 64, id 62686, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.0.3.43698 > 23.253.146.203.9999: Flags [F.], cksum 0x1af7 (correct), seq 28, ack 7, win 229, length 0
22:31:19.846464 36:e6:6a:0e:97:b1 > f4:6d:04:ed:62:ca, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 47, id 64660, offset 0, flags [DF], proto TCP (6), length 40)
    23.253.146.203.9999 > 192.168.0.3.43698: Flags [.], cksum 0x1bbf (correct), seq 7, ack 29, win 29, length 0

This now results in packets of length 52+52+40+67+40+45+40+40+40+40 = 456 bytes. So that eliminated 112 bytes right there! It’s probably possible to shave off another few bytes from the first two packets, but that might get iffy. After that, well, need to find a strategy that doesn’t require setting up a fresh TCP connection for each packet exchange…

I should note that a message with a correct key will look slightly differently because the server won’t close the connection immediately as far as I understand. But in broad strokes it should look very much the same.

This is a great analysis.
One thing: that error message is because it’s not proper json. Try putting quotes around the letters.
You can also test with our compact message format which is what the dash uses. That would shave a few bytes:
A12345678 TT SX

(The Dash would send slightly more bytes than that because it uses a different authentication scheme but it should be close)

Thanks for catching the JSON error, corrected it’s:

echo '{"k":"12345678","d":"X","t":"T"}' | nc -q 10 23.253.146.203 9999
[0,0]

Can you explain why I’m not getting an auth error?

Also, I have to admit I’m left really wondering why Holgram doesn’t support UDP.

Yeah so for security and efficiency reasons the actual authentication happens at a different step. Basically nobody can brute force a key this way.

Agreed on the benefits of UDP. It’s something we’re looking into.

Some further thoughts:

If my device is sending at an interval smaller than the cell carrier’s timeout, then UDP can be more efficient for many simple use-cases. I wonder what happens at intervals that are greater than the timeout. If there is not some special charge for reconnecting after timeout then I may not care about the timeout. The two downsides I see are power consumption to reconnect and inability to send packets to the device. The latter may or may not be an issue if the cloud can queue packets. E.g., in many use-cases I may be fine if I can only send something to the device when it sends an outgoing message to the cloud. If I need two-way comms for troubleshooting or such I could always send it a message to switch to a different mode for some time-period.

Over at particle.io I read that carriers use shorter timeouts for UDP than TCP (see Network Timers | AT&T Developer for an example), which makes me wonder why not send connectionless datagrams using fake TCP headers? That requires some network hackery on the linux side but should not be that difficult. Probably not a good avenue for Hologram as it would eventually cause bad relationships with carriers.

I also wonder whether one couldn’t keep a TCP NAT alive by abusing the TCP protocol a little. E.g., is a reply by the cloud required or is an outgoing packet sufficient? The outgoing packet could be specially crafted to be the minimum 40 bytes and the cloud server could be configured not to reply. Actually, one can probably craft the outgoing packet with a sequence number that causes the linux stack not to reply, need to go and read-up on those details…

Mhhh, somehow I’m sure others have played with such tricks before…

(BTW the comment that “Basically nobody can brute force a key this way” has me concerned, 'cause I don’t see how not providing a proper auth response helps much here and so I wonder how secure your system is.)

Sounds like you really know your networking. Really appreciate your thoughts here.

We can’t get into too much detail on carrier stuff but in some cases we have control over those timeouts and in some cases we don’t depending on our agreements and relationships.

We’re going down similar lines of thought as you and are experimenting with different protocols to see what would work well here. Some sort of UDP-based custom protocol that took certain concepts from TCP in order to acknowledge delivery would be a good way to save data while preventing lost messages but it might be a little while before we have that fully fleshed out.

Reply is actually not required right now so if you close the connection immediately after sending it should still go through (depending on things being buffered by the OS) and you may not have to incur any bytes on the response. It’s very timing dependent though and we’ve seen on some devices that it’s a coin flip on whether it actually sends the message or not.

Heh, good point. I didn’t mean to come across as that confident as nothing like that is really impossible. The main advantage from this architecture on the security side is that an attacker doesn’t know if he actually found the right key or not and really would have no idea if his message made it into a customer’s account unless he had access to the account itself at which point this is moot. This makes it incredibly difficult to try to inject invalid data into someone’s account as you never really know what their key is.

That being said, we realize that this isn’t great feedback for customers either and are doing some reevaluation around here about whether we want to make changes to our architecture to provide that authentication response. It’s a slightly bigger change because there’s actually a physical separation between that TCP endpoint service and our actual cloud service that does the authentication and routing.

Thanks for the thoughtful reply.

Reply is actually not required right now so if you close the connection immediately after sending it should still go through (depending on things being buffered by the OS) and you may not have to incur any bytes on the response. It’s very timing dependent though and we’ve seen on some devices that it’s a coin flip on whether it actually sends the message or not.

If you close the connection you will at the very least incur a FIN packet. What I’m wondering is whether a crafted packet isn’t a better solution. If you have a long-lived TCP connection, you could craft a packet that the server doesn’t respond to.

E.g., open a raw IP socket (if using Linux), craft a TCP SYN packet and send to cloud, receive corresponding SYN-ACK, respond with ACK. Now you have an open connection. Then, every 1700 seconds (assuming 1800 sec timeout) send a data packet but use a sequence number that is bogus. I haven’t looked in years but IIRC there is a range where the server considers the packet to be an attack and neither responds nor sends a connection reset. If you can’t make the Linux TCP stack on the cloud end cooperate you can use a raw socket there too. All you have to do is send valid TCP packets in the whole process and hope the carriers keep their timeout logic simple enough not to notice that this is a pure keep-alive socket with no real data.

If this works, it seems to me that you could get away with 63KB/mo of data to keep the connection alive 24x7: 36002431/1700 packets at 40 bytes each. OK, I’m sure there’s a bit more but hopefully <100KB.

Thanks for the analysis tve!

In the past I made a protocol (wrapper around UDP) for a SCADA system where I did not want the overhead of setting up / tearing down a TCP connection for small data packets.

Given that the cellular packets go to the hologram servers for translation, perhaps UDP should be used with some scheme to set up a (virtual) connection with one message, returning a handle to the client - similar to what I did for that system.

Subsequent messages could consist of the handle,sequence number, and encrypted minimal data frame (size of packet, data, crc)

It should be possible to keep the overhead to ~60 bytes.

Another message could be used to close the connection and release the handle.

This would take the minimum packet down to about 60-80 bytes of overhead, instead of 500-800 bytes of overhead.

Heck, it may be possible to reduce the header down to ~20 bytes by storing more state (ip, port, etc) on the hologram server, and only using the handle+sequence+length+crc+encrypted user data

Right now, I am waiting for my developer sim to arrive (I never got a confirmation email yet) and thinking of applications.

I wonder what the packet overhead is for sending SMS, and what the latency would be.

Reuben - glad to see you guys are working on making the overhead lower :slight_smile:

Most of my potential applications would involve small packets in the range of 50-100 bytes of data, so if the overhead is much lower, the SMS transmission would be a better fit for me - as otherwise 80-90% of the data traffic would be overhead.

Packing multiple data sets per message would reduce the frequency of updates - which is undesirable for my potential uses.

Heck, it may be possible to reduce the header down to ~20 bytes by storing more state (ip, port, etc) on the hologram server, and only using the handle+sequence+length+crc+encrypted user data

You still have to talk IP across the cellular network, that’s 20 bytes per packet right there. As far as I can tell, you also best talk a known protocol, else the carrier NAT stuff will not work to your favor. Look at how AT&T has different timeouts for different IP protocols.

You probably also need some form of ACK in many use-cases. If you can tell the server “expect a packet at least every minute from me” then you could use a NACK scheme where the cloud sends a NACK if it doesn’t receive a packet. You may still need infrequent ACKs as a fall-back and it’s more complicated overall, but that’s the only way I can imagine to get below 2x(IP+UDP) overhead per message.

Thanks - I have essentially zero knowledge of how the data is carried over the cellular network.

Revised suggestion based on your response:

Standard minimalist ip packet to the hologram server, UDP, with an embedded “transaction id” and encrypted user data. CRC, length etc left as normal, so basically a standard udp packet with an embedded custom encrypted frame.

Assuming an eight byte transaction id, that would make it 28 byte minimum + encrypted user data, after the “virtual conversation” was set up.

Even 50 bytes overhead would be 1/10th or less of the current method.

For proper encryption I believe you need a little more than that, but not a lot. Also, if you compare to the existing TCP method you have to include the cost of some ACK since the TCP method does provide one. The ACK would have the same overhead as a minimal payload (the ACK needs to be encrypted to be proper). If you don’t need ACKs then you are correct.

Probably, I was pretty much waving my hands in the air, and hoping to store most of that state information on the server - with the “transaction id” used on the server to retrieve all assorted information, with encryption keys shared during the “open connection” transaction.

Good point about the need for an ACK (and NACK) message, but that could be kept really light, just TCP minimal packet, “handle”, and ack/nack in a tiny encrypted packet (so a man in the middle attack could not easily spoof NACK’s)

Basically, I love the idea of affordable IoT cell connectivity, however a 500-700 byte packet overhead kills a lot of potential applications, with 1MB of data being consumed in roughly 1000 messages per month, which is not sufficient for a lot of potential uses.

If the overhead is lowered sufficiently (say 50 bytes or less per message) we are looking at approx 10k messages a month - much more reasonable for devices generating say 50 bytes of user data per message.

I got my SIMs a week ago and finally found time to prototype something… I wrote a tiny service that listens on UDP port 9999, prints incoming packets, and responds with “OK”. So far it look encouraging. Here’s the client side (cell connection) where I send something that looks like a real hologram message but using UDP:

> echo '{"k":"=xxxxxxx","d":"Hello, World!","t":"test"}' | nc -q 10 -u 52.40.91.111 9999
OK

On the server side it shows the source IP and port as well as the message:

2017/08/12 23:11:07 Listening...
RX 185.3.54.10:8817: "{\"k\":\"=xxxxxxx\",\"d\":\"Hello, World!\",\"t\":\"test\"}\n"

I’m running a tcpdump on the wwan0 cell interface, which shows:

16:17:17.863713 IP (tos 0x0, ttl 64, id 20523, offset 0, flags [DF], proto UDP (17), length 76)
    10.170.0.77.46778 > 52.40.91.171.9999: UDP, length 48
        0x0000:  4500 004c 502b 4000 4011 4fac 0aaa 004d  E..LP+@.@.O....M
        0x0010:  3428 5bab b6ba 270f 0038 eff4 7b22 6b22  4([...'..8..{"k"
        0x0020:  3a22 3d65 0000 0000 2976 222c 2264 223a  :"=xxxxxxx","d":
        0x0030:  2248 656c 6c6f 2c20 576f 726c 6421 222c  "Hello,.World!",
        0x0040:  2274 223a 2274 6573 7422 7d0a            "t":"test"}.
16:17:18.800495 IP (tos 0x5c, ttl 39, id 54387, offset 0, flags [DF], proto UDP (17), length 31)
    52.40.91.171.9999 > 10.170.0.77.46778: UDP, length 3
        0x0000:  455c 001f d473 4000 2711 e434 3428 5bab  E\...s@.'..44([.
        0x0010:  0aaa 004d 270f b6ba 000b 2df9 4f4b 0a    ...M'.....-.OK.

So that’s 76+31 bytes for a 13 byte message and an ACK :-).

I’m currently on T-Mobile and doing the same exchange 9 minutes later worked like a charm but yielded a fresh source port, i.e., fresh NAT entry. However, it looks like the exchange takes 4-5 seconds, perhaps the modem first has to acquire a carrier? Need to figure that one out… (I use a somewhat ancient sierra 313U.)

My ultra-simple GW code can be found at GitHub - tve/hologw: UDP gateway for hologram cell service. If you have the Go compiler compile with go build. On an X64 linux box you can use the provided hologw binary if you trust me (statically linked, runs on any distro).

I think the next step will be to write something a tad more sophisticated to test out the reliability and the NAT response duration…

I did some more testing, more of the anecdotal form than rigorous… I added a delay into my GW code such that I can send it a JSON message with a “a” field that has an integer that tells the GW to respond after that many seconds of delay. If I choose a delay of 28 seconds it comes through. If I choose a delay of 32 seconds it doesn’t. Looks like T-Mobile’s UDP NAT timeout is 30 seconds…

I also played a bit with long-lasting TCP connections to look what happens if I have a TCP connection open and I send a few bytes after a couple of minutes. Well, anecdotally, the first packet takes a while to get through causing the TCP stack to pretty immediately retransmit. So what should be a ~70 byte outgoing packet with a ~60 byte ACK coming back becomes two copies of the outgoing packet followed by two ACKs. That’s not cool.

My conclusion so far is:

  • if you only care about outbound (dev->GW), use UDP, most likely use an explicit ACK packet and retransmit according to your app’s needs.
  • if you have regular outbound packets but also want inbound then queue them at the GW and deliver them via UDP in the 30 second window after the next outbound
  • if you want more interactive inbound then send the device a “start TCP” message that causes it to open a TCP connection so you can use the STD inbound stuff.
  • if the above schemes don’t work for you, then what hologram currently does is probably the best you can do and what it’s gonna cost…
  • I guess you could do better by faking a TCP connection, this way you can probably get away with one packet per keep-alive and for sure avoid the duplication due to rexmit that I mentioned above.

Last message of me talking to myself… I finished off a very, very minimally working GW: send it a Hologram Socket API message via UDP and it forwards it to Hologram via TCP, reads the response and sends that back via UDP. Bonus feature is to delay the response in order to be able to test the carrier’s UDP NAT timeout. 85 lines of code, no frills, x64 binary provided: GitHub - tve/hologw: UDP gateway for hologram cell service