16 KiB
---title: "How to talk to your microcontroller over serial" date: 2018-05-19T08:09:46+02:00 ---
Scroll to the end for the TL;DR.
In this article I will give an overview on the protocols spoken on serial ports, highlighting common pitfalls. I will summarize some points on how to design a serial protocol that is simple to implement and works reliably even under error conditions.
If you have done low-level microcontroller firmware you will regularly have had to stuff some data up a serial port to another microcontroller or to a computer. In the age of USB, a serial port is still the simplest and quickest way to get communication to a control computer up and running. Integrating a ten thousand-line USB stack into your firmware and writing the necessary low-level drivers on the host side might take days. Poking a few registers to set up your UART to talk to an external hardware USB to serial converter is a matter of minutes.
This simplicity is treacherous, though. Oftentimes, you start writing
your serial protocol as needs arise. Things might start harmless with
something like SET_LED ON\n, but unless you proceed it is
easy to end up in a hot mess of command modes, protocol states that
breaks under stress. The ways in which serial protocols break are
manifold. The simplest one is that at some point a character is mangled,
leading to both ends of the conversation ending up in misaligned
protocol states. With a fragile protocol, you might end up in a state
that is hard to recover from. In extreme cases, this leads to code such
as this
gem performing some sort of arcane ritual to get back to some known
state, and all just because someone did not do their homework. Below
we'll embark on a journey through the lands of protocol design,
exploring the facets of this deceptively simple problem.
Text-based serial protocols
The first serial protocol you've likely written is a human-readable, text-based one. Text-based protocols have the big advantage that you can just print them on a terminal and you can immediately see what's happening. In most cases you can even type out the protocol with your bare hands, meaning that you don't really need a debugging tool beyond a serial console.
However, text-based protocols also have a number of disadvantages. Depending on your application, these might not matter and in many cases a text-based protocol is the most sensible solution. But then, in some cases they might and it's good to know when you hit one of them.
Problems
Low information density
Generally, you won't be able to stuff much more than four or five bit
of information down a serial port using a human-readable protocol. In
many cases you will get much less. If you have 10 commands that are only
issued a couple times a second nobody cares that you spend maybe ten
bytes per command on nice, verbose strings such as SET LED.
But if you're trying to squeeze a half-kilobyte framebuffer down the
line you might start to notice the difference between hex and base-64
encoding, and a binary protocol might really be more up to the job.
Complex parsing code
On the computer side of thing, with the whole phalanx of an operating
system, the standard library of your programming language of choice and
for all intents and purposes unlimted CPU and memory resources to spare
you can easily parse anything spoken on a serial port in real time, even
at a blazing fast full Megabaud. The microcontroller side however is an
entirely different beast. On a small microcontroller, printf
alone will eat about half your flash. On most small microcontrollers,
you just won't get a regex library even though it would make parsing
textual commands so much simpler. Lacking these resources, you
might end up hand-knitting a lot of low-level C code to do something
seemingly simple such as parsing
set_channel (13, 1.1333)\n. These issues have to be taken
into account in the protocol design from the beginning. If you don't
really need matching parentheses, don't use them.
Fragile protocol state
Say you have a SET_DISPLAY command. Now say your display
can display four lines of text. The obvious approach to this is probably
the SMTP
or HTTP
way of sending
SET_DISPLAY\nThis is line 1\nThis is line 2\n\n. This would
certainly work, but it is very fragile. With this protocol, you're in
trouble if at any point the terminating second newline character gets
mangled (say, someone unplugs the cable, or the control computer
reboots, or a cosmic ray hits something and 0x10 '\n' turns
into 0x50 'P').
Timeouts don't work
You might try to solve the problem of your protocol state machine
tangling up with a timeout. "If I don't get a valid command for more
than 200ms I go back to default state." But consider the above example.
Say, your control computer sends a SET_DISPLAY command
every 100ms. If in one of them the state machine tangles up, the parser
hangs since the timeout is never hit, a new line of text arriving every
100ms.
Framing is hard
You might also try to drop the second newline and using a convention
such as SET_DISPLAY is followed by two lines of text, then
commands resume.". This works as long as your display contents never
look like commands. If you are only ever displaying the same three
messages on a character LCD that might work, but if you're displaying
binary framebuffer data you've lost.
Solutions
Keep the state machine simple
Always use a single line of text to represent a single command. Don't
do protocol states or modes where you can toggle between different
interpretations for a line. If you have to send human-readable text as
part of a command (such as SET_DISPLAY) escape it so it
doesn't contain any newlines.
This way, you keep your protocol state machine simple. If at any time your serial trips and flips a bit or looses a byte your protocol will recover on the next newline character, returning to its base state.
Encode numbers in hex when possible
Printing a number in hexadecimal is a very tidy operation, even on the smalest 8-bit microcontrollers. In contrast, printing decimal requires both division and remainder in a loop which might get annoyingly code- and time-intensive on large numbers (say a 32-bit int) and small microcontrollers.
If you have to send fractional values, consider their precision.
Instead of sending a 12 bit ADC result as a 32-bit float formatted like
0.176513671875 sending 0x2d3 and dividing by
4096 on the host might be more sensible. If you really have to
communicate big floats and you can't take the overhead of including both
printf
and scanf
you can use hexadecimal floating point, which is basically
hex((int)foo) + "." + hex((int)(65536*(foo - (int)foo)))
for four digits. You can also just hex-encode the binary IEEE-754
representation of the float, sending
hex(*(int *)&float). Most programming languages will
have a simple,
built-in means to parse this sort of thing.
Escape multiline strings
If you have to send arbitrary strings, escape special characters.
This not only has the advantage of yielding a robust protocol: It also
ensures you can actually see everything that's going on when debugging.
The string "\r\n" is easy to distinguish from
"\n" while your terminal emulator might not care.
The simplest encoding to use is the C-style backslash encoding. Host-side, most languages will have a built-in means of escaping a string like that.
Encoding binary data
For binary data, hex and base-64 are the most common encodings. Since hex is simpler to implement I'd go with it unless I really need the 30% bandwidth improvement base-64 brings.
Binary serial protocols
In contrast to anything human-readable, binary protocols are generally more bandwidth-efficient and are easier to format and parse. However, binary protocols come with their own version of the caveats we discussed for text-based protocols.
The framing problem in binary protocols
The most basic problems with binary protocols as with text-based ones is framing, i.e. splitting up the continuous serial data stream into discrete packets. The issue is that it is that you have to somehow mark boundaries between frames. The simplest way would be to use some special character to delimit frames, but then any 8-bit character you could choose could also occur within a frame.
SLIP/PPP-like special character framing
Some protocols solve this problem much like we have solved it above
for strings in line-based protocols, by escaping any occurence of the
special delimiter character within frames. That is, if you want to use
0x00 as a delimiter, you would encode a packet containing
0xde 0xad 0x00 0xbe 0xef as something like
0xde 0xad 0x01 0x02 0xbe 0xef, replacing the null byte with
a magic sequence. This framing works, but is has one critical
disadvantage: The length of the resulting escaped data is dependent on
the raw data, and in the worst case twice as long. In a raw packet
consisting entirely of null bytes, every byte must be escaped with two
escape bytes. This means that in this case the packet length doubles,
and in this particular case we're even less efficient than base-64.
Highly variable packet length is also bad since it makes it very hard to make any timing guarantees for our protocol.
9-bit framing
A framing mode sometimes used is to configure the UARTs to transmit 9-bit characters and to use the 9th bit to designate control characters. This works really well, and gives plenty of control characters to work with. The main problem with this is that a 9-bit serial interface is highly nonstandard and you need UARTs on both ends that actually support this mode. Another issue is that though more efficient than both delmitier-based and purely text-based protocols, it still incurs an extra about 10% of bandwidth overhead. This is not a lot if all you're sending is a little command every now and then, but if you're trying to push large amounts of data through your serial it's still bad.
COBS
Given the limitations of the two above-mentioned framing formats, we really want something better. The Serial Line Internet Protocol (SLIP) as well as the Point to Point Protocol (PPP), standardized in 1988 and 1994 respectively, both use escape sequences. This might come as a surprise, but humanity has actually still made significant technological progress on protocols for 8-bit serial interfaces until the turn of the millennium. In 1999, Consistent Overhead Byte Stuffing (COBS) (wiki) was published by a few researchers from Apple Computer and Stanford University. As a reaction on the bandwidth doubling problem present in PPP, COBS always has an overhead of a single byte, no matter what or how long a packet's content is.
COBS uses the null byte as a delimiter interleaves all the raw packet data and a run-length encoding of the non-zero portions of the raw packet. That is, it prepends the number of bytes until the first zero byte to the packet, plus one. Then it takes all the leading non-zero bytes of the packet, unmodified. Then, it again encodes the distance from the first zero to the second zero, plus one. And then it takes the second non-zero run of bytes unmodified. And so on. At the end, the packet is terminated with a zero byte.
The result of this scheme is that the encoded packet does not contain any zero bytes, as every zero byte has been replaced with the number of bytes until the next zero byte, plus one, and that can't be zero. Both formatter and parser each have to keep a counter running to keep track of the distances between zero bytes. The first byte of the packet initializes that counter and is dropped by the parser. After that, every encoded byte received results in one raw byte parsed.
While this might sound more complicated than the escaping explained above, the gains in predictability and efficiency are worth it. An implementation of encoder and decoder should each be about ten lines of C or two lines of Python. A minor asymmetry of the protocol is that while decoding can be done in-place, encoding either needs two passes or you need to scan forward for the next null byte.
State machines and error recovery
In binary protocols even more than in textual ones it is tempting to build complex state machines triggering actions on a sequence of protocol packets. Please resist that temptation. As with textual protocols keeping the protocol state to the minimum possible allows for a self-synchronizing protocol. A serial protocol should be designed such that if due to a dropped packet or two both ends will naturally re-synchronize within another packet or two. A simple way of doing that is to always transmit one semantic command per packet and to design these commands in the most idempotent way possible. For example, when filling a framebuffer piece by piece, include the offset in each piece instead of keeping track of it on the receiving side.
Conclusion
Here's your five-step guide to serial bliss:
- Unless you have super-special requirements, always use the slowest you can get away with from 9600Bd, 115200Bd or 1MBd. 8N1 framing if you're talking to anything but another microcontroller on the same board. These settings are the most common and cover any use case. You'll inevitably have to guess these at some point in the future.
- If you're doing something simple and speed is not a particular concern, use a human-readable text-based protocol. Use one command/reply per line, begin each line with some sort of command word and format numbers in hexadecimal. You get bonus points if the device replies to unknown commands with a human-readable status message and prints a brief protocol overview on boot.
- If you're doing something even slightly nontrivial or need moderate
throughput (>1k commands per second or >20 byte of data per
command) use a COBS-based protocol. If you don't have a better idea, go
for an
[target MAC][command ID][command arguments]packet format for multidrop busses. For single-drop you may decide to drop the MAC address. - Always include some sort of "status" command that prints life stats such as VCC, temperature, serial framing errors and uptime. You'll need some sort of ping command anyway and that one might as well do something useful.
- If at all possible, keep your protocol context-free across
packets/lines. That is, a certain command should always be
self-contained, and no command should change the meaning of the next
packet or line that is sent. This is really important to allow for
self-synchronization. If you really need to break up something into
multiple commands, say you want to set a large framebuffer in pieces, do
it in a idempotent
way: Instead of sending something like
FRAMEBUFFER INCOMING:\n[byte 0-16]\n[byte 17-32]\n[...]\nEND OF FRAMErather sendFRAMEBUFFER DATA FOR OFFSET 0: [byte 0-16]\nFRAMEBUFFER DATA FOR OFFSET 17: [byte 17-32]\n[...]\nSWAP BUFFERS\n.