serial protocol post: light proofreading, fix link

This commit is contained in:
jaseg 2021-11-25 12:40:49 +01:00
parent e7a91af6b5
commit fdb87d6c0b

View file

@ -10,13 +10,13 @@ summarize some points on how to design a serial protocol that is simple to imple
conditions.
If you have done low-level microcontroller firmware you will regularly have had to stuff some data up a serial port to
another microcontroller or to a computer. In the age of USB, a serial port is still the simplest and quickest way to get
communication to a control computer up and running. Integrating a ten thousand-line USB stack into your firmware and
writing the necessary low-level drivers on the host side might take days. Poking a few registers to set up your UART to
talk to an external hardware USB to serial converter is a matter of minutes.
another microcontroller or to a computer. In the age of USB, an old-school serial port is still the simplest and
quickest way to get communication to a control computer up and running. Integrating a ten thousand-line USB stack into
your firmware and writing the necessary low-level drivers on the host side might take days. Poking a few registers to
set up your UART to talk to an external hardware USB to serial converter is a matter of minutes.
This simplicity is treacherous, though. Oftentimes, you start writing your serial protocol as needs arise. Things might
start harmless with something like ``SET_LED ON\n``, but unless you proceed it is easy to end up in a hot mess of command
start harmless with something like ``SET_LED ON\n``, but as the code grows it is easy to end up in a hot mess of command
modes, protocol states that breaks under stress. The ways in which serial protocols break are manifold. The simplest one
is that at some point a character is mangled, leading to both ends of the conversation ending up in misaligned protocol
states. With a fragile protocol, you might end up in a state that is hard to recover from. In extreme cases, this leads
@ -24,7 +24,7 @@ to code such as `this gem`_ performing some sort of arcane ritual to get back to
someone did not do their homework. Below we'll embark on a journey through the lands of protocol design, exploring the
facets of this deceptively simple problem.
.. _`this gem`: https://github.com/juhasch/pyBusPirateLite/blob/master/pyBusPirateLite/BBIO_base.py#L68
.. _`this gem`: https://github.com/juhasch/pyBusPirateLite/blob/dece35f6e421d4f6a007d1db98d148e2f2126ebb/pyBusPirateLite/base.py#L113
Text-based serial protocols
===========================
@ -45,10 +45,10 @@ Low information density
~~~~~~~~~~~~~~~~~~~~~~~
Generally, you won't be able to stuff much more than four or five bit of information down a serial port using a
human-readable protocol. In many cases you will get much less. If you have 10 commands that are only issued a couple
times a second nobody cares that you spend maybe ten bytes per command on nice, verbose strings such as ``SET LED``. But
if you're trying to squeeze a half-kilobyte framebuffer down the line you might start to notice the difference between
hex and base-64 encoding, and a binary protocol might really be more up to the job.
single byte of a human-readable protocol. In many cases you will get much less. If you have 10 commands that are only
issued a couple times a second nobody cares that you spend maybe ten bytes per command on nice, verbose strings such as
``SET LED``. But if you're trying to squeeze a half-kilobyte framebuffer down the line you might start to notice the
difference between hex and base-64 encoding, and a binary protocol might really be more up to the job.
Complex parsing code
~~~~~~~~~~~~~~~~~~~~
@ -60,7 +60,7 @@ an entirely different beast. On a small microcontroller, printf_ alone will eat
microcontrollers, you just won't get a regex library even though it would make parsing textual commands *so much
simpler*. Lacking these resources, you might end up hand-knitting a lot of low-level C code to do something seemingly
simple such as parsing ``set_channel (13, 1.1333)\n``. These issues have to be taken into account in the protocol design
from the beginning. If you don't really need matching parentheses, don't use them.
from the beginning. For example, you don't really need matching parentheses, don't use them.
Fragile protocol state
~~~~~~~~~~~~~~~~~~~~~~
@ -80,7 +80,7 @@ Timeouts don't work
You might try to solve the problem of your protocol state machine tangling up with a timeout. "If I don't get a valid
command for more than 200ms I go back to default state." But consider the above example. Say, your control computer
sends a ``SET_DISPLAY`` command every 100ms. If in one of them the state machine tangles up, the parser hangs since the
timeout is never hit, a new line of text arriving every 100ms.
timeout is never hit, because a new line of text is arriving every 100ms.
Framing is hard
~~~~~~~~~~~~~~~
@ -96,9 +96,9 @@ Solutions
Keep the state machine simple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Always use a single line of text to represent a single command. Don't do protocol states or modes where you can toggle
between different interpretations for a line. If you have to send human-readable text as part of a command (such as
``SET_DISPLAY``) escape it so it doesn't contain any newlines.
In a text-based protocol, always use a single line of text to represent a single command. Don't do protocol states or
modes where you can toggle between different interpretations for a line. If you have to send human-readable text as part
of a command (such as ``SET_DISPLAY``), escape it so it doesn't contain any newlines.
This way, you keep your protocol state machine simple. If at any time your serial trips and flips a bit or looses a byte
your protocol will recover on the next newline character, returning to its base state.
@ -229,19 +229,19 @@ Conclusion
Here's your five-step guide to serial bliss:
1. Unless you have super-special requirements, always use the slowest you can get away with from 9600Bd, 115200Bd or
1MBd. 8N1 framing if you're talking to anything but another microcontroller on the same board. These settings are
the most common and cover any use case. You'll inevitably have to guess these at some point in the future.
1MBd. 8N1 framing if you're talking to anything but another microcontroller on the same board. Using common values
like these makes it easier when you'll inevitably have to guess these at some point in the future ;)
2. If you're doing something simple and speed is not a particular concern, use a human-readable text-based protocol. Use
one command/reply per line, begin each line with some sort of command word and format numbers in hexadecimal. You get
bonus points if the device replies to unknown commands with a human-readable status message and prints a brief
protocol overview on boot.
one command/reply per line, begin each line with some sort of command word and format numbers in hexadecimal. Bonus
points for the device replying to unknown commands with a human-readable status message and printing a brief protocol
overview on boot.
3. If you're doing something even slightly nontrivial or need moderate throughput (>1k commands per second or >20 byte of
data per command) use a COBS-based protocol. If you don't have a better idea, go for an ``[target MAC][command
ID][command arguments]`` packet format for multidrop busses. For single-drop you may decide to drop the MAC address.
data per command) use a COBS-based protocol. A good starting point is a ``[target MAC][command ID][command
arguments]`` packet format for multidrop busses. For single-drop you may decide to drop the MAC address.
4. Always include some sort of "status" command that prints life stats such as VCC, temperature, serial framing errors
and uptime. You'll need some sort of ping command anyway and that one might as well do something useful.
and uptime. You'll need some sort of ping command anyway and that command might as well do something useful.
5. If at all possible, keep your protocol context-free across packets/lines. That is, a certain command should always be
self-contained, and no command should change the meaning of the next packet or line that is sent. This is really
self-contained, and no command should change the meaning of the next packet/line/command that is sent. This is really
important to allow for self-synchronization. If you really need to break up something into multiple commands, say you
want to set a large framebuffer in pieces, do it in a idempotent_ way: Instead of sending something like ``FRAMEBUFFER
INCOMING:\n[byte 0-16]\n[byte 17-32]\n[...]\nEND OF FRAME`` rather send ``FRAMEBUFFER DATA FOR OFFSET 0: [byte