User:CounterPillow/CM4 Carrier Board
Inspired by the DeskPi Super6C Carrier Board, but disappointed by its lack of UART, I decided to set out on drafting up specifications for my own "ideal" cluster carrier board.
Potential Use Cases
- Compute cluster
- distributed compilation
- MPI (e.g. OpenMPI)
- load balanced web applications
- container host (K8S, K3S)
- Map/Reduce (e.g. Hadoop)
- Virtual machine host
- Storage cluster
- GlusterFS
- Ceph
- For pine64.org itself, currently running out of space
- Distributed database server
- Replicated PostgreSQL
- Elasticsearch
- Redis
Also, generally for playing around with node redundancy and "serverless" architectures.
Suggested Specs
- Can carry at least 6 CM4 boards
- 6×4 cores = 24 Cortex-A55 cores, at <$600 for 6× 4GB SOQuartz + 6× 16GB eMMC + board, not a bad price proposition
- Built-in ethernet switch
- 8 port Gigabit Ethernet switch IC => 2 ports for upstream connectivity
- If affordable, a switch IC with 2.5Gbps ports could be interesting
- Power from either barrel jack or ATX PSU
- Mini-ITX form factor, so people can use PC cases
I/O
General
- One CM4 module is the "Monitor", gets special I/O.
- 2× Gigabit Ethernet from switch IC
- Fan controller with either temperature probe or I2C or SPI interface to monitor board
- Standard 12V PC fan header, at least 1, maybe more if controller allows for it
- Allow every board to be replaced while the rest are still turned on (hot-swapping)
For Each Board
- UART (important for bring-up and debugging!)
- I2C would be nice, but not required
- Maybe even as STEMMA (JST PH-4) or QWIIC (JST SH-4)? Make sure pin order is right if choosing that
- SPI would be nice, but also not required
- 5V/3.3V/GND for I2C and SPI, and debugging
- PCIe as M.2
- Regulators should be able to feed even power hungry M.2 NVMe SSDs
- SD card
- Consider putting SD card slot on top side of board for easier access
- User LED visible from back
- PWR and RST buttons
- SATA for each board over the USB 3.0 CM4 pins, if signal can be made reliable
- Example use case: NVMe SSD as fast cache drive with bcache, SATA HDD powered from ATX supply as backing storage.
- With SATA, potential for raw storage exceeding 100TB!
- Tie some of the GPIO to the reset pins of the other boards, so every board can reset every other board
- the CSI pins aren't needed, could use those
- make sure removing a board while the carrier board is powered on doesn't accidentally reset all other boards or something
For "Monitor" Board
- 1× HDMI
- 2× USB 2.0
Price
Around the $200 price point would be good. Consider that 6 × SOQuartz 4GB would be $300. Make it a more integrated alternative to 6 SOQuartz blades + a switch.
Store could also introduce a bundle consisting of carrier board, SOQuartz modules and eMMC modules for 5% off to promote fully populating the board.
Accessories
- Stamped metal ATX I/O shield
- DC 12V power brick with enough power to supply the board with all NVMe and CM4 slots populated
- Ignore SATA HDD power requirements, tell people to use an ATX power supply for that
- How much more complex would it be to support two redundant power supplies with automatic switchover in case one fails? Could be very useful for high availability
- Apparently redundant ATX power supplies like Fortron Twins Pro exist, so maybe this isn't needed
Challenges
Power Delivery
What PMU would be used for this? Each board needs to be able to turn itself off individually. Each M.2 slot will also need its own 3.3V regulator probably.
Unplugging and plugging back in a module while the board is powered up may present challenges. Maybe have one PMU per board so that they can be individually powered down?
Software
While RK3566 is in good shape, device tree and such for SOQuartz is not yet as of the time of writing (July 2022). Have at least a Debian image with a fully functional device tree for both monitor and client boards before public release.
Space Constraints
With additional I/O like SATA, we might run into space issues on the board layout. Keeping it a standard PC form factor would be great though.
Signalling
It's a large board with high speed signals running through it, e.g. PCIe or SATA. Make absolutely sure these work right and reliably before series production.
SoC and CM4 Constraints
The RK3566 and the CM4 form factor only allow for a single lane of PCIe 2.1. This means any NVMe SSD connected to it will be bottlenecked. The PCIe 3.0 on the RK3568 might fare better, and it would also allow for ECC (both cache and RAM). However I don't have any RK3568 hardware to verify this; if the bottleneck is in the interconnect then it doesn't really matter.