EK
EJ Kreinar
Thu, Mar 1, 2018 8:46 PM
Hi All,
I have an RFNoC setup with 5+ blocks contributing to a transmit path. I've
measured the latency to get a signal through this RFNoC graph is
non-negligible at my bit-rate -- I'm seeing several 10s of ms.
Before I go crazy consolidating RFNoC blocks, I decided to try shrinking
the STR_SINK_FIFOSIZE from 2^11 down to 2^9. I have a known MTU packet
limit that's something like 1024 bytes, which should be manageble with a
2^9 FIFO size. This could potentially reduce the latency due to input FIFOs
by 4x, which might be a good start. But, I hit a few problems here...
When setting the STR_SINK_FIFOSIZE to 9, I get the following error:
-- Assuming max packet size for 0/he360encoder_0
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in init
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(), 0,
self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
line 1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Input FIFO for block 0/qpskmodulator_0 is too
small (4 kiB) for packets of size 7 kiB
coming from block 0/he360encoder_0.
When setting the STR_SINK_FIFOSIZE to 10, I get a different error:
-- [0/he360encoder_0] source_block_ctrl_base::configure_flow_control_out()
buf_size_pkts==1
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in init
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(), 0,
self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
line 1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Invalid window size 1 for block
0/he360encoder_0. Window size must at least be 2.
Both errors trace back to uhd/host/lib/rfnoc/graph_impl.cc, where it
appears that the graph is attempting to read the FIFO size, compare to the
expected packet size, and intelligently throw errors when there's a
potential issue. I appreciate this feature, but I cant find how to set the
pkt_size argument... Passing a pkt_size stream_arg into the rfnoc block
constructor does not change the port_def to have a different pkt_size.
Any ideas? What am I missing? Do I manually need to access the tree to edit
the port_def parameter at "_root_path/ports/direction/port_index" for my
block?
Cheers,
EJ
Hi All,
I have an RFNoC setup with 5+ blocks contributing to a transmit path. I've
measured the latency to get a signal through this RFNoC graph is
non-negligible at my bit-rate -- I'm seeing several 10s of ms.
Before I go crazy consolidating RFNoC blocks, I decided to try shrinking
the STR_SINK_FIFOSIZE from 2^11 down to 2^9. I have a known MTU packet
limit that's something like 1024 *bytes*, which should be manageble with a
2^9 FIFO size. This could potentially reduce the latency due to input FIFOs
by 4x, which might be a good start. But, I hit a few problems here...
When setting the STR_SINK_FIFOSIZE to 9, I get the following error:
-- Assuming max packet size for 0/he360encoder_0
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in __init__
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(), 0,
self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
line 1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Input FIFO for block 0/qpskmodulator_0 is too
small (4 kiB) for packets of size 7 kiB
coming from block 0/he360encoder_0.
When setting the STR_SINK_FIFOSIZE to 10, I get a different error:
-- [0/he360encoder_0] source_block_ctrl_base::configure_flow_control_out()
buf_size_pkts==1
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in __init__
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(), 0,
self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
line 1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Invalid window size 1 for block
0/he360encoder_0. Window size must at least be 2.
Both errors trace back to uhd/host/lib/rfnoc/graph_impl.cc, where it
appears that the graph is attempting to read the FIFO size, compare to the
expected packet size, and intelligently throw errors when there's a
potential issue. I appreciate this feature, but I cant find how to set the
pkt_size argument... Passing a pkt_size stream_arg into the rfnoc block
constructor does not change the port_def to have a different pkt_size.
Any ideas? What am I missing? Do I manually need to access the tree to edit
the port_def parameter at "_root_path/ports/direction/port_index" for my
block?
Cheers,
EJ
NF
Nick Foster
Thu, Mar 1, 2018 9:44 PM
EJ,
Reducing the stream sink FIFO size won't really help here unless I'm
misunderstanding something. You need to send shorter packets, yes, but
changing the stream sink FIFO size won't change that. The FIFO doesn't have
to be full before it passes your packet along. That said, I don't think
I've had to specify packet size on packets from the PS on E310 before, so
I'm no help figuring that out. On X310 you can just set your MTU, and for
RX applications you can pass "spp=<xxx>" to the RX radio block. But I
haven't had to do it on E310 before.
But on another note, because whole packets are buffered before processing
by each block, your first instinct was good -- consolidating your blocks
will help more than just about anything else to reduce latency.
On E310, there's another approach that can help as well: force the CE clock
to run at full rate. By default, E310 runs the compute engine clock at the
minimum required rate to process samples -- i.e., the radio clock rate. But
you can force the compute engines to run at the full bus rate (64MHz) by
changing rfnoc_ce_auto_inst.v so that instead of:
wire ce_clk = radio_clk;
wire ce_rst = radio_rst;
...it says:
wire ce_clk = bus_clk;
wire ce_rst = bus_rst;
You'll consume more power, but you'll reduce latency some. It's also a
helpful trick to be able to run 1-input-N-output blocks in loopback, at
least up to something less than half the bus_clk rate.
Nick
On Thu, Mar 1, 2018 at 12:48 PM EJ Kreinar via USRP-users <
usrp-users@lists.ettus.com> wrote:
Hi All,
I have an RFNoC setup with 5+ blocks contributing to a transmit path. I've
measured the latency to get a signal through this RFNoC graph is
non-negligible at my bit-rate -- I'm seeing several 10s of ms.
Before I go crazy consolidating RFNoC blocks, I decided to try shrinking
the STR_SINK_FIFOSIZE from 2^11 down to 2^9. I have a known MTU packet
limit that's something like 1024 bytes, which should be manageble with a
2^9 FIFO size. This could potentially reduce the latency due to input FIFOs
by 4x, which might be a good start. But, I hit a few problems here...
When setting the STR_SINK_FIFOSIZE to 9, I get the following error:
-- Assuming max packet size for 0/he360encoder_0
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in init
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(), 0,
self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
line 1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Input FIFO for block 0/qpskmodulator_0 is too
small (4 kiB) for packets of size 7 kiB
coming from block 0/he360encoder_0.
When setting the STR_SINK_FIFOSIZE to 10, I get a different error:
-- [0/he360encoder_0] source_block_ctrl_base::configure_flow_control_out()
buf_size_pkts==1
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in init
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(), 0,
self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
line 1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Invalid window size 1 for block
0/he360encoder_0. Window size must at least be 2.
Both errors trace back to uhd/host/lib/rfnoc/graph_impl.cc, where it
appears that the graph is attempting to read the FIFO size, compare to the
expected packet size, and intelligently throw errors when there's a
potential issue. I appreciate this feature, but I cant find how to set the
pkt_size argument... Passing a pkt_size stream_arg into the rfnoc block
constructor does not change the port_def to have a different pkt_size.
Any ideas? What am I missing? Do I manually need to access the tree to
edit the port_def parameter at "_root_path/ports/direction/port_index" for
my block?
Cheers,
EJ
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
EJ,
Reducing the stream sink FIFO size won't really help here unless I'm
misunderstanding something. You need to send shorter packets, yes, but
changing the stream sink FIFO size won't change that. The FIFO doesn't have
to be full before it passes your packet along. That said, I don't think
I've had to specify packet size on packets from the PS on E310 before, so
I'm no help figuring that out. On X310 you can just set your MTU, and for
RX applications you can pass "spp=<xxx>" to the RX radio block. But I
haven't had to do it on E310 before.
But on another note, because whole packets are buffered before processing
by each block, your first instinct was good -- consolidating your blocks
will help more than just about anything else to reduce latency.
On E310, there's another approach that can help as well: force the CE clock
to run at full rate. By default, E310 runs the compute engine clock at the
minimum required rate to process samples -- i.e., the radio clock rate. But
you can force the compute engines to run at the full bus rate (64MHz) by
changing rfnoc_ce_auto_inst.v so that instead of:
wire ce_clk = radio_clk;
wire ce_rst = radio_rst;
...it says:
wire ce_clk = bus_clk;
wire ce_rst = bus_rst;
You'll consume more power, but you'll reduce latency some. It's also a
helpful trick to be able to run 1-input-N-output blocks in loopback, at
least up to something less than half the bus_clk rate.
Nick
On Thu, Mar 1, 2018 at 12:48 PM EJ Kreinar via USRP-users <
usrp-users@lists.ettus.com> wrote:
> Hi All,
>
> I have an RFNoC setup with 5+ blocks contributing to a transmit path. I've
> measured the latency to get a signal through this RFNoC graph is
> non-negligible at my bit-rate -- I'm seeing several 10s of ms.
>
> Before I go crazy consolidating RFNoC blocks, I decided to try shrinking
> the STR_SINK_FIFOSIZE from 2^11 down to 2^9. I have a known MTU packet
> limit that's something like 1024 *bytes*, which should be manageble with a
> 2^9 FIFO size. This could potentially reduce the latency due to input FIFOs
> by 4x, which might be a good start. But, I hit a few problems here...
>
> When setting the STR_SINK_FIFOSIZE to 9, I get the following error:
>
> -- Assuming max packet size for 0/he360encoder_0
> Traceback (most recent call last):
> File "./e310_ground_modem.py", line 330, in <module>
> main()
> File "./e310_ground_modem.py", line 319, in main
> tb = top_block_cls(bitstream=options.bitstream,
> clk_rate=options.clk_rate, conv=options.conv, device=options.device,
> freq=options.freq, hdlc_enable=options.hdlc_enable,
> lo_offset=options.lo_offset, probeconsole=options.probeconsole,
> probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
> rxaddr=options.rxaddr, tx_gain=options.tx_gain)
> File "./e310_ground_modem.py", line 169, in __init__
> self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(), 0,
> self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
> File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
> line 1595, in connect
> return _ettus_swig.device3_sptr_connect(self, *args)
> RuntimeError: RuntimeError: Input FIFO for block 0/qpskmodulator_0 is too
> small (4 kiB) for packets of size 7 kiB
> coming from block 0/he360encoder_0.
>
>
> When setting the STR_SINK_FIFOSIZE to 10, I get a different error:
>
> -- [0/he360encoder_0] source_block_ctrl_base::configure_flow_control_out()
> buf_size_pkts==1
> Traceback (most recent call last):
> File "./e310_ground_modem.py", line 330, in <module>
> main()
> File "./e310_ground_modem.py", line 319, in main
> tb = top_block_cls(bitstream=options.bitstream,
> clk_rate=options.clk_rate, conv=options.conv, device=options.device,
> freq=options.freq, hdlc_enable=options.hdlc_enable,
> lo_offset=options.lo_offset, probeconsole=options.probeconsole,
> probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
> rxaddr=options.rxaddr, tx_gain=options.tx_gain)
> File "./e310_ground_modem.py", line 169, in __init__
> self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(), 0,
> self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
> File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
> line 1595, in connect
> return _ettus_swig.device3_sptr_connect(self, *args)
> RuntimeError: RuntimeError: Invalid window size 1 for block
> 0/he360encoder_0. Window size must at least be 2.
>
>
> Both errors trace back to uhd/host/lib/rfnoc/graph_impl.cc, where it
> appears that the graph is attempting to read the FIFO size, compare to the
> expected packet size, and intelligently throw errors when there's a
> potential issue. I appreciate this feature, but I cant find how to set the
> pkt_size argument... Passing a pkt_size stream_arg into the rfnoc block
> constructor does not change the port_def to have a different pkt_size.
>
> Any ideas? What am I missing? Do I manually need to access the tree to
> edit the port_def parameter at "_root_path/ports/direction/port_index" for
> my block?
>
> Cheers,
> EJ
> _______________________________________________
> USRP-users mailing list
> USRP-users@lists.ettus.com
> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>
EK
EJ Kreinar
Thu, Mar 1, 2018 10:18 PM
Hey Nick,
I realize I've forgotten a key piece of information: My first block in the
RFNoC transmit chain accepts bursty data packets from the host, and encodes
the data using HDLC formatting, including inserting fill frames -- the goal
here is to generate a continuous transmit output without fully saturating
the PS->PL transport, keeping the PS load down, etc. FPGA fill-frame
creation is throttled via axi-stream backpressure from the radio endpoint,
which means the HDLC fill-frames will fill up all FIFO buffers of
downstream blocks. So, the latency of a single data packet to travel in the
FPGA from the PS to Radio output is largely driven by the size of the FIFOs
in the RFNoC path, and the output sample rate of the radio.
I agree consolidation is the "right" answer for best performance. But, I'm
trying to use a few off-the-shelf RFNoC blocks to reduce development time,
so if possible I'd prefer to tweak the exposed parameters rather than go
through the full process for a new FPGA block and software interface.
The STR_SINK_FIFOSIZE looks like a parameter that ought to be user
configurable -- but the limitation (from what I can tell) is that there
doesnt appear to be an easy way to change the pkt_size parameter on the
block port to stop the graph_impl.cc from erroring....
On X310 you can just set your MTU, and for RX applications you can pass
"spp=<xxx>" to the RX radio block. But I haven't had to do it on E310 before
On the E310 (as on the X310), the spp parameter does influence packet size,
but the relevant port_def "pkt_size" parameter seems to be a different
beast which is not updated to reflect the estimated spp from the
stream_args.. Perhaps it should?
Thanks!
EJ
On Thu, Mar 1, 2018 at 4:44 PM, Nick Foster bistromath@gmail.com wrote:
EJ,
Reducing the stream sink FIFO size won't really help here unless I'm
misunderstanding something. You need to send shorter packets, yes, but
changing the stream sink FIFO size won't change that. The FIFO doesn't have
to be full before it passes your packet along. That said, I don't think
I've had to specify packet size on packets from the PS on E310 before, so
I'm no help figuring that out. On X310 you can just set your MTU, and for
RX applications you can pass "spp=<xxx>" to the RX radio block. But I
haven't had to do it on E310 before.
But on another note, because whole packets are buffered before processing
by each block, your first instinct was good -- consolidating your blocks
will help more than just about anything else to reduce latency.
On E310, there's another approach that can help as well: force the CE
clock to run at full rate. By default, E310 runs the compute engine clock
at the minimum required rate to process samples -- i.e., the radio clock
rate. But you can force the compute engines to run at the full bus rate
(64MHz) by changing rfnoc_ce_auto_inst.v so that instead of:
wire ce_clk = radio_clk;
wire ce_rst = radio_rst;
...it says:
wire ce_clk = bus_clk;
wire ce_rst = bus_rst;
You'll consume more power, but you'll reduce latency some. It's also a
helpful trick to be able to run 1-input-N-output blocks in loopback, at
least up to something less than half the bus_clk rate.
Nick
On Thu, Mar 1, 2018 at 12:48 PM EJ Kreinar via USRP-users <
usrp-users@lists.ettus.com> wrote:
Hi All,
I have an RFNoC setup with 5+ blocks contributing to a transmit path.
I've measured the latency to get a signal through this RFNoC graph is
non-negligible at my bit-rate -- I'm seeing several 10s of ms.
Before I go crazy consolidating RFNoC blocks, I decided to try shrinking
the STR_SINK_FIFOSIZE from 2^11 down to 2^9. I have a known MTU packet
limit that's something like 1024 bytes, which should be manageble with a
2^9 FIFO size. This could potentially reduce the latency due to input FIFOs
by 4x, which might be a good start. But, I hit a few problems here...
When setting the STR_SINK_FIFOSIZE to 9, I get the following error:
-- Assuming max packet size for 0/he360encoder_0
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in init
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
line 1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Input FIFO for block 0/qpskmodulator_0 is too
small (4 kiB) for packets of size 7 kiB
coming from block 0/he360encoder_0.
When setting the STR_SINK_FIFOSIZE to 10, I get a different error:
-- [0/he360encoder_0] source_block_ctrl_base::configure_flow_control_out()
buf_size_pkts==1
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in init
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
line 1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Invalid window size 1 for block
0/he360encoder_0. Window size must at least be 2.
Both errors trace back to uhd/host/lib/rfnoc/graph_impl.cc, where it
appears that the graph is attempting to read the FIFO size, compare to the
expected packet size, and intelligently throw errors when there's a
potential issue. I appreciate this feature, but I cant find how to set the
pkt_size argument... Passing a pkt_size stream_arg into the rfnoc block
constructor does not change the port_def to have a different pkt_size.
Any ideas? What am I missing? Do I manually need to access the tree to
edit the port_def parameter at "_root_path/ports/direction/port_index"
for my block?
Cheers,
EJ
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
Hey Nick,
I realize I've forgotten a key piece of information: My first block in the
RFNoC transmit chain accepts bursty data packets from the host, and encodes
the data using HDLC formatting, including inserting fill frames -- the goal
here is to generate a continuous transmit output without fully saturating
the PS->PL transport, keeping the PS load down, etc. FPGA fill-frame
creation is throttled via axi-stream backpressure from the radio endpoint,
which means the HDLC fill-frames will fill up all FIFO buffers of
downstream blocks. So, the latency of a single data packet to travel in the
FPGA from the PS to Radio output is largely driven by the size of the FIFOs
in the RFNoC path, and the output sample rate of the radio.
I agree consolidation is the "right" answer for best performance. But, I'm
trying to use a few off-the-shelf RFNoC blocks to reduce development time,
so if possible I'd prefer to tweak the exposed parameters rather than go
through the full process for a new FPGA block and software interface.
The STR_SINK_FIFOSIZE looks like a parameter that ought to be user
configurable -- but the limitation (from what I can tell) is that there
doesnt appear to be an easy way to change the pkt_size parameter on the
block port to stop the graph_impl.cc from erroring....
> On X310 you can just set your MTU, and for RX applications you can pass
"spp=<xxx>" to the RX radio block. But I haven't had to do it on E310 before
On the E310 (as on the X310), the spp parameter does influence packet size,
but the relevant port_def "pkt_size" parameter seems to be a different
beast which is not updated to reflect the estimated spp from the
stream_args.. Perhaps it should?
Thanks!
EJ
On Thu, Mar 1, 2018 at 4:44 PM, Nick Foster <bistromath@gmail.com> wrote:
> EJ,
>
> Reducing the stream sink FIFO size won't really help here unless I'm
> misunderstanding something. You need to send shorter packets, yes, but
> changing the stream sink FIFO size won't change that. The FIFO doesn't have
> to be full before it passes your packet along. That said, I don't think
> I've had to specify packet size on packets from the PS on E310 before, so
> I'm no help figuring that out. On X310 you can just set your MTU, and for
> RX applications you can pass "spp=<xxx>" to the RX radio block. But I
> haven't had to do it on E310 before.
>
> But on another note, because whole packets are buffered before processing
> by each block, your first instinct was good -- consolidating your blocks
> will help more than just about anything else to reduce latency.
>
> On E310, there's another approach that can help as well: force the CE
> clock to run at full rate. By default, E310 runs the compute engine clock
> at the minimum required rate to process samples -- i.e., the radio clock
> rate. But you can force the compute engines to run at the full bus rate
> (64MHz) by changing rfnoc_ce_auto_inst.v so that instead of:
>
> wire ce_clk = radio_clk;
> wire ce_rst = radio_rst;
>
> ...it says:
>
> wire ce_clk = bus_clk;
> wire ce_rst = bus_rst;
>
> You'll consume more power, but you'll reduce latency some. It's also a
> helpful trick to be able to run 1-input-N-output blocks in loopback, at
> least up to something less than half the bus_clk rate.
>
> Nick
>
> On Thu, Mar 1, 2018 at 12:48 PM EJ Kreinar via USRP-users <
> usrp-users@lists.ettus.com> wrote:
>
>> Hi All,
>>
>> I have an RFNoC setup with 5+ blocks contributing to a transmit path.
>> I've measured the latency to get a signal through this RFNoC graph is
>> non-negligible at my bit-rate -- I'm seeing several 10s of ms.
>>
>> Before I go crazy consolidating RFNoC blocks, I decided to try shrinking
>> the STR_SINK_FIFOSIZE from 2^11 down to 2^9. I have a known MTU packet
>> limit that's something like 1024 *bytes*, which should be manageble with a
>> 2^9 FIFO size. This could potentially reduce the latency due to input FIFOs
>> by 4x, which might be a good start. But, I hit a few problems here...
>>
>> When setting the STR_SINK_FIFOSIZE to 9, I get the following error:
>>
>> -- Assuming max packet size for 0/he360encoder_0
>> Traceback (most recent call last):
>> File "./e310_ground_modem.py", line 330, in <module>
>> main()
>> File "./e310_ground_modem.py", line 319, in main
>> tb = top_block_cls(bitstream=options.bitstream,
>> clk_rate=options.clk_rate, conv=options.conv, device=options.device,
>> freq=options.freq, hdlc_enable=options.hdlc_enable,
>> lo_offset=options.lo_offset, probeconsole=options.probeconsole,
>> probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
>> rxaddr=options.rxaddr, tx_gain=options.tx_gain)
>> File "./e310_ground_modem.py", line 169, in __init__
>> self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
>> 0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
>> File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
>> line 1595, in connect
>> return _ettus_swig.device3_sptr_connect(self, *args)
>> RuntimeError: RuntimeError: Input FIFO for block 0/qpskmodulator_0 is too
>> small (4 kiB) for packets of size 7 kiB
>> coming from block 0/he360encoder_0.
>>
>>
>> When setting the STR_SINK_FIFOSIZE to 10, I get a different error:
>>
>> -- [0/he360encoder_0] source_block_ctrl_base::configure_flow_control_out()
>> buf_size_pkts==1
>> Traceback (most recent call last):
>> File "./e310_ground_modem.py", line 330, in <module>
>> main()
>> File "./e310_ground_modem.py", line 319, in main
>> tb = top_block_cls(bitstream=options.bitstream,
>> clk_rate=options.clk_rate, conv=options.conv, device=options.device,
>> freq=options.freq, hdlc_enable=options.hdlc_enable,
>> lo_offset=options.lo_offset, probeconsole=options.probeconsole,
>> probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
>> rxaddr=options.rxaddr, tx_gain=options.tx_gain)
>> File "./e310_ground_modem.py", line 169, in __init__
>> self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
>> 0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
>> File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
>> line 1595, in connect
>> return _ettus_swig.device3_sptr_connect(self, *args)
>> RuntimeError: RuntimeError: Invalid window size 1 for block
>> 0/he360encoder_0. Window size must at least be 2.
>>
>>
>> Both errors trace back to uhd/host/lib/rfnoc/graph_impl.cc, where it
>> appears that the graph is attempting to read the FIFO size, compare to the
>> expected packet size, and intelligently throw errors when there's a
>> potential issue. I appreciate this feature, but I cant find how to set the
>> pkt_size argument... Passing a pkt_size stream_arg into the rfnoc block
>> constructor does not change the port_def to have a different pkt_size.
>>
>> Any ideas? What am I missing? Do I manually need to access the tree to
>> edit the port_def parameter at "_root_path/ports/direction/port_index"
>> for my block?
>>
>> Cheers,
>> EJ
>> _______________________________________________
>> USRP-users mailing list
>> USRP-users@lists.ettus.com
>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>>
>
NF
Nick Foster
Thu, Mar 1, 2018 11:40 PM
Oh! That makes complete sense. The dreaded fill-frame dilemma.
Stop me if this is nuts, but... I wonder if, instead of using AXI
backpressure to stop fill frame creation, you could set up a short custom
FIFO just upstream of the TX radio and pull an "almost empty" watermark out
of it. The custom FIFO could be made into an RFNoC block, if the added
propagation latency of the framer/deframer isn't too much for you. With
some tuning, that could prompt just-in-time creation of a fill frame by the
modulator -- every rising edge of the "almost empty" watermark signals the
creation of one fill frame. The connection back to the modulator could come
out of band (i.e., made in rfnoc_ce_auto_inst.v as extra interfaces to the
blocks) if you didn't want to muck about with the command port stuff in
RFNoC. The FIFO could be made just long enough so that when it's not full,
it's time to generate a fill frame, and use !full as the watermark.
This wouldn't help with the propagation latency through the RFNoC
flowgraph, but it would keep the FIFOs largely empty (up to the TX radio
FIFO itself, unless you put your watermarking inside the TX Radio block).
RFNoC is a crossbar, so I think it should all be deterministic enough
that you wouldn't have to keep too much in flight at any given time. But I
haven't tried operating it in this way.
It's a little Rube Goldberg, and it's a little antithetical to how AXI
streams are supposed to work in the first place, but it might save you from
having to reengineer a bunch of things to get your latency down.
Nick
On Thu, Mar 1, 2018 at 2:18 PM EJ Kreinar ejkreinar@gmail.com wrote:
Hey Nick,
I realize I've forgotten a key piece of information: My first block in the
RFNoC transmit chain accepts bursty data packets from the host, and encodes
the data using HDLC formatting, including inserting fill frames -- the goal
here is to generate a continuous transmit output without fully saturating
the PS->PL transport, keeping the PS load down, etc. FPGA fill-frame
creation is throttled via axi-stream backpressure from the radio endpoint,
which means the HDLC fill-frames will fill up all FIFO buffers of
downstream blocks. So, the latency of a single data packet to travel in the
FPGA from the PS to Radio output is largely driven by the size of the FIFOs
in the RFNoC path, and the output sample rate of the radio.
I agree consolidation is the "right" answer for best performance. But, I'm
trying to use a few off-the-shelf RFNoC blocks to reduce development time,
so if possible I'd prefer to tweak the exposed parameters rather than go
through the full process for a new FPGA block and software interface.
The STR_SINK_FIFOSIZE looks like a parameter that ought to be user
configurable -- but the limitation (from what I can tell) is that there
doesnt appear to be an easy way to change the pkt_size parameter on the
block port to stop the graph_impl.cc from erroring....
On X310 you can just set your MTU, and for RX applications you can pass
"spp=<xxx>" to the RX radio block. But I haven't had to do it on E310 before
On the E310 (as on the X310), the spp parameter does influence packet
size, but the relevant port_def "pkt_size" parameter seems to be a
different beast which is not updated to reflect the estimated spp from the
stream_args.. Perhaps it should?
Thanks!
EJ
On Thu, Mar 1, 2018 at 4:44 PM, Nick Foster bistromath@gmail.com wrote:
EJ,
Reducing the stream sink FIFO size won't really help here unless I'm
misunderstanding something. You need to send shorter packets, yes, but
changing the stream sink FIFO size won't change that. The FIFO doesn't have
to be full before it passes your packet along. That said, I don't think
I've had to specify packet size on packets from the PS on E310 before, so
I'm no help figuring that out. On X310 you can just set your MTU, and for
RX applications you can pass "spp=<xxx>" to the RX radio block. But I
haven't had to do it on E310 before.
But on another note, because whole packets are buffered before processing
by each block, your first instinct was good -- consolidating your blocks
will help more than just about anything else to reduce latency.
On E310, there's another approach that can help as well: force the CE
clock to run at full rate. By default, E310 runs the compute engine clock
at the minimum required rate to process samples -- i.e., the radio clock
rate. But you can force the compute engines to run at the full bus rate
(64MHz) by changing rfnoc_ce_auto_inst.v so that instead of:
wire ce_clk = radio_clk;
wire ce_rst = radio_rst;
...it says:
wire ce_clk = bus_clk;
wire ce_rst = bus_rst;
You'll consume more power, but you'll reduce latency some. It's also a
helpful trick to be able to run 1-input-N-output blocks in loopback, at
least up to something less than half the bus_clk rate.
Nick
On Thu, Mar 1, 2018 at 12:48 PM EJ Kreinar via USRP-users <
usrp-users@lists.ettus.com> wrote:
Hi All,
I have an RFNoC setup with 5+ blocks contributing to a transmit path.
I've measured the latency to get a signal through this RFNoC graph is
non-negligible at my bit-rate -- I'm seeing several 10s of ms.
Before I go crazy consolidating RFNoC blocks, I decided to try shrinking
the STR_SINK_FIFOSIZE from 2^11 down to 2^9. I have a known MTU packet
limit that's something like 1024 bytes, which should be manageble with a
2^9 FIFO size. This could potentially reduce the latency due to input FIFOs
by 4x, which might be a good start. But, I hit a few problems here...
When setting the STR_SINK_FIFOSIZE to 9, I get the following error:
-- Assuming max packet size for 0/he360encoder_0
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in init
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File
"/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py", line
1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Input FIFO for block 0/qpskmodulator_0 is
too small (4 kiB) for packets of size 7 kiB
coming from block 0/he360encoder_0.
When setting the STR_SINK_FIFOSIZE to 10, I get a different error:
-- [0/he360encoder_0]
source_block_ctrl_base::configure_flow_control_out() buf_size_pkts==1
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in init
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File
"/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py", line
1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Invalid window size 1 for block
0/he360encoder_0. Window size must at least be 2.
Both errors trace back to uhd/host/lib/rfnoc/graph_impl.cc, where it
appears that the graph is attempting to read the FIFO size, compare to the
expected packet size, and intelligently throw errors when there's a
potential issue. I appreciate this feature, but I cant find how to set the
pkt_size argument... Passing a pkt_size stream_arg into the rfnoc block
constructor does not change the port_def to have a different pkt_size.
Any ideas? What am I missing? Do I manually need to access the tree to
edit the port_def parameter at "_root_path/ports/direction/port_index" for
my block?
Cheers,
EJ
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
Oh! That makes complete sense. The dreaded fill-frame dilemma.
Stop me if this is nuts, but... I wonder if, instead of using AXI
backpressure to stop fill frame creation, you could set up a short custom
FIFO just upstream of the TX radio and pull an "almost empty" watermark out
of it. The custom FIFO could be made into an RFNoC block, if the added
propagation latency of the framer/deframer isn't too much for you. With
some tuning, that could prompt just-in-time creation of a fill frame by the
modulator -- every rising edge of the "almost empty" watermark signals the
creation of one fill frame. The connection back to the modulator could come
out of band (i.e., made in rfnoc_ce_auto_inst.v as extra interfaces to the
blocks) if you didn't want to muck about with the command port stuff in
RFNoC. The FIFO could be made just long enough so that when it's not full,
it's time to generate a fill frame, and use !full as the watermark.
This wouldn't help with the propagation latency through the RFNoC
flowgraph, but it would keep the FIFOs largely empty (up to the TX radio
FIFO itself, unless you put your watermarking inside the TX Radio block).
RFNoC is a crossbar, so I *think* it should all be deterministic enough
that you wouldn't have to keep too much in flight at any given time. But I
haven't tried operating it in this way.
It's a little Rube Goldberg, and it's a little antithetical to how AXI
streams are supposed to work in the first place, but it might save you from
having to reengineer a bunch of things to get your latency down.
Nick
On Thu, Mar 1, 2018 at 2:18 PM EJ Kreinar <ejkreinar@gmail.com> wrote:
> Hey Nick,
>
> I realize I've forgotten a key piece of information: My first block in the
> RFNoC transmit chain accepts bursty data packets from the host, and encodes
> the data using HDLC formatting, including inserting fill frames -- the goal
> here is to generate a continuous transmit output without fully saturating
> the PS->PL transport, keeping the PS load down, etc. FPGA fill-frame
> creation is throttled via axi-stream backpressure from the radio endpoint,
> which means the HDLC fill-frames will fill up all FIFO buffers of
> downstream blocks. So, the latency of a single data packet to travel in the
> FPGA from the PS to Radio output is largely driven by the size of the FIFOs
> in the RFNoC path, and the output sample rate of the radio.
>
> I agree consolidation is the "right" answer for best performance. But, I'm
> trying to use a few off-the-shelf RFNoC blocks to reduce development time,
> so if possible I'd prefer to tweak the exposed parameters rather than go
> through the full process for a new FPGA block and software interface.
>
> The STR_SINK_FIFOSIZE looks like a parameter that ought to be user
> configurable -- but the limitation (from what I can tell) is that there
> doesnt appear to be an easy way to change the pkt_size parameter on the
> block port to stop the graph_impl.cc from erroring....
>
> > On X310 you can just set your MTU, and for RX applications you can pass
> "spp=<xxx>" to the RX radio block. But I haven't had to do it on E310 before
>
> On the E310 (as on the X310), the spp parameter does influence packet
> size, but the relevant port_def "pkt_size" parameter seems to be a
> different beast which is not updated to reflect the estimated spp from the
> stream_args.. Perhaps it should?
>
> Thanks!
> EJ
>
>
>
>
>
> On Thu, Mar 1, 2018 at 4:44 PM, Nick Foster <bistromath@gmail.com> wrote:
>
>> EJ,
>>
>> Reducing the stream sink FIFO size won't really help here unless I'm
>> misunderstanding something. You need to send shorter packets, yes, but
>> changing the stream sink FIFO size won't change that. The FIFO doesn't have
>> to be full before it passes your packet along. That said, I don't think
>> I've had to specify packet size on packets from the PS on E310 before, so
>> I'm no help figuring that out. On X310 you can just set your MTU, and for
>> RX applications you can pass "spp=<xxx>" to the RX radio block. But I
>> haven't had to do it on E310 before.
>>
>> But on another note, because whole packets are buffered before processing
>> by each block, your first instinct was good -- consolidating your blocks
>> will help more than just about anything else to reduce latency.
>>
>> On E310, there's another approach that can help as well: force the CE
>> clock to run at full rate. By default, E310 runs the compute engine clock
>> at the minimum required rate to process samples -- i.e., the radio clock
>> rate. But you can force the compute engines to run at the full bus rate
>> (64MHz) by changing rfnoc_ce_auto_inst.v so that instead of:
>>
>> wire ce_clk = radio_clk;
>> wire ce_rst = radio_rst;
>>
>> ...it says:
>>
>> wire ce_clk = bus_clk;
>> wire ce_rst = bus_rst;
>>
>> You'll consume more power, but you'll reduce latency some. It's also a
>> helpful trick to be able to run 1-input-N-output blocks in loopback, at
>> least up to something less than half the bus_clk rate.
>>
>> Nick
>>
>> On Thu, Mar 1, 2018 at 12:48 PM EJ Kreinar via USRP-users <
>> usrp-users@lists.ettus.com> wrote:
>>
>>> Hi All,
>>>
>>> I have an RFNoC setup with 5+ blocks contributing to a transmit path.
>>> I've measured the latency to get a signal through this RFNoC graph is
>>> non-negligible at my bit-rate -- I'm seeing several 10s of ms.
>>>
>>> Before I go crazy consolidating RFNoC blocks, I decided to try shrinking
>>> the STR_SINK_FIFOSIZE from 2^11 down to 2^9. I have a known MTU packet
>>> limit that's something like 1024 *bytes*, which should be manageble with a
>>> 2^9 FIFO size. This could potentially reduce the latency due to input FIFOs
>>> by 4x, which might be a good start. But, I hit a few problems here...
>>>
>>> When setting the STR_SINK_FIFOSIZE to 9, I get the following error:
>>>
>>> -- Assuming max packet size for 0/he360encoder_0
>>> Traceback (most recent call last):
>>> File "./e310_ground_modem.py", line 330, in <module>
>>> main()
>>> File "./e310_ground_modem.py", line 319, in main
>>> tb = top_block_cls(bitstream=options.bitstream,
>>> clk_rate=options.clk_rate, conv=options.conv, device=options.device,
>>> freq=options.freq, hdlc_enable=options.hdlc_enable,
>>> lo_offset=options.lo_offset, probeconsole=options.probeconsole,
>>> probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
>>> rxaddr=options.rxaddr, tx_gain=options.tx_gain)
>>> File "./e310_ground_modem.py", line 169, in __init__
>>> self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
>>> 0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
>>> File
>>> "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py", line
>>> 1595, in connect
>>> return _ettus_swig.device3_sptr_connect(self, *args)
>>> RuntimeError: RuntimeError: Input FIFO for block 0/qpskmodulator_0 is
>>> too small (4 kiB) for packets of size 7 kiB
>>> coming from block 0/he360encoder_0.
>>>
>>>
>>> When setting the STR_SINK_FIFOSIZE to 10, I get a different error:
>>>
>>> -- [0/he360encoder_0]
>>> source_block_ctrl_base::configure_flow_control_out() buf_size_pkts==1
>>> Traceback (most recent call last):
>>> File "./e310_ground_modem.py", line 330, in <module>
>>> main()
>>> File "./e310_ground_modem.py", line 319, in main
>>> tb = top_block_cls(bitstream=options.bitstream,
>>> clk_rate=options.clk_rate, conv=options.conv, device=options.device,
>>> freq=options.freq, hdlc_enable=options.hdlc_enable,
>>> lo_offset=options.lo_offset, probeconsole=options.probeconsole,
>>> probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
>>> rxaddr=options.rxaddr, tx_gain=options.tx_gain)
>>> File "./e310_ground_modem.py", line 169, in __init__
>>> self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
>>> 0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
>>> File
>>> "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py", line
>>> 1595, in connect
>>> return _ettus_swig.device3_sptr_connect(self, *args)
>>> RuntimeError: RuntimeError: Invalid window size 1 for block
>>> 0/he360encoder_0. Window size must at least be 2.
>>>
>>>
>>> Both errors trace back to uhd/host/lib/rfnoc/graph_impl.cc, where it
>>> appears that the graph is attempting to read the FIFO size, compare to the
>>> expected packet size, and intelligently throw errors when there's a
>>> potential issue. I appreciate this feature, but I cant find how to set the
>>> pkt_size argument... Passing a pkt_size stream_arg into the rfnoc block
>>> constructor does not change the port_def to have a different pkt_size.
>>>
>>> Any ideas? What am I missing? Do I manually need to access the tree to
>>> edit the port_def parameter at "_root_path/ports/direction/port_index" for
>>> my block?
>>>
>>> Cheers,
>>> EJ
>>> _______________________________________________
>>> USRP-users mailing list
>>> USRP-users@lists.ettus.com
>>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>>>
>>
>
EK
EJ Kreinar
Fri, Mar 2, 2018 12:22 AM
Aha, that's a pretty wild idea! I like it! It would end up essentially
becoming a tuning between the amount of fill-frames output to keep the
radio saturated while keeping the FIFOs mostly empty. Still, I'm not sure
if I'll go down this "back channel" watermark approach now, or just
consolidate blocks... It's one new block either way, eh? :D
This wouldn't help with the propagation latency through the RFNoC
flowgraph
Agreed... I'm really not worried about the latency through the crossbar,
though. I fully expect a few milliseconds at most through the crossbar, but
I'm literally getting like 30-40 ms latency attributed to this TX path
right now.
Of course, I still feel like the UHD software should support what I'm
trying to do with the STR_SINK_FIFOSIZE, and it may be applicable in other
situations perhaps, too...
EJ
On Thu, Mar 1, 2018 at 6:40 PM, Nick Foster bistromath@gmail.com wrote:
Oh! That makes complete sense. The dreaded fill-frame dilemma.
Stop me if this is nuts, but... I wonder if, instead of using AXI
backpressure to stop fill frame creation, you could set up a short custom
FIFO just upstream of the TX radio and pull an "almost empty" watermark out
of it. The custom FIFO could be made into an RFNoC block, if the added
propagation latency of the framer/deframer isn't too much for you. With
some tuning, that could prompt just-in-time creation of a fill frame by the
modulator -- every rising edge of the "almost empty" watermark signals the
creation of one fill frame. The connection back to the modulator could come
out of band (i.e., made in rfnoc_ce_auto_inst.v as extra interfaces to the
blocks) if you didn't want to muck about with the command port stuff in
RFNoC. The FIFO could be made just long enough so that when it's not full,
it's time to generate a fill frame, and use !full as the watermark.
This wouldn't help with the propagation latency through the RFNoC
flowgraph, but it would keep the FIFOs largely empty (up to the TX radio
FIFO itself, unless you put your watermarking inside the TX Radio block).
RFNoC is a crossbar, so I think it should all be deterministic enough
that you wouldn't have to keep too much in flight at any given time. But I
haven't tried operating it in this way.
It's a little Rube Goldberg, and it's a little antithetical to how AXI
streams are supposed to work in the first place, but it might save you from
having to reengineer a bunch of things to get your latency down.
Nick
On Thu, Mar 1, 2018 at 2:18 PM EJ Kreinar ejkreinar@gmail.com wrote:
Hey Nick,
I realize I've forgotten a key piece of information: My first block in
the RFNoC transmit chain accepts bursty data packets from the host, and
encodes the data using HDLC formatting, including inserting fill frames --
the goal here is to generate a continuous transmit output without fully
saturating the PS->PL transport, keeping the PS load down, etc. FPGA
fill-frame creation is throttled via axi-stream backpressure from the radio
endpoint, which means the HDLC fill-frames will fill up all FIFO buffers of
downstream blocks. So, the latency of a single data packet to travel in the
FPGA from the PS to Radio output is largely driven by the size of the FIFOs
in the RFNoC path, and the output sample rate of the radio.
I agree consolidation is the "right" answer for best performance. But,
I'm trying to use a few off-the-shelf RFNoC blocks to reduce development
time, so if possible I'd prefer to tweak the exposed parameters rather than
go through the full process for a new FPGA block and software interface.
The STR_SINK_FIFOSIZE looks like a parameter that ought to be user
configurable -- but the limitation (from what I can tell) is that there
doesnt appear to be an easy way to change the pkt_size parameter on the
block port to stop the graph_impl.cc from erroring....
On X310 you can just set your MTU, and for RX applications you can
pass "spp=<xxx>" to the RX radio block. But I haven't had to do it on E310
before
On the E310 (as on the X310), the spp parameter does influence packet
size, but the relevant port_def "pkt_size" parameter seems to be a
different beast which is not updated to reflect the estimated spp from the
stream_args.. Perhaps it should?
Thanks!
EJ
On Thu, Mar 1, 2018 at 4:44 PM, Nick Foster bistromath@gmail.com wrote:
EJ,
Reducing the stream sink FIFO size won't really help here unless I'm
misunderstanding something. You need to send shorter packets, yes, but
changing the stream sink FIFO size won't change that. The FIFO doesn't have
to be full before it passes your packet along. That said, I don't think
I've had to specify packet size on packets from the PS on E310 before, so
I'm no help figuring that out. On X310 you can just set your MTU, and for
RX applications you can pass "spp=<xxx>" to the RX radio block. But I
haven't had to do it on E310 before.
But on another note, because whole packets are buffered before
processing by each block, your first instinct was good -- consolidating
your blocks will help more than just about anything else to reduce latency.
On E310, there's another approach that can help as well: force the CE
clock to run at full rate. By default, E310 runs the compute engine clock
at the minimum required rate to process samples -- i.e., the radio clock
rate. But you can force the compute engines to run at the full bus rate
(64MHz) by changing rfnoc_ce_auto_inst.v so that instead of:
wire ce_clk = radio_clk;
wire ce_rst = radio_rst;
...it says:
wire ce_clk = bus_clk;
wire ce_rst = bus_rst;
You'll consume more power, but you'll reduce latency some. It's also a
helpful trick to be able to run 1-input-N-output blocks in loopback, at
least up to something less than half the bus_clk rate.
Nick
On Thu, Mar 1, 2018 at 12:48 PM EJ Kreinar via USRP-users <
usrp-users@lists.ettus.com> wrote:
Hi All,
I have an RFNoC setup with 5+ blocks contributing to a transmit path.
I've measured the latency to get a signal through this RFNoC graph is
non-negligible at my bit-rate -- I'm seeing several 10s of ms.
Before I go crazy consolidating RFNoC blocks, I decided to try
shrinking the STR_SINK_FIFOSIZE from 2^11 down to 2^9. I have a known MTU
packet limit that's something like 1024 bytes, which should be manageble
with a 2^9 FIFO size. This could potentially reduce the latency due to
input FIFOs by 4x, which might be a good start. But, I hit a few problems
here...
When setting the STR_SINK_FIFOSIZE to 9, I get the following error:
-- Assuming max packet size for 0/he360encoder_0
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in init
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
line 1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Input FIFO for block 0/qpskmodulator_0 is
too small (4 kiB) for packets of size 7 kiB
coming from block 0/he360encoder_0.
When setting the STR_SINK_FIFOSIZE to 10, I get a different error:
-- [0/he360encoder_0] source_block_ctrl_base::configure_flow_control_out()
buf_size_pkts==1
Traceback (most recent call last):
File "./e310_ground_modem.py", line 330, in <module>
main()
File "./e310_ground_modem.py", line 319, in main
tb = top_block_cls(bitstream=options.bitstream,
clk_rate=options.clk_rate, conv=options.conv, device=options.device,
freq=options.freq, hdlc_enable=options.hdlc_enable,
lo_offset=options.lo_offset, probeconsole=options.probeconsole,
probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
rxaddr=options.rxaddr, tx_gain=options.tx_gain)
File "./e310_ground_modem.py", line 169, in init
self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
line 1595, in connect
return _ettus_swig.device3_sptr_connect(self, *args)
RuntimeError: RuntimeError: Invalid window size 1 for block
0/he360encoder_0. Window size must at least be 2.
Both errors trace back to uhd/host/lib/rfnoc/graph_impl.cc, where it
appears that the graph is attempting to read the FIFO size, compare to the
expected packet size, and intelligently throw errors when there's a
potential issue. I appreciate this feature, but I cant find how to set the
pkt_size argument... Passing a pkt_size stream_arg into the rfnoc block
constructor does not change the port_def to have a different pkt_size.
Any ideas? What am I missing? Do I manually need to access the tree to
edit the port_def parameter at "_root_path/ports/direction/port_index"
for my block?
Cheers,
EJ
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
Aha, that's a pretty wild idea! I like it! It would end up essentially
becoming a tuning between the amount of fill-frames output to keep the
radio saturated while keeping the FIFOs mostly empty. Still, I'm not sure
if I'll go down this "back channel" watermark approach now, or just
consolidate blocks... It's one new block either way, eh? :D
> This wouldn't help with the propagation latency through the RFNoC
flowgraph
Agreed... I'm really not worried about the latency through the crossbar,
though. I fully expect a few milliseconds at most through the crossbar, but
I'm literally getting like 30-40 ms latency attributed to this TX path
right now.
Of course, I still feel like the UHD software should support what I'm
trying to do with the STR_SINK_FIFOSIZE, and it may be applicable in other
situations perhaps, too...
EJ
On Thu, Mar 1, 2018 at 6:40 PM, Nick Foster <bistromath@gmail.com> wrote:
> Oh! That makes complete sense. The dreaded fill-frame dilemma.
>
> Stop me if this is nuts, but... I wonder if, instead of using AXI
> backpressure to stop fill frame creation, you could set up a short custom
> FIFO just upstream of the TX radio and pull an "almost empty" watermark out
> of it. The custom FIFO could be made into an RFNoC block, if the added
> propagation latency of the framer/deframer isn't too much for you. With
> some tuning, that could prompt just-in-time creation of a fill frame by the
> modulator -- every rising edge of the "almost empty" watermark signals the
> creation of one fill frame. The connection back to the modulator could come
> out of band (i.e., made in rfnoc_ce_auto_inst.v as extra interfaces to the
> blocks) if you didn't want to muck about with the command port stuff in
> RFNoC. The FIFO could be made just long enough so that when it's not full,
> it's time to generate a fill frame, and use !full as the watermark.
>
> This wouldn't help with the propagation latency through the RFNoC
> flowgraph, but it would keep the FIFOs largely empty (up to the TX radio
> FIFO itself, unless you put your watermarking inside the TX Radio block).
> RFNoC is a crossbar, so I *think* it should all be deterministic enough
> that you wouldn't have to keep too much in flight at any given time. But I
> haven't tried operating it in this way.
>
> It's a little Rube Goldberg, and it's a little antithetical to how AXI
> streams are supposed to work in the first place, but it might save you from
> having to reengineer a bunch of things to get your latency down.
>
> Nick
>
> On Thu, Mar 1, 2018 at 2:18 PM EJ Kreinar <ejkreinar@gmail.com> wrote:
>
>> Hey Nick,
>>
>> I realize I've forgotten a key piece of information: My first block in
>> the RFNoC transmit chain accepts bursty data packets from the host, and
>> encodes the data using HDLC formatting, including inserting fill frames --
>> the goal here is to generate a continuous transmit output without fully
>> saturating the PS->PL transport, keeping the PS load down, etc. FPGA
>> fill-frame creation is throttled via axi-stream backpressure from the radio
>> endpoint, which means the HDLC fill-frames will fill up all FIFO buffers of
>> downstream blocks. So, the latency of a single data packet to travel in the
>> FPGA from the PS to Radio output is largely driven by the size of the FIFOs
>> in the RFNoC path, and the output sample rate of the radio.
>>
>> I agree consolidation is the "right" answer for best performance. But,
>> I'm trying to use a few off-the-shelf RFNoC blocks to reduce development
>> time, so if possible I'd prefer to tweak the exposed parameters rather than
>> go through the full process for a new FPGA block and software interface.
>>
>> The STR_SINK_FIFOSIZE looks like a parameter that ought to be user
>> configurable -- but the limitation (from what I can tell) is that there
>> doesnt appear to be an easy way to change the pkt_size parameter on the
>> block port to stop the graph_impl.cc from erroring....
>>
>> > On X310 you can just set your MTU, and for RX applications you can
>> pass "spp=<xxx>" to the RX radio block. But I haven't had to do it on E310
>> before
>>
>> On the E310 (as on the X310), the spp parameter does influence packet
>> size, but the relevant port_def "pkt_size" parameter seems to be a
>> different beast which is not updated to reflect the estimated spp from the
>> stream_args.. Perhaps it should?
>>
>> Thanks!
>> EJ
>>
>>
>>
>>
>>
>> On Thu, Mar 1, 2018 at 4:44 PM, Nick Foster <bistromath@gmail.com> wrote:
>>
>>> EJ,
>>>
>>> Reducing the stream sink FIFO size won't really help here unless I'm
>>> misunderstanding something. You need to send shorter packets, yes, but
>>> changing the stream sink FIFO size won't change that. The FIFO doesn't have
>>> to be full before it passes your packet along. That said, I don't think
>>> I've had to specify packet size on packets from the PS on E310 before, so
>>> I'm no help figuring that out. On X310 you can just set your MTU, and for
>>> RX applications you can pass "spp=<xxx>" to the RX radio block. But I
>>> haven't had to do it on E310 before.
>>>
>>> But on another note, because whole packets are buffered before
>>> processing by each block, your first instinct was good -- consolidating
>>> your blocks will help more than just about anything else to reduce latency.
>>>
>>> On E310, there's another approach that can help as well: force the CE
>>> clock to run at full rate. By default, E310 runs the compute engine clock
>>> at the minimum required rate to process samples -- i.e., the radio clock
>>> rate. But you can force the compute engines to run at the full bus rate
>>> (64MHz) by changing rfnoc_ce_auto_inst.v so that instead of:
>>>
>>> wire ce_clk = radio_clk;
>>> wire ce_rst = radio_rst;
>>>
>>> ...it says:
>>>
>>> wire ce_clk = bus_clk;
>>> wire ce_rst = bus_rst;
>>>
>>> You'll consume more power, but you'll reduce latency some. It's also a
>>> helpful trick to be able to run 1-input-N-output blocks in loopback, at
>>> least up to something less than half the bus_clk rate.
>>>
>>> Nick
>>>
>>> On Thu, Mar 1, 2018 at 12:48 PM EJ Kreinar via USRP-users <
>>> usrp-users@lists.ettus.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I have an RFNoC setup with 5+ blocks contributing to a transmit path.
>>>> I've measured the latency to get a signal through this RFNoC graph is
>>>> non-negligible at my bit-rate -- I'm seeing several 10s of ms.
>>>>
>>>> Before I go crazy consolidating RFNoC blocks, I decided to try
>>>> shrinking the STR_SINK_FIFOSIZE from 2^11 down to 2^9. I have a known MTU
>>>> packet limit that's something like 1024 *bytes*, which should be manageble
>>>> with a 2^9 FIFO size. This could potentially reduce the latency due to
>>>> input FIFOs by 4x, which might be a good start. But, I hit a few problems
>>>> here...
>>>>
>>>> When setting the STR_SINK_FIFOSIZE to 9, I get the following error:
>>>>
>>>> -- Assuming max packet size for 0/he360encoder_0
>>>> Traceback (most recent call last):
>>>> File "./e310_ground_modem.py", line 330, in <module>
>>>> main()
>>>> File "./e310_ground_modem.py", line 319, in main
>>>> tb = top_block_cls(bitstream=options.bitstream,
>>>> clk_rate=options.clk_rate, conv=options.conv, device=options.device,
>>>> freq=options.freq, hdlc_enable=options.hdlc_enable,
>>>> lo_offset=options.lo_offset, probeconsole=options.probeconsole,
>>>> probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
>>>> rxaddr=options.rxaddr, tx_gain=options.tx_gain)
>>>> File "./e310_ground_modem.py", line 169, in __init__
>>>> self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
>>>> 0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
>>>> File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
>>>> line 1595, in connect
>>>> return _ettus_swig.device3_sptr_connect(self, *args)
>>>> RuntimeError: RuntimeError: Input FIFO for block 0/qpskmodulator_0 is
>>>> too small (4 kiB) for packets of size 7 kiB
>>>> coming from block 0/he360encoder_0.
>>>>
>>>>
>>>> When setting the STR_SINK_FIFOSIZE to 10, I get a different error:
>>>>
>>>> -- [0/he360encoder_0] source_block_ctrl_base::configure_flow_control_out()
>>>> buf_size_pkts==1
>>>> Traceback (most recent call last):
>>>> File "./e310_ground_modem.py", line 330, in <module>
>>>> main()
>>>> File "./e310_ground_modem.py", line 319, in main
>>>> tb = top_block_cls(bitstream=options.bitstream,
>>>> clk_rate=options.clk_rate, conv=options.conv, device=options.device,
>>>> freq=options.freq, hdlc_enable=options.hdlc_enable,
>>>> lo_offset=options.lo_offset, probeconsole=options.probeconsole,
>>>> probecsv=options.probecsv, rate=options.rate, rs_enable=options.rs_enable,
>>>> rxaddr=options.rxaddr, tx_gain=options.tx_gain)
>>>> File "./e310_ground_modem.py", line 169, in __init__
>>>> self.device3.connect(self.fpgacomms_he360encoder_0.get_block_id(),
>>>> 0, self.hawkeye_qpsk_modulator_0.get_block_id(), 0)
>>>> File "/deploy/dev-ejk/lib/python2.7/site-packages/ettus/ettus_swig.py",
>>>> line 1595, in connect
>>>> return _ettus_swig.device3_sptr_connect(self, *args)
>>>> RuntimeError: RuntimeError: Invalid window size 1 for block
>>>> 0/he360encoder_0. Window size must at least be 2.
>>>>
>>>>
>>>> Both errors trace back to uhd/host/lib/rfnoc/graph_impl.cc, where it
>>>> appears that the graph is attempting to read the FIFO size, compare to the
>>>> expected packet size, and intelligently throw errors when there's a
>>>> potential issue. I appreciate this feature, but I cant find how to set the
>>>> pkt_size argument... Passing a pkt_size stream_arg into the rfnoc block
>>>> constructor does not change the port_def to have a different pkt_size.
>>>>
>>>> Any ideas? What am I missing? Do I manually need to access the tree to
>>>> edit the port_def parameter at "_root_path/ports/direction/port_index"
>>>> for my block?
>>>>
>>>> Cheers,
>>>> EJ
>>>> _______________________________________________
>>>> USRP-users mailing list
>>>> USRP-users@lists.ettus.com
>>>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>>>>
>>>
>>