I have written my own test program, it will not be published here because it could be very dangerous; if wrongly used, potentially can destroy a device layout (partitions + contents) within few seconds.

However the test program is freely available, PM me if you want it. (or may be I will publish it in Advanced user sections).
How is performed the test ? The original question was around what was the optimal format for a USB stick, in turn this may greatly depend from the the read-write unit used, a.k.a block size.
Each filesystem type has it's own algorithms and allocations strategy to write on device, so I decided to get rid of all fancy (and great) features offered from kernel and filesystem, and go directly to
test the raw device.
How is done ? very simple: the device is opened for writing using O_SYNC flag, that will assure that each written block will be really written on disk and will not stay in buffers.
The program perform 6 tests, for read and write operations, using the followings block sizes: 4K, 8K, 16K, 32K, 64K, and 128K, for a total of 128 Mb.
Things now start becoming interesting, here some results:
EDIT: results modified because of a slightly improved version of the test program
Trascend 2 Gb:
scsi 14:0:0:0: Direct-Access JetFlash TS2GJFV30 8.07 PQ: 0 ANSI: 2
Disk /dev/sdc: 2055 MB, 2055207936 bytes
64 heads, 62 sectors/track, 1011 cylinders, total 4014078 sectors
# ./testflash /dev/sdc
/dev/sdc: block 4096: Read 11.42 secs 11.21 MB/s, Write 36.61 secs 3.50 MB/s
/dev/sdc: block 8192: Read 11.42 secs 11.21 MB/s, Write 31.94 secs 4.01 MB/s
/dev/sdc: block 16384: Read 11.47 secs 11.16 MB/s, Write 29.54 secs 4.33 MB/s
/dev/sdc: block 32768: Read 11.52 secs 11.11 MB/s, Write 28.21 secs 4.54 MB/s
/dev/sdc: block 65536: Read 11.52 secs 11.11 MB/s, Write 28.18 secs 4.54 MB/s
/dev/sdc: block 131072: Read 11.54 secs 11.09 MB/s, Write 28.23 secs 4.53 MB/s
Kingston 2 Gb:
scsi 15:0:0:0: Direct-Access Kingston DataTraveler 2.0 PMAP PQ: 0 ANSI: 0 CCS
Disk /dev/sdc: 2031 MB, 2031091712 bytes
63 heads, 62 sectors/track, 1015 cylinders, total 3966976 sectors
# ./testflash /dev/sdc
/dev/sdc: block 4096: Read 6.51 secs 19.66 MB/s, Write 102.43 secs 1.25 MB/s
/dev/sdc: block 8192: Read 6.52 secs 19.65 MB/s, Write 30.36 secs 4.22 MB/s
/dev/sdc: block 16384: Read 6.53 secs 19.59 MB/s, Write 25.59 secs 5.00 MB/s
/dev/sdc: block 32768: Read 6.54 secs 19.56 MB/s, Write 22.64 secs 5.65 MB/s
/dev/sdc: block 65536: Read 6.53 secs 19.61 MB/s, Write 21.54 secs 5.94 MB/s
/dev/sdc: block 131072: Read 6.53 secs 19.61 MB/s, Write 21.27 secs 6.02 MB/s
Verbatim 4 Gb:
scsi 16:0:0:0: Direct-Access General USB Flash Disk 1100 PQ: 0 ANSI: 0 CCS
Disk /dev/sdc: 4001 MB, 4001366016 bytes
124 heads, 62 sectors/track, 1016 cylinders, total 7815168 sectors
# ./testflash /dev/sdc
/dev/sdc: block 4096: Read 8.90 secs 14.39 MB/s, Write 101.54 secs 1.26 MB/s
/dev/sdc: block 8192: Read 8.58 secs 14.93 MB/s, Write 60.41 secs 2.12 MB/s
/dev/sdc: block 16384: Read 8.38 secs 15.27 MB/s, Write 37.37 secs 3.43 MB/s
/dev/sdc: block 32768: Read 8.45 secs 15.15 MB/s, Write 38.26 secs 3.35 MB/s
/dev/sdc: block 65536: Read 8.35 secs 15.32 MB/s, Write 34.95 secs 3.66 MB/s
/dev/sdc: block 131072: Read 8.32 secs 15.38 MB/s, Write 41.30 secs 3.10 MB/s
HP 4Gb:
scsi 17:0:0:0: Direct-Access HP v100w PMAP PQ: 0 ANSI: 0 CCS
Disk /dev/sdc: 4007 MB, 4007657472 bytes
124 heads, 62 sectors/track, 1018 cylinders, total 7827456 sectors
# ./testflash /dev/sdc
/dev/sdc: block 4096: Read 7.06 secs 18.13 MB/s, Write 225.12 secs 0.57 MB/s
/dev/sdc: block 8192: Read 7.06 secs 18.14 MB/s, Write 136.26 secs 0.94 MB/s
/dev/sdc: block 16384: Read 7.06 secs 18.13 MB/s, Write 83.80 secs 1.53 MB/s
/dev/sdc: block 32768: Read 6.88 secs 18.60 MB/s, Write 11.86 secs 10.79 MB/s
/dev/sdc: block 65536: Read 6.87 secs 18.63 MB/s, Write 8.12 secs 15.77 MB/s
/dev/sdc: block 131072: Read 6.89 secs 18.58 MB/s, Write 8.09 secs 15.81 MB/s
Kingston3G, 8Gb:
scsi 18:0:0:0: Direct-Access Kingston DataTraveler G3 PMAP PQ: 0 ANSI: 0 CCS
Disk /dev/sdc: 8011 MB, 8011120640 bytes
247 heads, 62 sectors/track, 1021 cylinders, total 15646720 sectors
# ./testflash /dev/sdc
/dev/sdc: block 4096: Read 6.26 secs 20.43 MB/s, Write 109.36 secs 1.17 MB/s
/dev/sdc: block 8192: Read 6.26 secs 20.44 MB/s, Write 56.98 secs 2.25 MB/s
/dev/sdc: block 16384: Read 6.26 secs 20.44 MB/s, Write 41.33 secs 3.10 MB/s
/dev/sdc: block 32768: Read 6.10 secs 20.99 MB/s, Write 13.22 secs 9.68 MB/s
/dev/sdc: block 65536: Read 6.14 secs 20.86 MB/s, Write 11.50 secs 11.13 MB/s
/dev/sdc: block 131072: Read 6.10 secs 20.97 MB/s, Write 11.36 secs 11.27 MB/s
interesting things to note:
- the read spead practically doesn't change when you change the block size
- the HP 4Gb device, result in
both the best and the worse performance depending on block size
- nearly all perform better in writing using large blocks
... to be continued with filesystems related tests