Pushing the limits

The nice thing about Swift for Arduino these days is we are having a lot of conversations like "exactly how much performance can you get from an atmega328p, how much program space is the limit? what about RAM?"

To me it shows we've gone far beyond the 'hobbyist' place we started from and now as we build professional products, we are dealing with the same issues all professional embedded developers grapple with... in a word efficiency.

To that end I thought it might be worth me putting down some thoughts on what some of the numbers are, where the overheads are, what are the limits on RAM and program memory?

I'm talking here to the atmega328p as that's what most of our work uses.. the classic Arduino UNO heart, but of course similar things would apply for other chips.

In spec, we have 2k RAM, 32k program memory? Can I use all of those?

Of course not. ;-)

Let's break it down.

PROGRAM MEMORY

You might have seen my earlier article with a breakdown of what each part of a typical swift for arduino program is used for: http://swiftforarduino.blogspot.com/2020/07/where-do-all-my-bytes-go-analysis-of.html

With the latest Swift for Arduino, if you want to save space on simple programs you can start to go "bare metal" by replacing normal programs that use the AVR module with an import of ATmega328P (note the capitalisation).

import ATmega328P

DDRB = 1<<5

while true {

PORTB = 1<<5

_delay_ms(500)

PORTB = 0

_delay_ms(500)

}

...in this program I added a simple C function containing assembly to perform delays for us, you could use libc instead (note, if you want to make this work on your S4A IDE, see FOOTNOTE 1).

The point is by avoiding the import of the AVR module, you have saved a lot of program memory. How much? Let's build it and see...

reading project.swift settings...

finished reading project.swift settings

Emitted main.bc

Compiled main.bc

Emitted delay.c.bc

Compiled delay.c.bc

Linking ELF file

Linked main.elf

Made HEX

text data bss dec hex filename

974 0 0 974 3ce main.elf

Size of program is 974 bytes. Size of global memory is 0 bytes.

2% (rounded) of flash memory used.

0% (rounded) of RAM used for global variables. Stack and heap will use more on top of that.

finished

That's a lot smaller! What is the program memory used for in this case? Let's break it down. First disassemble...

Disassembly of section .text:

00000000 <__vectors>:

0: 0c 94 34 00 jmp 0x68 ; 0x68 <__ctors_end>

4: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

8: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

10: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

14: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

18: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

1c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

20: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

24: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

28: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

2c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

30: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

34: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

38: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

3c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

40: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

44: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

48: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

4c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

50: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

54: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

58: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

5c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

60: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

64: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>

00000068 <__ctors_end>:

68: 11 24 eor r1, r1

6a: 1f be out 0x3f, r1 ; 63

6c: cf ef ldi r28, 0xFF ; 255

6e: d8 e0 ldi r29, 0x08 ; 8

70: de bf out 0x3e, r29 ; 62

72: cd bf out 0x3d, r28 ; 61

00000074 <__do_copy_data>:

74: 11 e0 ldi r17, 0x01 ; 1

76: a0 e0 ldi r26, 0x00 ; 0

78: b1 e0 ldi r27, 0x01 ; 1

7a: ee ec ldi r30, 0xCE ; 206

7c: f3 e0 ldi r31, 0x03 ; 3

7e: 02 c0 rjmp .+4 ; 0x84 <__do_copy_data+0x10>

80: 05 90 lpm r0, Z+

82: 0d 92 st X+, r0

84: a0 30 cpi r26, 0x00 ; 0

86: b1 07 cpc r27, r17

88: d9 f7 brne .-10 ; 0x80 <__do_copy_data+0xc>

0000008a <__do_clear_bss>:

8a: 21 e0 ldi r18, 0x01 ; 1

8c: a0 e0 ldi r26, 0x00 ; 0

8e: b1 e0 ldi r27, 0x01 ; 1

90: 01 c0 rjmp .+2 ; 0x94 <.do_clear_bss_start>

00000092 <.do_clear_bss_loop>:

92: 1d 92 st X+, r1

00000094 <.do_clear_bss_start>:

94: a0 30 cpi r26, 0x00 ; 0

96: b2 07 cpc r27, r18

98: e1 f7 brne .-8 ; 0x92 <.do_clear_bss_loop>

9a: 0e 94 53 00 call 0xa6 ; 0xa6 <main>

9e: 0c 94 e5 01 jmp 0x3ca ; 0x3ca <_exit>

000000a2 <__bad_interrupt>:

a2: 0c 94 00 00 jmp 0 ; 0x0 <__vectors>

000000a6 <main>:

a6: cf 92 push r12

a8: df 92 push r13

aa: ef 92 push r14

ac: ff 92 push r15

ae: 0f 93 push r16

b0: 1f 93 push r17

b2: 80 e2 ldi r24, 0x20 ; 32

b4: 0e 94 c6 00 call 0x18c ; 0x18c <_setDDRB>

b8: 80 e0 ldi r24, 0x00 ; 0

ba: 90 e0 ldi r25, 0x00 ; 0

bc: 6c 01 movw r12, r24

be: 0a ef ldi r16, 0xFA ; 250

c0: 13 e4 ldi r17, 0x43 ; 67

c2: 80 e2 ldi r24, 0x20 ; 32

c4: 0e 94 c8 00 call 0x190 ; 0x190 <_setPORTB>

c8: 76 01 movw r14, r12

DDRB = 1<<5

while true {

PORTB = 1<<5

_delay_ms(500)

ca: b7 01 movw r22, r14

cc: c8 01 movw r24, r16

ce: 0e 94 71 00 call 0xe2 ; 0xe2 <_delay_ms>

d2: 80 e0 ldi r24, 0x00 ; 0

d4: 0e 94 c8 00 call 0x190 ; 0x190 <_setPORTB>

PORTB = 0

_delay_ms(500)

d8: b7 01 movw r22, r14

da: c8 01 movw r24, r16

dc: 0e 94 71 00 call 0xe2 ; 0xe2 <_delay_ms>

e0: f0 cf rjmp .-32 ; 0xc2 <main+0x1c>

000000e2 <_delay_ms>:

e2: 8f 92 push r8

e4: 9f 92 push r9

e6: af 92 push r10

e8: bf 92 push r11

ea: cf 92 push r12

ec: df 92 push r13

ee: ef 92 push r14

f0: ff 92 push r15

f2: 0f 93 push r16

f4: 1f 93 push r17

f6: 4c 01 movw r8, r24

f8: 7b 01 movw r14, r22

fa: 00 e0 ldi r16, 0x00 ; 0

fc: 10 e0 ldi r17, 0x00 ; 0

fe: 4a e7 ldi r20, 0x7A ; 122

100: 55 e4 ldi r21, 0x45 ; 69

102: 98 01 movw r18, r16

104: 0e 94 50 01 call 0x2a0 ; 0x2a0 <__mulsf3>

108: 6b 01 movw r12, r22

10a: 5c 01 movw r10, r24

10c: 40 e8 ldi r20, 0x80 ; 128

10e: 5f e3 ldi r21, 0x3F ; 63

110: 98 01 movw r18, r16

112: 0e 94 ca 00 call 0x194 ; 0x194 <__cmpsf2>

116: 88 23 and r24, r24

118: 1a f4 brpl .+6 ; 0x120 <_delay_ms+0x3e>

11a: 81 e0 ldi r24, 0x01 ; 1

11c: 90 e0 ldi r25, 0x00 ; 0

11e: 29 c0 rjmp .+82 ; 0x172 <_delay_ms+0x90>

120: 20 e0 ldi r18, 0x00 ; 0

122: 3f ef ldi r19, 0xFF ; 255

124: 4f e7 ldi r20, 0x7F ; 127

126: 57 e4 ldi r21, 0x47 ; 71

128: b6 01 movw r22, r12

12a: c5 01 movw r24, r10

12c: 0e 94 4b 01 call 0x296 ; 0x296 <__gesf2>

130: 10 e0 ldi r17, 0x00 ; 0

132: 18 17 cp r17, r24

134: cc f4 brge .+50 ; 0x168 <_delay_ms+0x86>

136: 20 e0 ldi r18, 0x00 ; 0

138: 30 e0 ldi r19, 0x00 ; 0

13a: 40 e2 ldi r20, 0x20 ; 32

13c: 51 e4 ldi r21, 0x41 ; 65

13e: b7 01 movw r22, r14

140: c4 01 movw r24, r8

142: 0e 94 50 01 call 0x2a0 ; 0x2a0 <__mulsf3>

146: 0e 94 cf 00 call 0x19e ; 0x19e <__fixunssfsi>

14a: cb 01 movw r24, r22

14c: 80 30 cpi r24, 0x00 ; 0

14e: 91 07 cpc r25, r17

150: 91 f0 breq .+36 ; 0x176 <_delay_ms+0x94>

152: 20 e9 ldi r18, 0x90 ; 144

154: 31 e0 ldi r19, 0x01 ; 1

156: 40 e0 ldi r20, 0x00 ; 0

158: f9 01 movw r30, r18

15a: 31 97 sbiw r30, 0x01 ; 1

15c: f1 f7 brne .-4 ; 0x15a <_delay_ms+0x78>

15e: 01 97 sbiw r24, 0x01 ; 1

160: 80 30 cpi r24, 0x00 ; 0

162: 94 07 cpc r25, r20

164: c9 f7 brne .-14 ; 0x158 <_delay_ms+0x76>

166: 07 c0 rjmp .+14 ; 0x176 <_delay_ms+0x94>

168: b6 01 movw r22, r12

16a: c5 01 movw r24, r10

16c: 0e 94 cf 00 call 0x19e ; 0x19e <__fixunssfsi>

170: cb 01 movw r24, r22

172: 01 97 sbiw r24, 0x01 ; 1

174: f1 f7 brne .-4 ; 0x172 <_delay_ms+0x90>

176: 1f 91 pop r17

178: 0f 91 pop r16

17a: ff 90 pop r15

17c: ef 90 pop r14

17e: df 90 pop r13

180: cf 90 pop r12

182: bf 90 pop r11

184: af 90 pop r10

186: 9f 90 pop r9

188: 8f 90 pop r8

18a: 08 95 ret

0000018c <redacted>:

ATmega328P module getters and setters

00000194 <__cmpsf2>:

194: 0e 94 fe 00 call 0x1fc ; 0x1fc <__fp_cmp>

198: 08 f4 brcc .+2 ; 0x19c <__cmpsf2+0x8>

19a: 81 e0 ldi r24, 0x01 ; 1

19c: 08 95 ret

0000019e <__fixunssfsi>:

19e: 0e 94 2a 01 call 0x254 ; 0x254 <__fp_splitA>

1a2: 88 f0 brcs .+34 ; 0x1c6 <__fixunssfsi+0x28>

1a4: 9f 57 subi r25, 0x7F ; 127

1a6: 98 f0 brcs .+38 ; 0x1ce <__fixunssfsi+0x30>

1a8: b9 2f mov r27, r25

1aa: 99 27 eor r25, r25

1ac: b7 51 subi r27, 0x17 ; 23

1ae: b0 f0 brcs .+44 ; 0x1dc <__fixunssfsi+0x3e>

1b0: e1 f0 breq .+56 ; 0x1ea <__fixunssfsi+0x4c>

1b2: 66 0f add r22, r22

1b4: 77 1f adc r23, r23

1b6: 88 1f adc r24, r24

1b8: 99 1f adc r25, r25

1ba: 1a f0 brmi .+6 ; 0x1c2 <__fixunssfsi+0x24>

1bc: ba 95 dec r27

1be: c9 f7 brne .-14 ; 0x1b2 <__fixunssfsi+0x14>

1c0: 14 c0 rjmp .+40 ; 0x1ea <__fixunssfsi+0x4c>

1c2: b1 30 cpi r27, 0x01 ; 1

1c4: 91 f0 breq .+36 ; 0x1ea <__fixunssfsi+0x4c>

1c6: 0e 94 44 01 call 0x288 ; 0x288 <__fp_zero>

1ca: b1 e0 ldi r27, 0x01 ; 1

1cc: 08 95 ret

1ce: 0c 94 44 01 jmp 0x288 ; 0x288 <__fp_zero>

1d2: 67 2f mov r22, r23

1d4: 78 2f mov r23, r24

1d6: 88 27 eor r24, r24

1d8: b8 5f subi r27, 0xF8 ; 248

1da: 39 f0 breq .+14 ; 0x1ea <__fixunssfsi+0x4c>

1dc: b9 3f cpi r27, 0xF9 ; 249

1de: cc f3 brlt .-14 ; 0x1d2 <__fixunssfsi+0x34>

1e0: 86 95 lsr r24

1e2: 77 95 ror r23

1e4: 67 95 ror r22

1e6: b3 95 inc r27

1e8: d9 f7 brne .-10 ; 0x1e0 <__fixunssfsi+0x42>

1ea: 3e f4 brtc .+14 ; 0x1fa <__fixunssfsi+0x5c>

1ec: 90 95 com r25

1ee: 80 95 com r24

1f0: 70 95 com r23

1f2: 61 95 neg r22

1f4: 7f 4f sbci r23, 0xFF ; 255

1f6: 8f 4f sbci r24, 0xFF ; 255

1f8: 9f 4f sbci r25, 0xFF ; 255

1fa: 08 95 ret

000001fc <__fp_cmp>:

1fc: 99 0f add r25, r25

1fe: 00 08 sbc r0, r0

200: 55 0f add r21, r21

202: aa 0b sbc r26, r26

204: e0 e8 ldi r30, 0x80 ; 128

206: fe ef ldi r31, 0xFE ; 254

208: 16 16 cp r1, r22

20a: 17 06 cpc r1, r23

20c: e8 07 cpc r30, r24

20e: f9 07 cpc r31, r25

210: c0 f0 brcs .+48 ; 0x242 <__fp_cmp+0x46>

212: 12 16 cp r1, r18

214: 13 06 cpc r1, r19

216: e4 07 cpc r30, r20

218: f5 07 cpc r31, r21

21a: 98 f0 brcs .+38 ; 0x242 <__fp_cmp+0x46>

21c: 62 1b sub r22, r18

21e: 73 0b sbc r23, r19

220: 84 0b sbc r24, r20

222: 95 0b sbc r25, r21

224: 39 f4 brne .+14 ; 0x234 <__fp_cmp+0x38>

226: 0a 26 eor r0, r26

228: 61 f0 breq .+24 ; 0x242 <__fp_cmp+0x46>

22a: 23 2b or r18, r19

22c: 24 2b or r18, r20

22e: 25 2b or r18, r21

230: 21 f4 brne .+8 ; 0x23a <__fp_cmp+0x3e>

232: 08 95 ret

234: 0a 26 eor r0, r26

236: 09 f4 brne .+2 ; 0x23a <__fp_cmp+0x3e>

238: a1 40 sbci r26, 0x01 ; 1

23a: a6 95 lsr r26

23c: 8f ef ldi r24, 0xFF ; 255

23e: 81 1d adc r24, r1

240: 81 1d adc r24, r1

242: 08 95 ret

00000244 <__fp_split3>:

244: 57 fd sbrc r21, 7

246: 90 58 subi r25, 0x80 ; 128

248: 44 0f add r20, r20

24a: 55 1f adc r21, r21

24c: 59 f0 breq .+22 ; 0x264 <__fp_splitA+0x10>

24e: 5f 3f cpi r21, 0xFF ; 255

250: 71 f0 breq .+28 ; 0x26e <__fp_splitA+0x1a>

252: 47 95 ror r20

00000254 <__fp_splitA>:

254: 88 0f add r24, r24

256: 97 fb bst r25, 7

258: 99 1f adc r25, r25

25a: 61 f0 breq .+24 ; 0x274 <__fp_splitA+0x20>

25c: 9f 3f cpi r25, 0xFF ; 255

25e: 79 f0 breq .+30 ; 0x27e <__fp_splitA+0x2a>

260: 87 95 ror r24

262: 08 95 ret

264: 12 16 cp r1, r18

266: 13 06 cpc r1, r19

268: 14 06 cpc r1, r20

26a: 55 1f adc r21, r21

26c: f2 cf rjmp .-28 ; 0x252 <__fp_split3+0xe>

26e: 46 95 lsr r20

270: f1 df rcall .-30 ; 0x254 <__fp_splitA>

272: 08 c0 rjmp .+16 ; 0x284 <__fp_splitA+0x30>

274: 16 16 cp r1, r22

276: 17 06 cpc r1, r23

278: 18 06 cpc r1, r24

27a: 99 1f adc r25, r25

27c: f1 cf rjmp .-30 ; 0x260 <__fp_splitA+0xc>

27e: 86 95 lsr r24

280: 71 05 cpc r23, r1

282: 61 05 cpc r22, r1

284: 08 94 sec

286: 08 95 ret

00000288 <__fp_zero>:

288: e8 94 clt

0000028a <__fp_szero>:

28a: bb 27 eor r27, r27

28c: 66 27 eor r22, r22

28e: 77 27 eor r23, r23

290: cb 01 movw r24, r22

292: 97 f9 bld r25, 7

294: 08 95 ret

00000296 <__gesf2>:

296: 0e 94 fe 00 call 0x1fc ; 0x1fc <__fp_cmp>

29a: 08 f4 brcc .+2 ; 0x29e <__gesf2+0x8>

29c: 8f ef ldi r24, 0xFF ; 255

29e: 08 95 ret

000002a0 <__mulsf3>:

2a0: 0e 94 74 01 call 0x2e8 ; 0x2e8 <__mulsf3x>

2a4: 0c 94 54 01 jmp 0x2a8 ; 0x2a8 <__fp_round>

000002a8 <__fp_round>:

2a8: 09 2e mov r0, r25

2aa: 03 94 inc r0

2ac: 00 0c add r0, r0

2ae: 11 f4 brne .+4 ; 0x2b4 <__fp_round+0xc>

2b0: 88 23 and r24, r24

2b2: 52 f0 brmi .+20 ; 0x2c8 <__fp_round+0x20>

2b4: bb 0f add r27, r27

2b6: 40 f4 brcc .+16 ; 0x2c8 <__fp_round+0x20>

2b8: bf 2b or r27, r31

2ba: 11 f4 brne .+4 ; 0x2c0 <__fp_round+0x18>

2bc: 60 ff sbrs r22, 0

2be: 04 c0 rjmp .+8 ; 0x2c8 <__fp_round+0x20>

2c0: 6f 5f subi r22, 0xFF ; 255

2c2: 7f 4f sbci r23, 0xFF ; 255

2c4: 8f 4f sbci r24, 0xFF ; 255

2c6: 9f 4f sbci r25, 0xFF ; 255

2c8: 08 95 ret

2ca: 0e 94 ce 01 call 0x39c ; 0x39c <__fp_pscA>

2ce: 38 f0 brcs .+14 ; 0x2de <__fp_round+0x36>

2d0: 0e 94 d5 01 call 0x3aa ; 0x3aa <__fp_pscB>

2d4: 20 f0 brcs .+8 ; 0x2de <__fp_round+0x36>

2d6: 95 23 and r25, r21

2d8: 11 f0 breq .+4 ; 0x2de <__fp_round+0x36>

2da: 0c 94 dc 01 jmp 0x3b8 ; 0x3b8 <__fp_inf>

2de: 0c 94 e2 01 jmp 0x3c4 ; 0x3c4 <__fp_nan>

2e2: 11 24 eor r1, r1

2e4: 0c 94 45 01 jmp 0x28a ; 0x28a <__fp_szero>

000002e8 <__mulsf3x>:

2e8: 0e 94 22 01 call 0x244 ; 0x244 <__fp_split3>

2ec: 70 f3 brcs .-36 ; 0x2ca <__fp_round+0x22>

000002ee <__mulsf3_pse>:

2ee: 95 9f mul r25, r21

2f0: c1 f3 breq .-16 ; 0x2e2 <__fp_round+0x3a>

2f2: 95 0f add r25, r21

2f4: 50 e0 ldi r21, 0x00 ; 0

2f6: 55 1f adc r21, r21

2f8: 62 9f mul r22, r18

2fa: f0 01 movw r30, r0

2fc: 72 9f mul r23, r18

2fe: bb 27 eor r27, r27

300: f0 0d add r31, r0

302: b1 1d adc r27, r1

304: 63 9f mul r22, r19

306: aa 27 eor r26, r26

308: f0 0d add r31, r0

30a: b1 1d adc r27, r1

30c: aa 1f adc r26, r26

30e: 64 9f mul r22, r20

310: 66 27 eor r22, r22

312: b0 0d add r27, r0

314: a1 1d adc r26, r1

316: 66 1f adc r22, r22

318: 82 9f mul r24, r18

31a: 22 27 eor r18, r18

31c: b0 0d add r27, r0

31e: a1 1d adc r26, r1

320: 62 1f adc r22, r18

322: 73 9f mul r23, r19

324: b0 0d add r27, r0

326: a1 1d adc r26, r1

328: 62 1f adc r22, r18

32a: 83 9f mul r24, r19

32c: a0 0d add r26, r0

32e: 61 1d adc r22, r1

330: 22 1f adc r18, r18

332: 74 9f mul r23, r20

334: 33 27 eor r19, r19

336: a0 0d add r26, r0

338: 61 1d adc r22, r1

33a: 23 1f adc r18, r19

33c: 84 9f mul r24, r20

33e: 60 0d add r22, r0

340: 21 1d adc r18, r1

342: 82 2f mov r24, r18

344: 76 2f mov r23, r22

346: 6a 2f mov r22, r26

348: 11 24 eor r1, r1

34a: 9f 57 subi r25, 0x7F ; 127

34c: 50 40 sbci r21, 0x00 ; 0

34e: 9a f0 brmi .+38 ; 0x376 <__mulsf3_pse+0x88>

350: f1 f0 breq .+60 ; 0x38e <__mulsf3_pse+0xa0>

352: 88 23 and r24, r24

354: 4a f0 brmi .+18 ; 0x368 <__mulsf3_pse+0x7a>

356: ee 0f add r30, r30

358: ff 1f adc r31, r31

35a: bb 1f adc r27, r27

35c: 66 1f adc r22, r22

35e: 77 1f adc r23, r23

360: 88 1f adc r24, r24

362: 91 50 subi r25, 0x01 ; 1

364: 50 40 sbci r21, 0x00 ; 0

366: a9 f7 brne .-22 ; 0x352 <__mulsf3_pse+0x64>

368: 9e 3f cpi r25, 0xFE ; 254

36a: 51 05 cpc r21, r1

36c: 80 f0 brcs .+32 ; 0x38e <__mulsf3_pse+0xa0>

36e: 0c 94 dc 01 jmp 0x3b8 ; 0x3b8 <__fp_inf>

372: 0c 94 45 01 jmp 0x28a ; 0x28a <__fp_szero>

376: 5f 3f cpi r21, 0xFF ; 255

378: e4 f3 brlt .-8 ; 0x372 <__mulsf3_pse+0x84>

37a: 98 3e cpi r25, 0xE8 ; 232

37c: d4 f3 brlt .-12 ; 0x372 <__mulsf3_pse+0x84>

37e: 86 95 lsr r24

380: 77 95 ror r23

382: 67 95 ror r22

384: b7 95 ror r27

386: f7 95 ror r31

388: e7 95 ror r30

38a: 9f 5f subi r25, 0xFF ; 255

38c: c1 f7 brne .-16 ; 0x37e <__mulsf3_pse+0x90>

38e: fe 2b or r31, r30

390: 88 0f add r24, r24

392: 91 1d adc r25, r1

394: 96 95 lsr r25

396: 87 95 ror r24

398: 97 f9 bld r25, 7

39a: 08 95 ret

0000039c <__fp_pscA>:

39c: 00 24 eor r0, r0

39e: 0a 94 dec r0

3a0: 16 16 cp r1, r22

3a2: 17 06 cpc r1, r23

3a4: 18 06 cpc r1, r24

3a6: 09 06 cpc r0, r25

3a8: 08 95 ret

000003aa <__fp_pscB>:

3aa: 00 24 eor r0, r0

3ac: 0a 94 dec r0

3ae: 12 16 cp r1, r18

3b0: 13 06 cpc r1, r19

3b2: 14 06 cpc r1, r20

3b4: 05 06 cpc r0, r21

3b6: 08 95 ret

000003b8 <__fp_inf>:

3b8: 97 f9 bld r25, 7

3ba: 9f 67 ori r25, 0x7F ; 127

3bc: 80 e8 ldi r24, 0x80 ; 128

3be: 70 e0 ldi r23, 0x00 ; 0

3c0: 60 e0 ldi r22, 0x00 ; 0

3c2: 08 95 ret

000003c4 <__fp_nan>:

3c4: 9f ef ldi r25, 0xFF ; 255

3c6: 80 ec ldi r24, 0xC0 ; 192

3c8: 08 95 ret

000003ca <_exit>:

3ca: f8 94 cli

000003cc <__stop_program>:

3cc: ff cf rjmp .-2 ; 0x3cc <__stop_program>

Most of it is C runtime and GCC runtime:

0x00-0x67 ... vectors (104 bytes)

0x68-0xa5 ... program setup (62 bytes)

0xa6-0xe1 ... our program, including function calls (60 bytes)

0xe2-0x18b ... _delay_ms function (170 bytes)

0x18c-0x193 ... DDRB and PORTB setting code within ATmega328P module (8 bytes)

0x194-0x3cd ... libGCC runtime (570 bytes)

Note that largely the GCC runtime is used because we did floating point arithmetic, which clang is lowering into compiler runtime function calls. If we rewrote the delay function we could probably make it more efficient.

170 bytes is pretty much unavoidable vectors and program overhead (including _exit, _stop_program, __ctors_end, __do_copy_data, __do_clear_bss, __bad_interrupt and __vectors)

Note: In this program 740 bytes are being used by inefficiently written delay functions, we could probably improve on that.

So, how do I know when I'm hitting my limit on program memory? The above program size output is helpful but it's not the full story. You can see basically a program size and a data size. (In classic linker terms the .text and .data segments.) You need to add both of those. The .data segment contains the values that are copied into RAM for initialising global variables when they're not initialised to zero (more on that later). So you have to add both of those, and then the value should be <= than 32,768, right?

Well... close. However, Arduino boards per normal also have a 512 byte bootloader, often it's a 1k bootloader... so that flash memory has gone too.

But the point is AVR uses a lot of fixed flash memory even in the most basic programs, as well as C runtime overhead described above and GCC runtime, there are a lot of fixed overheads in ISRs, etc. (see the other article).

RAM

*always* starts at 0x100 - why? on AVR that memory is actually needed for the CPU registers r0-31! And for all the hardware registers... the cpu status register (0x3f), the stack pointer (0x3d,0x3e) and many others like PORTB. So you've lost 12% of your RAM right there! (https://www.farnell.com/datasheets/1693866.pdf... this link might be broken, just google "atmega328p datasheet")

Add in global variables, which is the .data segment above and bss... variables initialised to 0 again will be larger with AVR imported because of the various global fixed sized buffer we use for things like I2C, UART, int to string. All of these are fixed overheads when you use the AVR module.

So that's another 256 bytes or so gone. And the stack? Well that will always be using a few bytes at the top of RAM... estimate maybe 10 bytes for each level of function call you go down, varying a lot. If using sophisticated swift structs and passing them on the stack (occasionally buggy in S4A version 4.8 still... but we're working on it!) then you can easily add 50-100bytes to each stack frame... if you have 4 functions deep, that's nearly 25% of your RAM just for the stack! If you do recursion it can quickly get worse and blow all the RAM.

So you can see that the room remaining for UnsafeMutableBufferPointers or Arrays can shrink dramatically.

So the final conclusion? When S4A says...

60% (rounded) of flash memory used.

10% (rounded) of RAM used for global variables. Stack and heap will use more on top of that.

...it's not the whole story! It's a reasonable estimate, but you'll never be able to use 100% of flash memory or 100% of RAM for global variables.

Search This Blog

Swift for Arduino

Pushing the limits

Comments

Post a Comment

Popular posts from this blog

Halloween LED lights on a plastic trick or treat cauldron

Swift for Arduino newsletter - Saturday November 16th 2019

code signing, entitlements, bundles, sandboxes, hardened runtime, notarisation, app store security