The nice thing about Swift for Arduino these days is we are having a lot of conversations like "exactly how much performance can you get from an atmega328p, how much program space is the limit? what about RAM?"
To me it shows we've gone far beyond the 'hobbyist' place we started from and now as we build professional products, we are dealing with the same issues all professional embedded developers grapple with... in a word efficiency.
To that end I thought it might be worth me putting down some thoughts on what some of the numbers are, where the overheads are, what are the limits on RAM and program memory?
I'm talking here to the atmega328p as that's what most of our work uses.. the classic Arduino UNO heart, but of course similar things would apply for other chips.
Of course not. ;-)
Let's break it down.
You might have seen my earlier article with a breakdown of what each part of a typical swift for arduino program is used for: http://swiftforarduino.blogspot.com/2020/07/where-do-all-my-bytes-go-analysis-of.html
With the latest Swift for Arduino, if you want to save space on simple programs you can start to go "bare metal" by replacing normal programs that use the AVR module with an import of ATmega328P (note the capitalisation).
...in this program I added a simple C function containing assembly to perform delays for us, you could use libc instead (note, if you want to make this work on your S4A IDE, see FOOTNOTE 1).
The point is by avoiding the import of the AVR module, you have saved a lot of program memory. How much? Let's build it and see...
reading project.swift settings...
Size of program is 974 bytes. Size of global memory is 0 bytes.
2% (rounded) of flash memory used.
0% (rounded) of RAM used for global variables. Stack and heap will use more on top of that.
That's a lot smaller! What is the program memory used for in this case? Let's break it down. First disassemble...
Disassembly of section .text:
00000000 <__vectors>:
0: 0c 94 34 00 jmp 0x68 ; 0x68 <__ctors_end>
4: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
8: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
10: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
14: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
18: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
1c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
20: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
24: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
28: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
2c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
30: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
34: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
38: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
3c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
40: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
44: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
48: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
4c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
50: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
54: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
58: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
5c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
60: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
64: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
00000068 <__ctors_end>:
68: 11 24 eor r1, r1
6a: 1f be out 0x3f, r1 ; 63
6c: cf ef ldi r28, 0xFF ; 255
6e: d8 e0 ldi r29, 0x08 ; 8
70: de bf out 0x3e, r29 ; 62
72: cd bf out 0x3d, r28 ; 61
00000074 <__do_copy_data>:
74: 11 e0 ldi r17, 0x01 ; 1
76: a0 e0 ldi r26, 0x00 ; 0
78: b1 e0 ldi r27, 0x01 ; 1
7a: ee ec ldi r30, 0xCE ; 206
7c: f3 e0 ldi r31, 0x03 ; 3
7e: 02 c0 rjmp .+4 ; 0x84 <__do_copy_data+0x10>
80: 05 90 lpm r0, Z+
82: 0d 92 st X+, r0
84: a0 30 cpi r26, 0x00 ; 0
86: b1 07 cpc r27, r17
88: d9 f7 brne .-10 ; 0x80 <__do_copy_data+0xc>
0000008a <__do_clear_bss>:
8a: 21 e0 ldi r18, 0x01 ; 1
8c: a0 e0 ldi r26, 0x00 ; 0
8e: b1 e0 ldi r27, 0x01 ; 1
90: 01 c0 rjmp .+2 ; 0x94 <.do_clear_bss_start>
00000092 <.do_clear_bss_loop>:
92: 1d 92 st X+, r1
00000094 <.do_clear_bss_start>:
94: a0 30 cpi r26, 0x00 ; 0
96: b2 07 cpc r27, r18
98: e1 f7 brne .-8 ; 0x92 <.do_clear_bss_loop>
9a: 0e 94 53 00 call 0xa6 ; 0xa6 <main>
9e: 0c 94 e5 01 jmp 0x3ca ; 0x3ca <_exit>
000000a2 <__bad_interrupt>:
a2: 0c 94 00 00 jmp 0 ; 0x0 <__vectors>
000000a6 <main>:
a6: cf 92 push r12
a8: df 92 push r13
aa: ef 92 push r14
ac: ff 92 push r15
ae: 0f 93 push r16
b0: 1f 93 push r17
b2: 80 e2 ldi r24, 0x20 ; 32
b4: 0e 94 c6 00 call 0x18c ; 0x18c <_setDDRB>
b8: 80 e0 ldi r24, 0x00 ; 0
ba: 90 e0 ldi r25, 0x00 ; 0
bc: 6c 01 movw r12, r24
be: 0a ef ldi r16, 0xFA ; 250
c0: 13 e4 ldi r17, 0x43 ; 67
c2: 80 e2 ldi r24, 0x20 ; 32
c4: 0e 94 c8 00 call 0x190 ; 0x190 <_setPORTB>
c8: 76 01 movw r14, r12
DDRB = 1<<5
while true {
PORTB = 1<<5
_delay_ms(500)
ca: b7 01 movw r22, r14
cc: c8 01 movw r24, r16
ce: 0e 94 71 00 call 0xe2 ; 0xe2 <_delay_ms>
d2: 80 e0 ldi r24, 0x00 ; 0
d4: 0e 94 c8 00 call 0x190 ; 0x190 <_setPORTB>
PORTB = 0
_delay_ms(500)
d8: b7 01 movw r22, r14
da: c8 01 movw r24, r16
dc: 0e 94 71 00 call 0xe2 ; 0xe2 <_delay_ms>
e0: f0 cf rjmp .-32 ; 0xc2 <main+0x1c>
000000e2 <_delay_ms>:
e2: 8f 92 push r8
e4: 9f 92 push r9
e6: af 92 push r10
e8: bf 92 push r11
ea: cf 92 push r12
ec: df 92 push r13
ee: ef 92 push r14
f0: ff 92 push r15
f2: 0f 93 push r16
f4: 1f 93 push r17
f6: 4c 01 movw r8, r24
f8: 7b 01 movw r14, r22
fa: 00 e0 ldi r16, 0x00 ; 0
fc: 10 e0 ldi r17, 0x00 ; 0
fe: 4a e7 ldi r20, 0x7A ; 122
100: 55 e4 ldi r21, 0x45 ; 69
102: 98 01 movw r18, r16
104: 0e 94 50 01 call 0x2a0 ; 0x2a0 <__mulsf3>
108: 6b 01 movw r12, r22
10a: 5c 01 movw r10, r24
10c: 40 e8 ldi r20, 0x80 ; 128
10e: 5f e3 ldi r21, 0x3F ; 63
110: 98 01 movw r18, r16
112: 0e 94 ca 00 call 0x194 ; 0x194 <__cmpsf2>
116: 88 23 and r24, r24
118: 1a f4 brpl .+6 ; 0x120 <_delay_ms+0x3e>
11a: 81 e0 ldi r24, 0x01 ; 1
11c: 90 e0 ldi r25, 0x00 ; 0
11e: 29 c0 rjmp .+82 ; 0x172 <_delay_ms+0x90>
120: 20 e0 ldi r18, 0x00 ; 0
122: 3f ef ldi r19, 0xFF ; 255
124: 4f e7 ldi r20, 0x7F ; 127
126: 57 e4 ldi r21, 0x47 ; 71
128: b6 01 movw r22, r12
12a: c5 01 movw r24, r10
12c: 0e 94 4b 01 call 0x296 ; 0x296 <__gesf2>
130: 10 e0 ldi r17, 0x00 ; 0
132: 18 17 cp r17, r24
134: cc f4 brge .+50 ; 0x168 <_delay_ms+0x86>
136: 20 e0 ldi r18, 0x00 ; 0
138: 30 e0 ldi r19, 0x00 ; 0
13a: 40 e2 ldi r20, 0x20 ; 32
13c: 51 e4 ldi r21, 0x41 ; 65
13e: b7 01 movw r22, r14
140: c4 01 movw r24, r8
142: 0e 94 50 01 call 0x2a0 ; 0x2a0 <__mulsf3>
146: 0e 94 cf 00 call 0x19e ; 0x19e <__fixunssfsi>
14a: cb 01 movw r24, r22
14c: 80 30 cpi r24, 0x00 ; 0
14e: 91 07 cpc r25, r17
150: 91 f0 breq .+36 ; 0x176 <_delay_ms+0x94>
152: 20 e9 ldi r18, 0x90 ; 144
154: 31 e0 ldi r19, 0x01 ; 1
156: 40 e0 ldi r20, 0x00 ; 0
158: f9 01 movw r30, r18
15a: 31 97 sbiw r30, 0x01 ; 1
15c: f1 f7 brne .-4 ; 0x15a <_delay_ms+0x78>
15e: 01 97 sbiw r24, 0x01 ; 1
160: 80 30 cpi r24, 0x00 ; 0
162: 94 07 cpc r25, r20
164: c9 f7 brne .-14 ; 0x158 <_delay_ms+0x76>
166: 07 c0 rjmp .+14 ; 0x176 <_delay_ms+0x94>
168: b6 01 movw r22, r12
16a: c5 01 movw r24, r10
16c: 0e 94 cf 00 call 0x19e ; 0x19e <__fixunssfsi>
170: cb 01 movw r24, r22
172: 01 97 sbiw r24, 0x01 ; 1
174: f1 f7 brne .-4 ; 0x172 <_delay_ms+0x90>
176: 1f 91 pop r17
178: 0f 91 pop r16
17a: ff 90 pop r15
17c: ef 90 pop r14
17e: df 90 pop r13
180: cf 90 pop r12
182: bf 90 pop r11
184: af 90 pop r10
186: 9f 90 pop r9
188: 8f 90 pop r8
18a: 08 95 ret
0000018c <redacted>:
ATmega328P module getters and setters
00000194 <__cmpsf2>:
194: 0e 94 fe 00 call 0x1fc ; 0x1fc <__fp_cmp>
198: 08 f4 brcc .+2 ; 0x19c <__cmpsf2+0x8>
19a: 81 e0 ldi r24, 0x01 ; 1
19c: 08 95 ret
0000019e <__fixunssfsi>:
19e: 0e 94 2a 01 call 0x254 ; 0x254 <__fp_splitA>
1a2: 88 f0 brcs .+34 ; 0x1c6 <__fixunssfsi+0x28>
1a4: 9f 57 subi r25, 0x7F ; 127
1a6: 98 f0 brcs .+38 ; 0x1ce <__fixunssfsi+0x30>
1a8: b9 2f mov r27, r25
1aa: 99 27 eor r25, r25
1ac: b7 51 subi r27, 0x17 ; 23
1ae: b0 f0 brcs .+44 ; 0x1dc <__fixunssfsi+0x3e>
1b0: e1 f0 breq .+56 ; 0x1ea <__fixunssfsi+0x4c>
1b2: 66 0f add r22, r22
1b4: 77 1f adc r23, r23
1b6: 88 1f adc r24, r24
1b8: 99 1f adc r25, r25
1ba: 1a f0 brmi .+6 ; 0x1c2 <__fixunssfsi+0x24>
1bc: ba 95 dec r27
1be: c9 f7 brne .-14 ; 0x1b2 <__fixunssfsi+0x14>
1c0: 14 c0 rjmp .+40 ; 0x1ea <__fixunssfsi+0x4c>
1c2: b1 30 cpi r27, 0x01 ; 1
1c4: 91 f0 breq .+36 ; 0x1ea <__fixunssfsi+0x4c>
1c6: 0e 94 44 01 call 0x288 ; 0x288 <__fp_zero>
1ca: b1 e0 ldi r27, 0x01 ; 1
1cc: 08 95 ret
1ce: 0c 94 44 01 jmp 0x288 ; 0x288 <__fp_zero>
1d2: 67 2f mov r22, r23
1d4: 78 2f mov r23, r24
1d6: 88 27 eor r24, r24
1d8: b8 5f subi r27, 0xF8 ; 248
1da: 39 f0 breq .+14 ; 0x1ea <__fixunssfsi+0x4c>
1dc: b9 3f cpi r27, 0xF9 ; 249
1de: cc f3 brlt .-14 ; 0x1d2 <__fixunssfsi+0x34>
1e0: 86 95 lsr r24
1e2: 77 95 ror r23
1e4: 67 95 ror r22
1e6: b3 95 inc r27
1e8: d9 f7 brne .-10 ; 0x1e0 <__fixunssfsi+0x42>
1ea: 3e f4 brtc .+14 ; 0x1fa <__fixunssfsi+0x5c>
1ec: 90 95 com r25
1ee: 80 95 com r24
1f0: 70 95 com r23
1f2: 61 95 neg r22
1f4: 7f 4f sbci r23, 0xFF ; 255
1f6: 8f 4f sbci r24, 0xFF ; 255
1f8: 9f 4f sbci r25, 0xFF ; 255
1fa: 08 95 ret
000001fc <__fp_cmp>:
1fc: 99 0f add r25, r25
1fe: 00 08 sbc r0, r0
200: 55 0f add r21, r21
202: aa 0b sbc r26, r26
204: e0 e8 ldi r30, 0x80 ; 128
206: fe ef ldi r31, 0xFE ; 254
208: 16 16 cp r1, r22
20a: 17 06 cpc r1, r23
20c: e8 07 cpc r30, r24
20e: f9 07 cpc r31, r25
210: c0 f0 brcs .+48 ; 0x242 <__fp_cmp+0x46>
212: 12 16 cp r1, r18
214: 13 06 cpc r1, r19
216: e4 07 cpc r30, r20
218: f5 07 cpc r31, r21
21a: 98 f0 brcs .+38 ; 0x242 <__fp_cmp+0x46>
21c: 62 1b sub r22, r18
21e: 73 0b sbc r23, r19
220: 84 0b sbc r24, r20
222: 95 0b sbc r25, r21
224: 39 f4 brne .+14 ; 0x234 <__fp_cmp+0x38>
226: 0a 26 eor r0, r26
228: 61 f0 breq .+24 ; 0x242 <__fp_cmp+0x46>
22a: 23 2b or r18, r19
22c: 24 2b or r18, r20
22e: 25 2b or r18, r21
230: 21 f4 brne .+8 ; 0x23a <__fp_cmp+0x3e>
232: 08 95 ret
234: 0a 26 eor r0, r26
236: 09 f4 brne .+2 ; 0x23a <__fp_cmp+0x3e>
238: a1 40 sbci r26, 0x01 ; 1
23a: a6 95 lsr r26
23c: 8f ef ldi r24, 0xFF ; 255
23e: 81 1d adc r24, r1
240: 81 1d adc r24, r1
242: 08 95 ret
00000244 <__fp_split3>:
244: 57 fd sbrc r21, 7
246: 90 58 subi r25, 0x80 ; 128
248: 44 0f add r20, r20
24a: 55 1f adc r21, r21
24c: 59 f0 breq .+22 ; 0x264 <__fp_splitA+0x10>
24e: 5f 3f cpi r21, 0xFF ; 255
250: 71 f0 breq .+28 ; 0x26e <__fp_splitA+0x1a>
252: 47 95 ror r20
00000254 <__fp_splitA>:
254: 88 0f add r24, r24
256: 97 fb bst r25, 7
258: 99 1f adc r25, r25
25a: 61 f0 breq .+24 ; 0x274 <__fp_splitA+0x20>
25c: 9f 3f cpi r25, 0xFF ; 255
25e: 79 f0 breq .+30 ; 0x27e <__fp_splitA+0x2a>
260: 87 95 ror r24
262: 08 95 ret
264: 12 16 cp r1, r18
266: 13 06 cpc r1, r19
268: 14 06 cpc r1, r20
26a: 55 1f adc r21, r21
26c: f2 cf rjmp .-28 ; 0x252 <__fp_split3+0xe>
26e: 46 95 lsr r20
270: f1 df rcall .-30 ; 0x254 <__fp_splitA>
272: 08 c0 rjmp .+16 ; 0x284 <__fp_splitA+0x30>
274: 16 16 cp r1, r22
276: 17 06 cpc r1, r23
278: 18 06 cpc r1, r24
27a: 99 1f adc r25, r25
27c: f1 cf rjmp .-30 ; 0x260 <__fp_splitA+0xc>
27e: 86 95 lsr r24
280: 71 05 cpc r23, r1
282: 61 05 cpc r22, r1
284: 08 94 sec
286: 08 95 ret
00000288 <__fp_zero>:
288: e8 94 clt
0000028a <__fp_szero>:
28a: bb 27 eor r27, r27
28c: 66 27 eor r22, r22
28e: 77 27 eor r23, r23
290: cb 01 movw r24, r22
292: 97 f9 bld r25, 7
294: 08 95 ret
00000296 <__gesf2>:
296: 0e 94 fe 00 call 0x1fc ; 0x1fc <__fp_cmp>
29a: 08 f4 brcc .+2 ; 0x29e <__gesf2+0x8>
29c: 8f ef ldi r24, 0xFF ; 255
29e: 08 95 ret
000002a0 <__mulsf3>:
2a0: 0e 94 74 01 call 0x2e8 ; 0x2e8 <__mulsf3x>
2a4: 0c 94 54 01 jmp 0x2a8 ; 0x2a8 <__fp_round>
000002a8 <__fp_round>:
2a8: 09 2e mov r0, r25
2aa: 03 94 inc r0
2ac: 00 0c add r0, r0
2ae: 11 f4 brne .+4 ; 0x2b4 <__fp_round+0xc>
2b0: 88 23 and r24, r24
2b2: 52 f0 brmi .+20 ; 0x2c8 <__fp_round+0x20>
2b4: bb 0f add r27, r27
2b6: 40 f4 brcc .+16 ; 0x2c8 <__fp_round+0x20>
2b8: bf 2b or r27, r31
2ba: 11 f4 brne .+4 ; 0x2c0 <__fp_round+0x18>
2bc: 60 ff sbrs r22, 0
2be: 04 c0 rjmp .+8 ; 0x2c8 <__fp_round+0x20>
2c0: 6f 5f subi r22, 0xFF ; 255
2c2: 7f 4f sbci r23, 0xFF ; 255
2c4: 8f 4f sbci r24, 0xFF ; 255
2c6: 9f 4f sbci r25, 0xFF ; 255
2c8: 08 95 ret
2ca: 0e 94 ce 01 call 0x39c ; 0x39c <__fp_pscA>
2ce: 38 f0 brcs .+14 ; 0x2de <__fp_round+0x36>
2d0: 0e 94 d5 01 call 0x3aa ; 0x3aa <__fp_pscB>
2d4: 20 f0 brcs .+8 ; 0x2de <__fp_round+0x36>
2d6: 95 23 and r25, r21
2d8: 11 f0 breq .+4 ; 0x2de <__fp_round+0x36>
2da: 0c 94 dc 01 jmp 0x3b8 ; 0x3b8 <__fp_inf>
2de: 0c 94 e2 01 jmp 0x3c4 ; 0x3c4 <__fp_nan>
2e2: 11 24 eor r1, r1
2e4: 0c 94 45 01 jmp 0x28a ; 0x28a <__fp_szero>
000002e8 <__mulsf3x>:
2e8: 0e 94 22 01 call 0x244 ; 0x244 <__fp_split3>
2ec: 70 f3 brcs .-36 ; 0x2ca <__fp_round+0x22>
000002ee <__mulsf3_pse>:
2ee: 95 9f mul r25, r21
2f0: c1 f3 breq .-16 ; 0x2e2 <__fp_round+0x3a>
2f2: 95 0f add r25, r21
2f4: 50 e0 ldi r21, 0x00 ; 0
2f6: 55 1f adc r21, r21
2f8: 62 9f mul r22, r18
2fa: f0 01 movw r30, r0
2fc: 72 9f mul r23, r18
2fe: bb 27 eor r27, r27
300: f0 0d add r31, r0
302: b1 1d adc r27, r1
304: 63 9f mul r22, r19
306: aa 27 eor r26, r26
308: f0 0d add r31, r0
30a: b1 1d adc r27, r1
30c: aa 1f adc r26, r26
30e: 64 9f mul r22, r20
310: 66 27 eor r22, r22
312: b0 0d add r27, r0
314: a1 1d adc r26, r1
316: 66 1f adc r22, r22
318: 82 9f mul r24, r18
31a: 22 27 eor r18, r18
31c: b0 0d add r27, r0
31e: a1 1d adc r26, r1
320: 62 1f adc r22, r18
322: 73 9f mul r23, r19
324: b0 0d add r27, r0
326: a1 1d adc r26, r1
328: 62 1f adc r22, r18
32a: 83 9f mul r24, r19
32c: a0 0d add r26, r0
32e: 61 1d adc r22, r1
330: 22 1f adc r18, r18
332: 74 9f mul r23, r20
334: 33 27 eor r19, r19
336: a0 0d add r26, r0
338: 61 1d adc r22, r1
33a: 23 1f adc r18, r19
33c: 84 9f mul r24, r20
33e: 60 0d add r22, r0
340: 21 1d adc r18, r1
342: 82 2f mov r24, r18
344: 76 2f mov r23, r22
346: 6a 2f mov r22, r26
348: 11 24 eor r1, r1
34a: 9f 57 subi r25, 0x7F ; 127
34c: 50 40 sbci r21, 0x00 ; 0
34e: 9a f0 brmi .+38 ; 0x376 <__mulsf3_pse+0x88>
350: f1 f0 breq .+60 ; 0x38e <__mulsf3_pse+0xa0>
352: 88 23 and r24, r24
354: 4a f0 brmi .+18 ; 0x368 <__mulsf3_pse+0x7a>
356: ee 0f add r30, r30
358: ff 1f adc r31, r31
35a: bb 1f adc r27, r27
35c: 66 1f adc r22, r22
35e: 77 1f adc r23, r23
360: 88 1f adc r24, r24
362: 91 50 subi r25, 0x01 ; 1
364: 50 40 sbci r21, 0x00 ; 0
366: a9 f7 brne .-22 ; 0x352 <__mulsf3_pse+0x64>
368: 9e 3f cpi r25, 0xFE ; 254
36a: 51 05 cpc r21, r1
36c: 80 f0 brcs .+32 ; 0x38e <__mulsf3_pse+0xa0>
36e: 0c 94 dc 01 jmp 0x3b8 ; 0x3b8 <__fp_inf>
372: 0c 94 45 01 jmp 0x28a ; 0x28a <__fp_szero>
376: 5f 3f cpi r21, 0xFF ; 255
378: e4 f3 brlt .-8 ; 0x372 <__mulsf3_pse+0x84>
37a: 98 3e cpi r25, 0xE8 ; 232
37c: d4 f3 brlt .-12 ; 0x372 <__mulsf3_pse+0x84>
37e: 86 95 lsr r24
380: 77 95 ror r23
382: 67 95 ror r22
384: b7 95 ror r27
386: f7 95 ror r31
388: e7 95 ror r30
38a: 9f 5f subi r25, 0xFF ; 255
38c: c1 f7 brne .-16 ; 0x37e <__mulsf3_pse+0x90>
38e: fe 2b or r31, r30
390: 88 0f add r24, r24
392: 91 1d adc r25, r1
394: 96 95 lsr r25
396: 87 95 ror r24
398: 97 f9 bld r25, 7
39a: 08 95 ret
0000039c <__fp_pscA>:
39c: 00 24 eor r0, r0
39e: 0a 94 dec r0
3a0: 16 16 cp r1, r22
3a2: 17 06 cpc r1, r23
3a4: 18 06 cpc r1, r24
3a6: 09 06 cpc r0, r25
3a8: 08 95 ret
000003aa <__fp_pscB>:
3aa: 00 24 eor r0, r0
3ac: 0a 94 dec r0
3ae: 12 16 cp r1, r18
3b0: 13 06 cpc r1, r19
3b2: 14 06 cpc r1, r20
3b4: 05 06 cpc r0, r21
3b6: 08 95 ret
000003b8 <__fp_inf>:
3b8: 97 f9 bld r25, 7
3ba: 9f 67 ori r25, 0x7F ; 127
3bc: 80 e8 ldi r24, 0x80 ; 128
3be: 70 e0 ldi r23, 0x00 ; 0
3c0: 60 e0 ldi r22, 0x00 ; 0
3c2: 08 95 ret
000003c4 <__fp_nan>:
3c4: 9f ef ldi r25, 0xFF ; 255
3c6: 80 ec ldi r24, 0xC0 ; 192
3c8: 08 95 ret
000003ca <_exit>:
3ca: f8 94 cli
000003cc <__stop_program>:
3cc: ff cf rjmp .-2 ; 0x3cc <__stop_program>
0x00-0x67 ... vectors (104 bytes)
0x68-0xa5 ... program setup (62 bytes)
0xa6-0xe1 ... our program, including function calls (60 bytes)
0xe2-0x18b ... _delay_ms function (170 bytes)
0x18c-0x193 ... DDRB and PORTB setting code within ATmega328P module (8 bytes)
0x194-0x3cd ... libGCC runtime (570 bytes)
Note that largely the GCC runtime is used because we did floating point arithmetic, which clang is lowering into compiler runtime function calls. If we rewrote the delay function we could probably make it more efficient.
170 bytes is pretty much unavoidable vectors and program overhead (including _exit, _stop_program, __ctors_end, __do_copy_data, __do_clear_bss, __bad_interrupt and __vectors)
Note: In this program 740 bytes are being used by inefficiently written delay functions, we could probably improve on that.
So, how do I know when I'm hitting my limit on program memory? The above program size output is helpful but it's not the full story. You can see basically a program size and a data size. (In classic linker terms the .text and .data segments.) You need to add both of those. The .data segment contains the values that are copied into RAM for initialising global variables when they're not initialised to zero (more on that later). So you have to add both of those, and then the value should be <= than 32,768, right?
Well... close. However, Arduino boards per normal also have a 512 byte bootloader, often it's a 1k bootloader... so that flash memory has gone too.
But the point is AVR uses a lot of fixed flash memory even in the most basic programs, as well as C runtime overhead described above and GCC runtime, there are a lot of fixed overheads in ISRs, etc. (see the other article).
*always* starts at 0x100 - why? on AVR that memory is actually needed for the CPU registers r0-31! And for all the hardware registers... the cpu status register (0x3f), the stack pointer (0x3d,0x3e) and many others like PORTB. So you've lost 12% of your RAM right there! (https://www.farnell.com/datasheets/1693866.pdf... this link might be broken, just google "atmega328p datasheet")
Add in global variables, which is the .data segment above and bss... variables initialised to 0 again will be larger with AVR imported because of the various global fixed sized buffer we use for things like I2C, UART, int to string. All of these are fixed overheads when you use the AVR module.
So that's another 256 bytes or so gone. And the stack? Well that will always be using a few bytes at the top of RAM... estimate maybe 10 bytes for each level of function call you go down, varying a lot. If using sophisticated swift structs and passing them on the stack (occasionally buggy in S4A version 4.8 still... but we're working on it!) then you can easily add 50-100bytes to each stack frame... if you have 4 functions deep, that's nearly 25% of your RAM just for the stack! If you do recursion it can quickly get worse and blow all the RAM.
So you can see that the room remaining for UnsafeMutableBufferPointers or Arrays can shrink dramatically.
So the final conclusion? When S4A says...
60% (rounded) of flash memory used.
10% (rounded) of RAM used for global variables. Stack and heap will use more on top of that.
...it's not the whole story! It's a reasonable estimate, but you'll never be able to use 100% of flash memory or 100% of RAM for global variables.
Comments
Post a Comment