3845 Commits

Author SHA1 Message Date
Scott Shawcroft
0b94638aeb
Changes based on Dan's feedback 2020-09-03 16:32:12 -07:00
Jeff Epler
cbfd38d1ce Rename functions to encode_ngrams / decode_ngrams 2020-09-02 19:09:23 -05:00
Jeff Epler
c34cb82ecb makeqstrdata: correct range of low code points to 0x80..0x9f inclusive
The previous range was unintentionally big and overlaps some characters
we'd like to use (and also 0xa0, which we don't intentionally use)
2020-09-02 15:52:02 -05:00
Jeff Epler
07740d19f3 add bigram compression to makeqstrdata
Compress common unicode bigrams by making code points in the range
0x80 - 0xbf (inclusive) represent them.  Then, they can be greedily
encoded and the substituted code points handled by the existing Huffman
compression.  Normally code points in the range 0x80-0xbf are not used
in Unicode, so we stake our own claim.  Using the more arguably correct
"Private Use Area" (PUA) would mean that for scripts that only use
code points under 256 we would use more memory for the "values" table.

bigram means "two letters", and is also sometimes called a "digram".
It's nothing to do with "big RAM".  For our purposes, a bigram represents
two successive unicode code points, so for instance in our build on
trinket m0 for english the most frequent are:
['t ', 'e ', 'in', 'd ', ...].

The bigrams are selected based on frequency in the corpus, but the
selection is not necessarily optimal, for these reasons I can think of:
 * Suppose the corpus was just "tea" repeated 100 times.  The
   top bigrams would be "te", and "ea".  However,
   overlap, "te" could never be used.  Thus, some bigrams might actually
   waste space
    * I _assume_ this has to be why e.g., bigram 0x86 "s " is more
      frequent than bigram 0x85 " a" in English for Trinket M0, because
      sequences like "can't add" would get the "t " digram and then
      be unable to use the " a" digram.

 * And generally, if a bigram is frequent then so are its constituents.
   Say that "i" and "n" both encode to just 5 or 6 bits, then the huffman
   code for "in" had better compress to 10 or fewer bits or it's a net
   loss!
    * I checked though!  "i" is 5 bits, "n" is 6 bits (lucky guess)
      but the bigram 0x83 also just 6 bits, so this one is a win of
      5 bits for every "it" minus overhead.  Yay, this round goes to team
      compression.
    * On the other hand, the least frequent bigram 0x9d " n" is 10 bits
      long and its constituent code points are 4+6 bits so there's no
      savings, but there is the cost of the table entry.
    * and somehow 0x9f 'an' is never used at all!

With or without accounting for overlaps, there is some optimum number
of bigrams.  Adding one more bigram uses at least 2 bytes (for the
entry in the bigram table; 4 bytes if code points >255 are in the
source text) and also needs a slot in the Huffman dictionary, so
adding bigrams beyond the optimim number makes compression worse again.

If it's an improvement, the fact that it's not guaranteed optimal
doesn't seem to matter too much.  It just leaves a little more fruit
for the next sweep to pick up.  Perhaps try adding the most frequent
bigram not yet present, until it doesn't improve compression overall.

Right now, de_DE is again the "fullest" build on trinket_m0.  (It's
reclaimed that spot from the ja translation somehow)  This change saves
104 bytes there, increasing free space about 6.8%.  In the larger
(but not critically full) pyportal build it saves 324 bytes.

The specific number of bigrams used (32) was chosen as it is the max
number that fit within the 0x80..0xbf range.  Larger tables would
require the use of 16 bit code points in the de_DE build, losing savings
overall.

(Side note: The most frequent letters in English have been said
to be: ETA OIN SHRDLU; but we have UAC EIL MOPRST in our corpus)
2020-09-01 17:12:22 -05:00
Scott Shawcroft
f0e60da51f
Merge pull request #3310 from dhalbert/ble_hci
_bleio HCI implementation
2020-09-01 11:28:05 -07:00
Dan Halbert
6dbd369272 merge from upstream 2020-08-30 14:39:03 -04:00
Dan Halbert
b27d511251 address review; use constructor for HCI Adapter 2020-08-30 14:06:48 -04:00
Jeff Epler
455226ffde builtinimport: Fix a crash with 'import ulab.linalg' on unix port only
A crash like the following occurs in the unix port:
```
Program received signal SIGSEGV, Segmentation fault.
0x00005555555a2d7a in mp_obj_module_set_globals (self_in=0x55555562c860 <ulab_user_cmodule>, globals=0x55555562c840 <mp_module_ulab_globals>) at ../../py/objmodule.c:145
145	    self->globals = globals;
(gdb) up
#1  0x00005555555b2781 in mp_builtin___import__ (n_args=5, args=0x7fffffffdbb0) at ../../py/builtinimport.c:496
496	                mp_obj_module_set_globals(outer_module_obj,
(gdb)
#2  0x00005555555940c9 in mp_import_name (name=824, fromlist=0x555555621f10 <mp_const_none_obj>, level=0x1) at ../../py/runtime.c:1392
1392	    return mp_builtin___import__(5, args);
```

I don't understand how it doesn't happen on the embedded ports, because
the module object should reside in ROM and the assignment of self->globals
should trigger a Hard Fault.

By checking VERIFY_PTR, we know that the pointed-to data is on the heap
so we can do things like mutate it.
2020-08-30 11:09:49 -05:00
Scott Shawcroft
767ca5c3dc
Merge remote-tracking branch 'adafruit/main' into native_wifi 2020-08-27 11:42:31 -07:00
Jeff Epler
2e0a109331
Merge pull request #3318 from jepler/interrupt-serial-rx
supervisor: check for interrupt during rx_chr
2020-08-25 21:01:33 -05:00
Scott Shawcroft
8b71e26abd
Merge remote-tracking branch 'adafruit/main' into native_wifi 2020-08-25 16:39:23 -07:00
Jeff Epler
c0753c1afb mp_obj_print_helper: Handle a ctrl-c that comes in during printing
In #2689, hitting ctrl-c during the printing of an object with a lot of sub-objects could cause the screen to stop updating (without showing a KeyboardInterrupt).  This makes the printing of such objects acutally interruptable, and also correctly handles the KeyboardInterrupt:

```
>>> l = ["a" * 100] * 200
>>> l
['aaaaaaaaaaaaaaaaaaaaaa...aaaaaaaaaaa', Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyboardInterrupt:
>>>
```
2020-08-25 11:47:50 -05:00
Scott Shawcroft
701e80a025
Make socket reads interruptable 2020-08-21 11:00:02 -07:00
Dan Halbert
0e30dd8bcc merge from upstream; working; includes debug_out code for debugging via Saleae for posterity 2020-08-20 20:29:57 -04:00
Scott Shawcroft
eb8b42aff1
Add basic error handling 2020-08-19 14:23:28 -07:00
Scott Shawcroft
1034cc1217
Add espidf module. 2020-08-19 14:23:28 -07:00
Scott Shawcroft
430530c74b
SSL works until it runs out of memory 2020-08-19 14:23:28 -07:00
Scott Shawcroft
c9ece21c28
SocketPool stubbed out 2020-08-19 14:22:13 -07:00
Scott Shawcroft
3860991111
Ping work and start to add socketpool 2020-08-19 14:22:13 -07:00
Scott Shawcroft
c53a72d3f5
Fix ipaddress import and parse ipv4 strings 2020-08-19 14:22:13 -07:00
Scott Shawcroft
c62ab6e09a
Add ipaddress 2020-08-19 14:22:12 -07:00
Scott Shawcroft
1a6f4e0fe0
Scanning WIP. Need to sort out supervisor memory 2020-08-19 14:22:12 -07:00
Scott Shawcroft
c5b8401a15
First crack at native wifi API 2020-08-19 14:21:59 -07:00
Scott Shawcroft
6857f98426
Split pulseio.PWMOut into pwmio
This gives us better granularity when implementing new ports because
PWMOut is commonly implemented before PulseIn and PulseOut.

Fixes #3211
2020-08-18 13:08:33 -07:00
Scott Shawcroft
24ca5c0218
Merge pull request #3295 from tannewt/turn_off_terminalio
Turn off terminalio for ja and ko
2020-08-18 12:10:31 -07:00
Taku Fukada
79a3796b1c Calculate the Huffman codebook without MP_QSTRs 2020-08-18 23:21:14 +09:00
Scott Shawcroft
d01f5dc0bd
Turn off terminalio for ja and ko
The font is missing many characters and the build needs the space.
We can optimize font storage when we get a good font.

The serial output will work as usual.
2020-08-17 17:17:59 -07:00
Jeff Epler
08ed09acc6 makeqstrdata: don't print "compression incrased length" messages
This check as implemented is misleading, because it compares the
compressed size in bytes (including the length indication) with the source
string length in Unicode code points.  For English this is approximately
fair, but for Japanese this is quite unfair and produces an excess of
"increased length" messages.

This message might have existed for one of two reasons:
 * to alert to an improperly function huffman compression
 * to call attention to a need for a "string is stored uncompressed" case
We know by now that the huffman compression is functioning as designed and
effective in general.

Just to be on the safe side, I did some back-of-the-envelope estimates.
I considered these three replacements for "the true source string size, in bytes":
+    decompressed_len_utf8 = len(decompressed.encode('utf-8'))
+    decompressed_len_utf16 = len(decompressed.encode('utf-16be'))
+    decompressed_len_bitsize = ((1+len(decompressed)) * math.ceil(math.log(1+len(values), 2)) + 7) // 8

The third counts how many bits each character requires (fewer than 128
characters in the source character set = 7, fewer than 256 = 8, fewer than 512
= 9, etc, adding a string-terminating value) and is in some way representative
of the best way we would be able to store "uncompressed strings".  The Japanese
translation (largest as of writing) has just a few strings which increase by
this metric.  However, the amount of loss due to expansion in those cases is
outweighed by the cost of adding 1 bit per string to indicate whether it's
compressed or not.  For instance, in the BOARD=trinket_m0 TRANSLATION=ja build
the loss is 47 bytes over 300 strings.  Adding 1 bit to each of 300 strings will
cost about 37 bytes, leaving just 5 Thumb instructions to implement the code to
check and decode "uncompressed" strings in order to break even.
2020-08-16 20:50:48 -05:00
Jeff Epler
cff448205f Don't define SHARPDISPLAY when !DISPLAYIO
.. even if FULL_BUILD
2020-08-12 07:39:28 -05:00
Jeff Epler
c1400bae9b sharpmemory: Implement support for Sharp Memory Displays in framebufferio 2020-08-12 07:32:18 -05:00
Jeff Epler
93b373d617 "pop from empty %q"
Saves 12 bytes code on trinket m0
2020-08-04 18:42:09 -05:00
Jeff Epler
65e26f4a06 py: mp_obj_get_type_qstr as macro saves 24 bytes 2020-08-04 14:45:45 -05:00
Jeff Epler
024c8da578 Combine some "can't convert" messages 2020-08-04 14:45:45 -05:00
Jeff Epler
c849b781c0 Combine 'index out of range' messages 2020-08-04 14:45:45 -05:00
Jeff Epler
89797fd3f9 various: Use mp_obj_get_type_qstr more widely
This removes runtime allocations of the cstring version of the qstring.

It is not a size improvement
2020-08-04 14:45:45 -05:00
Jeff Epler
c37a25f0e5 Use qstrs to save an additional 4 bytes 2020-08-04 14:45:45 -05:00
Jeff Epler
92917b84f1 fix exception type for pop from empty set 2020-08-04 13:58:29 -05:00
Jeff Epler
67eb93fc98 py: introduce, use mp_raise_msg_vlist
This saves a very small amount of flash, 8 bytes on trinket_m0
2020-08-04 13:34:29 -05:00
Jeff Epler
dddd25a776 Combine similar strings to reduce size of translations
This is a slight trade-off with code size, in places where a "_varg"
mp_raise variant is now used.  The net savings on trinket_m0 is
just 32 bytes.

It also means that the translation will include the original English
text, and cannot be translated.  These are usually names of Python
types such as int, set, or dict or special values such as "inf" or
"Nan".
2020-08-04 13:34:29 -05:00
Dan Halbert
0a60aee3e4 wip: compiles 2020-08-02 11:36:38 -04:00
Jeff Epler
d69f081c04 Merge remote-tracking branch 'origin/main' into blm_badge 2020-07-30 07:24:48 -05:00
Scott Shawcroft
61d1148bb3
Merge pull request #3222 from WarriorOfWire/pick_micropython
py/compile: Don't await __aiter__ special method in async-for.
2020-07-29 10:54:37 -07:00
Jeff Epler
9b8df7f635 Upgrade ulab
This version
 * moves source files to reflect module structure
 * adds inline documentation suitable for extract_pyi
 * incompatibly moves spectrogram to fft
 * incompatibly removes "extras"

There are some remaining markup errors in the specific revision of
extmod/ulab but they do not prevent the doc building process from
completing.
2020-07-28 16:57:48 -05:00
Dan Halbert
aa97ea2501 Merge remote-tracking branch 'adafruit/main' into blm_badge 2020-07-28 14:15:02 -04:00
Dan Halbert
e5e132a364 add blm_badge; add CIRCUITPY_AUDIOBUSIOIO_I2SOUT 2020-07-28 11:49:54 -04:00
Jerry Needell
2bdd62220e adjust stack for SAMD21 to accomodate larger pystack -- update frozen module adafruit_busdevice 2020-07-27 21:50:36 -04:00
Jonathan Hogg
901f3dce6e py/compile: Don't await __aiter__ special method in async-for.
MicroPython's original implementation of __aiter__ was correct for an
earlier (provisional) version of PEP492 (CPython 3.5), where __aiter__ was
an async-def function.  But that changed in the final version of PEP492 (in
CPython 3.5.2) where the function was changed to a normal one.  See
https://www.python.org/dev/peps/pep-0492/#why-aiter-does-not-return-an-awaitable
See also the note at the end of this subsection in the docs:
https://docs.python.org/3.5/reference/datamodel.html#asynchronous-iterators
And for completeness the BPO: https://bugs.python.org/issue27243

To be consistent with the Python spec as it stands today (and now that
PEP492 is final) this commit changes MicroPython's behaviour to match
CPython:  __aiter__ should return an async-iterable object, but is not
itself awaitable.

The relevant tests are updated to match.

See #6267.
2020-07-24 22:55:37 -07:00
Kenny
9a1f1236cc require async for and async with to actually be in an async def method instead of just a generator 2020-07-24 19:47:34 -07:00
Kenny
e9b4e0bd35 remove new char*s because m0 is way oversubscribed 2020-07-23 20:41:10 -07:00
Kenny
51a79b1af7 add coroutine behavior for generators
coroutines don't have __next__; they also call themselves coroutines.
This does not change the fact that `async def` methods are generators,
but it does make them behave more like CPython.
2020-07-23 20:40:16 -07:00