circuitpython

Commit Graph

Author	SHA1	Message	Date
Scott Shawcroft	1ba28b3edc	Merge pull request #3370 from jepler/compression-bigrams add bigram compression to makeqstrdata (save ~100 bytes on trinket m0 de_DE)	2020-09-10 11:44:56 -07:00
Scott Shawcroft	683462c1b1	Merge pull request #3326 from tannewt/native_wifi Add native wifi API with ESP32S2 support	2020-09-10 11:20:44 -07:00
Jeff Epler	bdb07adfcc	translations: Make decompression clearer Now this gets filled in with values e.g., 128 (0x80) and 159 (0x9f).	2020-09-08 19:07:53 -05:00
Jeff Epler	73858ea682	circuitpy_mpconfig: enable 3-arg pow() with CIRCUITPY_FULL_BUILD This is needed for a port of python3's decimal.py module.	2020-09-06 10:07:57 -05:00
Jeff Epler	20c2dd0c08	core: add int.bit_length() when MICROPY_CYPTHON_COMPAT is enabled This method of integer objects is needed for a port of python3's decimal.py module. MICROPY_CPYTHON_COMPAT is enabled by CIRCUITPY_FULL_BUILD.	2020-09-06 09:53:16 -05:00
Scott Shawcroft	96cf60fbbd	Merge remote-tracking branch 'adafruit/main' into native_wifi	2020-09-03 16:34:56 -07:00
Scott Shawcroft	0b94638aeb	Changes based on Dan's feedback	2020-09-03 16:32:12 -07:00
Jeff Epler	cbfd38d1ce	Rename functions to encode_ngrams / decode_ngrams	2020-09-02 19:09:23 -05:00
Jeff Epler	c34cb82ecb	makeqstrdata: correct range of low code points to 0x80..0x9f inclusive The previous range was unintentionally big and overlaps some characters we'd like to use (and also 0xa0, which we don't intentionally use)	2020-09-02 15:52:02 -05:00
Jeff Epler	07740d19f3	add bigram compression to makeqstrdata Compress common unicode bigrams by making code points in the range 0x80 - 0xbf (inclusive) represent them. Then, they can be greedily encoded and the substituted code points handled by the existing Huffman compression. Normally code points in the range 0x80-0xbf are not used in Unicode, so we stake our own claim. Using the more arguably correct "Private Use Area" (PUA) would mean that for scripts that only use code points under 256 we would use more memory for the "values" table. bigram means "two letters", and is also sometimes called a "digram". It's nothing to do with "big RAM". For our purposes, a bigram represents two successive unicode code points, so for instance in our build on trinket m0 for english the most frequent are: ['t ', 'e ', 'in', 'd ', ...]. The bigrams are selected based on frequency in the corpus, but the selection is not necessarily optimal, for these reasons I can think of: * Suppose the corpus was just "tea" repeated 100 times. The top bigrams would be "te", and "ea". However, overlap, "te" could never be used. Thus, some bigrams might actually waste space * I _assume_ this has to be why e.g., bigram 0x86 "s " is more frequent than bigram 0x85 " a" in English for Trinket M0, because sequences like "can't add" would get the "t " digram and then be unable to use the " a" digram. * And generally, if a bigram is frequent then so are its constituents. Say that "i" and "n" both encode to just 5 or 6 bits, then the huffman code for "in" had better compress to 10 or fewer bits or it's a net loss! * I checked though! "i" is 5 bits, "n" is 6 bits (lucky guess) but the bigram 0x83 also just 6 bits, so this one is a win of 5 bits for every "it" minus overhead. Yay, this round goes to team compression. * On the other hand, the least frequent bigram 0x9d " n" is 10 bits long and its constituent code points are 4+6 bits so there's no savings, but there is the cost of the table entry. * and somehow 0x9f 'an' is never used at all! With or without accounting for overlaps, there is some optimum number of bigrams. Adding one more bigram uses at least 2 bytes (for the entry in the bigram table; 4 bytes if code points >255 are in the source text) and also needs a slot in the Huffman dictionary, so adding bigrams beyond the optimim number makes compression worse again. If it's an improvement, the fact that it's not guaranteed optimal doesn't seem to matter too much. It just leaves a little more fruit for the next sweep to pick up. Perhaps try adding the most frequent bigram not yet present, until it doesn't improve compression overall. Right now, de_DE is again the "fullest" build on trinket_m0. (It's reclaimed that spot from the ja translation somehow) This change saves 104 bytes there, increasing free space about 6.8%. In the larger (but not critically full) pyportal build it saves 324 bytes. The specific number of bigrams used (32) was chosen as it is the max number that fit within the 0x80..0xbf range. Larger tables would require the use of 16 bit code points in the de_DE build, losing savings overall. (Side note: The most frequent letters in English have been said to be: ETA OIN SHRDLU; but we have UAC EIL MOPRST in our corpus)	2020-09-01 17:12:22 -05:00
Scott Shawcroft	f0e60da51f	Merge pull request #3310 from dhalbert/ble_hci _bleio HCI implementation	2020-09-01 11:28:05 -07:00
Dan Halbert	6dbd369272	merge from upstream	2020-08-30 14:39:03 -04:00
Dan Halbert	b27d511251	address review; use constructor for HCI Adapter	2020-08-30 14:06:48 -04:00
Jeff Epler	455226ffde	builtinimport: Fix a crash with 'import ulab.linalg' on unix port only A crash like the following occurs in the unix port: ``` Program received signal SIGSEGV, Segmentation fault. 0x00005555555a2d7a in mp_obj_module_set_globals (self_in=0x55555562c860 <ulab_user_cmodule>, globals=0x55555562c840 <mp_module_ulab_globals>) at ../../py/objmodule.c:145 145 self->globals = globals; (gdb) up #1 0x00005555555b2781 in mp_builtin___import__ (n_args=5, args=0x7fffffffdbb0) at ../../py/builtinimport.c:496 496 mp_obj_module_set_globals(outer_module_obj, (gdb) #2 0x00005555555940c9 in mp_import_name (name=824, fromlist=0x555555621f10 <mp_const_none_obj>, level=0x1) at ../../py/runtime.c:1392 1392 return mp_builtin___import__(5, args); ``` I don't understand how it doesn't happen on the embedded ports, because the module object should reside in ROM and the assignment of self->globals should trigger a Hard Fault. By checking VERIFY_PTR, we know that the pointed-to data is on the heap so we can do things like mutate it.	2020-08-30 11:09:49 -05:00
Scott Shawcroft	767ca5c3dc	Merge remote-tracking branch 'adafruit/main' into native_wifi	2020-08-27 11:42:31 -07:00
Jeff Epler	2e0a109331	Merge pull request #3318 from jepler/interrupt-serial-rx supervisor: check for interrupt during rx_chr	2020-08-25 21:01:33 -05:00
Scott Shawcroft	8b71e26abd	Merge remote-tracking branch 'adafruit/main' into native_wifi	2020-08-25 16:39:23 -07:00
Jeff Epler	c0753c1afb	mp_obj_print_helper: Handle a ctrl-c that comes in during printing In #2689, hitting ctrl-c during the printing of an object with a lot of sub-objects could cause the screen to stop updating (without showing a KeyboardInterrupt). This makes the printing of such objects acutally interruptable, and also correctly handles the KeyboardInterrupt: ``` >>> l = ["a" * 100] * 200 >>> l ['aaaaaaaaaaaaaaaaaaaaaa...aaaaaaaaaaa', Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyboardInterrupt: >>> ```	2020-08-25 11:47:50 -05:00
Scott Shawcroft	701e80a025	Make socket reads interruptable	2020-08-21 11:00:02 -07:00
Dan Halbert	0e30dd8bcc	merge from upstream; working; includes debug_out code for debugging via Saleae for posterity	2020-08-20 20:29:57 -04:00
Scott Shawcroft	eb8b42aff1	Add basic error handling	2020-08-19 14:23:28 -07:00
Scott Shawcroft	1034cc1217	Add espidf module.	2020-08-19 14:23:28 -07:00
Scott Shawcroft	430530c74b	SSL works until it runs out of memory	2020-08-19 14:23:28 -07:00
Scott Shawcroft	c9ece21c28	SocketPool stubbed out	2020-08-19 14:22:13 -07:00
Scott Shawcroft	3860991111	Ping work and start to add socketpool	2020-08-19 14:22:13 -07:00
Scott Shawcroft	c53a72d3f5	Fix ipaddress import and parse ipv4 strings	2020-08-19 14:22:13 -07:00
Scott Shawcroft	c62ab6e09a	Add ipaddress	2020-08-19 14:22:12 -07:00
Scott Shawcroft	1a6f4e0fe0	Scanning WIP. Need to sort out supervisor memory	2020-08-19 14:22:12 -07:00
Scott Shawcroft	c5b8401a15	First crack at native wifi API	2020-08-19 14:21:59 -07:00
Scott Shawcroft	6857f98426	Split pulseio.PWMOut into pwmio This gives us better granularity when implementing new ports because PWMOut is commonly implemented before PulseIn and PulseOut. Fixes #3211	2020-08-18 13:08:33 -07:00
Scott Shawcroft	24ca5c0218	Merge pull request #3295 from tannewt/turn_off_terminalio Turn off terminalio for ja and ko	2020-08-18 12:10:31 -07:00
Taku Fukada	79a3796b1c	Calculate the Huffman codebook without MP_QSTRs	2020-08-18 23:21:14 +09:00
Scott Shawcroft	d01f5dc0bd	Turn off terminalio for ja and ko The font is missing many characters and the build needs the space. We can optimize font storage when we get a good font. The serial output will work as usual.	2020-08-17 17:17:59 -07:00
Jeff Epler	08ed09acc6	makeqstrdata: don't print "compression incrased length" messages This check as implemented is misleading, because it compares the compressed size in bytes (including the length indication) with the source string length in Unicode code points. For English this is approximately fair, but for Japanese this is quite unfair and produces an excess of "increased length" messages. This message might have existed for one of two reasons: * to alert to an improperly function huffman compression * to call attention to a need for a "string is stored uncompressed" case We know by now that the huffman compression is functioning as designed and effective in general. Just to be on the safe side, I did some back-of-the-envelope estimates. I considered these three replacements for "the true source string size, in bytes": + decompressed_len_utf8 = len(decompressed.encode('utf-8')) + decompressed_len_utf16 = len(decompressed.encode('utf-16be')) + decompressed_len_bitsize = ((1+len(decompressed)) * math.ceil(math.log(1+len(values), 2)) + 7) // 8 The third counts how many bits each character requires (fewer than 128 characters in the source character set = 7, fewer than 256 = 8, fewer than 512 = 9, etc, adding a string-terminating value) and is in some way representative of the best way we would be able to store "uncompressed strings". The Japanese translation (largest as of writing) has just a few strings which increase by this metric. However, the amount of loss due to expansion in those cases is outweighed by the cost of adding 1 bit per string to indicate whether it's compressed or not. For instance, in the BOARD=trinket_m0 TRANSLATION=ja build the loss is 47 bytes over 300 strings. Adding 1 bit to each of 300 strings will cost about 37 bytes, leaving just 5 Thumb instructions to implement the code to check and decode "uncompressed" strings in order to break even.	2020-08-16 20:50:48 -05:00
Jeff Epler	cff448205f	Don't define SHARPDISPLAY when !DISPLAYIO .. even if FULL_BUILD	2020-08-12 07:39:28 -05:00
Jeff Epler	c1400bae9b	sharpmemory: Implement support for Sharp Memory Displays in framebufferio	2020-08-12 07:32:18 -05:00
Jeff Epler	93b373d617	"pop from empty %q" Saves 12 bytes code on trinket m0	2020-08-04 18:42:09 -05:00
Jeff Epler	65e26f4a06	py: mp_obj_get_type_qstr as macro saves 24 bytes	2020-08-04 14:45:45 -05:00
Jeff Epler	024c8da578	Combine some "can't convert" messages	2020-08-04 14:45:45 -05:00
Jeff Epler	c849b781c0	Combine 'index out of range' messages	2020-08-04 14:45:45 -05:00
Jeff Epler	89797fd3f9	various: Use mp_obj_get_type_qstr more widely This removes runtime allocations of the cstring version of the qstring. It is not a size improvement	2020-08-04 14:45:45 -05:00
Jeff Epler	c37a25f0e5	Use qstrs to save an additional 4 bytes	2020-08-04 14:45:45 -05:00
Jeff Epler	92917b84f1	fix exception type for pop from empty set	2020-08-04 13:58:29 -05:00
Jeff Epler	67eb93fc98	py: introduce, use mp_raise_msg_vlist This saves a very small amount of flash, 8 bytes on trinket_m0	2020-08-04 13:34:29 -05:00
Jeff Epler	dddd25a776	Combine similar strings to reduce size of translations This is a slight trade-off with code size, in places where a "_varg" mp_raise variant is now used. The net savings on trinket_m0 is just 32 bytes. It also means that the translation will include the original English text, and cannot be translated. These are usually names of Python types such as int, set, or dict or special values such as "inf" or "Nan".	2020-08-04 13:34:29 -05:00
Dan Halbert	0a60aee3e4	wip: compiles	2020-08-02 11:36:38 -04:00
Jeff Epler	d69f081c04	Merge remote-tracking branch 'origin/main' into blm_badge	2020-07-30 07:24:48 -05:00
Scott Shawcroft	61d1148bb3	Merge pull request #3222 from WarriorOfWire/pick_micropython py/compile: Don't await __aiter__ special method in async-for.	2020-07-29 10:54:37 -07:00
Jeff Epler	9b8df7f635	Upgrade ulab This version * moves source files to reflect module structure * adds inline documentation suitable for extract_pyi * incompatibly moves spectrogram to fft * incompatibly removes "extras" There are some remaining markup errors in the specific revision of extmod/ulab but they do not prevent the doc building process from completing.	2020-07-28 16:57:48 -05:00
Dan Halbert	aa97ea2501	Merge remote-tracking branch 'adafruit/main' into blm_badge	2020-07-28 14:15:02 -04:00

1 2 3 4 5 ...

3801 Commits