Journaled Flash Storage – Emulating EEPROM over Flash, ACID Transactions, and More. Part II – Existing Implementations by Atmel, SiLabs, TI, STM, and Microchip

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

This is the second part of the article about peculiarities of using Flash as a persistent storage. In the first part of the article we described Flash specifics and defined the desired properties of our persistent storage layer.

Previously published part of the article:
Part I. Flash vs EEPROM

In this second part, we’ll take a look at approaches which are available for emulating EEPROM over Flash. Apparently, most of the proposed implementations are by chip manufacturers; however, it should be noted that most of proposed algorithms are not really MCU-specific, and can be applied to any of Flash-enabled MCUs. As we will see below, all the five implementations we’ve found, have problems with preserving data (in particular, with providing guarantees which are normally provided by non-emulated EEPROM).

Upcoming part of the article:
Part III. JoFS Itself

Really Naïve Implementation – Circular Buffer with Whole-EEPROM-SIze Frames

As it was noted in Part I, one of the problems specific to Flash is that Flash has much less durability – in other words, it can sustain much smaller number of write/erase cycles (typical numbers are 10’000 erasures for Flash, and 100’000 writes for EEPROM). One common way to improve the number of writes (for Flash or for EEPROM) is to use circular buffers.

A very naïve implementation of the circular buffer is described in [Atmel]. While the description is quite sketchy, it seems that the idea is to implement a circular buffer over several pages, and when the buffer is full – to erase everything and to start anew. Each “frame” (“block” in [Atmel]) in the circular buffer starts with a status byte which indicates whether the “frame” is in use.

“There is no question if to use circular buffers, the question is how to use themThe idea of the circular buffer is to reduce the number of Flash erasures (which are necessary only when the whole circular buffer becomes full) while still storing all the necessary information. For example, if we need 100’000 write cycles over Flash which supports only 10’000 erasures, we can use Flash of 10*EEPROM_size, organized into a circular buffer, to get necessary 100’000 write cycles. All the implementations discussed in this article, use circular buffers in one way or another. There is no question if to use circular buffers, the question is how to use them, and on this way there are quite a few caveats.

The first problem with implementation in [Atmel] (at least as I read it) is that they seem to have size of each “frame” to be the whole size of emulated EEPROM. Which means that the size of the whole emulated EEPROM has to be written on every update. Which, of course, means that if we need to update a single byte of EEPROM multiple times, it will result in exhausting the buffer space much quicker than it is really necessary (which in turn will cause more-than-necessary load on Flash). To deal with it [Atmel] suggests to reduce the number of writes by caching data in-memory and to perform writes only on power loss, relying on “bulk capacitor to provide last gasp power”. However, such approaches tend to be not too reliable in practice (especially in an MCU world), and generally don’t provide sufficient protection to be relied on. More formally, when using such a write caching, the algorithm described in [Atmel] doesn’t provide Durability property, which violates our Requirements.

???rabbit_pullquote: unknown img attribute???Moreover, even without write caching the algorithm described in [Atmel] suffers from two further major problems. First, if power loss occurs during or soon after buffer erasure – there is nothing in Flash (as [Atmel] seems to erase the whole buffer at once), which means that power loss at that very moment will cause a violation at least of Durability property. Second, [Atmel] doesn’t specify how status byte is written in relation to the EEPROM data; however, if status byte is written before the EEPROM data, it may cause a violation of Validity property (in the extreme case, we may end up reading EEPROM which consists of all 0xFF bytes, and has nothing to do with anything which we have ever written there).

Integrity for Whole-EEPROM-Size Frames

[SiLabs] describes an approach which is similar to that of [Atmel], but addresses these Validity and Durability issues. First of all, they’re specifying the order of writes (in a manner similar – but not identical – to suggested above for status byte), which allows to prevent violating Validity property. Second, [SiLabs] also replaces erase-all-at-once approach with erase-per-page one (and requires at least two pages), which allows to help with Durability during page erasure. To deal with it, [SiLabs] replaces status byte (which can take only one of two values) with a tag (which contains, in addition to validity information, also a counter which says “how recent this page is” – let’s name it time_counter).

However, while [SiLabs] represents a very substantial improvement over [Atmel] in terms of Validity and Durability, there is still a gap in their approach, which is related to operations which are partially completed because of power loss (see also “Resilience to Arbitrary Data Resulting from Write Operation Interrupted by Power Loss” in our Requirements). The problem is related to partial page erasure: there is a chance that when the page is erased, at least some of the data within the erased page will be modified towards 0xFF state first, while the tag within the same page keeps it’s “Valid Tag” value. Moreover, there is a chance that a time_counter within the tag is modified to a more recent one (for example, 0x03 can be easily modified to 0x07, making it more recent than any of the pages, and causing to violate Validity property). While vulnerability window for this to happen is small, potential for the problem still exists (and, as usual, will manifest itself at the worst possible moment), especially as page erasure is a relatively lengthy operation.

Also it should be noted that [SiLabs] implementation, as the one writing the whole EEPROM at once, seems to have a very high overhead in terms of number of writes, especially if the size of emulated EEPROM is relatively large. [SiLabs] doesn’t mention using write cache; while write caching can be used with [SiLabs] in a manner similar to that of [Atmel], such caching would negate most of the Durability/Validity improvements achieved by [SiLabs] implementation.

Another Naïve Implementation – Multiple Circular Buffers per Page

[TI] describes an approach with a circular buffer for each byte of emulated EEPROM. In other words, for each EEPROM-byte-to-be-emulated we have an N-byte circular buffer of Flash memory, and write over it in circles, erasing the whole page every time when the buffer is about to overflow. However (as it is noted in [TI]), with such an approach it is not possible to distinguish legal value of 0xFF from not-written-yet value; to deal with it, a second circular buffer (“shadow buffer” in [TI]) of the same size is added to indicate whether the entry in the “main” circular buffer is already in use or not. When reading the circular buffer, it is the last value in the “main” circular buffer which has a non-0xFF value in the “shadow buffer”, which is the value to be read.

In [TI] they consider an MCU with 512-byte Flash pages, and suggest two 70-byte circular buffers per page, which provides emulation of 5 bytes of EEPROM per 512-byte Flash page (this is to increase number of writes 10-fold). This kind of 100x overhead is very large, and is related to an inherent inefficiency of the multiple-buffers-per-page method. That is, whenever one of the buffers is about to overflow, all the circular buffers need to be erased and rewritten (there is simply no way to erase a part of the page in Flash, see part I for details); this leads to 100x overhead to emulate 10x improvement in the number of writes.

Now let’s see how this approach copes with our requirements specified in Part I. To start with, approach described in [TI] is not Durable when sudden power loss is considered. As there is only one page present, which needs to be erased from time to time, it means that if power loss has occurred during or right after page erasure, than all the information is lost. While “vulnerability window” is not as large here as for write-caching, it is still pretty bad.

In addition, [TI] doesn’t specify the order of writing for “shadow byte” and “main byte”; if “shadow byte” is written before “main byte”, and power loss occurs after writing “shadow byte” and before writing “main byte”, it may lead to violating Validity property from our Requirements (in the same manner as violating Validity property for the implementation with Whole-EEPROM-Size Frames). Here “vulnerability window” is smaller than for erasure (as erasure tends to take longer than byte writing), but it still exists.

More Complicated Implementation – Two-Page “Pump” with Small Frames

A somewhat more complex implementation of EEPROM-over-Flash emulation is described in [STM]. They use exactly two pages, with each page storing a header (which contains page state), plus a lot of “frames” (“variable elements” in [STM]). Each of the “frames” contains an address and new value at that address. In [STM] address size is 16-bit, and the new value is 16-bit too (so updating any of two bytes leads to the creation of a “frame”).

Normally, one page is in “valid_page” state, and another one is in “erased” state. All the writes are done as “frames” into a “valid_page” page.

Whenever one of the pages becomes filled, then:

status of the “erased” page is changed from “erased” to a temporary “receive_data” state
all the most-recent-data-for-each-address from the “valid_page” page are moved to the “receive_data” page. Note that the size of the most-recent-data-for-each-address is usually much smaller than the size of the page itself (when the “valid_page” page has multiple modifications for the same address, only the last one needs to be written to the “receive_data” page). In a sense, this process can be seen as a very simple “garbage collection”.
“valid page” is erased; it also leads to its status becoming “erased”
status of the “receive_data” page is changed to “valid page”

“From a different perspective, this implementation can be seen as a circular buffer consisting of exactly two pages, and with some special rules regarding transition between the pages.In a sense, this implementation operates as a kind of two-stroke “pump”, with data being “pumped” between two pages. From a different perspective, it can be seen as a circular buffer consisting of exactly two pages, and with some special rules regarding transition between the pages.

Multiple Circular Buffers vs Small Frames: Coping with Real-World Write Distributions

The approach described in [STM] tends to have significantly less overhead that of [TI] in real-world scenarios. It can be observed that for real-world uneven write loads, the overhead of two-page pump is significantly smaller than that of multiple-circular buffers. Let’s consider an example with two 512-byte pages, and we need to provide E combined writes to the 10 bytes of emulated EEPROM over the Flash, where the Flash is able to withstand N erasures. With multiple-circular-buffers, as each of the buffers is 50 bytes in size, we’re able to provide 50*N writes to each of the emulated bytes, i.e. if the distribution of writes across bytes is perfectly flat, then we have E=50*10*N=500*N. However, if we consider more realistic distribution of writes over those 10 bytes, such as them having probabilities of 1/1023-2/1023-…-256/1023-512/1023, then multiple-circular-buffers implementation will fail after performing only 50*N writes to the most-used byte, so E will be only (1/1023+2/2023+…+512/1023)*10*N/(512/1023)=(1/512+1/256+…+1/2+1)*50*N~=250*N. In the very worst case, when all the writes go into one single byte, multiple circular buffers will provide E=50*N.

For two-page pump approach, the number E won’t depend on the distribution of writes across the bytes, and for an implementation in [STM] will always be around E=2*512*N/bytes_per_frame=2*512*N/4=256*N.

Two-Page Pump: Integrity

As we noted above, for multiple-circular-buffers there is a moment when there is no information in Flash at all, which means that if there is a power loss at that point, Durability property is violated for all the data in Flash. With two-page pump, there is no such problem – instead of erasing everything, data is being moved from one page to another one. In general, such an approach can lead to fully Durable implementations; however, the one described in [STM], still has some ways to fail. More specifically, implementation as described in [STM] has two problems.

The first problem is quite bad, as it potentially violates Validity property (this problem is quite similar to the problem with Validity mentioned for both naïve approaches). That is, if the “frame” is only partially written when power loss has occurred, we can end up with a “frame” which looks valid, but contains an invalid value. For example, if the implementation writes frame address first, and frame value second, and the power loss has occurred between writing address and writing value, then we have a frame, which has a valid address, and a value of 0xFF (which is a valid one too). This would violate Validity property from our Requirements. While chances of this happening are quite small (i.e. vulnerability window is quite narrow), the problem can still happen. While the problem can be solved by adding a status byte to each frame and by writing it after the frame itself, this additional logic is not described for [STM] (and would result in some reduction in space efficiency too).

The second problem with two-page pump is related to potential power loss at the very moment of erasing one of the pages. Due to the nature of the erasure it is impossible to guarantee that the page being erased, will end up in any given state (see also “Resilience to Arbitrary Data Resulting from Write Operation Interrupted by Power Loss” in our Requirements). It means that if such a power loss occurs (with erasure operation being only partially completed), then after the power is restored, the page which has been erased can hold pretty much any data: in particular, state within page header may end up in a “receive_data” state, making the choice between two pages impossible. As it will be shown in Part III, this problem can also be addressed, though in a much more complicated manner.

Circular Log with Small Frames

As a further improvement over two-page pump, it can be generalized to having more than two pages, as described in [Microchip]. Similar to the one described in [STM], implementation described in [Microchip] uses small frames, and similar to [STM], it uses a kind of “garbage collection” (named “pack” in [Microchip]) when a page becomes full, but unlike [STM], the number of pages can be more than two. However, both Durability/Validity issues remaining for [STM] and described above, seem to apply to [Microchip] implementation too. In a sense, [Microchip] implementation can be described as a circular log (similar to the one commonly used in databases, though with all information kept in the log itself, and not in the database).

On Faithful EEPROM Emulation

“Whenever a developer is using something named 'X Emulation over Y', she may expect that a faithful emulation exhibits the same properties as original (non-emulated) X.Whenever a developer is using something named “X Emulation over Y”, she may expect (unless it is written in Big Red Bold Letters at the first page of emulation description) that a faithful emulation exhibits the same properties as original (non-emulated) X. The same should stand for “EEPROM Emulation over Flash”. Let’s define “faithful EEPROM Emulation” as “EEPROM Emulation which provides an EEPROM API, and provides at least the same guarantees as non-emulated EEPROM”. Let’s see how implementations-described-above look from the point of view of this definition.

In terms of our Requirements from Part I, normal EEPROM guarantees Validity and Durability properties.¹ That is, with EEPROM the following guarantees are provided – and can be relied on in face of arbitrary power failures:

Validity: after you wrote a byte to EEPROM, you should never get spurious-data-which-you-didn’t-write-there
Durability: after you wrote a byte, it should stay there

“Unfortunately, as discussed in detail above, none of five implementations is a really faithful EEPROM emulation.Unfortunately, as discussed in detail above, none of five implementations really provide these guarantees, and therefore, none of them is a really faithful EEPROM emulation. Each and every of the implementations above has a”vulnerability window”, and if power is lost during this “vulnerability window”, the emulation may fail to provide data which is both Valid and Durable. Still, it is possible to compare these implementations in terms of the size of this vulnerability window (which can be interpreted as “how often it is expected to fail to be a faithful EEPROM emulation”).

¹ normally, Atomicity is not guaranteed by EEPROM (i.e. if power is lost in the middle of writing byte, after restoring the power we may get some bits of the byte being new, and some bits of the byte being old).

From “vulnerability window” point of view, by far the best out of those implementations above, is the one described in [SiLabs]. While there is a vulnerability window for partially completed erasures, it is not too bad compared to the problems exhibited by other implementations (as erasures are relatively rare). The second best one from this perspective is implementations by [STM] and [Microchip]; for them, vulnerability window includes both partially completed erasures and partially completed frame writes. The next one is [TI]: while it has potential to address power loss while writing a frame, it is still vulnerable to partially completed erasures, and also to power loss around completed erasures (more specifically: after the moment when we have started erasure, and until the moment when we finished writing outstanding data to erased page). The next one is [Atmel] (the one without write caching) – its vulnerability window includes partially completed erasures, partially completed writes, power loss while write a frame, and power loss around completed erasures. And by far the least faithful implementations are those with write cache: they are vulnerable all the time from the time when the first write occurs, to the moment when write cache is flushed.

Resilience to different events and size of vulnerability window, for all the implementations described above, are summarized in the following Table (sorted by “Vulnerability Window”):

Implementation	Resilience to Power Loss while Outside of EEPROM Emulator	Resilience to Power Loss around Erasure	Resilience to Power Loss while Writing Frame	Resilience to Partially Completed Writes while Writing Frame	Resilience to Partially Completed Erasure	Is a Faithful EEPROM Emulation?	Vulnerability Window
Any with Write Caching	No	No	No	No	No	No	Largest (Worst)
[Atmel]	Yes	No	No	No	No	No	2nd Largest
[TI]	Yes	No	Potentially Yes²		No	No	3rd Largest
[STM], [Microchip]	Yes	Yes	No	No	No	No	2nd Smallest
[SiLabs]	Yes	Yes	Yes	Probably Yes³	No	Not Exactly	Smallest (Best)

² see discussion in “Another Naïve Implementation – Multiple Circular Buffers per Page” section above

³ under “Weaker” version of corresponding Requirement; full analysis is complicated and is beyond the scope of present article

Summary of Existing Implementations

Taking into account both resilience and number of writes, existing implementations can be furhter summarized in the following Table (sorted by “Vulnerability Window”, which answers the question “How faithful implementation is?”):

Implementation	Is a Faithful EEPROM Emulation?	Vulnerability Window	Number of Writes Under Example Conditions⁴
Implementation	Is a Faithful EEPROM Emulation?	Vulnerability Window	EEPROM_Size= 10 bytes	EEPROM_Size= 64 bytes	EEPROM_Size= 256 bytes
Any with Write Caching	No	Largest (Worst)	Depends on Caching	Depends on Caching	Depends on Caching
[Atmel]	No	2nd Largest	100*N	16*N	4*N
[TI]	No	3rd Largest	50N to 500N⁵	N/A	N/A
[STM], [Microchip]	No	2nd Smallest	256*N	256*N	256*N
[SiLabs]	Not Exactly	Smallest (Best)	100*N	16*N	4*N

⁴ Number of writes is calculated for the following conditions: EEPROM of specified EEPROM_Size is emulated over two 512-byte Flash pages, with each page capable of sustaining N erasures. Writes are random single-byte writes. Note that the numbers provided are related to algorithms as such, and are not related to the libraries (which may have their own further restrictions)

⁵ depends on distribution of writes

It should be noted that the algorithms in this table have nothing to do with the quality of MCUs produced by respective manufacturers, or with their implementations of Flash in hardware; the table above is only about algorithms (and respective libraries), and all these algorithms (though not libraries) can work on any of the MCUs which provide standard Flash capabilities.

To be Continued…

Today, we’ve briefly described five of known implementations for emulating-EEPROM-over-Flash, and discussed their drawbacks. In the next part, we’ll discuss how to fix these problems, and how to satisfy our Requirements which were specified in Part I.

Stay tuned!

EDIT: Part III, JoFS Itself, has been published.

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.