×

Announcing: Slashdot Deals - Explore geek apps, games, gadgets and more. (what is this?)

Thank you!

We are sorry to see you leave - Beta is different and we value the time you took to try it out. Before you decide to go, please take a look at some value-adds for Beta and learn more about it. Thank you for reading Slashdot, and for making the site better!

Ext4 Data Losses Explained, Worked Around

timothy posted more than 5 years ago | from the you-did-back-up-right dept.

Data Storage 421

ddfall writes "H-Online has a follow-up on the Ext4 file system — Last week's news about data loss with the Linux Ext4 file system is explained and new solutions have been provided by Ted Ts'o to allow Ext4 to behave more like Ext3."

Sorry! There are no comments related to the filter you selected.

LOL: Bug Report (5, Funny)

Em Emalb (452530) | more than 5 years ago | (#27258731)

User: My data, it's gone!
EXT4:"Ext4 developer Ted Ts'o stresses in his answer to the bug report that Ext4 behaves precisely as demanded by the POSIX standard for file operations."

Solution: WORKS AS DESIGNED

Re:LOL: Bug Report (-1, Troll)

Anonymous Coward | more than 5 years ago | (#27258761)

LOL I KNO RITE you are so witty

Re:LOL: Bug Report (0)

jd (1658) | more than 5 years ago | (#27258853)

I wish to suggest that this is the immediate solution. The complete solution involves a truckload of pissed-off users storming a POSIX committee meeting and bashing the committee members over the head with clue sticks.

Re:LOL: Bug Report (5, Insightful)

Anonymous Coward | more than 5 years ago | (#27259073)

Rubbish. Sorry, if the syncs were implicit, app developers would just be demanding a way to to turn them off most of the time because they were killing performance.

Re:LOL: Bug Report (4, Insightful)

Z00L00K (682162) | more than 5 years ago | (#27258863)

This is the problem with new features - the users have problems using them until they fully understands and appreciates the advantages and disadvantages.

And also consider - ext4 is relatively new, so it will improve over time. If you want stability stick to ext3 or ext2. If you want a really stupid filesystem go FAT and prepare for a patent attack.

Re:LOL: Bug Report (4, Insightful)

von_rick (944421) | more than 5 years ago | (#27259141)

And also consider - ext4 is relatively new, so it will improve over time. If you want stability stick to ext3 or ext2.

QFT

The filesystem was first released sometime towards the end of December 2008. The Linux distros that incorporated it, gave it as an option, but the default for /root and /home was always EXT3.

In addition, this problem is not a week old like the article states. People have been discussing this problem on forums ever since mid-January, when the benchmarks for EXT4 were published and several people decided to try it out to see how it fares. I have been using EXT4 for my /root partition since January. Fortunately I haven't had any data loss, but if I do end up losing some data, I'd understand that since I have been using a brand new file-system which has not been thoroughly tested by users, nor has it been used on any servers that I know of.

Re:LOL: Bug Report (5, Insightful)

try_anything (880404) | more than 5 years ago | (#27259617)

This is the problem with new features - the users have problems using them until they fully understands and appreciates the advantages and disadvantages.

Advantages: Filesystem benchmarks improve. Real performance... I guess that improves, too. Does anybody know?

Disadvantages: You risk data loss with 95% of the apps you use on a daily basis. This will persist until the apps are rewritten to force data commits at appropriate times, but hopefully not frequently enough to eat up all the performance improvements and more.

Ext4 might be great for servers (where crucial data is stored in databases, which are presumably written by storage experts who read the Posix spec), but what is the rationale for using it on the desktop? Ext4 has been coming for years, and everyone assumed it was the natural successor to ext3 for *all* contexts where ext3 is used, including desktops. I hope distros don't start using or recommending ext4 by default until they figure out how to configure it for safe usage on the desktop. (That will happen long before the apps are rewritten.) Filesystem benchmarks be damned.

Re:LOL: Bug Report (3, Interesting)

causality (777677) | more than 5 years ago | (#27259871)

Disadvantages: You risk data loss with 95% of the apps you use on a daily basis. This will persist until the apps are rewritten to force data commits at appropriate times, but hopefully not frequently enough to eat up all the performance improvements and more.

For those of us who are not so familiar with the data loss issues surrounding EXT4, can someone please explain this? The first question that came to mind when I read that is "why would the average application need to concern itself with filesystem details?" I.e. if I ask OpenOffice to save a file, it should do that the exact same way whether I ask it to save that file to an ext2 partition, an ext3 partition, a reiserfs partition, etc. What would make ext4 an exception? Isn't abstraction of lower-level filesystem details a good thing?

Re:LOL: Bug Report (0)

Anonymous Coward | more than 5 years ago | (#27258889)

If an application decides to check the name of the file system and if the name is "ext4" it erases everything in your home directory, should that be considered a file system bug too?

Re:LOL: Bug Report (0)

Anonymous Coward | more than 5 years ago | (#27259129)

Yes. All new kernel features should do anything it takes to ensure they work with popular applications. If a new kernel feature breaks an application, even if it is because the developers made incorrect assumptions about how things work, then the new kernel feature should be discarded. This is simple common sense, and something that even Microsoft gets right.

Re:LOL: Bug Report (1)

Aphoxema (1088507) | more than 5 years ago | (#27259697)

If an application decides to check the name of the file system and if the name is "ext4" it erases everything in your home directory, should that be considered a file system bug too?

No, I'd call that malice.

Re:LOL: Bug Report (1, Interesting)

berend botje (1401731) | more than 5 years ago | (#27259147)

Ted Ts'o stresses in his answer to the bug report that Ext4 behaves precisely as demanded by the POSIX standard for file operations.

Mr Ts'o is mistaken about this. When he introduces optimasation features that other filesystems (Reiser, for example) have already tried and undone because it doesn't work he is not fit to write filing systems. First learn how others did it, then do it better.

With Ext4 now proven unstable, the only viable new filesystem is ZFS. Or just stick with ext3 or UFS.

Re:LOL: Bug Report (1, Insightful)

Anonymous Coward | more than 5 years ago | (#27259795)

ZFS isn't all that viable for Linux users. ZFS-FUSE is too slow.

With that said, I think someone should just go ahead and put ZFS in the Linux kernel and release a patch only. This will get around the GPL issues. All it would mean is that you couldn't redistribute a kernel binary or source with ZFS stuff in it. Anyone wanting ZFS would have to patch and compile their own kernel, not that big a deal. If it's internal use only then GPL is compatible with the ZFS license.

Personally I have lost a lot of data with all the ext filesystems (and Reiser3 too). I still use it on OS and boot partitions but all my important big data partitions are XFS. I have run for years on failing hardware with XFS. I have never lost data with XFS except for the sectors that were physically damaged and even then I never lost anything important. XFS has been fairly bulletproof for me, whereas I have lost entire ext2/3 partitions due to corruption that wasn't even a hardware failure.

Re:LOL: Bug Report (2, Informative)

larry bagina (561269) | more than 5 years ago | (#27259933)

My one experience with XFS involved the partition being corrupted beyond recoverability within 15 minutes. Too bad, in theory XFS is great.

Anyhow, ZFS is raid, lvm, and fs rolled up into one, so keeping the patch up to date with linux changes could be a bit of work.

Re:LOL: Bug Report (2, Insightful)

shentino (1139071) | more than 5 years ago | (#27259939)

Ext4 is still alpha-ish, and declared as such.

Any *user* who trusts production data to an experimental filesystem is already too stupid to have the right to gripe about losing said data.

Re:LOL: Bug Report (1)

nedlohs (1335013) | more than 5 years ago | (#27260167)

Actually

Solution: an update to the code to behave as idiot application programmers require with a simple mount option.

Those who fail to learn the lessons of history.... (5, Insightful)

morgan_greywolf (835522) | more than 5 years ago | (#27258767)

FTFA, this is the problem:

Ext4, on the other hand, has another mechanism: delayed block allocation. After a file has been closed, up to a minute may elapse before data blocks on the disk are actually allocated. Delayed block allocation allows the filing system to optimise its write processes, but at the price that the metadata of a newly created file will display a size of 0 bytes and occupy no data blocks until the delayed allocation takes place. If the system crashes during this time, the rename() operation may already be committed in the journal, even though the new file still contains no data. The result is that after a crash the file is empty: both the old and the new data have been lost.

And now my question: Why did the Ext4 developers make the same mistakes Reiser and XFS both made (and later corrected) years ago? Before you get to write any filesystem code, you should have to study how other people have done it, including all the change history. Seriously.

Those who fail to learn the lessons of [change] history are doomed to repeat it.

Re:Those who fail to learn the lessons of history. (2, Insightful)

Samschnooks (1415697) | more than 5 years ago | (#27259051)

Speaking as someone who has developed OS commercial code (OS/2), I always assumed that the person before me understood what they were doing; because, if you didn't, you were spending all your time researching how the 'wheel' was invented. Also, aside from this very rare occurrence, it is pretty arrogant to think that your predecessors are incompetent or, to be generous, ignorant.

This problem is just something that slipped through the cracks and I'm sure the originator of this bug is kicking himself in the ass for being so "stupid".

Re:Those who fail to learn the lessons of history. (0)

Anonymous Coward | more than 5 years ago | (#27259789)

You sir are an idiot.

You need to look at your competition or those you are following and see what they've done so you don't repeat their mistakes. Then you can ask yourself "Do we need to look at that too?" or "Do we need to change that too?"

If they had spent just a couple of hours reviewing the change logs of those file systems, this probably may have never happened as it might have been fixed long ago along with what ever else is new and extremely immature with EXT4.

Even if you are creating something new, or think you are (EXT4 isn't something new, it's just another file system so people creating file systems need to review the history of all other file systems, whether what you are doing is "new" or not). You still need to look at history. You don't need to go through their code in this case with a fine tooth comb and pick it apart but just reviewing what you or your competition has done or changed in the past will make your product a better product.

Re:Those who fail to learn the lessons of history. (-1, Troll)

Profane MuthaFucka (574406) | more than 5 years ago | (#27259159)

OK, so you like Reiser FS. Just please don't kill us, roll us up in the rug, and transport our bloody bodies to an unmarked grave in your passenger-seatless Honda.

Re:Those who fail to learn the lessons of history. (2, Insightful)

dotancohen (1015143) | more than 5 years ago | (#27259249)

Before you get to write any filesystem code, you should have to study how other people have done it...

No. Being innovative means being original, and that means taking new and different paths. Once you have seen somebody else's path, it is difficult to go out on your own original path. That is why there are alpha nad beta stages to a project, so that watchful eyes can find the mistakes that you will undoubtedly make, even those that have been made before you.

Shoulders of Giants (1)

turgid (580780) | more than 5 years ago | (#27259385)

Standing on the shoulders of giants is usually the best way to make progress.

Re:Shoulders of Giants (2, Insightful)

Evanisincontrol (830057) | more than 5 years ago | (#27259481)

Standing on the shoulders of giants is usually the best way to make progress.

Sure, if the only direction you want to go is the direction that the giant is already moving. Doesn't help you get anywhere else, though.

Re:Shoulders of Giants (1)

turgid (580780) | more than 5 years ago | (#27259529)

Learning from the mistakes of others is a good practice no matter what direction you're going in.

Re:Shoulders of Giants (1)

TemporalBeing (803363) | more than 5 years ago | (#27260005)

Learning from the mistakes of others is a good practice no matter what direction you're going in.

only so long as they apply to the direction you are going. Not all mistakes by others apply to every direction - in fact, most probably don't.

That doesn't mean that "Lesson's Learned" are useful - just not always applicable.

Re:Those who fail to learn the lessons of history. (1)

CannonballHead (842625) | more than 5 years ago | (#27259555)

Making the same mistakes someone else made is NOT being innovative, it's being stupid or ignorant... or a number of other predicate adjectives.

Innovation is using something in a new way, not making the same mistake in a new way. That's still considered a mistake, and if it can be shown that you should have known about the mistake from someone else making it, you're still "making the same mistake" and not "innovating." Not to say you're not going to make mistakes and not know everything, but it's still a valid criticism.

Re:Those who fail to learn the lessons of history. (0)

Anonymous Coward | more than 5 years ago | (#27259853)

No. Being innovative means being original, and that means taking new and different paths.

Sounds like you'd fit right in at Microsoft. Ignoring technology "not invented here" isn't innovation, it's reinventing the wheel, aka a stupid waste of time.

Re:Those who fail to learn the lessons of history. (1)

Mr. Underbridge (666784) | more than 5 years ago | (#27260101)

No. Being innovative means being original, and that means taking new and different paths.

Yeah, but you still have to get on the road so you can blaze your own trail off of it. That means knowing how other people have done things.

Otherwise, how far do you go with this? First principles? Hell, really, the only way to ensure a totally creative being is to have a baby and hand it over to wolves for rearing. You can be sure that that kid's ideas will be totally uncorrupted by the ideas of other humans. Of course, the kid's ideas will also be useless, but that's the price you pay for creativity.

If this was a Windows issue (-1, Troll)

Anonymous Coward | more than 5 years ago | (#27259343)

I like how if this was a Windows data loss problem, the Slashdot summary would be followed by 300 posts of Microsoft-bashing and jokes.

Re:If this was a Windows issue (1)

Anarke_Incarnate (733529) | more than 5 years ago | (#27259835)

if it was a default FS on the latest version of the $OS_Shipped_On_95_Percent_Of_Desktops and had this bug, sure. If it is a relatively new and untested file system on an OS with choices of stable FS like Reiser, Ext2/3, JFS, XFS, OCFS2, etc, then no, not as big a deal....

Re:If this was a Windows issue (1)

Extide (1002782) | more than 5 years ago | (#27260103)

Interestingly enough NTFS is probably one of the best things about Windows. It has most of the modern features, is incredibly resilient, and has existed for a LONG time.

No kidding (5, Insightful)

Sycraft-fu (314770) | more than 5 years ago | (#27259445)

All the stuff with Ext4 strikes me as amazingly arrogant, and ignorant of the past. The issue that FS authors, well any authors of any system programs/tools/etc need to understand is that your tool being usable is the #1 important thing. In the case of a file system, that means that it reliably stores data on the drive. So, if you do something that really screws that over, well then you probably did it wrong. Doesn't matter if you fully documented it, doesn't matter if it technically "follows the spec" what matters is that it isn't usable.

I mean I could write a spec for a file system that says "No write is guaranteed to be written to disk until the OS is shut down, everything can be cached in RAM for an indefinite amount of time." However that'd be real flaky and lead to data loss. That makes my FS useless. Doesn't matter if it is well documented, what matters is that the damn thing loses data on a regular basis.

I'd give these guys more credit if I was aware of any other major OS/FS combo that did shit like this, but I'm not. Linux/Ext3 doesn't, Windows/NTFS doesn't, OS-X/HFS+ doesn't, Solaris/ZFS doesn't, etc. Well that tells me something. That says that the way they are doing things isn't a good idea. If it is causing problems AND it is something else nobody else does, then probably you ought not do it.

This is just bad design, in my opinion.

Re:No kidding (2, Insightful)

mr_mischief (456295) | more than 5 years ago | (#27259999)

It does store data reliably on the drive that has been properly synchronized by the application's author. This data that is lost is what has been sent to a filehandle but not yet synchronized when the system loses power or crashes.

The FS isn't the problem, but it is exposing problems in applications. If you need your FS to be a safety net for such applications, nobody is taking ext3 away just because ext4 is available. IF you want the higher performance of ext4, buy a damn UPS already.

Re:Those who fail to learn the lessons of history. (1, Informative)

ChienAndalu (1293930) | more than 5 years ago | (#27259637)

As explained in the article - he hasn't made a mistake. The behaviour of ext4 is perfectly compatible with the POSIX standard.

man fsync

Re:Those who fail to learn the lessons of history. (1)

Extide (1002782) | more than 5 years ago | (#27260133)

I like the saying, Working as designed, too bad it's a shitty design. I understand it complies to POSIX but isn't the goal to make something that is perceived as better? In any case I think the semi-arrogance of the authors is the real issue here, not the behavior of the fs.

rename completes before the write (5, Insightful)

Spazmania (174582) | more than 5 years ago | (#27258781)

Ext4 developer Ted Ts'o stresses in his answer to the bug report that Ext4 behaves precisely as demanded by the POSIX standard for file operations.

I couldn't disagree more:

When applications want to overwrite an existing file with new or changed data [...] they first create a temporary file for the new data and then rename it with the system call - rename(). [...] Delayed block allocation allows the filing system to optimise its write processes, but at the price that the metadata of a newly created file will display a size of 0 bytes and occupy no data blocks until [up to 60 seconds later].

Application developers reasonably expect that writes to the disk which happen far apart in time will happen in order. If I write to a file and then rename the file, I expect that the rename will not complete significantly before the write. Certainly not 60 seconds before the write. It seems dead obvious, at least to me, that the update of the directory entry should be deferred until after ext4 flushes that part of the file written prior to the change in the directory entry.

Re:rename completes before the write (1, Insightful)

Anonymous Coward | more than 5 years ago | (#27259887)

You dissagree with his interpritations of the spec?

Well then, show us the relevent part of the spec that says things should happen in order.

It doesnt say that? It says instead to use fsync()?

Blame the FS all you people want, but the fact remains that the application writters screwed up big time, their code is not robust and probably will fail again in the future. Even with Ext3, the code was a ticking time bomb. If power is lost at the right time, the same results would happen.

Sure, it would be nice to have a FS that fixed the poorly made code people write, but that does not remove the blame from the application writters, it simply adds some to the FS writters for taking what was a good desktop FS and trying to turn it into a server FS. Desktop FSs need to deal with poor application code, and with frequent power losses, but poor code is still poor code.

the workaround is bad design (3, Insightful)

girlintraining (1395911) | more than 5 years ago | (#27258809)

Short version: "We're sorry we changed something that worked and everyone was used to, but hey -- it's compliant with a standard." If this were Microsoft, we'd give them a healthy helping of humble pie, but because it's Linux and the magic word "POSIX" gets used, I'm sure we'll forgive them for it. The workaround is laughable -- "call fsync(), and then wait(), wait(), wait(), for the Wizard to see you." How about writing a filesystem that actually does journaling in a reliable fashion, instead of finger-pointing after the user loses data due to your snazzy new optimization and say "The developer did it! It wasn't us, honest." Microsoft does it and we tar and feather them, but the guys making the "latest and greatest" Linux feature we salute them?

We let our own off with heineous mistakes while professionals who do the same thing we hang simply because they dared to ask to be paid for their effort. Lame.

Re:the workaround is bad design (5, Funny)

jd (1658) | more than 5 years ago | (#27258929)

But... those of us who learned the Ancient And Most Wise ways always triple-sync. We also sacrifice Peeps and use red food colouring in voodoo ceremonies (hey, it really is blood, so it should work) to keep the hardware from failing.

On next week's Slashdot, there will be a brief tutorial on the right way to burn a Windows CD at the stake, and how to align the standing stones known as RAM Chips to points of astronomical significance.

Re:the workaround is bad design (0)

Anonymous Coward | more than 5 years ago | (#27259851)

Yeah, if I had mod points (and was logged in!) I'd give them to you.
I'm an old Unix administrator who worked on Unix systems back in the early 1980s and always always always did a triple sync especially before shutdown.

Re:the workaround is bad design (2, Interesting)

morgan_greywolf (835522) | more than 5 years ago | (#27258933)

No, we don't salute them. If you ask me, now matter what Ted T'so says about it complying with the POSIX standard, sorry, but it's a bug if it causes known, popular applications to seriously break, IMHO.

Broken is broken, whether we're talking about Ted T'so or Microsoft.

Re:the workaround is bad design (0)

Anonymous Coward | more than 5 years ago | (#27259877)

While we're talking about Microsoft, I've had several instances of NTFS files being zero filled on system crash.

Dunno (4, Insightful)

Shivetya (243324) | more than 5 years ago | (#27258983)

but if you want a write later file system shouldn't it be restricted to hardware that can preserve it?

I understand that doing writes immediately when requested leads to performance degradation but that is why business systems which defer writes to disk only do so when the hardware can guarantee it. In other words, we have a battery backed cache, if the battery is low or nearing end of life the cache is turned off and all writes are made when the data changes.

Trying to make performance gains to overcome limitations of the hardware never wins out.

Re:Dunno (1)

gnasher719 (869701) | more than 5 years ago | (#27260061)

I understand that doing writes immediately when requested leads to performance degradation but that is why business systems which defer writes to disk only do so when the hardware can guarantee it. In other words, we have a battery backed cache, if the battery is low or nearing end of life the cache is turned off and all writes are made when the data changes.

You don't even need to do this. The reported problem happened (I think) during some installation of five hundred files. The computer crashed just after the installation was finished, at a time when half the changes were written to disk. If the computer had crashed _before_ the installation started, everything would have been fine. If the computer had delayed _all_ writes by two minutes, and the computer crashed a minute after it said "installation finished", but before anything was actually written to disk, everything would have been fine (Ok, you would have to repeat the installation process, but that is no problem).

What the file system must do is do a bunch of changes together that belong together, and minimize the time interval where a crash would have bad results, preferably to zero.

Re:Dunno (1)

mewsenews (251487) | more than 5 years ago | (#27260149)

In other words, we have a battery backed cache, if the battery is low or nearing end of life the cache is turned off and all writes are made when the data changes.

A capacitor would probably have enough juice to do an emergency flush of the cache without wearing out like a battery. I am not an electrical engineer.

O rly? (-1, Troll)

Anonymous Coward | more than 5 years ago | (#27259165)

And how many lines of code have you introduced to the Linux kernel? How much did you pay for you Linux distro?

Right. STFU.

Re:the workaround is bad design (0)

Anonymous Coward | more than 5 years ago | (#27259375)

You may let our own off with heineous mistakes while professionals who do the same thing get hung.

I do exactly the same thing on both cases... Not use it till it is fixed.

Re:the workaround is bad design (2, Insightful)

Dan667 (564390) | more than 5 years ago | (#27259383)

I believe a major difference is that Microsoft would just deny there was a problem at all. If they did acknowledge it, they certainly would not detail what it is.

Re:the workaround is bad design (1)

sakdoctor (1087155) | more than 5 years ago | (#27259417)

wait(), wait(), wait(), for the Wizard to see you

There's no place like /home.
There's no place like /home.
There's no place like /home.

Re:the workaround is bad design (1)

CannonballHead (842625) | more than 5 years ago | (#27260165)

ln -s /home /away
or...
mkdir /away; cp -Rf /home/* /away;

... yes there is!

Re:the workaround is bad design (2, Informative)

ManWithIceCream (1503883) | more than 5 years ago | (#27259607)

We let our own off with heineous mistakes while professionals who do the same thing we hang simply because they dared to ask to be paid for their effort. Lame.

Is Ted Ts'o not professional? Does he not get paid? Ts'o's employed by the Linux Foundation, on leave from IBM. Free Software does not mean volenteer-made software!

Re:the workaround is bad design (3, Insightful)

TheMMaster (527904) | more than 5 years ago | (#27259629)

Actually, no.

Microsoft runs a proprietary show where they 'set the standard' themselves. Which basically means 'there is no standard except how we do it'.
Linux, however, tries to adhere to standards. When it turns out that something doesn't adhere to standards, it gets fixed.

Another problem is that most users of proprietary software on their proprietary OS don't have the sources to the software they use, so if the OS fixes something that was previously broken, but the software version used is 'no longer supported' the 'fix' in the OS breaks the users' software and the user has no option of fixing his software.

THIS is why a) microsoft can't ever truly fix something and b) why using proprietary software screws over the user.

Or would you rather have OSS software do the same as proprietary software vendors and work around problems forever but never fixing them? Saw that shiny 'run in IE7 mode' button in IE8? that's what you'll get...

Re:the workaround is bad design (4, Insightful)

Hatta (162192) | more than 5 years ago | (#27259667)

If this were Microsoft, we'd give them a healthy helping of humble pie, but because it's Linux and the magic word "POSIX" gets used, I'm sure we'll forgive them for it.

You must be reading a different slashdot than I am. The popular opinion I see is that this is very bad design. If the spec allows this behavior, it's time to revisit the spec.

Re:the workaround is bad design (0)

try_anything (880404) | more than 5 years ago | (#27259811)

Short version: "We're sorry we changed something that worked and everyone was used to, but hey -- it's compliant with a standard." If this were Microsoft, we'd give them a healthy helping of humble pie, but because it's Linux and the magic word "POSIX" gets used, I'm sure we'll forgive them for it.

I think what we've learned is that there's a bug in the POSIX standard, and Ext4 exploits the bug to deliver high measured performance in a way that is actually bad for users. So it's a benchmark hack on top of a flawed spec -- all in all, a shit sandwich for users.

That's not to say that Ext4 is bad technology. It sounds like it will deliver on its performance promises on systems that run well-written, failure-resistent software. It just won't work with the software that desktop users currently use. It will take a while for this to get sorted out, and we have to moderate our expectations from "everyone switches to ext4 and gets an automatic speed boost" to "wait and see; desktop users might not benefit from it anytime soon."

Re:the workaround is bad design (1)

gnasher719 (869701) | more than 5 years ago | (#27260117)

I think what we've learned is that there's a bug in the POSIX standard, ...

It is not exactly a bug in the standard. There is a standard, and there is QOI (Quality of Implementation). When you write data, the Posix says that the data is vulnerable for a time interval of unknown length. A good implementation will replace "unknown length" with "length zero", or "length almost zero". ext4 decided that "unknown length" can mean "two minutes". QOI = zero.

Re:the workaround is bad design (1)

DragonWriter (970822) | more than 5 years ago | (#27259849)

Short version: "We're sorry we changed something that worked and everyone was used to, but hey -- it's compliant with a standard." If this were Microsoft, we'd give them a healthy helping of humble pie

If Microsoft simultaneously sacrificed backwards compatibility and correctly implemented a standard, we'd probably be left completely speechless.

voting (3, Funny)

Skapare (16644) | more than 5 years ago | (#27259905)

So is this why we can't have voting (where correctness is paramount over performance) systems developed on Linux?

Re:the workaround is bad design (1)

Xtravar (725372) | more than 5 years ago | (#27260055)

What? When Microsoft made IE more standards-compliant, everyone was happy even if it broke legacy applications/sites.

You, sir, are making no sense.

If Microsoft broke stuff to make their OS POSIX compliant, we'd all be really happy!

Show some respect! (5, Funny)

LotsOfPhil (982823) | more than 5 years ago | (#27258831)

...new solutions have been provided by Ted Ts'o to...

That's General Ts'o to you!

Re:Show some respect! (0)

Anonymous Coward | more than 5 years ago | (#27259181)

I don't think his file system will ever top his chicken.

Re:Show some respect! (1, Funny)

TheGratefulNet (143330) | more than 5 years ago | (#27259777)

"what's a matter, colonel? CHICKEN?"

sorry.

More like ext3? (0)

Anonymous Coward | more than 5 years ago | (#27258891)

...does that make it ext4-, ext3.99, ext4less?

I sit just me? (2, Insightful)

IMarvinTPA (104941) | more than 5 years ago | (#27258961)

I sit just me, or would you expect that the change would only be committed once the data was written to disk under all circumstances?
To me, it sounds like somebody screwed up a part of the POSIX specification. I should look for the line that says "During a crash, loose the user's recently changed file data and wipe out the old data too."

IMarv

Re:I sit just me? (1, Insightful)

Anonymous Coward | more than 5 years ago | (#27258991)

aye, standard aren't perfect. if it doesn't make sense, that part should be avoided and create an updated standard addressing these issues. what somebody decided years bad isn't always the best solution.

Re:I sit just me? (3, Funny)

Em Emalb (452530) | more than 5 years ago | (#27259003)

Nope, not just you, I sit also.

Re:I sit just me? (1)

IMarvinTPA (104941) | more than 5 years ago | (#27259169)

I sit is it? Hmm.

IMarv

POSIX spec is fine, ext4 is flawed (2, Informative)

iYk6 (1425255) | more than 5 years ago | (#27259411)

Someone above says that the POSIX standard is fine, but that ext4 violates it. Here is his quote:
"When applications want to overwrite an existing file with new or changed data [...] they first create a temporary file for the new data and then rename it with the system call - rename("

It seems that ext4 renames the file first, and then writes the file up to 60 seconds later.

Re:POSIX spec is fine, ext4 is flawed (1)

renoX (11677) | more than 5 years ago | (#27260013)

No, POSIX doesn't garantee write before you do a fsync, an added rename doesn't change this.

This situation is identitical to read&write memory ordering: due to a cache, different CPU may see different value of a variable.
Different architecture has different limitation on the way to reorganise read and write, with x86 it's not too bad but with the Alpha which can truly reorganise thing a lot it becomes very difficult to put all the needed memory barriers.

IMHO, there is performance / usability tradeoff here, and Ext4 shouldn't reorganise operation too much: it's too difficult to use for application programmers.. If you have 'write then rename' then the write should always be done *before* the rename..

Re:I sit just me? (0)

Anonymous Coward | more than 5 years ago | (#27259465)

There are proper ways to do what both GNOME and KDE are doing - they just choose to do it wrong, and dependant on a specific implementation's behaviour. They then discover their implementation is complete shit, wrong, and broken according to the POSIX specifications. They then decide to bitch and moan about the FS rather than fix their horribly broken code.

Next, a mass of ignorant users who don't understand what the hell they are talking about in the first place then complain loudly because they are clueless parrots - who are seemingly lucky to locate their keyboard.

Simple put, GNOME/KDE both need to simply fix their shit instead of passing the buck on their horribly buggy code based on notions which are well known to be false by any competent POSIX coder. In short, anyone that is wagging a finger of ext4 is ignorant. Anyone that isn't wagging a finger at GNOME/KDE and DEMANDING they fix their broken behaviour is an idiot.

Workaround is disaster for laptops (5, Insightful)

victim (30647) | more than 5 years ago | (#27258973)

The workaround (flushing everything to disk before the rename) is a disaster for laptops or anything else which might wish to spin down a disk drive.

The write-replace idiom is used when a program is updating a file and can tolerate the update being lost in a crash, but wants either the old or the new to be intact and uncorrupted. The proposed sync solution accomplishes this, but at the cost of spinning up the drive and writing the blocks at each write-replace. How often does your browser update a file while you surf? Every cache entry? Every history entry? What about your music player? Desktop manager? All of these will be spin up your disk drive.

Hiding behind POSIX is not the solution. There needs to be a solution that supports write-replace without spinning up the disk drive.

The ext4 people have kindly illuminated the problem. Now it is time to define a solution. Maybe it will be some sort of barrier logic, maybe a new kind of sync syscall. But it needs to be done.

Re:Workaround is disaster for laptops (0)

Anonymous Coward | more than 5 years ago | (#27259209)

"There needs to be a solution that supports write-replace without spinning up the disk drive."

You mean to write on the disk without spinning it up?

How dows ext3 do that?

The fix for the ext4-problem is so easy:

Good code:
fwrite()
fclose() - no extra spinning of disc

Bad code:
fwrite()
fclose()
rename() - rename may replace old file without new file on dix

Fixed code:
fwrite()
fsync() - sync this file before close
fclose()
rename()

Re:Workaround is disaster for laptops (0)

Anonymous Coward | more than 5 years ago | (#27259547)

No. He wants to write to the disk eventually (but is quite happy for the write to not happen for a long time) without the file being lost.

As long as the rename is not permitted to happen until the data has actually been written, there's no problem with nothing being written. The file on disk still has the entire old contents until the data is written out and then the rename is done, after which it has the entire new contents. It the file is overwritten 20 time, then whenever the OS decides to spin up the disk and clear the cache, the last updated copy of the file is written and the others could potentially (if the file system is smart enough) never have to be written.

Re:Workaround is disaster for laptops (5, Informative)

Kjella (173770) | more than 5 years ago | (#27259631)

Fixed code:
fwrite()
fsync() - sync this file before close
fclose()
rename()

Either you're a troll or an idiot, since you're AC'ing I guess I got trolled. This will sync immidiately and kill performance and battery life, since every block must be confirmed written before the process can continue. What you need to fix this is a delayed rename that happens after the delayed write.

Problem:
fwrite()
fclose()
rename()
*ACTUAL RENAME*
*TIME PASSES* <-- crash happens here = lose old file
*ACTUAL WRITE*

Real solution:
fwrite()
fclose()
rename()
*TIME PASSES* <-- crash happens here = keep old file
*ACTUAL WRITE*
*ACTUAL RENAME*

This is a real troll (-1, Troll)

Anonymous Coward | more than 5 years ago | (#27259757)

What you actually need to fix is the damn SHOW STOPPING CRASH BUG in your SHITTY CODE. After you have done that, then we can discuss your perceived issues with other people's code.

Re:Workaround is disaster for laptops (0)

Anonymous Coward | more than 5 years ago | (#27259847)

Why do you want to rename before the new file is synced?

Writing data and renaming are independent actions and don't have to be executed in order. This is what the stadard says. I assume that you, too, did not read the standard, before you made up your opinion about what it should say about the issue, but doesn't.

Re:Workaround is disaster for laptops (3, Informative)

david_thornley (598059) | more than 5 years ago | (#27260073)

In which case the standard sucks, big time, and finding a loophole that trashes normal expected behavior should not be cause for rejoicing.

There needs to be a way to write a file such that either the old or the new is preserved. Agreed on this?

Now, in a file system that's going to run real well, there needs to be a way to delay writes in order to batch them. Agreed on this?

We have two reasonable demands here. Pick one, because that's all you're going to get.

Currently, in order to keep either the old or new file, it's necessary to write the new file right now. This is the standard behavior, and it trashes performance. Alternatively, the writes can be batched up for later, for good performance, and we run the risk of losing both old and new versions of a file.

In other words, in order to optimize the heck out of the file system, it's necessary to trash the performance.

What we need is a way to do the rewrite-rename thing in a way so it can be safely delayed, so the file system can batch up a lot of writes to do in a really fancy optimized way, but writing the new file fully before renaming it. There's no obvious reason to me why the file system can't keep track of this and guarantee the order. It may not be required by the standard, but that's no excuse for not implementing it.

Re:Workaround is disaster for laptops (3, Informative)

dshadowwolf (1132457) | more than 5 years ago | (#27259975)

And you don't get it... The truth is that Ext4 was writing the journal out before any changes took place. This means that when the crash happens between the metadata write and the actual write a replay of the journal will cause data loss.

Other filesystems with delayed allocation solve this by not writing the journal before the actual data commits happen. The fix that TFA is talking about introduces this to Ext4.

Re:Workaround is disaster for laptops (1)

RiotingPacifist (1228016) | more than 5 years ago | (#27259799)

good code - unless there is a crash during the writing of the file, in which case your software is screwed next time you try and read the config file.

bad code - safe as long as the filesystem isn't ext4 or really old versions of XFS/reiserfs.

fixed code - yeah lets abuse fsync and slow the users system down.

Re:Workaround is disaster for laptops (0)

Anonymous Coward | more than 5 years ago | (#27260095)

(1) good code - unless there is a crash during the writing of the file, in which case your software is screwed next time you try and read the config file.

(2) bad code - safe as long as the filesystem isn't ext4 or really old versions of XFS/reiserfs.

(3) fixed code - yeah lets abuse fsync and slow the users system down.

1: the write is either committed or not like in ext3 thanks to the journal. The software doesn't care about the write commited to a particular point in time.

2: This uses a feature that is not in the posix-standard. So don't rely on it or change the standard.

3: In what way is the system slowed down if the write needs to be committed anyways and you only sync that particular file?

Sure, there should be a function fsync_then_rename() this would create the old behaviour.

Re:Workaround is disaster for laptops (2, Insightful)

GMFTatsujin (239569) | more than 5 years ago | (#27259211)

If the issue is drive spin-up, how have the new generation of flash drives been taken into account? It seems to me that rotational drives are on their way out.

That doesn't do anything for the contemporary generations of laptop, but what would the ramifications be for later ones?

Re:Workaround is disaster for laptops (0)

Anonymous Coward | more than 5 years ago | (#27259283)

All that temporary file usage should reside in /tmp, which anyone with a modicum of knowledge will mount to RAM, especially on desktops and laptops.

Re:Workaround is disaster for laptops (0)

Anonymous Coward | more than 5 years ago | (#27259413)

All that temporary file usage should reside in /tmp, which anyone with a modicum of knowledge will mount to RAM, especially on desktops and laptops.

Are you nuts? My 4 gig laptop has a 10 gig /tmp partition. When I replace it next year, it becomes a "california server", so that's why I set the fs up that way.

It already serves as a linux development platform, so a big /tmp is needed.

Re:Workaround is disaster for laptops (2, Informative)

BigBuckHunter (722855) | more than 5 years ago | (#27260159)

There needs to be a solution that supports write-replace without spinning up the disk drive.

How do you intend on writing to the disk drive... without spinning it up? Is this not what you're asking? If this is indeed your question, the answer is already "by using a battery backed cache".

BBH

wow! i thought linux was flawless. (-1, Flamebait)

Anonymous Coward | more than 5 years ago | (#27259035)

what shit. thank god i got out of it before i wasted too much time on it.

yeah old data in a crash cool no data not so cool (0)

Anonymous Coward | more than 5 years ago | (#27259045)

That is the issue. Ext3 generally gives me a consistent previous point in time in power failure or crash. I would expect ext4 to too. I used XFS and had a power cable get yanked accidentally in the middle of a project. Everything was gone. I immediately dumped XFS over this.

This is unacceptable behavior. Open files should not be zeroed by design. They should be at last point time. I understand HW issues of a power failure, but that is different than it doing it on purpose. Any system dev. that thinks its acceptable is a fool.

Re:yeah old data in a crash cool no data not so co (0)

Anonymous Coward | more than 5 years ago | (#27259583)

Any system dev. that thinks its acceptable is a fool.

Yes, fools are the ones who actually understand the POSIX specification and plan accordingly. Those foolish admins who experience excellent performance and no data-loss. Those fools!

Surely some day they will see the error of their ways by refusing to understand the job for which they are paid! Damn them! Damn their intelligence! Damn their comprehension abilities. Damn them to hell!

Re:yeah old data in a crash cool no data not so co (0)

Anonymous Coward | more than 5 years ago | (#27259785)

Read the comment. If a sys dev believes that going from a system that behaves in a certain way aka ext3 a "crash" and you either have old data or new data generally speaking. On ext4 (whiz bang new and improved) where you no data and this is acceptable. Yes they are fools. It is a regression. You want to set it as a laptop mode fine. Better give warnings though.

Sorry not as a default behavior. This is the difference between theory and practice. Also called the REAL WORLD. I can't guarantee that every app works according to spec. Seems that there is some debate POSIX addresses this.

What is at issue is ext3 was very good in this respect and ext4 no so much. This is a step backward for the vast majority of systems. Esp. servers and desktops.

I don't care what the excuse. If I have a crash or power cable failure etc... I expect the FS hasn't trashed a bunch of open files at least its not DESIGNED to.

CanSecWest security conference (0, Offtopic)

rs232 (849320) | more than 5 years ago | (#27259371)

Pwn2Own 2009 Day 1 - Safari, Internet Explorer, and Firefox Taken Down by Four Zero-Day Exploits [tippingpoint.com]

Charlie Miller got the luck of the draw, and had the first time slot for the browser competition. His target- Safari on Mac OS X. Before I could even pull my camera out, it was over within 2 minutes- and Charlie (coincidentally also last year's first winner of the day) is now the proud owner of yet another MacBook, and $5,000 from the Zero Day Initiative.

Next up, Nils. Just Nils- you know, like "Prince" or "Madonna". With a little tweaking, he ran a sleek exploit against IE8, defying Microsoft's latest built in protection technologies- DEP (Data Execution Prevention) as well as ASLR (Address Space Layout Randomization) to take home the Sony Vaio and $5,000 from ZDI.

Quick workaround - no patches required (5, Informative)

canadiangoose (606308) | more than 5 years ago | (#27259381)

If you mount your ext4 partitions with nodelalloc you should be fine. You will of course no longer benefit from the performance enhancements that delayed allocation bring, but at least you'll have all of your freaking data. I'm running Debian on Linux 2.6.29-rc8-git4, and so far my limited testing has shown this to be very effective.

Re:Quick workaround - no patches required (0)

Anonymous Coward | more than 5 years ago | (#27259493)

nodelalloc fantastic. I hope most distros consider this a DEFAULT.

The funny thing is... (0)

Anonymous Coward | more than 5 years ago | (#27259535)

The funny thing is Theodore claimed "all modern filesystems" suffered from this issue, when in reality, ZFS and others do not :-)

The odd thing is... (2, Insightful)

DragonWriter (970822) | more than 5 years ago | (#27259815)

I'm a hobbyist, and I don't program system level stuff, essentially, at all anymore, but way back when I did do C programming on Linux (~10 years ago), ISTR that this (from Ts'o in TFA) was advice you couldn't go anywhere without getting hit repeatedly over the head with:

if an application wants to ensure that data have actually been written to disk, it must call the the function fsync() before closing the file.

Is this really something that is often missed in serious applications?

Re:The odd thing is... (0)

Anonymous Coward | more than 5 years ago | (#27260031)

I've worked on so-called "enterprise" applications that were critical to the functioning of multi-billion dollar companies, and YES absolutely I can tell you that C programmers skip this kind of thing all the time. I've seen file I/O's done without error checking, I've seen bizarre recursive TCP select functions that only worked by accident, I've seen it all. The problem here is that programmers are seen by companies as an expense, not an asset, so they are constantly pressured to do their work faster and with less resources.

Re:The odd thing is... (0)

Anonymous Coward | more than 5 years ago | (#27260107)

Is this really something that is often missed in serious applications?

No, it's that ext4 applies it inconsistently.

App writes data to a file, then closes the file, then renames the file to something else. It's reasonable to assume that the data is either written to the disk, or it isn't. In this case, the rename (write) happens, but the actual data write doesn't.

Bad POSIX (4, Interesting)

Skapare (16644) | more than 5 years ago | (#27259823)

Ext4, on the other hand, has another mechanism: delayed block allocation. After a file has been closed, up to a minute may elapse before data blocks on the disk are actually allocated. Delayed block allocation allows the filing system to optimise its write processes, but at the price that the metadata of a newly created file will display a size of 0 bytes and occupy no data blocks until the delayed allocation takes place. If the system crashes during this time, the rename() operation may already be committed in the journal, even though the new file still contains no data. The result is that after a crash the file is empty: both the old and the new data have been lost.

Ext4 developer Ted Ts'o stresses in his answer to the bug report that Ext4 behaves precisely as demanded by the POSIX standard for file operations.

If that is true, then to the extent that is true, POSIX is "broken". Related changes to a file system really need to take place in an orderly way. Creating a file, writing its data, and renaming it, are related. Letting the latter change persist while the former change is lost, is just wrong. Does POSIX really require this behavior, or just allow it? If it requires it, then IMHO, POSIX is indeed broken. And if POSIX is broken, then companies like Microsoft are vindicated in their non-conformance.

Easier Fix (3, Insightful)

maz2331 (1104901) | more than 5 years ago | (#27260001)

Why not just make the actual "flushing" process work primarily on memory cache data - including any "renames", "deletes", etc.?

If any "writes" are pending, then the other operations should be done in the strict order in which they were requested. There should be no pattern possible where cache and file metadata can be out of sync with one another.

Data loss? Schmata loss! (-1, Troll)

Anonymous Coward | more than 5 years ago | (#27260119)

Who cares about a little data loss, at least it doesn't MURDER YOUR WIFE! [wikipedia.org]

Explain? What wasn't known? (1)

SIR_Taco (467460) | more than 5 years ago | (#27260127)

Ok...
A) Data loss is due to corrupting/interruption in the time it takes for the file-system to write pending items to the disk. We know that.
B) The time it takes to write items, that are not specifically (in code) told to write to disk NOW, is longer than in previous incarnations. We know that.
C) The main reason no one complained about this feature in ext3 was that the pending time was about 5secs and often times it was never noticed. We know that.

Honestly, any distro that would make this default on install may be brain-dead... The average users is more concerned with data retention than performance. However, having a mechanism to scale the pending write times variably is a good option and scalable to anyone's needs (home -> large data centre).

The applications are broken, not the FS (1)

k-zed (92087) | more than 5 years ago | (#27260137)

So as expected, there is a veritable army of people demanding the old behavior restored; also, most probably a lot of them will "downgrade" or stay with using EXT3.

Of course, the things at fault are really the buggy applications. But even deeper than that, the *paradigm* of having a lot of generated files (that store important user data) that are rewritten unconditionally at each program startup is wrong. What the hell is up with that?

Can't they come up with a method where you rewrite a file only when absolutely necessary? Why must all icon locations, thumbnails and other such GUI desktop bullshit be written and rewritten zillions of times?

Not to mention that EXT3 is just one file system out of many, and arguably not even a very good one. It's rather weird that it was chosen as a default option for so many "popular" distributions (maybe out of some misguided desire to be backwards compatible?). If your application (or again, *paradigm*) works well on only one file system, then it's most probably not the file system's fault.

Misleading headilne; try "Buggy Apps Lose Data" (1)

mkcmkc (197982) | more than 5 years ago | (#27260161)

...under ext4.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?