Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

EXT4 Data Corruption Bug Hits Linux Kernel

Soulskill posted about 2 years ago | from the plenty-of-time-to-fix dept.

Bug 249

An anonymous reader writes "An EXT4 file-system data corruption issue has reached the stable Linux kernel. The latest Linux 3.4, 3.5, 3.6 stable kernels have an EXT4 file-system bug described as an apparent serious progressive ext4 data corruption bug. Kernel developers have found and bisected the kernel issue but are still working on a proper fix for the stable Linux kernel. The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."

cancel ×

249 comments

Sorry! There are no comments related to the filter you selected.

Bisected? (0, Troll)

Anonymous Coward | about 2 years ago | (#41755841)

Kernel developers have found and bisected the kernel issue...

They split it in half? I suspect you mean disected.

Re:Bisected? (5, Informative)

Slayne (10400) | about 2 years ago | (#41755891)

Nope - bisection is a common technique for tracking down the cause of a bug by doing a binary search through the code history.
https://en.wikipedia.org/wiki/Code_Bisection

Re:Bisected? (1)

Tough Love (215404) | about 2 years ago | (#41756089)

The summary should say "bisected and found" not "found and bisected". Bisecting is a way of finding bugs.

Re:Bisected? (1, Funny)

fireman sam (662213) | about 2 years ago | (#41757365)

Bisecting is also a way of killing bugs - or perhaps Bisecting is when you act like an insect that goes both ways.

Re:Bisected? (4, Funny)

Gothmolly (148874) | about 2 years ago | (#41755899)

No this means the kernel has bug-like tendencies from time to time, but is not exclusively buggy. For instance when it's in college, or if its at a bar, and has had a few drinks, well then it might be buggy, but normally at work and at home and to all its friends it acts stable.

Re:Bisected? (0)

Anonymous Coward | about 2 years ago | (#41755971)

Whoooooooooosh!!

Re:Bisected? (2)

CheshireDragon (1183095) | about 2 years ago | (#41757287)

I think YOU are the one who didn't get the joke...

Re:Bisected? (0)

Alan Shutko (5101) | about 2 years ago | (#41755925)

No, they mean bisected.

That's a procedure [ubuntu.com] by which you do a binary search to find which patch caused a problem.

Re:Bisected? (-1, Troll)

kallisti5 (1321143) | about 2 years ago | (#41756021)

Kernel developers have found and bisected the kernel issue...

They split it in half? I suspect you mean disected.

AHAHAHAHAHAHA!!! *WOW* You should avoid making technical comments when you're not very technical. bisecting means git is able to track down bugs by taking known working and non-working points in a source code tree and narrow down the broken revision by working from two directions to the common fault.

Re:Bisected? (0)

Anonymous Coward | about 2 years ago | (#41756111)

I'm pretty sure the GP was making a grammatical comment that was, while technically correct, not applicable technically. English is a harsh mistress...

Re:Bisected? (-1)

kallisti5 (1321143) | about 2 years ago | (#41756159)

I know.. it's just nice to see one of the many grammar nazi's out there get a taste of their own medicine :P

Your Papers Please (5, Funny)

Anonymous Coward | about 2 years ago | (#41756393)

grammar nazi's

grammar Nazis

Re:Bisected? (1)

mcgrew (92797) | about 2 years ago | (#41757007)

grammar nazi's?

*facepalm* I hope that was deliberate.

Re:Bisected? (1)

i_ate_god (899684) | about 2 years ago | (#41756425)

presumably from this post, "being technical" only means complete knowledge of all tools.

I'm guessing you find it very hard to find work with that kind of understanding of what "being technical" implies.

Re:Bisected? (4, Informative)

petermgreen (876956) | about 2 years ago | (#41756179)

What they actually split in half is a sequence of changesets (also known as commits).

The idea is you have a seqence of changesets that take you from the last known good revision to the first known bad revision. By splitting that sequence in half and determining if the revsion in the middle is good or bad you can in principle halve the number of revisions between last known good and first known bad until you find the revision that introduced the bug. Reality is messier because of nonlinear history, because some revisions may be "broken" such that it is not possible to determine if they are "good" or "bad" and because some bugs may be difficult to test for but still bisection is a useful tool for finding problem revisions among a long history relatively easill.

Re:Bisected? (1)

partyguerrilla (1597357) | about 2 years ago | (#41756725)

Bisect and disect are synonymous, they both mean "splitting in half."

Re:Bisected? (2)

newcastlejon (1483695) | about 2 years ago | (#41756771)

Perhaps, if disect is a real word, but dissect means "cut up/apart", not specifically into two parts.

Re:Bisected? (3, Funny)

EMR (13768) | about 2 years ago | (#41757221)

If God forks the Universe every time you roll a die, he'd better have a damned good memory.

Nah, He only needs the latest SHA1 for each roll outcome commit as that'll point up the GIT tree :-D

Re:Bisected? (2)

FatdogHaiku (978357) | about 2 years ago | (#41757081)

They split it in half?

I know it's wrong but I just got this mental image of someone moving all the 0's to one side of a page and all the 1's to the other side...

Low impact (0)

Anonymous Coward | about 2 years ago | (#41755907)

It's a good thing most stable releases are on 3.2 or 3.0 with commercial systems on even earlier versions.

Re:Low impact (0, Flamebait)

Anonymous Coward | about 2 years ago | (#41756265)

Still, for all of the shit that Linux users talk about Windows, Windows has never had anything as serious as a file system corruption bug.

Re:Low impact (0)

hierofalcon (1233282) | about 2 years ago | (#41756353)

I suspect they were just more likely to find them during development since they have to reboot so often when updating Microsoft products. Reboots aren't nearly as frequent on Linux boxes. To say they never had them would be a stretch.

Re:Low impact (1)

Anonymous Coward | about 2 years ago | (#41756445)

Actually, XP is incompatible with the newest version of NTFS, as you will notice if you ever move HDs around various computers or some reason. Not quite the same thing, but easy to overlook. It can produce some very nasty problems.

Re:Low impact (2)

negRo_slim (636783) | about 2 years ago | (#41756567)

Source?
Cuz I'm looking:

http://en.wikipedia.org/wiki/Ntfs#Microsoft_Windows [wikipedia.org]
http://www.tomshardware.com/forum/1249-63-ntfs-win7-windows [tomshardware.com]
http://en.wikipedia.org/wiki/Ntfs#Versions [wikipedia.org]

And just not seeing "XP is incompatible with the newest version of NTFS"

Re:Low impact (0)

Anonymous Coward | about 2 years ago | (#41757169)

That isn't a file system bug, that is progress. Would you consider it a bug if a Linux system from 1998 caused corruption on an ext4 volume?

Re:Low impact (1)

Anonymous Coward | about 2 years ago | (#41756489)

Windows can fuck up its file system just fine. It's just that Microsoft never warns its users about defects in Windows unless someone goes public first. Mostly they silently slip the fixes in with a bunch of other fixes. That is, if they fix the bugs at all.

Backups are important regardless of file system. In the absence of human error or hardware failure... sure enough your file system will still get fucked.

Also if I had a dollar for every time Windows fucked a partition table, I'd be driving a much nicer car.

Re:Low impact (5, Insightful)

jedidiah (1196) | about 2 years ago | (#41756511)

> Windows has never had anything as serious as a file system corruption bug.

That you know of...

Since the Windows development process isn't open, there's no way for you to tell. You don't get to see Microsoft's development versions and you don't get to see Microsoft's bug database.

Re:Low impact (1)

bertok (226922) | about 2 years ago | (#41757225)

You don't get to see Microsoft's development versions and you don't get to see Microsoft's bug database.

You're looking in the wrong place!

They're called features, and they're on the technet website for all the world to see.

Like how in older Windows versions, disks would be auto-mounted, and NTFS didn't have native active/active capability. In other words, if you made the slightest mistake in your FC zoning, then you could kiss your multi-terabyte cluster volume goodbye.

Re:Low impact (1)

Bengie (1121981) | about 2 years ago | (#41756513)

I love Chan9 and MS Research and I think a lot of what MS makes is "cool", but we are all human and mistakes WILL be made. Linux has a great track record. This is also why BTRFS will take a while to get traction in the Enterprise. EXT4 and ZFS are still getting bug fixes.

Re:Low impact (4, Informative)

h4rr4r (612664) | about 2 years ago | (#41756543)

http://answers.microsoft.com/en-us/windows/forum/windows_cp-files/bug-report-serious-filesystem-corruption-and-data/17f69e19-92ca-4e1e-b9d5-f78f1ac4e963 [microsoft.com]

Bugs happen. The difference here is that Linux development is done in the open so people find out about them.

Re:Low impact (1)

Anonymous Coward | about 2 years ago | (#41757381)

1) Windows 7 fucked up the Windows 8 partition not because of a bug, but because it isn't forward compatible
2) Microsoft does not recommend dual booting Windows 8 with older Windows versions
3) The guy was using a prerelease version of Windows 8

Show me a bug where a specific version of Windows corrupts its OWN filesystem (ie. the filesystem that comes with it). You can't, because it never happens.

This is why I stick to Reiser (5, Funny)

Anonymous Coward | about 2 years ago | (#41755929)

I know he'd never do anything to harm me or my data.

Re:This is why I stick to Reiser (2, Funny)

Anonymous Coward | about 2 years ago | (#41755993)

Or your wife?

Re:This is why I stick to Reiser (1)

Anonymous Coward | about 2 years ago | (#41756667)

Only if she needed .... correction.....

Re:This is why I stick to Reiser (1)

Anonymous Coward | about 2 years ago | (#41756127)

There's a kind of sad irony to your comment. The people most enamored by the beauty of logic/algorithms/pure mathematics probably find it difficult [wikipedia.org] to deal with the ugly realities of real life.

Re:This is why I stick to Reiser (2, Funny)

localhost8080 (819098) | about 2 years ago | (#41756485)

yeah, reiser 4 has some killer features

Re:This is why I stick to Reiser (1)

psm321 (450181) | about 2 years ago | (#41756629)

I know you're making a joke about the person, but I've had many corruption issues with ReiserFS. Granted, this was in its earlier days, but after it had been declared stable for use. I gave up on it after the problems, so no idea if later versions improved.

Reinventing the wheel (0)

Anonymous Coward | about 2 years ago | (#41755933)

It's a pity they can use ZFS instead of re-inventing the wheel. The other pity is that newest distro seems to force you to use EXT4 at installation (on your desktop).

Re:Reinventing the wheel (4, Interesting)

UnknownSoldier (67820) | about 2 years ago | (#41756449)

I have to agree with you. This is one of the best demos of ZFS around :)
http://www.youtube.com/watch?v=QGIwg6ye1gE [youtube.com]

ZFS solves 3 problems by taking a wholistic approach:

* Volume Management
* File System
* Data Integrity

Instead of fragmenting the problem into 3 layers which only have limited access and knowledge by using a unified layer you have more meta-information available to make smarter decisions.

Some interesting essays:

https://blogs.oracle.com/bonwick/entry/raid_z [oracle.com]
https://blogs.oracle.com/bonwick/en_US/entry/rampant_layering_violation [oracle.com]

Re:Reinventing the wheel (1, Troll)

h4rr4r (612664) | about 2 years ago | (#41756565)

Hopefully BTFS will conquer this.

Blame SUN, they choose a license for ZFS to ensure it never had proper in kernel linux support. They did that because Linux was eating their lunch and still is.

Re:Reinventing the wheel (4, Interesting)

UnknownSoldier (67820) | about 2 years ago | (#41756969)

> Blame SUN, they choose a license for ZFS to ensure it never had proper in kernel linux support.

That's a myth / blatant lie.

Fork Yeah! The Rise and Development of illumos
http://www.youtube.com/watch?feature=player_detailpage&v=-zRN7XLCRhc#t=1460s [youtube.com]

Why You Need ZFS
http://www.youtube.com/watch?v=6F9bscdqRpo [youtube.com]
@5:40 I just want to clarify you comment "It would be illegal to ship"
@5:45 I think there is a perception issue that we need to tackle.
@5:55 One point that I would like to make because I think said earlier that I think we have much more in common then that separates us.
@5:58 One of the most important things we all have in common is we are all open source systems.
@6:02 And we need to end this self inflicted madness of open source licensing compatibility.
@6:12 I think that it is a boogey man and we letting it us hold us back.
@6:19 You say it would be illegal to ship. I say no one has standing
@6:24 The GPL was never ever designed to counter-act other open source licenses.
@6:33 That is a complete rewrite of history to believe the GPL was designed to be at war with BSD or with Cuddle.
@6:39 The GPL was at war with properiety softwware. And thank the GPL and Stallman open source won.
@6:45 That is the whole point. Open source won.
@6:49 We are pissing on our own victory parade by not allowing these technologies to flow between systems.

Re:Reinventing the wheel (0)

UnknownSoldier (67820) | about 2 years ago | (#41757049)

> Hopefully BTRFS will conquer this.

While I agree btrfs looks very interesting however, unfortunately, they are not taking a wholistic approach to the design so currently they will never match what ZFS has. Now IF they take a step back and incorporate ALL the layers like ZFS does then they will have a chance.

But do you really want another few years for btrfs to get it "right" when ZFS has already been debugged?

 

Re:Reinventing the wheel (1)

h4rr4r (612664) | about 2 years ago | (#41757241)

ZFS has not already been debugged on linux. Is there even a non-FUSE ZFS implementation for linux?

I am not sure everything has to be done in one step. Do one thing and do it well. This holistic idea is nice in concept but often leads to the windows outcome. Not much gets done and what gets done is not that great if at any point "just works" just doesn't.

Re:Reinventing the wheel (1)

dimeglio (456244) | about 2 years ago | (#41756523)

...or XFS with a recent kernel.

I don't see the problem then... (5, Funny)

Zapotek (1032314) | about 2 years ago | (#41755939)

The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often.

We're talking about Linux users here...move along.

Re:I don't see the problem then... (1)

Anonymous Coward | about 2 years ago | (#41756013)

Knew i shouldn't have dual booted my machine...

Re:I don't see the problem then... (0)

tessellated (265314) | about 2 years ago | (#41756035)

Where are my moderator points when I *need* them?
+1 funny

Re:I don't see the problem then... (-1)

Anonymous Coward | about 2 years ago | (#41756229)

Where are my moderator points when I *need* them?

They're waiting for you to make more interesting/informative/insightful comments of course. Given that you've only had a half-dozen or so comments modded up I think it'll take a while if this is the sort of thing you usually post.

Re:I don't see the problem then... (0)

Anonymous Coward | about 2 years ago | (#41756087)

To add, a lot of people like to wait months before upgrading, if not years. At least I do, I'm always a number behind.

Re:I don't see the problem then... (1, Troll)

vistapwns (1103935) | about 2 years ago | (#41756125)

What is it about Linux users' jokes that remind me of the Iraqi Information Minister? ;)

Re:I don't see the problem then... (1)

starless (60879) | about 2 years ago | (#41757353)

Even though my linux desktop machine runs for long periods without needing rebooting, there are exceptions:
My several year old Pioneer television runs linux. It crashes and reboots if I change HD channels more than 5 or 6 times.
My roku box needs to be rebooted from time to time.
So does my android phone.

Really clever... (5, Funny)

K. S. Kyosuke (729550) | about 2 years ago | (#41755963)

The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."

They're trying to boost the average uptime of all installations by making people keep their machines turned on. It's just a continuation of the uptime war waged with the BSD folks!

Re:Really clever... (0)

Anonymous Coward | about 2 years ago | (#41756519)

BSD will always win the uptime war. Linux should just stay home

LKML Slashdotted (1)

o'reor (581921) | about 2 years ago | (#41755991)

Brilliant. Well, it certainly worries this Linux developer -- although I mostly rely on pre-3.0 kernels. Wasn't there a rule on Slashdot about mirroring articles before posting links to them ?

Re:LKML Slashdotted (1)

Bill, Shooter of Bul (629286) | about 2 years ago | (#41756401)

Not that I've ever remebered. It was oft suggusted in comments, but most websites are nearly slashdot prooff these days. Kind of surprised that lkml is so sluggish under the load.

Re:LKML Slashdotted (1)

Anonymous Coward | about 2 years ago | (#41757023)

Kind of surprised that lkml is so sluggish under the load.

That's because they never wanted to reboot because of possible file system corruption. So they are still running on that 386DX they got 23 years ago.

Re:LKML Slashdotted (1)

Bill, Shooter of Bul (629286) | about 2 years ago | (#41757315)

I just pray someone hit the turbo button, we need all of that DX and all of the number co-processing it can give us.

Interesting bug, but don't get excited. (5, Informative)

dacut (243842) | about 2 years ago | (#41756001)

From Ted Ts'o's commentary, it's an optimization ("jbd2: don't write superblock when if its empty") gone awry:

The reason why the problem happens rarely is that the effect of the buggy commit is that if the journal's starting block is zero, we fail to truncate the journal when we unmount the file system. This can happen if we mount and then unmount the file system fairly quickly, before the log has a chance to wrap.

Basically, this optimization has the side effect of not updating the transaction log in this rare case. You can end up replaying old transactions after new ones, which will scramble metadata blocks. Given the rather unique conditions needed to hit this one, I'm not going to lose any sleep over any servers running without Ted's fix (though I'll certainly apply it once RedHat releases the patch).

Re:Interesting bug, but don't get excited. (0)

Anonymous Coward | about 2 years ago | (#41756177)

though I'll certainly apply it once RedHat releases the patch

So, in a couple of years you'll be 100% ok.

Re:Interesting bug, but don't get excited. (4, Informative)

Tough Love (215404) | about 2 years ago | (#41756273)

It means you could get an incorrect replay after a crash and end up needing to do a fsck. Good thing Ext2/3/4 fsck is awesome. Of course, having no replay bug will be much better. Note: the bug was introduced this October 8th. You are not running this kernel on your server or workstation unless you are a dev, it hasn't filtered through to distros yet.

Re:Interesting bug, but don't get excited. (1)

NotBorg (829820) | about 2 years ago | (#41756737)

You are not running this kernel on your server or workstation unless you are a dev, it hasn't filtered through to distros yet.

I'm a crazy, bad ass, rebel that uses ArchLinux for my workstation. Living wild and dangerous, I reclessly shutdown my heathen ext4 computer every night. I feel like I'm that evil mayhem guy on the Allstate commercials. RECALCULATING!

Re:Interesting bug, but don't get excited. (0)

Anonymous Coward | about 2 years ago | (#41756795)

The summary says kernels 3.4, 3.5 and 3.6 are affected. There are certainly distributions out there using 3.4 and 3.5 kernels.

Re:Interesting bug, but don't get excited. (3, Insightful)

Shimbo (100005) | about 2 years ago | (#41756943)

There are certainly distributions out there using 3.4 and 3.5 kernels.

Yes, but not many of them will push kernel updates all the way through to end users in a couple of weeks.

Re:Interesting bug, but don't get excited. (1)

Bradmont (513167) | about 2 years ago | (#41757299)

> it hasn't filtered through to distros yet.

FTA:
> Linux 3.4, 3.5, 3.6 stable kernels

I'm running Ubuntu 12.10 stock kernel:
% uname -r
3.5.0-17-generic

Re:Interesting bug, but don't get excited. (2)

WuphonsReach (684551) | about 2 years ago | (#41757367)

Note: the bug was introduced this October 8th.

Probably one of the more informative comments here.

Re:Interesting bug, but don't get excited. (0)

Anonymous Coward | about 2 years ago | (#41756311)

So, you'd have to do two or more reboots in quick sequence to trigger it?

I do that sometimes when tweaking stuff in /etc, just to make sure it comes up in a coherent state.

Re:Interesting bug, but don't get excited. (-1)

Anonymous Coward | about 2 years ago | (#41756753)

You trust RedHat? It was a RedHat developer (Eric Sandeen) who introduced the bug in the first place!

Not defined (0)

Anonymous Coward | about 2 years ago | (#41756045)

Please define "too often" .... ?!?!

Re:Not defined (1)

Anonymous Coward | about 2 years ago | (#41756263)

Write large chunks of data to every filesystem and force the journals to cycle before reboot. If you have to ask "how often is too often?", then you're probably already in trouble.

Article suggested that people who shut down every day, say a laptop owner who doesn't use suspend/hibernate, will probably bump up against this. My suspicion is that those of us with uptimes of several months will have no trouble, but YMMV.

Re:Not defined (1)

zonky (1153039) | about 2 years ago | (#41756443)

I'm a laptop owner, who uses Dmcrypt, and with a 2 second boot time off SSD, i never bother hibernating. Better check what kernel....

Re:Not defined (2)

h4rr4r (612664) | about 2 years ago | (#41756609)

This one occurred in october so pretty doubtful since none of the major distros are that up to date.

The file system dug too greedily... (3, Funny)

Bovius (1243040) | about 2 years ago | (#41756077)

...and too deep. It awoke a being of segfaults and kernel panics.

Part of the game (2)

ntropia (939502) | about 2 years ago | (#41756153)

At first I had mixed feelings of slight disappointment and concern, especially because it is the default filesystem in several distros, (including Android) [wikipedia.org] . Although, after some second thoughts, I have come to the following conclusions:

1) it is part of the game of having a continuous development toward improvement (most of the times) and new features implies some pitfalls. So far, benefits [wikipedia.org] are much larger than costs.

2) Despite the fact developers are still working on a fix, I wouldn't be surprised if it would be found soon.

3) ...please, guys, don't do it again!

Re:Part of the game (1)

compro01 (777531) | about 2 years ago | (#41756809)

This bug is only 10 days old. It's rather unlikely this has percolated down to anything important, much less Android, which still runs 3.0.31 from May.

too new (1)

Anonymous Coward | about 2 years ago | (#41756155)

This is why I don't use file systems less than 10 years old.

For butts sake (-1)

buttfuckinpimpnugget (662332) | about 2 years ago | (#41756195)

Just use BSD. I'd rather use windows than linux.

Re:For butts sake (2)

bluefoxlucid (723572) | about 2 years ago | (#41756551)

I have used BSD. I found it .... quite striking. There's a hell of a lot of performance enhancement in Linux, and it really shows when you try to boot BSD and find it's ass-slow from the get-go. I even tried slapping down Debian-kfreebsd to compare something roughly the same and ... yeah it's just slow as shit. Solaris (both Sun Solaris and Nexenta = Ubuntu/Solaris) wasn't that slow.

Re:For butts sake (0)

Anonymous Coward | about 2 years ago | (#41756617)

BSD died about 10 years ago.

Reiserfs became 'murderfs'... (1)

Omnifarious (11933) | about 2 years ago | (#41756207)

What term do we get to use for ext4 now? It's unfortunate that Theodore Tso is actually a pretty decent guy instead of being a murderer (and a jerk). So there aren't any obviously negative terms that come to mind.

But clearly, something needs to be done along these lines, as well as a legion of people who forever more claim that ext4 corrupts your data and you should never use it and stick with ext3 instead.

Re:Reiserfs became 'murderfs'... (5, Funny)

Anonymous Coward | about 2 years ago | (#41756295)

So clearly the answer is General Tso's FS. Delicious, but you'll lose your data an hour later.

Re:Reiserfs became 'murderfs'... (0)

Anonymous Coward | about 2 years ago | (#41756405)

Must be all the MSG.

Re:Reiserfs became 'murderfs'... (1)

corychristison (951993) | about 2 years ago | (#41756505)

What term to we get to use for ext4 now?

EXTerminator 4. Because its just awful. (Not really)
EXTerminator 4. Because its corruptt
EXTerminator 4. Because its on a (data) killing spree.

Re:Reiserfs became 'murderfs'... (0)

Anonymous Coward | about 2 years ago | (#41756681)

He should have stuck with chicken. And the military.

The New STABLE Is Not So Stable NOW Is It ? (-1)

Anonymous Coward | about 2 years ago | (#41756341)

Wake me up when it gets fixed. I was used to this, oh, back in the 80s, so it's nice to see disk data corruption make a comeback. Thank you, Linus and fellow corruptors. Thank you!

Workaround (0)

Anonymous Coward | about 2 years ago | (#41756365)

After recently discovering pm-suspend on my desktop, I have found I never need to turn off my computer again! Use "sudo visudo" to get rid of the annoying pw prompt.

Summary is wrong (5, Informative)

DrJimbo (594231) | about 2 years ago | (#41756397)

The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often.

This is wrong. The problem occurs when the fs is unmounted too *soon*. Twice in a row. The bug only appears if the journal buffer does not wrap. You only get catastrophic results if this happens twice in a row.

Re:Summary is wrong (5, Interesting)

Anonymous Coward | about 2 years ago | (#41756669)

This appears to be untrue. My latest tests suggest that it happens if a single unclean umount happens while the fs is mounted in 3.6.3. (At least, I saw corruption in /var after a single boot, followed by a rescue boot into 3.6.1 and fsck: every filesystem that had journal replay invoked also had corruption.)

  -- N., original reporter, not much enjoying his fifteen minutes of fame since it comes with happy fun filesystem corruption attached: captcha is 'contrite', how appropriate

Re:Summary is wrong (1)

Anonymous Coward | about 2 years ago | (#41756689)

Aside: let nobody say Oracle doesn't contribute to Linux, thankyouverymuch. No complaints from anyone @work while I tracked this down, even though it is quite far removed from DTrace work. (Admittedly it is sort of hard to work on anything while your filesystem is fried.)

Re:Summary is wrong (1)

DrJimbo (594231) | about 2 years ago | (#41757117)

I suspect that unclean umounts may trigger the bug too but that does not contradict anything I said. I did not say there was no corruption when you hit the bug once, I said there was catastrophic corruption when you hit it twice in a row. If a bug can be triggered by a clean umount, it is not very surprising if it also gets triggered by an unclean umount.

Your experience seems to confirm my correction. It is not about how *often* you mount, it is about how you umount. This is a non-trivial distinction because the misleading summary could tend to encourage some people who have been safely using a buggy kernel to unwittingly engage in behavior that triggers the bug, perhaps catastrophically.

In other words... (1, Funny)

Anonymous Coward | about 2 years ago | (#41756577)

This is what you get when you use a filesystem that wasn't developed by a real company.

Because if they had to worry about losing money, they would make damned sure that problem didn't exist. Or at least make it go away. I thought this "problem" existed with ext4 for years.

Yeah, Micro$oft is evil, but their FS works. And file corruption isn't a serious issue except when hard drives fail, and, well, in that case...DERP!

Re:In other words... (1)

interval1066 (668936) | about 2 years ago | (#41756929)

Figures... AC calls out FOSS.

This is what you get when you use a filesystem that wasn't developed by a real company.

Sounds like M$ FUD to me, but whatever. Is M$ the only "real" company?

Because if they had to worry about losing money, they would make damned sure that problem didn't exist. Or at least make it go away.

I got a list of "real"companies that haven't made good on many high-level flaws.

I thought this "problem" existed with ext4 for years.

You did? Would've made a nice /. article. Where are your notes regarding this flaw only you uncovered?

Yeah, Micro$oft is evil, but their FS works.

http://serverfault.com/questions/31709/how-to-workaround-the-ntfs-move-copy-design-flaw

Re:In other words... (0)

Anonymous Coward | about 2 years ago | (#41757039)

Ted Ts'o, the lead developer of EXT4, works for Google. Specifically, he's paid to work on the Linux kernel and filesystems. If Google isn't a company with a lot at stake with regard to filesystems, I don't know what is.

Re:In other words... (0)

Anonymous Coward | about 2 years ago | (#41757223)

And the guy who found the bug (me) works for Oracle (though the bug struck his own personal idiosyncractically-configured system).

Clearly Oracle have an interest in robust data storage :)

LOL (0, Funny)

Anonymous Coward | about 2 years ago | (#41756699)

The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."

"You're just rebooting it wrong."
-Loonix filesystem developer

Re:LOL (0)

Anonymous Coward | about 2 years ago | (#41756759)

And the followup: "You remounted it wrong."

How many times (1)

MetalliQaZ (539913) | about 2 years ago | (#41756803)

... can we get the words "stable", "linux", and "kernel" into a single summary? I like this game.

Well of course! (2)

Panaflex (13191) | about 2 years ago | (#41756937)

They're mounting it wrong!

When you mount your disks, you need to be sure of proper head alignment. Make sure she's spun up properly as well, otherwise the disks could be surprised and jump away causing a crash. Lastly, my geek friends, mounting too often can cause burning friction which can destroy data and cause irritation and discomfort.

EXT4 has had other issues... (0)

Anonymous Coward | about 2 years ago | (#41757091)

...I believe that it had problems with large files (I don't know all of the details) at one point, too.
This may still be an open issue.

I stick with EXT3, but it has the "forever to perform a mkdir" issue after your filesystem crosses
some file count threshold. But I've not had anything go sour with EXT3 even when the box has
gone down hard from a power failure.

Also, we're running Win 2008 server and this is the second time we've seen this where a whole
partition becomes unusable. We have to restore the entire image from backup; it can't be repaired.

CAPTCHA = sour grapes they're not!

then good thing i switched to OS X years ago! (-1)

Anonymous Coward | about 2 years ago | (#41757347)

can Linux even run on a Apple Fusion drive? Probably not since open source is chronically behind the curve.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?