Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

MineralMan

(146,317 posts)
Wed Mar 30, 2016, 09:29 AM Mar 2016

How to scramble data in one easy step.

There is lots of conversation about how the Arizona voter records messed up the Democratic Primary there. No doubt, many people found that their party affiliation had somehow been switched, blocking them from voting any way other than on a provisional ballot.

Was that deliberately done? It could have been. It could have been done to mess with Democratic Party voters, I suppose. We're not hearing about Republicans finding their party affiliation changed, but this is DU, so we don't talk a lot about Republicans.

Databases are funny things, and are subject to all sorts of errors, usually caused by people making mistakes.

Now, I know that an Excel spreadsheet isn't a database, although it can be used as one, but there's a boneheaded error in using that software's Sort feature that has destroyed data integrity so many times that it's not even funny. Similar sorting issues can also occur in other flat-file databases. Here's what happens.

Someone is looking at a huge spreadsheet and wonders what sorting it by some column, say "Party," would show. So this person, inexperienced in using Excel, clicks at the top of that column to select the entire column. Then, that person goes to the Data-Sort function. Excel pops up a nice little box, explaining that data "beside the selected data" is not selected and gives you an option to continue or select one of two options.

One of those options is to sort the damned worksheet the way it was selected. When you do, it sorts that column, but not the rest of the data. Suddenly, the worksheet is scrambled with the "Party" column no longer lined up as it was. Everything is FUBAR. Now, imagine this person saves the spreadsheet after doing this. Now, the data is irretrievably scrambled.

A stupid thing to do? Yes, indeed, but it happens so often that it's not even funny. There are may ways to screw up data, in spreadsheets and in database applications, too. Most databases let you view the dataset as a table and operate on it, just like a spreadsheet. Despite warnings and other safeguards, that same sorting error can take place. Nobody even notices that it has happened, in many cases, until later.

So, it's premature to assume that someone deliberately sabotaged the AZ primary election. Occam's Razor suggests that some error by some boneheaded worker is a more likely cause.

15 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
 

randome

(34,845 posts)
1. It's the curse of the Digital Age: easy to store huge amounts of data and easy to screw it all up.
Wed Mar 30, 2016, 09:32 AM
Mar 2016

[hr][font color="blue"][center]“If you're not committed to anything, you're just taking up space.”
Gregory Peck, Mirage (1965)
[/center][/font][hr]

MineralMan

(146,317 posts)
5. Ain't it the truth?
Wed Mar 30, 2016, 09:42 AM
Mar 2016

I've personally scrambled a monstrous Excel spreadsheet that very way on a day I was tired, rushed and frustrated. However, I never do anything to any data, except on a working copy of it, saved specifically to mess around with. The original data is saved, in fact, in a read-only state, to prevent me from screwing it up. I didn't even notice that I had screwed up until later, when I was comparing rows for another reason.

I've had others scramble my spreadsheets that way, too, destroying the integrity of data I had put into spreadsheet form. Once, the scrambled table ended up being published in a major national magazine, when someone at the magazine sorted a column that way. It was a definite SNAFU situation. Fortunately, I still had the original file, so I could demonstrate that it wasn't my fault. Lots of egg on lots of faces that caused.

kristopher

(29,798 posts)
2. Occams Razor tells us no such thing.
Wed Mar 30, 2016, 09:33 AM
Mar 2016

In the middle of a political campaign in a strategically critical primary, the simplest answer involves sabotage, not random failure.

 

HumanityExperiment

(1,442 posts)
3. Or...
Wed Mar 30, 2016, 09:34 AM
Mar 2016

This is a presidential election with a tremendous amount at stake, and knowing presidential history and dirty tricks throughout that history it's more likely that some nefarious actions were taken to leverage advantage towards one candidate over another

lostnfound

(16,184 posts)
14. Absolutely. And the candidate who benefits doesn't need to know a thing about it.
Wed Mar 30, 2016, 10:09 AM
Mar 2016

The paucity of voting machines in Maricopa was probably done intentionally by the GOP in AZ, to disadvantage all democrats in the general -- or even other races in general in the primary. It had a side effect in the presidential primary.

Or, some zealous partisan has succeeded in hacking into registrations.

I don't jump to conclusions about any one individual. But obviously there was systematic suppression in AZ, and it probably impacted the race a great deal. No surprise. The lengths that people go to get their type of candidate into power are huge.

Buns_of_Fire

(17,183 posts)
4. Been there. Done that. Learned my lesson well.
Wed Mar 30, 2016, 09:42 AM
Mar 2016

It's amazing how going through 1500 rows of a spreadsheet -- one by one -- solidifies the learning experience.

MineralMan

(146,317 posts)
6. Yup. SAVE AS is the answer.
Wed Mar 30, 2016, 09:45 AM
Mar 2016

Always work on a working copy. Never work on the original file. Always maintain multiple read-only backups of the original file. It's a lesson that everyone should learn, but it gets forgotten every day, causing enormous problems.

 

cherokeeprogressive

(24,853 posts)
8. You can also turn on the "Track Changes" feature.
Wed Mar 30, 2016, 09:52 AM
Mar 2016

In addition to that, Excel tracks the last 100 actions. Any one or all of them can be undone. The only time that list is emptied is when the file is closed, not when it's merely saved.

MineralMan

(146,317 posts)
10. Yes, that's true, too.
Wed Mar 30, 2016, 09:59 AM
Mar 2016

The problem is that a mis-sorted column often goes unnoticed by anyone until it's too late to fix. A working copy is the only safe way to mess around with big spreadsheets of data. Then, it's fairly easy to check some records at random to make sure no misaligned columns have occurred.

Personally, I've always thought that Excel's dialog box warning is unclear. In fact, I think sorting a column without selecting the entire dataset should be almost impossible in Excel. It's one of the most commonly made errors by inexperienced workers.

 

cherokeeprogressive

(24,853 posts)
15. Yes a working copy IS the only safe way. No doubt about that.
Wed Mar 30, 2016, 10:11 AM
Mar 2016

My way around sorting a column by itself though is to simply select a cell within the column rather than the entire column. Excel makes the assumption you're sorting the entire table and there isn't even a dialog box displayed.

 

Travis_0004

(5,417 posts)
7. I would hope no serious business or government uses excel for vital data
Wed Mar 30, 2016, 09:51 AM
Mar 2016

Another issue occurs when you have a blank column in excel. Excel sees the data on each side of the column as unrelated and only sorts half. I usually use a astrik in that blank column just so excel sees all the data as related.

Kip Humphrey

(4,753 posts)
9. The registration databases you refer to are entirely different in architecture and functioning
Wed Mar 30, 2016, 09:56 AM
Mar 2016

than an Excel spreadsheet. The registration databases in question utilize are SQL relational databases, not flat files. What you describe in terms of sorting an Excel worksheet cannot happen to a relational database.

 

randome

(34,845 posts)
12. The resulting lists are probably spit out as spreadsheets, though, and then it's easy to screw up.
Wed Mar 30, 2016, 10:01 AM
Mar 2016

No one does reporting through SQL Services. Well, almost no one.
[hr][font color="blue"][center]“If you're not committed to anything, you're just taking up space.”
Gregory Peck, Mirage (1965)
[/center][/font][hr]

Kip Humphrey

(4,753 posts)
13. The system contains incorrect data: the database data is wrong - this has nothing to do with
Wed Mar 30, 2016, 10:03 AM
Mar 2016

spreadsheets. Spreadsheets are a red herring.

Latest Discussions»Retired Forums»2016 Postmortem»How to scramble data in o...