Fog Creek Software
Discussion Board




Catching undercover posters

The thread about moving to NZ gave me something to think about.

Undoubtly there are many people who post under several aliases on this phorum. They change their nicknames, they change their mail addresses and since nowadays everyone has an internet connection at home, they can even have different IP addresses.

So, how can we catch those posters?

By analizing what they write. Not by subject, but by the style of their writing. If we analyzed by subject, then, a couple of months ago the first guy who posted something about outsourcing would be blamed for all the hundreds of posts on outsourcing that followed).

Is there any commercial/OSS/whatever package that does this? Read a string, analyze the way it's written and pick similar from a pile of them.

Actually, this would be very interesting...

RP
Monday, June 07, 2004

Yes, writing "style" is just the uneven distribution of usage of components of English grammar.

Why do you care about 'rooting out' these posters?


Monday, June 07, 2004

Yow yow yow! Check it out!

I wanna, I wanna, I wanna... catch these ev1l posters wh0 post in my n4me!!!

Yo, I'm so 31337 b.cause I can katch them by analyzzzing their writing1

George
Monday, June 07, 2004

It was just an academic curiosity. Nobody's been posting under my name, as far as I know. I just thought it could be kind of cool.

RP
Monday, June 07, 2004

:)
(Just kidding)

RP II
Monday, June 07, 2004

OK, I'll confess. This whole site was an elaborate joke played on you, RP. I am ALL the other posters.
Fooled you, didn't I. Nehnehneneneh :-p

Just me (Sir to you)
Monday, June 07, 2004

Maybe someone should post as Joel. If only we could post to the main JoS page too.

I'm sure we could create some interesting articles in Joel's style.

Well, not me obviously, as I'm too much of a perfectionist to even start, due to being too busy shaving and indenting my HTML ;-)

Steve Jones (UK)
Monday, June 07, 2004

You know that comment is going to bring the wrath of God upon us, don't you? At least upon this thread it will.....

RP
Monday, June 07, 2004

It's all the fault of the Kiwis

Moe Hawke
Monday, June 07, 2004

The Kiwi fruits, the kiwi fruits...

RP
Monday, June 07, 2004

To respond to the curiosity question about analyzing writings based on content, not responding to "how can we catch those posters"...

This is something that I have thought about once in a while, and I read an interesting article in a local newspaper several months ago (titled "Much ado about data" or something like that) where they had analyzed the words, grammar, and writing style of all of Shakespeare's works in an attempt to prove or disprove whether some of the works had been written or ghost-written by other authors.

I noticed in this forum some time ago, that I could reasonably guess the author of some replies (assuming the reply was long enough to both not immediately show the author's name at the bottom of the reply and also to have enough content to "analyze"). I guess this interests the "engineering" part of my mind.

Philip Dickerson
Monday, June 07, 2004

I remember an old Atari or XT software that could analyze text content and check its match to a previous text, though i think it worked based more on sentence length and word complexity.

GD
Monday, June 07, 2004

GD, still remember it's name?

RP
Monday, June 07, 2004

Here is an 1992 text about this:

http://www.textfiles.com/bbs/safter

Jack
Monday, June 07, 2004

Apart from a couple of windbags (such as myself), most posts on here I would guess at being too short to conclusively identify the writer. Text analysis generally requires a considerable body of work (and even then remains somewhat iffy).

Dennis Forbes
Monday, June 07, 2004

Yeah! Good point Dennis! You rock!

Joe1 on Software
Monday, June 07, 2004

Quote from the text above:

"Do you guys realize how much socializing is done on the boards? It is all politics, diplomacy, socialization, communciation, and having skill in each of those is really helpful. Yuo know that you all have most likely learned that you can read someone's personality by the way they organize their thoughts on paper? Neat I think."

:)

Jack
Monday, June 07, 2004

RP: Nope, sorry.

GD
Monday, June 07, 2004

People did use to impersonate Joel, which is why he added Fog Creek to his signatue

Stephen Jones
Monday, June 07, 2004

---"where they had analyzed the words, grammar, and writing style of all of Shakespeare's works in an attempt to prove or disprove whether some of the works had been written or ghost-written by other authors."----

This was first done in the 1960s. The Bacon society (an American society devoted to the works of Francis Bacon and to propagating the idea that he was the real author of Shakespeare's plays)  commissioned an American university professor t to analyze the average number of syllabes in a word in the works of Shakespeare, Bacon, Marlowe, Ben Jonson, Fletcher and the Earl of Essex. Three years, and tens of thousands of dollars, later (everything had to be done by hand because it was before the time of computers) he came to the firm conclusion that Shakespeare's plays were written by .... Christopher Marlowe! The author had to switch sponsorship to the Marlowe society to get the book published.

Stephen Jones
Monday, June 07, 2004

Yeah, then everyone tried to exploit the weaknesses in the forum software to get their own version of the "Fog Creak Software" line.

Steve Jones (UK)
Monday, June 07, 2004

A better and far more interesting assignmet would be to find out *why* people go undercover, given that the net is anonymous and JoS doubly so.

.
Monday, June 07, 2004

Kiwi Fruit is redundant

Moe Hawke
Monday, June 07, 2004

When referring to Kiwi, the fruit, there's no need to say "fruit" because you know it's a fruit.

Same for New Zealanders.

Staunch Kiwi Lover
Monday, June 07, 2004

If you're interested in identifying authors by their style of writing, you might want to look into Donald Foster's: Author Unknown: On the Trail of Anonymous. He's an English professor you deals with such methods. It's not a technical book, but fascination all the less, relating anecdotes about establishing authorship of the Unabomber, the author of Primary Colors and a bunch of other literary and non-literary works.

http://www.amazon.com/exec/obidos/tg/detail/-/0805063579/qid=1086623677/sr=8-1/ref=pd_ka_1/103-6913033-7939863?v=glance&s=books&n=507846

  -tim

a2800276
Monday, June 07, 2004

Unfortunately, in order for Marlowe to have written more than half of the First Folio plays he'd have to have been alive, which he wasn't, having been killed in a tavern brawl (or executed by the state ).

Yes there is some story about the fight being staged and him being smuggled across to France living out his days anonymously writing the plays that Shakespeare produced for him.

On a scale of believability this ranks a quite high negative.

Simon Lucy
Monday, June 07, 2004

I wrote them all. In fact I write them all. Gues what I am all

Feynman's Electron
Monday, June 07, 2004

Is a Kiwi a?
a) Flightless bird
b) fruit
c) Human resident of NZ
d) dyslexic spelling of wiki

Miles Archer
Monday, June 07, 2004

There was a DOS application back in the late eighties.

Can't remember the title, but it did stuff like analyze your text and compare it against that by famous authors. Alternatively, you could type stuff in, and have it try and gues whom the author was.

More fun than anything but a good start.

Tapiwa
Monday, June 07, 2004

check out the bullshit analyzer. Was released about a year ago by one of the big consultancy firms.

Another area to look in would be writers' resources. A lot of the software does stuff like comparing your text to that of other writers.

Tapiwa
Monday, June 07, 2004

DOT GUY SAYS:
>A better and far more interesting assignmet would be to find out *why* people go undercover, given that the net is anonymous and JoS doubly so.

Look who's talking
Monday, June 07, 2004

LOL! I am giving you an opportunity to express your prowess. You must thank me! Rather you.....!

.
Monday, June 07, 2004


No, I am just making a feeble attempt to get your goat!

;-)

Look who's talking
Monday, June 07, 2004

LOL!

.
Monday, June 07, 2004

RP, brother

You have way too much time on your hands. Get a life!

Jason
Monday, June 07, 2004

Dear Simon

If you'd read the book you would have found Marlowe's being dead to be a minor impediment to his having written the plays :)

After all he was a secret agent, and Waslingham, who was Elizabeth's spymaster, was more ruthless and devious than anything in Fleming or Le Carré.

Of course, really Malowe was abducted by aliens who  dictated all of Shakespeare's plays to him.

The reason I don't believe the theory is simple. The author of 'Dr. Faustus' would be quite capable of wirting 'Hamlet', but no way could he have produced the kind of junk that Shakespeare was writing at the same time ('Titus Andornicus' and 'Henry VI' for example).

Stephen Jones
Monday, June 07, 2004

I haven't seen the "Much ado about data" article available online anywhere (except at the newspaper's paid subscription online archive), but this is an abstract for the article:
Shakespeare Analyzed: Much ado about data — New computer analyses can identify Shakespeare as well as cardiac problems [...] about a team of researchers at Beth Israel Deaconess Medical Center who have used their software for detecting diagnostic patterns in people’s heart rhythms. [In the literature analysis section ...] they argue that Shakespeare was Shakespeare and not Marlowe (based on what they say are characteristic patterns of word use, especially of high frequency items). The research paper this article reports on is entitled “Information categorization approach to literary authorship disputes”. The authors are: Albert C.-C. Yang, C.-K. Peng, H.-W. Yien and Ary L. Goldberger.

This article http://www.sciencenews.org/articles/20031220/bob8.asp ("Statistical tests are unraveling knotty literary mysteries") discusses some of the recent research in this area (called "stylometry") and also mentions some author-analysis work that was done in the 1960s.

Philip Dickerson
Monday, June 07, 2004

RP, catching "undercover posters" implies they're doing something wrong. Since just about everyone is anonymous, what's the issue?

anon
Monday, June 07, 2004

I repeat: this was purely an academic exercise. I'm not looking for ghosts here.

Do post anonymously. Let my paranoia be.

RP
Monday, June 07, 2004

There is software out there that does what you are looking for (I believe it is web based) but you have to pay to use it. The software was created to help professors catch plagarizing college students. Several universities use this software on a subscription basis. I don't have a URL for you -- try Google.

One Programmer's Opinion
Monday, June 07, 2004

The plagiarism detector just matches significant phrases in particular subjects.

It's not going to identify writers discussing generic topics.

By the way, how come plagiarism is condemned in most fields, yet the same thing in software (open source) is praised by much the same people?


Monday, June 07, 2004

Clearly because all open source developers are code stealing donkeys.

Proven beyond reasonable doubt by the book which has exposed Linux itself as being written by someone other than Linus.

Thank you, Microsoft, for opening our eyes to this terrible theft in such a selfless way.

FullNameRequired
Monday, June 07, 2004

"So, how can we catch those posters?"

Why would you want to?  If you do, you'll only encourage us not to post anonymously.  As the /. FAQ says, there are legit reasons for wanting anonymity.  (Note that Joel could have made this a registration-with-email-and-full-name-required forum, but didn't.)

Wally
Wednesday, June 09, 2004

It is fairly easy to catch regular posters who use a different alias. Text analysis is pretty hard if you are comparing Shakespeare, Marlowe and Bacon because
a) they all wrote at the same time, 400 years ago
b) they used the same vernacular, etc
c) they were all pretty accurate in terms of the grammatical rules, spelling and structure conventions of their time. Teasing them apart is hard.

Teasing you guys apart is NOT a hard problem.

Why ? Because there is a hierarchy of grammatical correctness hereabouts; posters make the same distinctive spelling errors; posters use the same shorthand for short posts; posters use the same structures/whitespace ideas/etc. Short posts are as much of a giveaway as long ones.

On top of all that, a simple writing-style analysis program of the kind that have been around for 20 years gives a distinctive profile too.

It is very hard to be truly anonymous. You cannot hide.

WoodenTongue
Monday, June 14, 2004

*  Recent Topics

*  Fog Creek Home