timothy_gray_el34lojg1aih1's profile

308 Messages

 • 

7.1K Points

Monday, May 27th, 2024 6:13 PM

Solved

Automated system incorrectly capitalizing some titles

Occasionally, a film is released with two interchangeable titles linked by the word "or". The most famous example of this is probably Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964). Most people reading this will understand that the second part of this title represents an alternative title and should be capitalized accordingly. However, the automated system is not able to detect such nuance. This has led to the creation of some erroneous IMDb display titles uncapitalizing the initial word of the second version of the title, usually because it begins with the word "the".

Here are some examples of this happening:

The Astronomer's Dream; or, the Man in the Moon (1898)

Princess Nicotine; or, the Smoke Fairy (1909)

The Dover Boys at Pimento University or the Rivals of Roquefort Hall (1942)

Spider Baby or, the Maddest Story Ever Told (1967)

Dealing: Or the Berkeley-to-Boston Forty-Brick Lost-Bag Blues (1972)

Salò, or the 120 Days of Sodom (1975)

Soldier Jack or the Man Who Caught Death in a Sack (1989)

Twin Peaks: Zen, or the Skill to Catch a Killer (1990)

Philophobia: or the Fear of Falling in Love (2019)

Our Last Crusade or the Rise of a New World (2020-)

WeWork: Or the Making and Breaking of a $47 Billion Unicorn (2021)

The Extortionist, or the Revelatory Admissions of a High-Rise Loser (2022)

Scala!!! Or, the Incredibly Strange Rise and Fall of the World's Wildest Cinema and How It Influenced a Mixed-up Generation of Weirdos and Misfits (2023)

Spider Baby, or the Maddest Story Ever Told (2024)

In each of the above instances, the word "the" is the initial word of an alternative title and should be capitalized. A human being can figure this out pretty easily, but the automated system can only detect the capitalized "The" and assume that this is an error, while failing to see the bigger picture.

This issue goes to show the problem with relying on automated systems. There is simply no way with our current technology for an automated system to detect the subtle difference between titles like the ones shown above and titles like The Beginning or the End (1947) and The Wing or The Thigh? (1976), which are not meant to be read as two alternate titles.

Champion

 • 

14.2K Messages

 • 

328.1K Points

4 months ago

I think you are only partly right because some of those titles have been listed with lowercase "the" for much longer than the current autocorrection has been in place.

IMDb's written policy is quite strict, and that is presumably what the autocorrection follows: "English language words which must begin with a lower-case letter (unless at the end of a title) are: a an and as at by for from in of on or the to with."

Perhaps there is a need to discuss the policy.

That said, we already have a few examples of things that should probably not be autocorrected, so I do think autocorrection is causing errors.

https://community-imdb.sprinklr.com/conversations/data-issues-policy-discussions/the-shouldnt-always-be-lowercase-in-titles-automatic-fix-in-editor-is-wrong/61d9bb95f5b9527873e6cc59

https://community-imdb.sprinklr.com/conversations/data-issues-policy-discussions/strange-capitalization/6557ba769a86502528eee927

https://community-imdb.sprinklr.com/conversations/data-issues-policy-discussions/the-word-and-is-usually-written-as-and/65693f3c58976622ce9e278f

https://community-imdb.sprinklr.com/conversations/data-issues-policy-discussions/unable-to-fix-capitalization-of-titles-through-system-charlie-ben-podcast-aka-dropping-in-with-charlie-houpert/65f8a896907f9d030710f023

(edited)

308 Messages

 • 

7.1K Points

@Peter_pbn​ I guess the problem is not just with the automation, but also with overly rigid rules. I've gotten frustrated with IMDb's policy-makers/enforcers for similar situations in the past.

I may have gotten carried away with the examples I found. Not all of them are as straightforwardly bad as the worst offenders. Salò, or the 120 Days of Sodom (1975), for example, is listed with the lowercase "the" in many resources, not just IMDb. When it comes to some of the others on this list, particularly the first three, the lack of capitalization just looks plain wrong.

Regarding the strict written policy, I think the issue here is that the titles I've highlighted are sort of a special case: in a way, they are simultaneously one title and also two separate titles that just happen to be merged together. By this logic, the word "the" is at the end of the title, if not in the most technical sense, then in an intuitive, not-quite-as-literal sense.

Just look at the title card for The Dover Boys cartoon. It's blatantly obvious that these are supposed to be read as two different versions of the title, not one inexplicably long sentence. They even have separate quotation marks around each. The automated system capitalizes it as if it were one single thought, as if the title is a question asking us to choose between the "Dover Boys at Pimento University" or the "Rivals of Roquefort Hall", rather than just being a tongue-in-cheek way of giving the film two overly-serious sounding titles. It just looks very silly.

Whether IMDb chooses to acknowledge and fix this issue or just ignore it is out of my hands, I guess. But it's clear to me that they need to either change the rules or change the way the rules are being applied. The way it's set up currently just ain't doing it.

(edited)

Employee

 • 

5.3K Messages

 • 

55.5K Points

4 months ago

Hi @timothy_gray_el34lojg1aih1 -

This cases are exceptions as usually we would follow the regular capitalization rules. If you find any feel free correcting through our online form, of course if you encounter any problems while doing it let me know.

Cheers!

308 Messages

 • 

7.1K Points

@Bethanny​ I tried doing this before posting, it does not work. The system automatically "corrects" the capitalization before the user can submit.

In my second attempt, I tried just deleting the alternate name altogether, which I assumed would cause it to revert to the original title. This submission came back as "badly formatted."

Employee

 • 

5.3K Messages

 • 

55.5K Points

@timothy_gray_el34lojg1aih1​ I will manually fix the titles, will let you know when they are :)

Champion

 • 

14.2K Messages

 • 

328.1K Points

@Bethanny​ 

Not everyone will come here to get you to make edits when the site does not allow them.

Employee

 • 

5.3K Messages

 • 

55.5K Points

@timothy_gray_el34lojg1aih1​ Hi! I have fixed the titles above and sent information to team in charge for visibility of possible future improvements.

Cheers!