johnny_m's profile

7 Messages


150 Points

Tuesday, July 26th, 2022 8:09 AM

No Status

title.basics.tsv.gz is broken -

The title.basics.tsv.gz dataset is broken in .

It now only includes 3,477,496 titles. It should have 3 times that number almost.

The data is corrupted after the title "Kneeling for Justice: A San Francisco Memorial for George Floyd". This value is found in titleType. The value in tconst for that record is "ial for George Floyd". 

Could some at IMDb please correct this?

Thank you!

11 Messages


250 Points

2 years ago

There are tconst entries that appear to be missing from titles.basic.tsv.  The entry tt0055928 , "Dr. No", was missing for a few days, but is present once again.  Here is are some missing entries of which I am aware, but there could be others as well.  Nor do these tconst values have any entries in the title.crew.tsv, title.ratings.tsv, or title.episode.tsv files. 


Note: This comment was created from a merged conversation originally titled Some tconst titles missing from titles.basic.tsv

2 Messages


70 Points

2 years ago

Sometime over the last two-three weeks (Between files downloaded on 2022-07-10 and 2022-07-24), it seems as if the IMDB datasets available from no longer include some movies.

Download for instance, and try to find the following IMDB-ids entries, there on July 10 but not on July 24:

tt0044502  Clash by Night (1952)
tt0047573  Them! (1954)
tt0048977  The Bad Seed (1956)
tt0050539  The Incredible Shrinking Man (1957)
tt0053290  Solomon and Sheba (1959)
tt0056700  The Wonderful World of the Brothers Grimm (1962)
tt0057449  The Raven (1963)
tt0060980  The Silencers (1966)
tt0065421  The AristoCats (1970)

The same IMDB-ids seem to have disappeared from as well.

I did re-download the files on July 25 and got the same results missing.

What could explain this?

Note: This comment was created from a merged conversation originally titled IMDB Datasets no longer including some movies?

7 Messages


150 Points

The dataset is broken. It now only includes 3,477,496 titles. It should have 3 times that number almost.

The data is corrupted after the title "Kneeling for Justice: A San Francisco Memorial for George Floyd". The value in tconst for that character is "ial for George Floyd".

Could some at IMDb please correct this?

Thank you!

7 Messages


150 Points

2 years ago

The source of the issue might be another record. See tconst = tt14491350. The value in genre contains the value for tconst of another record.



17.2K Messages


310.3K Points

2 years ago

Hi @johnny_m & @christian_sauve -

Thanks for reporting the issue with the datasets.  I have filed a ticket for the appropriate team to investigate further.  As soon as I have an update on the status I will relay that information here.

Thanks again!



17.2K Messages


310.3K Points

2 years ago

Hi All -

I'm just following up here to confirm that the issue with the 'title.basics.tsv.gz' dataset should now be resolved and the titles should now be included.


11 Messages


250 Points


I'm replying about a smaller issue that still exists with the IMDB datasets available from

Several titles remain missing from various files there.


tt8084176 -- "Mr. Robot"; Season 4, Episode 7; "407 Proxy Authentication Required" is available via the web UI, but is not present in any of title.basics.tsv, title.ratings.tsv, title.crew.tsv, or title.episodes.tsv.

tt0562856, tt0811802, tt0811803, tt0811804, tt0562972, and tt0811753 -- "Doctor Who (1963)"; Season 19, Episodes 9-14 are available via the web UI, but are not present in any of title.basics.tsv, title.ratings.tsv, title.crew.tsv, or title.episodes.tsv.

On Sat, Jul 23, 2022, I mentioned a longer list of titles that are/were missing in a post titled "Some tconst titles missing from titles.basic.tsv".




17.2K Messages


310.3K Points

Hi @hduston​ -

Thanks for these additional missing title reports, I have filed a new ticket for the appropriate team to investigate further.

15 Messages


224 Points

2 years ago

Here are some additional info which could help find the source of the problem that still persists

The missing IDs in the TSV files can be as old as the 1923 movie "The Hunchback of Notre Dame" (tt0014142) or "The Wild Child" (tt0064285) from François Truffaut 1970, as well as more recent ones such as "Feast" (tt13097910) or Sinkhole (tt21953638) both released in 2021.

There are also some TV shows in that list (ex: "Norman" tt4191702)

What is peculiar is that some IDs can be found on the title.akas and title.principals or even only on the name.basics without appearing in the main title.basics.

After doing a crosscheck between what appears to be the main title.basics and the 4 "sub-files" (name.basics, title.akas, title.crew and title.principals), there seems to be 8712 incoherencies (mismatch) organized in 2 categories:

  • 2670 IMDb IDs (tconst) correspond to an existing movie/TV show/video/... on the website with 2130 of them that land to a regular page (returns HTTP 200 code) and 540 are 302 redirects to another id.
  • 6042 are inexistent when checking on the website (Not found / 404 HTTP error code).

here are some examples of the IDs found (top/bottom five for each group):

- IDs absent in title.basics but correspond to an existing movie/serie


total count = 2130

- IDs absent in title.basics but correspond to an existing movie/serie after being redirected (302)

tt0014327 -> tt0014325
tt0047941-> tt0047940
tt0059860 -> tt0059845
tt0088641 -> tt0085111
tt0103358 -> tt0103357
tt21312196 -> tt14604694
tt21336160 -> tt7227442
tt21931516 -> tt21905038
tt21943026 -> tt21926422
tt21944362 -> tt14794336

total count = 540

- IDs found in one of the 4 "sub-files" but not in title.basic and which returns a 404 error in the website


total count = 6042

I can provide the full list if need be.




17.2K Messages


310.3K Points

Hi @Adren​ -

Thanks for this additional information, I have passed this along to the technical team!



5.3K Messages


55.5K Points

1 year ago

Hello everyone-

This has now been solved.


1 Message


60 Points

2 months ago

I think the problem has re-appeared: title.basics.tsv.gz downloaded July 9, 2024, does not seem to have Oppenheimer (tt15398776) for example, as well as other major movies.

15 Messages


224 Points

As of today (10th July 2024 with 4 files still dated 2024-07-09), the problem seems to be fixed with the title.basics.tsv that has more than 109M lines.

But there are still some incoherencies between the files with tconst that are missing from the title.basics (see this other ticket)