Saved searches

Use saved searches to filter your results more quickly

Cancel Create saved search Sign up Reseting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Simple goal: Translate the CCP leaked database into English and make all the data available, unfiltered, for everyone to see.

Notifications You must be signed in to change notification settings

StopTheCCP/CCP-Database-Leak

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Go to file

Folders and files

Last commit message Last commit date

Latest commit

History

View all files

Repository files navigation

CCP-Database-Leak

Simple goal: Translate the CCP leaked database into English and make all the data available, unfiltered, for everyone to see. Note: GitHub has placed a bandwidth cap on the repository's CSV (LFS) files. As a result we have now moved our repository to Codeberg.org.

Input Data:

Original Data Source (removed): https://gitlab.com/shanghai-ccp-member-db/shanghai-ccp-member-db/-/blob/master/shanghai-ccp-member.csv
Mirror: https://git.rip/botayhard/shanghai-ccp-member-db/-/raw/master/shanghai-ccp-member.csv
This repository: https://codeberg.org/StopTheCCP/CCP-Database-Leak/raw/branch/main/Data/shanghai-ccp-member.csv

Pre-Processing:

The leaked csv file should be in the /Data directory.
Then SplitFileIntoParts.py is run to split the file into 40 separate files of 50,000 lines.

Processing:

Run TranslateCSVPartials.py with python3. You can change the input file range to target specific files to process.

inputFileRange = [*range(1, 41)] # all files from 1-40 inputFileRange = [1,2] # specific files

Post-Processing:

TBD: Reference MergeFiles.py

Status:

2020-12-18

Migrated repository to Codeberg.org
A few individuals have been running TranslateCSVPartials.py against the google translate service. Google has been heavily throttling us. So far we are only about halfway done.