Tutorial For Large Database Sorting / Cleaning

I need help finding the best program / method to clean / manage my database. Simple instructions on the best software / method to solve the following problem:

I have 10 million + entries in an excel file (10 sheets because of limit). I want to combine them into 1 program that is good for managing large csv files as I have outgrown excel.

Each day I add roughly 500,000 entries. I want to then select duplicates from the whole sheet, except first entries. Example, if there was a column containing a value in the initial 10 million entries such as "EXAMPLE1", and I import my 500,000 entries. If one or more of the new entries then contains "EXAMPLE1" I want to be able to remove all of the duplicates that I have just imported, apart from the first one that already existed. (The same would apply for any duplicates that are added when importing the new data, not just one value).

Basically the goal is to have one column that contains no duplicates, and each day as the database grows I can ensure that the data added doesn't match any of the excising data in a specific column.

Currently using excel and kutools to do this, and it takes 24+ hours to complete the operation. I am looking for a better method.

Skills: Excel, Data Entry, Database Programming, Data Processing, MySQL

See more: importing large database mysql, load large database mysql php, import large database mysql, how to analyze large data sets in excel, data cleaning in python, data cleaning, data cleaning in r, how to clean data in excel, python data cleaning exercises, data cleaning steps, data cleaning tutorial, import large database, oscommerce database restore large database, import mysql database large database, mysql software import large database, import large database ssh, experience large database mysql, large database, making excel files large database, sending messages large database

About the Employer:
( 2 reviews ) London, Morocco

Project ID: #25630475

Awarded to:


I'd like to take a crack at your problem. Could you share a sample of your data and the "id" column that makes your column unique? Looking forward. Balaji

$20 USD in 7 days
(0 Reviews)

6 freelancers are bidding on average $22 for this job


I can write you a PHP script to address your outlined requirements in approximately 3 hours, the script is highly unlikely to take more than 10 minutes processing those numerous records. Get in touch to kick-start the More

$48 USD in 1 day
(3 Reviews)

I would propose using MongoDB, MySQL, etc. If you require full text and fast searching Elasticsearch might be a good choice too. Please open a quick chat to discuss :)

$10 USD in 1 day
(1 Review)

I have done this in the past and have an experience of 20 years. I have a team of 11 people. My contact number is 8586898790. Feel free to shoot an email at [login to view URL] Relevant Skills and Experience More

$25 USD in 1 day
(0 Reviews)

Hi sir, I can do this task according to your description with the highest precision and quality. I would like to complete your project without error and with a full delivery. I have extensive experience in your pro More

$10 USD in 7 days
(1 Review)

Dear Sir/Madam I have excellent skills in Microsoft Office especially in Word / Excel. I have worked on a number of automation Excel Macros. Your requirement of combining your data and removing duplicates could be done More

$20 USD in 7 days
(0 Reviews)