Working with very large .CSV files

sumohax0r

what that be like?
Nov 1, 2009
740
41
0
127.0.0.1
Hey everyone,

Hoping to find a simple solution to this problem.

I have about 7 million email leads right now and a ton more coming in but I want to separate them based on what category they are.

Ok so HERE is the problem.

I have a lot of the .CSV files that I am putting together about 80 of them and I can easily sort based on the "Categories" column and break them up into separate files.

but I have a few that are 1.4GB+ and have 1 Million + Rows and I cannot open it in Excel.

Is there anyway to put all these files (All formatted the same) together into one large .CSV file then break it up based on the 10 or so different "Categories" the leads are, or do I have to dump this into a MySQL database to work with them then export?

Any help is greatly appreciated!:1zhelp:
 


@DewChugr yea I'm using Excel 2007, but some of these 1GB+ files have over a million rows some are like 3-4 million rows. :/
 
I have using Excel for CSV files because often - if the last column is an optional/null allowed field Excel will just strip it.

I routinely use Textpad for very manipulating very large text files. It may or may not be a good tool for you but you might want to check it out -and it's free.
 
Use this: PHP: fgetcsv - Manual

Run through every line and add it to a database table. Itll be much easier to manipulate and work with. As you said 1gb is a buttload for a textfile but its nothing for the database.

Itll take a while so increase your time limit using PHP: set_time_limit - Manual before the script runs. We're talking possibly hours to run through the whole set.
 
Use this: PHP: fgetcsv - Manual

Run through every line and add it to a database table. Itll be much easier to manipulate and work with. As you said 1gb is a buttload for a textfile but its nothing for the database.

Itll take a while so increase your time limit using PHP: set_time_limit - Manual before the script runs. We're talking possibly hours to run through the whole set.

Could import with myPhpAdmin.
 
MySQL 5.1 supports CSV as a database engine now.

If you have a large csv file you can just create matching CSV table then put your file in it's place. Once you've done that it's pretty simple to use a CREATE TABLE LIKE 'original_table' to mirror the table structure then insert records into it from the original table based on your criteria.

Suggest you install mySQL GUI tools. Makes it a lot easier to get your head around the DB structure.

Some info the CSV engine here:
MySQL :: A Look at the MySQL CSV Storage Engine
 
'scuse the double post... my edit's went past the time limit....

MySQL 5.1 supports CSV as a database engine now.

If you have a large csv file you can just create matching CSV table then put your file in it's place. Once you've done that it's pretty simple to use a CREATE TABLE LIKE 'original_table' to mirror the table structure then insert records into it from the original table based on your criteria.

Suggest you install mySQL GUI tools. Makes it a lot easier to get your head around the DB structure. Just choose 'create new table' and you get a nice window to enter all your column names. If you don't want to be learning anything about sql just use TEXT for all the column data types. It will do for your purpose.

Some info the CSV engine here:
MySQL :: A Look at the MySQL CSV Storage Engine

With the GUI tools you can directly export query results to CSV as well. So once your data is in there it a simple matter of a select like:
SELECT * FROM my_table WHERE category = 'my_category';
then right click and export.

Honestly it would be simpler to just use a 'LOAD DATA INFILE' statement as it allows you to account for anomolies in the CSV structure but that seems to scare a lot of MySQL noobs...
 
Hey everyone,

Hoping to find a simple solution to this problem.

I have about 7 million email leads right now and a ton more coming in but I want to separate them based on what category they are.

Ok so HERE is the problem.

I have a lot of the .CSV files that I am putting together about 80 of them and I can easily sort based on the "Categories" column and break them up into separate files.

but I have a few that are 1.4GB+ and have 1 Million + Rows and I cannot open it in Excel.

Is there anyway to put all these files (All formatted the same) together into one large .CSV file then break it up based on the 10 or so different "Categories" the leads are, or do I have to dump this into a MySQL database to work with them then export?

Any help is greatly appreciated!:1zhelp:

Using a database is the right way to go with this. If you have to have a windows desktop application I would check out send safe list manager. I used it a long time ago on some pretty large 500m+ files and it handled them pretty easily.
 
Thank you to everyone for the Help! I'm defiantly going to go the MySQL route and mange it that way.

@Xentech - hahahahahaha.... Oh Good try :)