C++ :: Parsing A Large File Into Smaller Units
Dec 16, 2013
I have a large binary file (84GB) that needs to be broken down into smaller file sizes (~1GB to 8GB) for analysis. The binary file is on a 32-bit machine and cannot be copied to another machine for analysis. The PC has Visual Studio 6.0 and is not upgradable. My issue is I'm using the following generic code to create the smaller files.
fseek(file, start, SEEK_SET);
end = start + (variable based on file size);
fseek(file, end, SEEK_SET);
for (i=start; i<end; i++) {
if(!feof(f)) {
byte = fgetc(f);
fputc(byte,new_file);
}
}
However, on a 32-bit machine, the iterator can only count up to ~2billion. Which means that I'm unable to copy anything past ~2GB. My original idea was to delete from the large binary file as I read from it so that I can reset the iterator on every read. However, I haven't come across a way to delete binary file entries.
Is there any other way that to break down a large binary file into smaller units? Or is there a way to delete binary file entries in sections or per entry?
On a 64-bit machine I could use _fseeki64. I've been reading that some versions of Visual 6.0 are capable of supporting 64-bit numbers but when using _fseeki64 or _lseeki64 on this machine its an "undeclared identifier"
View 7 Replies
ADVERTISEMENT
Feb 10, 2013
I have a massive text file containing many thousands of directory and file names with / at the root, like so:
/dir/
/dir/dir/
/dir/dir/dir/
/file
/dir/file
/dir/dir/file
/dir/dir/dir/file
I need to parse the file in such a way that I can create a filesystem hierarchy as if I were enumerating files/directories. Ultimately I want to add these to a tree gui control with everything under its proper node without duplicating anything. It should look roughly like so:
dir
-file
-dir
-file
-dir
-file
I can open the file and add nodes/children to the tree control but how should I go about doing the actual parsing? How can I find a filename and say "this belongs under this node"? I want to do this efficient as possible even if I must use multiple threads.
View 1 Replies
View Related
May 30, 2013
So I'm attempting to write a program that will parse through a large file (genome sequences) and I'm basically wondering what options I should consider if I wanted to either:
a) store the entire genome in memory and then parse through it
b) parse through a file in small portions
If I go with "a", should I just read a file into a vector and then parse through it? And if I go with "b" would I just use an input/output stream?
View 5 Replies
View Related
Jun 23, 2014
it will not run and im not sure why. I have a couple of errors, but I'm not sure why.
Here is my code.
//Reads input from user to determine different discounts by number of units sold
#include <iostream>
#include <string>
using namespace std;
int main() {
//Declaration and Initialization of variables
int quantity;
double discount,price = 99.00,totalCost;
[Code] ....
View 1 Replies
View Related
Apr 11, 2014
I'm parsing an xml file full of payslips and using the data in another application. I've got it all working but I suspect it isn't the most elegant piece of code. I run through the xml file finding a series of "Text" attributes/elements" and then I run through it again finding a series of "Field" attributes/elements, Here is a sample of the code:
For getting the "Text" fields :
XNamespace ns = "urn:crystal-reports:schemas:report-detail";
//
// Get all the Text attributes & Elements
//
foreach (XElement xtxt in xdoc.Descendants(ns + "Text"))
[Code]...
This works fine, I extract all the data I'm interested in and go to do my thing with it. However I really need to know when each record ends and I was doing that by looking for "Text24" in the text fields and "EeRef2" in the field fields, which wasn't very elegant in the first place. Then a "Text16" was added to end of each record which was fine I could just look for "Text16" but now it's apparent that "Text16" isn't always there. I've got it all working for now but I'd prefer to process one record at a time i.e. extract all the "Text" & "Field" values for one record, do whatever I need to do with it, update the xml file to indicate this progress ( if possible ) and then move on to the next record. I've attached a sample of the xml but basically is has the following structure :
<Details>
<Section>
<Text></Text>
"
<Field></Field>
[Code]...
So a record is everything between the first <Details> and the last </Details> with two <Details> and two </Details> in between.
View 2 Replies
View Related
Aug 26, 2014
Requirements in filtering the text file.
1. first my professor required me NOT to change the MAIN function(because he made it)
2. I have to make 3 getlogs() STRING FUNCTIONS:
a. string getlogs(); - accepts no paramters, SHOWS ALL THE CONTENTS OF TEXT FILE
b. string getLogs(const string & a); - accepts 1 parameter -SHOWS ONLY THE LINE WHICH CONTAINS THE SPECIFIED DATE FROM MAIN FUNCTION which is "2014-08-01"
c. string getLogs(const string & b, const string & c); - accepts 2 parameters, SHOWS ONLY THE LINES FROM THE DATE START to DATE END specified at THE MAIN FUNCTION which is date start-"2014-08-01";DateEnd = "2014-08-10";
3. all COUT should be in the MAIN FUNCTION
TEXTFILE CONTAINS:
2014-08-01 06:13:14,Name,4.5,CustomUnit,CustomType
2014-08-02 06:13:14,Name,4.5,CustomUnit,CustomType
2014-08-03 06:13:14,Name,4.5,CustomUnit,CustomType
2014-08-04 06:13:14,Name,4.5,CustomUnit,CustomType
[Code] .....
my codes so far:
//MAIN
#include <iostream>
#include <string>
#include <fstream>
#include <dirent.h>
[Code].....
View 1 Replies
View Related
Mar 23, 2013
We have developed MDI application on 17 inch monitor in VC++ 2010 MFC. But now management wants application to run on 10 inch laptop.
Application does not fit,it goes out of screen. How to fit application on smaller screen
View 1 Replies
View Related
Apr 19, 2013
I have a std::map<int, foo>
what's the ideal way to get an iterator to the item that has the largest key (int) smaller than a given value.
basically, the item before upper_bound(). I can use upper_bound() and then decrement, but it needs special cases for both end() and begin(), and in the case of end() I'm not sure how I get it to the last item in the map, afaik, we're not allowed to decrement end().
Code:
auto it = mymap.upper_bound(x);
if (it==mymap.begin()) // first item in the map is already too large. reject
NotFound();
else if (it==mymap.end())
[Code] .....
// here it points to largest item smaller than x.
I can iterate over the entire map and do a compare, but then I pretty much loose the benefit of the binary search.
View 2 Replies
View Related
Mar 26, 2015
I have this file that I would like to read into a multidimenstional array in c#: file. If I take the first set of lines as a first example, I would like the print_r to look something like this:
Array (
[SiiNunit] => Array (
[0] => Array (
[nameless.2BB8.4FB8] => Array
[0] => Array (
[bank] = _nameless.2917.43B0
[player] = _nameless.2813.6928
[companies] = 312
[code].....
this (as I expected it to) wrote each line of the file into a 1d array however, I would like it to write it to a multidimensional array.
View 7 Replies
View Related
Aug 10, 2014
If I do this:
void* testPtr = malloc ( 1000 )
And then this:
testPtr = realloc ( testPtr, 500 )
Will the realloc just reduce the allocated size and keep the same pointer, or can there be a chance of it finding another place for that allocation ( Meaning that it will expensively move the memory to another location )?
I am trying to create efficient programs by making my dynamic allocations the least resource hungry as possible during runtime.
View 2 Replies
View Related
Dec 2, 2013
Are there any examples how to parse a matrix:
Code:
const string ABC = "
A B C D
A 1 -1 2 14
B 0 -2 -4 8
C 6 2 2 3
" so if i have it as a string stream and then loop through each line like this:
Code:
istringstream in (ABC);
for (string line; getline(in, line); ){
vector<char> vec(line.begin(), line.end());
for (int i = 0; i< vec.size(); i++)
cout << vec[i] << "
";
}
I get my strings chopped into characters. but how to chop it into "meaningful" characters so that -1 is not - and 1. is there any quick way for that to happen ??
View 3 Replies
View Related
Sep 20, 2013
I have been attempting to store mathematical functions in a file by parsing them into a linked list with a variable sized char ** array as my storage device. I have ran into problems with the memory management detail. The program crashes before output is flushed to the console, so printf() wasn't a debugging option. Neither is my actual debugger, since it seems to get a SIGTRAP every time. I have my warnings turned all the way up, but no errors or warnings are appearing. The part I know works is the actual code that opens the file and gets a line from the file. As far as the two functions that implement the linked list, that is most likely where the problem lies. My current attempt is basically to store the size of the dynamic array in the structure and keep resizing it until there are no more tokens. Then I will store the number of elements of the array in the structure and move on to the next node.
Here is my text file I use :
Code:
sqrt( 25 ); pow( 6 ); sin( 2 );
pow( 4 ); tan( pow( 2 ) ); Main.c :
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct file_data
[Code]...
View 7 Replies
View Related
Dec 8, 2014
I'm parsing a text file, and I'd like to detect when a certain Compilation Condition - i.e. #ifdef - begins. The challenge is, that the condition can take any of the following patterns:
#ifdef (FLAG)
#if defined (FLAG)
#if (defined (FLAG))
(And perhaps I missed more)
I'd of course need to treat them all the same, as they are indeed the same. How would you know to treat them all the same?
View 2 Replies
View Related
Jan 4, 2014
I am parsing a binary data file by casting a buffer to a struct. It seems to work really well apart from this one double which is always being accessed two bytes off, despite being in the correct place.
Code:
typedef struct InvoiceRow {
uint INVOICE_NUMBER;
...
double GROSS;
...
char VAT_NUMBER[31];
} InvoiceRow;
If I attempt to print GROSS using printf("%f", row->GROSS) I get 0.0000. However, if I change the type of GROSS to char[8] and then use the following code, I am presented with the correct number...
Code:
typedef struct example {
double d;
}
example;
example *blah = (example*)row->GROSS;
printf("%f", blah->d);
View 7 Replies
View Related
Nov 5, 2014
I am searching an API that can give me thumbnail(with large Icons) of all folders of all drive in Windows 7. Like
c:/--->Folder1,Folder2.....FolderN
d:/--->Folder1,Folder2.....FolderN
.
.
n:/--->Folder1,Folder2.....FolderN
How i can get associate thumbnail(with large Icons) of each folders of any of the drive.
View 5 Replies
View Related
Feb 21, 2015
I have a specific byte (that is unsigned char) array, that I need to find in a very big file (2GB or so), currently I'm using:
size_t fsFind(FILE* device, byte* data, size_t size) {
int SIZE = (1024 > size) ? 1024 : size;
byte buffer[SIZE];
int pos = 0;
int loc = ftell(device);
[Code] ....
Which seems to find proper result on first use, but on subsequent searches it seems to find invalid positions, is there something I'm doing wrong, or is there a better way?
View 6 Replies
View Related
Dec 10, 2013
I need a large string vector for file content manipulation.
v1.max_size() gives me 153 391 689.
But if I make a test by looping a vector pushing back strings it crashes around 16 - 26 000 000 though I still have a lot of ram (1GB left).
How come and why isn't vector size limited by ram instead?
View 11 Replies
View Related
Jan 1, 2014
I am currently working out on a problem in which a c program is to be made which shows a large text file in parts.
f
For example: If file contains 200 lines. 50 lines will be shown on first page and user is asked to press any key to move to next page until EOF is found. user is allowed to return to previous page as well, and this is very complicated task for me. I tried to move cursor to a specific position using fseek etc but it page doesn't stop and reaches to end quickly.
View 1 Replies
View Related
Nov 7, 2014
I am writing a spell checker for a exercise for a class I am taking online. I have to load a huge text file of 143092 words into a hash table.
The program segfaults on word number 63197. Is there a way to debug this in gdb without having to step threw the function 63197 times.
View 3 Replies
View Related
Apr 1, 2015
Im trying to read and store several students information so that i can have an options menu where i can enter a student number and the program prints all the information stored about that student. The way i have it set up now, doesn't work for this because all info is reinitialized to stud1. Is there another way to store this info other than defining stud1, stud2,.....,stud200?
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
struct student_info {
char first[20];
char last[20];
[Code]....
View 1 Replies
View Related
Oct 1, 2012
So im trying to parse a string into a Ip Address but i have a problem, the IPAddress.Parse method only works for ipv4 address's how do i parse ANY Ip address into a string, if i use the IPaddress.Parse method on my public(remote) IP it throws an exception but on ipv4 local ip it doesn't how do i parse ANY ip address the user inputs as a string as an Ip Address?
View 5 Replies
View Related
Sep 10, 2014
I have been here for almost 3 months looking for answers in my C++ problems.here's some type of code for this.
cout << "Enter value of x: " << endl; //Let's say 5.
cin >> x;
cout << "Enter equation: "; //Let's say x+1
cin >> equation;
Then the program analyzes that this character "x" has an initial value of 5.I already have the parser for the equation functions (+,-,*,/)This is the only thing lacking. Is there some type of function that i missed?
View 6 Replies
View Related
Nov 3, 2013
I have an input file that contains any number of lines. Each line will follow the same structure. There will be 5 fields, separated by 4 commas. Fields 1 and 3 will be single characters, fields 2,4,5 will be integers. Example:
<A, 123, B, 456, 789>
I need to iterate over each line, parse out the fields, and capture each field value into a separate variables.
header file: Code: #include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/socket.h>
[Code]....
I've used similar code to this before, but this time I'm getting the dreaded pointer from integer error on my strtok lines:
warning: passing argument 2 of 'strtok' makes pointer from integer without a cast'
View 2 Replies
View Related
Mar 14, 2014
Suppose I have read a line from an ASCII file with fgets(). Now I want to parse the line, which looks something like this: Code: # John Q. Public et al. 2014, to be submitted The name, "John Q. Public" is what I want. However, the name can be anything, consisting of 1 or more tokens separated by spaces. it could be "John" Or "John Public", or "Thurston Howell the 3rd", or etc... Bascially, I need to get the entire substring between the first hash mark, and the "et al" in the line. I tried this: Code: sscanf(line,"# %s et al.",name); But I can only get the first token (which, in this case, is "John").
View 2 Replies
View Related
May 19, 2013
I have to make a c++ program, in which with an algorithm I have to code a text from a file and write it to another file. The input should like this: "code forCoding.txt toBeWritten.txt" ; or like this: "decode toBeReadFor.txt toBeWrittenIn". I have done everything except one thig: It is says I have to be able to input parameter.
How should i write this? I read [URL] ....., but still dont get. The input of my program has to have 3 strings, so I guess argc should be 3, but I dont really get it. What should I have in my main about this parsing command line parameters?
View 4 Replies
View Related
Nov 30, 2013
For a rather complex and strange reason that I won't explain right now, I need to have this going on in my program.
class FVF{
private:
vector<vector<float>> data; //Contains fvf data for Direct3D stuff
public:
[Code].....
The FVF allows this Model3D class to also be compatible with file handling methods I've got, but here's the problem. D3D buffers require an array to feed them the information, and I know that for a single dimension of vector I can use vec.data(), how to do this for multiple dimensions.
I think the best Idea I've got so far is to set the vector within the Model3D class as a pointer, then I can union it with a float pointer... Once I can guarantee the information is correct and complete, manually transfer the contents of the vectors into the float pointer.. (The union is to reduce memory needed instead of having the data repeated in vectors and arrays)
how I could just pass these as arrays?
View 4 Replies
View Related