C :: Read Enormous Binary Files (10-100GB) And Parse Their Contents Bit At A Time
Dec 5, 2013
I am trying to read enormous binary files (10-100GB) and parse their contents a bit at a time. As part of the process I need to get the size of the file in bytes. The simple solution
Code: fseek(file,0,SEEK_END);
size=ftell(file);
fails because the file size overflows the long int type returned by ftell. I need a long long int.
Is there a reasonably efficient way to do this? The good news is that it only needs to be done once. I suppose I could read it one character at a time until I hit the end and keep count, but that just seems inelegant...
I'm trying to do file compression/decompression in C and I need to handle one bit at a time.
I currently try to do:
unsigned char byte = fgetc(fptr);
and later
byte >>= 1;
but the problem is that I have to use the first bit of the byte and then treat the next 8 bits as one byte. The byte usage keeps shifting over in this way. It's probably quite clear that I'm a bit lost.
I am working on a program where I am reading in a text file to be parsed to load data objects into their appropriate classes. I have a class to load in this file and to parse its contents. I am at the point where I would like to include the ability to have both C and C++ style comments in this text file. My current class has a constructor that takes in const std::string for its file name. Upon construction I have a while loop that calls a member function getNextLine() as long as this is true this loader class object then calls parseGui(). All of this works correctly so far. Within the function getNextLine() looks like this:
// ---------------------------------------------------------------------------- // getNextLine() // Returns True If It Got A Line Of Text (Could Be Blank If It Is All Commented Out Or // If It Truly Was A Blank Line). False Is Returned When There Is No More Data In The File bool GuiLoader::getNextLine() { if ( !_file.readLine( _strLine ) ) { return false;
[Code] ....
As you can see this getNextLine() method reads in a single line of text and saves it into its member variable which is a std::string. It increments the line number, then a utility function trims out leading and ending white spaces. The next part is where this class calls a member function to remove comments. It is in here where I need to parse the string to look for comments either "//" C++ style line comments or "/* ... */" C style block comments that can span multiple lines. I have tried many different ways to go about doing this, and I am stuck on the C style comments. A note to consider is this: the way this code is designed is reading in a line of text from the file into a string variable and in doing so it is not logical for me to read in the complete file and do a pre-parse scan or analyzer.
This is what I have so far, however there are still cases that this code will fail on
An example of where this code will fail is when it encounters a single '/' any where on a line of text as it goes into an infinite loop. I am not sure on how to go about skipping this lonely '/' and advanced the index to look for a possible next '/' that belongs to a '//' or a '/*'. I know that this is sort of trivial, but for some reason or another I am having writers block.
I'm trying read a binary file. A binary files is continued with bytes(ascci characters). and the 1st position is the position 0(zero).
I'm trying read just some values from ICO file:
- the 3rd value is in 4th-1 position(number of icons); (See the table: [URL] .... ) - the with is the (numberoficons*16) + 4 (the 16 is the Entries structure size) position; - the height is the (numberoficons*16) + 4 + 4 (the 16 is the Entries structure size) position.
now see the code:
int iconwidth; int iconheight; int iconcount; FILE *iconfile = fopen(filename.c_str(), "rb");//open the file fseek(iconfile,4-1,SEEK_SET); //put the file in position 6(the position starts from 0) fread(&iconcount,sizeof(char),2,iconfile);//get 2 blocks with char size(2 bytes).. i'm getting the number of icons
I am trying to get the code to read from the txt file one bite at a time and then write this bite into the binary file but i cant seem to get it working.
FILE *fpcust, *fpcustbin; //<<<<<-----point to both sales and customers text files, and the new .bin files for both char buffer; int ch; int ch1; fpcust = fopen("c:customers.txt", "r"); //<<<<-----pointing to the file fpcustbin = fopen("c:customers.bin", "wb"); //<<<<<-----pointing to the new binary file, opening in writing binary
I'm writing a program that needs to parse executable files. I've got an "executable" base-class, and currently an "elf" class which inherits from it for parsing ELF files, and I will add more parsers (COM, MZ, PE, a.out, MACH-O, whatever) later on.
I want the program to automatically detect which kind of executable it's loading at runtime. It should be easy because every executable format I'm aware of/plan to support starts with a magic number. But because I can't have the parsers not check the file type (what if I re-use the code?), and I don't want to check each file twice (not just for performance, but also because only the ELF parser should know that ELF files start with "x7fELF", etc.) so I've come up with a pretty lazy solution: just try to parse the file with each known parser and have them throw an exception ("exe_type_error") if they can't parse it. If that exception gets thrown, try the next parser; if not, stop.
The remaining problem is how, at runtime, my program will know what parsers are available. I don't want to hard-code it in the main function; instead, I'd like the parsers to "register" themselves as available. That way, if I decide to go down the route of adding new parsers via dynamic linking, I will only have to add an API for dynamic libraries to register their parser, without recompiling any of the main program's code. I also want to do the same thing for another key part of the program (it's a static executable optimizer; it will run a series of "tests" (e.g. "is xor eax, eax faster than mov eax, 0 on this machine?") and optimizations ("if yes, change all mov eax, 0 to xor eax, eax") and I want to load those at runtime too).
I want to read the contents of a file block (512 bytes) by block using low I/O read statements. Each record is 64 bytes long and has a pre-defined structure. The first 4 bytes are an unsigned integer; the next 20 bytes are ascii text, etc.
I have a buffer which I can access with buf[0] to buf[63] to read the first record and then buf[64] to buf[127] for the second, etc. However, I was wondering how to map a record so that I can refer to an integer as an integer and a float as float, etc. I can't create a struct and move the 64 bytes to it, as I will have alllignment/padding problems.
What is the standard way to deal with records in C?
I have dabbled in programming with C, but I am not proficient enough to write a program dealing with files. I have to take an extremely long list of numbers that need to be placed into another file within a line of code for validating purposes. Taking the numbers from one file and placing them into another file with a string code around them?
i need to write a c++ code which can merge contents of several .txt files into a single file. i used the following code , it works but after merging the result file contains the contents merged twice.I think it over writes the result.I want to do it without using command line.
#include<iostream> #include<fstream> using namespace std; int main() { std::ofstream("merge.txt"); system("type *.txt >> merge.txt"); system("pause"); return 0; }
How does one go about copying one binary FILE variable to another in C++? Say I have the following:
FILE* tempFile; //is a binary file and assume already filled with contents FILE* localFile;
tempFile, as the name implies, is stored in a temporary directory (AND has a randomized temp name) within Windows and I want to copy its contents to another file with a predefined name that is within a valid local directory (e.g. C:UsersuserMy Documents est.exe). What do I have to use?
I am coding in C++ an implementation of BTree Insertion. I want to display the contents of the Tree in a per-level format. After writing functions and trying to run. I get the error Undefined reference.
// C++ program for B-Tree insertion #include<iostream> using namespace std;
// A BTree node class BTreeNode{ int *keys; // An array of keys int order; // Minimum degree (defines the range for number of keys) BTreeNode **child; // An array of child pointers int size; // Current number of keys
I have written the following code but i am stuck. Write a program that will prompt the user for a file name and open that file for reading. Print out all the information in the file, numbering each new line of text.
Code: #include <stdio.h> #include <stdlib.h> #include <conio.h> #include <ctype.h> int main() { char line[81], filename[21], c; int i = 1; FILE *inFile;
am trying to create a service that will try to create a service that will monitor a folder. Whenever a new file gets created, I am trying to read the contents of new file and copy contents (with same file) at a new location.
The problem I am facing is that only first file that gets its name copied and a file created at a new location, but without any contents. The subsequent files do not get created at all.
My Services.cs looks like this:-
public partial class Service1 : ServiceBase { public Service1() {
I'm trying to calculate a series of times from start to end, and find out the duration between them, sum them up and see if they're above a certain value or not, for each particular instance.
My goal is to provide a prepared text file with time tags such as this:
And the program is able to calculate the total time relevant to each instance (instances separated by a line of '=').
Some form of number should somehow identify each instance or something similar and a text file is generated with total time printed for each instance. E.g.
Now I'm currently working on making the logic to calculate time within the ranges I'd like based on several parameters.
Are there any references I can use when it comes to working with strings in order to seek and extract these values in order to work with them? The documentation available on this website, despite being very informative, does not show practical applications of said class and I'm at a loss on how to implement the functionality.
I have a tcp client - server implementation running in the same program, on different background worker threads. There will be instances of this program on multiple computers so they can send and receive files between each other. I can send files sequentially between computers using network stream, but how would I send multiple files at the same time from computer A to B.
Sending multiple files over one connection ( socket ) is fine, but having multiple network streams sending data to a client, the client doesn't know which chunk of data is apart of which file ?
I am attempting to merge binary files. However, this is to no avail. The program keeps segfaulting. I want to merge the buffers the files are stored in and then write the new one to disk. Anyway, here is my code.
I have a file called example.txt . In that file, the int 123456 is stored. How can I read one independent number from that int? Lets say, I have an int variable called "Weight." How do I set weight equal to the number 1 from the int 123456 in the file?