C++ :: Column Based Database - Constructing Very Fast Radix Sort
Apr 12, 2014
I'm attempting to build a column based database, and I'm new to C++ (just wanted to play around with building a column base database "for the fun of it"). I want to construct a very fast radix sort, that would allow me to quickly sort groups of columns based on integer values. My general preference is to take up more RAM to get more performance.
I'd like to build the radix sort by allowing 256 "buckets" to drop values in as I'm sorting. This allows me to sort through a group of 4 byte integers in only 4 passes. But assuming I'm trying to sort a large group of values (say 50+ million), I'm not sure what type of container to use for these. Also note I'm pretty unfamiliar with the "standard library", so below are my thoughts:
Vectors:
-Pros: Easy to use, and very fast for sequential and random access inserts / reads
-Cons: If they have to dynamically resize because a given vector wasn't large enough, this can apparently really slow performance. Unless I make another pass over the numbers before I start sorting, I wouldn't know how big to make individual the individual vectors. This means I either have to make them "too big" and waste space, or pay a performance price for either resizing, or scanning data first.
Lists:
-Pros: Seems like I wouldn't have to specify size ahead of time, so I could just easily insert values to a given list. Also, since I don't need random access reads (I'll ready the "0" list sequentially, then the "1" list, etc. they should work fine.
-Cons: I don't really know much about lists, but I'm not sure how easy it is to append a new value to the end of a list. I've read that standard library lists include both "forward" and "backward" pointers, which I don't need. Also, I find it hard to believe that there isn't some time taken up with memory allocation. If I build a list and append x million records in it, is it calling memory allocation routines x million times?
Or maybe there's another container type I should learn?
Again, my goal is to make this "fast", not "memory efficient". But having said that, the fastest way I could think of (use 256 vectors, each sized equal to the total number of members to be sorted) is just too much memory to burn - 256 times a vector big enough to hold millions of elements is too much.
Example radix sort function to sort an array of 64 bit unsigned integers. To allow for variable bin sizes, the array is scanned one time to create a matrix of 8 histograms of 256 counts each, corresponding to the number of instances of each possible 8 bit value in the 8 bytes of each integer, and the histograms are then converted into indices by summing the histograms counts. Then a radix sort is performed using the matrix of indices, post incrementing each index as it is used.
Code: typedef unsigned long long UI64; typedef unsigned long long *PUI64; PUI64 RadixSort(PUI64 pData, PUI64 pTemp, size_t count) { size_t mIndex[8][256] = {0}; /* index matrix */ PUI64 pDst, pSrc, pTmp; size_t i,j,m,n; UI64 u;
I was just going through Radix Sort algorithm. Though I got the logic and the concept used in it, I am still pretty much confused about the space complexity being O(k+n) for this algorithm. (where, k is the max no of digits in a number, and n is the no. of inputs to be sorted).
I'm trying ultimately to do a radix sort. I'm going with the implementation using two tables. One will initially hold the unsorted values then hold the partially sorted values thereafter. The other will be an array of linked lists which will be where the radix sort is done. Here is my code thus far:
Main
#include <iostream> #include <fstream> #include "radix.h" #include "radix.cpp" using namespace std ; int main ( int argc , char * argv [ ] ) { ifstream input ( argv [ 1 ] ) ;
[Code] ....
It appears to be crashing during void insert_into_sorted ( int to_insert , int place ) ; and I'm not sure why.
I Need to write a function using C wherein I should do the following:
(i) The function will receive a string in a character pointer
(ii) This string will adhere to the following structure: "Kentucky+New York+Arizona+Nevada" The number of states can differ from 4 to 50 The delimiter between States can differ from '+' to ',', hence I would like to pass the delimiter to the function.
(iii) This string should then be sorted alphabetically from left to right. The above example would then become: "Arizona+Kentucky+Nevada+New York"
(iv) This string needs to be returned from the function using a character pointer.
I'm using a database and im trying to sort a 2D array info[51][10] that contains 51 pieces of records and 10 different fields for each record. And, now I'm trying to sort a desired field in the 2D array alphabetically.
However, the other fields within the same record should be swapped together. (So that the information of the same records stays together).
Also, I am constantly experiencing a run-time error. Sometimes my program works sometimes the whole thing crashes... By opening the code in different folders sometimes works. But the problem is still here. Is there any way to fix this error?
bool swapped =true; int j=0; string tmp[10]; swapped =true; j=0;
What I want to do with the below code is to construct the vector containing 'Ability' objects in the class 'Card'. I have searched for the solution in the past, and have been unsuccessful, mainly because the vector contains child classes of the parent class 'Ability'. The below code is a snippet of the larger program that I am working on, and should compile:
As you can see, in the class 'Card' I have a pretty large constructor. Up to this point, however, I have failed in my attempts to construct the abilities vector, because it contains those child classes.
i have read a lot of about lists but i dont understand this. I know its something like dogs on leash where we have
dog1->dog2->dog3->.... and Code: struct DOG {char* (name of a dog of first leash) DOG* (next dog ) } I have written something like this but this doesnt work as i wanted Code: #include <iostream> using namespace std; struct line {
[Code]....
I wanted to make program where i can type XX numbers , then cout those numbers without changing the order, and my next exercise is to change order in this programme from end to start.
I am currently making pong using visual C++ with the SDL libary. I find that the ball moves too fast for the players and needs to slow down. The code for the movement is in a while loop and vairies speeds 1 - 2 (X and Y).
I just want to know how fast can C++ generate data? For example, I have a downstream device that is connected to my pc via a Gigabit Ethernet, and I have to generate some pattern and send it over the Gigabit interface.
I was curious if there is a way that I can see how fast I can generate data? I was curious if I can exercise a good portion of the bandwidth ! for example, sending about 600 Mbits/sec.
How do I find out, first, whether I can do this with C/C++, and, second, how do I know how fast I am sending data?
I want to build a server which holds hundreds of thousands of active users in memory. To keep all the users organized i would like to store them in a Vector.
The problem is how i could quickly and easy find the object whenever i need it? All users will have a unique ID. Would it be possible to keep some form of a Vector Index on the unique id number?
I am working with a new text adventure. The way i want to construct it is by having a class for all living things. in the class you have basic things as: health, gold, vector for inventory holding "struct item". etc...
There is also a class called world, wich navigates through the world.
World class contains of: player location, and a map containing info about the room etc...
Here comes the problem. I want there to be characters to be placed out in different maps, so basically i want the world class to hold objects from Character.
How to do it. In world class i made a map...
std::map<int,"content">
content is a struct i made above in world class:
struct content{ std::string name; // location name std::string info; // info about location std::vector<Character>characters; std::vector<item>items; };
To sum it up, i have a std::map<int,content>map the int stands for location id. Content holds more info about the room and what's in it
btw the classes are in different files and that means i have to include "Character.h" in the world file so i can set up the vector of characters.
i am looking for a verified code in simple c, for generating fft of an input image (.jpg). the output can be a text file including fft coefficients.can you recommend me any source except open-cv?
I have a SSD and I am trying to use it to simulate my program I/O performance, however, IOPS calculated from my program is much much faster than IOMeter.
My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random read IOPS is around 94k (queue depth is 32). However my program (32 windows threads) can reach around 500k 512B IOPS, around 5 times of IOMeter!!! I did data validation but didn't find any error in data fetching. It's because my data fetching in order?
I paste my code belwo (it mainly fetch 512B from file and release it; I did use 4bytes (an int) to validate program logic and didn't find problem).
#include <stdio.h> #include <Windows.h> /* ** Purpose: Verify file random read IOPS in comparison with IOMeter **/
//Global variables long completeIOs = 0; long completeBytes = 0; int threadCount = 32; unsigned long long length = 1073741824; //4G test file
You will write a program that uses a multidimensional array having 3 rows and 8 columns and sorts each of the rows using both a bubble sort and a selection sort.
You must declare the array inside of main. You will have a for loop containing the calls to bubbleSort and selectionSort. You need to pass into function bubbleSort and selectionSort the following: 1) each column of the multidimensional array, 2) the size of the column, and 3) a particular row number of the multidimensional array to be used for printing out the "pass" shown on the following pages.
I keep getting an error that the identifier for bubbleSort and selectionSort is not found. (Error C3861) Also, I feel like I'm missing something in int main() to get it to sort properly.
# include <iostream> using namespace std; int main() {
Suppose I want to represent an apartment building in terms of floors and rooms with a single data object. Each floor has the same number of rooms at any given time. During the course of the program, I will need to change the number of floors occasionally and will need to change the number of rooms on each floor occasionally. (Don't worry about the building permits needed to do this.) Now, I also want to be able to iterate over either each floor (think "row") or each room (think "column"). For example, I might want to program the equivalent of "Johnson, go fix all the front doors on the 8th floor" or the equivalent of "Johnson, go fix all the front doors to room B on each floor". (Yeah, I know, for the latter I could iterate over each floor and then access room B but that doesn't feel as "clean" as being able to iterate over a separate room/column.)