How can you sort a long list of data (strings, floats etc) that is read from a large file (say a few million lines) using a Data.Vector.Generic.Mutable object and a sort algorithm from Data.Vector.Algorithms?
Here's how to do it in the general case.
First, you need a mutable vector. You can build this incrementally as you scan the file; allocate a vector that's about as big as you need, and increase the size and copy when you run out of space. Or you can read the entire file, count the record separators, and allocate the right amount of space all at once. This is easier but probably not acceptable In Real Life. (The expand-as-needed strategy is pretty common; if you ever use a language like Perl and push lines you read from a file onto the end of an array, this is what's happening. Perl allocates some space for an array, when you fill it, it increases the amount of space, allocates new space, and copies.)
Anyway, I'm too lazy to do this, so I am just going to create a vector with some random numbers in it.
We need a bunch of libraries for this:
import Control.Monad import System.Random import qualified Data.Vector as IV import qualified Data.Vector.Mutable as MV import qualified Data.Vector.Generic as V import qualified Data.Vector.Algorithms.Intro as VA
We don't need this all at once, but we'll need it eventually, so I thought I'd get it out of the way.
Anyway, our mutable vector is going to be a "normal" mutable vector,
The idea of a mutable vector is that you create it and modify it in a
number of steps. In Haskell, there are a few ways to make this look
pure to the calling code; one is to do it all inside the
Your ST action is creating the vector, modifying it, and "freezing" it
into an immutable vector. Internally, you are using fast
modify-this-memory-location-a-bunch-of-times operations, but
externally, you have something that is pure. (Read the paper about
ST if you want an argument as to why this is safe.)
Another way to deal with mutable data is to just do it inside the
IO, monad. That's what we're going to do here, as
it's most convenient.
Data.Vector.Mutable has two predefined vector types for you,
STVector. We're using
IOVector, which puts all
the vector operations into
So like 8 paragraphs ago, we were going to create a mutable vector to sort. And here we are:
randVector :: IO (MV.IOVector Int) randVector = do v <- MV.new 10 forM [0..9] $ \x -> do r <- randomIO :: IO Int MV.write v x r return v
This is an IO action that returns a new mutable vector with 10 random numbers inside it. (Random number generation can also be conveniently lifted into the IO monad, so we did that too, for convenience! It's like we're writing C, but with nicer syntax and more type safety.)
That's actually the hard part. To do the sorting, I imported
Data.Vector.Algorithms.Intro which is basically an in-place
quicksort. A function called
sort does the actual sorting (in
whichever monad the mutable vector is in).
An action that creates the random mutable vector and sorts it in place looks like:
sort = VA.sort =<< randVector
Now, to print that out, all we need to do is "freeze" the vector into
an immutable vector, and print out the
.toList. Or you can just
iterate over every element and print it.
Here's what I came up with as an example:
main = do v <- randVector VA.sort v iv <- V.unsafeFreeze v :: IO (IV.Vector Int) print . V.toList $ iv
V.unsafeFreeze is from
Data.Vector.Generic (how you interact with
all vector types with the same API), as is
Anyway, it's worth noting that IO is purely for convenience here.
Since you're building a vector from file data, it's appropriate. 99%
of the time, though, you should use
ST. In your ST action, create
the vector, sort it, freeze it, and return the frozen version.
A similar example using
randVector :: ST s (Vector Int) randVector = do vec <- new 10 rand <- newSTRef 17 forM_ [0..9] $ \index -> do randN <- readSTRef rand let randN' = (fst . next . mkStdGen) randN writeSTRef rand randN' write vec index randN' unsafeFreeze vec
Then run with:
*Main> runST randVector fromList [679560,1422110406,306332632,1905242129,692062628,393451229,355476175,1240028023,873588529,1181443777] :: Data.Vector.Vector