C Some other asynchronous I/O questions (not programming only)

Hello
I have some misunderstandings in FreeBSD working, as iI see.
1) As iI know (and have to try) in Windows iI can not open file/disk as direct access (no cache) and then use async i/o operations. Direct access in Win used in synchronous mode. So, and what about the FreeBSD? If iI would open the disk(partition) by g_open() couldn't it be used with AIO functions?
2) After changing my code to optimize it algoryithm to use AIO (to use many reading in 'parrallel') and to serialize each reading. So I have wrote my data files to clear HDDs. Each file is about 125-135MB. Each file placed as sorted by name (was written in sorted order w/o parallel copying).
Thus I have abiility to read it one by one w/o making head to 'rock'nroll' by the disk surface. Each my reading "stream" use it's own disk (one reading on one disk ). All calculation with readed block are makes while next reading i/o process is busy. each file also read in sorted blocks (4K block size) order. So iI hope to approximate to random 4k reading tests results. So on each file group (256 files = about 32 gb) iI make 12000-15000 block reading operations (1500 * 4k bytes / 0,341 mbps ~ 180 sec = 3 minutes). But in real iI have more than 20 minutes. More than 7 times longer.
That is my file access class:
Code:
typedef struct{
    public:
        uint64_t offs;
        int len;
} FDBIdxRec;

void FileDB::open( std::string fileName, bool recheckCluster )
{
    aFileName = fileName;
    if( fd != -1 )
    {
        close(fd);
    }
    fd = ::open( fileName.c_str(), O_RDONLY | O_DIRECT, S_IREAD );
    currentIdxSector = -1;
    currentDataSector = -1;
    lastDataPos = -1;
}


FDBIdxRec FileDB::readIdx( int key )
{    // use blocked reading cause while read indices we do not any calculation, so AIO will not give any profit
    lseek( fd, key * 8, SEEK_SET );
    uint32_t tmp[2];
    ::read( fd, (char*)tmp, 8  );
    FDBIdxRec res;
    res.offs = (tmp[0] * 8 + 0x80000);
    res.len = tmp[1] * 8;
    return res;
} 

void FileDB::readDataRec_aio( FDBIdxRec idx )
{
    memset(&aio, 0,sizeof( aio )); 
    aio.aio_fildes = fd;
    aio.aio_offset = idx.offs;     
    aio.aio_buf = (char*)dataSectorBuf;
    aio.aio_nbytes = idx.len;
    aio_read(&aio);
    rdOffs = 0;
    rdSize = idx.len;
    asyncOpInProgress = true;// needed if would change to aio_read 
    count = rdSize/8;
}

void FileDB::getLastKeyData_aio2(uint32_t* buffer)
{ 
    if( asyncOpInProgress )
    {
        while( (err_r = aio_error(&aio)) == EINPROGRESS )
        {
            cblist[0] = &aio;
            aio_suspend( cblist, 1, NULL );
        }
        if (err_r == 0)
        {
            aio_return(&aio);                     
        }
        asyncOpInProgress = false;
    } 
    memcpy( buffer, ((char*)dataSectorBuf)+rdOffs, rdSize );
}
And that is how iI use it:
Code:
    uint32_t* tmpbuf = new uint32_t[2048];
    std::list<DataReadRec> readQueue;
    FileDB *db = new FileDB();
    for (int thrdBaseI = 0; thrdBaseI < thrdBaseCount; thrdBaseI++)
    { // перебираем все базы на одном носителе
        int baseNo = dbPriorityMap[thrdBaseI][arg->threadId];
        int currFileNo = -1;
        int recCnt;
        int fileNo;
        int currIdx;
        bool isFirstItem = true;
        readQueue.clear();
        for (auto& i : (*chklist))  // chklist - sorted multimap with data to check
        {         
            fileNo = DBKEY_FILE(i.first);   // mcroses to extract file number and index from multimap iterator
            currIdx = DBKEY_IDX(i.first);
            if( fileNo != currFileNo )
            {                        // open next file
                if(!isFirstItem)     // if it is first file then we have no prepared readQueue data, so skip that
                {
                    for( auto& rdrec : readQueue )
                    {   // asyncronously read of each needed record in file and check it
                        db->readDataRec_aio(rdrec.idxRec);
                        db->getLastKeyData_aio2( tmpbuf );
                        recCnt = db->getLastKeyRecordCount();         
                        if( checkCurrIdxData( tmpbuf, recCnt, rdrec.srcRec, baseNo, arg  ) )
                        {
                            readQueue.clear();
                            goto hct13_exit; // exit from searching thread if data found
                        }
                        arg->step++;                     
                    }                 
                }             
                isFirstItem = false;
                currFileNo = fileNo;
                db->open( bases[ baseNo ].path + "/" + db->fileNoToName( fileNo ) );
                readQueue.clear(); // clear reading queue for next file             
            }
            DataReadRec rdrec; // add new index item to reading queue
            rdrec.idxRec = db->readIdx( currIdx ); // data of file block to read
            rdrec.srcRec = i.second;               // data for block to check
            readQueue.push_back(rdrec);
        }
        for( auto& rdrec : readQueue )
        {   // check data for last file
            db->readDataRec_aio(rdrec.idxRec);
            db->getLastKeyData_aio2( tmpbuf );
            recCnt = db->getLastKeyRecordCount();         
            if( checkCurrIdxData( tmpbuf, recCnt, rdrec.srcRec, baseNo, arg  ) )
            {
                readQueue.clear();
                goto hct13_exit;// exit from searching thread if data found
            }
            arg->step++;                     
        }                 
        readQueue.clear();     
    }
hct13_exit:
    delete []tmpbuf;
    tmpbuf = NULL;
    delete db;
    db = NULL;

So iI check reading order by adding logging to each read function. The reading is serialized. No any block number which returns back. Each new offset bigger than previous. But the total time is 7 times longer than theoretical. So as iI think it can be due the system caching, which reads not by 4K but by bigger blocks? or may be iI did not have enough time to send next reading command to system between the HDD heads comes to the block addressed in that commsand?
I watch the block reading log, so there is 5-10 blocks with the same or "+1" number to read all other is "+2" or more. (Only may be the file open makes the HDD head to go back... but iI think that in that function the system also use the cached readed data).

So, can anyone points me to the way to get the results approximated to random 4k reading test? As I think, but don't know how:
1) switch off or reduce caching for that disks to 4K blocks
2) use g_open() - in that mode there is no caching, but iI don't know about AIO would it work with?
3) may be something else that iI do not know about FreeBSD and freebsd_ufs?
4) would lio_listio optimize readings and "ask" the disk for next block in time, or it acts like my reading queue (I have not found any information about it's functionality and optimization)?
 
Back
Top