Serial I/O problems in 8.0

I am having a rather strange serial (EIA-232) I/O problem under 8.0-RELEASE-p2. I have seen this on 2 different systems that I've tried: an AMD Athlon 64 X2 4200+ on a Biostar MC-P6P-M2+, and a Pentium III 450 in an HP Vectra VLI8. In both cases I'm using the first built-in serial port on the motherboard.

I have code that communicates with 1-wire devices (DS18B20, DS2406) through a DS9097U-S09 EIA-232 adapter. This code works very well under FreeBSD 6.4 and 7.x on all hardware I've tried, but fails miserably under 8.0. I know the default driver has changed from the sio driver to the uart driver, and have made the appropriate name changes. It almost works on 8.0, but it seems that every data transfer results in a CRC error. I have confirmed that the modem control signals and TX and RX data look right with an oscilloscope. It is hard to believe that data overruns are occurring because I never see a value above 27 interrupts per second with "systat -v" when it is failing, and I am only running one port at this time. I run this code on a Pentium-100 under 6.4 with 4 serial ports going without any issues.

The UART is reported like this at boot time on the P3 box:

Code:
Mar 10 17:27:03 fillhole kernel: uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
Mar 10 17:27:03 fillhole kernel: uart0: [FILTER]
Mar 10 17:27:03 fillhole kernel: uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
Mar 10 17:27:03 fillhole kernel: uart1: [FILTER]

Any suggestions (I mean, other than waiting for 8.1)?
 
More info...

I've done a bit more digging into this problem. I tried just spitting byte patterns out the serial port with a loop-back connector to see if I could get it to fail. It never did.

I found a part of my code that seems to be closely related to the problem. In fact, the addition of delays in (or stepping through) this part of the code makes it work much of the time.

The code sends a 1-wire "Match ROM" command with a slave device address. Then it wants to wait for the current transmission to complete, and discard any response, as quickly as possible. So it calls tcdrain(3) to wait for the UART to finish sending the "Match ROM" command, and waits a few milliseconds (via usleep(3)) to account for the serialization of the last byte, and the lag time until the reponse (about 1.5 character times ~= milliseconds at 9600 bps). Then the code calls tcflush(3) with the TCIFLUSH parameter to discard the response. The next transmission/response pair gets corrupted under 8.0. For example:

Code:
sent:     f5 0d ff ff 55 ff ff

expected: f5 0d ff c0 55 d1 49

seen:     55 12 a4 8b 21 00 00

The response is the correct length, but otherwise it looks like gibberish. Now an interesting thing is that the slave device address is 12-a4-8b-21-00-00-00-45, and the "Match ROM" command code is 0x55, so it seems that the system has given me back a copy of the start of my previous transmission (which was 9 bytes long, not 7).

When I change the code to:
Code:
    usleep( 100000 );
    tcdrain( fd );
    usleep( 100000 );
    tcflush( fd, TCIFLUSH );
    usleep( 100000 );
it works most of the time, but I still get CRC errors occasionally, and I haven't looked to see if the failure is like the one noted above.

Any comments?
 
Test program using looped-back port

Here is a test program that shows that tcdrain(3) is not working right.

Code:
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <termios.h>
#include <unistd.h>

static struct termios active_port_info;
static struct termios starting_port_info;
static int fd = -1;


/*
 * Open serial port device and set line attributes.
 * A non-zero return value indicates an error occurred.
 */
int serial_open( char *path )
{
  fd = open( path, O_RDWR );
  if ( fd < 0 ) {
    return -1;
  }

  /*
   * Get/set line parameters.
   */
  if ( tcgetattr( fd, &starting_port_info ) < 0 ) {
    return -1;
  }

  active_port_info = starting_port_info;

  active_port_info.c_iflag = 0;
  active_port_info.c_oflag = 0;
  active_port_info.c_cflag = CS8 | CREAD | CLOCAL;
  cfmakeraw( &active_port_info );
  active_port_info.c_cc[ VMIN ] = 0;
  active_port_info.c_cc[ VTIME ] = 1;
  cfsetspeed( &active_port_info, B9600 );
  if ( tcsetattr( fd, TCSAFLUSH, &active_port_info ) < 0 ) {
    return -1;
  }

  return 0;
}



/*
 * Function to read a data sequence from the serial port.
 * Handle timeouts, etc.  The number of characters
 * received is returned, but there is no indication of
 * timeout except a short count.
 */
int serial_read( unsigned char *data, int len )
{
  int count = 0;
  int i;

  while ( len > 0 && (i = read( fd, data, len )) > 0 ) {
    count += i;
    data += i;
    len -= i;
  }
  return count;
}


static char xmit1[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
static char xmit2[] = "abcdefghijklmnopqrstuvwxyz";
static char xmit3[] = "0123456789:;<=>?@";

static char rcvbuf[ 256 ];

main(int argc, char **argv )
{
  int count;
  int i;
  int len;

  if ( serial_open( "/dev/cuau0" ) ) {
    perror( "Open failed" );
    exit( EXIT_FAILURE );
  }

  /* Check loopback */
  write( fd, xmit1, strlen( xmit1 ) );
  count = serial_read( rcvbuf, sizeof(rcvbuf) );
  if ( count != strlen( xmit1 ) ||
       memcmp( xmit1, rcvbuf, count ) != 0 ) {
    printf( "Loopback failed\n" );
    exit( EXIT_FAILURE );
  }

  sleep( 1 );

  /* Check flush */
  write( fd, xmit1, strlen( xmit1 ) );
  write( fd, xmit1, strlen( xmit1 ) );
  write( fd, xmit2, strlen( xmit2 ) );
  tcdrain( fd );
  tcflush( fd, TCIFLUSH );

  write( fd, xmit3, strlen( xmit3 ) );
  count = serial_read( rcvbuf, sizeof(rcvbuf) );
  if ( count != strlen( xmit3 ) ||
       memcmp( xmit3, rcvbuf, count ) != 0 ) {
    printf( "Post-flush loopback failed.  Count off by %d\n",
	    count - strlen( xmit3 ) );
    exit( EXIT_FAILURE );
  }

  exit( EXIT_SUCCESS );
}
 
Still in 9.0-CURRENT as of 19-March-2010

Just to be sure this problem hasn't already been fixed by the changes since 8.0-RELEASE, I set up a system with 9.0-CURRENT. The loop back test program still shows that tcdrain(3) doesn't work right.
 
Email ed@ with the details. He's a really courteous guy about the syscons work, so I'd definitely approach him if you're having an issue.
 
Sorry to resurrect an old thread...

I can report that tcdrain(3) still doesn't behave properly in 8.3-RELEASE. I'm using the Maxim/Dallas OWPD library (more or less what's in comms/mlan3) to talk to some 1-Wire devices. My code worked under 6.3-RELEASE and 7.4-RELEASE. Under 8.3-RELEASE it fails.

Under 8.3 if I precede the tcdrain(3) call with a usleep(1000*num_bytes) [I'm running at 9600 bps, so each byte is roughly 1000 microseconds] then my code works OK.
 
I wrote my own code to talk to the 1-wire devices, and I eventually rewrote it to work without depending on tcdrain(3). At this time I still have one system running 6.4-stable (won't boot anything newer, probably due to BIOS bugs), and one running 7-stable, but everything else is running 8-stable or 9-stable.

Essentially, my fix (which makes sense in my code, but I'm not sure about the OWPD library), depends on a function that reads back as many data bytes as were written so that even if the 1-wire responses are not needed there is essentially nothing to drain. The situation is slightly more complex for command mode on the DS2480b/DS9097u and/or iButtonLink adapters. I wound up hauling my old HP 4952A protocol analyzer out to help me get the command mode code right.

Kinda interesting that the tcdrain(3) problem is noted in two PRs submitted just about a year apart: kern/144696 and kern/155752.
 
Back
Top