The problem is clearly some string unicode conversion involving file names. The basic problem underneath all this is a deep design flaw in Unix. In the old days (30 and more years ago), all strings were intended to be displayed in a fixed and known character set, typically 7-bit US-ASCII. Unix implemented strings as arrays of 8-bit bytes, which worked well with the ASCII character set. Later, the definition of string had to be made more general, to allow more complex character sets, for i18n. Initially, this was done by using 8-bit character sets, and as long as all processes on a set of connected computers (a cluster) used the same character set (for example iso8859-1 in western europe), this worked fine, and the kernel and C-library routines didn't have to know what the sequence of 8-bit bytes actually meant. But this technique was quickly found to be insufficient, in two areas: first, where multiple 8-bit character sets have to coexist (for example a computer that is used simultaneously by users who operate in french using iso-8859-1 and in ukrainian using KOI8-R); and second, for CJKV languages, where 8 bits were insufficient. Slowly, over the last 30 years, this has led to most string data having been converted to Unicode, and usually stored in a UTF encoding (often UTF-8). Userspace applications slowly learned how to interpret stored data in various string formats, and where necessary how to convert them from one encoding to another. To help them do that, they end up using locale settings; in Unix, the default locale for a process comes from the LANG and LC... variables. In userspace, this either works fine, or if it doesn't it's a bug in an application.
The problem is that the kernel doesn't know the locale of a user process. There are very few places where text strings cross the userspace / kernel boundary, the main one being file names. If one process running in one encoding (for example iso8859-1) puts a text string into the kernel (by creating a file with that name), and another process running in a different encoding (for example unicode encoded as utf-8) retrieves that string (by doing a readdir() or opening the file), then it will get back the original 8-bit bytes; but if it interprets them wrongly as a utf-8 encoded string, the iso8859-1 characters will seem like gibberish. This can cause bugs in the software, as your example shows.
This design flaw remains in all Unix-derived operating systems. To my knowledge, there is only one such system that has solved this, and that's Mac OS when using Apple's HFS file system: they enforce that all file names be Unicode encoded as some form of UTF.
The real fix for this will have to be to read the source code of cpdup, or find the author or maintainer of that software, and fix the bug: when dealing with file names, one can not assume any specific encoding, and needs to just transport them as binary blobs; the only characters with clearly specified semantics in file names are nul (the zero character that ends a string), and '/' to separate directory names from file names.
Here is a suggestion for a hack, which might temporarily get around the problem: For all processes involved, clear (meaning undefined or unset) all environment variables that begin with LC or LANG. In sh derived shells (such as ksh or bash), that can be done with "unset L..."; I don't remember the syntax for csh-derived shells. Then set exactly one language variable, namely LANG=C. Do this not only for the local process that starts cpdup, but also on any remote machines that are involved via ssh; for example by putting it into the .profile or .cshrc startup file. It *might* solve the problem by preventing some string library from attempting conversion.
Good luck!